Add New Sources

Create custom connectors for any data source in minutes

Missing a connector for your favorite app or internal system? No problem! Airweave’s open-source architecture makes adding new sources straightforward - usually taking just a few minutes to get up and running.

🚀 Coming Soon: Connector SDK

We’re building a standalone Airweave CLI + Connector SDK that will make this even easier. For now, you’ll need to fork our repository, but the process is simple and well-documented.

Why Add Custom Sources?

Proprietary Systems

Connect your internal databases, APIs, and custom applications to Airweave

Unsupported Apps

Add connectors for apps we don’t support yet - and contribute back to the community

Specialized Workflows

Create custom data processing and transformation logic for your specific needs

Step-by-Step Implementation

Here’s how to build a complete source connector, using the Studio Ghibli API as our example:

1

Fork the Repository

Get the code and set up your development environment.

Clone and setup
$# Clone your fork
>git clone https://github.com/YOUR_USERNAME/airweave.git
>cd airweave
>
># Start the development environment
>./start.sh
2

Create Authentication Schema

Define what credentials your source needs in backend/airweave/platform/configs/auth.py.

configs/auth.py
1# Add to backend/airweave/platform/configs/auth.py
2
3class GhibliAuthConfig(AuthConfig):
4 """Studio Ghibli authentication credentials schema."""
5
6 # No authentication needed for this public API
7 pass
3

Create Source Configuration

Define optional configuration options in backend/airweave/platform/configs/config.py.

configs/config.py
1# Add to backend/airweave/platform/configs/config.py
2
3class GhibliConfig(SourceConfig):
4 """Studio Ghibli configuration schema."""
5
6 include_rt_scores: bool = Field(
7 default=True,
8 title="Include RT Scores",
9 description="Whether to include Rotten Tomatoes scores in the metadata"
10 )
4

Define Entity Schemas

Create entity schemas in backend/airweave/platform/entities/your_source.py that define the structure of your data.

entities/ghibli.py
1# Create backend/airweave/platform/entities/ghibli.py
2
3from datetime import datetime
4from typing import List, Optional
5
6from pydantic import Field
7
8from airweave.platform.entities._base import ChunkEntity
9
10class GhibliFilmEntity(ChunkEntity):
11 """Schema for a Studio Ghibli film."""
12
13 film_id: str = Field(..., description="Unique ID of the film")
14 title: str = Field(..., description="Title of the film")
15 original_title: str = Field(..., description="Original Japanese title")
16 director: str = Field(..., description="Director of the film")
17 producer: str = Field(..., description="Producer of the film")
18 release_date: str = Field(..., description="Release date")
19 running_time: str = Field(..., description="Running time in minutes")
20 rt_score: Optional[str] = Field(None, description="Rotten Tomatoes score")
21 people: List[str] = Field(default_factory=list, description="Characters in the film")
22 species: List[str] = Field(default_factory=list, description="Species in the film")
23 locations: List[str] = Field(default_factory=list, description="Locations in the film")
24 vehicles: List[str] = Field(default_factory=list, description="Vehicles in the film")

Key points about entities:

  • Inherit from ChunkEntity for searchable content (documents, posts, issues)
  • Inherit from FileEntity for downloadable files (PDFs, images, attachments)
  • Use Field(...) for required fields, Field(default=...) for optional ones
  • Add source-specific fields that are relevant for search and metadata
5

Implement Your Source

Create your source connector in backend/airweave/platform/sources/your_source.py.

sources/ghibli.py
1# Create backend/airweave/platform/sources/ghibli.py
2
3from typing import AsyncGenerator, Optional, Dict, Any
4import httpx
5
6from airweave.platform.entities.ghibli import GhibliFilmEntity
7from airweave.platform.decorators import source
8from airweave.platform.sources._base import BaseSource
9from airweave.platform.auth.schemas import AuthType
10
11@source(
12 name="Studio Ghibli",
13 short_name="ghibli",
14 auth_type=AuthType.none,
15 auth_config_class="GhibliAuthConfig",
16 config_class="GhibliConfig",
17 labels=["Entertainment", "API"]
18)
19class GhibliSource(BaseSource):
20 """Studio Ghibli source implementation."""
21
22 @classmethod
23 async def create(
24 cls,
25 credentials=None,
26 config: Optional[Dict[str, Any]] = None
27 ) -> "GhibliSource":
28 """Create a new Ghibli source instance."""
29 instance = cls()
30 instance.config = config or {}
31 return instance
32
33 async def generate_entities(self) -> AsyncGenerator[GhibliFilmEntity, None]:
34 """Generate entities from the Ghibli API."""
35 async with httpx.AsyncClient() as client:
36 response = await client.get("https://ghibli.rest/films")
37 response.raise_for_status()
38 films = response.json()
39
40 for film in films:
41 yield GhibliFilmEntity(
42 entity_id=film["id"],
43 film_id=film["id"],
44 title=film["title"],
45 original_title=film["original_title"],
46 content=film["description"], # Required by ChunkEntity
47 director=film["director"],
48 producer=film["producer"],
49 release_date=film["release_date"],
50 running_time=film["running_time"],
51 rt_score=film["rt_score"] if self.config.get("include_rt_scores", True) else None,
52 people=film.get("people", []),
53 species=film.get("species", []),
54 locations=film.get("locations", []),
55 vehicles=film.get("vehicles", [])
56 )

Key implementation points:

  • Import your custom entity classes
  • Use the @source() decorator with your auth and config classes
  • Implement create() classmethod that handles credentials and config
  • Implement generate_entities() that yields your custom entity objects
  • Handle authentication based on your auth type
  • Use config options to customize behavior
6

Test Your Connector

Verify everything works by running Airweave and creating a test connection.

  1. Start Airweave: Your connector appears automatically in the dashboard
  2. Create a collection and add your new source
  3. Test the connection and verify data syncs correctly
  4. Search your data to confirm everything works end-to-end

🎉 Your custom source is live! Users can now connect to your data source just like any built-in connector.

Authentication Types

Choose the Right Auth Type

Be careful about what kind of authentication your app uses. Airweave supports many auth types including API keys, various OAuth2 flows, and database connections. Check your data source’s API documentation to determine the correct AuthType to use in your @source() decorator.

OAuth2 Setup

For OAuth2 sources, you’ll also need to add your integration to the dev.integrations.yaml file. Reach out to us for guidance on the OAuth2 flow setup.

File Structure

Your complete implementation will create files in these locations:

File structure
backend/airweave/platform/
├── configs/
│ ├── auth.py # Add YourSourceAuthConfig class
│ └── config.py # Add YourSourceConfig class
├── entities/
│ └── your_source.py # Define entity schemas for your data
└── sources/
└── your_source.py # Your source implementation

Real Examples

Learn from existing connectors in our open-source repository:

Contributing Back

🤝 Help the Community

Consider contributing your connector back to the main repository! File a PR and help other users benefit from your work.