Concepts
Airweave connects to your apps, databases, and documents, then turns them into knowledge you can search. To understand how it works, you only need a few core concepts.
Source
A Source is an external application, database, or document store where your data lives. Sources are the systems Airweave pulls data from to build your retrieval layer.
Sources can be:
- Productivity tools: Notion, Slack, Asana, Jira, Confluence
- Cloud storage: Google Drive, Dropbox, OneDrive, SharePoint
- CRMs and sales tools: Salesforce, HubSpot, Pipedrive
- Developer tools: GitHub, GitLab, Bitbucket
Each source type has its own data structures, authentication methods, and API patterns. Airweave abstracts these differences so your agents can query across all of them through a single interface.
Connector
A Connector is the integration code that allows Airweave to communicate with a specific source. Each connector handles:
- Authentication: OAuth flows, API keys, or database credentials depending on the source
- Data extraction: Fetching records, documents, or rows from the source API
- Entity mapping: Transforming source-specific data structures into Airweave’s unified entity format
- Incremental sync: Tracking changes so only new or modified data is re-synced
Connectors are designed to be reliable and efficient. They handle rate limits, pagination, and API quirks so you don’t have to build custom ingestion logic for each data source.
For more information on the connectors supported and how they work, see the Connectors overview.
Source Connection
A Source Connection is a configured, authenticated instance of a connector linked to your specific account or workspace. While a connector defines how to integrate with a source, a source connection is the actual live connection using your credentials.
When you create a source connection, you:
- Select a connector (e.g., Slack)
- Authenticate with your credentials (e.g., OAuth login to your Slack workspace)
- Assign it to a collection
Once created, Airweave continuously syncs data from that source connection, keeping your retrieval layer fresh and up-to-date. You can have multiple source connections of the same type (e.g., connecting to several different Slack workspaces).
Entity
An Entity is a single, searchable item extracted from a source. Entities are the atomic units of data that get indexed and returned in search results.
Examples of entities:
- A Slack message or thread
- A Notion page or database row
- A GitHub issue, pull request, or code file
- A Google Doc or spreadsheet
- A Zendesk ticket or customer conversation
- An Airtable record or row
Each entity is processed through Airweave’s pipeline:
- Extracted from the source via its connector
- Transformed into a standardized format with metadata
- Chunked if the content is long (for better retrieval accuracy)
- Embedded using vector embeddings for semantic search
- Indexed and stored in the retrieval layer
When an AI agent queries Airweave, it searches across these entities and receives results optimized for LLM consumption, with source attribution and links back to the original content.
Collection
A Collection is a searchable knowledge base composed of entities from one or more source connections. Collections are what your AI agents actually query.
A single collection might include:
- Slack messages from your team workspace
- Documentation from Notion
- Issues and PRs from GitHub
- Customer data from Salesforce
When an agent searches a collection, the query runs across all entities from all connected sources, returning the most relevant results regardless of where the data originally came from. This is what makes Airweave a unified retrieval layer.
Collections can be queried via the REST API, SDKs, or MCP, making them accessible to any AI agent, RAG pipeline, or application that needs grounded, up-to-date context.
To learn more about querying collections, including semantic search, hybrid search, filtering, and reranking, see the Search documentation.