Core Concepts

Airweave is built around several key concepts that work together to create a powerful agent knowledge platform. This guide will walk you through each concept and how they fit together to make your data searchable and accessible.

We’re here to help! If you have any questions or feedback, please contact us.

Data Sources

Airweave Data Sources Overview
Selecting data sources

Data source connectors are the entry points for your information into Airweave. We support a wide range of sources to ensure you can connect all your valuable data.

Supported Source Types

Connect to external services through their REST APIs:

  • Enterprise tools like Slack and Gmail
  • Custom internal APIs
  • Third-party service integrations

Destinations

Your processed data is stored in vector databases optimized for semantic search operations.

Currently, Airweave only supports vector databases as destinations. Support for additional destination types - such as graph databases - will come very soon.

You can either use the native Weaviate database, or connect to an existing database of your choice.

Destinations Overview
Selecting destinations

Entities

Entities are the fundamental building blocks that power Airweave’s search capabilities. They provide a standardized way to represent information across different platforms and integrations, making your data uniformly searchable.

Entities are the smallest unit of data in Airweave. Every entity has an entity_id and belongs to a sync job, which in turn belongs to a sync. During ingestion, entities are hashed and version-tracked in the Airweave database. This allows for efficient deduplication and change detection.

Core Structure

Every entity in Airweave inherits from a base structure that ensures consistency and searchability:

Base Entity Structure
1class BaseEntity(BaseModel):
2 """Base class for all entities."""
3 entity_id: str # Unique identifier from platform
4 breadcrumbs: list[Breadcrumb] # Tracks ancestry/hierarchy

These entities are extended by platform-specific entities:

Slack Message Entity
1class SlackMessageEntity(ChunkEntity):
2 """Schema for Slack message entities."""
3
4 channel_id: str
5 user_id: Optional[str] = None
6 text: Optional[str] = None
7 ...

Synchronization

A synchronization in Airweave orchestrates sync jobs and scopes your data in the target destination.

This may be only one sync job if triggered manually or via API.

Sync jobs bring together a source connection with a destination, ensuring your data stays synchronized.

Sync Overview in UI
Sync overview of a manually triggered sync

Sync Jobs

Each sync contains one or more sync jobs, which are the individual data syncs that are executed.

We’re working on detailed logs and metrics, which will allow you to monitor performance and troubleshoot any issues that arise during the synchronization process.

Manual Sync

One-off sync jobs that you can trigger via UI for immediate execution. Great for:

  • Checking if the vectorization and search strategy works for your agent
  • Testing new connections
  • Debugging pipeline issues
  • Quick data refreshes

Scheduled Sync

Automated sync jobs that run on a predefined schedule. This is useful if you want to sync data on a regular basis, for example every hour, day or week. Airweave will automatically trigger the sync at the specified interval.

API Triggered

Programmatically control syncs through our API. This is useful if you want to trigger on specific events for your agent or data source.

1from airweave import AirweaveSDK
2
3client = AirweaveSDK(api_key="YOUR_API_KEY")
4
5client.sync.run_sync(
6 sync_id="your-sync-id",
7)

Refer to the Sync API Reference for more information on how to trigger syncs via API.

White Labeling Authentication

White labeling allows you to integrate 3rd party OAuth2 authentication with Airweave. This way you can use Airweave to pull your users data into your own application, without having to deal with OAuth2 authentication or storing user credentials.

White Labeling Overview
White Labeling Authentication

Refer to the White Labeling documentation for more information on how to set up white labeling.