LlamaIndex | Airweave

The llama-index-tools-airweave package provides an AirweaveToolSpec that gives your LlamaIndex agents access to Airweave’s semantic search capabilities.

Prerequisites

Before you start you’ll need:

A collection with data: at least one source connection must have completed its initial sync. See the Quickstart if you need to set this up.
An API key: Create one in the Airweave dashboard under API Keys.

Installation

$ pip install llama-index llama-index-tools-airweave

Quick Start

1 import os
2 import asyncio
3 from llama_index.tools.airweave import AirweaveToolSpec
4 from llama_index.core.agent.workflow import FunctionAgent
5 from llama_index.llms.openai import OpenAI
6 
7 # Initialize the Airweave tool
8 airweave_tool = AirweaveToolSpec(
9     api_key=os.environ["AIRWEAVE_API_KEY"],
10 )
11 
12 # Create an agent with the Airweave tools
13 agent = FunctionAgent(
14     tools=airweave_tool.to_tool_list(),
15     llm=OpenAI(model="gpt-4o-mini"),
16     system_prompt="""You are a helpful assistant that can search through
17     Airweave collections to answer questions about your organization's data.""",
18 )
19 
20 # Use the agent to search your data
21 async def main():
22     response = await agent.run(
23         "Search the finance-data collection for Q4 revenue reports"
24     )
25     print(response)
26 
27 if __name__ == "__main__":
28     asyncio.run(main())

Available Tools

The AirweaveToolSpec provides five tools that your agent can use:

`search_collection`

Simple search in a collection with default settings (most common use case).

Parameter	Type	Description
`collection_id`	str	The readable ID of the collection
`query`	str	Your search query
`limit`	int	Max results to return (default: 10)
`offset`	int	Pagination offset (default: 0)

`advanced_search_collection`

Advanced search with full control over retrieval parameters.

Parameter	Type	Description
`collection_id`	str	The readable ID of the collection
`query`	str	Your search query
`limit`	int	Max results to return (default: 10)
`offset`	int	Pagination offset (default: 0)
`retrieval_strategy`	str	`"hybrid"`, `"neural"`, or `"keyword"`
`temporal_relevance`	float	Weight recent content (0.0-1.0)
`expand_query`	bool	Generate query variations
`interpret_filters`	bool	Extract filters from natural language
`rerank`	bool	Use LLM-based reranking
`generate_answer`	bool	Generate natural language answer

Returns a dictionary with documents list and optional answer field.

`search_and_generate_answer`

Convenience method that searches and returns a direct natural language answer (RAG-style).

Parameter	Type	Description
`collection_id`	str	The readable ID of the collection
`query`	str	Your question in natural language
`limit`	int	Max results to consider (default: 10)
`use_reranking`	bool	Use reranking (default: True)

`list_collections`

List all collections in your organization.

Parameter	Type	Description
`skip`	int	Pagination skip (default: 0)
`limit`	int	Max collections to return (default: 100)

`get_collection_info`

Get detailed information about a specific collection.

Parameter	Type	Description
`collection_id`	str	The readable ID of the collection

Advanced Examples

Direct Tool Usage

You can use the tools directly without an agent:

1 from llama_index.tools.airweave import AirweaveToolSpec
2 
3 airweave_tool = AirweaveToolSpec(api_key="your-key")
4 
5 # List collections
6 collections = airweave_tool.list_collections()
7 print(f"Found {len(collections)} collections")
8 
9 # Simple search
10 results = airweave_tool.search_collection(
11     collection_id="finance-data",
12     query="Q4 revenue reports",
13     limit=5
14 )
15 
16 for doc in results:
17     print(f"Score: {doc.metadata.get('score', 'N/A')}")
18     print(f"Text: {doc.text[:200]}...")

Advanced Search with All Options

1 result = airweave_tool.advanced_search_collection(
2     collection_id="finance-data",
3     query="Q4 revenue reports",
4     limit=20,
5     retrieval_strategy="hybrid",
6     temporal_relevance=0.3,
7     expand_query=True,
8     interpret_filters=True,
9     rerank=True,
10     generate_answer=True,
11 )
12 
13 documents = result["documents"]
14 if "answer" in result:
15     print(f"Generated Answer: {result['answer']}")

RAG-Style Direct Answers

1 answer = airweave_tool.search_and_generate_answer(
2     collection_id="finance-data",
3     query="What was our Q4 revenue growth?",
4     limit=10,
5     use_reranking=True,
6 )
7 print(answer)  # "Q4 revenue grew by 23% to $45M compared to Q3..."

Using Different Retrieval Strategies

1 # Keyword search for exact term matching
2 results = airweave_tool.advanced_search_collection(
3     collection_id="legal-docs",
4     query="GDPR compliance",
5     retrieval_strategy="keyword",
6 )
7 
8 # Neural search for semantic understanding
9 results = airweave_tool.advanced_search_collection(
10     collection_id="research-papers",
11     query="papers about transformer architectures",
12     retrieval_strategy="neural",
13 )
14 
15 # Hybrid search (default) - best of both worlds
16 results = airweave_tool.advanced_search_collection(
17     collection_id="all-docs",
18     query="machine learning best practices",
19     retrieval_strategy="hybrid",
20 )

Temporal Relevance

Weight recent documents higher in results:

1 results = airweave_tool.advanced_search_collection(
2     collection_id="news-articles",
3     query="AI breakthroughs",
4     temporal_relevance=0.8,  # 0.0 = no recency bias, 1.0 = only recent matters
5 )

Custom Base URL

If you’re self-hosting Airweave:

1 airweave_tool = AirweaveToolSpec(
2     api_key="your-api-key",
3     base_url="https://your-airweave-instance.com",
4 )

Using with Local Models

$ pip install llama-index-llms-ollama

1 from llama_index.llms.ollama import Ollama
2 
3 agent = FunctionAgent(
4     tools=airweave_tool.to_tool_list(),
5     llm=Ollama(model="llama3.1", request_timeout=360.0),
6 )