LlamaIndex

Connect your LlamaIndex agent to Airweave for semantic search across all your synced data sources.

The llama-index-tools-airweave package provides an AirweaveToolSpec that gives your LlamaIndex agents access to Airweave’s semantic search capabilities.

Prerequisites

Before you start you’ll need:

  • A collection with data: at least one source connection must have completed its initial sync. See the Quickstart if you need to set this up.
  • An API key: Create one in the Airweave dashboard under API Keys.

Installation

$pip install llama-index llama-index-tools-airweave

Quick Start

1import os
2import asyncio
3from llama_index.tools.airweave import AirweaveToolSpec
4from llama_index.core.agent.workflow import FunctionAgent
5from llama_index.llms.openai import OpenAI
6
7# Initialize the Airweave tool
8airweave_tool = AirweaveToolSpec(
9 api_key=os.environ["AIRWEAVE_API_KEY"],
10)
11
12# Create an agent with the Airweave tools
13agent = FunctionAgent(
14 tools=airweave_tool.to_tool_list(),
15 llm=OpenAI(model="gpt-4o-mini"),
16 system_prompt="""You are a helpful assistant that can search through
17 Airweave collections to answer questions about your organization's data.""",
18)
19
20# Use the agent to search your data
21async def main():
22 response = await agent.run(
23 "Search the finance-data collection for Q4 revenue reports"
24 )
25 print(response)
26
27if __name__ == "__main__":
28 asyncio.run(main())

Available Tools

The AirweaveToolSpec provides five tools that your agent can use:

search_collection

Simple search in a collection with default settings (most common use case).

ParameterTypeDescription
collection_idstrThe readable ID of the collection
querystrYour search query
limitintMax results to return (default: 10)
offsetintPagination offset (default: 0)

advanced_search_collection

Advanced search with full control over retrieval parameters.

ParameterTypeDescription
collection_idstrThe readable ID of the collection
querystrYour search query
limitintMax results to return (default: 10)
offsetintPagination offset (default: 0)
retrieval_strategystr"hybrid", "neural", or "keyword"
temporal_relevancefloatWeight recent content (0.0-1.0)
expand_queryboolGenerate query variations
interpret_filtersboolExtract filters from natural language
rerankboolUse LLM-based reranking
generate_answerboolGenerate natural language answer

Returns a dictionary with documents list and optional answer field.

search_and_generate_answer

Convenience method that searches and returns a direct natural language answer (RAG-style).

ParameterTypeDescription
collection_idstrThe readable ID of the collection
querystrYour question in natural language
limitintMax results to consider (default: 10)
use_rerankingboolUse reranking (default: True)

list_collections

List all collections in your organization.

ParameterTypeDescription
skipintPagination skip (default: 0)
limitintMax collections to return (default: 100)

get_collection_info

Get detailed information about a specific collection.

ParameterTypeDescription
collection_idstrThe readable ID of the collection

Advanced Examples

Direct Tool Usage

You can use the tools directly without an agent:

1from llama_index.tools.airweave import AirweaveToolSpec
2
3airweave_tool = AirweaveToolSpec(api_key="your-key")
4
5# List collections
6collections = airweave_tool.list_collections()
7print(f"Found {len(collections)} collections")
8
9# Simple search
10results = airweave_tool.search_collection(
11 collection_id="finance-data",
12 query="Q4 revenue reports",
13 limit=5
14)
15
16for doc in results:
17 print(f"Score: {doc.metadata.get('score', 'N/A')}")
18 print(f"Text: {doc.text[:200]}...")

Advanced Search with All Options

1result = airweave_tool.advanced_search_collection(
2 collection_id="finance-data",
3 query="Q4 revenue reports",
4 limit=20,
5 retrieval_strategy="hybrid",
6 temporal_relevance=0.3,
7 expand_query=True,
8 interpret_filters=True,
9 rerank=True,
10 generate_answer=True,
11)
12
13documents = result["documents"]
14if "answer" in result:
15 print(f"Generated Answer: {result['answer']}")

RAG-Style Direct Answers

1answer = airweave_tool.search_and_generate_answer(
2 collection_id="finance-data",
3 query="What was our Q4 revenue growth?",
4 limit=10,
5 use_reranking=True,
6)
7print(answer) # "Q4 revenue grew by 23% to $45M compared to Q3..."

Using Different Retrieval Strategies

1# Keyword search for exact term matching
2results = airweave_tool.advanced_search_collection(
3 collection_id="legal-docs",
4 query="GDPR compliance",
5 retrieval_strategy="keyword",
6)
7
8# Neural search for semantic understanding
9results = airweave_tool.advanced_search_collection(
10 collection_id="research-papers",
11 query="papers about transformer architectures",
12 retrieval_strategy="neural",
13)
14
15# Hybrid search (default) - best of both worlds
16results = airweave_tool.advanced_search_collection(
17 collection_id="all-docs",
18 query="machine learning best practices",
19 retrieval_strategy="hybrid",
20)

Temporal Relevance

Weight recent documents higher in results:

1results = airweave_tool.advanced_search_collection(
2 collection_id="news-articles",
3 query="AI breakthroughs",
4 temporal_relevance=0.8, # 0.0 = no recency bias, 1.0 = only recent matters
5)

Custom Base URL

If you’re self-hosting Airweave:

1airweave_tool = AirweaveToolSpec(
2 api_key="your-api-key",
3 base_url="https://your-airweave-instance.com",
4)

Using with Local Models

$pip install llama-index-llms-ollama
1from llama_index.llms.ollama import Ollama
2
3agent = FunctionAgent(
4 tools=airweave_tool.to_tool_list(),
5 llm=Ollama(model="llama3.1", request_timeout=360.0),
6)

Learn More