Search

Airweave lets you search across all your connected data sources through one unified interface. When you query a collection, Airweave runs a multi-step search pipeline that combines AI understanding with keyword precision. You can start with the defaults or configure each step for full control.

Want to try out our search right now? Head to our interactive API documentation where you can test search queries directly in your browser!

Quick Reference

Here are the default settings Airweave uses. You can override any of these in your queries.

FeatureDefaultWhat it does
Query ExpansionautoAutomatically creates query variations when AI is available
Search MethodhybridCombines AI semantic search with keyword matching
Query InterpretationOffYou control filters manually
AI RerankingOnImproves result quality (adds ~10s latency)
Recency Boost0.3Slightly prefers newer content
Score FilterNoneReturns all matches
Response FormatrawReturns actual documents, not AI summaries

Which endpoint to use

Choose the GET endpoint for simple searches, and the POST endpoint when you need advanced config.

How Airweave search works

Each search runs through a multi step pipeline. Understanding the stages helps explain why different parameters exist and when to use them:

  1. Query expansion: Generate variations of the user query to capture synonyms and related terms.
  2. Retrieval: Use keyword, neural, or hybrid methods to fetch candidate documents.
  3. Filtering: Apply structured metadata filters before or during retrieval.
  4. Recency bias: Optionally weight results toward fresher content.
  5. Reranking: Use AI to reorder the top results for higher precision.
  6. Answer generation: Return raw documents or synthesize a natural language response.

Defaults are designed to work out of the box, and you can override any stage as needed.

Parameters

Query Expansion

Expands your query to catch related terms and synonyms that may not appear verbatim in your documents. This improves recall when wording differs but meaning is the same.

Options: expansion_strategy

  • auto (default): Uses AI to expand queries when available
  • llm: Always uses AI to create up to 4 query variations
  • no_expansion: Search only for your exact query
$curl -X POST 'https://api.airweave.ai/collections/your-collection-id/search' \
> -H 'x-api-key: YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{
> "query": "customer churn analysis",
> "expansion_strategy": "llm"
> }'

Search Method

The search method determines how Airweave searches your data. Different methods balance semantic understanding and keyword precision. You can use AI to understand meaning, traditional keyword matching, or both.

Options: search_method

  • hybrid (default): Best of both worlds - finds results by meaning AND exact keywords
  • neural: AI-powered search that understands what you mean, not just what you type
  • keyword: Traditional search that looks for exact word matches
$curl -X POST 'https://api.airweave.ai/collections/your-collection-id/search' \
> -H 'x-api-key: YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{
> "query": "authentication flow security vulnerabilities",
> "search_method": "hybrid"
> }'

Filtering Results

Applies structured filters before search, ensuring only relevant subsets are scanned. Useful for large datasets or when results must match specific attributes like source, date, or status.

Parameter: filter

Example 1: Filter by source

$curl -X POST 'https://api.airweave.ai/collections/your-collection-id/search' \
> -H 'x-api-key: YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{
> "query": "deployment issues",
> "filter": {
> "must": [{
> "key": "source_name",
> "match": {"value": "GitHub"}
> }]
> }
> }'

Example 2: Multiple filters

$curl -X POST 'https://api.airweave.ai/collections/your-collection-id/search' \
> -H 'x-api-key: YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{
> "query": "customer feedback",
> "filter": {
> "must": [
> {
> "key": "source_name",
> "match": {"any": ["Zendesk", "Intercom", "Slack"]}
> },
> {
> "key": "created_at",
> "range": {
> "gte": "2024-01-01T00:00:00Z"
> }
> }
> ]
> }
> }'

Example 3: Exclude results

$curl -X POST 'https://api.airweave.ai/collections/your-collection-id/search' \
> -H 'x-api-key: YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{
> "query": "bug reports",
> "filter": {
> "must_not": [{
> "key": "status",
> "match": {"any": ["resolved", "closed", "done"]}
> }]
> }
> }'

Query Interpretation

This feature is currently in beta. It can occasionally filter too narrowly, so verify result counts.

Query interpretation allows Airweave to automatically extract structured filters from a natural language query. Instead of manually defining metadata filters, you can simply describe what you are looking for, and Airweave will translate that description into filter conditions.

This feature is useful when you want to let end users search in plain English, for example “open GitHub issues from last week” or “critical bugs reported this month”. Airweave analyzes the query, identifies entities like dates, sources, or statuses, and applies them as filters.

Options: enable_query_interpretation

  • false (default): You control all filters manually
  • true: AI extracts filters from your natural language query
$curl -X POST 'https://api.airweave.ai/collections/your-collection-id/search' \
> -H 'x-api-key: YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{
> "query": "open asana tickets from last week",
> "enable_query_interpretation": true
> }'

Temporal Relevance

Learn more about this topic in our blogpost: Deep Dive on Temporal Relevance .

Temporal relevance adjusts the results ranking to prefer newer documents. This is valuable for time-sensitive data like messages, customer feedback, tickets, or news.

The scoring formula adjusts results based on age:

Sfinal = Ssimilarity × (1 − β + β × d(t))

where,

  • Sfinal = final relevance score
  • Ssimilarity = semantic similarity score
  • β = recency bias parameter (0 to 1)
  • d(t) = time decay factor (0 = oldest, 1 = newest).

Options: recency_bias (0.0 to 1.0)

  • 0.3 (default): Slightly prefer newer content
  • 0.0: Don’t care about dates, just find the best matches
  • 1.0: Heavily prioritize the newest content
$curl -X POST 'https://api.airweave.ai/collections/your-collection-id/search' \
> -H 'x-api-key: YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{
> "query": "project updates",
> "recency_bias": 0.7
> }'

Use this when freshness matters. For example, prioritizing the latest bug reports or recent customer complaints over historical ones.

Pagination

Control how many results you get and navigate through large result sets.

Parameters:

  • limit: How many results to return (1-1000, default: 20)
  • offset: How many results to skip (for pagination, default: 0)
$# Simple search with pagination
>curl -X GET 'https://api.airweave.ai/collections/your-collection-id/search?query=data%20retention%20policies&limit=50&offset=50' \
> -H 'x-api-key: YOUR_API_KEY'

Filter by Relevance Score

Set a minimum relevance score to filter out weak matches. Useful when you only want high-quality results.

Options: score_threshold (0.0 to 1.0)

  • None (default): Return all matches
  • 0.7-0.8: Return only high-confidence matches
$curl -X POST 'https://api.airweave.ai/collections/your-collection-id/search' \
> -H 'x-api-key: YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{
> "query": "security vulnerability CVE-2024",
> "score_threshold": 0.8
> }'

Use this when you need very reliable matches and can tolerate lower recall. For example, in compliance or legal document retrieval.

AI Reranking

AI reranking takes the top set of results from the initial search and reorders them using a large language model. This improves accuracy in cases where keyword or semantic similarity alone might be misleading.

Options: enable_reranking

  • true (default): AI reviews and reorders results for best relevance
  • false: Skip reranking for faster results

Reranking adds about 10 seconds to your search. Turn it off if you need fast results.

$curl -X POST 'https://api.airweave.ai/collections/your-collection-id/search' \
> -H 'x-api-key: YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{
> "query": "user authentication methods",
> "enable_reranking": false
> }'

Generate AI Answers

Airweave can return either raw results or a synthesized answer. When set to completion, a large language model generates a natural language response based on the top results, including sources when available.

Options: response_type

  • raw (default): Get the actual results, recommended when you want full control.
  • completion: Get a synthesized answer generated from the top search results.
$curl -X POST 'https://api.airweave.ai/collections/your-collection-id/search' \
> -H 'x-api-key: YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{
> "query": "What are our customer refund policies?",
> "response_type": "completion"
> }'

Complete example

Here’s everything together in one search:

$curl -X POST 'https://api.airweave.ai/collections/your-collection-id/search' \
> -H 'x-api-key: YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{
> "query": "customer feedback about pricing",
> "expansion_strategy": "llm",
> "search_method": "hybrid",
> "filter": {
> "must": [{
> "key": "source_name",
> "match": {"any": ["Zendesk", "Slack"]}
> }]
> },
> "recency_bias": 0.5,
> "score_threshold": 0.7,
> "enable_reranking": true,
> "response_type": "raw",
> "limit": 50,
> "offset": 0
> }'
Ready to search?

Try these examples live in our interactive API documentation. You can execute real searches and see responses instantly!