Search
Instant Search
POST /collections/{id}/search/instant
Direct vector search. Use when speed is critical (~0.5sec).
The only parameter unique to instant is retrieval_strategy, which controls how the vector database matches your query:
hybrid(default) — Combines semantic and keyword search via Reciprocal Rank Fusion. Best for most queries.semantic— Dense vector cosine similarity. Finds conceptually similar content even when wording differs.keyword— BM25 text matching. Only returns content with your exact terms. Use for error codes, identifiers, or known phrases.
In classic and agentic search, the retrieval strategy is chosen automatically.
Classic Search
POST /collections/{id}/search/classic
AI-optimized search strategy. Sensible default for most use cases (~2sec).
An LLM analyzes your query and generates an optimized search strategy.
Agentic Search
POST /collections/{id}/search/agentic
Agent that navigates through your collection to find the best results. Use when recall matters more than latency (<2min).
An AI agent iteratively searches your data using tool calling. It searches with multiple strategies, reads full documents, navigates entity hierarchies (parent/child/sibling), and builds a comprehensive result set.
Two parameters unique to agentic:
thinking— Enables extended chain-of-thought reasoning before tool calls. Better search strategies, but slower and uses more tokens. Useful for complex or ambiguous queries.limit— Unlike instant/classic where the vector database always returns up tolimitresults, the agent collects results based on relevance. It may return fewer if it decides there aren’t enough matches. Setting a limit caps the maximum — if the agent collects more, results are truncated. Whennull(default), there is no cap.
Streaming
POST /collections/{id}/search/agentic/stream
Real-time SSE events as the agent works. Events are delivered as data: {json}\n\n messages. The stream terminates after a done or error event.
started
Emitted once when the search begins.
thinking
Emitted once per iteration after the LLM responds. thinking contains extended reasoning (when enabled), text contains conversational output before tool calls.
tool_call
Emitted after each tool the agent calls. diagnostics.arguments has the full tool input, diagnostics.stats has the output. The stats shape depends on which tool was called:
search
read
add_to_results
remove_from_results
count
get_children
get_siblings
get_parent
review_results
return_results_to_user
reranking
Emitted after the agent’s collected results are reranked for final ordering.
done
Final event. Contains the full result set and run diagnostics.
error
Emitted when the search fails. Also terminates the stream.
Filters
Filters constrain search results by metadata. They work across all three tiers.
In classic and agentic search, the AI generates its own filters internally, your filters are AND’d into every search it performs, acting as constraints that cannot be bypassed.
Structure
Filters use a two-level structure:
- Conditions within a group are combined with AND
- Multiple groups are combined with OR
This allows expressions like: (A AND B) OR (C AND D)
Filterable Fields
Operators
Examples
Filter by source:
Filter by time range (ISO 8601 timestamps required):
Filter by multiple sources (using in):
Combine groups with OR — Slack messages OR Notion pages:
Navigate hierarchy — find all entities inside a parent:
Validation Rules
- Date fields (
created_at,updated_at) require ISO 8601 timestamps (e.g.,2025-01-15T00:00:00Z) - Ordering operators (
greater_than,less_than, etc.) only work on date and numeric fields containsonly works on text fieldsinandnot_inrequire array values- Scalar operators (
equals,contains, etc.) require a single value, not an array
Response Format
All three tiers return the same SearchV2Response with a results array. See the API Reference for the full response schema and interactive examples.
Configuring the LLM provider chain
Self-hosted only
This section is only relevant to self-hosted deployments. The managed service ships with providers configured.
Classic and Agentic search call an LLM. Instant search does not — a backend with no LLM configured still answers instant queries, and Classic/Agentic return HTTP 503 until an API key is set.
Default chain
Out of the box, Airweave tries providers in this order:
together:zai-glm-5anthropic:claude-sonnet-4.6
The first provider with an API key set that responds successfully handles the request. Subsequent entries are tried only on failure.
Setting API keys
Set at least one of the following environment variables on the backend:
If none are set, the backend boots normally; Classic/Agentic search return 503 Service Unavailable with a message listing these variables.
Overriding the chain
Set LLM_FALLBACK_CHAIN to a comma-separated list of provider:model pairs. Example:
Supported providers: cerebras, groq, anthropic, together, mistral. The full list of models per provider lives in backend/airweave/adapters/llm/registry.py.
The parser validates three things at startup:
- Every provider is a known provider.
- Every model is a known model.
- Every
(provider, model)combination exists in the registry (e.g.together:mistral-largeis rejected becausemistral-largeis hosted on Mistral, not Together).
Misconfiguration is caught at startup with an error that lists the accepted values.
Fallback semantics
- Providers without an API key are silently skipped when the chain is built.
- Providers whose initialization raises are logged and skipped.
- If the resulting chain is empty, the backend wires a null LLM — instant search still works; Classic/Agentic return 503.
- When a call fails in a chained provider, the next one is tried; a circuit breaker temporarily removes providers that recently failed.