Search
Instant Search
POST /collections/{id}/search/instant
Direct vector search. Use when speed is critical (~0.5sec).
The only parameter unique to instant is retrieval_strategy, which controls how the vector database matches your query:
hybrid(default) — Combines semantic and keyword search via Reciprocal Rank Fusion. Best for most queries.semantic— Dense vector cosine similarity. Finds conceptually similar content even when wording differs.keyword— BM25 text matching. Only returns content with your exact terms. Use for error codes, identifiers, or known phrases.
In classic and agentic search, the retrieval strategy is chosen automatically.
Classic Search
POST /collections/{id}/search/classic
AI-optimized search strategy. Sensible default for most use cases (~2sec).
An LLM analyzes your query and generates an optimized search strategy.
Agentic Search
POST /collections/{id}/search/agentic
Agent that navigates through your collection to find the best results. Use when recall matters more than latency (<2min).
An AI agent iteratively searches your data using tool calling. It searches with multiple strategies, reads full documents, navigates entity hierarchies (parent/child/sibling), and builds a comprehensive result set.
Two parameters unique to agentic:
thinking— Enables extended chain-of-thought reasoning before tool calls. Better search strategies, but slower and uses more tokens. Useful for complex or ambiguous queries.limit— Unlike instant/classic where the vector database always returns up tolimitresults, the agent collects results based on relevance. It may return fewer if it decides there aren’t enough matches. Setting a limit caps the maximum — if the agent collects more, results are truncated. Whennull(default), there is no cap.
Streaming
POST /collections/{id}/search/agentic/stream
Real-time SSE events as the agent works. Events are delivered as data: {json}\n\n messages. The stream terminates after a done or error event.
started
Emitted once when the search begins.
thinking
Emitted once per iteration after the LLM responds. thinking contains extended reasoning (when enabled), text contains conversational output before tool calls.
tool_call
Emitted after each tool the agent calls. diagnostics.arguments has the full tool input, diagnostics.stats has the output. The stats shape depends on which tool was called:
search
read
add_to_results
remove_from_results
count
get_children
get_siblings
get_parent
review_results
return_results_to_user
reranking
Emitted after the agent’s collected results are reranked for final ordering.
done
Final event. Contains the full result set and run diagnostics.
error
Emitted when the search fails. Also terminates the stream.
Filters
Filters constrain search results by metadata. They work across all three tiers.
In classic and agentic search, the AI generates its own filters internally, your filters are AND’d into every search it performs, acting as constraints that cannot be bypassed.
Structure
Filters use a two-level structure:
- Conditions within a group are combined with AND
- Multiple groups are combined with OR
This allows expressions like: (A AND B) OR (C AND D)
Filterable Fields
Operators
Examples
Filter by source:
Filter by time range (ISO 8601 timestamps required):
Filter by multiple sources (using in):
Combine groups with OR — Slack messages OR Notion pages:
Navigate hierarchy — find all entities inside a parent:
Validation Rules
- Date fields (
created_at,updated_at) require ISO 8601 timestamps (e.g.,2025-01-15T00:00:00Z) - Ordering operators (
greater_than,less_than, etc.) only work on date and numeric fields containsonly works on text fieldsinandnot_inrequire array values- Scalar operators (
equals,contains, etc.) require a single value, not an array
Response Format
All three tiers return the same SearchV2Response with a results array. See the API Reference for the full response schema and interactive examples.