Using Filters

While vector search excels at finding semantically similar content, filters allow you to narrow results based on exact payload criteria. This combination of semantic search and filtering makes Airweave particularly powerful for finding specific information within large datasets.

Why Filters Matter

Consider searching for “payment processing issues” across your connected systems. Without filters, you might get results from three years ago, from test environments, or from unrelated payment systems. Filters let you specify exactly which subset of data to search within.

Filter Structure

Airweave uses Qdrant’s filtering system, which provides a flexible way to express complex conditions. Filters consist of conditions combined with logical operators.

Try these filter examples in our interactive API playground.

Open API Explorer →

Basic Filter Anatomy

1from qdrant_client.http.models import Filter, FieldCondition, MatchValue
2
3filter = Filter(
4 must=[
5 FieldCondition(
6 key="source_name",
7 match=MatchValue(value="Stripe")
8 )
9 ]
10)

Logical Operators

Filters support three logical operators that can be combined to create complex queries:

Must (AND)

All conditions in the must array must be satisfied. Think of this as an AND operation.

1Filter(
2 must=[
3 FieldCondition(key="source_name", match=MatchValue(value="GitHub")),
4 FieldCondition(key="is_archived", match=MatchValue(value=False))
5 ]
6)
7# Returns: GitHub items that are NOT archived

Should (OR)

At least one condition in the should array must be satisfied. This creates an OR operation.

1Filter(
2 should=[
3 FieldCondition(key="priority", match=MatchValue(value="high")),
4 FieldCondition(key="priority", match=MatchValue(value="critical"))
5 ]
6)
7# Returns: Items with high OR critical priority

Must Not (NOT)

None of the conditions in the must_not array can be satisfied. Use this to exclude results.

1Filter(
2 must_not=[
3 FieldCondition(key="status", match=MatchValue(value="resolved"))
4 ]
5)
6# Returns: All items except those with resolved status

Common Airweave Fields

Understanding the available fields is crucial for effective filtering. Here are the most commonly used fields across Airweave data sources:

source_name field

The data source identifier. Important: This field is case-sensitive.

1# Correct - matches exactly
2FieldCondition(key="source_name", match=MatchValue(value="Asana"))
3
4# Incorrect - won't match "Asana"
5FieldCondition(key="source_name", match=MatchValue(value="asana"))

Timestamps

Timestamps use ISO 8601 format. Use DatetimeRange for date filtering:

1from qdrant_client.http.models import DatetimeRange
2from datetime import datetime, timezone
3
4FieldCondition(
5 key="created_at",
6 range=DatetimeRange(
7 gte=datetime(2024, 1, 1, tzinfo=timezone.utc),
8 lte=datetime(2024, 12, 31, tzinfo=timezone.utc)
9 )
10)

Nested Payload

Nested fields can be accessed using dot notation:

1# Access nested metadata fields
2FieldCondition(key="metadata.project_id", match=MatchValue(value="PROJ-123"))
3FieldCondition(key="metadata.assignee", match=MatchValue(value="john@example.com"))

Practical Examples

Filter by Source

Find all content from a specific data source:

1search_request = SearchRequest(
2 query="deployment procedures",
3 filter=Filter(
4 must=[
5 FieldCondition(
6 key="source_name",
7 match=MatchValue(value="Confluence")
8 )
9 ]
10 )
11)

Date Range Filtering

Find recent items within the last 30 days:

1from datetime import datetime, timedelta, timezone
2
3thirty_days_ago = datetime.now(timezone.utc) - timedelta(days=30)
4
5search_request = SearchRequest(
6 query="bug reports",
7 filter=Filter(
8 must=[
9 FieldCondition(
10 key="created_at",
11 range=DatetimeRange(gte=thirty_days_ago)
12 )
13 ]
14 )
15)

Complex Multi-Source Query

Find high-priority items from multiple support systems:

1from qdrant_client.http.models import MatchAny
2
3search_request = SearchRequest(
4 query="customer complaints",
5 filter=Filter(
6 must=[
7 FieldCondition(
8 key="source_name",
9 match=MatchAny(any=["Zendesk", "Intercom"])
10 ),
11 FieldCondition(
12 key="priority",
13 match=MatchValue(value="high")
14 )
15 ],
16 must_not=[
17 FieldCondition(
18 key="status",
19 match=MatchValue(value="closed")
20 )
21 ]
22 )
23)

Handling Case Sensitivity

Since source_name is case-sensitive, use MatchAny to handle variations:

1# Case-insensitive source matching
2FieldCondition(
3 key="source_name",
4 match=MatchAny(any=["Slack", "slack", "SLACK"])
5)

Advanced Filtering

Combining Conditions

Create sophisticated filters by nesting conditions:

1Filter(
2 must=[
3 FieldCondition(key="source_name", match=MatchValue(value="GitHub")),
4 Filter(
5 should=[
6 FieldCondition(key="labels", match=MatchAny(any=["bug", "critical"])),
7 FieldCondition(key="assignee", match=MatchValue(value="unassigned"))
8 ]
9 )
10 ]
11)
12# Returns: GitHub issues that are either labeled as bug/critical OR unassigned

Null and Empty Checks

Check for missing or empty fields:

1from qdrant_client.http.models import IsNullCondition, IsEmptyCondition
2
3# Find items without an assignee
4Filter(
5 must=[
6 IsNullCondition(is_null={"key": "assignee"})
7 ]
8)
9
10# Find items with empty tags array
11Filter(
12 must=[
13 IsEmptyCondition(key="tags")
14 ]
15)

Next Steps