Overview
Custom metadata allows you to attach arbitrary key-value pairs to your memories, enabling powerful filtering capabilities when querying. This is useful for categorizing documents by department, priority level, confidentiality, or any other custom attributes relevant to your application.
Text Memories
When adding text memories via /memories/add, include a metadata object with your custom fields:
from hyperspell import Hyperspell
client = Hyperspell(api_key="API_KEY", user_id="YOUR_USER_ID")
memory = client.memories.add(
text="Q1 planning meeting notes discussing product roadmap.",
resource_id="meeting-2024-q1",
metadata={
"department": "engineering",
"priority": 5,
"confidential": True,
"meeting_date": "2024-01-15T09:00:00Z"
}
)
print(memory.resource_id)
Supported Value Types
Metadata values can be:
- Strings:
"department": "engineering"
- Numbers:
"priority": 5 or "score": 0.95
- Booleans:
"confidential": true
- ISO 8601 Dates:
"meeting_date": "2024-01-15T09:00:00Z"
File Uploads
When uploading files via /memories/upload, pass metadata as a JSON string in the form data:
import json
from hyperspell import Hyperspell
client = Hyperspell(api_key="API_KEY", user_id="YOUR_USER_ID")
with open("quarterly_report.pdf", "rb") as file:
memory = client.memories.upload(
file=file,
collection="reports",
metadata=json.dumps({
"department": "finance",
"quarter": "Q1",
"year": 2024,
"confidential": True
})
)
print(memory.resource_id)
For file uploads, metadata must be passed as a JSON-encoded string since the endpoint uses multipart form data.
When you add a memory with the same resource_id, the metadata is merged with any existing metadata. New keys are added, and existing keys are overwritten:
# First call creates the resource
client.memories.add(
text="Initial notes",
resource_id="doc-123",
metadata={
"department": "engineering",
"priority": 3
}
)
# Second call merges metadata
client.memories.add(
text="Updated notes",
resource_id="doc-123",
metadata={
"priority": 5,
"reviewed": True
}
)
# Result: metadata = { "department": "engineering", "priority": 5, "reviewed": True }
Using the Update Endpoint
For more granular updates, use the /memories/update endpoint. This endpoint allows you to update metadata, title, collection, or text without re-indexing if text is not provided. It works with documents from any source (vault, slack, gmail, etc.):
from hyperspell import Hyperspell
client = Hyperspell(api_key="API_KEY", user_id="YOUR_USER_ID")
# Update only metadata without re-indexing
client.memories.update(
source="vault",
resource_id="doc-123",
metadata={
"priority": 5,
"reviewed": True
}
)
# Update multiple fields at once
client.memories.update(
source="vault",
resource_id="doc-123",
title="New Title",
collection="new-collection",
metadata={
"status": "published"
}
)
# Remove a collection by setting to None
client.memories.update(
source="vault",
resource_id="doc-123",
collection=None
)
The update endpoint only modifies fields you explicitly provide. Fields you don’t include remain unchanged. To remove a collection, explicitly set it to null.
Use the options.filter parameter when querying to filter results by metadata. Filters use MongoDB-style operators and are combined with AND logic.
Basic Filtering
Filter by exact value match:
response = client.memories.search(
query="product roadmap",
sources=["vault"],
options={
"filter": {
"department": "engineering"
}
}
)
print(response.documents)
Multiple Conditions
Multiple conditions are combined with AND logic:
response = client.memories.search(
query="meeting notes",
sources=["vault"],
options={
"filter": {
"department": "engineering",
"confidential": False
}
}
)
Comparison Operators
Use MongoDB-style operators for advanced filtering:
| Operator | Description | Example |
|---|
$eq | Equal to (implicit) | {"priority": 5} or {"priority": {"$eq": 5}} |
$ne | Not equal to | {"department": {"$ne": "sales"}} |
$gt | Greater than | {"priority": {"$gt": 3}} |
$gte | Greater than or equal | {"priority": {"$gte": 3}} |
$lt | Less than | {"priority": {"$lt": 5}} |
$lte | Less than or equal | {"priority": {"$lte": 5}} |
$in | Value in list | {"department": {"$in": ["engineering", "marketing"]}} |
Complex Filter Examples
High priority, non-confidential documents:
response = client.memories.search(
query="important updates",
sources=["vault"],
options={
"filter": {
"priority": {"$gte": 4},
"confidential": False
}
}
)
Documents from specific departments:
response = client.memories.search(
query="team updates",
sources=["vault"],
options={
"filter": {
"department": {"$in": ["engineering", "product", "design"]}
}
}
)
Exclude certain categories:
response = client.memories.search(
query="company news",
sources=["vault"],
options={
"filter": {
"category": {"$ne": "internal"}
}
}
)
Range queries on numeric values:
response = client.memories.search(
query="documents",
sources=["vault"],
options={
"filter": {
"priority": {"$gt": 2, "$lte": 8}
}
}
)
Combining with Date Filters
Metadata filters can be combined with the after and before date range options:
response = client.memories.search(
query="quarterly reports",
sources=["vault"],
options={
"after": "2024-01-01",
"before": "2024-12-31",
"filter": {
"department": "finance",
"confidential": False
}
}
)
Filtering the List Endpoint
You can also use metadata filters when listing all memories via /memories/list. Pass the filter as a URL-encoded JSON string:
import json
from hyperspell import Hyperspell
client = Hyperspell(api_key="API_KEY", user_id="YOUR_USER_ID")
# List only engineering documents
response = client.memories.list(
filter=json.dumps({"department": "engineering"})
)
print(f"Found {len(response.items)} documents")
All the same filter operators work with the list endpoint: $eq, $ne, $gt, $gte, $lt, $lte, and $in.
When you query documents, the custom metadata is included in each document’s metadata field alongside system fields:
{
"documents": [
{
"resource_id": "meeting-2024-q1",
"source": "vault",
"title": "Q1 Planning Meeting",
"metadata": {
"status": "completed",
"indexed_at": "2024-01-15T10:30:00Z",
"department": "engineering",
"priority": 5,
"confidential": true,
"meeting_date": "2024-01-15T09:00:00Z"
},
"highlights": [...]
}
]
}
You can access this metadata in your code:
response = client.memories.search(
query="engineering roadmap",
sources=["vault"],
options={
"filter": {"department": "engineering"}
}
)
for doc in response.documents:
print(f"Title: {doc.title}")
print(f"Department: {doc.metadata.get('department')}")
print(f"Priority: {doc.metadata.get('priority')}")
Best Practices
- Use consistent key names across your application to enable reliable filtering
- Keep metadata flat - nested objects are not supported for filtering
- Use appropriate types - numbers for numeric comparisons, booleans for true/false values
- Plan your taxonomy - decide on standard values for categorical fields like
department or category
- Don’t over-filter - metadata filtering happens after semantic search, so overly restrictive filters may exclude relevant results