When Postgres isn't enough: Specialized vector databases for complex filtering in financial services

Quick Recap : Last Sunday, we covered how Embeddings represent Meaning and Similarity, and also understood semantic space behavior and implications for enterprise search.

When Performance Becomes Non-Negotiable

It's Monday morning. Your RAG system for customer service is live, running on Postgres with pgvector. It works well for 500K documents. Then your CFO announces: "We're launching in three new countries. By Q3, we'll have 15 million customer documents."

Your engineering lead looks worried. "Our current setup hits 200ms latency at 500K vectors. At 15M, we're looking at 800ms-1 second response times."

The product team isn't happy. "Our SLA is 100ms. Customers won't wait a second for answers."

Then Compliance adds another requirement: "We need to ensure that customer service agents in Germany can't retrieve documents from US customers due to data residency laws. And agents handling credit applications can't see collections documents. Can your current system handle that level of filtering?"

You check. Postgres with pg vector can filter by metadata, but complex multi-condition filters (country AND department AND customer_tier AND date_range) slow down dramatically at scale.

This is the moment many BFSI teams discover they've outgrown Postgres and need specialized vector databases. Not because Postgres is bad—it's excellent for getting started. But when you need:

  • Sub-100ms latency at 10M+ vectors

  • Complex metadata filtering that doesn't kill performance

  • Fine-grained access control (customer A's agent can't see customer B's data)

  • Multi-tenancy isolation at the database level

You need purpose-built vector databases. Specifically, Qdrant (for pure performance and filtering) or Weaviate (for rich features and hybrid search).

Here's what I've learned deploying these in banks: The choice isn't "which is better"—it's "which matches your specific constraints." Both excel at different things. Today, I'm showing you when to use each and how to implement PII controls properly.

Why Specialized Vector Databases Matter in BFSI

What Postgres Can't Do (At Scale)

Let's be honest about Postgres limitations:

At 1M+ vectors:

  • Query latency: 100-300ms (acceptable)

  • Filtered queries: 200-500ms (borderline)

  • Complex filters (3+ conditions): 400-800ms (too slow)

At 10M+ vectors:

  • Query latency: 300-800ms (too slow)

  • Filtered queries: 800ms-2s (unacceptable)

  • Complex filters: 2-5s (deal-breaker)

Why this happens: Postgres is a general-purpose database. Its vector index (IVFFlat/HNSW) is good, but not optimized exclusively for vectors. When you add complex metadata filtering, it's doing:

  1. Vector similarity search

  2. SQL WHERE clause evaluation

  3. JOIN operations if needed

At scale, these operations compound.

Enter Specialized Vector Databases

Qdrant (Rust-based, performance-first)

  • Optimized exclusively for vector operations

  • Advanced filtering before vector search (not after)

  • 50-100ms latency at 10M+ vectors

  • 2-3x faster than competitors on filtered queries

Weaviate (Go-based, feature-rich)

  • Hybrid database (vectors + graph relationships)

  • GraphQL API for complex queries

  • Built-in vectorization modules

  • Rich ecosystem integrations

When to use which vector DB

Qdrant: When Filtering Performance Matters

Core Architecture

Qdrant (pronounced "quadrant") is written in Rust for memory safety and performance. Key design:

Collections = Namespaces for vectors (like tables) Payloads = JSON metadata attached to each vector Points = Individual vectors + their payloads

The killer feature: Pre-filtering

Most vector databases filter after finding similar vectors:

  1. Find 1000 most similar vectors

  2. Apply filters

  3. Return top 10 that match filters

Qdrant filters before vector search:

  1. Apply filters to narrow candidate set

  2. Find most similar vectors within filtered set

  3. Return top 10

Why this matters: At 10M vectors, if only 100K match your filter (country=Germany, department=customer_service), Qdrant searches within 100K, not 10M. Massive performance difference.

Implementation: KYC Document Search with Multi-Tenant Isolation

Let's build a system where:

  • Each customer's documents are isolated

  • Agents can only access documents they're authorized for

  • Searches are filtered by document type, date range, customer tier

# qdrant_setup.py
from qdrant_client import QdrantClient
from qdrant_client.models import (
    VectorParams, Distance, PointStruct,
    Filter, FieldCondition, MatchValue, Range
)

# Initialize client (self-hosted or cloud)
client = QdrantClient(
    url="https://your-qdrant-instance.com",
    api_key="your-api-key"  # Or use internal auth
)

# Create collection for KYC documents
client.create_collection(
    collection_name="kyc_documents",
    vectors_config=VectorParams(
        size=1536,  # OpenAI embedding dimension
        distance=Distance.COSINE
    )
)

# Index payload fields for fast filtering
client.create_payload_index(
    collection_name="kyc_documents",
    field_name="customer_id",
    field_schema="keyword"  # Exact match filtering
)

client.create_payload_index(
    collection_name="kyc_documents",
    field_name="document_type",
    field_schema="keyword"
)

client.create_payload_index(
    collection_name="kyc_documents",
    field_name="uploaded_date",
    field_schema="integer"  # Unix timestamp
)

client.create_payload_index(
    collection_name="kyc_documents",
    field_name="customer_tier",
    field_schema="keyword"
)

Key point: Indexing payload fields is critical for fast filtering. Without indexes, Qdrant scans all vectors.

Ingesting Documents with PII Controls

# qdrant_ingest.py
from datetime import datetime
import uuid

def ingest_kyc_document(
    document_text: str,
    embedding: list,
    customer_id: str,
    document_type: str,
    customer_tier: str,
    uploaded_by: str
):
    """
    Ingest document with comprehensive metadata for filtering
    """
    
    point = PointStruct(
        id=str(uuid.uuid4()),
        vector=embedding,
        payload={
            # Core identifiers
            "customer_id": customer_id,
            "document_type": document_type,  # 'passport', 'bank_statement', etc.
            
            # Access control
            "uploaded_by": uploaded_by,
            "authorized_departments": ["kyc_team", "compliance"],
            "data_residency": "EU",  # For GDPR compliance
            
            # Business metadata
            "customer_tier": customer_tier,  # 'retail', 'high_net_worth', 'corporate'
            "uploaded_date": int(datetime.now().timestamp()),
            
            # PII indicators (for filtering)
            "contains_ssn": True,
            "contains_financial_data": True,
            "sensitivity_level": "high",
            
            # Original text (for retrieval)
            "text": document_text[:500],  # Preview only
            "full_text_location": f"s3://secure-bucket/kyc/{customer_id}/doc.pdf"
        }
    )
    
    client.upsert(
        collection_name="kyc_documents",
        points=[point]
    )

Notice: We're storing comprehensive metadata for filtering, but NOT full PII in Qdrant—just references to secure storage.

Querying with Complex Filters

Here's where Qdrant shines—complex multi-condition filtering that stays fast:

# qdrant_search.py

def search_kyc_documents(
    query_embedding: list,
    current_user_id: str,
    customer_ids: list,  # User is authorized for these customers
    document_types: list = None,
    date_range_days: int = 90,
    customer_tier: str = None
):
    """
    Search with multi-condition filtering for access control
    """
    
    # Build filter conditions
    must_conditions = [
        # User can only search customers they're authorized for
        FieldCondition(
            key="customer_id",
            match=MatchValue(any=customer_ids)
        ),
        
        # Only documents from last N days
        FieldCondition(
            key="uploaded_date",
            range=Range(
                gte=int((datetime.now() - timedelta(days=date_range_days)).timestamp())
            )
        )
    ]
    
    # Optional filters
    if document_types:
        must_conditions.append(
            FieldCondition(
                key="document_type",
                match=MatchValue(any=document_types)
            )
        )
    
    if customer_tier:
        must_conditions.append(
            FieldCondition(
                key="customer_tier",
                match=MatchValue(value=customer_tier)
            )
        )
    
    # Execute search with pre-filtering
    results = client.search(
        collection_name="kyc_documents",
        query_vector=query_embedding,
        query_filter=Filter(must=must_conditions),
        limit=10,
        with_payload=True
    )
    
    return results

The magic: All filtering happens before vector search. At 10M vectors, if filter narrows to 50K candidates, search happens in 50K space, not 10M. Result: 50-80ms latency even with 4-5 filter conditions.

Weaviate: When You Need More Than Vectors

Core Architecture

Weaviate takes a different approach—it's a hybrid database combining:

  • Vector storage and search

  • Graph database (relationships between objects)

  • GraphQL API (flexible querying)

  • Built-in vectorization modules

When Weaviate wins:

  • You need to model relationships (customer → accounts → transactions)

  • Complex queries spanning multiple object types

  • You want GraphQL's flexibility

  • Built-in ML model integration matters

Implementation: Customer Service Knowledge Base

Scenario: Customer service needs to search across:

  • Policy documents

  • Customer FAQs

  • Product information

  • Historical support tickets

And understand relationships: Which FAQs relate to which policies? Which tickets reference which products?

# weaviate_setup.py
import weaviate
from weaviate.classes.config import Configure, Property, DataType

client = weaviate.connect_to_local()  # Or cloud instance

# Define schema with relationships
policy_schema = {
    "class": "Policy",
    "description": "Internal policy documents",
    "properties": [
        {
            "name": "title",
            "dataType": ["text"],
            "description": "Policy title"
        },
        {
            "name": "content",
            "dataType": ["text"],
            "description": "Policy content"
        },
        {
            "name": "department",
            "dataType": ["text"],
            "index": True  # For filtering
        },
        {
            "name": "effective_date",
            "dataType": ["date"]
        },
        {
            "name": "sensitivity_level",
            "dataType": ["text"],
            "index": True
        }
    ],
    "vectorizer": "text2vec-openai",  # Built-in vectorization
    "moduleConfig": {
        "text2vec-openai": {
            "model": "text-embedding-3-small",
            "dimensions": 1536
        }
    }
}

faq_schema = {
    "class": "FAQ",
    "properties": [
        {
            "name": "question",
            "dataType": ["text"]
        },
        {
            "name": "answer",
            "dataType": ["text"]
        },
        {
            "name": "related_policies",
            "dataType": ["Policy"],  # Cross-reference
            "description": "Links to related policy documents"
        },
        {
            "name": "customer_tier",
            "dataType": ["text"],
            "index": True
        }
    ],
    "vectorizer": "text2vec-openai"
}

client.schema.create_class(policy_schema)
client.schema.create_class(faq_schema)

Key difference from Qdrant: Weaviate models relationships between object types. An FAQ can reference related Policy documents.

Querying with GraphQL and Relationships

# weaviate_search.py

def search_with_relationships(
    query: str,
    user_department: str,
    sensitivity_clearance: str
):
    """
    Search that leverages relationships and filters by access level
    """
    
    # GraphQL query (Weaviate's native query language)
    result = client.query.get(
        "FAQ",
        ["question", "answer", "customer_tier"]
    ).with_near_text({
        "concepts": [query],
        "certainty": 0.7  # Minimum similarity threshold
    }).with_where({
        "operator": "And",
        "operands": [
            {
                "path": ["customer_tier"],
                "operator": "Equal",
                "valueText": "retail"  # User's access level
            }
        ]
    }).with_limit(5).do()
    
    # Also fetch related policies
    for faq in result['data']['Get']['FAQ']:
        if 'related_policies' in faq:
            policy_refs = faq['related_policies']
            # Fetch full policy details
            policies = client.query.get(
                "Policy",
                ["title", "content", "department"]
            ).with_where({
                "path": ["uuid"],
                "operator": "ContainsAny",
                "valueText": policy_refs
            }).do()
            
            faq['policies'] = policies
    
    return result

The power: One query retrieves FAQs AND their related policies, with filtering applied to both.

PII Control Patterns for Both Databases

Pattern 1: Multi-Tenant Isolation

Qdrant approach: Separate collections per tenant

# Create isolated collection per customer
def create_tenant_collection(tenant_id: str):
    collection_name = f"tenant_{tenant_id}_documents"
    
    client.create_collection(
        collection_name=collection_name,
        vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
    )
    
    # Users can only access their tenant's collection
    return collection_name

Weaviate approach: Use class per tenant or strict filtering

# Filter by tenant in every query
def search_with_tenant_isolation(query, tenant_id):
    result = client.query.get("Document", ["content"]).with_near_text({
        "concepts": [query]
    }).with_where({
        "path": ["tenant_id"],
        "operator": "Equal",
        "valueText": tenant_id
    }).do()
    
    return result

Pattern 2: Field-Level PII Masking

Don't store raw PII in vector databases—store references:

# Good: Store reference, not PII
payload = {
    "customer_id": "CUST_12345",  # ID, not name
    "document_hash": "abc123...",  # Hash, not SSN
    "pii_location": "s3://secure/customer_12345.enc",  # Reference to encrypted storage
    "has_financial_data": True,  # Flag, not actual data
}

# Bad: Storing raw PII
payload = {
    "customer_name": "John Doe",  # ❌ PII in vector DB
    "ssn": "123-45-6789",  # ❌ Highly sensitive
    "account_balance": 50000,  # ❌ Financial data
}

Pattern 3: Audit Logging for Compliance

Both databases support logging, but implement differently:

Qdrant:

# Wrap searches with audit logging
def audited_search(query_embedding, user_id, customer_ids):
    # Log query attempt
    audit_log = {
        "timestamp": datetime.now().isoformat(),
        "user_id": user_id,
        "customers_accessed": customer_ids,
        "query_hash": hashlib.sha256(str(query_embedding).encode()).hexdigest()
    }
    
    results = client.search(...)
    
    # Log results
    audit_log["results_count"] = len(results)
    audit_log["result_ids"] = [r.id for r in results]
    
    store_audit_log(audit_log)  # To separate audit database
    
    return results

Performance Comparison: Real Numbers

Based on benchmarks and production deployments:

Query Latency (10M vectors, no filtering):

  • Postgres: 300-800ms

  • Qdrant: 50-100ms

  • Weaviate: 80-150ms

Filtered Query (10M vectors, 3+ conditions):

  • Postgres: 800ms-2s

  • Qdrant: 50-120ms (pre-filtering advantage)

  • Weaviate: 100-250ms

Ingestion Speed (vectors/second):

  • Postgres: 30K/s

  • Qdrant: 45K/s

  • Weaviate: 35K/s

Memory Usage (10M vectors, 1536 dimensions):

  • Postgres: ~60GB

  • Qdrant: ~45GB (with quantization: ~20GB)

  • Weaviate: ~55GB

When to Choose What

Choose Qdrant When:

  • Query performance is critical (sub-100ms SLA)

  • Complex filtering is common (3+ conditions per query)

  • You need 10M+ vectors with good performance

  • Multi-tenant isolation via separate collections works

  • You're comfortable with REST/gRPC APIs

Choose Weaviate When:

  • You need to model relationships between object types

  • GraphQL's flexibility matters for your use case

  • Built-in vectorization simplifies your architecture

  • You want rich ecosystem integrations (LangChain, etc.)

  • Hybrid search (vector + keyword) is important

Stick with Postgres When:

  • You have < 1M vectors

  • Your team knows Postgres well

  • You value existing infrastructure integration

  • Query latency of 100-300ms is acceptable

  • You need transactional guarantees with other data

Looking Ahead: 2026-2030

2026-2027: Qdrant and Weaviate both adding built-in PII detection and auto-masking features

2027-2028: Native integration with data catalogs (tracking what PII exists where)

2028-2030: Federated vector search (query across multiple isolated databases without moving data)

The trend: Vector databases absorbing more compliance features, making PII control easier by default.

HIVE Summary

Key takeaways:

  • Specialized vector databases matter at scale—Postgres is excellent until 1-5M vectors, then Qdrant (performance) or Weaviate (features) become necessary for sub-100ms latency

  • Qdrant excels at filtered queries through pre-filtering—applies filters before vector search, giving 4x performance advantage on complex multi-condition queries at scale

  • Weaviate shines for relationship-rich data—hybrid database combining vectors + graph enables modeling connections between objects (FAQs → Policies → Products)

  • PII control requires multiple layers—network isolation + database filtering + data minimization + audit logging. Never store raw PII in vector databases—use references to secure storage

Start here:

  • Evaluating options: Benchmark with YOUR data at YOUR expected scale. Performance varies dramatically based on embedding dimensions, filter complexity, data distribution

  • Starting with Qdrant: Focus on collection design and payload indexing. Properly indexed payloads are the difference between 50ms and 500ms filtered queries

  • Starting with Weaviate: Design your schema carefully. Relationships are powerful but add complexity—only model relationships you'll actually query

Looking ahead (2026-2030):

  • Built-in PII detection and auto-masking

  • Native data catalog integration

  • Federated vector search across isolated databases

  • More BFSI-specific compliance features out of the box

Open questions:

  • How to handle vector database backups that contain (references to) sensitive data?

  • Best practices for key rotation when using encrypted payloads?

  • How to audit vector similarity itself (why did doc A rank higher than doc B)?

Jargon Buster

Pre-Filtering: Applying metadata filters before vector similarity search, not after. Dramatically faster for queries with restrictive filters because it searches a smaller candidate set.

Collection: Qdrant's equivalent of a database table—a namespace for storing related vectors with similar structure. Often used for multi-tenant isolation.

Payload: JSON metadata attached to each vector in Qdrant. Contains business data, access control flags, and searchable attributes.

GraphQL: Query language used by Weaviate for flexible, nested queries. Allows requesting exactly the fields you need and traversing relationships.

Hybrid Search: Combining vector similarity search with traditional keyword search. Weaviate excels at this—find semantically similar documents that also contain exact keywords.

Quantization: Compression technique reducing vector precision to save memory. Example: 32-bit floats → 8-bit integers. Qdrant supports this, reducing memory 4x with minimal accuracy loss.

HNSW: Hierarchical Navigable Small Worlds—graph-based algorithm for fast approximate nearest neighbor search. Used by both Qdrant and Weaviate for vector indexing.

Multi-Tenancy: Isolating different customers' data in shared infrastructure. Achieved via separate collections (Qdrant) or strict filtering (Weaviate).

Fun Facts

On Qdrant's Rust Advantage: When a major European bank benchmarked Qdrant against competitors, they discovered Qdrant used 40% less memory for the same 10M vector dataset. The reason? Rust's zero-cost abstractions and explicit memory management meant no garbage collection pauses and more efficient data structures. For BFSI where infrastructure costs matter (storing 100M+ vectors across multiple environments), this 40% memory reduction translated to $200K+ annual savings in cloud costs. The lesson: Language choice matters at scale—Rust's performance benefits aren't just theoretical.

On Weaviate's GraphQL Surprise: A fintech implementing Weaviate for customer support discovered their support agents were constructing increasingly complex GraphQL queries through the UI—queries that joined 4-5 object types with nested filters. The agents didn't know they were writing GraphQL; the UI abstracted it. But the ability to say "find FAQs related to policies about mortgages for high-net-worth customers updated in the last quarter" in one query (impossible with simple REST APIs) reduced their average resolution time by 35%. Sometimes the "over-engineered" solution (GraphQL vs REST) delivers unexpected workflow improvements.

For Further Reading

  1. Qdrant Documentation (Qdrant, 2025)
    https://qdrant.tech/documentation/
    Comprehensive guide to collections, filtering, and optimization

  2. Weaviate Vector Database Docs (Weaviate, 2025)
    https://weaviate.io/developers/weaviate
    Schema design, GraphQL queries, and integration patterns

  3. Vector Database Benchmark (VectorDBBench, 2025)
    https://github.com/zilliztech/VectorDBBench
    Independent benchmarks comparing performance across databases

  4. Qdrant vs Weaviate Comparison (Enterprise AI Blog, 2025)
    https://research.aimultiple.com/vector-database-comparison/
    Detailed comparison with use case recommendations

  5. BFSI Vector Database Best Practices (AWS Financial Services, 2024)
    https://aws.amazon.com/financial-services/resources/
    Architecture patterns for regulated environments

Next Wednesday: SHAP + LIME Explainability Dashboard Toolkit—generating visual feature attributions suitable for audit and regulatory review. You'll see how to build explainability systems that satisfy compliance requirements while remaining practical for production use.

This is part of our ongoing work understanding AI deployment in financial systems. If you've deployed Qdrant or Weaviate in production, I'd love to hear what filtering patterns and PII controls work in your environment.

— Sanjeev @AITechHive Team

Reply

or to participate

Keep Reading

No posts found