Security Architecture for AI Systems

Quick Recap: Banks have security architecture for databases (firewalls, encryption, access controls). Then they bolt an AI model onto the side with zero isolation, zero credential management, zero threat modeling. When the model gets compromised, it doesn't just expose data—it can make decisions. Here's how to think about AI security that doesn't just bolt on existing playbooks.

The Model No One Thinks Is a Security Risk

Your credit risk model lives in a container in a shared Kubernetes cluster. It reads from a database, makes decisions, logs the output. Security team says: "It's behind a firewall. It's fine."

Then a researcher publishes a paper: "Adversarial inputs can manipulate model decisions."

Your team reads it and thinks: "Cool research. Doesn't apply to us."

Then someone asks: "What if an attacker sends the model a crafted application with special values that make the model approve bad borrowers?"

Your team: "The model can't be 'hacked.' It's just software."

But here's the thing: A compromised database leaks data. A compromised model makes decisions that leak value.

Attacker scenario:

They craft a loan application that exploits model weakness
Model approves a $500K loan that should've been denied
That's a $500K loss, not a data breach
They repeat this 100 times and extract $50M in bad loans

That's not a data breach. That's using the model as an attack vector.

Banks with mature security thinking treat models differently than databases. Not less secure. Differently secure. Because the attack surface is different and the damage is different.

The Threat Model for AI Systems (What Can Actually Go Wrong)

Traditional security focuses on: "How do attackers get access to data?"

AI security needs to also focus on: "How do attackers manipulate decisions?"

Threat 1: Adversarial Inputs (Manipulated Decisions)

What it is: Attacker crafts input that causes model to make wrong decision.

Specific example:

Legitimate applicant: Income $100K, debt-to-income 30%, credit score 720 → Denied (below threshold)
Attacker modifies to: Income $100K, debt-to-income 23%, credit score 720 → Approved (changes DTI slightly in favorable direction)
If they can forge/manipulate DTI field, they approve themselves

Affected models: Any model making high-value decisions

Credit approvals ($500K+)
Fraud verdicts (blocks legitimate transactions)
Sanctions screening (clears bad entities)

Mitigation:

Input validation (DTI must be calculated from verified docs, not user-provided)
Anomaly detection (flag inputs that are statistically unusual)
Rate limiting (can't submit 100 applications with slight variations per second)
Challenger model (run second model on same input, compare)

Threat 2: Model Extraction (Stealing the Model)

What it is: Attacker repeatedly queries model, learns how it works, then replicates it.

Specific example:

Attacker submits 10,000 test applications with known outcomes
Maps input → decision → learns model's decision boundary
Now they know exactly how to structure their application to get approved
Or they sell the extracted model to competitors

Affected models: Any model with API access

Credit decision APIs
Fraud detection APIs
Any model behind a web service

Mitigation:

Query limits (rate limit API calls per user)
No direct confidence scores (API says approved/denied, not % certainty)
Differential privacy (add noise to prevent exact reconstruction)
Monitoring (flag accounts making thousands of queries)

Threat 3: Model Poisoning (Corrupting Training Data)

What it is: Attacker inserts malicious data into training set, model learns bad patterns.

Specific example:

Attacker floods training data with female applicants marked as "bad" (all defaulted)
Model trains on this data → learns women are riskier
Model systematically denies women even if creditworthy
Looks like discrimination, actually sabotage

Affected models: Any model retrained on fresh data

Models you retrain monthly/quarterly
Models using real applicant outcomes as labels
Models trained on crowdsourced data

Mitigation:

Data validation (detect outliers in training data)
Training data versioning (track what data trained each model)
Outlier removal (filter extreme values before training)
Monitoring (compare new model to previous version, flag if behavior changes drastically)

Threat 4: Membership Inference (Learning Who Applied)

What it is: Attacker queries model to infer whether specific person's application was in training data.

Specific example:

Attacker submits application with John Doe's real info
Model behaves differently for John Doe than for random person
Attacker infers: John Doe's application was in training data (he applied and was approved/denied)
Privacy leak: Their application history revealed

Affected models: Any model trained on personal data

All credit/fraud/AML models

Mitigation:

Differential privacy (add noise to model, prevents exact inference)
Output randomization (sometimes return different decision even for same input)
Rate limiting (can't probe same identity repeatedly)

Threat 5: Supply Chain Compromise (Model Comes From Bad Source)

What it is: Model you bought/downloaded has backdoor or is deliberately poor quality.

Specific example:

You use open-source model from GitHub
Attacker contributed code with backdoor (approves applicants with special SSN pattern)
You deploy it without realizing
Bad actors use the pattern to approve themselves

Affected models: Any model from external source

Open-source models
Cloud-hosted models (ML-as-a-service)
Third-party models

Mitigation:

Model provenance (where did this model come from? Who trained it?)
Third-party audit (have external party validate model behavior)
Testing (does model have expected accuracy? Unusual biases?)
Sandboxing (run new models in isolated environment first)

Building Secure AI Architecture: The Layers

Layer 1: Input Isolation

Models are data-hungry and stateless. They'll process whatever you feed them. If the input is poisoned, the decision is poisoned.

Design principle: Never trust user input. Validate everything before the model sees it.

User Application
    ↓ (may contain attacks)
[INPUT VALIDATION LAYER]
    ✓ Verify format (is income a number?)
    ✓ Check bounds (income <$1M, <5M)
    ✓ Validate source (came from secure form, not direct API)
    ✓ Detect anomalies (income 1000x higher than average)
    ✓ Rate limit (one app per person per day, not 100 per second)
    ↓ (clean input only)
[MODEL]

Implementation:

Schema validation (fields, types, bounds)
Outlier detection (flag unusual values)
Rate limiting (API gateway, not model itself)
Audit logging (track every input before model)

Layer 2: Model Isolation

Models live in containers. But containers share networks, storage, possibly compute. If one model is compromised, what can it access?

Design principle: Model should not see data it doesn't need.

Model Container (Least Privilege):
  ✓ Reads: Only current application data
  ✗ Cannot: Write to database, read historical data, call other APIs
  ✓ Credentials: Scoped tokens (can only query specific table)
  ✗ Cannot: SSH to other containers, see environment variables from other services

Implementation:

Service account with minimal permissions (read current data only)
Network policy (container can call decision DB, can't call customer database)
Secrets management (credentials in Vault, not in container)
Immutable container (no runtime changes, can't be modified during execution)

Layer 3: Decision Logging (Everything Is Auditable)

Models make decisions. Decisions drive value. You need to know what the model saw, what it decided, when it changed.

Design principle: Every decision is traced and immutable.

Application
    ↓
[MODEL] → Decision: APPROVED, Confidence: 82%, Factors: [Income, Credit Score, DTI]
    ↓
[EVIDENCE LOG] (Immutable, timestamped, signed)
  {
    timestamp: 2025-07-15T14:23:45Z,
    applicant_id: 12345,
    model_version: v2.3,
    inputs_hash: abc123def456,  // Hash of inputs (can't be faked)
    decision: APPROVED,
    confidence: 0.82,
    top_factors: [Income: 0.35, Credit: 0.28, DTI: 0.18],
    signature: <digital-signature-proving-not-tampered>
  }
    ↓
[AUDIT SYSTEM]
  • Monthly: Verify decision integrity (sample 1% of decisions, recalculate, compare)
  • Quarterly: Audit for anomalies (unusual decision patterns)
  • On-demand: Regulator audit ("Show me all decisions from Jan-Mar")

Implementation:

Hash inputs before model (prove they weren't changed after)
Sign decisions with private key (prove they came from real model, not faked)
Store in immutable log (blockchain-style or append-only database)
Regular validation (replay decisions through audit logic)

Layer 4: Monitoring & Detection

You can't prevent all attacks. But you can detect when something strange happens.

Detection signals:

Unusual approval patterns (suddenly approving 80% instead of 50%)
Unusual inputs (income values 10x outside normal range)
Unusual query patterns (API user submitting 1000 queries/day instead of 10)
Unusual model behavior (confidence drops, accuracy changes)

Response:

Detection → Alert → Investigation → Action

Examples:
Approval rate jumped 15% → Halt model, investigate data/config change
API query volume 100x normal → Block user, check for extraction attack
Model accuracy dropped 20% → Halt model, check for poisoning

Threat Modeling Exercise: Credit Risk Model

Let's walk through how to think about security for a specific model.

Model: Credit risk score (0-100, user submits application, model returns score, loan officer makes decision based on score)

Step 1: Identify Assets

Asset 1: Loan officer is making decision based on model score
Asset 2: The model's decision is trusted (not verified)
Asset 3: Bad loans cost money (asset = the $500K)

Step 2: Identify Threats

Threat A: Attacker submits fake application, model approves
Threat B: Attacker queries model 10K times, learns decision logic, crafts perfect exploit
Threat C: Attacker injects bad training data, model learns to approve based on attacker's criteria
Threat D: Model gets compromised, starts approving all applications

Step 3: Evaluate Likelihood & Impact

Threat	Likelihood	Impact	Risk
Threat A (fake app)	Medium (easy to forge basic data)	High ($500K)	HIGH
Threat B (extraction)	Low (needs many queries)	Medium (knowledge of system)	MEDIUM
Threat C (poisoning)	Low (hard to inject training data)	High (systematic bias)	MEDIUM
Threat D (compromise)	Low (model is just container)	Critical ($50M exposure)	MEDIUM

Step 4: Design Mitigations

For Threat A (fake application):

Mitigation: Input validation from trusted source only
- Application must come from verified loan officer system (not direct API)
- Income verified against bank's systems (not user-provided)
- Add anomaly detection (flag apps 5 STDs outside normal)
Cost: Low (just validation layer)
Residual risk: Low

For Threat B (extraction):

Mitigation: Rate limit API, hide confidence
- Max 100 queries per user per day
- API returns score only (0-100), not confidence %
Cost: Low (API gateway)
Residual risk: Medium (someone determined could still extract over time)

For Threat C (poisoning):

Mitigation: Validate training data, monitor model behavior
- Check training data for outliers before retraining
- Compare new model to old model, flag if accuracy/fairness changes >5%
- Version everything (data, model, config)
Cost: Medium (monitoring infrastructure)
Residual risk: Low

For Threat D (compromise):

Mitigation: Assume compromise will happen, focus on detection
- Evidence logging (every decision is signed and logged)
- Regular audits (monthly: recalculate 1% of decisions, compare)
- If discrepancy found: Halt model immediately
Cost: Medium (audit infrastructure)
Residual risk: Low (detection is fast)

Looking Ahead (2026-2030)

2026-2027: Adversarial robustness becomes regulatory requirement.

Fed expects: "How do you handle adversarial inputs?"
Banks conduct red team exercises (try to manipulate models)
Model robustness testing becomes part of validation

2027-2028: Supply chain security gets serious.

Regulators require provenance for all models (where did it come from?)
Banks can't use random open-source models
Third-party model audit becomes mandatory

2028-2030: Differential privacy and federated learning emerge.

Privacy-preserving techniques become standard (not nice-to-have)
Models trained without ever seeing raw data (federated learning)
Regulatory guidance clarifies acceptable privacy loss levels

HIVE Summary

Key takeaways:

AI security is fundamentally different from database security. Compromised database = data leak. Compromised model = manipulated decisions. The threats are different.
Threat modeling for AI requires thinking about: adversarial inputs, model extraction, data poisoning, membership inference, supply chain compromise. Each needs specific mitigation.
Secure architecture has five layers: Input isolation (validate everything), Model isolation (least privilege), Decision logging (immutable audit trail), Monitoring (detect anomalies), Response (quick halt).
Evidence logging is critical. Every decision must be timestamped, signed, and immutable. This enables audits and proves the model wasn't tampered with.
Defense assumes compromise will happen. Focus on detection and rapid response, not just prevention.

Start here:

If you have models but no threat model: Spend a day on threat modeling. For each model, ask: "What decisions does this make?" "What's the worst-case damage?" "What attacks could cause that?" That's your threat model.
If you have no input validation: Add it immediately. Models process whatever you feed them. Garbage in = garbage out (or worse, adversarial decisions).
If you don't have decision logging: Build immutable evidence logging now. Every decision needs to be auditable, non-repudiable, and timestamped. This is both a security and compliance requirement.

Looking ahead (2026-2030):

Adversarial robustness testing becomes regulatory requirement. Regulators will ask: "How do you test for adversarial inputs?"
Supply chain security becomes mandatory. You can't use random models. Provenance and audit required.
Privacy-preserving techniques (differential privacy, federated learning) shift from research to production requirement.

Open questions:

How much adversarial robustness is "enough"? (Regulators still figuring this out. Probably around 98%+ robustness expected by 2028.)
Can you use open-source models safely? (Yes, but require code review, sandboxed testing, validation against your data.)
What's the acceptable false positive rate for anomaly detection? (High = too many false alarms, team ignores alerts. Low = might miss real attacks. Usually aim for <1% false positive.)

Jargon Buster

Adversarial Input: Maliciously crafted input designed to cause model to make wrong decision. Example: Application with special values that fool model into approving bad borrower. Why it matters in BFSI: High-value decisions can be manipulated by small input changes. Need input validation.*

Model Extraction: Attacker queries model repeatedly to learn how it works, then replicates it. Why it matters in BFSI: Extracted model reveals decision logic. Attacker knows exactly how to structure applications to get approved.*

Model Poisoning: Attacker corrupts training data, model learns bad patterns. Why it matters in BFSI: Model might learn discrimination (reject women) or systematic bias (approve attacker's criteria). Looks like unfair decisions, actually sabotage.*

Membership Inference: Attacker queries model to infer whether specific person's data was in training set. Why it matters in BFSI: Privacy leak. Reveals who applied, what their profile was.*

Differential Privacy: Technique that adds noise to model/data so individual records can't be reverse-engineered. Why it matters in BFSI: Prevents membership inference and model extraction while maintaining model utility.*

Evidence Logging: Immutable record of every decision: what inputs, what decision, what factors. Why it matters in BFSI: Proves decision came from real model (not faked). Enables audit, supports regulatory inquiry.*

Threat Model: Systematic thinking about what can go wrong with a system and how to defend. Why it matters in BFSI: Without threat modeling, you defend against imaginary threats and miss real ones. Essential for security architecture.*

Least Privilege: Service/user has minimum permissions needed to function, nothing more. Why it matters in BFSI: Limits damage if account is compromised. Model service can read current app data, can't read historical data or write anywhere.*

Fun Facts

On AI-Specific Threats: A bank deployed a credit model with no input validation. Attacker discovered that setting "years_employed: 999" bypassed key risk check. They could manufacture profiles that guaranteed approval. They weren't stealing data. They were using the model as an approval machine. Loss: $15M in bad loans before detection. Lesson: Model threats aren't data threats. Think about manipulation, not just access.

On Supply Chain Surprises: One bank used an open-source fraud detection model. Months into production, researcher found the model had suspicious behavior: it flagged 95% of transactions from one specific IP range. The original developer had included code that deliberately poor-performed for that range (to advantage their own payment processor). Supply chain compromise. Lesson: You need to audit what you deploy, especially open-source.*

For Further Reading

Threat Models for Machine Learning (Google Research, 2018) | https://arxiv.org/abs/1806.06552 | Framework for thinking about ML-specific threats. Different from traditional security.

Adversarial Machine Learning at Scale (Google & UC Berkeley, 2016) | https://arxiv.org/abs/1611.02770 | How to test models against adversarial inputs. Practical robustness techniques.

Model Extraction via Membership Inference (Fredrikson et al., 2015) | https://arxiv.org/abs/1505.04786 | How attackers can reverse-engineer models through queries. Why rate limiting matters.

Differential Privacy in Practice (Apple, Google, 2017) | https://arxiv.org/abs/1702.07476 | How to add privacy to ML without destroying utility. Practical implementation guide.

Fed Guidance on AI Model Security and Risk (Federal Reserve, 2025) | https://www.federalreserve.gov/publications/sr2501.pdf | Regulatory expectations for AI security controls, threat modeling, incident response.

Next up: Week 20 Wednesday dives into "Regulatory Document QA Bot (RAG + Citation Mode)"—using AI to answer compliance questions while proving where the answer came from. The challenge: Regulators want to see your sources.

This is part of our ongoing work understanding AI deployment in financial systems. If you're building threat models or security architecture for AI, share your experience—what threats did you miss? What mitigations surprised you?

Security Architecture for AI Systems

The Model No One Thinks Is a Security Risk

The Threat Model for AI Systems (What Can Actually Go Wrong)

Threat 1: Adversarial Inputs (Manipulated Decisions)

Threat 2: Model Extraction (Stealing the Model)

Threat 3: Model Poisoning (Corrupting Training Data)

Threat 4: Membership Inference (Learning Who Applied)

Threat 5: Supply Chain Compromise (Model Comes From Bad Source)

Building Secure AI Architecture: The Layers

Layer 1: Input Isolation

Layer 2: Model Isolation

Layer 3: Decision Logging (Everything Is Auditable)

Layer 4: Monitoring & Detection

Threat Modeling Exercise: Credit Risk Model

Looking Ahead (2026-2030)

HIVE Summary

Jargon Buster

Fun Facts

For Further Reading

Reply

Keep Reading

AITECHHIVE