← All insights
9 min read

SR 11-7 meets generative AI: what actually has to change

The Federal Reserve framework was written for traditional models. Here is what holds, what breaks, and what to add when you bring it to LLMs.

By Matt Achachlouei

SR 11-7 is the closest thing US banking has to a constitution for model risk. Most of it survives the move to generative AI — the principles around independent validation, ongoing monitoring, and clear documentation translate cleanly. But three things break, and pretending they don't is how organizations end up with unauditable LLM systems.

What still holds

The framework's core stays intact: a model inventory with risk tiering, validation independent from development, evaluation on out-of-sample data, and ongoing performance monitoring. Every one of these applies to LLMs and agentic systems, often more urgently than it did to logistic regressions.

What breaks

Determinism. Traditional models give you the same output for the same input. LLMs do not. Validation has to account for distributional behavior, not just point performance.

Stable inputs. Prompt templates change. System messages change. Tool definitions change. A "model" in the SR 11-7 sense is now a model plus its scaffolding — and the scaffolding mutates.

Closed input space. A logistic regression takes a fixed feature vector. An LLM takes natural language. Your validation has to reason about adversarial prompts, jailbreaks, and inputs you never saw in development.

What to add

A practical extended framework adds three things:

  1. Distributional metrics. Pass-rate over a held-out set, with confidence intervals. Not "did this prompt pass."
  2. Scaffolding versioning. Treat the prompt, tools, and retrieval index as components of the model. Version them. Validate changes.
  3. Adversarial evaluation. A jailbreak suite, a prompt-injection suite, and a refusal-behavior suite. These are not optional for production deployments.

The good news: examiners are not asking for a new framework. They are asking for SR 11-7 applied honestly to a probabilistic system. The teams that do this work upfront pass review. The teams that don't end up doing it twice.

Working with us: Rizmi Labs builds extended SR 11-7 frameworks for banks introducing GenAI into regulated workflows. Get in touch to discuss.