Extending SR 11-7 for an LLM-powered compliance assistant

A US regional bank introduced a retrieval-augmented LLM into its compliance review workflow. We built the validation framework that let it pass internal model risk review.

The situation

The bank's compliance team had built an internal RAG-based assistant to help analysts surface relevant policy guidance during review. The system worked. The problem was that nobody — including the model risk team — knew how to validate it under SR 11-7. The traditional framework had no answer for a probabilistic system that took natural-language input.

What we did

We worked with the model risk team and the engineering team in parallel. The deliverables were three artifacts that fit inside the bank's existing MRM policy:

A model definition that explicitly named the LLM weights, the prompt template, the retrieval index, and the tool definitions as components of the model.
A validation playbook that specified out-of-sample test sets, hallucination metrics, prompt-injection coverage, and human-review rubrics.
A monitoring plan that defined ongoing performance and drift thresholds, plus a re-validation trigger every time any model component changed.

The technical work alongside the documentation included tightening retrieval, adjusting the system prompt to enforce citation-first behavior, and adding refusal patterns for adversarial inputs.

Outcome

The system passed independent model risk review on first submission. The compliance team moved from running pilots to running production analyst workflows. The framework has since been re-used to validate two additional GenAI systems inside the bank.