Building an AI Risk Assessment Process

Every new AI project should go through risk assessment before deployment. But most banks don’t have a process that works for AI.

They have traditional risk assessment processes - great for evaluating technical projects, infrastructure changes, new software deployments. Those ask questions about availability, performance, security, disaster recovery.

All important questions. Also insufficient for AI.

AI-specific risks don’t show up in traditional risk assessments. Model drift, hallucination, bias, prompt injection, data leakage - these need different questions, different evaluation criteria, different mitigation strategies.

I’m going to walk you through a practical risk assessment process specifically designed for AI. This is based on the FINOS heuristic assessment methodology. It works. Takes 2-4 hours for a typical use case, produces a documented risk assessment that maps risks to specific mitigations.

Why Traditional Risk Assessment Falls Short

Traditional IT risk assessment template asks questions like:

What’s the system availability requirement?
What’s the disaster recovery plan?
What security controls are in place?
What’s the data backup strategy?
Who approves changes?

All reasonable. But they miss AI-specific concerns.

Traditional process doesn’t ask:

Can the model hallucinate false information?
How do we detect if model performance degrades over time?
What happens if the model exhibits bias against protected groups?
Can users manipulate the model through prompt injection?
Does the model leak training data or cross customer boundaries?

These risks aren’t hypothetical. I’ve seen AI systems that passed traditional risk assessment and then:

Hallucinated customer data that didn’t exist (data quality incident)
Silently degraded over months (model drift nobody noticed)
Showed bias in customer treatment (potential regulatory violation)
Leaked information across access boundaries (security incident)

Traditional risk assessment didn’t catch these because it wasn’t asking the right questions.

You need AI-adapted risk assessment methodology.

The 8-Step FINOS Heuristic Assessment

The FINOS framework provides an 8-step heuristic assessment process. I’m going to walk through each step with a concrete banking example: a loan underwriting assistant that uses an LLM to analyze applications and suggest approve/deny decisions.

This isn’t a real system (anonymized and simplified), but it’s representative of actual use cases banks are building.

Step A: Define Use Case and Context

Questions to answer:

What business problem are you solving?
Who are the users?
What decisions will the AI make or inform?
What’s the business value?

Example: Loan Underwriting Assistant

Business problem: Loan underwriters spend hours reviewing applications, reading documents, checking criteria. Process is slow and inconsistent.

Users: Commercial loan underwriters (internal staff, ~50 people)

Decision: AI analyzes loan application documents (financial statements, business plans, credit history) and suggests approve/deny with reasoning. Underwriter reviews the suggestion and makes final decision.

Business value: Faster underwriting (reduce decision time from 2 days to 4 hours), more consistent application of criteria, free up underwriter time for complex cases.

Why this matters: Defining context clearly helps identify relevant risks. Customer-facing vs. internal? Automated decision vs. advisory? High volume vs. occasional use? Each changes the risk profile.

For our example: Internal users (lower risk than customer-facing), but decisions affect customers (higher stakes than pure productivity tool). Advisory not automated (human oversight is a control), but suggestions will heavily influence final decisions (can’t assume humans always override bad suggestions).

Step B: Identify Data Involved

Questions to answer:

What data does the AI access?
Where does it come from?
What’s the sensitivity level?
What privacy regulations apply?

Example: Loan Underwriting Assistant

Data accessed:

Loan application forms (applicant name, business details, loan amount requested)
Financial statements (3 years of business financials)
Credit reports (from credit bureaus)
Internal customer history (previous loans, payment history)
Public business information (business registrations, litigation records)

Sources:

Loan origination system
Credit bureau APIs
Internal customer database
Public records databases

Sensitivity: High - contains PII (personal identifying information), financial data, credit information

Regulations: FCRA (Fair Credit Reporting Act), ECOA (Equal Credit Opportunity Act), state privacy laws, internal data governance policies

Why this matters: Data sensitivity drives security and privacy requirements. High-sensitivity data needs stronger controls. Regulated data (like credit reports) has specific compliance requirements.

For our example: We’re dealing with highly sensitive financial and credit data. Any data leakage is a serious incident. Privacy regulations apply. Need strong access controls and audit trails.

Step C: Assess Model and Technology

Questions to answer:

What type of AI? (LLM, traditional ML, hybrid)
Vendor model or custom?
What’s the architecture?
How is the model deployed?

Example: Loan Underwriting Assistant

Type: LLM-based with RAG (retrieval-augmented generation)

Model: GPT-4 via Azure OpenAI (vendor model)

Architecture:

User uploads loan application documents
RAG system indexes documents and extracts key information
System constructs detailed prompt with underwriting criteria and document analysis
GPT-4 analyzes and provides approve/deny suggestion with reasoning
Response returned to underwriter for review

Deployment: Azure cloud, private endpoint, data stays in US region

Why this matters: Different AI technologies have different risks. LLMs can hallucinate, RAG systems can leak data across access boundaries, vendor models can change without notice.

For our example: RAG system is a risk (need to ensure it doesn’t leak data between applicants). Vendor model means we don’t control model updates (need version pinning). LLM can hallucinate (need validation that suggestions are based on actual document content).

Step D: Evaluate Output and Decision Impact

Questions to answer:

How are outputs used?
Is there human-in-the-loop or fully automated?
What happens if output is wrong?
What’s the customer and business impact?

Example: Loan Underwriting Assistant

Output use: Suggestion with reasoning (“Recommend APPROVE because [reasons]” or “Recommend DENY because [reasons]”)

Human oversight: Yes - underwriter reviews suggestion and makes final decision. Underwriter can override.

Impact if wrong:

False positive (suggest approve for risky loan):

Business impact: Credit loss if loan defaults
Customer impact: None direct (customer gets loan)

False negative (suggest deny for good loan):

Business impact: Lost revenue, relationship damage
Customer impact: Wrongful denial, potential Fair Lending violation

Why this matters: Decision impact determines risk tier and governance requirements. Automated high-impact decisions need maximum controls. Advisory decisions with human oversight allow some error tolerance.

For our example: Human oversight is a significant control - bad suggestions should get caught. But we can’t assume 100% override rate (humans tend to follow AI suggestions). Wrong decisions have material financial and regulatory consequences. This is Tier 2 or borderline Tier 1 risk.

Step E: Map Regulatory Requirements

Questions to answer:

What regulations apply to this use case?
What compliance requirements must you satisfy?
Are there industry-specific standards?

Example: Loan Underwriting Assistant

Applicable regulations:

FCRA (Fair Credit Reporting Act): Proper use of credit reports, adverse action notices
ECOA (Equal Credit Opportunity Act): No discrimination based on protected characteristics
Fair Lending laws: Equal treatment, no disparate impact
GDPR (if any EU applicants): Data privacy, right to explanation
Internal model governance policies: (bank-specific governance requirements)

Compliance requirements:

Ability to explain denial reasons (FCRA adverse action)
Testing for bias/fairness (ECOA, Fair Lending)
Audit trail of decisions (regulatory examinations)
Data privacy controls (GDPR, state privacy laws)
Documentation of model validation (internal governance)

Why this matters: Regulatory requirements aren’t optional. They define mandatory controls. If you can’t satisfy regulatory requirements, you can’t deploy the system.

For our example: Fair Lending compliance is critical - any bias in suggestions is a major problem. Need ability to explain decisions (challenge for LLMs). Need robust testing and monitoring for discriminatory patterns.

Step F: Consider Security Aspects

Questions to answer:

What security risks are present?
What’s the attack surface?
What data protection measures are needed?

Example: Loan Underwriting Assistant

Security risks:

Prompt injection: Could an underwriter manipulate the AI by crafting malicious text in application documents?

Data leakage: Could the RAG system leak information from one applicant’s documents into another applicant’s analysis?

Access control: Do underwriters only see applications they’re authorized to handle?

Data exfiltration: Could the AI be tricked into extracting and exposing sensitive data?

Vendor data handling: What happens to data sent to Azure OpenAI API?

Attack surface:

Document upload (malicious file uploads)
RAG system (data isolation between applications)
API calls to Azure OpenAI (data in transit, vendor access)
User interface (access control, audit logging)

Why this matters: Security incidents in financial services are expensive (regulatory fines, reputation damage, customer impact). Security risks need specific mitigations.

For our example: Data leakage between applicants would be a critical incident (privacy violation, potential bias). Need strong isolation in RAG. Prompt injection via uploaded documents is a real risk (need input validation and output filtering).

Step G: Identify Controls and Safeguards

Questions to answer:

What mitigations should you implement?
Which FINOS mitigations apply?
What’s the implementation effort?

Example: Loan Underwriting Assistant

Based on identified risks, selected FINOS mitigations:

MI-2 (Data Filtering): Filter sensitive PII before sending to LLM, mask or redact unnecessary personal details

MI-3 (Firewalling): Input validation on uploaded documents, output filtering to detect data leakage or inappropriate content

MI-4 (Observability): Log all suggestions with reasoning, document analysis, user actions, maintain audit trail

MI-5 (Testing): Validate system with test applications (known good, known bad, edge cases), test for bias across demographic groups

MI-10 (Version Pinning): Pin to specific GPT-4 version, test new versions before production deployment

MI-11 (Feedback Loops): Underwriters can provide feedback on suggestion quality, track override rates and reasons

MI-16 (Access Control Preservation): RAG system respects user permissions, underwriters only access applications they’re authorized to see

Implementation effort estimate:

Data filtering: 2 weeks
Firewalling/validation: 2 weeks
Observability/logging: 1 week
Testing/validation: 3 weeks (includes bias testing)
Version pinning: 1 week (configuration)
Feedback mechanism: 1 week
Access control: 2 weeks

Total: ~12 weeks implementation effort

Why this matters: Identifying mitigations turns risk assessment into action plan. You know what controls to build and how long it’ll take.

For our example: Significant implementation effort, but appropriate for the risk level. Testing and bias validation take the most time - that’s expected for a lending use case with Fair Lending requirements.

Step H: Make Decision and Document

Questions to answer:

Approve, deny, or approve with conditions?
What conditions must be met before deployment?
How is this documented?

Example: Loan Underwriting Assistant

Decision: Approve with conditions

Conditions for deployment:

Implement all identified mitigations (Step G)
Complete bias testing across protected demographic groups, document results
Validate suggestion quality with 200 test applications (measured accuracy, false positive/negative rates)
Implement monitoring dashboard with key metrics (suggestion quality, override rate, bias metrics)
Train underwriters on system limitations and override procedures
Develop incident response plan for quality degradation or bias detection
Schedule quarterly governance reviews

Documentation:

Risk assessment document (this 8-step process, documented)
Mitigation implementation plan with timeline
Validation test results
Training materials for users
Monitoring dashboard
Entry in model inventory with risk tier (Tier 2)

Approval: Risk committee approves, contingent on conditions being met

Why this matters: Documentation creates accountability and audit trail. When regulators ask “how did you assess risk for this system?”, you have evidence of systematic process.

For our example: Conditional approval is appropriate - use case has business value, risks are manageable with proper controls, but those controls must be implemented before deployment. Not all projects get approved - if risks outweigh benefits or can’t be adequately mitigated, deny.

Practical Tips for Implementation

You’ve seen the 8-step process with a detailed example. Here’s how to actually implement this in your organization.

Use a template: Create a standard template with the 8 steps and key questions. Makes the process repeatable and ensures consistency across different use cases.

Involve multiple perspectives: Don’t do risk assessment alone. Include:

Technical team (understands the AI system)
Business team (understands use case and value)
Risk team (understands risk management)
Legal/compliance (understands regulatory requirements)
Security team (understands security risks)

Cross-functional assessment catches risks that single perspective would miss.

Be realistic about risk: Don’t under-assess to speed approval. Be honest about risks and consequences. Better to identify risks upfront than discover them in production.

Document decisions: Write down your risk assessment, identified mitigations, approval decision. This is your audit trail. You’ll need it for internal reviews, external audits, and regulatory examinations.

Review periodically: Risk assessment isn’t one-and-done. Systems change, threats evolve, regulations update. Review risk assessments quarterly or annually, or when significant changes occur.

This Process is Implementable Today

Risk assessment shouldn’t be a blocker that kills all AI projects. But it should be systematic and documented.

This 8-step process takes 2-4 hours for a typical use case. That’s not burdensome - it’s reasonable diligence for deploying AI systems that make material business decisions.

Template approach scales. Once you’ve done this for a few use cases, the process becomes familiar. Common patterns emerge (most customer service chatbots have similar risks, most document analysis tools have similar requirements). You get faster.

Start building your AI risk assessment muscle now. Document your process. Create templates. Train teams on the methodology.

When you’re scaling to 10, 20, 50 AI use cases, systematic risk assessment is what makes that manageable. Without it, you’re making ad-hoc decisions and hoping nothing goes wrong.

Build the process now. Use it consistently. Document everything. That’s how you scale AI deployment responsibly.