Most banks think: “We’re using Azure OpenAI, so Microsoft handles the risk.” Nope.
Or: “We’re using Claude through AWS Bedrock, so Amazon’s responsible.” Also nope.
You own the risk, even if you don’t own the model.
This misconception is everywhere. Banks assume that vendor AI means vendor responsibility for governance. They’re treating LLM APIs like they’d treat Oracle database - yes, Oracle maintains the database software, but you’re still responsible for data governance, access controls, backup strategies, and everything you do with it.
Same applies to AI vendors. OpenAI doesn’t govern your use of GPT-4. Anthropic doesn’t validate your Claude implementation. AWS doesn’t ensure your Bedrock-powered system complies with banking regulations.
That’s all still your job.
The Vendor AI Reality
Something like 90% of financial services AI is vendor-provided. OpenAI, Anthropic, AWS Bedrock, Google Vertex, Azure AI. This makes total sense - you don’t need to train your own large language model from scratch. The foundation models are commodities (expensive commodities, but commodities).
The business value is in how you use them. Your prompts, your data, your use cases, your integration into business processes.
Vendor AI is the right choice for most organizations. But it doesn’t make governance someone else’s problem - it changes what you need to govern.
What You Still Own
Let me be specific about what stays your responsibility when using vendor AI.
Use case risk: You decide how to use the AI. Customer service chatbot? Loan underwriting assistant? Compliance document review? Each use case has different risk profiles, different regulatory requirements, different consequences when things go wrong.
OpenAI doesn’t assess whether your specific use case is appropriate or high-risk. That’s your call, and your responsibility to get right.
Data risk: You control what data goes into prompts. Customer PII, confidential business information, trade secrets - you decide what the AI sees.
If you leak customer data because you sent it to an LLM API without proper safeguards, that’s on you. The vendor provided an API. You chose to send sensitive data through it.
Data governance - classification, access controls, privacy compliance - stays your responsibility.
Integration risk: You build the system around the AI. The prompts, the RAG layer, the business logic, the error handling, the user interface. All of that is custom code that you own and maintain.
When your RAG system leaks data across access boundaries (as I wrote about a few weeks ago), that’s not the LLM vendor’s fault. That’s your integration architecture failing to preserve access controls.
Validation: You must validate that the AI system works for YOUR use case. Not “does GPT-4 work in general” (OpenAI handles that) but “does our loan underwriting assistant that uses GPT-4 produce accurate, fair, compliant recommendations?”
That’s a validation question only you can answer. It depends on your prompts, your data, your use case, your risk tolerance.
You can’t outsource validation to your vendor. They don’t know your business requirements or regulatory obligations.
Monitoring: You must detect when things go wrong. Model drift, quality degradation, inappropriate outputs, cost spikes, security incidents.
The vendor monitors their API (uptime, performance, abuse). You monitor your AI system (business outcomes, user satisfaction, compliance, cost).
Those are different responsibilities. Vendor’s API can be working perfectly (99.9% uptime) while your use case is failing (hallucination rate spiked, users getting wrong answers).
Analogy that might help: Using Oracle database doesn’t mean Oracle governs your data. You’re responsible for schema design, access controls, backup strategies, performance tuning, data quality. Oracle maintains the database engine; you maintain your use of it.
Same split with AI vendors.
Vendor-Specific Risks
Using vendor AI creates specific risks that don’t exist when you control the model yourself. The FINOS AI Governance Framework identifies several of these.
Model changes without notice: Vendors update models continuously. GPT-4 in January behaves differently than GPT-4 in July. Sometimes vendors announce this, sometimes they don’t.
Your system’s behavior can change because the underlying model changed, not because you changed anything. I’ve seen banks get surprised by this - their carefully tuned prompts suddenly work differently after a vendor model update.
This is why version pinning matters (FINOS MI-10). Pin to a specific model version, test updates before accepting them, maintain control over when changes happen.
Service availability: Your AI system depends on the vendor’s API being up. If Azure OpenAI goes down, your customer service chatbot stops working. If AWS Bedrock has regional outages, your loan processing system is offline.
You need fallback strategies. Can you degrade gracefully? Queue requests and process them later? Fail over to a different vendor or a simpler non-AI system?
Your business continuity plan must account for vendor API failures.
Data residency and sovereignty: Where is your data processed when you call the LLM API? Different vendors have different approaches. Some keep data in specific regions, some don’t guarantee data residency.
For banks operating in Europe (GDPR), certain jurisdictions (data sovereignty laws), or with specific regulatory requirements, this matters. You can’t just send data to any API endpoint without knowing where it’s processed.
Contract terms and vendor architecture need to address this.
Contract terms and data handling: What happens to your data when you send it to the vendor? Is it used for model training? Is it logged? How long is it retained? Who can access it?
Most enterprise vendors now offer zero-retention agreements (data is processed but not stored). But you need to verify this in contracts and through vendor assessments.
One bank I know sent customer data to a free-tier LLM API for “testing.” That API’s terms allowed data use for model training. Oops.
Read the contracts. Understand the data handling. Don’t assume.
Lock-in: Once you’ve built your system around a specific vendor’s API, switching is hard. Your prompts are tuned for GPT-4 - they might not work well with Claude or Gemini. Your token limits, response formats, API patterns are vendor-specific.
This is commercial risk, not just technical risk. If the vendor raises prices 3x or changes terms unfavorably, can you switch? How long would it take? How much would it cost?
Consider this upfront. Some organizations deliberately build abstraction layers to support multiple vendors. Costs more initially, provides flexibility later.
Practical Vendor Governance
Here’s what effective vendor AI governance looks like, per FINOS framework (MI-7: Legal/Contractual and MI-10: Version Pinning).
Vendor assessment questions for procurement:
- Where is data processed geographically?
- What is your data retention policy?
- Is our data used for model training?
- What are your uptime SLAs?
- How do you handle security incidents?
- Do you support version pinning?
- What notice do you provide before model updates?
- Can we audit your data handling practices?
These should be standard questions in your vendor RFP process.
Contractual protections:
- Zero-persistence clauses (data not retained after processing)
- Data breach notification requirements (if vendor is compromised, they must tell you within X hours)
- SLAs for availability and performance (with penalties for violations)
- Version pinning rights (you control when model versions change)
- Data residency guarantees (data stays in specified regions)
- Termination terms (what happens to your data if you leave)
Get these in writing, in contracts, not just in vendor marketing materials.
Version pinning strategy: Don’t auto-update to new model versions. Pin to specific versions. Test new versions in staging environments before production promotion.
Example: Use gpt-4-0613 not just gpt-4. The first is a specific version, the second is a moving target that changes when OpenAI updates it.
When a new version is available, test it:
- Run your validation suite
- Compare outputs to current version
- Check for quality differences
- Verify performance and cost impacts
- Only promote to production when you’re confident
This is more work, but it prevents surprises in production.
Testing before updates: Build a test suite that validates AI behavior for your use cases. When a vendor releases a new model version:
- Run automated tests (check outputs for known inputs)
- Manual review (spot-check quality)
- Performance benchmarks (latency, token usage)
- Cost analysis (new version might be cheaper or more expensive)
Document results, make an informed decision about whether to upgrade.
Monitoring for vendor-driven drift: Even with version pinning, monitor for behavior changes. Vendors sometimes update models without changing version numbers (bug fixes, safety improvements).
Track quality metrics over time. If they change unexpectedly, investigate whether the vendor made changes.
Vendor AI is the Right Choice
I’m not arguing against vendor AI. For most financial institutions, it’s absolutely the right approach. Building and maintaining your own LLMs is expensive, complex, and offers little competitive advantage.
But vendor AI doesn’t outsource governance - it changes what you govern.
You’re not governing model training data or model architecture (vendor’s job). You’re governing use cases, data handling, integration architecture, validation, and monitoring (your job).
Focus on the integration points. How does the vendor AI fit into your business processes? What data goes in? What decisions are made with the outputs? What happens when it fails?
Those are the governance questions that matter for vendor AI. Answer them the same way you’d answer them for any critical business system - with appropriate controls, documentation, and oversight.
If your governance plan assumes “the vendor handles it,” you don’t have a governance plan. You have an assumption that will get tested when something goes wrong.
Fix that before the regulator asks about it.