MCP and the Illusion of Control

1. Everyone is talking about MCP

We’re standardising the cables while the system rewires itself in real time.

That’s what it feels like watching the current obsession with Model Context Protocol, or MCP. In 2025, this once-obscure specification has become the centre of attention for AI developers, compliance teams, and safety researchers. It’s been called “USB-C for AI”, a plug-and-play standard to help intelligent systems communicate what they are, what they can do, and what they shouldn’t.

Created by Anthropic, MCP was designed to solve a specific problem: as language models become embedded across applications, systems, and workflows, it becomes harder to track their origin, purpose, and boundaries. MCP offers a machine-readable format to describe this information in-context, a welcome improvement for a field still lacking shared safety language.

Major players like OpenAI, Google DeepMind, and Microsoft are now racing to implement it. Posts about MCP are flooding LinkedIn and Reddit. Labs are showcasing early prototypes. On the surface, it looks like progress, a sign that we’re finally building serious infrastructure for AI safety.

But here’s the uncomfortable truth: MCP is not governance. It’s a protocol.

It tells you how a model should present itself, not how it behaves. It offers context, not control. It helps tools and agents interoperate, but it says nothing about whether the system is safe, aligned, or even stable once deployed.

This is the illusion: that by structuring metadata, we’ve somehow constrained risk. That because a model describes itself accurately, it won’t be misused. That just because agents can check for context, they’ll act on it.

MCP is a useful protocol. It should be widely adopted. But if we mistake it for a safeguard, if we confuse interoperability with integrity, we will walk into the next phase of AI deployment with our eyes wide shut.

What follows is not a critique of MCP. It’s a warning about the limits of protocols in systems that learn, drift, and evolve in ways no schema can capture.

2. Model thinking blinds us to system risk

The field of AI governance keeps circling the same point: the model.

We scrutinise training data. We write model cards. We document hyperparameters and safety layers. We debate licensing and fine-tuning disclosures. Every time something goes wrong, the instinct is to ask: what model was used, and was it properly labelled?

But most real-world AI systems aren’t built around models. They’re built around systems, and the model is just one moving part.

Enterprise deployments today look nothing like a static model architecture. They involve LLM APIs chained with vector databases, structured RAG pipelines, layered prompt engineering, embedded safety modules, policy overlays, and increasingly, autonomous agents acting on real-world data. Many systems remix multiple models across vendors, contexts, and teams. And most of these components are invisible to the final user, or to the teams charged with oversight.

This compositional complexity is where risk emerges.

The failure doesn’t come from the model itself, but from how it’s used, by whom, and in what contextual loop. Prompt injection, data leakage, unintended goal pursuit, these are not model bugs. They’re system-level breakdowns. And they often arise after deployment, in environments that were never visible to the model’s original developers.

Governing the model in isolation is like governing a brick instead of the house. It might tell you something about the material, but it says nothing about how the building will behave under pressure, or whether the roof will collapse when the agents inside start moving the furniture.

Until governance frameworks move from artefact-based thinking to behaviour-aware systems thinking, we will continue to miss the very risks we claim to regulate.

3. Metadata cannot fix behaviour

MCP is, at its core, a metadata protocol. It allows developers to describe a model’s purpose, limitations, input requirements, and intended safety constraints in a structured, machine-readable way. According to Anthropic’s specification, the goal is to give downstream systems, agents, applications, users, the necessary context to make responsible decisions about how to interact with a model.

In theory, this brings much-needed transparency to black-box systems. In practice, however, we are once again overestimating what metadata can do.

A model can faithfully report its intended function, and still behave in ways that violate it. It can declare that it was trained to summarise legal documents, while producing hallucinated citations or exposing sensitive data from an earlier input. It can state a safety boundary, “I am not permitted to give medical advice”, and then, under subtle prompt engineering, provide treatment recommendations anyway.

Why? Because behaviour emerges not from declarations, but from interactions. And MCP has no real-time authority over the context in which a model is deployed. It cannot observe the evolving prompt chains. It cannot score the outputs. It cannot intervene.

The assumption behind metadata-based safety is that knowing the limits is equivalent to staying within them. But that assumption collapses in live systems. Most real-world failures don’t happen because metadata was missing, they happen because no one was watching what the system was doing when it mattered.

Worse, the presence of well-structured metadata can create a false sense of confidence. It may satisfy a compliance audit. It may even help an agent check compatibility. But it won’t prevent a model from misleading, leaking, or escalating, because the source of failure isn’t a missing field. It’s the gap between stated intent and lived behaviour.

Protocols like MCP give structure to context. But structure is not supervision.

4. Protocols don’t stop emergence or misuse

Protocols are designed to organise complexity. They give systems a common language, a structured way to declare what they are and how they should be used. Model Context Protocol (MCP) is a strong example of this logic: a mechanism for models to describe their intended capabilities, risks, and constraints to external systems.

But AI systems don’t fail because they lack description. They fail because they evolve.

Large language models today are not static assets. They are embedded, fine-tuned, prompted, chained, wrapped, and deployed in environments that shift constantly. An MCP-compliant model may enter production with clearly defined guardrails, but what happens next is rarely visible to the protocol.

Take fine-tuning. A developer can take a base model with declared safety constraints and fine-tune it to behave entirely differently. The MCP metadata may remain unchanged, still claiming that the model avoids sensitive content or refuses toxic queries, but the behaviour will no longer match.

Or consider autonomous agents. These systems don’t just call models, they build internal state, plan actions, and learn from feedback. Their capabilities can mutate during runtime, especially when chained with tools, memory, or other agents. MCP may have been present at the first call, but by the time the system has evolved downstream, the protocol is out of scope.

And then there’s prompt injection, an increasingly well-documented vector of attack. A malicious input can override model instructions, exfiltrate private data, or manipulate behaviour. No metadata tag will stop that. No field in the protocol will detect it. These threats require continuous validation, behavioural supervision, and context-aware safeguards, not static declarations.

Protocols like MCP create clarity at the interface. But AI doesn’t just fail at the interface. It fails in the grey areas, where intent becomes action, and documentation gives way to behaviour.

5. Governance is not compatibility

MCP is, without doubt, a breakthrough in technical integration. It offers a clean interface for models to declare their context, risks, and constraints, enabling agents and applications to interpret and interoperate across systems. In this respect, it has rightly been compared to USB-C: a universal connector that promises to make complex AI ecosystems easier to manage.

But governance is not about connection. It’s about control.

The mistake many are making is assuming that because systems can now speak the same language, they will follow the same rules. That because you can plug an AI model into a toolchain, it will behave as expected, and that risks can be mitigated through standardised formatting alone.

Governance is not about enabling flow. It’s about establishing friction where necessary. It is the architecture of permission:

What is allowed to run?
Who authorises it?
What thresholds trigger intervention?
Who has visibility into that process?

These are policy and enforcement questions, not protocol specifications.

Take OpenAI’s system cards, released in March 2023. They document use cases, limitations, and safety intentions. But system cards are not enforcement tools. They do not monitor drift, detect misuse, or validate alignment in real time. They communicate. They don’t intervene.

True governance requires observability at the behavioural level and mechanisms of control embedded in the system’s lifecycle, from design and deployment to continuous monitoring.

MCP excels at compatibility. It tells us what the model says it is. But compatibility does not equal accountability. We don’t need more connection. We need infrastructure that can decide when to say no.

That’s the gap governance must fill, and why no protocol, no matter how elegant, can close it alone.

6. We need governance that adapts

Governance that relies on static declarations is not governance at all. In the context of AI, true oversight must evolve with the system, not sit outside it, frozen at the moment of deployment.

Modern AI systems are dynamic. They are retrained, repurposed, extended with plugins, embedded into workflows, and increasingly chained with autonomous agents. Risk doesn’t live in the model; it emerges from how models behave in context, and that context changes daily.

What’s needed is governance that adapts in real time.

That means:

Drift detection: monitoring whether a model’s outputs are shifting over time due to subtle data shifts, adversarial prompting, or system interaction loops.
Multi-agent interaction audits: tracking what happens when autonomous agents collaborate or compete, particularly in enterprise environments where task planning is delegated to model-led agents.
Prompt behaviour scoring: not just logging prompts, but classifying and scoring them for manipulation, boundary-pushing, or behavioural escalation. This includes detecting “harmless” prompts that elicit dangerous or policy-violating outputs.
Workflow-aware approvals: governance mechanisms that can approve or block outputs based on the business process they affect, not just model-level confidence. A benign-sounding response may be high-risk if routed into an HR system, financial platform, or external API.

Frameworks like the NIST AI Risk Management Framework (RMF) provide a solid starting point. They emphasise layered, iterative, impact-based evaluation, a crucial mindset. But even NIST’s model remains high-level. Without system-specific enforcement capabilities, it cannot operationalise those principles where it matters most: at the interface between AI and action.

This is where protocols like MCP can mislead. They give us structure, yes, a common language for risk signalling. But language is not law. It’s a syntax, not a justice system.

We still need governance with teeth. That means human oversight, institutional authority, enforcement mechanisms, and the willingness to shut systems down when they cross a line.

7. Don’t confuse standardisation with safety

The excitement around MCP is understandable. In a fragmented ecosystem of AI models, toolchains, and orchestration layers, the promise of a common protocol is attractive. It offers order, visibility, and the appearance of control.

But standardisation does not guarantee safety, and interoperability is not assurance.

Just because something plugs in cleanly doesn’t mean it operates safely.

MCP ensures that models can declare who they are and what they’re meant to do. That’s a major improvement for developers and systems integrators. But enterprise leaders must resist the temptation to treat MCP compliance as a proxy for oversight.

Risk is not mitigated by a well-structured JSON field. It’s mitigated by constraint, observability, and consequence.

This is where the illusion becomes dangerous. As AI systems become more agentic, capable of tool use, memory, decision-making, and even goal setting, the distance between declared metadata and actual behaviour widens. Protocols like MCP may describe intent, but they do not account for interaction, escalation, or misuse.

Policymakers must also avoid the seduction of checklists. A system that declares its boundaries is not necessarily adhering to them. Governance must go beyond “what the model says it is” and grapple with what the system is doing in the real world.

To put it plainly: the real challenge is not the pipe, it’s what flows through it.
The data, the prompts, the goals, the interactions. That’s where harm lives. And that’s where governance must intervene.

If we confuse standardisation with safety, we’ll spend the next decade refining our protocols, while overlooking the behaviours that cause real-world damage.

MCP can bring clarity. But clarity without enforcement is not governance. It’s documentation.

8. What we’re building at BI Group

At BI Group, we don’t mistake structure for safety, and we don’t govern AI by what it claims to be.

We design Responsible AI systems that work in the world as it is: complex, dynamic, full of context collapse and unpredictable incentives. That’s why our focus isn’t on the artefact, it’s on the outcome.

Our Responsible AI Blueprint isn’t a protocol. It’s an adaptive, system-level assurance framework built for real-world AI.

We build systems that:

Validate prompt behaviour in real time: detecting when an input triggers outputs that violate policy, push beyond declared intent, or exploit edge-case ambiguity.
Supervise autonomous agents through chained audits: tracing not just model outputs, but how agents invoke tools, sequence decisions, and pursue goals across multi-step workflows.
Align outputs to regulatory and business context: ensuring that what’s generated downstream meets the obligations of the use case, not just the origin model’s metadata.

The goal is simple: govern what matters. Not what the model once was, but what the system is doing now, and what happens next.

Protocols like MCP are useful. We support their adoption. But they must sit inside a broader governance architecture, one capable of seeing, scoring, and stopping AI behaviour that drifts, mutates, or misleads.

📣 If your institution is serious about AI safety, don’t stop at metadata. Talk to us about system-aware governance, adaptive controls, and Responsible AI that’s actually operational.

Because in this next phase of AI, documentation will not protect you.

But real governance might.

AI-Powered Solutions for a Sustainable Future