Integrating Claude with Model Context Protocol (MCP): Building Production-Grade MCP Servers

If you’ve tried to give Claude access to your internal tools — a database, an API, a proprietary data source — you’ve probably cobbled together something with function calling and hoped for the best. Claude MCP server integration gives you a standardized, production-ready alternative. The Model Context Protocol (MCP) is Anthropic’s open protocol for connecting Claude to external tools and data sources in a way that’s composable, reusable, and actually maintainable. This article covers how to build custom MCP servers from scratch, how the architecture fits together, and what breaks when you move from local testing to production.

What MCP Actually Is (And What the Docs Gloss Over)

MCP is a client-server protocol where Claude acts as the client and your custom server exposes tools, resources, and prompts. When Claude needs to call a tool, it sends a JSON-RPC request to your MCP server over stdio or HTTP+SSE. Your server handles it and returns a structured response. That’s the whole thing.

The reason this matters in practice: without MCP, every agent you build needs its own bespoke tool integration. With MCP, you build the server once and any MCP-compatible client — Claude Desktop, Claude.ai (with MCP enabled), or your own agent code — can use it. The protocol handles capability negotiation, error formatting, and the schema layer so you’re not writing that boilerplate repeatedly.

What Anthropic’s docs don’t emphasize enough: MCP is transport-agnostic. Stdio works great for local development and single-process deployments. For production, you’ll want SSE (Server-Sent Events) over HTTP, which lets you run the server as a persistent service. The architecture choice matters before you write a single line of tool logic.

The Three Things an MCP Server Exposes

Tools — Functions Claude can call. Think: query a database, post a Slack message, run a calculation. These are the main event.
Resources — Read-only data that Claude can pull in as context. Files, database rows, API responses. Think of these as a managed way to give Claude reference material.
Prompts — Pre-defined prompt templates that users or orchestrators can invoke. Useful for standardizing workflows across agents.

For most production use cases you’ll focus on tools. Resources are useful when you need Claude to reference large static datasets without putting them in the system prompt.

Setting Up Your First MCP Server

The official Python SDK is the fastest path. Install it alongside the MCP package:

pip install anthropic mcp

Here’s a minimal but production-relevant MCP server that exposes two tools — a database query tool and a cache-busting utility. I’m using SQLite for brevity, but the pattern is identical for Postgres or any other backend:

import sqlite3
import json
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp import types

# Initialize the MCP server with a name Claude will see
app = Server("internal-data-server")

# In production, load this from env vars, not hardcoded paths
DB_PATH = "/var/data/production.db"

@app.list_tools()
async def list_tools() -> list[types.Tool]:
    """Tell Claude what tools are available — called during capability negotiation."""
    return [
        types.Tool(
            name="query_database",
            description=(
                "Run a read-only SQL query against the internal database. "
                "Only SELECT statements are permitted."
            ),
            inputSchema={
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "A valid SQL SELECT statement"
                    },
                    "limit": {
                        "type": "integer",
                        "description": "Max rows to return (default 50, max 500)",
                        "default": 50
                    }
                },
                "required": ["query"]
            }
        ),
        types.Tool(
            name="get_schema",
            description="Return the table schema for a given table name.",
            inputSchema={
                "type": "object",
                "properties": {
                    "table_name": {"type": "string"}
                },
                "required": ["table_name"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
    """Route tool calls to the appropriate handler."""
    
    if name == "query_database":
        query = arguments["query"].strip()
        limit = min(arguments.get("limit", 50), 500)
        
        # Enforce read-only — crude but effective for most cases
        if not query.upper().startswith("SELECT"):
            return [types.TextContent(
                type="text",
                text=json.dumps({"error": "Only SELECT queries are permitted."})
            )]
        
        # Append LIMIT if not present to prevent accidental full-table scans
        if "LIMIT" not in query.upper():
            query = f"{query} LIMIT {limit}"
        
        try:
            conn = sqlite3.connect(DB_PATH)
            conn.row_factory = sqlite3.Row
            cursor = conn.execute(query)
            rows = [dict(row) for row in cursor.fetchall()]
            conn.close()
            return [types.TextContent(
                type="text",
                text=json.dumps({"rows": rows, "count": len(rows)})
            )]
        except Exception as e:
            return [types.TextContent(
                type="text",
                text=json.dumps({"error": str(e)})
            )]
    
    elif name == "get_schema":
        table_name = arguments["table_name"]
        try:
            conn = sqlite3.connect(DB_PATH)
            cursor = conn.execute(
                "SELECT sql FROM sqlite_master WHERE type='table' AND name=?",
                (table_name,)
            )
            row = cursor.fetchone()
            conn.close()
            result = row[0] if row else f"Table '{table_name}' not found."
            return [types.TextContent(type="text", text=result)]
        except Exception as e:
            return [types.TextContent(type="text", text=f"Error: {e}")]
    
    return [types.TextContent(type="text", text=f"Unknown tool: {name}")]

if __name__ == "__main__":
    import asyncio
    asyncio.run(stdio_server(app))

Run this with python server.py and point Claude Desktop at it via its config file. You’ll see both tools appear in the available tools list automatically — that’s the capability negotiation working.

Moving to HTTP+SSE for Production Deployments

Stdio is fine when the MCP server runs on the same machine as the client. The moment you want your server running as a shared service — say, a single database proxy used by multiple Claude agents across different machines — you need SSE transport.

from mcp.server.sse import SseServerTransport
from starlette.applications import Starlette
from starlette.routing import Route, Mount
import uvicorn

# Reuse the `app` Server object from above — transport is separate from logic
sse = SseServerTransport("/messages")

async def handle_sse(request):
    async with sse.connect_sse(
        request.scope, request.receive, request._send
    ) as streams:
        await app.run(
            streams[0], streams[1], app.create_initialization_options()
        )

async def handle_messages(request):
    await sse.handle_post_message(request.scope, request.receive, request._send)

starlette_app = Starlette(routes=[
    Route("/sse", endpoint=handle_sse),
    Mount("/messages", app=handle_messages),
])

if __name__ == "__main__":
    # Bind to 0.0.0.0 for container deployments; use a reverse proxy in front
    uvicorn.run(starlette_app, host="0.0.0.0", port=8080)

Deploy this behind nginx or a cloud load balancer. Add your auth layer at the reverse proxy level — basic bearer token validation is enough for internal services. Don’t implement auth inside the MCP server itself; keep the protocol handler clean and push auth to the edge.

Authentication, Rate Limiting, and the Things That Break at Scale

The MCP spec doesn’t define authentication. That’s left to the transport layer, which is pragmatic but means you have to think about it yourself. Here’s what actually breaks in production:

Token Injection Attacks

If Claude is summarizing user-provided content and then calling your MCP tools based on that content, you’re exposed to prompt injection. A malicious document can instruct Claude to call your delete_record tool. The mitigation: keep destructive tools out of your MCP server entirely unless you have a confirmation layer, and scope tool permissions tightly. A read-only database tool is much safer than a read-write one.

Latency Budgets

Claude’s tool-calling timeout is effectively bounded by your API timeout. If your MCP tool calls a slow downstream service, you’ll hit timeouts before getting results. Keep tool response times under 5 seconds for anything in the hot path. For long-running operations, return a job ID immediately and expose a separate get_job_status tool — polling is better than blocking.

Error Formatting

Claude handles errors better when they’re structured. Return JSON with an error key rather than plain text error strings. Claude will incorporate the error context into its reasoning rather than treating it as a successful result with garbage data.

Schema Specificity

Vague tool descriptions produce vague tool calls. Write descriptions as if you’re writing for a junior developer who has never seen your system. “Query the orders table filtered by customer ID and date range” beats “query the database”. The more specific the description and the tighter the JSON schema, the fewer hallucinated arguments you’ll see.

Connecting Your MCP Server to Claude Agents Programmatically

Claude Desktop is great for testing. In production, you’ll invoke MCP servers from within your own agent code using the Anthropic Python SDK with tool definitions that mirror your MCP server’s schema. The cleanest pattern is to auto-generate the tool definitions by calling your server’s list_tools endpoint at agent startup:

import anthropic
import httpx
import json

async def get_tools_from_mcp_server(server_url: str) -> list[dict]:
    """Fetch tool definitions from a running SSE MCP server's metadata endpoint."""
    # In practice you'd use the MCP client library for this;
    # this shows the underlying HTTP call for clarity
    async with httpx.AsyncClient() as client:
        response = await client.get(f"{server_url}/tools")
        return response.json()["tools"]

async def run_agent_with_mcp(user_message: str):
    client = anthropic.Anthropic()
    
    # These would be fetched dynamically from your MCP server
    tools = [
        {
            "name": "query_database",
            "description": "Run a read-only SQL SELECT query against internal data.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "limit": {"type": "integer", "default": 50}
                },
                "required": ["query"]
            }
        }
    ]
    
    messages = [{"role": "user", "content": user_message}]
    
    # Agentic loop — keep going until Claude stops requesting tools
    while True:
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )
        
        if response.stop_reason == "end_turn":
            # Extract final text response
            return next(
                block.text for block in response.content
                if hasattr(block, "text")
            )
        
        if response.stop_reason == "tool_use":
            # Process all tool calls in this turn
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    # Route to your MCP server here
                    result = await call_mcp_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": json.dumps(result)
                    })
            
            # Append assistant turn and tool results, then loop
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})

At claude-opus-4-5 pricing, a tool-calling loop with 3–4 turns costs roughly $0.015–0.04 per run depending on context size. For high-volume workflows, switch to claude-haiku-4-5 and you’re looking at around $0.001–0.003 per run — a 10–15x cost difference that matters at scale.

When Claude MCP Server Integration Makes Sense (And When It Doesn’t)

MCP is worth the setup cost when: you’re building multiple agents that need the same tools, you want a clean separation between agent logic and data access, or you need non-technical teammates to connect Claude to your data without touching agent code.

Skip MCP and use raw function calling when: you’re building a single-agent, single-tool integration, you need maximum control over the tool execution lifecycle, or you’re prototyping something that will change shape weekly. The protocol overhead isn’t worth it for throwaway scripts.

For teams building serious agent infrastructure, Claude MCP server integration is the right architectural bet. The ecosystem is growing fast — there are already community MCP servers for GitHub, Postgres, Slack, and a dozen other services. Building your custom servers to the same standard means you can mix and match them as your needs evolve without rewriting agent code.

Integrating Claude with Model Context Protocol (MCP): Building Production-Grade MCP Servers

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation

Integrating Claude with Model Context Protocol (MCP): Building Production-Grade MCP Servers

What MCP Actually Is (And What the Docs Gloss Over)

The Three Things an MCP Server Exposes

Setting Up Your First MCP Server

Moving to HTTP+SSE for Production Deployments

Authentication, Rate Limiting, and the Things That Break at Scale

Token Injection Attacks

Latency Budgets

Error Formatting

Schema Specificity

Connecting Your MCP Server to Claude Agents Programmatically

When Claude MCP Server Integration Makes Sense (And When It Doesn’t)

Recommended Path by Reader Type

Related Posts

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation