Performance Engineer: The Claude Code Agent That Eliminates Performance Guesswork

Performance optimization is one of the most time-consuming and frustrating parts of software development — not because the fixes are hard, but because finding the right things to fix is genuinely difficult. Developers routinely spend hours chasing the wrong bottlenecks, optimizing code paths that aren’t hot, adding caches that don’t meaningfully reduce latency, or writing load tests that don’t resemble actual traffic patterns. The result: wasted effort, unchanged user experience, and a growing backlog of vague tickets labeled “make it faster.”

The Performance Engineer agent for Claude Code attacks this problem directly. It brings a structured, measurement-first methodology to every optimization task — profiling before recommending, benchmarking before and after, and ranking changes by actual user impact rather than theoretical improvement. Whether you’re debugging a slow API endpoint, setting up a CDN, tuning PostgreSQL queries, or getting your Core Web Vitals into the green, this agent keeps the work grounded in numbers rather than intuition.

For senior developers and engineering teams shipping production software, this means less time debating what might be slow and more time shipping verified improvements.

When to Use the Performance Engineer Agent

This agent is designed for proactive use. Don’t wait until users are complaining or an incident fires. The following scenarios are ideal entry points:

Pre-launch Performance Audits

Before a major release, run the agent across your critical paths. It can help establish performance budgets, identify likely regression points, and generate load test scripts that simulate realistic user behavior — all before a single real user hits the new code.

Diagnosing Production Slowdowns

When you have observability data showing elevated p95 or p99 latency but aren’t sure where to look, the agent can help interpret profiling output, suggest instrumentation strategies, and prioritize investigation across CPU, memory, I/O, and network dimensions simultaneously.

Database Query Optimization

Slow queries are among the most common causes of application performance problems and among the most under-examined. The agent understands query plans, index strategies, N+1 patterns, and ORM behavior across major databases. Feed it a slow query log and it returns actionable recommendations with expected impact.

Caching Strategy Design

Deciding where to cache, what TTLs to set, how to handle cache invalidation, and whether to use Redis, a CDN, or browser caching requires system-level reasoning that’s easy to get wrong. The agent treats caching as a multi-layer architecture problem rather than a single tactical decision.

Frontend Core Web Vitals Remediation

LCP, CLS, and INP failures are increasingly tied to search ranking and user retention. The agent understands the full stack of frontend performance — render-blocking resources, image optimization, JavaScript bundle analysis, and server response time — and can generate prioritized fix lists with expected CWV impact.

Load Testing Design and Execution

Most load tests are either too simple (constant arrival rate, single endpoint) or misconfigured (unrealistic ramp patterns, missing headers). The agent generates production-quality test scripts for JMeter, k6, and Locust, calibrated to realistic traffic shapes.

Real-World Examples

Scenario 1: API Endpoint Latency Investigation

A team notices their product search API is averaging 800ms response time under moderate load. They turn to the Performance Engineer agent to diagnose and prioritize fixes.

Developer: Our /api/search endpoint is sitting at 800ms average, 
2.1s p99 under about 200 concurrent users. We're on Node.js, 
PostgreSQL, no caching layer yet. Where do we start?

Agent: Before recommending changes, let's establish a baseline 
and locate the actual bottleneck. Here's the investigation sequence:

1. Instrument the request lifecycle first. Add timing spans at:
   - DB query execution (are you hitting N+1 here?)
   - External API calls (any third-party enrichment?)
   - Serialization/response construction
   
2. Run EXPLAIN ANALYZE on your search query. Specifically look 
   for Seq Scan on large tables — that's usually the first signal.

3. At 200 concurrent users and 800ms average, you're likely CPU-bound 
   on the DB or missing indexes. Share the query plan output and I'll 
   map the optimization path.

Initial hypothesis ranking by typical impact for this profile:
- Missing full-text search index: 40-70% latency reduction
- Query result caching (Redis, TTL 60s): 80%+ reduction for repeat queries
- Connection pool exhaustion: would show as p99 spike specifically
- N+1 on related data: would scale linearly with result set size

What does your query plan show?

Scenario 2: Setting Up a Caching Architecture

An engineering team is preparing to implement caching across their e-commerce platform and needs a coherent multi-layer strategy rather than ad-hoc Redis calls.

Developer: We need to add caching to our e-commerce platform. 
We have product pages, user carts, inventory data, and a 
recommendations engine. Where and how do we cache?

Agent: E-commerce caching needs to be layered precisely because 
the data has very different staleness tolerances. Here's the 
architecture I'd implement:

CDN Layer (Cloudflare/Fastly):
- Product pages: Cache with 5-minute TTL, purge on inventory update
- Static assets: Immutable cache headers, content-hashed filenames
- Do NOT cache: Cart pages, checkout flow, authenticated sessions

Redis Layer:
- Product catalog: TTL 300s, invalidate on CMS publish event
- Inventory counts: TTL 30s — users will accept slight staleness
- Recommendations: TTL 3600s per user_id, LRU eviction policy
- Sessions: TTL 86400s with sliding expiration

Application Layer (in-process):
- Category/taxonomy data: TTL 600s, shared across requests
- Configuration/feature flags: TTL 60s

Cart data: Never cache. Always read from source of truth. 
The cost of a stale cart is customer trust.

I'll generate the Redis implementation with TTL strategy, 
cache warming scripts for your catalog, and a CDN configuration. 
Want k6 load tests to validate the cache hit rate improvement 
before you deploy?

What Makes This Agent Powerful

Measure Before You Optimize

The agent’s first principle is instrumentation over intuition. Every optimization conversation starts with establishing what the current state actually is — latency distributions, profiling data, query plans. This alone prevents the most common performance mistake: optimizing things that don’t need optimizing.

Impact-Ranked Recommendations

The agent doesn’t output a flat list of suggestions. It ranks recommendations by expected impact on user-perceived performance, which means your engineering time goes to the changes that move the needle rather than the ones that are easiest to implement.

Full-Stack Coverage

From flamegraphs and CPU profiling to Core Web Vitals to CDN configuration to PostgreSQL query plans — the agent operates across the entire performance stack without switching context. A slow page load might have root causes in three different layers simultaneously, and the agent can reason across all of them.

Production-Quality Tooling Output

The agent generates ready-to-run artifacts: k6 and Locust scripts calibrated to realistic traffic patterns, Redis caching implementations with proper TTL strategies, monitoring dashboard configurations, and before/after benchmark frameworks. These aren’t boilerplate — they’re configured to your specific scenario.

Performance Budget Thinking

One of the most underused practices in frontend and API performance is setting explicit budgets and enforcing them in CI. The agent understands performance budgets as first-class engineering constraints and can help you implement them in your build and test pipeline.

Specific Numbers, Not Vague Guidance

The agent consistently works with concrete benchmarks. When it recommends an index, it estimates the query time reduction. When it suggests a caching TTL, it explains the staleness tradeoff in user terms. This makes it substantially easier to prioritize and justify performance work to stakeholders.

How to Install the Performance Engineer Agent

Installing this agent takes about two minutes. Claude Code loads agents automatically from a designated directory in your project.

Step 1: In the root of your project, create the directory .claude/agents/ if it doesn’t already exist.

Step 2: Create a new file at the following path:

.claude/agents/performance-engineer.md

Step 3: Paste the following system prompt into that file:

You are a performance engineer specializing in application 
optimization and scalability.

## Focus Areas
- Application profiling (CPU, memory, I/O)
- Load testing with JMeter/k6/Locust
- Caching strategies (Redis, CDN, browser)
- Database query optimization
- Frontend performance (Core Web Vitals)
- API response time optimization

## Approach
1. Measure before optimizing
2. Focus on biggest bottlenecks first
3. Set performance budgets
4. Cache at appropriate layers
5. Load test realistic scenarios

## Output
- Performance profiling results with flamegraphs
- Load test scripts and results
- Caching implementation with TTL strategy
- Optimization recommendations ranked by impact
- Before/after performance metrics
- Monitoring dashboard setup

Include specific numbers and benchmarks. Focus on 
user-perceived performance.

Step 4: Save the file. Claude Code will automatically detect and load agents from .claude/agents/ — no additional configuration required. The Performance Engineer agent will now be available in your Claude Code sessions.

You can commit this file to your repository so the entire team benefits from the same agent configuration.

Conclusion and Next Steps

The Performance Engineer agent is most valuable when you use it before performance becomes a crisis. Add it to your project now, and the next time a query runs slow, a load test reveals unexpected behavior, or you’re architecting a caching layer, you have a structured, measurement-first collaborator ready to work through it with you.

Practical next steps to get value immediately:

Install the agent and run it against your slowest API endpoint with a copy of your current query plan or profiling data
Use it to generate a k6 load test script for your most critical user journey before your next release
Ask it to audit your current caching architecture and identify TTL strategy gaps
Set up performance budgets for your Core Web Vitals with the agent’s help, then enforce them in CI

Performance work done well is invisible to users — pages load, APIs respond, systems stay stable under load. Done poorly, it’s the reason users churn. The Performance Engineer agent gives you the methodology and tooling to consistently get it right.

Agent template sourced from the claude-code-templates open source project (MIT License).

Performance Engineer — Claude Code Agent

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation