Terraform Specialist: The Claude Code Agent That Eliminates IaC Toil

Infrastructure as Code should make your life easier. In practice, it often means hours debugging cryptic state errors, untangling provider version conflicts, refactoring copy-pasted module configurations across environments, or reverse-engineering what some long-gone contractor actually provisioned. Terraform is powerful, but the cognitive overhead of doing it well — remote state backends, workspace strategies, version locking, drift detection — is substantial.

The Terraform Specialist agent for Claude Code targets exactly this problem. It’s a purpose-built sub-agent that brings deep Terraform expertise into your editor, ready to scaffold production-quality modules, reason about state management strategies, debug plan failures, and generate the backend configurations and CI/CD integrations that most teams either skip or get wrong. It’s the difference between asking a generalist LLM “how do Terraform workspaces work?” and working alongside someone who knows when workspaces are the wrong answer entirely.

If your team manages cloud infrastructure with Terraform — even occasionally — this agent will reclaim hours you currently spend on documentation, boilerplate, and painful trial-and-error cycles.

When to Use the Terraform Specialist

This agent is explicitly designed to be invoked proactively, not just when you’re already stuck. Here are the scenarios where it earns its keep:

Designing New Modules From Scratch

Starting a new Terraform module involves a lot of decisions: input variable structure, output definitions, version constraints, whether to expose the full resource interface or a curated subset. The Terraform Specialist generates complete, opinionated modules with proper variable definitions, defaults, and .tfvars examples — not just a skeleton you have to fill in yourself.

Remote State Setup and Migration

Whether you’re moving from local state to S3, Azure Storage, or Terraform Cloud, or consolidating multiple state files into a coherent structure, this agent knows the backend configuration patterns, the necessary IAM and RBAC permissions, and the migration procedures that won’t corrupt your state.

Multi-Environment Workspace Strategies

The “workspaces vs. directory-per-environment vs. separate root modules” debate has real consequences for your team’s ability to manage infrastructure at scale. The Terraform Specialist can walk through the tradeoffs and generate the directory structure, variable files, and CI/CD configuration appropriate for your situation.

Importing Existing Infrastructure

Few tasks are more tedious than bringing existing cloud resources under Terraform management. The agent can generate terraform import commands, write the corresponding resource configurations, and produce a plan for validating that the imported state matches reality.

CI/CD Pipeline Integration

Wiring Terraform into GitHub Actions, GitLab CI, or Azure Pipelines correctly — with plan-on-PR, apply-on-merge, state locking, and proper secret handling — is non-trivial. The agent generates pipeline configurations with the right sequencing and safety checks built in.

Drift Detection and Remediation

When terraform plan shows unexpected changes, you need to understand whether your code drifted from reality or reality drifted from your code. This is where the agent’s focus on terraform plan output analysis and import workflows pays off.

Real-World Examples

Scenario: Bootstrapping a Production-Ready AWS Module

A backend engineer is tasked with creating a reusable Terraform module for provisioning RDS instances across dev, staging, and production environments. They’ve been handed a half-finished module with hardcoded values and no variable documentation.

Developer: I need to create a reusable Terraform module for RDS PostgreSQL instances.
We need to support dev, staging, and prod with different instance sizes and backup 
retention. Currently we have some hardcoded configs scattered across three directories.

Terraform Specialist: I'll scaffold a complete RDS module that consolidates those 
configs. Here's the structure:

modules/rds-postgres/
  main.tf          # aws_db_instance, aws_db_subnet_group, aws_security_group
  variables.tf     # instance_class, allocated_storage, backup_retention_period, etc.
  outputs.tf       # endpoint, port, db_name, security_group_id
  versions.tf      # required AWS provider >= 5.0

I'll expose instance_class, storage, backup_retention, and deletion_protection 
as inputs with sensible defaults. Multi-AZ and enhanced monitoring get their own 
flags. I'll also generate dev.tfvars, staging.tfvars, and prod.tfvars so the 
differences between environments are explicit and reviewable.

One thing to flag: if those three directories have separate state files, you'll 
want to run terraform state mv before decommissioning them, not just delete the old 
configs. Want me to generate the migration plan alongside the module?

Scenario: Debugging State Corruption After a Failed Apply

A platform engineer ran terraform apply during a network interruption. The state file is now inconsistent with what’s actually provisioned. They need to recover without destroying live infrastructure.

Developer: Our apply failed halfway through with a network timeout. Now terraform plan 
shows it wants to recreate resources that already exist. State is in S3. How do we 
recover without blowing away production?

Terraform Specialist: First, do not run apply yet. Here's the recovery sequence:

1. Pull the current state: terraform state pull > backup-$(date +%Y%m%d).tfstate
   Keep this. Even if it's wrong, you want a restore point.

2. Run terraform refresh to sync state with actual cloud resources. This won't 
   change infrastructure, only the state file's view of it.

3. Compare plan output before and after refresh. If resources that "want to be 
   recreated" disappear from the plan, refresh fixed it. If they're still there, 
   the resources likely don't exist in AWS at all — they need to be imported or 
   recreated.

4. For any resources that exist in AWS but not in state, use terraform import. 
   Give me the resource types and IDs from the AWS console and I'll generate 
   the import commands and matching resource configs.

5. Once plan shows zero unexpected changes, re-enable state locking if you 
   disabled it during recovery. Your S3 backend should have DynamoDB locking — 
   if it doesn't, let's add that now to prevent this class of problem going forward.

Scenario: Setting Up Terraform in CI/CD

A DevOps engineer needs to configure GitHub Actions to run Terraform plan on PRs and apply on merge to main, with proper secret management and plan output posted as a PR comment.

Developer: Need GitHub Actions for Terraform — plan on PR, apply on merge. 
AWS credentials are in GitHub secrets. We want the plan diff visible in the PR.

Terraform Specialist: I'll generate a two-workflow setup. The PR workflow runs 
fmt, validate, and plan, then posts the plan output as a PR comment using 
actions/github-script. The merge workflow runs apply with -auto-approve only 
on pushes to main.

Key details I'll include: OIDC-based AWS authentication instead of static 
access keys (much safer, no rotation burden), terraform state lock with a 
timeout so stuck locks don't block your pipeline indefinitely, and a manual 
approval step before apply that you can enable when you're ready to require it.

I'll also add a .terraform.lock.hcl commit check — if your lockfile isn't 
committed, your CI environment can silently use different provider versions 
than local development. That's a class of bug worth eliminating upfront.

What Makes This Agent Powerful

Opinionated by Design

The agent operates from a clear set of principles: DRY module design, state files treated as sacred artifacts, plan-before-apply discipline, version locking for reproducibility, and data sources over hardcoded values. This isn’t a neutral assistant that will help you do things the wrong way if you ask nicely — it will flag risky patterns and suggest the safer path.

Complete Outputs, Not Fragments

The agent is configured to produce full modules with input variables, backend configurations, provider requirements with version constraints, Makefile targets for common operations, and pre-commit hooks for validation. You get something you can actually commit, not a starting point you have to complete yourself.

State Management Expertise

State management is where most Terraform disasters originate. The agent’s explicit focus on backup-first, refresh-before-apply, and import workflows reflects hard-won operational experience. It won’t let you skip the steps that prevent data loss.

Multi-Cloud Backend Knowledge

Whether your remote state lives in Azure Storage, AWS S3 with DynamoDB locking, or Terraform Cloud, the agent knows the configuration patterns and the permissions required. You don’t have to cross-reference three different documentation sites to get a working backend block.

How to Install the Terraform Specialist

Installing this agent takes about sixty seconds. Claude Code automatically discovers and loads agents defined in your project’s .claude/agents/ directory.

Step 1: In your project root (or your home directory for a global agent), create the directory:

mkdir -p .claude/agents

Step 2: Create the file .claude/agents/terraform-specialist.md and paste the following system prompt as the file contents:

---
name: terraform-specialist
description: Terraform and Infrastructure as Code specialist. Use PROACTIVELY for Terraform modules, state management, IaC best practices, provider configurations, workspace management, and drift detection.
---

You are a Terraform specialist focused on infrastructure automation and state management.

## Focus Areas

- Module design with reusable components
- Remote state management (Azure Storage, S3, Terraform Cloud)
- Provider configuration and version constraints
- Workspace strategies for multi-environment
- Import existing resources and drift detection
- CI/CD integration for infrastructure changes

## Approach

1. DRY principle - create reusable modules
2. State files are sacred - always backup
3. Plan before apply - review all changes
4. Lock versions for reproducibility
5. Use data sources over hardcoded values

## Output

- Terraform modules with input variables
- Backend configuration for remote state
- Provider requirements with version constraints
- Makefile/scripts for common operations
- Pre-commit hooks for validation
- Migration plan for existing infrastructure

Always include .tfvars examples. Show both plan and apply outputs.

Step 3: Claude Code loads agents automatically on startup. No CLI commands, no configuration files to edit. Open Claude Code in any project and the agent will be available.

To invoke it, either reference it directly in your prompt (“Using the Terraform specialist, scaffold a VPC module…”) or let Claude Code route to it automatically when your request involves Terraform or IaC work.

Next Steps

Once the agent is installed, put it to work immediately on the highest-leverage problems in your infrastructure codebase. If you have hardcoded values scattered across environment-specific configurations, start there — ask the agent to refactor them into a proper module with variable definitions. If your state is still local, ask it to generate the backend migration plan for your cloud provider. If your CI pipeline runs terraform apply without a plan review step, have it generate the corrected workflow.

The Terraform Specialist works best when you give it context: your provider, your cloud, your current directory structure, your constraints. The more specific your input, the more directly usable the output. Treat it as a pairing partner who already knows Terraform deeply, and use it proactively — before you’ve spent two hours on a problem, not after.

Agent template sourced from the claude-code-templates open source project (MIT License).

Terraform Specialist — Claude Code Agent

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation