Terraform Specialist: The Claude Code Agent That Eliminates IaC Toil
Infrastructure as Code should make your life easier. In practice, it often means hours debugging cryptic state errors, untangling provider version conflicts, refactoring copy-pasted module configurations across environments, or reverse-engineering what some long-gone contractor actually provisioned. Terraform is powerful, but the cognitive overhead of doing it well — remote state backends, workspace strategies, version locking, drift detection — is substantial.
The Terraform Specialist agent for Claude Code targets exactly this problem. It’s a purpose-built sub-agent that brings deep Terraform expertise into your editor, ready to scaffold production-quality modules, reason about state management strategies, debug plan failures, and generate the backend configurations and CI/CD integrations that most teams either skip or get wrong. It’s the difference between asking a generalist LLM “how do Terraform workspaces work?” and working alongside someone who knows when workspaces are the wrong answer entirely.
If your team manages cloud infrastructure with Terraform — even occasionally — this agent will reclaim hours you currently spend on documentation, boilerplate, and painful trial-and-error cycles.
When to Use the Terraform Specialist
This agent is explicitly designed to be invoked proactively, not just when you’re already stuck. Here are the scenarios where it earns its keep:
Designing New Modules From Scratch
Starting a new Terraform module involves a lot of decisions: input variable structure, output definitions, version constraints, whether to expose the full resource interface or a curated subset. The Terraform Specialist generates complete, opinionated modules with proper variable definitions, defaults, and .tfvars examples — not just a skeleton you have to fill in yourself.
Remote State Setup and Migration
Whether you’re moving from local state to S3, Azure Storage, or Terraform Cloud, or consolidating multiple state files into a coherent structure, this agent knows the backend configuration patterns, the necessary IAM and RBAC permissions, and the migration procedures that won’t corrupt your state.
Multi-Environment Workspace Strategies
The “workspaces vs. directory-per-environment vs. separate root modules” debate has real consequences for your team’s ability to manage infrastructure at scale. The Terraform Specialist can walk through the tradeoffs and generate the directory structure, variable files, and CI/CD configuration appropriate for your situation.
Importing Existing Infrastructure
Few tasks are more tedious than bringing existing cloud resources under Terraform management. The agent can generate terraform import commands, write the corresponding resource configurations, and produce a plan for validating that the imported state matches reality.
CI/CD Pipeline Integration
Wiring Terraform into GitHub Actions, GitLab CI, or Azure Pipelines correctly — with plan-on-PR, apply-on-merge, state locking, and proper secret handling — is non-trivial. The agent generates pipeline configurations with the right sequencing and safety checks built in.
Drift Detection and Remediation
When terraform plan shows unexpected changes, you need to understand whether your code drifted from reality or reality drifted from your code. This is where the agent’s focus on terraform plan output analysis and import workflows pays off.
Real-World Examples
Scenario: Bootstrapping a Production-Ready AWS Module
A backend engineer is tasked with creating a reusable Terraform module for provisioning RDS instances across dev, staging, and production environments. They’ve been handed a half-finished module with hardcoded values and no variable documentation.
Developer: I need to create a reusable Terraform module for RDS PostgreSQL instances.
We need to support dev, staging, and prod with different instance sizes and backup
retention. Currently we have some hardcoded configs scattered across three directories.
Terraform Specialist: I'll scaffold a complete RDS module that consolidates those
configs. Here's the structure:
modules/rds-postgres/
main.tf # aws_db_instance, aws_db_subnet_group, aws_security_group
variables.tf # instance_class, allocated_storage, backup_retention_period, etc.
outputs.tf # endpoint, port, db_name, security_group_id
versions.tf # required AWS provider >= 5.0
I'll expose instance_class, storage, backup_retention, and deletion_protection
as inputs with sensible defaults. Multi-AZ and enhanced monitoring get their own
flags. I'll also generate dev.tfvars, staging.tfvars, and prod.tfvars so the
differences between environments are explicit and reviewable.
One thing to flag: if those three directories have separate state files, you'll
want to run terraform state mv before decommissioning them, not just delete the old
configs. Want me to generate the migration plan alongside the module?
Scenario: Debugging State Corruption After a Failed Apply
A platform engineer ran terraform apply during a network interruption. The state file is now inconsistent with what’s actually provisioned. They need to recover without destroying live infrastructure.
Developer: Our apply failed halfway through with a network timeout. Now terraform plan
shows it wants to recreate resources that already exist. State is in S3. How do we
recover without blowing away production?
Terraform Specialist: First, do not run apply yet. Here's the recovery sequence:
1. Pull the current state: terraform state pull > backup-$(date +%Y%m%d).tfstate
Keep this. Even if it's wrong, you want a restore point.
2. Run terraform refresh to sync state with actual cloud resources. This won't
change infrastructure, only the state file's view of it.
3. Compare plan output before and after refresh. If resources that "want to be
recreated" disappear from the plan, refresh fixed it. If they're still there,
the resources likely don't exist in AWS at all — they need to be imported or
recreated.
4. For any resources that exist in AWS but not in state, use terraform import.
Give me the resource types and IDs from the AWS console and I'll generate
the import commands and matching resource configs.
5. Once plan shows zero unexpected changes, re-enable state locking if you
disabled it during recovery. Your S3 backend should have DynamoDB locking —
if it doesn't, let's add that now to prevent this class of problem going forward.
Scenario: Setting Up Terraform in CI/CD
A DevOps engineer needs to configure GitHub Actions to run Terraform plan on PRs and apply on merge to main, with proper secret management and plan output posted as a PR comment.
Developer: Need GitHub Actions for Terraform — plan on PR, apply on merge.
AWS credentials are in GitHub secrets. We want the plan diff visible in the PR.
Terraform Specialist: I'll generate a two-workflow setup. The PR workflow runs
fmt, validate, and plan, then posts the plan output as a PR comment using
actions/github-script. The merge workflow runs apply with -auto-approve only
on pushes to main.
Key details I'll include: OIDC-based AWS authentication instead of static
access keys (much safer, no rotation burden), terraform state lock with a
timeout so stuck locks don't block your pipeline indefinitely, and a manual
approval step before apply that you can enable when you're ready to require it.
I'll also add a .terraform.lock.hcl commit check — if your lockfile isn't
committed, your CI environment can silently use different provider versions
than local development. That's a class of bug worth eliminating upfront.
What Makes This Agent Powerful
Opinionated by Design
The agent operates from a clear set of principles: DRY module design, state files treated as sacred artifacts, plan-before-apply discipline, version locking for reproducibility, and data sources over hardcoded values. This isn’t a neutral assistant that will help you do things the wrong way if you ask nicely — it will flag risky patterns and suggest the safer path.
Complete Outputs, Not Fragments
The agent is configured to produce full modules with input variables, backend configurations, provider requirements with version constraints, Makefile targets for common operations, and pre-commit hooks for validation. You get something you can actually commit, not a starting point you have to complete yourself.
State Management Expertise
State management is where most Terraform disasters originate. The agent’s explicit focus on backup-first, refresh-before-apply, and import workflows reflects hard-won operational experience. It won’t let you skip the steps that prevent data loss.
Multi-Cloud Backend Knowledge
Whether your remote state lives in Azure Storage, AWS S3 with DynamoDB locking, or Terraform Cloud, the agent knows the configuration patterns and the permissions required. You don’t have to cross-reference three different documentation sites to get a working backend block.
How to Install the Terraform Specialist
Installing this agent takes about sixty seconds. Claude Code automatically discovers and loads agents defined in your project’s .claude/agents/ directory.
Step 1: In your project root (or your home directory for a global agent), create the directory:
mkdir -p .claude/agents
Step 2: Create the file .claude/agents/terraform-specialist.md and paste the following system prompt as the file contents:
---
name: terraform-specialist
description: Terraform and Infrastructure as Code specialist. Use PROACTIVELY for Terraform modules, state management, IaC best practices, provider configurations, workspace management, and drift detection.
---
You are a Terraform specialist focused on infrastructure automation and state management.
## Focus Areas
- Module design with reusable components
- Remote state management (Azure Storage, S3, Terraform Cloud)
- Provider configuration and version constraints
- Workspace strategies for multi-environment
- Import existing resources and drift detection
- CI/CD integration for infrastructure changes
## Approach
1. DRY principle - create reusable modules
2. State files are sacred - always backup
3. Plan before apply - review all changes
4. Lock versions for reproducibility
5. Use data sources over hardcoded values
## Output
- Terraform modules with input variables
- Backend configuration for remote state
- Provider requirements with version constraints
- Makefile/scripts for common operations
- Pre-commit hooks for validation
- Migration plan for existing infrastructure
Always include .tfvars examples. Show both plan and apply outputs.
Step 3: Claude Code loads agents automatically on startup. No CLI commands, no configuration files to edit. Open Claude Code in any project and the agent will be available.
To invoke it, either reference it directly in your prompt (“Using the Terraform specialist, scaffold a VPC module…”) or let Claude Code route to it automatically when your request involves Terraform or IaC work.
Next Steps
Once the agent is installed, put it to work immediately on the highest-leverage problems in your infrastructure codebase. If you have hardcoded values scattered across environment-specific configurations, start there — ask the agent to refactor them into a proper module with variable definitions. If your state is still local, ask it to generate the backend migration plan for your cloud provider. If your CI pipeline runs terraform apply without a plan review step, have it generate the corrected workflow.
The Terraform Specialist works best when you give it context: your provider, your cloud, your current directory structure, your constraints. The more specific your input, the more directly usable the output. Treat it as a pairing partner who already knows Terraform deeply, and use it proactively — before you’ve spent two hours on a problem, not after.
Agent template sourced from the claude-code-templates open source project (MIT License).
