Your First AI Red Team Skill: From Zero to Weaponized in 30 Minutes

This is Part 1 of The Operator's Field Manual: Claude Code for Offensive Security, a 7-part series on building AI-augmented red team operations from scratch.

Why Claude Code (and Why It's Not the Only Option)

This series uses Claude Code because it is the best tool I have found for offensive security work. The combination of agentic execution, custom skills, MCP server integration, and the ability to reason about complex attack surfaces in real time puts it ahead of everything else I have tested.

That said, the concepts in this series are not exclusive to Claude Code. Google's Gemini CLI, Codex, and other powerful AI coding agents can accomplish similar workflows. The skill structures, the MCP integrations, the phased methodology. All of it translates. If you are already deep in another ecosystem, you can adapt what you learn here. The principles are the same. AI that can read code, execute commands, iterate on output, and reason about what it finds is the new operating model for red teams. Claude Code is how I build it. It does not have to be how you build it.

What matters is the shift. The traditional approach to offensive security, manually grinding through targets, chasing rabbit holes, memorizing techniques from a pile of notes, that methodology hasn't caught up with what AI makes possible. The question is no longer "how do I memorize every technique." The question is "what skills should I be building, and what should I be creating with AI?"

This series answers that question. Let's start with your first skill.

Setting Up Claude Code for Offensive Work

Before you build anything, you need a project structure. Claude Code uses a CLAUDE.md file in the root of your project directory to understand what it's working on and how it should behave. For offensive security work, this file is your engagement brief.

Here's a starter config:

# Red Team Operations Workspace
 
## Scope
- Target: [engagement target or lab environment]
- Rules of engagement: [authorized testing boundaries]
- Out of scope: [what not to touch]
 
## Context
You are assisting an authorized offensive security operator. All testing
is conducted within legal boundaries against authorized targets. Follow
responsible disclosure practices.
 
## Offensive Workflow
1. Recon: enumerate the target, identify services, map the attack surface
2. Analysis: prioritize findings by exploitability and impact
3. Exploitation: test identified vulnerabilities with appropriate techniques
4. Documentation: log every finding with reproduction steps
 
## Tools Available
- Bash: full terminal access for running security tools
- Read/Write: access to project files, notes, and findings
- Grep/Glob: searching through enumeration output and logs
 
## Reporting
- Document every finding in /findings/ with severity rating
- Include reproduction steps for every vulnerability
- Screenshot or log evidence for the final report

Your project structure should look like this:

red-team-ops/
├── CLAUDE.md              # Engagement config (above)
├── .claude/
│   └── skills/            # Your custom offensive skills
│       ├── recon/
│       │   └── SKILL.md
│       ├── exploit/
│       │   └── SKILL.md
│       └── report/
│           └── SKILL.md
├── targets/               # Per-target working directories
├── findings/              # Documented vulnerabilities
├── tools/                 # Custom scripts and utilities
└── reports/               # Final deliverables

The key concept: skills are reusable offensive capabilities. Each skill is a markdown file that tells Claude Code what to do, how to do it, and what output to produce. Think of them as playbooks that Claude Code executes with full access to your terminal, files, and tools.

Agents extend this further. You can spawn background agents that work in parallel. One does recon while another analyzes previous findings. One probes a web application while another enumerates internal services.

Hooks fire automatically on events. A post-tool-use hook could log every command Claude Code runs to an audit trail. A pre-tool-use hook could enforce scope boundaries and prevent out-of-scope testing.

For MCP (Model Context Protocol) server connections, you can give Claude Code access to specialized tools: a Burp Suite bridge for web testing, a Kali toolset for exploitation, or an OSINT data source for reconnaissance. These extend Claude Code's capabilities beyond what's available through the terminal.

Building Your First Recon Skill

A Claude Code skill is a markdown file that lives in .claude/skills/ inside your project. When you tell Claude Code to use a skill, it reads the file and follows the instructions. Simple concept, powerful execution.

Here's a complete recon skill:

---
name: Recon
description: Automated reconnaissance and attack surface mapping
  for authorized targets. Enumerates services, fingerprints
  technology stacks, and identifies AI components.
---
 
# Recon Skill
 
Perform complete reconnaissance on an authorized target.
 
## Usage
 
Invoke with: "run recon on [target]" or "use the recon skill on [target]"
 
## Workflow
 
### Phase 1: Passive Reconnaissance
 
1. DNS enumeration: resolve the target domain, check for subdomains
   using dig, host, and any available subdomain enumeration tools
2. WHOIS and registration data: identify registrar, nameservers,
   and registration dates
3. Certificate transparency: search crt.sh for related domains
   and subdomains
4. Technology fingerprinting: use curl headers, response analysis,
   and any available fingerprinting to identify:
   - Web framework and server
   - JavaScript libraries and frontend stack
   - API patterns and endpoints
   - CDN and hosting provider
 
### Phase 2: AI Component Detection
 
Check for indicators of AI integration:
1. Look for common AI/LLM endpoints: /api/chat, /api/completion,
   /v1/messages, /api/ask, /graphql with AI-related queries
2. Check response headers for AI service indicators
   (x-openai, x-anthropic, x-model, etc.)
3. Analyze JavaScript bundles for AI SDK references
   (openai, anthropic, langchain, llamaindex)
4. Test for chatbot interfaces, search with AI features,
   or agent-like behavior
5. Check robots.txt and sitemap for AI-related paths
6. Look for vector database indicators (pinecone, weaviate,
   chromadb, qdrant references)
 
### Phase 3: Active Service Enumeration
 
Only if authorized for active testing:
1. Port scan the target (top 1000 ports)
2. Service version detection on open ports
3. Web directory discovery on HTTP/HTTPS services
4. API endpoint enumeration from JavaScript source analysis
 
## Output Format
 
Produce a structured recon report:
 
### Target: [domain]
#### Infrastructure
- IP addresses and hosting
- CDN and WAF detection
- DNS configuration
 
#### Technology Stack
- Server and framework
- Frontend libraries
- API patterns detected
 
#### AI Components Detected
- LLM endpoints found
- AI SDK references
- Chatbot/agent interfaces
- RAG/vector database indicators
 
#### Attack Surface Summary
- Total services exposed
- AI-specific attack vectors identified
- Priority targets for next phase
 
#### Recommended Next Steps
- Specific tests to run based on findings
- AI components to probe further
- Potential vulnerability classes to investigate

Save this as .claude/skills/recon/SKILL.md in your project.

To use it, open Claude Code in your project directory and say:

run recon on target.example.com

Claude Code reads the skill, follows the workflow phases in order, uses your terminal to run the actual commands (dig, curl, nmap, whatever tools you have installed), and produces the structured output.

Here's what the output looks like in practice:

### Target: target.example.com

#### Infrastructure
- IP: 104.21.x.x (Cloudflare)
- CDN: Cloudflare detected (cf-ray headers)
- DNS: 3 A records, MX pointing to Google Workspace

#### Technology Stack
- Server: Next.js on Vercel
- Frontend: React 18, Tailwind CSS
- API: REST endpoints at /api/*, GraphQL at /graphql

#### AI Components Detected
- Chat endpoint at /api/chat (POST, accepts JSON with "message" field)
- OpenAI SDK reference in main bundle (openai@4.x)
- LangChain reference in API route handlers
- Pinecone vector database client detected in server bundle

#### Attack Surface Summary
- 4 exposed services (HTTPS, SSH, GraphQL, WebSocket)
- 3 AI-specific vectors: chat endpoint, RAG pipeline, agent tool access
- Priority: /api/chat endpoint shows direct LLM interaction

#### Recommended Next Steps
- Test /api/chat for prompt injection (direct and indirect)
- Probe RAG pipeline for context poisoning via Pinecone
- Enumerate tool access through the chat interface
- Test GraphQL for introspection and schema leakage

That recon would take a human operator 2-4 hours of manual work. With the skill, it runs in under 15 minutes. And the output is structured, consistent, and directly feeds into the next phase of your operation.

Extending the Skill

The recon skill above is a starting point. As you build your toolkit, you extend it:

Add a phase for social engineering reconnaissance (LinkedIn profiles, org charts, email patterns)
Add screenshot capture for web services
Add integration with Shodan or Censys for deeper infrastructure mapping
Add a phase specifically for MCP server detection and enumeration
Build a second skill that takes the recon output and automatically generates an attack plan

Each skill you build compounds. Your second skill reads the output of your first. Your third skill chains the first two together. After a month of building, you have an arsenal that runs coordinated operations you would never have attempted manually.

Why This Changes Everything

Let me be direct about what this means for red team operations.

This is not about replacing pentesters. An AI does not understand the business context of a vulnerability. It does not know that the CEO's email address in a breached database is more valuable than a SQL injection on a test server. It does not have the judgment to decide when to stop testing and when to push harder.

What it does is remove the bottleneck that kills most engagements: time.

A single operator with Claude Code and a well-built skill library can do the reconnaissance, initial exploitation, and reporting work that traditionally requires a team of three. Not because the AI is smarter than three pentesters. Because it doesn't get tired. It doesn't chase rabbit holes for emotional reasons. It doesn't lose focus at 3am. It doesn't forget to check the service running on port 8443 because it got distracted by an interesting finding on port 80.

The difference between a vulnerability scanner and an AI operator is the difference between a checklist and a thinker. Scanners run known payloads against known patterns. An AI reads the recon output, reasons about what's unusual, generates hypotheses about what might be exploitable, and tests those hypotheses in real time.

That is not a marginal improvement. That is a different category of capability.

The operators who figure this out first will have an asymmetric advantage for the next two to three years. After that, everyone will have it, and the advantage shifts to whoever built the best skills and workflows.

Build now. The window is open.

What's Next

This is Part 1 of a 7-part series. We're just getting started.

Part 2: Reconnaissance at Machine Speed goes deep on building a full reconnaissance pipeline. Parallel agents running subdomain enumeration, technology fingerprinting, and AI component detection simultaneously. MCP server integration for tool access. Output formats that feed directly into exploitation workflows.

The goal of this series is simple: by Part 7, you will have a complete AI-augmented red team operation. Recon, weaponization, exploitation, coordination, and reporting. All built on Claude Code. All practical. All working code.

If that sounds useful, dig into more of the research in the guides, or follow along in the blog for new work as it ships. Everything we build, we share. Everything we learn, we publish.

The field manual continues next week.