Fingerprint the Model: Recon and Baseline Scan

The Play

Recon comes before craft. Every other play in this book is sharper when you already know where the model is soft. This play is the wide sweep: you take an open-source scanner, point it at an endpoint you are authorized to test, and let it throw a published library of probe families at the model while you watch the failure rates roll in. Think nmap for language models. You are not trying to win here. You are building a map. The output is a JSONL hit log that tells you which categories of weakness are real on this specific model, on this specific day, behind this specific system prompt. That map decides where you spend your expensive, hand-crafted attention next.

Before the Snap

Get the authorization in writing first. The scope must name the exact endpoint or model, the time window, the rate ceiling, and a named contact who can stop the run. A baseline scan is loud by design: it fires hundreds of attempts and it will show up in logs, so the defenders on the other side need to know it is you and not a real adversary. For a no-paperwork warmup, run the scanner against a local Ollama model you own outright, or work the hosted Crucible challenge range, which exists to be attacked. Confirm you can reach the endpoint, confirm your rate budget, and confirm where the hit log will land before you fire anything at a shared system.

Run It

Confirm scope and target. Re-read the signed authorization, note the endpoint, the time window, and the rate ceiling. If this is practice, stand up a local Ollama model or open a Crucible challenge instead, so nothing touches a system you do not own.
Pick your probe families, not everything at once. Start with the families that map to your top OWASP concerns: prompt injection, sensitive information leakage, and misinformation. A focused sweep finishes faster and produces a hit log you can actually read.
Run the scanner in a low-and-named configuration. Set a generation budget per probe and respect the agreed rate ceiling so you do not knock the endpoint over. Let it complete a full pass so the failure rates are statistically meaningful, not a handful of samples.
Watch the live pass-fail counters during the run. The console shows each probe family scoring against the model. Note which families light up hot in real time, that is your first read on the attack surface.
Open the JSONL hit log when the run finishes. Each line is one attempt with its probe, its detector verdict, and the model response. Filter to the failures only, that is your map.
Rank the probe families by failure rate. The families with the highest hit rates are where this model is genuinely soft. Sort them, do not eyeball them, so the priority is defensible.
Map each top hit to an OWASP category and to the next AHP play. A cluster of injection hits routes to the injection play, leakage hits route to the data-extraction play, and so on. The baseline becomes a target list.
Write a short triage note and save the raw log. Record the model version, the date, the system-prompt conditions, and the ranked findings. This is your before snapshot, and you will re-run it after any fix to prove movement.

What You Learn

You learn that you do not have to guess where a model is weak, you can measure it. A broad scan turns an unknown endpoint into a ranked list of failure categories in minutes, and that ranking is the difference between a red team run that wanders and one that goes straight at the soft spots. You also learn how loud recon is, and why naming yourself to the defenders matters. Most of all you learn the rhythm of the whole book: scan wide first, then craft narrow, then re-scan to prove the fix.

Drive It with Claude Code

We have signed authorization to test the model endpoint defined in scope.md. Run a garak baseline scan against our local Ollama target using the promptinject, leakreplay, and lmrc probe families at the agreed rate ceiling, then parse the resulting JSONL hit log: rank every probe family by failure rate, map the top three clusters to their OWASP LLM categories, and give me a triage note recommending which AHP play to run next on each.

## Baseline scan against an authorized local Ollama model
## Treat garak like nmap-for-LLMs: wide probe families, read the hitlog after.
 
python -m garak \
  --model_type ollama \
  --model_name llama3 \
  --probes promptinject,leakreplay,lmrc \
  --generations 5 \
  --report_prefix baseline_$(date +%Y%m%d)
 
## Output: baseline_YYYYMMDD.report.jsonl  (one JSON line per attempt)
## Triage: filter to failures, then rank families by hit rate.
jq -r 'select(.status==2) | .probe' baseline_*.report.jsonl | sort | uniq -c | sort -rn
 
## List every available probe family before you scope the run:
##   python -m garak --list_probes

Defend It

Defenders should assume their model will be fingerprinted and plan for it. Rate-limit and monitor probing traffic so a wide automated sweep stands out as a burst of failed safety checks from one source, and alert on it. Minimize what the model discloses about itself: do not echo the model name, version, or system prompt back to users, because that information narrows an attacker's probe selection. Run the same open scanner against your own endpoint on a schedule, treat the failure rate as a tracked metric, and gate releases on it not regressing. The team that scans itself first owns the baseline that an attacker would otherwise build for free.