Sign the Scope (ROE and Authorization)
No signature, no test. The one play you run before every other play.
Hack AI with AI, the ethical way. Hands-on plays for testing LLMs, agents, MCP servers, and RAG. You learn by doing, on ground you are allowed to touch.
Every play runs against a target you own, a lab you stood up, or a range someone sanctioned. PLAY-00 is the first move, not a formality. We show the pick because we respect the lock: every play closes with the fix.
The system prompt is the app's source code in plain English. Get it to read its own instructions back, and the guardrails stop being hidden.
Before you hand-craft a single probe, you scan. This play points an open-source LLM vulnerability scanner at your own authorized endpoint, treats it like nmap for language models, and turns a wide automated sweep into a ranked list of weak spots. You learn where the model leaks, where it bends, and where to point the next play, all without writing one payload yourself.
A model that knows too much and guards too little is a leak waiting to happen. This play probes an authorized LLM for the data it should never surface: PII, embedded secrets, and verbatim training-data fragments. You map what the model remembers, what it infers, and what it parrots back, then hand the defender a redaction and minimization plan.
The model is only as honest as the documents it retrieves. Poison the corpus, not the prompt, and the model repeats your answer with full confidence and a citation. This play builds an owned poisoned vector store and measures the blast radius.
The whole prompt is one stream. The model cannot tell the developer's instructions from yours. That gap is the play.
The attacker never talks to the model. The data does it for them.
The model is not the target. The thing that trusts the model is the target. When an application renders, queries, or executes whatever the LLM returns, the model becomes a smuggling lane for classic web bugs. This play walks the methodology for finding the sink, not for building the payload.
The model did not get tricked. The plumbing behind it had no brakes. Turn an agent's own tools against the system it serves, then build the brakes back.
Single prompts find single bugs. Campaigns find the patterns. This play wires an orchestrator to a target, a converter, and a scorer so a multi-turn adversarial run executes itself, scores every reply, and hands you a ranked list of hits instead of a wall of transcripts. Methodology only, no payloads, authorized ranges only.
Most teams meter the front door and forget the meter on the model. A single endpoint with no per-user budget, no output cap, and an autoscaler that says yes to everything is a credit card someone else is holding. This play measures the blast radius before an attacker prices it for you.
The model does not have to be jailbroken to be dangerous. It just has to be confidently wrong. This play measures the gap between what the model asserts and what is true: invented sources, fake legal cases, and most useful to an attacker, package names that do not exist. When a coding assistant suggests a library that was never published, an adversary can publish it. This is slopsquatting. We probe for it on an authorized range so the defender can pin dependencies before a real build pulls the poison.
A finding you cannot re-run is a finding the next deploy can undo. Make the fix permanent: turn the exploit into a red test in CI.
A finding you cannot map and cannot fix is a war story. Make it a record the client can act on.
These plays are the manual way. Krypteia is building the autonomous operator that runs them end to end, on authorized targets, so one engineer covers the ground a team used to. A look behind the curtain: