Domain 5, All 8 Scenarios, and Exam Day Tactics
Context management patterns, the lost-in-the-middle effect, all eight production scenarios with key patterns, the most common wrong-answer traps, and the exam-day tactics that separate high scores from borderline passes.

This lesson covers Domain 5, all eight exam scenarios with their key patterns, the wrong-answer traps that appear most frequently, and the exam-day tactics that are worth knowing before you sit. Domain 5 at 15 percent is the smallest domain, but its failure modes cascade into every other domain.
Domain 5: Context Management and Reliability (15%)
The Lost-in-the-Middle Effect
Models reliably process information at the start and end of a long context window. Information buried in the middle of a very long input gets missed at a higher rate. This is not a bug in Claude. It is a documented attention pattern in transformer architectures.
The correct mitigation is to place key findings, critical values, and action items at the top of the context, not in the middle. When an exam scenario asks what to do when a model is missing important information from a long document, front-loading is the answer.
Correct structure for long-context inputs:
[KEY FINDINGS at the top]
Critical: auth.ts has an unparameterized query at line 42 (SQL injection risk).
Priority: Fix before merge.
[DETAILED ANALYSIS in the middle]
... detailed file-by-file findings ...
[ACTION ITEMS at the end]
1. Fix SQL injection in auth.ts line 42
2. Review database.ts parameter handling
Progressive Summarization Is Wrong for Transactional Data
When context approaches the limit, progressive summarization compresses older messages into summaries. The problem: numeric values, dates, order IDs, and specific identifiers get vague during summarization. "The order total was $89.99" becomes "approximately $90" or disappears entirely.
The correct pattern for transactional data is to extract key facts into a structured block and persist that block explicitly in every subsequent prompt, regardless of how conversation history is compressed.
=== CASE FACTS (persisted, updated on each new fact) ===
Customer ID: CUST-12345
Order ID: ORD-67890
Order date: 2025-01-15
Order total: $89.99
Issue: Item arrived damaged
Customer request: Full refund
Current status: Pending manager approval
===
This block is appended to every prompt. It does not get summarized. It survives context compaction. The conversation history can be summarized aggressively because the critical values are in the structured block, not in the prose history.
Trimming Tool Results via PostToolUse
When a tool returns 40 fields but the current task needs 5, the other 35 fields are context waste. A PostToolUse hook can trim the result before it reaches the model.
This is not just an optimization. It is a reliability pattern. Noisy tool results with irrelevant fields increase the probability that the model focuses on the wrong information, especially in long conversations.
The exam tests this as the preferred solution when a scenario involves a verbose tool response degrading agent accuracy over time.
Scratchpad Files for Long Investigations
In long multi-session investigations, the agent can write key findings to an external file. When the session context degrades or a new session starts, the agent reads the scratchpad instead of re-running discovery from scratch.
This is the correct pattern for investigations that span multiple Claude Code sessions or that accumulate findings over days.
Subagent Context Budgets
Each subagent in a multi-agent system should receive only the context it needs for its specific task. The coordinator manages global state and distributes context selectively.
Giving subagents access to the full coordinator context is wasteful and introduces noise. The correct design is: specific task description plus the specific data the subagent needs for that task. Return structured results, not raw data dumps. Use allowed_tools to limit subagent toolsets.
Reliable Escalation Triggers
The exam distinguishes reliable escalation triggers from unreliable ones.
Reliable triggers:
- Customer explicitly requests a human operator
- Policy does not cover the situation
- Agent cannot make progress after a defined number of attempts
- Financial operation exceeds a policy threshold (enforced via hook)
Unreliable triggers (exam traps):
- Sentiment analysis of customer messages. Angry customers are not necessarily complex cases.
- Self-reported model confidence scores. The model can be confidently wrong.
- Automatic category classifiers. May require training data and can be overengineered for the problem.
All Eight Scenarios: Key Patterns
Scenario 1: Customer Support Resolution Agent
Core topics: agent loop with stop_reason, tool descriptions for overlapping tools, hook enforcement for financial operations, escalation triggers.
Key patterns:
get_customermust always be called beforeprocess_refund. Encode this dependency in theprocess_refundtool description ("Use only after verifying customer identity with get_customer").- Process refunds above a threshold via a PreToolUse hook, not a prompt instruction. The hook is deterministic; the prompt is not.
- Escalate immediately when the customer requests a manager. Do not attempt to solve first.
- Acknowledge frustration before offering resolution. Escalate only if the customer reiterates the desire for a human after a resolution attempt.
- Multiple customers matching a query means asking for additional identifiers, not guessing.
Scenario 2: Claude Code Configuration
Core topics: CLAUDE.md hierarchy, settings.json scope, path-scoped rules, custom commands.
Key patterns:
- Coding standards shared with the team belong in project-level
.claude/CLAUDE.md, not user-level~/.claude/CLAUDE.md. - Tool permission allowlists belong in
settings.json, not CLAUDE.md natural language. - Rules that apply only to test files use
.claude/rules/testing.mdwithpaths: ["**/*.test.ts"]. - A skill with
context: forkisolates verbose output from the main session context. - The
-pflag is the correct way to run Claude in CI/CD. Never use interactive mode in a pipeline.
Scenario 3: Multi-Agent Research System
Core topics: explicit context passing, parallel subagent execution, partial failure handling, coverage annotations.
Key patterns:
- Information loss between coordinator and subagent is caused by incomplete context passing in the subagent prompt, not context window overflow. This is the most common wrong answer for this scenario.
- Parallel subagent execution requires the coordinator to call multiple Task invocations in one response. Sequential calls miss the parallelism benefit.
- When a search subagent times out, the correct response is to continue with partial results and annotate the gap in the final report. Not to abort the whole workflow.
- The structured error response from a failing subagent must include
errorCategory,isRetryable,partial_results, andalternative_approaches.
Scenario 4: Developer Productivity Tools
Core topics: built-in tool versus MCP tool selection, tool description quality.
Key patterns:
- If the model consistently chooses the built-in Read tool over a custom MCP file access tool, the fix is to strengthen the MCP tool description by highlighting what it provides that Read cannot: richer metadata, permission handling, or access to remote files.
- When exploring an unfamiliar codebase, use an Explore subagent with
context: forkso the verbose exploration output does not pollute the main session. - Planning mode before making large cross-codebase changes. Direct execution for single-file fixes with clear stack traces.
Scenario 5: Claude Code in CI/CD
Core topics: -p flag, session isolation, duplicate comment prevention, structured output.
Key patterns:
- The
-pflag is non-interactive mode. Required for CI/CD. Not optional. - Use
--output-format jsonwith a schema to produce structured review output that downstream steps can parse for inline PR comments. - The code generation session and the code review session must be separate Claude instances. Using the same session for both is an architectural mistake: the model retains its own reasoning context and is less likely to challenge its own decisions.
- To prevent duplicate comments on re-review, include prior review results in context and instruct Claude to report only new or unresolved issues.
Scenario 6: Structured Data Extraction
Core topics: required versus nullable schema fields, enum design, validation-retry loops, normalization.
Key patterns:
- Mark a field as
requiredonly if it is always present in the source. Marking optional fields as required forces hallucination when the data does not exist. - Use
"type": ["string", "null"]for fields that may be absent. Honest null is better than a fabricated value. - Include
"other"and"unclear"in enums plus a companion detail field. Do not let the schema silently drop data outside predefined categories. - Retry with specific error context: original document plus previous extraction plus specific validation error. Not just a generic "try again."
- Retry helps for format errors and structural errors. Retry does not help when the information is genuinely absent from the source.
Scenario 7: Conversational AI Architecture
Core topics: multi-turn state management, lost-in-the-middle, instruction persistence.
Key patterns:
- Key findings and critical values belong at the top of long-context inputs, not buried in the middle.
- Progressive summarization is wrong for transactional data. Extract exact values into a structured block persisted outside the summarized history.
- When resuming a session with stale tool results, start a new session with a summary rather than resuming with potentially outdated data.
fork_sessioncreates independent branches from shared context for comparing approaches. Use it for architectural decisions with multiple plausible implementation paths.
Scenario 8: Agentic AI Tools
Core topics: tool_choice strategies, minimal footprint, hook enforcement.
Key patterns:
tool_choice: "any"forces structured output when any available tool produces the right format.tool_choice: "tool"with a specific name forces a guaranteed first step regardless of input.- Agents must not autonomously escalate their own permissions. Permission errors surface to humans, not to the agent's self-escalation logic.
- Hooks enforce policy deterministically. Prompt instructions enforce policy probabilistically. For any business rule with financial or security consequences, hooks win.
The Most Common Wrong-Answer Traps
These patterns appear as distractors in multiple scenarios across the exam.
"Technically works but architecturally wrong": the exam frequently presents two answers that both produce a working system. The architecturally superior answer applies the principle of least privilege, enforces critical rules in code rather than prompts, or separates concerns correctly. When two options both work, ask which one is more correct architecturally, not just functionally.
Progressive summarization for exact values: the exam presents progressive summarization as a valid context management technique. It is valid for prose history but wrong for exact values (IDs, amounts, dates). The correct answer always preserves exact values in a structured block separate from summarized history.
Batch API for anything that needs a response: Batch API has up to 24 hours of latency. Any scenario where a developer, user, or downstream system is waiting eliminates Batch API as the correct choice.
Parsing assistant text for completion signals: if any answer involves inspecting the text content of an assistant response to detect whether the task is complete, it is wrong. stop_reason == "end_turn" is the only reliable completion signal.
Sentiment analysis as an escalation trigger: customer dissatisfaction expressed as emotion does not equal a case the agent cannot handle. Explicit requests for a human, policy gaps, and inability to make progress are reliable triggers. Sentiment is not.
Model confidence scores as a gate: self-reported confidence from a model does not map reliably to actual accuracy. A model can report high confidence while being wrong. Confidence scores are useful for routing to human review after calibration against labeled data. They are not reliable on their own.
A new team member not getting shared configuration: always means the configuration is in a user-level file instead of a project-level file. Never means CLAUDE.md is not supported by the tool.
Generic error responses from subagents: if a scenario shows a coordinator unable to recover from a subagent failure, the cause is almost always a generic error response ("operation failed") with no errorCategory, no isRetryable flag, and no partial results. The fix is a structured error response, not a retry loop.
Exam Day Tactics
Read the scenario before the question. Every question is anchored to a scenario. The scenario provides the constraints that eliminate wrong answers. Candidates who skim the scenario and jump to the answers miss the constraints.
Domain weighting matters on hard questions. When you are genuinely unsure, reason from domain weight. The 27 percent domain tests agentic architecture judgment. The answers in that domain skew toward architectural correctness, deterministic enforcement, and minimal footprint. The 20 percent prompt engineering domain skews toward examples over descriptions and nullable fields over required ones.
Flag and return. The exam gives you 120 minutes for 60 questions: two minutes per question on average. If a question requires more than three minutes of reasoning, flag it and move on. Return after you have answered the questions you are confident about.
Eliminate distractors by checking for anti-patterns. Any answer that involves parsing assistant text, using sentiment analysis as an escalation trigger, or applying progressive summarization to exact values is wrong. Eliminating anti-pattern answers often leaves one clearly correct option.
Answer every question. There is no penalty for guessing. Never leave a question blank.
Practice to 85 percent before booking the real exam. The exam passing score is 720 out of 1000 (72 percent). Candidates who consistently score 85 percent or above on full mock exams pass on the first attempt at a high rate. Candidates who book the exam at 72 percent on practice tests are at the edge.
Study Checklist Before Exam Day
- Know
stop_reasonvalues and correct responses for each, cold. - Know the three CLAUDE.md hierarchy levels and which is shared via VCS.
- Know that
.claude/rules/withpathsfrontmatter loads rules conditionally by file pattern. - Know the difference between deterministic hook enforcement and probabilistic prompt instruction.
- Know when Batch API is correct (overnight, non-urgent, bulk) and when it is wrong (waiting user, CI/CD gate, real-time).
- Know that few-shot examples beat descriptive instructions for ambiguous scenarios.
- Know that nullable fields prevent hallucination; required fields force it when data is absent.
- Know that explicit context passing is mandatory for subagents; no inherited history.
- Know the structured error response fields:
errorCategory,isRetryable,partial_results. - Know that
stop_reason == "end_turn"is the only correct agent loop termination signal. - Know that the
-pflag is mandatory for CI/CD integration. - Know that
process_refundthresholds belong in PreToolUse hooks, not in prompts.
After Passing
Once you have passed:
- Add the certification badge to krypteiasec.com and your LinkedIn profile.
- The certification is valid for six months. Set a renewal reminder now.
- Consider the OffSec OSAI+ AI Red Teamer certification as the natural follow-on. It requires security-focused AI knowledge that the CCA-F builds toward. The combination of "I build Claude agents" and "I also break them" is the Krypteia Sec differentiator.
- Watch for the Claude Certified Architect: Advanced tier. Foundations is the prerequisite.