Introduction: How to Actually "Use" a Trained Model
In the previous installment (Part 1/3), we surveyed the cutting edge of LLM training using reinforcement learning. While RL continues to enhance model capabilities, the separate challenge facing practitioners is: "How do we actually integrate these models into real systems?" This article systematically explains the AI agent workflow patterns that Anthropic has organized based on its own production experience, and provides a framework for the design decision of "which pattern to choose and when."
The Difference Between Workflows and Agents
Anthropic broadly classifies "agentic systems" into two categories [Source: https://www.anthropic.com/engineering/building-effective-agents].
- Workflows: Systems in which LLMs and tools are orchestrated along predefined code paths.
- Agents: Systems in which the LLM autonomously and dynamically decides on processes and tool usage, controlling how it achieves a task on its own.
Which to choose depends on the degree of task structure, predictability, and acceptable latency. The recommended approach is to first attempt a solution with the simplest "prompt optimization + RAG," and only add complexity when that proves insufficient.
5 Workflow Patterns
1. Prompt Chaining
This pattern decomposes a task into a series of steps, with each LLM call processing the previous output as its input. A key feature is the ability to insert "gates" (validation logic) at intermediate steps. Latency increases, but accuracy improves because each call can focus on a simpler task. Typical use cases include generating marketing copy then translating it, and creating an outline, checking it against criteria, then writing the full text.
2. Routing
This pattern classifies input and directs it to specialized subtasks. It allows prompts to be optimized per input type, so that optimizing for one type does not degrade performance for another. Good examples include classifying customer support queries (general questions, refunds, technical support) and selecting a model based on difficulty (Claude Haiku 4.5 for simple tasks, Claude Sonnet 4.5 for complex tasks).
3. Parallelization
This pattern has two variants. Sectioning splits a task into independent subtasks and runs them in parallel. Voting runs the same task multiple times and aggregates the diverse outputs. Representative examples include running main response generation and guardrail checks in parallel for content moderation, and having multiple independent prompts review code for vulnerabilities.
4. Orchestrator-Workers
A central LLM (the orchestrator) dynamically decomposes a task, delegates it to worker LLMs, and integrates the results. Unlike routing, subtasks are not predefined — the orchestrator determines them on the fly based on the input. This is ideal for cases where the number of required subtasks cannot be predicted in advance, such as code changes spanning multiple files or gathering and analyzing information from multiple sources [Source: https://www.anthropic.com/engineering/building-effective-agents].
5. Evaluator-Optimizer
This is a loop structure in which one LLM generates a response while another provides evaluation and feedback. It is suited for tasks where "a human could improve it by giving feedback" and "an LLM can generate that feedback." Typical examples include refining nuance in literary translation and information-gathering tasks that require multiple rounds of search and analysis.
A Real-World Application: NVIDIA's #1-Ranked DABStep Architecture
As an example demonstrating the power of the Orchestrator-Workers pattern, NVIDIA's KGMON team's Data Explorer architecture — which achieved first place on the DABStep benchmark — is worth highlighting [Source: https://huggingface.co/blog/nvidia/nemo-agent-toolkit-data-explorer-dabstep-1st-place].
This system is composed of three phases.
- Learning Phase: A heavyweight model (Opus-class) solves representative tasks in a ReAct loop, generating a reusable function library (
helper.py) and few-shot examples. - Inference Phase: A lightweight model (Haiku 4.5) receives only the function signatures from
helper.pyas context and quickly solves new queries — completing each task in 20 seconds. - Offline Reflection Phase: The heavyweight model reviews outputs offline, performing reflection and group consistency checks. The insights gained are fed back into the prompts for the next inference phase.
As a result, the system achieved a 30x speedup compared to the baseline using Claude Code with Opus 4.5 (10 minutes per task), while significantly outperforming it on difficult tasks with an accuracy of 89.95% vs. 66.93%. The design philosophy of "do the heavy learning upfront, then repeat lightweight inference at speed" embodies the essence of the Orchestrator-Workers pattern.
Practical Guidelines for Pattern Selection
| Pattern | When It Works Well | When to Avoid It |
|---|---|---|
| Prompt Chaining | Task can be decomposed into fixed steps | When a dynamic number of steps is needed |
| Routing | Input categories can be clearly classified | When categories are ambiguous or overlapping |
| Parallelization | Subtasks are independent | When dependencies are complex |
| Orchestrator-Workers | Subtasks cannot be predicted in advance | When cost and latency constraints are strict |
| Evaluator-Optimizer | Evaluation criteria are clear and iterative improvement is valuable | When evaluation criteria are too subjective |
Anthropic's three guiding principles are straightforward: maintain simplicity, make planning steps explicit to ensure transparency, and invest sufficiently in tool documentation and testing [Source: https://www.anthropic.com/engineering/building-effective-agents]. Workflow patterns are ultimately "composable building blocks" — rather than applying a pattern as-is, what matters is finding the optimal combination based on empirical measurement.
Preview of Next Installment (Part 3/3)
In this article, we organized the overall picture of workflow design. In Part 3, we will dive deeper into the long-context utilization techniques and small-model breakthroughs that are key to running these agents efficiently — including the rise of edge-oriented lightweight models like Granite 4.0 Speech. We plan to explain how the division of labor in which "large models make decisions while small models execute quickly" is continuing to evolve.
Category: LLM | Tags: AIエージェント, LLMワークフロー, オーケストレーション, プロンプトエンジニアリング, Anthropic
0 件のコメント:
コメントを投稿