SAFe Feature Spec System
liveMulti-agent pipeline that classifies, interviews, generates, and evaluates SAFe feature specifications.
- Problem solved
- The original Feature Spec Generator worked but couldn't handle different feature types or improve its own output. This system adds classification, context-gathering, and self-evaluation.
- Architecture
- Router → Interviewer → Generator → Evaluator
- Tech stack
- PythonStreamlitAnthropic APISQLite
- Week built
- Week 8
What it does
Accepts raw stakeholder requests and produces polished SAFe feature specifications through a multi-stage pipeline. A Router classifies the request type (capability, webpage, experience), an Interviewer gathers missing context, a Generator produces the spec, and an Evaluator scores it against type-specific rubrics.
Architecture decisions
The multi-agent approach was justified here because the subtasks are genuinely different: classification requires pattern matching, interviewing requires conversational skill, generation requires structured output, and evaluation requires critical judgment. Each agent uses a specialized system prompt optimized for its task.
The Router uses Claude Haiku (fast, cheap) while the Generator and Evaluator use Claude Sonnet (higher quality where it matters). This is the Optimization Trilemma in practice — allocating quality budget where it has the most impact.
What I learned
The biggest lesson was about evaluation infrastructure. You can’t improve what you can’t measure. Building the golden test set and baseline scoring system before attempting improvements meant every change had a clear before/after comparison. Without that discipline, I would have been guessing whether my prompt changes were actually helping.