Measuring Impact
Everyone wants to measure agent impact. Most measurements are wrong. And sometimes agents slow you down.
The Measurement Trap
Section titled “The Measurement Trap”When you measure wrong things, people optimize for metrics, not outcomes.
| Bad metric | Gaming behavior |
|---|---|
| Lines of code generated | Verbose, less clean code |
| Tasks completed per sprint | Task inflation, tiny pieces |
| Time using AI tools | Running agents on things faster done manually |
What to Actually Measure
Section titled “What to Actually Measure”Leading Indicators (early signals)
Section titled “Leading Indicators (early signals)”- Acceptance rate: What % of suggestions accepted vs. rejected? Low rates suggest poor fit or skill gaps.
- Iteration count: How many prompt cycles before useful output? Decreasing = improving skills.
- Task scope: Are engineers tackling larger tasks with agent help? Growing confidence.
- Review feedback: Are reviewers catching fewer issues in agent-assisted PRs over time?
Lagging Indicators (outcomes)
Section titled “Lagging Indicators (outcomes)”- Velocity: Look at trends, not absolutes. Compare to teams not using agents. (Careful—gameable.)
- Bug rates: Bugs per feature changing? Account for code attribution.
- Time to production: Feature start to deploy. Harder to game.
- Developer satisfaction: Survey your team. Happy devs are productive devs.
What Not to Measure
Section titled “What Not to Measure”- Lines of code—irrelevant and gameable
- Tool usage time—usage ≠ value
- Cost of AI tools—matters for ROI, not effectiveness
- Prompt count—more prompts might mean learning
The Attribution Problem
Section titled “The Attribution Problem”Who gets credit for AI-generated code? Who takes blame?
Don’t solve this. Treat agent-assisted code like any other. The human who committed it owns it.
This simplifies everything: no separate metrics, normal accountability, no need to track percentages.
Qualitative Signals
Section titled “Qualitative Signals”Numbers don’t tell the whole story. Watch for:
- Team sentiment: Excitement or frustration? Positive talk about agents?
- Adoption patterns: Senior engineers using agents is a quality signal
- Knowledge sharing: Organic prompt sharing indicates value
- Problem selection: Engineers tackling harder problems is often the real win
Running an Experiment
Section titled “Running an Experiment”If you need rigorous measurement:
- Control group: Some work happens without agents
- Clear metrics: Define before you start
- Time bound: 4-6 weeks to account for learning curves
- Survey participants: Qualitative data matters
But most teams don’t need academic proof—just signals that adoption is working.
The Real Question
Section titled “The Real Question”Don’t ask “Are agents making us more productive?”
Ask “Are we building what we need, at the quality we need, without burning out?”
If yes, your approach is working.
When Agents Help
Section titled “When Agents Help”High-volume repetitive tasks
Section titled “High-volume repetitive tasks”Tests for multiple functions, docs across files, API boilerplate, migration scripts. Same thing, many times—agents thrive.
New territory exploration
Section titled “New territory exploration”Unfamiliar framework? Agent scaffolds while you learn. New language? Get working examples. Unknown API? Generate integration code to understand patterns.
Clear spec, straightforward implementation
Section titled “Clear spec, straightforward implementation”CRUD with defined schemas, form validation with known rules, utilities with well-defined I/O. Low ambiguity, well-understood problem space.
Tedious but necessary
Section titled “Tedious but necessary”Mocks and fixtures, logging and error handling, consistent formatting, config updates across many places. Takes time but not thought.
When Agents Slow You Down
Section titled “When Agents Slow You Down”High-context tasks
Section titled “High-context tasks”If understanding requires reading complex business logic, historical decisions, or unwritten conventions—you’d have to explain it all anyway. Often faster to just do it.
Tasks faster done manually
Section titled “Tasks faster done manually”Prompting + waiting + reviewing > manual coding? Just code it. Especially true for single-line changes, familiar patterns, quick fixes.
Build intuition for your personal break-even point.
Novel algorithms
Section titled “Novel algorithms”Agents pattern-match training data. New algorithmic approaches, domain-specific optimization, unusual data structures—solve it yourself, let agents help with boring parts around it.
Highly coupled changes
Section titled “Highly coupled changes”Changes touching many tightly-interdependent parts are hard for agents. They may not understand connections, errors compound, validation requires whole-system understanding. Break these apart or do manually.
Ambiguous requirements
Section titled “Ambiguous requirements”“Make it better” or “improve performance” without specifics wastes cycles. Agents need clear success criteria, defined constraints, specific scope. If you can’t articulate these, you’re not ready to delegate.
Team-Level Patterns
Section titled “Team-Level Patterns”Task assignment: Don’t assign agent-hostile tasks expecting agents will help.
Sprint planning: Don’t assume agent help for all tasks. Call out which are agent-friendly. Account for validation overhead.
Retrospectives: Review where agents helped and hindered. What task types worked? Where did you waste time prompting?
Building Team Judgment
Section titled “Building Team Judgment”- Share examples: “This task would have been faster manually—here’s why.”
- Celebrate good choices: Acknowledge when someone correctly decides not to use an agent.
- Create a reference: Maintain a guide of task types and recommended approaches.
- Review periodically: As tools improve, patterns change.
Resources
Section titled “Resources”Essential
Section titled “Essential”- Does AI Actually Boost Developer Productivity? – Yegor Denisov-Blanch, Stanford - 100k developer study: ~20% average boost, significant variance
- Stop Looking for AI Coding Spending Caps - Why caps cost more than they save
- ML-Enhanced Code Completion – Google Research - Google’s productivity impact research
Deep dives
Section titled “Deep dives”- The reality of AI-Assisted software engineering productivity - Balanced take on productivity claims
- Vibe coding is already dead - Critical perspective on when AI tools backfire