Skip to content

Measuring Impact

Everyone wants to measure agent impact. Most measurements are wrong. And sometimes agents slow you down.

When you measure wrong things, people optimize for metrics, not outcomes.

Bad metricGaming behavior
Lines of code generatedVerbose, less clean code
Tasks completed per sprintTask inflation, tiny pieces
Time using AI toolsRunning agents on things faster done manually
  • Acceptance rate: What % of suggestions accepted vs. rejected? Low rates suggest poor fit or skill gaps.
  • Iteration count: How many prompt cycles before useful output? Decreasing = improving skills.
  • Task scope: Are engineers tackling larger tasks with agent help? Growing confidence.
  • Review feedback: Are reviewers catching fewer issues in agent-assisted PRs over time?
  • Velocity: Look at trends, not absolutes. Compare to teams not using agents. (Careful—gameable.)
  • Bug rates: Bugs per feature changing? Account for code attribution.
  • Time to production: Feature start to deploy. Harder to game.
  • Developer satisfaction: Survey your team. Happy devs are productive devs.
  • Lines of code—irrelevant and gameable
  • Tool usage time—usage ≠ value
  • Cost of AI tools—matters for ROI, not effectiveness
  • Prompt count—more prompts might mean learning

Who gets credit for AI-generated code? Who takes blame?

Don’t solve this. Treat agent-assisted code like any other. The human who committed it owns it.

This simplifies everything: no separate metrics, normal accountability, no need to track percentages.

Numbers don’t tell the whole story. Watch for:

  • Team sentiment: Excitement or frustration? Positive talk about agents?
  • Adoption patterns: Senior engineers using agents is a quality signal
  • Knowledge sharing: Organic prompt sharing indicates value
  • Problem selection: Engineers tackling harder problems is often the real win

If you need rigorous measurement:

  1. Control group: Some work happens without agents
  2. Clear metrics: Define before you start
  3. Time bound: 4-6 weeks to account for learning curves
  4. Survey participants: Qualitative data matters

But most teams don’t need academic proof—just signals that adoption is working.

Don’t ask “Are agents making us more productive?”

Ask “Are we building what we need, at the quality we need, without burning out?”

If yes, your approach is working.


Tests for multiple functions, docs across files, API boilerplate, migration scripts. Same thing, many times—agents thrive.

Unfamiliar framework? Agent scaffolds while you learn. New language? Get working examples. Unknown API? Generate integration code to understand patterns.

Clear spec, straightforward implementation

Section titled “Clear spec, straightforward implementation”

CRUD with defined schemas, form validation with known rules, utilities with well-defined I/O. Low ambiguity, well-understood problem space.

Mocks and fixtures, logging and error handling, consistent formatting, config updates across many places. Takes time but not thought.

If understanding requires reading complex business logic, historical decisions, or unwritten conventions—you’d have to explain it all anyway. Often faster to just do it.

Prompting + waiting + reviewing > manual coding? Just code it. Especially true for single-line changes, familiar patterns, quick fixes.

Build intuition for your personal break-even point.

Agents pattern-match training data. New algorithmic approaches, domain-specific optimization, unusual data structures—solve it yourself, let agents help with boring parts around it.

Changes touching many tightly-interdependent parts are hard for agents. They may not understand connections, errors compound, validation requires whole-system understanding. Break these apart or do manually.

“Make it better” or “improve performance” without specifics wastes cycles. Agents need clear success criteria, defined constraints, specific scope. If you can’t articulate these, you’re not ready to delegate.

Task assignment: Don’t assign agent-hostile tasks expecting agents will help.

Sprint planning: Don’t assume agent help for all tasks. Call out which are agent-friendly. Account for validation overhead.

Retrospectives: Review where agents helped and hindered. What task types worked? Where did you waste time prompting?

  • Share examples: “This task would have been faster manually—here’s why.”
  • Celebrate good choices: Acknowledge when someone correctly decides not to use an agent.
  • Create a reference: Maintain a guide of task types and recommended approaches.
  • Review periodically: As tools improve, patterns change.