Quality Assurance with Agents

AI-generated code changes the code review and testing dynamic. Your policies need to adapt.

Code Review Policy Options

Should AI-generated code be reviewed differently? Most teams land in the middle.

Policy	How it works	Best for
No differentiation	Same review for all code	Small, high-trust teams with rigorous reviews
Disclosure only	Authors flag significant AI code	Transparency without bureaucracy
Tiered review	Extra scrutiny on critical paths + AI	Risk-varying codebases
AI-assisted review	AI pre-reviews; humans focus on what AI misses	High PR volume teams

If You Require Disclosure

Make it easy:

Clear trigger: “Disclose when >20% of the PR was generated by AI tools”
Simple mechanism: Checkbox in PR template or a tag/label
No stigma: Disclosure is information, not judgment
Useful metadata: Which tool? What prompts? Helps with learning

Review Checklist for AI Code

Train reviewers to watch for AI-specific issues:

Hallucinated APIs or methods
Plausible but incorrect logic
Missing edge case handling
Inconsistent with existing patterns
Over-engineered for the task

Plus standard checks: meets requirements, handles errors, security addressed, tests adequate, docs updated.

Building Reviewer Skills

Some reviewers catch AI mistakes better than others. This is trainable.

Share examples of AI failures caught in review
Pair junior reviewers with experienced ones
Create a team knowledge base of AI pitfalls

What Not to Do

Don’t create a separate “AI code” branch—integration nightmares
Don’t require manager approval for AI usage—kills adoption
Don’t ignore the conversation—pretending AI isn’t changing things helps no one

Shift Testing Left

Traditional: Write code → Test → Review → Deploy

Agentic: Write spec → Generate tests → Generate code → Verify tests pass → Review → Deploy

Tests come before implementation.

Why This Works

Tests define success criteria for the agent
Agents can run tests to self-validate
Fewer iterations when the goal is clear
Tests document intent

How to Implement

Write tests as part of task definition. Before asking an agent to implement a feature, write (or generate) the tests it should pass.

Use TDD prompting. “Here are the tests. Write code that makes them pass.”

Treat test failures as agent feedback. The test suite catches bugs, not you.

Using Agents for Test Generation

Agents excel at writing tests. Use this.

Unit tests: Give agent a function, get test cases back. Review for coverage gaps.

Edge cases: Agents are good at imagining cases you might miss.

Integration tests: More complex, but agents can generate scaffolding.

The Workflow

Point agent at a module
Request tests covering happy path, edges, and errors
Review for gaps and hallucinated behavior
Refine until coverage is meaningful

Watch For

Tests that pass for wrong reasons
Mocked dependencies hiding real issues
Tests that don’t actually test the requirement
Copy-paste tests that don’t add coverage

Test Coverage as Guardrail

High test coverage makes agentic development safer.

Minimum coverage gates: Don’t let agent-generated code reduce coverage
Critical path requirements: Some paths need 100% coverage with meaningful tests
Coverage trends: Track whether agent adoption correlates with coverage changes

Testing Agent Output

Not just testing code—testing agent behavior itself.

Acceptance criteria: Define what code should do, shouldn’t do, edge cases, and verification method.

Canary testing (for automated workflows):

Run agent changes through extended test suites before merge
Stage behind feature flags
Monitor for anomalies after deployment

Regression tracking: Notice patterns—do certain task types introduce more bugs? Are there problematic codepaths?

The Test Pyramid for Agents

Unit tests (foundation): Fast, focused, run constantly. Agent-generated with human review.
Integration tests (middle): Verify components work together. Human-guided generation.
E2E tests (top): Verify full user flows. Fewer but critical. Often still human-written.
Contract tests (boundaries): Verify API contracts. Especially important when agents modify interfaces.

Resources

Essential

Your job is to deliver code you have proven to work - Standards for reviewing AI-assisted code
Agent Readiness – Eno Reyes, Factory AI - How testing infrastructure affects agent reliability