Quality Assurance with Agents
AI-generated code changes the code review and testing dynamic. Your policies need to adapt.
Code Review Policy Options
Section titled âCode Review Policy OptionsâShould AI-generated code be reviewed differently? Most teams land in the middle.
| Policy | How it works | Best for |
|---|---|---|
| No differentiation | Same review for all code | Small, high-trust teams with rigorous reviews |
| Disclosure only | Authors flag significant AI code | Transparency without bureaucracy |
| Tiered review | Extra scrutiny on critical paths + AI | Risk-varying codebases |
| AI-assisted review | AI pre-reviews; humans focus on what AI misses | High PR volume teams |
If You Require Disclosure
Section titled âIf You Require DisclosureâMake it easy:
- Clear trigger: âDisclose when >20% of the PR was generated by AI toolsâ
- Simple mechanism: Checkbox in PR template or a tag/label
- No stigma: Disclosure is information, not judgment
- Useful metadata: Which tool? What prompts? Helps with learning
Review Checklist for AI Code
Section titled âReview Checklist for AI CodeâTrain reviewers to watch for AI-specific issues:
- Hallucinated APIs or methods
- Plausible but incorrect logic
- Missing edge case handling
- Inconsistent with existing patterns
- Over-engineered for the task
Plus standard checks: meets requirements, handles errors, security addressed, tests adequate, docs updated.
Building Reviewer Skills
Section titled âBuilding Reviewer SkillsâSome reviewers catch AI mistakes better than others. This is trainable.
- Share examples of AI failures caught in review
- Pair junior reviewers with experienced ones
- Create a team knowledge base of AI pitfalls
What Not to Do
Section titled âWhat Not to Doâ- Donât create a separate âAI codeâ branchâintegration nightmares
- Donât require manager approval for AI usageâkills adoption
- Donât ignore the conversationâpretending AI isnât changing things helps no one
Shift Testing Left
Section titled âShift Testing LeftâTraditional: Write code â Test â Review â Deploy
Agentic: Write spec â Generate tests â Generate code â Verify tests pass â Review â Deploy
Tests come before implementation.
Why This Works
Section titled âWhy This Worksâ- Tests define success criteria for the agent
- Agents can run tests to self-validate
- Fewer iterations when the goal is clear
- Tests document intent
How to Implement
Section titled âHow to ImplementâWrite tests as part of task definition. Before asking an agent to implement a feature, write (or generate) the tests it should pass.
Use TDD prompting. âHere are the tests. Write code that makes them pass.â
Treat test failures as agent feedback. The test suite catches bugs, not you.
Using Agents for Test Generation
Section titled âUsing Agents for Test GenerationâAgents excel at writing tests. Use this.
Unit tests: Give agent a function, get test cases back. Review for coverage gaps.
Edge cases: Agents are good at imagining cases you might miss.
Integration tests: More complex, but agents can generate scaffolding.
The Workflow
Section titled âThe Workflowâ- Point agent at a module
- Request tests covering happy path, edges, and errors
- Review for gaps and hallucinated behavior
- Refine until coverage is meaningful
Watch For
Section titled âWatch Forâ- Tests that pass for wrong reasons
- Mocked dependencies hiding real issues
- Tests that donât actually test the requirement
- Copy-paste tests that donât add coverage
Test Coverage as Guardrail
Section titled âTest Coverage as GuardrailâHigh test coverage makes agentic development safer.
- Minimum coverage gates: Donât let agent-generated code reduce coverage
- Critical path requirements: Some paths need 100% coverage with meaningful tests
- Coverage trends: Track whether agent adoption correlates with coverage changes
Testing Agent Output
Section titled âTesting Agent OutputâNot just testing codeâtesting agent behavior itself.
Acceptance criteria: Define what code should do, shouldnât do, edge cases, and verification method.
Canary testing (for automated workflows):
- Run agent changes through extended test suites before merge
- Stage behind feature flags
- Monitor for anomalies after deployment
Regression tracking: Notice patternsâdo certain task types introduce more bugs? Are there problematic codepaths?
The Test Pyramid for Agents
Section titled âThe Test Pyramid for Agentsâ- Unit tests (foundation): Fast, focused, run constantly. Agent-generated with human review.
- Integration tests (middle): Verify components work together. Human-guided generation.
- E2E tests (top): Verify full user flows. Fewer but critical. Often still human-written.
- Contract tests (boundaries): Verify API contracts. Especially important when agents modify interfaces.
Resources
Section titled âResourcesâEssential
Section titled âEssentialâ- Your job is to deliver code you have proven to work - Standards for reviewing AI-assisted code
- Agent Readiness â Eno Reyes, Factory AI - How testing infrastructure affects agent reliability