Pipeline
Every PR runs through one explicit pipeline whether it enters via the GitHub App webhook, the CLI, the VS Code extension, or the GitHub Action. The same runPipeline() function backs all four entry points, which means findings are reproducible and replayable across surfaces.
One pipeline, four entry points
runPipeline(). CLI and VS Code call it in-process. The GitHub Action calls our API. The ten stages execute in the same order regardless.Ten-stage pipeline + post-merge outcome
Stage 1 — Sanitize
First touch. Strips prompt-injection from anything we'll later send to an LLM: PR title, body, commit messages, file contents. Detects literal injection strings (ignore previous instructions, system: blocks), base64-encoded instruction payloads, zero-width characters, and unusual unicode direction marks.
Rule-based scrubbing first, then a Haiku call for ambiguous cases. Output is a sanitized payload plus a sanitizationFindings[] array, logged for audit but not user-blocking.
Stage 2 — Pre-ingest
Hygiene checks before any model sees the diff. Default checks:
- PR description present (configurable minimum length)
- Tests touched if
src/was touched (paths configurable) - No binary blobs over 1MB
- No committed
.envfiles or credentials
Outputs preIngestFindings[] plus a proceed flag. Under STRICT or LOCKED policy, a proceed: false result fails the check-run early without spending LLM tokens.
Stage 3 — Enrich
Assemble the context bundle every downstream LLM stage reads from. Cached in Redis keyed by ${prId}-${headSha} for one hour.
- Repo profile — language, frameworks, test conventions, fingerprinted from the last 200 commits (7-day TTL).
- Author profile — last 50 PRs by the author: merge rate, revert rate, avg findings per PR, primary languages.
- Similar PRs — file-overlap lookup against the last 200 PRs in the repo. Embedding-based similarity comes in v2.
- Dep graph — direct dep changes; new packages flagged for slopsquatting check.
- Agent identity — see Agents.
- Cross-PR detect — file overlap with other open PRs, revert detection, dup-diff signals.
Stage 4 — Static
Deterministic rules, no LLM. Combines the security ruleset, AI-pattern detection, slopsquatting checks for new dependencies, and the dependency license check (disallowed licenses configurable per-org).
Findings here are cheap, fast, and have rule IDs you can reference in policy.
Stage 5 — AI Review
Sized routing. Tiny diffs go to Haiku, mid-size to Sonnet, large to Opus — see Models. Prompts are pulled from the PromptTemplate table at the active version, so admins can ship prompt changes without a deploy.
The system prompt is augmented with the enrich bundle: repo profile summary, author summary, dep flags, agent identity. The cache point marker before the enrich block lets us hit Bedrock prompt caching on repeat-PR-same-repo cases.
Stage 6 — Cross-check
For every AI Review finding above HIGH severity, we run a second cheap Haiku call: "is this finding genuinely a problem? yes/no plus one-line reason." Race with a 5s timeout.
Disagreements get demoted to LOW and routed into the disputed list for Deep Audit. This is the yoloClassifier pattern — it cuts false positives without an Opus call on every finding.
Stage 7 — Deep Audit
Conditional. Runs only when one of:
- Any CRITICAL finding from stages 4–6
disputedFindings[]from Cross-check is non-empty- Trust score below configured threshold
Opus reads the same enrich bundle plus the disputed findings and produces a final verdict. Most PRs skip this stage entirely.
Stage 8 — Auto-fix
For findings tagged fixable, generate patch suggestions the author can apply with one click. Plan-gated to Pro and above.
Stage 9 — Merge Gate
Read the merge policy for the repo (falling back to the org default), evaluate against accumulated findings plus agent identity plus changed files, and produce the check-run conclusion: success, failure, or neutral.
Stage 10 — Explain
Every finding written to the database carries a provenance object: the stage that produced it, the rule ID (for static) or prompt version + AI call ID (for AI stages), and an evidence excerpt.
The PR comment renders findings as [stage:rule] message. Owners and admins can click through to the full AI call detail. See Audit Trail.
Post-merge: Outcome tracking
The ten numbered stages finish when the check-run is written. An async outcome loop runs after merge — it's a feedback channel, not a pipeline stage, so it sits outside the numbered flow.
Triggered on pull_request.closed with merged: true, a delayed BullMQ job fires 14 days later to check for revert PRs and incident-tagged commits in the same files.
Writes a PRReviewOutcome row. Aggregated weekly into per-rule false-positive rates that surface in the admin feedback dashboard and feed the model-routing tuning loop.
Cost and latency budgets
BUDGET_EXCEEDED finding is emitted. Latency target is 90s p95 from webhook to check-run.