I made Cursor do the boring 80 percent of code review so I could be meaner about the rest
A blunt reviewer's Cursor setup: two review subagents on Opus, GitHub and Sentry over MCP, and rules that ban nitpicks so the bot flags the bug that pages you instead of your quote style.
I have a reputation on my team and it is mostly deserved. I am the reviewer who will block your pull request, walk over to your desk, and make you explain why the retry loop has no ceiling. I am also, apparently, the reviewer who left a 31-comment review on a 40-line PR in 2023 where 28 of those comments were about import ordering. I think about that review a lot. It was the moment I realized I was the noise.
Here is the thing nobody says out loud about code review: most of it is mechanical. Did the formatter run. Is there a test. Did you catch the obvious null. That stuff is real, it matters, and it is also death by a thousand cuts to read manually across twelve open PRs on a Tuesday. The interesting 20 percent, the part where you trace data flow and find the deadlock, that is the part I am actually good at. So I spent a weekend building a Cursor setup that eats the boring 80 percent and hands me the rest sharpened.
This is that setup. It is Cursor, Opus 4.8 behind two review subagents, GitHub and Sentry wired in over MCP, and a set of rules whose entire purpose is to stop the bot from doing the thing that got me roasted in 2023. I will show you every file. None of it is clever. Cleverness is how review setups end up generating forty nits nobody reads.
The model and why I do not cheap out here
Opus 4.8. Every subagent runs on it. I get the appeal of putting a cheaper model on review to save tokens, and for inline autocomplete I would agree. But review is the one task where being wrong is expensive in a sneaky way: a reviewer that misses a real bug is worse than no reviewer, because it gives everyone false confidence to merge. A model that confidently approves a race condition has actively made your codebase less safe. So I pay for the reasoning. My cost per review runs about 41 cents. The incident it caught last month cost a lot more than that.
| Part | What I chose | Why |
|---|---|---|
| Model | Claude Opus 4.8 | Review punishes confident wrong answers; pay for the reasoning |
| MCP | github, sentry | Read the real PR, weight files that already crash in prod |
| Subagents | bughunter-reviewer, style-reviewer | Split correctness from cosmetics so neither drowns the other |
| Hooks | pre-commit lint, on-pr post-inline-comments | Let automation own style; let the bot own bugs |
Rules: the part everyone gets backwards
Cursor rules live as .mdc files in .cursor/rules/, version-controlled, with frontmatter that decides when they apply. Most people fill these with style preferences. That is exactly backwards for a review bot. My rules spend almost all their words telling the model what NOT to say. Here is the core one, always on.
---
description: How to review a diff in this repo. Always on.
alwaysApply: true
---
You are reviewing a pull request, not writing a poem about it.
Priority order, top to bottom. Do not skip down a level until the
level above is clean:
1. Correctness. Does it do what the PR description says? Trace the
data flow by hand. Off-by-one, null that can actually be null,
the await someone forgot, the error path that swallows.
2. Safety. Auth checks, input validation at the boundary, anything
that touches money, PII, or a delete.
3. Tests. New behavior with no test that asserts it is not done.
A test that runs but asserts nothing is worse than no test.
4. Then, and only then, style.
Hard rules:
- Cite the exact file and line, and say WHY it is wrong, not just
that it is. "Bug" is not a review. "This returns before the lock
is released, so a thrown error leaks the connection" is.
- One comment per real problem. Do not leave six comments that all
say the same thing.
- If it is correct and tested, approve it. Do not invent work.
- No nitpick spam. If the linter or formatter can catch it, it is
not your job to mention it. The hook already did.The priority order is load-bearing. Without it, a language model will happily lead with the easy, safe observation, the variable name, because finding a real bug is hard and pointing at a name is not. Forcing it to clear correctness and safety before it is allowed to mention style is the single change that turned this from annoying to useful.
And then the rule I am most proud of, because it is the one that exorcises my 2023 ghost. It is a list of things the bot is flat-out forbidden to comment on.
---
description: Things you are banned from commenting on.
alwaysApply: true
---
The formatter and eslint run on pre-commit. They own these. You do
not get to re-litigate them in prose:
- import ordering, quote style, semicolons, trailing commas
- line length, indentation, blank lines
- "could be a const" when it changes nothing
- renaming a variable you simply would have named differently
- "consider extracting this" on a function that is fine
If your comment would survive being prefixed with "nit:", it is a
nit. Cut it. The reviewer who leaves forty nits trains everyone to
stop reading the reviewer. Spend the attention budget on the line
that pages someone at 3am.
28:05MCP: read the actual PR, and ask prod what hurts
Two MCP servers, no more. I have seen people bolt eight servers onto a review setup and then wonder why it is slow and unfocused. Every server you add is more tool definitions competing for the model's attention. For review you need exactly two things: the real pull request, and a sense of which code is already on fire.
{
"mcpServers": {
"github": {
"type": "http",
"url": "https://api.githubcopilot.com/mcp/",
"headers": { "Authorization": "Bearer ${GITHUB_PAT}" }
},
"sentry": {
"type": "http",
"url": "https://mcp.sentry.dev/mcp"
}
}
}- github pulls the diff, the PR description, the linked issue, and lets the bot post comments back inline instead of dumping a wall of text in the editor.
- sentry is the secret weapon. Before judging a file, the bughunter checks whether that module already throws in production. A change to code that is currently paging the on-call is the highest-risk change in the diff, and most reviewers, human or bot, have no idea which file that is.
Two subagents, because one reviewer wears two hats badly
I split review into two subagents on purpose. A bughunter-reviewer that only cares about correctness and safety, and a style-reviewer that handles the cosmetic layer the linter cannot, things like a misleading function name or a comment that lies. Keeping them separate means the bughunter never softens a real finding by burying it next to a naming suggestion, and the style pass never gets to pretend a preference is a defect. Here is the bughunter brief.
# Review brief: bughunter-reviewer
You receive a unified diff and the PR title/body. You have the
github and sentry MCP tools. Use them.
Workflow:
1. Read the diff fully before commenting on any single line. A bug
on line 12 is often only a bug because of line 80.
2. For any file the diff touches, check Sentry for recent errors in
that module. A change to code that already throws in prod is the
most likely place to add a new incident. Weight those higher.
3. Comment only on correctness and safety (see review-core.mdc).
Hand style to style-reviewer; do not duplicate it.
4. Post findings as inline GitHub comments via the github MCP, one
per real issue, file:line + the why + a concrete fix.
5. End with a one-line verdict: APPROVE, COMMENT, or REQUEST_CHANGES.
REQUEST_CHANGES only for a correctness or safety bug you can name.
If the diff is clean, say so in one line and approve. Resist the
urge to find something. A clean PR exists.Notice step 2. The Sentry check is what makes this more than a fancier linter. The bughunter does not treat every file equally. It reads the diff, then asks production which of these files has been throwing, and weights its attention accordingly. That is exactly what a good human reviewer does in their head when they think "oh, the payments module again" and slows down. I just made it explicit.
What it looks like running
Day to day I open a PR in Cursor, point the bughunter at the diff, and let the on-pr hook post the results straight to GitHub. Here is a real-shaped session. The good part is the last line: it found something I would have caught, but on PR number nine of the day, honestly, maybe I would not have.
Honest limits
I am not going to oversell this. It pass-rates around 85 percent against my own manual review on the same PRs, which means roughly one in seven times it either misses something I would flag or flags something I would not. That is fine for a first pass. It is not fine as the only pass. I still read every REQUEST_CHANGES myself before it goes out, because a bot that wrongly blocks a teammate's PR erodes trust faster than one that misses a bug. The bot triages. I still own the merge button. That is the whole arrangement, and I would not give it up.
- It is great at local correctness: null paths, missing awaits, stale caches, swallowed errors.
- It is weak on cross-file architecture and on whether the feature should exist at all. That is still yours.
- Sentry weighting is only as good as your Sentry hygiene. Garbage in, garbage priority.
Take it
Everything above ships in this build: the two rules, the .cursor/mcp.json, and both subagent briefs. Install it, then do the one thing that matters, which is editing the no-nitpicks list to match what your linter already enforces, so the bot never repeats it. After that, point it at your noisiest open PR and see whether it finds the thing you were dreading reading for yourself.