How to write a performance review

How to Write a Performance Review for a Software Engineer

Engineering reviews are different from every other kind of review. The evidence lives in different places, the language has to mean something to people who read code for a living, and generic prose gets noticed instantly. Here's how to draft one that holds up.

11 min read·Updated 12 May 2026

Most engineering managers I know dread performance review week. Not because they don’t have opinions about their engineers, but because turning those opinions into a document that holds up under scrutiny (from HR, from the engineer, from yourself six months later) is harder than the work itself. I want to give you a way through that.

This is a guide for managing engineers specifically. Reviewing an engineer is not the same as reviewing a sales rep or a designer. The signal lives in different places: pull requests, design docs, on-call rotations, the unfiltered Slack DMs from peer leads. And the language has to actually mean something to a person who reads code for a living. Generic praise is recognised as generic praise.

Why software engineering reviews are different

Three things make these reviews trickier than most. First, the outcomes are often invisible. A senior engineer can spend a quarter preventing an incident that never happened, and the only way to evidence that is to read the design doc they wrote and the decisions they made. There is no quota chart.

Second, the people you’re reviewing are calibration-sensitive. Engineers compare notes. If your review of one engineer sounds like a generic praise sandwich and their peer’s review reads like a specific story about real work, the first engineer notices, often immediately, and trust in the process drops for years.

Third, you’re writing for two audiences at once. The HR partner you calibrate with is rarely technical and needs the review to be readable. The engineer you’re reviewing needs the review to be technically credible. Those two needs pull in different directions, and most weak reviews resolve the tension by going generic on both sides. That’s the trap.

Start by collecting evidence, not opinions

The single biggest unlock I’ve found is to separate evidence collection from review writing. They are different jobs. If you try to do them together, you’ll lean on whatever you happen to remember, which is usually the last sprint and the most emotionally charged moments. Both of those bias the review.

Block an hour for evidence collection before you write a word. Here’s where to look:

Pull requests they authored. Skim titles and descriptions, not the diffs. You want to see the scope of work they shipped, the granularity (lots of small PRs vs a few big ones), and the quality of their PR descriptions.
Pull requests they reviewed. This is where mid-level engineers earn their stripes. Read 10-15 of their review comments. Are they catching real issues? Asking good questions? Being kind?
Design docs, RFCs, ADRs. Whatever your team calls them. The presence and quality of these is a strong signal of judgement. An engineer who shipped three medium features without a single design doc is a different signal from one who shipped one feature with a careful proposal.
On-call rotations. Pull the on-call log or incident retro docs. Look at how they ran incidents, how they handed off, what they documented.
Your 1:1 notes.If you take them (you should), this is the most honest record of what they actually worked on and worried about. If you don’t, the period since the last review is now a good moment to start.
Peer-lead conversations.Ask two or three colleagues who worked with them: “what did Sam do this half that I might not have seen?” The answers reveal blind spots in both directions.

At the end of the hour you should have a doc with bullet points under each source. Not a review yet. Just evidence. You’ll be amazed how much you missed when you tried to write reviews from memory.

The four-bucket framework

Once you have evidence, sort it into four buckets. These map to what I think actually matters for a software engineer’s contribution and they map cleanly to the section structure most review templates use.

1. Technical delivery

What did they ship, and how much of it would have happened without them? Look for end-to-end ownership, not just merged PRs. Pick two or three specific projects to cite by name. For each, name the outcome (a specific metric if you can, a specific user problem if you can’t).

Weak version: “Shipped consistently across the half.”

Stronger: “Owned the billing migration end-to-end across six weeks, including a rollback plan that prevented a Stripe API rate-limit incident from reaching customers.”

2. Judgement and system design

Did they pick the right things to work on? Did they push back on bad ideas? Did they design for the long-tail failure case or only the happy path? This is the bucket where you can talk about scope, trade-offs, and the calls they made that nobody saw.

The artefacts to point to are design docs, RFC reviews, and scoping decisions. A line like “Argued for cutting the migration scope to the highest-impact 30%, which let us ship in Q2 instead of Q4” says more than any number of paragraphs about being “strategic.”

3. Collaboration

This is where mid-level engineers prove they are not just individual contributors with a higher level number. How do they review other people’s code? How do they support juniors? How do they show up in design review meetings? Do peers want to work with them?

Be specific. “Strong collaborator” is the canonical AI-prose phrase. “Reviewed 47 PRs from juniors with detailed feedback that the juniors actually read and applied” is a sentence. (And the number is from the GitHub UI, not from memory.)

4. Growth trajectory

Where are they relative to six months ago? Where do they need to go in the next six? This is the section that most reviews bungle because the temptation is to either fluff (“continued strong growth”) or fabricate development areas to satisfy a template. Be honest. If they’re flat, say so kindly and say what would unflatten it. If they’re ready for the next level, name the evidence and the gaps that remain.

Common traps to avoid

Reviews fail in predictable ways. Knowing the failure modes is half the battle.

The 10x trap

One outlier project does not make a performance period. If an engineer shipped one impressive thing and otherwise had a quiet half, name both. The review should reflect the whole period, not the most visible week of it. The 10x trap reads as fawning to anyone calibrating against other engineers in the org.

The recency trap

Whatever happened in the last six weeks looms larger than it should. This is why evidence collection from the full period matters. If you write the review from memory, you will under-credit the work from month one of the period and over-weight the emotional residue of the last two sprints.

The “great team player” trap

If a sentence could be moved verbatim into anyone else’s review without changing the meaning, cut it. “Great team player,” “consistent contributor,” “reliable performer” all fail this test. They flatter no one and evidence nothing. Replace them with a specific behaviour or a specific outcome.

The nice-in-private, harsh-in-writing trap

If you have feedback you’ve been delivering in 1:1s, the review is not the time to reframe it as a bigger deal than the engineer believed it was. Either you’ve been undersignaling in 1:1s (a 1:1 problem to fix going forward) or the written version is harsher than the conversation warranted (a review problem to fix now). Don’t use the review to deliver feedback for the first time.

The 90-minute drafting flow

Here’s the flow I’d use if I had to draft a clean review from scratch.

Minutes 0–30. Evidence collection.Open GitHub, Linear, design-doc folder, your 1:1 notes. Dump bullets into a scratch doc under headings for each source. Don’t write any narrative. Just collect.

Minutes 30–60. Bucket assessment. Drop each evidence bullet into one of the four buckets (delivery, judgement, collaboration, growth). Some bullets fit two buckets; pick the primary one. By the end of this 30 minutes you should see clusters: which bucket is full of evidence, which is light, which is mixed.

Minutes 60–90. Draft and sharpen.Write each section in 4–6 sentences, leading with the strongest evidence. Then go back through and cut every sentence that could appear in someone else’s review unchanged. What’s left is the review.

Ninety minutes is the realistic floor. If you have eight reports and you’re trying to do all eight in one Saturday morning, the quality is going to suffer no matter how good your process is. Spread it across the week.

What to do when you’re stuck

Three common stuck-points and what I’d do about each.

“I don’t have specific examples.” You do, you just haven’t looked at the artefacts yet. Go back to the evidence step. If you genuinely cannot find specific examples after looking, that itself is evidence: this engineer had a quiet half and the review should reflect that honestly.

“They’re underperforming and I don’t know how to say it.”Separate behaviour from worth. Name the specific behaviour that’s the issue (“design proposals are typically written after implementation has started”), name what it costs (“the team has had to redo work twice this half because key trade-offs weren’t flagged in advance”), and name what would change it (“a design doc circulated before implementation”). Stay in the work, out of the person.

“They’ve grown but I can’t quantify it.”Growth often shows up as the absence of previous patterns. If six months ago they’d ask you for guidance on every architectural call and now they propose two options and ask which you’d pick, that’s growth. Write the before-state and the after-state next to each other.

The drafting itself doesn’t have to be the bottleneck

Most of the work above is the thinking, not the writing. Once you’ve done the evidence collection and bucket assessment, turning bullets into prose is maybe 30 minutes of typing per engineer. If you want to skip that 30 minutes, this is exactly the kind of work AI is well-suited for: take your bullets, hand them to a tool that knows what a good engineering review reads like, edit the draft. Crestento is built for that pattern, with a system prompt tuned specifically to mid-level software engineer reviews. It won’t replace the thinking, but it will give you 80% of a draft from your bullets so you can spend the time editing instead of generating.

For more on this, the worked examples that show what a good review looks like, what to actually write in each section, and the engineer-side version of all this all live in the rest of the cluster:

Performance review examples for software engineers covers five worked examples for different scenarios with notes on what makes each one work.
Software engineer self-evaluation examples takes the same approach from the engineer’s side.
Performance review tips for software engineers collects tactical tips for both sides of the review.

Frequently asked questions

How long should a software engineer's performance review be?

About 400 to 700 words for the written review itself. Long enough to cover the four buckets (delivery, judgement, collaboration, growth) with concrete evidence, short enough that the engineer reads it twice and the HR partner can calibrate it quickly. Anything over a thousand words usually means the evidence is being padded with adjectives.

What if I have to write reviews for engineers I barely worked with?

Lean heavily on peer-lead conversations and artefacts. Ask two or three colleagues who worked closely with the engineer what stood out, then pair their input with what you can pull from PR history, design docs, and incident logs. Be explicit in the review about the basis of your assessment so the engineer knows it wasn't pulled from thin air.

How do I write a performance review for an underperforming software engineer?

Name the specific behaviour pattern, name the cost to the team, and name what would change it. Stay in the work, not the person. Don't use the review to deliver feedback for the first time. If the engineer is hearing something for the first time in writing, that's a 1:1 cadence problem to fix going forward.

Should I include the engineer's self-evaluation in my review?

Reference it where it adds signal, particularly where your view and theirs diverge. If they identified a growth area you agree with, acknowledge it. If they flagged a strength you hadn't noticed, look for evidence to confirm or correct. The self-eval is a useful cross-check; it isn't a summary you copy.

How do I avoid bias in a software engineer performance review?

Three habits help. Collect evidence from the full period, not just the last six weeks, so you don't lean on recency. Pull peer-lead input so you correct for blind spots. And run a final cut where you delete any sentence that could appear verbatim in someone else's review. That last pass catches the generic praise that hides bias.

Draft your next Software Engineer — Mid-level review with Crestento

Bullet points in, polished draft out. Two free reviews, no card required. The free tier IS the trial.

Try Crestento free See pricing

More for Software Engineer — Mid-level

Keep reading

Back to Software Engineer — Mid-level overview