How to Write Release Notes for Performance Improvements

Learn to communicate performance gains credibly: use user-facing metrics, pair claims with context, qualify conditions, and avoid vague promises that erode user trust.

June 18, 2026

The Doc Holiday Team

How to Write Release Notes for Performance Improvements

Performance improvements are the hardest thing to communicate in a release note. Not because the engineering is hard to explain — though it often is — but because users have been burned so many times by vague claims that they've learned to distrust them by default. "Faster and more responsive" is the changelog equivalent of "we hear your feedback." It says nothing, and everyone knows it.

The challenge is real: performance gains involve technical complexity, variability across hardware and network conditions, and the genuine risk of overpromising to users whose environments you can't fully control. But that's not a reason to retreat into vagueness. It's a reason to be more precise, more honest, and more useful. This guide covers how to do that.

Skeptical user reading generic performance claim, surrounded by years of identical promises — Trust, once lost to vagueness, requires more than one good release note to rebuild.

What Belongs in a Release Note vs. What Belongs Elsewhere

The first decision is what to report at all. Not every performance metric is worth surfacing to users, and mixing internal engineering signals with user-facing outcomes is one of the most common ways performance notes go wrong.

User-facing metrics are the ones that belong in a release note. These are measurements that correspond to something the user actually experiences: page load time, query response time, time-to-first-render, or how quickly a search returns results. Google's web.dev defines a useful framework for thinking about this, distinguishing between metrics like Largest Contentful Paint (LCP), which measures when the main content is visible, and Interaction to Next Paint (INP), which measures how quickly the page responds to user input. The key question is: does this metric correspond to something the user feels?

Internal metrics — CPU cycles saved, cache hit rate improvements, memory allocation reductions — are valuable engineering signals, but they typically belong in a technical deep-dive, an engineering blog post, or detailed documentation rather than a changelog. The exception is when an internal metric has a clear, direct consequence for users. If you reduced memory consumption by 40%, and that means the application no longer crashes on devices with 4GB of RAM, that's worth saying. The memory number is the evidence; the crash prevention is the story.

Jakob Nielsen's foundational research on response time thresholds is still the clearest articulation of why this distinction matters: users perceive systems as instantaneous below 0.1 seconds, feel a delay but stay in flow up to 1 second, and lose their train of thought entirely beyond 10 seconds. If your improvement moves a user from the "lost my train of thought" zone to the "barely noticed" zone, that's meaningful. If it shaves 50 milliseconds off an operation that was already imperceptible, it might not be worth the real estate in a release note.

Relative vs. Absolute Numbers — and Why Both Require Context

The debate between "30% faster" and "reduced from 4 seconds to 2.8 seconds" is not really about which format is better. It's about what information the reader needs to evaluate the claim.

Relative improvements are easy to read but easy to game. A 50% improvement on a 20-millisecond operation is a 10-millisecond gain — meaningful in some contexts, invisible in others. Absolute numbers give users a concrete baseline to compare against their own experience. If a user's queries currently take 6 seconds, knowing that the improvement targets operations in the 4-second range tells them something useful. If their queries take 200 milliseconds, the same note is irrelevant to them.

The more honest approach is to use both, with explicit conditions. Brendan Gregg, whose benchmarking methodology has influenced how performance engineers at Netflix and elsewhere think about measurement, argues that every benchmark claim should be interrogated with a simple question: "Why not double?" If you can't answer what the limiter is — what prevented a 2x improvement instead of a 1.3x improvement — you probably don't understand the improvement well enough to publish it. That's a useful test to apply before writing the release note.

How to Qualify Claims Without Burying the Lead

Qualification is not the same as hedging. Hedging is when you add so many caveats that the claim becomes meaningless. Qualification is when you give users the information they need to assess whether the improvement applies to them.

The conditions that matter most are: the workload type, the data size or volume, the infrastructure tier, and whether the improvement is automatic or requires configuration changes. Snowflake's approach to this is instructive — their 2025 performance improvement notes explicitly state that "performance improvements often target specific query patterns or workloads" and that improvements "might or might not have a material impact on a specific workload." That's not a disclaimer designed to lower expectations; it's an honest acknowledgment that performance is contextual.

Firebolt's release notes take a similar approach, specifying not just what improved but the mechanism: "Late materialization optimization is now available for top-k queries. This change significantly improves the speed of eligible queries, potentially making them 10 times faster." The phrase "eligible queries" is doing important work there — it signals that this is a targeted improvement, not a universal one, and invites users to check whether their workloads qualify.

Tradeoffs also deserve explicit mention. Modern systems routinely trade memory for speed, or throughput for latency. If a performance gain comes at the cost of increased memory usage, or if it only applies when a new configuration flag is enabled, say so. Users who discover an undisclosed tradeoff after upgrading will trust you less than users who were warned upfront.

Before-and-After Examples

The most credible performance notes pair a claim with the conditions under which it was measured. Here's what that looks like in practice:

Weak Version	Stronger Version
"Improved dashboard load times."	"Dashboard initial load time reduced from ~4.2s to ~2.8s (median, measured on 4G mobile connections with 50+ widgets). Users on faster connections will see smaller gains."
"Faster search."	"Full-text search queries on indexes larger than 10M records now return results in under 800ms, down from an average of 2.3s. Smaller indexes are unaffected."
"Reduced memory usage."	"Baseline memory footprint reduced by ~120MB. Note: peak memory during bulk exports may increase slightly due to the new streaming pipeline."
"Performance improvements to the editor."	"Reduced input latency in the editor by ~40ms for documents over 500 pages. Smaller documents are unaffected."

The pattern is consistent: state the metric, state the baseline, state the conditions, and flag any exceptions or tradeoffs. This is not more words for the sake of it — each piece of information serves a reader who is trying to decide whether this improvement matters to them.

What Respected Engineering Organizations Actually Do

Microsoft's .NET performance posts are a useful benchmark for thoroughness. The .NET 8 release included detailed documentation of over 500 pull requests, with reproducible microbenchmarks and explicit notes about the hardware and operating system used for testing. The methodology section alone — explaining how to set up the benchmark environment and why results may vary — is more honest than most companies' entire release notes.

‍PostgreSQL takes a different but equally credible approach. The PostgreSQL 17 release notes describe improvements in terms of the specific operations affected: "various query performance improvements, including for sequential reads using streaming I/O, write throughput under high concurrency, and searches over multiple values in a btree index." No headline numbers, but precise enough that a DBA can immediately assess relevance.

SQLite's changelog is a model of concision: "Performance enhancements to JSON processing results in a 2x performance improvement for some kinds of processing on large JSON strings." It gives a number, scopes it to a specific operation type, and qualifies it with "large JSON strings." That's three pieces of information in one sentence.

‍

Four-box flow diagram showing metric, baseline, conditions, and tradeoffs in sequence — A credible claim is transparent about scope; vague claims hide which part doesn't apply to you.

The Workflow Problem Behind the Communication Problem

Research on release note production confirms what most documentation teams already know from experience: there is a significant gap between what engineering produces and what documentation needs. A 2020 study of 32,425 release notes across 1,000 GitHub projects found that most release notes list only 6–26% of the issues addressed in a given release, and that there are meaningful discrepancies between what producers and users consider well-formed. The information exists; the problem is extracting and structuring it consistently.

Writing good performance release notes requires access to accurate benchmarks, reproducible test environments, and close collaboration between engineering and documentation teams. That collaboration is hard to maintain at scale, especially as release velocity increases. Engineers know what changed; technical writers know how to communicate it; and the handoff between them is where precision gets lost.

This is the type of structured, evidence-backed content that Doc Holiday generates automatically from engineering telemetry and QA benchmarks — then provides a validation workflow so a skilled technical writer can verify claims, add necessary qualifiers, and ensure the final release note is both accurate and useful. It's a practical operational solution to the workflow problem described here: not replacing the judgment of a good writer, but giving them the raw material they need to exercise it.

The goal is release notes that users actually trust. That means being specific enough to be credible, honest enough to be useful, and disciplined enough to say what you don't know. That's a higher bar than "faster and better" — but it's the bar that earns the benefit of the doubt the next time you ship.