What Grade Inflation Teaches Us About Vendor Reviews

By Liz Flyntz, Founder & Partner at Launch Day Advisors April 24, 2026

A student emailed me recently to dispute a grade. She had received an A. She wanted to know why it wasn’t an A+.

I teach design and media art at a large prestigious research university. The student in question was a CS major in my UX design class – someone training to make things that work, things that communicate, things that solve real problems for real people. In my classroom, an A represents genuinely very good work: original, better-than-average competency, and demonstrating real understanding of both the conceptual and the craft dimensions of design. It is not a consolation prize. But this student had moved through an educational system in which A+ had become the expected baseline for competent effort, which meant that anything short of it registered not as a strong grade but as a rebuke. The grade itself had become unreadable – stripped of meaning by the inflation surrounding it, an A could only be interpreted as a signal that something had gone wrong.

I’ve been thinking about that conversation in the context of vendor reviews, because exactly the same dynamic is at work – and it’s producing exactly the same problem for organizations trying to make good procurement decisions.

Two Systems, One Distortion

Grade inflation and review inflation are parallel phenomena driven by parallel pressures. In academic settings, the pressure runs through institutional incentives: student satisfaction scores affect faculty evaluations, grade appeals consume administrative time, and there is always more friction in defending a B than in awarding an A. The rational individual response to these pressures – across thousands of faculty, over decades – has produced a systemic shift in what grades mean. A recent Harvard Magazine piece, arriving in my mailbox just as I was writing this post, puts the numbers in stark relief: solid A’s made up 24 percent of final grades at Harvard College in 2005, 40.3 percent in 2015, and 60.2 percent by the spring of 2025 – meaning nearly two-thirds of all letter grades are currently A’s (Harvard Magazine, 2025; data from Harvard FAS Office of Undergraduate Education). The signal has collapsed under the weight of its own inflation.

What’s striking about the Harvard report is not just the numbers but the psychological mechanism it describes. Grade inflation, counterintuitively, does not produce complacent students. As Amanda Claybaugh, Harvard’s dean of undergraduate education, put it: “One might expect that a world where everyone got A’s would be a very relaxed world, but actually, it’s the most stressed-out world of all.” When A’s become the expected floor rather than a meaningful ceiling, students lose the ability to read their own performance – and any deviation from perfect is experienced as catastrophe rather than information. Faculty teaching large introductory courses routinely field about 200 grade change requests for every exam. The grade has stopped functioning as feedback and become a form of currency that must be defended at all costs.

The grade has stopped functioning as feedback and become a form of currency that must be defended at all costs.

Online vendor reviews have followed an almost identical trajectory through almost identical mechanisms. Agencies solicit reviews from their most satisfied clients. Platforms surface highly-rated vendors and bury lower-rated ones, which means vendors who want visibility have strong incentives to manage their ratings aggressively. Clients who have ongoing relationships with vendors – who may need them again, or who feel uncomfortable with public criticism – give five stars as a matter of social lubrication rather than honest assessment. My colleague Jonathan Blessing recently wrote about the structural result: in a market where positivity is engineered, visible differentiation collapses, and sentiment is not performance. His argument is about platform mechanics. What I want to add is that the distortion runs deeper than the mechanics – it has recalibrated expectations in ways that actively damage decision-making, and the mechanism driving that damage is the same anxious logic Claybaugh describes. Buyers filtering vendors by star rating are not evaluating quality; they are managing the fear of making an indefensible choice in a landscape where everything looks identical.

What a B Actually Means

When grades meant what they were designed to mean, a B was a good grade. It indicated solid, competent work – work that demonstrated understanding, met its objectives, and would have served the student and the field well. It was not a failure. It was not even a near-failure. It was a reliable indicator of quality that, in an uninfluenced system, you would be glad to receive.

The same logic applies to a 3.8 or 4.2 star vendor. In a world where reviews reflected genuine assessments distributed across the full range, a vendor averaging four stars out of five would be telling you something meaningful and largely positive: clients are generally satisfied, the work generally delivers, there are probably some rough edges worth understanding but nothing catastrophic. That is actually useful information. It is, in many cases, information that should send you toward a vendor rather than away from one.

What has happened instead is that the baseline has shifted and the interpretation has inverted. A 4.2 now triggers the same suspicion that a C used to – a sense that something is being hidden, that the lower rating reflects a pattern of failure rather than the natural distribution of human experience with any complex service. Buyers who would otherwise have found an excellent match rule out vendors before any real evaluation begins, on the basis of a rating that means almost nothing in the current environment.

A 4.2 now triggers the same suspicion that a C used to – a sense that something is being hidden, rather than the natural distribution of human experience with any complex service.

The Fit Problem

Here is what vendor ratings cannot tell you: whether this vendor is right for your organization.

A five-star agency that excels at fast-moving, loosely-structured startup engagements may be a genuinely poor fit for a large research institution with layered approval processes and a twelve-week decision cycle. An agency that received three reviews mentioning communication challenges may have since restructured its client management, or those reviews may have come from clients whose expectations were misaligned from the start, or the communication style that frustrated one client may be exactly what works in your organizational culture.

Ratings aggregate experience across all clients, all projects, all contexts. They cannot disaggregate for your specific context. A 4.2-star agency that has deep experience with your type of institution, that has navigated the specific technical constraints your project involves, that works in a style that complements how your team operates – that agency is a better choice for you than a 4.9-star firm that has never worked with an organization remotely like yours. The rating doesn’t tell you that. Only real evaluation does.

This is the argument for moving beyond ratings as a primary selection criterion, and it’s an argument that applies equally to the grade inflation problem in education. The student who complains about an A rather than an A+ has lost the ability to read her own performance clearly – to understand what she did well, where she has room to grow, and what the grade is actually trying to communicate. The buyer who filters vendors by star rating has lost the ability to read the landscape clearly, to distinguish between a firm that is right for her project and one that is merely universally inoffensive.

Recalibrating

What would it mean to take a four-star review seriously as a positive signal rather than a disqualifying one?

It would mean reading the content of reviews rather than aggregating their scores – understanding what clients actually said, what the specific friction points were, whether those friction points are relevant to your context. It would mean weighting recent reviews more heavily than old ones, since agencies change over time in both directions. It would mean looking for patterns of honest assessment rather than patterns of suspicion-free uniformity, since a vendor whose reviews are all identical in their enthusiasm is probably managing their review profile as carefully as a student managing a GPA.

It would also mean being honest about what you’re actually selecting for. The goal of vendor selection is not to find the agency with the highest aggregate satisfaction score across all of its clients. It is to find the agency that is most likely to do excellent work on your specific project, in your specific institutional context, with your specific team. Those are different optimization problems, and conflating them is how organizations end up with vendors who look perfect on paper and perform poorly in practice.

Harvard is asking a version of this question about grades right now. A faculty committee proposed in February 2026 to cap undergraduate A grades at roughly 20 percent and introduce an internal ranking system (Harvard FAS Educational Policy Committee proposal, reported by The Harvard Crimson) – an attempt to restore the signal value that inflation has eroded. Students protested. Faculty expressed cautious support. The proposal is controversial precisely because it requires the institution to acknowledge openly that its own currency has been debased. The same acknowledgment would be required of any review platform serious about restoring meaningful differentiation: that the current scale is broken, that a 4.2 is not a warning sign, and that the distance between a 4.8 and a 5.0 tells you almost nothing about who should build your next product. Some platforms are beginning to move in this direction – surfacing review content over aggregate scores, flagging recency, distinguishing verified project engagements from general recommendations. It is slow going. The incentives run the other way.

At Launch Day, our evaluation framework is built around fit rather than reputation alone. We use ratings and reviews as one signal among several – useful for flagging genuine patterns of failure, less useful as a primary ranking criterion. What we’re actually trying to answer is a different question: not which vendor has the highest score, but which vendor is most likely to succeed with you. Sometimes that’s a four-star firm with deep relevant experience and a communication style that matches your organization’s rhythms. Sometimes it’s a newer agency without enough reviews to have generated a meaningful rating at all.

The five-star vendor that isn't right for your project will still give you a three-star outcome. The four-star vendor that is precisely right for your context might give you the best project experience your organization has ever had.

The five-star vendor that isn’t right for your project will still give you a three-star outcome. The four-star vendor that is precisely right for your context might give you the best project experience your organization has ever had.

A good grade, in the end, is the one that accurately reflects the work. We’ve forgotten what that looks like. It’s worth remembering.

Launch Day Advisors is a buyer-side advisory that evaluates partners on fit for your specific context – not aggregate star ratings. We work on your behalf – not the vendor’s.

What grade inflation teaches us about vendor reviews

← All blog posts

What Grade Inflation Teaches Us About Vendor Reviews

Two Systems, One Distortion

What a B Actually Means

The Fit Problem

Recalibrating

Start a Conversation