You Only Need to Test with 5 Users (2000)

[+] an_opabinia|6 years ago|reply

This is an excellent point, and the much more fascinating corollaries:

- a product designer/manager of something with 1,000,000 users won’t learn more about usability than a product designer/manager of something with 15 users. All those measurements of flows and secret at scale analytics data is sort of worthless for the purposes of usability.

- people with 15 users worth of learning about usability instead of 0 users are way undervalued, while people with 1,000,000 are way overvalued

- “we don’t know until we test it,” the #1 refrain of big company design nowadays, is intellectually bankrupt for most free software, since if it looked bad for 15 users it’s probably going to still be bad for 1,000,000

- after you’ve shown something to 15 people and they don’t like it due to usability problems, you’re extremely unlikely to find 1,000,000 more who will like it.

This also appeals to my instinct that there is something learnable about design, that great design is achieved by people and not by massive datasets.

[+] noir_lord|6 years ago|reply

It might even be worse still since this model (seems I may be wrong) to assume that the probability of a usability bug is constant, it might be that the share of bugs discovered by users is skewed towards the first few such that the first user finds more than the formula would predict.

It's certainly been my observation that cynical developers who test things as they go by deliberately putting in silly things into stuff they just wrote seem to get hung up less in testing.

I mean the system I inherited at work the first thing I did when I got an instance spun up was put in a negative value in the quote line quantity (which immediately broke..well almost everything) then decimal values in quantity fields where only integers made sense, then text in number fields and so on each time breaking something in a new and interesting way.

Sometimes I think it's hard not to be cynical about enterprise systems.

My old lecturer (somewhat pithily) "Almost all the testing in the world means nothing compared to 15 minutes in the hands of the 17 year old office junior"

[+] ddebernardy|6 years ago|reply

+1 but several of your points are incorrect or need caveats.

Your 1st point isn't correct, in that you will learn interesting things from 1m users that you won't from 15. The thing is that the 15 will tell you why, whereas the 1m won't unless you ask precisely the right question (which is a problem in itself). It basically takes experience here... (And this is something an AI may eventually become very good at.)

Your 3rd and 4th point aren't correct, in that given some sampling error you may very well find that what 15 people don't like 1m will.

I'm in full agreement with you in principle, in the sense that I firmly hold that far too many start-ups do quantitative without qualitative and end up with the wrong conclusions; but you can't just wave away quantitative data like that.

[+] skybrian|6 years ago|reply

This may be true for software used when working alone, but for software used by groups, you are not going to figure out group effects without testing with different group sizes. Consider the differences testing a game with 1, 2, 5, 100, or 1000 users.

[+] Spooky23|6 years ago|reply

Agree, as the massive datasets are telling you more about what people want instead how the product works.

That’s a big distinction, as when you look at a million people, 80% are are either the same, or don’t really understand what they are doing at all.

Modern designers usually assume that people are dolts and need simplicity. Optimal in my opinion is providing quality on-boarding and understanding behavior of someone who knows what they are doing as well.

[+] mntmoss|6 years ago|reply

I definitely think there is a learnable aspect to design. Throwing data at the problem is expensive and does not create many deep insights. But focusing on communication and how the product communicates a coherent philosophy through features and UX yields a more systematic, standalone framework to test hypotheses against, something which data can further shape. It's more important to find "must" and "cannot" boundary conditions than preferences, and it's also easier to see if you are satisfying those conditions.

My formula is therefore: Pick a few principles to apply to the solution you're making, make sure as best you can that they agree with each other. Test to see if potential users agree with those principles. Select features based on the resulting design framework. Then develop the product as a way of discovering more about this framework.

Lots of companies are doing this implicitly through their founding teams and hiring practices: they just end up with culture that values certain principles, so they get upheld at every meeting and the results end up in the product. But it can also be communicated and imposed more explicitly, and I think that's where design becomes more visible in the process.

[+] nojvek|6 years ago|reply

It’s not exclusive, usability is a duality. At the micro and macro levels. At the micro level you want to watch a couple of individual users and watch whether every little thing added to the UI makes sense to them. Whether the product solves their problem.

If micro fails, macro will most likely fail.

At the macro level you’re tracking higher level metrics like how many users engage with the different parts of the UI, where the fallout is? What is the retention of users, what do power users do differently to new users e.t.c

This kind of wholistic view about the product at micro and macro levels gives you a much better understanding about the bottle necks in the product.

[+] bluGill|6 years ago|reply

If you have 1,000,000 users you need to ask very different questions, so you better learn different things.

At 15 users your training budget is to teach everybody how to use your mess of a UI is cheaper than the cost of fixing the mess of a UI. At 1,000,000 users your cost of training is higher, but you also have more users to spread the cost of preparing the training among.

[+] perl4ever|6 years ago|reply

My impression is that most people instinctively assume that to get accurate results from sampling, you need a sample size proportional to the population, and that isn't generally the case.

Maybe for many purposes, 5 won't cut it and you need 100 or 1,000, but you don't need 1%.

[+] miki123211|6 years ago|reply

I don't have much to do with UX, but I will add one insight. Try finding users with esoteric ways of working, and ensure the site works for them. By esoteric I mean blind users of screen readers, users of old, non js browsers, people on corp or school networks where things may be blocked, people on extremely small screens or extremely large ones (TVs with remotes), people who don't own or don't want to use a credit card (there's a widespread credit card phobia in i.e. Poland when it comes to online purchases), people using a different language and keyboard layout, particularly ltr, significant when it comes to desktop apps etc. I'm a screen reader user myself, and I find websites that might be beautiful, but are utterly inaccessible constantly. I've either encountered, or was a witness of, usability difficulties in all of those categories I outlined. For each one, I could provide an example of a website or app that I/someone had to abandon for just that reason, and this is just me. I'm sure there are more nichse I haven't thought about.

[+] iamaelephant|6 years ago|reply

This is going to depend heavily on your target market. In many of the SaaS applications I have been involved in we really don't care about users with non-JS browsers, or extremely small screens, or TVs, or people without credit cards. Some of the applications I have worked on will never be translated.

[+] achow|6 years ago|reply

This has been questioned (5 user theory).

From 2008 research paper: http://www.simplifyinginterfaces.com/wp-content/uploads/2008...

Excerpt:

Historic reason: Both Nielsen (1993) and Virzi (1992) were writing in a climate in which the concepts of usability were still being introduced..They were striving to lighten the requirements of usability testing in order to make usability practices more attractive to those working with the strained budgets.

Conclusion: It is advisable to run the maximum number of participants that schedules, budgets, and availability allow. The mathematical benefits of adding test users should be cited. More test users means greater confidence that the problems that need to be fixed will be found; as is shown in the analysis for this study, increasing the number from 5 to 10 can result in a dramatic improvement in data confidence. Increasing the number tested to 20 can allow the practitioner to approach increasing level

[+] chakintosh|6 years ago|reply

A few weeks ago, I tested an app with 50 users from different backgrounds, affinities, abilities ... etc.

Towards the end, we realized the past 20 or so tests have been a waste of time. Issues and improvements that arose from the first 20sh tests kept repeating themselves throughout every remaining session.

This sure increases confidence in your data, but really when you're in an MVP stage or you don't have much funding, you're better off testing with around 15 to 20 people and fix the issues that they find, because most likely, those issues are in fact very problematic and deserve more priority. More users will just yield more granular bugs and issues that you can schedule for later.

[+] tedivm|6 years ago|reply

The author ends by saying you need five users from each distinct type of user, with the example of parent and child.

Unfortunately the author neglected to mention that the vast majority of projects already have multiple distinct groups- abled people, blind people, deaf people, and are just a start. In many places these are legally required considerations.

[+] linuxdude314|6 years ago|reply

The 10x engineer uses no users for the test, but instead microdoses during testing for an altered subjective experience.

[+] swagasaurus-rex|6 years ago|reply

I think I discovered a way to become a 420x engineer.

[+] awillen|6 years ago|reply

This is one of those things that takes a very abstract concept and tries to boil it down a bit too far with a mathematical model. Also, the 5 number in the headline is just misleading, since he clearly points out that the real number is 15.

The reality is that the number of people you need to test with to get the right number of insights (along with the depth of testing for any given user) is going to vary drastically across products of varying purposes and level of complexity. 5/15 users may be a reasonable average, but this is a case where an average of many different things isn't a particularly useful measure for any one of those things.

That doesn't even take into account the quality of people that you're testing with. Five experienced testers is different than five people with domain expertise but not testing experience is different than five people off the street.

[+] zild3d|6 years ago|reply

> Also, the 5 number in the headline is just misleading, since he clearly points out that the real number is 15.

The 5 number isn't misleading at all, the author shows that it's the 80/20 point. The first 5 users in your usability test give you 85% of the value of testing with 15. The takeaway is to not do the same exact test with 15 users, do an iteration with 5, and rinse and repeat.

"Let us say that you do have the funding to recruit 15 representative customers and have them test your design. Great. Spend this budget on 3 studies with 5 users each!"

[+] zhoujianfu|6 years ago|reply

A friend who worked at google told me once they would invariably get better insight on usability by just following a few individual users around a task vs. analyzing their zillions and zillions of site visits.

[+] dalbasal|6 years ago|reply

The caveat here is that it depends on what you mean by "test" and what "insights" you are interested in.

If you are interested in (for example) determining the optimal price and only 1/20 users buy something... you still probably need about 1,000 X (price points you want to test).

In that "test" you are basically trying to uncover the demand curve (or points on it). It's a statistical question.

Say you have a dating app, and you are trying a new matching algorithim. It will also take thousands of matches before you have the data to make determinations.

All that said, I totally agree with the author. I would just frame it differently.

The question you need to ask is "do I need statistics?" Statistics have become habitual, but much (most?) of the time, we don't need statistics.

If you want to learn if a user can write and publish a blog using your software or install a water filter under they sink... you don't need statistics. You need to know where most people get into trouble, and n=5 will work fine for that.

This is intuitive if we just think outside of the "testing" vocabular. You write a CV/essay/article. You ask 1-3 friends to read it and advise. You don't produce statistics.

[+] unknown|6 years ago|reply

[deleted]

[+] arathore|6 years ago|reply

The article talks specifically about usability testing - though it isn't in the title but it is in the first line of the article. I don't think pricing strategies or matching algorithms and such would fall under this domain.

[+] julienreszka|6 years ago|reply

you probably don't but you probably should

[+] anbop|6 years ago|reply

Also... when you have 15 users (and it’s slow growing) it’s because you satisfy a unique need. These users are actually willing to talk to you for HOURS because they need your product, they know it’s niche, and that their feedback can actually affect the product development. Speaking from experience, I had a customer fly to ME to give me feedback.

[+] piyush_soni|6 years ago|reply

Wow. Skype/Email didn't suffice? :)

[+] calmchaos|6 years ago|reply

Some key questions:

1. How extensively do those 5 people test the software? Do they test all features or just part of the software?

2. What is the background of those 5 people testing the software? Do they understand UX/good UI design and how well?

3. Are these 5 people just random users or professional test engineers?

4. How passionate are these 5 people about the product/service they are testing? How meaningful is it for them that the product actually works _really well_?

5. What is the quality level of feedback these 5 people can provide? Is it like "meh, this is ugly" or is it detailed, concrete and contains practical improvement ideas that can be easily implemented?

[+] Kluny|6 years ago|reply

Ha, as a web designer I long for the ability to test with even one user before launch day. Rarely do I get room in the budget for that :(

[+] pkamb|6 years ago|reply

Sit in Starbucks and offer $5 gift cards...

[+] unknown|6 years ago|reply

[deleted]

[+] bravura|6 years ago|reply

You don't have $10? Come on.

[+] nitwit005|6 years ago|reply

The fundamental assumption here seems to be that users are basically interchangeable:

> There is no real need to keep observing the same thing multiple times

That may be true for some simpler products, but I helped out with some user research on an analytics tool, and there was quite a diversity of feedback from the first two batches of users.

[+] gdcohen|6 years ago|reply

There is a trend towards not testing at all! Instead builds are deployed straight to production or canary (mini-subset of production) and then very carefully and closely monitored. If a problem is uncovered, a rollback is performed. If canary is done well, then the problem can be caught before it has widespread impact.

[+] imhoguy|6 years ago|reply

By "not testing" you mean automated and no manual testing. Still somebody needs to manually test some cases to write them down next as a code to cover constantly changing product, but even the best automation won't resolve unknown unknowns. Canary is only a way to reduce one deployment impact.

[+] CodiePetersen|6 years ago|reply

Wow great article and perfect timing for me. I'm about to go into testing phase myself and this is something I've never considered. I was worried because we're and indie group and I wasn't sure how we were going to get a lot of people but looks like smaller is good enough and even optimal.

Thanks.

[+] EGreg|6 years ago|reply

I totally disagree this is true for most sites today.

Usability these days is not just about what a single user will do. We build multi user apps. So the following things are an emergent phenomenon of actions MANY people take:

Engagement

Retention

Viral Spread

Collaboration

Notifications

Real time updates

Chicken and Egg problems

To test these things, you often need tons of real or fake accounts. You get some people trying interest X, and others interest Y. Sometimes things with the exact same interface usability take off in one country and not another. Like Orkut in Brazil!

Sites sometimes arrive at breakthroughs by A/B testing many things automatically across millions of sessions. That’s far more efficient but requires a large enough sample.

In fact, most of the reasons famous sites are famous is because they successfully got a lot of people to keep coming back and doing something. They probably got them to invite friends. And so on.

[+] dbg31415|6 years ago|reply

I did some work for a major telco and we weren't allowed to do user testing outside of the company courtyard.

It consistently blew my mind how "out there" the feedback was with about a half-a-day-worth of testing.

So I don't know about 5-users, but I can say having about 25-50 is good for getting a broad sample.

If you get the wrong 5 users... you're going to get some really skewed results. For example, if you grab a group of 5 people and all of them are in tech your results are going to be dramatically different than if they are 5 people from accounting, or 5 people from janitorial services.

[+] z3t4|6 years ago|reply

I think the key here is to find five representative customers , not just five random people. These users can be hard to find unless you have good market fit. But if you have very good market fit, my experience is that users will overcome just about every hurdle. For example digging gold and other minerals, people will do a lot of work if it's valuable. But if it's not as valuable you might need to hand it to them on a silver plate.

[+] TrialError|6 years ago|reply

I have loved this rule ever since I came across it as a student in 2002 and been using it successfully for user acceptance testing strategies in large projects.

I interpret it as a sort of fixed Pareto principle for projects where you limit effort and team size for maximum gain. N=5 also happens to be close to the ideal team size in agile frameworks. This rule is smart in a lot of ways and was ahead of its time.

[+] andy_ppp|6 years ago|reply

I find it very irritating where I am that the design team don’t interact with the programmers concerns about the designs difficulties and instead just point to user research. A lot of design issues can be fixed by talking to not just your users but any human walking by. Hallway usability testing works well and just about anyone, even a programmer, will make your design better with a fresh pair of eyes.

[+] buro9|6 years ago|reply

I always worked on the premise: "Find five people who care"

It's nice to get the "five" verified, but I still think it's important to make sure they care, to have them be an essential part of making the product better. The trick though is not to be drawn into making anything for any one of these users (it's still got to be a generic usability for all users).

107 comments