The effects of this are easiest to see with online recipes; the highest ranking recipes are all thousand word ramblings with a recipe tacked on at the end. Google sees you spent more time on the site (i.e. wasted scrolling) and thinks you were more 'engaged.'
I guess it is up to you to define your metric to be measured and to be optimized. To define the right metric and targets is not easy and probably is business-specific, i.e., is not the same for every business.
It depends on your goal. If you want to show more ads for longer, it's the local maxima of profit. If you want to provide something like a quick reference guide, then yes, it's terrible. Instead of optimizing for some arbitrary metric you should focus on doing whatever you're doing right.
Behavioral analytics on user interaction can help you fix a bad design but shouldn't matter that much.
Not necessarily - if I have a website that is trying to offer information to people about something, then it's better for everyone if it's more navigable/undersatandable, which session length / clicks might be a proxy for.
'Optimize' has been around for quite some time, just in case anybody thinks this is new. In fact, I believe this feature has been around for over a year, now.
Does anyone have a rule of thumb on when A/B testing becomes important for startups?
We have a few thousand visitors a month and are starting to convert, but my guess is A/B testing language and buttons would be premature optimization for us. Just curious at what point that's no longer the case.
The last heavily traffic site I worked on wouldn't perform an A/B test unless they could experiment with tens of thousands of daily active users. The experiments would last 2-3 weeks to gain statistical significance. A page usually had a 70/30 control experiment split.
The challenge is gaining statistically significant data. I think it is easier for an early stage customer to talk to their customers versus go through the time of a split test.
Some pretty easy rules of thumb, assuming you have a decent grasp on your economics. Look at it as a "low hanging fruit" optimization problem -- do you put resources into running a test (+ opportunity cost for lost sales), or into something else?
Suppose you have 10k monthly sessions with a 0.5% conversion rate (50 conversions). How many more customers would you need in order to prioritize running a test? If 55 conversions in a given month means you crush important KPIs, then that's probably worth testing -- you just need a 10% lift.*
Also keep in mind that running A/B tests (1 control, 1 treatment) is suboptimal. That tests, "does this beat what I have now?" The more important question is "what is my best option?".
OTOH, if other things like messaging and product are stable, you can test a smaller traffic site by leaving it running longer.
My rough estimate is 100 conversion events in the time the test runs. So if I have 100 conversion events in 1 month, it may make sense to run a 2-3 option + 1 control test for 1 month.
(You can also test much larger things than buttons. For startups, I like to suggest trying out positioning or value statements and seeing how visitors respond!)
* however, it'll take a long time for you to reach statistical confidence for a 10% lift in rate, with only 50 conversion events across all tests.
The idea is that to get results, you need some combination of a lot of data, or a big impact from the changes your testing. If you just change the color of the signup button, it probably won't have a major impact on the conversion rate, so you'll need a lot more data to reach a conclusion. But if you test a completely new landing page, it might have a better chance of being meaningfully different (better or worse, who knows until you test?) and so you wouldn't need as many visitors to get a result.
In order to be useful, you probably want to see AB tests reaching a conclusion in under 30 days. I'd say for a conversion rate goal, this is going to be when you have around 100k visitors a month.
There are a few variables to consider
- What goal would you like to AB test? Conversion rate is an end-of-funnel goal that needs a lot of traffic, you can use upper funnel goals like product views, add to bag etc to get quicker conclusions (not as accurate but often a good approximation)
- The stats engine/AB testing tool you are using. More simple tools might conclude quicker but in my experience they can be so inaccurate they are counter productive. Usually a long time to conclude = reliable results. I've never used Google Optimize so I'm not sure where it stands.
- How many people are being exposed to the AB test, for example is it all web traffic or just mobile?
- How much of an affect the AB test has on behavior. A button color/text change will normally take long to conclude than a feature that's really helping your users.
- How confident do you want to be before reaching a conclusion? I'd recommend looking for 95% confidence in uplift before concluding an AB test.
At least thinking about it is important NOW, regardless of what stage your startup is. It can be extremely difficult to try to slot a 3rd party A/B testing solution into your product (or really hard to roll your own) if your infra doesn't support it from the start. Also, hire a data scientist!
(Disclaimers: I am not a data scientist, I just think everyone needs one! I have worked on an experimentation system at $BIG_COMPANY.)
I'd suggest thinking about the following BEFORE YOU RUN A SINGLE A/B TEST:
1) Key Metrics: Define these. They are the general, "I don't care what your experiment is about, these numbers are important." Every experiment you run should automatically track these metrics. You should also give the ability to define custom metrics, since an experiment that changes some random button color probably wants to look at how many people clicked the button, which is almost definitely NOT a key metric.
2) Logging Infrastructure: Make sure that you have a easy-to-use, reliable data pipeline set up for logging and processing events. Bad logging == bad experiment results. Also consider streaming vs batch processing for updating experiment results.
3) Population Management: How do your experiments segment users? Are variants calculated in realtime? Batched with some SLA for lag? Are they sticky?
4) Mutual Exclusion: People running experiments often want "their" users excluded from other experiments.
5) Guardrails: Do your experiments automatically shut off if there is a catastrophic decline in one or more key metrics? What safety measures do you have around determining if an experiment is safe/valid? How do you handle cleaning up data when there's a problem? What sorts of actions invalidate an experiement's existing results? Does your entire site break if your A/B Testing service is down for whatever reason?
6) Cleanup/Ownership: Experiments don't run forever (at least they shouldn't!). Cleaning up old features, populations, etc can a pain, especially when the people that wrote the stuff originally no longer work at the company. Make cleanup mandatory and as easy as possible.
There's a lot more, but I'm tired now. A/B testing is complex. There are lots of resources out there, though. Look for white papers on the subject, they're surprisingly approachable. Example from Microsoft: https://exp-platform.com/Documents/2017-08%20KDDMetricInterp...
I am wondering if it is based on their own Google HyperTune / Vizier [1, 2] which is modified to better deal with uncertainties or it is an absolutely independent in-house development.
[+] [-] rightbyte|7 years ago|reply
[+] [-] uberstuber|7 years ago|reply
[+] [-] indiesolver|7 years ago|reply
[+] [-] stochastic_monk|7 years ago|reply
[+] [-] John_KZ|7 years ago|reply
Behavioral analytics on user interaction can help you fix a bad design but shouldn't matter that much.
[+] [-] jakob223|7 years ago|reply
That said, it certainly can be.
[+] [-] maxpupmax|7 years ago|reply
[+] [-] AstralStorm|7 years ago|reply
(not necessarily true, but it has happened before with YouTube)
[+] [-] GunlogAlm|7 years ago|reply
[+] [-] curo|7 years ago|reply
We have a few thousand visitors a month and are starting to convert, but my guess is A/B testing language and buttons would be premature optimization for us. Just curious at what point that's no longer the case.
[+] [-] 505aaron|7 years ago|reply
The challenge is gaining statistically significant data. I think it is easier for an early stage customer to talk to their customers versus go through the time of a split test.
[+] [-] citrablue|7 years ago|reply
Suppose you have 10k monthly sessions with a 0.5% conversion rate (50 conversions). How many more customers would you need in order to prioritize running a test? If 55 conversions in a given month means you crush important KPIs, then that's probably worth testing -- you just need a 10% lift.*
Also keep in mind that running A/B tests (1 control, 1 treatment) is suboptimal. That tests, "does this beat what I have now?" The more important question is "what is my best option?".
OTOH, if other things like messaging and product are stable, you can test a smaller traffic site by leaving it running longer.
My rough estimate is 100 conversion events in the time the test runs. So if I have 100 conversion events in 1 month, it may make sense to run a 2-3 option + 1 control test for 1 month.
(You can also test much larger things than buttons. For startups, I like to suggest trying out positioning or value statements and seeing how visitors respond!)
* however, it'll take a long time for you to reach statistical confidence for a 10% lift in rate, with only 50 conversion events across all tests.
[+] [-] the_bear|7 years ago|reply
The idea is that to get results, you need some combination of a lot of data, or a big impact from the changes your testing. If you just change the color of the signup button, it probably won't have a major impact on the conversion rate, so you'll need a lot more data to reach a conclusion. But if you test a completely new landing page, it might have a better chance of being meaningfully different (better or worse, who knows until you test?) and so you wouldn't need as many visitors to get a result.
[+] [-] roberttod|7 years ago|reply
There are a few variables to consider
- What goal would you like to AB test? Conversion rate is an end-of-funnel goal that needs a lot of traffic, you can use upper funnel goals like product views, add to bag etc to get quicker conclusions (not as accurate but often a good approximation)
- The stats engine/AB testing tool you are using. More simple tools might conclude quicker but in my experience they can be so inaccurate they are counter productive. Usually a long time to conclude = reliable results. I've never used Google Optimize so I'm not sure where it stands.
- How many people are being exposed to the AB test, for example is it all web traffic or just mobile?
- How much of an affect the AB test has on behavior. A button color/text change will normally take long to conclude than a feature that's really helping your users.
- How confident do you want to be before reaching a conclusion? I'd recommend looking for 95% confidence in uplift before concluding an AB test.
[+] [-] shostack|7 years ago|reply
[+] [-] thedirt0115|7 years ago|reply
I'd suggest thinking about the following BEFORE YOU RUN A SINGLE A/B TEST:
1) Key Metrics: Define these. They are the general, "I don't care what your experiment is about, these numbers are important." Every experiment you run should automatically track these metrics. You should also give the ability to define custom metrics, since an experiment that changes some random button color probably wants to look at how many people clicked the button, which is almost definitely NOT a key metric.
2) Logging Infrastructure: Make sure that you have a easy-to-use, reliable data pipeline set up for logging and processing events. Bad logging == bad experiment results. Also consider streaming vs batch processing for updating experiment results.
3) Population Management: How do your experiments segment users? Are variants calculated in realtime? Batched with some SLA for lag? Are they sticky?
4) Mutual Exclusion: People running experiments often want "their" users excluded from other experiments.
5) Guardrails: Do your experiments automatically shut off if there is a catastrophic decline in one or more key metrics? What safety measures do you have around determining if an experiment is safe/valid? How do you handle cleaning up data when there's a problem? What sorts of actions invalidate an experiement's existing results? Does your entire site break if your A/B Testing service is down for whatever reason?
6) Cleanup/Ownership: Experiments don't run forever (at least they shouldn't!). Cleaning up old features, populations, etc can a pain, especially when the people that wrote the stuff originally no longer work at the company. Make cleanup mandatory and as easy as possible.
There's a lot more, but I'm tired now. A/B testing is complex. There are lots of resources out there, though. Look for white papers on the subject, they're surprisingly approachable. Example from Microsoft: https://exp-platform.com/Documents/2017-08%20KDDMetricInterp...
[+] [-] indiesolver|7 years ago|reply
[1] https://cloud.google.com/ml-engine/docs/tensorflow/using-hyp...
[2] https://ai.google/research/pubs/pub46180
[+] [-] matt4077|7 years ago|reply
(I'm not complaining–it's interesting and I'll give it a try. But it's not brand new)
[+] [-] na85|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] hartator|7 years ago|reply
[+] [-] subpixel|7 years ago|reply
https://github.com/dwyl/learn-google-optimize/issues/8
[+] [-] dudus|7 years ago|reply
[+] [-] jwatte|7 years ago|reply
[+] [-] oh-kumudo|7 years ago|reply