Investigating how the New York Times A/B tests their headlines

[+] awhitby|5 years ago|reply

This is interesting but unless I missed it the author doesn't really explain why they believe they observe all A/B tests. They kind of assume that the randomization is over time (so every reader within a window sees the same headline, and then it changes) rather than within cohorts at a fixed time.

But the quote included suggests the NYT does do the latter: "Half of readers will see one headline, and the other half will see an alternative headline, for about half an hour."

So given that the author mostly observes long consistent blocks of time with the same headline, that suggests the NYT is allocating them to a subgroup in a persistent way (by IP or whatever). Then perhaps the cases where they didn't observe A/B testing were just cases where they were randomized into the optimal (hence final) headline subgroup by accident at the beginning, and never saw any different.

[+] tqi|5 years ago|reply

Agree, that is absolutely not how AB tests are run. No one runs tests sequentially on 100% of traffic - that would be like a McDonalds offering two versions of the Egg McMuffin, one from 7a-11a and the second from 11a-6p, and declaring the v1 a clear winner because it had more sales.

Assignments are also almost certainly sticky based on a browser cookie.

[+] tomjcleveland|5 years ago|reply

OP here. I don't know if anyone reads old HN threads but thought I'd clarify some things:

1. My scraper runs every 5 minutes, with a randomly-generated user-agent and never sends cookie headers

2. The charts are bucketed by half-hour periods, so even if the headline flips back and forth many times in half an hour, the colors are grouped together

3. Agreed that in the SpaceX situation (and maybe the Cuomo situation) the headlines change because the stories change. But, e.g., in the Meghan Markle situation, the first headline appears _after_ the interview is over. But that's something to watch out for!

And charts like this one[0] look (to me) like a clear example of A/B testing. But would be interested to hear other explanations!

[0] https://nyt.tjcx.me/articles/1163e0c4-e609-5cfa-aff1-b0945f1...

[+] lolc|5 years ago|reply

It took me a while to understand this, but I think the long consistent blocks are a sorting artifact. As the observed headlines are bucketed by the hour. You don't see the title flip within the hour, you just see the percentage.

It would have been more interesting to see the title flipping back and forth, because that would reveal how long those tests last. Less neat though.

[+] asdfaoeu|5 years ago|reply

In addition, there's only one case too where it changes back to the original and that's a minor In, in change.

Edit: The only example of clickbait here is OP's title.

[+] h00kwurm|5 years ago|reply

you’re right! as someone who has built (and builds the future of) the testing/algorithmic machinery at the NYT, this is one of the “misses” in the post.

[+] baking|5 years ago|reply

Also, if the content of the story changes with the headline, then it is probably just reflecting a developing story. That's what the SpaceX and Cuomo stories seemed to be doing. Assuming that all headline changes are just because an algorithm thought they were more dramatic seems to be jumping the gun.

[+] dang|5 years ago|reply

Ok, I've consed 'investigating' onto the title above, which hopefully adds an appropriate degree of uncertainty. Thanks!

[+] rosstex|5 years ago|reply

+1, they didn't explain that at all.

[+] JasonFruit|5 years ago|reply

Should a news source be optimizing for engagement, or for accuracy in communicating facts? I understand it's a business, but it's analogous to a bakery that labels all its goods as its best-selling items, only to confuse and disappoint the buyer when they open the box and find something different inside. It may sell that product that time, but it goes against the purpose of the organization as a whole, and doesn't seem like a sustainable practice.

[+] mxcrossr|5 years ago|reply

If people don’t like click bait I recommend not using news aggregators that only show the headline and are curated based on the votes of users who don’t read the articles!

[+] nkozyra|5 years ago|reply

Headlines have become advertisements. They're almost never written by the journalists who wrote the story, they're written for space constraints that change on a whim ...

They're very rarely "accurate" the way stories are intended to be for these reasons.

[+] tootie|5 years ago|reply

It's not an either/or. I work at a place that produces serious news. User engagement pays the bills and it also gets vital news in front of more eyes. Summing up a complex story into one sentence that convinces a reader to click and learn more is an art unto itself.

[+] refurb|5 years ago|reply

This is my takeaway. There is a huge incentive for the media to make things more sensational or dramatic because it drives clicks and views.

I might also argue that online news drives this more so than print - with print you tend to buy the paper based on the totality of their reporting and reputation versus a zero-cost click of a headline on your screen.

[+] throw_m239339|5 years ago|reply

> Should a news source be optimizing for engagement, or for accuracy in communicating facts?

A journal is a business, with something to sell, the news, and the attention of their readers to advertisers as well.

> The New York Times is a big deal. As they tell their advertisers, the NYT is the #1 news source for young, rich thought leaders:

Obviously ultimately it reflects badly on the profession, since all these news sites are using the same clickbait techniques, from Breibart to the Dailymail to NYT, since they probably hire the same consultancies when it comes to clickbaiting design, or at the very least, people who come from the same marketing circles/education.

[+] rkho|5 years ago|reply

It's the goal of Meredith Levien, their current CEO, for the NYT to be considered a "world class tech company"[1]. I believe doing A/B tests is a way to work towards that goal.

[1]: https://www.theinformation.com/articles/meredith-levien-want...

[+] afavour|5 years ago|reply

Is there actually an either/or here? Surely it’s possible to write two (or more) headlines for a story that are all equally accurate?

[+] samdfonseca|5 years ago|reply

The article is what communicates facts. The headline is just an advertisement to draw the user in to read it. Optimizing the headline means the article is able to inform more people; in line with the mission of the organization. NYT wants to increase subscribers, not ad revenue, so it's incentivized to not confuse and disappoint readers with clickbait headlines.

[+] moscovium|5 years ago|reply

Is it not the goal of a headline to engage?

[+] monch1962|5 years ago|reply

It _usually_ makes sense to optimise for engagement, particularly in the US where there's no state-funded media outlet (cue discussion about socialism that will be ignored). The UK has the BBC, Australia has the ABC, and both are state funded media outlets that aren't (at least overtly) driven by views. I'm sure their funding largely depends on how many people are consuming their product, but it's not as though some editor is going to be fired that afternoon if their "Oprah/Meghan" story loses out to the Murdoch equivalent. I'd be curious to compare how A/B testing works for state-funded media vs pay-for-access media outlets

[+] NetOpWibby|5 years ago|reply

This is why I’m letting my NYT subscription expire. I see no difference between them and tabloids.

Sensationalist junk.

[+] justapassenger|5 years ago|reply

It’s fundamentally possible to have both. Although, yes, it’s very tempting to sacrifice one for the other, once you start to optimize.

[+] ftio|5 years ago|reply

If you're interested in observing this in realtime, there's a great Twitter account called @nyt_diff[0] that automatically posts when headlines on the front page change. Really cool to watch articles get updated as stories evolve. It's particularly interesting to witness headline writers grapple with how and how much to editorialize.

[0] https://twitter.com/nyt_diff

[+] TeeMassive|5 years ago|reply

Cool bot. I'm a keyboard warrior and I've been burned by the NYT and other "mainstream" news site stealth changes multiple times. Now when a big story comes out I save the page using SingleFile to Dropbox and when the edited part comes up in the conversation I compare both versions copied paste with Copy PlainText and Meld which allows me to post nice screenshots of the diff.

[+] drak0n1c|5 years ago|reply

There's a project inspired by nyt diff and NewsDiffs and working on improving the algorithm and supporting many more news outlets:

https://github.com/DocNow/diffengine

[+] exikyut|5 years ago|reply

Agh, this is using a horribly noisy diffing algorithm.

With quite a few of the posts I'd just present them side by side, with the differences highlighted.

[+] motohagiography|5 years ago|reply

Valuable anecdata. News is defined by conflict, (literally, no conflict = not news) and this shows how headlines that express the most conflict get the most clicks.

The integrity issues come up when even though the facts reported are real, the conflict that frames them is manufactured - and this is why people reject news. They don't reject it because of fake facts, they reject it because of fake conflict. When I want the real news, I go to fringe websites, because they get the real conflict right, and if I need details and facts, I can look those up. The reason they get the real conflict right is because by definition the fringe lives in that conflict, and they are the real anti-establishment that creates a counter balance narrative, whereas a conflict produced by setting mundane events and facts against the backdrop of an ideology designed to manufacture conflict is unreadable tripe.

If I needed to sustain the dissonance of establishment narratives, I would read the NYTimes to keep up the appearances, but since my livelihood and aspirations do not depend on that, I have the freedom not to engage it. If you think you are being played and manipulated, watching these A/B tests should be enlightening.

[+] unknown|5 years ago|reply

[deleted]

[+] TeeMassive|5 years ago|reply

> literally, no conflict = not news

Read local news.

In fact that gives me an idea, a local news aggregator which ignores redundant news and can find the most impactful ones.

[+] clairity|5 years ago|reply

it's not simply conflict, but value-laden judgement and manipulative framings to try to coerce, not invite, agreement.

[+] ARandomerDude|5 years ago|reply

Meh. This is the narrative of Marx and Foucault, and one I think is demonstrably false.

For multiple counter-examples, see pretty much everything on https://www.msn.com/en-us/news/good-news

[+] h00kwurm|5 years ago|reply

I’m really happy that folks are digging into the perceptual outcomes of the AB testing from headlines. Frankly, it provides context to the NYT’s own research and development of the algorithms and human processes that go into such efforts.

If the narrative is entirely that if we dont actively consider capturing interest, we’d be doddering and hard to track, if we do we’re abusive, then media is forever doomed to be unsatisfactory. We all hope to improve.

In the world of headlines, the “spiciness” that’s been advanced as a function of engagement hunting is something that’s currently contended with through human intervention. All headlines are human created and the outcomes of AB tests are more about improving the understanding between author/editor and captured audience than manipulation or future interest conditioning.

[disclosure: i lead ML platforms and the algo related eng products at NYT. all thoughts are my own presentation of what i have experienced, and not company opinions]

[+] rossdavidh|5 years ago|reply

What we measure, determines what gets maximized (or minimized, if it's something we don't like). The factfulness of an article, or how well informed the reader is after they read it, are not easily measured. The clicks, are now easy to measure. What gets maximized, is the clicks.

It would be nice if we had a news source with a feature (opt in) where the reader got an email quiz the next day, asking just a few questions about the facts in the article. The feedback loop would tell us which articles actually make the reader better informed.

[+] Rastonbury|5 years ago|reply

Emotionally charged, slanted or dramatic headlines is one of the reasons I cancelled my NYT subscription a couple of years ago. I'd picked up a student NYT subscription back then and was had FT subscription from school, reading them back to back makes their slant and editorialization pretty obvious even if it isn't the most in your face.

[+] freebuju|5 years ago|reply

Correct me if am wrong but doesn't this mean they are eventually going to move entirely to a click-bait model of acquiring readers.

If this level of desperation is actually seen as acceptable for a media house, then journalism or whatever is left of it, is in dire need of help.

[+] ggm|5 years ago|reply

Great article. Stories also change under time and press engagement. Some of these may be feedback loop outcomes (cuomo) where others (spacex land explode story) i don't think are.

[+] bArray|5 years ago|reply

I would suggest that this can be more 'evil' that just trying to get more clicks.

A number of times I have seen on some place like Facebook where the initial article has some extreme headline, and then hours after when engagement is up, the headline is swapped to one that is less inflammatory. A few more extreme-perspective friends will send me an article saying "see?!?!" - and by the time I see it it's already been rewritten.

A few times I would click on an article going viral and find the headline in the article itself doesn't match the one cached by Facebook. Remember that most people aren't even reading the headlines and just assume that NYT are trustworthy. The first headline is the one they end up internalizing.

Bare in mind, there is zero consequence for doing this either. The newspaper and claim they were "correcting an editorial error", whilst openly spreading misinformation about some hot topic.

I think at the very least it should be mandatory to maintain a list of edits to an article once published - and to indicate to the reader clearly that the article has received a number of edits.

[+] trident5000|5 years ago|reply

"And yet it rarely attracts the kind of close scrutiny of, say, a Fox News. And that’s totally reasonable! Fox News is an absurd clown show and deserves every criticism it gets"

I find it comical that any write-up critical of a left wing organization needs to include something like this to avoid being flagged, down-voted, whatever, on the platform its being posted on.

[+] iujjkfjdkkdkf|5 years ago|reply

I don't usually read any US news as a rule, because of how sensational it all is. But I think I see most NYT headlines because someone inevitably posts them to NH. And honestly, when I compare them to what I saw on Fox News (which admittedly I only looked at for about a week around the election), I see no difference in the level of partisanship or spin. If anything there is less on Fox News just because the headline writers are not trying to show their undiscovered writing talent in the same way as NYT.

[+] unknown|5 years ago|reply

[deleted]

[+] sodality2|5 years ago|reply

If excluded, I doubt anyone would have cared or complained.

[+] Waterluvian|5 years ago|reply

CBC did this years ago.

Whenever I'd load the website, sometimes all the headlines would suddenly switch to different wordings, once sufficiently loaded.

[+] unknown|5 years ago|reply

[deleted]

[+] minimaxir|5 years ago|reply

The article is trying to paint NYT's A/B testing as nefarious, but unfortunately in the current social-media-oriented landscape, headlines are very, very important for clarity, and A/B testing headlines is something every media publisher with a sufficient tech infra does.

As frequent Hacker News submitters know, headlines alone can determine whether a post gets upvoted.

[+] reilly3000|5 years ago|reply

Good piece, but damn I was hoping for some details on their implementation. Are the variations coming from the server, CDN/edge, or client? How does this impact SEO? Do they change the article slug if the title is a winner? How much is automated and how much is human curated? When beset with new assignments who has the time to babysit these things?

[+] Hippocrates|5 years ago|reply

The value in A/B testing headlines is 30% boosting engagement directly and 70% creating a feedback loop where the newsroom can learn how to write a more engaging headline. Avoiding clickbait headlines requires some restraint and they are well aware of that slippery slope.

[+] exikyut|5 years ago|reply

Feature request:

Service that scrapes websites known to A/B test things, caches the A/B test content, and presents all of the variants at once.

In this context, yes, I want all the headlines. Maybe they could be presented as a diff. With timestamps!

[+] unethical_ban|5 years ago|reply

I think there is some bad intent at times - I think a reputable institution should be optimizing for accuracy and engagement, not sensation for its own sake. Different headline styles are suitable for different sections of the paper, as well.

I saw one the other day from Nick Kristof: "America is not designed for the anatomically correct". It was about the need for public restrooms, but the headline didn't make me read it. It sounded goofy and, other than knowing Kristof writes decent articles, I passed it over.

Later, the headline was "America isn't designed for those who pee". A bit coarse, but very much more clear as to what the topic of the editorial was.

[+] benlumen|5 years ago|reply

Doesn't surprise me in the least. I've long felt that "journalism" has more in common with marketing and SEO than much else these days.

[+] wp381640|5 years ago|reply

This is very reminiscent of how the YouTube algorithm when optimized for views and clicks drove a lot of people into deep rabbit holes.

120 comments