This is interesting but unless I missed it the author doesn't really explain why they believe they observe all A/B tests. They kind of assume that the randomization is over time (so every reader within a window sees the same headline, and then it changes) rather than within cohorts at a fixed time.
But the quote included suggests the NYT does do the latter: "Half of readers will see one headline, and the other half will see an alternative headline, for about half an hour."
So given that the author mostly observes long consistent blocks of time with the same headline, that suggests the NYT is allocating them to a subgroup in a persistent way (by IP or whatever). Then perhaps the cases where they didn't observe A/B testing were just cases where they were randomized into the optimal (hence final) headline subgroup by accident at the beginning, and never saw any different.
Agree, that is absolutely not how AB tests are run. No one runs tests sequentially on 100% of traffic - that would be like a McDonalds offering two versions of the Egg McMuffin, one from 7a-11a and the second from 11a-6p, and declaring the v1 a clear winner because it had more sales.
Assignments are also almost certainly sticky based on a browser cookie.
OP here. I don't know if anyone reads old HN threads but thought I'd clarify some things:
1. My scraper runs every 5 minutes, with a randomly-generated user-agent and never sends cookie headers
2. The charts are bucketed by half-hour periods, so even if the headline flips back and forth many times in half an hour, the colors are grouped together
3. Agreed that in the SpaceX situation (and maybe the Cuomo situation) the headlines change because the stories change. But, e.g., in the Meghan Markle situation, the first headline appears _after_ the interview is over. But that's something to watch out for!
And charts like this one[0] look (to me) like a clear example of A/B testing. But would be interested to hear other explanations!
It took me a while to understand this, but I think the long consistent blocks are a sorting artifact. As the observed headlines are bucketed by the hour. You don't see the title flip within the hour, you just see the percentage.
It would have been more interesting to see the title flipping back and forth, because that would reveal how long those tests last. Less neat though.
you’re right! as someone who has built (and builds the future of) the testing/algorithmic machinery at the NYT, this is one of the “misses” in the post.
Also, if the content of the story changes with the headline, then it is probably just reflecting a developing story. That's what the SpaceX and Cuomo stories seemed to be doing. Assuming that all headline changes are just because an algorithm thought they were more dramatic seems to be jumping the gun.
Should a news source be optimizing for engagement, or for accuracy in communicating facts? I understand it's a business, but it's analogous to a bakery that labels all its goods as its best-selling items, only to confuse and disappoint the buyer when they open the box and find something different inside. It may sell that product that time, but it goes against the purpose of the organization as a whole, and doesn't seem like a sustainable practice.
If people don’t like click bait I recommend not using news aggregators that only show the headline and are curated based on the votes of users who don’t read the articles!
Headlines have become advertisements. They're almost never written by the journalists who wrote the story, they're written for space constraints that change on a whim ...
They're very rarely "accurate" the way stories are intended to be for these reasons.
It's not an either/or. I work at a place that produces serious news. User engagement pays the bills and it also gets vital news in front of more eyes. Summing up a complex story into one sentence that convinces a reader to click and learn more is an art unto itself.
This is my takeaway. There is a huge incentive for the media to make things more sensational or dramatic because it drives clicks and views.
I might also argue that online news drives this more so than print - with print you tend to buy the paper based on the totality of their reporting and reputation versus a zero-cost click of a headline on your screen.
> Should a news source be optimizing for engagement, or for accuracy in communicating facts?
A journal is a business, with something to sell, the news, and the attention of their readers to advertisers as well.
> The New York Times is a big deal. As they tell their advertisers, the NYT is the #1 news source for young, rich thought leaders:
Obviously ultimately it reflects badly on the profession, since all these news sites are using the same clickbait techniques, from Breibart to the Dailymail to NYT, since they probably hire the same consultancies when it comes to clickbaiting design, or at the very least, people who come from the same marketing circles/education.
It's the goal of Meredith Levien, their current CEO, for the NYT to be considered a "world class tech company"[1]. I believe doing A/B tests is a way to work towards that goal.
The article is what communicates facts. The headline is just an advertisement to draw the user in to read it. Optimizing the headline means the article is able to inform more people; in line with the mission of the organization. NYT wants to increase subscribers, not ad revenue, so it's incentivized to not confuse and disappoint readers with clickbait headlines.
It _usually_ makes sense to optimise for engagement, particularly in the US where there's no state-funded media outlet (cue discussion about socialism that will be ignored).
The UK has the BBC, Australia has the ABC, and both are state funded media outlets that aren't (at least overtly) driven by views. I'm sure their funding largely depends on how many people are consuming their product, but it's not as though some editor is going to be fired that afternoon if their "Oprah/Meghan" story loses out to the Murdoch equivalent.
I'd be curious to compare how A/B testing works for state-funded media vs pay-for-access media outlets
If you're interested in observing this in realtime, there's a great Twitter account called @nyt_diff[0] that automatically posts when headlines on the front page change. Really cool to watch articles get updated as stories evolve. It's particularly interesting to witness headline writers grapple with how and how much to editorialize.
Cool bot. I'm a keyboard warrior and I've been burned by the NYT and other "mainstream" news site stealth changes multiple times. Now when a big story comes out I save the page using SingleFile to Dropbox and when the edited part comes up in the conversation I compare both versions copied paste with Copy PlainText and Meld which allows me to post nice screenshots of the diff.
Valuable anecdata. News is defined by conflict, (literally, no conflict = not news) and this shows how headlines that express the most conflict get the most clicks.
The integrity issues come up when even though the facts reported are real, the conflict that frames them is manufactured - and this is why people reject news. They don't reject it because of fake facts, they reject it because of fake conflict. When I want the real news, I go to fringe websites, because they get the real conflict right, and if I need details and facts, I can look those up. The reason they get the real conflict right is because by definition the fringe lives in that conflict, and they are the real anti-establishment that creates a counter balance narrative, whereas a conflict produced by setting mundane events and facts against the backdrop of an ideology designed to manufacture conflict is unreadable tripe.
If I needed to sustain the dissonance of establishment narratives, I would read the NYTimes to keep up the appearances, but since my livelihood and aspirations do not depend on that, I have the freedom not to engage it. If you think you are being played and manipulated, watching these A/B tests should be enlightening.
I’m really happy that folks are digging into the perceptual outcomes of the AB testing from headlines. Frankly, it provides context to the NYT’s own research and development of the algorithms and human processes that go into such efforts.
If the narrative is entirely that if we dont actively consider capturing interest, we’d be doddering and hard to track, if we do we’re abusive, then media is forever doomed to be unsatisfactory. We all hope to improve.
In the world of headlines, the “spiciness” that’s been advanced as a function of engagement hunting is something that’s currently contended with through human intervention. All headlines are human created and the outcomes of AB tests are more about improving the understanding between author/editor and captured audience than manipulation or future interest conditioning.
[disclosure: i lead ML platforms and the algo related eng products at NYT. all thoughts are my own presentation of what i have experienced, and not company opinions]
What we measure, determines what gets maximized (or minimized, if it's something we don't like). The factfulness of an article, or how well informed the reader is after they read it, are not easily measured. The clicks, are now easy to measure. What gets maximized, is the clicks.
It would be nice if we had a news source with a feature (opt in) where the reader got an email quiz the next day, asking just a few questions about the facts in the article. The feedback loop would tell us which articles actually make the reader better informed.
Emotionally charged, slanted or dramatic headlines is one of the reasons I cancelled my NYT subscription a couple of years ago. I'd picked up a student NYT subscription back then and was had FT subscription from school, reading them back to back makes their slant and editorialization pretty obvious even if it isn't the most in your face.
Great article. Stories also change under time and press engagement. Some of these may be feedback loop outcomes (cuomo) where others (spacex land explode story) i don't think are.
I would suggest that this can be more 'evil' that just trying to get more clicks.
A number of times I have seen on some place like Facebook where the initial article has some extreme headline, and then hours after when engagement is up, the headline is swapped to one that is less inflammatory. A few more extreme-perspective friends will send me an article saying "see?!?!" - and by the time I see it it's already been rewritten.
A few times I would click on an article going viral and find the headline in the article itself doesn't match the one cached by Facebook. Remember that most people aren't even reading the headlines and just assume that NYT are trustworthy. The first headline is the one they end up internalizing.
Bare in mind, there is zero consequence for doing this either. The newspaper and claim they were "correcting an editorial error", whilst openly spreading misinformation about some hot topic.
I think at the very least it should be mandatory to maintain a list of edits to an article once published - and to indicate to the reader clearly that the article has received a number of edits.
"And yet it rarely attracts the kind of close scrutiny of, say, a Fox News. And that’s totally reasonable! Fox News is an absurd clown show and deserves every criticism it gets"
I find it comical that any write-up critical of a left wing organization needs to include something like this to avoid being flagged, down-voted, whatever, on the platform its being posted on.
I don't usually read any US news as a rule, because of how sensational it all is. But I think I see most NYT headlines because someone inevitably posts them to NH. And honestly, when I compare them to what I saw on Fox News (which admittedly I only looked at for about a week around the election), I see no difference in the level of partisanship or spin. If anything there is less on Fox News just because the headline writers are not trying to show their undiscovered writing talent in the same way as NYT.
The article is trying to paint NYT's A/B testing as nefarious, but unfortunately in the current social-media-oriented landscape, headlines are very, very important for clarity, and A/B testing headlines is something every media publisher with a sufficient tech infra does.
As frequent Hacker News submitters know, headlines alone can determine whether a post gets upvoted.
Good piece, but damn I was hoping for some details on their implementation. Are the variations coming from the server, CDN/edge, or client? How does this impact SEO? Do they change the article slug if the title is a winner? How much is automated and how much is human curated? When beset with new assignments who has the time to babysit these things?
The value in A/B testing headlines is 30% boosting engagement directly and 70% creating a feedback loop where the newsroom can learn how to write a more engaging headline. Avoiding clickbait headlines requires some restraint and they are well aware of that slippery slope.
I think there is some bad intent at times - I think a reputable institution should be optimizing for accuracy and engagement, not sensation for its own sake. Different headline styles are suitable for different sections of the paper, as well.
I saw one the other day from Nick Kristof: "America is not designed for the anatomically correct". It was about the need for public restrooms, but the headline didn't make me read it. It sounded goofy and, other than knowing Kristof writes decent articles, I passed it over.
Later, the headline was "America isn't designed for those who pee". A bit coarse, but very much more clear as to what the topic of the editorial was.
[+] [-] awhitby|5 years ago|reply
But the quote included suggests the NYT does do the latter: "Half of readers will see one headline, and the other half will see an alternative headline, for about half an hour."
So given that the author mostly observes long consistent blocks of time with the same headline, that suggests the NYT is allocating them to a subgroup in a persistent way (by IP or whatever). Then perhaps the cases where they didn't observe A/B testing were just cases where they were randomized into the optimal (hence final) headline subgroup by accident at the beginning, and never saw any different.
[+] [-] tqi|5 years ago|reply
Assignments are also almost certainly sticky based on a browser cookie.
[+] [-] tomjcleveland|5 years ago|reply
1. My scraper runs every 5 minutes, with a randomly-generated user-agent and never sends cookie headers
2. The charts are bucketed by half-hour periods, so even if the headline flips back and forth many times in half an hour, the colors are grouped together
3. Agreed that in the SpaceX situation (and maybe the Cuomo situation) the headlines change because the stories change. But, e.g., in the Meghan Markle situation, the first headline appears _after_ the interview is over. But that's something to watch out for!
And charts like this one[0] look (to me) like a clear example of A/B testing. But would be interested to hear other explanations!
[0] https://nyt.tjcx.me/articles/1163e0c4-e609-5cfa-aff1-b0945f1...
[+] [-] lolc|5 years ago|reply
It would have been more interesting to see the title flipping back and forth, because that would reveal how long those tests last. Less neat though.
[+] [-] asdfaoeu|5 years ago|reply
Edit: The only example of clickbait here is OP's title.
[+] [-] h00kwurm|5 years ago|reply
[+] [-] baking|5 years ago|reply
[+] [-] dang|5 years ago|reply
[+] [-] rosstex|5 years ago|reply
[+] [-] JasonFruit|5 years ago|reply
[+] [-] mxcrossr|5 years ago|reply
[+] [-] nkozyra|5 years ago|reply
They're very rarely "accurate" the way stories are intended to be for these reasons.
[+] [-] tootie|5 years ago|reply
[+] [-] refurb|5 years ago|reply
I might also argue that online news drives this more so than print - with print you tend to buy the paper based on the totality of their reporting and reputation versus a zero-cost click of a headline on your screen.
[+] [-] throw_m239339|5 years ago|reply
A journal is a business, with something to sell, the news, and the attention of their readers to advertisers as well.
> The New York Times is a big deal. As they tell their advertisers, the NYT is the #1 news source for young, rich thought leaders:
Obviously ultimately it reflects badly on the profession, since all these news sites are using the same clickbait techniques, from Breibart to the Dailymail to NYT, since they probably hire the same consultancies when it comes to clickbaiting design, or at the very least, people who come from the same marketing circles/education.
[+] [-] rkho|5 years ago|reply
[1]: https://www.theinformation.com/articles/meredith-levien-want...
[+] [-] afavour|5 years ago|reply
[+] [-] samdfonseca|5 years ago|reply
[+] [-] moscovium|5 years ago|reply
[+] [-] monch1962|5 years ago|reply
[+] [-] NetOpWibby|5 years ago|reply
Sensationalist junk.
[+] [-] justapassenger|5 years ago|reply
[+] [-] ftio|5 years ago|reply
[0] https://twitter.com/nyt_diff
[+] [-] TeeMassive|5 years ago|reply
[+] [-] drak0n1c|5 years ago|reply
https://github.com/DocNow/diffengine
[+] [-] exikyut|5 years ago|reply
With quite a few of the posts I'd just present them side by side, with the differences highlighted.
[+] [-] motohagiography|5 years ago|reply
The integrity issues come up when even though the facts reported are real, the conflict that frames them is manufactured - and this is why people reject news. They don't reject it because of fake facts, they reject it because of fake conflict. When I want the real news, I go to fringe websites, because they get the real conflict right, and if I need details and facts, I can look those up. The reason they get the real conflict right is because by definition the fringe lives in that conflict, and they are the real anti-establishment that creates a counter balance narrative, whereas a conflict produced by setting mundane events and facts against the backdrop of an ideology designed to manufacture conflict is unreadable tripe.
If I needed to sustain the dissonance of establishment narratives, I would read the NYTimes to keep up the appearances, but since my livelihood and aspirations do not depend on that, I have the freedom not to engage it. If you think you are being played and manipulated, watching these A/B tests should be enlightening.
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] TeeMassive|5 years ago|reply
Read local news.
In fact that gives me an idea, a local news aggregator which ignores redundant news and can find the most impactful ones.
[+] [-] clairity|5 years ago|reply
[+] [-] ARandomerDude|5 years ago|reply
For multiple counter-examples, see pretty much everything on https://www.msn.com/en-us/news/good-news
[+] [-] h00kwurm|5 years ago|reply
If the narrative is entirely that if we dont actively consider capturing interest, we’d be doddering and hard to track, if we do we’re abusive, then media is forever doomed to be unsatisfactory. We all hope to improve.
In the world of headlines, the “spiciness” that’s been advanced as a function of engagement hunting is something that’s currently contended with through human intervention. All headlines are human created and the outcomes of AB tests are more about improving the understanding between author/editor and captured audience than manipulation or future interest conditioning.
[disclosure: i lead ML platforms and the algo related eng products at NYT. all thoughts are my own presentation of what i have experienced, and not company opinions]
[+] [-] rossdavidh|5 years ago|reply
It would be nice if we had a news source with a feature (opt in) where the reader got an email quiz the next day, asking just a few questions about the facts in the article. The feedback loop would tell us which articles actually make the reader better informed.
[+] [-] Rastonbury|5 years ago|reply
[+] [-] freebuju|5 years ago|reply
If this level of desperation is actually seen as acceptable for a media house, then journalism or whatever is left of it, is in dire need of help.
[+] [-] ggm|5 years ago|reply
[+] [-] bArray|5 years ago|reply
A number of times I have seen on some place like Facebook where the initial article has some extreme headline, and then hours after when engagement is up, the headline is swapped to one that is less inflammatory. A few more extreme-perspective friends will send me an article saying "see?!?!" - and by the time I see it it's already been rewritten.
A few times I would click on an article going viral and find the headline in the article itself doesn't match the one cached by Facebook. Remember that most people aren't even reading the headlines and just assume that NYT are trustworthy. The first headline is the one they end up internalizing.
Bare in mind, there is zero consequence for doing this either. The newspaper and claim they were "correcting an editorial error", whilst openly spreading misinformation about some hot topic.
I think at the very least it should be mandatory to maintain a list of edits to an article once published - and to indicate to the reader clearly that the article has received a number of edits.
[+] [-] trident5000|5 years ago|reply
I find it comical that any write-up critical of a left wing organization needs to include something like this to avoid being flagged, down-voted, whatever, on the platform its being posted on.
[+] [-] iujjkfjdkkdkf|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] sodality2|5 years ago|reply
[+] [-] Waterluvian|5 years ago|reply
Whenever I'd load the website, sometimes all the headlines would suddenly switch to different wordings, once sufficiently loaded.
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] minimaxir|5 years ago|reply
As frequent Hacker News submitters know, headlines alone can determine whether a post gets upvoted.
[+] [-] reilly3000|5 years ago|reply
[+] [-] Hippocrates|5 years ago|reply
[+] [-] exikyut|5 years ago|reply
Service that scrapes websites known to A/B test things, caches the A/B test content, and presents all of the variants at once.
In this context, yes, I want all the headlines. Maybe they could be presented as a diff. With timestamps!
[+] [-] unethical_ban|5 years ago|reply
I saw one the other day from Nick Kristof: "America is not designed for the anatomically correct". It was about the need for public restrooms, but the headline didn't make me read it. It sounded goofy and, other than knowing Kristof writes decent articles, I passed it over.
Later, the headline was "America isn't designed for those who pee". A bit coarse, but very much more clear as to what the topic of the editorial was.
[+] [-] benlumen|5 years ago|reply
[+] [-] wp381640|5 years ago|reply