The Surprising Path to a Faster NYTimes.com

[+] saturdaysaint|11 years ago|reply

Improvements like that at Nytimes.com have had me happily paying for my daily news for the first time. The user experience is just orders of magnitude better than anything I've found that's available for free. Almost reminds me of when Google maps came out and completely changed expectations for a map site. The iOS app also impressed me: it's great at caching stories, so even in crappy network areas it feels like a "broadband" newsreading experience (kind of amazing how long it took for a news app to accomplish this).

[+] Curmudgel|11 years ago|reply

I find the user experience on the new site to be much worse than the new site. When I'm reading an article I really don't care about navigating to the rest of the site. I have a small widescreen laptop, which means that the fixed headers at the top give me less room to read the article. I don't need gobs of white space to read the news article in print, so why is white space so important on the web that navigation has to be hidden in one of those awful hamburger buttons? The text is harder to read because it has lower contrast. The new comment system is totally unusable. I don't really appreciate large, useless images interrupting the flow of the article. And I don't need Javascript to read a newspaper article (the site loads like a pig with Javascript) because I'm going to be navigating to a different page when I read another article. It's faster to just open the home page and open each article in a new tab (with JS disabled).

[+] nanoscopic|11 years ago|reply

The slides say there are 1 million pages, and "republishing" them would take 90 days. Maths -> 7.8 seconds on average to "republish" a page. Modern templates systems can convert a page constructed out of structured data in less than 1/4 of a second. ( and that is a high estimate ) That is ~30 times faster, meaning all pages could be "republished" in 3 days instead of 90 had a more efficient system been used from the start.

Focusing on shifting all "rendering" into front end JS seems like it will lead to more difficulty in the long run instead of using a more efficient structured page creation mechanism.

I am curious how the static pages were created. Others here are speculating that templating was not done. If not; what does "republishing" mean exactly?

[+] marbletiles|11 years ago|reply

Their source material here is 1 million pages of HTML, they don't have (on my reading) some separate source of "structured data" for the modern template system to use.

It seems reasonable (and possibly low) for a 90 days estimate to extract the content from the variety of versions of static page, structure it, and then publish it in a more modern fashion.

It's all very well to say they should have used a more efficient system from the start, but "the start" in this case is 1996, which is the wild west in terms of best practices.

[+] eitanmk|11 years ago|reply

As a disclaimer, this falls under the "Back then" section of the talk which is an overview of how things used to work. We no longer do things this way.

Publishing a page is running content from a CMS through a templating system. However, the time spent executing templates isn't the only factor in the duration of the "publish". The slides refer to a compilation step (which actually also included a preprocessor step), and includes delegating to a service to copy the resulting page to disk and ensure the write succeeds for all data centers. For data consistency and system monitoring, we essentially treat that entire process as an atomic action and wait for all parts to finish. Additionally, since "publishing" is a core process for us, we avoid doing massive publishes that might risk the systems involved in the successful publishing of current articles. So increasing the number of these for the sake of pushing code is considered too risky. Yes, there are ways of mitigating that risk, but dealing with this legacy problem once and for all is a better path forward than scaling up this solution.

[+] ben336|11 years ago|reply

"From the beginning" in this case going back to the mid 90s. Pretty sure "Modern template systems" have mostly been written since then.

[+] Zizzle|11 years ago|reply

It also seems like a task with inherent parallelism.

Upload the corpus and throw a bunch of cloud instances at it.

[+] hyperpape|11 years ago|reply

The point about elements on the page shifting around is huge. I've personally dreamed of a browser change that would make reflows that are out of the current browser viewport not change your position on the page. If you have a slow connection or view certain types of content (liveblogs, etc) this can become a huge pain.

Just thinking about it raises all sorts of questions about whether the browser/rendering engine can actually reliably know that information, but it doesn't mean I can't dream.

[+] malyk|11 years ago|reply

If you have fixed ad sizes and known locations you could make those spaces empty boxes of the correct size and then asynchronously fill them with data. I haven't tried it, but it seems like it should work.

[+] stdbrouw|11 years ago|reply

I can imagine static pages getting really annoying at this scale, and it also seems like a no-brainer to have your content in a database... but the nerd in me did think "page rendering can be trivially parallelized – why not throw some map/reduce at it?"

[+] jamessantiago|11 years ago|reply

I've been really impressed with the quality of new york times posts as of late. The post "Norway the Slow Way" posted here a few days back was impressive in its use of a variety of frontend display techniques to tell a single story. Even their web console output had some neat ascii art and a hiring call to interested developers.

[+] gizzlon|11 years ago|reply

http://www.nytimes.com/interactive/2014/09/19/travel/reif-la...

[+] ljosa|11 years ago|reply

I just found that annoying. I simply wanted to read the text.

[+] eitanmk|11 years ago|reply

Hello Hacker News.

I'd first like to say that this is the deck from a presentation at Velocity NY last week. Like most other talks, separating the slides from the presenter can make interpreting the context difficult. I did try to make an effort to have my slides provide useful information without me presenting them, but I acknowledge that I may not have done enough in that regard. I also received feedback from people present that there were too many bullet points and my font was too small. Can't please everyone I guess. But if you have a link to what you consider the "perfect" slide deck where unambiguous context is maintained without video of the talk, I'd love to study it in order to improve.

Other replies will be directed at the specific comment thread.

[+] x110dc|11 years ago|reply

Is there a video of your presentation? I'm interested in watching it.

[+] DanielBMarkham|11 years ago|reply

Static pages are a barrier to scaling if you have a bunch of other stuff tied in with them Stuff like HTML macros, CSS, and so forth.

My physical NYT copy from 1980 is fine. It was "published" this way, and it stays this way.

What we're really saying is that if you want to go the static route, you can't go half-way: everything that _is_ the page gets deployed in one file. I doubt very many people who think they have static pages actually do.

[+] andreasvc|11 years ago|reply

What is WPO?

[+] hawtshot|11 years ago|reply

Web Performance Optimization

[+] gulbrandr|11 years ago|reply

I was wondering the same thing, going back in the previous slides, trying to find the definition of it.

[+] vkb|11 years ago|reply

What really strikes me here, aside from the technical aspects, is the note on p. 21 about how the project was supported from the top because SEO was lagging as a result of site load time, and this line especially: "NYT became an e-commerce site since the last redesign."

Once you are focusing on e-commerce and SEO as an executive team, are you still committed to journalism?

[+] acdha|11 years ago|reply

If you're selling subscriptions, wouldn't that leave you more committed to the journalistic quality readers want rather than letting advertisers dominate that discussion?

[+] eitanmk|11 years ago|reply

Why must it be either-or? The web is another medium for the journalism. There's no reason to assume performance was of greater importance than our core mission. The point of that slide is to explain why this redesign had a performance goal at all. Perhaps the slides don't explain this point well, but I think you're reading too much into it.

[+] akgerber|11 years ago|reply

Perhaps off-topic, but recently it's appeared that NYT pages have had some sort of JS memory leak when left open for a long time in Chrome.

[+] ck2|11 years ago|reply

They were keeping a million static pages on disk without any templating?

Whoa.

[+] unknown|11 years ago|reply

[deleted]

[+] unknown|11 years ago|reply

[deleted]

[+] untilHellbanned|11 years ago|reply

What was so surprising about the path? Wasn't clear from the deck.

[+] jrochkind1|11 years ago|reply

That they needed to step away from "everything possible async loaded" in order to avoid having the page move around in front of the user's eyes, and that this resulted in 'objectively' slower page loads (more time to DOMReady), but an actually increased perception of performance and load speed from users.

At least I think that's what they were saying. Always challenging to deal with a slide deck meant to be presented by a human, but without the human doing the presentation.

[+] sp332|11 years ago|reply

Really? Each surprise gets a whole slide in bold text all to itself. #1. A lot of static pages are a barrier to optimization. #2. Performance increase demanded as part of redesign. #3. Sometime you have to slow down to seem faster.

#1 really did surprise me, because I had always assumed that serving static pages would be really fast. I guess I never thought about sites with millions of pages.

33 comments