top | item 9530766

(no title)

netnichols | 10 years ago

They probably don't care about that content.

My first guess would be that they snapshot the DOM in the JS tick immediately after window.onload completes. Maybe they have a short pause to let any fast timeouts or callbacks complete, but there's got to be a cutoff at some point (e.g. to stop an infinite wait for pages that continuously update a relative date). Of course, with their own JS engine, I bet they can get really fancy with the heuristics to determine when to take that snapshot.

discuss

order

KMag|10 years ago

Actually, we did care about this content. I'm not at liberty to explain the details, but we did execute setTimeouts up to some time limit.

If they're smart, they actually make the exact timeout a function of a HMAC of the loaded source, to make it very difficult to experiment around, find the exact limits, and fool the indexing system. Back in 2010, it was still a fixed time limit.

Source: executing JavaScript in Google's indexing pipeline was my job from 2006 to 2010.

blumkvist|10 years ago

What about AJAX? Does it load/read/index data after the fact?