top | item 5212879

When Google got flu wrong

29 points| ananyob | 13 years ago |nature.com | reply

12 comments

order
[+] calinet6|13 years ago|reply
Here's the actual graph image of the last 3 years:

http://www.nature.com/polopoly_fs/7.8976.1360689365!/image/F...

The data is there, it's correct, it accurately measures how much people are talking about the flu. The fact that it's lined up with past flu seasons is simply a good sign of correlation in the past.

The area between the inflated google trend and the real number of cases is the amount of hype. It's been talked about in the media and online like it was an epidemic of 10 times the size that it actually was, and this likely had a positive effect on vaccination rates and conscientiousness.

The google data is still extremely useful as a measure of our collective attention, but the article really fails to give it credit and seems to think that it's failed somehow. It could be further refined, sure, but it's still extremely useful and extremely true. It shows that this year, the flu went viral, and the attention was amplified.

Maybe we need a Google Flu Tracker Tracker? (http://3.bp.blogspot.com/_Otk-knCm-nw/ShT1qAUsWiI/AAAAAAAAAQ...)

[+] davidjhall|13 years ago|reply
I am in Connecticut and there was an article on gizmodo[1] that said Connecticut was the safest place in the US due to Google's analysis. When everyone then started to search for "flu connecticut" on Google, we became the worst! To me, that showed the flaw of using google searches to determine flu outbreaks

[1]http://gizmodo.com/5974671/you-will-not-be-able-to-escape-th...

[+] ChuckMcM|13 years ago|reply
This is exactly right, it only works when it isn't observed by the people its trying to observe. I had this discussion with one of the project managers when it came out and pointed out the flaw, if you let people know, they could search for it and then that would mess up your graph. They believed they had a good handle on differentiating between people looking for flu information and people looking for Google Trends flu information. Apparently it still needs some work :-)
[+] 127001brewer|13 years ago|reply
Why wouldn't Google use your physical location (determined by your IP Address) to graph trends instead of using, as an example, the "flu connecticut" search term? (Granted, determining your physical location by IP Address is not always accurate.)
[+] jstalin|13 years ago|reply
...or maybe more people are searching for flu information than are going to the doctor to treat it. I had the flu and didn't go to a doctor because I know there's little than can be done.
[+] CodeCube|13 years ago|reply
This is a great point here. I got pretty sick in January, I assume it was the flu ... but I didn't go to the doctor because I know there's nothing they can really do. I just got my rest, kept hydrated, and tried to stay as comfortable as possible with mucinex, advil cold & flu, etc. and in the end, got better.
[+] greenyoda|13 years ago|reply
And a lot of people who are searching may not have the flu at all. It's possible that they have a bad cold and are trying to determine whether they have a cold or the flu by searching for information on flu symptoms. It's also possible that they just want to be prepared in case they do get the flu.
[+] jholman|13 years ago|reply
I think this article discusses the issue fairly, but only if you read what it actually says, instead of just reading one or two sentences and guessing the rest.

Google doesn't claim that GFT replaces epidemiologists, and epidemiologists agree. And the article didn't say otherwise, if you actually read it.

Google, and the CDC, and the article, agree that Google returns data much faster than the CDC does, which is a service of some value. Similarly, Google can sometimes give finer-grained geographic resolution.

Everyone agrees that it would be nice to know about flu trends in countries that don't have good epidemiological analysis. GFT helps with that.

The article isn't saying that GFT is useless, nor spreading FUD, nor is it anti-strange-new-technology. Nor are the epidemiologists saying that. The article is pointing out that GFT this year didn't do as well as it has most years since its inception, and that the modest-but-proud claims of the GFT team are pretty much exactly proportional to GFT's efficacy, rather than being false modesty.

Note that there are other Google Trends published, many with similar benefits and faults. For example, Google's guesses about unemployment rates are released in real time, but are presumably accurate than government reports. That's a tradeoff.

tl;dr: GFT is useful but not miraculous, and if you read the article carefully, the article says that.

[+] Jabbles|13 years ago|reply
Meh. A bit of a weak article IMO. It just seems to be a bit biased against this "strange new technology".

The main (and interesting) point is that heavy media coverage of flu caused people who weren't ill to search for it, which Google's algorithm misinterpreted.

[+] ananyob|13 years ago|reply
Bit unfair. The article repeatedly says there are promising techniques for crowd-sourcing etc and that Google Flu Trends has been quite accurate in the past. Also says that a few methods have entered the mainstream. It's a rather good overview of the efforts for tracking flu IMHO.
[+] RyanMcGreal|13 years ago|reply
It seems the slightly increased virulence of the flu this winter triggered an availability cascade that disproportionately increased searches.
[+] danso|13 years ago|reply
This seems a bit FUD, but to be fair, not much more so than the hype given to projects that claim to find insights via Tweet sentiment or your Facebook friend network.

I think traditional researchers should scrutinize new-tech methods applied by Google and others, as their domain expertise is valuable in finding mistakes/discounting assumptions by an algorithm.

But -- and I don't speak from expertise -- I'm thinking that the data that traditional researchers use for these kinds of assessments, has usually been very structured and dependent on the reliability and frequency of official reports. Google and machine learning brings a whole new capability of interpreting unstructured, seemingly unrelated data, that may consist of a lot of noise, but also contains insights that were otherwise impossible to get through the traditional research and data collecting process.