top | item 7896175

Analyzing IMDB Data on 90,000 TV Series

23 points| dfkoz | 11 years ago |dfkoz.tumblr.com | reply

5 comments

order
[+] jonnathanson|11 years ago|reply
Some very interesting analysis, and in some ways, it raises even more interesting questions about the sampling at IMDB. Who is filling out IMDB reviews and ratings? How many segments are there among the raters, and to what extent are they truly representative of the general, TV-watching population?

Looking at the list, for instance, you see that Dragon Ball Z shows up as one of the highest-rated shows of its decade. Now, perhaps a truly randomized, representative sample of the population really would rank Dragon Ball Z as one of the greatest TV series ever made. But my gut tells me that's unlikely. More likely is that a big cluster of fans of the show have self-selected into rating it highly. What you're actually seeing, in this case, is the power of a very vocal, very rabid niche. [1]

This same effect, in reverse, can artificially deflate the ratings of shows and movies that a vocal minority of the population rallies against. This appears to be the case with Gunday, one of the lowest-rated movies on the site, which David Goldenberg of FiveThirtyEight discusses in a very interesting post: http://fivethirtyeight.com/features/the-story-behind-the-wor...

Finally, we need to filter the data set for size. Ice TV, a show from 1996 with all of 6 ratings on its IMDB page, probably wouldn't pass our gut check for "Best of the 90s," alongside shows like The Sopranos and Friends, which probably got ratings more or less representative of public opinion.

This is a fascinating subject and a really compelling start. But the next step in this sort of analysis is developing a reasonable methodology for parsing and reconciling quality, quantity, and clusters. To give the author due credit, he expresses the appropriate amount of skepticism or reservation when showing results. And his observations about genre and overall trends are fascinating to see.

[1] Of course, this raises some intriguing questions about what, exactly, a "quality" show is. A show that has a large, thriving fanbase is a great show to that fanbase. And to the extent it's attracted a sizable, rabid following, it can probably be said to be a great show in general -- even if it's not everyone's cup of tea. And that's probably fine. Obviously not every show is going to appeal to everyone in the population. Nevertheless, strong-niche shows confound our idea of "generally" great shows and make normative comparisons very hard to do, unless we start factoring for things like strength of sentiment.

[+] BenderV|11 years ago|reply
"Using a simple scraper written in Ruby, I was able to grab data on the >90,000 TV shows in the IMDB database"

I know it may be a bit more hard to use than a crawler (sadly), but IMDB give their dataset for free (under conditions) : http://www.imdb.com/interfaces

(imdb stricly forbit crawler)

edit : nice analysis ! thanks !

[+] yaeger|11 years ago|reply
True, but I think the rational behind that is that imdb doesn't want someone creating a software that gets used by a lot of people which implements such a crawler.

I doubt although I can't say for certain, that imdb much cares about someone who made a scraper to gather data for a one time aggregation like this.

Of course, for a use case like OP has, the static data imdb offers would also have sufficed, I'd say. Of course, that data is not as "up to date" as the data you can scrape off of the web pages, but it would have been sufficiently "up to date" for such an analysis, I think.

[+] stephenaturner|11 years ago|reply
Interesting. Though as mentioned the ratings of shows are largely skewed by time -- ie: because no one was reviewing shows from the 50s-80s at the time of airing, it's all viewed through nostalgia and history, so it skews to certain shows and avoids even reviewing the dreck.

When you get into the 90s and beyond, shows were being viewed and reviewed contemporaneously and therefore everything was covered and a greater range was covered so overall ratings for the period actually went down...

Nice analysis of the available data anyway. Also interesting to see the appearance of certain non-English language shows as well.

[+] gggggggg|11 years ago|reply
What happened to the west wing.....

that aside, its hard to compare years with such a service, as hard core fans come onboard to imdb, this will impact results of new shows. As he says in the link, its hard to compare to shows on years ago.