top | item 2265790

Steve Huffman on Lessons Learned at Reddit

100 points| natsel | 15 years ago |thinkvitamin.com

24 comments

order
[+] shazow|15 years ago|reply
I'm always reluctant about going completely schemaless. Reminds me of the blog post by FriendFeed about how they use MySQL, highly recommended: http://bret.appspot.com/entry/how-friendfeed-uses-mysql

I feel like like there needs to be a better middleground for having some schema but being able to augment it easily with metadata that you're not querying against (yet). Then later extracting the metadata into queryable columns.

I wrote a post outlining some ideas of how to do this: https://github.com/shazow/everything/blob/master/idea/arbitr...

I've only implemented bits and pieces of this in practice, huge convenience so far.

[+] joshu|15 years ago|reply
I think this is exactly right. Have a document store and a (separate?) index store.
[+] iamclovin|15 years ago|reply
James Golick implemented this idea in Ruby last year: https://github.com/jamesgolick/friendly

We used Friendly at my previous company wego.com and t worked quite well although I don't know shehar they at still using it. (Friendly doesnt look like it's in activ development)

[+] runningdogx|15 years ago|reply
I don't understand the fetish with caching every variation for every user.

Here's a thought: dynamic apps should render a page exactly one way in html, and rely on javascript and cookies to post-process the site so it APPEARS to be customized for that particular user. That includes admin widgets.

That particularly applies for displaying how many minutes ago something was generated. Serve it with a date-time format, use javascript to post-process into "x seconds ago" or "x minutes ago".

If someone isn't logged in, or isn't an admin, and they hack their javascript to display user or admin stuff, who cares? The user/admin requests sent to the webserver won't succeed anyway, because they rely on having an admin session cookie.

If there's meaningful rather than just UI stuff that users or admins get to see, then you have to cache that separately, but you can even do stuff like loading it dynamically with js so the publicly visible (cached) content can still be used, and you cut down the amount of stuff your server has to auto-generate. It can cache the separate pages (xml, json, whatever) that serve the logged-in user content, as well.

[+] apu|15 years ago|reply
Note: the talk is from May 2010.

I wonder if Steve (or rather, jedberg or someone else at reddit) were to give the talk today, if 'memcache' and 'memcachedb' would both be replaced by Redis?

[+] jedberg|15 years ago|reply
We replaced memcachedb with Cassandra a while ago, because memcachedb pretty much hit a wall at some point.

As for replacing memcached, I'm certainly open to it, but from what I've read, the performance of memcache for what we use it for is better than redis.

[+] spez|15 years ago|reply
Maybe? There's something to be said for memcache's simplicity, however.
[+] p90x|15 years ago|reply
another lesson could be: "don't make a site that appeals to people who use ad-block. revenue won't keep up with demand."
[+] patrickod|15 years ago|reply
many users (myself included) add an exception to adblock for reddit. I don't mind helping their ad revenue when the ads aren't that intrusive.
[+] code_duck|15 years ago|reply
That's a good basic overview on what it takes to keep a site the size of Reddit (a year ago) afloat. I need to think more about caching, personally.

Reddit has done a great job of serving a massive amount of traffic, given the size of their staff especially.

[+] dwc|15 years ago|reply
Watching the video made me feel a little uncomfortable. I come away with the impression that they almost, but not quite, really understood the important lessons.

Still, it was well worth watching and I'm glad Huffman decided to go there.

[+] natsel|15 years ago|reply
Why did they use Python in the first place? Reddit is still kind of unstable.
[+] simonw|15 years ago|reply
Are you trolling? Site stability issues very rarely have anything to do with the underlying programming language, unless you're using some experimental language that no one else is using for web development (and even then, Arc seems to be working pretty well for HN).
[+] Pewpewarrows|15 years ago|reply
It's half a dozen employees constantly trying to keep up with what is now a 1 billion page views a month site. From what I understand they're carefully trying to balance getting new hires (which Conde Nast now lets them do) and keep the damn ship afloat.
[+] pjscott|15 years ago|reply
In the very first place, actually, they wrote the initial version in Lisp. The Python code was a later rewrite.
[+] curtis|15 years ago|reply
Reddit was stable for a long time. It's really only been the last few months (~6?) that they've been suffering serious service outages.
[+] irahul|15 years ago|reply
The site is unstable doesn't mean Python is to blame. Perhaps you can elaborate why do you think so.
[+] random42|15 years ago|reply
I dont think reddit's stability is as bad as it looks. Just that Reddit's community is pretty vocal on it. IIRC, they have availability in high 90s (may be jedberg can confirm).