"A few decades ago [...] if you'd wanted to use a hash table, if you even knew what a hash table was, you'd have to write your own."
I agree with you on the quality of the BSD code, and I'm glad such great code is readily available. But a) I definitely had been programming before 1990 (the copyright on that BSD code), b) back then hash tables were far less tightly integrated with programming languages than they are today and fewer people knew about them, and c) if you want to be pedantic Hash Tables have been around since 1953, so way before most programming languages that are still in use today. http://en.wikipedia.org/wiki/Hash_table#History - they're however much more commonly understood, and in ubiquitous use today!
'And if you'd wanted to use a hash table, if you even knew what a hash table was, you'd have to write your own.'
And just for the record, Common Lisp had hash tables since 1984 (and I guess Maclisp had them before that), but earlier lisp dialects had things like plists and alists.
This post seems to miss the point that the major hurdle faced by a 14-year old trying to learn how to program and find the right libraries etc. to use is solved by Google itself.
I thought Google's real innovation was their technique of using the interconnectedness of the web to determine the true value of content. So rather than only looking at the content of a page, they also look at the content from incoming links to that page. What package out there implements the algorithms for this, and is well-documented and trivial enough to use that a 14-year-old can understand them?
As far as I can tell, this article says 1) Shucks, hardware sure is cheap these days! and 2) There sure is a lot of software out there that you can mash together! Those things make it easier to start a company, but they don't provide the essential insights that make that company truly revolutionary.
I don't think the point is that the breakthrough idea of today is within the means of some real fourteen-year-old. The breakthrough idea of today is something that today's concepts, economics, and best practices are NOT well-suited to handle; otherwise it wouldn't be much of a breakthrough. The amazing thing is how quickly something has gone from the realm of obsessed genius to the realm of the mundane. It goes back to Whitehead's observation that, "Civilization advances by extending the number of important operations which we can perform without thinking of them."
"Without thinking" is an exaggeration for some of the items in the post, but consider the problem of storing 200GB of data. "Um... on a hard drive?" "And how will you finance that?" "Gee, maybe with the money in my wallet right now? When do these questions get hard?" Shucks, hardware sure is cheap these days! Problems simply disappear from being challenges to not requiring any thought at all. The exponential increase in the power of affordable hardware may not be surprising, but to me it seems worth thinking about even though it's been normal and predictable my whole life.
I've said this before, I'll try to sum it up as succinctly as possible:
Google's innovation was 3-fold: better search algorithms (pagerank), which did use the implicit data from the interconnectedness of the web to judge the relevancy and rank of search results; revolutionary data center ops (using commodity hardware with heavy reliance on automation); and state of the art software engineering (sharding, map reduce, etc.) The last 2 enabled the first to run efficiently on a rather small set of hardware and to scale up speed just by adding more hardware. The end result was better results, delivered faster, and at lower cost to google.
This led to a much better product for the end users (better/faster) and allowed them to acquire a huge portion of search marketshare quickly. But the low cost of operations meant that they could better take advantage of advertising (lower cost per search means that even lower revenue per search can be profitable).
What package out there implements the algorithms for this, and is well-documented and trivial enough to use that a 14-year-old can understand them?
Nutch[1].
Nutch doesn't deal with modern web spam particularly well, but I'd say it matched early Google pretty well. Specifically, it implements Page Rank, has a reliable web crawler and a web-scale data store.
even if you had had the same brilliant insights into the graph structure of the web when they did, you most likely would have failed because it was prohibitively expensive (the cost in the article is probably underestimated by orders of magnitude). it's simply a fact that:
1) getting the data, 2) computing the eigenvector of a large matrix, 3) and serving that data to users, wasn't cheap in 1998. it's comparatively dirt cheap today.
not to diss larry and sergey's impressive achievement - they were brilliant and they pulled it off - but i think back then game was so costly that a lot of brilliant people never made it to the starting line. it's cool to see that it's become a much more level playing field now. i'm curious what cool stuff we missed out on because of people who didn't make it to the starting line!
By the end of 1998, Google had an index of about 60 million pages
Sounds like a marvelous challenge. Anyone have other similar "technological frontier then, high-school science fair project now" type challenges? OPer notes BioCurious as one. A major factor in education is walking kids thru a subject from basic principles to state-of-the-art, recreating historical milestones along the way.
Content publishing: Weekend project. Rails, memcached and CloudFront and you're done.
IM and Buddy Lists: 1.5 million simultaneous users doing n^2 pub/sub-type distributed transactions.
Mail: 4,000 emails per second with live unsend and recipient read/unread status. I think PostgreSQL tops out in the millions of rows per second nowadays.
Web caching/acceleration: pick your favorite proxy solution and configure it.
Single sign-on: Form strategic partn-- Hey, you said technical challenge, not political.
For extremely contrived definitions of "1998's Google" yes. But if all it took was a pile of servers and hard-drives for 1998's Google to succeed then a lot more other companies would have done so as well. It takes more than that to build a company.
I was writing this more in the sense that kids at BioCurious (and the DIY Bio Movement in general) are doing electrophoresis to transfer DNA from glowing jellyfish to bacteria. This is just a few (two?) years after someone got a Nobel prize for that.
That's progress. If stuff that used to be hard falls into kids hands, you're gonna see impressive stuff happening.
However I fully agree that it takes more than that to build a company (Also I wouldn't try to compete with 2012 Google using 1998 technology)
I think the article was more about "Google the search technology" rather than "Google the company". It wasn't about startups or entrepreneurship but rather about technological progress.
Google's value doesn't come so much from search any more (it's good at it, though there are now grumblings from the Googluminati), but from its advertising network (and the concomitant connections and contracts associated with it), and the value-added services built on top of Google's underlying search technology, to the extent that those leverage Google's base tools and/or expertise.
The chinks in Google's armor are starting to show though:
- Cheap and/or federated search is now available.
- OpenStreetMap is providing mapping data (and APIs) to rival Google Maps.
- There's a lot of grumbling going on over privacy especially in the social and mobile spaces. Neither has quite fully coalesced, but if you look at the volatility in both spaces (consider what the largest social network and most popular smartphones were 5 years ago vs. today), things could again change quickly.
- Most tellingly, trust in Google to "not be evil" is eroding, rapidly in some quarters.
Google is valuable -- because it dominates advertising, and has the users to monetize that. Chip away at the user base and it could find its hegemony starting to fail.
The fact that it's very, very cheap to replicate Google's underlying tech helps with this. DuckDuckGo is essentially a one-man shop. Yes, it has a very small fraction of Google's traffic, but it compares favorably with everyone else who's tackling Google, including Micorosft's Bing, with ... more than one man equivalent last I checked.
A pile of servers and a special algorithm. Now that the algorithm is published, rather than yet-to-be-invented, it would be very possible. So "Dad's credit card and a few late nights reading papers".
I think the heart of Google (at least at the get-go) was PageRank. Sure you had to write a web crawler, but that wasn't the magic sauce that made Google's search so good. I don't think most 14 year olds could understand the math behind PageRank, much less derive it from scratch.
I'm not sure whether this is applicable but my main objection with this article is that the numbers don't add up. How many Ph.D. candidates do you know who are granted a budget of $10k+ to do their research? Surely something else must have been going on to shrink the expenses to a more acceptable amount.
Then again, according to the wikipedia page the original BackRub was conceived when the web was only 10 million pages large, $2000 is considerably more acceptable for a Ph.D. project.
"The SDLP is notable in the history of Google as a primary sources of funding for Lawrence Page's and Sergey Brin (Brin was also supported by a NSF Graduate Research Fellowship) during the period they developed the precursors and initial versions of the Google search engine prior to the incorporation of Google as a private entity"
This included a $4,516,573 NSF grant (that didn't go to Larry & Sergey in full, but probably helped their project's infrastructure quite a bit).
On the expense side I've probably actually underestimated the expenses by orders of magnitude. Bandwidth wasn't cheap back then and the storage requirements probably were significantly higher.
tl;dr version: Computers and disks are a lot cheaper now.
Basically the article boils down to this, what counted as a 'cluster' in 1998 is a single system in 2008, what used to take hundreds of disk drives to store, you can store on 1 today.
Not particularly deep, but useful to think about from time to time. There is a quote, perhaps apocryphal, which says
"There are two ways to solve a problem that would take 1000 computers 10 years to solve. One is to buy a 1000 computers and start crunching the numbers, the other is party for 9 years, use as much of the money as you need to buy the best computer you can at the end of the 9th year, and compute the answer in one day."
The idea that computers get more powerful every year, and that in 10 years they will be more than 1000x more powerful than the ones you would have started with so one can solve the same problem.
Of course they haven't been getting as powerful as quickly as they once were, but the amount of data you can store per disk has continued to outperform.
The point is that if you are designing for the long haul (say 10 yrs from now) you can probably assume a much more powerful compute base and a lot more data storage.
That's not even close to what he's saying - I thought that was actually a rhetorical weakness, to tell the truth.
What he's saying is that the existence of the cloud and library advances such as MapReduce and APIs mean that the bar is lowered, when writing new software, to an extent it's hard even to comprehend.
Every time I get a module from CPAN I still get a shiver down my spine, remembering trying to do new and interesting things in the 80's and early 90's and every single time ending up trying to build a lathe to build a grinder to grind a chisel to hack out my reinvented wheel.
> what used to take hundreds of disk drives to store, you can store on 1 today
Though, given that hard drives very much do not obey Moore's Law, a well-designed 1998 solution with hundreds of disks may well have far faster IO than the 2012 one-disk solution.
Where does to 200Gb figure come from? I was quite busy building a web crawler too at the time and I can distinctly remember that our crawlers had about 17Tb of storage. So let's say we had crawled something like 15Tb of data to get a meaningful sample of the web.
If you think a 14 year old could build something as complicated as 1998's google.com, think of what an adult with training could do at the same time with the same resources. As technology advances, so do our expectations.
The author makes a great point about technology advancing so quickly that the bleeding edge of just yesterday is now just cute compared what we have now and about how cheap of a commodity server hardware has now become.
Unfortunately he had to use the 14 year old girl analogy and exaggerate the ease with we could build Google circa '98 today. Now his whole point is lost to click clacking of a thousand pedants' keyboards. Guys, this isn't about 14 year old girls nor is it about Google per se as much as it is about the fast pace of tech innovation, the ease and costs associated with acquiring infrastructure, and to a lesser extent there's a tiny but about how we're totally spoiled compared to what we had to work with 14 years ago.
The stuff about Google and 14 year old girls is just a literary tool (along with some mild hyperbole) to help illustrate his point which so far is getting completely missed. Come on guys, is this Hacker News or Pedantic Literary Scholar News? Focus on the point, not little Google girls. PLSN does have a nice ring to it but no, we're not on PSLN. At least not yet.
So, just a gripe about your startup plug at the end of the article.
Look, I don't care whether your product cures cancer, dispenses oral sexual favors, and mints pure gold dubloons-- I will not give you my email address without a damned good reason.
Every single goddamn link on your page brings me to a "Enter your email here" prompt, except for the company tab, which brings me instead to a pile of vapid marketing bullshit.
Flotype Inc. is a venture-backed company building a suite of enterprise technology for real-time messaging. Flotype takes a unique approach by building developer-friendly technologies focused on ease-of-use and simplicity, while still exceeding enterprise-grade performance expectations.
Flotype licenses enterprise-grade middleware, Bridge, to customers ranging from social web and software enterprises to financial and fleet management groups.
What does that even mean? You using carrier pidgins? Dwarves? Cyborgs? UDP? ZeroMQ? Smoke signals?
You don't even tell me how my email is going to be used.
^ This kind of post just drags HN down, and is the kind of thing that jacquesm was talking about. Seeing something this rude at the top of HN for a post this guy worked hard on is probably not what he expected, and made his lunch taste a little worse today.
There's a time and place for profanity/verbal hostility. Feedback to a stranger on website UX isn't it, the perceived intensity level and level of anger is just dialed wrong. I wish pg would implement a filter for this kind of comment.
[+] [-] wickedchicken|14 years ago|reply
BSD's hash table code has been around since probably longer than the author has been alive.
Here is the FreeBSD version, it's very compact and works quite well: http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/db/hash/h...
[+] [-] enki|14 years ago|reply
"A few decades ago [...] if you'd wanted to use a hash table, if you even knew what a hash table was, you'd have to write your own."
I agree with you on the quality of the BSD code, and I'm glad such great code is readily available. But a) I definitely had been programming before 1990 (the copyright on that BSD code), b) back then hash tables were far less tightly integrated with programming languages than they are today and fewer people knew about them, and c) if you want to be pedantic Hash Tables have been around since 1953, so way before most programming languages that are still in use today. http://en.wikipedia.org/wiki/Hash_table#History - they're however much more commonly understood, and in ubiquitous use today!
[+] [-] kroger|14 years ago|reply
And just for the record, Common Lisp had hash tables since 1984 (and I guess Maclisp had them before that), but earlier lisp dialects had things like plists and alists.
[+] [-] leif|14 years ago|reply
[+] [-] enki|14 years ago|reply
[+] [-] notJim|14 years ago|reply
As far as I can tell, this article says 1) Shucks, hardware sure is cheap these days! and 2) There sure is a lot of software out there that you can mash together! Those things make it easier to start a company, but they don't provide the essential insights that make that company truly revolutionary.
[+] [-] dkarl|14 years ago|reply
"Without thinking" is an exaggeration for some of the items in the post, but consider the problem of storing 200GB of data. "Um... on a hard drive?" "And how will you finance that?" "Gee, maybe with the money in my wallet right now? When do these questions get hard?" Shucks, hardware sure is cheap these days! Problems simply disappear from being challenges to not requiring any thought at all. The exponential increase in the power of affordable hardware may not be surprising, but to me it seems worth thinking about even though it's been normal and predictable my whole life.
[+] [-] InclinedPlane|14 years ago|reply
Google's innovation was 3-fold: better search algorithms (pagerank), which did use the implicit data from the interconnectedness of the web to judge the relevancy and rank of search results; revolutionary data center ops (using commodity hardware with heavy reliance on automation); and state of the art software engineering (sharding, map reduce, etc.) The last 2 enabled the first to run efficiently on a rather small set of hardware and to scale up speed just by adding more hardware. The end result was better results, delivered faster, and at lower cost to google.
This led to a much better product for the end users (better/faster) and allowed them to acquire a huge portion of search marketshare quickly. But the low cost of operations meant that they could better take advantage of advertising (lower cost per search means that even lower revenue per search can be profitable).
[+] [-] nl|14 years ago|reply
Nutch[1].
Nutch doesn't deal with modern web spam particularly well, but I'd say it matched early Google pretty well. Specifically, it implements Page Rank, has a reliable web crawler and a web-scale data store.
[1] http://nutch.apache.org/about.html
[+] [-] enki|14 years ago|reply
1) getting the data, 2) computing the eigenvector of a large matrix, 3) and serving that data to users, wasn't cheap in 1998. it's comparatively dirt cheap today.
not to diss larry and sergey's impressive achievement - they were brilliant and they pulled it off - but i think back then game was so costly that a lot of brilliant people never made it to the starting line. it's cool to see that it's become a much more level playing field now. i'm curious what cool stuff we missed out on because of people who didn't make it to the starting line!
[+] [-] ramblerman|14 years ago|reply
The real message is that servers are cheap, albeit brought forward in a long vague buildup, and hardly novel information.
[+] [-] ctdonath|14 years ago|reply
Sounds like a marvelous challenge. Anyone have other similar "technological frontier then, high-school science fair project now" type challenges? OPer notes BioCurious as one. A major factor in education is walking kids thru a subject from basic principles to state-of-the-art, recreating historical milestones along the way.
[+] [-] jaylevitt|14 years ago|reply
Content publishing: Weekend project. Rails, memcached and CloudFront and you're done.
IM and Buddy Lists: 1.5 million simultaneous users doing n^2 pub/sub-type distributed transactions.
Mail: 4,000 emails per second with live unsend and recipient read/unread status. I think PostgreSQL tops out in the millions of rows per second nowadays.
Web caching/acceleration: pick your favorite proxy solution and configure it.
Single sign-on: Form strategic partn-- Hey, you said technical challenge, not political.
[+] [-] ippisl|14 years ago|reply
opening a web shop.
Building robots(at today's kid's levels).
Designing really complex and fast digital circuits(using FPGA, and IP blocks).
Building a global, scalable and complex database application(using something like MS lightswitch).
[+] [-] InclinedPlane|14 years ago|reply
[+] [-] enki|14 years ago|reply
I was writing this more in the sense that kids at BioCurious (and the DIY Bio Movement in general) are doing electrophoresis to transfer DNA from glowing jellyfish to bacteria. This is just a few (two?) years after someone got a Nobel prize for that.
That's progress. If stuff that used to be hard falls into kids hands, you're gonna see impressive stuff happening.
However I fully agree that it takes more than that to build a company (Also I wouldn't try to compete with 2012 Google using 1998 technology)
[+] [-] tikhonj|14 years ago|reply
[+] [-] dredmorbius|14 years ago|reply
Google's value doesn't come so much from search any more (it's good at it, though there are now grumblings from the Googluminati), but from its advertising network (and the concomitant connections and contracts associated with it), and the value-added services built on top of Google's underlying search technology, to the extent that those leverage Google's base tools and/or expertise.
The chinks in Google's armor are starting to show though:
- Cheap and/or federated search is now available. - OpenStreetMap is providing mapping data (and APIs) to rival Google Maps. - There's a lot of grumbling going on over privacy especially in the social and mobile spaces. Neither has quite fully coalesced, but if you look at the volatility in both spaces (consider what the largest social network and most popular smartphones were 5 years ago vs. today), things could again change quickly. - Most tellingly, trust in Google to "not be evil" is eroding, rapidly in some quarters.
Google is valuable -- because it dominates advertising, and has the users to monetize that. Chip away at the user base and it could find its hegemony starting to fail.
The fact that it's very, very cheap to replicate Google's underlying tech helps with this. DuckDuckGo is essentially a one-man shop. Yes, it has a very small fraction of Google's traffic, but it compares favorably with everyone else who's tackling Google, including Micorosft's Bing, with ... more than one man equivalent last I checked.
[+] [-] forkandwait|14 years ago|reply
[+] [-] vecter|14 years ago|reply
[+] [-] gghootch|14 years ago|reply
Then again, according to the wikipedia page the original BackRub was conceived when the web was only 10 million pages large, $2000 is considerably more acceptable for a Ph.D. project.
[+] [-] enki|14 years ago|reply
This included a $4,516,573 NSF grant (that didn't go to Larry & Sergey in full, but probably helped their project's infrastructure quite a bit).
http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=9411... http://en.wikipedia.org/wiki/Stanford_Digital_Library_Projec...
On the expense side I've probably actually underestimated the expenses by orders of magnitude. Bandwidth wasn't cheap back then and the storage requirements probably were significantly higher.
[+] [-] ChuckMcM|14 years ago|reply
Basically the article boils down to this, what counted as a 'cluster' in 1998 is a single system in 2008, what used to take hundreds of disk drives to store, you can store on 1 today.
Not particularly deep, but useful to think about from time to time. There is a quote, perhaps apocryphal, which says
"There are two ways to solve a problem that would take 1000 computers 10 years to solve. One is to buy a 1000 computers and start crunching the numbers, the other is party for 9 years, use as much of the money as you need to buy the best computer you can at the end of the 9th year, and compute the answer in one day."
The idea that computers get more powerful every year, and that in 10 years they will be more than 1000x more powerful than the ones you would have started with so one can solve the same problem.
Of course they haven't been getting as powerful as quickly as they once were, but the amount of data you can store per disk has continued to outperform.
The point is that if you are designing for the long haul (say 10 yrs from now) you can probably assume a much more powerful compute base and a lot more data storage.
[+] [-] Vivtek|14 years ago|reply
What he's saying is that the existence of the cloud and library advances such as MapReduce and APIs mean that the bar is lowered, when writing new software, to an extent it's hard even to comprehend.
Every time I get a module from CPAN I still get a shiver down my spine, remembering trying to do new and interesting things in the 80's and early 90's and every single time ending up trying to build a lathe to build a grinder to grind a chisel to hack out my reinvented wheel.
[+] [-] rsynnott|14 years ago|reply
Though, given that hard drives very much do not obey Moore's Law, a well-designed 1998 solution with hundreds of disks may well have far faster IO than the 2012 one-disk solution.
[+] [-] jpzeni|14 years ago|reply
[+] [-] bborud|14 years ago|reply
I agree with the gist of the blog posting though.
[+] [-] enki|14 years ago|reply
[+] [-] agscala|14 years ago|reply
[+] [-] robot|14 years ago|reply
[+] [-] ahi|14 years ago|reply
[+] [-] rudiger|14 years ago|reply
[+] [-] billpatrianakos|14 years ago|reply
Unfortunately he had to use the 14 year old girl analogy and exaggerate the ease with we could build Google circa '98 today. Now his whole point is lost to click clacking of a thousand pedants' keyboards. Guys, this isn't about 14 year old girls nor is it about Google per se as much as it is about the fast pace of tech innovation, the ease and costs associated with acquiring infrastructure, and to a lesser extent there's a tiny but about how we're totally spoiled compared to what we had to work with 14 years ago.
The stuff about Google and 14 year old girls is just a literary tool (along with some mild hyperbole) to help illustrate his point which so far is getting completely missed. Come on guys, is this Hacker News or Pedantic Literary Scholar News? Focus on the point, not little Google girls. PLSN does have a nice ring to it but no, we're not on PSLN. At least not yet.
[+] [-] joeycfan|14 years ago|reply
[deleted]
[+] [-] joejohnson|14 years ago|reply
[+] [-] SODaniel|14 years ago|reply
[+] [-] dmoy|14 years ago|reply
hehehe
[+] [-] angersock|14 years ago|reply
Look, I don't care whether your product cures cancer, dispenses oral sexual favors, and mints pure gold dubloons-- I will not give you my email address without a damned good reason.
Every single goddamn link on your page brings me to a "Enter your email here" prompt, except for the company tab, which brings me instead to a pile of vapid marketing bullshit.
Flotype Inc. is a venture-backed company building a suite of enterprise technology for real-time messaging. Flotype takes a unique approach by building developer-friendly technologies focused on ease-of-use and simplicity, while still exceeding enterprise-grade performance expectations.
Flotype licenses enterprise-grade middleware, Bridge, to customers ranging from social web and software enterprises to financial and fleet management groups.
What does that even mean? You using carrier pidgins? Dwarves? Cyborgs? UDP? ZeroMQ? Smoke signals?
You don't even tell me how my email is going to be used.
Fix your shit.
[+] [-] ramanujan|14 years ago|reply
There's a time and place for profanity/verbal hostility. Feedback to a stranger on website UX isn't it, the perceived intensity level and level of anger is just dialed wrong. I wish pg would implement a filter for this kind of comment.
[+] [-] enki|14 years ago|reply
[+] [-] wisty|14 years ago|reply
[+] [-] conbtl|14 years ago|reply