We're looking at implementing a tagging system for navigating through a large proprietary datastore of oil and gas well data.
The problem we're running into is scaling conjunction queries without blowing-out our hardware budget -- 100's of tags per item with tens of millions of items, updated daily.
I had seen your comment on the original article and was very interested. At the time I checked your site just in case you had written on the subject. I think this would be very useful.
This matches all my intuitions about how I was going to do this; now I have a cookbook (several recipies!)!
This is why Hacker News is so valuable. Nothing beats experienced advisors.
I tried this method for my app. But it turned out that I needed a lot of fields to be in their own columns since i would have to use them with queries in the WHERE clause or ORDER clause.
So be careful while choosing the columns you would like to serialize or json-ify and dump.
Part of this article talks about the index tables they use. The rationale for having separate index tables is they can create or drop them without locking tables or anything like that. It takes a little more manual work to keep things updated though.
I too use a hybrid approach on my site. Only columns that aren't specifically referenced in indexes aren't kept in the serialized attribute hash.
[+] [-] joshu|16 years ago|reply
[+] [-] eserorg|16 years ago|reply
We're looking at implementing a tagging system for navigating through a large proprietary datastore of oil and gas well data.
The problem we're running into is scaling conjunction queries without blowing-out our hardware budget -- 100's of tags per item with tens of millions of items, updated daily.
[+] [-] mnmdev|16 years ago|reply
[+] [-] bravura|16 years ago|reply
[+] [-] hackernews|16 years ago|reply
http://apps.ycombinator.com/item?id=496946
[+] [-] dpapathanasiou|16 years ago|reply
[+] [-] JoeAltmaier|16 years ago|reply
[+] [-] SingAlong|16 years ago|reply
So be careful while choosing the columns you would like to serialize or json-ify and dump.
Just my 2c.
[+] [-] technoweenie|16 years ago|reply
I too use a hybrid approach on my site. Only columns that aren't specifically referenced in indexes aren't kept in the serialized attribute hash.
[+] [-] cnu|16 years ago|reply