One of the coolest things about Postgres is how easy it is to make your own datatypes[1]. At previous jobs we've had custom types and aggregates that allowed super fact set operations with comparatively small data storage requirements.
One of the little-noticed (yet) features of the upcoming 9.4 release greatly increases the ability to optimize moving aggregates for certain types of numerical operations, including custom aggregates. See http://www.depesz.com/2014/04/13/waiting-for-9-4-provide-mov... for a better explanation.
If your ORM doesn't let you take advantage of these goodies, maybe it's time you went past ORM? Java's JOOq library is stellar at this kind of raw SQL access stuff, while still being expressive and well-typed:
PostGIS absolutely deserves some attention, as it's one of if not the best GIS DB out there. It felt too big to include in here, but I should at least add some small callout.
PostGIS is awesome... the geometry functions built into vanilla Postgres provide an interesting set of functionality that almost no one seems to know about. points, boxes, circles, intersection/distance/etc functions. Really cool stuff!
Also worth mentioning that a lot of people go for PostGIS when really they just need the earthdistance module. Much more lightweight and very very useful.
Hstore, and the supplied functions for converting to/from records has been very useful for making a "one size fits all" history table.
Currently I'm working with numeric range types to model order quantities and stock quantities, and forecasting which order is going to consume what stock by joining on rows with overlapping ranges, then taking the intersections of those overlaps. Again, Postgres supplies functions and operators for range overlaps, intersections, etc.
In the absence of those datatypes, there'd be a lot more work required to achieve either of these.
The ltree module is really awesome. If you have to do any hierarchical stuff I feel it's much better than nested set implementations. Great performance with gist indexes and high level operators for querying.
I have to add that RhodiumToad is a class act on IRC, and an SQL wizard. I was using the ip4r datatype for something really annoying at work, and eventually needed some hairy SQL. He steered me away from recursive queries into window functions, and wrote a correct query to solve my problem (basically normalizing overlapping IP ranges and some associated metadata with each range) in less than 30 minutes.
I really couldn't say how much time and money he saved me, but it was a lot.
Joe, this is definitely an awesome datatype, what is painful is building it typically or at least it was years ago when I dealt with it. I'm sure it's gotten better but still, I'd love to see this packaged up as a contrib just like hstore.
Why aren't they getting a lot of use? Is it because they're new or ORMs don't have them yet? (Note: This might sound like a false dichotomy but these are the only 2 reasons I know so feel free to add more reasons.)
Hstore is definitely an odd one in that it's an extension so sort of core, sort of not. Other's its likely just that people default to what the ORM does. Hstore actually sees 13% usage across production databases on Heroku, so while something I'd still consider it much lower than what it probably should be.
Because these are Postgres-specific features I think.
I suspect that for a lot of developers, the only database features they use are the ones their ORM of choice supports. The popular ORMs out there tend to be database-agnostic, which means they'll only natively support the lowest-common-denominator database features.
I keep seeing the uuid idea for pkeys, but have yet to see them used by any big site. Last I looked, twitter, facebook are using bigints all over the place. Also, instagram went bigint and blogged about it here http://instagram-engineering.tumblr.com/post/10853187575/sha...
Also, I tried the uuid extension once... It is not well supported, had to make a small change to a c header file to get it to compile on ubuntu, for dev on os x I think I gave up.
The "api userID" for facebook/twitter/etc. is numeric because it's always been numeric (and facebook explicitly doesn't guarantee anything about their lengths IIRC); that doesn't imply anything about what their real pkey is, just that it was numeric back in 2003. It's pretty easy to transform a UUID back and forth into a number.
UUID aren't sortable by order of creation (on a per server basis which is useful in distributed algorithms), and while you'll hear a lot of talk about meteors, lightnings and winning the lottery, you do risk collisions (yes, when it comes to thousands of rps per server, millions of servers, and extended periods of time, you reach ridiculous collision rates even on a 128-bit UUID, thanks to the Birthday paradox).
UUID has no guaranteed properties you can rely on in high volumes.
The only desirable property it has is that it's an easy way out in tutorials.
The better solution is simply longer to explain (have a unique name for each server, an incremental generation number every time the server is restarted, and an incremental number on top of that for the runtime of a given generation - way Erlang does it).
There’s seldom a case you shouldn’t be using these.
For recording most event times that's true but:
Birthdates (which can include times) shouldn't use them.
Calendaring should think very carefully too, should the 9am meeting move to 10am due to DST? or should it "stay put" but potentially move for other timezones?
Definitely some great points and valid cases. In the calendaring case I actually may enjoy the shift, but that would probably be a surprise to most. Will work to add these examples where you wouldn't want the with timezone behavior in.
The postgresql manpage says it loud: don't fucking use timestamp with timezone.
It is bounded to political decision that makes it unusable compared to an UTC TS.
Plus good practices says: store UTC present in local
Does any one Read The Fantastics Manual these days?
Kind of confused here: I was under the impression that timestamp with timezone only did the converting to-and-from your system timezone, but still always stored in UTC. Is that incorrect?
I wish arrays and hstore-like types were available in sqlite (not through plugins, but standard-wise).
I've used something else from my experiments in PostgreSQL - the TXID - this way I was able to track down changes in the database (by keeping previous TXID and some other various bits, and then by polling again (or making the server call me)) - polling again and only instructing to get the rows that have changed since my last TXID.
Because pretty soon you will deal with a non US currency.....
For a "datatype" as commonly needed as "money", it has always struck me as strange that a more standardized and widely agreed upon approach to dealing with them, have not emerged.
Yes, amazing. About to work on stats caching and that Ranges datatype will be suuuper useful for using ranges of timestamps when caching a stat every week, month, etc.
[+] [-] zrail|12 years ago|reply
[1]: http://www.postgresql.org/docs/9.3/static/sql-createtype.htm...
[+] [-] Jweb_Guru|12 years ago|reply
[+] [-] callesgg|12 years ago|reply
Want to store an order id create a OrderId domain. Want to store weight create a gram domain. Want to store quantity create a quantity domain.
That way it becomes much more easy to spot logical errors and maintain data consistency.
Also one can use it to create something like an auto ORM.
[A domain Is like a user defined child type of the system types that can have some extra restrictions.]
[+] [-] mahmud|12 years ago|reply
http://www.jooq.org/
[+] [-] ebiester|12 years ago|reply
"If you have to ask, you can't afford it."
[+] [-] wlievens|12 years ago|reply
[+] [-] craigkerstiens|12 years ago|reply
[+] [-] skyebook|12 years ago|reply
http://www.postgresql.org/docs/9.3/static/datatype-geometric...
[+] [-] baudehlo|12 years ago|reply
[+] [-] radiowave|12 years ago|reply
Currently I'm working with numeric range types to model order quantities and stock quantities, and forecasting which order is going to consume what stock by joining on rows with overlapping ranges, then taking the intersections of those overlaps. Again, Postgres supplies functions and operators for range overlaps, intersections, etc.
In the absence of those datatypes, there'd be a lot more work required to achieve either of these.
[+] [-] apinstein|12 years ago|reply
[+] [-] joevandyk|12 years ago|reply
(https://github.com/RhodiumToad/ip4r-historical/blob/master/R...)
[+] [-] empthought|12 years ago|reply
I really couldn't say how much time and money he saved me, but it was a lot.
[+] [-] craigkerstiens|12 years ago|reply
[+] [-] yawboakye|12 years ago|reply
[+] [-] craigkerstiens|12 years ago|reply
[+] [-] yen223|12 years ago|reply
I suspect that for a lot of developers, the only database features they use are the ones their ORM of choice supports. The popular ORMs out there tend to be database-agnostic, which means they'll only natively support the lowest-common-denominator database features.
[+] [-] troyk|12 years ago|reply
Also, I tried the uuid extension once... It is not well supported, had to make a small change to a c header file to get it to compile on ubuntu, for dev on os x I think I gave up.
[+] [-] lmm|12 years ago|reply
I've used UUID-as-pkey and had it work well.
[+] [-] mantrax5|12 years ago|reply
UUID has no guaranteed properties you can rely on in high volumes.
The only desirable property it has is that it's an easy way out in tutorials.
The better solution is simply longer to explain (have a unique name for each server, an incremental generation number every time the server is restarted, and an incremental number on top of that for the runtime of a given generation - way Erlang does it).
[+] [-] bwooce|12 years ago|reply
Timestamp with Timezone
There’s seldom a case you shouldn’t be using these.
For recording most event times that's true but:
Birthdates (which can include times) shouldn't use them.
Calendaring should think very carefully too, should the 9am meeting move to 10am due to DST? or should it "stay put" but potentially move for other timezones?
[+] [-] craigkerstiens|12 years ago|reply
[+] [-] ragsagar|12 years ago|reply
I don't understand why you don't want to use timezone with birthdates(with time). Can you explain this, please?
[+] [-] julie1|12 years ago|reply
The postgresql manpage says it loud: don't fucking use timestamp with timezone. It is bounded to political decision that makes it unusable compared to an UTC TS. Plus good practices says: store UTC present in local
Does any one Read The Fantastics Manual these days?
[+] [-] groupmuse|12 years ago|reply
[+] [-] saurik|12 years ago|reply
[+] [-] malkia|12 years ago|reply
I've used something else from my experiments in PostgreSQL - the TXID - this way I was able to track down changes in the database (by keeping previous TXID and some other various bits, and then by polling again (or making the server call me)) - polling again and only instructing to get the rows that have changed since my last TXID.
[+] [-] Alex3917|12 years ago|reply
[+] [-] Ixiaus|12 years ago|reply
[+] [-] dholowiski|12 years ago|reply
[+] [-] dmm|12 years ago|reply
http://slashdot.org/story/06/11/09/1534204/slashdot-posting-...
Slashdot got over 16,777,216 comments overflowing a MySQL mediumint they were using for an index.
[+] [-] olefoo|12 years ago|reply
http://www.postgresql.org/docs/9.3/static/routine-vacuuming....
Fortunately the PostgreSQL has autovacuum so you probably don't need to worry about this.
[+] [-] jeltz|12 years ago|reply
[+] [-] troyk|12 years ago|reply
So if your only dealing with US currency, why not love the money type?
[+] [-] rpedela|12 years ago|reply
http://www.postgresql.org/message-id/b42b73150805270629h309f...
[+] [-] stuki|12 years ago|reply
For a "datatype" as commonly needed as "money", it has always struck me as strange that a more standardized and widely agreed upon approach to dealing with them, have not emerged.
[+] [-] callesgg|12 years ago|reply
As the currency of money is not known it is pretty obvious that one needs to store that in another column.
[+] [-] frankpinto|12 years ago|reply