top | item 7712599

Rarely Used Postgres Datatypes

129 points| narfz | 12 years ago |craigkerstiens.com | reply

67 comments

order
[+] zrail|12 years ago|reply
One of the coolest things about Postgres is how easy it is to make your own datatypes[1]. At previous jobs we've had custom types and aggregates that allowed super fact set operations with comparatively small data storage requirements.

[1]: http://www.postgresql.org/docs/9.3/static/sql-createtype.htm...

[+] callesgg|12 years ago|reply
Jup I recommend creating [Postgres domains] for everything that is not specifically already a Postgres type.

Want to store an order id create a OrderId domain. Want to store weight create a gram domain. Want to store quantity create a quantity domain.

That way it becomes much more easy to spot logical errors and maintain data consistency.

Also one can use it to create something like an auto ORM.

[A domain Is like a user defined child type of the system types that can have some extra restrictions.]

[+] mahmud|12 years ago|reply
If your ORM doesn't let you take advantage of these goodies, maybe it's time you went past ORM? Java's JOOq library is stellar at this kind of raw SQL access stuff, while still being expressive and well-typed:

http://www.jooq.org/

[+] ebiester|12 years ago|reply
How much is it?

"If you have to ask, you can't afford it."

[+] wlievens|12 years ago|reply
Let's certainly not forget the awesome PostGIS extension with its spatial types and functions.
[+] craigkerstiens|12 years ago|reply
PostGIS absolutely deserves some attention, as it's one of if not the best GIS DB out there. It felt too big to include in here, but I should at least add some small callout.
[+] baudehlo|12 years ago|reply
Also worth mentioning that a lot of people go for PostGIS when really they just need the earthdistance module. Much more lightweight and very very useful.
[+] radiowave|12 years ago|reply
Hstore, and the supplied functions for converting to/from records has been very useful for making a "one size fits all" history table.

Currently I'm working with numeric range types to model order quantities and stock quantities, and forecasting which order is going to consume what stock by joining on rows with overlapping ranges, then taking the intersections of those overlaps. Again, Postgres supplies functions and operators for range overlaps, intersections, etc.

In the absence of those datatypes, there'd be a lot more work required to achieve either of these.

[+] apinstein|12 years ago|reply
The ltree module is really awesome. If you have to do any hierarchical stuff I feel it's much better than nested set implementations. Great performance with gist indexes and high level operators for querying.
[+] joevandyk|12 years ago|reply
The ip4/ip4r data type is fantastic as well if you just need to store an ip address.

(https://github.com/RhodiumToad/ip4r-historical/blob/master/R...)

[+] empthought|12 years ago|reply
I have to add that RhodiumToad is a class act on IRC, and an SQL wizard. I was using the ip4r datatype for something really annoying at work, and eventually needed some hairy SQL. He steered me away from recursive queries into window functions, and wrote a correct query to solve my problem (basically normalizing overlapping IP ranges and some associated metadata with each range) in less than 30 minutes.

I really couldn't say how much time and money he saved me, but it was a lot.

[+] craigkerstiens|12 years ago|reply
Joe, this is definitely an awesome datatype, what is painful is building it typically or at least it was years ago when I dealt with it. I'm sure it's gotten better but still, I'd love to see this packaged up as a contrib just like hstore.
[+] yawboakye|12 years ago|reply
Why aren't they getting a lot of use? Is it because they're new or ORMs don't have them yet? (Note: This might sound like a false dichotomy but these are the only 2 reasons I know so feel free to add more reasons.)
[+] craigkerstiens|12 years ago|reply
Hstore is definitely an odd one in that it's an extension so sort of core, sort of not. Other's its likely just that people default to what the ORM does. Hstore actually sees 13% usage across production databases on Heroku, so while something I'd still consider it much lower than what it probably should be.
[+] yen223|12 years ago|reply
Because these are Postgres-specific features I think.

I suspect that for a lot of developers, the only database features they use are the ones their ORM of choice supports. The popular ORMs out there tend to be database-agnostic, which means they'll only natively support the lowest-common-denominator database features.

[+] troyk|12 years ago|reply
I keep seeing the uuid idea for pkeys, but have yet to see them used by any big site. Last I looked, twitter, facebook are using bigints all over the place. Also, instagram went bigint and blogged about it here http://instagram-engineering.tumblr.com/post/10853187575/sha...

Also, I tried the uuid extension once... It is not well supported, had to make a small change to a c header file to get it to compile on ubuntu, for dev on os x I think I gave up.

[+] lmm|12 years ago|reply
The "api userID" for facebook/twitter/etc. is numeric because it's always been numeric (and facebook explicitly doesn't guarantee anything about their lengths IIRC); that doesn't imply anything about what their real pkey is, just that it was numeric back in 2003. It's pretty easy to transform a UUID back and forth into a number.

I've used UUID-as-pkey and had it work well.

[+] mantrax5|12 years ago|reply
UUID aren't sortable by order of creation (on a per server basis which is useful in distributed algorithms), and while you'll hear a lot of talk about meteors, lightnings and winning the lottery, you do risk collisions (yes, when it comes to thousands of rps per server, millions of servers, and extended periods of time, you reach ridiculous collision rates even on a 128-bit UUID, thanks to the Birthday paradox).

UUID has no guaranteed properties you can rely on in high volumes.

The only desirable property it has is that it's an easy way out in tutorials.

The better solution is simply longer to explain (have a unique name for each server, an incremental generation number every time the server is restarted, and an incremental number on top of that for the runtime of a given generation - way Erlang does it).

[+] bwooce|12 years ago|reply
the only bit I'd prefer clarification on is:

Timestamp with Timezone

There’s seldom a case you shouldn’t be using these.

For recording most event times that's true but:

Birthdates (which can include times) shouldn't use them.

Calendaring should think very carefully too, should the 9am meeting move to 10am due to DST? or should it "stay put" but potentially move for other timezones?

[+] craigkerstiens|12 years ago|reply
Definitely some great points and valid cases. In the calendaring case I actually may enjoy the shift, but that would probably be a surprise to most. Will work to add these examples where you wouldn't want the with timezone behavior in.
[+] ragsagar|12 years ago|reply
Birthdates (which can include times) shouldn't use them.

I don't understand why you don't want to use timezone with birthdates(with time). Can you explain this, please?

[+] julie1|12 years ago|reply
http://www.postgresql.org/docs/9.1/static/datatype-datetime....

The postgresql manpage says it loud: don't fucking use timestamp with timezone. It is bounded to political decision that makes it unusable compared to an UTC TS. Plus good practices says: store UTC present in local

Does any one Read The Fantastics Manual these days?

[+] groupmuse|12 years ago|reply
Kind of confused here: I was under the impression that timestamp with timezone only did the converting to-and-from your system timezone, but still always stored in UTC. Is that incorrect?
[+] malkia|12 years ago|reply
I wish arrays and hstore-like types were available in sqlite (not through plugins, but standard-wise).

I've used something else from my experiments in PostgreSQL - the TXID - this way I was able to track down changes in the database (by keeping previous TXID and some other various bits, and then by polling again (or making the server call me)) - polling again and only instructing to get the rows that have changed since my last TXID.

[+] Alex3917|12 years ago|reply
The array type is actually one of the most interesting, I'd love to see the performance implications of those specifically covered in more detail.
[+] Ixiaus|12 years ago|reply
Postgres is a great database, I'm particularly in love with hstore for the data that it makes sense for.
[+] dholowiski|12 years ago|reply
While I don't disagree with the coolness of uuid, how exactly do you 'run out of integers' ?
[+] jeltz|12 years ago|reply
Yeah, running out of a bigint in PostgreSQL would be hard since they are 63 bits + sign.
[+] troyk|12 years ago|reply
I think the money type is awesome! Under the hood it uses int (I believe int32 even on 64bit) so it takes 50% less space than decimal.

So if your only dealing with US currency, why not love the money type?

[+] stuki|12 years ago|reply
Because pretty soon you will deal with a non US currency.....

For a "datatype" as commonly needed as "money", it has always struck me as strange that a more standardized and widely agreed upon approach to dealing with them, have not emerged.

[+] callesgg|12 years ago|reply
Why would it make any difference if one used different currency's.

As the currency of money is not known it is pretty obvious that one needs to store that in another column.

[+] frankpinto|12 years ago|reply
Yes, amazing. About to work on stats caching and that Ranges datatype will be suuuper useful for using ranges of timestamps when caching a stat every week, month, etc.