Design better databases

[+] dang|6 years ago|reply

The original URL was http://dbpatterns.com/, which no longer points to the right content, so we replaced it with the closest archive.org copy of the original.

[+] mortehu|10 years ago|reply

Tip for SQL users:

If you give all your ID fields unique names, e.g. by calling your field "reservation_id" instead of "id", even in the reservation table, you can do stuff like:

SELECT * FROM reservation JOIN guest USING (reservation_id);

By doing "USING (reservation_id)" instead of "ON reservation.id = guest.reservation_id", the field will be automatically deduplicated, so you don't have to qualify it elsewhere in the query, and "SELECT *" will return only one copy.

[+] Tenhundfeld|10 years ago|reply

I actually prefer just using id as the primary key, and I like the explicitness of seeing the table/alias before the column in complex queries. I don't care too much about typing it out; reservation.id isn't longer than reservation_id, and the savings of USING vs ON seem minimal. I also don't care about deduplicating that one field, as I'll likely need to consider other duplicate fields in the results, like created_at or description.

But I can see arguments in both directions. I think the big thing is consistency. If you consistently use <table name>_id as the PK and your tooling doesn't fight it, then cool.

I do feel like this benefit breaks down rather quickly though. For example, if you wanted to track who created a guest record, you probably wouldn't name the column user_id as it's ambiguous. So you might use creator_id to signify it points to the user who created the guest, not to the user invited to the reservation – thus negating your ability to join to a user table via USING.

That's not to say it's a bad tip, just because there are cases where it doesn't fit. Definitely worth knowing. Thanks for sharing.

[+] developer2|10 years ago|reply

Not really a fan. When your table name is "reservation", you're prefixing "id" with the table name ("reservation_id"), but you don't do the same for the other columns. I've never liked the inconsistency of having "reservation_id", but other columns like "name" instead of "reservation_name". Especially on longer table names where you wind up with columns like "this_really_long_table_id".

All just to shorten join clauses with a natural join or USING? You already get shortened joins - less so, but still a gain - by using table aliases so as not to repeat the long table names in each query multiple times. IMO it also takes an extra moment to parse the ambiguous "id" columns to resolve them mentally to which id is being referred to.

[+] leepowers|10 years ago|reply

I prefer longer, more descriptive table names coupled with shorter columns names. Then use aliases when writing queries.

    SELECT * FROM long_table_name l LEFT JOIN another_table_here a ON l.id = a.rel_id

Also, most tables end up being used to populate objects. It's simpler to reference an object with `Reservation.id` than `Reservation.reservation_id`

Regardless of the design choices made it's more important to be consistent in naming conventions and to document why a particular design/schema was chosen.

[+] daxelrod|10 years ago|reply

It's long bothered me that SQL doesn't have a way of saying "join these two tables based on the key relationships I've already defined in my schema". (Or maybe some variants do?)

The database already knows that guest.reservation_id is a foreign key referencing reservation.id. Why should you have to repeat yourself?

[+] ComodoHacker|10 years ago|reply

> the field will be automatically deduplicated, so you don't have to qualify it elsewhere in the query, and "SELECT " will return only one copy.

You still have in outer joins. Also, SELECT * is a bad practice (with some exceptions) when used in code, suitable only for ad-hoc queries.

[+] petepete|10 years ago|reply

Or if you're feeling lucky:

SELECT * FROM reservation NATURAL INNER JOIN guest;

[+] sokoloff|10 years ago|reply

I don't like using select * as it will cause the shape of the result set to change upon schema modifications. We use a DB access layer that "knows" the column indices of result sets at compile time and our zero-downtime process relies on that not changing (for the time period where old code is running against new schema).

Imagine two two-column tables, A with id and name, and B with id and value. (Doesn't matter what the columns are; this is just for concreteness.)

select * from A JOIN B using (id) will give you a three column result set, id, name, value.

Now, suppose you want to do a schema modification to add "last _login" to A. Suddenly, your query is returning a 4 column result set, id, name, last_login, value and code using ordinal positions is now reading last_login into the "value" slot.

Maybe it's a corner-case for our use case and zero-downtime release processes, but it's something to be aware of anyway.

[+] rusanu|10 years ago|reply

I don't like this because it leads to the misleading assumption that it doesn't matter which value you refer (left or right), since they are equal. But once you consider collations is easy to see the fallacy:

   left.id = 'Something' COLLATE case insensitive
   right.id = 'SOMETHING' COLLATE case insensitive

   SELECT id FROM left JOIN right USING (id);

Will the value be 'Something' or 'SOMETHING'? Better avoid the surprise and be specific...

[+] NicoJuicy|10 years ago|reply

I like using id as key, if you want to "decouple" info, i just use reservation.id as reservation_id

This way, my db fetch is always on id and not a longer name ( reservation_id), which on the end of an webapp, makes me save some reading / writing :)

[+] ciroduran|10 years ago|reply

And if you're using a schema you did not create, don't forget to use shorthands! SELECT * FROM reservation r JOIN guest g ON (r.id = g.reservation_id), queries can get messy if you don't use them! :)

[+] swah|10 years ago|reply

Another option:

    SELECT * FROM reservation r JOIN guest USING (r.id);

[+] piers|10 years ago|reply

I really hope you're using SELECT * as an example and that you don't really do that.

[+] voltagex_|10 years ago|reply

In which dialect?

[+] rlonstein|10 years ago|reply

But, but, but ActiveRecord doesn't do that by default so it can't be a good idea [sarcasm]

[+] clay_to_n|10 years ago|reply

It's an old-looking website, but I've found this site has some really cool data models: http://www.databaseanswers.org/data_models/index.htm

As someone in the healthcare space, looking at some of these models gives me a better idea of how various aspects of the healthcare industry work, and the things they interact with. Ex) http://www.databaseanswers.org/data_models/patient_data_ware...

[+] sakopov|10 years ago|reply

I was just about to post this. I've been referring to this site for years all the way since University days. Invaluable resource! Thanks for posting!

[+] nradov|10 years ago|reply

That healthcare industry data model is very simplistic and would not be adequate for the majority of real world use cases involving patient visits. If you have to model something in the healthcare domain then don't reinvent the wheel. Start with the HL7 RIM, and then constrain it down to what you really need.

https://www.hl7.org/implement/standards/product_brief.cfm?pr...

[+] klodolph|10 years ago|reply

I just clicked on the second "featured" pattern and found this hot mess:

http://dbpatterns.com/documents/5091f74289cbad03bc958bc0/

It has the "let's put a UUID on every row" disease common to designers who have never really learned anything other than object oriented design. Price is a string (I guess so you can put "market price" on the fish?), and there's a currency symbol on every "delivery". The whole thing just makes no sense. 18 stars. Unbelievable.

[+] skrebbel|10 years ago|reply

I'm daft. What's wrong with using uuids as primary keys everywhere? I'd appreciate it if you could elaborate :)

[+] Mister_Snuggles|10 years ago|reply

But where do you even put the price on a food item to start with?

The only price is on the 'delivery' table, which seems to imply that the price is only set when someone orders something. I guess this is a delivery-only place? But because the 'delivery' table can only have one food_uuid, you'd have to put in for a bunch of deliveries to order more than one thing.

Maybe the 'price' on the menu is in that 'Additional Info' table?

    SELECT A.NAME, A.DESCRIPTION, B.VALUE FROM FOOD A LEFT OUTER JOIN "Additional Info" B ON (A.UUID = B.UUID AND B.TYPE = 'Price');

There's so much wrong here...

[+] Smudge|10 years ago|reply

I don't disagree, but there is a case to be made for using UUID over sequential integer ids, and that is for horizontal scaling. When inserting records with UUID primary keys, you don't have to have your db cluster synchronized around which sequential ID should be next. There are other ways around this problem, of course, but using UUIDs sidesteps the issue entirely.

[+] nodefortytwo|10 years ago|reply

Yeah, i found that that too, spent 10 mins poking at it:

http://dbpatterns.com/documents/5707794a1514b4252236dcbe/

better, but not finished...

[+] r0m4n0|10 years ago|reply

And only one comment that says "thhx" haha

Like any open advice community like this, its important there are checks in place to not perpetuate bad ideas...

[+] wahsd|10 years ago|reply

Did it cross your mind that there is a whole host of reasons why it may have 18 stars that have nothing to do with the quality of the pattern?

[+] zenogais|10 years ago|reply

Kinda misleading title. Doesn't really have anything to do with database design. It has to do with sharing Entity-Relationship diagrams. Maybe "Design better relational models".

[+] jordanlev|10 years ago|reply

If anyone is interested, there is a 2-volume set of books called "The Data Model Resource Book". They've been around for a while, so for more traditional businesses, but very thorough and broken out by industry:

http://www.wiley.com/WileyCDA/WileyTitle/productCd-047138023...

http://www.wiley.com/WileyCDA/WileyTitle/productCd-047135348...

[+] rlonstein|10 years ago|reply

+1. Definitely worth reading, I have Vols 1 & 2 (Vol 3 came out later). I found it useful because the author explained the decisions about the models and traps to avoid.

[+] koolba|10 years ago|reply

More of an ER modeler than a database designer. The interface is pretty slick/simple. Here's a sample I just cooked up: http://dbpatterns.com/documents/570699101514b428de8893a1

I can't imagine using it for anything real but it was fun to play with.

If the creator is reading this, here's a bug report: If the user hasn't hit "Save" when they export the schema, they get an empty text file and they'll think the app is broken. It should either auto-save or prompt the user "You have unsaved content, do you want to save before exporting?"

UPDATE: One more bug report, the export doesn't escape the columns names at all. It just naively adds double quotes around them. This breaks with double quotes in field names (yeah yeah which is stupid but still..):

    CREATE TABLE "foo" (
	"id" int,
	"baz" varchar(255),
	"bar" varchar(255),
	""test"" varchar(255)
    );

[+] beachy|10 years ago|reply

The social aspects of this are awesome.

But the actual modelling functionality itself is really lacking. The lines joining the entities don't even indicate cardinality (does a ninja have many ninja_nelts? Or is it the other way around?). That's pretty much 101 for any visual database modelling.

Also the lines overlap, meaning you can't see whether ninja_weapon is related to ninja, or to ninja_belt.

I can answer these questions myself by perusing the foreign key columns - but that makes the lines redundant. Since several decades ago, Chen, Bachman, Grady Booch and a bunch of others have given ways to denote cardinality etc., and these are widely used. This tool should use one.

(apologies in advance if I have this wrong and the tool does support this but the diagram authors elected not to use them).

[+] astazangasta|10 years ago|reply

I found the interface very off-putting. If this is a site about exploring data models, a "public feed" and "search" is not a great way to interact with those. In fact, a timeline chronology of when these things were made seems a great way to expose mostly irrelevant information to a large number of users.

How do I browse these models? Is there a taxonomy of organization? How many models are in the system? Are there models relevant to general domains of interest to me? None of this is made apparent.

[+] karmakaze|10 years ago|reply

Thanks for your sample link. It's way more interesting than the feed of model snippets in the original link.

[+] collyw|10 years ago|reply

Isn't that how you design databases? Start with an ER model? Its easy to translate to database tables.

[+] cmrdporcupine|10 years ago|reply

Y'know I really miss designing relational schemas. As much as I hate SQL the language I love the relational model that it is a (butchered and ugly and compromised) implementation of. But working where I do on the systems I do now this is something I never have to deal with anymore.

There's something very therapeutic about organizing data using a system of rules, and the relational data model is a powerful one.

[+] dv35z|10 years ago|reply

Completely agree with you. I have always hoped that some of the GUI modelers would allow you to add notes/explanations on tables & relations, explaining the motivation behind design decisions. Example, "M2M relation here allows us to 'tag' objects with multiple attributes".

[+] iamleppert|10 years ago|reply

Looking around I see lots of people just blindly using UUIDs for everything. At a prior job, I had a boss who forced me to use UUIDs for everything, before we had even collected a single row of data.

It hurts my pragmatism to solve a distributed systems problem before we even had a distributed system! Don't be a sheep and use UUIDs, rather than a simple integer primary key or composite key that is natural to the table. People say it's just an id, who cares? But I say people making these kinds of decisions are probably making other poor choices backed by something they read and don't understand.

Here's a great article on locality and the need for uniqueness and why UUIDs should only be introduced when needed and to solve very specific distributed systems problems. The reality is almost all applications will never need to have multiple concurrent writers.

https://eager.io/blog/how-long-does-an-id-need-to-be/?hn

[+] gholevas|10 years ago|reply

I built an electron app that allows you to design, generate, and share Mongoose Schemas if anyone finds it useful: http://mongomulch.com

[+] moron4hire|10 years ago|reply

These schemas seem like pretty simple, straightforward adaptations of specific problems. This doesn't actually look like any sort of actionable advice on how to design better schemas. Are we supposed to just osmose the knowledge somehow? I was expecting something more along the lines of recommendations like "foreign key relationships should always be indexed, nobody ever came up with a realistic example where they shouldn't! Why the hell isn't this the default behavior?"

[+] igrekel|10 years ago|reply

Interesting idea but it the way it is currently done isn't that great.

Most of the patterns I've seen are obvious simple things but they are missing a lot of content like : - What was the design's intended properties? - Expected volumes, access patterns? - Good places for indices? Maybe the comments could be of some help but really haven't seen much. I've actually found more interesting content in the HN comments than on the site itself.

[+] hghar|10 years ago|reply

I don't think this must be called "patterns" in computer science patterns are a model applied to give solutions to recurring problems. I was expecting something like Martin Fowler Patterns-Enterprise-Application-Architecture but for databases.

This should be called something like database designs.

[+] nxzero|10 years ago|reply

One of the keys to understanding SQL, and on that note, I would highly recommend SQLZoo's interactive tutorial that allows someone to query a (fake) database and progress using baby steps: http://sqlzoo.net

[+] haddr|10 years ago|reply

Cool website, for off the shelf models. For anything more advanced you should be careful, as modeling real world scenarios is rarely so abstract and mamy times you need to make some scenario-specific tradeoffs during the modeling phase to fulfill your requirements.

[+] barryosull|10 years ago|reply

Looks like they’re trying to create a repo of open source DB schemas for domains. That's putting the cart before the horse in my mind. People don’t start with DB schemas when building domains, they end up with DB schemas after modelling them.

[+] Scirra_Tom|10 years ago|reply

Had a quick look through, don't understand what audience this is targeted towards.

[+] tacone|10 years ago|reply

I like this site. As simple as it is, it has a lot of potential community-wise.

- it could allow sql exports for various platforms - it could feature an API so people can write their own framework drivers (for example creating migrations, importing existing schemas etc)

Lets hope that it gains users, so that the voting system (the star) can become more useful to filter out the garbage.

Edit: please remove the login wall to see the starred items, it raises the entry barrier quite a bit. Do you really want new visitors to see low rated schemas as the first thing?

[+] adwf|10 years ago|reply

As a database guy this makes me feel good. From looking at a lot of these schemas, my job is not going to disappear anytime soon...

Most egregious example for me is probably the prevalence of a lot of "type" tables when a simple enum column would do. Or maybe the sheer number of UUIDs that are being thrown around.

I even saw a circular ID chain in one. Would be fun setting up foreign key constraints for that!

[+] raziel2p|10 years ago|reply

Most of these seem extremely simple. I would like to see some examples where a good database schema isn't so obvious.

175 comments