top | item 37533687

(no title)

> The "normal forms" could merely be suggestions for a database designer, not a technical limitation enforced by the software itself.

I think most of the motivation for normal forms is to avoid 'update anomalies', which is essentially, don't represent the same information in two places in your base relation variables (aka tables in SQL). So you can have repeated values or nested relations in queries, and you can have them in base tables which are morally normalized, as long as there's no possibility that these lead to the same information being recorded in two distinct places.

When people talk about 'denormalizing' and it's justified, I think it's breaking this rule about representing information in two or more places in exchange for performance. If you do this, the application programmer has to be careful to keep these multiple locations in sync - a kind of consistency you don't have to think about in a clean database design. I think that database management software in general cannot enforce normalisation - it can only make it easier or more difficult to use it with normalized databases.

In theory, the DBMS itself could directly support 'physical denormalization' and make this performance optimisation easier to implement and transparent to the application code. I think some SQL DBMSs have attempted to do things like this.

discuss

6345dhjdsf|2 years ago

(Posted under a different account because I'm being slow-posted again by HN)

> In theory, the DBMS itself could directly support 'physical denormalization' and make this performance optimisation easier to implement and transparent to the application code. I think some SQL DBMSs have attempted to do things like this.

Automatically managed, application-transparent, physical denormalisation entirely managed by the database is something I am very, very interested in. Unfortunately I've been able to find pretty well nothing to describe what it would look like and how it would be done. If you can provide any links that would be so incredibly helpful!

It gets mentioned in the Date/Darwen books as being the right way to do things, but no actual information seems to be given.

jimwhite42|2 years ago

I'm a bit fuzzy, but I think Vertica allows duplicating tables stored in multiple orders - then I think the appropriate version is picked automatically by the query optimiser. So this works not that differently to an index (which is also dbms managed performance denormalization).

There's also materialized views - if you have automatic incrementally updated materialized views, which are transparently substituted into queries, that's along these lines. I think there's a lot of progress being made here, and plenty of compromises used in the field that have been in production for a long time.

I think there's some ambitious work on materialized views being done in postgres.

> It gets mentioned in the Date/Darwen books as being the right way to do things, but no actual information seems to be given.

I don't think they ever convincingly got into the details on it.

smaddox|2 years ago

> Automatically managed, application-transparent, physical denormalisation entirely managed by the database is something I am very, very interested in.

Sounds a bit like Noria: https://github.com/mit-pdos/noria

bazoom42|2 years ago

> I think most of the motivation for normal forms is to avoid 'update anomalies', which is essentially, don't represent the same information in two places

This is true for the second and higher normal forms, but not for first normal form. First normal form is about eliminating nested tables, not about eliminating redundant data.