top | item 2658443

(no title)

This seems incredibly naive.

ORMs reduce code duplication. They speed up development, especially when you're treating the underlying data storage as a "dumb" datastore that could just as easily be sqlite or H2 as MySQL or Postgres.

As for ORMs having some sort of negative impact on the queries sent to the underlying database - it really depends on what ORM you use but any ORM I've used had support for pre-loading relationships in advance when required, removing that N+1 problem.

I also want to add that I wrote an ORM for the first company I worked for and when it was finished it was a drop-in replacement for 90% of the queries in our application - and I mean that literally the SQL generated by the ORM was exactly the same as the SQL being replaced. The queries that it couldn't replace (mainly reporting queries) already had an aggressively tuned caching layer in front of them anyway because they were so hairy.

But the real point is this: the performance of the ORM didn't really matter because we were a database driven website that needed to scale - so we had layers upon layers of caching to deal with that issue.

And that is an extremely important point - the way ORMs generalise a lot of queries (every query for an object is always the same no matter what columns you really need) lends itself to extremely good cache performance. Take the query cache of MySQL for example - it stores result sets in a sort of LRU. If you make n queries for the same row in a DB but select different columns each time - you store the same "entity" n times in the query cache. Depending on how big n is, that can cause much worse cache hit performance than simply storing one representation of that entity and letting all n use cases use the attributes they need.

Now, relying on MySQL's query cache for anything would not be smart, but replace it with memcached or reddis or whatever memory-is-a-premium cache and the same point stands. Another example to drive the point home is a result set where you join the result entities to the user query so that you can get all the results back in a single query. In theory this is a great way to reduce the number of queries sent to your DB but if you have caching then there are many times where you could have very low cache hit ratios for user queries since they tend to be unique (for example they use user id) but where you could still get great cache hit performance if certain entities appear often across all those result sets by leaving out the join and doing N+1 fetches instead.

ORMs prevent you from scaling as much as using Python or Ruby over C does.

So I guess that leaves the point about leaky or broken abstractions. Well I would never claim that you can abstract across a whole bunch of databases anyway, I think that's a ridiculous claim that most ORMs make. These types of abstractions when people try to hide the underlying technology are really just a lowest-common-denominator of all the feature sets. So if you chose some technology because you really wanted a differentiating feature then most likely you will find yourself working against such abstractions. Interestingly enough, the dire support for cross-database queries which are perfectly legal in MySQL but not in other vendors is the reason I had to roll my own ORM. But the productivity and maintainability benefits were well worth it.

So yeah I guess what I'm saying is: premature optimization is the root of all evil, there are no silver bullets and performance and scalability is about measuring and optimising where needed. And finally: ORMs are not an anti-pattern.

discuss

akeefer|14 years ago

Calling the author naive is, I think, uncharitable. I've also written my own ORM layer, used for large-scale, high-performance enterprise applications, and I largely agree with the original author's post, so I certainly don't think it's naive. It sounds to me like the thoughts of someone who's ran into some real problems in real situations.

I don't disagree with the points you've made around caching, but I do think you're simplifying the problem a bit. Not all performance tuning in DB-intensive applications is around caching, and it often involves query tuning, indexing, and traditional DB-level stuff.

A large part of the abstraction leak around ORMs is around both the caching and that DB-level performance tuning. You have to understand what code is going to generate what queries so that, at the very least, you can tune them by adding in the appropriate indexes in the database. All of a sudden, you're living in SQL land, examining query plans, etc. But if you decide that the change you need to make is to the SQL itself, the ORM layer suddenly gets in your way: you either have to bypass the ORM layer to drop into raw SQL, which at worst is hard to do and at best tends to massively reduce the value proposition of the ORM framework, or you have to try to tweak your code to get it to generate the query that you want, which is often frustrating and far more difficult than just writing the SQL yourself. I don't think I'm that much in the minority of having an experience like, "Hmm, the query I really need to write needs to use an ORDER BY statement that includes a few case clauses . . . now how do I convince this query-generation framework to spit that out so that I don't have to pull back all the results and do the sorting in memory?" It's also worth mentioning that caching doesn't help tune writes, so if scaling your product requires scaling writes, you're probably going to be mucking around in SQL land.

There's a similar problem around query-generation layers that attempt to allow you to just write normal methods and have things executed on the database; because the code is so far removed from the SQL, it makes it really, really easy to write really terribly-performing queries or to write things that will do hugely unnecessary amounts of work.

On a more trivial point, the fetching all columns when you only need a subset of them problem is really an issue sometimes, especially if you A) have to join across a bunch of tables, B) the columns that you want could be retrieved from indexes, rather than requiring actual row reads, or C) the columns that you care about are strictly several removes away from the original search table, but the ORM layer loads everything in between. (For example, Foo->Bar->Baz, my WHERE clause is on Foo, but the only columns I care about are the id on Foo, which is in the index, and a few columns on Baz . . . how do I tell my ORM layer to load nothing from else from Foo and nothing at all from Bar? It's a different problem than pre-fetching, because I just don't want anything loaded.)

Now, that's not to say that ORM layers can't be made to perform; of course they can, pretty much all of them have the sorts of hooks you describe, and there's plenty of empirical evidence to that effect. But sometimes the way you make them perform is by just bypassing them.

There's another abstraction point, which is that supporting multiple databases often leads to a least-common-denominator functionality approach; for example, if you want to use a db-specific spatial data type, the ORM has to either provide db-specific functionality, or it might just not support handling that type of data well. The same often comes to things like db-specific functions or query hints; if the ORM layer doesn't handle those things for you, you have to bypass it and drop into raw SQL if you need them.

So really, the argument is not, "ORM's are not functional and no one should use them," it's related to the value proposition of an ORM layer. The value proposition is "This tool will make your life easier, will save you from having to write SQL, and will help you work across multiple databases." If the tool makes life harder than it otherwise would be, then it's not useful, even if it's still possible to do work in it.

So the question is largely around whether or not they make life easier or not. In the simple case, I think the answer is that yes, they do: they make it easier for beginners to get off the ground, they make it easy to do simple queries and writes, and the performance probably doesn't matter anyway.

When things get more complicated, though, the question becomes a lot less clear. Yes, the ORM layer makes it easier to have structured queries that can be cached . . . it also makes it harder to have one-off queries that can be tuned easily based on exactly what data is needed and tweaked to convince the database to generate the right query plan, and it makes it much harder to look at some DB stats, identify a poorly-performing query, and then map it back to the code that generated that query. I know of applications that have basically had to bypass ActiveRecord more and more as they scaled to just do raw SQL queries because making ActiveRecord perform was simply too hard or not possible.

So personally, I prefer an ORM approach that does minimum stuff to let me do the simple things simply (pull rows back and map them to an object, execute simple queries directly on that table), but that's designed from the ground up with the idea that dropping straight into SQL is a normal, accepted part of the workflow, rather than some one-off thing that you should rarely do. But it really depends on your project and your comfort level with SQL.

DavidMcLaughlin|14 years ago

> Yes, the ORM layer makes it easier to have structured queries that can be cached . . . it also makes it harder to have one-off queries that can be tuned easily based on exactly what data is needed and tweaked to convince the database to generate the right query plan, and it makes it much harder to look at some DB stats, identify a poorly-performing query, and then map it back to the code that generated that query.

The problem I have with your post is that you are repeatedly mistaking the high level idea of an ORM with the (seemingly) Spartan implementations that you have used.

Here is an ORM that provides support for automatically profiling the performance of queries over a period of time:

http://squeryl.org/performance-profiling.html

Sample output:

http://squeryl.org/profileOfH2Tests.html

I cannot begin to tell you how much time this has saved me when optimising performance of my webapp.

Note that that particular ORM also has the advantage of having type safe queries - i.e. it can tell at compile time if there's a syntax error in your query (subject to bugs in the ORM :)) - even in dynamically generated queries. In practice this is a fantastic feature as it is so much safer than building up SQL queries with string manipulation and dealing with multiple code paths that depend on user input. The test paths alone in such code (even if you have a "query builder" layer) are the stuff of nightmares.

There are many features missing from Squeryl though that I've had in other ORMs because it makes different trade-offs. But this is what you do when you choose a library, and it's important to understand what trade-offs you're making upfront... otherwise you might find yourself writing off an entire approach to software development as an anti-pattern because you picked the wrong library.

unknown|14 years ago

[deleted]

ssmoot|14 years ago

An IdentityMap would resolve the issue with your second-layer cache, allowing you to select only the columns necessary, which can be a very significant win.

Ruby O/RMs for example generally take a big hit from loading date/time-columns. If you don't need the columns most of the time, then why load them?

Many (not all) databases will take a significant hit when loading TEXT columns. Again, most of the time you probably don't need them, and could be better served with a 'summary' column.

Just nit-picking really. Though I do generally try to think of caching as a last resort; too often it feels like a hammer for compromised design.

keithnoizu|14 years ago

  Caching can only go so far . . . eventually you are going to need to refactor or tweak schemas and adjust queries. Especially if you are dealing with user centric / heavy write data although yes cache on write can also reduce some strain.

 Personally I would take hand rolled DAOs over magically generated ORM/AR logic any day. The upfront cost is marginally higher but modifying, adjusting and logically grouping data is much more straight forward.

webjprgm|14 years ago

That's what I do. I have a DB class in PHP that makes most queries easy (much easier than Pear or PDO), and all DAOs are similar to each other so it's a copy-paste-modify to add a new one.

Then I have static methods to do the special queries that the post author talks about, which may return direct result sets, an integer, generated objects with fewer columns, or large object trees depending on the case.

ORM would be faster (developer time) for a prototype site, but do-it-yourself DAOs work just fine.

gaius|14 years ago

treating the underlying data storage as a "dumb" datastore that could just as easily be sqlite or H2 as MySQL or Postgres

If you're doing that then you've missed the point of a database and might as well be using flatfiles.