top | item 9178773

Goodbye MongoDB, Hello PostgreSQL

802 points| YorickPeterse | 11 years ago |developer.olery.com | reply

374 comments

order
[+] jgrahamc|11 years ago|reply
As a greying developer I am most amused by people discovering that 'old' technologies like SQL databases work really well.

The only useful piece of advice I can give a younger developer is... be careful when drinking the newtech koolaid.

And one more thing:

    star = Sequel.lit('*')

    User.select(:locale)
      .select_append { count(star).as(:amount) }
      .select_append { ((count(star) / sum(count(star)).over) * 100.0).as(:percentage) }
      .group(:locale)
      .order(Sequel.desc(:percentage))
just makes me want to cry. Learn SQL rather than wrapping it.
[+] YorickPeterse|11 years ago|reply
It's not so much about not wanting to write/understand SQL (both are still very much required), but about composability. If you want to re-use bits of a SQL query written as a string literal your only option is string concatention or using some kind of string builder/template system. In both cases there's little validation of the query's correctness (syntax wise) until you actually run it.

While I agree that many ORMs go too far or even worse, not implement certain powerful features, Sequel (and similar tools I imagine) strikes a nice balance. The particular examples I gave on their own are not super useful, but we already have quite a few queries that are composed/re-used without having to concat strings.

So tl;dr: it's about composability, not being "lazy" or "ignorant" to SQL.

[+] Pxtl|11 years ago|reply
I use Entity Framework as an ORM and I have to say I like it. As much as there have been a lot of false-starts related to trying to reinvent SQL, I think Entity Framework hits many of the right notes for me:

1) All my code lives in Visual Studio with compile-time type-checking. No maintaining stored procedures outside of my main codebase, no mucking about with strings. And because Entity Framework puts the Select after the From clause, I even get good intellisense on column-names.

2) I can quickly shift code into being in-memory queries from database queries.

3) While it's a little muddy mixing SQL and C#, there are places where C#/EF's syntax is so much more brief and clean since it's hyper-aware of relationships. For example, instead of writing 2 joins to get the country of a user, I can say User.Centre.Country.

Of course, the Entity Framework is adding a layer of complexity to my work - you still have to know what's going on in SQL, and that mental overhead is a downside. But it pays off.

SQL is a brilliant language, and that's why it's stood the test of time in spite of its advanced age.

But seriously, it has a lot of bad flaws - it's often difficult to make reusable components in SQL like you can in other platforms.

The underlying relational algebra is brilliant. That's why I like ORMs - they admit that SQL is ideologically beautiful, but the SQL language itself could use some loving.

[+] JonnieCache|11 years ago|reply
Sequel is the best ORM I've ever seen or used, but that is (probably IMO) not a good application of it.

It gives you total choice over what level of abstraction you want, so this is a particularly egregious use of it, being as nothing about that query is dynamic :)

You can use it purely to execute handwritten SQL queries loaded from files, or stored procedures, or any level of abstraction between raw SQL and the monstrosity you posted. The model layer is totally optional and is built cleanly on top of the query/relational layer, not inside of it.

The thing is a fucking work of art to be honest. The design shows impeccable taste, ruby's more exotic features are deployed only when really needed, everything is so clear, especially if youre used to activerecord.

BTW also the developer is super helpful and pops up everywhere to answer questions.

[+] api|11 years ago|reply
Agreed, but it's important to understand what drove the adoption of NoSQL and schema-less stuff as well as the related trend of dynamic languages like Ruby and JavaScript.

(1) SQL server software itself was clunky, hard to scale, complex to deploy in a clustered fashion, and generally "old" in a bad way. This made NoSQL seem like a breath of fresh air.

(2) Startups! Ship now! MVP! Fail fast! The whole industry has been operating on this ship-fast-fail-fast way of doing things that prioritizes development speed and agility over correctness. That's because half the time what you are developing will be thrown away anyway -- it will fail, or you'll "pivot" and rip it all apart and almost start over. No point in static languages, SQL schemas, or other explicit design elements in that world.

#2 is I think the more significant of the two for things like schema-less MongoDB or dynamic languages. Once you get product/market fit, you typically end up taking the crap you wrote with no types and schemas and re-doing it in a "real language" with a "real database." But as they say in Startupistan, that's a "good problem to have."

[+] Sivart13|11 years ago|reply
I assure you that whoever wrote that code knows SQL.

The builder syntax is convenient because it allows queries to be composable without ugly string-smashing. For instance, you could take the object built out of that "User.select..." and pass it around into other functions that append extra 'where' clauses, etc.

[+] chousuke|11 years ago|reply
I may just be rehashing sibling arguments here, but to me that particular API looks very much what I think an SQL-wrapping library ought to be: a replacement for string concatenation and something that allows you to treat SQL queries as data. I don't know anything about Sequel, but that example still feels close enough to SQL.

My experience with ORMs has been that I eventually end up regretting using one if I try to model my data as objects because with ORMs it's easy to code yourself into a corner where you end up wishing that your design supported the relational model instead.

I do understand the desire for a good ORM, though. SQL is extremely powerful and a well-designed database is a joy to work with, but a sequence of tuples is often not the most convenient datastructure to process in most programming languages.

[+] drzaiusapelord|11 years ago|reply
>As a greying developer

The problem with being a young person who wants to make his or her mark on the world is that if the people before you did an excellent job, you can only make things worse. In fact, I think this is one of our larger problems not only in technology but in society (see the recent move to extremism in many aspects of our political and religious life the people who grew up under more moderate times are pushing for).

Some things should be boring and stable. Build on top of those things. Redoing the foundation every six months is just sub-optimal. Accept that you may not make that big mark in the world, or if you do, it'll be in ways you completely did not expect, just like everyone else who made their mark.

[+] coriny|11 years ago|reply
PostgreSQL is a very different beast to what it was 10 years ago.

So for a while it didn't support use cases as well as various NoSQL/NewSQL databases. However, PostgreSQL has also been built by great bunch of developers who are happy to adapt and implement the new, instead of harping on about the past.

So personally I adapt that advice to instead "use evidence to guide decisions".

[+] stcredzero|11 years ago|reply
The lack of a schema may sound interesting, and in some cases it can certainly have its benefits. However, for many the usage of a schemaless storage engine leads to the problem of implicit schemas.

As another greying developer, one thing that was seen with ORMs were implicit schemas, often motivated as bureaucracy shortcuts.

So, you need a dictionary of configuration data for your user, but the bureaucratic overhead of adding that to your schema is too much? Why, then take your fancy OO serialization technology and save that little dictionary as a binary blob and stick it into just one new column in the database.

Problem solved...until years later when information in one of those blobs becomes tied (by a maintenance programmer who didn't entirely understand the system architecture) to what is effectively an implicit schema.

[+] matwood|11 years ago|reply
I started with SQL in the 90s and completely agree with you. People who do not know SQL tend to reinvent it poorly in a language not designed to work with sets of data.

We have started using jOOQ and it strikes a great balance of raw sql/typed sql and result -> object mapping. I don't ever seen going back to a heavyweight ORM again. There is just too much magic and lost time spend figuring out performance issues for a minimal gain in ramp up for those who do not know SQL.

[+] serve_yay|11 years ago|reply
Gosh, no kidding. (I sorta disagree about ORMs though -- if you're selecting by ID it's monkey work to write those queries, but anything complex, sure, use SQL)
[+] mtanski|11 years ago|reply
I would like to say lets step back and not conflate SQL and relational databases together. Clearly SQL as the language the primary way most people interact with relational database.

In my my mind SQL as a language is a huge PITA. First, parsing of complex statements is expensive (there's workloads where SQL parsing takes more time then processing the results).

Second, as SQL exists today (SQL2011) it's a large, complex language that's not implemented uniformly. So I understand why people want to build programmable interfaces for generating queries versus writing giant string statements that expand to SQL.

I personally would wish that there was an alternative language for interacting with relational databases that isn't SQL. Just expose the whole relational expression tree to users say ala S expressions. It's not like the relational engine is going to optimize / re-order it anyways.

I mean something along the lines of:

  [ GROUP_AGGREGATE,
    [ "name" ],
    [ [ SUM, "COUNT(*)" ],
      [ SUM, "cost" ]],
    [ JOIN, [ ==, "user_id", "customer_id"],
      [ FILTER, [
         [ &&, 
           [ >=, "age", "30" ],
           [ IN, "state", "NY", "CT", "NJ"]],
         [ TABLE, "customer" ]
      [ TABLE, "orders" ]]]]]
Is it more verbose, yes. But much easier to compose, parse and machine transform by software (code is data). Also, makes you think in terms of relational operations/expressions versus SQL the language.
[+] baldfat|11 years ago|reply
The real issue isn't "newtech koolaid" it is the definition of the problem.

I find that people that don't understand the problem like MongoDB for the ability to be "flexible" in designing and modifying down the road. Though I like to argue you have flexibility in SQL also.

[+] jgreen10|11 years ago|reply
SQL databases had their imminent death coming. They were stuck. However, PostgreSQL is under more active development than most other databases, SQL or NoSQL. It is new and cool technology.

In the end, whether you speak SQL or use NoSQL construct matters as much as whether you write a program in C or Pascal. You can have endless debates about form and function and you might want to see and identify with a winner, but in the end whether the tools and the community surrounding a technology fits people's needs is the only thing that really matters.

[+] collyw|11 years ago|reply
Whenever I see someone shoehorning something into MongoDB or something else I ask the reasoning for not using a relational database. Its always something along the lines of "We don't do that now" or "schemaless" (which as the article points out means more work for the dev) something similar, with the "fast" added somewhere into the description.

The crazy thing is most of them are working on a single node, so no horizontal scaling. And they seem to think non-relational is the new way to do things.

[+] theseoafs|11 years ago|reply
If the API surrounding the ORM is well-designed, using an SQL wrapper like this eliminates the possibility of SQL injection. There are arguments for not always doing things the same way.
[+] joevandyk|11 years ago|reply
The advantage of this is that you can pass the relational dataset that's a result of this code around to other methods. Those methods can add additional filters, different ordering, etc. You can't do that with raw sql strings.
[+] jjtheblunt|11 years ago|reply
Agreed. And one might note that there's a really good reason Oracle has dominated for decades, and Larry Ellison (who read the SQL research paper, and formed a company to implement it nearing 40 years ago) is a billionaire.
[+] spydum|11 years ago|reply
i think there is a line to be drawn.. If the ORM is saving you no effort, I fully agree. However, I suspect most ORMs in the short term boost early productivity quite a bit.

It's like writing an app from scratch in assembly, versus C.. The compiler will do quite a bit of the dirty work for you, and if you look at it in a disassembler no doubt you could find tons of improvements to be made (errmmm depending on the compiler I suppose). Doesn't mean either one is the wrong way to go about it. Now if all you do is stuff ASM code in your C app, yeah.. Why bother?

[+] scott_s|11 years ago|reply
I agree with the commenters who argue for the value in a library that wraps strings that happen to be SQL statements, but none that I saw touched on this point: I think most users of such a library will have to know SQL. These libraries seem to have a one-to-one matching between concepts in them, and concepts in SQL. Hence, the point is not protecting developers from learning SQL (because you can't, for these needs), but in finding the least painful way of integrating SQL into the application logic.
[+] Animats|11 years ago|reply
Right. If you write SQL directly, you can also use commands such as EXPLAIN to see what the lookup strategy will be. If there's a full table scan of a large table involved, maybe you need a different query or a new index. If the SQL generation is hidden by some library, you can't do that.

Wikipedia runs on MySQL. They have replicated read-only copies of the database and ngnix caches which handle most read-type operations. Is your site busier than Wikipedia?

[+] evantahler|11 years ago|reply
Wait... does preferring to write SQL (and knowing how) make me old now?

Dang.

[+] jakejake|11 years ago|reply
I think the problem is that NoSQL has been the cool new thing, which causes people to use it without really understanding why it is better/worse than other solutions.

Tools like MongoDB can do things that are extremely difficult or impossible with MySQL/PostgreSQL and they are a great choice for those situations. Using it simply out of laziness or misunderstanding, though, is probably going to create problems later.

[+] ttty|11 years ago|reply
I sincerely still prefer mongodb syntax because:

- Fits well with a programming language; {a: data.x} is better than 'WHERE A="' + data.x + '"' (sanitize?) or similar which are harder to read. SQL queries are good for direct input, mongodb queries styles are better to be used with a programming language;

[+] jv22222|11 years ago|reply
I agree, using ORMs and such hide the real SQL and make it much harder to optimize by using explains etc.
[+] JustGotHere|11 years ago|reply
What I'd like to see is a universal SQL that can be translated to whatever dialect of SQL my current database is using. That way I won't have to relearn SQL every time I start a project with a different database engine.
[+] limelight|11 years ago|reply
This post reflects an interesting technical narrative of companies switching off MongoDB to more traditional relational databases as they grow.

Importantly, I don't think that's an indictment of MongoDB. Instead, it highlights the key advantages of NoSQL: ease of use and rapid iteration.

When you're first working on a project, MongoDB is very easy to slap in. You don't even have to create tables/collections. As you iterate, you don't have to constantly be updating schemas as your product direction shifts. Thus, MongoDB is perfect for early startups.

As a company scales, of course it's warts start to show. The very attributes (like schemaless storage) which were an asset when just a few developers were hacking on the project become a drawback as the company grows in size. Reliability becomes more of a concern.

Naturally, people shift to a more traditional database. Personally, I'm almost ready to switch some of our core functionality from MongoDB to Postgres. This isn't an indictment of Mongo though.

[+] IgorPartola|11 years ago|reply
> Another way of handling this is defining a schema in your models. For example, Mongoid, a popular MongoDB ODM for Ruby, lets you do just that. However, when defining a schema using such tools one should wonder why they aren’t defining the schema in the database itself.

Bah. It's like they didn't know that schema-free data stores mean "there is no schema; different objects may have different fields". This is the whole point of MongoDB: you assume the responsibility of managing the schema. That's a "feature": you get greater flexibility by assuming more responsibility. Whether it's a useful feature, I won't say.

Their second reason is much more valid: MongoDB is not consistent/durable, all that. It's good for a cache, but not for long term data.

Here's a third reason I'd give against it as your primary data store: it's expensive. You have to keep your entire dataset in RAM, but that's not always necessary. My favorite example is from Foursquare. They had every single check-in ever in MongoDB in RAM. That's absolutely unnecessary, and quite silly to do so. Old check-ins are archived data. You don't need them. No user ever wants to know when/where they checked in three years ago. This is why at the time they were paying for 68 GB RAM Amazon boxes instead of 4-8 GB boxes. (I have no idea what they do now. I remember chatting with them on HN after a catastrophic out of memory failure when they filled up the entire 68 GB's).

[+] pizza234|11 years ago|reply
There is a mistake in the article, due to the OP not knowing an arguably basic notion about MySQL.

> when defining a field as int(11) you can just happily insert textual data and MySQL will try to convert it.

this is dependent on the SQL Mode, which is quite flexible. for example, the STRICT_ALL_TABLES will prevent strings to be inserted in INT fields:

mysql> create table example ( `number` int(11) not null );

mysql> insert into example (number) values ('wat'); Query OK, 1 row affected, 1 warning (0,00 sec)

mysql> SET sql_mode = 'STRICT_ALL_TABLES';

mysql> insert into example (number) values ('wat'); ERROR 1366 (HY000): Incorrect integer value: 'wat' for column 'number' at row 1

There are certainly advantages in choosing PostgreSQL over MySQL... this is just not one of them :-)

[+] bgentry|11 years ago|reply
In the future we might also move our Rails applications over to Sequel, but considering Rails is so tightly coupled to ActiveRecord we’re not entirely sure yet if this is worth the time and effort.

Actually, with modern versions of Rails, using Sequel in place of ActiveRecord isn't bad at all. Nothing in Rails is really tied to ActiveRecord anymore. There are dependencies on ActiveModel, but you can easily make Sequel::Model objects conform to this interface. Sequel-rails helps with most of that: https://github.com/TalentBox/sequel-rails

It's really just the migration that's difficult, as that's tough to do piecemeal and requires good test coverage.

Here's a recent side project of mine that I switched from ActiveRecord to Sequel pretty quickly once I remembered how extremely limiting the querying capabilities of ActiveRecord can be: https://github.com/bgentry/portfolio-api

And if you really want to keep the option for schemaless data storage, Postgres can now do that with better performance than MongoDB, while keeping full indexing capabilities: http://blogs.enterprisedb.com/2014/09/24/postgres-outperform...

[+] aravan|11 years ago|reply
The article is sort of weird, why Document, no sql, no schema DB to be a schema DB? One fundamental fact that you liked MongoDB in early days (first 5 years) because it was damn easy to handle anything you throw at, you learned to build the business around it. In 5 years, you learned enough, business model matured, not much changes in the data model. Now development progress is matured, you have got team to work on, your worries relies on consistency. You might want to back to SQL. Now you know what schema/table you need, string length, data type, constraint, relationship etc, because you learned that in 5 years time. You don't need to deal with new developers screwing the Document DB (like storing Object ID as string in few places)

Developers, who reads the article beware of reality, you can look back 5 years now, not looking forward 5 years ahead. Don't waste time in dealing with database columns and schemas, instead build the business faster, I found Mongo DB or any Document DB is good fit for agility.

1000's of business move to Document DB because it is schema-less.

While working on Document DB, you should be master in writing stand-alone scripts in Python/Perl/Ruby to run every-time you break the structure, or fix inconsistency.

edit: fixed typo errors

[+] code_duck|11 years ago|reply
I had a business based on another company's API years ago, and they started changing a lot of their systems to work with mongo. The data lost a few useful points, like IDs and the speed and reliability didn't seem to be improved. My thought at the time it was, how about you guys just use postgresql? My comment on the mailing list about that seemed to be taken as an insult or naïveté. However, sure enough two or three years later they said they had all sorts of problems with Mongo and were switching to… Postgres.
[+] super_sloth|11 years ago|reply
Is there anybody here who has run MongoDB at moderate scale with good results?

As in a few terabytes of data, >10k ops/second territory.

I've been really disappointed with its reliability and performance in situations where I've been around that.

[+] esilverberg2|11 years ago|reply
The author's assertion that "Another problem with MySQL is that any table modification (e.g. adding a column) will result in the table being locked for both reading and writing. This means that any operation using such a table will have to wait until the modification has completed." is no longer correct as of Mysql 5.6:

http://dev.mysql.com/doc/refman/5.7/en/innodb-create-index-o...

If you specify ALGORITHM=INPLACE,LOCK=NONE you can alter table without blocking reads and writes. We have used this method successfully in Amazon RDS when updating schemas.

[+] DonnyV|11 years ago|reply
As soon as you have more then 1 app talking to your database its time to either create a common library that talks to your database that can be shared or wrap it in a service. This has nothing to do with NoSQL vs SQL. It has to do with architecture composition.

Plus your problem with checking if a field exists is because Ruby doesn't support property attributes. This is easily solved in C# using attributes. Which you can fake in Ruby. http://stackoverflow.com/questions/1085070/how-do-i-fake-c-s...

Data consistency is what your models are for. No data should be inserted before being assigned to a model.

Again it sounds like you needed a shared library or just a service wrapped around your database.

[+] jhoffner|11 years ago|reply
Something that I need to call out here. In talking about how to handle the lack of fixed schemas within Mongo, the OP refers to using if/else blocks to get around field name changes. This shows a severe lack of understanding of how to work with Mongo at such a fundamental level. If using Mongoid (which he is) you simply need to run `Post.all.rename(title: :post_title)` to do the schema change. This may seem like a nitpick to some people, but honestly if that concept was lost on the developer then its clear that there is little to no credibility to be had in the reporting of all the other Mongo issues.

I'm glad that the OP is growing as a developer and starting to understand the merits of SQL and why having defined schemas is important. SQL is awesome and its not going to be replaced by NoSQL (at least not fully). However please make sure that you are building the foundation of your understanding by fully grasping the technologies that you are already using.

[+] yawz|11 years ago|reply
> ... we value the following: > Consistency

With all due respect, you should have had this list before selecting MongoDB. Consistency, as in "eventual consistency" and also in the way that you describe it is better supported outside the NoSQL group.

Also, I don't fully agree with the "schemaless" discussion. The moment you think about your data, you build a schema in your head, which, then, is translated into code etc. It's about how far you take this rigid model. Maybe you should see it as "flexible schema".

[+] tannerj|11 years ago|reply
So let me ask a question. What should I use when I do need a schemaless database? Is NoSQL never the answer? I've got a project that needs to allow clients to create registration forms for different events that my company hosts. A lot of the registration data will have a defined shema ex: name, email, address. I feel like that stuff should go in a RDMS, but all the event specific stuff needs to be schemaless. I know I can do custom key/value tables in a RDMS, but that doesn't feel right either. Is MongoDB useless as a database, or are people being bitten for thinking it's a silver bullet and throwing it at every problem?
[+] imfletcher|11 years ago|reply
while i've not spent much time with the "nosql" products, its mostly because i've decided to stay SQL. The bottom line is that most humans think of data the same way SQL thinks about data. When you talk to people outside of engineering (i.e. mgmt or customers), they expect the product to be able to do things that SQL does naturally (normalized data for easy changes to things like 'user name', join and sort based on a variety of cross cutting properties, etc). You can always make both do the same thing eventually, but one will fight you more than the other....
[+] Tyguy7|11 years ago|reply
I did a similar migration last year. Loving postgres.
[+] alttab|11 years ago|reply
This post outlines some of the most expensive parts of deploying, operating, and maintaining an application that operates on "NoSQL" databases like Mongo, and in my experience, DynamoDB:

1) Implicit Schemas. We avoid this completely by doing production migrations and re-indexes on every new field we add. Its expensive, and we have to write/test/run scripts in production. At times I wish I could just write a Rails migration on an RDS instance and call it a day. N-1 compatibility isn't hard to accomplish with good code reviews.

2) Search. Want to search and join like in the old days? Good luck. Using a NoSQL DB as the primary authoritative store of record is great, but you'll need a secondary indexes for any searches you want to do. If you need to look up an object on a new field (or even one that already exists in all of your data) if you haven't built an index for it you will have to.

3) Serving Clients. Because of the schemaless-blobby nature of writing clients, things get real messy the minute you have multiple services or applications reading or writing to a DB. To get around this, you have to put a service in front of it and serve the data from some RPC technology, which is an extra step and requires more development and maintenance.

4) Administration. There aren't a long history yet of robust toolsets and "science" behind different schemaless NoSQL databases. Meaning which one you choose has a huge impact on your ability to fine tune it, debug consistency or other expectation issues, and do things like proper failover, backups, restores, etc. Knowledge between similar NoSQL DBs doesn't transfer as well so your mileage will vary more so than on SQL databases (MySQL vs PostgresSQL, for instance).

NoSQL has its place and purpose, but it is rarely as the "one database that rules them all" that many businesses end up with. I'd be interested in counter stories.

[+] st3fan|11 years ago|reply
Yes if you have one MongoDB database that uses `title` and another one that uses `post_title` then you have to adjust your code for that.

Guess what. Same thing applies to SQL.

[+] jqm|11 years ago|reply
I haven't used MongoDB in production (the comments regarding reliability have been around for awhile), but playing around with it, I do like the json format and query structure.

The issue I have with SQL (MS sql in the case of work) is the amount of cleverness involved in some queries I have seen. Among the old timers, it seems almost a badge of honer to develop the longest, most clever SQL query that does everything in one step. Inevitably, there are problems, and people have trouble figuring out why because they can't debug parts of the statement in isolation. In this case, it's as if they have written an entire program in one line and can't test parts in isolation. No doubt this is abuse of the language... and admittedly my SQL skills are not world class, but I keep scratching my head and wondering why they do this to themselves. If a query is so complex you can't tell what it is doing, and it misbehaves, perhaps you would have been better off with some smaller queries you join in the program... nothing against joins... one or two or three of them... but really....

I keep asking myself if I'm missing something, but I sort of doubt it.

[+] harel|11 years ago|reply
I switched a new product in planning from MongoDB to PostgreSql 9.4. With the new JSONB columns you get the best of both worlds.
[+] hbbio|11 years ago|reply
Interesting to read both the article and comments. It's cool to hate Mongo nowadays but we use it successfully (for PEPS, open source: https://github.com/MLstate/PEPS).

The version of MongoDb used is not mentioned in the article. The changes between versions is a problem with Mongo but it improved much recently.

One problem clearly identified is the lack of model. There are solutions to this. For instance, with Opa (http://opalang.org) we use strong static typing to generate a database model and guarantee that the whole applications sticks to it. That leaves out model changes, but there are solutions for that.

Also, there is no need to have the whole collections in RAM, but clearly enough RAM for the "working set" helps.

In the end, Mongo is no magic so do SQL databases which have their share of problems too.