top | item 29765253

(no title)

mbell | 4 years ago

Truncating the DB between every test is indeed horrifically slow. However it's much faster to wrap the test in a transaction and roll it back at the end. Transaction based cleaning also allows parallel testing. That mostly leaves the argument of not writing tests that rely on the state of the database being clean. I have mixed feelings on this one.

Just last week I opened a PR to fix some tests that should not have been passing but were due to an issue along these lines. The tests were making assertions about id columns from different tables and despite the code being incorrect the tests were passing because the sequence generators were clean and thus in sync. The order in which the test records were created happened to line up in the right way that an id in one table matched the id in another table.

So, I get the pain. But I'm not yet convinced it's worth a change.

Another option that I think isn't a bad approach is the default testing setup that Rails uses. Every test runs in a transaction but the test database is also initially seeded with a bunch of realistic data (fixtures in Rails lingo). This makes it impossible to write a test that assumes a clean database while also starting every test with a known state.

discuss

order

baskethead|4 years ago

You must have this backwards.

Truncating a table is extremely fast. Rolling back a transaction is very slow. If you're not seeing this then there's something wrong with your setup.

amirkdv|4 years ago

(Not grandparent commenter) I think you're usually right but I doubt it makes a difference at the scale of 2-5 objects created in a test case. The big game changers IME are in-memory dbs (SQLite) or parallel execution of tests.

This idea of "transaction rollback in test teardown because performance" has a life of its own. The recommended base class for django unit tests (informally recommended, via code comments, not actual docs) uses transaction rollbacks instead of table truncation [0].

On top of this, I think, db truncation gets mixed up with table truncation sometimes too. For example, from OP:

> The time taken to clean the database is usually proportional to the number of tables

... only if you're truncating the whole db and re-initializing the schema, no?

And people sometimes actually do clear the whole db between tests! One unfortunate reason being functionally necessary data migrations that are mixed up with schema-producing migrations, meaning truncating tables doesn't take you back to "zero".

[0]: https://docs.djangoproject.com/en/2.2/_modules/django/test/t...

jeltz|4 years ago

This is definitely not true in PostgreSQL. In PostgreSQL rolling back a transaction just requires writing a couple of bytes of data while truncating requires taking a lock on every table you want to truncate and then for every table probably write more data than the rollback required and then you need to do the commit which is also more expensive than a rollback.

mbell|4 years ago

Not with PG. A couple weeks ago I was working on a project that used truncation cleaning, the test suite took 3m40s. I switched it to cleaning with transactions and the the test suite ran in 5.8s.

Truncation cleaning is extremely slow, not only because the cleaning is slower but because you actually have to commit everything your test code does.

pydry|4 years ago

I used to effectively do this in postgres with rsync on a known fixture snapshot of the data files. It would usually take under two seconds to reset the state and restart the servers, which was easily fast enough to do effective TDD.

I had a few other ideas to speed it up, also.

JoshTriplett|4 years ago

Unless the test is checking performance, another option might be to start up multiple instances of the database, and run many tests concurrently each on identically prepopulated separate databases.

commandlinefan|4 years ago

> Truncating the DB between every test is indeed horrifically slow

Using a database at all in unit tests is horrifically slow - one of the (many) reasons you shouldn’t.

vlovich123|4 years ago

They’re for integration tests not unit tests. Although the distinction is frequently treated as something that means something by purists, I only use it as a way to distinguish conceptually how many complex layers are being stacked since both run under “unit test frameworks” usually for reporting and assertion purposes. I view mocking as usually an anti-pattern. Careful DI usually gets you far enough and is easier to work with. You want the code under test to resemble what’s happening as much in production as possible. The more “extra” you have, the more time you’re wasting maintaining the test infrastructure itself which is generally negative value (your users don’t care about the feature being late because you were refactoring the codebase to be easier to test each function in isolation.

Empty databases should generally start quickly unless there’s some distributed consensus that’s happening (and even then, it’s all on a local machine…). You also don’t even need to tear it down all the way - just drop all tables.

ramchip|4 years ago

Transactions / savepoints and parallelism make a huge difference. I have an app using Ecto and PostgreSQL, and running its ~550 tests takes under 5 seconds. Almost all of them hit the DB many times. The DB is empty and each test starts from a blank slate, inserting any fixture it needs.

An important trick when doing this is to respect unique constraints in fixtures. For instance if you have a users table with an email column as primary key, make the user fixture/factory generate a unique email each time ("user-1@example.com", "user-2@example.com", ...) Then you don't get slowdowns or deadlocks when many tests run in parallel.

srer|4 years ago

One supposes horrifically slow might be a bit subjective.

I notice in a VM on my laptop establishing the initial connection to postgres seems to take 2-3ms, and running a trivial query takes 300-1000us.

I routinely involve the database in unit tests, it is certainly slower but my primary concern is the correct behavior of production code which uses real databases.

ryanbrunner|4 years ago

It depends on what is under test. If you're testing a model file that is highly coupled to the database, and whose entire purpose is more or less to function as an interface to the DB, tests need to include the DB almost by necessity. The alternative is to mock so much out that you're essentially testing your mocked code more than the unit under test.

goto11|4 years ago

What is the purpose of automated testing? Is it to ensure the code works correctly or is it to "run fast"?