Why you should be using Database Cleaner's `deletion` strategy

me: Your regularly scheduled reminder that Database Cleaner’s deletion strategy is so much faster than the truncation strategy.
my coworker: You should write a blog post about it!
— from Twitter

One of the Rails apps I maintain at work has always had an amazingly slow Cucumber test suite. And when I say amazingly slow, I mean really amazingly slow:

91 scenarios (91 passed)
545 steps (545 passed)
14m50.612s

It’s so bad that I don’t run the Cucumber tests, ever. If it’s going to fail, our CI server will pick it up - I won’t. No-one has ever known why they were slow - they were nearly all written several years ago, when Cucumber was just becoming popular, and it seemed to be taken for granted that it was just ‘one of those things’.

When doing some major refactoring lately I needed to run the Cucumber tests repeatedly, while rewriting some of them and fixing others. I thought I was going to lose my mind while waiting for them, so I decided to take the time out to look at why they were so slow.

I started with the env.rb file, which holds the configuration for the test suite. Abridged, it looked like this:

Dir[File.expand_path(File.join(Rails.root,'spec','support','blueprints','*.rb'))].each do |f|
  require f
end

Capybara.default_driver = :webkit
Capybara.javascript_driver = :webkit

# For those pesky tests that don't work with capybara webkit.
Before '@selenium' do
  Capybara.javascript_driver = :selenium
end

After '@selenium' do
  Capybara.javascript_driver = :webkit
end

Capybara.server_boot_timeout = 50
SunspotTest.solr_startup_timeout = 20

begin
  DatabaseCleaner.strategy = :truncation
rescue NameError
  raise "You need to add database_cleaner to your Gemfile (in the
    :test group) if you wish to use it."
end

Before do
  DatabaseCleaner.clean
end

It looked fairly innocent (apart from “those pesky tests that don’t work with capybara-webkit”), but my eyes zeroed in on the Database Cleaner configuration.

I have a love-hate relationship with Database Cleaner, especially in the last few months. I’ve done a lot of reading about why it works sometimes and not others, and what’s the best way to configure it, including this post by Avdi Grimm. One article I read (I thought it was that one, but apparently not…) mentioned that some people get better results with the deletion strategy, while some people swear by truncation. So I thought I’d try the deletion strategy instead - what harm could it do, except perhaps waste another 15min of my time?

Changing that one configuration option had an astonishing effect:

91 scenarios (91 passed)
545 steps (545 passed)
5m10.963s

That’s a saving of nearly 10 minutes on a 15min test suite. But why is it such a radical difference???

This app is using PostgreSQL as its data store, and I suspect that the two commands run by the two different strategies (DELETE FROM products vs TRUNCATE products) don’t do the same thing in PostgreSQL. This post on StackOverflow (and the subsequent discussion on the PostgreSQL performance mailing list) is a great explanation of exactly what the two commands do, and how they differ.

The tl;dr version of the threads are as follows:

  • TRUNCATE will give you back your tables in near-pristine state. The table gets emptied, the index is reset, the foreign keys are updated, a vacuum is run, and everything is back to as if you had just run a CREATE TABLE command. It has virtually a fixed cost per table, no matter how much data is in it.
  • DELETE is quicker and dirtier - it just deletes the data, nothing more. Indexes aren’t reset, foreign keys arent updated, vacuuming isn’t run, and you’re left with your table in a non-pristine state. The speed of running a DELETE is also very dependent on the amount of data in it.

TRUNCATE would be the better approach, for database consistency, efficiency and cleanliness, but for our testing purposes, do we really need it? We don’t.

  • Each of our tests will only create a couple of rows in a couple of tables, meaning the DELETE to empty them will be very very fast, much faster than a TRUNCATE.
  • The indexes and vaccuuming are irrelevant - PostgreSQL will run its automated vaccuums often enough that this will never cause an issue.
  • The foreign keys would be an issue if we weren’t emptying the entire database at the same time, after every single test. Because we are emptying the entire database, we won’t have any dangling foreign key references to cause problems.

So for all uses of Database Cleaner, I believe that the preferred cleaning strategy in all of your Rails apps should be deletion, not truncation. See how much time you can save in your test suite runs!