Parallel your specs and don’t waste time

knapsack-logoA year ago we started working on a new project. We spend a few months adding new features and making old code better. The build was green and fast. The time flew by and our test suite grew – a natural thing, you would say. Our tests suite took about 12 minutes. When we were working intensively a few new commits in a repository causes long delay before we got feedback from our Continuous Integration server. That was annoying.

Balancing

An obvious thing happened: we decided to add extra CI nodes to split our tests across them. The simple solution for the split was assign an equal amount of test files per CI node. Some of test files, like unit tests, were super fast and others, like end to end tests, took much more. The simple split wasn’t smart. We ended up with three fast CI nodes and one very slow.

without_knapsack

It was sad seeing three CI nodes wasting their time.

Time is your friend

We tried a few solutions but calculating time was the best one. I started working on a gem called Knapsack. The name is based on the knapsack problem. :) This gem helps parallel specs across CI server nodes based on each spec file’s time execution. It generates a spec time execution report and uses it for future test runs.

Don’t waste your CI nodes’ time

Now with Knapsack our test suite is split across CI nodes in a more efficient way. Here is an example how time execution looks for each CI node with Knapsack.

with_knapsack

Get started with Knapsack

Add the gem to your Gemfile and run bundle command:

gem 'knapsack'

You need to bind the knapsack rspec adapter at the beginning of your spec_helper.rb:

require 'knapsack'
Knapsack::Adapters::RspecAdapter.bind

And the last thing, which is to edit Rakefile and add these lines:

require 'knapsack'
Knapsack.load_tasks

Generate time execution report for your spec files

After you add knapsack to your project you need to generate report with spec files’ time execution data. You should run the rspec command on one of your CI nodes.

$ KNAPSACK_GENERATE_REPORT=true bundle exec rspec spec

It will run all your specs and generate a file called knapsack_report.json. The contents of this file will be output at the end of the test suite. You then need to commit knapsack_report.json into your repository. Knapsack will use this file for better test balancing across your CI nodes.

This report should be updated only after you add a lot of new slow tests or you change existing ones which causes a big time execution difference between CI nodes. Either way, you will get time offset warning at the end of the rspec results which reminds you when it’s a good time to regenerate the knapsack report.

Using knapsack on your CI nodes

Run this command on your CI server where CI_NODE_TOTAL is the number of nodes, and CI_NODE_INDEX is how the CI server starts counting nodes (usually 0).

$ CI_NODE_TOTAL=2 CI_NODE_INDEX=0 bundle exec rake knapsack:rspec

The epic split: no problem

We are happier now because our CI feedback is much faster and we know that at any time we can add another CI node and have the epic spec split out of the box thanks to Knapsack.

There is always room for improvement

Do you want to help? There are a few things we can improve; like adding adapters other than RSpec or just improving the spec assignment algorithm. Feel free to fork Knapsack or just give us your feedback. Many thanks!

Oh, and one more thing, check the read me because Knapsack has even more features than described here.

What’s on your mind?