Streamlining code reviews with automated deploys ✈︎

Code reviews are an important part of developing reliable software. Reviewers not only need to make sure that the proposed code changes make sense and meet code style guidelines; but also work as intended. Cloning the pull request branch for testing often means stashing work in progress on a different branch. This context change between developing and reviewing slows down the review process, especially when testing front-end changes. Some reviewers may be tempted into skipping this step in their code reviews, increasing the risk for regressions when these pull requests are merged in and deployed.

Developers are not the only people that review pull requests. Designers often need to approve UI changes made in pull requests. Some developers may share the local server running on their dev machines using an IP address or a .local address. This makes it easier for designers to see the changes in a browser, but ties up the developer's machine while changes are being tested.

We also have a staging environment that is sometimes used for testing. Changes from multiple pull requests may be combined and deployed to staging, but only one set of changes may be tested at a time. Because of this, teams often need to coordinate and schedule time to use staging. In some cases, especially for external demos, a team may need to reserve staging for weeks at a time.

Introducing automated test pulls

There was need for a solution that made it easier for code reviewers to test front-end changes without interrupting the developer's workflow. The initial solution was something we called a test pull. Each test pull is made of up of small set of services that are unique to each pull request such as the main website and API endpoints. The test pull would use the staging cluster for all of the other services that it needs. This means that each test pull is fully functional, but small enough that a single test pull host may be used to serve several pull requests. We use a separate set of Docker containers for each test pull, and use Nginx to serve each one with a unique FQDN name. Every test pull is accessible to all Hipmunk employees.

Test pull host diagram

A diagram illustrating a number of test pulls on the test pull host. Nginx uses the FQDN of the test pull to proxy requests to a Docker container for a specific test pull. These containers connect to other Docker containers for some services. The staging cluster is used for any services that aren't present on the test pull host.

We wrote some scripts that developers could use to deploy and remove test pulls. However, these scripts had to be run each time a test pull needed to be created or updated. As a result, very few pull requests were running an up-to-date test pull, and stale test pulls often took up resources on the test pull host. We automated the creation of test pulls using a Jenkins freestyle job triggered on every pull request branch commit. A different job cleans up test pulls after their corresponding pull request is closed. Test pull deployment status is reported using GitHub's Deployments API which adds links to console logs and the test pull itself to the pull request's timeline whenever a test pull is created or updated. This not only reduces workload on developers, but also makes sure we are making the most effective use of our single test pull host.

GitHub deployment statuses

Two deployment statuses in a single GitHub pull request: The first is an inactive deployment with a link to a console log. The second is an active deployment with a link to the console log and the test pull.

Successes

Fully automating the deployment and removal of test pulls brought some immediate benefits to developer productivity. Since there were no new commands to learn; developers and reviewers were able to start using test pulls right away:

  • Front-end developers were the first to start incorporating test pulls into their workflows, we noticed that changes were being tested more consistently almost immediately after we turned on the automated processes.
  • Designers and product managers could see changes in development at any time and provide feedback.
  • Mobile developers can also use test pulls to test changes on real devices. Using test pulls is more convenient than figuring out how to connect to a server running on a developer's machine, especially for testing performance over cellular data connections.
  • Mobile clients (native and mobile web) could be configured to connect to a test pull to test API changes. Since test pulls run on their own domain, it is simply a matter of configuring the app to use the test pull FQDN as the base domain name.
  • In general, we've seen test pulls encourage exchange and feedback across teams while keeping our development and testing overhead light.

Learnings and next steps

Test pulls do not have to run every service in the app to be useful. We only run a handful of services in our test pulls, which is enough to see most UI and API changes to the site on web and mobile. Adding more services would let us test even more kinds of changes, but would have required scaling up our test host. Starting small let us put together something that works very quickly.

There are only so many services we can run on a single test host. We are beginning to outgrow our test host and are looking to scale up to a multi-host setup using Docker Swarm. We can use the additional machines to power even more services for each test pull and cover more kinds of changes. In a way, we are transitioning our staging environment into something that exclusively powers our test pulls; not just for testing changes but also for external partner demos and other kinds of non-production deploys.

A fully automated test pull system like this takes some engineering effort to put together, but greatly improves the consistency of code reviews across an entire organization. By making it easier to properly exercise and test each change before a deploy, we make teams more productive and prevent trivial regressions that lead to rollbacks.