At Hipmunk we prefer to use "boring" technology. We'll take the tried and true, battle-tested tech over the current flavor of the month any day. We strive to punch above our weight and one of our biggest challenges is doing the most we can with the small, high caliber team that we have. Doing so requires us to keep things simple and leverage what we know best. We experiment with new technologies from time to time (as you'll see later) but it's important that the new tech is actually solving a problem for us and that we're not just buying into the hype.
When Hipmunk was founded six years ago we chose AWS for hosting. For the most part, we've been happy with that decision and still continue to use them to this day. We do most of the provisioning ourselves using custom Python scripts and Puppet for configuration management. We host most services ourselves on EC2 but that's starting to change as hosted solutions become cheaper and more reliable.
PostgreSQL is our primary database. We host it ourselves on EC2, although we use RDS for a few things. We have many workloads that depend on geospatial data and PostGIS has been indispensible for that. We have a custom built ORM, similar to Uber's Schemaless, which lets us to make most schema changes without having to run a data migration. It's optimized for reads though which means it can be a problem for heavy write workloads.
We used to use Memcached as our primary tool for caching, but today we use Redis across the board. Redis is only used as a cache and we don't enable persistent storage on any of the instances. Like Postgres, we host Redis ourselves on EC2. We've found Redis to be flexible, fast, and reliable and most of the problems with it were our own fault.
We use Zookeeper for configuration management and are experimenting with using Cassandra for some of our write-heavy workloads.
We use a few different tools for logging. Server logs are aggregated by a self-hosted ELK (Elasticsearch, Logstash, Kibana) stack. We aggregate most errors using Sentry. Sentry is great because it shows you complete stack traces, dedupes similar errors, and even allows you to inspect the state of variables when the error occurred. We also log events for analytics using a custom event pipeline built with ZMQ, Postgres, S3, and HBase. This makes up the data pipeline powering our internal analytics platform which is also self-hosted on AWS.
Our analytics platform.
We use Jenkins to run our automated tests as well as deploy our code to production. Every patch must pass tests and be code reviewed before it's shipped to production. We've got close to 80% coverage of our backend codebase. On the frontend, we have unit tests plus CasperJS for functional testing.
Most of our dev team develops on Macs but we do have a handful of Linux users. We develop locally on our own machines using Docker (we don't use Docker in production yet). We have an install script that can get you a fully working dev env in a few hours.
Hipmunk's platform is built to serve all of our clients: web, mobile web, iOS, Android, Hello Hipmunk, and even some of our partners. It's pretty much entirely built on Python (2.7) and we use Tornado as our web framework. Tornado is an asynchronous web framework and takes some getting used to but it's perfect for us when we have to make many API calls to external services in parallel, which is frequently. We use the built-in Tornado server to serve requests and use HAProxy to load balance requests between the processes running on each machine.
The bulk of Hipmunk still runs as a monolithic web application. However, we do segment our machines by route so that we can control the type of workload they handle. For example, flight requests go to a different cluster than hotel requests. We're not at the scale where it makes sense to use microservices so the monolith approach enables us to move quickly with minimum overhead. The same codebase is deployed to all of our machines at once but only a subsection of it is actually being exercised on any given machine. Requests first go through Fastly, our CDN, which load balances into one of our HAProxy machines. Those machines then forward the request to the appropriate server based on the request path.
We use Rundeck to schedule our background jobs. It has some nice advantages over Cron including a web UI, automatic capturing of log output, alerting on failed jobs, and the ability to retry failed jobs. We use Upstart for daemons that need to be continually running.
Web and Mobile Web
Most of our templating is done client-side. A year ago this was mostly done with CoffeeScript and Drykup. Since then we've transitioned to using ES6/React/Redux for new features. The website and mobile website have separate entrypoints but share React components and core libraries with each other. On the rare occasion where we need to do server-side templating, we use Mako.
iOS and Android
We adopted Swift fairly quickly when it came out for iOS. New development is done in Swift but we still have a large codebase of Objective C. We expect 80% of our code to be in Swift a year from now. On Android, we use Java. Both apps are beginning to use web views to power features so that they can reuse React components from the mobile website. By doing this, we maintain a consistent experience across platforms with as little effort on our part as possible, while keeping performance up to our standards.
Hello Hipmunk is built with Python. Most of the NLP is done in-house but we outsource a shrinking portion of it to wit.ai and use Elasticsearch to query our hotel dataset. Hello is currently integrated with Slack, Skype, Messenger, and email. Hello is treated like any other client from the platform's perspective and communicates with it mostly via API calls.
There you have it. As you can see we like to keep things simple but we're not afraid to modify our toolchain as our needs change.
Interested in joining the team? Check out our jobs page!