Putting Pants on a Python

python pants

Most of the code that keeps AddThis going is JavaScript in the browser or Java on our backend servers. However, supporting the infrastructure is a fair amount of Python “glue” code. This glue covers everything from simple “scripts” to provisioning servers, analyzing DNS zones, migrating containers, generating dashboards from service discovery data, and CI jobs that build more CI jobs.

We traditionally used the standard Python virtualenv to manage dependencies, but didn’t find it to be a great fit in this case. While second nature to Python developers, the virtualenv setup-and-source dance was a source of friction for everyone else. From the point of view of the Python novice it felt a bit like being asked to fiddle with autotools every time you wanted to run /bin/ls.

As virtual environments are generally neither portable nor relocatable they also have drawbacks as server side deployment mechanism. Instead of building an artifact in an continuous integration pipeline, they are typically built directly on the server. If we were not careful with explicitly specifying all transitive dependencies this could result in an environment built Tuesday being different than the one built Monday. This was obviously not ideal, and worse, the bugs that manifest tend to be subtle.

Finally, as short scripts became long scripts, and long scripts became projects, an increasing amount of copy-paste and related technical debt accumulated. At one particularly embarrassing point we had approximately 30 definitions of a function to do mkdir -p. We wanted it to be easy to discover, navigate, and share code, but not pay the cost of a brittle network of interdependent micro-packages. Fixing a bug in a function should be as easy as developing a fix and passing code review, not writing a fix followed by fiddling with version requirements for 50 downstream projects. Or to put it another way, propagating a bug fix should ideally be O(1) effort to the number of internal projects, not O(n).

To improve on all of these pain points we incrementally converted our Python code to use Pants, a multi-language build system focused on managing a cohesive set of fine-grained targets sharing a single repository. With Pants, dependencies are managed in BUILD files that live alongside the code. Rearranging classes and their dependencies becomes no harder than any other refactoring operation within a repository. As scripts were converted to use Pants they could take advantage of a growing body of shared library code, simplifying them while paying down technical debt.

Pants can create PEX files, a self contained Python executable artifact like a “fat jar” to run on the JVM or a statically compiled Go binary. This lets us build an immutable artifact on our continuous integration server, know it will never change, and know it will run on any of our servers. Pants can also just run a target locally, which was perfect for giving everyone a simple way to run scripts on their workstation without maintaining dozens of virtual environments. Running a PEX or run target doesn’t require any knowledge of the inner workings of Python or Pants.

A side benefit of switching to Pants was centralizing our dependencies on 3rd-party open source projects. Previously each requirements file would have its own list of version constraints for popular projects like Fabric, Jinja2, or PyYAML. Inevitably these version constraints were slightly different and out of sync. With the Pants 3rd-party pattern we could collapse that to a single definition per project, making for easier debugging and future maintenance.

Pants has seen significant improvements just in the short time we have been using it and this week brings the release of Pants 1.0. Congratulations to the entire Pants team! If you or your organization are facing any of these problems, take a look at Pants.