How to Speed Up Your GitLab CI Pipelines for Node Apps by 40%

Article by Drew Tabor, an engineer at AddThis

Are you limited in moving to full CI/CD because of resource constraints? Are you having trouble scaling or concerned about performance? Or maybe you’re tired of waiting on your pipelines to run?

To explore how to improve all of the above, let’s dive into one of the core principles from the Agile Manifesto–the art of maximizing the amount of work not done.

What Your Pipeline Really Needs to Do

Take this example of a pipeline that builds and deploys to a test environment:

Seems like a reasonably intuitive pipeline, right? Obviously, you need to install your npm packages. You can’t deploy a build if there isn’t a build. And also, you want to make sure all your tests pass. Sticking your build in a Docker container (like in this example) and deploying it certainly doesn’t seem optional.

However, if we zoom out, we start to see some redundancies. Here’s a pipeline that deploys to a production environment. Notice that test already doesn’t run. A bit of foreshadowing…

I give you this:

“But Drew, I don’t want my code going straight to production immediately after the test environment!” you scream, shaking your laptop. “I need to do some manual testing first!”

Don’t worry. You can have your cake and eat it, too.

See the following:


click to deploy to prod:
  stage: Begin_deploy_to_prod
  script:
    - <notification that prod is deploying>
  when: manual
  allow_failure: false

Adding the “when: manual; allow_failure: false” pauses the pipeline on this job and waits for you to resume the pipeline. Don’t want or need to deploy a particular commit from the test environment to production? Don’t. It won’t hurt anything.

Maximize your work not done.

The Tricky Part

GitLab CI has a caching mechanism we can use to do even less work if we set it up the right way.

How often do you update your dependencies? Probably not that often, right? And if you’re not updating them from one commit to the next, `npm install` or `npm ci` is doing the exact same thing from one pipeline to the next. If only there were a way to just run that when it needed to be run…

Well, as of October 2018, this is very easy to implement.

Before:

install_dependencies:
  stage: install_dependencies
  script:
    - npm ci
  artifacts:
    paths:
      - node_modules/

After:

install_dependencies:
  stage: install_dependencies
  cache: 
    key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR
    paths:
      - node_modules/
  script:
    - npm ci
  only:
    changes:
      - package-lock.json

In the code block above, the purpose of install_dependencies has shifted from “install all node modules from scratch and pass them downstream” to “only update the cached node modules if they have changed.” Here is the documentation for `npm ci` for those not familiar: `npm ci.`

The build job, in turn, reads in the node modules from the cache instead of an artifact:

build:
  stage: build
  cache: 
    key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR
    paths:
      - node_modules/
    policy: pull
  script:
    - gulp build
  artifacts:
    paths:
      - build

Note the cache key – this will render out to something like drewsBranch-drewsProject. While it’s certainly possible to get this functional cross-branch, in my opinion, it introduces a lot of potential brittleness and edge cases that aren’t worth dealing with.

Best to keep it simple in this case and stick with one cache per branch. Be sure to specify that you only want to pull the cache in the policy – the build job doesn’t need to upload the cache again after running since nothing in it has changed.

Squeezing Out More Improvement

One last thing I want to touch on is selectively pulling artifacts. This is a very incremental improvement compared to previous strategies but can still compound into nice gains depending on the size of your project and number of stages in your pipeline.

In the example pipeline from above, the “build” job creates an artifact that is passed to downstream jobs. The downstream jobs, by default, download all upstream artifacts before starting the script you provide. However, you can specify which, if any, artifacts a given job actually needs!

For us, the only job that needs the build artifact is “dockerize,” so we can tell the remaining jobs not to download anything, thereby speeding them up even more:

test_deploy:
  stage: deploy
  script:
    - <deploy script that pushes a remote Docker image to Kubernetes>
  dependencies: []

This principle also applies to the cache – don’t pull it if you don’t need it!

Show Me the Numbers

This is all well and good in theory, but how about some real numbers to show a real picture? I took one of our projects at AddThis and gave it a CI facelift while writing this article. Here is the before and after:

Install dependencies

Before: ~3 minutes

After: 0* *Note: For master, without any changes to dependencies. More often than not this project does not have dependency changes

Time to deploy to test, from merge to master until finish

Before: ~7 minutes

After: ~4 minutes

Improvement: 43%

Time to deploy to production, from button click until finish

Before: ~7 minutes

After: ~1 minute

Improvement: 86%

Bonus Numbers: Resource Usage

We maintain our own runners and are keenly interested in resource usage as owners, in addition to speed as users. The more efficiently projects use the runners, the fewer resources we need to maintain the projects.

Before this facelift, two pipelines (one to test, one to prod) ran 11 jobs – Install Dependencies (2), Build (2), Test, Dockerize (2), Deploy (3), and Purge CDN (1).

Afterward, it only runs 7 – Build, Test, Dockerize, Deploy (3), and Purge CDN. This is a 37% improvement in jobs run! We can now support almost 5 projects instead of 3 with the same amount of resources.

What does your pipeline look like?