In my experience, build optimisation is more often than not driven by a ‘gut’ feel rather than actual data. This is often because the data that tells you where you need to make adjustments can be very hard to retrieve.
In this short post you will learn how the cloud Software Team at Adaptavist saved money and CPU usage by optimising resource allocation in Bamboo. We hope that this post will help other engineers facing similar challenges.
Optimise memory allocation to reduce uptime for remote agents
The cloud Software Team at Adaptavist runs a few hundred builds per day on a Bamboo server. We use Atlassian’s Elastic Bamboo as a scheduling and orchestration framework, which allows us to use computing resources from the Amazon Elastic Compute cloud (EC2) to run over 100 agents at peak times.
To optimise our resource allocation, we wanted to identify the jobs requiring the allocation of a larger EC2 instance with more memory and CPU resources/cores. This we hoped would make jobs run quicker, which would keep their agents engaged for a shorter amount of time and therefore reduce costs. To achieve this, we needed more visibility over our remote agents.
Gaining visibility over agent lifecycle events
Adaptavist uses DataDog which allows us to see instances which are up, and identify the CPU and memory usage associated with them. However, it does not tell you if the agent is active and running jobs. So, if you have an agent running different types of jobs, and they are running them at a high CPU, you won’t know what job is responsible for the usage.
For each job, we needed to know the project and the EC2 instance it’s running on. To achieve this, we looked at how to enrich DataDog with information about lifecycle events such as: what job, to which project it belongs, job duration, queue times, etc.
Initially, we looked at writing and implementing something that would capture the lifecycle events we wanted directly on the agents, but this approach had a serious drawback. We would have had to rely on scraping logs (i.e: ‘if we see this log message then it means this event has happened’), but if Atlassian changed the log message format, or the text, or if we internally changed how verbose our logging is, the scraper would break.
Fortunately, ScriptRunner for Bamboo saved the day.
Using ScriptRunner listeners to supplement info in DataDog
If you’re familiar with any of the other ScriptRunner apps, ScriptRunner for Bamboo shares the same main features: create complex automations, customisations and integrations by running Groovy scripts.
In our case, the ScriptRunner feature we required was a listener that would perform a custom action whenever standard Bamboo system events were fired. Ours pulls out information from the actual build based on a ‘BuildResultEvent’, then it constructs a JSON payload which is sent to a DataDog endpoint.
The data sent into DataDog was enriched with a lot of tags via the ScriptRunner listener. This helped us know exactly what project the job ran for and what EC2 instance it was on. In DataDog, the dashboard collecting all the data about the agent lifecycle looks like this:
Given the cost of the instance the job is running on and the time it takes to run it, the ‘Build Compute Cost’ dashboard calculates the total cost of the job, and helps us answer questions such as ‘If we use a larger instance with 8 CPU cores rather than 4, will it run faster and cheaper?”. The answer may seem obvious, but it is not all about the cost of a job, in some instances you might need the speed despite its higher cost, as faster builds can increase the velocity of a development team - this is why we need quite a lot of data to get the full picture and understand what ‘knobs and dials’ we have to adjust in order to enact savings.
ScriptRunner was a blessing, and that’s not just because it’s an Adaptavist product. Groovy is very similar to Java and it was fairly easy to write the script, then all we had to do was add it to the listener and click ‘Run’.
Copy the script - tested on Bamboo 6 and Bamboo 7
If you’re curious about the code, you’re welcome to copy ours and use it for your own optimisation needs. It works for both Bamboo 6 and Bamboo 7.
Understanding where in Bamboo to find the information about builds and durations can be harder if you’re new to Bamboo (like I was), but once you have that knowledge you can use the listener to bring in pretty much any information you need, and send it to almost anywhere you want. And you’re not limited to DataDog, the listener works the same way with Splunk, cloudWatch, etc.
This ScriptRunner listener helped us improve the resource allocation for our jobs. By enriching the data coming from the DataDog agent with information about what job was being processed, we could easily see what jobs needed more memory and CPU.
After knowing what’s possible, we discovered some extra improvements that come from using ScriptRunner for Bamboo. It allows you to:
- Control the agent’s lifecycle from within the listener, to, for example, terminate an agent when it runs out of disk
- Use a listener to enforce that an agent only runs a certain amount of jobs
- Use scripts to tidy up things after a build - when the job finishes the listeners run a script that goes through and clears down docker images, temporary folders etc., so the developers don't have to spend their time on routine chores
ScriptRunner for Bamboo is an amazingly versatile tool that lets you customise, extend and automate your Bamboo builds as required. Once Atlassian releases Bamboo Data Center, ScriptRunner for Bamboo Data Center will be available. See what you can achieve with a 30-day free trial: