Skip to main content

DevOps Decrypted Ep. 10 - Tipping the CI/CD scales

Charlotte Hester
Charlotte Hester
27 April 22
DevOps Decrypted

Summary

In this month's episode ‘Tipping the CI/CD scales’ we discuss the recent Atlassian outage, new spring Java framework zero-day, we also have an exclusive interview with Adrian Waters at GitLab around CI/CD pipelines and we discuss one thing you should be doing in DevOps.

Transcript

Romy Greenfield:

Hello everyone and welcome to another episode of DevOps Decrypted. This is our 10th episode now, 'Tipping the CI/CD scales'. I'm your host, Romy Greenfield and joining me today, we have Jobin, and we also have Rasmus. Say hi, guys.

Jobin Kuruvilla:

Hello, hello.

Rasmus Praestholm:

Hello.

Romy Greenfield:

Hi. So we've got a treat for you today. We've been doing an interview with GitLab about CI/CD and scaling, so that will be coming in a little bit.

Romy Greenfield:

But we're going to talk a bit about DevOps in the news. So I noticed that Atlassian had a little bit of a boo-boo, recently.

Jobin Kuruvilla:

At the worst time possible. Because it was happening right when the Atlassian, what was previously called Summit, the Atlassian teams event was happening. I don't know. Rasmus, have you heard about the event at all?

Rasmus Praestholm:

Yep, yep. We were helping monitor booths and things. So indirectly I heard about the outage. It's not the best time we started hearing about the background of it and I'm trying to think about how that would fit into something like a CI/CD pipeline almost, is on my mind on that.

Jobin Kuruvilla:

Speaking of the event. So what has likely happened. I was actually reading the article published by Three Visions, about what happened. I think it was, again, a classic case where everything went wrong. Right?

Romy Greenfield:

Yeah.

Jobin Kuruvilla:

So there was a communication gap, at first. So this all happened because of the add-on app Atlassian had called Insight. So this has now become part of the Atlassian functionality, a core feature within Atlassian.

Jobin Kuruvilla:                                                                                                                                                                                                                             

But since it was an app earlier, they had to clean up instances, which already had this app installed. So what they wanted to do was, they wanted to clean up this app.

Jobin Kuruvilla:

And the team that provided the IDs of this app, instantly provided the IDs of the actual application itself. So basically instant providing the idea of the app.

Jobin Kuruvilla:

They gave the idea of the JIRA instance itself. That was the first thing that went wrong. So there was the communication app.

Romy Greenfield:

Big woops.

Jobin Kuruvilla:

Communication is important. Right?

Romy Greenfield:

Yes. Apparently so.

Jobin Kuruvilla:

Apparently so. So what happened next was, Atlassian usually marks it for deletion and then removes it later but here what happened was, the script actually permanently deleted it.

Jobin Kuruvilla:

So boom! Instead of deleting the Insight app, what happened was, they deleted the entire application itself. So all those customers who had Insight previously installed, it's all gone. Boom! So that's what happened.

Jobin Kuruvilla:

Now, why wouldn't it be reverted back within a day or maybe in a few hours, because Atlassian had the backups? But here's where it gets interesting.

Jobin Kuruvilla:

The customers live together in an instance, so they had the backup for the end and their ecosystem. So they could revert it back immediately.

Jobin Kuruvilla:

But what they couldn't do was, they couldn't pick and choose the customer and revert back that particular customer alone. That's where things get trickier.

Jobin Kuruvilla:

Now you have like 400 plus customers who have lost their data. But many more customers living together, which means if you're trying to replace those 400 plus customers, you're also putting the other customers back to the time when the event occurred.

Jobin Kuruvilla:

Which means they will be losing data, and Atlassian couldn't risk doing that. So what they decided was, "Let's take these 400 plus customers, start selectively restoring data for those customers and that was not automated.

Jobin Kuruvilla:

So instead of doing it in an hour or 10 hours, it took two weeks. Still happening? I don't know. But that was a big outage.

Rasmus Praestholm:

It seems like a classic case of, "Why wasn't that already automated?" Both on the operational front of, "Here's a change. We need to deactivate this old thing. Let's just have somebody go on the script to do the thing."

Rasmus Praestholm:

And then that introduces this huge risk of human error, getting all of these, not testing it. Whereas if you codify it, even in a pipeline with test environments, and tests, and so on, you can hit the button that goes...

Rasmus Praestholm:

And messes with the test environment. You can validate. Delete what it was. And then you just say, "Okay. We have it codified. We're going to run the exact same thing in this other environment and it's going to be great."

Rasmus Praestholm:

And it would've been great. Then when it starts you can state, "That thing wasn't automated, the recovery wasn't automated." And so on, so in my head, everything nowadays needs to be pipeline. It's just everything.

Romy Greenfield:

Who hasn't accidentally deleted some production data before?

Jobin Kuruvilla:

Exactly. This is not the first time that it has happened. This probably won't be the last.

Romy Greenfield:

Definitely not.

Jobin Kuruvilla:

But as Rasmus said, "Figure out, where are the possible cases where things can go wrong?" And probably this is why chaos engineering is such a big thing these days.

Jobin Kuruvilla:

You have to think outside of the box and create some chaos which you never anticipated, and then see how our systems react to it.

Romy Greenfield:

And I think that's a good point because they obviously had thought about, "If every single instance had accidentally been deleted. Cool. We've got a quick automated way of getting that straight back up and running.

Romy Greenfield:

We didn't think that we'd ever be mistaken enough to do it one-by-one and only get a small subset." They have the big picture, but the detail there was missing for an individual instance.

Rasmus Praestholm:

There may also be some science of short term thinking and cultural problems. If you live in a culture where you try to do all the things in a nice long term like, "We'll automate these things because that might come in handy later."

Rasmus Praestholm:

Rather than, "This happened several times before already, but let's just keep doing it the same way and hope there's..you now think long term, adjust your culture to be health and then these issues will be found”

Jobin Kuruvilla:

And I must say that Atlassian is usually in the forefront of doing that. They advocate doing that. And the irony is that this has happened to Atlassian.

Romy Greenfield:

Yeah.

Jobin Kuruvilla:

There's a little bit of irony in there. But again, that also shows us that this can happen to anyone, any company. So we always have to keep that in mind while designing our systems and designing our disaster recovery process.

Romy Greenfield:

Yep. No matter how big you are, you can still make mistakes.

Jobin Kuruvilla:

Exactly.

Romy Greenfield:

So what else has been going on in the news?

Jobin Kuruvilla:

I think there's another vulnerability key issue that was with the Spring Boot. We talked about Log4j 1 in one of our previous episodes. So I don't want to go into the details of this.

Jobin Kuruvilla:

But this, again, proves the same point. So all these open source libraries that you're using, there's going to be some vulnerability out there which will come up sooner or later, so you have to be prepared.

Jobin Kuruvilla:

Again, going back to the point that Rasmus was making, make sure you have a robust CI/CD pipeline. So you should be able to acutely fit it and turn around with the new deployment that goes into production, once your vulnerability is identified. That's very, very important.

Rasmus Praestholm:

You really should both have the right security type standing in there. So you yourself will find out when some issue comes up or a CBE is being announced.

Rasmus Praestholm:

And then have the robustness and speed to be able to say, "Okay. This is an issue now. We need to fix it here, here, and here. Let the pipes roll, deploy all the updates. Okay. We're good."

Jobin Kuruvilla:

That's a very good point and good that you mentioned about having our own security checks and vulnerability scanning tools like GitLab.

Jobin Kuruvilla:

So probably when we are going on the interview with... Probably today, we'll be speaking more on the CI/CD pipelines and scaling it.

Jobin Kuruvilla:

But this also shows the importance of having tools like GitLab, which addresses a lot of that security scanning, vulnerability scanning, as part of the core product itself.

Romy Greenfield:

Yeah it's important to have that in. So now we're going to go over to our interview that we recorded earlier with GitLab. Enjoy.

Romy Greenfield:

Hello everyone. Today we are joined by Adrian Waters, who is a Senior Solutions Architect for GitLab. And we're going to talk a bit about GitLab CI/CD, and what they've got on offer.

Adrian Waters:

Okay. Thanks, Romy. Just to give a brief bit of information about myself, as Romy mentioned, I'm a solution architect with GitLab.

Adrian Waters:

I've been here a good few years now, and I work within our channel organisation. So work with some of our great partners such as Adaptavist.

Adrian Waters:

Going back to previous lives, I've been in the DevOps type space for many, many years, before it was actually called DevOps. So I'm really pleased to be here today.

Jobin Kuruvilla:

That's awesome. Welcome to the show, Adrian. So a lot of our customers are using GitLab these days, and we all know what they used for GitLab, it covers all of the stages of DevOps.

Jobin Kuruvilla:

There are so many wonderful features like Auto DevOps, which helps us get up and running really, really fast. From your perspective, what are some of the things that we need to watch out for in GitLab? What is standing out?

Adrian Waters:

I think if you look back over the last four or five years, and the way that GitLab has expanded from being a version control tool with some CI capabilities, to cater for that full lifestyle cycle.

Adrian Waters:

I think that was a really inspired move back in time, because it's now becoming the way that analysts are seeing the market going, into this more value stream development platform approach.

Adrian Waters:

So reducing the number of tools that you have to have in your tool chain, which simplifies effort in terms of maintaining them, costs associated with that.

Adrian Waters:

But it also gives a much better user experience, because you've got a common user interface, it's easier to onboard users, you have better data storage, which aids value stream analytics, compliance, audit, et cetera.

Adrian Waters:

So I think that that approach that GitLab has and has had for many years, really puts us in a good position moving forward.

Jobin Kuruvilla:

That sounds great. So what you're saying is, GitLab is not just a CM or CI/CD. There's a lot more to it than that.

Adrian Waters:

Yeah. Absolutely. From initial planning ideas through to the development, through to the testing security side of things, packaging a part of facts, deploying them into your production environments and monitoring them in your production environments and so on.

Adrian Waters:

That's all part and parcel of the GitLab platform approach, and gives you so many benefits. What we're not saying is, "Never use any other tools."

Adrian Waters:

Because there may always be a reason why you want to use a particular tool. And so, having great integration capabilities is also really important.

Adrian Waters:

But for every third party tool that you can take out of your tool chain, you're simplifying life for yourself, as an organisation.

Adrian Waters:

And you're allowing more focus on your core business of developing software and releasing software. We see that as a pretty compelling argument.

Adrian Waters:

And it's now picked up by people such as Gartner and someone who recognise this as a real trend within the industry of where companies themselves see themselves, now go in to try and simplify their tool chains.

Jobin Kuruvilla:

Absolutely. It's good that you brought up the integration part because I think just the last few years, Atlassian, for example, released their open DevOps platform and it integrates with GitLab. So I can see where it is going. Good trend indeed.

Jobin Kuruvilla:

Now the focus of the podcast is obviously CI/CD and the pipelines. So let me ask you one question about that. What I really liked about GitLab is the Auto DevOps capability.

Jobin Kuruvilla:

I don't know if you can maybe expand a little bit more but that is indeed helping customers taking on GitLab and starting up with developing pipelines and creating pipelines.

Adrian Waters:

Yeah. Absolutely. We know what's involved in creating a pipeline. However, the pipeline is defined, you've got to start with something and build it up, and add the capabilities to build your applications, and to test them, and to deploy them.

Adrian Waters:

With Auto DevOps, GitLab does that for you. So without having to create any pipeline, it will analyse your application, determine the best way to build it.

Adrian Waters:

If you want to give it a bit of a clue, you can; how to build your application. But it will then go on to run a wide variety of security tests, whether there's static security, dynamic, container scanning, et cetera, on your application.

Adrian Waters:

It will deploy that into a preview environment, where you can then test the application with the changes that you've just made. You can then take it further and deploy it out into production and so on, if you so wish.

Adrian Waters:

And that's all without having to create anything. Without having to write any pipeline. So from the speed of starting a project to getting a best in breed pipeline up and running, it just takes all that effort away from you.

Adrian Waters:

Now, not everybody will want to do things in, if you like, the prescriptive way that Auto DevOps determines what this pipeline should look like.

Adrian Waters:

So you can incorporate parts of that pipeline into your own pipeline, if you want to, or you can configure it to work in slightly different ways.

Adrian Waters:

But the basic premise of Auto DevOps and the pipeline it creates, is based on many, many years of experience of what a good pipeline would look like.

Adrian Waters:

To be able to have that delivered to you, more or less out of the box, without you having to configure anything and develop anything, is really powerful.

Adrian Waters:

And then being able to take that on and customise it, if you so wish, gives you that flexibility to do the things that are maybe outside of that more prescriptive approach.

Romy Greenfield:

That sounds really cool, especially if it's your first experience dealing with CI/CD. You have no idea what's going on, just to teach you about it. Even if you don't even end up using GitLab, that's an amazing feature to give people exposure.

Adrian Waters:

Yeah. Absolutely. I think the end-to-end structure of it is really sound but if you look at what's going on within software now around security and vulnerabilities.

Adrian Waters:

If you were to look at, "Okay. From day one, we want to make sure that our software is as robust as possible, from a security posture perspective."

Adrian Waters:

Being able to have all of that, as I say, whether it's static, it's looking for security detections, it's looking at what versions of open source libraries are you pulling into your code, maybe unwittingly?

Adrian Waters:

What open source licences are you pulling into your application, again, maybe unwittingly? Without the developer having to really think about that, having all that in place by using Auto DevOps, is really amazing. It's slightly mind blowing when you actually see it in action.

Jobin Kuruvilla:

And I agree with Romy. It is super helpful when you're starting from scratch and you don't have to worry about creating pipelines and stuff.

Jobin Kuruvilla:

But even for experienced folks like myself and probably people from my team, when they're creating complex pipelines, you can still take advantage of Auto DevOps, because you have a template to start from.

Jobin Kuruvilla:

That's where it gets interesting. All those things that you just mentioned, Adrian, the security testing, vulnerability testing, all of those, you don't have to build anything from scratch. You know you need those, but you still have a template work from. That is actually awesome.

Adrian Waters:

Even if you're writing your pipeline from scratch, being able to, say, pick particular jobs, in effect, the templates out of Auto DevOps, is also really powerful.

Adrian Waters:

So you might have your own pipeline, but you want to do some DAST, some dynamic security testing. We've got a job as part of Auto DevOps, that will do that for you.

Adrian Waters:

So just include that template into your pipeline. You've now got dynamic security testing, without really having to understand what's behind it.

Jobin Kuruvilla:

We were actually working on a white paper for everything as code, and one of the key areas we touched step one was pipelines as a code.

Jobin Kuruvilla:

And this is a very good example of pipeline as code, Auto DevOps and the GitLab CAA, and the other templates that we have in place. So everything you have has code, and you take it, you modify it as you like. So I think, super stuff.

Romy Greenfield:

Can you get all of the templates as YAML and then just-

Adrian Waters:

Yes. So in effect, for those who maybe aren't aware, GitLab CI/CD pipeline is defined as YAML, and Auto DevOps is a collection of YAML templates to handle the different jobs within an Auto DevOps pipeline.

Adrian Waters:

So one job to build the software, one to do static security testing, maybe one to create a preview environment, one to do a deployment, et cetera.

Adrian Waters:

So Auto DevOps wraps all of those up into a single super template. But you have access to that and you have access to all the individual templates for all those individual jobs. So you can just include the template into your own pipeline and you know where you go.

Romy Greenfield:

Perfect.

Adrian Waters:

And more importantly, you can see the content of that template as well. So if you want to use it as a learning exercise or you want to customise it, maybe it doesn't work in quite the way that you want, you have access to that YAML, so you can enhance it and change it as your needs demand.

Jobin Kuruvilla:

That's awesome and that's also a perfect segue into our next question, which is, what pipeline architecture?. So I do see that GitLab gives, on a high level, three different architectures.

Jobin Kuruvilla:

A basic architecture, there's the next one called DAG. I should get the acronym correct here, directed acyclic graph. And then you have child/parent pipelines. Can you talk a bit more about the different architectures?

Adrian Waters:

Yeah. Certainly. So I would say, if you go back in time, the standard pipeline, the typical pipeline that you might describe, would be sequential.

Adrian Waters:

So you're starting to do something on the left. And when that finishes, you do the next thing, then you do the next thing, and then you do the next thing.

Adrian Waters:

And ultimately, you get to the right hand side, which is probably deploying into production or staging. And then to improve the performance of that pipeline, you might run certain jobs in parallel.

Adrian Waters:

So the first job will often be, build your software. The second stage in that pipeline might be running several tests; different types of tests.

Adrian Waters:

But they can be done in parallel. So you shorten the overall duration of the pipeline. And when all of those parallel jobs are finished, you then go on and start running jobs in the next stage of that pipeline.

Adrian Waters:

You can't deploy the software until it's been built. I suppose you could deploy before it's been tested, whether that's a good idea or not, probably depends where you're deploying it to.

Adrian Waters:

It's a fairly structured approach. Start on the left, work across to the right with some parallelisation, where it's appropriate.

Adrian Waters:

But you also get situations where there isn't really a need to wait for jobs to finish. A job further to the right of a pipeline, isn't actually dependent on jobs further to the left.

Adrian Waters:

So you could say, for example, "If I want to do some static analysis of my code, I don't need to wait for the software to be built to do that." There isn't necessarily a dependency on that.

Adrian Waters:

Whereas, if I want to deploy my changes into a pod running on Kubernetes, I have to have created that pod before I can do that. So there's a dependency.

Adrian Waters:

Even within a more static pipeline, you can remove the dependencies. So a job that would typically be in, say, stage two or three, could start running straight away as soon as the pipeline has kicked off, or it could start running at the end of the first stage. Because it's not dependent on it.

Adrian Waters:

So you can define that dependency or that lack of dependency. I've got to think now. Yeah. That's it. You go there.

Adrian Waters:

That is a much more flexible pipeline, where really, you're not focusing on that left to right behaviour. You've got relationships between different jobs that will control when and how they're executed.

Adrian Waters:

So within GitLab, there's a different view that allows you to look at that scenario because if you look at just the standard pipeline, you can't see those dependencies, you can't see the relationship between those jobs.

Adrian Waters:

Other than that, they look as if they're moving from left to right. Whereas in the DAG environment, you have a different view that highlights where the dependencies between jobs are. So it allows you to track it through.

Jobin Kuruvilla:

I must say that, I especially like that view. Because as you said, it gives you dependency between different jobs. So that helps, especially when you're debugging against something.

Adrian Waters:

I think it can also help when you're looking to optimise performance of the pipeline as well, because you can see where the dependency is and the critical paths are.

Jobin Kuruvilla:

Good point.

Adrian Waters:

And then the final you mentioned were...

Jobin Kuruvilla:

Child/parent pipelines

Adrian Waters:

Parent/child pipelines. Yeah. So there's maybe a fourth, that is slightly different, which is the multi-project pipelines.

Adrian Waters:

But either way, it allows you to say, "Rather than having one large pipeline, you can split that up into multiple pipelines and you can use one pipeline to trigger another pipeline or trigger multiple other pipelines."

Adrian Waters:

There's a distinction with multi-project pipelines. You're looking at where your code is split up into multiple projects.

Adrian Waters:

Maybe you've got one for building software, and then you've got other projects that will actually do deployments or something. So you can trigger one from the other.

Adrian Waters:

The parent/child pipelines are similar, but they're within the same project. So if you've got one large code base, you can have a pipeline that will trigger other pipelines within that same project.

Adrian Waters:

So that could be useful in a monorepo type environment, maybe. Where you don't want to run all the pipelines every time, because it's a monorepo. It's large. It's going to take a long time. That's the distinction between the two.

Adrian Waters:

And I think the parent/child pipeline can be useful if you've got a large project and you've got a complex pipeline, then it can start to become more difficult to understand and to manage, and so on.

Adrian Waters:

And being able to separate that out, can make it more readable for one thing. But I think more importantly, it can become more selective as to when you actually execute that child pipeline. So only execute it if something has changed, that is relevant to what that pipeline is doing.

Romy Greenfield:

We actually use parent/child in ScriptRunner, because it has so many unique features. So we do have that functionality and it helps us a lot. It's really sped up us deploying code to production.

Romy Greenfield:

For example, if we only change something as part of the enhanced search feature, we don't need to run all of the functionality, all of the tests against scheduled jobs, script listeners, all of that.

Romy Greenfield:

So actually, we've saved a hell lot of the time deploying enhanced search because of our parent/child relationship. So it can be really, really useful when you do have a big project.

Adrian Waters:

Yeah, yeah. That's a good use case.

Jobin Kuruvilla:

Yeah. Absolutely. And you mentioned monorepos. I still remember that was a big issue probably a few years back, because not many tools actually supported monorepos.

Jobin Kuruvilla:

So if I remember correctly, Google, 95% of the code for Google resides in a single repository, and they had to actually write specific different tools to handle that monorepo situation.

Jobin Kuruvilla:

They spend a lot of money writing their own CI/CD mechanisms. And with tools like GitLab, it becomes a lot, lot easier.

Jobin Kuruvilla:

When I was using one of the other tools in the past, I had to actually write a plugin to handle the scenario, because there were so many dependencies and everything was in a single repository.

Jobin Kuruvilla:

Obviously, you didn't want to kickstart the entire build when a particular folder in that changed. That was a big problem but I think tools like GitLab, it actually handles that really, really well. Do you want to talk a little bit more about that? How can you build a monorepo using GitLab?

Adrian Waters:

Yeah. Sure. I think there's a couple of different areas of challenges, really. One is almost inherent to Git. In that, Git was not really designed for super large repos.

Adrian Waters:

Although there have been, I guess, extensions and so, on LFS, to help with that. At its core, it's not really what Git was designed for back in the day.

Adrian Waters:

So having capabilities that will support, if somebody wants to suddenly have a 5, 10, 20 gigabyte repo or whatever it may be. And for it not to grind things to a halt, that is one consideration.

Adrian Waters:

And then the other side of it is, is it being usable and having functionality now that makes it usable for individual developers, individual teams? That's a slightly different side to it.

Adrian Waters:

On the more just volume based challenges that it can bring then supporting things like LFS, is really important, which GitLab has done.

Adrian Waters:

Our architecture uses a service called Gitaly to optimise the performance of read writes to the underlying data, that's really important to give you that throughput.

Adrian Waters:

Being able to do partial clones, so that whenever a developer is working on this 20 gig repo, that they don't have to pull the whole 20 gig down.

Adrian Waters:

That they can isolate certain parts, whether it's limiting file size, or file paths, or object types, and so on. So all of those are important to make it usable, in the first place, and not get swamped by the volume of data.

Adrian Waters:

I think, on the more functionality side, what we already mentioned there about parent/child pipelines, is really important.

Adrian Waters:

Being able to structure your monorepo into folders or directories that have their own defined purpose. And so, I have been able to have pipelines within those that then control using the GitLab rules syntax within the YMAL.

Adrian Waters:

So to define, "I'm only going to run this job if something in this part of the repo changes. Otherwise, I'm not going to bother." That becomes really important.

Jobin Kuruvilla:

We were actually specifically using those GitLab rules. That's so helpful, because you can define, "Okay. If this changes and build a particular repository only if the dependency changes and so on." Different ways you can handle that using that rules keyword. Right?

Adrian Waters:

Yeah, yeah. And I think one of the other areas of functionality that is really useful is around code owners because if you've got a very large repo with thousands and thousands of files in it, that everybody has got access to and change, you can create a bit of the Wild West out there.

Adrian Waters:

And so, Git, again, its permission model was not designed as tightly as maybe some other version control systems from the past.

Adrian Waters:

And so, being able to use code owners to give some governance to who is allowed to change which files in which parts of this monorepo.

Adrian Waters:

But also going beyond that and into the approval process that you can build with GitLab, is then saying, "Okay. If somebody's changed something in this peripheral bit of software that isn't too important, I'm not so worried about the details of the approval process.

Adrian Waters:

If somebody changes something at the core of the application, or they change the API structure, or something, then I want to make sure our DRI in that area has eyes on that change and approves it before it gets brought into the code base."

Adrian Waters:

So I think that having that flexibility to control different parts of the repo, whether it's through a pipeline or whether it's through things like the approvals and code owners, those are really important aspects.

Jobin Kuruvilla:

That's actually awesome. Because what you're telling me is, it's not just accountability. You can counter who is doing what, where? And security and compliance teams are probably going to love that aspect of GitLab.

Adrian Waters:

Yeah. Approvals are an interesting topic, because you want to use approvals to apply governance, but you don't want them to become a block.

Adrian Waters:

You don't want them to be onerous and stop the flow of developing and releasing software quickly. So the ability within those approvals to say, "Okay. I'm only going to involve Fred if something has changed that really Fred needs to be aware of."

Adrian Waters:

Or maybe flipping it to look at security, "I'm only going to involve the security team, from an approvals perspective, if there are still some serious critical vulnerabilities that have been introduced by the changes Jess made, that they need to give their approval for.

Adrian Waters:

If we've not got any security vulnerabilities, then we don't want to include the security team. There's no need to waste their time doing that."

Adrian Waters:

So having that intelligence around how those approvals are applied to control what then becomes part of your code base, is again key. And I think with a monorepo, that all tends to be exacerbated, because you've got everything in that one place.

Jobin Kuruvilla:

That makes total sense. Sorry, Romy, you were going to say something.

Romy Greenfield:

Sorry. I was just agreeing.

Jobin Kuruvilla:

Sounds good. Now you mentioned a good point about a lot of developers working on those single code bases. Especially if it is a monorepo, that's going to be the case.

Jobin Kuruvilla:

One of the things in GitLab that I liked a lot is merge trains. The way you can queue your merges. So do you want to talk a little bit more about that?

Adrian Waters:

Yeah. I love merge trains. I think it's a great capability and really powerful. And as you say, especially in larger repos or monorepos.

Adrian Waters:

In an ideal world, developers make a change, they test it. It's great. And they press the merge button, and that's their job done.

Adrian Waters:

The reality in the real world is, it's not always like that. You've got many developers working in the same area, creating changes, trying to merge them at, for around, the same time.

Adrian Waters:

So as a developer, what do you do? Do you know that? When changes emerge, you potentially get conflicts. So do you say, "Okay. I'll let my colleague go first and then I'll see what is the result of his merges.

Adrian Waters:

And then I'll come along and try mine. And then my other colleague, he can wait until I finish." And so on. It's not very DevOps really, is it?

Romy Greenfield:

No.

Adrian Waters:

What merge trains do is bring some automation and some intelligence to it. It's something that's quite difficult to describe. A visual really, really helps.

Adrian Waters:

But basically, when you come to do your merge, your merge request goes into a train of merge requests. So if you've got four or five developers all working on the same monorepo, all wanting to merge, the merge request gets cued into a train.

Adrian Waters:

The first merge request in that train will do a simulated merge of what it would look like if it actually merged into the parent branch.

Adrian Waters:

But it doesn't actually apply to the parent branch because what you don't want to do is break the parent branch. You don't want to break the build.

Adrian Waters:

So it will do a merge that will result in a branch in exactly the same way as if you were merging into the parent. And if it finds a problem, then that will get flagged up to the developer; the owner of the merge request.

Adrian Waters:

But that merge request will be taken out of the merge train. So the rest of the merge requests in that train can progress. But the clever bit is that it doesn't do them in sequence. It does them in parallel.

Adrian Waters:

So for, for example, if you've got three merge requests in a merge train, it's going to try and merge all three in parallel. The first one, it will merge changes that the developer has made with the current state of the parent branch.

Adrian Waters:

The second merge request in the train is going to do a simulated merge of the current status of the parent branch and the changes that have been made in the first merge request in the train.

Adrian Waters:

And the third merge request is going to do a simulated merge based on the content of the parent branch and each of the first two merge requests ahead of it in the train. I said it was difficult to explain.

Jobin Kuruvilla:

No, no, no. So let me get this right. So you have MR, one, two, and three. What you're saying is, they will be happening in parallel, although slightly different.

Jobin Kuruvilla:

Because one came first, two came second, three came third. But they'll still be happening in parallel. And one and three might be completed.

Jobin Kuruvilla:

So first one will be merged with master, then three will be merged. And by the time two is completed, two will be merged with that one plus two plus... So one plus three. So everything will be merged together at the end.

Adrian Waters:

Yeah. So basically, if a merge request near to the front of the train passes without any problems, then that's fantastic.

Adrian Waters:

Because we know we've already been testing the merge for the merge requests to the right, in the train, with the changes in that successful merge. So they can just carry on; those merge requests can carry on.

Adrian Waters:

If, say, the merge request at the head of the train fails, conflict is identified or something, then it is taken out of the train and all of the pipelines of all the merge requests in that train, they will all be restarted, taking those changes of the failed merge request out of their scenario.

Jobin Kuruvilla:

That makes sense.

Adrian Waters:

So as a developer, you don't really need to worry about it. You finish your changes, you've done your code review, you've got all your tests passed, you've got your approvals through, you hit the button to say, "Add this to the merge train, please." And away you go.

Adrian Waters:

You go off and move on to the next thing. The only time you will then get drawn back into any investigation, is if your specific merge request causes a conflict. Otherwise, you will gradually move along the train and at some point then be merged into the parent branch.

Jobin Kuruvilla:

So what you're saying is, no matter how hard it is to explain without a diagram, as a developer, I don't have to worry about it because I'm getting pulled in only if my code fails. That's great.

Adrian Waters:

Yes and that's fair enough.

Jobin Kuruvilla:

That is fair enough. Second thing is, I would say, there is wonderful documentation out there. I have seen it. I like the diagrams that you have and data.

Jobin Kuruvilla:

So if you didn't quite catch it during this podcast, you can still go back to the documentation and take a look at it. It's wonderful. I like the idea of merge trains. As I had mentioned, that's why I brought it up in the first place.

Adrian Waters:

There is a video on our GitLab on the filter channel as well, that shows it in action; a demonstration with a project and a few pipelines.

Jobin Kuruvilla:

That sounds good. Probably a topic for one of our webinars too; upcoming webinars. I know there are a lot of other things that we can talk about, because Gitlab has a lot of other features that I really wanted to talk about.

Jobin Kuruvilla:

Merge request pipelines, merge these as pipelines, how to customise a pipeline configuration using GitLab CI? But I think we are going to be out of time soon, Romy.

Romy Greenfield:

Yeah. That's all been brilliant. It's been really informative. I love the sound of merge trains. I wish that that existed when I first started developing, because that would've saved me so much hassle.

Romy Greenfield:

But yes. Thank you so much for joining us today, Adrian, and giving us all of that insider information. It's been a pleasure having you.

Adrian Waters:

No problem. I've enjoyed it.

Romy Greenfield:

Excellent and maybe we'll have you back one day.

Adrian Waters:

Okay.

Jobin Kuruvilla:

We will talk about all the topics that we couldn't cover in this particular podcast. Thanks a lot, Adrian.

Adrian Waters:

Thanks for inviting me along.

Romy Greenfield:

Thanks for listening to the GitLab interview. Hope you enjoyed it. To wrap up today's show, let's talk about one thing that you think that we should be doing in DevOps. Is there anything based on today's episode that you think that people could be doing?

Jobin Kuruvilla:

That's an interesting question. I can go first, Rasmus, if you don't mind.

Rasmus Praestholm:

Please go ahead.

Jobin Kuruvilla:

We did discuss a lot of advanced techniques in that GitLab interview. Adrian was talking about merge trains, lot of things that we can do.

Jobin Kuruvilla:

But I will actually go back to the incident that we were talking about earlier, where Atlassian had a long outage of two weeks.

Jobin Kuruvilla:

This just makes me think that, again, going back to the point I was talking about chaos engineering. See how to look for opportunities, or at least, look for scenarios where everything can go wrong.

Jobin Kuruvilla:

We don't usually do that. We put our scientific hat on and we think, "Okay. This is what could happen." And we are preparing for it.

Jobin Kuruvilla:

So we automate the scripts preventing such things happening. But we have to sometimes think outside of the box and imagine a scenario where two teams make the mistake, like what happened with Atlassian.

Jobin Kuruvilla:

And then, everything goes wrong because of that mistake, and how do we recover the systems back? It could happen in our company. So let's look for opportunities like that, where we can improve the whole process and how we recover our systems back.

Rasmus Praestholm:

And for me, it's hard to pick just one thing you should be doing in DevOps because it really is this holistic view of all the things.

Rasmus Praestholm:

The closest I could get to that would be cheating by calling out the one thing I think is the most important, which is just the culture of doing things differently, doing things well, to be prepared, to automate, to test things well.

Rasmus Praestholm:

All the tools, to me, almost come after that. Just because if you have the right mindset, in the first place, all of the other good things will fall.

Rasmus Praestholm:

But then there are some things that can help you arrive at that mindset. If you're still organising yourselves in emails, and IM, and phone calls, and these old structures, then you will tend to also structure your tools, and your silos, and your teams, as for how you think internally makes sense.

Rasmus Praestholm:

So to me, there are some interesting things like ChatOps, that really challenge you on how you communicate. Which in turn, informs your culture, which in turn, informs your tools.

Rasmus Praestholm:

So at the end of the day, you get all three people processed tools, but you really have to start somewhere that helps you get the mindset to take care of all that stuff.

Jobin Kuruvilla:

Rasmus, if I follow up with a question, how do you think that cultural shift will happen? Does it happen top down or bottom up?

Rasmus Praestholm:

I would say that my favourite is when it happens grassroot style and then gain support from the top. Because it's so much easier to start with good employees that are already plugged in and eager to work in new ways.

Rasmus Praestholm:

Rather than those that sit around in the back room, smoking cigars while talking on the phone with somebody. They're going to be a lot harder, whether you're doing up or down. So if you have the right people...

Rasmus Praestholm:

Because the masses are following the stuff already, and then you just bless it with support from up top, and you're good. It's going to be a lot harder if you just come in and say, "Okay. Everybody, we are doing ChatOps. Any questions?"

Jobin Kuruvilla:

I took note that you said it starts with good employees and then the Rasmus style. It means you're a good employee. I get it. Fair point.

Romy Greenfield:

I think my advice would be, whenever I've been writing a script that's been potentially messing with some really important production customer data, I've always blocked out the actual doing of the changes until I'm absolutely confident and I've done it on a trial instance. That's just one thing.

Romy Greenfield:

Because we've all been there. Like I said, I've accidentally deleted production data before. Not here; not at Adaptavist. Don't fire me.

Romy Greenfield:

It happens. If there's not the tools in place to stop it from happening, then someone like me coming along and doing it, will hopefully encourage you to add the tools to stop it.

Jobin Kuruvilla:

If anything is fit, tell us that it can happen again. So be prepared. Have your DR strategy in place. Right?

Romy Greenfield:

Yeah. Be prepared. If you fail to prepare, you prepare to fail.

Jobin Kuruvilla:

There we go.

Romy Greenfield:

Cool. I think that wraps it up for today's episode. So thank you for listening. You can connect with us on our socials at Adaptavist.

Romy Greenfield:

Let us know what you think of the show. But for me, Romy, and from our speakers, Rasmus and Jobin, thank you for listening. And we'll see you next time on DevOps Decrypted, which is a part of the Adaptavist Live Network.