DevOps Decrypted: Ep.26 - DevOps-Specific AI Beds In

In this episode, we discuss the state of play in AI for DevOps, Jobin's inner poet, the tenth anniversary of Kubernetes – and talk to special guest Lilly Holden about Developer Experience (DevEx).

Please update your cookie preferences below to view this content.

Summary

Is AI finally in a place where it's useful in DevOps? It might just be, as we discover in another fascinating panel discussion from the usual DevOps Decrypted gang – plus special guest Lilly Holden.

Lilly joined us to speak about her role in the Platform Team at Adaptavist, where she's working on a DevEx initiative to de-complexify Kubernetes. It's no coincidence that we're doing this episode as Kubernetes celebrates its ten-year anniversary—it gives us a chance to look back over its history and try to figure out where it all got so complicated…

We discuss what humans will do when AI does all the things (including whether Jobin is secretly a great poet), how open source started closing doors – and whether DevEx and security are best left to us humans or to machines.

This was a super fun episode, and we'll be sure to have Lilly back to update us on her DevEx journey. Stay tuned for more, and please give us your feedback on the show at devopsdecrypted@adaptavist.com!

Laura Larramore:

Welcome to DevOps Decrypted, where we talk all things DevOps.

I'm your host, Laura Larmore. I'm here today with our usual roundup, Matt, Jobin, Jon and Rasmus – and today we have a special guest, Lilly Holden, who's here to talk to us a little bit about Kubernetes in light of the 10-year anniversary for the very clever pun… KuberTENes!

So, yay! Lilly, would you like to tell us a little bit about what's going on in your world over there?

Lilly Holden:

Hi, all. Yeah, so I'm Lilly. I'm part of the platform team in Adaptivist, which, among other things, looks after our Kubernetes clusters. I'm working on a DevEx initiative, which is developer experience because Kubernetes is complicated enough that we need to abstract it a bit and make it easy for developers to use.

It's so kind of simplified. And take the platform out of the picture a bit.

So yeah, it's kind of what I'm working on, so I rely on lots of Argo products, Argo CD, Argo workflows, events, etc. And yeah, I'm trying to make Helm charts simple to use as well by using Helm chart libraries and our standardised application, Helm charts, job Helm charts, etc.

Jobin Kuruvilla:

Lilly, quick question—it's funny how you mentioned Kubernetes is complicated enough. I mean, it was supposed to make everything simple, right? I mean, I remember those days when we had Docker containers, and we had to manually make sure that they were up and running in production and that the orchestration of those containers was like when it came into effect in the first place.

So why is it complicated? What makes it so complicated?

Lilly Holden:

I think there's just so many choices. I think if you go and look at various Kubernetes clusters, it's unlikely you'll find two that are the same, you know, using, you know, the same packages inside and kind of having the same experience.

Yes, you will kind of find similarities. It is an ecosystem, but they're all going to be customised in one way or another, using different ingress, you know, different service meshes, and there is a relatively steep learning curve to each one of those.

So yeah, I think we shouldn't say it's not complicated. It is! It's just once you're in that world, you understand it. And yeah, it's kind of cool.

Matt Saunders:

It's almost like a double-edged sword, isn't it? Where we've got – so one of the things I love about Kubernetes is it's open source. And you know, Google were very careful to put it out to the community in a way which didn't mean that they got to dominate whether that's successful or not is an exercise to the reader, I think.

But then, yeah, you got like, oh, you can choose whatever ingress you want. You can choose whatever storage you want, whatever back-end way of deploying your clusters you want, and part of me is like… all these things. We've got all the things. Which one should I actually use? Someone, please tell me how to do it!

But then, is that part of the reason why it's become so successful? Because people, you know, the markets be able to find itself and find, like the, the best way to doing things?

Lilly Holden:

I think it's about some of the things in Kubernetes that have a very simple interface. So, like, whole GitOps, you know, I just commit the file. And all of a sudden, things are spinning and running and resources are being created. All of it is happening for you – but in order for that to happen, someone else needs to do the complicated stuff for you and hide it behind, just like a simple YAML file. And that's kind of, yeah, that's quite impressive, actually.

Rasmus Praestholm:

I wonder, just thinking about the history of Kubernetes overall, whether right when it started, it was like, we need to organise Docker containers better. Let's do that.

But then somebody kept getting bright ideas about new, powerful things to add to it, and it felt like the overall tooling was struggling to keep up with that. It's like one graph of increasing complexity. And then the tool graph slowly tries to keep up, but it never quite gets there, so it always keeps getting more complicated.

Jon Mort:

Yeah, Rasmus, I think this is already. This is a really interesting point because I think that there's an inherent complexity to running distributed applications in any cluster and things like that. And I think that most engineering teams underestimate that complexity.

And they don't actually… you know, it's a hard thing to see all of the various different parts of a distributed application across the network, and I think any kind of attempt to simplify things down to super simple is kind of bound to break at certain points.

So I wonder whether, like the complexity that you see Kubernetes, when it's a simplification of like what it could be, and what the real world sort of looks like, which is what – you know, to Lilly's point – why, each system looks different. It's because it has different requirements and those kinds of things.

Lilly Holden:

But also like, if you think maybe 10, 20 years ago, you would have separate teams doing each one of those functions. So you would have a team looking after the database. You have a team after looking at networking if they have a team like installing servers and things like that. So, and now all of that is within one platform team. So obviously, you need to kind of know all of those different topics.

So that's kind of maybe. Why, it seems a bit more complicated because you need to be a bit more multi-skilled.

Jobin Kuruvilla:

And I like the way Matt described it, as you know, a double-edged sword because if you really think about it, the complexity comes because we are spoilt with all the options that you can do in there, right? You take any good best-in-class tools, you know – Rasmus, so you just mentioned about, you know the scale of problems, you know, think about Jenkins. Jenkins, I mean, we had the same exact problem.

The tool is so good that you can do so many things with it. And suddenly people are complaining about, oh, Jenkins is so bad, which is not really the case.

As an Atlassian partner, we know the same story about Jira. We know it's the best tool out there, but people complain about how complex it is. This is mostly because of all the things we can do with it.

I think Kubernetes has the same problem because of all the options out there that can supplement Kubernetes as a core platform. We are spoiled with choices, and you know, we add complexities on top of the existing platform.

Matt Saunders:

So, Rasmus, you're probably the closest thing we've got to an open-source Guru.

So, I'm interested in how you feel about all this proliferation. You know, I'm an open-source maintainer, picking up on what Jobin said, like, just arbitrarily, service mesh.

You want a service mesh for your cluster and…

Rasmus Praestholm:

Which one?!

Matt Saunders:

EXACTLY!

But my point here is, I don't know. Linkerd, Istio, and other service meshes are available. I'm just looking at the way that Kubernetes has grown, and grown, and grown, and grown. With all these proliferations of options, look at the Istio people and the Linkerd people. I'm assuming they're different people.

Do you think these individual products are kind of spurred on better by the fact that there are two or three almost competing, slightly overlapping, slightly not diagramming options—and they all end up being better like that?

Or would it have been better if someone just said, no, we're doing the service mesh. I know you couldn't do it because of the licensing and just had one option, and then we're like, oh, we don't have to be confused by all of these.

Rasmus Praestholm:

Yeah, that's an interesting topic. And indeed, one that's close to my heart. And I wish, especially as I said, an open source Maintainer, who's kind of run a mid-sized, you know, open source community for more than a decade.

I wish for less fragmentation. Honestly.

I am so tired of it.

But I almost wonder if this is more about human psychology and tribalism than it is about technology.

Matt Saunders:

It always is…

Rasmus Praestholm:

Yeah, like, when you get something like Kubernetes, awesome, that's all standardised on that. Oh, wait! There are others like OpenShift and Mirantis that are kind of satellite kinda like it, but because somebody had to go make money off it over here, or because someone didn't like the colour of the icon or something like that. They're going to do it go do their own thing over here.

And it's just it's crazy. It's too much.

And what's worse is it still happens.

I could sort of almost understand it. If it's a startup that needs to make money, they try to make everything their own, everything custom, and so on because they have to make money off it somehow.

But even more open-source products are spinning up just because maybe 2 maintainers disagreed or something.

It's wild.

Jobin Kuruvilla:

But at the same time, all those people who are making money off of it. They're doing it, and people are still purchasing it for a reason. Right? I mean, there must be something good that they're doing, and there must be something that customers want.

Rasmus Praestholm:

Yep. But then it goes back to this very dangerous thing with a famous cartoon about how everything open source, Kubernetes, and all that ultimately relies on this one guy in Nebraska, thanklessly maintaining some library that's going to be open for vulnerabilities and all that.

And "with enough eyes, all bugs are shallow" works if you get enough eyes on them.

But if everybody kind of runs off to their own camps, and so on, that's not that starts breaking apart. And if you have the, you know, commercial entities that like, yeah, yeah, we just take everything open source over here, and then just add our thing to the top. Everything will be fine!

That doesn't work, and then they don't contribute back. They don't add more eyes. People get into creation is about licences.

Less fragmentation. Please.

Matt Saunders:

I love that, you know, the kind of the history of open source where you could run just everything purely open source. But we've got like abstraction on abstraction, on abstraction ad nauseam these days.

None of us have time to buy something, and I find it quite interesting. I bring it back to the Kubernetes thing, how we have this proliferation of services you can buy to take care of things. I mean, Rasmus, you say, as you know, everything will be fine if you just take some open-source things, cobble them together, and put a support contract over the top of it.

I find it interesting how different companies have been trying to evaluate where the line is drawn over the past ten years. Where do you have your techies—I don't mean that derogatorily—focusing their efforts? And where do you say, "I'm just going to buy that?"

And seeing that. And things like you went to Mirantis. So that's why Mirantis bought the remains of Docker enterprise. So Docker's commercial offering around Docker swarm and multi-server, multi-container stuff, which actually, I quite liked back in the day.

They seem to have made a viable business out of it, so people are buying these container orchestration solutions.

What's interesting, then, is if you put those models onto the cloud providers—Google, AWS, Microsoft—the sort of things that we're paying them for. A lot of it's compute and kind of like a fairly thin layer of running the control plane for us.

And seeing how more people are doing that sort of thing rather than actually running their own control planes these days, are there fewer people doing that than are running the likes of Morantis, Nomad, etc.?

It's, yeah. It's interesting where you draw the line, I think.

Rasmus Praestholm:

Yep. And I don't see it right now. Sure, everything gets better over time to a point. But then you also get things like Hashicorp and Redis, and all these things.

And again, that one guy in Nebraska is still maintaining that one thing on his own and like, when is something going to change? And I kind of wonder if we are going to face a paradigm shift thanks to AI. Will that make it easier?

But first, it takes a whole lot of power to run that thing…

Laura Larramore:

So, are we going to run out of energy trying to run our AI?! So, what do you guys think about this news story that we passed around about Oracle infra onto OpenAI?

Rasmus Praestholm:

Yeah, it's like OpenAI can't get enough compute. I think I read something somewhere, too, that they're also using part of GCP, so just like more compute, more compute!

Jon Mort:

Yeah. Well, I think you see you. You see all of this, the shortage of GPUs, Nvidia's share price? Right? I think that's kind of where all this… I just think it's fascinating that there's that level of collaboration between competitors to just scramble around to make sure that OpenAI has enough. GPUs, I think that's a that's a fascinating thing. I'm not quite sure what I make of it, but it's fascinating, yeah. Fascinating things happen.

Rasmus Praestholm:

It's wild.

Matt Saunders:

Yeah, it's almost come from nowhere. I mean, I've been having interesting conversations with people like, Oh, my God! Have you heard of this company called Nvidia?!

Suddenly it is more valuable, allegedly, than Apple and Microsoft.

I mean. Obviously, I think I'm among fellow nerds here who have been tinkering with graphics cards for far too many years and know who they are. But yeah, it's like, it's a massive hype thing.

And you're getting previously. I'm going to tread quite carefully here with what I say about Oracle companies that maybe have a reputation for not necessarily entirely friendly business practices, all of a sudden cosying up to big rivals to get what they need, given the scarcity of this stuff.

So, that was what struck me in this story.

It's like everyone's collaborating together to make OpenAI as big as it possibly could be, which is quite a scary thing, really.

Rasmus Praestholm:

It is.

It's hard to play as a little player in this game, where it used to be that you could kind of this scrappy startup, most anything.

But that's tough with the AI.

And I wonder if it's a one-time transition phase where we're like, okay, we need all the GPUs and make this one giant model. Do we come out on the other side, and then things go back to normal, or is it a new paradigm, or oh, hello singularity!

We don't need to work anymore.

Like, what happens? That's… it's hard to know.

But I kind of wondered, just to tie back to Kubernetes, because I know that there are a lot of tools coming out to try to help make Kubernetes less complex.

But then, does it become different when you suddenly get to a point where an AI trained on anything Kubernetes can just generate all the stuff for you, anyway?

Jon Mort:

Well, just just just to add to add to that, as the things you always OpenAI have been quite vocal about some of that. Some of their training clusters and how all of that's running Kubernetes. And that's that big, that big orchestrator. So you've got this thing which learns it learns due to being run on Kubernetes clusters generating Kubernetes thing. So you end up with this circular phenomenon that ends up helping build itself.

Jobin Kuruvilla:

Yeah. But if you really think about it, one of the major problems that Kubernetes has is security. Right? I mean.

Just because, you know, people think it's too simple. They overlook security, and then eventually, you produce platforms that are less secure. I mean, AI has definitely a good place to you know, do something there.

Because, you know, if AI in many ways, you know, we are already seeing some plugins coming on top of Kubernetes, which will look at specifically at securities and the flaws that it detects and highlights.

Because, you know, monitoring is not so good in Kubernetes. I mean, you have to obviously go for something else, a third-party add-on, to help us monitor it. If AI can do that job for us and highlight the issues that you have in the platform, that will certainly help.

That's been the problem all along, right? People who are using Kubernetes clusters, they're not so good in terms of, you know, spinning up clusters. All they care about is, yep, just go and do any KS cluster, you know. That's pretty much what people are doing now. They don't realise the underlying complexities. If AI can do that part for us? That will make it easier.

Matt Saunders:

Yeah, we could. We could tell AI to do that for us, can't we?

I mean the Kubernetes security thing has been a concern for a long time. I remember probably 5 or 6 years ago, when Docker Swarm was still a thing before Docker sold it off, I was training people on Docker Swarm, and they made a big thing of people, in the materials, saying how Docker is secure by default.

This raised a little bit of a giggle. But actually, in the context of what Kubernetes lets you do, this is absolutely true.

And yeah, the platform is one of the biggest benefits of all these flexibilities. You can do exactly what you want, including if that is shooting yourself in the foot and letting endless hackers from wherever you want into your as soon as they're in your system, being able to go all over the system unless you actually get into proper things like pod security policies and ingress and out– ingress and outgress?

Ingress and egress network policies. All that sort of stuff.

And this actually, maybe, if we want to pivot now onto some of the things in the news at the moment. As we see, there's so much hype around the AI stuff, especially around DevOps and getting people to deploy software better. And this stuff now seems to be heading down a route where we're actually getting AI that's actually useful in DevOps.

One way we're seeing that is in security. Put that together with platform engineering-type principles and paved roads, and we can use AI to poke holes, pun intended, in our security policies and make this sort of stuff actually secure where it isn't by default.

Is that the overall answer? Do I trust an AI if I say, hey, ChatGPT! Can you write me a network security policy that's gonna ensure that no one could ever hack my data and trust it – NO! But… Go on, Rasmus.

Rasmus Praestholm:

Yup. That's where I started playing with something that I wonder if has some staying power? Because, yeah, no one should trust an AI to learn, give me X, and then just take it at face value that, yep, that's gotta be secure and all those things.

I've started out trying behaviour-driven development, where you write features for whatever you're trying to do, break those features into scenarios, and then run them as actual unit tests.

And I've done that for deploying Kubernetes, things like deploying Helm charts and adjusting things, and so on.

And I have a feeling that, if you like, you can separately ask the AI for your feature writeup, and then you, as a human-like—yep, check out.

Then you ask the scenarios, and, yep, it checks out. Then, you ask for the unit tests. Yep, test out—and they pass?

That's why, suddenly, you're in the loop yourself as a human, and you have the automated testing to actually validate things.

That's a little more difficult if you try to have AI write a poem for you.

But at least in Kubernetes, and to some degree, software development, if you can combine it with automated testing to help validate it, you don't have to trust the AI. You can verify.

Lilly Holden:

Yeah, I agree that that's exactly how I use AI at the moment. It is almost peer programming with an AI, so I still verify, and I still have that conversation going on. And that's where you can really be accelerated versus just expecting AI to do things for you and be right the first time.

Jobin Kuruvilla:

I love that idea—peer programming with AI—because I think that's how we can utilise all the capabilities that AI can give us. Again, cross-check. Make sure it all tests out, as Rasmus said, so that way we can save a lot of time—and actually write the poems that we want to write—because I don't want AI to write the poems for me.

I do want it to do some programming so I can save time there and make sure again. Automated testing is where all the DevOps principles apply, I guess.

Laura Larramore:

Yeah, AI is actually really useful in that realm for helping you generate the test. And of course you, you have to check it. I also find it useful in deployment scenarios where I need to provide documentation. I can get a little help with the wording and all of that stuff.

And I think that that helps, too, and that can help in the development lifecycle to be able to quickly generate that documentation and not have to, you know, bear down on that and get back to the poems we like to write.

Rasmus Praestholm:

That does make me think, one day… we're gonna automate all of the things. Then what are we gonna do? Are we all gonna become poets?

Matt Saunders:

Oh, I hope so.

Yeah.

Laura Larramore:

It sounds lovely. I would love to just sit and play in my code poet life! That would be so nice.

Jobin Kuruvilla:

I mean, if you really think about it, I can write better poems than me, so I don't know what I'm going to do!

Matt Saunders:

No, Jobin, I find that hard to believe. I think there's an artist within us all.

Hopefully, we can get… We shouldn't be letting these AIs do the artistic things. Let them do the science.

I'm done with science.

They can do the science, and we do the arty bits.

Rasmus Praestholm:

I do like that when we're just using AI to augment ourselves rather than just handing our jobs over like that.

Matt Saunders:

Yeah. And that seems to be the reality of how it's how all this stuff is emerging.

I mean, you know, inevitably, it's not as simple as like AI is not going to take your job. I think we covered this before, actually, and – but probably a few months ago. And what I really like is, now we are seeing tooling emerge with that in mind.

So maybe a year ago we were like, Oh, you can get Visual Studio's now got an AI assistant that's going to write all your code for you.

No. But these days, dragging it back on topic, just looking at a news article here, Pulumi adds generative AI co-pilot to manage cloud infrastructure. And the whole thing here is it's like, yes, we know the mistakes that AI can make. We know that AI is never going to have the full context, and the developer will. But we can still innovate and bring out tools that save us a lot of time.

We can peer program with it, as well as with a Pulumi AI cloud infrastructure, generator, and tool. It's one of those things.

So maybe it's not going to be terrible after all.

Jobin Kuruvilla:

It's also interesting how Pulumi does it. Right? I mean, it's adding basically a lot of AI assistance on top of the DevOps pipeline that we have. But going back to the point that Jon was making early on, in a distributed environment, there are so many dependencies and so many complexities that no AI is going to solve. I mean, that's where human intervention is needed.

Granted, at the end of the day. We don't need 100 DevOps engineers with all the capabilities that Pulumi and other providers are coming up with. But we may still need 5 engineers who are really good at, you know, using these capabilities that AI is adding.

And then, you know, we need to make sure that we are also filling in the gaps all those complexities because of our distributed application architecture and maybe multiple cloud providers, right? So…

Laura Larramore:

I think it'll push us forward. And then you see that news article where Oracle and OpenAI are kind of joining together. And you think all these very large corporations with lots of resources are driving this. And I think it's very important that, as people in development, we hold on to what we have, and we continue to push back and say, “No, I'm not going to use that in that way. I would like to use it this way", and that can help drive some of that change that they have a lot of.

They have a lot of resources to drive it one way, but I think that many people who use it can have a voice and pull it off in the direction we would like it to go.

Rasmus Praestholm:

So I know we've been looking at Foil dot AI or dot io – I close track… Is that the one that can always kind of sort of help teach itself when it screws up DevOps because it sounds like that's one of those DevOps-related ones that we can really find useful?

Jon Mort:

Yeah. I think this is an interesting, interesting tool because it's very much that kind of a companion thing of, like, helping you with some things, and then, if there are errors, it can help you understand those errors. So I think that's an interesting assistive tool. And I think this is where the discussions are going, right? This is like those assistive tools; those are the ones that kind of give you superpowers like those are great, those the ones that will go on take over, not so much…

So that's where I kind of want to see tooling, going, making sure that human in the loop is is that is important—and just making you better at doing what you do best.

But that's got to be the best kind of AI, and I think Foil looks in that in that region of of that kind of thing.

Jobin Kuruvilla:

I completely agree. And I remember reading somewhere. Actually, we have seen it happening; there is AI that can produce a complete podcast – but I'm not sure if it can actually go the wide range of topics as we did in this one on Kubernetes today, coming back to security and how we can use AI and Kubernetes.

So it's it's interesting. Yes, I would. I would still see it as a great assistant, but something that replaces me?...

Laura Larramore:

We should put it on our Tagline – DevOps Decrypted: still better than AI!

Jobin Kuruvilla:

I will take it!

Rasmus Praestholm:

So, with AI, are we going to get an improved developer experience, or are we just going to outsource the developer experience?

Where are we going with DevEx in an AI-enabled world?

Jobin Kuruvilla:

We should probably have a podcast just along that topic.

Lilly Holden:

Does AI really know what is a good developer experience? Or do you need a human to say I like this, so I don't like it?

Jobin Kuruvilla:

I love that thought…

Matt Saunders:

Lilly, I think you just invited yourself back onto the next podcast!

I mean, you know, DevEx – DevEx is still emergent, and I almost see DevEx as a little bit of an anti-AI thing.

Yeah, just basically agreeing with you, Lilly. There, it's like, we automate things, we do platform engineering. And we give our users, eg. software developers, a load of tools and expect them to just go and click on things.

We give people buttons to do things. And I like that DevEx is not coming around as a reemergence of a like, hang on. Wait a minute. These are humans who are writing our software for us.

And I'm a little uneasy about how we mesh the emergence of AI with that initiative, or maybe DevEx becomes, or the requirements for DevEx other than, like, do you like this? Become almost automatable. And then we let the machines take over again? I don't know.

Rasmus Praestholm:

Yup, Yup, and from there, you can also kind of get from DevEx is not just about being, you know, happy developing code and all I would like to think it slowly extends into kind of a more sensitive future in which we're thinking about, you know, mental health and all those kind of things as human beings which could perhaps tie back to that thing about open source and fragmentation, and just help cut down on that to where we can all just live together in a happier world.

We could sort of become… poet psychologists.

Matt Saunders:

No more Java writing forms.

Yeah.

Let's train those AIs so that we can all become poets.

TL;DR, if you're just joining us.

Laura Larramore:

This has been a fascinating discussion. You guys, y'all are so great.

Lilly, thanks for coming on and talking to us a little bit about Kubernetes in light of its ten years. I appreciate everyone's thoughts and discussions around AI.

On the last topic, DevEx, we will be talking about that next month with Jennifer Davis. So please come and join us for that.

And for our panel here—Matt, Jon Jobin, Rasmus, and me—this has been DevOps Decrypted.

Why not leave us a review on your podcast platform of choice? Let us know how we're doing or highlight topics you would like us to discuss in our upcoming episodes.

We truly love to hear your feedback, and as a thank you, we will be giving out some free Adaptavist swag bags to say thank you for your ongoing support!