DevOps Decrypted: Ep. 15 - Are Kubernetes Tools Doomed to be Complicated?

Please update your cookie preferences below to view this content.

Summary

Welcome to DevOps Decrypted Episode 15, where we ask; are Kubernetes tools doomed to be complicated?

In this episode, some new voices (and faces, if you’re watching on YouTube) join Romy and Rasmus. We welcome Michael Guarino, CTO and Co-founder of Plural, who shares his knowledge on deploying open-source software onto Kubernetes.

We also speak with two more of our fellow Adaptavists; Timothy Chin, Platform Team Lead for our new venture, Venue DevOps – and Daniel “Chalky" Chalk, Engineering Manager for our business services infrastructure.

It’s an in-depth episode that’ll resonate with anyone who’s juggled deployment with Kubernetes, Crossplane, Terraform, Rancher… And the main takeaway?

Kubernetes is hard. Will software and DevOps practitioners ever find a silver bullet solution?

Romy Greenfield:

Hello, everybody. Welcome to another episode of DevOps Decrypted. I'm your host, Romy Greenfield, and today joining me, we have some new guests, as well as some similar faces.

So we have Rasmus – we have Michael, Timothy and Daniel.

Do you all want to take a sec to introduce yourself – Michael? If you want to go first?

Michael Guarino:

Yeah, so I'm Michael; I'm a Co-founder and CTO of a company based out of New York City called Plural – and what we do primarily make it very easy for people to deploy a large variety of open-source software onto Kubernetes into their own cloud environments.

So we've spent a lot of time playing around with Kubernetes, and all 3 of the "Big 3" clouds, so we have a fair amount of knowledge to share there and hopefully can contribute to the conversation.

Romy Greenfield:

Awesome. Thank you!

Timothy, do you want to go next?

Timothy Chin:

Hi, Timothy here!

My job title is Team Lead, or Platform Team Lead at Venue DevOps, which is part of Adaptavist Group. So we are building a developer experience platform here for DevOps. So, yeah.

Romy Greenfield:

Thank you, and welcome. Daniel!

Daniel Chalk:

Yeah. So yeah, I’m Dan. I'm Engineering Manager for an infrastructure team that works inside Business Services. You could say we're like a platform team, but generally, we do a bit of everything.

Romy Greenfield:

Awesome. Thanks for joining us today. So today, I think we're going to talk a little bit about Kubernetes and it actually being quite hard.

Does anyone want to kick off the conversation?

Rasmus Praestholm:

Sure, so I can chip in since I thought about the topic out there because nowadays, I think we passed the point where people have recognized that, “Oh, wait, yeah. Kubernetes is actually still hard.”

Just using it raw and looking at the gigantic, you know, cloud native computing foundation map of you know, hundreds of different things you can do with Kubernetes. It's gotten pretty crazy.

It seems kind of like it hasn't really gotten easier to use it yet. There have been a lot of attempts to do different, like layers on top of Kubernetes, to make it easy.

So what have we done? What have others done? What are we even trying to do? What are you trying to do? Or are you trying just to do better workflow management or workload management? Are you trying to make the developer experience easier?

There's so much to go on.

With Adaptavist, we've tried a bunch of different things, which is why I invited Dan, AKA Chalky, here on the call, to sort of just give us a little intro on, hey, what have we done internally at Adaptavist to this over time?

Daniel Chalk:

We've done quite a few things. We haven't actually got anything that I'd say looks like a platform yet, is one thing I'd be honest about. I think we've all seen people go across this “digital transformation” journey – I'm doing air quotes here – that takes years, and then nothing's delivered.

Because actually, it's hard to deliver a platform for people. And that's what Kubernetes is. But it's not a platform. It's a framework, really, as far as I'm concerned – it gives you the foundations to build some things, you know, and that’s actually the most important part of the journey, isn’t Kubernetes?

It’s actually about the finished viable platform for your organisation. And actually, that answer, that question is different, depending on the size of your business, how many business units you have – what does “good" look like? That also will change depending on who you ask.

And so part of our earlier journey was product so– actually, it's Adaptavist’s equivalent to Plural.

You know, you could look at it that way. It is a service catalogue. You can run a turnkey application on it. It's just that the turnkey applications are purely Atlassian-focused. And, you know, we want to run things like Jira DC and those sorts of things on there.

Rasmus Praestholm:

Is one of the [inaudible] Rancher?

Daniel Chalk:

Yes. And it still is, and Plural might actually be a suitable replacement for it because one of the problems of Rancher is end users are scared of it somewhat!

So it gives this personal service catalogue – it's really cool, but actually, it's a bit of a power user tool when you go into it, like, there's a lot of knobs and dials to change and stuff.

And some of these people on Atlassian, they're not developers. They're not. They're not SREs, not DevOps-y people. They’re consultants. They know how to manage Jira.

And they just see all these options, and they don't know what to do. They just have to follow some guidelines and hope for the best.

Rasmus Praestholm:

It's still just a UI for Kubernetes, much like you would see a nice dashboard on…

Daniel Chalk:

Yeah – underneath, it's just a chart. You know, it's just that. It's just a Rancher flavour of a Helm chart, you know, and ultimately you can abstract that any way you want it to. It's just that for the sake of getting something done quickly, we picked Rancher.

Also, because it did cover some other use cases, by the way, so things like SAML SSO, that kind of stuff, you know, some of those things out of the box, so it was like the path of least resistance was to pick that tool for this. But the sacrifice was the user experience.

And over time, we will correct that. And again, this goes back to the finished level platform lot. What's the correct thing at this point in time and in the future? You just, you know, keep iterating.

Rasmus Praestholm:

Yep, and I do also remember that we did start a Backstage at Adaptavist as well. But much like, you said, Kubernetes is a platform, but it's really kind of like a framework to build a platform. Backstage is almost like. Yeah, it's an IDP, but it's almost more like a framework for building an IDP; you really have to dress it up yourself a lot.

Daniel Chalk:

Yeah, eventually, it will be a framework. You know, it's still in that phase of not being in an internal project of the company anymore, and so it's having to change its shape to accommodate the fact the community runs it now.

So it doesn't have things like a robust plugin system, you know, it's still got some of the technical debt of it being internal, where you just, you know, you just change some boilerplate, and things happen. It's not bad. It's just an evolution of it being open-sourced.

And you know, and eventually that those things will become mature.

So until then, it's not. I wouldn't necessarily even call it a framework. Not yet. It's not in the right shape to be a framework, but still a powerful tool. Right? You still get some good stuff out of the box.

Rasmus Praestholm:

They are weirdly similar in a sense. Kubernetes also came out of Google and became open source. Adapt it, adopt it, and worked on and so on…

Even though you might still have a hard time asking somebody, hey. What does Kubernetes do?

Daniel Chalk:

It could do anything! If you so wish it to or have the intelligence to do it yourself! So you, in simplest terms… It's an orchestrator, right? It doesn't even necessarily have to be containers if you know what you're doing when we talk about Crossplane as a perfect example, huh?

So we took that Crossplane as a perfect example of that, right, where it's not containers we're orchestrating now – we've just given it a different DSL to run it through infrastructure.

Rasmus Praestholm:

Yeah – real quick on that, then, internally, we have been playing with something that you call Kubera, which also uses Crossplane, and so on. Can you tell us a little about how that works and what it does do?

Daniel Chalk:

Yeah. So Kubera is more of a code name internally. For what Kubernetes look like at Adaptavist and its numerous business units, so it's more of a discovery for us again.

Talking a bit about, you know, the minimum viable platform.

And so, we’re kind of eating our own dog food because we write our own services and stuff, and we need to run them somewhere, and everyone's got slightly different, you know, approach to doing it, and actually teams at Adaptavist, we have the same problem – we have different business units teams – they've made their own platforms because there hasn't one that pre-exist.

And so we just tried to work out what's the finished viable one so that we could give people options. And we want to start with the developer experience; that's about feedback. It's not about giving them a production environment.

It's not about telling everyone they need to move to this. But can we improve their feedback cycles? Can they get feedback fast as part of the development lifecycle? It's those sorts of things we care about more.

So in this, we started off looking at things like Argo CD; we've started looking at remote developer environments, so how quickly we can get the environment set up and running.

And we look at things like telepresence and such as well, so they could actually run a container, get feedback from it, basically integrating with that adjacent services and stuff.

But simply that Kubera is actually just about us doing that discovery and working out what that's in the finished level platform is. But we also know that there is some baseline knowledge that we have to acquire in a team first before we can even go out further and look at all these other add-ons, services and products.

Rasmus Praestholm:

Right? And that's getting me excited to talk about Plural.

But let me get through one little quick thing first. I know Plural does some of those things. But you also had worked with Crossplane – I know Tim has been working with Crossplane because we're working on the whole Venue thing together.

What are our thoughts on Crossplane? I mean, how are you using it correctly? How does that orchestrate Kubernetes, and so on and so forth?

Daniel Chalk:

Well, honestly, I don't even know what using it correctly looks like yet – I know what using Terraform correctly looks like, and maybe I'll work backwards from that.

So the first thing we did when we installed Crossplane was seeing the numerous amount of CRDs that they just dragged the control plane to, not to a halt, but might as well be. And so what we did. We went. Okay, let's just focus on the one CRD we care about, and that is the Terraform module.

Because we've got loads of modules we already use, and if we can just provide a suitable abstraction for that. I think we'd be happy. We’d be cooking on gas rather than implementing the same thing, but under a different DSL, you know, and that was also our way of controlling the load on the Crossplane– sorry, on the control plane, because that control plane only has one CRD now, or that's the one CRD we care about. We don't care about all the others because we'll let Terraform do the rest for us.

And that's kind of how we've managed it for now. But we've not had any real big gotchas, really, other than that. It's just us learning to read the manual half the time, actually than understanding the product. But we've not used it in anger enough to have any of the painful lessons yet.

And the only nice thing about sticking to the Terraform module provider is that it means I can use my current domain knowledge to at least triage the problem.

And because then that knowledge goes back to the module, to Terraform, not to say, you know, the Kubernetes or Crossplane itself? I've shifted the problem essentially. I've managed to complete the cognitive load of the complexity on the team by choosing something that could be seen as lesser, but it is again not the right choice at the right point in time, and then we might change our minds as we go forward.

Rasmus Praestholm:

So that brings you to the next step of Crossplane and Terraform. Do you use them together, or can you use just one or the other? I know Plural just uses Terraform, but we also use trying to use Crossplane for Venue – Tim, what are your thoughts on the whole setup?

Timothy Chin:

So, yeah, just like Daniel–Chalky – mentioned, Crossplane does have different providers, for example, for AWS, for different clouds, and even for Terraform. So I've tried to use Crossplane with the Terraform provider, and one thing to that I learned my lesson from that is to make sure that the Terraforms that you pass into Crossplane are, is it manageable size Terraform. Right? In the sense that doesn't throw in a Terraform that runs for like 20 or 30 minutes because you won't get much output from the Terraform provider in Crossplane.

So that's one of the main things that I learned from that otherwise is, again, just re-implementing Terraform in another in another wrapper, just like, yeah!

Rasmus Praestholm:

So that brings up a thought.

Because this is really technical and awesome and all that. But pretend for a moment I'm a dumb user – because I am.

If I'm looking at Kubernetes, Terraform, and Crossplane on these things… Ultimately I want to like host things there. So what are we actually trying to do by introducing Crossplane instead of or in addition to Terraform? Terraform is about multi-cloud, and Crossplane is about multi-cluster. What are we talking about here, really?

Daniel Chalk:

So the Terraform thing is just to manage our cognitive load. You know. There are other things we're trying to learn, and we're trying not to put too much weight on the control plane. That's why we've opted to use Terraform in that way, so the end user there should be a suitable enough abstraction that they… Not that “I don't care”, but I should care less or worry about it less. So I am thinking of using something like KubeVela.

And now you're introducing annotations to decorate your charts, and all that person's now saying is, “I want a database”, you know? What actually is getting pulled in is hidden from them, essentially, or to a degree, hidden from them.

Timothy Chin:

I think what the user wants is a nice and friendly user experience, right? What they want is a site that shows up, which is friendly, has good UX and good UI, and shows them the infrastructure that you want to build, correct?

So Crossplane does facilitate the back side of that right. It abstracts for the developers that develop the platform. It helps us abstract that information and only capture the information that they need, which is, for example… “Oh, you want a database? Okay, do you want a PostgreSQL or MSSQL, right?

That's it. So yeah, I think that's what the user wants, and the Crossplane does help to do that in the back end. But there still needs to be something for the user to use.

Rasmus Praestholm:

It almost sounds like there should be a front end – something like Plural! That makes it all nice and neat! Maybe we should talk about that for a little bit?

Michael Guarino:

Yeah! So I'll speak to Crossplane specifically and then go into what we're doing. So we have used Crossplane at Plural a few times, and the issue mentioned about it jamming up the control plane. We hit pretty quickly.

And I don't know how deep you guys dug into it, but it's basically a bug in Kubernetes itself that gets triggered. So when you add a CRD to a control plane. There's a process in the device control plane that basically reconstructs the entire open API specification for the Kubernetes control plane on each CRD load, and that's like an extremely heavy process.

And then, on the other side, when the Kubernetes client runs any when it like the go client, specifically runs any API call, it'll do a discovery call against that open API spec to do some basic pattern matching in case you don't provide the fully specified API.

And so if you put in a few hundred CRDs like Crossplane does, that API discovery transfer that that the client does becomes extremely slow, and it'll do things like, make all the operators that are running in your cluster effectively inoperable, or it'll make it very difficult for you to use Kubectl and all-time out all the time, and cause all sorts of annoying problems. If you just install, like everything and Crossplane at once.

Eventually, Kubernetes will fix that. But it just hasn't happened from whatever we can see, and in Plural's case, a lot of what we do is we'll deploy a lot of different applications, all in the same cluster for efficiency.

And so, if you deploy Crossplane alongside other applications that need operators and provision CRDs, it will really degrade the cluster pretty quickly. So that was one of the more annoying things that we saw with it, and we sort of stopped using it as a result. But there are other solutions that do stuff for Crossplane, specific to certain clouds.

So, like Google has a config connector, there are other CRD cloud resource provisioners out there that can reduce the CRD scope and like keep your cluster from blowing up, basically!

Rasmus Praestholm:

So is it sort of plausible, then, if you have a completely dedicated cluster, just so you can use something like Crossplanes as cross-cloud operators, and–

Michael Guarino:

Yeah, just mitigate the problem significantly. Or if you just use a single cloud provider for Crossplane and don't install like 3. It'll also mitigate the problem because it'll be less small.

You'll create fewer CRDs, basically. There's just like a number of CRTs at which your control plane will probably start having a lot of trouble…

Timothy Chin:

So it's basically kind of like a cluster of clusters?!

Michael Guarino:

Yeah.

Timothy Chin:

Where the CRD, the Crossplane is in

Michael Guarino:

Yeah, I think the right policy that you have with Crossplane is they'll have, like, a dedicated Crossplane management cluster. It might manage other clusters, or it might just create resources and stuff like that.

There are some other interesting things you could talk about, like the user feedback dynamic around it. Because if you're using Crossplane to to provision resources, you're kind of hoping that your user is familiar enough, at least with Kubectl, to be able to understand what's going on with the Crossplane resource because they're going to have like look at the CRD status field to know if it was created successfully, or it wasn't created successfully, and all of that.

And so it and what we frequently found is, people just are not comfortable with Kubectl at all, but they can sort of understand how to use the Terraform DSL: so what you guys are doing where you're just using Crossplanes effectively a way to execute Terraform makes a lot of sense from a user's experience standpoint to be, because I think it's probably easier actually to teach people – or at least find people who already know Terraform.

Then it would be currently to get them to understand how the reconciliation of a Crossplane CRD is happening and how to debug that if they misconfigured it in some way.

But that was another thing.

And in like, in the worst case, for whatever reason, something is really weird, you end up having to actually go into the server logs for the Crossplane operator to like to understand what's going on, which is what is a pretty unusual experience, and a lot of users will have some trouble with it that said like if you wanted to create a front end for re-provisioning resources, Crossplanes. Probably amazing for that. Because you do have wonderful little YAML objects that you can make API calls against and provision resources, whereas, you know, it's basically impossible with Terraform, you just can't dynamically generate Terraform code easily. But yeah.

Rasmus Praestholm:

Yeah, it was beginning to sound like Crossplane is like Rancher just without the UI.

Michael Guarino:

Yeah, that's basically its current status! I don't know if Upbound, which is the company that is trying to commercialise it, has created a decent UI; they actually might have. That probably is what they're primarily selling. So it might be worth it to look into that if you're interested in using Crossplane. But yeah.

Rasmus Praestholm:

So how does Plural at the UI and all this, and make it user-friendly for you that actually want a thing?

Michael Guarino:

Yeah. So Plural is interesting – it's a different solution to the same problem. So what we're trying to do is we're trying to make it really easy for people to take an off-the-shelf software like Airflow, Airbyte, Dagster – oftentimes our data software, but there are also things like Grafana for metrics storage, and deploy it into their own cloud so you have an AWS account, you know. You want an Airflow instance in it, or you know you want a Grafana on it, so we want to make make it as easy for you to get from that point A to point B.

And what we have is, we have a catalogue of those open-source applications that have been packaged for each of the clouds, and the packaging ultimately resolves to Helm charts and Terraform modules and a dependency tree amongst them. So if you wanted to deploy Airflow, it's going to have that in an Airflow home chart and Airflow Terraform module.

But then within it there's going to need to be a dependency on a PostgreSQL operator provision as a PostgreSQL database, dependency on the core Kubernetes runtime – that's things like external DNS or manager, controllers stuff, like that, and then Terraform, ultimately actually provision the cluster from 0 to one.

So you can't run any of this without a Kubernetes cluster in the first place, and our command line tooling will actually generate a full Git repo for you with those resources loaded into them. So you'll run Plural in it, and create a Git repository as part of the process. Plural Build will generate all the Helm and Terraform for you, and Plural Deploy will understand the dependency ordering of everything and then just execute it one by one to provision that resource.

Rasmus Praestholm:

That was kind of one of the things I found more impressive was just the onboarding, especially considering and I want to highlight this because you kind of briefly mentioned it. But you can use Plural across different clouds almost seamlessly, like, if you set it up on Google. Well, okay, here are the apps you can use, and they have, I guess, this little like thin adaptor for Google or AWS or Azure, and it just works – which brings me back to how Chalky was saying that, well, maybe we did this thing in Rancher, but it almost kind of sound like Plural would be cool for using these additional bits and pieces.

Michael Guarino:

There are also some other things. So we're giving you the ability to deploy Airflow. But we don't want to have it as part of the trade-off; you have to manage Kubernetes as a result of that. So like, if the problem you're trying to solve is “I don't know how to stand up and deploy Airflow", you probably can't also solve the problem “I don't know how to stand up and deploy Kubernetes”. So we make a few key decisions on that one is we always use manage control planes because we don't want people to have the operational burden of understanding how to troubleshoot things…

And then, we also made a very good, featureful UI for cluster administration. That's somewhat geared towards beginners. But in reality, I haven't found any Kubernetes use case that I couldn't solve with it, and it'll have things like interactive run books to be able to resize resources like databases within the cluster. It is appropriate; it has like it. It will actually accept and apply upgrades for all the applications for you over the air. You can configure, and you can add or remove users from open ID connect providers for the applications in that UI.

And you can visualise cost information. Basically, everything – it’s a full DevOps control plane for those applications. We vendor for you as highly configurable as well. So we can. We can tweak it in various different ways using CRDs to like to improve the operational experience, but the goal is, sort of like you said, Rasmus. Where like, you really don't have to understand anything about Kubernetes at all. You just have to know that you want Airflow or Airbyte, or any open source application, or theoretically, the Atlassian products, and click a few buttons and run a few commands. And ultimately, you actually have it deployed in your cloud if you have the permissions actually to predict those resources.

Rasmus Praestholm:

Cool. So yeah, that does seem like it lines up with what we've been using Rancher for internally, which is sort of like spinning up well-known applications in environments for use by, you know, individuals and so on. And that's kind of like that was our use case internally because we need a demo environment for consultants to play with and do things that actually do work on the applications.

But that's also the challenge of developing applications in the first place. So that gets into another interesting area of okay... You have Kubernetes. Maybe now you have it orchestrated to a point. Then what? It’s like… There's some degree of “we need some apps”.

But a lot of them are also developing their own things. That's for things like Backstage. Try to do developer-side things. But how do they meet in the middle? Are you trying to reach a point where you can also do your own like homebrew, stable or develop applications and deploy them, manage them across the environment tiers, dev, QA, production – that sort of thing?

Michael Guarino:

Yeah. Yeah. So the way we sort of like segment this is, there's like two categories of applications. We can call one category third-party or vendor software. Open source is in that flavour. But, like the Atlassian products or another. So software that's actually developed by an organisation that's not within your own per-view. And you're bringing that software in-house to run.

And then there are first-party applications, things that you would actually be developing and building for yourself or your own business. And that's actually the next thing that we're putting on our plate. We ultimately want it to be as easy an experience as possible to run anything you want on Kubernetes anywhere. That's kind of going to be our overarching mission going forward, and being able to solve for the first party would be the final step of that.

And I think there are a few interesting things that I've seen pain points around here. So one obvious pain point is a lot of people just straight up do not know how to provision a Kubernetes cluster properly and understand it's a credential chain, so like it's actually non-trivially hard to create a Kubernetes cluster in Terraform.

We've done it many times now, and there are many guns. So we're actually investing effort into using Cluster API properly because we think that's a better way of managing the lifecycle of Kubernetes long term.

And the real big thing that happens is maybe you get the cluster up. But then the process of upgrading the cluster. It's just an incredible nightmare, and so we think that the upgrade flow within Cluster API and its constant reconciliation process is going to be a lot smoother than doing it with a lot of standard infrastructure as code.

Another big pain point we've seen is there's the toolchain around Author Ring Kubernetes manifests. It's just not actually that great, like Helm is cool. We use it. But if I were developing my own applications, I would prefer not to. Honestly.

So we're. We're investing in doing a really good job with cdk8s, which is a cool project that allows you to author Kubernetes resources in standard programming languages. Javascript and typescript are the primary ones, and that's a pretty good plastic language to do that sort of thing, but they also support Go and Java and some other languages as well.

So we think having a good Kubernetes manifest authoring experience is a key portion to this.

And then the other thing is actually being able to deploy the applications seamlessly is still a little bit tricky for people, especially because they don't know how to provision Kubernetes in the first place.

And they don't know how to manage this credential chain and all of that. So the last thing would be building a proper deployment engine that allows you to take cdk8s resources from a Git repository anywhere, and have a good stage pipeline deploy system, from like dev to staging to production, with the like approval, change, or integration tests in between, and all of that.

But we think like, if you get all of those together and have them seamlessly interoperable, you have created what hopefully would be a pretty good developer environment around Kubernetes.

And so that's what we're focusing on.

Rasmus Praestholm:

Cool.

Daniel Chalk:

Oh, you got a disagreement from me regarding Helm. I've actually been looking at cdk8s as well for similar reasons. And the same is true for Terraform and cloud formation as well; their cdk8s equivalent is actually quite, quite useful for you to avoid having loads of boilerplate or very odd operations in code. You just like what you just, if you know, if you don't want that resource to exist. Just omit it, please! Yeah, When generally it has to be there, and you have to have then resort to hacks to hide it.

Michael Guarino:

Yeah, Helm is like it's unavoidable now because it's gotten so big and so prevalent, so there's going to be just like Helm charts you're going to want to use, and fortunately, cdk8s is compatible with it. You can inject a Helm chart into cdk8s. But there are two things that we see that I've personally seen, having written a ton of Helm charts now.

The template language kind of sucks… is just very clunky and weird, and you can't do a lot of things you would ideally want. I've used other template languages before that were a lot better. It's just not in that league.

And the other big thing that bugs me is there's basically no good story for you to testing it. So ideally, I want Kubernetes to add benefits. I'd be writing to have a unit test associated with it that can run in CI when you have a pull request against it and validate that at least it looks sort of kind of good before sending it into a cluster, and that's a really tricky flow actually to do well with Helm.

And I think it'd be very easy to do if you had something like Javascript for your authoring language because you could just use a standard Javascript unit test framework.

So you could have a better CI flow on the front end of your Kubernetes development. And then you can have you know your integration tests later down the pipeline, but it would just like ties together, a better life cycle around all of it

Rasmus Praestholm:

That does sound like another missing piece of the orchestration.

Because I think, like pre-Kubernetes, we were almost getting to the point of, like doing better-automated testing and really putting testing first and all those kind of things. Yeah.

Then, Kubernetes! Have fun!

Daniel Chalk:

I don't know if it was easier… so. And the reason I say that is, testing infrastructure is hard.

But it's really, really hard nigh-on impossible.

Often the best endeavours you can do are actually compliance, and testing these things. You know, ensuring that if you do deploy something in it and it will, and it works, it's not insecure, you know, speaking specifically from a Terraform front, or even a cloud formation one.

And actually, with Kubernetes, we just shifted complexity on the most part. All you're trying to do is at least try and ensure some guarantees that it's going to deploy rather than integrating, if you see what I mean, because normally, generally, when we deploy these things, we're bundling modules together that create an end-to-end solution.

No one writes one Terraform module for their stack. It's a collection of state files, and you're wiring them together to make something useful. And Kubernetes is largely the same.

Or even down to the add-ons, that's another thing. You're going to deploy it. It's got to be configured correctly for the next thing you want to jump on there to deploy. So I don't necessarily see it as a solved problem.

Because actually everyone's flavour of Kubernetes is different, and therefore tests will have to be different as well that you couldn't make an industry standard, “here's a testing frame you can employ”, and it's going to work for everybody – just like you can't take someone else's Helm chart, and it just works on your cluster because you've configured it differently.

So it's always going to be an ongoing problem.

It's just the quality of the tools and making them, you know, having them enable you is actually the important bit. And that's going back to the cdk8s being important that we actually have, you know, you got ritual tools actually to do it with.

Rasmus Praestholm:

Yeah, I'm: I'm intrigued now because again, as a well-known dumb user, about the best I've done is get into a Helm, and like, wow, this is complex, and went into some of those templating issues. But well, what if I want this environment to, like, look a little bit differently, and for a while, I got into like customise and Helm running in different phases, and all that kind of stuff, and it's just like – this feels like it's getting worse. So, whatever cdk8s is, save me!

Michael Guarino:

Yeah, I think no matter what, you're ultimately going to have to be managing like a couple of dozen plus objects and the creation of them to use to like to deploy a pretty significant service on Kubernetes. I'm going to involve, like, probably a few deployments; there are a number of config maps and services, some ingresses.

But the idea is, you know, creating a development flow around those objects that you can have fast feedback on, whether you're screwing yourself up, basically – so like some testing is, it's the best I can think of.

And then and then also, just like the Go templating, which is just like not fit for purpose. So like, if I were. If I were able to choose, I would choose not to use it. We use Helm specifically within Plural because it is just the industry standard tool. And so we don't want to inject like… We want people to deploy an Airflow buster to be something that is, that's close to what they likely already know is possible, and it's also nice that I can just call it directly using Helm SDK from our Go code.

But for like a proper end-to-end Kubernetes development experience. I would use something else, and cdk8s looks like the best solution at the moment.

And there's a similar play – I don't know if you guys are familiar with Pulumi. But for me, it has a similar alternative to Terraform for standard cloud resources. So you can write the stacks of Pulumi in Javascript, Go, or all sorts of different standard programming languages.

And it has a lot of advantages, so, like, for one thing, you could loop in it you can. You can have proper branching, you know it. There's like so much stuff that you can't really do well in Terraform that you can do in Pulumi, and it would just work because it's an actual programming language.

Rasmus Praestholm:

And another thing I want to ask – you talked a little bit about just like deploying applications off the shelf and potentially like developing – and how do you do your own Kubernetes, and so on.

If you're in a large enterprise and you have bunches of different teams and so on. Who kind of does what? Like do we need to get into this newfangled platform engineering? And is that like one team that takes care of all the hard stuff, and then developers, this can kind of like.

Hey! It's a candy store. I will have some of that, Some of these.

Michael Guarino:

Yeah, I mean, that's an interesting question. So I've worked at Amazon previously, Twitter previously, and a lot of different companies, and they have 2 different ways that I've seen the division of responsibilities flow like there are a lot of places that will have consolidated platform teams.

And then and then there are some places where, like operations, is a part of the standard software development engineers’ job responsibility. Amazon was actually the latter.

So every engineer on Amazon had the expectation that they would do some degree of Ops, and then places like Twitter or Facebook, actually Google as well, more like a centralised platform team.

I think if Kubernetes is done well, you should be able to have a very self-serviceable experience around provisioning new Kubernetes clusters and then creating development, a deployment pipeline to those clusters in a way that is quite reproducible, and that's hopefully what we would be able to build.

But that would move people more towards, I think, the Amazon model – which I thought was actually really, really good, because, like it, like realistically… The knowledge of how an application is going to break is probably much closer to the developer of the application’s head than a centralised team's head.

But if your infrastructure is just too unwieldy for various reasons, you can actually do that if you need someone who specialises in just keeping the lights on.

But if you have the ability for an individual team or an individual business to create a cluster for themselves, that cluster is going to be conformant based on various things like maybe you put in your security, scanning, and all of that by default.

And then they can just slap in their infrastructure, their Kubernetes resources to find in a consistent way.

I feel like you could sort of reduce the need for that degree of specialisation around just keeping the lights on and managing the platform. But we were still definitely in the early days, and it's ultimately the team's preference on how they decide to organise things. So it's not actually a truly right answer from what I've seen.

Rasmus Praestholm:

I think we've also been going a little bit back and forth ourselves, and like trying to think of new products and ideas, and so on, like Kubernetes orchestration. Is it like a Venn diagram when you have the Kubernetes orchestration circle? But then there's also some other stuff that's more like developer experience, like, how much do they overlap? Is there actually a sharp point where one ends and the other begins?

Kubernetes is hard.

Michael Guarino:

It's very complicated. Yeah, and it but it's like a palette. It's like a white space. There are tons of things you can do with it. And you said, like, create the appropriate guide rails for your organisation to use it. Appropriate, like, use it in an effective way.

From what I've seen, some places do it really Well. Lots of places, don't… it's hard to say…

Daniel Chalk:

I think all places are going to start with not doing it very well. I think that's actually the best place to start because at least you've started. That is probably how I’d put it.

That's probably where we are, as well – not that we do it wrong necessarily, but we're doing it the right way we know how we can.

If you see what I mean? And then, you know, you've just got to keep it trying, you know. Get the feedback, and then and then just keep adjusting. And yeah, you might make the platform available for people to change as well. So that I don't necessarily mean, like you have the doors open and everyone is able just to change what they like.

But I think that you shouldn't ever stop the customers internally from the add to, like, open a merge request, to go actually, like – thinking of Crossplane, as I've got a Terraform module I would actually like to expose. Could you? Could you wrap it for me, or could I just add it to a registry somewhere? Enable people to contribute towards it because if you leave it up to a siloed team or a platform team, your customers will overwhelm you pretty quickly if they're not contributing.

Michael Guarino:

Yeah, yeah. And that team oftentimes gets very quickly overwhelmed by, like, all the demands on them. So they just become a bottleneck and shed work, and in effect, it becomes inefficient as well.

Timothy Chin:

Yeah. You know, we like to make the user experience – like I go back to user experience, right, you make it where they can bring their own templates, they can bring their own Helm if it's Helm, they own CRDs and integrate it into the platform that they're building, relatively easily, and then for the general masses to use it.

So yeah, I think Rasmus’s favourite word is eating our own dog food? Right?!

Rasmus Praestholm:

Yup, Yup!

I want us to go further with, you know, Kubera and Backstage, and all these things, and just kind of figure out – what does it mean to us? And is it meaningful enough to what we can think of a way of building products, because we have, you know, as Adaptavist, we've had so many clients over the years that are, you know, usually using some large Atlassian suite, but a lot of them also using all kinds of, you know, DevOps, tooling, and things?

And can we somehow distil something that we have figured out is doing well and put it into a platform somebody else can use?

That's where I find it interesting to think about. You know, platform, Plural and Backstage, and others, and trying to spot where there are areas still not really being served sufficiently by different actors.

So that's where my interest comes in. It's kind of like I'm seeing Plural move into more like the first-party developer stuff, and that's cool. That's great. I'm seeing Backstage growing up a little bit and thinking about who uses Backstage. Is it like only developers, or would something like a PM or some sort of team lead ever be welcomed in there?

Would they have something they could figure out and look at?

Would they ever want to see something like Plural?

Who is where is my big question at times. But I'm not sure there's a clear answer to that quite yet.

With that in mind, I have one other question that I'm just curious about.

Kubernetes is great and all. It's also hard. We've ascertained that.

What about in enterprises that do things other than just Kubernetes – like serverless? Is there room for serverless in a platform that we might be able to think of with Adaptavist like Plural? Or are we locked into Kubernetes–

Daniel Chalk:

There has to be. There's a lot of love for Lambda internally at Adaptavist. The problem is, there is actually not anything like it, either. And what I mean, what I mean by that is obviously, there are other products that do serverless. But the developer experience that they've got is, it's pretty solid, and it's well known, and I enjoy it, and I wouldn't want to take that away from developers, you know, I wouldn't want to say “by the way, you can't use this anymore”.

And actually, it’s nice and atomic, right? They've got this one little repo. I've got a small bit of code. They're making a change in that. One change is live, and all things I have to worry about – half of them are gone right because that's what the whole point of using that sort of stuff is.

And those are systems that still have limits. And then you still want to come and bring it back in-house. And actually, for some of our products, that's actually the case that we do actually monitor that and go actually, we now need to make it some, running this on a container on it, even on EC2, might actually be a better option.

But you still have to make room for the service stuff because it enables people to prototype really, really quickly.

And it enables small teams that aren't as cross-functional as they would like to be able to run something and not necessarily care about what they're running it on. You know. They can just focus on literally the function itself, and they get a lot, and they get everything else for free, you know, even like scaling. This is the first thing you can look at, right? You don't even have to think about it. It just happens. Okay, You introduce things like cold starts and stuff. But you know that that's a very small part of the learning experience, and once you've got it, you're pretty good.

Rasmus Praestholm:

And that is, you know, that is the wonderful thing about serverless, is that level of simplicity.

And I don't know if this is more about the industry or me just being a crazy nerd who finds all the challenges and things, but I feel like I've seen more out there about how to use Kubernetes to enable serverless, essentially your own Lambda replacement. Why would you ever do that? I don't know.

Then I've seen actual platforms and things that really aim to support serverless and make it easy, which it may already be sufficiently easy if you're just looking at Lambda and AWS.

Would you ever see something like that in a thing like Backstage?

Like, you want to do serverless? Click this button! Put some stuff in a Git repo.

Daniel Chalk:

You probably will, eventually. You can now – it's just whether you want to make that jump. And so I mean by… you already trust in Lambda.

It already responds really, really quickly. At that point, you rely on AWS. You rely on Amazon at that point. Right? And the second you start doing it on Kubernetes, you now rely on one of your Ops teams to do that. Can they scale as quickly? And can I do it as well? So it's a trust exercise to some degree and a maturity one.

But it is doable. The case for it is if your business cares about being multi-cloud or not; I think – that would be the bigger one. I mean, if your company is one like, you know what, we've already drunk the Kool-Aid, we're with this provider, you might not want to.

We’re having a chat about Kubernetes, and I’m saying not to! But yeah, the right tool, right job kind of thing.

Yeah, I think there might be high-level decisions as to why you'd not want to do that. And there are obviously other reasons why you'd want to as well. You want to be what we cloud so far, all right, this function once, and as long as I got a cluster, it'll work right, or as long as it's a cluster that's provisioned a certain way, it will work.

When you know, if you run, you were on a function for AWS. Well, it's only one AWS; you know you'd have to change it to run this one elsewhere, and that that that's the main benefit.

I'm reluctant to say I don't think it would scale like AWS Lambda, because, of course, you probably could get it too. It's the endeavour that you're going to have to go through to make it do that.

But I think it's quite a steep… An expensive commitment is probably how I put it.

Rasmus Praestholm:

So that's what brings me to an interesting conundrum. Kubernetes is hard. But they are always trying to do it. Serverless is kind of supposed to be easy, but I don't see it getting that much traction or adoption, even having been out so long.

So if Kubernetes is hard and everybody is trying to do it, but they’re kind of bad at it, and service is easy, and some people trying to do it, but it just isn't taking on – what's going on here? Are our problems as developers just fundamentally so difficult if we can't figure out the magic silver bullet that just does it right? Are we just gluttons for punishment?

Michael Guarino:

We used serverless a lot at a previous company, and there are some interesting things that come out of it. So the lack of infrastructure is one of the benefits; the other real benefit is it effectively gives you scale to zero trivially, so if you have very bursty work, it's perfect.

You actually save a ton of money, and we had a service that – the company was Frame.io, which is a video collaboration SaaS platform. But we had a service that ran Lambda functions on Cloudfront Lambda at edge that would resize videos on the fly and put them straight into the Cloudfront, but it's like the perfect use case for serverless.

The problem is one. It gets really expensive, like if you actually have consistent work going on because you're effectively allowing Amazon to tax you per function instead of like the amount of CPU that you're using consistently.

So if you have something that uses a lot of functions, but it doesn't really use a lot of actual compute resources, they're going to just drain you.

And the other weird thing is the developer experience can actually be kind of weird.

So you really can't run a Lambda function without putting it into Amazon, for instance, whereas with the Kubernetes development flow, you can always just run things in Docker compose, and that's going to be effectively equivalent to it running in Kubernetes.

That's not entirely true, of course, but it's true enough – so, like, what we found is our developers were much more productive and just a standard, bog standard Docker-based workflow than the serverless workflow, because, like for them, actually to get the full like service experience, they would have to be a toggle in between, like ten or so different serverless functions that all orchestrate with each other, but they don't actually really work locally.

So you've had to deploy it into the cloud. But then you have to go through approvals to deploy into the cloud, and it's just like they became a real pain in the butt!

So I think, if you like, have that perfect use case for it, it's really, really cool.

Another thing is AWS has serverless Aurora databases now. So if you have a dev database, you can make a server list; it'll scale to zero on-off hours when you would have no developers up and actually using Dev.

But there are a lot of use cases where it's still not exactly fit for purpose. And then, like the Kubernetes serverless experiments, if there's Knative out there, I think there are probably a few others.

They are pretty good, so they can solve the cost problem, Obviously, because it's just going to be running in your Kubernetes cluster. But then you have the problem of it not necessarily giving you a scale to zero because it's got to still have the Kubernetes cluster underneath this.

So you kind of have, like, a $200 minimum to run the service on that individual Kubernetes cluster, and that might actually kind of be more expensive than you were expecting to run that service, so… The gap is just still a little bit awkward.

But if you had a very large cluster and you could throw Knative on it, you, for sure, could probably outperform, from a cost perspective, Lambda, Google Cloud Run, or any of the other competitors.

Rasmus Praestholm:

It almost feels like these cloud providers just want to make money off us!

Daniel Chalk:

Well, they are businesses!

So I was saying about scaling. Your point was much more clear like you can scale to zero. That's the plus point; you don't have resources just sitting there costing you money when they're not actually being utilised.

If they are being utilised heavily, you are just spending extra money, essentially.

Michael Guarino:

A lot of extra money, and you aren't necessarily getting a better development workflow out of it as well, which is kind of like – if the developer workflow was just utterly beautiful, then maybe you spend that money you claw back with, like having fewer engineers.

But that's not actually true; at least, it wasn't true at the time when we were using serverless.

Maybe it's gotten a lot better.

Michael Guarino:

But the stuff that's like we've seen the really, really cool is the edge serverless stuff, like Lambda at edge. I think Cloudfront has Workers now, which is effectively a serverless offering as well.

So there, there's a lot of compute that you can do in the edge, like resizing images or video and stuff like that, that can then be quickly cached. And there's probably a lot of play for how to use serverless in those in those capacities.

Daniel Chalk:

Those would be good in front of the Kubernetes cluster as well, to be fair, you know, used in conjunction rather than – we're almost having a conversation as if one's better than the other, but actually, you know, very complicated!

Rasmus Praestholm:

I think I’m going to go with… Kubernetes is hard – and so is development. But there are a lot of cool toys and tools out there for us to play with.

And maybe that's enough to keep your career interesting and exciting.

Michael Guarino:

Yeah!

Rasmus Praestholm:

And with that, I think we're about ready to wrap up. So, thank you so much for coming, all of you – Michael, Tim, and Chalky.

I hope to talk to you all again soon!

Timothy Chin:

Thank you very much for having us.

Michael Guarino:

Yeah, thanks for having me as well.

Daniel Chalk:

Catch you later, peeps.

Romy Greenfield:

Thanks for joining us to discuss Kubernetes and lots of other interesting tools today – this has been DevOps Decrypted, which is a podcast on the Adaptavist Live network.

Please contact us on social media to let us know how you're finding the show, and thanks for joining us.

Thanks, everybody!

Why not leave us a review on your podcast platform of choice? Let us know how we're doing or highlight topics you would like us to discuss in our upcoming episodes.

We truly love to hear your feedback, and as a thank you, we will be giving out some free Adaptavist swag bags to say thank you for your ongoing support!