Kelsey Hightower on Kubernetes, DevOps engineers and digital transformations
Kelsey Hightower is a Staff Developer Advocate at Google, co-chair of KubeCon, the largest Kubernetes conference, and the author of the very first “here’s how you install Kubernetes on your laptop and play with it” post.
In reality, Kelsey needs absolutely no introduction in the cloud native world. I’m always amazed at how open and approachable this guy is! I managed to grab some of his time and here are some parts of what turned out the be one of the most interesting conversations I had on cloud native topic.
Kelsey, thanks a lot for your time!
Ivan: Kubernetes has for sure revolutionized the way applications are built and delivered. Good Kubernetes engineer is worth its value in gold, but with public clouds and their services like Google Kubernetes Engine, AWS Fargate, and Azure Kubernetes Service, are we in a point of time where Kubernetes has become a commodity? Should a brand new engineer fresh from the college invest in learning Kubernetes or should they just accept it as a given piece of infrastructure?
Kelsey: You’re starting to see something like the first distro of Kubernetes. If you go to Google Cloud to do the integration work there, you have to have so much insight into how the cloud provider works. If you want to do IAM integration or network integration, it’s just easier if you have access to those things at a very low level as we do at Google Cloud, like Amazon has with EKS and the same thing for Azure and so forth. If you’re in the cloud provider, the whole reason you’re there is for them to give you some services, right? If you’re just doing VMs and do everything yourself, that doesn’t make a lot of sense. So I think in the cloud you have two options.
One is if you just want a decent Kubernetes set up where it’s just going to work (my guess is 85-95% of all the Kubernetes will just be in the managed offering), which will deploy to VMs.
But there are going to be cases where maybe you want some custom type of setup, maybe have a different set of security requirements where you really need to control the host. I think this is still where it’s going to be important to know Kubernetes. Some people want different networking options, some people are using a different type of security agent and even when Kubernetes is managed, you still may have issues with Docker. You may still run out of disk space, you may have other concerns around how the node is configured. So I still think it’s okay to learn a little bit about how it works. Just like Linux. You still need to know how bash works, you still need to know how to control memory and CPU utilization. I think it’s going to be more about how not everyone is going to have to install Kubernetes in every environment, that’s the key. On-prem you have options like Pivotal PKS, OpenShift, Google Anthos, but you may just want to roll your own, too. You may just say, “no, we don’t need all of the features. We just want just enough Kubernetes.” So I still think for the next five years it’s really worth learning because we’re still in a period where you have to troubleshoot even the managed installs like: “Oh, the login is broken!” Well, you have to go look at the agent and reconfigure the logging.
Ivan: DevOps skills are in a high demand these days and a lot of organizations are hiring junior people and trying to raise them straight as “DevOps engineers” without any significant experience in development or operations. What are your experiences, is it realistic? Or do you believe that “there simply are no junior DevOps engineers.”
Kelsey: You need to think about the work required. At all levels of managing infrastructure, there’s the kind of stuff that you don’t want to do every day. Like for example, you have to restart it, you have to troubleshoot it, you have to monitor it. You have “trial and error”, touch this config, touch that config… And every company can’t hire or afford to hire people who have lots of experience. In that case, they just have to get someone who kind of knows the tools well enough maybe to use Kubernetes and not manage it. I think that’s where a good cloud provider just buying Red Hat comes in. Just buy something and then just use it and then just call the 1-800 number or bring in a consultancy. And I think day-to-day is going to be like: you need to watch it just to make sure that if it breaks we have a heads up or just migrate things to the backup. I think you’ll always need that and you’re right, things are changing.
Even people with experience are finding themselves fish out of water. Like if you have been managing VMware for five years and now Kubernetes shows up, it feels like you’re starting from scratch. So now he’s like “Oh, what’s distributed system? I don’t know anything about containers. I don’t know anything about these things.” I think we’ll always have a need for junior DevOps engineers. But they have to understand what they don’t know. They still need to learn. They still need to know how the process works. They still need to learn about security. And I think what happens is that sometimes people early in the career believe that they don’t need to learn anything else.
It’s like, I’m just going to be a Kubernetes engineer and everything’s going to be fine. But then there’s a problem with Linux or there’s a problem with networking and they’re just like “Oh my God”. They keep searching for Kubernetes solutions when really it’s the Linux solution. And so we just have to do our job to educate them. So when they come in we will say, “Hey, Kubernetes is a tool on top of Linux machines, but you still need to learn Linux. Maybe you don’t have to learn Linux immediately to get some value, but just know that you’re going to have to eventually go to that level if you really want to be strong and be able to troubleshoot complex issues.”
Ivan: When you see a team of enthusiasts in a typical large enterprise organization that wants to change something and introduce DevOps culture… What would you suggest to them how to start? What have you seen so far as the low hanging fruit?
Kelsey: To be honest, if I think back to my time working at one of these companies trying to do digital transformation, we had lots of mainframes, IBM MQ series, Solaris, DB2… and of course, it was easy to come in and say we can do something better, but the why is normally missing. Like why do we even want to do better?
So I did one thing and at the time instead of Kubernetes, it was Puppet. This is when configuration management was really new. And the first thing I did was just saying “What are we missing?” For example, at that time we just couldn’t really deploy very fast because we just didn’t have a good framework for writing automation. We just had lots of scripts. But we didn’t have a really good framework for automation. So when I brought in Puppet, instead of saying “Hey, look at Puppet and we’re doing something new”, it was more like “Hey, I can now set up an environment in five minutes instead of three hours”. And you just show that. And then people say, “Hey, how?” And then you mention Puppet.
So I think in the Kubernetes case, one thing I would do is… most companies don’t really have a good catalog of their software. For example, most companies that are kind of big will say things like “We have 10,000 applications”. I’ll say “Great, where’s the list of those applications and how do you deploy them?” And that’s where everything gets fuzzy. No one knows. It’s like “this team does it this way, that team does it that way…” And now we want to do CI/CD. Well, the first step is a catalog of all the binaries. Then I can tell you which ones can actually be containerized – if they’re on Linux, probably the majority. And once I have that catalog, then we start to talk about artifacts like RPMs and container images and then based on the artifact, I will say “Hey if we have a container, I can now guarantee that I can give a consistent deployment model even without CI/CD. With one command I can reliably deploy the software and if a machine crashes I can move to another machine”. And for most companies doing digital transformation, that’s a big step up from where they are. This is why I always start with that. I don’t really start with the whole auto-scaling and density because that’s really not the core problem they have. For them, it’s just more consistency and being able to repeatably deploy something.
Kelsey Hightower. Source: HashiCorp (https://www.hashicorp.com/resources/hashinetes-combining-kubernetes-hashicorp-kelsey-hightower)
Ivan: Our experience with large enterprises and financial institutions is that they are reluctant to move to the public cloud. And the usual suspect is security, they don’t think that public cloud is secure enough. What is your take on this thinking? Should these enterprises really be worried or are these just false arguments?
Kelsey: They should be worried, I think. Because if you think about it, you don’t know what you don’t know so I can understand where you’re going to be cautious. Because if you’re the person responsible for the security and you can’t say it’s secure, then you’re going to be very hesitant and also a little sceptical for something new. But even in the financial services world, they use a lot of public infrastructure. They use the public internet today, they have to deal with all these government regulations and systems. Those are also public, so they have some experience using shared infrastructure already today. When I run into these situations, I try to get as many people across the organization, including InfoSec and people in Ops in one room, and then I get the whiteboard and say “Why are you not comfortable with the security?”
Then they ask some very good questions. Sometimes what I find is that they will say “You know what? On premise, we’ve been hiding all of our security problems behind three firewalls. We have a lot of obscurity and it’s really, really hard to get into those environments”. Now here’s the thing, if you get into those environments, it’s game over. We know this. In the cloud, we’re asking them to remove two of those three walls and now they’re nervous and say, “You know what? We know none of our apps have encryption.” Most of the apps don’t even have authentication. If you hit the port, you’re into the app. So that’s why they’re really hesitant. That’s really the thing that they don’t want to say out loud in front of teammates.
That’s where I start recommending solutions. Like putting Envoy on the VM. So if the app doesn’t have security, I can still put TLS mutual authentication in front of the app. I can still put metrics in front of the app in a way that I don’t have to wait for the developer to do the right thing. And that’s starting to make the InfoSec team more comfortable. There is a firewall close to the app and you can reuse some of your existing stuff. So I’m just very patient with them and just really try to show them every step of the way. But usually, when they get the confidence, we kind of see the worry goes away over time and they say “Okay, now I understand what’s going on here.”
Ivan: What future do you foresee for Go as a language? Should DevOps engineers invest in getting more familiar with it? What are your experiences?
Kelsey: I saw a list the other day, Kubernetes is written in Go, every tool from Hashi Corp, Vault console… Almost every tool you see in our world is written in Go. So that lets you know that this is the language that you’ll need to extend it or troubleshoot it. Very similar to when everything was written in C. You have to learn it if you really want to be able to contribute to that ecosystem.
Python, for example, is a very advanced language in terms of mapping functions and all of these fancy things. But for Ops and DevOps kind of work, we just need something to read the file, transform from XML to JSON, call some API and be done with it. Go makes it really simple to build these tools. So I think you’re gonna see it just more and more. The IDE support is also getting great now.
Ivan: I’m sure that a lot of people and clients talk to you about cloud-native architecture. What are in your experience the biggest benefits that organizations (especially large ones) get from this approach?
Kelsey: I always tell the customer, number one is to decouple yourself from the machine. And they say “What does that mean?” Right now some customers believe that they need to have Red Hat 6 and some specific JVM and if they don’t have that, then they can’t do anything. And this is how they get locked into their environment. We try to decouple it. One way of decoupling is with the container image. We just put Red Hat 6 in the container image, we put the JVM, we put the app, and now we have the container image that can run on multiple Linux machines. So the first piece of cloud native is that we want to decouple from the machine so we can get the movement, that agility of moving. I didn’t say rewrite the app. I didn’t say do something wholly different.
I don’t say go from JBoss to Spring Boot. That doesn’t really matter. I’m just saying you have to get a format that allows you to move to the cloud. Now Dev and Ops people can go from on-prem to the cloud without taking the whole machine image with them and all the tools. So that’s step one.
The next one is that I need some help from the app. If a database goes down, I need you to do basic retry, don’t just crash and then make everybody run around. And don’t make me start the app in a certain order. So a part of being cloud native is knowing that we’re going to be in a network environment and we know that things may go down. So just very simple things like if the configuration file is not there, just log that config file is missing and keep trying and that’s going to feel really cloud native. That also means I can deploy the app and the order doesn’t matter. It becomes so much simpler. Just that little thing around network retries and retrying loading the file.
The next thing you can do is health checks. Wouldn’t it be nice if every app had just /health? If you hit it, it goes back and does all of its own checks: check the database, check this, check that and if there’s something wrong gives a JSON response “Hey, this is the issue, I’m trying to connect to this database at this IP. It isn’t working and I’m going to keep trying. If you fix it, I’ll keep going.” That’s a real cloud native thing to do. Most enterprises don’t do that.
Ivan: Can you share with us what are you reading these days and who are the people you follow that make you think differently and do better?
Kelsey: If you look on my Twitter profile, I’m only following 36 people. And it’s just because they’re all from various backgrounds. I’m also reading about things that have nothing to do with tech, like biographies by Jimmy Carter, I’m reading about a sportswriter and how they got into the sports world, I’m reading some science fiction. I’m reading about how metal has progressed with humans going from iron to steel. Because if you look at all of those things, they represent the same challenges we have in tech. But they approached the problem differently and in many cases way different than we think because those industries are way more mature. And it gives me a lot of ideas on how to approach what we’re doing with a little bit more maturity and ideas that have worked at other places. I’m reading stuff that’s not immediately within my domain, just so I can broaden a little bit and get more perspectives.
If you’re interested in receiving interviews with thought leaders and a digest of exciting ideas from the world of DevOps straight to your inbox, subscribe to our 0800-DEVOPS newsletter!