“No one thinks about what comes after the deployment”

Hype topics such as tools or the developer side of things are often a little bit more the spotlight of the DevOps movement. In our interview at the DevOpsCon 2019 in Berlin we talked to Damon Edwards about the Ops perspective. He explained which modern practices in this part of DevOps could revolutionize the daily work of operators and what Site Reliability Engineering (SRE) actually is.

Besides Site Reliability Engineering (SRE) and modern working techniques for operators, the topic Serverless – of course – was also on the table again. We talked to our speaker about whether and how Serverless will change work in the DevOps context, and how forward-thinking developers and Ops people can already adjust to it.

JAXenter: Hello Damon. When it comes to DevOps, the “Ops” part is usually not the element of focus. We often focus on the cultural aspect of it. Why do you think this is?

Damon Edwards: I think a lot of people think that Ops is in the focus, but I think they mistake deployment for operations. So the part that’s not in the focus is what happens after deployment. No one thinks what comes after deployment. We’ve gotten fixated on this idea of deploy, deploy, deploy. The original was Flickr, they deployed up to ten times a day. Other organizations would think, that’s crazy, that’s incredible, how can we deploy ten times a day?

When the original conversation started, it was about the relationship between development and operations. The flashpoint, where things go wrong, is at that time and point where development and operations connect. So a lot of focus was put on deployment and I think since then, what’s happened is a lot of the DevOps conversations have been about Dev towards Ops. How do we build, test, and deploy application code as quickly as possible?

But if you stand back and look at the end lifecycle, that’s just one piece of what has to go on. There’s this whole other ‘what happens after?’ deployment topic and all the other operations concerns that take place that have to happen. That tends not to be discussed as often in the DevOps conversation.

JAXenter: There are many practices in the Ops sector that are not very up-to-date. Can you expand upon that a little bit?

Damon Edwards: I think operations and their individual skills and applications are actually quite up to date. In terms of automation, in terms of the platforms and the technologies, there’s a lot of great momentum on the skill side.

But what’s kind of out of sync with the rest of the IT lifecycle, are the ideas like Agile, flow, fast feedback, and working in small batches. Those ideas have been sinking into the development side of the house for almost twenty years. Whether people have been doing Agile practices or not, the thinking has been there, the book have been there, the tools and terminology are all there. These ideas of fast feedback, flow, and small-batch sizes and product aligning our teams, there’s a long history of that.

Whereas, on the Ops side of the house, the ways of working are really rooted in the classic kind of ITIL functional silos, command and control way of working that’s been around since the 90s or early 2000s. So it’s not just a matter of the individuals needing to change their skills or they don’t know how to do things. It’s more about how we approach and structure the work of operations that needs to catch up with what’s going on in the Dev side of the house.

There are good reasons why it’s not just a one-to-one transfer. There are other considerations that operations have that development doesn’t have. So, it’s not just a forceful way of Devs taking over Ops. It’s a matter of giving operations space so they can ingest a lot of these lean and Agile ideas in their own way. Then we get that true harmony between Devs and Ops.

JAXenter: Are there any other operational techniques that could revolutionize the way Ops works?

Damon Edwards: I’m not sure if you could call it a technique, it’s more of a design pattern. But a problem in operations is this extreme functional silos. So it’s like, we have the Linux server team, the Windows server team, the storage team, the DBA team, the firewall team, the DNS team. Everyone’s in these kind of very functional things, but work needs to flow horizontally across those different teams. So what happens is because we have all these different specialists and special kind of know-how, in some cases we have access issues. If there’s customer data in an environment, maybe only one team can access the environment, yet all the work needs to go on there.

So, it’s not just a forceful way of Devs taking over Ops. It’s a matter of giving operations space so they can ingest a lot of these lean and Agile ideas in their own way. Then we get that true harmony between Devs and Ops.

So what happens is we end up with ticket queues that drive all these interruptions and waiting. You are either constantly being interrupted by somebody from a different functional group trying to get you to do something or when you have time to get back to your work, you’re waiting in a ticket queue for somebody else to get something for you. A lot of time gets eaten up by the interruptions, the waiting, all that coordination overhead that goes into that.

One of the things that we noticed and founded the Rundeck company upon is this idea of how do you replace all of those interruptions and waiting with self-service. How can you take all the knowledge that is in that functional teams’ head and help them turn it into standard operating procedures that they can then safely delegate to other people? Instead of having to be constantly interrupted for these repetitive things, they can hand off the self-service. Likewise, those teams that need something from them, instead of waiting they have a self-service way to get that operations task done.

What that leads to is the ability to distribute operations and the ability to take operations actions to where it’s needed best in the organizations. You can make the workflow better throughout the organization.

JAXenter: Can you explain what SRE is and what role it plays in operations?

Damon Edwards: SRE (Site Reliability Engineering) is a rethinking of how operation work gets done and what the role of operations is. The term was coined at Google, but it’s really something that’s been going on at a lot of web-scale companies. The fundamental idea is what if we applied software discipline and software development thinking to how we run operations? On the surface, people get excited about how it’s like taking the mindest and skills of software engineers and injecting that into operations.

But it really has some fundamentally different points of view that come from the fact that these companies don’t exist to write software. They exist to run software. In SRE, there are some key principles such as this idea that we don’t want to have our operations team constantly buried under what they call “toil”. They are constantly in this churning mode doing a lot of repetitive work and in this SRE model, we should limit the amount of work that could be automated. Instead, we should make sure that they have at least fifty percent of their time available to do engineering work, to do things that will move the organization forward.

They also talk a lot about shared responsibility. In the classic world, the idea of an SLA was that operations agreed to have a penalty on them if the service fell below a certain level. If you look at the SLO (service level objective), the SRE version of the same idea, it’s about a shared responsibility model. In that model, if we drop below the SLO, the development and the business side and operations all have to basically stop what they’re doing and try to figure out how to raise that SLO above.

So fundamentally it’s a kind of modern rethinking about what operations work is, what kind of people and skills we want to apply to that, and what are the different thought processes and design patterns.

Whereas, on the Ops side of the house, the ways of working are really rooted in the classic kind of ITIL functional silos, command and control way of working that’s been around since the 90s or early 2000s. So it’s not just a matter of the individuals needing to change their skills or they don’t know how to do things. It’s more about how we approach and structure the work of operations that needs to catch up with what’s going on in the Dev side of the house.

JAXenter: Serverless is on the rise and it will change how operations work. What impact will it have exactly?

Damon Edwards: I think it’s the same kind of impact as containerization, virtualization, and the cloud. It’s another architectural design pattern that we can use. There are some economic impacts as well. If everything is a function, we can easily track the cost of things. I think it will have some far-reaching implications.

What it won’t get rid of is the notion of operations. You can talk to people who are currently going down the full serverless path. One of the greatest examples is Patrick Dubois who coined the term DevOps. The word DevOps is around because of him. He has a startup where it’s all based on serverless and Lambda, those types of technologies and is all in the cloud. When you read his Twitter feed, it’s fascinating, because it’s all operations questions and operations work. It’s just in a different context. But if you look at what he is talking about, asking what his system is, what the limits are, the weird behaviors that happen, how do my systems break, how do I respond, this is the kind of first-responder mindset that comes into operations. He’s doing all of that except it’s all in this serverless world.

The technology is changing to distribute who does what operational tasks at what time. It’s being redistributed and rethought. But the fundamental domain and discipline of operations is just as relevant today, tomorrow in the serverless world, as it was yesterday in the Java web app on VM world, classic mainframe world. Operations as a discipline is there, as a skillset it’s there, it’s just being redistributed and the infrastructure and tooling looks different.

The post “No one thinks about what comes after the deployment” appeared first on JAXenter.

Source : JAXenter