HumanOps: It’s time to make DevOps personal
2d illustration Human Heart image via Shutterstock
In this article, David Mytton, founder and CEO of Server Density, explains why we need a return to empathetic operations.
Over the years, as our infrastructure has moved to the cloud it has become more abstracted and self-correcting. During this time it’s been far too easy to forget that, when something goes seriously wrong, it’s still a human at the other end of the system waking up to fix it.
The tech world’s mantra of ‘move fast and break things’ has spread to many other industries and sectors relying on IT infrastructure for their products and services, however this way of working risks breaking their most important asset: their employees’ health and morale.
With scale comes complexity, and with complexity comes unpredictability: today’s on-call IT worker just doesn’t know when he or she will next be woken up at four in the morning to fix an incident, and this is a problem. Errors and delays cost millions, and it’s understandable that businesses need to get critical failures fixed as soon as possible, so how can we move forward and create a more appealing, healthy environment for engineers?
Extending DevOps with empathy
The expectation of rapid patch releases and continuous delivery models have placed strain on enterprises seeking to improve their products. Understandably, many companies are looking to DevOps practices to ease this burden and modify their team structures so that development and operations are working together in a more efficient and agile way.
However, this methodology can be extended further. While IT managers are by now used to looking at how teams work together, the industry is, in general, far less adept at considering the individual, his/her relationship to the job, and the detrimental impact an ‘always on’ culture can have on IT workers.
If you’re woken up at 3 am to work on an urgent fix, you’re not going to be very productive at work the next day. This may sound obvious, but a surprising number of businesses fail to take this into account, which is especially shocking considering that some research suggests that interrupted sleep is worse than little sleep. Sysadmins and on-call workers are expected to be available at all times of the day or night, but they are not ‘superheroes’ or ‘ninjas’. This expectation and the sustained stress it creates can be very detrimental to employee health.
SEE ALSO: The portrait of a DevOps developer
Workers with low job satisfaction are likely to leave their jobs sooner: according to some estimates replacing employees can cost up to 21% of that employee’s annual salary, so a consideration of how employees are coping from a health and wellbeing perspective can seriously impact business’ bottom lines. We’ve also seen a high-profile example of the catastrophic consequences of lack of concentration recently with GitLab’s data loss incident.
I know this from experience: I was on call for the first few years when I started my own company. Our team was just a couple of people so I found that I was frequently called away in the evenings from socialising with my friends. This prompted me to see how we as a company could make life on call easier when dealing with IT alerts, but also how the industry can come together with the same goals.
So what’s to be done?
It may seem strange for the founder of an infrastructure monitoring company to be talking about this: after all, server monitoring includes sending alerts and waking people up. We realised that we have a responsibility to advocate more sustainable working practices for IT teams, and we term these practices HumanOps.
There are several core HumanOps principles, but the most important one to remember is that human health impacts business health. As Richard Branson likes to say, “If you look after your people, your customers and bottom line will be rewarded too”. If businesses prioritise looking after their employees’ wellbeing, system performance, staff retention and productivity will all improve as a consequence.
These are great ideas in theory, but how do we develop practical policies to implement them? Developers can start by working with their managers reevaluating how issues are escalated, who they are escalated to, and who has oversight of on-call workload. Build internal protocols to ensure that all alternatives are exhausted before you give someone the call to come in, and ideally ensure that important knowledge does not reside only within one or two people who are then charged with fixing the problem.
Implementing policies such as blameless postmortems, and raising general awareness of the impact of sleep loss, can go a long way to bringing empathy back to on-call IT work. You could even ask your manager to join the on-call rotation and experience the disruption first hand – many companies take this approach because there’s nothing like constantly being alerted to increase the priority of a fix!
Software makers have a responsibility to ensure their products work as intended, and this should extend to include making sure that staff are adequately equipped to work in a healthy and sustainable way. Metrics such as interruption cost in human hours, time spent on-call per employee and number of alerts triggered out of hours can help managers get an overview of the highest priority problem areas in their particular company.
There’s so much energy and enthusiasm about DevOps and what those practices can do for businesses, but it’s important not to focus solely on the technical practices and forget the human aspects of DevOps. It’s about time that we recognised that engineers are humans who get stressed and need downtime and that there are strong business as well as social reasons why these needs should be met. As the old saying goes, the most important assets go home every night, so let them get some sleep.