The need for advanced systems engineers
Our expectation of latency for the delivery of content has dropped from hours or minutes to seconds. We, as customers, also don’t care about the infrastructure or the complexity of the systems required to deliver this: we just want to binge watch the new season of Making a Murderer. Each iteration of this evolution has required the technology, systems, and skills we need to build and manage that technology to change. So, what do we need to know now that we didn’t before?
Recently, we have been thinking about the changes that have taken place in the technology industry from the very start of our careers, up until today. There is no denying that we have come a long way. One thing to note is that there are two very different, yet overlapping spheres in which we have noticed a significant change, that being technology and methodology.
The systems we worked on when many of us first started out were the first generations of client-server applications. These systems were without a doubt very different to the prior generation: terminals connecting to centralised apps running on mainframe or midrange systems. This is where things started to change. Suddenly, engineers started to understand the logic of their application client as well as the server powering it. This meant that there were new issues to consider to manage these systems effectively including, connectivity, the transmission of data, security, latency and performance, and the synchronisation of state between the client and the server.
This increase in sophistication spawned commensurate changes to the complexity of the methodologies and skills required to manage those systems. New types of systems meant new skills, understanding new tools, frameworks, and programming languages. We can trace back to this moment the spawning of numerous new specialisations that had previously been more concentrated in single roles: front-end engineers, back-end engineers, data scientists, designers, UX/UI specialists, and myriad other specialities. We can perhaps also trace back to this period the construction of more siloed functions and the increased complexity in transitions between those silos. The silos that the DevOps and SRE communities are attempting to dismantle today.
Since the first generation of client-server systems, we’ve seen significant evolution. Much of it driven by the emergence of technology as being mission critical to doing business—for any business in every industry. This has been coupled with customer demand for fast, immediate functionality available on devices, delivered seamlessly across different geographies and fabrics. Take, for example, the evolution of renting videos from the corner video store to streaming on Netflix and Hulu and their peers. Our expectation of latency for the delivery of content has dropped from hours or minutes to seconds. Our expectation of the delivery of that content is that it’ll be available to us 24x7x365 on every device we own and in every location: from our homes and offices to being on the move. We, as customers, also don’t care about the infrastructure or the complexity of the systems required to deliver this: we just want to binge watch the new season of Making a Murderer.
Each iteration of this evolution has required the technology, systems, and skills we need to build and manage that technology to change. In almost every case, those changes have introduced more complexity. The skills and knowledge we once needed to manage our client-server systems versus these modern distributed systems with their requirements for resilience, low latency, and high availability are vastly different. So, what do we need to know now that we didn’t before?
Building for tomorrow
As practitioners, we’ve had to build better. With availability and resilience being prime concerns, the definition of an application’s minimum viable product has had to be redefined. Good design goals now have to include a baseline architecture for operability, security, performance, and observability. Every engineer, from a front-end engineer working on a React component, to a back-end engineer building a distributed data store, needs to consider how their piece of the system will impact the overall system.
This is especially true because the performance demands of our users have created new constraints in the computational models and state management strategies available to our systems. Computational models are turning to serverless and edge computing architectures to reduce latency for users. The new lesson we’ve learned: it’s always more efficient to perform computations as close to the end- user as possible.
This is also true for state management. Applications are being deployed from inception with distributed state, shared storage, and possibly even the migration of data (or some segment of data) from centralised stores into the edge and the cloud. But being closer to the end user enables faster decisions at the expense of greatly increasing the complexity of our applications.
Both of these constraints mean engineers need to understand how their part of the stack pairs with the other pieces and what the implications of a seemingly small change might have on the overall system. And when this can’t be modelled mentally, due to complexity or lack of insight into the systems, then it has to be modelled programmatically via observability, instrumentation, tracing, and tests.
We can no longer only use simplistic probing to identify failures or easily provide sufficient information to debug faults. Applications with complex architectures and distributed state, that look fully functional to probes, may not be performing optimally or accurately for end users. Even when looking at metrics and events, which in turn require correlation and levelling across disparate systems, we struggle to gain a full picture as traditional approaches and even calculations of latency are less accurate for distributed systems.
The instrumentation of your applications is now a mandatory step in the development process and no longer an afterthought. Every engineer needs to consider how to articulate the state, performance, and observability of their aspects of the system. This requires engineers to develop the skills and adopt the techniques to ship these new capabilities.
A new tech ecosystem
New frameworks, architectures, processes, and a thriving ecosystem of tools have emerged to help us meet those challenges. Some of these are in an embryonic state, but rapid adoption is driving quick maturity. We’ve seen this evolution in compute: it’s only been four years since containers became a mainstream technology, and we are now working with complex application-level abstractions enabled by tools like Kubernetes. A similar evolution is occurring with deployment, serverless, edge-computing technology, security, performance, and system observability.
Ultimately, no changes can exist in a human and organisational vacuum. To successfully create truly cross-functional teams and enable the rapid iteration required to develop build increasingly more advanced systems, it is vital that we focus on developing the necessary leadership skills. By continuing the work of our DevOps and SRE communities, we can prevent stagnation in the industry and break down barriers whilst streamlining transitions between teams. Ultimately, teams structured around swiftly delivering high-quality, secure, and performant applications create highly innovative products and organisations. If we listen to the successes of those around us, we can, in turn, continue to navigate future complexities that businesses face each day.