Canary deployments for IT operations
Canary deployments are a commonly-used DevOps practice for staggered rollouts, sending small updates to groups in order to catch and fix issues. Ultimately, experimenting with DevOps practices such as canary deployments can help IT (and IT operations) bridge the gap with the business and deliver more value, faster.
IT operations teams have been doing their jobs using traditional practices, such as ITIL, for many years. Their focus has been on consistency, reliability, and stability, using precise and standardized approaches for ensuring performance and service availability. On the other hand, DevOps is defined by speed and flexibility. Yet modern IT organizations need both. That’s why there’s been a movement in recent years to merge DevOps principles into IT operations, given the more dynamic, unpredictable nature of IT infrastructure today.
Canary deployments, a popular DevOps practice for staggered rollouts, is a prime example of how DevOps can positively influence enterprise IT operations. This deployment method sends updates to small groups incrementally, with the goal of catching issues and fixing them quickly rather than deploying to the entire population of users at once. In a mass deployment, if a significant bug is discovered, it takes a lot more money and pain to discover and fix the issues and then redeploy across a large user base. Canary deployments also allow for continual improvements: fine-tuning the release for a better user experience or outcome.
Consider the “canary in the coal mine” analogy. Based on a true practice in the mining industry where canaries were ruthlessly sent into mines to test for poisonous gases and protect the humans, canary deployments intend to isolate the impact of software defections to a small audience. Thankfully, mining operations no longer sacrifice innocent birds to maintain worker safety, but in software and IT, canaries play a valuable role in reducing the risk of the rapid release and change cycle. And nobody dies.
Canary deployments have been around for several years, and while common in DevOps and SaaS organizations, they are still rare in enterprise IT environments. When done correctly, they can result in faster, cleaner and more successful changes. Ostensibly, this practice could be useful in a lot of IT operations change events, but I think the two below will provide the most immediate, measurable benefit.
Canary use cases in IT Ops
- Patching desktops, servers, and operating systems is a routine yet important IT operations task. If something goes wrong during the update, and heaven forbid, takes down the network or results in terrible response time for the entire company for a day, it’s going to make a lot of people unproductive and unhappy. Instead, you can create logical segments for an incremental canary rollout: this could be by device type, cluster, geography or data center, business unit or even by customers, such as at an MSP. Depending upon the type of release you’re doing, you could stagger each rollout by an hour or even a day, monitoring the new environment for any issues. Once clear, you move on to the next batch unless you’ve got an issue to fix first.
- Nightly backups are another area where canary deployments can be useful. Let’s say you want to back up the virtual storage (VMDK) in two vCenters to a cloud service like AWS. In a large environment where you’ve got 100 or more VMs, having to stop the backup midstream, deal with any issues such as corrupt files, and then repeat the entire backup process is time-consuming. By segmenting that workload into four 25 VM groups and then using various tools (including sys log monitoring and cloud service monitoring), you’ll know quickly if there have been incomplete transfers or any other problems before starting the other backup segments.
A few considerations for running canary deployments
While straightforward from the outside, there are a few things to keep in mind when running canary deployments:
Put in the time at the beginning: Most companies have a patching tool or service like Ansible to automate change events. But many of these IT automation tools don’t natively support segmentation and scheduling. So while you can use your existing toolsets, you’ll likely need to adapt them to canary rollouts. Initially, configuration will be time-consuming, but will get easier with more practice.
Consider the tradeoffs: With staggered deployments, not everybody benefits from the new release at once. If your team must fix bugs before releasing to other groups, that can create tension in the business for anticipated updates. For a mission-critical end-user service, such as an update to the sales management software during the end-of-quarter reporting period, it may not be wise to do a canary release. Your people may need that update ASAP to meet their deadlines.
Re-orient toward user experience: As with many changes in IT, mindsets must also shift. The age-old “set and forget” IT management practices are becoming less relevant in today’s distributed, always-changing, hybrid cloud environment. With a DevOps orientation focused on delivering an unforgettable end-user experience, IT operations teams can adopt these new practices swiftly. IT Ops will need to give up some of their tried-and-true practices to experiment with new, more agile ways of completing tasks.
Ultimately, experimenting with DevOps practices such as canary deployments can help IT (and IT operations) bridge the gap with the business and deliver more value, faster.