Is chaos engineering the key to lockdown cybersecurity?
The practice of chaos engineering can help unearth security problems that ordinarily, you would never discover. By utilizing chaos engineering best practices, your teams can keep up with how a potential hacker might infiltrate your network and gain a better understanding of the weaknesses in your infrastructure and what security measures to take.
For companies across every industry, technology resources have become a critical asset. Applications, databases, and websites are now a key driver for business and every time a cybercriminal strikes, it can put operations at risk. So it’s no surprise that IT security has become a fundamental focus at the enterprise level.
Installing a suite of cybersecurity tools is a good first step to locking down your infrastructure and software. However, you also need to expect the unexpected given that hackers are always looking for new ways to infiltrate organizations. The practice of chaos engineering can help to uncover vulnerabilities that you didn’t even know existed.
In this article, we’ll dive into the background of chaos engineering and how it can benefit your organization from a security perspective.
History of chaos engineering
The idea behind chaos engineering is to purposely break certain elements of your network or systems to better understand where the weaknesses and dependencies are. Embracing this practice can be a challenge for some organizations, as they prefer to maintain stable operations at all times, no matter what.
Netflix was one of the pioneers in the practice of chaos engineering. In 2010, as the company aimed to shift from the DVD delivery model into an online streaming platform, they began to understand the risk of outages. If their service went down for just a few minutes, customers would get frustrated and possibly cancel their subscriptions.
So Netflix built an internal tool called Chaos Monkey, which would randomly take down a piece of their infrastructure and force engineering teams to react appropriately. Since then the scope of chaos experiments has expanded and often targets scenarios related to hacking and security.
Assume the hacker mindset
The first step in the chaos engineering process is to think up an experiment that will teach your organization something about its internal technology. This can be difficult for developers and system administrators, as their normal focus is on making things work right and stay stable.
To boost your organization’s security profile, try adopting the mindset of a hacker. Start by reviewing your IT assets and ranking them based on their perceived value to an outsider. Then do research on the latest types of cyberattacks being used – ransomware is a VERY popular one right now.
Chaos tests should always be executed in a test or staging environment so that you do not directly impact any live systems. After simulating the hacking attack, you should review every element of your network to understand the impact. This information will feed into your analysis to determine what actions to take next.
Chaos in the cloud
Why did it take companies so long to start running chaos experiments against their own systems? The reason is that the cloud computing trend significantly increased the complexity of enterprise infrastructures. Now instead of a company operating their own group of internal servers, they run various distributed systems across a global network.
Take cloud storage. In the old days, databases would reside on local drives with redundant backups on-site. Today, the cheapest way to store data is through a cloud storage provider in a managed data center. The cost benefits are obvious, but relying on a third-party storage vendor can’t help but increase the risk to your own organization.
A good chaos experiment to run would be to simulate a hacker compromising a storage volume or taking it completely offline. During this test, you want to monitor every level of your cloud architecture to see how it reacts to the failure and what other systems are affected by the outage.
It’s smart to automate chaos
We’ve discussed the idea of injecting controlled chaos into your network, one element at a time, in order to find the system limits. As you’ve probably already experienced in other areas of your online life, automation is a good thing, and that’s where this is headed with chaos testing. After the initial test is complete, plan to take advantage of the following tools to ramp up testing by injecting new destructive factors, changing the parameters, or scaling the test.
Here’s a quick rundown on a few proven effective automation tools for network chaos testing.
Open source, which is a good thing, this framework lets white hat testers easily play the part of bad hackers through use of the techniques and tools that help them strip away typical protections like a firewall or anti-virus suite. Using a VPN in conjunction with Metasploit lets you probe for flaws without putting your data or system at risk. More on that later.
Another open source tool, this network mapper offers a bird’s eye of the entire system you’re testing, including constant real-time scanning that records KPIs like speed and uptime. Of particular value to chaos testers – remote host detection to keep you from going into a situation blind.
We’ll close this short tool discussion by mentioning the idea of a virtual private network (VPN) once more. While recruited to play a mostly defensive role in conjunction with Metasploit and Nmap, thanks to military-grade encryption (AES-256) applied to web traffic going in both directions, testers should be aware that another standard feature offered by most consumer-focused VPN services is the ability to mask your IP address and thus geographical location. This second feature is an important part of an additional type of chaos testing.
Attacking your target network through a VPN server located in a different region allows you to simulate users from other countries as a variable – modeling a decentralized network like the kind often deployed by hackers when they launch a Distributed Denial of Service attack (DDoS).
Embrace long-term chaos
It’s important to remember that chaos engineering can only succeed when it is run as a continuous activity across your organization. If you simply run one set of experiments and never follow-up on them or re-test them in the future, then the original effort has been wasted and you will remain vulnerable to external attacks.
A series of outcomes should be documented at the end of every chaos experiment. System weaknesses should be prioritized, with anything related to cybersecurity ranked at the top. Then individual people must be given responsibilities for finding a solution and fix for the found problems.
When possible, organizations should seek out cybersecurity products that make use of artificial intelligence and machine learning platforms. These are becoming more and more prominent in today’s fast-moving cloud computing world. It’s no longer feasible for a single IT team to monitor all of an enterprise’s distributed systems and be able to react quickly to issues. With AI tools, problems can be discovered sooner or better yet, blocked at the source.
But even the smartest security products will still require human oversight and intervention. For this reason, it’s important to pair your chaos engineering efforts with a disaster recovery (DR) plan. Everyone in the organization should know what to do in the case of a major outage or breach that hits cloud-hosted systems.
The principles of chaos engineering can seem a little strange at first. Why would a company want to waste time and money on something that involves breaking things on purpose? Shouldn’t those resources be dedicated to stabilizing the existing systems and building new functionality?
In reality, the cost of online outages can be disastrous for a modern business. Using chaos engineering best practices will help you avoid or minimize these types of disruptions, making the return on investment quite valuable in the long run. There are many unknowns in the cloud computing world and chaos engineering is perhaps the best way to uncover those proactively.
The biggest variable of all on the internet is the threat of hackers. There’s no way to predict when a cybercriminal might strike or how they will infiltrate your network. But by running and analyzing proper chaos experiments, you can gain a better understanding of your infrastructure weaknesses and what improvements are needed to lock down all internal security measures.