Develop the ideal direction

Kubernetes Cost Management: Challenges and Notable Tools

Saqib Jang
© Shutterstock / LuckyStep

This article describes Kubernetes cost management challenges while highlighting select tools from among the market-leaders. Kubecost, Opsani, Rafay and StormForge provide prominent examples of the value standalone tools can bring to K8S cost efficiency and why you should consider investing in them.

Kubernetes(K8S) has rapidly become one of the leading cloud technologies because it simplifies the deployment, management, and scaling of modern container-based cloud-native applications. Simply put, K8S is about enabling moving rapidly with more dexterity, providing an improved user experience, scalability and becoming more efficient in terms of resource utilization and cost.

However, most enterprises confront a wide range of challenges before they can fully realize the benefits of running Kubernetes. These include performance, reliability, scalability and capacity management, and cost efficiency, which for most organizations is the most important goal behind transitioning to cloud-native architectures.

This article describes Kubernetes cost management challenges while highlighting select tools from among the market-leaders. Kubecost, Opsani, Rafay and StormForge provide prominent examples of the value standalone tools can bring to K8S cost efficiency and why you should consider investing in them.

K8S Spend Drivers

Practically speaking, the biggest obstacle to large-scale Kubernetes (K8s) deployment is the unmanageable growth in costs. A recent FinOps/CNCF survey with 195 respondents — 75% of whom reported having Kubernetes in production — highlights Kubernetes cost management difficulties. The survey found that over the past year, 68% of respondents reported that Kubernetes costs increased. Among those whose spending increased, half saw it jump more than 20% during the year.

SEE ALSO: Detect proactively whether application’s memory is under-allocated

Sub-optimal resource provisioning

K8S application developers focus on implementing innovative features rapidly, typically through exploiting readily available repositories for existing code that is usually not optimized for the functionality required in the new application. Because time to market is the dominant factor, code optimization for maximizing performance assumes a lower priority and is typically mitigated through on-demand, programmatic provisioning of additional resources.

While K8S application development and deployment may be relatively straightforward, as applications scale up, the wasted resources can readily impact the company’s bottom line. Enterprises are frequently alarmed to discover that most of their K8S-based application in production are only using a small percentage of the compute, memory, and storage resources allocated to them, resulting in a huge amount of inefficiency and waste.

Similarly, the 2021 StormForge cloud spend survey of 105 IT professionals found that over 48% of cloud spend is wasted on average with Kubernetes-based cloud complexity (which makes it hard to estimate the resources that are needed requiring intentional over-provisioning as a safety measure to ensure application performance) being a major contributor.

Cost Allocation for Multi-tenant Clusters

Kubernetes clusters are shared infrastructure run by multiple teams to run various applications. Once an application is deployed, it uses some of the cluster’s resources and contributes to the aggregate cost of the cluster.

Next, visualize multiple teams working on many discrete applications. Identifying the contribution of each application towards the overall aggregate cluster cost is very difficult as it’s not clear how much compute, memory, and storage resources of a multi-tenant cluster are used by an individual K8S application. This makes it very difficult to calculate and allocate costs by application and allocate such costs to a business unit or enterprise.

Non-K8S cloud costs

While there is strong impetus towards adopting Kubernetes, non-cluster cloud usage such as for cloud SQL instances and S3 storage continues to be the major component of cloud costs. Customers may find themselves using multiple tools across their cloud infrastructure, which can create discrepancies in reporting and lack of trust the insights uncovered. This creates pressure on engineering and financial management teams vis-à-vis the requirement to get a full picture of cloud costs to derive the most value out of cloud services.

Finance versus development role split

Heightening the difficulty is the split – in terms of both attitude and roles – between developers and the corporate finance function that is ultimately responsible for disbursing the fees. Continual increases in cloud costs created much apprehension among financial teams. However, development teams with primary responsibility for greater cloud fees are known for their rapid execution while being typically unaware of the subsequent cost challenges they have been forced to create.

Kubecost for monitoring and reducing Kubernetes spend

Kubecost is an open source-based platform that gives teams running Kubernetes a solution for visibility, operational insights, and ongoing cost allocation management. It provides granular visibility into Kubernetes clusters including namespace, label, cluster, and individual pod or container level. With cost center or business unit visibility, Kubecost allows developers to drill down into units that impact costs.

The commercial Kubecost offers insights into cost allocation, cost monitoring, and alerts – and includes additional features like longer metric retention period, data aggregation across clusters, user authentication, saved reports and enterprise support.

Kubecost can be used to track costs for Kubernetes applications utilizing Amazon Elastic Kubernetes Service (Amazon EKS), Google Kubernetes Engine (GKE), Microsoft Azure Kubernetes Service (Microsoft AKS), and multi-cloud environments. In addition, Kubecost can be used to track costs of on-premises and high-security, air-gapped Kubernetes clusters.

Kubecost recently announced visibility on cloud services outside of Kubernetes clusters. With this step, Kubecost provides developers a single view into both Kubernetes and non-Kubernetes cluster spending on cloud resources.
“Kubecost’s new cohesive cost view allows users to see the complete picture – combining in-cluster spend on nodes, disks, and more with out-of-cluster spend on external services like AWS S3 and RDS,” said Rob Faraj, Founding Partner at StackWatch, creators of Kubecost. “For the first time, engineering, finance, and executive leadership teams can get complete visibility across their infrastructure in a few minutes.”

Opsani for Kubernetes cost optimization

Opsani is a provider of AI-driven optimization for cloud applications enabling companies to reduce costs through continuous cloud optimization while meeting customer-set performance goals. The Opsani Learning Autoscaler (OLAS) is a drop-in replacement for the Kubernetes Horizontal Pod Autoscaler (HPA). Unlike HPA, OLAS provides predictive and proactive autoscaling. Using machine learning, OLAS learns the traffic patterns of your service and scales up just ahead of time, to ensure that there are enough pods ready to take the increased traffic. When traffic subsides, OLAS quickly scales down to reduce cloud costs.

OLAS reduces cloud costs for applications with variable daily traffic up to 80% compared to applications with resources provisioned for peak load. Due to its predictive capability, OLAS scales just before a load increase. In contrast, HPA scales reactively after the demand has already increased, often leading to service-level objective violations.

Using advanced machine learning, OLAS automatically chooses to scale up and down points to meet the specified service level objective while maintaining constraints, such as minimum replicas and maximum cost. Unlike reactive HPAs which are frequently misconfigured, OLAS makes it very easy to autoscale all your services.

OLAS integrates with Opsani Continuous Cloud Optimization to provide optimal performance at the lowest possible cost. It continuously monitors traffic levels, SLO objective and costs, providing a single pane of glass reporting for autoscaling all your services. A web dashboard summarizes costs and SLOs across all services.

“I would like to bring up the fact that cloud-native has both positive and negative business implications. On the plus side, we get faster feature delivery, quicker time to market, and a richer set of services. On the negative side, we get more inadvertent sprawl in every dimension: API versions, reductant code repos, excess resource utilization, etc.”, says Amir Sharif, VP, Marketing at Opsani. “So, the negative has to be managed to make the positive’s punch powerful. Given the dynamism and complexity that cloud-native creates, organizations are seeing that AIOps is the only solution. Here at Opsani, even more than ever, our role is helping enterprises maximize the positive by minimizing the negative risks”.

SEE ALSO: Monitoring as Code: How to implement better processes across your pipelines

Rafay cluster template for Kubernetes cost visibility

Rafay Systems’ Kubernetes Operations Platform (KOP) provides a comprehensive approach to managing modern infrastructure by streamlining the lifecycle management of Kubernetes clusters and cloud-native applications. With Rafay, enterprises can use any Kubernetes distribution and gain centralized automation, security, visibility, and governance capabilities.

Rafay partners with Kubecost to make it easier for teams working with Kubernetes to streamline operations with Kubernetes cost reporting and visibility. Organizations utilizing Rafay KOP can take advantage of Rafay’s new Kubernetes Cost Management recipe to automate the deployment of Kubecost across an organization’s clusters. This enables SRE and DevOps teams to understand their costs in every Kubernetes cluster, providing teams with Kubernetes cost optimization and control.

“The Rafay-Kubecost cluster template enables enterprises to seamlessly deploy Kubecost to each new cluster as the cluster gets created,” said Mohan Atreya, SVP Products and Solutions, Rafay Systems. “It vastly simplifies the process of configuring Kubecost across Kubernetes clusters—allowing you to gain deep insights into your cluster’s performance and take advantage of optimization opportunities identified by Kubecost.”

StormForge platform for Kubernetes resource efficiency

The underlying assumption behind the StormForge platform is that K8S resource efficiency requires a holistic solution which includes cost optimization as an obvious goal while also ensuring that applications are going to perform as expected before being deployed into production. Additionally, optimizing cost and application performance cannot be effectively done manually by developers who should instead be enabled to focus on innovating to drive business value.

The StormForge platform uses a patent-pending machine learning algorithm to automatically find the application configuration that will result in the best outcomes, based on the goals for application performance, cost, and resource utilization. It allows optimizing for multiple objectives, acknowledging that there’s always a trade-off between performance, cost, and time/effort.

Unlike other K8S cost management offerings discussed earlier, StormForge takes a proactive approach through placing real-world loads on applications in pre-production environments to perform “what if” analyses that predict how each application will behave at scale. The impetus is to provide developer-ready infrastructure so teams can make smart decisions without spending hours or days manually tuning applications.

“We believe that Kubernetes complexity is the main driver of over-provisioning and cloud spend waste which cannot be optimized in a vacuum”, said Rich Bentley, Sr. Director, Product Marketing, StormForge. “StormForge provides a whole solution addressing the trade-offs among cost, performance, agility, and effort.”

No one doubts the value and financial results of an effective process towards cloud native application infrastructure, and that Kubernetes is crucial to accomplishing this goal. While this winding path is fraught with challenges and unique to every enterprise, developing the ideal direction starts by choosing the tools that enable optimizing infrastructure deployment costs.


Saqib Jang

Saqib Jang is Founder and Principal of Margalla Communications, a market analysis and consulting firm with deep domain expertise in cloud infrastructure and services. He is an accomplished marketing and business development executive with over 20 years’ experience in setting product and marketing strategy and delivering market-leading infrastructure solutions for cloud and enterprise markets.

Inline Feedbacks
View all comments