Five tips for reducing on-premise platform costs
Businesses are looking to reduce spending across cloud providers. Clusters are expensive and poorly optimised applications come at the price of high maintenance costs. Take a look at these five areas to work towards reducing on-premise platform costs. They will not only increase reliability but save the cost of buying more nodes as the business expands.
Across cloud providers, customers are looking for ways to reduce spending and bring down overall costs for their applications. Often as businesses expand and more use cases are onboarded, there is a need to buy more clusters to facilitate their applications.
However, clusters are expensive. Moreover, with poorly optimised applications incurring exorbitant maintenance costs, there is a latent opportunity for organisations to optimise their existing applications for cost and efficiency – both of which positively impact the business. With this in mind, here are five tips for reducing on-premise platform costs.
Identify resource wasting applications
A good starting point is identifying your resource-wasting applications. With various sets of app developers submitting applications, there are inevitably some that are resource wasting and affect mission-critical applications. This is, of course, detrimental to overall application performance but it also has a financial impact. Yet this can be established easily using point in time KPIs to provide developers a comprehensive view of individual clusters.
With a view of clusters on a more granular level, plans can be developed to remediate issues – and these plans can be refined further as management teams add more nodes. For instance, if developers identify a resource with issues they can use point in time KPIs to identify the exact state of the applications at the point of failure. As it stands, operations teams have the tools in-house to understand their clusters but are in a reactive – rather than proactive – state.
In other words, teams can only identify and respond to issues after they occur. For teams wanting to adopt a more proactive stance, they now have the ability to set up ‘auto actions’ that will remediate them faster. For example, by creating an automated response when ‘criteria X’ is met, management systems can prevent that application from slowing down other critical applications unnecessarily.
Another step for teams looking to adopt a proactive stance is through establishing priority queues to ensure that meaningful applications maintain performance if less-critical apps are draining resources. Simply put, low-priority applications can be killed remotely.
Maximise existing cluster resources
A common situation developers find themselves in is working within a cluster and finding they are running out of resources. It’s common for them to ask for more nodes to increase performance but it’s important to establish first whether they are actually making the most of their existing resources.
For instance, it is fairly common for developers to copy and paste pipeline jobs but use the same launch time. As a consequence clusters resources are imbalanced and, with people typically working and running applications during business hours, we see peak usage between 9 am and 5 pm. This means that there are 16 hours of the day where applications could be run with more abundant resources.
This maximisation of existing resources improves performance of existing applications but also allows for onboarding more applications with existing hardware – avoiding the need for buying more clusters.
Optimising Existing HDFS Storage
While it may seem a basic issue, HDFS storage can be a bottleneck for organisations running out of disk space for their applications. This is particularly true of huge organisations with massive data workloads – especially as archival and queries are a large burden for most clusters.
To address this, we need to understand how teams are using data and from there develop data archival strategies. While there is the option to spend more and increase available hardware, the more economically viable option is to address this issue through smarter data use. By using data to see who has built which tables, when they were last accessed, how many partitions there are, and so forth, a proper archival strategy can be developed that deletes tables not used by the business.
Moreover, much of this process can be automated by allowing policies to prioritise and create tables based on quantifiable metrics.
Properly size your applications
Creating efficient applications that don’t allocate resources is another key way of reducing costs. However, this process can often take hours or days as teams need to manually compile the necessary data to inform these decisions.
For a Spark developer, for instance, all this information would have to be collated personally; you would need to browse through logs, open up Spark UI, look at the Spark history server, look at resource manager, then do trial and error to figure out what the issue is. What is needed is a self-service model where data teams can reduce the time scale down to hours as information of applications is readily available.
With metrics down to the operation level, teams are empowered with information on which applications are slow, which can be fixed on the configuration side and which have data skew. And with a lower mean-time to resolution, time spent on maintenance issues and spend on personnel supporting applications can be reduced. Instead, support personnel can be working on other areas that provide more value to the business.
Promote better applications to production
Another way of ensuring that application resources are maximised from the beginning is to promote better applications to production. Bad applications being promoted to production with flawed code often result in other applications suffering. By looking to promote better efficiencies in the software development cycle, future applications will drain fewer resources. This needs to be a manual effort from the team leader who is doing manual code review – they are the ones determining that the code is good enough to be pushed along to different stages.
However, code review on this scale is time-consuming. There are likely various teams creating data apps but how do you manage the code quality from development to sage to UAT to production? Moreover, how do we do so efficiently?
Again, automation is a boon to data teams. By providing API driven recommendations, team leaders will only have to review code once it is ready. This can be implemented by creating check-ins that independently check code and returns it to the developer before final approval. This promotes efficient applications that are properly sized during production where resource wastage is minimised. The main advantage is the time savings for teams in the development process as team leaders don’t need to manually code check everything, allowing for more apps to be created at a faster velocity.
By looking at these five areas, organisations can increase the efficiency of their data workloads. This not only increases reliability but saves the cost of buying more nodes as the business expands. For long term growth, this approach is integral and necessary to ensure IT teams can devote their time to areas other than fixing applications.
Think actionable data – understand the environment, and make that data work anywhere to better optimise performance, automate troubleshooting and keep costs in check.