Data science applications: Containers or serverless?
Computing solutions like serverless and containers have become all the rage. In this article, Kayla Matthews explains the differences between serverless and containers and how each one has differing implications for data science applications.
The rise of cloud computing solutions, coupled with a growing need for always-on, connected experiences, is driving cutting-edge IT adoption and a push toward innovative technology. As a result, a derivative of cloud computing called serverless computing has recently emerged. It stacks up against another form of similar technology that has been popular for years now: containers or container-based programming.
As one might expect, it’s not always the best idea to adopt something new because it’s trendy. It may work and may well be more efficient, but that doesn’t necessarily mean it’s the ideal solution for a team or organization.
That said, one of the newer debates in the industry now relates to these two technologies — one already prominent and another rising. Between containers and serverless solutions, which is better for the average data science application? More importantly, what’s the difference, and how does that affect the results?
What is serverless computing and programming?
Right off the bat, it’s important to understand the term “serverless” does not mean there is no server or remote computing portals involved. In fact, the reality is precisely the opposite, because the technology still relies on remote, cloud-based servers.
If that’s the case, why is it called serverless? It’s because a third-party service provider handles all the IT operations and maintenance. In other words, the primary platform still lives within a cloud operations system, yet the code gets written and deployed separately.
This tech allows the programming team to handle, develop and distribute the application itself, while the hardware and systems infrastructure get handled remotely. It removes the burden of delivering, powering, and maintaining remote hardware from a development or programming team and allows them to focus on their specialty — product and software development.
That’s precisely why many hail serverless computing as the ideal solution, particularly through Amazon’s Lambda.
How is serverless computing different from containers?
Containers are essentially what the name describes: a comprehensive software package that gets delivered and used as a standalone application environment. The most common form of this is how applications get distributed at runtime.
Everything and anything that’s needed to run and interact with a piece of software gets included or packaged together. That often entails bundling the software code, runtime and system tools, software and foundational libraries and default settings.
In the case of virtualization containers — through services like Docker — they help solve the more common problems that arise from cross-platform use. When moving from one computing environment to another, developers often run into obstacles.
If the supporting software is not identical, it can cause a series of hiccups. And since it’s not uncommon for developers to move from a personal or work computer to a test environment, or even from staging into production, this becomes a widespread issue. That’s just concerning the software itself, as other issues can appear from network topologies, security and privacy policies, as well as varying tools or technologies.
Containers solve this by wrapping everything up nicely into a runtime environment. Virtualization technology and systems are remarkably similar, except the package that is exchanging systems is an entire virtual machine – essentially an operating system.
Even with container-based solutions, companies and teams still require internal servers or hardware solutions to manage the data. The number of servers necessary for this depends — and always will — on the data load or requirements needed.
OK, which is better?
Unsurprisingly, the ideal solution depends on the project at hand. Some projects are certainly more suitable for a serverless computing environment, whereas others would be better within a container-based solution.
Containers, by nature, allow for bigger and more complex applications and deployments. Virtualization has made them even more viable because it allows incredibly complicated and monolithic solutions to be structured and delivered within the appropriate environment. Because of this, it affords robust controls over the individual containers, as well as the entire system.
Comparatively, serverless computing calls for reliance on a service provider, not least of which involves their goodwill and security capabilities. The partner company must trust that their provider has infrastructure and security policies in order. For reference, it does help to know AWS Lambda, Microsoft, and even Google are big, trustworthy providers in this space.
Also, serverless costs are much more manageable — even inexpensive — because a company only pays a provider for resources, like the time and volume of traffic each system uses. Pricing is based on active computing resources only, so idle time has little to no cost.
That is in direct contrast to the complexity of containers, which call for completion of functions on a greater scale, which often drives up costs. Serverless relies on small, simple tasks with almost no overhead. Also, with serverless computing, the emphasis remains on the core concepts of software development, such as writing code.
For instance, a programmer can write a main application — like an external service — entirely separate, because there’s no integration with the container or runtime ecosystem. But there’s a downside to this. Less control and weaker integration mean less power to debug, test, and monitor an application, especially across platforms — and fewer performance metrics to boot.
The kind of full and rounded controls container-based solutions offer allows users to test and effectively understand what happens inside and outside containers, resulting in more detailed analytics at all levels of deployment.
When working with containers, it’s not unheard of to identify and optimize performance issues as the project progresses, so the entire system always returns the desired result.
Which is better for data science?
Containers still have their place, but serverless computing is definitely the up-and-coming star in the world of big data. For data science specifically, serverless platforms don’t require infra-management or extra teams to handle things like Hadoop or Spark clusters.
Not to mention, serverless solutions monitor resource usage always and will scale up — or down — depending on requirements. That’s incredibly helpful for smaller teams that don’t have the bandwidth to scale up on their own, but it’s equally beneficial for larger organizations due to the cost savings.
Serverless is remarkably hands-off, which is one of the advantages of TensorFlow-like models. The code gets uploaded to the provider or remote system, which then handles the deployment appropriately. Before uploading the code, a programmer can develop and write it using more preferable environments or methods.
But that doesn’t mean containers are obsolete — not yet, anyway. In fact, they are still incredibly useful and reliable — in some cases, it’s possible to use both forms of technology interchangeably.
As serverless computing solutions become more advanced and capable, it’s possible this will change, but right now it sure seems like it’s going to be a while before containers get phased out entirely.
This article is part of the latest JAX Magazine issue. You can download it now for free.
Have you adopted serverless and loved it, or do you prefer containers? Are you still unsure and want to know more before making a decision? This JAX Magazine issue will give you everything you need to know about containers and serverless computing but it won’t decide for you.