The idea of collecting and analyzing data to gather insights isn’t really new. However, the specific roles involved in the collection and analysis of data have grown and evolved considerably over the last decade as the amount of data being created has increased at a staggering rate. In this article, Cher Zavala explains why data engineers are so important.
Containers revolutionize the way modern software is being developed and operated. We talked to Johannes Unterstein, Distributed Applications Engineer at Mesosphere and JAX DevOps speaker about container tools and technologies and containers’ usefulness in a DevOps context.
It’s been one year since Yahoo open-sourced CaffeOnSpark so the tech giant has found a way to celebrate it — by open-sourcing TensorFlowOnSpark, its latest open source framework for distributed deep learning on big data clusters.
Apache Beam has successfully graduated from incubation, becoming a new Top-Level Project at the Apache Software Foundation. We invited the Apache Software Foundation’s Davor Bonaci and Jean-Baptiste Onofré to talk about the project’s journey to becoming a Top-Level Project and concrete plans for its future.
Big Data is changing. Buzzwords such as Hadoop, Storm, Pig and Hive are not the darlings of the industry anymore —they are being replaced by a powerful duo: Fast Data and SMACK. Such a fast change in such a (relatively) young ecosystem begs the following question: What is wrong with the current approach? What is the difference between Fast and Big Data? And what is SMACK?
Netflix Hollow is a Java library and comprehensive toolset for harnessing small to moderately sized in-memory datasets which are disseminated from a single producer to many consumers for read-only access. It is built with servers busily serving requests at or near maximum capacity in mind and its aim is to address the scaling challenges of in-memory datasets. Let’s see the advantages that come from using Netflix Hollow.
As a (new) member of the R Consortium, IBM will work side by side with the R user community and support the project’s mission to pinpoint, create and implement infrastructure projects that drive standards and best practices for R code.
Version 1.8 of the Clojure Lisp dialect offers new string functions, as well as the possibility of direct linking, among other features.
It’s touted as the industry’s only open-source enterprise grad unified stream and batch processing platform. Apache Apex community manager Desmond Chan show’s us what exactly that means and how this open-source engine handles big data.
After a preview version had been published at the end of November 2015, the final version of Apache Spark 1.6 is at long last ready for download. The update contains a total of over 1,000 changes; release highlights include a variety of performance improvements, the new Dataset API and expanded data science functions.
VMTurbo founders Yechiam Yemini and Yuri Rabover, as well as Principal Solutions Engineer Eric Wright have braved a look into the future and identified a few trends for the upcoming year.
If you search Google Scholar for “machine learning”, it returns over 1,800,000 publications. As the buzz around this technology grows, so too does its complexity. Sebastian Raschka, author of Packt’s “Python Machine Learning”, introduces us to the three types of machine learning.
At the SAP TechEd in Barcelona, SAP brought its new technology down to developer level, showcasing the latest in SAP Hana, such as the Hana Cloud Platform’s usage of Cloud Foundry, while calling on IT to innovate and build a ‘digital core’, rather than just integrate.
It’s been 10 years since Big Data made the rounds for the first time as a mainstream concept and many questions are still unanswered. Emmanuel Letouzé, in his W-JAX Keynote, looks at the relationship between data, ethics, politics and human rights.