How well do you know your Apache Spark trivia?
We’re bringing our long-running trivia series to an end. One of our final subjects is Apache Spark, the popular open source data processing engine for machine learning and SQL datasets.
Europe’s biggest community event for Apache Spark and artificial intelligence comes to London in October.
To get you ready, we worked with David Wyatt, Vice President EMEA of Databricks on a quiz covering Apache Spark and machine learning. Test your knowledge of AI and Apache Spark using the quiz below.
1. When was Apache Spark first created?
2. What is Apache Spark?
a) A unified analytics engine for large-scale data processing
b) A distributed systems kernel
c) An open-source project to manage computer clusters
d) An open source stream-processing software platform
3. Announced at Spark Summit US in 2018, what is the latest open source project focused on the end-to-end machine learning lifecycle?
4. What project was introduced by the committers of Apache Spark this year, aimed at unifying Apache Spark and Deep Learning frameworks?
a) Project Hydrogen
b) Project Oxygen
c) Project Neon
d) Project Sulphur
5. What API interface do you need to use with MLflow?
6. What percentage of enterprise organisations are considering AI projects today?
7. According to research, how many AI projects are actually moved to production?
a) 1 in 3
b) 2 in 3
c) 3 in 4
d) 4 in 5
8. At which United States university was the Spark project created?
a) Stanford University
b) UC Berkeley
9. How many tools do companies have to manage their AI and analytics deployments, on average?
10. What percentage of companies have problems with AI due to poor collaboration and siloed data science and engineering teams?
1. b) 2009 was the year when the Spark project was originally conceived by Matei Zaharia at UC Berkeley AMPLab.
2. a) Apache Spark is a unified analytics engine for large-scale data processing.
3. a) MLflow is a new open source project based on Spark that aims to make it easier to integrate Spark with machine learning. MLflow’s open format makes it very easy to share workflow steps and models across organisations if you wish to open source your code. See Introducing MLflow: an Open Source Machine learning Platform for more details.
4. a) Project Hydrogen was announced at Spark Summit US, and is being built to “reconcile fundamental incompatibility between Spark and distributed ML frameworks” The project was announced by Reynold Xin in a presentation entitled ‘Project Hydrogen: State-of-the-Art Learning on Apache Spark’
5. a) REST – Mlflow is built around REST APIs and simple data formats (e.g., a model can be viewed as a lambda function) that can be used from a variety of tools, instead of only providing a small set of built-in functionality.
6. d) 80% – according to research by Databricks and IDG, 80% of companies are looking to implement new AI projects. In fact the average company had around 6 projects powered by AI and machine learning technologies planned! See the Survey report for more details.
7. a) Sadly, only one in three AI projects is implemented successfully and put into production. This poor rate of success was most commonly linked to issues with getting data ready for analysis.
8. b) The Apache Spark project was originally created in the AMPLab at UC Berkeley.
9. c) According to our the recent Databricks and IDG survey, companies have up to seven different tools and machine learning frameworks in place.
10. d) 80% of companies surveyed admitted that there were issues caused by data science and data engineering teams running in silos, rather than collaborating. Unifying analytics workflows can help avoid that problem.
How well did you do?
0-3 correct: You’re starting out around Spark. Getting to know more about the project could help you out.
4-6 correct: You’re pretty solid in your Apache Spark trivia, but you still might need to refresh your knowledge.
7-9 correct: Nice! You really know your stuff!
10 correct: You are an Apache Spark professional.
Whatever your score, Spark+AI Summit Europe will help you keep up with the latest projects and developments for Apache Spark. The event is taking place from 2nd October to 4th October 2018 at London’s Excel Centre. For more information, visit here.
Programming Pub Quiz: Have you tried our other pub quizzes? Test your knowledge of other topics!