Trivial pursuit

How well do you know your Apache Spark trivia?

David Wyatt
Eclipse IDE Jenkins Akka npm MapR GitLab Sauce Labs JavaFX MicroProfile Apache Cassandra NetBeans Theia jOOQ Docker ArangoDB Hyperledger Couchbase Apache Spark JUnit
© Shutterstock / Frazao

We’re bringing our long-running trivia series to an end. One of our final subjects is Apache Spark, the popular open source data processing engine for machine learning and SQL datasets.

Europe’s biggest community event for Apache Spark and artificial intelligence comes to London in October.

To get you ready, we worked with David Wyatt, Vice President EMEA of Databricks on a quiz covering Apache Spark and machine learning. Test your knowledge of AI and Apache Spark using the quiz below.

1. When was Apache Spark first created?
a) 2007.
b) 2009.
c) 2011.
d) 2013.

2. What is Apache Spark?
a) A unified analytics engine for large-scale data processing
b) A distributed systems kernel
c) An open-source project to manage computer clusters
d) An open source stream-processing software platform

SEE ALSO: Deep learning anomalies with TensorFlow and Apache Spark

3. Announced at Spark Summit US in 2018, what is the latest open source project focused on the end-to-end machine learning lifecycle?
a) MLflow
b) SparkUP
c) TensorSpark
d) SparkSML

4. What project was introduced by the committers of Apache Spark this year, aimed at unifying Apache Spark and Deep Learning frameworks?
a) Project Hydrogen
b) Project Oxygen
c) Project Neon
d) Project Sulphur

5. What API interface do you need to use with MLflow?
d) RPC

6. What percentage of enterprise organisations are considering AI projects today?
a) 20%
b) 40%
c) 60%
d) 80%

7. According to research, how many AI projects are actually moved to production?
a) 1 in 3
b) 2 in 3
c) 3 in 4
d) 4 in 5

8. At which United States university was the Spark project created?
a) Stanford University
b) UC Berkeley
c) MIT
d) Harvard

9. How many tools do companies have to manage their AI and analytics deployments, on average?
a) 3
b) 5
c) 7
d) 9

10. What percentage of companies have problems with AI due to poor collaboration and siloed data science and engineering teams?
a) 20%
b) 40%
c) 60%
d) 80%

SEE ALSO: New Apache Spark library aims to make deep learning approachable


1. b) 2009 was the year when the Spark project was originally conceived by Matei Zaharia at UC Berkeley AMPLab.

2. a) Apache Spark is a unified analytics engine for large-scale data processing.

3. a) MLflow is a new open source project based on Spark that aims to make it easier to integrate Spark with machine learning. MLflow’s open format makes it very easy to share workflow steps and models across organisations if you wish to open source your code. See Introducing MLflow: an Open Source Machine learning Platform for more details.

4. a) Project Hydrogen was announced at Spark Summit US, and is being built to “reconcile fundamental incompatibility between Spark and distributed ML frameworks” The project was announced by Reynold Xin in a presentation entitled ‘Project Hydrogen: State-of-the-Art Learning on Apache Spark

5. a) REST – Mlflow is built around REST APIs and simple data formats (e.g., a model can be viewed as a lambda function) that can be used from a variety of tools, instead of only providing a small set of built-in functionality.

6. d) 80% – according to research by Databricks and IDG, 80% of companies are looking to implement new AI projects. In fact the average company had around 6 projects powered by AI and machine learning technologies planned! See the Survey report for more details.

7. a) Sadly, only one in three AI projects is implemented successfully and put into production. This poor rate of success was most commonly linked to issues with getting data ready for analysis.

8. b) The Apache Spark project was originally created in the AMPLab at UC Berkeley.

9. c) According to our the recent Databricks and IDG survey, companies have up to seven different tools and machine learning frameworks in place.

10. d) 80% of companies surveyed admitted that there were issues caused by data science and data engineering teams running in silos, rather than collaborating. Unifying analytics workflows can help avoid that problem.

SEE ALSO: Machine learning A-team: TensorFlow, Apache Spark MLlib, MOA and more


How well did you do?

0-3 correct: You’re starting out around Spark. Getting to know more about the project could help you out.
4-6 correct: You’re pretty solid in your Apache Spark trivia, but you still might need to refresh your knowledge.
7-9 correct: Nice! You really know your stuff!
10 correct: You are an Apache Spark professional.

Whatever your score, Spark+AI Summit Europe will help you keep up with the latest projects and developments for Apache Spark. The event is taking place from 2nd October to 4th October 2018 at London’s Excel Centre. For more information, visit here.



David Wyatt

David Wyatt is Vice President EMEA at Databricks, responsible for the company’s growth and development in Europe. Prior to Databricks, he was Vice President EMEA at MuleSoft.

Inline Feedbacks
View all comments