MOA: Machine learning for the Internet of Things
Machine learning may sound futuristic, but it’s not. Speech recognition systems such as Cortana or Search in e-commerce systems have already shown us the benefits and challenges that go hand in hand with these systems. In our machine learning series we will introduce you to several tools that make all this possible. Next stop: MOA, an open source software specific for machine learning/data mining on data streams in real time.
This article is part of a Machine Learning series. Our next expert is Albert Bifet, co-leader at MOA and author of a book on Adaptive Stream Mining and Pattern Learning and Mining from Evolving Data Streams. In this article, he talks about MOA, an open source software specific for machine learning/data mining on data streams in real time.
JAXenter: What is the idea behind MOA ?
Albert Bifet: MOA is an open source software specific for machine learning/data mining on data streams in real time. In this setting, data arrives continuously and we need to process it in a very efficient way, in terms of time and memory. An example of this type of data is data produced by the devices of the Internet of Things (IoT). We need also to consider that data may be changing, so we need to update our models continuously.
MOA is developed in Java.
JAXenter: Tell us more about what’s under this project’s hood: What language do you use?
Albert Bifet: MOA is developed in Java, and can be easily be used with Weka and Adams. MOA can be used for large evolving datasets and data streams.
JAXenter: Can you give us an example?
Albert Bifet: It is very easy to use MOA objects inside Scala. Let’s see an example, using the Scala Interactive Interpreter. First, we need to start it, telling where the MOA library is:
scala -cp moa.jar Welcome to Scala version 2.9.2. Type in expressions to have them evaluated. Type :help for more information.
Let’s run a very simple experiment: using a decision tree (Hoeffding Tree) with data generated from an artificial stream generator (RandomRBFGenerator).
We should start importing the classes that we need and defining the stream and the learner.
scala> import moa.classifiers.trees.HoeffdingTree import moa.classifiers.trees.HoeffdingTree scala> import moa.streams.generators.RandomRBFGenerator import moa.streams.generators.RandomRBFGenerator scala> val learner = new HoeffdingTree(); learner: moa.classifiers.trees.HoeffdingTree = Model type: moa.classifiers.trees.HoeffdingTree model training instances = 0 model serialized size (bytes) = -1 tree size (nodes) = 0 tree size (leaves) = 0 active learning leaves = 0 tree depth = 0 active leaf byte size estimate = 0 inactive leaf byte size estimate = 0 byte size estimate overhead = 0 Model description: Model has not been trained. scala> val stream = new RandomRBFGenerator(); stream: moa.streams.generators.RandomRBFGenerator =
Now, we need to initialize the stream and the classifier:
scala> stream.prepareForUse() scala> learner.setModelContext(stream.getHeader()) scala> learner.prepareForUse()
Now, let’s load an instance from the stream, and use it to train the decision tree:
scala> import weka.core.Instance import weka.core.Instance scala> val instance = stream.nextInstance() instance: weka.core.Instance = 0.210372,1.009586,0.0919,0.272071, 0.450117,0.226098,0.212286,0.37267,0.583146,0.297007,class2 scala> learner.trainOnInstance(instance)
And finally, let’s use it to do a prediction.
scala> learner.getVotesForInstance(instance) res9: Array[Double] = Array(0.0, 0.0) scala> learner.correctlyClassifies(instance) res7: Boolean = false
Read the entire example here.
JAXenter: What does the future hold for MOA?
Albert Bifet: We would like to continue adding more methods to MOA, and continue helping developers and researchers to create better algorithms.
JAXenter: What is so fascinating about machine learning?
Albert Bifet: That in a near future, it can be used to learn new concepts using datasets that due to its large size, it is not possible to do it nowadays.
JAXenter:Do you think machines will someday take over the world? Are those fears well-founded?
Albert Bifet: No, this is only science-fiction at the moment :)
JAXenter: What are the top three blogs/movies/books that come to your mind when someone says they would like to know more about machine learning?
Albert Bifet: I may recommend the following books: The Master Algorithm by Pedro Domingos, the book Data Mining: Practical Machine Learning Tools and Techniques by Ian Witten, Eibe Frank, and Mark Hall. And the website/blog www.kdnuggets.com.
We asked Albert Bifet to finish the following sentences:
In 50 years’ time, machine learning will be still in its infancy.
If machines become more intelligent than humans, humans will live happier and longer lives.
Compared to a human being, a machine will never enjoy doing nothing.
Without the help of machine learning, mankind would never (be able to) cure cancer.
Thank you very much!
Take a look at our machine learning initiative:
- The vital role of CognitiveJ in Machine Learning
- Machine intelligence vs. machine learning
- Apache MLlib — Making practical machine learning easy and scalable
- “Python is the most popular programming language today for machine learning”
- Weka — An interface to a collection of machine learning algorithms in Java