Diving deep into Machine Learning

Weka — An interface to a collection of machine learning algorithms in Java

Eibe Frank
Machine learning image via Shutterstock

Machine learning may sound futuristic, but its not. Speech recognition systems such as Cortana or Search in e-commerce systems have already showed us the benefits and challenges that go hand in hand with these systems. In our machine learning series we will introduce you to several tools that make all this possible. Next stop: Weka.

This article is part of a Machine Learning series. Our fourth expert is Dr. Eibe Frank, Associate Professor (Computer Science) at the University of Waikato, New Zealand. In this article, he talks about Weka and reveals what’s under its hood. 

What is Weka?

The idea behind Weka was to provide a uniform interface to a collection of machine learning algorithms in Java. This includes a graphical user interface, a command-line interface, and an API.

Weka is implemented in Java, but there are packages for Weka that enable use of code written in Python, and R can also be used from Weka. It is also possible to script Weka using Groovy or Jython. Development of Weka was started in 1997, when Java was still very young (and slow). The latest version, Weka 3.8, requires Java 7 or later. Weka’s strength lies in classification, so applications that require automatic classification of data can benefit from it, but it also supports clustering, association rule mining, time series prediction, feature selection, and anomaly detection.

How to use Weka in your Java code

Here is an example of how to use Weka in your Java code. Another example can be found here — it trains a naive Bayes classifier on a dataset stored in an ARFF file. ARFF is Weka’s default data format but there is support for a number of other data formats as well, including CSV files. It is also possible to extract data from a database.

A more interesting example is this one. It applies text classification by representing user-specified text using the so-called bag-of-words model. The bag-of-words representation is obtained by applying Weka’s StringToWordVector filter. The decision tree learner J48 is then run on the bag-of-words data.

Plans for Weka

Weka 3.8 has a package management system and we expect that Weka will now primarily be expanded through the contribution of new packages, offering new learning algorithms and visualization tools. We are not currently planning any major changes to the base system.

Machine learning: Resources for getting started

A gentle introduction to practical machine learning is “Data Mining: Practical Machine Learning Tools and Techniques” by I. H. Witten et al. A fourth edition of this book will come out later this year, with material on deep learning and probabilistic modeling. Disclaimer: I am one of the co-authors.

Will machines rule the world?

As far as I can tell, nobody is even close to solving the mystery of consciousness. As long as machines only do what they are told, whether this is based on optimizing performance using machine learning or not, we should be safe as long as we can prevent the ruling elites from abusing the machinery.

We asked Eibe Frank to finish the following sentences:

In 50 years’ time machine learning will be ubiquitous.
If machines become more intelligent than humans  the latter will need to find new ways of occupying themselves because many jobs will disappear.
Compared to a human being, a machine will never …A human being is a biological machine so I cannot complete this sentence.
Without the help of machine learning, mankind would never (be able to) exploit all the useful information in the enormous amount of data being collected today.



Eibe Frank

Dr. Eibe Frank, Associate Professor (Computer Science) at the University of Waikato, New Zealand. He is involved in the development of the WEKA software and programmed the first components for WEKA as a PhD student in 1997. His main field of activity is machine learning (and its applications).

comments powered by Disqus