Deep Learning: It’s time to democratize technology
Deep learning is probably one of the hottest topics in the field of software development at the moment. We spoke with Shirin Glander and Uwe Friedrichsen, both giving an introduction into deep learning at JAX 2018, about such future prospects.
Deep Learning = Software 2.0?
JAXenter: Hello Shirin, hello Uwe! We would like to talk to you a little bit about deep learning. Some experts are predicting the dawn of a new era, which will also lead to the development of a wholly new set of software. What do you think of such predictions? What would this software 2.0 be like?
Uwe Friedrichsen: I think that right now the magic crystal ball is still very cloudy; a prognosis is still difficult to make from my point of view.
Deep learning can be used for tasks which are easy for humans while they are quite difficult for the traditional AI methods.
What distinguishes deep learning (DL) from many other traditional AI approaches is the fact that it can be used quite successfully in areas which require a certain kind of “intuition”. In other words, DL can be used for tasks which are easy for humans while they are quite difficult for the traditional AI methods (for example, the recognition of objects in images).
I think that DL has the potential to have a similar effect on white-collar workers, including software developers as office workers, just as robots had back then on blue collar workers. I do think, however, that it’s still too difficult to predict, whether this will happen and just how far-reaching the effects will be since there are new factors now beyond pure technology, like political, economic and social developments, which will have a much more decisive role in this scenario.
Shirin Glander: These predictions about the sheer endless possibilities in which AI, and deep learning, in particular, will change our world do not exist solely within the realm of software development. This is due to the fact, that in some cases deep learning was so surprisingly successful in solving tasks which were previously considered impossible for a computer. For example “inventing” new moves in Go or creating images, texts and videos which are so realistic that people cannot distinguish them from the real ones.
When it comes now to software development these concepts could be, for instance, transferred in such a way that an algorithm generates code or even recognizes problematic code points which would lead to errors in the test stages, already during the development stage. And analogous to go-gaming, for example, such an algorithm could also be able to develop new and more effective variants of code writing.
These possibilities are already not that far away. However, we of course, still don’t know whether these possibilities can be implemented in practice. But I do think that software development will be supported and supplemented by AI in the future.
JAXenter: Well now, the field of AI wasn’t established just yesterday. As far back as the 1960s, there have been some wild expectations associated with the field of AI which were fulfilled only in the rarest of cases. How does the current wave of machine learning differ from previous AI approaches?
Uwe Friedrichsen: It differs so, that expectations have become even wilder (laughs!) No, but seriously, from my point of view the most significant differences are the much greater computing power, coupled with the sheer amount of available data for learning which has also grown in scale.
Shirin Glander: Exactly, this is the essential difference. To put it simply, many ideas and concepts were theoretically developed back then but only now, due to today’s computing power and storage capacity, it is possible to use them in such a way that their ingenuity and performance ability is able to shine through. Of course, further developments and new algorithms are still being developed today but they all are actually based on what was already conceptually developed years ago.
Uwe Friedrichsen: Yes, I can still remember the wave of connectionism of the 80s and 90s quite well. During that time, it took almost forever to just train a simple three-layered neural net, with only a few thousand examples. And of course, it was only successful if you painstakingly extracted all suitable characteristics from the raw data itself and found a good initialization for the parameter, which should be learned. This felt, quite literally, like searching for the proverbial needle in a haystack.
The DL networks of today are able to find the appropriate characteristics themselves. They are shown the raw data and basically, the first layer of the network does nothing else but extract the best characteristics for the task. And because the correct configuration of the so-called hyperparameters, the parameters which control the behavior of the network like the number of layers, the learning rate, etc, is still not quite trivial; one is now moving towards a kind of “meta-learning”, meaning that one tries to learn the best hyperparameter settings for a class of problems by machine instead of painstakingly finding out and optimizing them manually.
But this requires much more computing power than we would have had at the beginning of the 90s. Today, with almost unlimited computing power via the cloud, this is no longer a technical problem. The storage and provision of extremely large amounts of data which enables a much more accurate training of the networks is no longer a technical problem thanks to cloud and Big Data.
Today we can move into the fields of machine learning (ML) and DL that were simply denied to us previously, due to a lack of computer resources. This is, in my opinion, also one of the greatest dangers that I currently see in AI, especially in the DL sector; even if it is no longer a technical problem, computing power of this magnitude costs a lot of money – money that only very few and large companies are able or willing to raise. With a few exceptions, like in China, there are largely no reports in the area of state subsidies or the like.
Today we can move into the fields of machine learning (ML) and DL that were simply denied to us previously, due to a lack of computer resources.
This means that not only applied research but also basic research in this field is increasingly concentrated on a relatively small number of companies. This is understandable since, let’s say, if I “just” need 10,000 GPUs to validate my hypothesis in a finite time, then this is no problem for Google, Amazon or a comparable company. If I want to do this at a university, it is at least difficult but usually simply not possible.
Shirin Glander: It is a similar situation in terms of data. In order to get a good DL model, we need a large amount of data which must also be processed accordingly, for instance by labeling. And here it is the case too that companies like Google and Facebook have the data monopoly which gives them an enormous advantage. It is not without its reason why so many of today’s popular pre-trained models and algorithms are from companies like GoogLeNet/Inception, Deep Dreaming, Prophet, etc. But even in this case, it is the only part, and probably also a small part, that has been made open source and is now available to the public. We can only imagine what else is being developed in the backdrop.
Uwe Friedrichsen: And if we link this situation to my hypothesis, how DL has the potential to have significant effects on professional and private areas which were previously “reserved” for people, then it makes sense. History teaches us that powerful tools in the hands of a few have rarely produced good results for the masses. In my view, we urgently need to work towards a greater democratization of technology.
How to start with Deep Learning
JAXenter: Deep learning is not known for being particularly easily accessible. And in this context one question is asked over and over again: How much math do I actually need in order to be able to use deep learning techniques?
Uwe Friedrichsen: I will be brief here since my last answer was quite long. You don’t need to know a lot of math in order to use DL in a simple matter. The existing ecosystem allows for an easy, mainly math-free access with which someone can also just try out what works and what does not. But if you want to get to know the subject in more depth, then I do think that there is no way around refreshing your mathematics skills a little bit.
Shirin Glander: From the point of view of a data scientist who has a reasonably good mathematical knowledge, but is far from being a mathematician, I would confirm that. To apply the existing libraries and algorithms, it is sufficient to be able to estimate which algorithm can be used for which problem and then apply it according to the documentation of the selected framework like TensorFlow.
Uwe already mentioned before the so-called hyperparameter tuning, the automated adjustment of all the many adjustment screws that can be adjusted in a neural network in order to achieve the best possible result; with this tuning, one could theoretically go quite far in a machine learning model by simply trying out various hyperparameter combinations without any mathematical knowledge. However, this completely “blind” approach costs more time and computer power which, for example, if you train larger models on cloud instances, can also create considerable more financial costs.
JAXenter: One difference to the past is certainly that some deep learning projects are now available which in principle can be used by everyone. Which tools, frameworks or libraries can you recommend?
Shirin Glander: TensorFlow and PyTorch are particularly popular at the moment. But there are also many others like Caffe, Theano, CNTK, MXNet, etc. However, beginners do not need a special deep learning tool. Most of the more general machine learning libraries can also be used to train neural networks, but are more flexible with the selection of algorithms because deep learning is not always the best method to train a model; other algorithms like Random Forest, Gradient Boosting or Naive Bayes are also worth a comparison! The two clear favorites for such general machine learning libraries are caret for R and Scikit-learn for Python.
Uwe Friedrichsen: I would perhaps add Keras, as it offers a uniform facade for various DL frameworks such as TensorFlow, CNTK, Theano and MXNet. You can use Keras to make DL locally on your computer but also as a frontend for the offers of the big cloud providers. If you want to work solely in JVM, maybe you should take a look at Eclipse Deeplearning4J. Nevertheless, I recommend focussing more on Python at the moment. The ML/DL ecosystem is simply incomparably larger and better.
Deep learning obstacles
JAXenter: In your JAX session you will also deal with typical obstacles and problems of using deep learning. Can you give us an example?
Shirin Glander: Obstacles and problems can be on several levels: There’s, of course, the purely technical side, but also social and ethical/moral implications should not be ignored when developing an algorithm. From a technical point of view, the first thing we need is the appropriate data and the necessary computer power. Another frequent problem is that the data, even if it’s available in sufficient quantity, needs a so-called label for the most common application, the classification. Classification refers to machine learning on the basis of historical data, the result of which we know; the machine learns then the mathematical representation of the data by trying to depict the known result as accurately as possible. And these known results are described in the so-called label. For example, if we train a model that recognizes cats in pictures, we need a lot of pictures that show cats, and other unrelated stuff and each of these pictures must be labeled “cat” or “no cat” manually. This is of course extremely painstaking!
Fortunately, there is now a whole series of pre trained models for image recognition, such as Google’s Inception or ImageNet, that you can use for free to train your own models. If you have such pre-trained models, you will also get ahead with only a little bit of data. However, for many other more specialized applications this is not the case, as the data itself is the bottleneck.
Obstacles and problems can be on several levels, but social and ethical/moral implications should not be ignored.
Another problem may be that deep learning models are so complex that we can no longer understand their decisions and what exactly they have learned – this is the reason why we also call them black boxes. From a purely technical point of view, it is of course not necessary to understand them, as long as the result is correct. Nevertheless, I would argue that in most cases it makes sense (or that it’s even necessary) to use techniques, which give us at least an approximate understanding. Because if we better understand our models, we can, among other things, avoid possible errors at an early stage: like recognizing whether there is a hidden bias in our training data which leads our model to learn the wrong characteristics in an image.
The most striking example of this is the model which was very good at distinguishing tanks from civilian vehicles in training images. However, when this model was then applied to new images, almost all civilian vehicles were also recognized as tanks and it turned out that this was because most of the tanks in the training images were created in bright sunlight, while civilian vehicles had a much darker background. So the model has not learned the shapes of the vehicles but the brightness of the background.
Today, there are some new approaches to avoid such cases and they show us what part of our data has led to a decision, such as LIME (Local Interpretable Model-agnostic Explanation). But a better understanding of the decisions can also help us to trust them more. And this is of course particularly important when we use models in medicine, for example, to detect breast cancer on mammography images or to diagnose other diseases. But business decision-makers also generally want to know whether they really should make a, possibly risky, decision based on a decision made by machines.
This issue will become even more relevant from the 25 of May when the new basic EU data protection regulation, the basic privacy regulation, becomes effective. Especially the articles 13, 14, 15 and 22 are providing material for discussions right now, because they give all those who are affected by algorithm-based decisions the right to be informed and to receive an explanation of the logic behind it.
Another problem that affects all models which learned from historical data is fairness and social bias. If the data that was used for learning contains a bias because certain skin colors or genders were socially disadvantaged, then the models naturally learn these bias as well and this leads to a self-fulfilling prophecy that, for example, prisoners with darker skin color will end up in prison more often and, therefore, there are less suitable for a suspended sentence than those with white skin color. I mention this only briefly but it actually concerns more use cases of deep learning than one would expect and we should be aware of that!
Uwe Friedrichsen: I think these were enough obstacles for now ( laughs!).
JAXenter: What is the core message of your session that everyone should take home with them?
Uwe Friedrichsen: Well, it is our initial goal to demystify the topic of DL but without trivializing it. We want to convey to the participants that although it is a broad subject, and that’s not a secret it is not a secret, highly complex art that is reserved for only a few but that one can and also should dare to approach it as a “normal” IT person.
Shirin Glander: Exactly, I want to show how you can approach the topic of deep learning practically, even as a complete beginner. It is because of the many frameworks and data that is freely available, that you can try out a lot of things. And because there’s also a very large community on this topic it is also very easy to find instructions with code examples, if you should have questions. This is why I’ll show a few ways of how to build your first neural network relatively easily or how to teach a computer to play simple games and I hope that some of them will try it on their own after our session.
Uwe Friedrichsen: I am glad for every person, who’s looking into DL and shows interest in it and is thus advancing the democratization of the topic a little bit, especially against the backdrop of the threat of the monopolization of the topic by a few companies.
And that’s why my core message is DL is no witchcraft. Take a look, participate and develop!
Thank you very much!