Machine learning for you and me

Fitting models with AWS AI

Cyrus Vahid
© Shutterstock / Jirsak

Amazon has been using predictive models for decades. Now, they want to put machine learning in the hands of every developer with their AWS AI division. In this article, Cyrus Vahid, an AI specialist from AWS, explains some basic models for deep learning and goes over how Amazon has a service for every use case.

Machine Learning, in short, is software that can learn and improve patterns and rules from experience (a.k.a. data) without having to explicitly implement those rules. In machine learning, instead of implementing rules, we fit a model. Let us look at a simple example of a linear model. If our data points are scattered in the following distribution, we can fit the model into data using the green line.


Figure 1: Linear regression

A 2D line can be represented as following equation:


In which x is the input data. Fitting a model is finding optimal parameters a and b, so that equation f is the optimal representation of unseen data points, which gives a model the ability to generalize beyond direct experience.

Amazon has been building such predictive models since mid-1990’s. Over the years, ML has become a code component of Amazon and consequently AWS. The mission of the AWS AI team is to “Put machine learning in the hands of every developer and data scientist”.

To achieve this goal, AWS services cover a large set of use cases including Computer Vision, Natural Language Processing, pre-trained algorithms for most common use cases, ML tools and frameworks, GPU, and FPGA.

SEE ALSO: “Go and AWS — A perfect match”

Deep learning

Complex tasks such as vision often require solving differential equations in high-dimensional spaces. The mathematics of such equations are simply not solvable. Such problems are addressed through function approximation, in which we find a function that produces the same results within acceptable error threshold of an ϵ value. More formally, if f is our high dimensional function we want to find a function F so that:


Deep Neural Networks are proven universal function approximators and are, thus, capable of solving high dimensional equations through approximation given that certain conditions are met.

Like most fields of science and engineering, to solve a complex problem, looking into evolution is the best inspiration. Looking at the human brain was, therefore, the natural place to look into. Our brain encodes information in a distributed network of neurons connected through a network of connection or synopses. When a stimulus evokes a recall, all related components of a specific memory are called together and the information is reconstructed. Connections between neurons have different strengths and that controls how much a piece of information participates in the recall.

In the most abstract way, neural networks mimic this process by encoding data in floating point vector representations, connecting these nodes to one another, and assigning a weight to each connection to represent the importance of a node for activation.

A deep neural network has an input layer, and an output layer, and one or more “hidden” layers, which act as intermediary computational nodes. The value of each node is computed by multiplying inputs to a node by associated weights or computing a weighted average of all the inputs to a node.

Figure 2: Multilayer Perceptron Value of hidden node h11 is computed as:
 h11=Φ (I1 × w11 + I2 × w21 + ⋯ + In × wn1) in which Φ is a non-linear function that is called activation function. Generally if input to a node is X={x1, x2, …, xn} and associated weight vector W = {w1, w2, …, wn }; then f(xi, wi ) = Φ (b + ∑i (wi, xi))

SEE ALSO: AWS documentation is now open source and on GitHub

Learning process involves: (1) computing output values throughout layers the forward pass, (2) comparing the output to the actual output we were expecting, (3) computing how much the computed output differs from expected output, and (4) adjusting the weights on the backward pass by distributing blame on the basis of the distance calculated in step (3).

Deep Learning has become an essential part of machine learning in recent times due to development of GPU processors performing over 1012 floating point operations per second (TFLOPS) and an explosion in available data. These improvements have resulted in the development of powerful algorithms that solve complex problems such as translation.

On the software side, the mathematics of deep learning is now being abstracted in libraries and deep learning frameworks. Some of the most popular frameworks include:

  • TensorFlow
  • Apache MXNet
  • PyTorch
  • Caffe

Deep learning frameworks are built to optimize parallel processing on GPU and optimize training computation for the underlying hardware. Using these frameworks, a complex image classification problem can be reduced to a few hundred lines of code without requiring any knowledge of mathematics behind the scene while leaving optimization for a large part to the framework. In short, deep learning frameworks turn a research problem into a programming task.

AWS provides a Deep Learning AMI that supports all the above-mentioned frameworks and more. The AMI is available as a community AMI and comes in Ubuntu and Amazon Linux flavors. Customers like ZocDoc use the Deep Learning AMI to build patient confidence using TensorFlow.

Deep learning use cases

Some of the most researched areas of deep learning are:

Computer Vision

Image classification, scene detection, face detection, and many more use cases for both videos and still pictures fall under this category. The application of computer vision in medical imagery and diagnosis is on the rise, while self-driving cars heavily make use of computer vision algorithms. There are several methods for a developer to benefit from application of computer vision in the products they develop without a requirement to implement models themselves.

Simplest method involves Amazon Rekognition API for images and videos. The API provides an SDK and REST endpoint for various computer vision tasks.


Figure 3: Example for image labelling using Amazon Rekognition Java API

If the API-based service is not sufficient, then there are further choices available to explore.

The next best choice is to use Image Classification Algorithms from amongst Amazon SageMaker built-in algorithms. With this algorithm, you can use your own dataset and labels in order to fit the model to your data. Training a model would then be no more than a few lines of code using Amazon SageMaker python SDK or SageMaker APIs.


Figure 4: Image Classification in Amazon SageMaker

Even if this solution is not sufficient, there is always the possibility of building your own algorithms or using model zoo. Amazon SageMaker provides you with a fully-manages, end-to-end, zero setup ML environment which you can use to develop your ML code, train your model, and publish the endpoint to an elastic environment based on Amazon Elastic Container Service. Training and hosting a model in Amazon SageMaker is a single line of code per task using Python SDK and a few lines, should you choose to use SageMaker API. You can always develop your models from existing code base and altering the model to fit your problem.

SEE ALSO: Machine learning and data sovereignty in the age of GDPR

Using MXNet and gluon model_zoo, you have access to a variety of pre-trained models. You can simply import a model from model_zoo, train it with your data and perform inference. There are several available models in model_zoo.


Figure 5: Using for a computer vision task

If you would still like to go deeper, you can start your model development from scratch. The following example in MXNet and gluon is a sample code for building a simple network for hand-written digit recognition using Apache MXNet and gluon.


Figure 6: Gluon sample code for MNIST

The final depth you might be interested in is to implement your own code if you choose not to use MXNet or TensorFlow. You can still develop your model elsewhere, using your platform of choice and host it on Amazon SageMaker.

Other notable use cases for ML are:

Natural Language Processing

AWS provides Translation, Conversational Chat Bot, Text to Speech, and Voice to Text (ASR) services as API services, while Amazon SageMaker built-in algorithms provide Topic Modeling, Word Embedding, and Translation algorithms.


Amazon SageMaker built-in algorithms include a state-of-the art distributed factorization machine that can train across several GPU instances.


Amazon SageMaker built-in algorithms include a state-of-the art forecasting algorithm called DeepAR. Using this algorithm you can perform time-series prediction on your own dataset.

There are other algorithms implemented as part of SageMaker built-in algorithm. I encourage you to refer to documentation for further details.

SEE ALSO: Polyaxon: Accessible machine learning for the enterprise

ML-related AWS services


Amazon Rekognition Video can track people, detect activities, and recognize objects, faces, etc. in videos. Its API is powered by computer vision models that are trained to accurately detect thousands of objects and activities, and extract motion-based context from both live video streams and video content stored in Amazon S3. The solution can automatically tag specific sections of video with labels and locations (e.g. beach, sun, child), detect activities (e.g. running, jumping, swimming), recognize, and analyze faces, and track multiple people, even if they are partially hidden from view in the video.

AWS DeepLens is a deep-learning enabled, fully programmable video camera. It can run sophisticated deep learning computer vision models in real-time and comes with sample projects, example code, and pre-trained models so developers with no machine learning experience can run their first deep learning model in a very short time.

Amazon SageMaker makes model building and training easier by providing pre-built development notebooks, popular machine learning algorithms optimized for petabyte-scale datasets, and automatic model tuning. It simplifies and accelerates the training process, automatically provisioning and managing the infrastructure to both train models and run inference to make predictions using these models.

SEE ALSO: A basic introduction to Machine Learning

Natural language processing

Amazon Translate uses neural machine translation techniques to provide highly accurate translation of text from one language to another. Currently it supports translation between English and six other languages (Arabic, French, German, Portuguese, Simplified Chinese, and Spanish), with many more to come this year.

Amazon Transcribe converts speech to text, allowing developers to turn audio into accurate, fully punctuated text. It supports English and Spanish with more languages to follow. In the coming months, Amazon Transcribe will have the ability to recognize multiple speakers in an audio file, and will also allow developers to upload custom vocabulary for more accurate transcription for those words.

Amazon Lex is a service for building conversational interfaces into any application using voice and text. The service provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions. Amazon Lex enables you to easily build sophisticated, natural language, conversational bots (“chatbots”).

Amazon Polly turns text into lifelike speech, allowing programmers to create applications that talk, and build entirely new categories of speech-enabled products. It uses advanced deep learning technologies to sound like a human voice.

With dozens of voices across a variety of languages, developers can select the ideal voice and build speech-enabled applications that work in different countries.

Amazon Comprehend can understand natural language text from documents, social network posts, articles, or any other textual data. The service uses deep learning techniques to identify text entities (e.g. people, places, dates, organizations), the language the text is written in, the sentiment expressed in the text, and key phrases with concepts and adjectives, such as ‘beautiful,’ ‘warm,’ or ‘sunny.’ Amazon Comprehend has been trained on a wide range of datasets, including product descriptions and customer reviews from, to build language models that extract key insights from text. It also has a topic modeling capability that helps applications extract common topics from a corpus of documents.

SEE ALSO: Why are so many machine learning tools open source?

Further information




Cyrus Vahid

Cyrus is an AI specialist, working as Principal Solution Architect at AWS Deep Learning team. His current work includes Natural Language processing and [Deep] Reinforcement Learning. Cyrus mostly uses Apache MXNet for developing ANNs.

He has been working in the field of software for over 20 years in a combination of startups and enterprises with entrepreneurial spirits. For the last 7 years, he has been dedicated to big data and Machine Learning solutions.

Leave a Reply

Be the First to Comment!

Notify of