*An article by Martin Heller *

Machine learning uses algorithms to turn a data set into a model. Which algorithm works best depends on the problem

Machine learning and deep learning have been widely embraced, and even more widely misunderstood. In this article, I’d like to step back and explain both machine learning and deep learning in basic terms, discuss some of the most common machine learning algorithms, and explain how those algorithms relate to the other pieces of the puzzle of creating predictive models from historical data.

**What are machine learning algorithms?**

Recall that machine learning is a class of methods for automatically creating models from data. Machine learning algorithms are the engines of machine learning, meaning it is the algorithms that turn a data set into a model. Which kind of algorithm works best (supervised, unsupervised, classification, regression, etc.) depends on the kind of problem you’re solving, the computing resources available, and the nature of the data.

**How machine learning works**

Ordinary programming algorithms tell the computer what to do in a straightforward way. For example, sorting algorithms turn unordered data into data ordered by some criteria, often the numeric or alphabetical order of one or more fields in the data.

Linear regression algorithms fit a straight line*, or another function that is linear in its parameters such as a polynomial,* to numeric data, typically by performing matrix inversions to minimize the squared error between the line and the data. Squared error is used as the metric because you don’t care whether the regression line is above or below the data points; you only care about the distance between the line and the points.

Nonlinear regression algorithms, *which fit curves that are not linear in their parameters to data*, are a little more complicated, because, unlike linear regression problems, they can’t be solved with a deterministic method. Instead, the nonlinear regression algorithms implement some kind of iterative minimization process, often some variation on the method of steepest descent.

Steepest descent basically computes the squared error and its gradient at the current parameter values, picks a step size (aka learning rate), follows the direction of the gradient “down the hill,” and then recomputes the squared error and its gradient at the new parameter values. Eventually, with luck, the process converges. The variants on steepest descent try to improve the convergence properties.

Machine learning algorithms are even less straightforward than nonlinear regression, partly because machine learning dispenses with the constraint of fitting to a specific mathematical function, such as a polynomial. There are two major categories of problems that are often solved by machine learning: regression and classification. Regression is for numeric data (e.g. What is the likely income for someone with a given address and profession?) and classification is for non-numeric data (e.g. Will the applicant default on this loan?).

Prediction problems (e.g. What will the opening price be for Microsoft shares tomorrow?) are a subset of regression problems for time series data. Classification problems are sometimes divided into binary (yes or no) and multi-category problems (animal, vegetable, or mineral).

**Supervised learning vs. unsupervised learning**

Independent of these divisions, there are another two kinds of machine learning algorithms: supervised and unsupervised. In *supervised learning*, you provide a training data set with answers, such as a set of pictures of animals along with the names of the animals. The goal of that training would be a model that could correctly identify a picture (of a kind of animal that was included in the training set) that it had not previously seen.

In *unsupervised learning*, the algorithm goes through the data itself and tries to come up with meaningful results. The result might be, for example, a set of clusters of data points that could be related within each cluster. That works better when the clusters don’t overlap.

Training and evaluation turn supervised learning algorithms into models by optimizing their parameters to find the set of values that best matches the ground truth of your data.

The algorithms often rely on variants of steepest descent for their optimizers, for example stochastic gradient descent (SGD), which is essentially steepest descent performed multiple times from randomized starting points. Common refinements on SGD add factors that correct the direction of the gradient based on momentum or adjust the learning rate based on progress from one pass through the data (called an epoch) to the next.

continue to read this article __here__

*Martin Heller is a contributing editor and reviewer for InfoWorld. Formerly a web and Windows programming consultant, he developed databases, software, and websites from 1986 to 2010. More recently, he has served as VP of technology and education at Alpha Software and chairman and CEO at Tubifi.*

Copyright © 2019 IDG Communications, Inc.