When most people hear Machine Learning, they picture a robot, a dependable butler or a deadly Terminator, depending on who you ask. But Machine Learning is not just a futuristic fantasy; it’s already here. It has been around for decades in some specialized applications, such as Optical Character Recognition (OCR).
But the first ML application that became mainstream, improving the lives of hundreds of millions of people, took over the world back in the 1990s, the spam filter. It’s not precisely a self-aware Skynet, but it does technically qualify as Machine Learning. It was followed by hundreds of ML applications that now quietly power hundreds of products and features that you regularly use, from better recommendations to voice search.
What is Machine Learning?
Machine Learning is the science and art of programming computers so they can learn from data.
Your spam filter is a Machine Learning program that given examples, of spam emails (e.g., flagged by users) and examples of regular (nonspam, also called ham) emails, can learn to flag spam. The models that the system uses to determine are called the training set. Each training example is called a training instance.
If you download a copy of Wikipedia, your computer has a lot more data, but it is not suddenly better at any task. Thus, downloading a copy of Wikipedia is not Machine Learning.
Types of Machine Learning Systems
There are so many different types of Machine Learning systems that it is useful to classify them in broad categories, based on the following criteria:
- Whether or not they are trained with human supervision (supervised, unsupervised, and Reinforcement Learning)
- Whether or not they can learn incrementally on the fly (online versus batch learning)
- Whether they work by merely comparing new data points to known data points, or instead by detecting patterns in the training data and building a predictive model, much like a scientist, do (instance-based versus model-based learning)
These criteria are not exclusive; you can combine them in any way you like. For example, a state-of-the-art spam filter may learn on the fly using a deep neural network model trained using instances of spam and ham; this makes it an online, model-based, supervised learning system.
Let’s look at each of these criteria a bit more closely.
Supervised and Unsupervised Learning
Machine Learning systems can be classified according to the amount and type of supervision they get during training; There are two major categories:
- Supervised Learning
- Unsupervised Learning
In supervised learning, the training set you feed to the algorithm includes the desired solutions, called labels.
A typical supervised learning task is classification. The spam filter is an excellent example of this. It is trained with many example emails along with their class (spam or ham), and it must learn how to classify new emails.
Here are some of the essential supervised learning algorithms :
- K-Nearest Neighbors
- Linear Regression
- Logistic Regression
- Support Vector Machines
- Decision Tree and Random Forests
- Neural Networks
In unsupervised learning, as you might guess, the training data is unlabeled. The system tries to learn without a teacher.
For example, say you have a lot of data about your blog’s visitors. You may want to run a clustering algorithm to try to detect groups of similar visitors. At no point do you tell the algorithms which group a visitor belongs to, it finds those connections without your help.
Here are some of the most important unsupervised learning algorithms:
- Hierarchical Cluster Analysis
- One-class SVM
- Isolation Forest
- Principal Component Analysis
- Kernel PCA
- Locally Linear Embedding
- t-Distributed Stochastic Neighbor Embedding