Machine Learning basics focused on Supervised Learning

5 min readMar 1, 2021

Introduction

Hello, my name is Anthony Hosemann and I am an IT software development student at SAIT. In this blog I will be discussing some of the basics of machine learning and focusing primarily on supervised learning and decision trees.

What is machine learning?

Machine learning is a branch of AI (artificial intelligence) that focuses on applications that use data to learn and improve their accuracy. Machine learning has multiple different algorithms that can be utilized to create these applications that seem to be able to learn. Some of these algorithms are trained with data sets to find patterns in order to make decisions and predictions based on new data.

Machine Learning history

Machine learning and AI have been researched from all the way back in the 1950s. Machine learning was originally based partly on human brain cells. One of the first machine learning programs was created by Arthur Samuel in the 1950s. This program was created to learn how to play checkers and was built using search trees. Because of the limitations in computer memory at that time Arthur Samuel implemented a scoring function where the program, instead of searching each path until the game’s conclusion, would assess the board at any given time and try to choose what move would have the highest chance of resulting in a win. From then machine learning has made a lot of advancements and it’s practical uses can be seen almost everywhere. Some of the most common applications for machine learning would be for image/speech recognition, email spam filtering, or even online fraud prevention.

Types of Machine Learning

In machine learning there are three main types of learning. Supervised learning, unsupervised learning, and reinforcement learning. Supervised learning programs are trained using training data that is labelled. It means the data that is input into the program is already tagged with the answer/label. Unsupervised learning on the other hand is trained with the use of unlabeled data. Finally, reinforcement learning is where a program will use trial and error where each decision the program makes will either give it a “reward” or “penalty” and its goal is to maximize the reward. In this blog I will be talking primarily about supervised learning.

Classifiers

What is a classifier? A classifier is the part of the program that makes decisions and chooses what the correct output should be for any given input data. There are a lot of different types of classifiers such as the k-nearest neighbor or random forest classifiers which each use their own algorithms to find the correct output. In supervised learning the classifier needs to be trained with a training set of data to be accurate. These training sets of data hold the features/characteristics of the objects that the classifier has to discern as well as the label of the object. This allows the classifier to learn from the training data set and create rules that will help it make more accurate decisions.

A basic Machine Learning program

let’s say I want to create a program that can tell the difference between a banana, an apple and an orange. We as humans would be able to tell the difference right away just by looking at the three fruits but for a machine without eyes it is a bit more difficult. So let’s define some features of these fruits to help the program discern between them. One feature of an orange is that it has bumpy skin, in contrast to bananas and apples which have smooth skin and peels. Another feature of these fruits would be their color. An orange is orange and bananas are yellow. Apples have multiple different colors like red, yellow, and green. Now that the fruits are defined, we can create a training set of data which might look something like this:

Now that we know what the features of our fruits are, we can begin creating a program using a machine learning algorithm to solve this problem.

So how does a decision tree classifier work?

A decision tree classifier is one of the easiest machine learning classifiers to make and visualize. To describe decision trees, I will be using the example problem that I provided above. In machine learning there are multiple different algorithms used to create decision trees, but I will be using one called CART (Classification and Regression Trees).

To begin creating a decision tree you create the root node of the tree and the root will receive the entire training set of data as an input. Each node of a decision tree will make a true or false question about one of the features to split the data. The question that is asked at each node must be a question that helps the program discern between different labels or fruits in this case. The question that is asked is generated by the classifier looking at all the features and figuring out which question splits the data the best.

This can be quantified by two values, Gini impurity and information gain. Gini impurity is a value between 0 and 1 that allows us to quantify how much uncertainty there is at a node and information gain allows us to quantify how much a question reduces the impurity.

The split data is then fed as an input into child nodes that will either split the data further if it is still mixed or become a leaf node if it is unmixed. The goal of the decision tree is to unmix the data until only one label remains in each leaf node. Below I have drawn a visual representation of the decision tree I have been describing.

As you have probably noticed One of the leaf nodes is still mixed. I created the data set so that entry 2 and entry 4 could not be discerned to showcase one of the flaws with decision trees. Compared to other machine learning algorithms decision trees are inaccurate.

Video

Below is a video where I talk about the content discussed in this blog.

Conclusion

Thank you for taking the time to read through my blog post and I hope you have learned something new about machine learning. I will be creating a blog post later this year providing coded examples and more in-depth information on supervised learning.