views
Supervised vs Unsupervised Machine Learning | by Picklai | Mar, 2022 | Medium
Mar 7
twenty-first century. It has ushered in unprecedented changes in the way we work, live and exist. From your favorite social media application to your preferred search engine, chances are you rely heavily on machine learning!
Machine learning is a vast domain, which is exemplified by the fact that practitioners state that it has over fifty classifications. but the two most fundamental ones are supervised and unsupervised learning. Let’s dive in to understand:
Simply stated, a supervised machine learning model seeks to mirror the mathematical relationship between a set of inputs and outputs. Some input variables are provided along with their output(s) (known as target) and the model takes upon itself the task of learning the relationship.
The procedure is carried out with the help of an optimization algorithm, which aims to minimize (or maximize, if the need be) an objective function, in a long trial-and-error process. Note that no manual intervention is needed. We do not direct the model the how of carrying out the process. We only tell it what we want.
For example, we can have some sample data as follows:
The model shall start with a form of the equation y = b1x + b0 and try to unearth the values of b1 and b0. In its iterations, the model’s task would be to minimize the difference between f(x) and y, where the former corresponds to the function the model is using and the latter, is the actual output. The relation thus learned and can now be used on unseen data and thus, for making predictions about the unknown.
The parameters (jargon for b1 and b0) are found out to be 2.3 and 0 respectively. Note that this is a trivial scenario and in reality, there would potentially be thousands of rows of labeled data with multiple x variables (inputs) mapped to one or more outputs.
What you just read is in fact the simplest rendition of linear regression, which is used across domains like finance, economics, social sciences and biology. The relation to be learnt could very well have been non-linear; this is where supervised learning shines as it can mimic practically every identity under the sun!
You might be wondering, what if there were no targets? This would mean that there are no relations to learn and all we have is some unlabeled data at hand which has now to be made sense of in some insightful manner.
Unsupervised learning comes to our rescue, where the data points are classified on the basis of some shared characteristic. For instance, assume that you are running a website that enables users to upload pictures of their pets. You’d like each one of them to be displayed under the relevant animal category.
Here, you have three choices. The first is to have a human supervisor who would sift through all the submissions and classify them manually. This may not be feasible to do for a single person, depending on the number of daily submissions. It would be very time-consuming and thus, resource-intensive.
As per the second choice, you would ask a human supervisor to label some images as cats, dogs, fishes etc and then use the results for training a supervised machine learning model to do the rest for you. While this could be a good solution, it may not be feasible always owing to the dearth of resources and/or time.
Alternatively, you can use an unsupervised ML algorithm for the purpose. A predetermined number of categories may be fed into the model and then, using various statistical techniques, the model classifies the pictures on the basis of their visual similarity. Quite obviously, it would be most meaningful to have the number of categories to be the same as the number of animal types.
The example discussed above is implemented most effectively using a technique called clustering. Unsupervised learning finds use in fraud detection, customer segmentation, recommender systems and various other applications.
It makes a lot of sense to learn both of these techniques right at the beginning of your machine learning journey. Experts often end up using unsupervised learning to better explore the data at hand before starting off with supervised learning for obtaining more concrete solutions. Thus, the two of them are complementary.
In case you would like to learn these techniques, log on to Pickl.ai to check out our courses. We start from the scratch and level up gradually to introduce you to the basics first (statistics and programming in Python), before taking a deep dive into machine learning and its two principal varieties.