A Quick Guide to Unsupervised Learning

What is unsupervised learning?

Seth Adler

What is Unsupervised Learning?

Machine learning (ML) is a subset of artificial intelligence (AI) that is modeled after the human brain to accomplish tasks through algorithms and statistical models. Data scientists can teach an ML algorithm through two main processes: supervised learning and unsupervised learning.

A learning model is a mathematical representation of a tangible process. If the learning model is supervised, the inputs of the algorithm are labeled. The algorithm receives a paired dataset that includes the sample and the label for that sample. These inputs are also called observations, examples, or instances. If the datasets are unlabeled, the algorithm is unsupervised and must categorize, compute, and deliver outputs on its own with no predefined parameters.

For example, in a supervised learning model where the output goal is to categorize animals into cats or dogs, the samples include labeled pictures of cats and dogs. If the sample is of the right quality and quantity, the ML algorithm can learn from the labeled data and categorize new inputs of cats and dogs quickly and accurately.

If the dataset is unsupervised, the examples are unlabeled. In this case, the ML algorithm will generate observations and inferences on its own. It may sort by length of hair, color, shape of ears, and a number of other characteristics. Eventually, by overlapping these outputs and learning from them, the unsupervised learning model will have taught itself the difference between a cat and a dog and can categorize them—if not label them—appropriately.  


Agustinus Nalwan, AI & Machine Learning Technical Lead Carsales.com explains how to use Image Recognition in business

Source: AIIA.net: AI and Intelligent Automation


Data Classification

Collecting the data alone is not typically actionable until it gets put into a data matrix. A data matrix is a collection of data organized into rows and columns—think Excel spreadsheets. The characteristics are usually stored in columns, and the variables are stored in rows. Data matrices help clean up data, removing outliers or irrelevant data points, and sorts and organizes information into functional data, or data classifications. From data classifications, data models can then be formed.


Cluster Analysis

The most common strategy used in unsupervised learning is cluster analysis. Because it automatically finds patterns in data without labels, it is commonly used in data analytics and classification tasks across industries, from healthcare to marketing. Deep learning and data mining lends itself well to clustering, because cluster analytics is able to take massive amounts of unlabeled data and sort them using a variety of methods.

Different clustering algorithms are best in different use cases. The “shape” data takes on when plotted determines which algorithm is best, as does the ultimate use case. While the number of clustering model algorithms is undefined, as new ones are continuously entering the scene, there are a few common ones that are most effective across the board.


  • K-means clustering - The value k represents the manually determined and preset desired number of clusters. A k-means clustering algorithm starts by randomly assigning dataset points as cluster representatives. From there, other dataset points are pulled into the data cluster representative it is closest to using mean values. New centers are calculated with the new data points, and the process is repeated until the clusters remain the same. k-means clustering works best with compact and well-separated data clusters.

There are pros and cons to k-means clustering. K-means clustering is used for fraud detection, customer segmentation, and delivery store optimization. k-means algorithms are commonly deployed with large datasets. While k-means clustering algorithms are the simplest form of data clustering, they are also sensitive to outliers and produce only a general set of clusters.

K-nearest neighbor clustering (KNN) is a related method used for classification. The target attribute is known beforehand, and clusters are built around those attributes using a similar clustering model as k-means clustering.


  • Hierarchical clustering – Using a top-down or bottom-up approach, hierarchical clustering orders data into a dendrogram. A dendrogram is a diagram that represents cluster merges (bottom-up, also known as agglomerative) or cluster separation (top-down, also known as divisive) with a tree diagram.

Unlike k-means clustering, which sorts data all at once, hierarchical clustering goes one step at a time, creating one new cluster by combining previous clusters before starting the process over again. Determining which clusters to merge is based on proximity.

Because hierarchical clustering is a step-by-step process, it is more time-consuming than k-means clustering and struggles with large amounts of data. Hierarchical clustering has been used for managing investment portfolios, determining risk scores in banking, and tracking and grouping DNA and evolutionary cycles in the animal kingdom. Hierarchical clustering is sometimes deployed as a first step to k-means clustering. Hybrid approaches to data clustering are being used more and more as computing power and storage increases.


  • Expectation-maximization (EM) – An EM cluster is a soft cluster, meaning the output answers aren’t definitive. They will instead suggest different clusters a particular data point may fit into with a probability score. EM works well with missing data and unobserved features. For example, EM algorithms—of which there are several—can take data points of an individual, such as whether they smoke, how much exercise they get, and what they eat, and infer their risk of heart disease. EM clustering and k-means clustering are sometimes used in conjunction when data is missing in a traditional k-means cluster.


  • Fuzzy c-means clustering (FCM) – Similar to EM clustering, Fuzzy c-means clustering calculates the potential of a data point to fit into any of several clusters. Further, similar to k-means clustering, data points are clustered by pulling in their nearest neighbor, albeit in a weighted or soft way as opposed to a concrete way as with k-means clustering. Fuzzy c-means clustering has been used as an important tool in image analysis. Marketing also uses fuzzy c-means clustering as a way to loosely group target audiences.


  • Density-based clustering (DBSCAN) – While density-based clustering uses special relativity similar to k-means clustering, it does not require a preset number of clusters. Density-based clustering also works well at disseminating between clusters and outliers. However, to do this, it requires a careful selection of its parameters.


Unsupervised learning algorithms are highly mathematical, complex, and dependent on use cases. This task is left almost exclusively to highly-trained data scientists in the ML industry. However, third-party software, programming languages, and ML programs are hitting the market in a way that allows businesses and laypeople to take advantage of the power of unsupervised learning as well.


Listen to a discussion on the importance of machine and man working side-by-side in business.

Source: The AIIA Network Podcast


Data Science & Unsupervised Learning

Choosing the right algorithm in unsupervised learning means first examining the data at hand. Some algorithms require numerical data, while others can work with voice and images. Some algorithms work well with data clumped together, and others can handle spread out data. Some algorithms take more computing power and thus are reserved for small sets of data, while others are great at processing massive amounts of data at high speeds.

Data scientists today often used a hybrid approach to machine learning. Unsupervised learning is used to roughly group undefined clusters that can then be examined and labeled. Once labeled, this data can be entered into supervised learning algorithms to deliver more precise answers to complex problems.

It is important to note that unsupervised learning simply means the data isn’t labeled. Oversight and human intervention is still necessary to define parameters, reduce or eliminate unintended biases, and ensure accuracy and reliability, especially when the industry is customer-facing, such as with chatbots and Netflix or Spotify recommendations.



Admin (2015, April 9). How Businesses Can Use Clustering in Data Mining. Dataentryoutsourced.com.

AIIA Editorial Team. (2019, June 3). A Quick Guide to Deep Learning. AI & Intelligent Automation Network. Aiia.net.

Algorithmia. (2018, April 9). Introduction to Unsupervised Learning. Blog.algorithmia.com.

Bari, A., Chaouchi, M., Jung, T. (n.d.). How To Convert Raw Data Into A Predictive Analysis Matrix. Dummies.com.

Bock, T. (n.d.). What is Hierarchical Clustering? Displayr.com.

Dey, S. (2017, August 14). Dogs vs. Cats: Image Classification with Deep Learning using TensorFlow in Python. Datasciencecentral.com.

Garbade, M. (2018, September 12). Understanding K-means Clustering in Machine Learning. Towardsdatascience.com.

Geeks for Geeks. (n.d.). Introduction to Machine Learning Using Python. Geeksforgeeks.org.

Google Developers (2019, May 6). K-Means Advantages and Disadvantages. Machine Learning Developers.google.com.

Harlalka, R. (2018, June 18). Choosing the Right Machine Learning Algorithm. Hackernoon.com.

Jones, M.T. (2018, February 6). Supervised learning models. IBM Developer. Developer.ibm.com.

Kassambara, A. (n.d.). Fuzzy C-Means Clustering Algorithm. Data Novia. Datanovia.com.

Kaushik, S. (2016, November 13). An Introduction to Clustering and Different Methods of Clustering. Analytics Vidhya. analyticsvidhya.com.

Keng, B. (2016, October 7). The Expectation-Maximization Algorithm. Bounded Rationality. Bjlkeng.github.io

Korbut, D. (2017, July 6). Recommendation System Algorithms. Stats and Bots. Blog.statbot.co.

Lynch, V. (2018, November). Three ways to avoid bias in machine learning. Tech Crunch. Techcrunch.com.

  1. C. J. Christ and R. M. S. Parvathi, "Fuzzy c-means algorithm for medical image segmentation," 2011 3rd International Conference on Electronics Computer Technology, Kanyakumari, 2011, pp. 33-36.

Ma, E. (2019, January 20). Combining supervised learning and unsupervised learning to improve word vectors. Towards Data Science. Towardsdatascience.com.

Malik, F. (2019, June 6). Machine Learning Hard Vs. Soft Clustering. Medium.com.

MathWorks (n.d.) Dendogram. Mathworks.com.

Morgan, J. (2018, June 21). Difference Between Supervised Learning and Unsupervised Learning. DifferenceBetween.net.

Nandi, M. (2015, September 9). Density-Based Clustering. Domino. Dominodatalab.com.

Peterson, A., Ghosh, A., Maitra, R. (2018, January 17) Merging k-means with hierarchical clustering for identifying general-shaped groups. Onlinelibrary.wiley.com

Ramesh, VC. (2016, October 23). Unsupervised Deep Learning for Vertical Conversational Chatbots. Chatbots Magazine. Chatbotsmagazine.com.

Sharova, E. (2016) A Tutorial on the Expectation Maximization (EM) Algorithm. Kdnuggets.com.

Srivastava, T. (2018, March 26). Introduction to k-Nearest Neighbors: A powerful Machine Learning Algorithm (with implementation in Python). Analytics Vidhya. Analyticsvidhya.com.