A Quick Guide to Predictive Analytics

Predictive analytics defined, explained and simplified

Seth Adler

What is Predictive Analytics?

In its simplest form, predictive analytics takes historical data and garners insight from that data to predict future outcomes. Because of the massive computing power available today, Big Data has unlocked the power of predictive analytics in new and actionable ways. Human-built algorithms no longer need to be populated by hand. Data mining, artificial intelligence, and machine learning does those laborious, time-consuming tasks at a scale and cost humans never could. Now, human capital channels into the business intelligence (BI) field, creating a model that best suits an organization and adapts to its goals. Predictive analytics is being deployed across all industries, from healthcare and banking to astrology and dating, with incredible results.


How Machine Learning (ML) Enhances Predictive Analytics

While the two terms are often used interchangeably, predictive analytics is an area of study that has been around since the 1940s. Machine learning automates the process and allows the predictive analytic outputs to change and evolve with the access to new internal and external data. Machine learning can do predictive analytics, but machine learning is not predictive analytics. Other uses of machine learning outside of predictive analytics include natural language processing and facial recognition. Machine learning encompasses unsupervised and supervised learning models.


  • Unsupervised learning provides the ML algorithm with no predefined data. In unsupervised learning, the unstructured data is examined for patterns, then sorted.


Data Mining

Data miningdiscovers hidden patterns within massive amounts of data, which is then leveraged by predictive analytics. A dataset is a collection of the mined data, and Big Data is the term coined to describe massive datasets. Data is collected from everywhere today. In 2017, 2.5 exabytes (EB) of data was created daily. One billion gigabytes (GB) make up an exabyte. Data comes from sensors, social media, club cards, transactional data, and self-reported data, to name a few.

Breast cancer detection is a good example of leveraged data mining and Big Data. Large swaths of breast cancer scans are collected through Big Data processes, and patterns are unearthed through data mining. These patterns inform doctors or artificial intelligence, (AI) who are then able to non-invasively spot cancer earlier and more accurately than by a single scan alone. Mined data is also used in the business world to spot customer trends and recognize fraudulent activities. Fierce competition can be overcome through the correct predictive analytics strategy, and it starts with data mining and Big Data.


Jamie Campbell, marketing lead at the financial service company, Bud, shares their approach at handling customer data

Source: The AIIA Network Podcast


Because the amount of data available is so vast, data preparation, including data cleansing, are vital tasks to perform before data can be effectively plugged into a predictive analytics algorithm.


Data Classification

Collecting the data alone is not typically actionable until it gets put into a data matrix. A data matrix is a collection of data organized into rows and columns—think Excel spreadsheets. The characteristics are usually stored in columns, and the variables are stored in rows. Data matrices help clean up data, removing outliers or irrelevant data points, and sorts and organizes information into functional data, or data classifications. From data classifications, data models can then be formed.


Data Modeling

After mining and sorting data, a mathematical formula called an algorithm can apply that data to models. Data modeling turns historical data into insights and predictions by mapping past action and simulating future action. Data modeling takes some creativity and experimentation. Once the model is built, its values can be changed to help spot obvious or obscure trends and predictors. Data clusters and decision trees are two common types of data models.

Data Clusters

A data cluster is a machine learning algorithm that creates data models by grouping the data into sets with like characteristics. Data clusters are one modeling avenue for predictive analytics by predicting future behavior or outcomes of a particular cluster. There are different ways to cluster data, each with its own specific uses case.

  • K-means clustering -A K-means clusteringalgorithm starts by randomly assigning dataset points as cluster representatives. From there, other dataset points are pulled into the data cluster representative it is closest to. New centers are calculated with the new data points, and the process is repeated until the clusters remain the same. K-meansalgorithms use the unsupervised learning method and is commonly deployed with large datasets. While K-means clustering algorithms are the simplest form of data clustering, they are also sensitive to outliers and produce only a general set of clusters.


  • Nearest neighbor clustering - K-nearest neighbor clustering (KNN) is a supervised learning method that is commonly used for classification. The target attribute is known beforehand, and clusters are built around those attributes using a similar clustering model as k-means clustering.


  • Biologically inspired clustering - Just as machine learning is modeled after human neural networks, clustering can also be modeled after nature. In data analytics, “bird flocking” and “ant colonizing” algorithms are commonly used to cluster data in an organic way. Essentially, these methods cluster groups based on what keeps each data point away from each other, what keeps a data point moving congruently with another, and which data points move together. These algorithms are powerful predictive analytics tools, because with enough data points, one person’s actions, such as their buying habits, can help predict what their dataset peer group’s buying habits may be, allowing for a holistic target marketing approach.

Decision Trees

A decision tree is a directed supervised learning model. They are less susceptible to outliers than clustering, and they are simple to comprehend. The model is tree-like visually and structurally in that it starts at the root and branches out into leaves based on varying factors. A decision tree looks like a flow chart and uses a rule-based tactic in predictive learning. Observations are represented as branches, and outcomes are represented as leaves or nodes. Decision trees are outcome-specific and use data to reach particular conclusions, whereas clustering algorithms have no defined direction and cluster accordingly.


  • Classification Tree – In a classification tree, data points are already recognized and defined. New data is then plugged into the algorithm and fed down the tree until it reaches a definitive leaf. Animal classification is a descript example. The limbs of the trees categorize by characteristics such as, does it breathe air; does it lay eggs; does it have fur, et cetera. In this way, a classification tree can take an unknown data point and come to certain conclusions about it.


By combining classification trees, an ensemble is born. Ensembles are supervised learning algorithms made up of overlapping yet differentiated classification trees. The power of ensembles lies in its ability to weed out outliers, decrease biases, and develop a more robust predictive model.


  • Regression Tree –  A regression tree has no categories, because the variables are ongoing. A regression tree uses variables as inputs and numbers as outputs. For example, based on age, weight, and sex, a regression tree can predict mortality rates for patients with heart disease.  


Classification and Regression Tree (CART) is the blanket term for decision tree learning. Ultimately, a classification tree produces an A/B outcome, as in, yes it is a mammal or no it is not; and a regression tree offers a continuous and numerical data outcome, such as the forecasting of housing market prices.


Business Intelligence (BI) in Predictive Analytics

At its core, predictive analytics is a powerful computing tool. However, it is the practice of business intelligence that prioritizes data, takes aim at desired outcomes, and designs a solution around the pattern recognition and forecasting of predictive analytics. In this way, predictive analytics doesn’t replace a human’s capacity to plan, strategize, and creatively execute solutions. The term business intelligence houses not just the technological tools used in predictive analytics and other ML business applications, but the tools, strategies, and best practices brought forth by those capabilities as well.

BI software and platforms offer frontend services like dashboards, reports, and visual tools in the form of graphs and charts. Such an offering allows businesses to scale its data insights companywide, leading to greater collaboration, elimination of silos, and an agile workplace.  


Rob McCargow, Programme Leader Artificial Intelligence at PwC, discusses AI as a tool for business during AI Live 2018

Source: AIIA.net: AI and Intelligent Automation


Use Cases for Predictive Analytics

The uses for predictive analytics are ever increasing. It is no longer confined by limited computing, processing, and storing capacities. Big Data, the cloud, and an increase in processing power has allowed industries across the board to leverage the power of predictive analytics and machine learning. In addition, third party software and vendors are inexpensively providing these services, which now makes them affordable to small businesses or industries who don’t have access to their own mathematicians and data scientists.


  • Healthcare – Predictive analytics in healthcare has the power to take current science and healthcare processes and expand them on a scale humans alone cannot. Symptom calculators, genetic screening, and early intervention and disease prevention all benefit from predictive analytics.


As science better understands the human genome, Big Data and predictive analytics takes this new dataset and offers medical professionals the ability to customize healthcare options to patients based on their predicted future risks and needs.


  • Retail – In brick and mortar retail, predictive analytics assists with inventory management, pricing, and revenue forecasting. New and old customers benefit from the customized offerings predictive analytics allows, and retailers decrease customer churn and increase customer loyalty as a result.


  • eCommerce – Thanks largely to predictive analytics, eCommerce has exploded in recent years. It is predictive analytics that is behind Netflix’s recommendation engine, Amazon’s suggestions, and Facebook’s networking capabilities.


Personalized ads on social media platforms are all made possible by predictive analytics. Done correctly, both the customer and e-tailer benefit. The customer discovers new products and services organically and noninvasively, and the brand strategically spends its marketing dollars in a highly specific and targeted way.


  • Banking – Financial institutions use predictive analytics to detect and prevent fraud, to score credit, and to approve loans. Instantly signing up for a credit card or loan online is possible because of predictive analytics.


  • Cybersecurity – Predictive analytics can help discover data breaches far earlier than the human eye. It can also forecast the when and where of potential cybercrimes, leading to concentrated fortification and detection efforts. While there is still a long way to go in this arena, with the explosion of cybercrime, predictive analytics is shaping up to be a powerful cybersecurity tool.



Strategic models are relying on predictive analytics software and technologies like never before. To stay competitive in today’s fast-moving landscape, understanding and applying the competency of predictive analytics is becoming a necessity more than just a nice-to-have across all public and private sectors. Utilizing current and past knowledge to predict the future has limitless potential.



AltexSoft. (2019, February 26). Price Forecasting: Applying Machine Learning Approaches to Electricity, Flights, Hotels, Real Estate, and Stock Pricing, Altexsoft.com.

Austin, P., Lee, D., Steyerberg, E. Tu, J. (2012, July 6). Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods? Biometrical Journal.

Bari, A., Chaouchi, M., Jung, T. (n.d.). How To Convert Raw Data Into A Predictive Analysis Matrix. Dummies.com.

Dickson, B. (2016). How predictive analytics discovers a data breach before it happens. Techcrunch.com.

Editorial Team, insideBIGDATA (2018, January 20). How Netflix Uses Big Data to Drive Success. Predictiveanalyticsworld.com.

Editorial Team, insideBIGDATA (2018, November 28). Data Mining and Predictive Analytics: Things We should Care About. Insidebigdata.com.

Ferguson, D. (2013, June 18). How supermarkets get your data – and what they do with it. The Guardian.

Gandhi M., Wang T. (n.d.). The Future of Personalized Healthcare: Predictive Analytics. Rockhealth.com.

Garbade, M. (2018, September 12). Understanding K-means Clustering in Machine Learning. Medium.com.

Harrington, R. (2017, July 31). Predictive Analytics & Data Mining 101: Clustering. Compassred.com.

How can Machine Learning boost your predictive analytics? (n.d.). www.marutitech.com.

Ibnouhsein, I., Jankowski, S., Neuberger, K., Mathelin, C. (2018, April 1). The Big Data Revolution for Breast Cancer Patients. J Breast Health.

Jain G. (2018, June 28). Predictive Analytics in the Retail Industry. Infotrellis.com.

Kumar, S. (2018, March 15). The Differences Between Machine Learning And Predictive Analytics. D!gitalist Magazine by SAP.

Maini, V. (2017, August 19). Machine Learning for Humans, Part 2: Supervised Learning. Medium.com.

Maini, V. (2017, August 19). Machine Learning for Humans, Part 3: Unsupervised Learning. Medium.com.

Mallon, S. (2018, November 17). What To Know About How Big Data Is Affecting Social Media. Smartdatacollective.com.

Marr, B. (2018, May 21). How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read. Forbes.

Marvin, R. (2016, July 12). Predictive Analytics, Big Data, and How to Make Them Work for You. PCMag.com.

Pant, B., Pant K., Pardasani, K.R. (2009). Decision Tree Classifier for Classification of Plant and Animal Micro RNAs. Communications in Computer and Information Science.

Pop C.B., Chifu, V.R., Salomie, I., Dinsoreanu, M., David, T., Acretoaie, V., et al. (2011, October 10). Biologically-inspired clustering of semantic web services. Birds or ants intelligence? Concurrency and Computation: Practice and Experience. Wiley Online Library.

Pritchard, J. (2019, March 20). How Banks Use Predictive Analytics for Service, Marketing, & Security. Thebalance.com.

Ray, S. (2015, January 15). Decision Tree – Simplified! Analyticsvidhya.com.

Srivastava, T. (2018, March 26). Introduction to k-Nearest Neighbors: Simplified (with implementation in Python). Analyticsvidhya.com.

Stedman, C., Burns, E. (n.d.). Business intelligence (BI). Searchbusinessanalytics.techtarget.com.

Training Data. (n.d.). www.techopedia.com

Van Rijmenam, M. (2018, June 24). The History Of Predictive Analytics - Infographic. Datafloq.com.