In Machine Learning, the Apriori algorithm is used for data mining association rules. In this article, I will take you through Market Basket Analysis using the Apriori algorithm in Machine Learning by using the Python programming language.
What is Association Mining?
Association mining is typically performed on transaction data from a retail marketplace or online e-commerce store. Since most transaction data is large, the a priori algorithm makes it easy to find these patterns or rules quickly.
Also, Read – 100+ Machine Learning Projects solved and explained.
Association rules are used to analyze retail or transactional data and are intended to identify strong rules mainly found in transactional data using measures of interest, based on the concept of strong principals.
How does the Apriori Algorithm Work?
The Apriori algorithm is the most popular algorithm for mining association rules. It finds the most frequent combinations in a database and identifies the rules of association between elements, based on 3 important factors:
- Support: the probability that X and Y meet
- Confidence: the conditional probability that Y knows x. In other words, how often does Y occur when X came first.
- Lift: the relationship between support and confidence. An increase of 2 means that the probability of buying X and Y together is twice as high as the probability of simply buying Y.
Apriori uses a “bottom-up” approach, in which frequent subsets are extended one item at a time (one step is called candidate generation) and groups of candidates are tested against the data. The algorithm ends when no other successful extension is found.
Now, I will take you through the task of Market Basket analysis using the Apriori Algorithm using Python and Machine Learning.
Market Basket Analysis with Apriori Algorithm using Python
Market basket analysis, also known as association rule learning or affinity analysis, is a data mining technique that can be used in various fields, such as marketing, bioinformatics, the field of marketing. education, nuclear science, etc.
The main goal of market basket analysis in marketing is to provide the retailer with the information necessary to understand the buyer’s purchasing behaviour, which can help the retailer make incorrect decisions.
There are different algorithms for performing market basket analysis. Existing algorithms operate on static data and do not capture data changes over time. But the Apriori algorithm not only leverages static data but also provides a new way to account for changes that occur in the data.
I will start this task of Market Basket Analysis with Apriori Algorithm by importing the necessay Python libraries:
Now let’s load the dataset. The dataset that I am using in this task can be downloaded from here:
data = pd.read_csv("Groceries_dataset.csv") data.head()
Member_number | Date | itemDescription | |
---|---|---|---|
0 | 1808 | 21-07-2015 | tropical fruit |
1 | 2552 | 05-01-2015 | whole milk |
2 | 2300 | 19-09-2015 | pip fruit |
3 | 1187 | 12-12-2015 | other vegetables |
4 | 3037 | 01-02-2015 | whole milk |
Data Exploration
Let’s first have a look at the top 10 most selling products:

Now let’s explore the higher sales:

Observations:
From the above visualizations we can observe that:
- Milk is bought the most, followed by vegetables.
- Most shopping takes place in August / September, while February / March is the least demanding.
Implementation of Apriori Algorithm uisng Python
Now, I will implement the Apriori algorithm in machine learning by using the Python programming language for the taks of market basket analysis:
RelationRecord(items=frozenset({'liver loaf', 'fruit/vegetable juice'}), support=0.00040098910646260775, ordered_statistics=[OrderedStatistic(items_base=frozenset({'liver loaf'}), items_add=frozenset({'fruit/vegetable juice'}), confidence=0.12, lift=3.5276227897838903)])
Rule : liver loaf -> fruit/vegetable juice Support : 0.00040098910646260775 Confidence : 0.12 Lift : 3.5276227897838903 ============================= Rule : ham -> pickled vegetables Support : 0.0005346521419501437 Confidence : 0.05970149253731344 Lift : 3.4895055970149254 ============================= Rule : roll products -> meat Support : 0.0003341575887188398 Confidence : 0.06097560975609757 Lift : 3.620547812620984 ============================= Rule : misc. beverages -> salt Support : 0.0003341575887188398 Confidence : 0.05617977528089888 Lift : 3.5619405827461437 ============================= Rule : spread cheese -> misc. beverages Support : 0.0003341575887188398 Confidence : 0.05 Lift : 3.170127118644068 ============================= Rule : soups -> seasonal products Support : 0.0003341575887188398 Confidence : 0.10416666666666667 Lift : 14.704205974842768 ============================= Rule : spread cheese -> sugar Support : 0.00040098910646260775 Confidence : 0.06 Lift : 3.3878490566037733 =============================
I hope you liked this article on the Apriori algorithm in Machine Learning by using the Python programming language. Feel free to ask your valuable questions in the comments section below.
I Aman. Sorry for the question. The code return this error:
rules = apriori(transactions, min_support = 0.00030, min_confidence = 0.05, min_lift = 3, max_length = 2, target = “rules”)
NameError: name ‘transactions’ is not defined
The code:
import numpy as np
import pandas as pd
import plotly.express as px
import apyori
from apyori import apriori
rules = apriori(transactions, min_support = 0.00030, min_confidence = 0.05, min_lift = 3, max_length = 2, target = “rules”)
association_results = list(rules)
print(association_results[0])
RelationRecord(items=frozenset({‘liver loaf’, ‘fruit/vegetable juice’}), \
support=0.00040098910646260775, \
ordered_statistics=[OrderedStatistic(items_base=frozenset({‘liver loaf’}),\
items_add=frozenset({‘fruit/vegetable juice’}), \
confidence=0.12, lift=3.5276227897838903)])
Complete code can be found here: https://github.com/amankharwal/Website-data/blob/master/association_rule_market_basket_analysis.ipynb