Apriori Algorithm using Python

In Machine Learning, the Apriori algorithm is used for data mining association rules. In this article, I will take you through Market Basket Analysis using the Apriori algorithm in Machine Learning by using the Python programming language.

What is Association Mining?

Association mining is typically performed on transaction data from a retail marketplace or online e-commerce store. Since most transaction data is large, the a priori algorithm makes it easy to find these patterns or rules quickly.

Also, Read – 100+ Machine Learning Projects solved and explained.

Association rules are used to analyze retail or transactional data and are intended to identify strong rules mainly found in transactional data using measures of interest, based on the concept of strong principals.

How does the Apriori Algorithm Work?

The Apriori algorithm is the most popular algorithm for mining association rules. It finds the most frequent combinations in a database and identifies the rules of association between elements, based on 3 important factors:

  1. Support: the probability that X and Y meet
  2. Confidence: the conditional probability that Y knows x. In other words, how often does Y occur when X came first.
  3. Lift: the relationship between support and confidence. An increase of 2 means that the probability of buying X and Y together is twice as high as the probability of simply buying Y.

Apriori uses a “bottom-up” approach, in which frequent subsets are extended one item at a time (one step is called candidate generation) and groups of candidates are tested against the data. The algorithm ends when no other successful extension is found.

Now, I will take you through the task of Market Basket analysis using the Apriori Algorithm using Python and Machine Learning.

Market Basket Analysis with Apriori Algorithm using Python

Market basket analysis, also known as association rule learning or affinity analysis, is a data mining technique that can be used in various fields, such as marketing, bioinformatics, the field of marketing. education, nuclear science, etc.

The main goal of market basket analysis in marketing is to provide the retailer with the information necessary to understand the buyer’s purchasing behaviour, which can help the retailer make incorrect decisions.

There are different algorithms for performing market basket analysis. Existing algorithms operate on static data and do not capture data changes over time. But the Apriori algorithm not only leverages static data but also provides a new way to account for changes that occur in the data.

I will start this task of Market Basket Analysis with Apriori Algorithm by importing the necessay Python libraries:

Now let’s load the dataset. The dataset that I am using in this task can be downloaded from here:

data = pd.read_csv("Groceries_dataset.csv")
data.head()
Member_numberDateitemDescription
0180821-07-2015tropical fruit
1255205-01-2015whole milk
2230019-09-2015pip fruit
3118712-12-2015other vegetables
4303701-02-2015whole milk

Data Exploration

Let’s first have a look at the top 10 most selling products:

data exploration - apriori algorithm

Now let’s explore the higher sales:

higher sales exploration

Observations:

From the above visualizations we can observe that:

  1. Milk is bought the most, followed by vegetables.
  2. Most shopping takes place in August / September, while February / March is the least demanding.

Implementation of Apriori Algorithm uisng Python

Now, I will implement the Apriori algorithm in machine learning by using the Python programming language for the taks of market basket analysis:

RelationRecord(items=frozenset({'liver loaf', 'fruit/vegetable juice'}), support=0.00040098910646260775, ordered_statistics=[OrderedStatistic(items_base=frozenset({'liver loaf'}), items_add=frozenset({'fruit/vegetable juice'}), confidence=0.12, lift=3.5276227897838903)])
Rule :  liver loaf  -> fruit/vegetable juice
Support :  0.00040098910646260775
Confidence :  0.12
Lift :  3.5276227897838903
=============================
Rule :  ham  -> pickled vegetables
Support :  0.0005346521419501437
Confidence :  0.05970149253731344
Lift :  3.4895055970149254
=============================
Rule :  roll products   -> meat
Support :  0.0003341575887188398
Confidence :  0.06097560975609757
Lift :  3.620547812620984
=============================
Rule :  misc. beverages  -> salt
Support :  0.0003341575887188398
Confidence :  0.05617977528089888
Lift :  3.5619405827461437
=============================
Rule :  spread cheese  -> misc. beverages
Support :  0.0003341575887188398
Confidence :  0.05
Lift :  3.170127118644068
=============================
Rule :  soups  -> seasonal products
Support :  0.0003341575887188398
Confidence :  0.10416666666666667
Lift :  14.704205974842768
=============================
Rule :  spread cheese  -> sugar
Support :  0.00040098910646260775
Confidence :  0.06
Lift :  3.3878490566037733
=============================

I hope you liked this article on the Apriori algorithm in Machine Learning by using the Python programming language. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

Data Strategist at Statso. My aim is to decode data science for the real world in the most simple words.

Articles: 1611

2 Comments

  1. I Aman. Sorry for the question. The code return this error:

    rules = apriori(transactions, min_support = 0.00030, min_confidence = 0.05, min_lift = 3, max_length = 2, target = “rules”)
    NameError: name ‘transactions’ is not defined

    The code:
    import numpy as np
    import pandas as pd
    import plotly.express as px
    import apyori
    from apyori import apriori

    rules = apriori(transactions, min_support = 0.00030, min_confidence = 0.05, min_lift = 3, max_length = 2, target = “rules”)
    association_results = list(rules)
    print(association_results[0])

    RelationRecord(items=frozenset({‘liver loaf’, ‘fruit/vegetable juice’}), \
    support=0.00040098910646260775, \
    ordered_statistics=[OrderedStatistic(items_base=frozenset({‘liver loaf’}),\
    items_add=frozenset({‘fruit/vegetable juice’}), \
    confidence=0.12, lift=3.5276227897838903)])

Leave a Reply

Discover more from thecleverprogrammer

Subscribe now to keep reading and get access to the full archive.

Continue reading