Market basket Analysis is a powerful tool to enhance the bottom line of every shop, offline or online. Affinity analysis or association rule learning are often used synonymous for Market Basket Analysis (MBA). No matter what you call it, the results are hopefully the same, a set of association rules showing patterns in consumer behavior.

Content

Market Basket Analysis

Market Basket Analysis is a technique used by businesses to understand the buying habits of their customers. Imagine you own a grocery store and want to know what items people tend to buy together. This analysis helps you find patterns in customers' purchases, allowing you to make smarter decisions on product placement, promotions, and inventory.

How It Works

Think of it like this: Every time a customer shops, their purchases are recorded. Market Basket Analysis looks at these records and identifies which products are frequently bought together. For example, if many customers who buy bread also buy butter, this analysis will highlight that connection.


Why It's Useful:

  • Product Placement: f you know that customers often buy chips and soda together, you might place these items near each other in the store to make it easier for customers to find and purchase both.
  • Promotions: If certain items are often bought together, you can create bundle deals to encourage customers to buy more.
  • Inventory Management: Understanding what items are commonly purchased together helps you ensure those items are always in stock.

Simple Example:

Imagine you notice that customers who buy spaghetti often buy tomato sauce too. With this insight, you could place tomato sauce next to the spaghetti on the shelves, making it more convenient for customers and potentially boosting sales.



The Metrics

In market basket analysis, support and confidence are key metrics used to identify and evaluate the relationships between items in a dataset, typically in the context of retail transactions.

Support

Definition: Support refers to the proportion of transactions in the dataset that contain a specific item or set of items. It is a measure of how frequently an itemset appears in the database.

Formula:

Support(A, B) = Number of transactions containing both A and B / Total number of transactions

Interpretation: A higher support value indicates that the itemset is common across transactions, which may be indicative of a strong overall purchasing trend.


Confidence

Definition: Confidence is a measure of how often items in a set appear together, given that one of the items is already present. It indicates the likelihood that a transaction containing item A also contains item B.

Formula:
Confidence ( A B ) = Support ( A , B ) Support ( A )

Interpretation: Confidence is used to evaluate the reliability of a rule, such as "If a customer buys A, they are likely to also buy B." A higher confidence value suggests a stronger association between the items.


Example

Suppose we have the following list of transactions from the online shop Vorhees Supplies:

Transaction ID Items Purchased

1

Chainsaw, Band Aid

2

Chainsaw, Beer, Mask

3

Band Aid, Beer, Mask

4

Chainsaw, Band Aid, Beer

4

Chainsaw, Band Aid, Mask


The support for some items is calculated as follows:

Support(Chainsaw):
Chainsaw appears in four out of five transactions.
Support ( Chainsaw ) = 4 5 = 0.8   or   80 %

Support(Chainsaw, Band Aid):
The combination of Chainsaw and Band Aid appears in three out of five transactions.
Support ( Chainsaw ,   Band Aid ) = 3 5 = 0.6   or   60 %

Let"s calculate the confidence that a customer will buy a Band Aid if they have already bought a Chainsaw.
We know the Support(Chainsaw) is 0.8 and the Support(Chainsaw, Band Aid) is 0.6, so we just need to plug it into the formula:
Confidence ( { Chainsaw } { Band Aid } ) = Support ( Chainsaw ,   Band Aid ) Support ( Chainsaw ) = 0.6 0.8 = 0.75   or   75 %

Results:

  • Support tells us how often a particular item or combination of items appears in all transactions. For example, Chainsaw and Band Aid appear together in 60% of all transactions.
  • Confidence tells us the likelihood of a customer buying one item if they've already bought another. For example, if a customer buys a Chainsaw, there's a 75% chance they'll also buy a Band Aid.



Lift

Definition: Lift compares the actual probability of purchasing two items together to the probability of purchasing them independently. Essentially, it tells you how much one item boosts the chances of the other item being bought.

Formula:
Lift ( A B ) = Support ( A , B ) Support ( A ) × Support ( B )

Where:

  • Support(A, B): The probability that both A and B are bought together.
  • Support(A): The probability that A is bought.
  • Support(B): The probability that B is bought.

Interpretation:
  • Lift = 1: Indicates no association between the items A and B. The occurrence of A does not affect the likelihood of B being purchased.
  • Lift > 1: Indicates a positive association. If Lift is 1.5, it means that A and B are 1.5 times more likely to be purchased together than if they were independent.
  • Lift < 1: Indicates a negative association. It suggests that the presence of A decreases the likelihood of B being purchased.

Advanced Metrics

In market basket analysis, beyond support and confidence, there are other metrics like leverage, conviction, and Zhang's metric that help assess the strength and significance of associations between items.



Leverage

Definition: Leverage measures the difference between the observed frequency of an itemset and the frequency expected if the items were independent. It indicates whether the presence of one item increases or decreases the likelihood of the other item being present.

Formula:
Leverage ( A , B ) = Support ( A , B ) Support ( A ) × Support ( B )

Interpretation:

  • A leverage value of 0 indicates no association between the items (they are independent).
  • A positive leverage value indicates a positive association (items are often bought together more than expected).
  • A negative leverage value indicates a negative association (items are bought together less often than expected).



Conviction

Definition: Conviction measures the strength of an implication rule in terms of the ratio of the expected frequency of the rule being wrong (if the items were independent) to the observed frequency of the rule being wrong. It is an alternative to confidence, especially when the itemset has a low support.

Formula:
Conviction ( A B ) = 1 Support ( B ) 1 Confidence ( A B )

Interpretation:

  • A conviction value close to 1 suggests a weak or no association.
  • A higher conviction value suggests a stronger association.
  • Conviction tends to infinity when the confidence is 1 (i.e., the rule is always true).



Zhang's Metric (Interest)

Definition: Zhang's metric, also known as interest, measures the strength of the association between two items by considering their joint probability and their individual probabilities. It is designed to overcome some limitations of other metrics like confidence and lift.

Formula:
Zhang's Metric ( A B ) = Support ( A , B ) Support ( A ) × Support ( B ) max ( Support ( A ) × ( 1 Support ( B ) ) ,   Support ( B ) × ( 1 Support ( A ) ) )

Interpretation:

  • Zhang's metric value ranges between -1 and 1.
  • A value close to 0 indicates no association.
  • A positive value indicates a positive association (items are frequently bought together).
  • A negative value indicates a negative association (items are rarely bought together).

Summary of Advanced Metrics

Leverage tells us how much more (or less) often items are bought together than expected by chance.

Conviction helps to understand how strongly one item implies the presence of another, considering how often that implication would be wrong.

Zhang's Metric gives a balanced view of association strength, correcting some biases inherent in other metrics.

Diapers, beer and urban myths

Market Basket Analysis had its fifteen minutes of fame through the urban legend of Diapers & Beer. Supposedly Walmart ran a MBA on their customer data and found out that diapers and beer are most commonly bought together. Sadly, this is a kind of an urban legend. The diapers and beer example goes way back to 1992 to a company named Osco Drug. The company used the learnings to remove slow moving SKUs and place fast moving items more closely together.

And what about Walmart?

Well, everyone's favorite retailer from Arkansas uses market basket analysis. Under the easy to remember name Luminate Shopper Behavior you can get insides into the performance of your brands products, if you are listed at Walmart.
For most SMB, startups and online shops this is not very helpful. So, MBA on owned data it is.

Challenges with Market Basket Analysis

Theoretically, you can run MBA in Excel (But let's be honest, you can do almost everything in Excel, like a Flight Simulator. That does not mean it is a smart move.)
The main challenge does not lie in the actual programming. You can find tons of ready made scripts for R(brrr) and Python(yay) on Github. The problem is the data.
Either you have to few data points, or to many. Which algorithm do you choose, a priori, or fp growth? How long do you look back? Do you segment the data before you run your analysis? Lot's of headaches before you can run the algorithm for the first time.

Market Basket Analysis in the Digital Twin

The Digital Twin was developed to make your life as a marketer or sales guy much easier.
If the Digital Twin has access to your Google Analytics or shop data, you can run MBA off the bat.
Just prompt the Digital Twin to do it for you:

What are the top three associated products bought together?
Which products are bought together on mobile and which on tablets?
Would my customer from California who buy product X prefer to also buy product y or z when the first touch point is Google Ads?

Contact us

Get in touch with us for a demo of Mnemonic AI or request further information.