The Metrics
In market basket analysis, support and confidence are key metrics used to identify and evaluate the relationships between items in a dataset, typically in the context of retail transactions.
Support
Definition: Support refers to the proportion of transactions in the dataset that contain a specific item or set of items. It is a measure of how frequently an itemset appears in the database.
Formula:
Support(A, B) = Number of transactions containing both A and B / Total number of transactions
Interpretation: A higher support value indicates that the itemset is common across transactions, which may be indicative of a strong overall purchasing trend.
Confidence
Definition: Confidence is a measure of how often items in a set appear together, given that one of the items is already present. It indicates the likelihood that a transaction containing item A also contains item B.
Formula:
Interpretation: Confidence is used to evaluate the reliability of a rule, such as "If a customer buys A, they are likely to also buy B." A higher confidence value suggests a stronger association between the items.
Example
Suppose we have the following list of transactions from the online shop Vorhees Supplies:
Transaction ID |
Items Purchased |
|
|
Chainsaw, Band Aid
|
|
Chainsaw, Beer, Mask
|
|
Band Aid, Beer, Mask
|
|
Chainsaw, Band Aid, Beer
|
|
Chainsaw, Band Aid, Mask
|
The support for some items is calculated as follows:
Support(Chainsaw):
Chainsaw appears in four out of five transactions.
Support(Chainsaw, Band Aid):
The combination of Chainsaw and Band Aid appears in three out of five transactions.
Let"s calculate the confidence that a customer will buy a Band Aid if they have already bought a Chainsaw.
We know the Support(Chainsaw) is 0.8 and the Support(Chainsaw, Band Aid) is 0.6, so we just need to plug it into the formula:
Results:
- Support tells us how often a particular item or combination of items appears in all transactions. For example, Chainsaw and Band Aid appear together in 60% of all transactions.
- Confidence tells us the likelihood of a customer buying one item if they've already bought another. For example, if a customer buys a Chainsaw, there's a 75% chance they'll also buy a Band Aid.
Lift
Definition: Lift compares the actual probability of purchasing two items together to the probability of purchasing them independently. Essentially, it tells you how much one item boosts the chances of the other item being bought.
Formula:
Where:
- Support(A, B): The probability that both A and B are bought together.
- Support(A): The probability that A is bought.
- Support(B): The probability that B is bought.
Interpretation:
- Lift = 1: Indicates no association between the items A and B. The occurrence of A does not affect the likelihood of B being purchased.
- Lift > 1: Indicates a positive association. If Lift is 1.5, it means that A and B are 1.5 times more likely to be purchased together than if they were independent.
- Lift < 1: Indicates a negative association. It suggests that the presence of A decreases the likelihood of B being purchased.
Advanced Metrics
In market basket analysis, beyond support and confidence, there are other metrics like leverage, conviction, and Zhang's metric that help assess the strength and significance of associations between items.
Leverage
Definition: Leverage measures the difference between the observed frequency of an itemset and the frequency expected if the items were independent. It indicates whether the presence of one item increases or decreases the likelihood of the other item being present.
Formula:
Interpretation:
- A leverage value of 0 indicates no association between the items (they are independent).
- A positive leverage value indicates a positive association (items are often bought together more than expected).
- A negative leverage value indicates a negative association (items are bought together less often than expected).
Conviction
Definition: Conviction measures the strength of an implication rule in terms of the ratio of the expected frequency of the rule being wrong (if the items were independent) to the observed frequency of the rule being wrong. It is an alternative to confidence, especially when the itemset has a low support.
Formula:
Interpretation:
- A conviction value close to 1 suggests a weak or no association.
- A higher conviction value suggests a stronger association.
- Conviction tends to infinity when the confidence is 1 (i.e., the rule is always true).
Zhang's Metric (Interest)
Definition: Zhang's metric, also known as interest, measures the strength of the association between two items by considering their joint probability and their individual probabilities. It is designed to overcome some limitations of other metrics like confidence and lift.
Formula:
Interpretation:
- Zhang's metric value ranges between -1 and 1.
- A value close to 0 indicates no association.
- A positive value indicates a positive association (items are frequently bought together).
- A negative value indicates a negative association (items are rarely bought together).
Summary of Advanced Metrics
Leverage tells us how much more (or less) often items are bought together than expected by chance.
Conviction helps to understand how strongly one item implies the presence of another, considering how often that implication would be wrong.
Zhang's Metric gives a balanced view of association strength, correcting some biases inherent in other metrics.
Diapers, beer and urban myths
Market Basket Analysis had its fifteen minutes of fame through the urban legend of Diapers & Beer. Supposedly Walmart ran a MBA on their customer data and found out that diapers and beer are most commonly bought together. Sadly, this is a kind of an urban legend. The diapers and beer example goes way back to 1992 to a company named Osco Drug. The company used the learnings to remove slow moving SKUs and place fast moving items more closely together.
And what about Walmart?
Well, everyone's favorite retailer from Arkansas uses market basket analysis. Under the easy to remember name
Luminate Shopper Behavior you can get insides into the performance of your brands products, if you are listed at Walmart.
For most SMB, startups and online shops this is not very helpful. So, MBA on owned data it is.
Challenges with Market Basket Analysis
Theoretically, you can run MBA in Excel (But let's be honest, you can do almost everything in Excel, like a Flight Simulator. That does not mean it is a smart move.)
The main challenge does not lie in the actual programming. You can find tons of ready made scripts for R(brrr) and Python(yay) on Github. The problem is the data.
Either you have to few data points, or to many. Which algorithm do you choose, a priori, or fp growth? How long do you look back? Do you segment the data before you run your analysis? Lot's of headaches before you can run the algorithm for the first time.
Market Basket Analysis in the Digital Twin
The Digital Twin was developed to make your life as a marketer or sales guy much easier.
If the Digital Twin has access to your Google Analytics or shop data, you can run MBA off the bat.
Just prompt the Digital Twin to do it for you:
What are the top three associated products bought together?
Which products are bought together on mobile and which on tablets?
Would my customer from California who buy product X prefer to also buy product y or z when the first touch point is Google Ads?