Challenges with association rules mining and how to address them
In market basket analysis, support and confidence are key metrics used to identify and evaluate the relationships between items in a dataset, typically in the context of retail transactions.
Definition: Support refers to the proportion of transactions in the dataset that contain a specific item or set of items. It is a measure of how frequently an itemset appears in the database.
Formula:
Support(A, B) = ^{Number of transactions containing both A and B} / _{Total number of transactions}
Interpretation: A higher support value indicates that the itemset is common across transactions, which may be indicative of a strong overall purchasing trend.
Definition: Confidence is a measure of how often items in a set appear together, given that one of the items is already present. It indicates the likelihood that a transaction containing item A also contains item B.
Formula:
$\mathrm{Confidence}(A\to B)=\frac{\mathrm{Support}(A,B)}{\mathrm{Support}\left(A\right)}$
Interpretation: Confidence is used to evaluate the reliability of a rule, such as "If a customer buys A, they are likely to also buy B." A higher confidence value suggests a stronger association between the items.
Suppose we have the following list of transactions from the online shop Vorhees Supplies:
Transaction ID  Items Purchased  

1 
Chainsaw, Band Aid 

2 
Chainsaw, Beer, Mask 

3 
Band Aid, Beer, Mask 

4 
Chainsaw, Band Aid, Beer 

4 
Chainsaw, Band Aid, Mask 
Results:
Definition: Lift compares the actual probability of purchasing two items together to the probability of purchasing them independently. Essentially, it tells you how much one item boosts the chances of the other item being bought.
Formula:
$\mathrm{Lift}(A\to B)=\frac{\mathrm{Support}(A,B)}{\mathrm{Support}\left(A\right)\times \mathrm{Support}\left(B\right)}$
Where:
In market basket analysis, beyond support and confidence, there are other metrics like leverage, conviction, and Zhang's metric that help assess the strength and significance of associations between items.
Definition: Leverage measures the difference between the observed frequency of an itemset and the frequency expected if the items were independent. It indicates whether the presence of one item increases or decreases the likelihood of the other item being present.
Formula:
$\mathrm{Leverage}(A,B)=\mathrm{Support}(A,B)\mathrm{Support}\left(A\right)\times \mathrm{Support}\left(B\right)$
Interpretation:
Definition: Conviction measures the strength of an implication rule in terms of the ratio of the expected frequency of the rule being wrong (if the items were independent) to the observed frequency of the rule being wrong. It is an alternative to confidence, especially when the itemset has a low support.
Formula:
$\mathrm{Conviction}(A\to B)=\frac{1\mathrm{Support}\left(B\right)}{1\mathrm{Confidence}(A\to B)}$
Interpretation:
Definition: Zhang's metric, also known as interest, measures the strength of the association between two items by considering their joint probability and their individual probabilities. It is designed to overcome some limitations of other metrics like confidence and lift.
Formula:
$\mathrm{Zhang\text{'}sMetric}(A\to B)=\frac{\mathrm{Support}(A,B)\mathrm{Support}\left(A\right)\times \mathrm{Support}\left(B\right)}{max\left(\mathrm{Support}\right(A)\times (1\mathrm{Support}\left(B\right)),\mathrm{Support}(B)\times (1\mathrm{Support}\left(A\right)\left)\right)}$
Interpretation:
Leverage tells us how much more (or less) often items are bought together than expected by chance.
Conviction helps to understand how strongly one item implies the presence of another, considering how often that implication would be wrong.
Zhang's Metric gives a balanced view of association strength, correcting some biases inherent in other metrics.
Market Basket Analysis had its fifteen minutes of fame through the urban legend of Diapers & Beer. Supposedly Walmart ran a MBA on their customer data and found out that diapers and beer are most commonly bought together. Sadly, this is a kind of an urban legend. The diapers and beer example goes way back to 1992 to a company named Osco Drug. The company used the learnings to remove slow moving SKUs and place fast moving items more closely together.
Theoretically, you can run MBA in Excel (But let's be honest, you can do almost everything in Excel, like a Flight Simulator. That does not mean it is a smart move.)
The main challenge does not lie in the actual programming. You can find tons of ready made scripts for R(brrr) and Python(yay) on Github. The problem is the data.
Either you have to few data points, or to many. Which algorithm do you choose, a priori, or fp growth? How long do you look back? Do you segment the data before you run your analysis? Lot's of headaches before you can run the algorithm for the first time.
The Digital Twin was developed to make your life as a marketer or sales guy much easier.
If the Digital Twin has access to your Google Analytics or shop data, you can run MBA off the bat.
Just prompt the Digital Twin to do it for you:
What are the top three associated products bought together?
Which products are bought together on mobile and which on tablets?
Would my customer from California who buy product X prefer to also buy product y or z when the first touch point is Google Ads?
Get in touch with us for a demo of Mnemonic AI or request further information.