This case was made it during the course "My First Experience as a Data Analyst" and the idea was create a Rule Engine System for increasing the sales in one purchase.
Python, Pandas, Jupyter Notebook, Power BI.
Market Basket Analysis (or Association Rule Analysis) was used in this case to discover associations between products purchased by customers. This technique is based on the idea that certain products tend to be purchased together more frequently than would be expected by chance. Its implementation require the construction of the Support, Confidence and Lift indexes.
Support: Measures the frequency with which a product is found in the entire set of products. This is the probability of that product. This is Support(A) = P(A). In business terms, Support(A) tells us how likely it is that product A will appear on a purchase ticket. Ergo, Support(A∩B) = P(A∩B).
Confidence: It tells us the probability that a subsequent product B will be purchased once the antecedent product A has been purchased, that is, if there exist a relationship between those two products. This relationship will be expressed as A => B, which is the P(B/A).
Lift: The Lift allows us to quantify how many times the probability of buying both products together is greater, equal to, or less than the probability of buying each one independently. This is done to control for the effect of products that are naturally bought very frequently, i.e., highly popular products. The numerator of the Lift represents the products that are bought together, and the denominator represents the products that are bought independently. So, the resultant value of the Lift index can be interpreted as:
Lift = 1; implies that there is no association between A & B.
Lift > 1; implies that B is likely to be purchased if A is purchased.
Lift < 1; implies that B is unlikely to be purchased if A is purchased.
Before creating the sales dashboard with Power BI, you must create a Business Functional Requirements Document. This document defines the scope of the project and the specific tasks to be performed, including a mock up with the design of the final dashboard. This document also prevents sudden changes from being made and you can see it below.
Use of Python within a Jupyter Notebook, Pandas, and DataFrames.
Understanding of probabilistic concepts associated with the development of the rule-generating engine.
How to work with the Support, Confidence and Lift indicators from Market Basket Analysis.
How to deal with business people about the requirements through the Business Functional Requirement Document.
To know how to create a dashboard in Power BI so anyone in the company can easily understand the data. Specifically, the addition of a new rules table and its relationship with the other tables.
A Jupyter Notebook was used for running the Python code and get the resultant rules.
The company's database was provided to our Jupyter Notebook via a .db file and the sqlite3 library was used to establish the connection.
Pandas was used for loading an entire table into a DataFrame to operate with the data.
To create the dashboard in Power BI, the rules generated by the rules engine were taken from a .CSV file and a new table was generated inside Power BI, which was linked to the existing sections table in a many-to-one relationship. Also, there is a tickets table containing all of the company's sales. The sections table contains the names and images of the sections where the company's products are applied.
Market Basket Analysis answers the question: What is the probability that a customer will buy product B if he already have product A in their shopping basket? So, if a customer already has bananas in their shopping basket, what is the probability that this customer will also buy lemons, apples, or strawberries?
Based on the sales history (tickets table), the algorithm searches for product combinations that appear together in a purchase, creating a set of rules (or patterns) that, in turn, contain three indicators: Support, Confidence, and Lift. These rules allow, given a product chosen by the customer, to recommend another product that is highly likely to be purchased in addition to the first, thus increasing the average purchase ticket.
The algorithm starts from the purchases database, which contains all purchased products via the tickets table. From there, the combinations of all products are generated in terms of antecedent and subsequent products, calculating the indicators for each combination. From this, we obtain a table with the results from which we will be able to identify those patterns, those purchasing patterns that exist in our products, so we can later exploit them on our e-commerce site. In such a way that, if a very strong relationship has been identified between A and C (A => C), when the customer is on A's page, or has loaded product A into their cart, then within their recommended products section, the product C will appears in the position number one. Then, the details of each product, such as its identifier and the section and department to which it belongs, will be merged into this resulting table in order to get the full table of rules.
In order to make the results more readable and understandable for the business people, a new mock up is created for building a new dashboard that display the results and rules.
You can download the entire Python code and the full Power BI project file from here. Files MarketBasketAnalysis_colab.ipynb y "Sano y Fresco - Sales Dashboard".pbix.