To solve the problem of finding the percentage variance of sales from the previous day in a “sales” table, let’s break it down step-by-step. This will involve calculating how much the sales change from one day to the next and then expressing that change as a percentage.
Problem Breakdown
- Understand the Sales Table: The table likely contains at least the following columns:
sale_date
: Date of the sale.sales_amount
: The total sales amount for the day.
- Percentage Variance Formula: The percentage variance formula is given as:
- Percentage Variance = (Current Day’s Sales−Previous Day’s Sales)/ Previous Day’s Sales×100
We will calculate this variance for each day, except the first date in the dataset, as it does not have a prior day to compare.
- Percentage Variance = (Current Day’s Sales−Previous Day’s Sales)/ Previous Day’s Sales×100
Step 1: Create the sales
Table
We need to create a table named sales
with two columns:
sale_date
: The date of the sales transaction.sales_amount
: The total sales amount on that day.
Here’s the SQL query to create the table:
CREATE TABLE sales (
sale_date DATE PRIMARY KEY,
sales_amount DECIMAL(10, 2)
);
sale_date
: This is the date of the sale, defined as aDATE
data type, and it’s the primary key to ensure each day’s data is unique.sales_amount
: This represents the sales for the day, defined asDECIMAL(10, 2)
to store values with two decimal places (e.g., 100.50).
Step 2: Insert Dummy Data
Let’s insert 20 rows of sales data to simulate real-world sales amounts over a span of 20 consecutive days.
Here’s the SQL query for inserting data into the sales
table:
INSERT INTO sales (sale_date, sales_amount)
VALUES
('2024-09-01', 500.00),
('2024-09-02', 550.00),
('2024-09-03', 530.00),
('2024-09-04', 600.00),
('2024-09-05', 620.00),
('2024-09-06', 610.00),
('2024-09-07', 650.00),
('2024-09-08', 670.00),
('2024-09-09', 690.00),
('2024-09-10', 700.00),
('2024-09-11', 680.00),
('2024-09-12', 720.00),
('2024-09-13', 750.00),
('2024-09-14', 770.00),
('2024-09-15', 800.00),
('2024-09-16', 810.00),
('2024-09-17', 790.00),
('2024-09-18', 820.00),
('2024-09-19', 830.00),
('2024-09-20', 850.00);
Explanation of the Data:
- We inserted 20 rows of data spanning from
2024-09-01
to2024-09-20
. - The sales amounts fluctuate slightly, simulating a real-world scenario where sales numbers vary daily.
Percentage Variance Query for MySQL
After creating the table and inserting the data, you can now run the percentage variance query provided earlier:
SELECT
sale_date,
sales_amount,
LAG(sales_amount) OVER (ORDER BY sale_date) AS previous_day_sales,
ROUND(((sales_amount - LAG(sales_amount) OVER (ORDER BY sale_date)) /
LAG(sales_amount) OVER (ORDER BY sale_date)) * 100, 2) AS percentage_variance
FROM
sales
ORDER BY
sale_date;
Output:
Explanation:
LAG(sales_amount) OVER (ORDER BY sale_date)
:- This retrieves the
sales_amount
of the previous row (previous day) based on the ordering ofsale_date
. This value is stored asprevious_day_sales
.
- This retrieves the
(sales_amount - LAG(sales_amount) OVER (ORDER BY sale_date)) / LAG(sales_amount) OVER (ORDER BY sale_date)
:- This part calculates the difference between the current day’s sales and the previous day’s sales, then divides that by the previous day’s sales to compute the percentage variance.
ROUND(..., 2)
:- Rounds the percentage variance to two decimal places for better readability.
ORDER BY sale_date
:- Ensures the data is ordered by the sale date for proper comparison.
Edge Cases and Assumptions:
- First Day Handling: Since the first date has no previous day to compare to, the result for this row will be
NULL
. This is a natural limitation of theLAG()
function. - Missing Dates: If the dataset has gaps in the dates (e.g., no sales on weekends or holidays), the calculation still works but may give misleading results unless the data is continuous or padded with zeroes for missing dates.
- Zero Sales on a Previous Day: If a previous day has
sales_amount = 0
, the percentage variance formula would result in a division by zero error. You can handle this case by using aCASE
statement to check for zero values.
Practice Questions:
- Modify the query to calculate the percentage variance based on a moving average of the last three days of sales.
- Write a query to find the days with the highest positive percentage variance in sales.
- How would you adjust the query if sales data were missing for certain days (e.g., weekends or holidays)?
- What changes would you make to handle multiple stores’ sales data, where each store has its own daily sales?
- Extend the query to return only the days where the percentage variance exceeds 10%.