Joe runs a pet shop. Business is good, but another pet shop opened nearby and competition is becoming tougher by the day. Many business aspects may be improved, but one may yield an immediate advantage: sales forecasting. By an accurate sales forecasting, Joe may keep a lower inventory and thus make a more efficient use of money that may be invested in other business aspects such as marketing and communication.
Joe knows the average daily sales for dog food are $535.66. This is his benchmark for sales forecasting and he is not satisfied with it. He needs something more accurate. Fortunately, he came across Brainiac Cloud.
Step 1 – Joe’s Excel file
Joe starts with an Excel file containing two columns: Date and Sales. The file is an extraction from his local software that yields for each day, the total sales in $ related to dog food products. Joe was careful to upload data in numerical format, thus omitting any symbols.
Step 2 – Build the Dataset
Joe wants to predict Sales (the target variable). Sales are a time series. Time series are characterized by the natural flow of time, as the name says. In its basic form, the sales of today are influenced by the sales of yesterday, the sales of yesterday by the sales of two days ago, and so on. Then, what will influence the sales of tomorrow? Easy: the sales of today. In time series modelling, this setting is called auto-regressive model with one lag. One lag means that what you are analyzing is affected by just one day ago.
For more details, have a look at https://en.wikipedia.org/wiki/Autoregressive_model.
To process time series using machine learning Joe must transform the data to make them interpretable by machine learning models. At this point, what information should the ML model work on in order to predict sales? Following his experience and what he read, Joe’s first guess is that today’s sales are affected by yesterday’s sales. He works on his Excel file to put in each row the following information:
- Sales corresponding to Date (target)
- Sales corresponding to the previous day
Step 3 – Upload the Dataset
Joe saves the file in CSV format and uploads it into Brainiac Cloud.
Step 4 – Explore the Data
In ML Lab, Joe selects the dataset and selects Sales as target variable. In addition, he excludes the Date column from the analysis as it doesn’t seem to make any sense for sales to be influenced by the exact date such as February 4th or February 11th.
Step 5 – Train the Model
Joe selects XGBoost as a regression model and hits train. After a few seconds, Brainiac Cloud returns model diagnostics on the trained model. By looking at MSE, Joe finds the model is 4.3% better than his original benchmark.
Step 6 – Use the model
To use the model Joe may use ML Lab Quick Model Application feature to fill out the data needed for prediction or may leverage Apply to File feature by creating an Excel file containing in each row the data necessary for prediction (i.e., Sales_Lag_1).