Valery is a undergraduate’s student. In this academic session, she attended the Quantitative Methods for Management’s course. The time runs faster, and exam session is approaching.
The Professor gave her an assignment about data analysis and Machine Learning. She should solve the assignment developing a python code dedicated to each request. Before to start coding, she would like to have some initial insights to know if she is in the right direction.
In what follow her assignment:
Data Analysis and Machine Learning assignment
- Upload the dataset form the .csv file
- Compute the main quantitative moments for the variables
- Develop a histogram for one or more variables
- Write on an external file the quantitative moments
- Save the histogram
- Plot and analyze the temporal behavior of a selected variable
- Plot and analyze the multivariate behavior of two selected variables
- Apply a machine learning approach (K-Nearest Neighbor, Decision Tree, Random Forest) on a selected variable. Train the model and predict the next three temporal instants.
A friend suggested Valery to have a look at Brainiac Cloud. An innovative platform on which she would be able to upload her data and get some preliminary results in an easy and fast way. She decided to give a try to Brainiac.
Step 1 – TOUR
First, she read the TOUR where she learnt about Laura’s problem and how she solved it through Brainiac. The TOUR was a nice introduction to the platform.
Step 2 – Upload the dataset from the .csv file
The Valery’s data are organized as shown in the picture below.
In the ML Lab, there is the possibility to upload a dataset. Valery could then upload her data just clicking on ADD DATASET.
The dataset has been uploaded in few second. After that phase, Valery had to click on the dataset icon to see the initial explanatory analysis.
Step 3 – Compute the main quantitative moments for the variables
In the DATA EXPLORER, Valery could make some initial consideration about her variables. All variables have been recognized as continuous. Since the dataset contains information on nine companies, she decided to change the type of company variable from numerical to categorical. On the dataset the companies are codified with numbers but they could be also identified with letters without changing the meaning of the variable. In the second DATA EXPLORER’s column the average value of each variables is provided. The average value corresponds to the first moment of a statistical distribution. If the variable is recognized as categorical the mode is provided.
In the fourth column a histogram is proposed for each variable. The histogram is compute on a subset of the original data and can give a initial visualization of each variable. For a more complete visualization, you should use excel or specific language program such as Python or R. For instance, Valery can use scipy.stats.moment for moments calculation and numpy.percentile for percentile calculation.
Step 4 – Plot and analyze the temporal behavior of a selected variable
Using the DRAG & DROP feature, Valery was able to make some initial consideration about the stock of plant and equipment (kstock variable). First, she dragged and dropped the year variable on x-axis. Then, she dragged and dropped the kstock variable on the y-axis. Since she is interested on the behavior of each company in the dataset, she selected the company variable as target variable.
In the Brainiac plot, only a subset of companies is reported. From the plot, it can be easily seen that company 4 (label 35) has a linear behavior over the time instead company 0 (label 31) after an initial stable period, it had a slow flexion and then a sudden increase of stock.
For a complete plot, Valery can use matplotlib.pyplot.
For a complete plot, Valery can use matplotlib.pyplot.
STEP 5 – Plot and analyze the multivariate behavior of two selected variables
In order to create a multivariate plot, Valery decided to drag and drop the export variable on x-axis and the more a company invests more gets profit. For the plot, it can be seen also that Valery is considering different companies in terms of size.
Step 6 – Apply a machine learning approach (K-Nearest Neighbor, Decision Tree, Random Forest) on a selected variable. Train the model and predict the next three temporal instants.
Valery did not know how to use Brainiac Cloud for Machine Learning purpose. She decided to surf the platform searching for some useful tips. On the home page, she found the CASE STUDIES section. She started to read carefully the examples provided. The BUSINESS example is about sales forecasting and it is explained how to set the data for time series analysis and prediction. Following the BUSINESS example, Valery produced a new file for her final task. She decided to focus on a single company (label 34) and on the profit variable.
She then uploaded the new dataset.
In the PICK YOUR MODEL section, she selected the K-Nearest Neighbor model. In the first instance, she did not change the default parameters in order to have some results. In few seconds, she got some really good results. The three indicators (MAE, MSE and Brainiac Meter) where satisfactory as first try. Reading the tips near each indicator, she could have a better idea about their meaning.
It is time to predict! Using the QUICK MODEL APPLICATION section, she could predict the next year just changing the year and the invest_lag_1 variables.