# UNIVERSITY EXAMPLE

Valery is a undergraduate’s student. In this academic session, she attended the Quantitative Methods for Management’s course. The time runs faster, and exam session is approaching.

The Professor gave her an assignment about data analysis and Machine Learning. She should solve the assignment developing a python code dedicated to each request. Before to start coding, she would like to have some initial insights to know if she is in the right direction.

In what follow her assignment:

**Data Analysis and Machine Learning assignment**

- Upload the dataset form the .csv file
- Compute the main quantitative moments for the variables
- Develop a histogram for one or more variables
- Write on an external file the quantitative moments
- Save the histogram
- Plot and analyze the temporal behavior of a selected variable
- Plot and analyze the multivariate behavior of two selected variables
- Apply a machine learning approach (K-Nearest Neighbor, Decision Tree, Random Forest) on a selected variable. Train the model and predict the next three temporal instants.

A friend suggested Valery to have a look at Brainiac Cloud. An innovative platform on which she would be able to upload her data and get some preliminary results in an easy and fast way. She decided to give a try to Brainiac.

#### Step 1 – TOUR

First, she read the TOUR where she learnt about Laura’s problem and how she solved it through Brainiac. The TOUR was a nice introduction to the platform.

#### Step 2 – Upload the dataset from the .csv file

The Valery’s data are organized as shown in the picture below.

In the ML Lab, there is the possibility to upload a dataset. Valery could then upload her data just clicking on ADD DATASET.

The dataset has been uploaded in few second. After that phase, Valery had to click on the dataset icon to see the initial explanatory analysis.

#### Step 3 – Compute the main quantitative moments for the variables

In the DATA EXPLORER, Valery could make some initial consideration about her variables. All variables have been recognized as continuous. Since the dataset contains information on nine companies, she decided to change the type of

*company*variable from numerical to categorical. On the dataset the companies are codified with numbers but they could be also identified with letters without changing the meaning of the variable. In the second DATA EXPLORER’s column the average value of each variables is provided. The average value corresponds to the first moment of a statistical distribution. If the variable is recognized as categorical the mode is provided.

In the fourth column a histogram is proposed for each variable. The histogram is compute on a subset of the original data and can give a initial visualization of each variable. For a more complete visualization, you should use excel or specific language program such as Python or R. For instance, Valery can use

*scipy.stats.moment*for moments calculation and

*numpy.percentile*for percentile calculation.

#### Step 4 – Plot and analyze the temporal behavior of a selected variable

Using the DRAG & DROP feature, Valery was able to make some initial consideration about the stock of plant and equipment (

*kstock*variable). First, she dragged and dropped the

*year*variable on

*x-axis*. Then, she dragged and dropped the

*kstock*variable on the

*y-axis*. Since she is interested on the behavior of each company in the dataset, she selected the

*company*variable as target variable.

For a complete plot, Valery can use

*matplotlib.pyplot*.

#### STEP 5 – Plot and analyze the multivariate behavior of two selected variables

In order to create a multivariate plot, Valery decided to drag and drop the

*export*variable on

*x-axis*and the more a company invests more gets profit. For the plot, it can be seen also that Valery is considering different companies in terms of size.

#### Step 6 – Apply a machine learning approach (K-Nearest Neighbor, Decision Tree, Random Forest) on a selected variable. Train the model and predict the next three temporal instants.

Valery did not know how to use Brainiac Cloud for Machine Learning purpose. She decided to surf the platform searching for some useful tips. On the home page, she found the CASE STUDIES section. She started to read carefully the examples provided. The BUSINESS example is about sales forecasting and it is explained how to set the data for time series analysis and prediction. Following the BUSINESS example, Valery produced a new file for her final task. She decided to focus on a single company (label 34) and on the

*profit*variable.

She then uploaded the new dataset.

*year*and the

*invest_lag_1 variables*.