Project: ML - Neural Networks (Flower Type & Wine Cultivator)¶

Problem 1 (Flower Type):¶

Predict species of flower from flower properties: sepal width and length, petal width and length
Binary classification (only two species of flowers), using MLPClassifier (NN)

Problem 2 (Wine Cultivator):¶

Predict cultivator from wine properties (data: alcohol, malic acid, color intensity, hue, magnesium, etc)
Multiclass classification (3 cultivators), using MLPClassifier (NN)

Tools:¶

Feature Engineering: rescale StandardScaler rescaling and reshuffle df
Models: NNT3 Algorithm, MLPClassifier
Model validation: holdout validation (train_test_split (default, test=0.25))
Error Metric: AUC, classification_report, confusion_matrix

load defaults¶

import numpy as np
import pandas as pd
import seaborn as sns
import re
import requests 

%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
from matplotlib import rcParams
import matplotlib.dates as mdates
from datetime import datetime
from IPython.display import display, Math

from functions import *

plt.style.use('seaborn')
plt.rcParams.update({'axes.titlepad': 20, 'font.size': 12, 'axes.titlesize':20})

colors = [(0/255,107/255,164/255), (255/255, 128/255, 14/255), 'red', 'green', '#9E80BA', '#8EDB8E', '#58517A']
Ncolors = 10
color_map = plt.cm.Blues_r(np.linspace(0.2, 0.5, Ncolors))
#color_map = plt.cm.tab20c_r(np.linspace(0.2, 0.5, Ncolors))


#specific to this project
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
#to normalize data
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, confusion_matrix

print("Defaults Loaded")

Defaults Loaded

Problem 1: Predict Flower Type from flower properties¶

# Read in dataset
iris = pd.read_csv("./data/iris.csv")

display(iris[:3])

# shuffle rows
shuffled_rows = np.random.permutation(iris.index)
iris = iris.loc[shuffled_rows,:]

my_dict = {'Iris-versicolor': 1, 'Iris-virginica': 2}
iris['species'] = iris['species'].map(my_dict)

display(iris.describe().transpose())

X = iris.drop('species',axis=1)
y = iris['species']

X_train, X_test, y_train, y_test = train_test_split(X, y)

#normalize data using StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
StandardScaler(copy=True, with_mean=True, with_std=True)

# Now apply the transformations to the data:
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

print("data normalization data")

data normalization data

mlp = MLPClassifier(hidden_layer_sizes=(13,13,13),max_iter=500)
mlp.fit(X_train,y_train)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(13, 13, 13), learning_rate='constant',
       learning_rate_init=0.001, max_iter=500, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

Error Metrics: Confusion Matrix and AUC¶

predictions = mlp.predict(X_test)

#classification report
print(classification_report(y_test,predictions))

#confusion matrix
True_positives = confusion_matrix(y_test,predictions)[0][0]
False_Positives = confusion_matrix(y_test,predictions)[0][1]
False_Negatives = confusion_matrix(y_test,predictions)[1][0]
True_Negatives = confusion_matrix(y_test,predictions)[1][1]

print("True_positives: {:d}".format(True_positives))
print("False_Positives: {:d}".format(False_Positives))
print("False_Negatives: {:d}".format(False_Negatives))
print("True_Negatives: {:d}".format(True_Negatives))

#AUC
auc = roc_auc_score(y_test, predictions)
print("\nAUC: {:0.3f}".format(auc))

              precision    recall  f1-score   support

           1       1.00      0.83      0.91        12
           2       0.87      1.00      0.93        13

   micro avg       0.92      0.92      0.92        25
   macro avg       0.93      0.92      0.92        25
weighted avg       0.93      0.92      0.92        25

True_positives: 10
False_Positives: 2
False_Negatives: 0
True_Negatives: 13

AUC: 0.917

Problem 2: Predict cultivator from Wine properties¶

columns = ["Cultivator", "Alchol", "Malic_Acid", "Ash", "Alcalinity_of_Ash", 
           "Magnesium", "Total_phenols", "Falvanoids", "Nonflavanoid_phenols", 
           "Proanthocyanins", "Color_intensity", "Hue", "OD280", "Proline"]
wine = pd.read_csv('./data/wine_data.csv', names = columns)
display(wine.iloc[:3,:12])

wine.describe().transpose()

print(wine.shape)

(178, 14)

178 data points with 13 features and 1 label column

X = wine.drop('Cultivator',axis=1)
y = wine['Cultivator']

X_train, X_test, y_train, y_test = train_test_split(X, y)

#normalize data using StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
StandardScaler(copy=True, with_mean=True, with_std=True)

# Now apply the transformations to the data:
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

print("data normalization data")

data normalization data

/Users/BrunoHenriques/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py:617: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
  return self.partial_fit(X, y)
/Users/BrunoHenriques/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:12: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
  if sys.path[0] == '':
/Users/BrunoHenriques/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:13: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
  del sys.path[0]

Train the model¶

mlp = MLPClassifier(hidden_layer_sizes=(13,13,13),max_iter=500)
mlp.fit(X_train,y_train)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(13, 13, 13), learning_rate='constant',
       learning_rate_init=0.001, max_iter=500, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

parameters can be adjusted

predictions = mlp.predict(X_test)
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test,predictions))

[[18  0  0]
 [ 1 12  0]
 [ 0  1 13]]
              precision    recall  f1-score   support

           1       0.95      1.00      0.97        18
           2       0.92      0.92      0.92        13
           3       1.00      0.93      0.96        14

   micro avg       0.96      0.96      0.96        45
   macro avg       0.96      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45

coefs_ is a list of weight matrices, where weight matrix at index i represents the weights between layer i and layer i+1.
intercepts_ is a list of bias vectors, where the vector at index i represents the bias values added to layer i+1.

print(len(mlp.coefs_))
print(len(mlp.coefs_[0]))
print(len(mlp.intercepts_[0]))

4
13
13

	sepal_length	sepal_width	petal_length	petal_width	species
0	7.0	3.2	4.7	1.4	Iris-versicolor
1	6.4	3.2	4.5	1.5	Iris-versicolor
2	6.9	3.1	4.9	1.5	Iris-versicolor

	count	mean	std	min	25%	50%	75%	max
sepal_length	100.0	6.262	0.662834	4.9	5.800	6.3	6.700	7.9
sepal_width	100.0	2.872	0.332751	2.0	2.700	2.9	3.025	3.8
petal_length	100.0	4.906	0.825578	3.0	4.375	4.9	5.525	6.9
petal_width	100.0	1.676	0.424769	1.0	1.300	1.6	2.000	2.5
species	100.0	1.500	0.502519	1.0	1.000	1.5	2.000	2.0

	Cultivator	Alchol	Malic_Acid	Ash	Alcalinity_of_Ash	Magnesium	Total_phenols	Falvanoids	Nonflavanoid_phenols	Proanthocyanins	Color_intensity	Hue
0	1	14.23	1.71	2.43	15.6	127	2.80	3.06	0.28	2.29	5.64	1.04
1	1	13.20	1.78	2.14	11.2	100	2.65	2.76	0.26	1.28	4.38	1.05
2	1	13.16	2.36	2.67	18.6	101	2.80	3.24	0.30	2.81	5.68	1.03

	count	mean	std	min	25%	50%	75%	max
Cultivator	178.0	1.938202	0.775035	1.00	1.0000	2.000	3.0000	3.00
Alchol	178.0	13.000618	0.811827	11.03	12.3625	13.050	13.6775	14.83
Malic_Acid	178.0	2.336348	1.117146	0.74	1.6025	1.865	3.0825	5.80
Ash	178.0	2.366517	0.274344	1.36	2.2100	2.360	2.5575	3.23
Alcalinity_of_Ash	178.0	19.494944	3.339564	10.60	17.2000	19.500	21.5000	30.00
Magnesium	178.0	99.741573	14.282484	70.00	88.0000	98.000	107.0000	162.00
Total_phenols	178.0	2.295112	0.625851	0.98	1.7425	2.355	2.8000	3.88
Falvanoids	178.0	2.029270	0.998859	0.34	1.2050	2.135	2.8750	5.08
Nonflavanoid_phenols	178.0	0.361854	0.124453	0.13	0.2700	0.340	0.4375	0.66
Proanthocyanins	178.0	1.590899	0.572359	0.41	1.2500	1.555	1.9500	3.58
Color_intensity	178.0	5.058090	2.318286	1.28	3.2200	4.690	6.2000	13.00
Hue	178.0	0.957449	0.228572	0.48	0.7825	0.965	1.1200	1.71
OD280	178.0	2.611685	0.709990	1.27	1.9375	2.780	3.1700	4.00
Proline	178.0	746.893258	314.907474	278.00	500.5000	673.500	985.0000	1680.00