This project aimed to build a classifier to sort wine into seven different classes based on their quality. We chose MLP Classifier for the task since this project was a part of my Neural Networks course. The classifier learns using 10 features from the collected data.
Data Collection and Preprocessing
Data Source: Wine Quality Dataset from UCI Machine Learning Repository
Sample data:
Data Preprocessing
Checked for missing value.
#identify nans
def num_missing(x):
return sum(x.isnull())
#Applying per column:
print ("Missing values per column:")
print (df.apply(num_missing, axis=0),'\n')
PythonAs you can see, there were no missing values.
Did random oversampling for handling imbalanced classes.
Applied standard scalar.
A standard scaler transforms your data so that each feature has:
- A mean of 0
- A standard deviation of 1
This process is called standardization.
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
PythonMethodology and Modeling
Used stratified split for separating training and testing data.
#train and split the data
X_train, X_test, Y_train, Y_test = train_test_split(X_ros , Y_ros, test_size=0.25, random_state =789,stratify=Y_ros)
PythonUsed an MLP(Multilayer Perceptron) Classifier.
mlp = MLPClassifier(hidden_layer_sizes=(11,9,7),activation='tanh',solver='adam',max_iter=1000,verbose=True,random_state=789,batch_size=200)
mlp.fit(X_train,Y_train)
PythonThis MLP Classifier classifies the data obtained from different wine samples. The classifier learns using 10 features from the collected data. We have seven classes based on the quality of wine samples(3 to 9).
Trained the model multiple times by changing parameters like activation function, optimizer, etc.
Plotted the loss curve while training the model.
plt.plot((mlp.loss_curve_))
plt.title("Loss over epochs", fontsize=14)
plt.xlabel('Epochs')
plt.ylabel('Loss')
PythonMade predictions with test data.
predictions = mlp.predict(X_test)
PythonResults and Evaluation
actr=accuracy_score(Y_train,mlp.predict(X_train))
print(actr*100,"%")
PythonTraining accuracy: 72.8%
act=accuracy_score(Y_test, predictions)
print(act*100,"%")
PythonTesting accuracy: 69.9%
Used confusion matrix to summarize the model’s performance,
m=(confusion_matrix(Y_test,predictions))
Pythonprint(classification_report(Y_test,predictions))
PythonLibraries
- NumPy
- Pandas
- imblearn
- sklearn
- Matplotlib