Wine Classification using Neural Networks

This project aimed to build a classifier to sort wine into seven different classes based on their quality. We chose MLP Classifier for the task since this project was a part of my Neural Networks course. The classifier learns using 10 features from the collected data.

Data Collection and Preprocessing

Data Source: Wine Quality Dataset from UCI Machine Learning Repository

Sample data:

Data Preprocessing

Checked for missing value.

#identify nans
def num_missing(x):
    return sum(x.isnull())
#Applying per column:
print ("Missing values per column:")
print (df.apply(num_missing, axis=0),'\n')

Python

As you can see, there were no missing values.

Did random oversampling for handling imbalanced classes.

Applied standard scalar.

A standard scaler transforms your data so that each feature has:

A mean of 0
A standard deviation of 1

This process is called standardization.

scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Python

Methodology and Modeling

Used stratified split for separating training and testing data.

#train and split the data
X_train, X_test, Y_train, Y_test = train_test_split(X_ros , Y_ros, test_size=0.25, random_state =789,stratify=Y_ros)

Python

Used an MLP(Multilayer Perceptron) Classifier.

mlp = MLPClassifier(hidden_layer_sizes=(11,9,7),activation='tanh',solver='adam',max_iter=1000,verbose=True,random_state=789,batch_size=200)
mlp.fit(X_train,Y_train)

Python

This MLP Classifier classifies the data obtained from different wine samples. The classifier learns using 10 features from the collected data. We have seven classes based on the quality of wine samples(3 to 9).

Trained the model multiple times by changing parameters like activation function, optimizer, etc.

Plotted the loss curve while training the model.

plt.plot((mlp.loss_curve_))
plt.title("Loss over epochs", fontsize=14)
plt.xlabel('Epochs')
plt.ylabel('Loss')

Python

Made predictions with test data.

predictions = mlp.predict(X_test)

Python

Results and Evaluation

actr=accuracy_score(Y_train,mlp.predict(X_train))
print(actr*100,"%")

Python

Training accuracy: 72.8%

act=accuracy_score(Y_test, predictions)
print(act*100,"%")

Python

Testing accuracy: 69.9%

Used confusion matrix to summarize the model’s performance,

m=(confusion_matrix(Y_test,predictions))

Python

print(classification_report(Y_test,predictions))

Python

Libraries

NumPy
Pandas
imblearn
sklearn
Matplotlib

Code

Data Collection and Preprocessing

Methodology and Modeling

Results and Evaluation

Libraries

Related Projects

BP Prediction from ECG data

Research Paper Recommendation System

Leave a Reply Cancel reply