Lab Walkthrough: Machine Learning Classifier
Unit 8: Machine Learning Foundations (Capstone) — Lab Walkthrough
|
Use this walkthrough only after you have attempted the lab yourself. Academic integrity requires that your submitted notebook represents your own original work. Use this walkthrough to understand why each step works and to check your approach — not to copy code. Understanding the reasoning behind each decision is far more valuable than having working output. |
Overview
This walkthrough provides a complete solution to the Week 8 Machine Learning Classifier Lab. We will train and evaluate Decision Tree and k-NN classifiers on the wine quality dataset, compare performance, diagnose overfitting, and make a justified algorithm recommendation.
Part 1: Data Loading and Exploration
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import (accuracy_score, precision_score,
recall_score, f1_score, confusion_matrix)
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
# Load the dataset
df = pd.read_csv('wine_quality.csv')
print("First 5 rows:")
print(df.head())
print("\nDataset info:")
df.info()
print("\nMissing values:", df.isnull().sum().sum())
# Create binary target
df['good_quality'] = (df['quality'] >= 6).astype(int)
print("Class distribution:")
print(df['good_quality'].value_counts())
print("\nAs percentages:")
print(df['good_quality'].value_counts(normalize=True).mul(100).round(1))
Expected output:
Class distribution: 0 614 (61.4% -- normal quality) 1 386 (38.6% -- good quality)
Part 2: Train-Test Split
X = df.drop(['quality', 'good_quality'], axis=1)
y = df['good_quality']
X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.2,
random_state=42,
stratify=y # preserve class proportions in both splits
)
print(f"Training set: {X_train.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")
print(f"\nTrain class distribution:\n{y_train.value_counts()}")
print(f"\nTest class distribution:\n{y_test.value_counts()}")
Expected output:
Training set: 800 samples Test set: 200 samples Train class distribution: 0 491 1 309 Test class distribution: 0 123 1 77
The stratify=y parameter is essential here.
Without it, random chance could produce a test set that has 80% normal quality wines, making the evaluation unrepresentative.
Stratified splitting ensures both sets reflect the original 61.4% / 38.6% class balance.
Part 3: Decision Tree Classifier
# Train default (unlimited depth) decision tree
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)
y_pred_dt = dt_model.predict(X_test)
# Compute metrics
print(f"Accuracy: {accuracy_score(y_test, y_pred_dt):.4f}")
print(f"Precision: {precision_score(y_test, y_pred_dt):.4f}")
print(f"Recall: {recall_score(y_test, y_pred_dt):.4f}")
print(f"F1 Score: {f1_score(y_test, y_pred_dt):.4f}")
# Confusion matrix
cm = confusion_matrix(y_test, y_pred_dt)
print(f"\nConfusion Matrix:\n{cm}")
print(f"TN={cm[0,0]} FP={cm[0,1]}")
print(f"FN={cm[1,0]} TP={cm[1,1]}")
Typical output:
Accuracy: 0.7150 Precision: 0.6667 Recall: 0.5974 F1 Score: 0.6301 Confusion Matrix: [[102 21] [ 31 46]] TN=102 FP=21 FN=31 TP=46
Depth Experiments
depths = [3, 5, 10, None]
print(f"{'Depth':<10} {'Accuracy':<12} {'F1 Score':<10}")
print("-" * 35)
for depth in depths:
dt = DecisionTreeClassifier(max_depth=depth, random_state=42)
dt.fit(X_train, y_train)
y_pred = dt.predict(X_test)
acc = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
label = str(depth) if depth else 'None (unlimited)'
print(f"{label:<10} {acc:.4f} {f1:.4f}")
Typical output:
Depth Accuracy F1 Score ----------------------------------- 3 0.7100 0.6370 5 0.7200 0.6444 10 0.7200 0.6444 None 0.7150 0.6301
Part 4: K-Nearest Neighbors Classifier
knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train, y_train)
y_pred_knn = knn_model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred_knn):.4f}")
print(f"Precision: {precision_score(y_test, y_pred_knn):.4f}")
print(f"Recall: {recall_score(y_test, y_pred_knn):.4f}")
print(f"F1 Score: {f1_score(y_test, y_pred_knn):.4f}")
Typical output:
Accuracy: 0.7350 Precision: 0.6892 Recall: 0.6623 F1 Score: 0.6755
k Value Experiments
k_values = [1, 3, 5, 10, 20]
print(f"{'k':<6} {'Accuracy':<12} {'F1 Score':<10}")
print("-" * 30)
for k in k_values:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
acc = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print(f"{k:<6} {acc:.4f} {f1:.4f}")
Typical output:
k Accuracy F1 Score ------------------------------ 1 0.6950 0.6588 3 0.7250 0.6623 5 0.7350 0.6755 10 0.7400 0.6887 20 0.7300 0.6797
Part 5: Model Comparison and Overfitting Analysis
# Retrain with best hyperparameters
best_dt = DecisionTreeClassifier(max_depth=5, random_state=42)
best_dt.fit(X_train, y_train)
y_pred_best_dt = best_dt.predict(X_test)
best_knn = KNeighborsClassifier(n_neighbors=10)
best_knn.fit(X_train, y_train)
y_pred_best_knn = best_knn.predict(X_test)
# Training accuracy for overfitting analysis
train_acc_dt = accuracy_score(y_train, best_dt.predict(X_train))
train_acc_knn = accuracy_score(y_train, best_knn.predict(X_train))
test_acc_dt = accuracy_score(y_test, y_pred_best_dt)
test_acc_knn = accuracy_score(y_test, y_pred_best_knn)
print("Model Comparison (Test Set)")
print("=" * 60)
for name, y_pred in [("Decision Tree (depth=5)", y_pred_best_dt),
("k-NN (k=10)", y_pred_best_knn)]:
print(f"\n{name}")
print(f" Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f" Precision: {precision_score(y_test, y_pred):.4f}")
print(f" Recall: {recall_score(y_test, y_pred):.4f}")
print(f" F1 Score: {f1_score(y_test, y_pred):.4f}")
print("\nOverfitting Analysis")
print("=" * 60)
print(f"Decision Tree -- Train: {train_acc_dt:.4f} Test: {test_acc_dt:.4f} Gap: {train_acc_dt-test_acc_dt:.4f}")
print(f"k-NN -- Train: {train_acc_knn:.4f} Test: {test_acc_knn:.4f} Gap: {train_acc_knn-test_acc_knn:.4f}")
Typical output:
Model Comparison (Test Set) ============================================================ Decision Tree (depth=5) Accuracy: 0.7200 Precision: 0.6667 Recall: 0.6234 F1 Score: 0.6444 k-NN (k=10) Accuracy: 0.7400 Precision: 0.7027 Recall: 0.6753 F1 Score: 0.6887 Overfitting Analysis ============================================================ Decision Tree -- Train: 0.7363 Test: 0.7200 Gap: 0.0163 k-NN -- Train: 0.7725 Test: 0.7400 Gap: 0.0325
Part 6: Sample Reflection Answers
Q1: Which model performed better, and why?
K-NN with k=10 outperformed the Decision Tree on every metric (74.0% vs 72.0% accuracy; 68.9% vs 64.4% F1). This makes intuitive sense for wine quality: wines with similar chemical compositions tend to be similar in quality. K-NN naturally exploits this by looking directly at similar training examples. The Decision Tree makes hard axis-aligned cuts that may miss nuanced chemical relationships.
Q2: Did you observe overfitting?
No severe overfitting.
The Decision Tree had only a 1.6% training-test gap; k-NN had a 3.3% gap.
Both gaps are small.
This is because we constrained complexity: max_depth=5 prevented the decision tree from growing arbitrarily deep, and k=10 smoothed out k-NN’s sensitivity to individual training points.
Q3: Which hyperparameter settings worked best?
max_depth=5 for the Decision Tree; k=10 for k-NN.
Both represent the "just right" balance: complex enough to learn real patterns, simple enough to avoid memorizing noise.
Going simpler (depth=3, k=20) produced underfitting; going more complex (unlimited depth, k=1) produced overfitting.
Q4: Recommendation for the wine quality problem?
I would recommend the Decision Tree despite its slightly lower accuracy, for this application context. Winemakers and quality managers need to understand why a wine was rated good or poor in order to act on that information. A decision tree gives them explicit, interpretable rules: "IF alcohol > 10.5 AND volatile_acidity < 0.4 THEN good quality." This is actionable. A k-NN prediction — "the 10 most similar wines in our database said good quality" — is much harder for a domain expert to verify or act on. For a 2% accuracy tradeoff, interpretability is worth it in this context.
Key Learnings from This Lab
You experienced the complete professional ML workflow: load, explore, split, train, tune, evaluate, compare.
Five key takeaways:
-
Accuracy alone misleads — on imbalanced data, always check precision, recall, and F1.
-
Overfitting is real and diagnosable — compare training vs. test accuracy; a large gap is a red flag.
-
Hyperparameter tuning matters — 5—6% accuracy differences from tuning depth and k are typical.
-
No single best algorithm — k-NN won on accuracy; Decision Tree wins on interpretability. "Best" depends on the problem constraints, not just the numbers.
-
The gap is more informative than training accuracy — a small gap between training and test is a better sign of generalization than a high training score.
Lab code uses scikit-learn, BSD License.
Wine Quality dataset from the UCI Machine Learning Repository, CC BY 4.0.
This work is licensed under CC BY-SA 4.0.