Handling Imbalanced Data: SMOTE, Class Weights, and Better Metrics

Table of Contents

Most real datasets are imbalanced: 99% “normal” transactions, 1% fraud. Standard accuracy lies (99% by predicting all normal). Here’s how to build models that work when classes aren’t equal.​

Why imbalance breaks Machine Learning

Problems:

  • Models ignore rare class (easy 99% accuracy).
  • Threshold at 0.5 biases toward majority.
  • Evaluation metrics hide poor minority performance.​

Solution domains: Resampling, cost‑sensitive learning, better metrics.

Method 1: Resampling strategies

Undersampling: Remove majority samples → balanced but less data.
Oversampling: Duplicate minority → overfitting risk.

SMOTE (Synthetic Minority Oversampling):

  • Find k nearest minority neighbors.
  • Generate synthetic samples along line segments.
  • Preserves local structure better than duplication.​

Method 2: Algorithm tweaks

Class weights: Penalize majority errors more.

sklearn: class_weight=’balanced’

XGBoost: scale_pos_weight = neg/pos ratio

Ensemble: Undersample /boost on different splits.

Method 3: Threshold Tuning + Metrics

Key metrics:

  • Precision/Recall trade‑off (PR curve > ROC for imbalance).
  • F1 score: Harmonic mean, punishes imbalance.
  • AUC‑PR: Area under precision‑recall curve.

Tune threshold on validation for business cost (FP vs FN).​

Example: Detection of Frauds

Dataset: 98% normal, 2% fraud.

Baseline: Predict all normal → Accuracy 98%, Recall 0%

Class weights → Recall 75%, Precision 60%

SMOTE + threshold → Recall 85%, Precision 55%

Pick based on cost: $100 FN vs $10 FP.

Try this: Grab a fraud/credit dataset. Fit 3 models: baseline, class weights, SMOTE. Plot PR curves.

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Now

    All Categories

    Recent Post

    https://youtu.be/_KW9ZKQYtNY?si=wrMtMBnFXZk5IJ3c





































































































                                            UPCOMING BATCHES






                                              https://youtu.be/IoG1WxAKXwg

                                              https://www.youtube.com/watch?v=l9XB4Gwt0H4

                                              https://www.youtube.com/watch?v=71Y_1M0NSoo

                                              https://www.youtube.com/watch?v=yjGQ1g9S-dU&feature=youtu.be

                                              https://www.youtube.com/watch?v=Q_BixayJrHk

                                              https://www.youtube.com/watch?v=LMc1oH5ikpE