What I Learned Fine-Tuning a Deep Learning Model for a Kaggle Competition

I recently participated in a Kaggle competition where the task was to detect AI-generated images vs. human-created ones. This story isn’t about my journey to becoming a Kaggle Grandmaster (though I do plan to be one someday 😉). It’s about what I learned while fine-tuning deep learning models — starting with ResNet18 and eventually switching to ResNet50 and later to EfficientNet — for an image classification challenge that pushed me outside my comfort zone in the best possible way.

What Was the Problem?

The challenge was hosted by Women in AI (WAI). The dataset for this challenge, provided by Shutterstock and DeepMedia, combines authentic and AI-generated images to create a robust foundation for training and evaluation. Authentic images are sourced from Shutterstock’s platform, including a balanced selection where one-third of the images feature humans. These are paired with their AI-generated counterparts, created by DeepMedia using state-of-the-art generative models. This pairing ensures a direct comparison between real and AI-generated content, enhancing model training and enabling the development of robust image authenticity detection systems.

The goal is to build a model that can accurately detect whether an image was generated by AI or created by a human. This problem is a binary classification task. Simple in words, not so much in practice.

The dataset consisted of:

  • ~80,000 training images, evenly split between AI-generated and human-generated.
  • A test set without labels, which would be evaluated privately by Kaggle.

The catch? AI-generated images have become incredibly realistic, sometimes more perfect than human-made ones. So I knew the model had to go beyond surface-level features to generalize well.

A transfer learning approach using a pre-trained model (like ResNet-50 or EfficientNet) fine-tuned on the dataset is more efficient and likely to perform better than training a CNN from scratch here, since the dataset is large and the generated images are somewhat realistic.

Prepping the Data

The images were referenced in a train.csv file, so I built a custom PyTorch Dataset class to load them and apply transformations.

For training:

I used augmentations to help the model generalize better:

transforms.Compose([
    transforms.Resize((224, 224)), # Resize images to 224x224 (common for models like ResNet)
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.ToTensor(), # Convert images to PyTorch tensors
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]) 
])
Python

For validation and test:

No augmentations, just resizing and normalization to keep it consistent. Splitting the data into train and validation sets (80/20) helped me evaluate generalization clearly.

Starting Small: ResNet18

I kicked off the project with ResNet18, a lightweight model that’s fast, simple, and easy to debug.

Here’s why that worked well:

  • Quick to train on Kaggle’s free GPU
  • Ideal for testing the entire training loop and preprocessing pipeline
  • Surprisingly good performance (~88–91% validation accuracy)

Setup:

model = models.resnet18(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 2)
Python

I froze the earlier layers and trained just the final fc layer. Once everything ran smoothly and gave promising results, I was ready to go deeper.

Loss Function & Optimizer

CrossEntropyLoss:

Suitable for multi-class classification (even though we have 2 classes).

AdamW Optimizer:

Helps with regularization and improves convergence.

Trained for 5 epochs with a batch size of 32.

Check out the notebook.

Scaling Up: ResNet50

With the base pipeline working, I switched to ResNet50 for more feature extraction power.

What I changed:

  • Unfroze layer4 and the fc layer
  • Added dropout for regularization
  • Used AdamW optimizer with weight_decay and a StepLR scheduler
  • Implemented early stopping

Result

  • Validation accuracy peaked at 99.59%.
  • Consistently high performance across multiple epochs.
  • This became my strongest model overall.

Check out the notebook.

Switching to EfficientNet-B0

EfficientNet-B0 is lightweight and efficient, so I gave it a shot to see if it could match or beat ResNet50.

My EfficientNet Setup:

from torchvision.models import efficientnet_b0, EfficientNet_B0_Weights

# Load pre-trained EfficientNet-B0
model = efficientnet_b0(weights=EfficientNet_B0_Weights.DEFAULT)

# Modify the classifier to match the number of classes (2 in this case)
num_ftrs = model.classifier[1].in_features
model.classifier[1] = nn.Linear(num_ftrs, 2)
Python

I unfreezed the last three feature blocks.

Result

  • Validation accuracy: 99.27%
  • Faster than ResNet50
  • Slightly underperformed compared to ResNet50, but was still impressive for its size

While it didn’t outperform ResNet50 in my setup, I can see it being useful in production scenarios where inference time and model size matter more.

Check out the notebook.

What Worked & What Didn’t

What Worked:

  • Starting small with ResNet18 to test ideas quickly
  • Using EfficientNet-B0 for better generalization
  • Dropout + weight decay + AdamW = less overfitting
  • Early stopping and LR scheduling
  • Tracking precision, recall, and F1 (not just accuracy)

What Didn’t Help:

  • Increasing image size (more cost, no gain)
  • Training all layers from the start, overfitting happened fast

What I’d Try Next Time

  • Use test-time augmentation (TTA) to improve predictions
  • Combine models with ensembling
  • Try ConvNeXt or Vision Transformers
  • Use differential learning rates (higher for classifier, lower for backbone)
  • Log precision/recall during training — not just at the end

Final Thoughts

Fine-tuning pre-trained models is both an art and a science. I learned more from debugging weird results, failed submissions, and last-minute fixes than from any tutorial. This project wasn’t just about improving validation accuracy, it was about building intuition, confidence, and resilience.

If you’re new to competitions, I’d 100% recommend starting with transfer learning. And if you’re like me, juggling a bunch of passions and projects, just remember: one model, one tweak, one submission at a time 💪

You can support me in my journey to becoming a Kaggle grandmaster by going to my Kaggle profile and upvoting some of my notebooks😄.

Check out the notebooks on my Kaggle profile.

Leave a Reply

Your email address will not be published. Required fields are marked *