Motivation

ImageNet, CIFAR10, … are all image dataset, why do we need to train a new model for every new dataset?

We know that before the last layer, the network is just doing feature extraction, I think if we get rid of last layer, we can treat the network as “feature extractor” and just train the last classification layer!

This is why we come up with “feature extraction” and “fine-tuning”

Feature Extraction

We remove the last layer of the network, freeze the hyperparameters in the other parts of the network, then train a new last layer with the new dataset

Fine-Tuning

Method

With large dataset, instead of only retrain the last layer, we can continue training the entire model or few layers on the new dataset

Tricks

Train with feature extraction first before fine-tuning
Lower the learning rate, commonly make it 1/10
Sometimes freeze lower layers to save computation resource - low layers tends to extract low level characteristics which share between different datasets

Downstream tasks in image classification like object detection may wrap image classification models, and we'll fine-tune the model before putting into other tasks

When to use them?

	Dataset Similar to ImageNet	Data very different from ImageNet
very little data (10s to 100s)	Use Linear Classifier on top layer	You’re in trouble… Try linear classifier from different stage
quite lots of data (100s to 1000s)	Finetune a few layers	Finetune a larger number of layers

The upper right means we can try to put the last layer (classification layer) in different stage of the old network to find the best place to insert last layer

Chilfox

目錄

D-DL4CV-Lec11e-TransferLearning

Motivation

Feature Extraction

Fine-Tuning

Method

Tricks

When to use them?

關係圖譜

反向連結