Training Dynamics

Learning Rate Schedule

This section introduce ways to adjust learning rate while training

Choosing Hyperparameters

This section introduce grid search and random search, which are two ways of deciding which sets of hyperparameters should I choose

Choose Hyperparameters without tons of GPUs

Most of the time, we have limited GPU resources, thus we need a smart way of finding suitable hyperparameters using the smallest amount of time This section the lecturer introduces his way of finding hyperparameters

After Training

Model Ensembles

Sometimes we can you multiple models then average their results to make better predictions. Here we introduce different way to accomplish this medthod

Transfer Learning

We don’t need to train a new model for every new dataset. Instead, we can use the model trained on similar dataset, then just fine-tune the model to fit our new dataset

Distributed Training

When we can use many GPUs, how do we distribute tasks to different GPUs? This section introduce ways we distribute tasks to multiple GPUs to reduce training time