Single Frame CNN

Idea

We treat each frame of the video as independent images. Then, we train them on 2D CNN then average the prediction to get our final answer

Problem

This method completely ignore the temporal dimension of our input video

It comes out that single-frame CNN works pretty well on the video classification task, thus it becomes a strong baseline for video classification