Nearest Neighbor Classifier

Introduction

Training

In the training part we just memorize all data and labels

The time complexity is

Predict

The algorithm find the most similar distance with the input image by computing their distance

The time complexity is , since we compare the input with images

This is bad in complexity: We can afford slow training, but we need fast testing

Distance Metric to compare images

L1 (Manhattan) Distance

where refers to each pixels, and , refer to RGB value of the two images

L2 (Euclidean) Distance

Right choice of distance metric can not only improve your accuracy, but also allow you to apply -nearest neighbors learning algorithm to other types of data besides images

Classification Regions

When the value of pixels in the image are in certain range, the nearest neighbor will be a particular category. We call this range the classification region

Decision Boundaries

There will be boundaries between different classification regions. We called these boundaries the decision neighbors


-Nearest Neighbors

Problem

Sometimes the outliers in the training data will make inappropriate classification regions which will cause error in classification, we need a way to get rid of outliers

Solution

Instead of copying label from the nearest neighbor, we take the majority vote from closest points

However, sometimes we’ll have the same vote for more than one label. In this situation, we’ll use other ways to determine the label for this input


Universal Approximation

Introduction

As training samples approach infinity, the data becomes arbitrarily dense in the input space. This means for any query point , we can find training points arbitrarily close to it. Since the nearest neighbor simply returns function value of the closest training point, and that the closest point can be made arbitrarily close to , the prediction approaches the true function value

Problem: Curse of Dimensionality

As we increase the dimensions in a space, the number of data points required to maintain the same density of coverage grows exponentially