Implementation of K-nearest Neighbor Classification Model

Background

Installing Panda, numpy, scipy, sklearn libraries

  • Go to terminal in Pycharm
  • Type pip install pandas
  • Or you can install all by going to Settings -> Project Interpreter -> +-> sklearn -> Install Package

image 18 - Python Learning Week 8

KNN- Classifier

In pattern recognition and machine learning, k-nearest neighbors (KNN) is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g. distance). KNN is a non-parametric method where the input consists of the k closest training examples in the feature space. The output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.

image 19 - Python Learning Week 8

 

KNN Classification Model

The algorithm for KNN classification model is given below:

Algorithm: KNearestNeighbors
Input: Training data X              
Training data labels Y              
Sample x to classify
Output: Decision yp about sample x  
for<-– 1 to m do   
Compute distance between training sample Xi and unlabeled sample x i.e. d(Xi, x) end for  
Compute set I containing the indices for the k smallest distances d(Xi, x)  
Compute the decision class yp by measuring the majority label Y from I
return yp

Example

image 20 - Python Learning Week 8

Fit does training on Training Data (X_train, Y_Train)

Predict does testing on Test data (X_test) and predicts outputs

 

Task 1

Write a program to implement KNN classifier and classify given vector. (for k = 3)

AgeLoanClass (Defaulter)
2540000N
3560000N
4580000N
2020000N
35120000N
5218000N
2395000Y
4062000Y
60100000Y
48220000Y
33150000Y
48142000?

 

 

Task 2

Modify Task 1  and perform it for all odd values of k from 1 to 10.

  • Find accuracy for each value of k and display.
  • Find the best k which gives highest value of accuracy
  • Also compute confusion matrix for the best value of k.

Task 3

Implement KNN algorithm yourself in python for Iris Dataset without using built-in KNN classifier library.

  1. Load dataset
  2. Split dataset into test and train sets
  3. Perform KNN algorithm to make predictions for k=5
  4. Compute accuracy and confusion matrix

Solution:

Check solution here Python Learning Week 8 (Solution)