Implementation of K-nearest Neighbor Classification Model
Background
Installing Panda, numpy, scipy, sklearn libraries
- Go to terminal in Pycharm
- Type pip install pandas
- Or you can install all by going to Settings -> Project Interpreter -> +-> sklearn -> Install Package
KNN- Classifier
In pattern recognition and machine learning, k-nearest neighbors (KNN) is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g. distance). KNN is a non-parametric method where the input consists of the k closest training examples in the feature space. The output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
KNN Classification Model
The algorithm for KNN classification model is given below:
Algorithm: KNearestNeighbors |
Input: Training data X Training data labels Y Sample x to classify Output: Decision yp about sample x fori <-– 1 to m do Compute distance between training sample Xi and unlabeled sample x i.e. d(Xi, x) end for Compute set I containing the indices for the k smallest distances d(Xi, x) Compute the decision class yp by measuring the majority label Y from I return yp |
Example
Fit does training on Training Data (X_train, Y_Train)
Predict does testing on Test data (X_test) and predicts outputs
Task 1
Write a program to implement KNN classifier and classify given vector. (for k = 3)
Age | Loan | Class (Defaulter) |
25 | 40000 | N |
35 | 60000 | N |
45 | 80000 | N |
20 | 20000 | N |
35 | 120000 | N |
52 | 18000 | N |
23 | 95000 | Y |
40 | 62000 | Y |
60 | 100000 | Y |
48 | 220000 | Y |
33 | 150000 | Y |
48 | 142000 | ? |
Task 2
Modify Task 1 and perform it for all odd values of k from 1 to 10.
- Find accuracy for each value of k and display.
- Find the best k which gives highest value of accuracy
- Also compute confusion matrix for the best value of k.
Task 3
Implement KNN algorithm yourself in python for Iris Dataset without using built-in KNN classifier library.
- Load dataset
- Split dataset into test and train sets
- Perform KNN algorithm to make predictions for k=5
- Compute accuracy and confusion matrix
Solution:
Check solution here Python Learning Week 8 (Solution)