- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

# What is the K-nearest neighbors algorithm?

A k-nearest-neighbors algorithm is a classification approach that does not create assumptions about the structure of the relationship among the class membership (Y) and the predictors X_{1}, X_{2},…. X_{n}.

This is a nonparametric approach because it does not include estimation of parameters in a pretended function form, including the linear form pretended in linear regression. This approach draws data from similarities among the predictor values of the data in the dataset.

The concept in k-nearest-neighbors methods is to recognize k records in the training dataset that are the same as the new data that it is required to classify. It can use these similar (neighboring) records to define the new record into a class, creating the new data to the predominant class between these neighbors. It indicates the values of the predictors for this new record by X_{1}, X_{2},…. X_{n}.

A central question is how to calculate the distance between data depending on their predictor values. The well-known measure of distance is the Euclidean distance. The Euclidean distance among two records (X_{1}, X_{2},…. X_{n}) and (U_{1}, U_{2},…. U_{n}) is

$$\mathrm{\sqrt{(X_1-U_1)^2+(X_2-U_2)^2+...+(X_n-U_n)^2}}$$

The k-NN algorithm depends on several distance computations (between each data to be forecasted and each data in the training set), and hence the Euclidean distance, which is computationally inexpensive, is the most popular in k-NN.

It can balance the scales that the several predictors can have, in most cases, predictors must be standardized before calculating a Euclidean distance. The means and standard deviations that can standardize new data are those of the training data, and the new data is not involved in calculating them. The validation data, such as new data, are also not involved in this calculation.

After calculating the distances between the data to be defined and current records, it is required a rule to assign a class to the record to be classified, depending on the classes of its neighbors.

The simplest case is k = 1, where we look for the closest data (the nearest neighbor) and classify the new data as belonging to the equal class as its closest neighbor.

It is an extraordinary fact that this simple, perceptive concept of using a single nearest neighbor to classify records can be strong when we have multiple records in the training set. It changes out that the misclassification error of the 1-nearest neighbor design has a misclassification rate that is no more than twice the error when it can understand accurately the probability density functions for each class.

- Related Questions & Answers
- C++ Program to Implement Nearest Neighbour Algorithm
- What is the CART Pruning Algorithm?
- What is the C5 Pruning Algorithm?
- What is the Blowfish encryption algorithm?
- How does the k-means algorithm work?
- How efficient is the k-medoids algorithm on large data sets?
- What is Dijikstra Algorithm?
- What is Parallel Algorithm?
- What is RIPPER Algorithm?
- What is Backpropagation Algorithm?
- What is Apriori Algorithm?
- What is algorithm for computing the CRC?
- What is the Uses of MD5 Algorithm?
- What is the BLAST Local Alignment Algorithm?
- What is the complexity of the Apriori Algorithm?