hello everyone welcome to my channel in

this particular video we will talk about one of the popular machine learning

algorithm called KNN K nearest neighbor where K represents our number 1

to n and the algorithm means that I am as good as my K neighbors ok

so by the end of this particular video you will understand what is KNN and why

you should make use of it now before I go ahead just a reminder

please subscribe to my youtube channel and press the bell icon you can also

follow me on the social media I have provided the link in the description ok

so let’s start now kNN or K nearest neighbor is a

supervised learning algorithm you know that there are three types of algorithm

called supervised learning unsupervised learning and reinforcement learning

reinforcement learning is sometimes also called semi-supervised learning now

supervised learning is of two types classification and regression linear

regression is regression type and km in its classification type now just to

remember that KNN can also be used for regression but in this particular

video we will talk only about classification because that’s the

primary area where KNN is used so let’s go ahead and understand it now to

understand this particular algorithm let’s come up with a problem here is the

problem statement the problem statement states that let’s assume that a group of

people are traveling to London for a marathon

ok now let’s also assume that we have the data of the participants till last

here okay now the problem state that based on the distance they traveled just

the continent they are arriving from now the problem is simple we have to get the

number of kilometers or miles they traveled and we need to guess the

continent they are coming from now here is how our data looked like which we

have collected till last year so we have collected data from 2000 kilometer to

10,000 plus kilometer in east-west north-south direction ok so this graph

is self explanatory ok now let’s move further and let’s

try to classify the people based on the continents okay now what we can do is

that we can safely assume that people from two thousand to five thousand

kilometer east side are coming from Europe now one important thing when I am

talking about east of London East means if you open the Google map right side of

you will be east the left side of you will be west okay this is how we can

relate this now again one disclaimer that this is a totally imaginary data it

is only for learning purpose and it may not accurately represent the continents

okay now two thousand to five thousand kilometer in East Side Europe beyond

that Asia beyond that Australia in West Side it will be either North America or

South America in North and South within two thousand kilometer it will still be

Europe in south ranging from eight to ten thousand kilometer beyond is Africa

so this is the data which we have now let’s try to solve our problem what will

happen that if somebody comes and ask me please tell my continent okay this is

the problem we need to solve so let’s first try K equal to one

remember in the beginning I told you that K is a number number of neighbors k

equal to one means find the first nearest data point that is the person

matching the distance traveled okay so from the visual reference you can see

that the nearest person is in Africa and this person okay so looks like things

are simple the result is that this person is from Africa or traveling from

Africa now what happens when we change K is equal to two in this case we need to

consider two nearest data points so let’s consider two nearest data point

from this person first data point is in Africa also from visually you can see

that the second nearest person is also in Africa no issue both neighbors are

from Africa which means that this person is also traveling from Africa that’s the

result now with k equal to two consider this person

this person wants you to tell which continent he is traveling from but there

is a catch this person is equidistant from the two nearest person both are in

different continent so you can see that person in Africa is also in equidistance

person in North and South America is also equidistant so what will be your

choice what will your algorithm says that

whether this person belongs to or traveling from Africa or North South

America well there is no clear answer you can guess it you can generate some

random number generate some divider or something like that and gets get out

some value which can represent either Africa or North South America now the

problem is even numbers with two options are bad because in that case there are

no clear decision points so you can think that okay let’s not use even

number that that’s a good assumption you know even numbers you will always get

this problem even if you are getting two 2 three 3 four 4 what will you decide ok

so let’s not use even number for now ok now let’s try to solve this problem by

using K equal to 3 which means that we have to consider three nearest data

point so from this particular picture if you try to find a third nearest person

that 3rd nearest person also resides in North or South America so in this

particular case with the voting majority wins and that’s how KNN works if there

are more than one options the voting decides which particular neighbor it

belongs to in this particular case this is actually belonging to North or South

America so now you understand how KNN works and why you should not use even

number you can use odd numbers is it let’s see one more example in this case

it is North South America but if some person like this ask you to find the

his or her continent now the problem is that this person is equidistance from

3 neighbor all three of them are different continue

one is in Africa one is in North South America one is in Europe now we are back

to square one you got into the same issue even if we use odd numbers so if

you are getting into this kind of issue that means your K value is not

optimized you have to optimize your K value based on your data set and this is

what you need to remember when you decide your K value K people generally

randomly select 1 2 5 10 15 20 but you have to understand that if any of your

data points getting into these kind of situation then your K value is not

correct okay now KNN is easy to implement okay

I will not recommend you to implement and use some machine learning algorithm

library called scikit-learn or any other library if you want to really make use

of your data to find valuable information but if you want to just try

out your programming skills this is the best algorithm to start with if you are

just learning and want to brush up your skills and see how your programming

knowledge go with that just think about implementing this

algorithm other thing is that it takes more memory because everything is has to

be loaded into the memory so there is a limitation beyond which KNN will not

work and you know we talked about distance between two data points to

consider which is the nearest neighbor in KNN we can use multiple ways of

calculating the distance some of them are Euclidean Distance Hamming

distance Manhattan distance minkowski distance in general people use

Euclidean distance but you should try each one of them to see where you get

the best results so thanks a lot guys this is what about KNN and I hope I was

able to explain the idea behind KNN algorithm in the best possible way

please subscribe for programming machine learning and cloud computing videos

thanks a lot thanks for watching good day