Machine Learning - KNN(K-Nearest Neighbor)

KNN

(K-Nearest Neighbor)

다음처럼 카테고리가 레이블링 되어 있는 데이터가 존재합니다.

새로운 데이터가 생겼을때, 이를 어디로 분류해야할까요?

왜 빨간색으로 분류를 했을까요?

KNN 알고리즘

내 주위에 몇개의 이웃을 확인해 볼 것인가를 결정한다. = K

새로운 데이터가 발생 시, Euclidean distance에 의해서,

가장 가까운 K개의 이웃을 택한다.

K개의 이웃의 카테고리를 확인한다.

카테고리의 숫자가 많은 쪽으로,

새로운 데이터의 카테고리를 정해버린다.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

X, y 분류하기

X = df.loc[: , 'Age':'EstimatedSalary']
y = df['Purchased']

Feature Scaling

from sklearn.preprocessing import StandardScaler, MinMaxScaler

scaler = StandardScaler()
X = scaler.fit_transform(X)

학습용과 테스트용으로 나누기

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X , y , test_size= 0.2 , random_state= 3)

KNN으로 모델링

from sklearn.neighbors import KNeighborsClassifier

classifier = KNeighborsClassifier()
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

Confusion_matrix

from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

cm = confusion_matrix(y_test, y_pred)
cm

accuracy_score(y_test, y_pred)

print(classification_report(y_test, y_pred))

import seaborn as sb

plt.figure(figsize=(10,8))
sb.heatmap(data= cm , cmap = 'RdPu', annot = True, fmt = '.1f', linewidths=0.5)
plt.show()

Machine Learning - K-Means Clustering (wcss & elbow method) (0)	2022.05.09
Machine Learning - Logistic Regression (이상한 값을 NaN으로 처리하기) (0)	2022.05.08
Machine Learning - Logistic Regression & Confusion Matrix (0)	2022.05.07
Machine Learning - New data Predicting [신규 데이터 예측하기( np.array)] (0)	2022.05.07
Machine Learning - Multiple Linear Regression(여러개의 수치데이터 예측) (0)	2022.05.07

DevOps Studio