Machine Learning - CountVectorizer (analyzer)

Count Vectorizer

의미없는 구두점도 제거했고, 의미없는 단어도 제거 했으면,

이제는 남아있는 단어들을 숫자로 바꿔줘야 한다.

단어를 숫자로 바꿔주는것을 벡터라이징 이라고 한다.

from sklearn.feature_extraction.text import CountVectorizer

sample_data = ['This is the first document', 'I loved them', 'This document is the second document', 'I am loving you', 'And this is the third one']
vec = CountVectorizer()
X = vec.fit_transform(sample_data)
X = X.toarray()
X

vec.get_feature_names_out()

Analyzer

카운트 벡터라이저의, 애널라이저 파라미터에,

우리가 만든 구두점과 불용어 제거해주는 함수를 셋팅해주면

카운트 벡터라이저가, 알아서 문자열을 깨끗하게 먼저 처리한 후에, 숫자로 바꿔준다.

vec = CountVectorizer(analyzer=message_cleaning)
X = vec.fit_transform( spam_df['text'] )
X = X.toarray()
vec.get_feature_names_out()

vec = CountVectorizer(analyzer=message_cleaning)

문자를 숫자로 바꾸기전에 메세지클리닝을 먼저 적용하라는 의미

X = vec.fit_transform( spam_df['text'] )

fit_transform은 텍스트에 잇는 내용들을

가져와서 정렬하고 컬럼으로 만들어서 숫자로 변경하라는 의미

'Machine Learning' 카테고리의 다른 글

Machine Learning - FaceBook Prophet Library (0)	2022.05.11
Machine Learning - WordCloud Visualizing (0)	2022.05.11
Machine Learning - 구두점 & STOPWORDS(불용어) & Pipe Lining (0)	2022.05.11
Machine Learning - GridSearchCV (0)	2022.05.09
Machine Learning - Word Cloud (Stopwords) (0)	2022.05.09

DevOps Studio

Machine Learning - CountVectorizer (analyzer)

'Machine Learning' 카테고리의 다른 글

티스토리툴바

Machine Learning - CountVectorizer (analyzer)

'Machine Learning' 카테고리의 다른 글

'Machine Learning' Related Articles

티스토리툴바