👨🏻‍🏫IT 활동/인공지능교육 - NLP

[NLP] Day 35 - Topic Clustering

728x90
반응형

1. LSA를 사용해서 Topic Clustering

이전 내용을 기반으로해서 News data에 적용시켜보기

In [1]:
import os 
corpus=[]
for file in os.listdir('./News'):
        if file.startswith('정치'):
            with open('./News/'+file,encoding='utf-8') as f:
                corpus.append([file,f.read()])
In [2]:
len(corpus)
Out[2]:
40
In [3]:
corpus[0][1]
Out[3]:
"\n\n\n\n\n// flash 오류를 우회하기 위한 함수 추가\nfunction _flash_removeCallback() {}\n\n  김부겸 행정안전부장관이 14일 서울 여의도 국회 행정안전위원회에 출석해 얼굴을 어루만지고 있다. [뉴스1]           김부겸 행정안전부 장관이 정부의 개각 인사 발표 방식에 대해 “늘 하던 방식이 아닌 출신고별로 발표하는 발상은 누가 했는지 모르지만, 상당히 치졸하다고 생각한다”며 비판적 태도를 보였다.        김 장관은 14일 국회에서 열린 행정안전위원회 업무보고 오후 질의에서 윤재옥 자유한국당 의원의 질문에 이같이 답했다. 이날 질의는 사실상 자신의 마지막 국회 업무보고다.      윤 의원은 “장관 일곱 분 개각이 됐는데 TK(대구ㆍ경북) 출신은 한 명도 없다”며 “정략적으로 고립화한다는 지역 여론이 있다”고 했다. 또 “출신 지역을 숨기고 출신고를 발표했는데 그 결과 호남 출신은 한 명도 없는 것으로 나왔으나 실제로는 4명이었다”며 “특정 지역이 소외감을 느끼는 불균형 인사는 빨리 시정돼야 한다. 국회로 돌아오면 목소리를 같이 내 달라”고 질의했다.      이에 김 장관은 “대한민국에서 인사를 하면 늘 그런 식으로 평가가 엇갈리게 마련이지만, 그런 측면이 있더라도 한 국가의 인사에 그런 잣대를 들이대는 것은 지나치다”고 답했다. 이에 김 장관은 ‘출신고 기준’ 발표 방식이 치졸하다면서 “앞으로는 제가 국회로 돌아가서 그런 문제에 앞장서겠다”고 말했다.      앞서 지난 8일 문재인 대통령은 진영 의원을 새 행안부 장관에 내정했다. 당시 청와대는 개각 명단을 발표하면서 이번에 처음으로 출신지를 제외하고 출생연도와 출신 고교ㆍ대학 등 주요 학력과 경력만을 공개했다.      문재인 대통령이 지난 8일 7개 부처에 대한 중폭 개각을 단행했다. 왼쪽 위부터 시계방향으로 중소벤처기업부장관에 내정된 박영선 더불어민주당 의원, 행안부장관에 내정된 진영 더불어민주당 의원, 통일부장관에 내정된 김연철 통일연구원장, 국토부장관에 내정된 최정호 전 국토부 2차관, 과기부장관에 내정된 조동호 카이스트 교수, 해수부장관에 내정된 문성혁 세계해사대교수, 문체부장관에 내정된 박양우 전 문화관광부 차관. [사진 청와대]           장관 후보자 중 서울 지역 고등학교 졸업자는 조동호 과학기술정보통신부 장관 후보자(서울 배문고), 진영 행정안전부 장관 후보자(서울 경기고), 문성혁 해양수산부 장관 후보자(서울 대신고), 박영선 중소벤처기업부 장관 후보자(서울 수도여고) 등 4명이다. 김연철 통일부 장관 후보자는 강원 북평고, 박양우 문화체육관광부 장관 후보자는 인천 제물포고, 최정호 국토교통부 장관 후보자는 경북 금오공고를 나왔다. 고등학교 기준으로 하면 서울 4명, 인천 1명, 경북 1명, 강원 1명의 분포다.        그러나 종전의 출생지 기준으로 재분류를 하면 전북이 3명(진영ㆍ조동호ㆍ최정호)이고 광주 1명(박양우), 부산 1명(문성혁), 경남 1명(박영선), 강원 1명(김연철)의 분포가 된다. 청와대 발표에는 안 보이던 호남 출신이 4명이다.        당시 청와대는 “지연 중심 문화를 탈피해야 한다는 데 사회의 공감대가 있다”면서 “출신지라는 게 객관적이지도 않아서 그곳에서 태어나 오랫동안 성장한 사람이 있는가 하면 출생만 하고 성장은 다른 곳에서 해온 분들도 있다. 불필요한 논란을 끌지 않기 위해 이번에 고등학교 중심으로 발표했다”고 설명했다.        한영혜 기자 han.younghye@joongang.co.kr  ▶ 중앙일보 '홈페이지' / '페이스북' 친구추가▶ 네이버 구독 1위 신문, 중앙일보ⓒ중앙일보(https://joongang.co.kr), 무단 전재 및 재배포 금지\n\t\n"
In [4]:
from string import punctuation
# punc = list(“,”)
punc = ['“','”']
punc
for i in punc:
    punctuation+=i
    
punctuation
Out[4]:
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~“”'
In [5]:
corpus2 = list()
for i in range(len(corpus)):
    corpus[i][1] = corpus[i][1].translate(str.maketrans('', '', punctuation))
    corpus[i][1] = corpus[i][1].replace('flash 오류를 우회하기 위한 함수 추가\nfunction flashremoveCallback', '')
    corpus2.append((corpus[i][0],corpus[i][1]))
In [7]:
## 진짜 Noun으로만 DTM 만드는거 
from collections import defaultdict
from konlpy.tag import Kkma

DTM = defaultdict(lambda: defaultdict(int))
dictNoun = list()

for i in range(len(corpus2)):
    for t in Kkma().pos(corpus2[i][1]):
        if len(t[0]) >1 and t[1].startswith('N'):
            DTM[i][t[0]] += 1
            
In [256]:
# DTM
In [9]:
from math import log10

def rawTF(freq):
    return freq

def normTF(freq,totalCount):
    return (freq / totalCount)

def logTF(freq):
    if freq > 0:
        return 1 + log10(freq)
    else:
        return 0

def maxTF(a,freq,maxFreq):   # double normalization K -  doc : 0 / query : 0.5
    return a + ((1-a)* (freq/maxFreq))
In [10]:
def convertInvertedDocument(DTM):
    TDM = defaultdict(lambda: defaultdict(int))
    
    for fileName, termList in DTM.items():  
        maxFreq = max(termList.values())
        for term, freq in termList.items():
            TDM[term][fileName] = maxTF(0,freq,maxFreq)
            
    return TDM
In [145]:
termList = list([] for _ in range(len(corpus2)))
freqList = list([] for _ in range(len(corpus2)))
zero = list([0] for _ in range(2979))
In [60]:
totalword = []
for i in range(len(corpus2)):
    for j,d in DTM[i].items():
        totalword.append(j)
In [146]:
for i in range(len(corpus2)):
    for j,d in DTM[i].items():
        termList[i].append(j)
In [257]:
# termList
In [193]:
coldata = list(set(totalword))
# coldata
In [99]:
coldata = np.array(coldata)
In [107]:
len(termList)
Out[107]:
40
In [194]:
case = list(list(0 for _ in range(len(wordList))) for _ in range(len(docName)))
case = np.array(case)
In [195]:
data = pd.DataFrame(case,columns=coldata)
import tqdm
In [196]:
data
Out[196]:
발언30포스절전두고분산촉구강물평가사이트...핵심이제배제감정비판적문체부일방적메시지보험현종
00000000000...0000000000
10000000000...0000000000
20000000000...0000000000
30000000000...0000000000
40000000000...0000000000
50000000000...0000000000
60000000000...0000000000
70000000000...0000000000
80000000000...0000000000
90000000000...0000000000
100000000000...0000000000
110000000000...0000000000
120000000000...0000000000
130000000000...0000000000
140000000000...0000000000
150000000000...0000000000
160000000000...0000000000
170000000000...0000000000
180000000000...0000000000
190000000000...0000000000
200000000000...0000000000
210000000000...0000000000
220000000000...0000000000
230000000000...0000000000
240000000000...0000000000
250000000000...0000000000
260000000000...0000000000
270000000000...0000000000
280000000000...0000000000
290000000000...0000000000
300000000000...0000000000
310000000000...0000000000
320000000000...0000000000
330000000000...0000000000
340000000000...0000000000
350000000000...0000000000
360000000000...0000000000
370000000000...0000000000
380000000000...0000000000
390000000000...0000000000

40 rows × 2979 columns

In [197]:
for i in tqdm.tqdm_notebook(range(40)):
    for t in tqdm.tqdm_notebook(termList[i]):
        if t in list(data[i:(i+1)].columns):
            data[i:(i+1)][t] += 1
/Users/charming/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
In [198]:
data
Out[198]:
발언30포스절전두고분산촉구강물평가사이트...핵심이제배제감정비판적문체부일방적메시지보험현종
00000000010...0000110000
10100000001...0000000010
20000001000...0000000000
30000000010...0000000001
40000000000...0000000010
50010000110...1000000000
61000000000...0000001000
70100000000...0000000000
80100000000...1000000000
90000001000...1000000000
101000001000...0000000000
110000001000...0100000000
121100000000...1000000000
131000001000...0101000000
140100000010...0000000000
150001000000...0000000100
160000001000...0000000000
170000000000...0000000000
181000000000...0000000000
190000000000...0000000000
200000000001...0000000000
211000000010...0000000100
220000000000...0010000000
230000000000...0000000000
240000000000...0000000000
250000001000...0000000000
261000000000...0000000000
270000000001...0000000000
281000000000...1000000000
291100000000...0000000000
300000100000...0010000100
310100000010...0000000000
321100000000...0000000000
331000000010...0000000000
340000000000...0000000000
350100000000...0000000000
360000010000...1000000000
371000000000...0000000000
381000001000...0000000000
391000100000...0000000000

40 rows × 2979 columns

In [111]:
termList = np.array(termList)
freqList = np.array(freqList)
In [199]:
U,sigma,Vt = np.linalg.svd(data.T,full_matrices=False)
In [202]:
U.shape, sigma.shape, Vt.shape
Out[202]:
((2979, 40), (40,), (40, 40))
In [203]:
_sigma = np.diag(sigma)
_sigma.shape
Out[203]:
(40, 40)
In [211]:
_D = np.round(U.dot(_sigma.dot(Vt)))
pd.DataFrame(_D,index=coldata)
Out[211]:
0123456789...30313233343536373839
발언0.00.00.0-0.00.0-0.01.0-0.0-0.00.0...0.00.01.01.00.0-0.0-0.01.01.01.0
300.01.00.0-0.00.0-0.0-0.01.01.0-0.0...-0.01.01.00.00.01.0-0.00.0-0.0-0.0
포스-0.00.0-0.0-0.0-0.01.00.0-0.0-0.0-0.0...0.0-0.00.0-0.0-0.0-0.00.00.0-0.0-0.0
절전0.00.00.00.00.00.00.0-0.00.00.0...0.00.00.00.00.00.00.00.0-0.00.0
두고-0.0-0.00.0-0.00.0-0.00.0-0.0-0.00.0...1.0-0.00.00.0-0.0-0.0-0.00.0-0.01.0
분산-0.00.00.00.0-0.0-0.00.00.0-0.00.0...-0.00.0-0.00.0-0.00.01.0-0.00.0-0.0
촉구0.00.01.00.00.0-0.0-0.0-0.0-0.01.0...-0.00.00.0-0.00.0-0.0-0.00.01.00.0
강물-0.00.0-0.0-0.0-0.01.0-0.0-0.00.00.0...0.00.00.00.0-0.0-0.0-0.0-0.00.0-0.0
평가1.00.00.01.00.01.00.0-0.0-0.0-0.0...0.01.00.01.00.0-0.0-0.00.00.00.0
사이트0.01.00.0-0.00.0-0.00.00.0-0.0-0.0...0.0-0.00.00.00.0-0.0-0.00.0-0.0-0.0
나발-0.00.0-0.0-0.0-0.01.0-0.00.00.00.0...0.00.00.00.00.00.00.0-0.0-0.00.0
일훈0.00.0-0.0-0.00.0-0.0-0.00.0-0.0-0.0...0.00.00.00.00.0-0.0-0.00.00.00.0
참사0.00.00.0-0.00.0-0.00.00.0-0.00.0...0.00.01.0-0.00.0-0.0-0.0-0.00.00.0
부업0.0-0.00.00.01.0-0.00.0-0.00.00.0...-0.00.0-0.0-0.0-0.00.0-0.0-0.00.0-0.0
30000.00.00.00.00.0-0.00.00.00.0-0.0...-0.0-0.00.00.00.00.0-0.00.00.00.0
강경0.00.0-0.0-0.00.0-0.01.00.0-0.0-0.0...0.00.0-0.00.00.0-0.0-0.00.0-0.0-0.0
경원0.00.00.0-0.00.0-0.01.01.0-0.0-0.0...0.00.00.01.00.0-0.0-0.00.01.01.0
말씀0.0-0.0-0.0-0.01.01.00.0-0.00.00.0...0.0-0.00.00.00.0-0.01.0-0.0-0.0-0.0
보안법0.0-0.00.0-0.00.0-0.00.00.0-0.00.0...0.00.0-0.00.00.0-0.0-0.00.0-0.00.0
채널0.01.01.01.00.0-0.0-0.0-0.01.01.0...0.0-0.01.01.01.0-0.0-0.01.01.00.0
자동차-0.00.0-0.0-0.0-0.01.0-0.00.00.00.0...-0.0-0.00.00.00.0-0.0-0.00.00.00.0
만이0.00.0-0.0-0.00.0-0.01.0-0.0-0.0-0.0...0.0-0.00.0-0.00.0-0.0-0.00.01.0-0.0
민주화-0.00.00.00.0-0.0-0.00.00.0-0.00.0...0.0-0.00.0-0.0-0.0-0.0-0.01.00.0-0.0
신기0.0-0.0-0.0-0.00.0-0.00.0-0.00.00.0...0.0-0.00.00.0-0.0-0.0-0.0-0.0-0.0-0.0
여건0.00.00.0-0.00.0-0.0-0.0-0.00.00.0...0.00.0-0.0-0.00.01.0-0.00.0-0.0-0.0
시베리아0.00.00.00.00.0-0.00.0-0.0-0.0-0.0...-0.00.00.00.0-0.00.0-0.00.00.0-0.0
호통-0.00.00.00.01.0-0.00.0-0.00.0-0.0...0.00.0-0.00.0-0.00.0-0.01.00.0-0.0
시스0.00.0-0.0-0.00.0-0.0-0.0-0.0-0.00.0...0.00.0-0.01.00.0-0.0-0.00.0-0.0-0.0
호치민0.00.00.00.00.0-0.00.0-0.0-0.0-0.0...-0.00.00.0-0.00.0-0.0-0.00.0-0.0-0.0
교통부1.00.00.0-0.00.0-0.00.00.0-0.0-0.0...-0.00.00.0-0.00.0-0.00.0-0.00.00.0
..................................................................
부끄러움0.00.00.00.01.0-0.00.0-0.0-0.00.0...0.0-0.00.00.01.00.0-0.01.00.00.0
가능-0.00.0-0.0-0.00.0-0.0-0.00.01.01.0...-0.00.0-0.00.00.0-0.0-0.00.01.00.0
벤처1.00.00.0-0.00.0-0.00.00.0-0.0-0.0...-0.00.00.0-0.0-0.0-0.0-0.00.0-0.00.0
어제0.0-0.00.0-0.00.0-0.0-0.00.0-0.01.0...0.00.01.00.00.0-0.0-0.00.00.00.0
칭호-0.00.0-0.0-0.00.0-0.00.00.0-0.0-0.0...0.00.0-0.00.00.0-0.0-0.00.00.0-0.0
아이러니0.0-0.00.0-0.00.0-0.0-0.0-0.0-0.00.0...-0.0-0.00.0-0.00.0-0.0-0.0-0.0-0.00.0
기준1.0-0.00.0-0.00.0-0.0-0.0-0.0-0.00.0...-0.0-0.00.0-0.00.0-0.0-0.00.01.00.0
개입0.00.01.0-0.00.0-0.0-0.0-0.0-0.01.0...-0.00.00.0-0.00.00.0-0.00.00.00.0
있음-0.0-0.0-0.01.00.0-0.0-0.0-0.0-0.0-0.0...-0.0-0.0-0.00.00.0-0.0-0.00.00.00.0
매진0.00.00.0-0.01.0-0.00.0-0.01.00.0...0.00.0-0.00.01.01.0-0.00.0-0.00.0
소란0.0-0.0-0.00.01.0-0.00.0-0.00.00.0...-0.0-0.0-0.00.00.00.0-0.0-0.00.00.0
아시아0.0-0.0-0.0-0.00.01.00.0-0.00.00.0...0.0-0.00.00.00.0-0.0-0.00.00.00.0
확립-0.00.0-0.00.00.0-0.0-0.00.0-0.0-0.0...0.00.0-0.0-0.0-0.0-0.0-0.00.00.00.0
균형-0.00.0-0.0-0.0-0.0-0.00.00.0-0.0-0.0...0.00.00.00.0-0.0-0.01.0-0.00.00.0
이달-0.0-0.00.0-0.00.0-0.0-0.0-0.00.0-0.0...0.00.00.0-0.00.00.0-0.00.00.0-0.0
의뢰0.00.01.0-0.00.0-0.00.0-0.0-0.0-0.0...0.00.01.01.00.00.0-0.00.00.00.0
변화-0.0-0.00.0-0.00.0-0.0-0.0-0.00.0-0.0...0.0-0.00.00.00.00.0-0.00.00.01.0
혼자0.00.0-0.0-0.0-0.0-0.00.0-0.0-0.0-0.0...0.00.0-0.01.00.00.0-0.00.0-0.00.0
박지원-0.00.00.00.0-0.0-0.00.00.0-0.00.0...0.00.0-0.00.00.00.0-0.01.00.0-0.0
부분적-0.0-0.0-0.01.00.0-0.0-0.0-0.0-0.0-0.0...-0.0-0.0-0.00.00.0-0.0-0.00.00.00.0
핵심-0.00.0-0.0-0.00.01.00.00.01.01.0...0.00.0-0.00.00.0-0.01.00.00.00.0
이제-0.0-0.0-0.00.00.0-0.0-0.0-0.00.0-0.0...-0.00.00.0-0.00.00.0-0.00.00.0-0.0
배제-0.0-0.0-0.0-0.00.0-0.00.0-0.0-0.0-0.0...1.0-0.00.00.00.0-0.0-0.0-0.0-0.0-0.0
감정0.0-0.0-0.00.00.0-0.00.0-0.0-0.00.0...-0.00.00.0-0.00.0-0.0-0.00.0-0.0-0.0
비판적1.00.00.0-0.00.0-0.00.00.0-0.0-0.0...-0.00.00.0-0.0-0.0-0.0-0.00.0-0.00.0
문체부1.00.00.0-0.00.0-0.00.00.0-0.0-0.0...-0.00.00.0-0.0-0.0-0.0-0.00.0-0.00.0
일방적0.00.0-0.0-0.00.0-0.01.0-0.0-0.0-0.0...0.00.00.0-0.00.0-0.0-0.00.0-0.00.0
메시지0.0-0.00.0-0.00.0-0.00.0-0.0-0.00.0...1.0-0.00.0-0.00.0-0.0-0.00.00.0-0.0
보험0.01.00.0-0.01.0-0.00.0-0.00.0-0.0...-0.0-0.00.00.00.00.0-0.0-0.0-0.00.0
현종-0.0-0.0-0.01.00.0-0.0-0.0-0.0-0.0-0.0...-0.0-0.0-0.00.00.0-0.0-0.00.00.00.0

2979 rows × 40 columns

In [212]:
U.shape
Out[212]:
(2979, 40)
In [214]:
pd.DataFrame(U.dot(_sigma),index=coldata)
Out[214]:
0123456789...30313233343536373839
발언2.369581-0.836695-0.1443840.813850-1.1618210.162323-0.1006780.091714-0.0497720.148860...0.0765900.4593550.066907-0.0135060.0019140.2210400.100225-0.362063-0.7281120.331359
301.2329220.0461060.4036840.3619280.610832-0.1433630.6936650.5658840.1703580.525898...0.367807-0.424507-0.014371-0.4211980.1659030.0862790.0137500.119869-0.260407-0.595102
포스0.1977660.5772850.536741-0.425677-0.366042-0.053780-0.019975-0.007174-0.004273-0.089186...-0.0236130.012388-0.0204550.0074040.010441-0.0000780.011467-0.008349-0.0066700.002912
절전0.1568720.0666060.1417000.0410880.2515850.107481-0.3464180.105462-0.2415830.056318...0.005409-0.037925-0.010387-0.031392-0.008988-0.0344260.0177800.0012820.032712-0.006153
두고0.274984-0.0617510.0492590.0909260.132552-0.1025410.0596630.1532360.008006-0.169971...0.0792980.128989-0.0169940.0110600.1006880.0071990.0652990.0235690.023708-0.027495
분산0.2394030.607510-0.737459-0.0796450.104458-0.0045540.010982-0.0062720.000990-0.048650...-0.0154300.009186-0.0004260.0273390.000319-0.000260-0.0083910.0116440.0087950.000428
촉구1.218676-0.186607-0.0053940.705432-0.459797-0.1851520.5337960.166890-0.2497990.670231...-0.0155290.4071650.0732840.280464-0.034711-0.0075890.1867450.783091-0.2429100.203356
강물0.1977660.5772850.536741-0.425677-0.366042-0.053780-0.019975-0.007174-0.004273-0.089186...-0.0236130.012388-0.0204550.0074040.010441-0.0000780.011467-0.008349-0.0066700.002912
평가1.0629350.5616590.895490-0.3581610.318224-0.063948-0.6318980.5614590.145027-0.047984...0.1811240.2361980.2428520.338859-0.214197-0.026540-0.052922-0.0853090.016330-0.001337
사이트0.430040-0.270124-0.021477-0.1730820.159062-0.600739-0.118443-0.4148300.116138-0.262634...0.1001910.121265-0.1577900.1111940.4284290.302030-0.436008-0.057473-0.0015480.008746
나발0.1977660.5772850.536741-0.425677-0.366042-0.053780-0.019975-0.007174-0.004273-0.089186...-0.0236130.012388-0.0204550.0074040.010441-0.0000780.011467-0.008349-0.0066700.002912
일훈0.179643-0.0363180.0283000.0655910.018432-0.207722-0.086565-0.090929-0.2887290.066226...0.0401270.0216260.025197-0.0345160.0151180.0348500.0290410.022119-0.013406-0.046313
참사0.316409-0.1043570.0106790.058574-0.007663-0.455446-0.0792170.208707-0.2106030.409731...0.1034160.109178-0.0147630.120089-0.1425860.0995030.006561-0.0877820.058818-0.071663
부업0.177889-0.211081-0.075403-0.3130030.0165320.0673520.0145990.048049-0.0902680.042147...-0.4121760.1084970.1998950.074994-0.0626230.1870800.0994740.028052-0.0384590.003520
30000.146436-0.177308-0.064159-0.2671530.0323580.067886-0.0082180.046320-0.092291-0.009298...0.328133-0.099486-0.0191060.0429150.2273960.334372-0.333997-0.020548-0.0212040.020148
강경0.362483-0.063681-0.0153070.398809-0.4600400.153092-0.1377440.016463-0.1420260.056521...-0.0161670.464930-0.0880880.0665770.034248-0.059507-0.0080240.2293840.018940-0.032632
경원1.995896-0.365314-0.0160191.401452-1.140363-0.360019-0.397663-0.643030-0.2839480.354143...0.390465-0.1114280.1487370.0000380.563293-0.1019730.0865190.3754830.328855-0.240790
말씀0.7425640.950088-0.251962-0.805080-0.238978-0.0659280.0927980.037636-0.166707-0.137240...-0.398463-0.0524990.1332330.0995050.1273840.2190810.119312-0.170996-0.043838-0.033214
보안법0.205547-0.0685610.0312430.0986420.089656-0.594459-0.148476-0.5524180.256667-0.233459...-0.0054430.0247050.008247-0.0054030.0150030.014114-0.009206-0.010712-0.0072400.003638
채널3.088727-0.9694240.316349-0.0895230.8744230.204035-0.486886-0.1318720.917133-0.280083...0.1313160.351365-0.8679310.1246070.0862000.063269-0.1482430.507811-0.001623-0.566629
자동차0.1977660.5772850.536741-0.425677-0.366042-0.053780-0.019975-0.007174-0.004273-0.089186...-0.0236130.012388-0.0204550.0074040.010441-0.0000780.011467-0.008349-0.0066700.002912
만이0.710152-0.0473550.0177570.596906-0.5453430.189023-0.4520870.3827820.7686500.632477...-0.1150390.507876-0.081620-0.0457900.129570-0.072574-0.0255570.1842330.034024-0.019336
민주화0.222092-0.294330-0.135534-0.482251-0.0513340.190610-0.087395-0.1925930.1546830.038338...0.0264720.0043240.071936-0.0082090.0813310.1576600.0423450.006350-0.029562-0.012229
신기0.263535-0.0332610.0310350.157818-0.163540-0.0422640.022192-0.027389-0.1249400.032316...0.1032560.040767-0.0986070.0439410.1957170.0106410.0114770.563328-0.000051-0.085494
여건0.1233990.0718240.1100440.0195510.1807130.0941550.328738-0.112077-0.0079910.167470...0.1588700.1668730.550972-0.554074-0.123399-0.055217-0.0124880.0000570.0317170.017503
시베리아0.1568720.0666060.1417000.0410880.2515850.107481-0.3464180.105462-0.2415830.056318...0.005409-0.037925-0.010387-0.031392-0.008988-0.0344260.0177800.0012820.032712-0.006153
호통0.545251-0.670008-0.284991-1.022469-0.0561820.292303-0.113777-0.173441-0.0005050.105931...-0.3526390.1507120.0557260.012635-0.253473-0.258008-0.3696520.0284660.0110990.033373
시스0.525090-0.0494180.0992170.310381-0.0034470.081755-0.2345580.6941470.477752-0.343681...-0.1231970.1233920.091550-0.1947870.0742630.060611-0.0368260.042755-0.0271920.036038
호치민0.1568720.0666060.1417000.0410880.2515850.107481-0.3464180.105462-0.2415830.056318...0.005409-0.037925-0.010387-0.031392-0.008988-0.0344260.0177800.0012820.032712-0.006153
교통부0.144714-0.0554860.067303-0.0730550.026855-0.0978880.050656-0.000477-0.1128080.154227...-0.031898-0.019856-0.1444090.0473830.0526540.044239-0.0180520.029880-0.0269800.013759
..................................................................
부끄러움0.525449-0.660899-0.277153-1.0158970.0116640.309758-0.039148-0.1807310.0316670.063505...-0.3029670.0764890.005942-0.082149-0.0503450.0079790.8471590.022324-0.017174-0.039672
가능0.6729150.1183030.2521020.3575780.3717210.0981560.2878760.2551910.8870200.809933...-0.0854340.117597-0.5442080.105501-0.3460180.400869-0.054870-0.0553930.209869-0.045066
벤처0.144714-0.0554860.067303-0.0730550.026855-0.0978880.050656-0.000477-0.1128080.154227...-0.031898-0.019856-0.1444090.0473830.0526540.044239-0.0180520.029880-0.0269800.013759
어제0.471845-0.1276010.0193840.1125170.051449-0.6285830.0780150.325739-0.3421930.409083...0.1537110.238796-0.289976-0.014860-0.6648070.509221-0.003573-0.0957970.276327-0.126526
칭호0.1304400.0224250.0873730.0030870.214820-0.116077-0.2045060.5293170.4773870.086242...-0.0782070.0186450.021820-0.0355880.0346030.0001640.009421-0.0049650.0111690.007691
아이러니0.111598-0.0394130.0271200.003489-0.010765-0.130006-0.0148110.056837-0.0412370.067703...0.2669140.1241680.2824050.451492-0.334273-0.127715-0.029909-0.0959200.066272-0.091411
기준0.473540-0.1009980.0401130.125444-0.284033-0.075886-0.073992-0.1066380.2792450.711643...0.2143510.1286130.1226440.422096-0.220900-0.096707-0.074916-0.1062260.043207-0.072047
개입0.593166-0.0456450.1347180.1596600.306130-0.331935-0.0517070.425573-0.6235080.196482...-0.0004460.0024660.2654810.296231-0.1426710.1022230.2251980.0563290.546281-0.196162
있음0.1583280.0840210.1492700.1056220.3870310.361035-0.430081-0.199925-0.088406-0.244106...0.0285190.0368950.036162-0.0435850.0333060.0012530.018745-0.0109290.0010800.029669
매진0.596566-0.1695240.178759-0.4085560.6416230.4486650.821972-0.328376-0.0230740.427263...-0.2074280.1840710.209515-0.275197-0.174195-0.2006790.7651240.0138050.021381-0.013440
소란0.177889-0.211081-0.075403-0.3130030.0165320.0673520.0145990.048049-0.0902680.042147...-0.4121760.1084970.1998950.074994-0.0626230.1870800.0994740.028052-0.0384590.003520
아시아0.4613010.5440240.567776-0.267858-0.529581-0.0960440.002217-0.034563-0.129213-0.056869...0.0796430.053155-0.1190630.0513450.2061580.0105620.0229440.554979-0.006721-0.082582
확립0.162838-0.0916670.028184-0.0250760.024234-0.0264000.1336830.0295690.290536-0.251457...-0.0134720.083443-0.0142980.027256-0.034940-0.0735710.011902-0.0120610.0332290.008907
균형0.4190460.571192-0.709159-0.0140540.122890-0.212276-0.075583-0.097201-0.2877390.017576...0.0246970.0308110.024771-0.0071770.0154370.0345900.0206500.033763-0.004611-0.045885
이달0.113932-0.0037550.0609660.0111100.0867320.0225000.2419710.0566260.091929-0.055611...0.0726040.034421-0.098965-0.0909650.053405-0.023499-0.031495-0.0021360.0398040.002156
의뢰0.699156-0.336653-0.045144-0.0457280.046333-0.108698-0.1345180.196913-0.1240500.014475...-0.036544-0.0221970.0999500.150080-0.214865-0.358695-0.4558730.040130-0.270180-0.433357
변화0.266071-0.0392930.0690450.0709670.0843020.0412020.2615470.0544170.074815-0.061984...0.1058900.150717-0.075866-0.0277820.0804400.055592-0.006594-0.0018410.065624-0.022228
혼자0.197166-0.0150120.0294160.112143-0.0211300.024773-0.1826210.012911-0.0801170.026364...0.0348260.0612180.054531-0.0779550.0183790.066178-0.0371660.003066-0.0103430.029035
박지원0.222092-0.294330-0.135534-0.482251-0.0513340.190610-0.087395-0.1925930.1546830.038338...0.0264720.0043240.071936-0.0082090.0813310.1576600.0423450.006350-0.029562-0.012229
부분적0.1583280.0840210.1492700.1056220.3870310.361035-0.430081-0.199925-0.088406-0.244106...0.0285190.0368950.036162-0.0435850.0333060.0012530.018745-0.0109290.0010800.029669
핵심1.1048961.1587870.074804-0.3053250.238107-0.2302300.640343-0.185935-0.025133-0.089090...0.0010500.201294-0.5606580.245351-0.4504020.3748770.0066830.0031120.216733-0.092429
이제0.317579-0.0370370.0569900.208030-0.0542660.2111920.329048-0.041612-0.118889-0.083444...-0.0784510.111508-0.154888-0.0151140.040884-0.000271-0.002623-0.0159150.0227690.002153
배제0.241414-0.0107200.0862170.0535680.275454-0.137406-0.1459990.3744300.125597-0.020558...0.2260420.069864-0.091348-0.0322880.028683-0.1306060.0494290.039650-0.034989-0.008817
감정0.203647-0.033283-0.0039760.196920-0.1409980.1886920.087077-0.098238-0.210818-0.027833...-0.1510550.077088-0.0559230.075851-0.0125210.0232280.028873-0.013779-0.017035-0.000003
비판적0.144714-0.0554860.067303-0.0730550.026855-0.0978880.050656-0.000477-0.1128080.154227...-0.031898-0.019856-0.1444090.0473830.0526540.044239-0.0180520.029880-0.0269800.013759
문체부0.144714-0.0554860.067303-0.0730550.026855-0.0978880.050656-0.000477-0.1128080.154227...-0.031898-0.019856-0.1444090.0473830.0526540.044239-0.0180520.029880-0.0269800.013759
일방적0.226454-0.054046-0.0221820.254235-0.2904260.120410-0.0727440.046885-0.090241-0.017346...-0.0666660.241592-0.0352610.0124030.017777-0.037808-0.002739-0.5362870.0114870.012788
메시지0.3913140.0009800.2100000.0756460.375803-0.143768-0.3211420.317744-0.257700-0.039576...0.3183340.0989350.2319250.367976-0.269608-0.2340330.028268-0.0713630.096871-0.100675
보험0.255946-0.235336-0.063964-0.3175740.053579-0.0068140.0528490.139317-0.1385060.022270...-0.6346750.3045430.0529630.1486770.1234070.1406230.0066690.001840-0.011563-0.011519
현종0.1583280.0840210.1492700.1056220.3870310.361035-0.430081-0.199925-0.088406-0.244106...0.0285190.0368950.036162-0.0435850.0333060.0012530.018745-0.0109290.0010800.029669

2979 rows × 40 columns

In [215]:
_sigma = np.diag(sigma[:2])
_sigma
Out[215]:
array([[35.64359912,  0.        ],
       [ 0.        , 19.45585383]])
In [216]:
U[:,:2].dot(_sigma)
Out[216]:
array([[ 2.36958057e+00, -8.36695147e-01],
       [ 1.23292230e+00,  4.61061318e-02],
       [ 1.97766076e-01,  5.77285049e-01],
       ...,
       [ 3.91314446e-01,  9.79699246e-04],
       [ 2.55945976e-01, -2.35336430e-01],
       [ 1.58327760e-01,  8.40213821e-02]])
In [218]:
# K = 2
pd.DataFrame(U[:,:2].dot(_sigma.dot(Vt[:2,:])), index=coldata)
Out[218]:
0123456789...30313233343536373839
발언0.3893360.2052560.3329630.3048700.598133-0.0143890.5818220.3083180.2976050.387768...0.3130220.3014510.2848040.4797610.4274030.2323090.0589850.7725290.5198440.390240
300.1758620.0951200.1484490.1990800.2095910.2704470.2767090.1692370.2151350.190569...0.1502490.1509940.1331130.2423980.1475240.1554530.3231760.2602510.2675450.185937
포스-0.0034120.001435-0.0055390.079816-0.0866740.3723690.0135850.0375810.1058710.017321...0.0091620.0172900.0035090.030327-0.0649470.0658670.398052-0.1259900.0394400.009572
절전0.0190060.0106290.0157600.0304340.0138470.0694740.0319250.0226450.0349790.022835...0.0175250.0184730.0150350.0299300.0093260.0241420.0780190.0152360.0336710.021499
두고0.0432200.0229620.0368190.0383490.0619510.0187350.0656090.0364270.0389620.044178...0.0353990.0345530.0319440.0551440.0441030.0294980.0283180.0792470.0601110.044030
분산0.0009370.003952-0.0020020.088948-0.0856470.3980520.0213810.0438210.1167260.023091...0.0134850.0220400.0071060.038082-0.0644230.0731760.426383-0.1256390.0483000.014833
촉구0.1867130.0996520.1586920.1772710.2561780.1332870.2860600.1630300.1835750.193764...0.1545990.1520730.1388440.2430830.1819200.1369810.1783900.3255820.2658700.192040
강물-0.0034120.001435-0.0055390.079816-0.0866740.3723690.0135850.0375810.1058710.017321...0.0091620.0172900.0035090.030327-0.0649470.0658670.398052-0.1259900.0394400.009572
평가0.1226570.0693460.1011020.2154830.0705290.5344500.2103510.1554590.2508280.152163...0.1158530.1238290.0984190.2011430.0460330.1715060.5956840.0707560.2274740.141754
사이트0.0772210.0401190.0665190.0453910.133517-0.0708910.1119830.0537890.0392000.073123...0.0599090.0561470.0553900.0888440.0959570.033665-0.0611500.1750140.0950640.075026
나발-0.0034120.001435-0.0055390.079816-0.0866740.3723690.0135850.0375810.1058710.017321...0.0091620.0172900.0035090.030327-0.0649470.0658670.398052-0.1259900.0394400.009572
일훈0.0280120.0149030.0238460.0253910.0396230.0145610.0426440.0238710.0259570.028767...0.0230200.0225240.0207430.0359650.0281870.0195590.0209430.0505870.0392450.028621
참사0.0515790.0272290.0440810.0413280.0783130.0023310.0772920.0413040.0406620.051607...0.0416050.0401630.0377990.0639520.0559260.0315490.0123520.1009870.0693700.051847
부업0.0374550.0190050.0326320.0104290.076200-0.0866740.0516920.0204310.0037750.032557...0.0273860.0244340.0260230.0382420.0551400.006791-0.0856470.1016350.0399300.034565
30000.0310290.0157310.0270450.0082870.063476-0.0733970.0427440.0167540.0026630.026883...0.0226370.0201570.0215330.0315340.0459420.005335-0.0726590.0847090.0328910.028580
강경0.0559900.0298390.0476220.0520410.0779240.0349250.0855280.0483420.0535790.057823...0.0461980.0453320.0415540.0724250.0553820.0401560.0480930.0992480.0791300.057411
경원0.3091030.1646540.2629730.2853120.4321590.1838300.4717230.2659110.2931770.318727...0.2547610.2497840.2292630.3990070.3072240.2200530.2558920.5507950.4357940.316636
말씀0.0547430.0349180.0419060.197396-0.0684520.6953260.1168090.1188150.2450650.093337...0.0663160.0797240.0512930.132146-0.0545590.1598710.754961-0.1147220.1555120.079209
보안법0.0335500.0177070.0286760.0267830.0510370.0010710.0502530.0268180.0263190.033543...0.0270480.0261000.0245790.0415560.0364500.0204400.0075570.0658300.0450690.033708
채널0.5007710.2646100.4277720.4075790.7540780.0512110.7518490.4041080.4031020.502634...0.4048450.3914650.3674450.6235440.5382710.3115180.1505170.9713120.6768730.504368
자동차-0.0034120.001435-0.0055390.079816-0.0866740.3723690.0135850.0375810.1058710.017321...0.0091620.0172900.0035090.030327-0.0649470.0658670.398052-0.1259900.0394400.009572
만이0.1053960.0565810.0893120.1084580.1363240.1131070.1633770.0961260.1146610.111484...0.0884800.0878700.0789860.1407290.0964650.0842310.1412450.1716570.1545540.109725
민주화0.0484710.0244750.0423270.0104330.101635-0.1259900.0662010.0249440.0008570.041363...0.0349980.0308800.0334540.0482070.0736300.006266-0.1256390.1359550.0500400.044249
신기0.0399830.0213770.0339510.0389300.0539010.0329170.0614760.0353850.0405860.041736...0.0332460.0327990.0298030.0524590.0382370.0301310.0428850.0683190.0574500.041276
여건0.0138720.0078900.0113960.0255720.0067910.0658670.0240620.0181690.0299480.017511...0.0132760.0142950.0112190.0232520.0043150.0203860.0731760.0062660.0263680.016221
시베리아0.0190060.0106290.0157600.0304340.0138470.0694740.0319250.0226450.0349790.022835...0.0175250.0184730.0150350.0299300.0093260.0241420.0780190.0152360.0336710.021499
호통0.1160810.0588120.1012080.0300330.238420-0.2789530.1596860.0622030.0086900.100326...0.0845440.0751720.0804850.1175630.1725900.019161-0.2765010.3182990.1225310.106765
시스0.0787300.0421860.0667790.0789840.1038390.0753170.1215800.0708120.0829770.082767...0.0658000.0651470.0588540.1042720.0735660.0612460.0956860.1311630.1143660.081643
호치민0.0190060.0106290.0157600.0304340.0138470.0694740.0319250.0226450.0349790.022835...0.0175250.0184730.0150350.0299300.0093260.0241420.0780190.0152360.0336710.021499
교통부0.0240210.0126420.0205610.0182500.037455-0.0034120.0357700.0187490.0176260.023784...0.0192320.0184630.0175310.0293660.0267840.0138720.0009370.0484710.0317740.023988
..................................................................
부끄러움0.1127100.0570450.0983160.0276640.232975-0.2776110.1547090.0596650.0064680.097036...0.0818730.0726270.0780380.1135220.1686890.017372-0.2757090.3112210.1181730.103428
가능0.0908160.0496560.0762250.1164810.0947330.2013740.1459910.0940730.1290810.101846...0.0795630.0812780.0697350.1309000.0660350.0915340.2329680.1146290.1454550.098172
벤처0.0240210.0126420.0205610.0182500.037455-0.0034120.0357700.0187490.0176260.023784...0.0192320.0184630.0175310.0293660.0267840.0138720.0009370.0484710.0317740.023988
어제0.0753630.0399260.0642930.0639850.1108700.0196530.1137480.0621080.0641450.076308...0.0613080.0595520.0554910.0949470.0790420.0490600.0354420.1423500.1032770.076321
칭호0.0176320.0096380.0148020.0225370.0184700.0387420.0283270.0182260.0249580.019754...0.0154360.0157610.0135340.0253820.0128790.0177070.0448510.0223690.0281990.019048
아이러니0.0183370.0096670.0156820.0143570.028171-0.0006820.0274020.0145200.0140150.018262...0.0147420.0141970.0134130.0225950.0201300.0109400.0027730.0363850.0244830.018379
기준0.0741320.0394130.0631300.0664890.1055560.0353450.1126940.0628270.0677640.075953...0.0608190.0594370.0548430.0948820.0751180.0511800.0520100.1348960.1034820.075633
개입0.0883720.0474080.0749130.0900790.1151530.0909580.1367920.0801790.0950090.093261...0.0740640.0734690.0661650.1176370.0815210.0699180.1142760.1451720.1291310.091866
있음0.0182500.0103210.0150410.0321270.0104290.0798160.0313130.0231630.0374070.022657...0.0172470.0184400.0146490.0299560.0068010.0255720.0889480.0104330.0338810.021102
매진0.0957370.0506780.0817090.0802090.1419060.0201170.1442570.0783750.0800740.096669...0.0777290.0753930.0704150.1201670.1012090.0614400.0398320.1823880.1306250.096786
소란0.0374550.0190050.0326320.0104290.076200-0.0866740.0516920.0204310.0037750.032557...0.0273860.0244340.0260230.0382420.0551400.006791-0.0856470.1016350.0399300.034565
아시아0.0365710.0228120.0284120.118746-0.0327730.4052870.0750610.0729650.1464560.059057...0.0424080.0500890.0333110.082786-0.0267100.0959980.440937-0.0576720.0968900.050848
확립0.0286510.0149340.0246410.0180800.048316-0.0207140.0418300.0205620.0161730.027442...0.0224070.0211310.0206420.0334820.0346840.013510-0.0167050.0631450.0359320.028032
균형0.0289490.0188550.0218440.114339-0.0460240.4126140.0640240.0676920.1426830.051858...0.0365050.0445640.0278490.074047-0.0362360.0927350.447326-0.0750520.0875450.043454
이달0.0166960.0089840.0141310.0177230.0210600.0203640.0260030.0154920.0188770.017796...0.0140940.0140510.0125520.0225200.0148790.0137890.0249950.0264080.0247720.017467
의뢰0.1198570.0627390.1028660.0824100.195433-0.0560750.1765220.0893270.0765670.116500...0.0947120.0900370.0868440.1429030.1400670.062095-0.0371400.2543640.1539300.118333
변화0.0406840.0217220.0345720.0388250.0556250.0299370.0623770.0356210.0402610.042271...0.0337150.0331840.0302680.0530500.0394930.0300110.0398280.0706570.0580380.041876
혼자0.0293660.0157540.0248920.0299560.0382420.0303270.0454600.0266540.0316010.030996...0.0246140.0244190.0219880.0391000.0270720.0232520.0380820.0482070.0429220.030530
박지원0.0484710.0244750.0423270.0104330.101635-0.1259900.0662010.0249440.0008570.041363...0.0349980.0308800.0334540.0482070.0736300.006266-0.1256390.1359550.0500400.044249
부분적0.0182500.0103210.0150410.0321270.0104290.0798160.0313130.0231630.0374070.022657...0.0172470.0184400.0146490.0299560.0068010.0255720.0889480.0104330.0338810.021102
핵심0.0955970.0581380.0754820.272299-0.0480490.8874610.1875810.1721240.3327260.144806...0.1053560.1217250.0843030.200452-0.0415480.2195720.968491-0.0956780.2329480.126917
이제0.0480130.0256880.0407570.0471700.0643120.0414250.0739190.0426970.0492900.050224...0.0399840.0394880.0358190.0631720.0456050.0365290.0535290.0814330.0692130.049632
배제0.0355310.0191040.0300840.0373220.0452080.0415550.0552490.0327760.0396520.037774...0.0299370.0298060.0266830.0477600.0319570.0290200.0512830.0567710.0525070.037109
감정0.0313170.0167030.0266260.0294470.0432520.0210610.0479160.0272050.0304140.032428...0.0258890.0254380.0232670.0406520.0307260.0227390.0285340.0550240.0444410.032166
비판적0.0240210.0126420.0205610.0182500.037455-0.0034120.0357700.0187490.0176260.023784...0.0192320.0184630.0175310.0293660.0267840.0138720.0009370.0484710.0317740.023988
문체부0.0240210.0126420.0205610.0182500.037455-0.0034120.0357700.0187490.0176260.023784...0.0192320.0184630.0175310.0293660.0267840.0138720.0009370.0484710.0317740.023988
일방적0.0357700.0189870.0304860.0313130.0516920.0135850.0542030.0299400.0316860.036456...0.0292350.0284940.0264070.0454600.0368160.0240620.0213810.0662010.0495220.036373
메시지0.0565740.0305210.0478190.0620380.0694040.0779540.0885620.0534640.0665720.060802...0.0480450.0480900.0426760.0771390.0489450.0483580.0942770.0866190.0849990.059499
보험0.0500970.0256860.0434300.0207500.095205-0.0852390.0706790.0306480.0139930.045254...0.0376100.0343240.0353010.0539970.0687050.014681-0.0816950.1261100.0570340.047303
현종0.0182500.0103210.0150410.0321270.0104290.0798160.0313130.0231630.0374070.022657...0.0172470.0184400.0146490.0299560.0068010.0255720.0889480.0104330.0338810.021102

2979 rows × 40 columns

In [220]:
# 원래 
pd.DataFrame(U.dot(np.diag(sigma)).dot(Vt),index=coldata)
Out[220]:
0123456789...30313233343536373839
발언4.511668e-161.142421e-151.071848e-15-1.765938e-152.210952e-15-1.067545e-141.000000e+00-1.165556e-15-2.292612e-151.235156e-15...1.382884e-157.612421e-161.000000e+001.000000e+003.307632e-15-1.217122e-15-9.911487e-151.000000e+001.000000e+001.000000e+00
303.336287e-161.000000e+001.529144e-16-4.689882e-152.082784e-15-4.170081e-15-8.098133e-161.000000e+001.000000e+00-2.153584e-15...-1.284910e-151.000000e+001.000000e+002.341709e-161.272994e-151.000000e+00-5.632642e-151.457603e-15-1.259300e-15-1.814579e-15
포스-1.639695e-172.515408e-16-6.861209e-17-2.553020e-16-6.310523e-161.000000e+005.844549e-16-4.850555e-16-3.787971e-16-1.694962e-17...3.932637e-16-3.869439e-164.402897e-16-3.042305e-16-2.638965e-16-3.757404e-168.143798e-182.803780e-16-2.187497e-15-7.326933e-16
절전4.597195e-162.605088e-171.747082e-166.190021e-162.299841e-163.819965e-161.053478e-16-1.318783e-161.812797e-156.606538e-16...2.092256e-164.534301e-164.135441e-161.460791e-155.581437e-161.355660e-153.994348e-161.682267e-15-9.774098e-171.812806e-16
두고-9.180194e-17-1.326430e-174.000906e-16-6.704289e-162.870614e-16-5.066714e-162.815634e-15-4.790930e-17-1.967159e-165.823311e-16...1.000000e+00-2.566413e-164.755388e-161.249544e-15-8.239925e-16-8.174645e-17-1.826759e-159.388768e-16-3.698883e-161.000000e+00
분산-2.440442e-161.496535e-164.972386e-173.986480e-17-4.885358e-16-6.848354e-165.208930e-162.819547e-16-2.020721e-163.177013e-17...-1.337750e-162.098993e-16-3.612618e-165.166253e-16-7.044050e-167.309626e-161.000000e+00-4.136106e-165.195663e-17-6.396306e-16
촉구5.457131e-161.710637e-161.000000e+001.961934e-162.756229e-15-4.571046e-15-2.350097e-15-4.038126e-16-1.390469e-151.000000e+00...-9.196288e-163.200655e-161.497461e-15-1.115237e-151.462282e-15-3.325721e-16-3.729321e-154.344065e-151.000000e+006.762754e-16
강물-2.468931e-172.512921e-16-9.468087e-17-1.243414e-16-3.090262e-161.000000e+00-1.002925e-16-1.741639e-167.681702e-161.348089e-15...6.354412e-164.937574e-163.883982e-161.479050e-15-1.054803e-16-7.584244e-16-1.750137e-15-1.566978e-168.868654e-16-4.231038e-17
평가1.000000e+001.368110e-163.204847e-161.000000e+001.174513e-151.000000e+005.640569e-17-4.087164e-16-1.651178e-15-9.184983e-16...2.452692e-161.000000e+009.705906e-161.000000e+007.212501e-16-3.409187e-15-3.183174e-151.916508e-153.044472e-164.516649e-16
사이트3.929215e-161.000000e+005.630138e-16-2.180729e-164.098521e-16-1.660619e-157.063678e-166.006623e-16-3.579097e-16-7.800961e-16...5.940034e-16-1.116837e-152.745919e-163.315917e-167.164963e-16-1.390471e-15-3.728440e-158.808758e-17-1.502040e-15-6.700611e-17
나발-4.542461e-172.577704e-16-1.012918e-16-1.020088e-16-3.116262e-161.000000e+00-1.393816e-161.210821e-161.051449e-166.287759e-17...5.953511e-165.553037e-161.509967e-151.160403e-156.589474e-163.466416e-168.615199e-16-1.705346e-16-1.762902e-152.870787e-15
일훈3.006819e-162.217434e-16-2.640711e-16-5.038979e-163.987142e-16-5.382811e-16-1.101200e-162.215327e-16-6.029598e-16-8.623856e-17...1.160702e-171.830631e-157.902401e-183.914009e-162.786844e-16-1.391746e-16-1.132388e-159.442215e-162.189720e-165.009731e-16
참사3.021628e-161.140508e-168.276296e-16-1.106820e-157.501103e-16-1.109252e-151.352104e-162.183515e-16-5.940927e-167.727204e-16...9.802820e-168.650158e-161.000000e+00-5.618253e-162.474275e-15-4.107559e-16-9.597439e-16-8.225012e-163.288632e-162.120899e-15
부업2.666319e-16-1.499422e-165.507429e-171.656491e-161.000000e+00-9.128753e-164.659564e-16-2.311182e-163.378026e-161.109353e-16...-3.335360e-176.550148e-17-4.654981e-16-6.514475e-17-2.170014e-166.502927e-18-1.873559e-15-2.198138e-151.054924e-15-1.129764e-16
30001.945806e-166.705649e-172.457529e-162.233831e-16-2.335262e-17-6.564076e-162.515107e-164.623586e-161.094529e-16-1.293552e-16...-7.948876e-16-2.203896e-161.832962e-165.945960e-161.625308e-163.999249e-16-7.416788e-162.989951e-163.129241e-176.032096e-16
강경3.979218e-162.075387e-16-4.968640e-16-4.595904e-161.114898e-15-1.178534e-151.000000e+001.418369e-16-1.731612e-16-4.658473e-16...1.443939e-163.973923e-161.116800e-174.762633e-165.272900e-16-6.557216e-16-1.124734e-151.439950e-15-9.307373e-16-3.423256e-16
경원1.289635e-158.717351e-165.435182e-16-2.268144e-153.274809e-15-7.706843e-151.000000e+001.000000e+00-3.136795e-15-1.089577e-15...2.001833e-157.736920e-164.064411e-161.000000e+002.635201e-15-1.601846e-15-9.738861e-153.964623e-151.000000e+001.000000e+00
말씀1.633328e-16-1.102160e-16-3.678172e-16-1.744641e-161.000000e+001.000000e+006.472868e-16-6.478147e-165.073468e-164.828009e-16...5.303748e-16-7.048546e-167.724059e-169.321540e-172.041717e-15-9.455163e-161.000000e+00-9.450553e-16-2.034927e-16-2.318517e-17
보안법1.180645e-16-7.398370e-183.564967e-17-2.723914e-161.059019e-16-6.846583e-163.885914e-161.757864e-16-5.177088e-166.983360e-17...8.978497e-165.215270e-16-4.413166e-151.178671e-155.986048e-16-5.750490e-16-4.024094e-162.214893e-16-1.642945e-151.836392e-17
채널1.407318e-151.000000e+001.000000e+001.000000e+004.630984e-15-1.086535e-14-1.233736e-16-1.194865e-151.000000e+001.000000e+00...1.082688e-15-9.661867e-161.000000e+001.000000e+001.000000e+00-2.539260e-15-1.533968e-141.000000e+001.000000e+002.884800e-15
자동차-6.313339e-182.669006e-16-8.218223e-17-1.343520e-16-2.465707e-161.000000e+00-9.776055e-171.360615e-166.109589e-177.372563e-17...-1.134761e-15-1.129681e-166.586643e-161.218094e-162.513428e-16-5.075660e-16-8.593307e-167.527663e-164.097772e-173.614712e-16
만이5.062875e-174.685358e-16-6.774259e-16-6.473326e-162.024655e-15-2.315017e-151.000000e+00-3.180736e-16-9.357807e-16-7.241269e-16...-1.276827e-17-3.275366e-175.773009e-16-1.284923e-151.972824e-15-1.960978e-16-1.211241e-152.403594e-151.000000e+00-9.307931e-16
민주화-7.885390e-172.659273e-161.499327e-163.337361e-16-1.454992e-16-1.260415e-153.386095e-165.406702e-17-8.827522e-172.749863e-16...4.430796e-16-3.101933e-161.465007e-16-5.806546e-161.820320e-17-4.209278e-16-3.018863e-151.000000e+001.122912e-15-3.567432e-17
신기4.187398e-16-3.464448e-16-9.676182e-16-2.753753e-182.821835e-16-8.138597e-162.525641e-16-4.761422e-162.221648e-161.930931e-16...1.304184e-15-4.709820e-169.833192e-166.964750e-16-5.152953e-16-4.814632e-16-2.083809e-15-1.308965e-16-1.606136e-15-5.142800e-16
여건1.220887e-162.506478e-162.254202e-16-7.037113e-161.931531e-16-4.827899e-16-2.299824e-16-1.865925e-161.898593e-162.339320e-16...5.948592e-174.690015e-16-2.033156e-16-8.059062e-175.631545e-171.000000e+00-3.458103e-162.524948e-16-8.198460e-16-2.849071e-16
시베리아5.862109e-162.691324e-171.774368e-163.656700e-169.031377e-17-3.895657e-161.098044e-17-2.101936e-16-4.556776e-16-1.800539e-16...-1.484648e-153.570037e-169.933804e-162.594783e-16-5.111784e-161.205011e-16-8.961277e-166.252006e-161.187277e-15-5.102914e-16
호통-9.331040e-171.079745e-167.419846e-164.794175e-161.000000e+00-2.812850e-159.231259e-16-8.856258e-167.925280e-16-9.764165e-17...1.074559e-155.816594e-16-3.052167e-161.455149e-15-6.168711e-161.903078e-16-3.834766e-151.000000e+001.407430e-15-4.013017e-17
시스1.721318e-163.824994e-16-3.103303e-16-6.615510e-165.632508e-16-1.733232e-15-2.015049e-16-4.742026e-16-9.297764e-162.851297e-16...2.116321e-151.689165e-15-1.346595e-151.000000e+001.392831e-15-2.286779e-16-2.090036e-159.913909e-16-1.259248e-16-2.141422e-16
호치민5.042455e-169.002023e-182.131416e-164.408888e-161.174915e-16-4.206274e-166.190753e-17-2.153976e-16-4.687856e-16-1.901210e-16...-4.200499e-169.477919e-167.593250e-16-8.339204e-162.593931e-16-2.306440e-16-1.006097e-159.275090e-16-4.191583e-16-5.165477e-17
교통부1.000000e+001.283010e-161.682673e-16-1.346571e-161.922969e-16-8.951294e-167.842355e-173.575696e-17-1.878072e-16-1.169783e-17...-1.325842e-162.245894e-163.257213e-16-1.131536e-152.569633e-16-2.566153e-162.254166e-16-1.602070e-165.092335e-171.710269e-16
..................................................................
부끄러움1.269359e-17-1.736880e-185.042811e-165.068361e-161.000000e+00-2.795013e-151.188689e-15-3.514061e-17-5.229979e-177.186929e-16...6.535474e-16-2.897751e-167.576655e-176.418363e-161.000000e+005.169170e-17-4.091547e-151.000000e+001.030852e-163.656845e-16
가능1.565054e-171.388220e-16-5.947793e-16-1.533307e-151.204327e-15-2.122293e-15-6.123341e-162.697781e-171.000000e+001.000000e+00...-4.059552e-172.159083e-16-7.996950e-166.531443e-171.364121e-15-8.186006e-16-2.237341e-152.408964e-151.000000e+003.251166e-16
벤처1.000000e+001.017455e-161.768183e-16-1.055311e-162.225640e-16-8.616252e-164.797987e-176.331525e-17-1.730787e-16-2.263154e-17...-1.575282e-168.585896e-172.679593e-16-2.277127e-16-7.375566e-17-2.096159e-17-7.628661e-162.659221e-16-1.277343e-161.124512e-16
어제6.874632e-16-2.057264e-164.562329e-16-1.693421e-151.123881e-15-1.586557e-15-4.060148e-174.246575e-16-9.149167e-161.000000e+00...2.715356e-161.703363e-161.000000e+002.710436e-161.263911e-15-1.618567e-16-2.320301e-153.333905e-166.797370e-169.946882e-16
칭호-3.023440e-163.256248e-16-9.636503e-17-1.763011e-163.343751e-16-3.238237e-163.222644e-161.248441e-16-4.184457e-16-3.545329e-16...2.520550e-161.551038e-16-2.522339e-161.202454e-161.894667e-16-6.155780e-16-6.150971e-163.675251e-163.822671e-16-2.937485e-16
아이러니1.600374e-16-3.533154e-166.601704e-16-5.908843e-165.455166e-16-4.477240e-16-1.793479e-16-1.472798e-16-2.507661e-162.055540e-16...-5.275390e-17-3.007479e-162.286264e-16-3.714682e-167.616648e-17-2.113635e-16-6.291345e-16-2.911885e-16-1.637698e-162.084617e-16
기준1.000000e+00-4.422601e-167.437923e-16-6.165874e-168.869455e-16-2.283853e-15-9.780911e-16-5.864349e-16-7.094002e-162.900493e-16...-1.955109e-16-1.796286e-167.301612e-16-7.106162e-164.322825e-16-1.994184e-16-1.777998e-159.208457e-161.000000e+009.223277e-17
개입1.150305e-157.173939e-161.000000e+00-2.448920e-167.142415e-16-1.720427e-15-5.075911e-16-9.878014e-17-2.146405e-151.000000e+00...-2.063949e-161.939317e-161.168311e-15-4.594299e-163.695250e-167.081793e-17-2.819325e-152.105320e-153.987370e-166.541624e-16
있음-1.428390e-16-3.466876e-16-2.386881e-171.000000e+004.704339e-16-5.033291e-16-4.017279e-16-3.798596e-16-4.623307e-16-1.496321e-16...-4.256385e-16-7.553894e-16-3.096594e-172.794540e-161.948332e-16-2.457131e-16-7.229393e-166.593767e-161.654162e-161.724140e-16
매진3.106727e-161.435648e-166.932490e-16-1.258877e-151.000000e+00-2.509813e-157.212015e-16-4.724614e-171.000000e+005.827422e-16...4.192791e-165.856879e-17-5.279074e-164.442819e-161.000000e+001.000000e+00-3.330754e-151.012998e-15-5.786201e-167.057395e-16
소란2.073391e-16-1.764376e-163.247439e-171.851661e-161.000000e+00-9.536697e-165.239234e-16-2.032505e-164.262527e-166.608960e-17...-8.706450e-17-3.120747e-16-2.308651e-174.800220e-162.119441e-163.454101e-16-1.247311e-15-3.913659e-169.147912e-173.779023e-16
아시아5.084913e-16-4.393881e-17-1.181097e-15-1.889898e-165.749909e-171.000000e+00-1.225573e-17-3.788992e-161.812197e-163.118365e-16...9.263653e-16-1.276196e-176.350603e-163.146377e-163.087344e-16-7.518176e-16-1.549606e-156.060662e-166.894905e-163.784213e-16
확립-1.951192e-164.002892e-16-2.477151e-161.227879e-164.330421e-17-6.425090e-16-3.128465e-171.098239e-16-6.019509e-17-2.225742e-16...1.236228e-161.378799e-16-2.429613e-16-1.899113e-16-6.777494e-17-2.984465e-17-1.001192e-151.248363e-168.227746e-172.760995e-16
균형-2.273374e-173.994881e-16-2.087908e-16-3.848397e-16-1.759686e-16-1.052719e-152.781553e-163.203560e-16-6.002874e-16-1.595189e-16...2.113147e-164.253309e-162.519920e-171.971263e-16-1.198314e-16-6.156241e-171.000000e+00-2.605469e-162.622405e-161.631526e-16
이달-8.912446e-17-3.796699e-167.725550e-17-8.359802e-171.401538e-16-3.683840e-16-5.032360e-16-4.418148e-163.442211e-16-4.756102e-16...3.742773e-161.889777e-168.301251e-18-6.376420e-171.432189e-172.204580e-16-4.037298e-162.933496e-162.037987e-16-6.051803e-17
의뢰2.207398e-165.251605e-161.000000e+00-9.668526e-161.460238e-15-2.604289e-151.847022e-17-9.146816e-16-7.068540e-16-3.211261e-16...1.251900e-152.291694e-161.000000e+001.000000e+001.027399e-163.318693e-16-3.901783e-151.860345e-152.513289e-161.230600e-15
변화-1.200636e-16-3.231444e-164.442827e-16-4.310002e-164.821337e-17-1.162889e-15-5.872965e-16-5.351615e-169.550488e-16-3.161273e-16...2.624641e-16-1.195166e-166.596578e-179.351860e-184.495845e-171.062827e-16-1.366489e-153.824295e-162.050309e-161.000000e+00
혼자1.616156e-161.190886e-16-1.237939e-16-4.727851e-16-3.316469e-17-7.678253e-161.185351e-16-2.598935e-16-3.814766e-161.150709e-17...4.669069e-162.468203e-16-5.732111e-161.000000e+001.553072e-161.715588e-16-6.511977e-161.125986e-15-1.732467e-166.372093e-16
박지원-5.665050e-172.862771e-161.475419e-163.608009e-16-1.971568e-16-1.263449e-153.335290e-168.629444e-17-1.158498e-162.509216e-16...2.966789e-162.205559e-16-4.983018e-171.154984e-161.724135e-163.546417e-17-2.047834e-151.000000e+001.432283e-16-1.690714e-16
부분적-1.411649e-16-3.558471e-16-3.618491e-171.000000e+004.794992e-16-5.051486e-16-3.876431e-16-3.734489e-16-4.673708e-16-1.485214e-16...-4.106196e-16-7.469626e-16-2.262881e-172.958177e-161.958099e-16-2.238599e-16-7.087331e-166.864393e-161.669806e-161.668000e-16
핵심-5.165226e-178.930469e-16-9.937413e-16-1.692253e-154.007618e-161.000000e+006.476327e-178.918484e-161.000000e+001.000000e+00...4.469924e-168.449680e-16-6.680409e-169.600721e-184.916428e-16-3.664129e-161.000000e+009.786720e-164.122725e-161.281340e-15
이제-1.978152e-17-2.846498e-166.244395e-174.424334e-161.078370e-15-1.214058e-15-5.694480e-16-4.221536e-161.956979e-16-2.150542e-16...-1.709340e-165.492586e-163.608405e-16-6.576984e-178.124091e-163.364827e-16-7.728318e-168.070056e-164.797450e-17-6.968775e-17
배제-3.413509e-17-1.668367e-16-3.315866e-17-1.114470e-165.481836e-16-6.394383e-165.062442e-16-1.671202e-16-5.829587e-16-4.615162e-17...1.000000e+00-4.079784e-169.285057e-171.223855e-167.365089e-16-6.228133e-16-1.087219e-15-2.312131e-16-2.009094e-16-4.499170e-16
감정2.545475e-173.174320e-18-3.546369e-175.714128e-168.877861e-16-8.265459e-161.404766e-17-1.151509e-16-2.968444e-163.122962e-16...-6.280357e-161.398827e-163.031013e-16-8.445067e-177.035527e-16-1.286183e-16-3.939719e-165.902020e-16-1.389763e-16-1.072180e-16
비판적1.000000e+001.304477e-161.824123e-16-1.461290e-161.599323e-16-8.585250e-163.159116e-176.584922e-17-1.716097e-16-1.181601e-17...-9.085250e-171.301921e-162.440096e-16-1.893784e-16-1.262320e-16-5.148840e-17-7.322998e-163.560617e-16-1.570949e-163.235906e-17
문체부1.000000e+001.582033e-161.824123e-16-1.426595e-161.460545e-16-8.602598e-163.506061e-176.584922e-17-1.854874e-16-2.569380e-17...-6.222956e-171.440699e-162.370707e-16-1.789701e-16-1.262320e-16-2.373282e-17-7.305651e-163.560617e-16-1.432172e-161.327710e-17
일방적1.007401e-162.471163e-16-1.248165e-16-5.636086e-165.524940e-16-8.829365e-161.000000e+00-1.222899e-16-1.801963e-16-2.451677e-16...4.453535e-161.884088e-161.565845e-16-1.184261e-181.700217e-16-7.309473e-17-4.045143e-165.810719e-16-1.052485e-151.378215e-16
메시지6.623673e-16-3.987856e-169.564588e-16-3.271697e-169.715764e-16-1.051376e-151.801281e-16-9.127584e-16-1.139223e-151.181844e-16...1.000000e+002.132760e-186.171452e-16-2.203443e-162.421658e-16-9.396799e-16-1.616319e-151.741008e-16-6.131815e-18-2.251124e-16
보험2.074556e-161.000000e+001.592850e-16-6.650689e-171.000000e+00-1.403406e-153.984420e-16-2.250567e-164.049823e-16-1.181218e-16...-3.856852e-16-8.476240e-164.089501e-165.585453e-161.676714e-162.166523e-16-1.731809e-15-3.141403e-16-4.640783e-171.758170e-16
현종-1.481710e-16-3.571441e-16-4.303619e-171.000000e+004.635859e-16-4.481806e-16-3.684724e-16-3.732042e-16-5.042715e-16-1.261339e-16...-3.985930e-16-7.667819e-16-4.758792e-173.353757e-161.826931e-16-2.473741e-16-7.490404e-166.756209e-161.293977e-161.497172e-16

2979 rows × 40 columns

단어 ~ 단어 Cosine Similarity

In [221]:
_U = U.dot(np.diag(sigma))
_U.shape
Out[221]:
(2979, 40)
In [224]:
# 단어간의 
# Cosine similarity
pd.DataFrame(_U.dot(_U.T) / (np.linalg.norm(_U, axis=1).reshape(2979,1) * np.linalg.norm(_U.T, axis=0).reshape(1,2979)),index=coldata,columns=coldata)
Out[224]:
발언30포스절전두고분산촉구강물평가사이트...핵심이제배제감정비판적문체부일방적메시지보험현종
발언1.000000e+002.672612e-01-4.031939e-151.547491e-151.889822e-01-3.079241e-152.834734e-01-2.098961e-152.020305e-016.602492e-16...2.182179e-011.889822e-014.523622e-162.672612e-011.569177e-161.485725e-162.672612e-011.543033e-018.295622e-16-1.824265e-16
302.672612e-011.000000e+00-1.805800e-151.205687e-15-4.431395e-16-1.872207e-15-2.980433e-16-1.141319e-152.519763e-011.924501e-01...2.721655e-012.428841e-16-8.793233e-164.596401e-18-4.104865e-17-4.041213e-18-4.404116e-17-1.210510e-152.357023e-01-1.997698e-15
포스-4.031939e-15-1.805800e-151.000000e+007.312311e-16-6.906868e-16-1.029443e-16-3.118161e-151.000000e+003.779645e-01-9.608891e-16...4.082483e-01-1.731820e-15-3.422214e-16-1.448835e-15-8.184704e-16-8.202051e-16-3.368393e-16-5.469588e-16-1.120320e-15-3.992337e-16
절전1.547491e-151.205687e-157.312311e-161.000000e+005.678522e-167.002979e-161.368988e-156.886505e-165.943712e-16-4.055891e-16...1.143539e-15-2.766023e-16-2.454099e-17-3.161317e-161.906062e-175.182829e-18-1.969417e-165.773503e-014.087718e-168.092316e-16
두고1.889822e-01-4.431395e-16-6.906868e-165.678522e-161.000000e+00-1.701985e-159.934992e-17-1.121916e-178.016095e-17-5.637757e-16...5.326886e-168.212049e-175.000000e-01-2.239780e-161.299002e-161.403266e-162.088913e-154.082483e-01-1.632470e-16-4.189833e-16
분산-3.079241e-15-1.872207e-15-1.029443e-167.002979e-16-1.701985e-151.000000e+00-8.995014e-16-1.862092e-15-1.865657e-15-1.718013e-15...4.082483e-01-5.156016e-16-7.403722e-16-3.839905e-16-8.175738e-16-8.175738e-164.207800e-17-1.387303e-15-1.092039e-15-8.893801e-16
촉구2.834734e-01-2.980433e-16-3.118161e-151.368988e-159.934992e-17-8.995014e-161.000000e+00-1.007217e-15-7.169167e-16-1.160861e-15...1.443376e-015.000000e-01-2.787964e-163.535534e-01-1.508455e-17-2.489763e-17-7.814280e-164.547199e-164.215397e-161.630744e-16
강물-2.098961e-15-1.141319e-151.000000e+006.886505e-16-1.121916e-17-1.862092e-15-1.007217e-151.000000e+003.779645e-01-1.273260e-15...4.082483e-01-1.340623e-15-8.124476e-17-1.732294e-15-8.125122e-16-8.142470e-16-1.020719e-15-4.700472e-16-8.895341e-16-2.685442e-16
평가2.020305e-012.519763e-013.779645e-015.943712e-168.016095e-17-1.865657e-15-7.169167e-163.779645e-011.000000e+00-2.194008e-16...1.543033e-01-2.750663e-17-4.065218e-164.421610e-163.779645e-013.779645e-01-5.085478e-162.182179e-01-5.045943e-163.779645e-01
사이트6.602492e-161.924501e-01-9.608891e-16-4.055891e-16-5.637757e-16-1.718013e-15-1.160861e-15-1.273260e-15-2.194008e-161.000000e+00...-8.732159e-161.720707e-17-5.186785e-165.191784e-161.566534e-161.566534e-161.011546e-15-4.871482e-164.082483e-012.645252e-18
나발-1.990028e-15-1.360914e-151.000000e+001.645808e-151.996906e-157.510281e-16-2.923233e-151.000000e+003.779645e-01-4.229613e-16...4.082483e-01-8.799124e-163.822305e-17-3.214967e-16-8.449298e-16-8.466645e-16-1.063285e-156.829410e-16-8.942223e-16-2.443685e-16
일훈2.672612e-013.333333e-01-9.861456e-17-2.209970e-169.079508e-16-1.349699e-151.410664e-16-4.086480e-167.261561e-161.228368e-15...4.082483e-018.732022e-165.959272e-168.876378e-167.907563e-177.213674e-17-6.186277e-17-1.198068e-164.861792e-16-6.090468e-16
참사4.629100e-011.924501e-01-7.112279e-161.164378e-151.497523e-15-1.534184e-151.033762e-15-6.393180e-162.182179e-019.706597e-16...-7.390398e-164.714437e-16-9.795538e-172.786236e-162.361971e-162.311894e-16-5.851668e-173.333333e-013.519996e-16-8.712875e-16
부업1.070807e-158.478870e-16-1.179621e-156.466294e-16-3.233341e-17-1.781403e-159.543648e-16-8.835599e-167.982555e-174.334738e-16...-2.624818e-163.406332e-163.216397e-161.802722e-162.511500e-162.303333e-166.695421e-169.948147e-167.071068e-013.291406e-16
30001.532009e-151.038522e-15-5.776181e-16-7.818372e-16-7.495561e-16-1.428608e-159.934465e-163.035353e-162.482833e-165.773503e-01...-6.100627e-163.412069e-16-1.212552e-163.859861e-162.097688e-161.958910e-164.790479e-16-1.087276e-152.417842e-166.849652e-16
강경1.889822e-01-1.508250e-16-6.596480e-16-9.513827e-161.342132e-15-3.764467e-162.500000e-01-4.808572e-16-3.514756e-171.295287e-16...-5.429130e-16-7.275317e-16-4.822874e-16-4.120275e-162.377516e-162.426582e-167.071068e-01-9.363677e-164.320060e-16-5.896999e-16
경원5.640761e-013.015113e-01-3.764325e-155.454889e-162.132007e-01-1.995141e-154.264014e-01-2.054144e-151.139606e-011.740777e-01...1.230915e-012.132007e-017.934186e-163.015113e-011.753291e-161.690526e-163.015113e-017.684498e-175.978865e-16-4.458086e-16
말씀-2.022059e-15-1.213589e-155.000000e-015.325882e-16-8.647979e-165.000000e-01-1.416709e-155.000000e-011.889822e-01-1.170850e-15...4.082483e-01-3.560547e-16-2.738033e-16-2.348565e-16-5.874369e-16-5.874369e-16-2.718188e-16-3.601166e-163.535534e-01-5.811463e-16
보안법-1.187865e-15-1.490129e-15-8.834375e-16-2.424362e-16-3.356497e-179.541403e-16-8.473820e-16-2.294154e-152.278574e-165.773503e-01...7.976262e-17-1.886646e-168.423345e-16-2.185495e-163.524890e-171.096277e-171.194490e-15-5.833658e-16-4.987346e-16-1.937304e-16
채널3.585686e-012.981424e-01-2.779920e-152.236068e-016.742444e-16-3.752754e-153.162278e-01-9.008705e-162.535463e-013.872983e-01...2.738613e-014.867044e-16-2.263283e-169.073905e-168.719741e-179.340374e-172.812267e-161.290994e-011.581139e-012.236068e-01
자동차-2.423509e-15-1.796684e-151.000000e+002.775422e-16-9.820412e-16-9.696599e-16-1.724844e-151.000000e+003.779645e-01-4.717490e-16...4.082483e-01-7.413625e-16-1.432484e-15-5.726770e-16-8.010210e-16-8.010210e-16-1.023398e-15-8.822222e-16-8.499310e-16-3.059445e-16
만이2.672612e-011.666667e-01-1.593529e-15-3.594393e-165.543611e-16-8.018047e-163.535534e-01-3.818932e-161.889822e-01-4.067924e-16...-9.940897e-16-7.994824e-162.728278e-16-6.089410e-16-7.187808e-17-6.493919e-175.000000e-01-1.822301e-161.985616e-16-4.797663e-16
민주화2.672612e-014.894895e-16-9.111893e-161.694840e-151.236439e-15-3.042822e-152.042087e-15-1.338968e-15-3.329743e-163.446112e-16...-1.004869e-156.903158e-165.190756e-167.337792e-161.564993e-161.842549e-167.287847e-164.200757e-161.388336e-168.034669e-16
신기1.179237e-168.097646e-16-5.003771e-16-1.270199e-162.197437e-16-1.672671e-152.500000e-01-3.286274e-17-2.368109e-16-4.099551e-16...-5.769326e-16-1.815313e-164.518657e-169.935570e-172.403939e-162.502070e-166.723099e-16-5.810992e-171.401019e-17-5.209760e-17
여건-6.828427e-163.333333e-01-5.588181e-161.007427e-15-2.564426e-161.825763e-16-5.453939e-16-9.233485e-16-1.554519e-15-4.279073e-16...-1.177946e-16-1.744475e-16-4.984487e-16-1.389139e-16-6.544481e-17-6.544481e-17-1.756768e-16-7.773842e-164.083820e-16-7.908555e-16
시베리아8.840799e-16-6.344168e-17-7.147487e-171.000000e+00-1.122578e-15-6.025291e-161.036844e-15-8.412810e-17-1.501819e-16-1.425827e-15...-9.402002e-16-6.929379e-16-1.529615e-15-5.618828e-161.271070e-161.132293e-16-2.826490e-165.773503e-013.010031e-166.754328e-16
호통1.543033e-011.136791e-15-1.381280e-151.427338e-151.044314e-15-2.482535e-152.220736e-15-1.565057e-158.090835e-166.429731e-16...-4.612072e-161.045819e-158.378038e-161.063494e-151.291103e-161.210980e-161.124866e-151.231586e-154.082483e-017.405610e-16
시스3.086067e-011.924501e-01-8.626135e-164.306964e-169.839793e-16-7.754337e-16-6.230571e-168.858978e-184.364358e-014.183501e-16...-1.246887e-15-2.708082e-161.033099e-151.629011e-17-5.575005e-17-5.775314e-17-9.281776e-17-3.070160e-171.882064e-16-2.767250e-16
호치민2.758671e-16-2.086169e-16-6.223133e-171.000000e+00-4.660727e-17-7.119020e-163.356821e-16-1.017457e-16-3.044839e-16-1.001298e-15...-1.199733e-15-7.764533e-16-7.385508e-16-6.071734e-166.487416e-175.099637e-17-2.437566e-165.773503e-012.973974e-166.931617e-16
교통부-2.450926e-16-3.139535e-16-8.489829e-165.671229e-172.119374e-161.446691e-167.322861e-17-8.394943e-163.779645e-011.860673e-16...-5.944910e-161.632883e-16-2.938332e-169.582427e-171.000000e+001.000000e+001.401776e-161.633676e-161.375423e-16-1.108862e-16
..................................................................
부끄러움1.543033e-018.204404e-16-1.694243e-151.798978e-155.765568e-16-2.620976e-151.925605e-15-1.664377e-154.182624e-164.764923e-16...-6.492529e-169.815038e-166.925882e-161.073444e-151.389439e-161.349377e-161.136582e-159.342037e-164.082483e-018.293488e-16
가능1.336306e-013.333333e-01-2.129361e-159.936105e-161.840266e-16-1.527726e-153.535534e-011.890843e-161.889822e-01-6.790769e-16...4.082483e-01-1.795059e-16-1.500591e-165.775927e-17-4.130845e-16-4.200234e-16-3.341468e-16-3.263729e-16-2.504497e-17-9.831505e-16
벤처1.465616e-16-6.833597e-17-8.098341e-16-1.065432e-171.469673e-16-8.491971e-16-4.979419e-17-8.151178e-163.779645e-011.692606e-16...-9.723270e-161.897723e-16-2.974879e-168.601694e-171.000000e+001.000000e+001.340201e-169.909559e-171.181449e-16-7.568870e-17
어제4.008919e-011.666667e-01-1.015545e-151.498499e-158.610362e-16-1.907849e-151.767767e-01-2.652066e-161.889822e-013.949892e-16...2.041241e-015.321215e-16-3.598428e-168.412977e-161.984790e-161.967442e-16-2.481363e-162.886751e-013.289840e-16-1.099642e-15
칭호1.655131e-163.333333e-013.954247e-16-5.674830e-16-1.946303e-16-1.424620e-15-2.607685e-16-5.395634e-163.779645e-01-2.314916e-16...-1.363442e-15-4.375153e-163.850119e-163.403021e-17-2.899521e-16-2.886511e-168.454096e-16-4.635769e-16-3.770104e-16-5.462657e-16
아이러니2.672612e-01-7.527508e-16-9.246011e-166.275262e-163.179035e-16-1.730666e-155.120363e-16-1.016764e-153.779645e-01-1.480828e-16...-1.667039e-165.677957e-17-2.670453e-161.957916e-161.988127e-162.126905e-16-5.343087e-165.773503e-011.503970e-16-7.397933e-16
기준3.086067e-01-9.366787e-16-2.620001e-154.153618e-161.353856e-16-1.791282e-152.041241e-01-8.977938e-164.364358e-01-4.703431e-16...-7.909057e-166.885576e-17-2.080051e-16-1.129141e-165.773503e-015.773503e-01-1.018439e-153.333333e-013.265807e-17-1.570521e-16
개입1.374363e-15-4.968431e-16-1.134199e-155.000000e-011.917039e-16-6.435044e-165.303301e-01-3.850819e-16-9.315854e-16-2.281346e-16...2.041241e-01-2.474854e-16-1.311648e-162.436532e-161.694543e-161.555765e-16-4.154837e-162.886751e-015.874866e-16-2.733263e-16
있음-2.130203e-16-2.038423e-15-4.180614e-167.614281e-16-4.156377e-16-8.489675e-161.963651e-16-3.147073e-163.779645e-016.724061e-18...-1.469305e-155.713513e-16-2.784531e-168.663778e-16-1.265129e-16-1.230434e-16-1.019196e-15-2.338447e-16-9.010830e-171.000000e+00
매진6.490871e-163.333333e-01-1.893976e-152.035965e-151.886556e-17-1.648659e-154.880647e-16-1.251826e-15-1.089814e-151.133122e-16...2.041241e-015.631877e-16-1.654325e-165.823554e-16-1.258138e-16-1.327526e-166.147019e-16-6.003738e-173.535534e-01-6.087419e-16
소란1.001860e-157.224031e-16-1.249689e-156.915044e-162.976880e-16-1.168708e-151.039647e-15-9.278779e-162.728889e-16-2.083528e-16...6.399128e-175.980881e-162.979605e-165.089443e-161.885977e-161.677811e-167.135770e-168.046681e-167.071068e-013.657744e-16
아시아-8.652082e-16-2.228156e-165.773503e-014.976037e-171.577740e-16-1.127167e-152.041241e-015.773503e-012.182179e-01-8.752627e-16...2.357023e-01-5.476476e-16-9.193232e-17-2.342298e-16-2.082962e-16-1.842592e-16-5.714552e-17-5.572780e-16-4.561401e-16-2.264465e-16
확립2.672612e-01-1.314673e-17-1.723770e-15-2.214627e-161.073361e-15-2.225639e-157.943987e-16-3.342901e-163.241804e-17-5.001838e-17...4.082483e-014.187739e-16-3.553113e-174.652016e-16-1.719474e-16-1.997030e-16-2.430389e-166.492411e-164.630460e-16-5.053392e-17
균형1.889822e-012.357023e-01-1.121773e-171.234640e-16-2.685847e-167.071068e-01-8.061260e-16-1.489033e-15-1.228053e-15-7.973260e-16...5.773503e-01-2.873849e-169.763444e-17-3.111645e-16-5.711362e-16-5.527366e-16-1.379065e-16-7.318125e-16-4.540800e-16-9.980189e-16
이달6.622936e-168.526817e-17-9.845618e-16-7.320804e-173.062687e-16-3.642769e-163.535534e-01-1.646712e-16-5.415693e-16-5.175999e-16...-4.307152e-167.071068e-011.040311e-16-1.544741e-167.915257e-177.915257e-17-5.453564e-17-1.561105e-16-2.996262e-16-9.470969e-18
의뢰2.390457e-011.490712e-01-6.080700e-169.646622e-161.008042e-15-2.042172e-151.581139e-012.038280e-161.690309e-011.046467e-15...-1.320712e-158.006338e-162.320170e-166.617438e-169.946819e-179.946819e-172.121939e-167.006740e-169.204320e-16-2.199244e-16
변화1.889822e-01-2.461831e-16-1.768328e-15-3.708481e-165.000000e-01-1.371866e-152.500000e-01-6.804547e-16-4.704942e-16-5.262290e-16...-9.313993e-175.000000e-012.146736e-16-1.103007e-162.279462e-162.083200e-16-9.624912e-17-1.280167e-16-1.557247e-16-1.149629e-16
혼자2.672612e-01-3.280069e-18-9.317744e-161.413438e-151.088538e-15-2.338691e-16-6.922492e-168.602065e-163.779645e-011.541006e-16...-6.177635e-163.502385e-175.875253e-173.417332e-17-1.907159e-16-1.941854e-164.134129e-17-1.933867e-162.012455e-16-6.292850e-17
박지원2.672612e-017.008149e-16-8.828364e-161.707508e-151.057155e-15-2.071522e-151.649440e-15-1.288796e-151.484051e-163.777626e-16...-6.784538e-167.096127e-163.589086e-167.730656e-161.804374e-161.804374e-167.185000e-163.366222e-161.644674e-168.297184e-16
부분적-1.890682e-16-1.988226e-15-4.194743e-167.491553e-16-4.209950e-16-8.490727e-162.042810e-16-2.873347e-163.779645e-01-9.783881e-18...-1.475632e-155.699671e-16-3.076840e-168.545778e-16-1.164905e-16-1.130210e-16-1.017308e-15-2.070046e-16-1.029779e-161.000000e+00
핵심2.182179e-012.721655e-014.082483e-011.143539e-155.326886e-164.082483e-011.443376e-014.082483e-011.543033e-01-8.732159e-16...1.000000e+00-2.319058e-16-4.784199e-165.610211e-17-9.168681e-16-9.225337e-16-6.431829e-16-6.645728e-16-1.822889e-16-1.452701e-15
이제1.889822e-012.428841e-16-1.731820e-15-2.766023e-168.212049e-17-5.156016e-165.000000e-01-1.340623e-15-2.750663e-171.720707e-17...-2.319058e-161.000000e+00-2.966956e-167.071068e-011.604476e-161.604476e-16-5.739071e-16-4.460868e-167.260418e-175.495369e-16
배제4.523622e-16-8.793233e-16-3.422214e-16-2.454099e-175.000000e-01-7.403722e-16-2.787964e-16-8.124476e-17-4.065218e-16-5.186785e-16...-4.784199e-16-2.966956e-161.000000e+00-6.072885e-16-1.998530e-16-1.986264e-166.078213e-164.082483e-01-1.117246e-16-2.964576e-16
감정2.672612e-014.596401e-18-1.448835e-15-3.161317e-16-2.239780e-16-3.839905e-163.535534e-01-1.732294e-154.421610e-165.191784e-16...5.610211e-177.071068e-01-6.072885e-161.000000e+001.069518e-161.277685e-16-7.740290e-16-5.045745e-163.156100e-168.508213e-16
비판적1.569177e-16-4.104865e-17-8.184704e-161.906062e-171.299002e-16-8.175738e-16-1.508455e-17-8.125122e-163.779645e-011.566534e-16...-9.168681e-161.604476e-16-1.998530e-161.069518e-161.000000e+001.000000e+001.211008e-161.379819e-161.257630e-16-1.162324e-16
문체부1.485725e-16-4.041213e-18-8.202051e-165.182829e-181.403266e-16-8.175738e-16-2.489763e-17-8.142470e-163.779645e-011.566534e-16...-9.225337e-161.604476e-16-1.986264e-161.277685e-161.000000e+001.000000e+001.106925e-161.540065e-161.257630e-16-1.127629e-16
일방적2.672612e-01-4.404116e-17-3.368393e-16-1.969417e-162.088913e-154.207800e-17-7.814280e-16-1.020719e-15-5.085478e-161.011546e-15...-6.431829e-16-5.739071e-166.078213e-16-7.740290e-161.211008e-161.106925e-161.000000e+00-1.799724e-164.458281e-16-9.938009e-16
메시지1.543033e-01-1.210510e-15-5.469588e-165.773503e-014.082483e-01-1.387303e-154.547199e-16-4.700472e-162.182179e-01-4.871482e-16...-6.645728e-16-4.460868e-164.082483e-01-5.045745e-161.379819e-161.540065e-16-1.799724e-161.000000e+001.129956e-16-2.154203e-16
보험8.295622e-162.357023e-01-1.120320e-154.087718e-16-1.632470e-16-1.092039e-154.215397e-16-8.895341e-16-5.045943e-164.082483e-01...-1.822889e-167.260418e-17-1.117246e-163.156100e-161.257630e-161.257630e-164.458281e-161.129956e-161.000000e+00-1.081326e-16
현종-1.824265e-16-1.997698e-15-3.992337e-168.092316e-16-4.189833e-16-8.893801e-161.630744e-16-2.685442e-163.779645e-012.645252e-18...-1.452701e-155.495369e-16-2.964576e-168.508213e-16-1.162324e-16-1.127629e-16-9.938009e-16-2.154203e-16-1.081326e-161.000000e+00

2979 rows × 2979 columns

1.1 K 바꾸기

In [226]:
# K = 5
# 단어 관계 찾기
_sigma = np.diag(sigma[:5])
_sigma
_U = U[:,:5].dot(_sigma)
In [227]:
_U.shape
Out[227]:
(2979, 5)
In [228]:
pd.DataFrame(_U.dot(_U.T) / (np.linalg.norm(_U, axis=1).reshape(2979,1) * np.linalg.norm(_U.T, axis=0).reshape(1,2979)),index=coldata,columns=coldata)
Out[228]:
발언30포스절전두고분산촉구강물평가사이트...핵심이제배제감정비판적문체부일방적메시지보험현종
발언1.0000000.563623-0.0045720.0375780.649830-0.0072160.962735-0.0045720.2767210.570105...0.2335390.8954180.2659230.9082810.5375950.5375950.8661020.3074000.356009-0.059951
300.5636231.0000000.0749000.8447760.9678390.0411170.6637160.0749000.7569660.656030...0.6069270.7895230.9235750.4950630.7011340.7011340.2731020.9494990.2779500.777660
포스-0.0045720.0749001.0000000.107744-0.129727-0.002145-0.0010771.0000000.676156-0.121059...0.5958030.008641-0.095320-0.0422730.2937630.293763-0.0005270.036084-0.008466-0.058237
절전0.0375780.8447760.1077441.0000000.749531-0.0103400.1810100.1077440.7456330.419364...0.5537840.3824910.9411010.0166020.5139470.513947-0.2219700.9530130.0980760.977899
두고0.6498300.967839-0.1297270.7495311.000000-0.0043060.713470-0.1297270.6028240.757554...0.4430620.8148130.9030160.5465670.7117520.7117520.3317670.9067970.3822400.700025
분산-0.0072160.041117-0.002145-0.010340-0.0043061.0000000.052678-0.002145-0.001878-0.026795...0.587395-0.0277380.0323100.003335-0.216768-0.216768-0.028645-0.046929-0.0076480.023601
촉구0.9627350.663716-0.0010770.1810100.7134700.0526781.000000-0.0010770.3348410.453729...0.3269820.9721940.3644230.9643760.4409230.4409230.8890760.4065990.1505770.105389
강물-0.0045720.0749001.0000000.107744-0.129727-0.002145-0.0010771.0000000.676156-0.121059...0.5958030.008641-0.095320-0.0422730.2937630.293763-0.0005270.036084-0.008466-0.058237
평가0.2767210.7569660.6761560.7456330.602824-0.0018780.3348410.6761561.0000000.452452...0.8005080.4503110.6622150.1573780.7433640.7433640.0098690.7549650.2835260.594916
사이트0.5701050.656030-0.1210590.4193640.757554-0.0267950.453729-0.1210590.4524521.000000...0.2720730.4617430.6546810.2252610.8897020.8897020.0880770.6397590.8905240.328859
나발-0.0045720.0749001.0000000.107744-0.129727-0.002145-0.0010771.0000000.676156-0.121059...0.5958030.008641-0.095320-0.0422730.2937630.293763-0.0005270.036084-0.008466-0.058237
일훈0.8477890.910736-0.0251930.5559660.947671-0.0164280.892785-0.0251930.5733360.705296...0.4385740.9441680.7293330.7634730.6970530.6970530.5982070.7600290.3479250.476378
참사0.9120420.810776-0.0420630.3906790.884700-0.0029420.890033-0.0420630.4952410.805279...0.3757560.8915800.6172380.7471060.7591770.7591770.6163730.6419980.5302460.293710
부업0.2732490.1209690.000099-0.0386380.229524-0.0080250.0444740.0000990.1802170.807560...0.065464-0.0362770.164173-0.1511710.7053370.705337-0.1619130.1450030.986921-0.139561
30000.2401280.130652-0.016765-0.0062970.237632-0.0019230.015926-0.0167650.1847430.812221...0.068793-0.0564990.192442-0.1851340.7037340.703734-0.2041850.1687120.987466-0.101730
강경0.8625840.2860040.036051-0.2034310.334607-0.0289010.8909610.0360510.0416020.078320...0.0803340.810132-0.0683170.9689290.1017520.1017520.999122-0.009936-0.125058-0.260944
경원0.9557740.559227-0.0014580.0623060.6133600.0136920.989436-0.0014580.2425590.350555...0.2413560.9448990.2374780.9894750.3462630.3462630.9456790.2853070.070554-0.008258
말씀0.1447290.1905080.6724400.1009950.0577060.6614210.1217300.6724400.5581420.202931...0.8507200.0546980.067743-0.0089360.3251440.325144-0.0191370.1004790.316198-0.025640
보안법0.6968150.931701-0.2250150.6781440.988535-0.0548650.761701-0.2250150.4986810.720575...0.3341510.8539900.8519810.6228640.6470690.6470690.4201380.8517890.3281480.643704
채널0.7185990.881148-0.0183160.5984230.9338390.0046870.691410-0.0183160.6284150.921042...0.4587060.7382160.7956990.4841650.8807700.8807700.3078460.8073970.6663570.501257
자동차-0.0045720.0749001.0000000.107744-0.129727-0.002145-0.0010771.0000000.676156-0.121059...0.5958030.008641-0.095320-0.0422730.2937630.293763-0.0005270.036084-0.008466-0.058237
만이0.9121570.4789670.064061-0.0052470.5121640.0220860.9612020.0640610.2104620.211744...0.2358660.9105640.1348520.9934680.2411950.2411950.9731130.192697-0.052430-0.073829
민주화0.254681-0.0003880.041099-0.1740710.1070080.0118640.0104320.0410990.1076890.724507...0.028377-0.0980250.021544-0.1619540.6304610.630461-0.1366460.0065150.955455-0.278290
신기0.9528990.5644980.1218990.0747670.596691-0.0276820.9804280.1218990.3270950.347608...0.2900060.9393290.2252610.9741370.4005180.4005180.9369730.2918430.085866-0.018807
여건0.0302530.8377850.1995600.9945340.7219360.0367260.1755280.1995600.7956530.395235...0.6274390.3716990.9177670.0082470.5187130.518713-0.2241040.9378980.0907920.959637
시베리아0.0375780.8447760.1077441.0000000.749531-0.0103400.1810100.1077440.7456330.419364...0.5537840.3824910.9411010.0166020.5139470.513947-0.2219700.9530130.0980760.977899
호통0.2847550.0601670.017656-0.1202150.1709840.0067840.0467490.0176560.1374930.768689...0.045843-0.0523400.084478-0.1352600.6675160.667516-0.1255740.0677010.972724-0.223790
시스0.8529640.867097-0.0037390.5015010.885200-0.0041590.938991-0.0037390.5196030.526391...0.4252560.9897820.6427990.8606770.5431150.5431150.7122760.6830850.1380310.435320
호치민0.0375780.8447760.1077441.0000000.749531-0.0103400.1810100.1077440.7456330.419364...0.5537840.3824910.9411010.0166020.5139470.513947-0.2219700.9530130.0980760.977899
교통부0.5375950.7011340.2937630.5139470.711752-0.2167680.4409230.2937630.7433640.889702...0.4215990.4857170.6335910.2175301.0000001.0000000.0941450.6896990.7916480.358778
..................................................................
부끄러움0.2558390.0725670.001353-0.0870060.1824670.0080590.0214200.0013530.1443810.778056...0.048052-0.0692690.114636-0.1666820.6713830.671383-0.1654860.0938310.977522-0.185978
가능0.5112710.9794860.0549190.8537760.9285620.0647300.6589410.0549190.7103730.498722...0.5971740.7995400.8972770.5285780.5520000.5520000.3062680.9227800.0811240.810711
벤처0.5375950.7011340.2937630.5139470.711752-0.2167680.4409230.2937630.7433640.889702...0.4215990.4857170.6335910.2175301.0000001.0000000.0941450.6896990.7916480.358778
어제0.8600390.878445-0.0735780.5035840.9398960.0350210.868726-0.0735780.5330660.798630...0.4240470.8970500.7148500.7172560.7438930.7438930.5576190.7311350.4831190.421401
칭호0.0397130.8338430.0215810.9812060.7707590.0098400.1447560.0215810.7061020.546963...0.5151810.3322750.966377-0.0495600.5848920.584892-0.2922100.9598830.2587920.960651
아이러니0.8765610.7837110.1355630.3922830.833298-0.1537290.8272800.1355630.6049250.816188...0.3781100.8370420.5806590.6735230.8647480.8647480.5598890.6366170.5901860.258129
기준0.9801580.5477160.1886550.0356220.592887-0.0300760.9465110.1886550.3826450.505323...0.3189430.8813250.2159420.8959700.5597230.5597230.8692120.2849130.316766-0.090454
개입0.5918400.991602-0.0241980.8137840.9897550.0490860.671340-0.0241980.6941760.727545...0.5508920.7851670.9312280.4934520.7208580.7208580.2702470.9423100.3569240.754857
있음-0.0599510.777660-0.0582370.9778990.7000250.0236010.105389-0.0582370.5949160.328859...0.4487140.3134040.920700-0.0333070.3587780.358778-0.2747820.906593-0.0072261.000000
매진0.1567090.7072360.0559350.7485820.7062840.0076330.1171370.0559350.6684720.843564...0.4581120.2216300.833355-0.1401850.8343680.834368-0.3329980.8219300.7237240.674090
소란0.2732490.1209690.000099-0.0386380.229524-0.0080250.0444740.0000990.1802170.807560...0.065464-0.0362770.164173-0.1511710.7053370.705337-0.1619130.1450030.986921-0.139561
아시아0.3032800.2501330.9473430.1219550.074715-0.0108790.3153350.9473430.7194890.002224...0.6345620.310900-0.0138760.2759000.3959600.3959600.3018150.1269220.020016-0.058947
확립0.7382840.751371-0.0199330.4390300.836594-0.1725990.650264-0.0199330.5407230.951300...0.2815910.6704810.6576860.4537330.9239280.9239280.3204400.6808590.7693980.325383
균형0.1587410.218609-0.0070390.0985820.1811430.9806860.226472-0.0070390.1103060.111603...0.6637300.1574030.1744580.152629-0.076924-0.0769240.0888350.1025010.0605350.116408
이달0.3827320.9609570.1077530.9206940.919063-0.0760240.4599610.1077530.8029190.693045...0.5570280.6152820.9653800.2626420.7686360.7686360.0324590.9918840.3652760.850039
의뢰0.8225820.727826-0.1009780.3439850.8313470.0059780.739796-0.1009780.4437800.933685...0.3141000.7238410.6029160.5569100.8366300.8366300.4330380.6069180.7454200.245244
변화0.7104100.9756930.0200920.7236850.985536-0.0267330.7679240.0200920.6891060.745228...0.5115900.8600970.8594900.6028340.7631030.7631030.4035140.8879650.3843370.643799
혼자0.8891030.8278250.0268860.4294090.8470820.0229530.9652960.0268860.4985150.507648...0.4304300.9963040.5767810.8955610.5274050.5274050.7648300.6209690.1396120.355849
박지원0.254681-0.0003880.041099-0.1740710.1070080.0118640.0104320.0410990.1076890.724507...0.028377-0.0980250.021544-0.1619540.6304610.630461-0.1366460.0065150.955455-0.278290
부분적-0.0599510.777660-0.0582370.9778990.7000250.0236010.105389-0.0582370.5949160.328859...0.4487140.3134040.920700-0.0333070.3587780.358778-0.2747820.906593-0.0072261.000000
핵심0.2335390.6069270.5958030.5537840.4430620.5873950.3269820.5958030.8005080.272073...1.0000000.3671480.4944240.1761880.4215990.4215990.0526010.5329680.1458620.448714
이제0.8954180.7895230.0086410.3824910.814813-0.0277380.9721940.0086410.4503110.461743...0.3671481.0000000.5278780.9222310.4857170.4857170.8038820.5757110.0933170.313404
배제0.2659230.923575-0.0953200.9411010.9030160.0323100.364423-0.0953200.6622150.654681...0.4944240.5278781.0000000.1726390.6335910.633591-0.0760130.9868430.3109570.920700
감정0.9082810.495063-0.0422730.0166020.5465670.0033350.964376-0.0422730.1573780.225261...0.1761880.9222310.1726391.0000000.2175300.2175300.9680370.218188-0.064936-0.033307
비판적0.5375950.7011340.2937630.5139470.711752-0.2167680.4409230.2937630.7433640.889702...0.4215990.4857170.6335910.2175301.0000001.0000000.0941450.6896990.7916480.358778
문체부0.5375950.7011340.2937630.5139470.711752-0.2167680.4409230.2937630.7433640.889702...0.4215990.4857170.6335910.2175301.0000001.0000000.0941450.6896990.7916480.358778
일방적0.8661020.273102-0.000527-0.2219700.331767-0.0286450.889076-0.0005270.0098690.088077...0.0526010.803882-0.0760130.9680370.0941450.0941451.000000-0.022689-0.112132-0.274782
메시지0.3074000.9494990.0360840.9530130.906797-0.0469290.4065990.0360840.7549650.639759...0.5329680.5757110.9868430.2181880.6896990.689699-0.0226891.0000000.2944240.906593
보험0.3560090.277950-0.0084660.0980760.382240-0.0076480.150577-0.0084660.2835260.890524...0.1458620.0933170.310957-0.0649360.7916480.791648-0.1121320.2944241.000000-0.007226
현종-0.0599510.777660-0.0582370.9778990.7000250.0236010.105389-0.0582370.5949160.328859...0.4487140.3134040.920700-0.0333070.3587780.358778-0.2747820.906593-0.0072261.000000

2979 rows × 2979 columns

2. 문서 ~ 문서 Cosine Similarity

In [229]:
_U = np.diag(sigma).dot(Vt)
In [230]:
# 열이 문서차원 
_U.shape
Out[230]:
(40, 40)
In [231]:
# 문서에 대해서 
pd.DataFrame(_U.T.dot(_U) / (np.linalg.norm(_U.T, axis=1).reshape(40,1) * np.linalg.norm(_U, axis=0).reshape(1,40)))
Out[231]:
0123456789...30313233343536373839
01.0000000.1082390.1567200.1055570.2007590.1408020.1653980.1622620.1278390.154765...0.1053240.1672640.1638150.1746600.1688340.1969040.0849090.1948960.1471170.181832
10.1082391.0000000.1115970.0666300.1175070.0678700.1063010.1222100.0616220.170959...0.1343880.1529110.0987030.1052380.1186820.1107320.0744150.1031100.0768240.073039
20.1567200.1115971.0000000.0837170.1637700.0861450.2164410.1608620.1034170.207129...0.1252980.1029230.2568890.2077830.1255330.1327400.1193770.1753060.1866850.144210
30.1055570.0666300.0837171.0000000.0914840.1249470.1573520.1021560.2232090.124501...0.1196170.0871490.0875070.1924320.1003880.1212110.1198710.1236990.1317270.132283
40.2007590.1175070.1637700.0914841.0000000.0963390.2218100.1554320.1126610.160604...0.1602380.1602230.1569190.1725360.3706250.1383180.1220020.3961510.1315290.156760
50.1408020.0678700.0861450.1249470.0963391.0000000.1259350.1408070.1614650.116972...0.0874090.1046230.0800400.1393430.0614400.1675620.1581670.1053880.1251940.101799
60.1653980.1063010.2164410.1573520.2218100.1259351.0000000.2801200.1506600.213741...0.1597210.1115190.1851050.2914720.1360170.1510770.1776740.1838200.3115420.228800
70.1622620.1222100.1608620.1021560.1554320.1408070.2801201.0000000.1096980.167460...0.0909380.1041950.1681440.2156920.1032560.1333920.1786630.1341780.2436220.191615
80.1278390.0616220.1034170.2232090.1126610.1614650.1506600.1096981.0000000.181725...0.1375600.1231360.0953810.1677980.1141410.4596090.1219470.1014840.1549470.150102
90.1547650.1709590.2071290.1245010.1606040.1169720.2137410.1674600.1817251.000000...0.1934880.1552820.1683940.1763360.1666670.1555010.1537830.1535740.1742740.169706
100.1470770.1668120.1852620.0871490.1574610.1046230.2310040.1855980.1329870.484481...0.1611260.1250000.2044070.1656060.1727870.1422460.1561350.1442290.1771310.201414
110.1144940.0766510.1816100.0854310.1340470.1153800.1346990.1149090.2009800.164399...0.1184620.1164090.1265540.1217560.1195630.1812740.0942940.1161380.1198110.141644
120.2183250.0920830.1623310.1466150.1673080.1210080.2512690.1663910.1444930.188092...0.1468290.1261760.1493180.2039800.1474810.1316190.1372000.1516520.1564480.154691
130.1663300.1163840.1787250.1711500.1644680.1117450.3112260.2604960.1741140.190678...0.1110290.1240350.1334400.2640970.1575850.2058400.1590790.1533370.2636450.206375
140.1053040.0888280.1440650.1149140.1059150.1096560.1163460.1129510.1348890.136083...0.1416390.1369310.1178510.1535770.0865980.0865680.0950210.1044940.1250460.101259
150.1341370.1099180.1595760.2657330.1284910.1586060.1778430.1029870.1695320.156009...0.1387860.0981940.1534560.1907370.0945510.1352660.1175800.1192620.1274290.157497
160.1328060.1066930.1667690.1424480.1554860.1338340.5842100.2998960.1197130.174801...0.1068730.0923870.1467950.2353820.1126820.1374820.1379080.1288560.3021160.186702
170.1975810.0821020.2350510.1029450.4111590.0915450.1881890.1692360.0814550.168753...0.1057390.1312500.2118040.2049370.3681890.1045500.1273470.4435020.1450700.193094
180.1593470.0914400.1624860.0445870.1453710.0713690.1676720.1199430.0842380.130744...0.1256150.0803970.6133170.1452460.0891440.0914890.1027050.1204730.1429260.130569
190.1074100.1065940.1503290.1538080.1452550.0594270.1745190.1141400.1348890.197320...0.1765050.1217160.1099940.1554960.2078350.1177320.0895910.1705340.1293580.159882
200.1571640.1090440.1529750.1319280.1414240.1055870.2017080.1666960.1464140.200024...0.1626110.1100960.1599010.1996910.1510670.1252860.1289240.1776940.1833140.168722
210.1852140.1287660.1101690.1076360.1432990.1148600.1442940.1367330.0790830.176443...0.1695210.1440930.2480310.1623310.1422710.1093150.0872370.1583410.1866850.115368
220.1180070.0853230.1052900.1436350.1695590.0991000.1629760.1265770.1209290.114374...0.1404110.0568330.1173930.1398330.0831810.1034780.1102860.0983630.1352990.143335
230.1573880.1120730.1361710.1741530.1439100.1297690.2502980.1696250.1771690.138675...0.1480390.0826900.1245440.2086700.1260680.1411470.1244970.1363000.2167750.195514
240.1671200.1127790.1829090.1066170.1792960.1212580.1907990.1434050.1370060.208768...0.1659960.1545330.1163760.1645180.1413600.1685260.1206400.1379740.1313890.162392
250.1814240.1356990.1935030.0945260.1977570.1134790.2303990.1907140.1442450.431174...0.1424020.1385940.1867040.1938810.1543410.1302870.1491910.1589210.1690710.183636
260.1281190.0988110.1889960.1376600.1374530.1032890.3161360.2372360.1531730.204199...0.1378620.1332780.1083350.1915220.1384740.1123450.1420330.1708670.2140440.212678
270.1558910.1052010.1706190.0994530.4181210.0837850.1607550.1267290.0798750.154449...0.1483900.1621690.1085560.1477800.4248880.1230280.1085150.3514590.1174540.132546
280.1684170.1014770.1669640.1272370.1989710.1093760.1705710.1331090.1246460.169278...0.2091050.1081480.1535800.1637480.1582860.1292220.1049320.2094590.2022910.170471
290.1255030.1099920.1764510.1197170.2021890.1034790.4727160.1610260.1412770.178159...0.1652670.1428660.1702510.2235950.1340380.1375640.1250140.1992600.2289220.167454
300.1053240.1343880.1252980.1196170.1602380.0874090.1597210.0909380.1375600.193488...1.0000000.1133850.1309710.1694190.1091780.1222370.1038240.0934480.0862560.137964
310.1672640.1529110.1029230.0871490.1602230.1046230.1115190.1041950.1231360.155282...0.1133851.0000000.0860660.1156610.1626230.1327630.1263950.1419390.1086410.099247
320.1638150.0987030.2568890.0875070.1569190.0800400.1851050.1681440.0953810.168394...0.1309710.0860661.0000000.1900410.1049730.1224260.1295800.1300430.1646460.135665
330.1746600.1052380.2077830.1924320.1725360.1393430.2914720.2156920.1677980.176336...0.1694190.1156610.1900411.0000000.1795430.1136710.1583070.1689830.2190270.204414
340.1688340.1186820.1255330.1003880.3706250.0614400.1360170.1032560.1141410.166667...0.1091780.1626230.1049730.1795431.0000000.1233730.1133530.3797460.1267450.149532
350.1969040.1107320.1327400.1212110.1383180.1675620.1510770.1333920.4596090.155501...0.1222370.1327630.1224260.1136710.1233731.0000000.1311420.1042080.1236290.146158
360.0849090.0744150.1193770.1198710.1220020.1581670.1776740.1786630.1219470.153783...0.1038240.1263950.1295800.1583070.1133530.1311421.0000000.1317440.1801620.132823
370.1948960.1031100.1753060.1236990.3961510.1053880.1838200.1341780.1014840.153574...0.0934480.1419390.1300430.1689830.3797460.1042080.1317441.0000000.2180040.173215
380.1471170.0768240.1866850.1317270.1315290.1251940.3115420.2436220.1549470.174274...0.0862560.1086410.1646460.2190270.1267450.1236290.1801620.2180041.0000000.188621
390.1818320.0730390.1442100.1322830.1567600.1017990.2288000.1916150.1501020.169706...0.1379640.0992470.1356650.2044140.1495320.1461580.1328230.1732150.1886211.000000

40 rows × 40 columns

In [232]:
# K = 5 바꾸기
_U = _sigma.dot(Vt[:5,:])
_U.shape
Out[232]:
(5, 40)
In [233]:
# 차원을 줄이면서 정보를 버리게 되면서 관계를 찾아내게 됨.
pd.DataFrame(_U.T.dot(_U) / (np.linalg.norm(_U.T, axis=1).reshape(40,1) * np.linalg.norm(_U, axis=0).reshape(1,40)))
Out[233]:
0123456789...30313233343536373839
01.0000000.9583100.9106560.6010130.8075390.4081360.5869770.6867330.5947850.889547...0.8678240.9372990.8967790.8413930.7941270.7503680.1538270.7389880.5845500.883858
10.9583101.0000000.9536270.7162220.7244010.2412530.6476510.7384930.6894660.975465...0.9549800.9460110.9650650.9144790.7282050.8126510.2846110.6351150.6354570.950619
20.9106560.9536271.0000000.5246990.7015820.1966740.8283510.8788830.4950140.959872...0.8508760.8484790.9930200.9590120.6888650.6450010.2919110.6333350.8202780.986170
30.6010130.7162220.5246991.0000000.1881220.1857080.2182820.3160450.9932890.734031...0.8821000.6797250.6004270.6216110.2279090.9604560.2088690.0574750.1777200.613423
40.8075390.7244010.7015820.1881221.0000000.1408880.2975810.3981940.1485870.582854...0.5527370.8143690.6574890.4851220.9958440.3281450.1717190.9905720.3237510.587497
50.4081360.2412530.1966740.1857080.1408881.0000000.1761390.3331090.2813690.205208...0.1577350.3303080.1865970.2742600.0828460.4127370.1580440.1463300.2469980.225939
60.5869770.6476510.8283510.2182820.2975810.1761391.0000000.9648730.2095430.749117...0.5252110.4350240.8108600.8788080.2572690.3387390.2264890.2597820.9898590.848187
70.6867330.7384930.8788830.3160450.3981940.3331090.9648731.0000000.3127890.822013...0.6080140.5923750.8734610.9233930.3599960.4701510.4120250.3577740.9829090.898096
80.5947850.6894660.4950140.9932890.1485870.2813690.2095430.3127891.0000000.706928...0.8543460.6557740.5687570.6084920.1813750.9702030.1810320.0206860.1725650.590577
90.8895470.9754650.9598720.7340310.5828540.2052080.7491170.8220130.7069281.000000...0.9492930.8760410.9834240.9693700.5860950.8135400.3498450.4869970.7332840.983190
100.8787580.9576680.9843510.6292160.6036010.2010770.8230900.8856900.6007200.989150...0.8946690.8475850.9973100.9830770.5982830.7282470.3716820.5213030.8146240.996572
110.9165080.9634880.8689540.8551710.5632030.3383540.5703130.6694660.8471440.952764...0.9809900.9040340.8982540.8900560.5703150.9326020.2340880.4556680.5486300.903220
120.9001720.9638280.9778550.6657350.5826590.2516550.8150280.8716620.6462250.990653...0.9123050.8434560.9887360.9887820.5743970.7674670.2900100.4949010.7982330.996995
130.6916430.7722480.9008920.4071520.3586290.1891720.9787710.9712230.3934930.864603...0.6847990.5744130.8991560.9551790.3306150.5136580.2755820.2974880.9634630.931062
140.7966770.8658440.6917740.9573700.4540570.2756660.3148210.4368110.9473110.843510...0.9580050.8560860.7453820.7262570.4829420.9786690.2402140.3346100.2893840.742350
150.7462060.8189030.6491000.9749840.3285690.3409880.3344880.4503500.9781580.819814...0.9297190.7883010.7085220.7271690.3523320.9961930.2243600.2058990.3070550.722231
160.6052330.6573080.8255670.2574520.2780930.2584690.9955460.9739220.2579680.759173...0.5411080.4513550.8114110.8919430.2350360.3889810.2360540.2372980.9891580.854186
170.8084940.7309710.7362260.1517680.9951380.1399840.3732880.4663230.1114200.603027...0.5427360.8038840.6891040.5242830.9860360.3013210.2002600.9895820.4021860.622084
180.8926710.9537250.9967970.5661870.6495450.1844680.8400940.8885270.5364040.974408...0.8690540.8370710.9975890.9752800.6396310.6744020.3110830.5745310.8286680.995300
190.9059890.9784730.9243890.7466820.6772610.0637650.6121990.6655330.7056590.960317...0.9697960.8924520.9397740.8835900.6933400.7915770.1881490.5765040.5748820.925134
200.8780520.9674900.9536490.7376310.5631160.1352050.7497580.7948690.7070950.992507...0.9541080.8413800.9735480.9625280.5684760.7978630.2497970.4622340.7171450.977632
210.9556190.9655900.9860320.5701030.7036960.3186150.7913210.8527640.5560030.955567...0.8695180.8704470.9760600.9547930.6856050.7081500.2174800.6320170.7817330.977315
220.8387970.9256850.7995150.9229340.4897140.2609720.4742830.5886420.9063870.926087...0.9856760.8870470.8494320.8347570.5121770.9617600.3243230.3734210.4543210.847407
230.8625300.9329590.9271090.7378300.4630170.3409300.7918280.8604350.7323450.977691...0.9137280.8108020.9510680.9844060.4541170.8362370.3093510.3669820.7764600.972927
240.9437590.9765660.9814510.6389770.6602250.3280940.7813960.8586430.6245650.979587...0.9024870.8880250.9851300.9707210.6476300.7666180.2958420.5805020.7755990.986951
250.8947540.9634590.9904450.6144530.6334090.2098130.8189960.8821000.5863090.986701...0.8920430.8579130.9991770.9793390.6266000.7198180.3545680.5536010.8111370.996595
260.6517990.7179230.8776670.2919730.3615060.1649720.9949490.9701580.2779000.808862...0.6035180.5131650.8637570.9174270.3268930.4078770.2388090.3145200.9816570.895225
270.8008970.7204660.6868250.2036710.9991450.1270150.2663530.3699420.1626340.575360...0.5570850.8173530.6452860.4683780.9982520.3364350.1714000.9878240.2915640.573031
280.9726920.9801550.9729950.5810360.7915780.2302800.6962720.7611060.5545500.941304...0.8897920.9114570.9613840.9025400.7839880.7005230.1859770.7185620.6817830.946564
290.6878000.7261040.8910080.2373450.4418430.1702350.9854540.9558740.2229100.794587...0.5872340.5293460.8637470.9011410.4030780.3679650.1764160.4027020.9717190.890142
300.8678240.9549800.8508760.8821000.5527370.1577350.5252110.6080140.8543460.949293...1.0000000.8880710.8888710.8597480.5753060.9124580.2403440.4371030.4910510.883715
310.9372990.9460110.8484790.6797250.8143690.3303080.4350240.5923750.6557740.876041...0.8880711.0000000.8639680.7688840.8240750.7960100.4079680.7415130.4556600.824169
320.8967790.9650650.9930200.6004270.6574890.1865970.8108600.8734610.5687570.983424...0.8888710.8639681.0000000.9711010.6519190.7043990.3558480.5795180.8033770.993330
330.8413930.9144790.9590120.6216110.4851220.2742600.8788080.9233930.6084920.969370...0.8597480.7688840.9711011.0000000.4699620.7280230.3120300.4017030.8646300.991164
340.7941270.7282050.6888650.2279090.9958440.0828460.2572690.3599960.1813750.586095...0.5753060.8240750.6519190.4699621.0000000.3473350.1879670.9803610.2797270.576866
350.7503680.8126510.6450010.9604560.3281450.4127370.3387390.4701510.9702030.813540...0.9124580.7960100.7043990.7280230.3473351.0000000.2679120.2104400.3220850.718980
360.1538270.2846110.2919110.2088690.1717190.1580440.2264890.4120250.1810320.349845...0.2403440.4079680.3558480.3120300.1879670.2679121.0000000.1664390.3287460.309756
370.7389880.6351150.6333350.0574750.9905720.1463300.2597820.3577740.0206860.486997...0.4371030.7415130.5795180.4017030.9803610.2104400.1664391.0000000.2964860.505994
380.5845500.6354570.8202780.1777200.3237510.2469980.9898590.9829090.1725650.733284...0.4910510.4556600.8033770.8646300.2797270.3220850.3287460.2964861.0000000.834408
390.8838580.9506190.9861700.6134230.5874970.2259390.8481870.8980960.5905770.983190...0.8837150.8241690.9933300.9911640.5768660.7189800.3097560.5059940.8344081.000000

40 rows × 40 columns

문서 내 단어간의 유사도

In [234]:
# K = 5
_sigma = np.diag(sigma[:5])
_U = U[:,:5].dot(_sigma)
In [236]:
cluster = pd.DataFrame(_U,index=coldata)
cluster
Out[236]:
01234
발언2.369581-0.836695-0.1443840.813850-1.161821
301.2329220.0461060.4036840.3619280.610832
포스0.1977660.5772850.536741-0.425677-0.366042
절전0.1568720.0666060.1417000.0410880.251585
두고0.274984-0.0617510.0492590.0909260.132552
분산0.2394030.607510-0.737459-0.0796450.104458
촉구1.218676-0.186607-0.0053940.705432-0.459797
강물0.1977660.5772850.536741-0.425677-0.366042
평가1.0629350.5616590.895490-0.3581610.318224
사이트0.430040-0.270124-0.021477-0.1730820.159062
나발0.1977660.5772850.536741-0.425677-0.366042
일훈0.179643-0.0363180.0283000.0655910.018432
참사0.316409-0.1043570.0106790.058574-0.007663
부업0.177889-0.211081-0.075403-0.3130030.016532
30000.146436-0.177308-0.064159-0.2671530.032358
강경0.362483-0.063681-0.0153070.398809-0.460040
경원1.995896-0.365314-0.0160191.401452-1.140363
말씀0.7425640.950088-0.251962-0.805080-0.238978
보안법0.205547-0.0685610.0312430.0986420.089656
채널3.088727-0.9694240.316349-0.0895230.874423
자동차0.1977660.5772850.536741-0.425677-0.366042
만이0.710152-0.0473550.0177570.596906-0.545343
민주화0.222092-0.294330-0.135534-0.482251-0.051334
신기0.263535-0.0332610.0310350.157818-0.163540
여건0.1233990.0718240.1100440.0195510.180713
시베리아0.1568720.0666060.1417000.0410880.251585
호통0.545251-0.670008-0.284991-1.022469-0.056182
시스0.525090-0.0494180.0992170.310381-0.003447
호치민0.1568720.0666060.1417000.0410880.251585
교통부0.144714-0.0554860.067303-0.0730550.026855
..................
부끄러움0.525449-0.660899-0.277153-1.0158970.011664
가능0.6729150.1183030.2521020.3575780.371721
벤처0.144714-0.0554860.067303-0.0730550.026855
어제0.471845-0.1276010.0193840.1125170.051449
칭호0.1304400.0224250.0873730.0030870.214820
아이러니0.111598-0.0394130.0271200.003489-0.010765
기준0.473540-0.1009980.0401130.125444-0.284033
개입0.593166-0.0456450.1347180.1596600.306130
있음0.1583280.0840210.1492700.1056220.387031
매진0.596566-0.1695240.178759-0.4085560.641623
소란0.177889-0.211081-0.075403-0.3130030.016532
아시아0.4613010.5440240.567776-0.267858-0.529581
확립0.162838-0.0916670.028184-0.0250760.024234
균형0.4190460.571192-0.709159-0.0140540.122890
이달0.113932-0.0037550.0609660.0111100.086732
의뢰0.699156-0.336653-0.045144-0.0457280.046333
변화0.266071-0.0392930.0690450.0709670.084302
혼자0.197166-0.0150120.0294160.112143-0.021130
박지원0.222092-0.294330-0.135534-0.482251-0.051334
부분적0.1583280.0840210.1492700.1056220.387031
핵심1.1048961.1587870.074804-0.3053250.238107
이제0.317579-0.0370370.0569900.208030-0.054266
배제0.241414-0.0107200.0862170.0535680.275454
감정0.203647-0.033283-0.0039760.196920-0.140998
비판적0.144714-0.0554860.067303-0.0730550.026855
문체부0.144714-0.0554860.067303-0.0730550.026855
일방적0.226454-0.054046-0.0221820.254235-0.290426
메시지0.3913140.0009800.2100000.0756460.375803
보험0.255946-0.235336-0.063964-0.3175740.053579
현종0.1583280.0840210.1492700.1056220.387031

2979 rows × 5 columns

통상적으로는 첫 번째 컬럼이 변별력이 없기 때문에 버린다

In [251]:
# 2번째
temp = cluster.sort_values(by=[1],ascending=False)
ranking = temp[temp[1] > 0 ][1].to_dict()
print(list(ranking.keys())[:5])
print(list(ranking.values())[:5])
['성과', '제안', '지금', '역사', '보기']
[1.4484458092330283, 1.2961552344051241, 1.2302949924041506, 1.2142382529094597, 1.2031049279258397]
In [253]:
# 3번째
temp = cluster.sort_values(by=[2],ascending=False)
ranking = temp[temp[2] > 0 ][2].to_dict()
print(list(ranking.keys())[:5])
print(list(ranking.values())[:5])
['최근', '이번', '장관', '정상', '정책']
[1.261455103177059, 1.2496831743653438, 1.1790329587285966, 1.129323254832694, 1.0620208403636429]
In [254]:
temp = cluster.sort_values(by=[3],ascending=False)
ranking = temp[temp[3] > 0 ][3].to_dict()
print(list(ranking.keys())[:5])
print(list(ranking.values())[:5])
['대표', '원내', '단체', '교섭', '경원']
[1.7764712272588647, 1.5966030961602944, 1.5616105565313059, 1.4331206562274619, 1.4014516862874566]
In [255]:
temp = cluster.sort_values(by=[4],ascending=False)
ranking = temp[temp[4] > 0 ][4].to_dict()
print(list(ranking.keys())[:5])
print(list(ranking.values())[:5])
['국무', '청와대', '지시', '분석', '대통령']
[1.4519332749861376, 1.393792647999382, 1.3629042239054683, 1.3176724673796523, 1.2828038733880196]

2. LDA 구현하기

In [1]:
# 각 list가 documents를 말함. 
collection = [
    ["Hadoop", "Big Data", "HBase", "Java", "Spark", "Storm", "Cassandra"],
    ["NoSQL", "MongoDB", "Cassandra", "HBase", "Postgres"],
    ["Python", "scikit-learn", "scipy", "numpy", "statsmodels", "pandas"],
    ["R", "Python", "statistics", "regression", "probability"],
    ["machine learning", "regression", "decision trees", "libsvm"],
    ["Python", "R", "Java", "C++", "Haskell", "programming languages"],
    ["statistics", "probability", "mathematics", "theory"],
    ["machine learning", "scikit-learn", "Mahout", "neural networks"],
    ["neural networks", "deep learning", "Big Data", "artificial intelligence"],
    ["Hadoop", "Java", "MapReduce", "Big Data"],
    ["statistics", "R", "statsmodels"],
    ["C++", "deep learning", "artificial intelligence", "probability"],
    ["pandas", "R", "Python"],
    ["databases", "HBase", "Postgres", "MySQL", "MongoDB"],
    ["libsvm", "regression", "support vector machines"]
]
In [3]:
type(collection)
Out[3]:
list
In [401]:
from collections import defaultdict

documents = defaultdict(lambda: defaultdict(int)) # DTM
vocabulary = list()
# i : 문서제목 / d : i번째 문서 내 단어목록 
for i, d in enumerate(collection):  
    for term in d:
        documents[i][term.lower()] += 1
        vocabulary.append(term.lower())
        
vocabulary = list(set(vocabulary))
In [402]:
# D : docu, a,b
# alpha, beta 만들기 
a = 0.1
b = 0.1

K = 3 # 전체 토픽 수
M = len(documents) # 전체 문서의 수 
V = len(vocabulary) # 전체 단어의 수 
# N은 특정 문서마다 항상 다르다. 

# 특정 토픽에 몇 개의 단어가 있는지 -> 분모
# 특정 토픽 : sum(단어)
topicTermCount = defaultdict(int)

# 특정 문서의 단어에 상관없이 토픽 할당 횟수 
docTopicDistribution = defaultdict(lambda: defaultdict(int))
# [document][0번째토픽: 몇 개의 단어, 1번째 토픽:몇 개의 단어]

# 문서에 상관없이 특정 단어의 토픽 할당 횟수 
topicTermDistribution = defaultdict(lambda: defaultdict(int))
# [topic][vocab 0 : 몇 번, ... , n]

# Z_ml = m번째 문서 1번째 단어의 Topic
# M개의 문서만큼 -> N개의 단어 -> Topic
termTopicAssignmentMatrix = defaultdict(lambda:defaultdict(int))
# Z[documents][term] = topic
# n(i,(j,r)) = i번째 토픽의 횟수, j번째 문서의 r번째 단어
In [403]:
from random import randrange,seed

seed(0)

for i,termList in enumerate(collection):
    for j, term in enumerate(termList):
        token = term.lower()
        topic = randrange(K)
        
        topicTermCount[topic] += 1
        docTopicDistribution[i][topic] += 1
        topicTermDistribution[topic][term] += 1
        termTopicAssignmentMatrix[i][j] = topic
In [408]:
from random import random

def collapsedGibbsSampling(i,term):
    sampling = list()
    # k번째 토픽에 대한 확률     
    for k in range(K):
        sampling.append(likelighoodAlpha(i,k) * likelighoodBeta(k,term))
    # 0~1의 실수값을 가짐 
    threshold =  sum(sampling) * random()   
    
    for topicNo, topicProbability in enumerate(sampling):
        threshold -= topicProbability
        
        if threshold <= 0.0:
            return topicNo
    
#     print(sampling)
#     return termTopicAssignmentMatrix[i][term]
In [409]:
def likelighoodAlpha(i,k):
    return docTopicDistribution[i][k] + a
In [410]:
def likelighoodBeta(k,term):
    return  (topicTermDistribution[k][term] +b) / (topicTermCount[k] + b * V)
In [411]:
iterationNumber = 1000

for _ in range(iterationNumber):
    # m을 고정, l을 고정해야함 -> topicTermAssingnmentMatrix
    # m,l => i, j
    for i,termList in enumerate(collection):
        for j,term in enumerate(termList):
            topic = termTopicAssignmentMatrix[i][j]
            
            topicTermCount[topic] -= 1
            docTopicDistribution[i][topic] -= 1
            topicTermDistribution[topic][term] -= 1
            
            topic = collapsedGibbsSampling(i,term)
            
            topicTermCount[topic] += 1
            docTopicDistribution[i][topic] += 1
            topicTermDistribution[topic][term] += 1
            
            termTopicAssignmentMatrix[i][j] = topic
In [412]:
topicTermCount
Out[412]:
defaultdict(int, {1: 23, 0: 24, 2: 20})
In [413]:
topicTermDistribution
Out[413]:
defaultdict(<function __main__.<lambda>()>,
            {1: defaultdict(int,
                         {'Hadoop': 2,
                          'Big Data': 3,
                          'Java': 3,
                          'Storm': 1,
                          'Cassandra': 2,
                          'NoSQL': 1,
                          'MongoDB': 2,
                          'scipy': 0,
                          'R': 0,
                          'machine learning': 0,
                          'programming languages': 0,
                          'statistics': 0,
                          'probability': 0,
                          'Mahout': 0,
                          'neural networks': 0,
                          'deep learning': 0,
                          'artificial intelligence': 0,
                          'Python': 0,
                          'HBase': 3,
                          'Spark': 1,
                          'Postgres': 2,
                          'scikit-learn': 0,
                          'numpy': 0,
                          'statsmodels': 0,
                          'pandas': 0,
                          'regression': 0,
                          'decision trees': 0,
                          'libsvm': 0,
                          'C++': 0,
                          'Haskell': 0,
                          'mathematics': 0,
                          'theory': 0,
                          'MapReduce': 1,
                          'databases': 1,
                          'MySQL': 1,
                          'support vector machines': 0}),
             0: defaultdict(int,
                         {'HBase': 0,
                          'Postgres': 0,
                          'scikit-learn': 2,
                          'numpy': 1,
                          'statsmodels': 0,
                          'probability': 0,
                          'regression': 3,
                          'libsvm': 2,
                          'Haskell': 1,
                          'machine learning': 2,
                          'Big Data': 0,
                          'Hadoop': 0,
                          'Java': 0,
                          'C++': 2,
                          'pandas': 0,
                          'MongoDB': 0,
                          'Spark': 0,
                          'Storm': 0,
                          'Cassandra': 0,
                          'NoSQL': 0,
                          'Python': 2,
                          'scipy': 0,
                          'R': 0,
                          'statistics': 0,
                          'decision trees': 1,
                          'programming languages': 0,
                          'mathematics': 0,
                          'theory': 0,
                          'Mahout': 1,
                          'neural networks': 2,
                          'deep learning': 2,
                          'artificial intelligence': 2,
                          'MapReduce': 0,
                          'databases': 0,
                          'MySQL': 0,
                          'support vector machines': 1}),
             2: defaultdict(int,
                         {'Spark': 0,
                          'HBase': 0,
                          'Python': 2,
                          'pandas': 2,
                          'statistics': 3,
                          'regression': 0,
                          'decision trees': 0,
                          'C++': 0,
                          'mathematics': 1,
                          'theory': 1,
                          'scikit-learn': 0,
                          'neural networks': 0,
                          'artificial intelligence': 0,
                          'MapReduce': 0,
                          'R': 4,
                          'statsmodels': 2,
                          'deep learning': 0,
                          'databases': 0,
                          'MySQL': 0,
                          'support vector machines': 0,
                          'Hadoop': 0,
                          'Big Data': 0,
                          'Java': 0,
                          'Storm': 0,
                          'Cassandra': 0,
                          'NoSQL': 0,
                          'MongoDB': 0,
                          'Postgres': 0,
                          'scipy': 1,
                          'numpy': 0,
                          'probability': 3,
                          'machine learning': 0,
                          'libsvm': 0,
                          'Haskell': 0,
                          'programming languages': 1,
                          'Mahout': 0})})
In [414]:
docTopicDistribution
Out[414]:
defaultdict(<function __main__.<lambda>()>,
            {0: defaultdict(int, {1: 7, 0: 0, 2: 0}),
             1: defaultdict(int, {1: 5, 2: 0, 0: 0}),
             2: defaultdict(int, {2: 3, 0: 3, 1: 0}),
             3: defaultdict(int, {1: 0, 2: 3, 0: 2}),
             4: defaultdict(int, {1: 0, 0: 4, 2: 0}),
             5: defaultdict(int, {2: 3, 1: 1, 0: 2}),
             6: defaultdict(int, {1: 0, 2: 4, 0: 0}),
             7: defaultdict(int, {0: 4, 2: 0, 1: 0}),
             8: defaultdict(int, {2: 0, 1: 1, 0: 3}),
             9: defaultdict(int, {0: 0, 2: 0, 1: 4}),
             10: defaultdict(int, {2: 3, 0: 0, 1: 0}),
             11: defaultdict(int, {0: 3, 2: 1, 1: 0}),
             12: defaultdict(int, {0: 0, 2: 3, 1: 0}),
             13: defaultdict(int, {2: 0, 0: 0, 1: 5}),
             14: defaultdict(int, {0: 3, 2: 0, 1: 0})})
In [415]:
termTopicAssignmentMatrix
Out[415]:
defaultdict(<function __main__.<lambda>()>,
            {0: defaultdict(int, {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1}),
             1: defaultdict(int, {0: 1, 1: 1, 2: 1, 3: 1, 4: 1}),
             2: defaultdict(int, {0: 0, 1: 0, 2: 2, 3: 0, 4: 2, 5: 2}),
             3: defaultdict(int, {0: 2, 1: 0, 2: 2, 3: 0, 4: 2}),
             4: defaultdict(int, {0: 0, 1: 0, 2: 0, 3: 0}),
             5: defaultdict(int, {0: 2, 1: 2, 2: 1, 3: 0, 4: 0, 5: 2}),
             6: defaultdict(int, {0: 2, 1: 2, 2: 2, 3: 2}),
             7: defaultdict(int, {0: 0, 1: 0, 2: 0, 3: 0}),
             8: defaultdict(int, {0: 0, 1: 0, 2: 1, 3: 0}),
             9: defaultdict(int, {0: 1, 1: 1, 2: 1, 3: 1}),
             10: defaultdict(int, {0: 2, 1: 2, 2: 2}),
             11: defaultdict(int, {0: 0, 1: 0, 2: 0, 3: 2}),
             12: defaultdict(int, {0: 2, 1: 2, 2: 2}),
             13: defaultdict(int, {0: 1, 1: 1, 2: 1, 3: 1, 4: 1}),
             14: defaultdict(int, {0: 0, 1: 0, 2: 0})})


728x90
반응형