这是我使用的代码,用于计算单词共现矩阵以获取相邻计数。我在网上找到了以下代码,它使用SVD。
import numpy as np
la = np.linalg
words = ['I','like','enjoying','deep','learning','NLP','flying','.']
### A Co-occurence matrix which counts how many times the word before and after a particular word appears ( ie, like appears after I 2 times)
arr = np.array([[0,2,1,0,0,0,0,0],[2,0,0,1,0,1,0,0],[1,0,0,0,0,0,1,0],[0,0,0,1,0,0,0,1],[0,1,0,0,0,0,0,1],[0,0,1,0,0,0,0,8],[0,2,1,0,0,0,0,0],[0,0,1,1,1,0,0,0]])
u, s, v = la.svd(arr, full_matrices=False)
import matplotlib.pyplot as plt
for i in xrange(len(words)):
plt.text(u[i,2], u[i,3], words[i])
在最后一行代码中,使用U的第一个元素作为x坐标,使用U的第二个元素作为y坐标来投影单词,以查看相似性。这种方法背后的直觉是什么?为什么他们要将每行(每行代表每个单词)的第1和第2个元素作为x和y来表示一个单词?请帮忙。