在R中将矩阵转换为文档-词项矩阵

3

I have a character vector that looks like this:

charVec[1:10]
[1] "dentistry"  "free"       "cache"      "key"        "containing" "cite"       "templates"  "deprecated" "errors"     "dates"  

然后我制作了向量的所有三个字母组合:

combwords <- t(combn(charVec,3))

这让我得到了以下的矩阵组合词:
    [,1]     [,2]     [,3]       
[1,] "import" "school" "dentistry"
[2,] "import" "school" "school"   
[3,] "import" "school" "log"      
[4,] "import" "school" "search"   
[5,] "import" "school" "current"  
[6,] "import" "school" "advanced" 

现在我想为combwords矩阵的每一行创建一个文档术语矩阵(DTM):
word_corpus <- Corpus(VectorSource(combwords))

这没用...我怎样才能让矩阵(combwords)的每一行成为语料库中的一行?
1个回答

3
library(tm)

foo <- apply(combwords, 1, paste, collapse = " ")
foo

##  [1] "dentistry free cache"       "dentistry free key"        
##  [3] "dentistry free containing"  "dentistry free cite"       
##  [5] "dentistry cache key"        "dentistry cache containing"
##  [7] "dentistry cache cite"       "dentistry key containing"  
##  [9] "dentistry key cite"         "dentistry containing cite" 
## [11] "free cache key"             "free cache containing"     
## [13] "free cache cite"            "free key containing"       
## [15] "free key cite"              "free containing cite"      
## [17] "cache key containing"       "cache key cite"            
## [19] "cache containing cite"      "key containing cite" 

tt <- Corpus(VectorSource(foo))
DocumentTermMatrix(tt)

## A document-term matrix (20 documents, 6 terms)
## 
## Non-/sparse entries: 60/60
## Sparsity           : 50%
## Maximal term length: 10 
## Weighting          : term frequency (tf)

as.matrix(DocumentTermMatrix(tt))

##     Terms
## Docs cache cite containing dentistry free key
##   1      1    0          0         1    1   0
##   2      0    0          0         1    1   1
##   3      0    0          1         1    1   0
##   4      0    1          0         1    1   0
##   5      1    0          0         1    0   1
##   6      1    0          1         1    0   0
##   7      1    1          0         1    0   0
##   8      0    0          1         1    0   1
##   9      0    1          0         1    0   1
##   10     0    1          1         1    0   0
##   11     1    0          0         0    1   1
##   12     1    0          1         0    1   0
##   13     1    1          0         0    1   0
##   14     0    0          1         0    1   1
##   15     0    1          0         0    1   1
##   16     0    1          1         0    1   0
##   17     1    0          1         0    0   1
##   18     1    1          0         0    0   1
##   19     1    1          1         0    0   0
##   20     0    1          1         0    0   1

太完美了!非常好的答案。谢谢! - Cybernetic

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接