我在elasticsearch中有一个文档,它的id是AVosj8FEIaetdb3CXpP-
。我想要访问字段中每个单词的tf-idf值,我进行了以下操作:
GET /cnn/cnn_article/AVosj8FEIaetdb3CXpP-/_termvectors
{
"fields" : ["author_wording"],
"term_statistics" : true,
"field_statistics" : true
}'
我收到的响应是:
{
"_index": "dailystormer",
"_type": "dailystormer_article",
"_id": "AVosj8FEIaetdb3CXpP-",
"_version": 3,
"found": true,
"took": 1,
"term_vectors": {
"author_wording": {
"field_statistics": {
"sum_doc_freq": 3408583,
"doc_count": 16111,
"sum_ttf": 7851321
},
"terms": {
"318": {
"doc_freq": 4,
"ttf": 4,
"term_freq": 1,
"tokens": [
{
"position": 121,
"start_offset": 688,
"end_offset": 691
}
]
},
"742": {
"doc_freq": 1,
"ttf": 1,
"term_freq": 1,
"tokens": [
{
"position": 122,
"start_offset": 692,
"end_offset": 695
}
]
},
"9971": {
"doc_freq": 1,
"ttf": 1,
"term_freq": 1,
"tokens": [
{
"position": 123,
"start_offset": 696,
"end_offset": 700
}
]
},
"a": {
"doc_freq": 14921,
"ttf": 163268,
"term_freq": 11,
"tokens": [
{
"position": 1,
"start_offset": 13,
"end_offset": 14
},
...
"you’re": {
"doc_freq": 1112,
"ttf": 1647,
"term_freq": 1,
"tokens": [
{
"position": 80,
"start_offset": 471,
"end_offset": 477
}
]
}
}
}
}
}
它返回了一些有趣的字段,比如词频(tf),但没有tf-idf。我应该自己重新计算吗?这是个好主意吗?如果是,我该怎么做?