我有一个20GB的csv文件,格式如下。
date,ip,dev_type,env,time,cpu_usage
2015-11-09,10.241.121.172,M2,production,11:01,8
2015-11-09,10.241.121.172,M2,production,11:02,9
2015-11-09,10.241.121.243,C1,preproduction,11:01,4
2015-11-09,10.241.121.243,C1,preproduction,11:02,8
2015-11-10,10.241.121.172,M2,production,11:01,3
2015-11-10,10.241.121.172,M2,production,11:02,9
2015-11-10,10.241.121.243,C1,preproduction,11:01,4
2015-11-10,10.241.121.243,C1,preproduction,11:02,8
并将其以如下格式导入 ElasticSearch:
{
"_index": "cpuusage",
"_type": "logs",
"_id": "AVFOkMS7Q4jUWMFNfSrZ",
"_score": 1,
"_source": {
"date": "2015-11-10",
"ip": "10.241.121.172",
"dev_type": "M2",
"env": "production",
"time": "11:02",
"cpu_usage": "9"
},
"fields": {
"date": [
1447113600000
]
}
}
...
当我查找每天每个IP的cpu_usage的最大值时,如何输出所有字段(日期、IP、dev_type、env、cpu_usage)
curl -XGET localhost:9200/cpuusage/_search?pretty -d '{
"size": 0,
"aggs": {
"by_date": {
"date_histogram": {
"field": "date",
"interval": "day"
},
"aggs" : {
"genders" : {
"terms" : {
"field" : "ip",
"size": 100000,
"order" : { "_count" : "asc" }
},
"aggs" : {
"cpu_usage" : { "max" : { "field" : "cpu_usage" } }
}
}
}
}
}
}'
---cut---
----output ----
"aggregations" : {
"events_by_date" : {
"buckets" : [ {
"key_as_string" : "2015-11-09T00:00:00.000Z",
"key" : 1447027200000,
"doc_count" : 4,
"genders" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "10.241.121.172",
"doc_count" : 2,
"cpu_usage" : {
"value" : 9.0
}
}, {
"key" : "10.241.121.243",
"doc_count" : 2,
"cpu_usage" : {
"value" : 8.0
}
} ]
}
},