使用Python将CSV索引到ElasticSearch

Question

使用Python将CSV索引到ElasticSearch

pythoncsvelasticsearchpython-3.5elasticsearch-dsl

9

我希望能够使用elasticsearch-dsl高级库，将CSV文件索引到ElasticSearch中，而不使用Logstash。

假设有一个CSV文件，其中包含标题，例如：

name,address,url
adam,hills 32,http://rockit.com
jane,valleys 23,http://popit.com

什么是将所有数据按字段索引的最佳方法？最终，我希望每一行看起来像这样。

{
"name": "adam",
"address": "hills 32",
"url":  "http://rockit.com"
}

- bluesummers

看起来 elasticsearch-dsl 依赖于 elasticsearch-py 库。请查看 elasticsearch-py 文档中如何插入文档的示例。 - user378704

问题不在于索引文档，而在于如何将整个 .csv 文件索引到 Elasticsearch 中的技术。 - bluesummers

2个回答

1

如果您想创建严格类型和模型的 elasticsearch 数据库以进行更好的过滤，可以执行以下操作：.tsv/.csv。

class ElementIndex(DocType):
    ROWNAME = Text()
    ROWNAME = Text()

    class Meta:
        index = 'index_name'

def indexing(self):
    obj = ElementIndex(
        ROWNAME=str(self['NAME']),
        ROWNAME=str(self['NAME'])
    )
    obj.save(index="index_name")
    return obj.to_dict(include_meta=True)

def bulk_indexing(args):

    # ElementIndex.init(index="index_name")
    ElementIndex.init()
    es = Elasticsearch()

    //here your result dict with data from source

    r = bulk(client=es, actions=(indexing(c) for c in result))
    es.indices.refresh()

- Alex

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Honza Král · Accepted Answer

41

使用较低级别的elasticsearch-py库可以更轻松地完成这种任务:

from elasticsearch import helpers, Elasticsearch
import csv

es = Elasticsearch()

with open('/tmp/x.csv') as f:
    reader = csv.DictReader(f)
    helpers.bulk(es, reader, index='my-index', doc_type='my-type')

- Honza Král

这就是我想要的答案，我会在几个小时后尝试并做出相应回复，谢谢！ - bluesummers

1

关于映射，我们如何使其知道每个字段的类型？ - Souad

你的代码片段有个小细节问题：Elasicsearch中有个打字错误（应为ElasticSearch）。 - Montenegrodr

1

@shinz4u 只需将读取器包装在某个东西中，该东西将在字典中添加所需的 id 作为 _id 键，然后 elasticsearch 将接受它。 - Honza Král

2

@seamaner 这意味着elasticsearch无法快速处理您发送的数据。您可以通过在实例化时将timeout=N传递给Elasticsearch（其中N> 10）来增加超时时间（默认为10秒）。 - Honza Král

显示剩余4条评论