我正在尝试从以下段落结构中提取此类信息:
women_ran men_ran kids_ran walked
1 2 1 3
2 4 3 1
3 6 5 2
text = ["On Tuesday, one women ran on the street while 2 men ran and 1 child ran on the sidewalk. Also, there were 3 people walking.", "One person was walking yesterday, but there were 2 women running as well as 4 men and 3 kids running.", "The other day, there were three women running and also 6 men and 5 kids running on the sidewalk. Also, there were 2 people walking in the park."]
我正在使用Python的
spaCy
作为我的自然语言处理库。我对NLP工作比较新,希望能得到一些指导,了解从这些句子中提取表格信息的最佳方法。如果只是识别是否有人在跑步或散步,我会使用
sklearn
来拟合分类模型,但我需要提取的信息显然更加细致(我正在尝试检索每个子类别和其值)。任何指导都将不胜感激。
id
或class
(DOM是HTML / XML文档/树的数据结构,在javascript等中使用)。因此,您可以通过ID和类进行过滤以查找元素。在NLP中,依赖关系分析器将非结构化文本转换为类似于HTML的树形数据结构,并具有可以使用DOM选择器过滤器和XPath查询进行查询的标签。 - hobs