是否有一个开源的Java库/算法可以判断特定的文本是否是问题?
我正在开发一个问答系统,需要分析用户输入的文本是否是问题。
我认为这个问题可能可以通过使用开源的NLP库来解决,但显然不仅仅是简单的词性标注那么简单。因此,如果有人能告诉我如何使用现有的开源NLP库来解决这个问题,那就太好了。
如果您知道一个使用数据挖掘来解决这个问题的库/工具包,请让我知道。虽然获取足够的训练数据将会很困难,但我可以使用堆栈交换数据进行训练。
是否有一个开源的Java库/算法可以判断特定的文本是否是问题?
我正在开发一个问答系统,需要分析用户输入的文本是否是问题。
我认为这个问题可能可以通过使用开源的NLP库来解决,但显然不仅仅是简单的词性标注那么简单。因此,如果有人能告诉我如何使用现有的开源NLP库来解决这个问题,那就太好了。
如果您知道一个使用数据挖掘来解决这个问题的库/工具包,请让我知道。虽然获取足够的训练数据将会很困难,但我可以使用堆栈交换数据进行训练。
(SBARQ (WH+ (W+) ...)
(SQ ...*
(V+) ...*)
(?))
因此,使用任何可用的语法分析器,具有嵌套SQ(可选)的SBARQ节点的树将表明输入是问题。 WH +节点(WHNP / WHADVP / WHADJP)包含问题主干(谁/什么/何时/在哪里/为什么/如何),而SQ保持倒置短语。
即:
(SBARQ
(WHNP
(WP What))
(SQ
(VBZ is)
(NP
(DT the)
(NN question)))
(. ?))
当然,拥有大量的前置从句会导致解析错误(可以通过解决方法来解决),而糟糕的问题撰写也会导致解析错误。例如,本篇文章标题“如何判断一个句子是否是疑问句?”将具有SBARQ,但不会有SQ。
User: I would like to know how many angels fit on the head of a pin.
System: Do you have a question about angels?
User: Yes.
System: Do you want to know if angels are fit?
User: No.
System: Do you want to know if angels have heads?
User: Possibly.
System: Do you want to know if angels have pins?
User: No.
System: Do you want to know if angels are numerous?
User: No.
System: Do you want to know the dimensions of an angel?
User: Yes.
System: Do you mean angels in paintings?
User: No.
System: Do you mean angels in myth and religious writing?
User: Yes.
System: Angels are metaphysical beings.
User: I hear that Pennsylvania was founded by William Penn. Is that true?
System: Do you have a question about hearing?
User: No.
System: Do you have a question about Pennsylvania?
User: Yes.
System: Pennsylvania was founded by William Penn.
User: When?
System: 1682.
User: What does the name mean?
System: What name?
User: Pennsylvania!
System: Do you want to know the meaning of Pennsylvania?
User: Yes.
System: Pennsylvania means Penn's Woods.
;
。 - whiskeysierra