使用R包openNLP
是否可以提取noun+noun
或(adj|noun)+noun
?也就是说,我想使用语言过滤器提取候选名词短语。您能指导我如何操作吗?
非常感谢。
谢谢回复。 以下是代码:
library("openNLP")
acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in
pipeline and terminal operations for 12.2 mln dlrs. The company said
the sale is subject to certain post closing adjustments,
which it did not explain. Reuter."
acqTag <- tagPOS(acq)
acqTagSplit = strsplit(acqTag," ")
acqTagSplit
qq = 0
tag = 0
for (i in 1:length(acqTagSplit[[1]])){
qq[i] <-strsplit(acqTagSplit[[1]][i],'/')
tag[i] = qq[i][[1]][2]
}
index = 0
k = 0
for (i in 1:(length(acqTagSplit[[1]])-1)) {
if ((tag[i] == "NN" && tag[i+1] == "NN") |
(tag[i] == "NNS" && tag[i+1] == "NNS") |
(tag[i] == "NNS" && tag[i+1] == "NN") |
(tag[i] == "NN" && tag[i+1] == "NNS") |
(tag[i] == "JJ" && tag[i+1] == "NN") |
(tag[i] == "JJ" && tag[i+1] == "NNS"))
{
k = k +1
index[k] = i
}
}
index
读者可以参考 acqTagSplit 上的索引来进行noun+noun
或(adj|noun)+noun
提取。(该代码不是最优的,但它能够工作。如果您有任何想法,请告诉我。)
我还有一个额外的问题:
Justeson和Katz(1995)提出了另一种语言过滤方法来提取候选名词短语:
((Adj|Noun)+|((Adj|Noun)*(Noun-Prep)?)(Adj|Noun)*)Noun
我无法很好地理解它的含义。你能帮我解释一下吗?或者展示如何在R语言中编写过滤规则?非常感谢。