我正在编写一段Python脚本,但似乎无法得到正确的结果。它使用了两个输入参数:
该脚本的目标是:
以下是停止文件的示例:
- 数据文件
- 停用词文件
该脚本的目标是:
- 如果数据文件第1列中的字符串与停用词文件中的某个字符串匹配,则删除整行。
abandonment-n after+n-the+n-a-j stop-n 1
abandonment-n against+n-the+ns leave-n 1
cake-n against+n-the+vg rest-v 1
abandonment-n as+n-a+vd require-v 1
abandonment-n as+n-a-j+vg-up use-v 1
以下是停止文件的示例:
apple-n
banana-n
cake-n
pigeon-n
以下是我目前的代码:
with open("input1", "rb") as oIndexFile:
for line in oIndexFile:
lemma = line.split()
#print lemma
with open ("input2", "rb") as oSenseFile:
with open("output", "wb") as oOutFile:
for line in oSenseFile:
concept, slot, filler, freq = line.split()
nounsInterest = [concept, slot, filler, freq]
#print concept
if concept != lemma:
outstring = '\t'.join(nounsInterest)
oOutFile.write(outstring + '\n')
else:
pass
期望的输出如下:
abandonment-n after+n-the+n-a-j-stop-n 1
abandonment-n against+n-the+ns-leave-n 1
abandonment-n as+n-a+vd-require-v 1
abandonment-n as+n-a-j+vg-up-use-v 1
有什么见解吗?
目前我得到的输出如下,基本上只是我一直在做的打印:
abandonment-n after+n-the+n-a-j stop-n 1
abandonment-n against+n-the+ns leave-n 1
cake-n against+n-the+vg rest-v 1
abandonment-n as+n-a+vd require-v 1
abandonment-n as+n-a-j+vg-up use-v 1
我尝试过的但仍未奏效的方法有:
将 if concept != lemma:
改为 if concept not in lemma:
结果和之前的输出相同。
我还怀疑该函数没有调用第一个输入文件,但即使将其纳入代码中:
with open ("input2", "rb") as oSenseFile:
with open("tinput1", "rb") as oIndexFile:
for line in oIndexFile:
lemma = line.split()
with open("out", "wb") as oOutFile:
for line in oSenseFile:
concept, slot, filler, freq = line.split()
nounsInterest = [concept, slot, filler, freq]
if concept not in lemma:
outstring = '\t'.join(nounsInterest)
oOutFile.write(outstring + '\n')
else:
pass
这段代码生成了一个空白的输出文件。
我还尝试了一种不同的方法,参考链接如下:
filename = "input1.txt"
filename2 = "input2.txt"
filename3 = "output1"
def fixup(filename):
fin1 = open(filename)
fin2 = open(filename2, "r")
fout = open(filename3, "w")
for word in filename:
words = word.split()
for line in filename2:
concept, slot, filler, freq = line.split()
nounsInterest = [concept, slot, filler, freq]
if True in [concept in line for word in toRemove]:
pass
else:
outstring = '\t'.join(nounsInterest)
fout.write(outstring + '\n')
fin1.close()
fin2.close()
fout.close()
这段内容是从这里抽取的,但未能成功。在这种情况下,输出根本没有产生。
请问有人能指导我如何解决这个任务吗?虽然示例文件很小,但我必须在一个大文件上运行它。
感谢任何帮助。
line.split()
都会生成一个新的列表。在你的情况下,循环后 lemma 是["pigeon"]
。这就是输出结果不如预期的原因。 - flyingfoxlee