输入文件的格式如下:
CHROMOSOME_IV ncRNA gene 5723085 5723105 . - . ID=Gene:WBGene00045518 CHROMOSOME_IV ncRNA ncRNA 5723085 5723105 . - . Parent=Gene:WBGene00045518
替换值的文件格式如下:
WBGene00045518 21ur-5153
以下是我的代码:
infile1 = open('f1.txt', 'r')
infile2 = open('f2.txt', 'r')
outfile = open('out.txt', 'w')
import re
from datetime import datetime
startTime = datetime.now()
udict = {}
for line in infile1:
line = line.strip()
linelist = line.split('\t')
udict1 = {linelist[0]:linelist[1]}
udict.update(udict1)
mult10K = []
for x in range(100):
mult10K.append(x * 10000)
linecounter = 0
for line in infile2:
for key, value in udict.items():
matches = line.count(key)
if matches > 0:
print key, value
line = line.replace(key, value)
outfile.write(line + '\n')
else:
outfile.write(line + '\n')
linecounter += 1
if linecounter in mult10K:
print linecounter
print (datetime.now()-startTime)
infile1.close()
infile2.close()
outfile.close()