如何打印出最高值的三行?

3

我有一个输入文件,

10N06_64  sc635516  93.93   100.0
10N06_64  sc711028  93.99   100.0
10N06_64  sc255425  93.46   95.8
10N06_64  sc115511  87.5    93.0
116F19_238  sc121016    91.30   12.1
116F19_238  sc1132492   90.94   6.1
116F19_238  sc513573    87.38   6.1
116F19_238  sc68511 75.93   10.5

我需要在每个line[0]内分组和迭代,并选择具有line[3]和line[2]最高值的3行进行打印,以便输出文件如下:

10N06_64  sc635516  93.93   100.0
10N06_64  sc711028  93.99   100.0
10N06_64  sc255425  93.46   95.8
116F19_238  sc121016    91.30   12.1
116F19_238  sc68511 75.93   10.5
116F19_238  sc1132492   90.94   6.1

这是我的尝试,但它只打印出最好的一行,如何修改它以打印3个最佳匹配?

import csv
from itertools import groupby
from operator import itemgetter
with open('myfile','rb') as f1:
    with open('outfile', 'wb') as f2:
        reader = csv.reader(f1, delimiter='\t')
        writer1 = csv.writer(f2, delimiter='\t')
        for group, rows in groupby(reader, itemgetter(0)):
            best = max(rows, key=lambda r: (float(r[3]), float(r[2])))
            writer1.writerow(best)
4个回答

3
您可以使用heapq.nlargest()函数来获取值最高的行:
#!/usr/bin/env python
import csv
import sys
from heapq import nlargest
from itertools import groupby

writerows = csv.writer(sys.stdout, delimiter='\t').writerows
for _, rows in groupby(csv.reader(sys.stdin, delimiter='\t'), key=lambda r: r[0]):
    writerows(nlargest(3, rows, key=lambda row: (float(row[3]), float(row[2]))))

例子:

$ <input.csv ./your-script >output.csv

输出

10N06_64    sc711028    93.99   100.0
10N06_64    sc635516    93.93   100.0
10N06_64    sc255425    93.46   95.8
116F19_238  sc121016    91.30   12.1
116F19_238  sc68511 75.93   10.5
116F19_238  sc1132492   90.94   6.1

nlargest()可以避免将输入组加载到内存中。如果行数始终很少,则还可以使用sorted(iterable, key=key, reverse=True)[:n]


2

我将使用排序方法来改进你的代码

输入:

10N06_64    sc635516    93.93   100.0
10N06_64    sc711028    93.99   100.0
10N06_64    sc255425    93.46   95.8
10N06_64    sc115511    87.5    93.0
116F19_238  sc121016    91.30   12.1
116F19_238  sc1132492   90.94   6.1
116F19_238  sc513573    87.38   6.1
116F19_238  sc68511 75.93   10.5

代码:

import csv
from itertools import groupby
from operator import itemgetter
with open('word.txt','rb') as f1:
        reader = csv.reader(f1, delimiter='\t')
        for group, rows in groupby(reader, itemgetter(0)):
            best = sorted(rows, key=lambda r: (float(r[3]), float(r[2])),reverse=True)[:3]
            for a in best:
                print a
            print "\n"

输出:

['10N06_64', 'sc711028', '93.99', '100.0']
['10N06_64', 'sc635516', '93.93', '100.0']
['10N06_64', 'sc255425', '93.46', '95.8']


['116F19_238', 'sc121016', '91.30', '12.1']
['116F19_238', 'sc68511', '75.93', '10.5']
['116F19_238', 'sc1132492', '90.94', '6.1']

看起来我们得出了相同的解决方案 ;) - avenet
@avenet 是的,几乎同时 :) - The6thSense
2
@avenet http://budapestbeacon.com/wp-content/uploads/2015/02/there-can-be-only-one.jpg - yuvi

2
您可以尝试这样做:
import csv
from itertools import groupby
from operator import itemgetter

take = 3

with open('myfile','rb') as f1:
    with open('outfile', 'wb') as f2:
        reader = csv.reader(f1, delimiter='\t')
        writer1 = csv.writer(f2, delimiter='\t')
        for group, rows in groupby(reader, itemgetter(0)):
            sorted_items = sorted(rows, key=lambda r: (float(r[3]), float(r[2])), reverse=True)
            for item in sorted_items[:take]:
                writer1.writerow(item)

sorted函数类似于max函数,并根据您提供的键对项目进行排序。


1

#您需要使用if语句来识别前三个最佳结果,例如:

for x  in table:
    if x > number1
        number1 = x
    elif x > number2
        number2 = x
    elif x > number3
        number3 = x

打印 number1、number2、number3


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接