在Python中格式化CSV文件中的数据（计算平均值）

Question

在Python中格式化CSV文件中的数据（计算平均值）

3

import csv
with open('Class1scores.csv') as inf:
    for line in inf:
        parts = line.split() 
        if len(parts) > 1:   
            print (parts[4])   


f = open('Class1scores.csv')
csv_f = csv.reader(f)
newlist = []
for row in csv_f:

    row[1] = int(row[1])
    row[2] = int(row[2])
    row[3] = int(row[3])

    maximum = max(row[1:3])
    row.append(maximum)
    average = round(sum(row[1:3])/3)
    row.append(average)
    newlist.append(row[0:4])

averageScore = [[x[3], x[0]] for x in newlist]
print('\nStudents Average Scores From Highest to Lowest\n')

这段代码的作用是读取CSV文件，在前三行（第0行是用户名）中，应该将所有三个分数相加并除以三，但它没有计算出正确的平均值，只是从最后一列中获取了分数。

- Billy

1

你能否发布你的CSV文件的前几行。 - Igor

打开文件两次有什么意义？ - Seekheart

Billy，看看我的回答。你可以删掉不需要的部分，并根据自己的需求实现它。 - Igor

2个回答

2

这里有一种方法可以实现。请看两个部分。首先，我们创建一个字典，以名称为键，以结果列表为值。

import csv


fileLineList = []
averageScoreDict = {}

with open('Class1scores.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        fileLineList.append(row)

for row in fileLineList:
    highest = 0
    lowest = 0
    total = 0
    average = 0
    for column in row:
        if column.isdigit():
            column = int(column)
            if column > highest:
                highest = column
            if column < lowest or lowest == 0:
                lowest = column
            total += column    
    average = total / 3
    averageScoreDict[row[0]] = [highest, lowest, round(average)]

print(averageScoreDict)

输出:

{'Milky': [7, 4, 5], 'Billy': [6, 5, 6], 'Adam': [5, 2, 4], 'John': [10, 7, 9]}

现在我们有了字典，可以通过对列表进行排序来创建您所需的最终输出。请参阅此更新的代码:

import csv
from operator import itemgetter


fileLineList = []
averageScoreDict = {} # Creating an empty dictionary here.

with open('Class1scores.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        fileLineList.append(row)

for row in fileLineList:
    highest = 0
    lowest = 0
    total = 0
    average = 0
    for column in row:
        if column.isdigit():
            column = int(column)
            if column > highest:
                highest = column
            if column < lowest or lowest == 0:
                lowest = column
            total += column    
    average = total / 3
    # Here is where we put the emtpy dictinary created earlier to good use.
    # We assign the key, in this case the contents of the first column of
    # the CSV, to the list of values. 
    # For the first line of the file, the Key would be 'John'.
    # We are assigning a list to John which is 3 integers: 
    #   highest, lowest and average (which is a float we round)
    averageScoreDict[row[0]] = [highest, lowest, round(average)]

averageScoreList = []

# Here we "unpack" the dictionary we have created and create a list of Keys.
# which are the names and single value we want, in this case the average.
for key, value in averageScoreDict.items():
    averageScoreList.append([key, value[2]])

# Sorting the list using the value instead of the name.
averageScoreList.sort(key=itemgetter(1), reverse=True)    

print('\nStudents Average Scores From Highest to Lowest\n')
print(averageScoreList)

输出:

学生平均分从高到低排序 [['约翰', 9], ['比利', 6], ['米尔基', 5], ['亚当', 4]]

- Igor

你能否为每一行添加注释，以便我更好地理解字典的概念？ - Billy

最后一个问题，你如何知道在字典中定义键的内容？为什么需要将反向设置为true？ - Billy

我也在代码示例中添加了一些注释。这种方法有点复杂，但这是很好的学习方式。在Python中，我们通常将数组称为列表。打印列表内容最简单的方法是使用for循环。例如：“for i in averageScoreList:”下一行“print(i)”。您还可以导入pprint并打印字典和列表，有时这样可以更轻松地查看数据结构。例如：“pprint.pprint(averageScoreList)”。 - Igor

您IP地址为143.198.54.68，由于运营成本限制，当前对于免费用户的使用频率限制为每个IP每72小时10次对话，如需解除限制，请点击左下角设置图标按钮（手机用户先点击左上角菜单按钮）。 - Igor

我使用名称作为我的字典键，因为在将来我们只需像“print(averageScoreDict['John'])”这样说就可以给我们提供与该键相关联的所有值。在这种情况下，是一个由3个整数组成的列表。这只是一种为了方便以后在我们的代码中进行引用而结构化数据的方式。 - Igor

显示剩余2条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Patrick the Cat · Accepted Answer

基本上，您想要每行的统计信息。通常情况下，应该像这样做：

import csv

with open('data.csv', 'r') as f:
    rows = csv.reader(f)
    for row in rows:
        name = row[0]
        scores = row[1:]

        # calculate statistics of scores
        attributes = {
           'NAME': name,
           'MAX' : max(scores),
           'MIN' : min(scores),
           'AVE' : 1.0 * sum(scores) / len(scores)
        }

        output_mesg ="name: {NAME:s} \t high: {MAX:d} \t low: {MIN:d} \t ave: {AVE:f}"
        print(output_mesg.format(**attributes))

尽量不要考虑在本地执行某些操作是否低效。一个好的Python脚本应该尽可能易读，以方便所有人。

在您的代码中，我发现了两个错误：

1. 向row追加不会改变任何东西，因为row是for循环中的局部变量，并将被垃圾收集。

2. row [1: 3]只提供第二和第三个元素。row [1: 4] 提供了您想要的内容，以及 row [1：]。在Python中进行索引通常是末端排除的。

这里有一些问题需要您考虑：

如果我可以在Excel中打开文件并且它不是那么大，为什么不直接在Excel中处理呢？我能否利用所有工具以最小的努力尽快完成工作？我能在30秒内完成此任务吗？