用Python语言循环遍历字典的技巧

Question

用Python语言循环遍历字典的技巧

pythonpandasloopsdictionarykey

4

我正在练习Pandas，并有以下任务：

创建一个列表，其元素是每个.csv文件的列数

.csv文件存储在以年份为键的字典 directory 中

我使用字典推导式 dataframes（同样以年份为键）将.csv文件存储为 Pandas 数据帧

directory = {2009: 'path_to_file/data_2009.csv', ... , 2018: 'path_to_file/data_2018.csv'}

dataframes = {year: pandas.read_csv(file) for year, file in directory.items()}

# My Approach 1 
columns = [df.shape[1] for year, df in dataframes.items()]

# My Approach 2
columns = [dataframes[year].shape[1] for year in dataframes]

哪种方法更符合“Pythonic”？或者有更好的方法来解决这个问题吗？

- Vivek Jha

2

你能使用 [df.shape[1] for df in dataframes.values()] 吗？ - Peter Gibson

@PeterGibson 这正是我在寻找的！我不知道有一个 dict.values() 方法。 - Vivek Jha

4个回答

4

您的方法2：

columns = [dataframes[year].shape[1] for year in dataframes]

使用数据框在合并、绘图、操作等方面更加Pythonic和简洁，因为键是在推导中暗示的，并且形状给出了列数。

- privatevoid

3

您可以使用以下方法：

columns = [len(dataframe.columns) for dataframe in dataframes.values()]

如@piRSquared所提到的，如果您的唯一目标是获取数据框中的列数，则不应读取整个csv文件，而应使用read_csv函数的nrows关键字参数。

- theSanjeev

2

import os
#use this to find files under certain dir, you can filter it if there are other files
target_files = os.listdir('path_to_file/')       
columns = list()
for filename in train_files:
    #in your scenario @piRSquared's answer would be more efficient.
    columns.append(#column_numbers)

如果您想从文件名中获取按年份分组的关键列，可以过滤文件名并像这样更新字典：

year = filename.replace(r'[^0-9]', '')

- Shihe Zhang

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- piRSquared · Accepted Answer

您的方法可以解决问题...但我不喜欢读取整个文件并创建数据帧仅仅是为了计算列数。您可以通过仅读取每个文件的第一行并计算逗号的数量来完成同样的任务。请注意，我添加了1，因为逗号的数量始终比列数少一个。

columns = [open(f).readline().count(',') + 1 for _, f in directory.items()]