将元组列表转换为pandas.DataFrame

3

我有三个元组列表,这些列表的第一个元素是年份,如下所示。

list1 = [
    ('2010', 1783675.0), ('2011', 1815815.0), ('2012', 1633258.0), ('2013', 1694062.0), ('2014', 1906527.0), 
    ('2015', 1908661.0), ('2016', 2492979.0), ('2017', 2846997.0), ('2018', 2930313.0), ('2019', 2654724.0)
]

list2 = [
    ('2010', 302816.0), ('2011', 229549.0), ('2012', 323063.0), ('2013', 285066.0), ('2014', 282003.0), 
    ('2015', 354500.0), ('2016', 275383.0), ('2017', 322074.0), ('2018', 366909.0), ('2019', 297942.0)
]

list3 =[
    ('2010', 149036.0), ('2011', 144112.0), ('2012', 173944.0), ('2013', 205724.0), ('2014', 214019.0), 
    ('2015', 261462.0), ('2016', 260646.0), ('2017', 279267.0), ('2018', 288120.0), ('2019', 277106.0)
]

我想使用这些列表创建一个pandas.DataFrame,将年份设置为行索引:

          list1     list2     list3
2010  1783675.0  302816.0  149036.0
2011  1815815.0  229549.0  144112.0
2012  1633258.0  323063.0  173944.0
2013  1694062.0  285066.0  205724.0
2014  1906527.0  282003.0  214019.0
2015  1908661.0  354500.0  261462.0
2016  2492979.0  275383.0  260646.0
2017  2846997.0  322074.0  279267.0
2018  2930313.0  366909.0  288120.0
2019  2654724.0  297942.0  277106.0
4个回答

1

除了已经提供的答案外,另一个选择是使用Python的defaultdict,可以简化将数据合并到一个字典中,然后再读入DataFrame的过程:

 from collections import defaultdict
 from itertools import chain

 #chain the lists into one, then
 #get all the similar values into one list:

 d = defaultdict(list)

 for k, v in chain(list1,list2,list3):
     d[k].append(v)

 #read the data into a pandas dataframe:

 df = pd.DataFrame.from_dict(d, orient='index', columns=['list1','list2','list3'])

          list1      list2       list3
2010    1783675.0   302816.0    149036.0
2011    1815815.0   229549.0    144112.0
2012    1633258.0   323063.0    173944.0
2013    1694062.0   285066.0    205724.0
2014    1906527.0   282003.0    214019.0
2015    1908661.0   354500.0    261462.0
2016    2492979.0   275383.0    260646.0
2017    2846997.0   322074.0    279267.0
2018    2930313.0   366909.0    288120.0
2019    2654724.0   297942.0    277106.0

1
你可以为每个列表创建一个新的DataFrame,并使用merge方法将它们合并。
import pandas as pd 

list1 = [('2010', 1783675.0), ('2011', 1815815.0), ('2012', 1633258.0), ('2013', 1694062.0),
('2014', 1906527.0),  ('2015', 1908661.0), ('2016', 2492979.0), ('2017', 2846997.0), ('2018', 2930313.0),
 ('2019', 2654724.0)]

list2 = [('2010', 302816.0), ('2011', 229549.0), ('2012', 323063.0), ('2013', 285066.0),
 ('2014', 282003.0), ('2015', 354500.0), ('2016', 275383.0), ('2017', 322074.0), ('2018', 366909.0),
 ('2019', 297942.0)]

list3 =[('2010', 149036.0), ('2011', 144112.0), ('2012', 173944.0), ('2013', 205724.0),
 ('2014', 214019.0), ('2015', 261462.0), ('2016', 260646.0), ('2017', 279267.0), ('2018', 288120.0),
 ('2019', 277106.0)]

df = (pd.DataFrame(data=list1, columns=["year", "list1"])
        .merge(pd.DataFrame(data=list2, columns=["year", "list2"]), on="year")
        .merge(pd.DataFrame(data=list3, columns=["year", "list3"]), on="year"))

0
另一个解决方案是在for循环中使用pandas.concatpandas.Series。代码如下:
series = []

for l, name in [(list1, 'list1'), (list2, 'list2'), (list3, 'list3')]:
    series.append(pd.Series({tup[0]: tup[1] for tup in l}, name=name))

df = pd.concat(series, axis=1)

结果看起来像这样:

>>> print(df)
          list1     list2     list3
2010  1783675.0  302816.0  149036.0
2011  1815815.0  229549.0  144112.0
2012  1633258.0  323063.0  173944.0
2013  1694062.0  285066.0  205724.0
2014  1906527.0  282003.0  214019.0
2015  1908661.0  354500.0  261462.0
2016  2492979.0  275383.0  260646.0
2017  2846997.0  322074.0  279267.0
2018  2930313.0  366909.0  288120.0
2019  2654724.0  297942.0  277106.0

0
你可以遍历列表并创建正确格式的字典,然后将其转换为DataFrame。请注意,这假定有序列表,每个列表中的年份相同。
import pandas as pd

list1 = [('2010', 1783675.0), ('2011', 1815815.0), ('2012', 1633258.0),
    ('2013', 1694062.0), ('2014', 1906527.0), ('2015', 1908661.0),
    ('2016', 2492979.0), ('2017', 2846997.0), ('2018', 2930313.0),
    ('2019', 2654724.0)]

list2 = [('2010', 302816.0), ('2011', 229549.0), ('2012', 323063.0),
    ('2013', 285066.0), ('2014', 282003.0), ('2015', 354500.0),
    ('2016', 275383.0), ('2017', 322074.0), ('2018', 366909.0),
    ('2019', 297942.0)]

list3 =[('2010', 149036.0), ('2011', 144112.0), ('2012', 173944.0),
    ('2013', 205724.0), ('2014', 214019.0), ('2015', 261462.0),
    ('2016', 260646.0), ('2017', 279267.0), ('2018', 288120.0),
    ('2019', 277106.0)]

df_dict = {}
years = [el[0] for el in list1]

df_dict["list1"] = [el[1] for el in list1]
df_dict["list2"] = [el[1] for el in list2]
df_dict["list3"] = [el[1] for el in list3]

df = pd.DataFrame(df_dict, index=years)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接