apply()返回的是DataFrame而不是Series。

Question

apply()返回的是DataFrame而不是Series。

3

In the folwwing code:
import pandas as pd
import sqlite3
import math
import numpy
con = sqlite3.connect(r'C:\Python34\factbook.db')
facts = pd.read_sql_query('select * from facts;', con)
facts.dropna(inplace=True)
facts = facts[facts['area_land']!=0][:]
facts = facts[facts['population']!=0][:]
facts.reset_index(drop=True, inplace=True)
def pop_50(name):
    pop = facts[facts['name'] == name]['population']
    perc = facts[facts['name'] == name]['population_growth']
    new_pop = pop*(math.e**(35*perc))
    return new_pop


x=pd.Series(data=facts['name'])
z = x.apply(pop_50)

x 是一个 Series：

0                                        Afghanistan
1                                            Albania
2                                            Algeria
3                                            Andorra
4                                             Angola
5                                Antigua and Barbuda
6                                          Argentina
7                                            Armenia

等等等等...

但z不是。这里有一个链接，可以看到它是什么（一个数据框）：https://www.scribd.com/document/357697929/Doc1

我不明白为什么。pop_50函数返回一个单一的结果（我测试过了），那么为什么zed是DataFrame？ pop_50如何返回一个系列？它需要一行（其中facts['name'] == name)并从中取出单个值（在population列下），然后将其称为pop。它然后对perc执行相同的想法。new_pop是两个单独值的数学组合，因此它也是一个单独的值，并且该函数仅返回该值，难道不是吗？

谢谢。

- Moran Reznik

你能发布一下z的内容吗？在我的测试中，它是一个Pandas系列对象。 - James

这是因为你的返回值 new_pop 是一个序列。尝试返回一个整数。例如 new_pop.population.values[0]。 - Jan Zeiseweis

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- piRSquared · Accepted Answer

pop_50 返回一个 pd.Series。 x.apply(pop_50) 对 x 的每一行调用函数 pop_50，并将该行的值作为参数 name 传递给 pop_50。因此，对于 x 中的第一行，您将返回一个系列。对于第二行也是如此。您最终得到了一系列系列...这是一个数据框。此外，x 的索引将成为您结果的列。

请尝试使用以下内容：

facts2 = facts.set_index('name')

def pop_50(name):

    pop = facts2.at[name, 'population']
    perc = facts2.at[name, 'population_growth']
    new_pop = pop*(math.e**(35*perc))
    return new_pop

你也可以使用 pd.Series.squeeze。

def pop_50(name):
    pop = facts[facts['name'] == name]['population'].squeeze()
    perc = facts[facts['name'] == name]['population_growth'].squeeze()
    new_pop = pop*(math.e**(35*perc))
    return new_pop

如果由于某种原因无法更改pop_50，请将其包装在lambda中。

z = x.apply(lambda name: pop_50(name).squeeze())