寻找 Pandas 数据框的中位数

Question

寻找 Pandas 数据框的中位数

pythonpandas

11

我试图找到整个数据框的中位数流量。首先要做的是仅选择数据框中的某些项目。

这有两个问题，它包括不在“states”中的数据框部分。此外，中位数不是单个值，而是基于行的。如何获得数据框中所有数据的总体中位数？

- ksalerno

为了以后的参考，Stack Overflow 上的图片效果不佳，我们更希望看到您代码的输出结果。您是否有“状态”和“值”列，还是每个状态都是一列？ - MattR

each one is a column - ksalerno

你能否提供你dataframe数据的样例？这样我们就可以通过复制它来创建自己的数据了。这会帮助我们帮助你。你很可能需要使用melt函数。 - MattR

我认为，如果您想要DataFrame中所有数据的单个中位数，那么您选择了错误的数据结构或做出了其他错误的设计决策。 - Elmex80s

我只是将数据框的一部分放入了其中。 - ksalerno

没事，我自己解决了。 - ksalerno

2个回答

2

您贴上的DataFrame由于一些空格而有些混乱。但您需要将DataFrame melt，然后在新的melted DataFrame上使用median()：

df2 = pd.melt(df, id_vars =['U.S.'])
print(df2['value'].median())

你的数据框可能略有不同，但概念相同。请查看我关于pd.melt()的注释，特别是value_vars和id_vars参数。

以下是我清洁和获得正确答案的非常详细的方法：

# reading in on clipboard
df = pd.read_clipboard()

# printing it out to see and also the column names
print(df)
print(df.columns)

# melting the DF and then printing the result
df2 = pd.melt(df, id_vars =['U.S.'])
print(df2)

# Creating a new DF so that no nulls are in there for ease of code readability
# using .copy() to avoid the Pandas warning about working on top of a copy
df3 = df2.dropna().copy()

# there were some funky values in the 'value' column. Just getting rid of those
df3.loc[df3.value.isin(['Columbia', 'of']), 'value'] = 99

# printing out the cleaned version and getting the median
print(df3)
print(df3['value'].median())

- MattR

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ayhan · Accepted Answer

两个选项：

1）一个Pandas选项：

df.stack().median()

2) 一个NumPy选项：

np.median(df.values)