如何在seaborn因素图中使用加权平均估计器(包括自助法)?

4

我有一个数据框,其中每一行都有一个特定的权重需要在均值计算中加以考虑。我喜欢seaborn factorplots及其自举95%置信区间,但无法让seaborn接受新的加权均值估计器。

以下是我想要做的示例。

tips_all = sns.load_dataset("tips")
tips_all["weight"] = 10 * np.random.rand(len(tips_all))
sns.factorplot("size", "total_bill", 
               data=tips_all, kind="point")
# here I would like to have a mean estimator that computes a weighted mean
# the bootstrapped confidence intervals should also use this weighted mean estimator
# something like (tips_all["weight"] * tips_all["total_bill"]).sum() / tips_all["weight"].sum()
# but on bootstrapped samples (for the confidence interval)

任何想法都将不胜感激! - Tim
@mwaskom: 对于如何实现这个有什么想法吗? - Tim
2个回答

5

来自@mwaskom:https://github.com/mwaskom/seaborn/issues/722

虽然不是很支持,但我认为可以拼凑出一个解决方案。这个方法好像可以行得通?

tips = sns.load_dataset("tips")
tips["weight"] = 10 * np.random.rand(len(tips))

tips["tip_and_weight"] = zip(tips.tip, tips.weight)

def weighted_mean(x, **kws):
    val, weight = map(np.asarray, zip(*x))
    return (val * weight).sum() / weight.sum()

g = sns.factorplot("size", "tip_and_weight", data=tips,
                   estimator=weighted_mean, orient="v")
g.set_axis_labels("size", "tip")

1

来自同一GitHub讨论串的@fkloosterman:一个可行的解决方案,适用于seaborn v0.11.0及以上版本(在v0.11.2上已经确认):

import seaborn as sns, numpy as np
tips = sns.load_dataset("tips")
tips["weight"] = 10 * np.random.rand(len(tips))
tips["tip_and_weight"] = [ v + w*1j for v,w in zip(tips.tip, tips.weight)]
def weighted_mean(x, **kws):
    return np.sum(np.real(x) * np.imag(x)) / np.sum(np.imag(x))

sns.pointplot(x="size", y="tip_and_weight", data=tips, estimator=weighted_mean, orient='v')

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接