每行如何对以相同字符串开头的列求和？

Question

每行如何对以相同字符串开头的列求和？

4

编辑：列名确实以多于1个字符开头，但是由 sep='_', 可能更像 AAA_BBB、AAA_DDD、BBB_EEE、BBB_FFF 等。

感谢 groupby 的解决方案！

我有一个 pandas 数据帧，就像这样（来自另一个问题）：

df =

C1    C2    T3  T5
28    34    11  22
45    100   33  66

如何获取一个新的数据框，其中包含具有相同“起始字符串”（例如“C”，“T”）的列的总和？谢谢！

很不幸，我必须处理这种数据框架的结构，而且数据框架中有约1000个列，看起来像A1、A2、A3、B1、B2、B3等。

- user3275943

3个回答

3

`pandas.DataFrame.groupby`与`axis=1`

OP没有明确列名的一般特征。请阅读各种选项，确定哪个对您的特定情况更合适。

`可调用`版本＃1

假设您的列前缀是单个字符...

from operator import itemgetter

df.groupby(itemgetter(0), axis=1).sum()

     C   T
0   62  33
1  145  99

当您将一个callable传递给pandas.DataFrame.groupby时，它会将该可调用对象映射到索引（如果axis=1则为列），并将唯一结果作为分组键。

`callable`版本#2：自定义函数

稍微有些复杂，但应该对不止单个字符前缀具有鲁棒性。此外，没有使用任何导入。

def yield_while_alpha(x):
    it = iter(x)
    y = next(it)
    while y.isalpha():
        yield y
        y = next(it)

def get_prefix(x):
    return ''.join(yield_while_alpha(x))

df.groupby(get_prefix, axis=1).sum()

     C   T
0   62  33
1  145  99

相同的想法，但使用 itertools 替代

from itertools import takewhile

df.groupby(
    lambda x: ''.join(takewhile(str.isalpha, x)),
    axis=1
).sum()

     C   T
0   62  33
1  145  99

`pandas.Index.str.extract`

我们可以使用 callable，也可以不使用。

df.groupby(df.columns.str.extract('(\D+)', expand=False), axis=1).sum()

     C   T
0   62  33
1  145  99

- piRSquared

2

使用MultiIndex的另一种选择：

df.columns = [df.columns.str[0], df.columns]
df.groupby(level=0, axis=1).sum()

- Code Different

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Scott Boston · Accepted Answer

使用，

 df.groupby(df.columns.str[0], axis=1).sum()

输出：

     C   T
0   62  33
1  145  99

每行如何对以相同字符串开头的列求和？

pandas.DataFrame.groupby与axis=1

可调用版本＃1

callable版本#2：自定义函数

pandas.Index.str.extract

`pandas.DataFrame.groupby`与`axis=1`

`可调用`版本＃1

`callable`版本#2：自定义函数

`pandas.Index.str.extract`