如果Pandas数据框中所有行的列只有一个值，则折叠行

Question

如果Pandas数据框中所有行的列只有一个值，则折叠行

5

我有以下数据框（DF）

         col1  |  col2   | col3   | col4   | col5  | col6
    0    -     |   15.0  |  -     |  -     |   -   |  -
    1    -     |   -     |  -     |  -     |   -   |  US
    2    -     |   -     |  -     |  Large |   -   |  -
    3    ABC1  |   -     |  -     |  -     |   -   |  -
    4    -     |   -     |  24RA  |  -     |   -   |  -
    5    -     |   -     |  -     |  -     |   345 |  -

我希望将行折叠为以下格式

    output DF:
         col1  |  col2    | col3   | col4   | col5  | col6
    0    ABC1  |   15.0   |  24RA  |  Large |   345 |  US

我不想遍历列，而是想使用pandas来实现这一点。

- Test Test

1

除了有效值，字面破折号（*-*）或NaN还有哪些其他值？ - Psidom

它是NaN - 我不好意思，为了更好的可读性，我把它替换成了'-'。 - Test Test

2个回答

1

你可以使用max，但是你需要将字符串列中的空值转换一下（这有点丑陋）。

>>> df = pd.DataFrame({'col1':[np.nan, "ABC1"], 'col2':[15.0, np.nan]})

>>> df.apply(lambda c: c.fillna('') if c.dtype is np.dtype('O') else c).max()
col1    ABC1
col2      15
dtype: object

你可以使用向前填充和向后填充的组合来填补空缺，如果只想将其应用于某些列，则这可能很有用：

>>> df.apply(lambda c: c.fillna(method='bfill').fillna(method='ffill'))

- maxymoo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- piRSquared · Accepted Answer

选项0
超级简单

pd.concat([pd.Series(df[c].dropna().values, name=c) for c in df], axis=1)

   col1  col2  col3   col4   col5 col6
0  ABC1  15.0  24RA  Large  345.0   US

我们能够处理每个列中超过一个值吗？
当然可以！

df.loc[2, 'col3'] = 'Test'

   col1  col2  col3   col4   col5 col6
0  ABC1  15.0  Test  Large  345.0   US
1   NaN   NaN  24RA    NaN    NaN  NaN

选项 1
使用像外科医生一样的 np.where 泛化解决方案

v = df.values
i, j = np.where(np.isnan(v))

s = pd.Series(v[i, j], df.columns[j])

c = s.groupby(level=0).cumcount()
s.index = [c, s.index]
s.unstack(fill_value='-')  # <-- don't fill to get NaN

   col1  col2  col3   col4 col5 col6
0  ABC1  15.0  24RA  Large  345   US

df.loc[2, 'col3'] = 'Test'

v = df.values
i, j = np.where(np.isnan(v))

s = pd.Series(v[i, j], df.columns[j])

c = s.groupby(level=0).cumcount()
s.index = [c, s.index]
s.unstack(fill_value='-')  # <-- don't fill to get NaN

   col1  col2  col3   col4 col5 col6
0  ABC1  15.0  Test  Large  345   US
1     -     -  24RA      -    -    -

选项2
mask 生成空值，然后使用 stack 去除它们

或者我们可以这样做

# This should work even if `'-'` are NaN
# but you can skip the `.mask(df == '-')`
s = df.mask(df == '-').stack().reset_index(0, drop=True)
c = s.groupby(level=0).cumcount()
s.index = [c, s.index]
s.unstack(fill_value='-')

   col1  col2  col3   col4 col5 col6
0  ABC1  15.0  Test  Large  345   US
1     -     -  24RA      -    -    -