I have a dataframe in a "yes/no" format like
7 22
1 NaN t
25 t NaN
其中“t”代表“是”,我需要将其转换为X-Y表格,因为列名是X坐标,索引是Y坐标:
X Y
1 22 1
2 7 25
一个类似伪代码的例子:
if a cell = "t":
newdf.X = df.column(t)
newdf.Y = df.index(t)
I have a dataframe in a "yes/no" format like
7 22
1 NaN t
25 t NaN
其中“t”代表“是”,我需要将其转换为X-Y表格,因为列名是X坐标,索引是Y坐标:
X Y
1 22 1
2 7 25
if a cell = "t":
newdf.X = df.column(t)
newdf.Y = df.index(t)
试试这个:
# Use np.where to get the integer location of the 't's in the dataframe
r, c = np.where(df == 't')
# Use dataframe constructor with dataframe indexes to define X, Y
df_out = pd.DataFrame({'X':df.columns[c], 'Y':df.index[r]})
df_out
输出:
X Y
0 22 1
1 7 25
根据@RajeshC的评论更新:
给定数据框df,
7 22
1 NaN t
13 NaN NaN
25 t NaN
然后:
r, c = np.where(df == 't')
df_out = pd.DataFrame({'X':df.columns[c], 'Y':df.index[r]}, index=r)
df_out = df_out.reindex(range(df.shape[0]))
df_out
输出:
X Y
0 22 1.0
1 NaN NaN
2 7 25.0
stack
的另一个选项:
pd.DataFrame.from_records(
df.stack().index.swaplevel(),
columns=['X', 'Y'])
输出:
X Y
0 22 1
1 7 25
df.stack().to_frame().reset_index().drop(0,axis = 1).rename(columns = {'level_0':"Y","level_1":"X"}).reindex(columns=["X","Y"])
PS - 感谢Kristian Canler的编辑。