我有一个数据集希望进行转换,下面是该数据集的部分内容。其中有一列名为"Hospital",其值在数据集中不断重复。我希望对这个数据集进行转换,只留下第一行(即名为"prelim_arm_1"的行)的数据,并删除其他三个分组(即"arms")所对应的行。
import pandas as pd
import numpy as np
# initialize data of lists.
data = {'Hospital':['prelim_arm_1' , '24_hour_review_arm_1','48_hour_review_arm_1',
'72_hour_review_arm_1','discharge_informat_arm_1','prelim_arm_1' ,
'24_hour_review_arm_1','48_hour_review_arm_1',
'72_hour_review_arm_1','discharge_informat_arm_1'],
'Bug_Hosp':['133', 'NAN' , 'NAN', 'NAN', 'NAN','133', 'NAN' , 'NAN', 'NAN', 'NAN'],
'code':['G45','NAN' ,'NAN','NAN', 'NAN', 'G45','NAN' ,'NAN','NAN', 'NAN'],
'cont':['T256','NAN' ,'NAN','NAN', 'NAN','T256','NAN' ,'NAN','NAN', 'NAN'],
'IPC':['NAN','NAN' ,'NAN','567TY', 'NAN','NAN','NAN' ,'NAN','567Tu', 'NAN'],
'NO_CT':['NAN','NAN' ,'NAN','NAN', '5667','NAN','NAN' ,'NAN','3456', 'NAN'],
}
# Create DataFrame
df_final = pd.DataFrame(data)
# Print the output.
print(df_final)
最终数据集应该像这样
import pandas as pd
import numpy as np
# initialize data of lists.
data = {'Hospital':['prelim_arm_1'],
'Bug_Hosp':['133'], 'code':['G45'],
'cont':['T256'],
'IPC':['567TY'],
'NO_CT':['5667']}
# Create DataFrame
df_final = pd.DataFrame(data)
# Print the output.
print(df_final)
这个数据集非常庞大,且存在重复行分组,但我想每4行为一组,只保留 prelim_arm_1 的数据并删除其他 3 行分组的数据。所以最终表格将只包含每个 4 行分组的 prelim_arm_1 数据。
df[df.Hospital == 'prelim_arm_1']
,因为其他字段都是“NAN”。 - morganics