使用某些条件将Pandas数据帧的一列数据拆分为两列

3

我有一个数据框,如下所示-

             0  
    ____________________________________
0     Country| India  
60        Delhi  
62       Mumbai  
68       Chennai  
75    Country| Italy  
78        Rome  
80       Venice  
85        Milan  
88    Country| Australia  
100      Sydney  
103      Melbourne  
107      Perth  

我想将数据分成两列,一列是国家,另一列是城市。 我不知道从哪里开始。 我希望像下面这样-

             0                    1
    ____________________________________
0     Country| India           Delhi
1     Country| India           Mumbai
2     Country| India           Chennai         
3    Country| Italy           Rome
4    Country| Italy           Venice   
5    Country| Italy           Milan        
6    Country| Australia       Sydney
7   Country| Australia       Melbourne
8   Country| Australia       Perth     

有什么想法如何做到这一点?
2个回答

3
在寻找包含|的行并将其放入另一列中,然后向下填充新创建的列。
(
    df.rename(columns={"0": "city"})
    # this looks for rows that contain '|' and puts them into a 
    # new column called Country. rows that do not match will be
    # null in the new column.
    .assign(Country=lambda x: x.loc[x.city.str.contains("\|"), "city"])
    # fill down on the Country column, this also has the benefit
    # of linking the Country with the City, 
    .ffill()
    # here we get rid of duplicate Country entries in city and Country
    # this ensures that only Country entries are in the Country column
    # and cities are in the City column
    .query("city != Country")
    # here we reverse the column positions to match your expected output 
    .iloc[:, ::-1]
)


      Country           city
60  Country| India      Delhi
62  Country| India      Mumbai
68  Country| India      Chennai
78  Country| Italy      Rome
80  Country| Italy      Venice
85  Country| Italy      Milan
100 Country| Australia  Sydney
103 Country| Australia  Melbourne
107 Country| Australia  Perth

请问您能否解释一下这个解决方案? - Ashish Kumar

2

使用DataFrame.insertSeries.where以及Series.str.startswith将不匹配的值替换为缺失值,使用ffill填充缺失值并通过Series.ne移除两个中相同值的行,以实现在布尔索引中的不等操作:

df.insert(0, 'country', df[0].where(df[0].str.startswith('Country')).ffill())
df = df[df['country'].ne(df[0])].reset_index(drop=True).rename(columns={0:'city'})
print (df)
             country       city
0      Country|India      Delhi
1      Country|India     Mumbai
2      Country|India    Chennai
3      Country|Italy       Rome
4      Country|Italy     Venice
5      Country|Italy      Milan
6  Country|Australia     Sydney
7  Country|Australia  Melbourne
8  Country|Australia      Perth

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接