我有两个 PySpark 数据框。
df1:
person_id Name serialNo Maritalstatus Location_name
01 abc 10 M America
02 xyz 13 S London
03 def 14 M Europe
04 qwe 15 M Australia
05 asd 16 M Europe
06 fgh 17 M London
07 aka 18 M Australia
08 fgi 19 M London
09 aba 20 M Australia
df2:
Code Location_Name Location_Id
111 Australia AUS
112 America USA
123 London UK
124 Europe EU
我想在df1中添加一个名为Location_Id的列,从df2中获取匹配的ID,类似下面这样:
person_id Name serialNo Maritalstatus Location_name Location_Id
01 abc 10 M America USA
02 xyz 13 S London UK
03 def 14 M Europe EU
04 qwe 15 M Australia AUS
05 asd 16 M Europe EU
06 fgh 17 M London UK
07 aka 18 M Australia AUS
08 fgi 19 M London UK
09 aba 20 M Australia AUS
我该如何实现这个目标?
df1.join(df2, on='Location_Id').drop('Code')
。 - pythonic833