我有一个Pandas DataFrame,想把'lat'和'long'列合并成一个元组。
<class 'pandas.core.frame.DataFrame'>
Int64Index: 205482 entries, 0 to 209018
Data columns:
Month 205482 non-null values
Reported by 205482 non-null values
Falls within 205482 non-null values
Easting 205482 non-null values
Northing 205482 non-null values
Location 205482 non-null values
Crime type 205482 non-null values
long 205482 non-null values
lat 205482 non-null values
dtypes: float64(4), object(5)
我尝试使用的代码是:
def merge_two_cols(series):
return (series['lat'], series['long'])
sample['lat_long'] = sample.apply(merge_two_cols, axis=1)
然而,这导致了以下错误:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-261-e752e52a96e6> in <module>()
2 return (series['lat'], series['long'])
3
----> 4 sample['lat_long'] = sample.apply(merge_two_cols, axis=1)
5
...
AssertionError: Block shape incompatible with manager
我该如何解决这个问题?
list
。下面的代码应该可以起作用:df['new_col'] = list(zip(df.lat, df.long))
。 - paulwasitlist(zip(df.lat, df.long))
在124毫秒内比df[['lat', 'long']].apply(tuple, axis=1)
更有效,后者需要14.2秒才能完成对于900k行数据的处理。两者效率相差超过100倍。 - Pengju Zhaodf['new_col'] = list(zip(df[cols_to_keep]))
,但是一直出现错误:值的长度与索引的长度不匹配
。有什么建议吗? - seeiespidf['new_col'] = list(zip(*[df[c] for c in cols_to_keep])
- jedge