如何从下面显示的df表到df1表?
df = koalas.DataFrame({"teams": [["SF", "NYG"] for _ in range(7)],'teams1':[np.random.randint(0,10) for _ in range(7)]})
df
output:
teams teams1
0 [SF, NYG] 0
1 [SF, NYG] 5
2 [SF, NYG] 8
3 [SF, NYG] 1
4 [SF, NYG] 2
5 [SF, NYG] 8
6 [SF, NYG] 5
df1 = koalas.DataFrame({"col1": ["SF" for _ in range(7)],\
"col2": ["NYG" for _ in range(7)],\
'teams1':[np.random.randint(0,10) for _ in range(7)]})
df1
output:
col1 col2 teams1
0 SF NYG 8
1 SF NYG 2
2 SF NYG 9
3 SF NYG 4
4 SF NYG 8
5 SF NYG 3
6 SF NYG 1
我在这里找到了一个 Pandas 的解决方案,链接在此。但这个方法会将所有数据都收集到驱动端,这并非我所期望的。我需要一个 Koalas(Pandas on PySpark) 的解决方案。
new_sdf = kdf.to_spark().withColumn('col1', sdf.teams[0]).withColumn('col2', sdf.teams[1])
- samkart