I have a two different dataframes:
users
:
+-------+---------+--------+
|user_id| movie_id|timestep|
+-------+---------+--------+
| 100 | 1000 |20200728|
| 101 | 1001 |20200727|
| 101 | 1002 |20200726|
+-------+---------+--------+
电影
:
+--------+---------+--------------------------+
|movie_id| title | genre |
+--------+---------+--------------------------+
| 1000 |Toy Story|Adventure|Animation|Chil..|
| 1001 | Jumanji |Adventure|Children|Fantasy|
| 1002 | Iron Man|Action|Adventure|Sci-Fi |
+--------+---------+--------------------------+
如何获得以下格式的数据框?这样我就可以通过相似度分数比较不同用户的口味偏好。
+-------+---------+---------+---------+--------+-----+
|user_id| Action |Adventure|Animation|Children|Drama|
+-------+---------+---------+---------+---------+----+
| 100 | 0 | 1 | 1 | 1 | 0 |
| 101 | 1 | 2 | 0 | 1 | 0 |
+-------+---------+---------+---------+--------+-----+
F.explode(F.split("genre", '\|'))
之后再进行连接。 - undefined