我想使用BigQuery代替Pandas来创建我的类别变量的虚拟变量(one-hot-encoding)。因为最终会有大约200列,所以我不能手动硬编码它。
测试数据集(实际数据集比这个多得多)。
测试数据集(实际数据集比这个多得多)。
WITH table AS (
SELECT 1001 as ID, 'blue' As Color, 'big' AS size UNION ALL
SELECT 1002 as ID, 'yellow' As Color, 'medium' AS size UNION ALL
SELECT 1003 as ID, 'red' As Color, 'small' AS size UNION ALL
SELECT 1004 as ID, 'blue' As Color, 'small' AS size)
SELECT *
FROM table
预期结果: