首先,似乎没有直接将2列CSV文件加载到Map中的方法。如果我有一个简单的map.csv:
1,2
3,4
5,6
我试图将它作为地图加载:
m = load 'map.csv' using PigStorage(',') as (M: []);
dump m;
我得到了三个空元组:
()
()
()
所以我尝试加载元组,然后生成地图:
m = load 'map.csv' using PigStorage(',') as (key:chararray, val:chararray);
b = foreach m generate [key#val];
ERROR 1000: Error during parsing. Encountered " "[" "[ "" at line 1, column 24.
...
许多语法变体也会失败(例如,generate [$0#$1]
)。
好的,所以我将我的映射转换为Pig的映射字面格式,如map.pig
:
[1#2]
[3#4]
[5#6]
并加载它:
m = load 'map.pig' as (M: []);
现在让我们加载一些键并尝试查找:
k = load 'keys.csv' as (key);
dump k;
3
5
1
c = foreach k generate m#key; /* Or m[key], or... what? */
ERROR 1000: Error during parsing. Invalid alias: m in {M: map[ ]}
嗯,好的,也许由于涉及两个关系,我们需要使用联接(join):
c = join k by key, m by /* ...um, what? */ $0;
dump c;
ERROR 1068: Using Map as key not supported.
c = join k by key, m by m#key;
dump c;
Error 1000: Error during parsing. Invalid alias: m in {M: map[ ]}
失败。如何引用map的键(或值)?映射模式语法似乎甚至不能让你命名键和值(邮件列表说没有办法分配类型)。
最后,我只想能够找到我的地图中所有的键:
d = foreach m generate ...oh, forget it.
Pig的map类型不完善吗?我错过了什么?