技术上说,每周6天足以为大小为7的词汇表提供独特的映射:
1. Sunday [0,0,0,0,0,0]
2. Monday [1,0,0,0,0,0]
3. Tuesday [0,1,0,0,0,0]
4. Wednesday [0,0,1,0,0,0]
5. Thursday [0,0,0,1,0,0]
6. Friday [0,0,0,0,1,0]
7. Saturday [0,0,0,0,0,1]
虚拟编码是更紧凑的表示法,它在统计模型中更受欢迎,当输入线性独立时表现更佳。
然而,现代机器学习算法不需要其输入线性独立,并使用诸如L1正则化等方法来修剪冗余输入。这种额外的自由度使框架能够在生产环境中透明地处理缺失的输入,因为所有缺失的输入将被视为全零。
1. Sunday [0,0,0,0,0,0,1]
2. Monday [0,0,0,0,0,1,0]
3. Tuesday [0,0,0,0,1,0,0]
4. Wednesday [0,0,0,1,0,0,0]
5. Thursday [0,0,1,0,0,0,0]
6. Friday [0,1,0,0,0,0,0]
7. Saturday [1,0,0,0,0,0,0]
for missing values : [0,0,0,0,0,0,0]