尝试将字符串转换为数字向量,
### Clean the string
def names_to_words(names):
print('a')
words = re.sub("[^a-zA-Z]"," ",names).lower().split()
print('b')
return words
### Vectorization
def Vectorizer():
Vectorizer= CountVectorizer(
analyzer = "word",
tokenizer = None,
preprocessor = None,
stop_words = None,
max_features = 5000)
return Vectorizer
### Test a string
s = 'abc...'
r = names_to_words(s)
feature = Vectorizer().fit_transform(r).toarray()
但是当我遇到以下情况时:
['g', 'o', 'm', 'd']
出现了错误:
ValueError: empty vocabulary; perhaps the documents only contain stop words
似乎单个字符的字符串存在问题。 应该怎么办呢? 谢谢。