我正在尝试使用一个字典
的键
来替换一个
字符串
为它的值
。然而,每个列都包含句子。因此,我必须先对句子进行分词,并检测句子中的单词是否与字典中的键相对应,然后用相应的值替换字符串。
然而,我继续得到的结果是“none”。有没有更好的Pythonic方法来解决这个问题?
这是我目前的MVC。在注释中,我指定了问题出现的位置。
import pandas as pd
data = {'Categories': ['animal','plant','object'],
'Type': ['tree','dog','rock'],
'Comment': ['The NYC tree is very big','The cat from the UK is small','The rock was found in LA.']
}
ids = {'Id':['NYC','LA','UK'],
'City':['New York City','Los Angeles','United Kingdom']}
df = pd.DataFrame(data)
ids = pd.DataFrame(ids)
def col2dict(ids):
data = ids[['Id', 'City']]
idDict = data.set_index('Id').to_dict()['City']
return idDict
def replaceIds(data,idDict):
ids = idDict.keys()
types = idDict.values()
data['commentTest'] = data['Comment']
words = data['commentTest'].apply(lambda x: x.split())
for (i,word) in enumerate(words):
#Here we can see that the words appear
print word
print ids
if word in ids:
#Here we can see that they are not being recognized. What happened?
print ids
print word
words[i] = idDict[word]
data['commentTest'] = ' '.apply(lambda x: ''.join(x))
return data
idDict = col2dict(ids)
results = replaceIds(df, idDict)
结果:
None
我正在使用 python2.7
,当我打印 dict
时,会出现带有 Unicode 编码的 u'
。
我的预期输出是:
Categories
Comment
Type
commentTest
Categories Comment Type commentTest
0 animal The NYC tree is very big tree The New York City tree is very big
1 plant The cat from the UK is small dog The cat from the United Kingdom is small
2 object The rock was found in LA. rock The rock was found in Los Angeles.
regex=True
?从文档中我认为应该是 False:“是否将 to_replace 和/或 value 解释为正则表达式。如果为 True,则 to_replace 必须是字符串。否则,to_replace 必须为 None,因为此参数将被解释为正则表达式或正则表达式列表、字典或数组。” - pceccon