I have the following code:
businessdata = ['Name of Location','Address','City','Zip Code','Website','Yelp',
'# Reviews', 'Yelp Rating Stars','BarRestStore','Category',
'Price Range','Alcohol','Ambience','Latitude','Longitude']
business = pd.read_table('FL_Yelp_Data_v2.csv', sep=',', header=1, names=businessdata)
print '\n\nBusiness\n'
print business[:6]
它会读取我的文件并创建一个可以操作的Panda表格。 我需要做的是统计每行“Category”变量中有多少个类别,并将此数字存储在一个名为“# Categories”的新列中。以下是目标列示例:
Category
French
Adult Entertainment , Lounges , Music Venues
American (New) , Steakhouses
American (New) , Beer, Wine & Spirits , Gastropubs
Chicken Wings , Sports Bars , American (New)
Japanese
期望的输出:
Category # Categories
French 1
Adult Entertainment , Lounges , Music Venues 3
American (New) , Steakhouses 2
American (New) , Beer, Wine & Spirits , Gastropubs 4
Chicken Wings , Sports Bars , American (New) 3
Japanese 1
编辑1:
原始输入= CSV文件。目标列:“类别” 我现在无法发布截图。 我认为要计算的值不是列表。
这是我的代码:
business = pd.read_table('FL_Yelp_Data_v2.csv', sep=',', header=1, names=businessdata, skip_blank_lines=True)
#business = pd.read_csv('FL_Yelp_Data_v2.csv')
business['Category'].str.split(',').apply(len)
#not sure where to declare the df part in the suggestions that use it.
print business[:6]
但是我一直收到以下错误提示:
TypeError: object of type 'float' has no len()
编辑2:
我放弃了。感谢你们的所有帮助,但我将不得不想出其他办法。
print type(business['Category']) is [所有类型的变量]
但总是返回 False。 - Danilo