用Python数CSV文件中的列数

4

https://istack.dev59.com/Le696.webp

我希望能够分别计算男性和女性的电子邮件账户,但我编写的代码无法正常工作,有人可以帮忙吗?以下是我的代码,提前感谢您的帮助。

    import csv

mailAcc = {}
femailAcc = {}

with open('1000 Records.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)
    for i in csv_reader:
        email = i[6]
        gender = i[5]
        doman = email.split('@')[-1]
        if doman in mailAcc:
            if gender == 'm':
                 mailAcc[doman] = mailAcc[doman] + 1
        else:
            mailAcc[doman] = 1

        if doman in femailAcc:
            if gender == 'F':
                femailAcc[doman] = femailAcc[doman] + 1
        else:
            femailAcc[doman] = 1
            
    print('Mail Email accounts: ', mailAcc)
    print('Femail Email Accounts: ', femailAcc)

欢迎来到SO。请避免使用截图,复制并粘贴一些虚拟数据作为[mcve]。另外:我真的希望那些电子邮件地址不是真实的... - JoSSte
2
它们是假邮件。 - imhamza3333
你只是想计算男性和女性账户的总数,还是要按域名来计算?如果只是想计算男性和女性,就无需检查域名。 - accdias
我想用域名来做这件事,比如Gmail上有多少个男性账户和多少个女性账户。这里男性账户是9个,但他们给我的结果比9个还要多,女性账户也是一样。 - imhamza3333
请将示例输入数据以文本形式发布,这将使试图帮助您重现问题的人更容易。 - accdias
显示剩余2条评论
3个回答

1
使用pandas。
import pandas as pd

df = pd.read_csv('your_csv_file.csv') # read in csv
df['domain'] = df['email'].apply(lambda x: x[x.index('@')+1:]) # column with just domain

male = {} # setup male dictionary
female = {} # setup female dictionary

# iterate on unique domains to get a count of male/female and populate in dictionaries
for domain in df['domain'].unique():   
    male[domain] = df[(df['gender']=='M') & (df['domain']==domain)].shape[0]
    female[domain] = df[(df['gender']=='F') & (df['domain']==domain)].shape[0]

我不能。这是不允许的。 - imhamza3333

1

这可以通过 pandas 实现。由于你的列没有名称,请在读取csv文件时使用 header=None,并通过编号访问列:

import pandas as pd

df = pd.read_csv('1000 Records.csv', header=None)
df['mailhosts'] = df[6].str.split('@').str[-1]

gp = df.groupby(5)

#count e-mail accounts per gender:
print('Female Email Accounts:', gp.get_group('F')['mailhosts'].value_counts())
print('Male Email Accounts:', gp.get_group('M')['mailhosts'].value_counts())

谢谢您的帮助,但我不能使用pandas,因为它不允许。 - imhamza3333
1
@imhamza3333,如果您不能使用pandas,为什么要将此答案标记为已接受? - accdias

0
这里是一个解决方案,使用标准的Python模块仅通过域来计算男性和女性账户:
import csv
from collections import Counter

males = Counter()
females = Counter()

with open('1000 Records.csv') as f:
    records = csv.reader(f)
    for record in records:
        _, domain = record[6].split('@')
        gender = record[5]
        if gender.lower() == 'm':
            males.update((domain.lower(),))
        else:
            females.update((domain.lower(),))

    print('Total male accounts:', sum(males.values()))
    print('Total male accounts by domain')
    for k, v in males.items():
        print(k, v)

    print('Total female accounts:', sum(females.values()))
    print('Total female accounts by domain')
    for k, v in females.items():
        print(k, v)
                                              

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接