如何找到每行之间差异的平均值？

Question

如何找到每行之间差异的平均值？

3

大家早上好。我试图找出两行之间的差异。我在尝试编写一个公式，但感觉可能有更简单的答案，而且我的代码不完美。以下是我的示例数据集。

cols = ['Name', 'Math', 'Science', 'English', "History"]
data = [['Tom', 100, 93, 95, 92], ['Nick', 89, 75, 82, 57], ['Julie', 99, 89, 76, 88], ['Sarah', 79, 78, 94, 88]]
df = pd.DataFrame(data, columns=cols)
df

输出结果如下：

我目前的（不起作用的）公式是：

students = ['Tom', 'Nick', 'Julie', 'Sarah']
differences = []

def student_diff(student):
    for col in df.columns[1:]:
        for classmate in students:
            differences.append(abs(student[col] - classmate[col]))
            print (student, differences.mean())
              
student_diff('Tom')

错误如下：

TypeError: string indices must be integers

总的来说，我希望输出结果是这样的（例如Tom）：

Nick 19.25

- E_Sarousi

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- It_is_Chris · Accepted Answer

# student to find difference
student = 'Tom'
# create your mask where the name is the student name 
mask = df['Name'].eq(student)
# concat you masks together and set the index
data = pd.concat([df[mask], df[~mask]]).set_index('Name')
# get the mean from the differnce
abs(data.iloc[0, :] - data.iloc[1:, :]).mean(axis=1)

Name
Nick     19.25
Julie     7.00
Sarah    10.25
dtype: float64

或者如果您想要一个函数

def student_diff(df: pd.DataFrame, student: str) -> pd.Series:
    # create your mask where the name is the student name 
    mask = df['Name'].eq(student)
    # concat you masks together and set the index
    data = pd.concat([df[mask], df[~mask]]).set_index('Name')
    # get the mean from the differnce
    return abs(data.iloc[0, :] - data.iloc[1:, :]).mean(axis=1)


student_diff(df=df, student='Nick')

Name
Tom      19.25
Julie    15.25
Sarah    14.00
dtype: float64