将时间序列数据汇总以制作散点图

3
我想为我的时间序列数据制作时间序列散点图,其中我的数据具有需要按组聚合的分类列以首先绘制数据,然后使用seabornmatplotlib绘制散点图。我的数据是产品销售价格时间序列数据,我想看到每个产品所有者在不同市场阈值上的价格趋势。我尝试使用pandas.pivot_tablegroupby来整理绘图数据,但无法得到我想要制作的所需图形。
可重现数据:
这里是我使用的示例产品数据; 我想看到每个经销商对不同蛋白质类型的价格趋势,与threshold相关。
我的尝试:
这是我目前尝试聚合数据以制作绘图数据的方式,但它没有给出我正确的图形。我打赌我的聚合绘图数据的方法不正确。有人能指出如何使其正确以获得所需的图形吗?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn

mydf = pd.read_csv('foo.csv')
mydf=mydf.drop(mydf.columns[0], axis=1)
mydf['expected_price'] = mydf['price']*76/mydf['threshold']

g = mydf.groupby(['dealer','protein_type'])
newdf= g.apply(lambda x: pd.Series([np.average(x['threshold'])])).unstack()

但是上面的尝试并不起作用,因为我想在每个经销商的市场购买价格上绘制不同蛋白质类型和不同阈值的数据,并沿着每日时间序列进行。我不知道如何处理这个时间序列。有人可以建议或纠正我如何正确地做到这一点吗?
我还尝试了 pandas/pivot_table 来聚合我的数据,但仍无法表示绘图数据。
pv_df= pd.pivot_table(mydf, index=['date'], columns=['dealer', 'protein_type', 'threshold'],values=['price'])
pv_df= pv_df.fillna(0)
pv_df.groupby(['dealer', 'protein_type', 'threshold'])['price'].unstack().reset_index()

但是上述尝试仍然没有起作用。此外,在我的数据中,日期不是连续的,因此我认为我可以制作月度时间序列线图。

我制作图表的尝试:

这是我制作图表的尝试:

def scatterplot(x_data, y_data, x_label, y_label, title):
    fig, ax = plt.subplots()
    ax.scatter(x_data, y_data, s = 30, color = '#539caf', alpha = 0.75)

    ax.set_title(title)
    ax.set_xlabel(x_label)
    ax.set_ylabel(y_label)
    fig.autofmt_xdate()

期望输出:

我希望有一条线图或散点图,其中x轴显示每月时间序列,y轴显示不同经销商在每个月时间序列上不同threshold值下每种不同protein_type的价格。以下是我想要的示例线图:

example line chart

2个回答

7

使用threshold更新

选项1

  • 在看到选项1的结果后,实施了这个选项。
    • 图表中存在大量未解释的信息,它们并没有清晰地呈现数据。
  • 为了清晰地呈现数据,每个图表应该仅包含一个dealer、一个threshold和一个protein_type的 3 个数据维度(例如datevaluescats)。
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import timedelta

# read the data in and parse the date column and set threshold as a str
df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])

# calculate expected price
df['expected_price'] = df.price*76/df.threshold

# set threshold as a category
df.threshold = df.threshold.astype('category')

# set the index
df = df.set_index(['date', 'dealer', 'protein_type', 'threshold'])

# form the dataframe into a long form
dfl = df.drop(columns=['destination', 'quantity']).stack().reset_index().rename(columns={'level_4': 'cats', 0: 'values'})

# plot
for pt in dfl.protein_type.unique():
    for t in dfl.threshold.unique():
        data = dfl[(dfl.protein_type == pt) & (dfl.threshold == t)]
        if not data.empty:
            utc = len(data.threshold.unique())
            f, axes = plt.subplots(nrows=utc, ncols= 2, figsize=(20, 4), squeeze=False)
            for j in range(utc):
                for i, d in enumerate(dfl.dealer.unique()):
                    data_d = data[data.dealer == d].sort_values(['cats', 'date']).reset_index(drop=True)
                    p = sns.scatterplot('date', 'values', data=data_d, hue='cats', ax=axes[j, i])
                    if not data_d.empty:
                        p.set_title(f'{d}\nThreshold: {t}\n{pt}')
                        p.set_xlim(data_d.date.min() - timedelta(days=60), data_d.date.max() + timedelta(days=60))
                    else:
                        p.set_title(f'{d}: No Data Available\nThreshold: {t}\n{pt}')
                    
            plt.show()

前四张图

enter image description here

选项2

  • 这会生成4个带有threshold作为category类型的单独图。
  • expected_price计算必须先将threshold保留为int,然后再进行转换。
  • 请注意,我的数据没有额外的未命名列,因此仍需要删除它,以下代码中没有显示该过程。
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read the data in and parse the date column and set threshold as a str
df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])

# calculate expected price
df['expected_price'] = df.price*76/df.threshold

# set threshold as a category
df.threshold = df.threshold.astype('category')

# set the index
df = df.set_index(['date', 'dealer', 'protein_type', 'threshold'])

# form the dataframe into a long form
dfl = df.drop(columns=['destination', 'quantity']).stack().reset_index().rename(columns={'level_4': 'cats', 0: 'values'})

# plot four plots with threshold
for d in dfl.dealer.unique():
    for pt in dfl.protein_type.unique():
        plt.figure(figsize=(13, 7))
        data = dfl[(dfl.protein_type == pt) & (dfl.dealer == d)]
        sns.lineplot('date', 'values', data=data, hue='threshold', style='cats')
        plt.yscale('log')
        plt.title(f'{d}: {pt}')
        plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

enter image description here enter image description here

没有 threshold 作为类别的原始内容

  • 我不理解你在做以下的操作:
    • newdf= g.apply(lambda x: pd.Series([np.average(x['threshold'])])).unstack()
    • 我认为这并不是主要的问题,主要问题是绘制数据。
  • 首先,需要将数据框转换为长格式,并且需要删除'destination'列。
  • 在单个图上绘制的维度太多了。
    • x = 'date'y = 'values'hue = 'cats'style = 'dealer'
    • 'protein_type' 需要有一个单独的图。
    • 但是,数据重叠度太高,无法使用 'dealer' 来区分,因此需要4个绘图。

数据框设置:

  • 请注意,我的数据没有额外的未命名列,因此仍需要删除它,这在以下代码中没有显示。
  • 使用 pandas.DataFrame.stack 将数据框转换为长格式。

选项1:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read the data in
df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])

# your calculation
df['expected_price'] = df['price']*76/df['threshold']

# set the index
df = df.set_index(['date', 'dealer', 'protein_type'])

# form the dataframe into a long form
dfl = df.drop(columns=['destination']).stack().reset_index().rename(columns={'level_3': 'cats', 0: 'values'})

# display(dfl.head())
        date            dealer protein_type            cats    values
0 2001-12-22  Alpha Food Corps      chicken       threshold     50.00
1 2001-12-22  Alpha Food Corps      chicken        quantity  39037.00
2 2001-12-22  Alpha Food Corps      chicken           price      0.50
3 2001-12-22  Alpha Food Corps      chicken  expected_price      0.76
4 2001-12-27  Alpha Food Corps         beef       threshold     85.00

选项2:滚动平均

df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])
df['expected_price'] = df['price']*76/df['threshold']
df = df.set_index('date')

# groupby aggregate rolling mean and stack
dfl = df.groupby(['dealer', 'protein_type'])[['expected_price', 'price']].rolling(7).mean().stack().reset_index().rename(columns={'level_3': 'cats', 0: 'values'})

选项1:两个图表

  • 'dealer'数据太相似,无法区分(有价格勾结的可能性吗?)
for pt in dfl.protein_type.unique():
    plt.figure(figsize=(9, 5))
    data = dfl[dfl.protein_type == pt]
    sns.lineplot('date', 'values', data=data, hue='cats', style='dealer')
    plt.xlim(datetime(2001, 11, 1), datetime(2004, 8, 1))
    plt.yscale('log')
    plt.title(pt)
    plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

在此输入图片描述

  • 即使只有'price''expected_price',也无法确定'dealer'

在此输入图片描述

选项 2:四个图

seaborn.FacetGrid

g = sns.FacetGrid(data=dfl, col='dealer', row='protein_type', hue='cats', height=5, aspect=1.5)
g.map(sns.lineplot, 'date', 'values').add_legend()
plt.yscale('log')
g.set_xticklabels(rotation=90)

这里输入图片描述

  • 滚动平均数据图

这里输入图片描述

嵌套循环

  • 此代码段将生成一个由4个数字组成的列,首先选择dealer,然后选择protein_type
  • 可选地,可以交换dealerprotein的顺序。
for d in dfl.dealer.unique():
    for pt in dfl.protein_type.unique():
        plt.figure(figsize=(10, 5))
        data = dfl[(dfl.protein_type == pt) & (dfl.dealer == d)]
        sns.lineplot('date', 'values', data=data, hue='cats')
        plt.xlim(datetime(2001, 11, 1), datetime(2004, 8, 1))
        plt.yscale('log')
        plt.title(f'{d}: {pt}')
        plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

CSV 示例:

date,dealer,threshold,quantity,price,protein_type,destination
2001-12-22,Alpha Food Corps,50,39037,0.5,chicken,UK
2001-12-27,Alpha Food Corps,85,35432,1.8,beef,UK
2001-12-29,Alpha Food Corps,50,32142,0.5,chicken,UK
2001-12-30,Alpha Food Corps,85,34516,1.8,beef,UK
2002-01-02,Alpha Food Corps,85,39930,1.8,beef,UK
2002-01-04,Alpha Food Corps,85,40709,1.8,beef,UK
2002-01-08,Alpha Food Corps,94,37641,2.2,beef,UK
2002-01-08,Alpha Food Corps,85,37545,1.8,beef,UK
2002-01-08,Alpha Food Corps,85,37564,1.8,beef,UK
2002-01-08,Alpha Food Corps,85,37607,1.8,beef,UK
2002-01-08,Alpha Food Corps,85,41706,1.8,beef,UK
2002-01-08,Alpha Food Corps,90,41628,2.1,beef,UK
2002-01-08,Alpha Food Corps,65,35720,0.9,chicken,UK
2002-01-09,Alpha Food Corps,94,1581,2.2,beef,UK
2002-01-09,Alpha Food Corps,85,11426,1.8,beef,UK
2002-01-09,Alpha Food Corps,85,37489,1.8,beef,UK
2002-01-09,Alpha Food Corps,90,15630,2.1,beef,UK
2002-01-09,Alpha Food Corps,80,3136,1.6,beef,UK
2002-01-10,Alpha Food Corps,85,41919,1.8,beef,UK
2002-01-10,Alpha Food Corps,90,39932,2.1,beef,UK
2002-01-10,Alpha Food Corps,90,41665,2.1,beef,UK
2002-01-10,Alpha Food Corps,90,41860,2.1,beef,UK
2002-01-10,Alpha Food Corps,65,39879,0.9,chicken,UK
2002-01-10,Alpha Food Corps,65,39884,0.9,chicken,UK
2002-01-11,Alpha Food Corps,90,37613,2.1,beef,UK
2002-01-12,Alpha Food Corps,90,41855,2.1,beef,UK
2002-01-13,Alpha Food Corps,90,37585,2.1,beef,UK
2002-01-15,Alpha Food Corps,85,41618,1.8,beef,UK
2002-01-15,Alpha Food Corps,85,41721,1.8,beef,UK
2002-01-15,Alpha Food Corps,85,41869,1.8,beef,UK
2002-01-15,Alpha Food Corps,85,41990,1.8,beef,UK
2002-01-15,Alpha Food Corps,90,41744,2.1,beef,UK
2002-01-15,Alpha Food Corps,90,41936,2.1,beef,UK
2002-01-15,Alpha Food Corps,65,41684,1.0,chicken,UK
2002-01-15,Alpha Food Corps,65,41776,1.0,chicken,UK
2002-01-16,Alpha Food Corps,94,35891,2.2,beef,UK
2002-01-16,Alpha Food Corps,85,39985,1.8,beef,UK
2002-01-16,Alpha Food Corps,85,41754,1.8,beef,UK
2002-01-16,Alpha Food Corps,85,41811,1.8,beef,UK
2002-01-16,Alpha Food Corps,90,39838,2.1,beef,UK
2002-01-16,Alpha Food Corps,80,3244,1.7,beef,UK
2002-01-17,Alpha Food Corps,94,22245,2.2,beef,UK
2002-01-17,Alpha Food Corps,85,5186,1.8,beef,UK
2002-01-17,Alpha Food Corps,90,2016,2.1,beef,UK
2002-01-17,Alpha Food Corps,90,40875,2.1,beef,UK
2002-01-17,Alpha Food Corps,65,41440,1.0,chicken,UK
2002-01-18,Alpha Food Corps,94,12525,2.2,beef,UK
2002-01-18,Alpha Food Corps,94,31325,2.2,beef,UK
2002-01-18,Alpha Food Corps,85,15486,1.8,beef,UK
2002-01-18,Alpha Food Corps,85,29992,1.8,beef,UK
2002-01-18,Alpha Food Corps,85,39938,1.8,beef,UK
2002-01-18,Alpha Food Corps,85,41777,1.8,beef,UK
2002-01-18,Alpha Food Corps,90,9475,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,9960,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,41676,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,41816,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,42036,2.1,beef,UK
2002-01-18,Alpha Food Corps,65,41673,1.0,chicken,UK
2002-01-19,Alpha Food Corps,85,19961,1.8,beef,UK
2002-01-19,Alpha Food Corps,90,19955,2.1,beef,UK
2002-01-19,Alpha Food Corps,90,40437,2.1,beef,UK
2002-01-19,Alpha Food Corps,65,41574,1.0,chicken,UK
2002-01-19,Alpha Food Corps,65,41700,1.0,chicken,UK
2002-01-20,Alpha Food Corps,94,23278,2.2,beef,UK
2002-01-20,Alpha Food Corps,85,9230,1.8,beef,UK
2002-01-20,Alpha Food Corps,85,38842,1.8,beef,UK
2002-01-20,Alpha Food Corps,90,9173,2.1,beef,UK
2002-01-20,Alpha Food Corps,90,38608,2.1,beef,UK
2002-01-20,Alpha Food Corps,50,39191,0.8,chicken,UK
2002-01-22,Alpha Food Corps,94,41741,2.2,beef,UK
2002-01-22,Alpha Food Corps,85,39879,1.8,beef,UK
2002-01-22,Alpha Food Corps,85,41683,1.8,beef,UK
2002-01-22,Alpha Food Corps,85,41958,1.8,beef,UK
2002-01-22,Alpha Food Corps,90,41833,2.1,beef,UK
2002-01-23,Alpha Food Corps,94,20294,2.2,beef,UK
2002-01-23,Alpha Food Corps,85,15553,1.8,beef,UK
2002-01-23,Alpha Food Corps,85,40753,1.8,beef,UK
2002-01-23,Alpha Food Corps,85,41740,1.8,beef,UK
2002-01-23,Alpha Food Corps,90,1892,2.1,beef,UK
2002-01-23,Alpha Food Corps,90,39850,2.1,beef,UK
2002-01-23,Alpha Food Corps,80,3231,1.7,beef,UK
2002-01-23,Alpha Food Corps,65,41415,1.1,chicken,UK
2002-01-24,Alpha Food Corps,90,35473,2.1,beef,UK
2002-01-24,Alpha Food Corps,90,41824,2.1,beef,UK
2002-01-24,Alpha Food Corps,65,41721,1.1,chicken,UK
2002-01-25,Alpha Food Corps,85,19983,1.8,beef,UK
2002-01-25,Alpha Food Corps,85,35823,1.8,beef,UK
2002-01-25,Alpha Food Corps,90,19949,2.1,beef,UK
2002-01-25,Alpha Food Corps,90,41800,2.1,beef,UK
2002-01-25,Alpha Food Corps,65,40990,1.1,chicken,UK
2002-01-26,Alpha Food Corps,90,39938,2.1,beef,UK
2002-01-26,Alpha Food Corps,90,40641,2.1,beef,UK
2002-01-26,Alpha Food Corps,90,41550,2.1,beef,UK
2002-01-27,Alpha Food Corps,94,16589,2.2,beef,UK
2002-01-27,Alpha Food Corps,85,11669,1.8,beef,UK
2002-01-27,Alpha Food Corps,90,24982,2.1,beef,UK
2002-01-27,Alpha Food Corps,65,29819,1.1,chicken,UK
2002-01-29,Alpha Food Corps,94,37516,2.2,beef,UK
2002-01-29,Alpha Food Corps,85,37378,1.8,beef,UK
2002-01-29,Alpha Food Corps,85,37535,1.8,beef,UK
2002-01-29,Alpha Food Corps,85,40174,1.8,beef,UK
2002-01-29,Alpha Food Corps,90,37831,2.1,beef,UK
2002-01-30,Alpha Food Corps,94,34435,2.2,beef,UK
2002-01-30,Alpha Food Corps,94,39640,2.2,beef,UK
2002-01-30,Alpha Food Corps,85,1619,1.8,beef,UK
2002-01-30,Alpha Food Corps,85,3058,1.8,beef,UK
2002-01-30,Alpha Food Corps,85,20929,1.8,beef,UK
2002-01-30,Alpha Food Corps,90,3641,2.1,beef,UK
2002-01-30,Alpha Food Corps,90,20974,2.1,beef,UK
2002-01-30,Alpha Food Corps,90,31160,2.1,beef,UK
2002-01-30,Alpha Food Corps,92,38189,2.3,beef,UK
2002-01-31,Alpha Food Corps,94,8804,2.2,beef,UK
2002-01-31,Alpha Food Corps,85,17398,1.8,beef,UK
2002-01-31,Alpha Food Corps,90,13963,2.1,beef,UK
2002-01-31,Alpha Food Corps,90,37673,2.1,beef,UK
2002-01-31,Alpha Food Corps,90,40330,2.1,beef,UK
2002-01-31,Alpha Food Corps,90,40511,2.2,beef,UK
2002-01-31,Alpha Food Corps,80,38290,1.9,beef,UK
2002-01-31,Alpha Food Corps,92,37193,2.3,beef,UK
2002-02-01,Alpha Food Corps,94,5011,2.2,beef,UK
2002-02-01,Alpha Food Corps,85,18783,1.8,beef,UK
2002-02-01,Alpha Food Corps,85,41827,1.8,beef,UK
2002-02-01,Alpha Food Corps,90,16394,2.1,beef,UK
2002-02-01,Alpha Food Corps,90,23013,2.1,beef,UK
2002-02-01,Alpha Food Corps,90,39923,2.1,beef,UK
2002-02-01,Alpha Food Corps,90,41417,2.1,beef,UK
2002-02-01,Alpha Food Corps,80,15592,1.7,beef,UK
2002-02-01,Alpha Food Corps,80,38364,1.9,beef,UK
2002-02-01,Alpha Food Corps,92,37605,2.3,beef,UK
2002-02-01,Alpha Food Corps,92,39234,2.3,beef,UK
2002-02-02,Alpha Food Corps,90,34578,2.1,beef,UK
2002-02-02,Alpha Food Corps,90,41661,2.1,beef,UK
2002-02-02,Alpha Food Corps,80,3157,1.7,beef,UK
2002-02-02,Alpha Food Corps,65,41272,1.2,chicken,UK
2002-02-02,Alpha Food Corps,65,41503,1.2,chicken,UK
2002-02-02,Alpha Food Corps,92,36207,2.3,beef,UK
2002-02-05,Alpha Food Corps,94,41559,2.2,beef,UK
2002-02-05,Alpha Food Corps,85,41549,1.8,beef,UK
2002-02-05,Alpha Food Corps,85,41753,1.8,beef,UK
2002-02-05,Alpha Food Corps,85,41908,1.8,beef,UK
2002-02-05,Alpha Food Corps,90,39813,2.1,beef,UK
2002-02-05,Alpha Food Corps,90,41526,2.1,beef,UK
2002-02-05,German Food Corps,80,36031,1.9,beef,UK
2002-02-05,German Food Corps,50,38538,0.9,chicken,UK
2002-02-05,Alpha Food Corps,50,38772,0.9,chicken,UK
2002-02-05,German Food Corps,50,39099,0.9,chicken,UK
2002-02-05,German Food Corps,50,39132,0.9,chicken,UK
2002-02-05,German Food Corps,50,39207,0.9,chicken,UK
2002-02-06,Alpha Food Corps,85,41947,1.8,beef,UK
2002-02-06,German Food Corps,80,37287,1.9,beef,UK
2002-02-06,Alpha Food Corps,89,43201,2.1,beef,UK
2002-02-06,German Food Corps,50,38553,0.9,chicken,UK
2002-02-06,German Food Corps,50,38837,0.9,chicken,UK
2002-02-06,Alpha Food Corps,50,38985,0.9,chicken,UK
2002-02-06,German Food Corps,65,40386,1.4,chicken,UK
2002-02-06,Alpha Food Corps,65,41851,1.2,chicken,UK
2002-02-06,Alpha Food Corps,92,38405,2.3,beef,UK
2002-02-06,German Food Corps,73,37731,1.5,chicken,UK
2002-02-07,Alpha Food Corps,85,41097,1.9,beef,UK
2002-02-07,Alpha Food Corps,90,39582,2.1,beef,UK
2002-02-07,German Food Corps,65,38832,1.4,chicken,UK
2002-02-07,German Food Corps,50,39269,0.9,chicken,UK
2002-02-07,German Food Corps,50,40129,0.9,chicken,UK
2002-02-07,German Food Corps,50,41124,0.8,chicken,UK
2002-02-07,German Food Corps,65,41739,1.2,chicken,UK
2002-02-08,Alpha Food Corps,85,20034,1.8,beef,UK
2002-02-08,German Food Corps,85,33503,1.9,beef,UK
2002-02-08,German Food Corps,85,40780,1.9,beef,UK
2002-02-08,Alpha Food Corps,90,19913,2.1,beef,UK
2002-02-08,Alpha Food Corps,90,36682,2.1,beef,UK
2002-02-08,Alpha Food Corps,90,41624,2.1,beef,UK
2002-02-08,German Food Corps,65,37503,1.4,chicken,UK
2002-02-08,German Food Corps,50,38973,0.9,chicken,UK
2002-02-08,German Food Corps,50,39069,0.9,chicken,UK
2002-02-08,German Food Corps,50,40697,0.9,chicken,UK
2002-02-08,German Food Corps,92,36103,2.3,beef,UK
2002-02-08,Alpha Food Corps,92,38278,2.3,beef,UK
2002-02-09,Alpha Food Corps,90,39842,2.1,beef,UK
2002-02-09,Alpha Food Corps,90,16553,2.3,beef,UK
2002-02-09,Alpha Food Corps,80,18739,1.9,beef,UK
2002-02-09,German Food Corps,80,36349,1.9,beef,UK
2002-02-09,German Food Corps,65,35238,1.4,chicken,UK
2002-02-09,German Food Corps,50,38391,0.9,chicken,UK
2002-02-09,Alpha Food Corps,50,38819,0.9,chicken,UK
2002-02-09,German Food Corps,50,41691,0.9,chicken,UK
2002-02-09,Alpha Food Corps,92,40245,2.3,beef,UK
2002-02-09,German Food Corps,73,37323,1.5,chicken,UK
2002-02-09,German Food Corps,90,40312,2.2,beef,UK
2002-02-10,Alpha Food Corps,90,42108,2.1,beef,UK
2002-02-10,German Food Corps,65,37831,1.4,chicken,UK
2002-02-11,Alpha Food Corps,50,38591,0.9,chicken,UK
2002-02-12,Alpha Food Corps,94,41559,2.3,beef,UK
2002-02-12,Alpha Food Corps,85,40968,1.8,beef,UK
2002-02-12,Alpha Food Corps,85,41985,1.8,beef,UK
2002-02-12,German Food Corps,50,38931,0.9,chicken,UK
2002-02-12,German Food Corps,50,38986,0.9,chicken,UK
2002-02-12,German Food Corps,92,39684,2.3,beef,UK
2002-02-12,German Food Corps,73,36619,1.5,chicken,UK
2002-02-13,Alpha Food Corps,85,41291,1.8,beef,UK
2002-02-13,Alpha Food Corps,85,41892,1.8,beef,UK

3
在线图中,据我所知,您只能表示4个维度:
- x轴,您可以将其用于“日期” - y轴,您可以将其用于“价格” - 线的“hue”,您可以将其用于“阈值” - 线的“style”,您可以将其用于“经销商”
但是,您想考虑第五个维度:蛋白质类型。 为此,我建议使用下面代码中的子图:
# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read dataframe
mydf = pd.read_csv('foo.csv')
mydf = mydf.drop(mydf.columns[0], axis = 1)

# convert 'date' type to datetime and sort values by threshold, then by date
mydf['date'] = pd.to_datetime(mydf['date'], format = '%m/%d/%Y')
mydf['threshold'] = mydf['threshold'].astype('category')
mydf.sort_values(['threshold', 'date'], inplace = True)

# set up subplots layout, one row for each threshold
fig, ax = plt.subplots(nrows = len(mydf['protein_type'].unique()),
                       ncols = 1,
                       figsize = (10, 10),
                       sharex = True)

# loop over protein_type
for i, protein_type in enumerate(mydf['protein_type'].unique(), 0):

    # filter dataframe
    df_filtered = mydf[mydf['protein_type'] == protein_type]

    # set up plot
    sns.lineplot(ax = ax[i],
                 data = df_filtered,
                 x = 'date',
                 y = 'price',
                 hue = 'threshold',
                 style = 'dealer',
                 legend = 'full',
                 ci = False)

    # set up subplot title and legend
    ax[i].set_title(f'Protein type = {protein_type}')
    ax[i].legend(bbox_to_anchor = (1.02, 1), loc = 'upper left')

# adjust general layout
plt.subplots_adjust(top = 0.95,
                    right = 0.85,
                    bottom = 0.05,
                    left = 0.05,
                    hspace = 0.15)

# show the plot
plt.show()

enter image description here


在上面的图中,很难区分不同的交易商之间的差异,因此您可以像下面的代码一样将它们分开放置在另一个子图网格中:
# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read dataframe
mydf = pd.read_csv('foo.csv')
mydf = mydf.drop(mydf.columns[0], axis = 1)

# convert 'date' type to datetime and sort values by threshold, then by date
mydf['date'] = pd.to_datetime(mydf['date'], format = '%m/%d/%Y')
mydf['threshold'] = mydf['threshold'].astype('category')
mydf.sort_values(['threshold', 'date'], inplace = True)

# set up subplots layout, one row for each threshold, one column for each dealer
fig, ax = plt.subplots(nrows = len(mydf['protein_type'].unique()),
                       ncols = len(mydf['dealer'].unique()),
                       figsize = (10, 10),
                       sharex = True,
                       sharey = True)

# loop over protein_type
for i, protein_type in enumerate(mydf['protein_type'].unique(), 0):

    # loop over dealer
    for j, dealer in enumerate(mydf['dealer'].unique(), 0):

        # filter dataframe
        df_filtered = mydf[(mydf['protein_type'] == protein_type) & (mydf['dealer'] == dealer)]

        # set up plot
        sns.lineplot(ax = ax[i, j],
                     data = df_filtered,
                     x = 'date',
                     y = 'price',
                     hue = 'threshold',
                     legend = 'full',
                     ci = False)

        # set up subplot title and legend
        ax[i, j].set_title(f'Protein type = {protein_type} | Dealer = {dealer}')
        ax[i, j].legend(bbox_to_anchor = (1.02, 1), loc = 'upper left')

# adjust general layout
plt.subplots_adjust(top = 0.95,
                    right = 0.9,
                    bottom = 0.05,
                    left = 0.05,
                    wspace = 0.3,
                    hspace = 0.2)

# show the plot
plt.show()

enter image description here


最后,如果您想比较 priceexpected_price,您可以使用 style 维度来完成此任务。
这需要对数据框进行不同的聚合:您需要将 priceexpected_price 列堆叠在一个唯一的列中。您可以使用 pd.melt 方法来实现这一点。
请参考以下代码:
# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read dataframe
mydf = pd.read_csv('foo.csv')
mydf = mydf.drop(mydf.columns[0], axis = 1)
mydf['expected_price'] = mydf['price']*76/mydf['threshold']

# convert 'date' type to datetime
mydf['date'] = pd.to_datetime(mydf['date'], format = '%m/%d/%Y')
mydf['threshold'] = mydf['threshold'].astype('category')

# reshape dataframe
mydf = pd.melt(frame = mydf,
               id_vars = ['date', 'dealer', 'threshold', 'quantity', 'protein_type', 'destination'],
               value_vars = ['price', 'expected_price'],
               var_name = 'price type',
               value_name = 'price value')

# sort values by threshold, then by date
mydf.sort_values(['threshold', 'date'], inplace = True)

# set up subplots layout, one row for each threshold, one column for each dealer
fig, ax = plt.subplots(nrows = len(mydf['protein_type'].unique()),
                       ncols = len(mydf['dealer'].unique()),
                       figsize = (10, 10),
                       sharex = True,
                       sharey = True)

# loop over protein_type
for i, protein_type in enumerate(mydf['protein_type'].unique(), 0):

    # loop over dealer
    for j, dealer in enumerate(mydf['dealer'].unique(), 0):

        # filter dataframe
        df_filtered = mydf[(mydf['protein_type'] == protein_type) & (mydf['dealer'] == dealer)]

        # set up plot
        sns.lineplot(ax = ax[i, j],
                     data = df_filtered,
                     x = 'date',
                     y = 'price value',
                     hue = 'threshold',
                     style = 'price type',
                     legend = 'full',
                     ci = False)

        # set up subplot title and legend
        ax[i, j].set_title(f'Protein type = {protein_type} | Dealer = {dealer}')
        ax[i, j].legend(bbox_to_anchor = (1.02, 1), loc = 'upper left')

# adjust general layout
plt.subplots_adjust(top = 0.95,
                    right = 0.9,
                    bottom = 0.05,
                    left = 0.05,
                    wspace = 0.3,
                    hspace = 0.2)

# show the plot
plt.show()

enter image description here


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接