Plotly:如何使用Plotly和Plotly Express绘制回归线?

4

我有一个数据框df,其中包含pm1和pm25两列。我想用Plotly展示这两个信号的相关性图表。目前为止,我已经成功地展示了散点图,但是我无法绘制信号之间相关性的拟合线。到目前为止,我尝试过以下方法:

denominator=df.pm1**2-df.pm1.mean()*df.pm1.sum()
print('denominator',denominator)
m=(df.pm1.dot(df.pm25)-df.pm25.mean()*df.pm1.sum())/denominator
b=(df.pm25.mean()*df.pm1.dot(df.pm1)-df.pm1.mean()*df.pm1.dot(df.pm25))/denominator
y_pred=m*df.pm1+b


lineOfBestFit = go.Scattergl(
    x=df.pm1,
    y=y_pred,
    name='Line of best fit',
    line=dict(
        color='red',
    )
)

data = [dataPoints, lineOfBestFit]
figure = go.Figure(data=data)

figure.show()

剧情:

enter image description here

如何使最佳拟合直线被正确地绘制?

2个回答

24

更新 1:

现在 Plotly Express 能够轻松处理长格式和宽格式(后者是你的情况)的数据,绘制回归线所需的唯一步骤是:

fig = px.scatter(df, x='X', y='Y', trendline="ols")

完整的代码片段位于问题末尾的宽数据处。

enter image description here

如果您希望回归线更加明显,您可以在以下位置指定trendline_color_override
fig = `px.scatter([...], trendline_color_override = 'red') 

或者在通过以下方式构建图形后包含线条颜色:

fig.data[1].line.color = 'red'

enter image description here

你可以通过以下方式访问回归参数alphabeta
model = px.get_trendline_results(fig)
alpha = model.iloc[0]["px_fit_results"].params[0]
beta = model.iloc[0]["px_fit_results"].params[1]

你甚至可以通过以下方式请求非线性拟合:
fig = px.scatter(df, x='X', y='Y', trendline="lowess")

enter image description here

而那些长格式怎么办?这就是Plotly Express展现其真正实力的地方。以内置数据集px.data.gapminder为例,您可以通过指定color="continent"来触发一系列国家的单独线条:

enter image description here

完整的长格式片段

import plotly.express as px

df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x="gdpPercap", y="lifeExp", color="continent", trendline="lowess")
fig.show()

如果您希望在模型选择和输出方面拥有更大的灵活性,您可以随时参考下面我对此帖子的原始回答。但首先,这里是我更新后的答案开头的完整代码片段:

宽数据的完整代码片段

import plotly.graph_objects as go
import plotly.express as px
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime

# data
np.random.seed(123)
numdays=20
X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})

# figure with regression
# fig = px.scatter(df, x='X', y='Y', trendline="ols")
fig = px.scatter(df, x='X', y='Y', trendline="lowess")

# make the regression line stand out
fig.data[1].line.color = 'red'

# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')

fig.show()

原始回答:

对于回归分析,我喜欢使用statsmodels.apisklearn.linear_model。我还喜欢将数据和回归结果组织在pandas dataframe中。以下是一种以清晰有序的方式完成您所需内容的方法:

使用sklearn或statsmodels进行绘图:

enter image description here

使用sklearn的代码:

from sklearn.linear_model import LinearRegression
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import datetime

# data
np.random.seed(123)
numdays=20

X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})

# regression
reg = LinearRegression().fit(np.vstack(df['X']), Y)
df['bestfit'] = reg.predict(np.vstack(df['X']))

# plotly figure setup
fig=go.Figure()
fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))
fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))

# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')

fig.show()

使用 statsmodels 的代码:

import plotly.graph_objects as go
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime

# data
np.random.seed(123)
numdays=20

X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()

df = pd.DataFrame({'X': X, 'Y':Y})

# regression
df['bestfit'] = sm.OLS(df['Y'],sm.add_constant(df['X'])).fit().fittedvalues

# plotly figure setup
fig=go.Figure()
fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))
fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))


# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')

fig.show()

4

Plotly还提供了一个对statsmodels进行包装的本地支持,用于绘制(非)线性曲线:

引自他们在https://plotly.com/python/linear-fits/上的文档。


import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
fig.show()

enter image description here


1
哇,这是一种非常直观和快速的方法,可以实现问题中所要求的。 - Lukas Fink

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接