pickle/joblib AttributeError: module 'main' has no attribute 'thing' in pytest 在pytest中出现pickle/joblib的错误：模块'main'没有属性'thing'

Question

pickle/joblib AttributeError: module 'main' has no attribute 'thing' in pytest 在pytest中出现pickle/joblib的错误：模块'main'没有属性'thing'

6

我已经建立了一个自定义的sklearn管道，如下所示：

pipeline = make_pipeline(
    SelectColumnsTransfomer(features_to_use),
    ToDummiesTransformer('feature_0', prefix='feat_0', drop_first=True,  dtype=bool), # Dummify customer_type
    ToDummiesTransformer('feature_1', prefix='feat_1'), # Dummify the feature
    ToDummiesTransformer('feature_2', prefix='feat_2'), # Dummify 
    ToDummiesTransformer('feature_3', prefix='feat_3'), # Dummify
)
pipeline.fit(df)

类SelectColumnsTransfomer和ToDummiesTransformer是自定义的sklearn步骤，实现了BaseEstimator和TransformerMixin。为了序列化这个对象，我使用

from sklearn.externals import joblib
joblib.dump(pipeline, 'data_pipeline.joblib')

但是当我使用反序列化时

pipeline = joblib.load('data_pipeline.joblib')

我遇到了 `AttributeError: module '__main__' has no attribute 'SelectColumnsTransfomer'` 错误。

我已经阅读了其他类似的问题，并按照这篇博客文章 here 中的说明进行操作，但是无法解决问题。我正在复制并粘贴类，并在代码中导入它们。如果我创建这个练习的简化版本，整个过程都可以正常工作，问题出现在我使用 pytest 进行一些测试时，当我运行 pytest 时，它似乎看不到我的自定义类，实际上还有错误的另一部分

self = <sklearn.externals.joblib.numpy_pickle.NumpyUnpickler object at 0x7f821508a588>, module = '__main__', name = 'SelectColumnsTransfomer'

，这提示我 NumpyUnpickler 看不到 SelectColumnsTransfomer，即使在测试中已经导入了它。

我的测试代码：

import pytest
from app.pipeline import * # the pipeline objects 
                          # SelectColumnsTransfomer and ToDummiesTransformer 
                          # are here!


@pytest.fixture(scope="module")
def clf():
    pipeline = joblib.load("persistence/data_pipeline.joblib")
    return clf

def test_fake(clf):
    assert True

- DarioB

可能是Joblib.load __main__ AttributeError的重复问题。 - BenP

2个回答

0

我曾经遇到过与sklearn和复杂管道相关的类似问题。

我使用了cloudpickle 2.0.0 /py3.10（而不是pickle或joblib）来转储模型，然后使用joblib加载它，没有出现错误。

希望这能有所帮助。

注意：该模型是从jupyter笔记本中转储并在python脚本中加载的。

- Julien

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Madhav Malhotra · Accepted Answer

当我尝试像这样保存 Pytorch 类时，我遇到了相同的错误消息：

import torch.nn as nn

class custom(nn.Module):
    def __init__(self):
        super(custom, self).__init__()
        print("Class loaded")

model = custom()

然后使用Joblib将这个模型导出，就像这样：

from joblib import dump
dump(model, 'some_filepath.jobjib')

问题在于我在Kaggle内核中运行了上面的代码。然后尝试使用此脚本在本地加载已转储的文件：

from joblib import load
model = load(model, 'some_filepath.jobjib')

我解决这个问题的方法是在本地计算机上运行所有这些代码片段，而不是创建类并将其转储到 Kaggle 上，但在本地机器上加载它。我想在这里添加这个，因为@DarioB在答案中的评论让我困惑了，他们提到了一个在我更简单的情况下不适用的“函数”。

pickle/joblib AttributeError: module '__main__' has no attribute 'thing' in pytest 在pytest中出现pickle/joblib的错误：模块'__main__'没有属性'thing'

pickle/joblib AttributeError: module 'main' has no attribute 'thing' in pytest 在pytest中出现pickle/joblib的错误：模块'main'没有属性'thing'