Pandas不可变数据框架

29

我有兴趣在程序中使用一个不可变的DataFrame作为参考表,希望在初始化构建完成后(在我的情况下是类的def __init__()方法期间)强制执行只读属性。

我看到索引对象是被冻结的。

是否有一种方法可以使整个DataFrame不可变?


1
问题几乎就像如何使列表不可变。这些问题根源于数据结构的设计,可能无法更改。如果您确实想强制执行此操作,以便指向数据框的指针不能意外更改它,则应该为其创建自己的getter和setter,即将df放入self._df中,然后创建一个getter,始终返回df的副本而不是原始指针。您仍然可以使用self._df访问原始数据并进行更改,但它将提供额外的抽象层。 - Joop
4个回答

21

StaticFrame 包(我是作者之一)实现了类似 Pandas 的接口,以及许多常见的 Pandas 操作,同时在底层 NumPy 数组和不可变的 Series 和 Frame 容器中强制执行不可变性。

通过使用 static_frame.Frame.from_pandas(df) 将整个 Pandas DataFrame 转换为 StaticFrame Frame,您可以使其成为真正的只读表格。

请参阅 StaticFrame 这种方法的文档: https://static-frame.readthedocs.io/en/latest/api_detail/frame.html#frame-constructor


2
干得好!不可变性在哪个级别上执行?我想知道我们可以在哪里期望开销,特别是在使用StaticFrames为sklearn提供numpy数组而不复制时。 - dawid
5
谢谢!NumPy数组层面强制实现不可变性:由于NumPy数组实现了缓冲区协议(PEP 3118),我们可以设置array.flags.writeable布尔属性来强制实现不可变性。我认为与可变数组相比,这不会增加额外的开销。StaticFrame利用不可变数组使许多SeriesFrame操作快速轻便,因为无需复制底层数组。此终端动画演示了这种好处:https://raw.githubusercontent.com/InvestmentSystems/static-frame/master/doc/images/animate-low-memory-ops-verbose.svg - flexatone

13

尝试编写类似以下代码的内容

class Bla(object):
    def __init__(self):
        self._df = pd.DataFrame(index=[1,2,3])

    @property
    def df(self):
        return self._df.copy()

这将允许您使用b.df获取DataFrame,但无法对其进行赋值。简而言之,在类中拥有的DataFrame在行为上类似于"不可变数据帧",因为它阻止了对原始数据帧的更改。然而,返回的对象仍然是可变的数据帧,因此在其他方面它将不会像不可变的数据帧一样工作。例如,您将不能将其用作字典的键等。


@sanguineturtle,看来你已经问了一个非常类似的问题,并且得到了类似的回答! - Joop
1
另一个问题涉及到Python类的只读属性。我希望这个问题更多地关注于不可变数据框以保护类内的数据。这样,我就可以在内部拥有一个不可更改的“raw_data”数据框和一个类属性,该属性无法从类外更新。回传副本的想法很棒。 - sanguineturtle
虽然复制很耗费资源,但这是相当聪明的做法。 - lnNoam
我认为我们仍然可以使用 self.df.append() 来更改 df 而无需分配给它,这是我的印象。 - stucash

5
如果您真正想让DataFrame表现为不可变的,而不是使用@Joop的copy解决方案(我建议使用),您可以基于以下结构构建。

请注意,它只是一个起点。

它基本上是一个代理数据对象,隐藏了所有会改变状态的内容,并允许自身被哈希,所有相同原始数据的实例将具有相同的哈希值。可能有一些模块以更酷的方式执行下面的操作,但我认为这个示例作为教育示例可能很有用。

一些警告:

  • 根据代理对象的字符串表示形式如何构造,两个不同的代理对象可能会获得相同的哈希值,但是该实现与DataFrame等其他对象兼容。

  • 对原始对象的更改将影响代理对象。

  • 如果另一个对象将等式问题抛回,则等式将导致一些令人讨厌的无限递归(这就是为什么list有一个特殊情况)。

  • DataFrame代理制造器助手只是一个开始,问题在于不能允许或需要手动重写助手中更改原始对象状态的任何方法,或者在实例化_ReadOnly时完全通过extraFilter参数进行屏蔽。请参见DataFrameProxy.sort

  • 代理不会显示为派生自代理类型。

通用只读代理

这可用于任何对象。

import md5                                                                                              
import warnings                                                                                         

class _ReadOnly(object):                                                                                

    def __init__(self, obj, extraFilter=tuple()):                                                       

        self.__dict__['_obj'] = obj                                                                     
        self.__dict__['_d'] = None                                                                      
        self.__dict__['_extraFilter'] = extraFilter                                                     
        self.__dict__['_hash'] = int(md5.md5(str(obj)).hexdigest(), 16)                                 

    @staticmethod                                                                                       
    def _cloak(obj):                                                                                    
        try:                                                                                            
            hash(obj)                                                                                   
            return obj                                                                                  
        except TypeError:                                                                               
            return _ReadOnly(obj)                                                                       

    def __getitem__(self, value):                                                                       

        return _ReadOnly._cloak(self._obj[value])                                                       

    def __setitem__(self, key, value):                                                                  

        raise TypeError(                                                                                
            "{0} has a _ReadOnly proxy around it".format(type(self._obj)))                              

    def __delitem__(self, key):                                                                         

        raise TypeError(                                                                                
            "{0} has a _ReadOnly proxy around it".format(type(self._obj)))                              

    def __getattr__(self, value):                                                                       

        if value in self.__dir__():                                                                     
            return _ReadOnly._cloak(getattr(self._obj, value))                                          
        elif value in dir(self._obj):                                                                   
            raise AttributeError("{0} attribute {1} is cloaked".format(                                 
                type(self._obj), value))                                                                
        else:                                                                                           
            raise AttributeError("{0} has no {1}".format(                                               
                type(self._obj), value))                                                                

    def __setattr__(self, key, value):                                                                  

        raise TypeError(                                                                                
            "{0} has a _ReadOnly proxy around it".format(type(self._obj)))                              

    def __delattr__(self, key):                                                                         

        raise TypeError(                                                                                
            "{0} has a _ReadOnly proxy around it".format(type(self._obj)))                              

    def __dir__(self):                                                                                  

        if self._d is None:                                                                             
            self.__dict__['_d'] = [                                                                     
                i for i in dir(self._obj) if not i.startswith('set')                                    
                and i not in self._extraFilter]                                                         
        return self._d                                                                                  

    def __repr__(self):                                                                                 

        return self._obj.__repr__()                                                                     

    def __call__(self, *args, **kwargs):                                                                

        if hasattr(self._obj, "__call__"):                                                              
            return self._obj(*args, **kwargs)                                                           
        else:                                                                                           
            raise TypeError("{0} not callable".format(type(self._obj)))                                 

    def __hash__(self):                                                                                 

        return self._hash                                                                               

    def __eq__(self, other):                                                                            

        try:                                                                                            
            return hash(self) == hash(other)                                                            
        except TypeError:                                                                               
            if isinstance(other, list):                                                                 
                try:                                                                                    
                    return all(zip(self, other))                                                        
                except:                                                                                 
                    return False                                                                        
            return other == self    

数据框代理

应该扩展更多方法,如sort和过滤掉其他非关键的状态改变方法。

您可以将一个DataFrame实例作为唯一参数进行实例化,或者按照创建DataFrame的方式提供参数。

import pandas as pd

class DataFrameProxy(_ReadOnly):                                                                        

    EXTRA_FILTER = ('drop', 'drop_duplicates', 'dropna')                                                

    def __init__(self, *args, **kwargs):                                                                

        if (len(args) == 1 and                                                                          
                not len(kwargs) and                                                                     
                isinstance(args, pd.DataFrame)):                                                        

            super(DataFrameProxy, self).__init__(args[0],                                               
                DataFrameProxy.EXTRA_FILTER)                                                            

        else:                                                                                           

            super(DataFrameProxy, self).__init__(pd.DataFrame(*args, **kwargs),                         
                DataFrameProxy.EXTRA_FILTER)                                                            



    def sort(self, inplace=False, *args, **kwargs):                                                     

        if inplace:                                                                                     
            warnings.warn("Inplace sorting overridden")                                                 

        return self._obj.sort(*args, **kwargs) 

最后:

不过,尽管制造这个装置很有趣,但为什么不直接使用一个不需要改变的DataFrame呢?如果只有您可以访问它,那么最好由您自己来确保不要更改它...


这里有一些非常棒和有趣的想法。关于最后一个问题 - 其他人将在协作环境中为这门课程做出贡献,我希望确保其他方法使用私有的“__raw_data”属性并使用“_dataset”来进行修改。 - sanguineturtle
1
真正困难的部分是原始的 DataFrame 可以返回数据的 视图,这些视图可能具有可以访问和影响原始 DataFrame 状态的方法。因此,根据您对合作者的信任程度,将特定类型查找放在 _ReadOnly._cloak 中(而不是放在模块级别上)可能是值得的,以便尽可能地提供特定代理。另外,也许可以添加一个 .to_mutable 函数,以便人们可以检索 _obj 的副本。 - deinonychusaur
2
干得好。我一直希望能有一个不可变的DataFrame,这是个好的开始。我会将self._obj.values.flags.writeable=False添加到混合中,并且也许覆盖任何定位器返回的__setitem__(例如df.iloc[0.0]=999)。也许不能完全控制可变性,但你已经有了一个很好的基础。 - Phil Cooper

5

通过研究pandas实现并利用pandas的功能,可以修补DataFrame对象以实现此行为。我编写了一个名为make_dataframe_immutable(dataframe)的方法来解决这个问题。适用于pandas==0.25.3版本。

编辑: 添加了pandas==1.0.5和pandas==1.1.4的解决方案。

新的pandas版本可能需要进行调整 - 希望不会太难,通过下面的测试进行调整。

这个解决方案是新的,没有经过充分的测试 - 我们将非常感谢您提供任何反馈。

如果有人能够在这里发布一个反向的make_dataframe_mutable()方法,那就太好了。

import functools

import numpy as np
import pandas as pd
from pandas.core.indexing import _NDFrameIndexer


def make_dataframe_immutable(df: pd.DataFrame):
    """
    Makes the given DataFrame immutable.
    I.e. after calling this method - one cannot modify the dataframe using pandas interface.

    Upon a trial to modify an immutable dataframe, an exception of type ImmutablePandas is raised.
    """
    if getattr(df, "_is_immutable", False):
        return
    df._is_immutable = True
    df._set_value = functools.wraps(df._set_value)(_raise_immutable_exception)
    df._setitem_slice = functools.wraps(df._setitem_slice)(_raise_immutable_exception)
    df._setitem_frame = functools.wraps(df._setitem_frame)(_raise_immutable_exception)
    df._setitem_array = functools.wraps(df._setitem_array)(_raise_immutable_exception)
    df._set_item = functools.wraps(df._set_item)(_raise_immutable_exception)
    df._data.delete = functools.wraps(df._data.delete)(_raise_immutable_exception)
    df.update = functools.wraps(df.update)(_raise_immutable_exception)
    df.insert = functools.wraps(df.insert)(_raise_immutable_exception)

    df._get_item_cache = _make_result_immutable(df._get_item_cache)

    # prevent modification through numpy arrays
    df._data.as_array = _make_numpy_result_readonly(df._data.as_array)

    _prevent_inplace_argument_in_function_calls(
        df,
        # This list was obtained by manual inspection +
        #  [attr for attr in dir(d) if hasattr(getattr(pd.DataFrame, attr, None), '__code__') and
        #  'inplace' in getattr(pd.DataFrame, attr).__code__.co_varnames]
        (
            'bfill',
            'clip',
            'clip_lower',
            'clip_upper',
            'drop',
            'drop_duplicates',
            'dropna',
            'eval',
            'ffill',
            'fillna',
            'interpolate',
            'mask',
            'query',
            'replace',
            'reset_index',
            'set_axis',
            'set_index',
            'sort_index',
            'sort_values',
            'where',
            "astype",
            "assign",
            "reindex",
            "rename",
        ),
    )


def make_series_immutable(series: pd.Series):
    """
    Makes the given Series immutable.
    I.e. after calling this method - one cannot modify the series using pandas interface.


    Upon a trial to modify an immutable dataframe, an exception of type ImmutablePandas is raised.
    """
    if getattr(series, "_is_immutable", False):
        return
    series._is_immutable = True
    series._set_with_engine = functools.wraps(series._set_with_engine)(_raise_immutable_exception)
    series._set_with = functools.wraps(series._set_with)(_raise_immutable_exception)
    series.set_value = functools.wraps(series.set_value)(_raise_immutable_exception)

    # prevent modification through numpy arrays
    series._data.external_values = _make_numpy_result_readonly(series._data.external_values)
    series._data.internal_values = _make_numpy_result_readonly(series._data.internal_values)
    series._data.get_values = _make_numpy_result_readonly(series._data.get_values)

    _prevent_inplace_argument_in_function_calls(
        series,
        # This list was obtained by manual inspection +
        #  [attr for attr in dir(d) if hasattr(getattr(pd.Series, attr, None), '__code__') and
        #  'inplace' in getattr(pd.Series, attr).__code__.co_varnames]
        (
            "astype",
            'bfill',
            'clip',
            'clip_lower',
            'clip_upper',
            'drop',
            'drop_duplicates',
            'dropna',
            'ffill',
            'fillna',
            'interpolate',
            'mask',
            'replace',
            'reset_index',
            'set_axis',
            'sort_index',
            'sort_values',
            "valid",
            'where',
            "_set_name",
        ),
    )


class ImmutablePandas(Exception):
    pass


def _raise_immutable_exception(*args, **kwargs):
    raise ImmutablePandas(f"Cannot modify immutable dataframe. Please use df.copy()")


def _get_df_or_series_from_args(args):
    if len(args) >= 2 and (isinstance(args[1], pd.DataFrame) or isinstance(args[1], pd.Series)):
        return args[1]


def _safe__init__(self, *args, **kwargs):
    super(_NDFrameIndexer, self).__init__(*args, **kwargs)
    df_or_series = _get_df_or_series_from_args(args)
    if df_or_series is not None:
        if getattr(df_or_series, "_is_immutable", False):
            self._get_setitem_indexer = functools.wraps(self._get_setitem_indexer)(_raise_immutable_exception)


# This line is the greatest foul in this module - as it performs a global patch.
# Notice that a reload of this module incurs overriding this variable again and again. It is supported.
_NDFrameIndexer.__init__ = functools.wraps(_NDFrameIndexer.__init__)(_safe__init__)


def _make_numpy_result_readonly(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        if isinstance(res, np.ndarray):
            res.flags.writeable = False
        return res

    return wrapper


def _make_result_immutable(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        if isinstance(res, pd.Series):
            make_series_immutable(res)
        return res

    return wrapper


def _prevent_inplace_operation(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        # TODO: here we assume that in-place is not given as a positional.
        #  remove this assumption, either by hard-coding the position for each method or by parsing the
        #  function signature.
        if kwargs.get("inplace", False):
            _raise_immutable_exception()
        return func(*args, **kwargs)

    return wrapper


def _prevent_inplace_argument_in_function_calls(obj, attributes):
    for attr in attributes:
        member = getattr(obj, attr)
        setattr(obj, attr, _prevent_inplace_operation(member))


pytest单元测试

import immutable_pandas
import importlib
import warnings

import pandas as pd
import pytest



def create_immutable_dataframe() -> pd.DataFrame:
    # Cannot be used as a fixture because pytest copies objects transparently, which makes the tests flaky
    immutable_dataframe = pd.DataFrame({"x": [1, 2, 3, 4], "y": [4, 5, 6, 7]})
    make_dataframe_immutable(immutable_dataframe)
    return immutable_dataframe


def test_immutable_dataframe_cannot_change_with_direct_access():
    immutable_dataframe = create_immutable_dataframe()
    immutable_dataframe2 = immutable_dataframe.query("x == 2")
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        immutable_dataframe2["moshe"] = 123
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.x = 2
    with pytest.raises(ImmutablePandas):
        immutable_dataframe["moshe"] = 56
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.insert(0, "z", [1, 2, 3, 4])


def test_immutable_dataframe_cannot_change_with_inplace_operations():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.eval("y=x+1", inplace=True)
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.assign(y=2, inplace=True)


def test_immutable_dataframe_cannot_change_with_loc():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.loc[2] = 1
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.iloc[1] = 4


def test_immutable_dataframe_cannot_change_with_columns_access():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        immutable_dataframe["x"][2] = 123
    with pytest.raises(ImmutablePandas):
        immutable_dataframe["x"].loc[2] = 123


def test_immutable_dataframe_cannot_del_column():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        del immutable_dataframe["x"]


def test_immutable_dataframe_cannot_be_modified_through_values():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ValueError, match="read-only"):
        immutable_dataframe.values[0, 0] = 1
    with pytest.raises(ValueError, match="read-only"):
        immutable_dataframe.as_matrix()[0, 0] = 1


def test_immutable_series_cannot_change_with_loc():
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ImmutablePandas):
        series.loc[0] = 1
    with pytest.raises(ImmutablePandas):
        series.iloc[0] = 1


def test_immutable_series_cannot_change_with_inplace_operations():
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ImmutablePandas):
        series.sort_index(inplace=True)
    with pytest.raises(ImmutablePandas):
        series.sort_values(inplace=True)
    with pytest.raises(ImmutablePandas):
        series.astype(int, inplace=True)


def test_series_cannot_be_modeified_through_values():
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ValueError, match="read-only"):
        series.get_values()[0] = 1234
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ValueError, match="read-only"):
        series.values[0] = 1234


def test_reloading_module_immutable_pandas_does_not_break_immutability():
    # We need to test the effects of reloading the module, because we modify the global variable
    #       _NDFrameIndexer.__init__ upon every reload of the module.
    df = create_immutable_dataframe()
    df2 = df.copy()
    immutable_pandas2 = importlib.reload(immutable_pandas)
    with pytest.raises(immutable_pandas2.ImmutablePandas):
        df.astype(int, inplace=True)
    df2.astype(int, inplace=True)
    immutable_pandas2.make_dataframe_immutable(df2)
    with pytest.raises(immutable_pandas2.ImmutablePandas):
        df2.astype(int, inplace=True)


编辑:这里提供经过测试适用于pandas==1.0.5和pandas==1.1.4的更新。

"""
Two methods to make pandas objects immutable.
    make_dataframe_immutable()
    make_series_immutable()
"""
import functools

import numpy as np
import pandas as pd
from pandas.core.indexing import _iLocIndexer
from pandas.core.indexing import _LocIndexer
from pandas.core.indexing import IndexingMixin


def make_dataframe_immutable(df: pd.DataFrame):
    """
    Makes the given DataFrame immutable.
    I.e. after calling this method - one cannot modify the dataframe using pandas interface.

    Upon a trial to modify an immutable dataframe, an exception of type ImmutablePandas is raised.
    """
    if getattr(df, "_is_immutable", False):
        return
    df._is_immutable = True
    df._set_value = functools.wraps(df._set_value)(_raise_immutable_exception)
    df._setitem_slice = functools.wraps(df._setitem_slice)(_raise_immutable_exception)
    df._setitem_frame = functools.wraps(df._setitem_frame)(_raise_immutable_exception)
    df._setitem_array = functools.wraps(df._setitem_array)(_raise_immutable_exception)
    df._set_item = functools.wraps(df._set_item)(_raise_immutable_exception)
    if hasattr(df, "_mgr"):
        # pandas==1.1.4
        df._mgr.idelete = functools.wraps(df._mgr.idelete)(_raise_immutable_exception)
    elif hasattr(df, "_data"):
        # pandas==1.0.5
        df._data.delete = functools.wraps(df._data.delete)(_raise_immutable_exception)
    df.update = functools.wraps(df.update)(_raise_immutable_exception)
    df.insert = functools.wraps(df.insert)(_raise_immutable_exception)

    df._get_item_cache = _make_result_immutable(df._get_item_cache)

    # prevent modification through numpy arrays
    df._data.as_array = _make_numpy_result_readonly(df._data.as_array)

    _prevent_inplace_argument_in_function_calls(
        df,
        # This list was obtained by manual inspection +
        #  [attr for attr in dir(d) if hasattr(getattr(pd.DataFrame, attr, None), '__code__') and
        #  'inplace' in getattr(pd.DataFrame, attr).__code__.co_varnames]
        (
            "bfill",
            "clip",
            "drop",
            "drop_duplicates",
            "dropna",
            "eval",
            "ffill",
            "fillna",
            "interpolate",
            "mask",
            "query",
            "replace",
            "reset_index",
            "set_axis",
            "set_index",
            "sort_index",
            "sort_values",
            "where",
            "astype",
            "assign",
            "reindex",
            "rename",
        ),
    )


def make_series_immutable(series: pd.Series):
    """
    Makes the given Series immutable.
    I.e. after calling this method - one cannot modify the series using pandas interface.


    Upon a trial to modify an immutable dataframe, an exception of type ImmutablePandas is raised.
    """
    if getattr(series, "_is_immutable", False):
        return
    series._is_immutable = True
    series._set_with_engine = functools.wraps(series._set_with_engine)(_raise_immutable_exception)
    series._set_with = functools.wraps(series._set_with)(_raise_immutable_exception)

    # prevent modification through numpy arrays
    series._data.external_values = _make_numpy_result_readonly(series._data.external_values)
    series._data.internal_values = _make_numpy_result_readonly(series._data.internal_values)

    _prevent_inplace_argument_in_function_calls(
        series,
        # This list was obtained by manual inspection +
        #  [attr for attr in dir(d) if hasattr(getattr(pd.Series, attr, None), '__code__') and
        #  'inplace' in getattr(pd.Series, attr).__code__.co_varnames]
        (
            "astype",
            "bfill",
            "clip",
            "drop",
            "drop_duplicates",
            "dropna",
            "ffill",
            "fillna",
            "interpolate",
            "mask",
            "replace",
            "reset_index",
            "set_axis",
            "sort_index",
            "sort_values",
            "where",
            "_set_name",
        ),
    )


class ImmutablePandas(Exception):
    pass


def _raise_immutable_exception(*args, **kwargs):
    raise ImmutablePandas(f"Cannot modify immutable dataframe. Please use df.copy()")


def _get_df_or_series_from_args(args):
    if len(args) >= 2 and (isinstance(args[1], pd.DataFrame) or isinstance(args[1], pd.Series)):
        return args[1]


def _protect_indexer(loc_func):
    def wrapper(*arg, **kwargs):
        res = loc_func(*args, **kwargs)
        return res


def _safe__init__(cls, self, *args, **kwargs):
    super(cls, self).__init__(*args, **kwargs)
    df_or_series = _get_df_or_series_from_args(args)
    if df_or_series is not None:
        if getattr(df_or_series, "_is_immutable", False):
            self._get_setitem_indexer = functools.wraps(self._get_setitem_indexer)(_raise_immutable_exception)


@functools.wraps(IndexingMixin.loc)
def _safe_loc(self):
    loc = _LocIndexer("loc", self)
    if getattr(self, "_is_immutable", False):
        # Edit also loc._setitem_with_indexer
        loc._get_setitem_indexer = functools.wraps(loc._get_setitem_indexer)(_raise_immutable_exception)
    return loc


@functools.wraps(IndexingMixin.iloc)
def _safe_iloc(self):
    iloc = _iLocIndexer("iloc", self)
    if getattr(self, "_is_immutable", False):
        # Edit also iloc._setitem_with_indexer
        iloc._get_setitem_indexer = functools.wraps(iloc._get_setitem_indexer)(_raise_immutable_exception)
    return iloc


# wraps
pd.DataFrame.loc = property(_safe_loc)
pd.Series.loc = property(_safe_loc)
pd.DataFrame.iloc = property(_safe_iloc)
pd.Series.iloc = property(_safe_iloc)


def _make_numpy_result_readonly(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        if isinstance(res, np.ndarray):
            res.flags.writeable = False
        return res

    return wrapper


def _make_result_immutable(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        if isinstance(res, pd.Series):
            make_series_immutable(res)
        return res

    return wrapper


def _prevent_inplace_operation(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        # TODO: here we assume that in-place is not given as a positional.
        #  remove this assumption, either by hard-coding the position for each method or by parsing the
        #  function signature.
        if kwargs.get("inplace", False):
            _raise_immutable_exception()
        return func(*args, **kwargs)

    return wrapper


def _prevent_inplace_argument_in_function_calls(obj, attributes):
    for attr in attributes:
        member = getattr(obj, attr)
        setattr(obj, attr, _prevent_inplace_operation(member))


还有pytest文件

import importlib
import warnings

import pandas as pd
import pytest

import immutable_pandas
from immutable_pandas import ImmutablePandas
from immutable_pandas import make_dataframe_immutable
from immutable_pandas import make_series_immutable


def create_immutable_dataframe() -> pd.DataFrame:
    # Cannot be used as a fixture because pytest copies objects transparently, which makes the tests flaky
    immutable_dataframe = pd.DataFrame({"x": [1, 2, 3, 4], "y": [4, 5, 6, 7]})
    make_dataframe_immutable(immutable_dataframe)
    return immutable_dataframe


def test_immutable_dataframe_cannot_change_with_direct_access():
    immutable_dataframe = create_immutable_dataframe()
    immutable_dataframe2 = immutable_dataframe.query("x == 2")
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        immutable_dataframe2["moshe"] = 123
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.x = 2
    with pytest.raises(ImmutablePandas):
        immutable_dataframe["moshe"] = 56
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.insert(0, "z", [1, 2, 3, 4])


def test_immutable_dataframe_cannot_change_with_inplace_operations():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.eval("y=x+1", inplace=True)
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.assign(y=2, inplace=True)


def test_immutable_dataframe_cannot_change_with_loc():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.loc[2] = 1
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.iloc[1] = 4


def test_immutable_dataframe_cannot_change_with_columns_access():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        immutable_dataframe["x"][2] = 123
    with pytest.raises(ImmutablePandas):
        immutable_dataframe["x"].loc[2] = 123


def test_immutable_dataframe_cannot_del_column():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        del immutable_dataframe["x"]


def test_immutable_dataframe_cannot_be_modified_through_values():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ValueError, match="read-only"):
        immutable_dataframe.values[0, 0] = 1
    # with pytest.raises(ValueError, match="read-only"):
    #     immutable_dataframe.as_matrix()[0, 0] = 1


def test_immutable_series_cannot_change_with_loc():
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ImmutablePandas):
        series.loc[0] = 1
    with pytest.raises(ImmutablePandas):
        series.iloc[0] = 1


def test_immutable_series_cannot_change_with_inplace_operations():
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ImmutablePandas):
        series.sort_index(inplace=True)
    with pytest.raises(ImmutablePandas):
        series.sort_values(inplace=True)
    with pytest.raises(ImmutablePandas):
        series.astype(int, inplace=True)


def test_series_cannot_be_modeified_through_values():
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ValueError, match="read-only"):
        series.values[0] = 1234


def test_reloading_module_immutable_pandas_does_not_break_immutability():
    # We need to test the effects of reloading the module, because we modify the global variable
    #       pd.DataFrame.loc, pd.DataFrame.iloc,
    #       pd.Series.loc, pd.Series.iloc
    #       upon every reload of the module.
    df = create_immutable_dataframe()
    df2 = df.copy()
    immutable_pandas2 = importlib.reload(immutable_pandas)
    with pytest.raises(immutable_pandas2.ImmutablePandas):
        df.astype(int, inplace=True)
    immutable_pandas2.make_dataframe_immutable(df2)
    with pytest.raises(immutable_pandas2.ImmutablePandas):
        df2.astype(int, inplace=True)


def test_at_and_iat_crash():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(immutable_pandas.ImmutablePandas):
        immutable_dataframe.iat[0, 0] = 1
    with pytest.raises(immutable_pandas.ImmutablePandas):
        immutable_dataframe.at[0, "x"] = 1



网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接