递归地使用dir()函数遍历Python对象,以查找特定类型或具有特定值的值。

10
我有一个复杂的Python数据结构(如果有影响的话,它是一个大型的music21 Score对象),由于对象结构深处存在weakref,无法进行pickle。我以前曾使用堆栈跟踪和Python调试器调试过此类问题,但这总是很麻烦。是否有一种工具可以在对象的所有属性上递归运行dir(),查找隐藏在列表、元组、字典等中的对象,并返回与特定值相匹配的对象(如lambda函数或类似函数)。一个大问题是递归引用,因此需要某种记忆功能(例如copy.deepcopy使用的功能)。我尝试了:
import weakref
def findWeakRef(streamObj, memo=None):
    weakRefList = []
    if memo is None:
        memo = {}
    for x in dir(streamObj):
        xValue = getattr(streamObj, x)
        if id(xValue) in memo:
            continue
        else:
            memo[id(xValue)] = True
        if type(xValue) is weakref.ref:
            weakRefList.append(x, xValue, streamObj)
        if hasattr(xValue, "__iter__"):
            for i in xValue:
                if id(i) in memo:
                    pass
                else:
                    memo[id(i)] = True
                    weakRefList.extend(findWeakRef(i), memo)
        else:
            weakRefList.extend(findWeakRef(xValue), memo)
    return weakRefList

我可能可以继续修补这个问题(例如,iter 对于字典来说不是我想要的),但在我投入更多时间之前,想知道是否有更简单的答案。这可能是一个非常有用的通用工具。


1
我还没有看到现成的解决方案。也许使用gc.get_referents而不是dir可以让你走得更远一些。 - Pankrat
它确实减少了冗余代码(__eq__等),但代价是返回的格式会根据对象类型而改变,所以这可能是一个更快的解决方案,但并不简单。它似乎也不支持递归。谢谢! - Michael Scott Asato Cuthbert
1
你考虑过继承pickle.Pickler类吗?源代码包含在.../Lib/pickle.py中。这样做应该允许您重用大量代码并捕获PickleError以执行您所描述的操作 - 同时利用Python已经建立的成熟的pickling协议。 - martineau
好评。我还需要这个脚本来做其他事情,但这似乎是一个很好的方法。 - Michael Scott Asato Cuthbert
2个回答

5
这里有一个更加简单但有些幼稚的解决方案。即在属性树下进行深度优先搜索。如果它是原始类型,则停止搜索,否则继续向下搜索。这将获取调用树和叶子节点的值。
def recursive_dir(obj, path):
    if ((obj!=None) and (not isinstance(obj, (str,float,int,list,dict,set)))):
        for attr, val in obj.__dict__.iteritems():
            temp_path = path[:]
            temp_path.append(attr)
            recursive_dir(getattr(obj, attr), temp_path)
    else:
        print (path, "--->", obj)
        print("")
recursive_dir(x,[])

1
虽然查看可迭代对象(如列表)的内容对我找到那些讨厌的弱引用也很重要,但这个简化真是太好了。 - Michael Scott Asato Cuthbert

4

这似乎是一个答案的开始。 我需要从Python 3.2的inspect.getattr_static中回溯一些项目,以使其正常工作,以便它不会调用只会生成新对象的属性。 这是我想出来的代码:

#-------------------------------------------------------------------------------
# Name:         treeYield.py
# Purpose:      traverse a complex datastructure and yield elements
#               that fit a given criteria
#
# Authors:      Michael Scott Cuthbert
#
# Copyright:    Copyright © 2012 Michael Scott Cuthbert
# License:      CC-BY
#-------------------------------------------------------------------------------
import types

class TreeYielder(object):
    def __init__(self, yieldValue = None):
        '''
        `yieldValue` should be a lambda function that
        returns True/False or a function/method call that
        will be passed the value of a current attribute
        '''        
        self.currentStack = []
        self.yieldValue = yieldValue
        self.stackVals = []
        t = types
        self.nonIterables = [t.IntType, t.StringType, t.UnicodeType, t.LongType,
                             t.FloatType, t.NoneType, t.BooleanType]

    def run(self, obj, memo = None):
        '''
        traverse all attributes of an object looking
        for subObjects that meet a certain criteria.
        yield them.

        `memo` is a dictionary to keep track of objects
        that have already been seen

        The original object is added to the memo and
        also checked for yieldValue
        '''
        if memo is None:
            memo = {}
        self.memo = memo
        if id(obj) in self.memo:
            self.memo[id(obj)] += 1
            return
        else:
            self.memo[id(obj)] = 1

        if self.yieldValue(obj) is True:
            yield obj


        ### now check for sub values...
        self.currentStack.append(obj)

        tObj = type(obj)
        if tObj in self.nonIterables:
            pass
        elif tObj == types.DictType:
            for keyX in obj:
                dictTuple = ('dict', keyX)
                self.stackVals.append(dictTuple)
                x = obj[keyX]
                for z in self.run(x, memo=memo):
                    yield z
                self.stackVals.pop()

        elif tObj in [types.ListType, types.TupleType]:
            for i,x in enumerate(obj):
                listTuple = ('listLike', i)
                self.stackVals.append(listTuple)
                for z in self.run(x, memo=memo):
                    yield z
                self.stackVals.pop()

        else: # objects or uncaught types...
            ### from http://bugs.python.org/file18699/static.py
            try:
                instance_dict = object.__getattribute__(obj, "__dict__")
            except AttributeError:
                ## probably uncaught static object
                return

            for x in instance_dict:
                try:
                    gotValue = object.__getattribute__(obj, x)
                except: # ?? property that relies on something else being set.
                    continue
                objTuple = ('getattr', x)
                self.stackVals.append(objTuple)
                try:
                    for z in self.run(gotValue, memo=memo):
                        yield z
                except RuntimeError:
                    raise Exception("Maximum recursion on:\n%s" % self.currentLevel())
                self.stackVals.pop()                

        self.currentStack.pop()

    def currentLevel(self):
        currentStr = ""
        for stackType, stackValue in self.stackVals:
            if stackType == 'dict':
                if isinstance(stackValue, str):
                    currentStr += "['" + stackValue + "']"
                elif isinstance(stackValue, unicode):
                    currentStr += "[u'" + stackValue + "']"
                else: # numeric key...
                    currentStr += "[" + str(stackValue) + "]"
            elif stackType == 'listLike':
                currentStr += "[" + str(stackValue) + "]"
            elif stackType == 'getattr':
                currentStr += ".__getattribute__('" + stackValue + "')"
            else:
                raise Exception("Cannot get attribute of type %s" % stackType)
        return currentStr

这段代码可以让你运行类似这样的内容:
class Mock(object):
    def __init__(self, mockThing, embedMock = True):
        self.abby = 30
        self.mocker = mockThing
        self.mockList = [mockThing, mockThing, 40]
        self.embeddedMock = None
        if embedMock is True:
            self.embeddedMock = Mock(mockThing, embedMock = False)

mockType = lambda x: x.__class__.__name__ == 'Mock'

subList = [100, 60, -2]
myList = [5, 20, [5, 12, 17], 30, {'hello': 10, 'goodbye': 22, 'mock': Mock(subList)}, -20, Mock(subList)]
myList.append(myList)

ty = TreeYielder(mockType)
for val in ty.run(myList):
    print(val, ty.currentLevel())

并获得:

(<__main__.Mock object at 0x01DEBD10>, "[4]['mock']")
(<__main__.Mock object at 0x01DEF370>, "[4]['mock'].__getattribute__('embeddedMock')")
(<__main__.Mock object at 0x01DEF390>, '[6]')
(<__main__.Mock object at 0x01DEF3B0>, "[6].__getattribute__('embeddedMock')")

或者运行:

high = lambda x: isinstance(x, (int, float)) and x > 10
ty = TreeYielder(high)
for val in ty.run(myList):
    print(val, ty.currentLevel())

并获得:

(20, '[1]')
(12, '[2][1]')
(17, '[2][2]')
(30, '[3]')
(22, "[4]['goodbye']")
(100, "[4]['mock'].__getattribute__('embeddedMock').__getattribute__('mocker')[0]")
(60, "[4]['mock'].__getattribute__('embeddedMock').__getattribute__('mocker')[1]")
(40, "[4]['mock'].__getattribute__('embeddedMock').__getattribute__('mockList')[2]")

我仍在努力弄清楚为什么找不到.abby文件,但我认为即使在这个阶段发布也是值得的,因为它比我开始时更接近正确的方向。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接