如何确定Python中嵌套数据结构的类型？

Question

如何确定Python中嵌套数据结构的类型？

11

我目前正在翻译一些Python代码，具体来说是神经网络和深度学习。

为了确保数据结构正确翻译，需要了解Python中嵌套类型的详细信息。虽然type()函数可以处理简单类型，但无法处理嵌套类型。

例如，在Python中：

> data = ([[1,2,3],[4,5,6],[7,8,9]],["a","b","c"])
> type(data)
<type 'tuple'>

只给出第一级的类型，关于元组中的数组没有任何信息。

我希望得到像 F# 一样的东西。

> let data = ([|[|1;2;3|];[|4;5;6|];[|7;8;9|]|],[|"a";"b";"c"|]);;

val data : int [] [] * string [] =
  ([|[|1; 2; 3|]; [|4; 5; 6|]; [|7; 8; 9|]|], [|"a"; "b"; "c"|])

返回独立于值的签名。

int [] [] * string []

*         is a tuple item separator  
int [] [] is a two dimensional jagged array of int  
string [] is a one dimensional array of string

这个可以用Python实现吗？

简而言之，我目前正在使用带有调试器的PyCharm，在变量窗口中点击查看单个变量的选项以查看详细信息。问题在于输出包含值和类型混合在一起，而我只需要类型签名。当变量像(float[50000][784], int[50000])这样时，值会妨碍阅读。是的，我现在正在重新调整变量大小，但这只是一个解决方法，不是解决方案。

例如：使用 PyCharm社区

(array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        ...,     
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32),
  array([7, 2, 1, ..., 4, 5, 6]))

使用 Spyder

使用Visual Studio Community与Python Tools for Visual Studio。

(array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],    
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],  
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],  
        ...,   
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],  
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],  
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32),  
  array([5, 0, 4, ..., 8, 4, 8], dtype=int64))

编辑：

由于有人正在寻找更多细节，因此这是我的修改版本，它还可以处理numpy ndarray。感谢Vlad提供的初始版本。

另外，由于使用了Run Length Encoding的变体，不再使用?用于异构类型。

# Note: Typing for elements of iterable types such as Set, List, or Dict 
# use a variation of Run Length Encoding.

def type_spec_iterable(iterable, name):
    def iterable_info(iterable):
        # With an iterable for it to be comparable 
        # the identity must contain the name and length 
        # and for the elements the type, order and count.
        length = 0
        types_list = []
        pervious_identity_type = None
        pervious_identity_type_count = 0
        first_item_done = False
        for e in iterable:
            item_type = type_spec(e)
            if (item_type != pervious_identity_type):
                if not first_item_done:
                    first_item_done = True
                else:
                    types_list.append((pervious_identity_type, pervious_identity_type_count))
                pervious_identity_type = item_type
                pervious_identity_type_count = 1
            else:
                pervious_identity_type_count += 1
            length += 1
        types_list.append((pervious_identity_type, pervious_identity_type_count))
        return (length, types_list)
    (length, identity_list) = iterable_info(iterable)
    element_types = ""
    for (identity_item_type, identity_item_count) in identity_list:
        if element_types == "":
            pass
        else:
            element_types += ","
        element_types += identity_item_type
        if (identity_item_count != length) and (identity_item_count != 1):
            element_types += "[" + `identity_item_count` + "]"
    result = name + "[" + `length` + "]<" + element_types + ">"
    return result

def type_spec_dict(dict, name):
    def dict_info(dict):
        # With a dict for it to be comparable 
        # the identity must contain the name and length 
        # and for the key and value combinations the type, order and count.
        length = 0
        types_list = []
        pervious_identity_type = None
        pervious_identity_type_count = 0
        first_item_done = False
        for (k, v) in dict.iteritems():
            key_type = type_spec(k)
            value_type = type_spec(v)
            item_type = (key_type, value_type)
            if (item_type != pervious_identity_type):
                if not first_item_done:
                    first_item_done = True
                else:
                    types_list.append((pervious_identity_type, pervious_identity_type_count))
                pervious_identity_type = item_type
                pervious_identity_type_count = 1
            else:
                pervious_identity_type_count += 1
            length += 1
        types_list.append((pervious_identity_type, pervious_identity_type_count))
        return (length, types_list)
    (length, identity_list) = dict_info(dict)
    element_types = ""
    for ((identity_key_type,identity_value_type), identity_item_count) in identity_list:
        if element_types == "":
            pass
        else:
            element_types += ","
        identity_item_type = "(" + identity_key_type + "," + identity_value_type + ")"
        element_types += identity_item_type
        if (identity_item_count != length) and (identity_item_count != 1):
            element_types += "[" + `identity_item_count` + "]"
    result = name + "[" + `length` + "]<" + element_types + ">"
    return result

def type_spec_tuple(tuple, name):
    return name + "<" + ", ".join(type_spec(e) for e in tuple) + ">"

def type_spec(obj):
    object_type = type(obj)
    name = object_type.__name__
    if (object_type is int) or (object_type is long) or (object_type is str) or (object_type is bool) or (object_type is float):            
        result = name
    elif object_type is type(None):
        result = "(none)"
    elif (object_type is list) or (object_type is set):
        result = type_spec_iterable(obj, name)
    elif (object_type is dict):
        result = type_spec_dict(obj, name)
    elif (object_type is tuple):
        result = type_spec_tuple(obj, name)
    else:
        if name == 'ndarray':
            ndarray = obj
            ndarray_shape = "[" + `ndarray.shape`.replace("L","").replace(" ","").replace("(","").replace(")","") + "]"
            ndarray_data_type = `ndarray.dtype`.split("'")[1]
            result = name + ndarray_shape + "<" + ndarray_data_type + ">"
        else:
            result = "Unknown type: " , name
    return result

我不认为它已经完成了，但迄今为止它在我需要的所有东西上都起作用了。

- Guy Coder

你可能可以为元组自己编写一些代码，但对于列表或字典来说就不行了，因为它们是无类型的（元组也是，但至少它们是不可变的）。 [1, 2，'c'] 应该是什么类型？ - L3viathan

data来自可预测的、有结构的来源吗？还是仅仅是偶然的呢？ - Kyle Pittman

1

你是想从正在运行的脚本中推断变量的类型，还是从代码本身推断它们的类型？生成列表或元组，然后迭代其中每个级别时注意其类型是一回事。而仅仅通过查看代码来推断它所产生的结果（而不运行它）则完全是另外一回事。 - hpaulj

@Monkpit 对于我正在处理的教程数据来自MNIST，被称为神经网络的Hello World。 - Guy Coder

@hpaulj 我更喜欢通过查看代码来获取类型签名，但我意识到Python的类型系统不是静态的。由于这是一个相对简单的项目，我会尽可能地完成它。目前，我运行代码并使用调试器，或者对于简单的代码片段，使用交互式会话。如果将来我更多地使用Python，我预计会经常提出这个问题。 - Guy Coder

2个回答

3

一种手工实现的方法是：

def type_spec_iterable(obj, name):
    tps = set(type_spec(e) for e in obj)
    if len(tps) == 1:
        return name + "<" + next(iter(tps)) + ">"
    else:
        return name + "<?>"


def type_spec_dict(obj):
    tps = set((type_spec(k), type_spec(v)) for (k,v) in obj.iteritems())
    keytypes = set(k for (k, v) in tps)
    valtypes =  set(v for (k, v) in tps)
    kt = next(iter(keytypes)) if len(keytypes) == 1 else "?"
    vt = next(iter(valtypes)) if len(valtypes) == 1 else "?"
    return "dict<%s, %s>" % (kt, vt)


def type_spec_tuple(obj):
    return "tuple<" + ", ".join(type_spec(e) for e in obj) + ">"


def type_spec(obj):
    t = type(obj)
    res = {
        int: "int",
        str: "str",
        bool: "bool",
        float: "float",
        type(None): "(none)",
        list: lambda o: type_spec_iterable(o, 'list'),
        set: lambda o: type_spec_iterable(o, 'set'),
        dict: type_spec_dict,
        tuple: type_spec_tuple,
    }.get(t, lambda o: type(o).__name__)
    return res if type(res) is str else res(obj)


if __name__ == "__main__":
    class Foo(object):
        pass
    for obj in [
        1,
        2.3,
        None,
        False,
        "hello",
        [1, 2, 3],
        ["a", "b"],
        [1, "h"],
        (False, 1, "2"),
        set([1.2, 2.3, 3.4]),
        [[1,2,3],[4,5,6],[7,8,9]],
        [(1,'a'), (2, 'b')],
        {1:'b', 2:'c'},
        [Foo()], # todo - inheritance?
    ]:
        print repr(obj), ":", type_spec(obj)

这将打印：

1 : int
2.3 : float
None : (none)
False : bool
'hello' : str
[1, 2, 3] : list<int>
['a', 'b'] : list<str>
[1, 'h'] : list<?>
(False, 1, '2') : tuple<bool, int, str>
set([2.3, 1.2, 3.4]) : set<float>
[[1, 2, 3], [4, 5, 6], [7, 8, 9]] : list<list<int>>
[(1, 'a'), (2, 'b')] : list<tuple<int, str>>
{1: 'b', 2: 'c'} : dict<int, str>
[<__main__.Foo object at 0x101de6c50>] : list<Foo>

有一个问题是你想要处理到什么程度，以及在速度和准确性之间的权衡。例如，你是否想要遍历一个大列表中的所有项目？你是否想要处理自定义类型（并查找这些类型的共同祖先）？

虽然我不确定是否适用，但值得阅读的是关于类型提示的PEP（PEP 0484）。

- Vlad

不错的细节。Python不是我经常使用的语言，所以我得试一试并花些时间去理解它。这里有足够的内容让我深入挖掘和享受。你是在问题之后写的这篇文章吗？如果是的话，我印象深刻；如果只是复制粘贴并做了一些修改，那也可以。 - Guy Coder

我在看到问题后从零开始编写了它。我没有在网上查找，但我认为肯定有人之前做过类似的事情。 - Vlad

通过鸭子类型，即使您还没有运行代码的某个部分，您也可以长时间使用错误而不会注意到。但是，Python非常适合在您需要快速完成某些任务时使用。在C语言中设置include和main()所需的时间内，您已经完成了编写代码的工作。 - Vlad

你为什么要在“type_spec”中迭代类型，而不是使用字典呢？ - L3viathan

1

@Vlad 关于 PEP 0484，虽然 lint 在代码运行期间不使用类型，但我发现添加 assert type(x) == y 等断言同样有用。也许应该有一个新的 PEP 将类型信息转换为在代码运行期间运行的断言。如果您曾经做过未类型化的 Lambda 演算，然后是类型化的 Lambda 演算，这应该更容易理解。 - Guy Coder

显示剩余6条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- L3viathan · Accepted Answer

正如我所评论的，这在Python中是不可能的，因为列表是无类型的。

你仍然可以假装去做它：

def typ(something, depth=0):
    if depth > 63:
        return "..."
    if type(something) == tuple:
        return "<class 'tuple': <" + ", ".join(typ(ding, depth+1) for ding in something) + ">>"
    elif type(something) == list:
        return "<class 'list': " + (typ(something[0], depth+1) if something else '(empty)') + ">"
    else:
        return str(type(something))

对于您的示例，返回字符串<class 'tuple': <<class 'list': <class 'list':<class 'int'>>>,<class 'list':<class'str'>>>>。

编辑：为使其更像F#，您可以改为：

def typ(something, depth=0):
    if depth > 63:
        return "..."
    if type(something) == tuple:
        return " * ".join(typ(ding, depth+1) for ding in something)
    elif type(something) == list:
        return (typ(something[0]) if something else 'empty') + " []"
    else:
        return str(type(something, depth+1)).split("'")[1]

在您的示例中，将返回int [] [] * str []。