列表推导式中为什么元组需要用括号包裹?

8
众所周知,元组是通过逗号而不是括号来定义的。引用自Python官方文档的一句话:“元组由多个值通过逗号分隔而成。”因此:
myVar1 = 'a', 'b', 'c'
type(myVar1)
# Result:
<type 'tuple'>

另一个引人注目的例子是这样的:
myVar2 = ('a')
type(myVar2)
# Result:
<type 'str'>  

myVar3 = ('a',)
type(myVar3)
# Result:
<type 'tuple'>

即使是只有一个元素的元组,也需要逗号,而括号总是用来避免混淆。 我的问题是:为什么我们不能在列表推导式中省略数组的括号呢?例如:

myList1 = ['a', 'b']
myList2 = ['c', 'd']

print([(v1,v2) for v1 in myList1 for v2 in myList2])
# Works, result:
[('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd')]

print([v1,v2 for v1 in myList1 for v2 in myList2])
# Does not work, result:
SyntaxError: invalid syntax

第二个列表推导式只是下面的循环的语法糖,它确实起作用吗?
myTuples = []
for v1 in myList1:
    for v2 in myList2:
        myTuple = v1,v2
        myTuples.append(myTuple)
print myTuples
# Result:
[('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd')]
3个回答

11

Python的语法是LL(1),这意味着在解析时只会向前查看一个符号。

[(v1, v2) for v1 in myList1 for v2 in myList2]

这里,解析器看到类似这样的内容。

[ # An opening bracket; must be some kind of list
[( # Okay, so a list containing some value in parentheses
[(v1
[(v1,
[(v1, v2
[(v1, v2)
[(v1, v2) for # Alright, list comprehension

然而,如果没有括号,它必须更早地做出决定。

[v1, v2 for v1 in myList1 for v2 in myList2]

[ # List-ish thing
[v1 # List containing a value; alright
[v1, # List containing at least two values
[v1, v2 # Here's the second value
[v1, v2 for # Wait, what?

回溯的解析器通常会非常慢,因此LL(1)解析器不会回溯。因此,禁止使用模棱两可的语法。


2
事实证明,这个答案是错误的。请参考user2357112的答案这个更近期的问题了解详情。 - John Kugelman

3

当我感觉"因为语法禁止"有点过于尖刻时,我想出了一个理由

它开始将表达式解析为列表/集合/元组,并期望遇到,,但实际上遇到了for标记。

例如:

$ python3.6 test.py
  File "test.py", line 1
    [a, b for a, b in c]
            ^
SyntaxError: invalid syntax

进行如下的标记化:

$ python3.6 -m tokenize test.py
0,0-0,0:            ENCODING       'utf-8'        
1,0-1,1:            OP             '['            
1,1-1,2:            NAME           'a'            
1,2-1,3:            OP             ','            
1,4-1,5:            NAME           'b'            
1,6-1,9:            NAME           'for'          
1,10-1,11:          NAME           'a'            
1,11-1,12:          OP             ','            
1,13-1,14:          NAME           'b'            
1,15-1,17:          NAME           'in'           
1,18-1,19:          NAME           'c'            
1,19-1,20:          OP             ']'            
1,20-1,21:          NEWLINE        '\n'           
2,0-2,0:            ENDMARKER      ''     

2

这个限制并非由于解析器问题而产生。与Silvio Mayolo的回答相反,LL(1)解析器可以很好地解析无括号语法。在原始列表理解补丁的早期版本中,括号是可选的;只有为了使含义更清晰才将其变为强制性的。

引用Guido van Rossum在2000年的话,他在回复某人对[x, y for ...]可能会导致解析器问题感到担忧时说:

Don't worry. Greg Ewing had no problem expressing this in Python's own grammar, which is about as restricted as parsers come. (It's LL(1), which is equivalent to pure recursive descent with one lookahead token, i.e. no backtracking.)

Here's Greg's grammar:

atom: ... | '[' [testlist [list_iter]] ']' | ...
  list_iter: list_for | list_if
  list_for: 'for' exprlist 'in' testlist [list_iter]
  list_if: 'if' test [list_iter]

Note that before, the list syntax was '[' [testlist] ']'. Let me explain it in different terms:

The parser parses a series comma-separated expressions. Previously, it was expecting ']' as the sole possible token following this. After the change, 'for' is another possible following token. This is no problem at all for any parser that knows how to parse matching parentheses!

If you'd rather not support [x, y for ...] because it's ambiguous (to the human reader, not to the parser!), we can change the grammar to something like:

'[' test [',' testlist | list_iter] ']'

(Note that | binds less than concatenation, and [...] means an optional part.)

还请查看该主题中的下一个响应,其中Greg Ewing参与了讨论。

>>> seq = [1,2,3,4,5]
>>> [x, x*2 for x in seq]
[(1, 2), (2, 4), (3, 6), (4, 8), (5, 10)]

在早期版本的列表推导补丁中,它运作得非常好。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接