基于正则表达式选择要调用的Python函数

Question

基于正则表达式选择要调用的Python函数

55

是否可以在不使用def先定义函数名称的情况下将函数放入数据结构中？

# This is the behaviour I want. Prints "hi".
def myprint(msg):
    print msg
f_list = [ myprint ]
f_list[0]('hi')
# The word "myprint" is never used again. Why litter the namespace with it?

lambda函数的主体受到严格限制，因此我不能使用它们。

编辑：为了参考，这更像是我遇到问题时的实际代码。

def handle_message( msg ):
    print msg
def handle_warning( msg ):
    global num_warnings, num_fatals
    num_warnings += 1
    if ( is_fatal( msg ) ):
        num_fatals += 1
handlers = (
    ( re.compile( '^<\w+> (.*)' ), handle_message ),
    ( re.compile( '^\*{3} (.*)' ), handle_warning ),
)
# There are really 10 or so handlers, of similar length.
# The regexps are uncomfortably separated from the handler bodies,
# and the code is unnecessarily long.

for line in open( "log" ):
    for ( regex, handler ) in handlers:
        m = regex.search( line )
        if ( m ): handler( m.group(1) )

- Tim

1

不是这样的。# 单词“myprint”再也没有被使用过。为什么要在命名空间中乱扔呢？ 你为什么要花费这么多时间来消除一行对你没有任何影响的代码呢？ - phant0m

2

@phant0m, @Udi：我希望我的代码既漂亮又易于阅读。在现实生活中，我有一组正则表达式-函数对的列表，并在匹配正则表达式的字符串上运行该函数/处理程序。处理程序足够小，以至于在列表外部进行定义会显得丑陋和不恰当。 - Tim

1

我现在已经添加了真正的问题。我通常不喜欢这样做，因为它会使问题变得更加具体。我可能会从发布中学到更多，但是那些通过标题找到问题的未来访问者则不会。（@phant0m） - Tim

1

这些函数名是很好的文档。如果你将它们匿名化，你的代码读者需要花费更多的脑力才能理解这些函数所做的事情。 - Steven Rumbalski

1

@sleepless：我更喜欢Perl，但目标机器上没有它。 - Tim

显示剩余6条评论

14个回答

16

更好的DRY方式来解决你的实际问题：

def message(msg):
    print msg
message.re = '^<\w+> (.*)'

def warning(msg):
    global num_warnings, num_fatals
    num_warnings += 1
    if ( is_fatal( msg ) ):
        num_fatals += 1
warning.re = '^\*{3} (.*)'

handlers = [(re.compile(x.re), x) for x in [
        message,
        warning,
        foo,
        bar,
        baz,
    ]]

- Udi

比我的尝试好多了。在想更多的想法之前，我真的应该先阅读一下可用的数据结构。谢谢！ - Tim

15

继续Gareth的干净方法，使用一个模块化的自包含解决方案：

import re

# in util.py
class GenericLogProcessor(object):

    def __init__(self):
      self.handlers = [] # List of pairs (regexp, handler)

    def register(self, regexp):
        """Declare a function as handler for a regular expression."""
        def gethandler(f):
            self.handlers.append((re.compile(regexp), f))
            return f
        return gethandler

    def process(self, file):
        """Process a file line by line and execute all handlers by registered regular expressions"""
        for line in file:
            for regex, handler in self.handlers:
                m = regex.search(line)
                if (m):
                  handler(m.group(1))      

# in log_processor.py
log_processor = GenericLogProcessor()

@log_processor.register(r'^<\w+> (.*)')
def handle_message(msg):
    print msg

@log_processor.register(r'^\*{3} (.*)')
def handle_warning(msg):
    global num_warnings, num_fatals
    num_warnings += 1
    if is_fatal(msg):
        num_fatals += 1

# in your code
with open("1.log") as f:
  log_processor.process(f)

- Udi

我必须说，这个很好而且紧凑。所有的东西都聚集在一个地方。 - sleeplessnerd

优秀的装饰器使用，使代码更加简洁！ - TyrantWave

13

如果你想保持一个干净的命名空间，使用del：

def myprint(msg):
    print msg
f_list = [ myprint ]
del myprint
f_list[0]('hi')

- Udi

9

正如你所说，这是不可能完成的。但你可以近似实现它。

def create_printer():
  def myprint(x):
    print x
  return myprint

x = create_printer()

myprint在这里实际上是匿名的，因为创建它的变量作用域不再对调用者可访问。（请参见Python中的闭包。）

- robert

我不知道我对最后一行的感觉如何。Python中闭包的整个重点不是在于作用域会持续“内部”吗？话虽如此，我非常喜欢这个解决方案。 - Wilduck

6

如果您担心污染命名空间，请将您的函数创建在另一个函数内。这样，您只会“污染”create_functions函数的本地命名空间，而不是外部命名空间。

def create_functions():
    def myprint(msg):
        print msg
    return [myprint]

f_list = create_functions()
f_list[0]('hi')

- FogleBird

2

除了命名空间污染之外，我还被以下两点困扰：1.使用临时名称；2.当它只会被使用一次时，必须在不同的地方定义它。尽管这解决了命名空间问题，但却加剧了其他问题。 - Tim

5

你不应该这样做，因为eval是有害的，但你可以使用FunctionType和compile在运行时编译函数代码：

>>> def f(msg): print msg
>>> type(f)
 <type 'function'>
>>> help(type(f))
...
class function(object)
 |  function(code, globals[, name[, argdefs[, closure]]])
 |
 |  Create a function object from a code object and a dictionary.
 |  The optional name string overrides the name from the code object.
 |  The optional argdefs tuple specifies the default argument values.
 |  The optional closure tuple supplies the bindings for free variables.    
...

>>> help(compile)
Help on built-in function compile in module __builtin__:

compile(...)
    compile(source, filename, mode[, flags[, dont_inherit]]) -> code object

    Compile the source string (a Python module, statement or expression)
    into a code object that can be executed by the exec statement or eval().
    The filename will be used for run-time error messages.
    The mode must be 'exec' to compile a module, 'single' to compile a
    single (interactive) statement, or 'eval' to compile an expression.
    The flags argument, if present, controls which future statements influence
    the compilation of the code.
    The dont_inherit argument, if non-zero, stops the compilation inheriting
    the effects of any future statements in effect in the code calling
    compile; if absent or zero these statements do influence the compilation,
    in addition to any features explicitly specified.

- Udi

好主意。出于兴趣，eval和compile之间的道德差异是什么？ - Tim

@Tim：https://dev59.com/questions/t3E95IYBdhLWcg3wp_gg - Mark Fowler

这是一个非常棒的回答。这种新颖的方法不仅可以在运行时动态定义 Python 可调用对象（例如，函数、方法），还支持闭包！我已经搜索了很长时间，一直在寻找像这样完全的东西。感谢您把拼图拼在一起，Udi。 - Cecil Curry

3

作为所有人所说的，lambda是唯一的方法，但你需要考虑如何避免它们的限制 - 例如，你可以使用列表、字典、推导式等来实现你想要的功能：

funcs = [lambda x,y: x+y, lambda x,y: x-y, lambda x,y: x*y, lambda x: x]
funcs[0](1,2)
>>> 3
funcs[1](funcs[0](1,2),funcs[0](2,2))
>>> -1
[func(x,y) for x,y in zip(xrange(10),xrange(10,20)) for func in funcs]

经过print(尝试查看pprint模块)和控制流编辑：

add = True
(funcs[0] if add else funcs[1])(1,2)
>>> 3

from pprint import pprint
printMsg = lambda isWarning, msg: pprint('WARNING: ' + msg) if isWarning else pprint('MSG:' + msg)

- Artsiom Rudzenka

这似乎相当复杂，而且大多适用于没有流程控制的数学表达式。这种方式是否可能编写打印机？ - Tim

我认为这里不仅可以用于数学，还可以用于流程控制 - 请看我的更新 - Artsiom Rudzenka

我明白了，这可能对一些小的解决方法很有用。谢谢。 - Tim

不用谢，如果我的解决方案不符合您的需求，我很抱歉。我只是一个初学者，正在尝试使用Python进行实验。 - Artsiom Rudzenka

3

Python真的非常不想做这件事。它不仅没有任何定义多行匿名函数的方法，而且函数定义也不会返回函数，所以即使语法上是有效的...

mylist.sort(key=def _(v):
                    try:
                        return -v
                    except:
                        return None)

即使语法正确，函数定义也不会返回函数（虽然我猜想如果它是语法正确的，他们会让函数定义返回函数，这样它就可以工作了）。所以你可以编写自己的函数来从字符串中创建一个函数（当然要使用exec），并传入一个三重引号的字符串。虽然在语法上有点丑陋，但它可以工作：

def function(text, cache={}):

    # strip everything before the first paren in case it's "def foo(...):"
    if not text.startswith("("):
        text = text[text.index("("):]

    # keep a cache so we don't recompile the same func twice
    if text in cache:
        return cache[text]

    exec "def func" + text
    func.__name__ = "<anonymous>"

    cache[text] = func
    return func

    # never executed; forces func to be local (a tiny bit more speed)
    func = None

使用方法：

mylist.sort(key=function("""(v):
                                try:
                                    return -v
                                except:
                                    return None"""))

- kindall

除了语法高亮之外，我并不认为三引号的自定义函数很丑陋。我不知道 func = None 的技巧在哪里记录？ - Tim

如果您在函数中分配一个变量，那么该变量是局部的。它在编译时确定，而不是在执行时确定，因此它可能会超出实际执行路径。请参见：http://docs.python.org/reference/executionmodel.html，特别是“如果名称在块中绑定，则它是该块的局部变量。” - kindall

本地变量的速度优势：http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Local_Variables 我想，如果我要做一些如此hacky的事情，我至少要尽可能地让它快。 :-) - kindall

3

唯一创建匿名函数的方法是使用lambda，如您所知，它们只能包含单个表达式。

您可以创建多个同名函数，这样至少不必为每个函数想出新名称。

拥有真正的匿名函数将是很好的，但 Python 的语法不能轻易地支持它们。

- Ned Batchelder

1

但是Python的语法不太容易支持它们。你能详细说明一下吗？（或提供资源链接）谢谢。 - phant0m

3

@phant0m说：块分界的缩进模式不支持在表达式内（例如在括号内）开始/结束块，似乎这很难添加（要么非常复杂才能实现，要么使所有表达式的缩进都有意义，这会破坏多行表达式）。因此，您不能在lambda表达式中允许多个语句，因为您无法知道它在哪里结束（没有DEDENT标记）。我打赌邮件列表上还有更多相关材料。 - user395760

啊，是的，那很有道理！我没有想得太多去思考它们实际上会在哪里和如何使用，以及这意味着什么 :) 谢谢！ - phant0m

1

我找不到一个好的描述，但它已经被反复推敲过很多次了，@delnan是正确的：基于缩进的语法无法容纳表达式中的语句。 - Ned Batchelder

相反，基于Scheme的R允许执行以下操作：function(x,y,z) {w<- y+z ; x+w} 其中，对于未熟悉的人来说，“function”是关键字，代替了Python中的“lambda”。使用此类匿名函数的主要原因是将其作为参数传递给其他函数。 - user1544219

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Gareth Rees · Accepted Answer

这是基于Udi的精彩回答。

我认为创建匿名函数的难度有点转移注意力。你真正想做的是将相关的代码放在一起，使代码整洁。所以我认为装饰器可能适合你。

import re

# List of pairs (regexp, handler)
handlers = []

def handler_for(regexp):
    """Declare a function as handler for a regular expression."""
    def gethandler(f):
        handlers.append((re.compile(regexp), f))
        return f
    return gethandler

@handler_for(r'^<\w+> (.*)')
def handle_message(msg):
    print msg

@handler_for(r'^\*{3} (.*)')
def handle_warning(msg):
    global num_warnings, num_fatals
    num_warnings += 1
    if is_fatal(msg):
        num_fatals += 1