在给定的字符串中找到所有的浮点数或整数。

Question

在给定的字符串中找到所有的浮点数或整数。

3

给定一个字符串 "Hello4.2this.is random 24 text42"，我想要返回所有的整数或浮点数，[4.2, 24, 42]。其他问题的解决方案只返回 24。即使数字旁边有非数字字符，我也想返回浮点数。由于我是 Python 新手，我尝试避免使用正则表达式或其他复杂的导入。我不知道该如何开始。请帮忙。以下是一些研究尝试：Python: Extract numbers from a string，这个解决方案无法识别 4.2 和 42。还有其他类似的问题，但都无法识别 4.2 和 42。

- Nairit

4

如果不使用正则表达式，你很难做好这件事情。正则表达式专门用于此任务。 - Alex Huszagh

@AlexanderHuszagh：“如果没有正则表达式，你做不好这件事。” 好的，听起来像是一个挑战... - Warren Weckesser

@WarrenWeckesser，关键词是well。这是可行的，但如果没有使用re模块，则效率低下，难以阅读且可能无法执行。 - Alex Huszagh

在查看了re模块之后，我才意识到re是为了做这些事情而创建的。这让我意识到我在掌握基本Python方面还有多长的路要走。 - Nairit

3个回答

2

使用正则表达式可能会为您提供最简洁的代码来解决此问题。很难超越其简洁性。

re.findall(r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?", str)

来自pythad的答案。

然而，你说“我试图避免使用正则表达式”，所以这里提供一种不使用正则表达式的解决方案。显然，这种解决方案比使用正则表达式的解决方案要长一些（可能慢得多），但它并不复杂。

该代码逐个字符地循环输入。每次从字符串中获取一个字符时，如果将其附加到current（一个保存当前正在解析的数字的字符串）仍然保持有效数字，则将其附加到current中。当它遇到无法附加到current的字符时，只有在current本身不是''、'.'、'-'或'-.'之一时，才将current保存到数字列表中；这些是可以潜在地开始一个数字但本身不是有效数字的字符串。

保存current时，会删除尾随的'e'、'e-'或'e+'。例如，对于这样的字符串'1.23eA'，将会发生这种情况。在解析该字符串时，current最终将变为'1.23e'，但随后遇到'A'，这意味着该字符串不包含有效的指数部分，因此将丢弃'e'。

保存current之后，将其重置。通常情况下，current被重置为''，但当触发保存current的字符是'.'或'-'时，current被设置为该字符，因为这些字符可能是一个新数字的开头。

这里是函数extract_numbers(s)。在return numbers之前的那一行将字符串列表转换为整数和浮点数值的列表。如果只想要字符串，可以删除那一行。

def extract_numbers(s):
    """
    Extract numbers from a string.

    Examples
    --------
    >>> extract_numbers("Hello4.2this.is random 24 text42")
    [4.2, 24, 42]

    >>> extract_numbers("2.3+45-99")
    [2.3, 45, -99]

    >>> extract_numbers("Avogadro's number, 6.022e23, is greater than 1 million.")
    [6.022e+23, 1]
    """
    numbers = []
    current = ''
    for c in s.lower() + '!':
        if (c.isdigit() or
            (c == 'e' and ('e' not in current) and (current not in ['', '.', '-', '-.'])) or
            (c == '.' and ('e' not in current) and ('.' not in current)) or
            (c == '+' and current.endswith('e')) or
            (c == '-' and ((current == '') or current.endswith('e')))):
            current += c
        else:
            if current not in ['', '.', '-', '-.']:
                if current.endswith('e'):
                    current = current[:-1]
                elif current.endswith('e-') or current.endswith('e+'):
                    current = current[:-2]
                numbers.append(current)
            if c == '.' or c == '-':
                current = c
            else:
                current = ''

    # Convert from strings to actual python numbers.
    numbers = [float(t) if ('.' in t or 'e' in t) else int(t) for t in numbers]

    return numbers

- Warren Weckesser

+1 谢谢！你的代码很棒。我主要是想要一个没有正则表达式的解决方案，这样我就可以理解真正编码背后的逻辑了——而且据我所知，JS和C都没有正则表达式。 - Nairit

0

如果您想从字符串中获取整数或浮点数，请遵循pythad的方法...

如果您想从单个字符串中获取整数和浮点数，请执行以下操作：

string = "These are floats: 10.5, 2.8, 0.5; and these are integers: 2, 1000, 1975, 308 !! :D"

for line in string:
    for actualValue in line.split():
        value = []

            if "." in actualValue:
                value = re.findall('\d+\.\d+', actualValue)
            else:
                value = re.findall('\d+', actualValue)
                
            numbers += value

- Riyasat - TheCodeHeist

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- pythad · Accepted Answer

来自perldoc perlretut的正则表达式:

import re
re_float = re.compile("""(?x)
   ^
      [+-]?\ *      # first, match an optional sign *and space*
      (             # then match integers or f.p. mantissas:
          \d+       # start out with a ...
          (
              \.\d* # mantissa of the form a.b or a.
          )?        # ? takes care of integers of the form a
         |\.\d+     # mantissa of the form .b
      )
      ([eE][+-]?\d+)?  # finally, optionally match an exponent
   $""")
m = re_float.match("4.5")
print m.group(0)
# -> 4.5

从字符串中获取所有数字的方法如下：

str = "4.5 foo 123 abc .123"
print re.findall(r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?", str)
# -> ['4.5', ' 123', ' .123']