正则表达式 - 将行首的所有空格替换为句点

Question

正则表达式 - 将行首的所有空格替换为句点

6

我可以用vim、sed、awk、python等工具来实现，但无论我用哪个工具都做不到。

对于这样的输入：

top           f1    f2    f3
   sub1       f1    f2    f3
   sub2       f1    f2    f3
      sub21   f1    f2    f3
   sub3       f1    f2    f3

我希望：

top           f1    f2    f3
...sub1       f1    f2    f3
...sub2       f1    f2    f3
......sub21   f1    f2    f3
...sub3       f1    f2    f3

我想把这个内容加载到Excel中（通过空格分隔），并仍然能够查看第一列的层次结构！

我尝试了很多方法，但最终会失去层次结构信息。

- shikhanshu

不确定您所说的“在 Excel 中加载它”的含义。您想要格式化它，以便可以轻松地将其粘贴到电子表格中吗？这是您问题的一部分吗，还是您只是问如何用点替换前导空格？ - CAustin

@CAustin 对于造成的混淆我感到抱歉.. 'excel' 部分并不是一个问题，只是我想要了解原因。 - shikhanshu

我尝试了所有的方法，但无法完成。如果您不添加其中至少一个，那么这个问题看起来就像是在免费寻求代码。 - Sundeep

5个回答

5

在vim中，有两种不同的方法可以实现这一点。

With a regex:
```
:%s/^\s\+/\=repeat('.', len(submatch(0)))
```
This is fairly straightforward, but a little verbose. It uses the eval register (\=) to generate a string of '.'s the same length as the number of spaces at the beginning of each line.
With a norm command:
```
:%norm ^hviwr.
```
This is a much more conveniently short command, although it's a little harder to understand. It visually selects the spaces at the beginning of a line, and replaces the whole selection with dots. If there is no leading space, the command will fail on ^h because the cursor attempts to move out of bounds.

To see how this works, try typing ^hviwr. on a line that has leading spaces to see it happen.

- DJMcMayhem

正则表达式（regex）和norm命令都非常完美。我尝试在一行没有前导空格且没有^h的情况下，我明白了为什么需要^h :)感谢您提供的解决方案！ - shikhanshu

3

由于您提到了Python：

#!/usr/bin/env python
import re, sys
for line in sys.stdin:
    sys.stdout.write(re.sub('^ +', lambda m: len(m.group(0)) * '.', line))

（对于每一行，我们将最长的前缀空格序列'^ +'替换为等长的点序列'len(m.group(0)) * '.''。）

最终结果如下：

$ ./dottify.py <file
top           f1    f2    f3
...sub1       f1    f2    f3
...sub2       f1    f2    f3
......sub21   f1    f2    f3
...sub3       f1    f2    f3

既然你提到了awk：

$ awk '{ match($0,/^ +/); p=substr($0,0,RLENGTH); gsub(" ",".",p); print p""substr($0,RLENGTH+1) }' file
top           f1    f2    f3
...sub1       f1    f2    f3
...sub2       f1    f2    f3
......sub21   f1    f2    f3
...sub3       f1    f2    f3

（对于每一行，我们使用match匹配最长的空格前缀，用substr提取它，通过gsub替换每个空格为点号，并打印修改后的前缀p，然后是输入行的其余部分（match()之后，RSTART和RLENGTH变量被填充并保存所匹配模式的起始位置和长度）。）

- randomir

1

Python 真是太好了！'awk' 那个肯定需要认真思考，感谢详细的解释！我希望在 SO 上能选择多于一个答案。 - shikhanshu

3

在awk中，它会将第一个空格替换为句点，只有在该空格之前仅由句点组成时才会替换：

$ awk '{while(/^\.* / && sub(/ /,"."));}1' file
top           f1    f2    f3
...sub1       f1    f2    f3
...sub2       f1    f2    f3
......sub21   f1    f2    f3
...sub3       f1    f2    f3

这里有一个Perl版本的：

$ perl -p -e 'while(s/(^\.*) /\1./){;}' file
top           f1    f2    f3
...sub1       f1    f2    f3
...sub2       f1    f2    f3
......sub21   f1    f2    f3
...sub3       f1    f2    f3

- James Brown

2

如果你正在使用Perl，可以使用e特性... perl -pe 's/^ */"." x length($&)/e' - Sundeep

1

你和你对简短代码的追求 :P 然后你的代码可以缩短为 perl -pe 'while(s/^\.*\K /./){}' .... 但我的重点更在于性能 ;) - Sundeep

2

顺便说一句，这很快。有一百万条记录和四个空格在开头，它开始对其他解决方案产生影响（好吧，我只测试了我的、sed 和其他 awk）。不过这个更快：gawk 'BEGIN{FS=OFS=""}{i=1;while($i==" ")$(i++)="."}1' file（在我的有偏见的笔记本电脑上 :)。 - James Brown

你应该完全发布速度测试结果来回答 :) - Sundeep

1

聪明的方法和优雅的awk解决方案。不错！ - randomir

我了解了\G，你可以进一步缩短为perl -pe 's/\G /./g'。 - Sundeep

1

有点冗长，但仍然是一个有趣的练习：

# Function to count the number of leading spaces in a string
# Basically, this counts the number of consecutive elements that satisfy being spaces
def count_leading_spaces(s):
    if not s:
        return 0
    else:
        curr_char = s[0]
        if curr_char != ' ':
            return 0
        else:
            idx = 1
            curr_char = s[idx]
            while curr_char == ' ':
                idx += 1
                try:
                    curr_char = s[idx]
                except IndexError:
                    return idx
        return idx

最后，打开文件并进行一些工作：

with open('file.txt', 'r') as f:
    data = []
    for i, line in enumerate(f):
        # Don't do anything to the field names
        if i == 0:
            new_line = line.rstrip()
        else:
            n_leading_spaces = count_leading_spaces(line)
            # Impute periods for spaces
            new_line = ('.'*n_leading_spaces + line.lstrip()).rstrip()
        data.append(new_line)

结果：

>>> print('\n'.join(data))
top           f1    f2    f3
...sub1       f1    f2    f3
...sub2       f1    f2    f3
......sub21   f1    f2    f3
...sub3       f1    f2    f3

你也可以使用这种方式，它更简单：

with open('file.txt', 'r') as f:
    data = []
    for i, line in enumerate(f):
        # Don't do anything to the field names
        if i == 0:
            new_line = line.rstrip()
        else:
            n_leading_spaces = len(line) - len(line.lstrip())
            # Impute periods for spaces
            new_line = line.lstrip().rjust(len(line), '.').rstrip()
        data.append(new_line)

- blacksite

1

"len(line) - len(line.lstrip())" 这个写法很巧妙，我不知道为什么我没想到。感谢你的回答。 - shikhanshu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- John1024 · Accepted Answer

作为输入：

$ cat file
top           f1    f2    f3
   sub1       f1    f2    f3
   sub2       f1    f2    f3
      sub21   f1    f2    f3
   sub3       f1    f2    f3

尝试：

$ sed -E ':a; s/^( *) ([^ ])/\1.\2/; ta' file
top           f1    f2    f3
...sub1       f1    f2    f3
...sub2       f1    f2    f3
......sub21   f1    f2    f3
...sub3       f1    f2    f3

工作原理：

:a

这将创建一个标签a。
s/^( *) ([^ ])/\1.\2/

如果行以空格开头，则将前导空格中最后一个空格替换为句号。

更详细地说，^( *)匹配除了最后一个之外的所有前导空格，并将它们存储在组1中。正则表达式([^ ])（尽管stackoverflow让它看起来像是一个空格后面跟着([^ ])），它匹配一个空格后面跟着一个非空格，并将非空格存储在组2中。

\1.\2用组1、句号和组2替换匹配的文本。
ta

如果替换命令导致替换，则返回到标签a并重试。

兼容性：

The above was tested on modern GNU sed. For BSD/OSX sed, one might or might not need to use:
```
sed -E -e :a -e 's/^( *) ([^ ])/\1.\2/' -e ta file
```
On ancient GNU sed, one needs to use -r in place of -E:
```
sed -r ':a; s/^( *) ([^ ])/\1.\2/; ta' file
```
The above assumed that the spaces were blanks. If they are tabs, then you will have to decide what your tabstop is and make substitutions accordingly.