如何在 POSIX shell 脚本中遍历字符串的字符？

Question

如何在 POSIX shell 脚本中遍历字符串的字符？

10

一个符合POSIX标准的shell应该提供如下机制来迭代字符串集合：

for x in $(seq 1 5); do
    echo $x
done

但是，我如何迭代每个单词的每个字符？

- Luis Lavaire.

1

（顺带一提，seq 不被 POSIX 规定；在 POSIX 中计数到 5 的一种机制可能是 i=0; while [ "$i" -lt 5 ]; do echo "$i"; i=$((i + 1)); done） - Charles Duffy

我试图演示如何执行迭代。seq命令不是执行迭代的机制的一部分。但你对你的例子符合POSIX标准是正确的。 - Luis Lavaire.

4个回答

3

使用getopts以逐个字符处理输入。冒号（:）指示 getopts 忽略非法选项并设置 OPTARG。输入中的前导连字符（-）使 getopts 将字符串视为选项。

如果 getopts 遇到冒号，则不会设置 OPTARG，因此脚本使用参数扩展在未设置 / 为空时返回 :。

#!/bin/sh
IFS='
'
split_string () {
  OPTIND=1;
  while getopts ":" opt "-$1"
    do echo "'${OPTARG:-:}'"
  done
}

while read -r line;do
  split_string "$line"
done

与已接受的答案类似，这个过程是按字节而不是按字符进行处理，会破坏多字节码点。关键是要检测多字节码点，将它们的字节连接起来然后打印出来:

#!/bin/sh
IFS='
'
split_string () {
  OPTIND=1;
  while getopts ":" opt "$1";do
    case "${OPTARG:=:}" in
      ([[:print:]])
        [ -n "$multi" ] && echo "$multi" && multi=
        echo "$OPTARG" && continue
    esac
    multi="$multi$OPTARG"
    case "$multi" in
      ([[:print:]]) echo "$multi" && multi=
    esac
  done
  [ -n "$multi" ] && echo "$multi"
}
while read -r line;do
  split_string "-$line"
done

这里额外使用了case "$multi"来检测多缓冲区是否包含可打印字符。这在像Bash和Zsh这样的shell上有效，但Dash和busybox ash不会模式匹配多字节码点，忽略locale。

这样做可以得到相对较好的结果：Dash/ash将多字节码点序列视为一个字符，但正确处理由单字节字符包围的多字节字符。

根据您的需求，不分割连续的多字节码点可能更好，因为下一个码点可能是组合字符，它修改之前的字符。

这种方法无法处理后面跟随组合字符的单字节字符情况。

- David Farrell

1

有趣。它可能会改善大字符串的性能问题。我不知道它是否有效，但我已经尝试改进了你的想法。split_string() { OPTIND=1; while getopts ":" opt "-$1"; do echo "'$OPTARG'"; done; } - Koichi Nakashima

@KoichiNakashima非常棒！在我的笔记本电脑上，它解析312Kb文本文件的速度快了1.4倍，比使用'sed'快了55倍。 - David Farrell

2

这在dash和busybox中都有效：

echo 'ab * cd' | grep -o .

输出：

a
b

*

c
d

- agc

1

正如 @Gordon Davisson 所提到的，grep 的 -o 选项不符合 POSIX 标准。 - Luis Lavaire.

1

@LuisLavaire，谢谢。我现在会留下它，因为它可以使用busybox grep... - agc

-1

我正在开发一个需要使用堆栈的脚本... 因此，我们可以使用它来迭代字符串

#!/bin/sh
# posix script

pop () {
#    $1 top
#    $2 stack
    eval $1='$(expr "'\$$2'" : "\(.\).*")'
    eval $2='$(expr "'\$$2'" : ".\(.*\)" )'
}

string="ABCDEFG"
while [ "$string" != "" ]
do
    pop c string
    echo "--" $c
done

- macemurez

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Gordon Davisson · Accepted Answer

这可能有点绕，但我认为它适用于任何遵循 posix 标准的 shell。我已经在 dash 中尝试过，但我没有方便测试的 busybox。

var='ab * cd'

tmp="$var"    # The loop will consume the variable, so make a temp copy first
while [ -n "$tmp" ]; do
    rest="${tmp#?}"    # All but the first character of the string
    first="${tmp%"$rest"}"    # Remove $rest, and you're left with the first character
    echo "$first"
    tmp="$rest"
done

输出：

a
b

*

c
d

请注意，赋值语句右侧的双引号并不需要；我只是倾向于在所有扩展周围使用双引号，而不是试图记住在哪里可以留下它们。另一方面，在[ -n "$tmp" ]中的双引号是绝对必要的，如果字符串包含“*”，则first="${tmp%"$rest"}"中的内部双引号是必需的。