如何在bash中查找两个字符串之间的共同字符？

Question

如何在bash中查找两个字符串之间的共同字符？

5

例如：

s1="my_foo"
s2="not_my_bar"

期望的结果应该是my_o。在bash中如何实现呢？

- johannes

下划线将作为分隔符吗？ - ajreal

不，问题是我想从s1和s2中获取所有共同的字符。 - johannes

在编写shell脚本时，任务的简单性和解决方案的复杂性之间存在极大的差异。非常不错！ - Karoly Horvath

8个回答

2

一个晚到的参与者，我刚刚找到了这个页面：

echo "$str2" |
  awk 'BEGIN{FS=""}
  { n=0; while(n<=NF) {
   if ($n == substr(test,n,1)) { if(!found[$n]) printf("%c",$n); found[$n]=1;} n++;
  } print ""}' test="$str1"

还有一个，它构建了一个用于匹配的正则表达式（注意：不适用于特殊字符，但使用另一个sed很容易解决）

echo "$str1" |
  grep -E -o ^`echo -n "$str2" | sed 's/\(.\)/(|\1/g'; echo "$str2" | sed 's/./)/g'`

- Karoly Horvath

使用awk是个好主意，但是使用这个例子却不起作用:

awk 'BEGIN{FS=""} { n=0; while(n<=NF) {if ($n == substr(test,n,1)) {printf("%c",$n);} n++;} print ""}' test="/aa/ba/" <<< "/aa/bb/"

。它显示的是/aa/b/而不是/aa/b。请尝试修复你的答案。谢谢。 - oHo

1

@olibre：报告有点奇怪 :) 我修复了它。 - Karoly Horvath

2

假设字符串不包含嵌入的换行符：

s1='my_foo' s2='my_bar'
intersect=$(
  comm -12 <(
    fold -w1 <<< "$s1" |
      sort -u
      ) <(
        fold -w1 <<< "$s2" |
          sort -u
          ) |
            tr -d \\n
            )

printf '%s\n' "$intersect"

还有一个：

tr -dc "$s2" <<< "$s1"

- Dimitre Radoulov

1

你的第二个解决方案使用 tr 很好，但是没有去除重复项。 - dogbane

@dogbane，说得好！我应该提到这一点。为了删除重复项，两个值都应通过“fold .. | sort ..”过滤器。 - Dimitre Radoulov

1

应该是一个可移植的解决方案：

s1="my_foo"  
s2="my_bar"
while [ -n "$s1" -a -n "$s2" ]
do
    if [ "${s1:0:1}" = "${s2:0:1}" ]
    then
        printf %s "${s1:0:1}"
    else
        break
    fi
    s1="${s1:1:${#s1}}"
    s2="${s2:1:${#s2}}"
done

- l0b0

这只匹配两个字符串中相同索引处的字符。因此，如果你有my_foo_bar和my_bar，它就无法工作。 - dogbane

1

comm=""
for ((i=0;i<${#s1};i++))
do 
  if test ${s1:$i:1} = ${s2:$i:1}
  then 
    comm=${comm}${s1:$i:1}
  fi
done

- ajreal

1

一种使用单个sed执行的解决方案：

echo -e "$s1\n$s2" | sed -e 'N;s/^/\n/;:begin;s/\n\(.\)\(.*\)\n\(.*\)\1\(.*\)/\1\n\2\n\3\4/;t begin;s/\n.\(.*\)\n\(.*\)/\n\1\n\2/;t begin;s/\n\n.*//'

像所有晦涩的sed脚本一样，它需要以sed脚本文件的形式进行解释，可以通过echo -e "$s1\n$s2" | sed -f script运行：

# Read the next line so s1 and s2 are in the pattern space only separated by a \n.
N
# Put a \n at the beginning of the pattern space.
s/^/\n/
# During the script execution, the pattern space will contain <result so far>\n<what left of s1>\n<what left of s2>.
:begin
# If the 1st char of s1 is found in s2, remove it from s1 and s2, append it to the result and do this again until it fails.
s/\n\(.\)\(.*\)\n\(.*\)\1\(.*\)/\1\n\2\n\3\4/
t begin
# When previous substitution fails, remove 1st char of s1 and try again to find 1st char of S1 in s2.
s/\n.\(.*\)\n\(.*\)/\n\1\n\2/
t begin
# When previous substitution fails, s1 is empty so remove the \n and what is left of s2.
s/\n\n.*//

如果您想要删除重复项，请在脚本末尾添加以下内容：

:end;s/\(.\)\(.*\)\1/\1\2/;t end

编辑：我意识到dogbane的纯shell解决方案具有相同的算法，并且可能更高效。

- jfg956

0

"flower","flow","flight" --> output fl

    s="flower"
t="flow"
i=0
while [ $i -ne ${#s} ]
do
    c=${s:$i:1}
    if [[ $result != *$c* && $t == *$c* ]]
    then
      result=$result$c
    fi
    ((i++))
done
echo $result
p=$result
q="flight"
j=0

while [ $j -ne ${#p} ]
do
    c1=${p:$j:1}
    if [[ $result1 != *$c1* && $q == *$c1* ]]
    then
      result1=$result1$c1
    fi
    ((j++))
done
echo $result1

- Uzzal Basak

0

由于每个人都喜欢充满标点符号的 Perl 单行命令：

perl -e '$a{$_}++ for split "",shift; $b{$_}++ for split "",shift; for (sort keys %a){print if defined $b{$_}}' my_foo not_my_bar

从输入字符串创建哈希表 %a 和 %b。
打印两个字符串中共同的字符。

输出结果：

_moy

- Chris Koknat

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- dogbane · Accepted Answer

我的解决方案如下：使用fold将字符串拆分为每行一个字符，sort对列表进行排序，comm比较这两个字符串，最后使用tr删除换行符。

comm -12 <(fold -w1 <<< $s1 | sort -u) <(fold -w1 <<< $s2 | sort -u) | tr -d '\n'

或者，这里有一个纯Bash解决方案（还保持字符的顺序）。它遍历第一个字符串并检查每个字符是否出现在第二个字符串中。

s="temp_foo_bar"
t="temp_bar"
i=0
while [ $i -ne ${#s} ]
do
    c=${s:$i:1}
    if [[ $result != *$c* && $t == *$c* ]]
    then
      result=$result$c
    fi
    ((i++))
done
echo $result

输出：temp_bar