在字符串列表中查找相同位置的字符的算法？

Question

在字符串列表中查找相同位置的字符的算法？

algorithmstring

3

假设我有以下这些字符串：

Toby Tiny Tory Tily

是否有一种算法可以轻松地创建一个列表，其中包含所有这些字符串中相同位置的公共字符？（在这种情况下，相同位置的公共字符是第0个位置上的'T'和第3个位置上的'y'）

我尝试查看了一些用于DNA序列匹配的算法，但它们似乎大多只用于查找无论其位置如何都具有相同子字符串。

- ilitirit

8个回答

1

一些通用代码，其性能相当差，为O(n^2)

str[] = { "Toby", "Tiny", "Tory", "Tily" };
result = null;
largestString = str.getLargestString(); // Made up function
str.remove(largestString)
for (i = 0; i < largestString.length; i++) {
   hits = 0;
   foreach (str as value) {
      if (i < value.length) {
         if (value.charAt(i) == largestString.charAt(i))
            hits++;
      }
   }
   if (hits == str.length)
      result += largestString.charAt(i);
}
print(str.items);

- Josh Smeaton

1

我想不出任何特别优化的东西。

你可以做这样的事情，这应该不太难：

        //c# -- assuming your strings are in a List<string> named Names
        int shortestLength = Names[0].Length, j;
        char[] CommonCharacters;
        char single;

        for (int i = 1; i < Names.Count; i++)
        {
            if (Names[i].Length < shortestLength) shortestLength = Names[i].Length;
        }

        CommonCharacters = new char[shortestLength];
        for (int i = 0; i < shortestLength; i++)
        {
            j = 1;
            single = Names[0][i];
            CommonCharacters[i] = single;
            while (j < shortestLength)
            {
                if (single != Names[j][i])
                {
                    CommonCharacters[i] = " "[0];
                    break;
                }
                j++;
            }
        }

这将为您提供一个在列表中所有内容中都相同的字符数组。

- theo

1

这样的东西怎么样？

strings = %w(Tony Tiny Tory Tily)
positions = Hash.new { |h,k| h[k] = Hash.new { |h,k| h[k] = 0 } }
strings.each { |str| 
  0.upto(str.length-1) { |i| 
    positions[i][str[i,1]]+=1 
  }
}

执行结束时，结果将为：

positions = {
  0=>{"T"=>4},
  1=>{"o"=>2, "i"=>2}, 
  2=>{"l"=>1, "n"=>2, "r"=>1},
  3=>{"y"=>4}
}

- hoyhoy

这就是我喜欢 Ruby 的原因。它让你快速完成任务。 - Josh Smeaton

1

这是一段5行ruby算法的代码示例：

#!/usr/bin/env ruby
chars = STDIN.gets.chomp.split("")
STDIN.each do |string|
  chars = string.chomp.split("").zip(chars).map {|x,y| x == y ? x : nil }
end
chars.each_index {|i| puts "#{chars[i]}  #{i}" if chars[i] }

请将以下内容放入commonletters.rb文件中。示例使用方法：

$ commonletters.rb < input.txt
T  0
y  3

假设input.txt包含：

Toby
Tiny
Tory
Tily

它应该能处理您输入的任何内容。如果输入文件为空，它将会出错，但您可以自己修复。这是O(n)（n是输入中字符的总数）。

- tgamblin

1

这里是一个Python的简单版本：

items = ['Toby', 'Tiny', 'Tory', 'Tily']
tuples = sorted(x for item in items for x in enumerate(item))
print [x[0] for x in itertools.groupby(tuples) if len(list(x[1])) == len(items)]

这将打印：

[(0, 'T'), (3, 'y')]

编辑：这是一个更好的版本，它不需要创建（可能）巨大的元组列表：

items = ['Toby', 'Tiny', 'Tory', 'Tily']
minlen = min(len(x) for x in items)
print [(i, items[0][i]) for i in range(minlen) if all(x[i] == items[0][i] for x in items)]

- Arachnid

1

#include <iostream>

int main(void)
{
    char words[4][5] = 
    {
        "Toby",
        "Tiny",
        "Tory",
        "Tily"
    };

    int wordsCount = 4;
    int lettersPerWord = 4;

    int z;
    for (z = 1; z < wordsCount; z++)
    {
        int y;
        for (y = 0; y < lettersPerWord; y++)
        {
            if (words[0][y] != words[z][y])
            {
                words[0][y] = ' ';
            }
        }
    }

    std::cout << words[0] << std::endl;

    return 0;
}

- EvilTeach

0

在Lisp中：

CL-USER> (defun common-chars (&rest strings)
           (apply #'map 'list #'char= strings))
COMMON-CHARS

只需传入字符串：

CL-USER> (common-chars "Toby" "Tiny" "Tory" "Tily")
(T NIL NIL T)

如果您想要字符本身：

CL-USER> (defun common-chars2 (&rest strings)
           (apply #'map
                  'list
                  #'(lambda (&rest chars)
                      (when (apply #'char= chars)
                        (first chars))) ; return the char instead of T
                  strings))
COMMON-CHARS2

CL-USER> (common-chars2 "Toby" "Tiny" "Tory" "Tily")
(#\T NIL NIL #\y)

如果您不关心位置，只想要一个常见字符列表：

CL-USER> (format t "~{~@[~A ~]~}" (common-chars2 "Toby" "Tiny" "Tory" "Tily"))
T y 
NIL

我承认这不是一个算法...只是一种使用现有功能在Lisp中完成它的方法。

如果你想手动完成它，正如已经说过的，你需要循环比较给定索引处的所有字符。如果它们都匹配，保存匹配的字符。

- Morikal

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Brian R. Bondy · Accepted Answer

在特定位置找到所有字符串都包含的字符列表非常简单。只需要对每个字符位置的每个字符串进行迭代。如果任何一个字符串的字符与其最接近的邻居字符串的字符不匹配，则该位置不包含公共字符。

对于任何i = 0到length-1...一旦找到Si[x]！= Si + 1[x]，就可以跳转到下一个位置x + 1。

其中Si是列表中的第i个字符串。[x]是位置x上的字符。