match( ['A', 'B', '*', 'D'], ['A', 'B', 'C', 'C', 'C', 'D'] )
只要两个列表中的所有元素顺序相同,就会返回True。
我正在处理对象列表,但为了简单起见,上面使用了字符串。
match( ['A', 'B', '*', 'D'], ['A', 'B', 'C', 'C', 'C', 'D'] )
看起来你并没有使用字符串,而是在比较对象。因此,我将提供一个显式算法——正则表达式提供了针对字符串的好解决方案,但从你在问题评论中所说的内容来看,似乎一个显式、简单的算法会让事情变得更容易。
结果证明,这个问题可以用比之前的回答更简单的算法来解决:
def matcher (l1, l2):
if (l1 == []):
return (l2 == [] or l2 == ['*'])
if (l2 == [] or l2[0] == '*'):
return matcher(l2, l1)
if (l1[0] == '*'):
return (matcher(l1, l2[1:]) or matcher(l1[1:], l2))
if (l1[0] == l2[0]):
return matcher(l1[1:], l2[1:])
else:
return False
关键思想是遇到通配符时,可以探索两个选项:
True
*则停止探索其余部分(如果有很多星号,则可能)。仍然存在指数级的情况(例如,在两个模式中每隔一个字符就有1个星号),但它们应该开始变得罕见了。无论如何,这种递归排序不是Python的急切求值语义吗?即左侧返回之前甚至不看右侧调用? - Francois Gimport re
def match(pat, lst):
regex = ''.join(term if term != '*' else '.*' for term in pat) + '$'
s = ''.join(lst)
return re.match(regex, s) is not None
print match( ['A', 'B', '*', 'D'], ['A', 'B', 'C', 'C', 'C', 'D'] )
它使用正则表达式。通配符(*
)被更改为.*
,而所有其他搜索术语保持不变。
一个注意点是,如果您的搜索术语可能包含在正则表达式语言中具有特殊含义的内容,则需要正确转义这些内容。在match
函数中处理这个问题非常容易,我只是不确定这是否是您所需的。
我建议将['A', 'B', '*', 'D']
转换为'^AB.*D$'
,将['A', 'B', 'C', 'C', 'C', 'D']
转换为'ABCCCD'
,然后使用re
模块(正则表达式)进行匹配。
如果您的列表元素只有一个字符且为字符串,则此方法有效。
类似于:
import(re)
def myMatch( patternList, stringList ):
# convert pattern to flat string with wildcards
# convert AB*D to valid regex ^AB.*D$
pattern = ''.join(patternList)
regexPattern = '^' + pattern.replace('*','.*') + '$'
# perform matching
against = ''.join(stringList) # convert ['A','B','C','C','D'] to ABCCCD
# return whether there is a match
return (re.match(regexPattern,against) is not None)
如果列表包含数字或单词,请选择一个您不希望出现在其中的字符,例如#
。然后将['Aa','Bs','Ce','Cc','CC','Dd']
转换为Aa#Bs#Ce#Cc#CC#Dd
,通配符模式['Aa','Bs','*','Dd']
可以转换为^Aa#Bs#.*#Dd$
,并执行匹配。
实际上,这只是意味着在myMatch
中所有的''.join(...)
都变成了'#'.join(...)
。
''.join
很快,而且正则表达式也相当简单(没有回顾)。 - mathematical.coffee我同意关于可以使用正则表达式完成此操作的评论。例如:
import re
lst = ['A', 'B', 'C', 'C', 'C', 'D']
pattern = ['A', 'B', 'C+', 'D']
print re.match(''.join(pattern), ''.join(lst)) # Will successfully match
编辑:正如评论所指出的那样,有时我们可能只知道需要匹配某个字符,但不知道具体是哪一个。这种情况下,正则表达式仍然非常有用:
import re
lst = ['A', 'B', 'C', 'C', 'C', 'D']
pattern = r'AB(\w)\1*D'
print re.match(pattern, ''.join(lst)).groups()
我同意,正则表达式通常是处理这种事情的方法。虽然这个算法可以工作,但对我来说看起来很复杂。不过写它还是很有趣的。
def match(listx, listy):
listx, listy = map(iter, (listx, listy))
while 1:
try:
x = next(listx)
except StopIteration:
# This means there are values left in listx that are not in listy.
try:
y = next(listy)
except StopIteration:
# This means there are no more values to be compared in either
# listx or listy; since no exception was raied elsewhere, the
# lists match.
return True
else:
# This means that there are values in listy that are not in
# listx.
return False
else:
try:
y = next(listy)
except StopIteration:
# Similarly, there are values in listy that aren't in listx.
return False
if x == y:
pass
elif x == '*':
try:
# Get the value in listx after '*'.
x = next(listx)
except StopIteration:
# This means that listx terminates with '*'. If there are any
# remaining values of listy, they will, by definition, match.
return True
while 1:
if x == y:
# I didn't shift to the next value in listy because I
# assume that a '*' matches the empty string and well as
# any other.
break
else:
try:
y = next(listy)
except StopIteration:
# This means there is at least one remaining value in
# listx that is not in listy, because listy has no
# more values.
return False
else:
pass
# Same algorithm as above, given there is a '*' in listy.
elif y == '*':
try:
y = next(listy)
except StopIteration:
return True
while 1:
if x == y:
break
else:
try:
x = next(listx)
except StopIteration:
return False
else:
pass
我有一段 C++ 代码,似乎可以实现你想要的功能(输入是字符串而不是字符数组,但你需要进行适当的调整)。
bool Utils::stringMatchWithWildcards (const std::string str, const std::string strWithWildcards)
PRINT("Starting in stringMatchWithWildcards('" << str << "','" << strWithWildcards << "')");
const std::string wildcard="*";
const bool startWithWildcard=(strWithWildcards.find(wildcard)==0);
int pos=strWithWildcards.rfind(wildcard);
const bool endWithWildcard = (pos!=std::string::npos) && (pos+wildcard.size()==strWithWildcards.size());
// Basically, the point is to split the string with wildcards in strings with no wildcard.
// Then search in the first string for the different chunks of the second in the correct order
std::vector<std::string> vectStr;
boost::split(vectStr, strWithWildcards, boost::is_any_of(wildcard));
// I expected all the chunks in vectStr to be non-empty. It doesn't seem the be the case so let's remove them.
vectStr.erase(std::remove_if(vectStr.begin(), vectStr.end(), std::mem_fun_ref(&std::string::empty)), vectStr.end());
// Check if at least one element (to have first and last element)
if (vectStr.empty())
{
const bool matchEmptyCase = (startWithWildcard || endWithWildcard || str.empty());
PRINT("Match " << (matchEmptyCase?"":"un") << "successful (empty case) : '" << str << "' and '" << strWithWildcards << "'");
return matchEmptyCase;
}
// First Element
std::vector<std::string>::const_iterator vectStrIt = vectStr.begin();
std::string aStr=*vectStrIt;
if (!startWithWildcard && str.find(aStr, 0)!=0) {
PRINT("Match unsuccessful (beginning) : '" << str << "' and '" << strWithWildcards << "'");
return false;
}
// "Normal" Elements
bool found(true);
pos=0;
std::vector<std::string>::const_iterator vectStrEnd = vectStr.end();
for ( ; vectStrIt!=vectStrEnd ; vectStrIt++)
{
aStr=*vectStrIt;
PRINT( "Searching '" << aStr << "' in '" << str << "' from " << pos);
pos=str.find(aStr, pos);
if (pos==std::string::npos)
{
PRINT("Match unsuccessful ('" << aStr << "' not found) : '" << str << "' and '" << strWithWildcards << "'");
return false;
} else
{
PRINT( "Found at position " << pos);
pos+=aStr.size();
}
}
// Last Element
const bool matchEnd = (endWithWildcard || str.rfind(aStr)+aStr.size()==str.size());
PRINT("Match " << (matchEnd?"":"un") << "successful (usual case) : '" << str << "' and '" << strWithWildcards);
return matchEnd;
}
/* Tested on these values :
assert( stringMatchWithWildcards("ABC","ABC"));
assert( stringMatchWithWildcards("ABC","*"));
assert( stringMatchWithWildcards("ABC","*****"));
assert( stringMatchWithWildcards("ABC","*BC"));
assert( stringMatchWithWildcards("ABC","AB*"));
assert( stringMatchWithWildcards("ABC","A*C"));
assert( stringMatchWithWildcards("ABC","*C"));
assert( stringMatchWithWildcards("ABC","A*"));
assert(!stringMatchWithWildcards("ABC","BC"));
assert(!stringMatchWithWildcards("ABC","AB"));
assert(!stringMatchWithWildcards("ABC","AB*D"));
assert(!stringMatchWithWildcards("ABC",""));
assert( stringMatchWithWildcards("",""));
assert( stringMatchWithWildcards("","*"));
assert(!stringMatchWithWildcards("","ABC"));
*/
这不是我真正引以为豪的事情,但目前似乎还可以。希望你能发现它有用。