我有两个 char*
邮政编码,我想进行大小写不敏感的比较。
是否有函数可以做到这一点?
还是我必须循环遍历每个使用 tolower
函数然后进行比较?
有没有想法这个函数将如何处理字符串中的数字?
谢谢
我有两个 char*
邮政编码,我想进行大小写不敏感的比较。
是否有函数可以做到这一点?
还是我必须循环遍历每个使用 tolower
函数然后进行比较?
有没有想法这个函数将如何处理字符串中的数字?
谢谢
在C标准中没有这样的函数。 遵循POSIX的Unix系统必须在头文件strings.h
中拥有strcasecmp
函数;Microsoft系统则有stricmp
。为了保持可移植性,编写自己的函数:
int strcicmp(char const *a, char const *b)
{
for (;; a++, b++) {
int d = tolower((unsigned char)*a) - tolower((unsigned char)*b);
if (d != 0 || !*a)
return d;
}
}
但请注意,这些解决方案都无法处理UTF-8字符串,只能处理ASCII字符串。
a
或 b
为 NULL
时,这将会出现严重错误。 - YoTengoUnLCDa
和/或b
为NULL
时会导致程序崩溃,这是被广泛接受的做法。尽管这是一个不错的检查,但是应该返回什么呢?cmp("", NULL)
应该返回 0 还是 INT_MIN 呢?对此并没有共识。请注意:C 允许使用strcmp(NULL, "abc");
产生未定义行为。 - chux - Reinstate Monica请查看 strings.h
中的 strcasecmp()
。
strings.h
中的int strcasecmp(const char *s1, const char *s2);
。 - Brighamstricmp
。@entropo:strings.h
是为了与上世纪80年代的Unix系统兼容而设计的头文件。 - Fred Foostrings.h
。它还定义了在该头文件中声明的strcasecmp
函数。不过,ISO C标准并没有包含这个函数。 - Fred Foostrcasecmp
应该在那里声明。但我使用的所有编译器都在string.h中声明了strcasecmp
。至少cl、g++、forte c++编译器都有。 - Mihran Hovsepyan我发现了内置的名为from的方法,它包含了一些额外的字符串函数,比标准头文件中的更多。
以下是相关的签名:
I've found built-in such method named from
which contains additional string functions to the standard header <string>
.
Here's the relevant signatures:
int strcasecmp(const char *, const char *);
int strncasecmp(const char *, const char *, size_t);
我在xnu内核(osfmk/device/subrs.c)中也找到了它的同义词,并在以下代码中实现了它,因此你不应该期望与原始的strcmp函数相比,在数字上有任何行为上的改变。
tolower(unsigned char ch) {
if (ch >= 'A' && ch <= 'Z')
ch = 'a' + (ch - 'A');
return ch;
}
int strcasecmp(const char *s1, const char *s2) {
const unsigned char *us1 = (const u_char *)s1,
*us2 = (const u_char *)s2;
while (tolower(*us1) == tolower(*us2++))
if (*us1++ == '\0')
return (0);
return (tolower(*us1) - tolower(*--us2));
}
strncasecmp()
函数! - Mike C.strcasecmp()
和 strncasecmp()
不是标准 C 库的一部分,但它们是 *nix 常见的扩展。 - chux - Reinstate Monicatolower()
函数 - 根据实际上每个版本的C标准,tolower()
是必需的函数。 - Andrew Henle是将文本转换为小写还是大写?(这是一个常见的问题)
使用 strcicmpL("A", "a")
和 strcicmpU("A", "a")
两种方式都会返回 0。
但是,strcicmpL("A", "_")
和 strcicmpU("A", "_")
可以返回不同的有符号结果,因为 '_'
经常位于大写字母和小写字母之间。
当与 qsort(..., ..., ..., strcicmp)
结合使用时,这会影响排序顺序。非标准库 C 函数,如常用的 stricmp()
或 strcasecmp()
,往往被定义得很好,并且支持通过小写字母进行比较。然而,存在一些变化。
int strcicmpL(char const *a, char const *b) {
while (*b) {
int d = tolower(*a) - tolower(*b);
if (d) {
return d;
}
a++;
b++;
}
return tolower(*a);
}
int strcicmpU(char const *a, char const *b) {
while (*b) {
int d = toupper(*a) - toupper(*b);
if (d) {
return d;
}
a++;
b++;
}
return toupper(*a);
}
char
可以有负值。(不常见)
toupper(int)
和tolower(int)
被指定用于unsigned char
值和负的EOF
。此外,strcmp()
返回结果,就好像每个char
都被转换为unsigned char
,无论char
是signed还是unsigned。
tolower(*a); // Potential UB
tolower((unsigned char) *a); // Correct (Almost - see following)
char
可以有负值且不是2的补码。(罕见)
上述内容不能正确处理-0
或其他负值,因为位模式应该被解释为unsigned char
。为了正确处理所有整数编码,请先更改指针类型。
// tolower((unsigned char) *a);
tolower(*(const unsigned char *)a); // Correct
本地化(较少用)
虽然使用ASCII码(0-127)的字符集很普遍,但其余代码往往具有特定于本地化的问题。因此,strcasecmp("\xE4", "a")
在一个系统上可能返回0,在另一个系统上则可能返回非零值。
Unicode(未来的趋势)
如果解决方案需要处理的内容不仅限于ASCII,请考虑使用 unicode_strcicmp()
。由于C库没有提供这样的函数,因此建议使用某个备用库中预编码的函数。编写自己的 unicode_strcicmp()
是一项艰巨的任务。
所有字母是否都映射为一个小写字母和一个大写字母?(卖弄学问)
[A-Z] 与 [a-z] 一一对应,但各种本地化将各种小写字符映射为一个大写字符,反之亦然。此外,有些大写字符可能没有相应的小写字符,反之亦然。
这就需要代码通过 tolower()
和 toupper()
进行转换。
int d = tolower(toupper(*a)) - tolower(toupper(*b));
如果代码使用tolower(toupper(*a))
与toupper(tolower(*a))
,则在排序时可能会得到不同的结果。
可移植性
@B. Nadolson建议避免自己编写strcicmp()
函数,这是合理的,除非代码需要高度等效的可移植功能。
下面是一种方法,它甚至比一些系统提供的函数执行速度更快。它每个循环只进行一次比较,而不是两次,通过使用2个不同的表来区分'\0'
。您的结果可能会有所不同。
static unsigned char low1[UCHAR_MAX + 1] = {
0, 1, 2, 3, ...
'@', 'a', 'b', 'c', ... 'z', `[`, ... // @ABC... Z[...
'`', 'a', 'b', 'c', ... 'z', `{`, ... // `abc... z{...
}
static unsigned char low2[UCHAR_MAX + 1] = {
// v--- Not zero, but A which matches none in `low1[]`
'A', 1, 2, 3, ...
'@', 'a', 'b', 'c', ... 'z', `[`, ...
'`', 'a', 'b', 'c', ... 'z', `{`, ...
}
int strcicmp_ch(char const *a, char const *b) {
// compare using tables that differ slightly.
while (low1[*(const unsigned char *)a] == low2[*(const unsigned char *)b]) {
a++;
b++;
}
// Either strings differ or null character detected.
// Perform subtraction using same table.
return (low1[*(const unsigned char *)a] - low1[*(const unsigned char *)b]);
}
stricmp()
。它可以比较两个字符串而不考虑大小写。strncmpci()
是一个直接的、可替换大小写不敏感字符串比较函数,可以替代strncmp()
和strcmp()
我并不是很喜欢这里最受欢迎的答案(部分原因是它似乎不正确,因为如果只有一个字符串读到了空终止符而不是两个字符串都读到了,它应该continue
,但它没有这样做),所以我自己写了一个。
这是一个直接的可替换strncmp()
的函数,并已经通过了大量测试用例,如下所示。
它与strncmp()
完全相同,除了:
strncmp()
如果其中一个字符串是空指针,则其行为是未定义的(请参见:https://en.cppreference.com/w/cpp/string/byte/strncmp)。NULL
指针,则它将返回INT_MIN
作为特殊的哨兵错误值。int strncmpci(const char * str1, const char * str2, size_t num)
{
int ret_code = 0;
size_t chars_compared = 0;
if (!str1 || !str2)
{
ret_code = INT_MIN;
return ret_code;
}
while ((chars_compared < num) && (*str1 || *str2))
{
ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
if (ret_code != 0)
{
break;
}
chars_compared++;
str1++;
str2++;
}
return ret_code;
}
完全注释版本:
/// \brief Perform a case-insensitive string compare (`strncmp()` case-insensitive) to see
/// if two C-strings are equal.
/// \note 1. Identical to `strncmp()` except:
/// 1. It is case-insensitive.
/// 2. The behavior is NOT undefined (it is well-defined) if either string is a null
/// ptr. Regular `strncmp()` has undefined behavior if either string is a null ptr
/// (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
/// 3. It returns `INT_MIN` as a special sentinel value for certain errors.
/// - Posted as an answer here: https://dev59.com/WW025IYBdhLWcg3wvIu2#55293507.
/// - Aided/inspired, in part, by `strcicmp()` here:
/// https://dev59.com/WW025IYBdhLWcg3wvIu2#5820991.
/// \param[in] str1 C string 1 to be compared.
/// \param[in] str2 C string 2 to be compared.
/// \param[in] num max number of chars to compare
/// \return A comparison code (identical to `strncmp()`, except with the addition
/// of `INT_MIN` as a special sentinel value):
///
/// INT_MIN (usually -2147483648 for int32_t integers) Invalid arguments (one or both
/// of the input strings is a NULL pointer).
/// <0 The first character that does not match has a lower value in str1 than
/// in str2.
/// 0 The contents of both strings are equal.
/// >0 The first character that does not match has a greater value in str1 than
/// in str2.
int strncmpci(const char * str1, const char * str2, size_t num)
{
int ret_code = 0;
size_t chars_compared = 0;
// Check for NULL pointers
if (!str1 || !str2)
{
ret_code = INT_MIN;
return ret_code;
}
// Continue doing case-insensitive comparisons, one-character-at-a-time, of `str1` to `str2`, so
// long as 1st: we have not yet compared the requested number of chars, and 2nd: the next char
// of at least *one* of the strings is not zero (the null terminator for a C-string), meaning
// that string still has more characters in it.
// Note: you MUST check `(chars_compared < num)` FIRST or else dereferencing (reading) `str1` or
// `str2` via `*str1` and `*str2`, respectively, is undefined behavior if you are reading one or
// both of these C-strings outside of their array bounds.
while ((chars_compared < num) && (*str1 || *str2))
{
ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
if (ret_code != 0)
{
// The 2 chars just compared don't match
break;
}
chars_compared++;
str1++;
str2++;
}
return ret_code;
}
从我的eRCaGuy_hello_world存储库中下载带有单元测试的完整示例代码,链接在这里: "strncmpci.c":
(这只是一个片段)
int main()
{
printf("-----------------------\n"
"String Comparison Tests\n"
"-----------------------\n\n");
int num_failures_expected = 0;
printf("INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!\n");
EXPECT_EQUALS(strncmpci("hey", "HEY", 3), 'h' - 'H');
num_failures_expected++;
printf("------ beginning ------\n\n");
const char * str1;
const char * str2;
size_t n;
// NULL ptr checks
EXPECT_EQUALS(strncmpci(NULL, "", 0), INT_MIN);
EXPECT_EQUALS(strncmpci("", NULL, 0), INT_MIN);
EXPECT_EQUALS(strncmpci(NULL, NULL, 0), INT_MIN);
EXPECT_EQUALS(strncmpci(NULL, "", 10), INT_MIN);
EXPECT_EQUALS(strncmpci("", NULL, 10), INT_MIN);
EXPECT_EQUALS(strncmpci(NULL, NULL, 10), INT_MIN);
EXPECT_EQUALS(strncmpci("", "", 0), 0);
EXPECT_EQUALS(strncmp("", "", 0), 0);
str1 = "";
str2 = "";
n = 0;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 0);
str1 = "hey";
str2 = "HEY";
n = 0;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 0);
str1 = "hey";
str2 = "HEY";
n = 3;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');
str1 = "heY";
str2 = "HeY";
n = 3;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');
str1 = "hey";
str2 = "HEdY";
n = 3;
EXPECT_EQUALS(strncmpci(str1, str2, n), 'y' - 'd');
EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');
str1 = "heY";
str2 = "hEYd";
n = 3;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 'e' - 'E');
str1 = "heY";
str2 = "heyd";
n = 6;
EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
EXPECT_EQUALS(strncmp(str1, str2, n), 'Y' - 'y');
str1 = "hey";
str2 = "hey";
n = 6;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 0);
str1 = "hey";
str2 = "heyd";
n = 6;
EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
EXPECT_EQUALS(strncmp(str1, str2, n), -'d');
str1 = "hey";
str2 = "heyd";
n = 3;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 0);
str1 = "hEY";
str2 = "heyYOU";
n = 3;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');
str1 = "hEY";
str2 = "heyYOU";
n = 10;
EXPECT_EQUALS(strncmpci(str1, str2, n), -'y');
EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');
str1 = "hEYHowAre";
str2 = "heyYOU";
n = 10;
EXPECT_EQUALS(strncmpci(str1, str2, n), 'h' - 'y');
EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');
EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 0);
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 'n' - 'N');
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "nice to meet you.,;", 100), 0);
EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO UEET YOU.,;", 100), 'm' - 'u');
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "nice to uEET YOU.,;", 100), 'm' - 'u');
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "nice to UEET YOU.,;", 100), 'm' - 'U');
EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 0);
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 'n' - 'N');
EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 5), 0);
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "nice eo uEET YOU.,;", 5), 0);
EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 100), 't' - 'e');
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "nice eo uEET YOU.,;", 100), 't' - 'e');
EXPECT_EQUALS(strncmpci("nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');
if (globals.error_count == num_failures_expected)
{
printf(ANSI_COLOR_GRN "All unit tests passed!" ANSI_COLOR_OFF "\n");
}
else
{
printf(ANSI_COLOR_RED "FAILED UNIT TESTS! NUMBER OF UNEXPECTED FAILURES = %i"
ANSI_COLOR_OFF "\n", globals.error_count - num_failures_expected);
}
assert(globals.error_count == num_failures_expected);
return globals.error_count;
}
$ gcc -Wall -Wextra -Werror -ggdb -std=c11 -o ./bin/tmp strncmpci.c && ./bin/tmp
-----------------------
String Comparison Tests
-----------------------
INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!
FAILED at line 250 in function main! strncmpci("hey", "HEY", 3) != 'h' - 'H'
a: strncmpci("hey", "HEY", 3) is 0
b: 'h' - 'H' is 32
------ beginning ------
All unit tests passed!
ret_code
初始化为0
而不是INT_MIN
(或者像您测试的代码中一样为-9999
),然后仅在其中一个输入字符串为NULL
ptr时将其设置为INT_MIN
。现在它完美地工作了。问题只是当n
为0时,没有进入任何块(既不是if
也不是while
),因此它只返回我初始化ret_code
的值。无论如何,现在已经修复了,并且我已经大量清理了我的单元测试并添加了您提到的测试。希望您现在能点赞。 - Gabriel Staples就像其他人所表明的那样,没有一个可在所有系统上运行的可移植函数。您可以通过简单的ifdef
部分规避这个问题:
#include <stdio.h>
#ifdef _WIN32
#include <string.h>
#define strcasecmp _stricmp
#else // assuming POSIX or BSD compliant system
#include <strings.h>
#endif
int main() {
printf("%d", strcasecmp("teSt", "TEst"));
}
strings.h
(带有s
)和 string.h
不是同一个东西... 我浪费了一些时间在错误的头文件上寻找 strcasecmp
... - Gustavo Vargas如果你的库中没有高效实现的方法,可以从这里获取一个想法。
它使用一个包含所有256个字符的表。
然后我们只需要遍历字符串并比较给定字符的表格单元格:
const char *cm = charmap,
*us1 = (const char *)s1,
*us2 = (const char *)s2;
while (cm[*us1] == cm[*us2++])
if (*us1++ == '\0')
return (0);
return (cm[*us1] - cm[*--us2]);
int str_case_ins_cmp(const char* a, const char* b) {
int rc;
while (1) {
rc = tolower((unsigned char)*a) - tolower((unsigned char)*b);
if (rc || !*a) {
break;
}
++a;
++b;
}
return rc;
}
static int ignoreCaseComp (const char *str1, const char *str2, int length)
{
int k;
for (k = 0; k < length; k++)
{
if ((str1[k] | 32) != (str2[k] | 32))
break;
}
if (k != length)
return 1;
return 0;
}
ignoreCaseComp("\
", "@", 1),更重要的是
ignoreCaseComp("\0", " ", 1)(即除了第5位(十进制32)之外的所有位都相同的情况),两者都会评估为
0`(匹配)。 - user966939