如何在使用strtok()函数后恢复字符串

Question

如何在使用strtok()函数后恢复字符串

3

我有一个项目，需要根据每行文本中的第二个、第三个等单词来对多行文本进行排序，而不是第一个单词。例如：

this line is first

but this line is second

finally there is this line

如果您选择按第二个单词排序，则会变成：

this line is first

finally there is this line

but this line is second

我有一个指向包含每行的字符数组的指针。到目前为止，我所做的是使用strtok()函数将每行分割到第二个单词，但这会将整个字符串更改为仅该单词，并将其存储在我的数组中。我的分词代码如下：

```c char *token = strtok(line, " "); strcpy(myArray[i], token); ```

Is there a way to split the line and keep the original string intact? I need to be able to access the entire line later in my code.

 for (i = 0; i < numLines; i++) {
   char* token = strtok(labels[i], " ");
   token = strtok(NULL, " ");
   labels[i] = token;
 }

这将给我每行的第二个单词，因为我调用了两次strtok。然后我对这些单词进行排序。(line, this, there) 然而，我需要将字符串以其原始形式放回。我知道strtok会将标记转换为'\0'，但我还没有找到一种方法来获取原始字符串。

我相信答案在于使用指针，但我不确定下一步需要做什么。

我应该提到，我正在从输入文件中读取行，如下所示：

for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
  labels[i] = strdup(buffer);

编辑：我的find_offset方法

size_t find_offset(const char *s, int n) {
  size_t len;
  while (n > 0) {
     len = strspn(s, " ");
     s += len;
  }

  return len;
}

编辑2：用于排序的相关代码

//Getting the line and offset
for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
   labels[i].line = strdup(buffer);
   labels[i].offset = find_offset(labels[i].line, nth);
}


int n = sizeof(labels) / sizeof(labels[0]);
qsort(labels, n, sizeof(*labels), myCompare);
for (i = 0; i < numLines; i++)
  printf("%d: %s", i, labels[i].line); //Print the sorted lines


int myCompare(const void* a, const void* b) { //Compare function
  xline *xlineA = (xline *)a;
  xline *xlineB = (xline *)b;

  return strcmp(xlineA->line + xlineA->offset, xlineB->line + xlineB->offset);
}

- nhlyoung

7

最简单的方法是先复制该字符串。 - paddy

1

警告：如果您把字符串重新组合在一起，那么labels[i]将不指向漂亮的子字符串。你确定要这么做吗？ - chux - Reinstate Monica

如果我复制了这个字符串，那么如何将它按照新的顺序排列？ - nhlyoung

当（n>0）时{ len = strspn(s，" "); s + = len; } 是一个无限循环。 - chux - Reinstate Monica

strdup是唯一可行的解决方案，特别是如果我们有多个分隔符而不仅仅是空格，它可能是最快的。 - 0___________

显示剩余4条评论

2个回答

1

我需要将字符串恢复为原始形式。我知道strtok将令牌转换为'\0'，但我尚未找到将原始字符串还原的方法。

如果您想保留它们，特别是避免丢失对它们的指针，最好避免在第一次损坏原始字符串，并尤其避免损失指向它们的指针。假设可以安全地假定每行至少有三个单词，并且第二个单词与第一和第三个单词之间恰好相隔一个空格，那么可以撤消strtok()用字符串终止符替换分隔符。但是，一旦丢失整个字符串的开头，就没有安全或可靠的方法来恢复它。

我建议创建一个辅助数组，在其中记录每个句子的第二个单词的信息-在不损坏原始句子的情况下获得-然后共同排序辅助数组和句子数组。在辅助数组中记录的信息可以是句子的第二个单词的副本、它们的偏移量和长度或类似的东西。

- John Bollinger

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- chux - Reinstate Monica · Accepted Answer

也许与其使用 strtok()，不如使用 strspn()、strcspn() 来解析字符串中的标记。这样原始字符串甚至可以是 const。

#include <stdio.h>
#include <string.h>

int main(void) {
  const char str[] = "this line is first";
  const char *s = str;
  while (*(s += strspn(s, " ")) != '\0') {
    size_t len = strcspn(s, " ");

    // Instead of printing, use the nth parsed token for key sorting
    printf("<%.*s>\n", (int) len, s);

    s += len;
  }
}

输出

<this>
<line>
<is>
<first>

或者

不要对行进行排序。

对结构进行排序。

typedef struct {
  char *line;
  size_t offset;
} xline;

伪代码

int fcmp(a, b) {
  return strcmp(a->line + a->offset, b->line + b->offset);
}

size_t find_offset_of_nth_word(const char *s, n) {
  while (n > 0) {
    use strspn(), strcspn() like above
  }
}

main() {
  int nth = ...;
  xline labels[numLines];
  for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
     labels[i].line = strdup(buffer);
     labels[i].offset = find_offset_of_nth_word(nth);
  }

  qsort(labels, i, sizeof *labels, fcmp);

}

或

阅读每行后，使用 strspn()、strcspn() 找到第 n 个标记，并将行从 "aaa bbb ccc ddd \n" 改为 "ccd ddd \naaa bbb "，然后进行排序，最后重新排列该行。

在所有情况下，请勿使用 strtok() - 过多的信息会丢失。