将字符串数组进行分组

Question

将字符串数组进行分组

3

我已经创建了一个字符串数组，并尝试将其分组到不同的类别中。

到目前为止，我的代码看起来像这样：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int
main(int argc, char *argv[]) {
    char *results[] = {"Canada", "Cycling", "Canada", "Swimming", "India", "Swimming", "New Mexico",
                       "Cycling", "New Mexico", "Cycling", "New Mecico", "Swimming"};



    int nelements, i, country_count;

    nelements = sizeof(results) / sizeof(results[0]);

    for (i = 0 ; i < nelements; i++) {
        printf("%s\n", results[i]);
    }

    return 0;
}

这将输出以下内容：

Canada
Cycling
Canada
Swimming
India
Swimming
New Mexico
Cycling
New Mexico
Cycling
New Mexico
Swimming

我想要将每个国家的运动项目和相应的计数分组，结果应该是这样的：

Canada
    Cycling  1
    Swimming 1

India
    Swimming 1

New Mexico
    Cycling  2
    Swimming 1

我考虑使用数组中每个元素，将国家进行分类，并使用strcmp函数去除重复的国家字符串，但我不确定如何处理每个国家的体育项目计数。我不太清楚该怎么做。希望能得到任何形式的帮助。

- RoadRunner

1

你可以从 char *results[][2] 开始简化。 - MotKohn

1

在C++中无法使用一些数据结构，例如map吗？ - zhujs

7个回答

2

考虑使用城市和国家列表而不是字符串数组。

以下代码解释了最简单的实现方式，包括两个结构体和两个方法 - 添加新元素和搜索元素。

尝试运行此代码并学习它：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct city
{
    struct city * next;
    char * cityName;
    int counter;
};

struct country
{
    struct country * next;
    char * coutryName;
    struct city * cities;
};

struct country * findCountry(struct country * coutries, char * country)
{
    struct country * searchResult = NULL;
    while (coutries != NULL)
    {
        if (strcmp(country, coutries->coutryName) == 0)
        {
            searchResult = coutries;
            break;
        }
        coutries = coutries->next;
    }
    return searchResult;
}

struct country * addCountry(struct country * coutries, char * country)
{
    struct country * newCountry = malloc(sizeof(struct country));
    newCountry->next = coutries;
    newCountry->coutryName = country;
    newCountry->cities = NULL;
    return newCountry;
}

struct city * findCity(struct city * cities, char * city)
{
    struct city * searchResult = NULL;
    while (cities != NULL)
    {
        if (strcmp(city, cities->cityName) == 0)
        {
            searchResult = cities;
            break;
        }
        cities = cities->next;
    }
    return searchResult;
}

struct city * addCity(struct city * cities, char * city)
{
    struct city * newCity = malloc(sizeof(struct city));
    newCity->cityName = city;
    newCity->next = cities;
    newCity->counter = 0;
    return newCity;
}

int main(void) 
{
    char *results[] = { "Canada", "Cycling", "Canada", "Swimming", "India", "Swimming", "New Mexico",
        "Cycling", "New Mexico", "Cycling", "New Mexico", "Swimming" };

    struct country * countries = NULL;
    int nelements = sizeof(results) / sizeof(results[0]);
    // filling list of countries with sublists of cityes
    int i;
    for (i = 0; i < nelements; i+=2)
    {
        struct country * pCountry = findCountry(countries, results[i]);
        if (!pCountry)
        {
            countries = addCountry(countries, results[i]);
            pCountry = countries;
        }
        struct city * pCity = findCity(pCountry->cities, results[i+1]);
        if (!pCity)
        {
            pCountry->cities = addCity(pCountry->cities, results[i + 1]);
            pCity = pCountry->cities;
        }
        pCity->counter++;
    }

    // reading cities from all countries
    struct country * pCountry = countries;
    while (pCountry != NULL)
    {
        printf("%s\n",pCountry->coutryName);
        struct city * pCity = pCountry->cities;
        while (pCity != NULL)
        {
            printf("    %s %d\n", pCity->cityName, pCity->counter);
            pCity = pCity->next;
        }
        printf("\n");
        pCountry = pCountry->next;
    }

    return 0;
}

注意：在您的代码中，最后一个 "New Mexico" 写成了 "New Mecico"，在我的代码中已经修正了这个错误。

更新：

注意2：由于我在列表开头添加了元素，所以国家和城市的顺序与它们在源数组中首次出现的顺序相反。

如果顺序很重要，您有两个选择：

1）重写我的代码以将新项目添加到列表末尾（这是比较麻烦的方法）

2）在 main 中重新编写 for 循环，从末尾读取初始数组（这是最简单的方法）。

// filling list of countries with sublists of cityes
int i;
for (i = nelements-2; i >=0 ; i -= 2)
   {
   . . .

- VolAnd

1

这个解决方案的想法在于建立一个映射表，其中行对应于国家，列对应于体育赛事（或体育项目）。

最大可能的地图内存（大小为nelements/2 x nelements/2）是使用calloc分配的，但如果char *results[]没有改变，实际上只需要int[6][6]即可。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) 
{
    char *results[] = { "Canada", "Cycling", "Canada", "Swimming", "India", "Swimming", "New Mexico",
        "Cycling", "New Mexico", "Cycling", "New Mexico", "Swimming" };
    int nelements = sizeof(results) / sizeof(results[0]);
    int i;
    // making empty map
    int ** map = calloc(nelements/2, sizeof(int*));
    for (i = 0; i < nelements / 2; i++)
        map[i] = calloc(nelements/2, sizeof(int));
    char ** rowNames = calloc(nelements / 2, sizeof(char*));
    int usedRows = 0;
    char ** colNames = calloc(nelements / 2, sizeof(char*));
    int usedCols = 0;

    // filling the map
    // the outer loop for countries
    int c;
    for (c = 0; c < nelements; c+=2) {
        int row = -1;
        // Find country in the map (loop for rows)
        for (i = 0; i < usedRows; i++) 
        {
            if (strcmp(results[c], rowNames[i]) == 0)
            {
                row = i;
                break;
            }
        }
        // or add if it is new country
        if (row < 0)
        {
            row = usedRows;
            rowNames[usedRows] = results[c];
            usedRows++;
        }
        // Find sport in the map (loop for columns)
        int col = -1;
        for (i = 0; i < usedCols; i++)
        {
            if (strcmp(results[c+1], colNames[i]) == 0)
            {
                col = i;
                break;
            }
        }
        // or add if it is new sport
        if (col < 0)
        {
            col = usedCols;
            colNames[usedCols] = results[c+1];
            usedCols++;
        }
        // Just count sport event in the current country
        map[row][col]++;
    }

    // print results from map
    // the outer loop for countries (loop for rows in map)
    for (c = 0; c < usedRows; c++) {
        printf("%s\n", rowNames[c]);
        // the inner loop for sport
        for (i = 0; i < usedCols; i++)
            if (map[c][i])
                printf("   %s %d\n", colNames[i], map[c][i]);
        printf("\n");
    }

    return 0;
}

所以当map、rowNames（包含国家）和colNames（包含运动项目）被填充后，我们可以以任何方式输出数据。

- VolAnd

1

我会使用一个结构体（如果您不熟悉，需要时可以通过 myStruct.c 进行提醒），并使用两个数组作为其数据成员，如下所示：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define COUNTRY_LENGTH 15
#define MAX_SPORTS 5

enum sport_name { CYCLING, SWIMMING };

typedef struct Record {
  char country[COUNTRY_LENGTH];
  int sports[MAX_SPORTS];
} Record;

// return index of 'country' in 'array' if the 'country'
// is found inside 'array', else -1
int exists(char country[], Record* array, int size) {
    int i;
    for(i = 0; i < size; ++i)
        if(!strcmp(array[i].country, country))
            return i;
    return -1;
}

int find_sport_index(char sport[]) {
    if(!strcmp(sport, "Cycling"))
        return CYCLING;
    if(!strcmp(sport, "Swimming"))
        return SWIMMING;
    printf("I couldn't find a sport index for %s\n!!! Do something...Undefined Behavior!", sport);
    return -1;
}

char* find_sport_string(int sport) {
    if(sport == CYCLING)
        return "Cycling";
    if(sport == SWIMMING)
        return "Swimming";
    printf("I couldn't find a sport string for sport index %d\n!!! Do something...", sport);
    return NULL;
}

int main(int argc, char *argv[]) {
    // you had a typo, New Mecico, I corrected it..Also you could have used a struct here... ;)
    char *results[] = {"Canada", "Cycling", "Canada", "Swimming", "India", "Swimming", "New Mexico",
                       "Cycling", "New Mexico", "Cycling", "New Mexico", "Swimming"};



    int nelements, i, j;

    nelements = sizeof(results) / sizeof(results[0]);

    const int records_size = nelements/2;

    Record record[records_size];
    for(i = 0; i < records_size; i++) {
        for(j = 0; j < COUNTRY_LENGTH; j++) 
            record[i].country[j] = 0;
        for(j = 0; j < MAX_SPORTS; j++)
            record[i].sports[j] = 0;
    }

    int country_index, records_count = 0;
    for(i = 0; i < nelements; ++i) {
        // results[i] is a country
        if(i % 2 == 0) {
            country_index = exists(results[i], record, records_size);
            if(country_index == -1) {
                country_index = records_count++;
                strcpy(record[country_index].country, results[i]);
            }
        } else {
            // result[i] is a sport
            record[country_index].sports[find_sport_index(results[i])]++;
        }
    }    


    for(i = 0; i < records_size; ++i) {
        if(strlen(record[i].country)) {
            printf("%s\n", record[i].country);
            for(j = 0; j < MAX_SPORTS; j++) {
                if(record[i].sports[j] != 0) {
                    printf("    %s %d\n", find_sport_string(j), record[i].sports[j]);
                }
            }
        }    
    }

    return 0;
}

输出：

C02QT2UBFVH6-lm:~ gsamaras$ ./a.out 
Canada
    Cycling 1
    Swimming 1
India
    Swimming 1
New Mexico
    Cycling 2
    Swimming 1

这个想法是：

Record 结构体包含奥运会记录和相关运动。
Record.country 包含国家名称（我假设最多为14个字符，再加上一个NULL终止符，因此我将其定义为15）。
Record.sports 是一个大小为 MAX_SPORTS 的数组，大小等于奥运会中的所有运动，但我假设它是5。该数组的每个位置都是一个计数器（表示该国在某项运动中获得的奖牌数）。例如，Record.sports[1] = 2 表示该国在游泳项目中获得了2枚奖牌。但是我怎么知道这是游泳呢？作为程序员，我事先决定，第一个计数器与自行车项目相关联，第二个计数器与游泳项目相关联，以此类推。我使用了 enum 来使代码更易读，而不是使用“魔术数字”。
你以奇怪的方式定义了 results[]，因为你真正应该使用结构体，但我使用了你的代码...所以我需要一个 Records 数组，并且它的大小应该等于国家数量的一半，即 results[] 大小的一半。请注意，因为你将 results[] 定义为包含隐式的国家-运动对，所以除以2就足以确定 Record 数组的大小。
我循环遍历 results[] 来填充 record[]，使用 for-loop 中的计数器 i。当 i 为偶数时，result[i] 包含一个国家，否则它包含一个运动。我使用模运算符（%）来轻松确定这一点。
如果该国家不存在于 record[] 中，则我将其插入，否则我不会再次插入。在两种情况下，我都希望记住其在 record[] 中的索引，以便在下一次迭代中，即我们将处理运动时，我们将知道应该查看并相应地处理 record[] 的哪个位置。
现在，当我处理一个运动时，我想增加该运动的计数器，但仅适用于相应的国家（记住，我已经存储了我在上一次迭代中处理过的国家的索引）。
然后我只需打印输出，就是这样！ :)

- gsamaras

1

不错。我花了一些时间才弄清楚!strcmp是什么意思。因为它有点不直观。 - jian

1

根据你提供的数组，我可以看到国家名称是交替出现的。如果数据以这种格式提供，则可以按照以下代码操作。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[])
{
   char *results[] = {"Canada", "Cycling", "Canada", "Swimming", "India","Swimming", "New Mexico",
               "Cycling", "New Mexico", "Cycling", "New Mexico", "Swimming"};



   int nelements, i, sport_count=0,country_change =0;
   char country[50];char sport[50];
   strcpy(country,results[0]);
   printf("%s\n", country);
   strcpy(sport,results[1]);
   nelements = sizeof(results) / sizeof(results[0]);

   for (i = 1 ; i < nelements; i++) 
   {
      if(((i%2)==0) && (strcmp(country,results[i])))
      {
         //sport_count++;
         printf("\t%s %d\n", sport,sport_count);
         country_change =1;
         strcpy(country,results[i]);
         printf("%s\n", country);
      }
      else if((i%2)==1)
      {
          if(country_change)
          {
             strcpy(sport,results[i]);
             country_change = 0;
             sport_count = 0;
          }

          if(!strcmp(sport,results[i]))
          {
              sport_count++;
          }
          else
          {
              printf("\t%s %d\n", sport,sport_count);
              strcpy(sport,results[i]);
              sport_count = 1;
          }
             //strcpy(country,results[i]);
       }

    }
    printf("\t%s %d\n", sport,sport_count);

 return 0;
}

基本上这就是我在这里试图做的事情：

Store the first index in a variable.
Than in each even iteration check if the country name is equal to the stored name. If not update the name.
In each odd iteration you can just print out the name.
Sport name is stored in a variable and a int variable sports_count keeps the count.
If new country arrives than print the name of sport first and than a mandatory update in the name of sport and relevant variables.

Last sport name is printed outside the loop.

Output

Canada
        Cycling 1
        Swimming 1
India
        Swimming 1
New Mexico
        Cycling 2
        Swimming 1

- Denis

1

有多种方法可以处理这个任务，正如您从答案数量中所看到的。您需要的一个元素，无论是针对国家还是事件（但不是两者都要），都是一个简单的查找表，包含国家条目或事件条目，以允许您区分结果中的值是国家名称还是事件名称。一个简单的国家查找（在此处全局定义，但也可以是函数范围），如下所示：

char *countries[] = { "Canada", "India", "New Mexico" }; /* countries lookup */

另一个快捷方式是认识到结果中的指针具有函数作用域，因此无需复制或分配内存来保存它们--它们已经存在于只读内存中。

另一个有帮助的结构元素是保持与国家相关联的事件计数，例如eventcnt。每次在国家下添加事件时，可以将其递增。您可以使用类似于国家/事件结构的结构。

typedef struct {
    char *country;
    char *event[MAXE];
    int eventcnt;
} host;

MAXE是一个简单的常量，用于允许您在结构体数组中使用自动存储的最大事件数。（它可以轻松更改为根据需要分配/重新分配存储空间）。

然后，您只需要简单地循环遍历results数组一次，理解事件始终在其前面的国家之后。使用几个嵌套循环将您遍历results的次数保持为一次。基本上，您遍历results中的每个指针，确定它是否指向国家名称，如果是国家名称，则将其添加到host.country值中（如果它不是已存在的），或者如果它已经存在，则跳过它（无需将指针更新为指向国家名称的最后一次出现）。

由于涉及嵌套循环，因此简单的goto提供了您确定何时处理country名称以及何时处理event名称并允许您在每种情况下采取所需措施的所有控制。

然后，只需要打印/使用您想要的结果，这些结果现在包含在具有hidx（主机索引）的结构体数组中，该索引包含涉及的唯一主机总数。

将各部分组合在一起，您可以执行类似以下的操作：

#include <stdio.h>
#include <string.h>

/* constants max(countries, events) */
enum { MAXC = 8, MAXE = 16 };

char *countries[] = { "Canada", "India", "New Mexico" }; /* countries lookup */

typedef struct {
    char *country;
    char *event[MAXE];
    int eventcnt;
} host;

int main (void) {

    char *results[] = { "Canada", "Cycling", "Canada", "Swimming", 
                        "India", "Swimming", "New Mexico", "Cycling", 
                        "New Mexico", "Cycling", "New Mexico", "Swimming"};
    host hosts[MAXC] = {{ .country = NULL }};
    int hidx = 0, i, j, country_count, current = 0, nelements;

    country_count = sizeof countries/sizeof *countries;
    nelements = sizeof results / sizeof *results;

    for (i = 0 ; i < nelements; i++) {          /* for each element */
        for (j = 0; j < country_count; j++) {   /* check if country */
            if (strcmp (results[i], countries[j]) == 0) { /* if so */
                int k;
                for (k = 0; k < hidx &&  /* check if already assigned */
                    strcmp (hosts[k].country, countries[j]); k++) {}
                if (!hosts[k].country) { /* if not, assign ptr, increment */
                    hosts[hidx++].country = results[i];
                    current = hidx - 1;;
                }
                goto nextc; /* skip event adding */
            }
        } /* results[i] is not a country, check if event exists for host */
        if (hosts[current].eventcnt < MAXE) {   /* if it doesn't, add it */
            int k;
            for (k = 0; k < hosts[current].eventcnt; k++)
                if (strcmp (results[i], hosts[current].event[k]) == 0)
                    goto nextc;  /* already exists for host, skip add */
            hosts[current].event[hosts[current].eventcnt++] = results[i];
        }
        nextc:;
    }

    for (i = 0; i < hidx; i++) {    /* output countries & events for each */
        printf (" %s\n", hosts[i].country);
        for (j = 0; j < hosts[i].eventcnt; j++)
            printf ("     %s\n", hosts[i].event[j]);
    }

    return 0;
}

例子使用/输出

$ ./bin/events
 Canada
     Cycling
     Swimming
 India
     Swimming
 New Mexico
     Cycling
     Swimming

请查看所有答案。其中包含许多好的观点。如果您有任何问题，请告诉我。

- David C. Rankin

1

我会列举运动和场地，将NUM_x添加为最后一个元素，以便未来可以轻松添加枚举...

typedef enum _sport_t
{
  CYCLING,
  SWIMMING,
  NUM_SPORTS
} sport_t;

typedef enum _location_t
{
  CANADA,
  INDIA,
  NEW_MEXICO,
  NUM_LOCATIONS
} location_t;

现在，您可以定义字符串数组以便在打印名称时使用...

char* sports_name[NUM_SPORTS] = {"Cycling", "Swimming"};
char* location_name[NUM_LOCATIONS] = {"Canada", "India", "New Mexico"};

这种方法会稍微减少存储空间，提高效率，因为当你对列表进行分类时，你将比较枚举（整数）而不是字符串。

你可能还想考虑使用一个二维布尔数组来表示所有位置和所有运动项目，指示该位置是否有该运动项目。

typedef enum _bool_t
{
  FALSE,
  TRUE
} bool_t;

bool_t sports_array[NUM_LOCATIONS][NUM_SPORTS] =
{ 
  {TRUE,TRUE},  // Canada
  {TRUE,FALSE}, // India
  {TRUE,TRUE},  // New Mexico
};

所以，你的循环应该是这样的...

location_t l;
sport_t s;

for (l = (location_t)0; l < NUM_LOCATIONS; l++)
{
  printf( " %s\n", location_name[l] );
  for (s = (sport_t)0; s < NUM_SPORTS; s++)
  {
    if (sports_array[l,s])
    {
      printf( "     %s\n", sport_name[s] );
    }
  }
}

- DiegoSunDevil

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ankita Mehta · Accepted Answer

解决方案取决于你想采取哪种方法。保持单个字符数组（在你的代码中是results*）无法使您的数据具有动态性。基本上，您希望使用字典数据结构存储（如有必要，嵌套）一对一对数据。在C语言中，我会使用结构体使其模块化。

首先，您需要一个结构体来存储不同运动及其计数（例如奖牌数量）。

struct sport {
  char *sport_name;
  int medal_count;
  //Any other details you want to store
};

然后，一个国家可以参与多项运动。因此我们需要建立国家结构。

struct Country{
  char *country_name;
  struct sport* results;
  //Any other details you want to store
};

现在让我们创建一个国家数据的数组。

#define NO_OF_COUNTRIES 3  //You may fix this or make it dynamic
struct Country country_data[NO_OF_COUNTRIES];

现在您可以根据需要填写数据。希望这能帮到您。