基于选择查询的群组最常见值

7

我很苦恼如何在SQL中实现这个。 我有一个表:

| User_id | Question_ID | Answer_ID |
|    1    |     1       |    1      |
|    1    |     2       |    10     |
|    2    |     1       |    2      |
|    2    |     2       |    11     |
|    3    |     1       |    1      |
|    3    |     2       |    10     |
|    4    |     1       |    1      |
|    4    |     2       |    10     |

这个表格保存了用户对于一个特定问题的回答。一个问题可能有多个回答,但一个用户不能重复回答同一个问题。(因此,每个{User_id, Question_ID}只有一个Answer_ID)

我正在尝试找到一个查询的答案:对于一个特定的问题和答案id(与同一问题相关),我想找到其他用户给出的针对不同问题的最常见答案。

例如,对于上面的表格:

For question_id = 1 -> For Answer_ID = 1 - (Question 2 - Answer ID 10)
                       For Answer_ID = 2 - (Question 2 - Answer ID 11)

是否可以在一个查询中完成?是否应该在一个查询中完成?还是我应该只使用存储过程或Java来完成这个任务?


将其分解成一组,并考虑“当前”记录..1)选择where question_id = current_question_id and user_id <> current_user_Id。2)创建临时表,其中包含问题<>当前问题的列表,按问题ID排序/排序聚合记录计数。如果要包括原始用户的答案,则可以将结果加1...然后从那里开始。但这是我的思维过程...将其分解为一系列可以重新连接的临时表。 - JGFMK
3个回答

4

@rick-james说得对,但是如果你不知道像这样查询通常在MySQL中如何编写,那么开始可能并不容易。

  1. You need a query to find out the most common answers to questions:

    SELECT 
      question_id, 
      answer_id, 
      COUNT(*) as cnt 
    FROM user_answers
    GROUP BY 1, 2
    ORDER BY 1, 3 DESC
    

    This would return a table where for each question_id we output counts in descending order.

    | 1 |  1 | 3 |
    | 1 |  2 | 1 |
    | 2 | 10 | 3 |
    | 2 | 11 | 1 |
    
  2. And now we should solve a so called greatest-n-per-group task. The problem is that in MySQL for the sake of performance the tasks like this are usually solved not in pure SQL, but using hacks which rest on knowledge how the queries are processed internally.

    In this case we know that we can define a variable and then iterating over the ready table, have knowledge about the previous row, which allows us to distinguish between the first row in a group and the others.

    SELECT 
      question_id, answer_id, cnt,
      IF(question_id=@q_id, NULL, @q_id:=question_id) as v
    FROM (
      SELECT 
         question_id, answer_id, COUNT(*) as cnt 
      FROM user_answers
      GROUP BY 1, 2
      ORDER BY 1, 3 DESC) cnts
    JOIN (
      SELECT @q_id:=-1
    ) as init;
    

    Make sure that you have initialised the variable (and respect its data type on initialisation, otherwise it may be unexpectedly casted later). Here is the result:

    | 1 |  1 | 3 |    1 |
    | 1 |  2 | 1 |(null)|
    | 2 | 10 | 3 |    2 |
    | 2 | 11 | 1 |(null)|
    
  3. Now we just need to filter out rows with NULL in the last column. Since the column is actually not needed we can move the same expression into the WHERE clause. The cnt column is actually not needed either, so we can skip it as well:

    SELECT 
      question_id, answer_id
    FROM (
      SELECT 
        question_id, answer_id
      FROM user_answers
      GROUP BY 1, 2
      ORDER BY 1, COUNT(*) DESC) cnts
    JOIN (
      SELECT @q_id:=-1
    ) as init
    WHERE IF(question_id=@q_id, NULL, @q_id:=question_id) IS NOT NULL;
    
  4. The last thing worth mentioning, for the query to be efficient you should have correct indexes. This query requires an index starting with (question_id, answer_id) columns. Since you anyway need a UNIQUE index, it make sense to define it in this order: (question_id, answer_id, user_id).

    CREATE TABLE user_answers (
      user_id INTEGER,
      question_id INTEGER,
      answer_id INTEGER,
      UNIQUE INDEX (question_id, answer_id, user_id) 
    ) engine=InnoDB;
    

这里有一个sqlfiddle供您使用:http://sqlfiddle.com/#!9/bd12ad/20


3
你是想要一条鱼吗?还是想学会钓鱼?
你的问题似乎有多个步骤。
1. 检索关于“给定答案的用户提出的问题”的信息。设计这个SELECT语句,想象结果形成一个新表格。
2. 应用“其他”限制。这可能是在SELECT#1中添加的一个小的AND ...!= ...。
3. 现在找到“最常见的答案”。这可能涉及ORDER BY COUNT(*) DESC LIMIT 1。它很可能使用派生表:
SELECT ...
    FROM ( select#2 )

1

你的问题有多个条件,你需要从Question表中获取第一个提问用户的问题:

select question_id,user_id from question

然后插入对所问问题的答案,并在您的Java代码中进行一些检查,例如(用户是否回答了与提问此问题的用户相同的问题,用户是否多次回答了此问题)。
select question_id,user_id from question where user_id=asking-user_id // gets all questions and show on UI
select answer_id,user_id from answer where user_id=answering-user_id // checks the answers that particular user

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接