高效算法：确定列表中不常见的值

Question

高效算法：确定列表中不常见的值

4

我正在构建一个问答应用程序，该应用程序从问题池中随机提取问题。但是，有一个要求，即问题池仅限于用户尚未看到的问题。如果用户已经看过所有问题，则算法应“重置”，仅显示用户已经看过一次的问题。也就是说，始终向用户展示他们从未见过的问题，或者如果他们已经看过所有问题，则始终向他们展示他们更少见的问题，然后才是他们看过更多次数的问题。

列表(L)的创建方式如下：列表(I)中的任何值可能存在一次或在列表中多次重复出现。让我们定义列表中的另一个值J，使其不同于I。那么，0 <= abs(frequency(I) - frequency(J)) <= 1将始终为真。

换句话说：如果一个值在列表中重复5次，并且5次是列表中任何值重复的最大次数，则列表中的所有值都将重复4次或5次。算法应该在返回任何frequency == 5的值之前返回列表中所有frequency == 4的值。

抱歉这么冗长，我很难简洁地定义这个问题。请随时留下评论，如果需要进一步说明，我会进一步澄清。

非常感谢您提供的任何帮助。

澄清：

感谢迄今为止提出的答案。我认为它们都还不够。让我进一步解释。

我没有与用户进行交互并问他们问题。我正在将问题ID分配给考试记录，以便当用户开始考试时，确定他们可以访问的问题列表。因此，我有两个数据结构要处理：

1. 用户可以访问的可能问题ID列表 2. 此用户以前被分配过的所有问题ID的列表。这就是上面描述的列表L。

因此，除非我弄错了，否则解决此问题的算法/解决方案将需要使用上述两个列表进行列表和/或集合操作。

结果将是一个问题ID列表，我可以将其与考试记录相关联，然后插入到数据库中。

- Randy Syring

我认为这个问题比你所描述的要简单得多。我在下面提供了解决方案，它不使用多个列表或枢轴，而且编码更容易。如果有不足之处，请告诉我。你是在使用Python 2还是3？ - Russia Must Remove Putin

@AaronHall，请查看我问题中的澄清。另外，我正在使用Python 2。 - Randy Syring

要求蔓延是难以抵抗的，即使是程序员也一样。我相信我的修改后的答案可以为您省去一些步骤，并得到相同的信息。 - cphlewis

5个回答

5

为什么不创建两个列表，一个用于尚未选择的问题，另一个用于已选择的问题。最初，未选择列表将是满的，您将从中选择元素，并将其从列表中删除并添加到已选择列表中。一旦未选择列表为空，重复上述相同的过程，这次使用完整的已选择列表作为未选择列表，反之亦然。

- arshajii

每次都对完整列表进行洗牌，以满足随机请求？不错。（当然，随机选取。算了。） - cphlewis

一个单独的列表比仅仅一次遍历洗牌后的列表要低效得多。 - Russia Must Remove Putin

@cphlewis 是的，完全正确。 :) - arshajii

@AaronHall 我不认为这里会有性能问题。根据原帖的问题，我了解到将“已选择”和“未选择”视为两个不同的元素集合在逻辑上是有意义的，这就是为什么我更喜欢这种方法的原因。 - arshajii

谢谢您的想法。因为我是通过查询数据库来获取我的两个列表，所以我不能像您建议的那样做。请查看我的更新问题，了解我开始使用的两个列表的性质。 - Randy Syring

@Randy，你的评论毫无意义。你不能将建议的方法应用于特定场景并不意味着它们无效。与其重申你的问题，不如直接提出一个新问题。 - Niklas B.

3

要实现您的算法，只需要对列表进行随机排序并依次遍历，在完成后重复该过程。

不需要复制列表或在两个列表之间移动项目，只需使用以下控制流程，例如：

import random

def ask_questions(list_of_questions):
    while True:
        random.shuffle(list_of_questions)
        for question in list_of_questions:
            print(question)
            # Python 3 use input not raw_input
            cont = raw_input('Another question?') 
            if not cont:
                break
        if not cont:
            break

- Russia Must Remove Putin

0

这是我想出来的：

from collections import Counter
import random

# the number of question ids I need returned to
# assign to the exam
needed = 3

# the "pool" of possible question ids the user has access to
possible = [1,2,3,4,5]

# examples of lists of question ids I might see that represent
# questions a user has already answered
answered1 = []
answered2 = [1,3]
answered3 = [5,4,3,2]
answered4 = [5,4,3,2,1,1,2]
answered5 = [5,4,3,2,1,1,2,3,4,5,1]
answered6 = [5,4,3,2,1]

def getdiff(answered):
    diff = set(possible) - set(answered)
    still_needed = needed - len(diff)
    if still_needed > 0:
        not_already_selected = list(set(possible) - diff)
        random.shuffle(not_already_selected)
        diff = list(diff) + not_already_selected[0:still_needed]
        random.shuffle(diff)
        return diff
    diff = list(diff)
    random.shuffle(diff)
    if still_needed == 0:
        return diff
    return diff[0:needed]

def workit(answered):
    """ based on frequency, reduce the list down to only
        those questions we want to consider "answered"
    """
    have_count = 0
    if len(possible) > len(answered):
        return getdiff(answered)
    counted = Counter(answered)
    max_count = max(counted.values())
    # the key here is to think of "answered" questions as
    # only those that have been seen with max frequency
    new_answered = []
    for value, count in counted.iteritems():
        if count == max_count:
            new_answered.append(value)
    return getdiff(new_answered)

print 1, workit(answered1)
print 2, workit(answered2)
print 3, workit(answered3)
print 4, workit(answered4)
print 5, workit(answered5)
print 6, workit(answered6)

"""
>>> 
1 [2, 4, 3]
2 [2, 5, 4]
3 [5, 2, 1]
4 [5, 3, 4]
5 [2, 4, 3]
6 [2, 3, 5]
>>> ================================ RESTART ================================
>>> 
1 [3, 1, 4]
2 [5, 2, 4]
3 [2, 4, 1]
4 [5, 4, 3]
5 [4, 5, 3]
6 [1, 5, 3]
>>> ================================ RESTART ================================
>>> 
1 [1, 2, 3]
2 [4, 2, 5]
3 [4, 1, 5]
4 [5, 4, 3]
5 [2, 5, 4]
6 [2, 1, 4]
"""

- Randy Syring

0

让我们定义一个“枢轴”，它将列表分成两个部分。枢轴将数组划分为这样一种方式，即枢轴之前的所有数字都比枢轴之后的数字选了一个（或者更普遍地说，枢轴之前的所有数字都不符合选取条件，而枢轴之后的所有数字都符合选取条件）。

您只需要从枢轴后的数字列表中随机选择一个项目，将其与枢轴上的数字交换，然后将枢轴增加。当枢轴达到列表的末尾时，您可以将其重置回开头。

另外，您还可以使用两个列表，这样做实现起来要容易得多，但效率略低，因为它需要扩展/缩小列表。大多数情况下，实现的简易性会胜过低效，因此我通常会选择使用两个列表。

- Lie Ryan

我认为你提出的建议有价值。问题是，我如何确定我的枢轴在哪里？请查看我更新的问题，以了解我需要处理的两个列表的性质。 - Randy Syring

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- cphlewis · Accepted Answer

使用填充伪代码重写了与数据库相关的内容。

如果我正确理解问题，我会把问题（或它们的ID作为代理）视为一副实际的纸牌：对于每个用户，洗牌并逐个发放问题；如果他们需要超过len(deck)个问题，就重新开始：将牌洗入新的顺序并再次执行。当一个问题第n次被看到时，所有其他问题都已被看到n或n-1次。

为了跟踪用户可用的问题，我们将未使用的问题ID放回数据库，并在需要新的发牌时增加“通过多少次”计数器。

类似于：

from random import shuffle

def deal():
    question_IDs = get_all_questions(dbconn) # all questions
    shuffle(question_IDs)
    increment_deal_count(dbconn, userID) # how often this student has gotten questions
    return question_IDs


count_deals = get_stored_deals(dbconn, userID) # specific to this user
if count_deals: 
    question_IDs = get_stored_questions(dbconn, userID) # questions stored for this user 
else: # If 0 or missing, this is the first time for this student
    question_IDs = deal()


while need_another_question(): #based on exam requirements
    try:
        id = question_IDs.pop()
    except IndexError:
        question_IDs = deal()
        id = question_IDs.pop() # Trouble if db is ever empty. 

    use_question(id) # query db with the ID, then put question in print, CMS, whatever

# When we leave that while loop, we have used at least some of the questions
# question_IDs lists the *unused* ones for this deal
# and we know how many times we've dealt.

store_in_db(dbconn, userinfo, question_IDs)
# If you want to know how many times a question has been available, it's
# count_deals - (ID in question_IDs)
# because True evaluates to 1 if you try to subtract it from an integer.