跟随动态分数

Question

跟随动态分数

pythonstatisticsartificial-intelligenceagent

3

我没有接受过正式的离散数学培训，遇到了一个小问题。我正在尝试编写一个代理程序，它会读取人类玩家（任意）的得分，并每隔一段时间得分一次。代理需要时不时地“落后”并“追赶”，以便人类玩家认为有一些竞争。然后，代理必须根据条件赢或输给人类。

我尝试了几种不同的技术，包括一个失败得很惨的奇怪概率循环。我认为这个问题需要类似于发射隐藏马尔可夫模型（HMM）的东西，但我不确定如何实现它（甚至是否是最佳方法）。

我有一个gist，但它很糟糕。

我希望__main__函数能提供一些关于这个代理目标的见解。它将在pygame中被调用。

- Octaflop

2

更详细的信息可能会有所帮助 - 这是什么类型的“游戏”？得分频繁吗，像弹球游戏一样，还是不频繁，像足球一样（除非你是巴西队）？ - Seth

游戏是俄罗斯方块。我已经编写了代码，使得玩家每放置一个方块可以获得10分，并且当玩家消除一些行时，他们将获得消除行数的平方乘以100的分数。 - Octaflop

1

只是好奇，为什么不让代理程序真正地参与比赛，通过实际游戏来获得分数呢？ - MattH

这是为心理学实验设计的；需要有胜利和失败的条件。 - Octaflop

2个回答

0

我假设人类无法看到计算机代理程序玩游戏。如果是这种情况，这里有一个你可以尝试的想法。

创建一个列表，列出任何给定移动可以得分的所有可能点组合。对于每个移动，找到一个分数范围，你希望代理在当前回合结束后落在其中。将可能的移动值集合缩小为只有那些可以让代理落在特定范围内的值，并随机选择一个。随着条件改变，你希望代理落后或领先的程度，只需相应地调整你的范围。

如果你正在寻找一些具有内置和研究过的心理效应的东西，我无法帮助你。如果你想要比这更具体的东西，你需要为我们定义更多的规则。

- Nick Larsen

我很喜欢这个。我会在周六尝试实现它。 - Octaflop

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Cerin · Accepted Answer

我认为你可能过度思考了。你可以使用简单的概率来估计电脑得分的“追赶”频率和程度。此外，你可以计算出电脑得分与人类得分之间的差异，然后将其输入到类Sigmoid函数中，以确定电脑得分增加的程度。

Python示例：

#!/usr/bin/python
import random, math
human_score = 0
computer_score = 0
trials = 100
computer_ahead_factor = 5 # maximum amount of points the computer can be ahead by
computer_catchup_prob = 0.33 # probability of computer catching up
computer_ahead_prob = 0.5 # probability of computer being ahead of human
computer_advantage_count = 0
for i in xrange(trials):
    # Simulate player score increase.
    human_score += random.randint(0,5) # add an arbitrary random amount
    # Simulate computer lagging behind human, by calculating the probability of
    # computer jumping ahead based on proximity to the human's score.
    score_diff = human_score - computer_score
    p = (math.atan(score_diff)/(math.pi/2.) + 1)/2.
    if random.random() < computer_ahead_prob:
        computer_score = human_score + random.randint(0,computer_ahead_factor)
    elif random.random() < computer_catchup_prob:
        computer_score += int(abs(score_diff)*p)
    # Display scores.
    print 'Human score:',human_score
    print 'Computer score:',computer_score
    computer_advantage_count += computer_score > human_score
print 'Effective computer advantage ratio: %.6f' % (computer_advantage_count/float(trials),)