Python中的马尔可夫链(初学者)

3

我是Python的新手,尝试制作马尔可夫链。其他示例展示了对象实例的使用,而我还没有那么深入。我还没有完成值的随机选择部分,但基本上我对迄今为止代码的输出感到困惑。

filename = open("dr-suess.txt")

def make_list(filename):
    """make file a list and a list of tuple tup_pairs"""
    file_string = filename.read()  #read whole file
    file_list = file_string.split()   #split on whitespace (not worrying about 
                                      # puncuation right now)
    tup_pairs = []
    for i in range(len(file_list)-1):  
        tup_pairs.append((file_list[i], file_list[i+1]))  #making my tuple pair list
        return tup_pairs, file_list  

def mapping(filename):
    tup_pairs, file_list = make_list(filename)  
    dictionary = {} 
    for pair in tup_pairs:
        dictionary[pair] = []  #setting the value of dict to empty list
    tup_pairs = set(tup_pairs)   #throwing out repeated tuples 
    for word in file_list:
        word_number = file_list.index(word)  #index number of iter word
        if word_number > 1:   #because there is no -2/-1 index 
            compared_tuple = (file_list[word_number-2], file_list[word_number-1]) #to find
                                                            #preceeding pair to compare
            for pair in tup_pairs:
                if compared_tuple == pair: 
                    dictionary[pair].append(word)  #should append the word to my dict value (list)

    print dictionary  #getting weird results (some words should appear that dont, some
                   # don't appear that should)

mapping(filename)

输出:

Lindsays-MBP:markov lindsayg$ python markov.py 
{('a', 'fox?'): [], ('Sam', 'I'): ['am?'], **('you,', 'could'): ['you', 'you', 'you', 'you', 'you', 'yo**u']**, ('could', 'you'): ['in', 'with', 'in', 'with'], ('you', 'with'): [], ('box?', 'Would'): [], ('ham?', 'Would'): [], ('I', 'am?'): [], ('you', 'in'): ['a', 'a', 'a', 'a'], ('a', 'house?'): [], ('like', 'green'): ['eggs'], ('like', 'them,'): ['Sam'], ('and', 'ham?'): [], ('Would', 'you'): ['like', 'like'], ('a', 'mouse?'): [], ('them,', 'Sam'): ['I'], ('in', 'a'): ['house?', 'box?'], ('with', 'a'): ['mouse?', 'fox?'], ('house?', 'Would'): [], ('a', 'box?'): [], ('Would', 'you,'): ['could', 'could', 'could', 'could'], ('green', 'eggs'): ['and'], ('you', 'like'): ['green', 'them,'], ('mouse?', 'Would'): [], ('fox?', 'Would'): [], ('eggs', 'and'): ['ham?']}

一个奇怪输出的例子(应该只有4个“you”值,但是有6个):
('you,', 'could'): ['you', 'you', 'you', 'you', 'you', 'you']

提供文件文本:

Would you, could you in a house?
Would you, could you with a mouse?
Would you, could you in a box?
Would you, could you with a fox?
Would you like green eggs and ham?
Would you like them, Sam I am?

我认为您需要重新审视您试图实现的算法。根据我的阅读,对于每个“you”,代码将获取可比较元组(“Would you”),这在您的情况下存在6次。也许您需要使用“for idx,word in enumerate(file_list)”并使用idx而不是word_number。 - Tasos Vogiatzoglou
1个回答

3
你的问题在于你找到单词的索引的方法: index 只会返回第一个实例。有6个'you'(还有4个不同的'you,'),它们每个都将获得相同的索引word_number = 3,因此它们都将被添加到一对('Would', 'you,')中。
要获取索引,应使用内置的enumerate函数:
for word_number, word in enumerate(file_list):
    ...

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接