在Python中计算BLEU分数

Question

在Python中计算BLEU分数

pythonnltk

14

有一个测试句子和一个参考句子。我如何编写Python脚本，以BLEU度量衡的形式衡量这两个句子之间的相似性？BLEU度量衡被用于自动机器翻译评估。

- Alapan Kuila

2

除了BLEU之外，如果您想使用机器翻译度量相似性的指标，请参考：http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval015.pdf - alvas

6个回答

12

你实际上在询问两件不同的事情。我会尝试解答每个问题。

第一部分：计算BLEU分数

您可以使用nltk下的BLEU模块来计算BLEU分数。请参见这里。

从那里，您可以轻松计算候选句子和参考句子之间的对齐分数。

第二部分：计算相似度

如果您想基于参考句子来衡量相似度，则我不建议使用BLEU分数作为第一个候选句子和第二个候选句子之间的相似性度量。

现在，让我详细说明一下。如果您针对参考句子计算一个候选的BLEU分数，那么即使参考句子保持不变，此分数也仅能帮助您了解另一个候选的BLEU分数与参考句子之间的相似性。

如果您想测量两个句子之间的相似性，则word2vec将是一种更好的方法。您可以计算两个句子向量之间的角余弦距离以了解它们的相似性。

如果想深入了解 BLEU 指标的作用，我建议阅读 this 以及 this 了解 word2vec 相似性。

- Semih Yagcioglu

不要使用word2vec，更先进的doc2vec（或任何其他句子嵌入）是寻找句子相似性的更好方法。 - Rajarshee Mitra

5

您可能想要使用Python包SacréBLEU（仅限Python 3）：

SacréBLEU提供无忧计算可共享、可比较和可重现的BLEU分数。受Rico Sennrich的multi-bleu-detok.perl启发，它产生官方WMT分数，但与纯文本一起使用。它还知道所有标准测试集，并为您处理下载、处理和标记化。

为什么要使用这个版本的BLEU？

它自动下载常见的WMT测试集并将其处理为纯文本
它生成一个短版本字符串，便于跨论文比较
它使用WMT（机器翻译会议）标准标记化在解标记化输出上正确计算得分
它产生与WMT使用的official script（mteval-v13a.pl）相同的值
它输出BLEU分数而不带逗号，因此您不必使用sed去除它（看着你，multi-bleu.perl）

安装方式：pip install sacrebleu

- Franck Dernoncourt

4

以下是计算两个文件之间的Bleu分数的代码。

from nltk.translate.bleu_score import sentence_bleu
import argparse

def argparser():
    Argparser = argparse.ArgumentParser()
    Argparser.add_argument('--reference', type=str, default='summaries.txt', help='Reference File')
    Argparser.add_argument('--candidate', type=str, default='candidates.txt', help='Candidate file')

    args = Argparser.parse_args()
    return args

args = argparser()

reference = open(args.reference, 'r').readlines()
candidate = open(args.candidate, 'r').readlines()

if len(reference) != len(candidate):
    raise ValueError('The number of sentences in both files do not match.')

score = 0.

for i in range(len(reference)):
    score += sentence_bleu([reference[i].strip().split()], candidate[i].strip().split())

score /= len(reference)
print("The bleu score is: "+str(score))

使用命令 python file_name.py --reference file1.txt --candidate file2.txt

- Ameet Deshpande

0

如果已知测试和参考句子，我可以展示如何计算BLEU分数的一些例子。

你甚至可以将这两个句子作为字符串输入并转换为列表。

from nltk.translate.bleu_score import sentence_bleu
reference = [['the', 'cat',"is","sitting","on","the","mat"]]
test = ["on",'the',"mat","is","a","cat"]
score = sentence_bleu(  reference, test)
print(score)


from nltk.translate.bleu_score import sentence_bleu
reference = [['the', 'cat',"is","sitting","on","the","mat"]]
test = ["there",'is',"cat","sitting","cat"]
score = sentence_bleu(  reference, test)
print(score)

- Aryan Singh

0

如果有人使用TensorFlow，你需要计算y_true和y_pred。

例子：

英文输入（y_true是以下句子的某个向量）- 我非常喜欢这部电影。

法语输出（y_pred是以下句子的某个向量，你可以使用tf.argmax()来获取最高概率）- j'ai beaucoup aimé le film。

class BLEU(tf.keras.metrics.Metric):

def __init__(self, name='bleu_score'):
    super(BLEU, self).__init__()
    self.bleu_score = 0

def update_state(self, y_true, y_pred, sample_weight=None):
    y_pred = tf.argmax(y_pred, -1)
    self.bleu_score = 0
    for i, j in zip(y_pred, y_true):
        tf.autograph.experimental.set_loop_options()

        total_words = tf.math.count_nonzero(i)
        total_matches = 0
        for word in i:
            if word == 0:
                break
            for q in range(len(j)):
                if j[q] == 0:
                    break
                if word == j[q]:
                    total_matches += 1
                    j = tf.boolean_mask(j, [False if y == q else True for y in range(len(j))])
                    break

        self.bleu_score += total_matches / total_words

def result(self):
    return self.bleu_score / BATCH_SIZE

- ZKS

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ccy · Accepted Answer

BLEU分数由两部分组成，修改后的精确度和长度惩罚。有关详细信息，请参见论文。您可以在NLTK中使用nltk.align.bleu_score模块。以下是一个代码示例：

import nltk

hypothesis = ['It', 'is', 'a', 'cat', 'at', 'room']
reference = ['It', 'is', 'a', 'cat', 'inside', 'the', 'room']
#there may be several references
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis)
print(BLEUscore)

请注意，默认的BLEU分数使用n=4，其中包括从unigram到4-gram。如果您的句子小于4，则需要重置N值，否则将返回ZeroDivisionError: Fraction(0, 0)错误。

因此，您应该像这样重置权重：

import nltk

hypothesis = ["open", "the", "file"]
reference = ["open", "file"]
#the maximum is bigram, so assign the weight into 2 half.
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis, weights = (0.5, 0.5))
print(BLEUscore)