当前位置：网站首页>Explain Bleu in machine translation task in detail

Explain Bleu in machine translation task in detail

2022-07-07 07:09:00 【aelum】

Catalog

One 、 $n$ Metagrammar （N-Gram）
Two 、BLEU（Bilingual Evaluation Understudy）
References

One 、 $n$ Metagrammar （N-Gram）

$n$ Metagrammar （n-gram） Refers to the text continuity The emergence of $n$ individual Morpheme . When $n$ Respectively $1, 2, 3$ when ,n-gram It's also called unigram（ Unary grammar ）、bigram（ Binary grammar ） and trigram（ Ternary grammar ）.

$n$ The meta grammar model is based on $n - 1$ A probabilistic language model of order Markov chains （ That is, only the former is considered $n - 1$ When words appear , The probability of the latter word ）：

$\begin{aligned} \text{unigram:}\quad&P(w_1,w_2,\cdots,w_T)=\prod_{i=1}^T P(w_i) \\ \text{bigram:}\quad&P(w_1,w_2,\cdots,w_T)=P(x_1)\prod_{i=1}^{T-1} P(w_{i+1}|w_i) \\ \text{trigram:}\quad&P(w_1,w_2,\cdots,w_T)=P(x_1)P(x_2|x_1)\prod_{i=1}^{T-2} P(w_{i+2}|w_{i},w_{i+1}) \\ \end{aligned}$

Two 、BLEU（Bilingual Evaluation Understudy）

2.1 BLEU Definition

BLEU（ Pronunciation and words blue identical ） It was first used to evaluate the results of machine translation , But now it has been widely used to evaluate the quality of output sequences in many applications . For the prediction sequence pred Any of the $n$ Metagrammar , BLEU This is the assessment of $n$ Whether the meta syntax appears in the tag sequence label in .

BLEU The definition is as follows ：

$\text{BLEU}=\exp\left(\min\left(0,1-\frac{\text{len(label)}}{\text{len(pred)}}\right)\right)\prod_{n=1}^kp_n^{1/2^n}$

among $\text{len(*)}$ Represents a sequence $*$ The number of lexical elements in , $k$ Used to match the longest $n$ Metagrammar （ Constant access $4$ ）, $p_n$ Express $n$ The accuracy of meta grammar .

To be specific , Given label： $A, B, C, D, E, F$ and pred： $A, B, B, C, D$ , take $k = 3$ .

First of all to see $p_1$ How to calculate . We will first pred Each of the unigram It's all figured out ： $(A), (B), (B), (C), (D)$ , then label Each of the unigram It's all figured out ： $(A), (B), (C), (D), (E), (F)$ , Then see how many matches there are between them （ Cannot match repeatedly , That is, one-to-one correspondence must be maintained ）. It can be seen that there are $4$ A match , and pred There's a total of $5$ individual unigram, therefore $p_1=4/5$ .

Look again. $p_2$ How to calculate . We will first pred Each of the bigram It's all figured out ： $(A, B), (B, B), (B, C), (C, D)$ , then label Each of the bigram It's all figured out ： $(A, B), (B, C), (C, D), (D, E), (E, F)$ , Then see how many matches there are between them . It can be seen that there are $3$ A match , and pred There's a total of $4$ individual bigram, therefore $p_2=3/4$ .

Finally, let's see $p_3$ How to calculate . We will first pred Each of the trigram It's all figured out ： $(A, B, B), (B, B, C), (B, C, D)$ , then label Each of the trigram It's all figured out ： $(A, B, C), (B, C, D), (C, D, E), (D, E, F)$ , Then see how many matches there are between them . It can be seen that only $1$ A match , and pred There's a total of $3$ individual trigram, therefore $p_3=1/3$ .

So in this case BLEU The score is

$\begin{aligned} \text{BLEU}&=\exp(\min(0,1-6/5))\cdot p_1^{1/2}\cdot p_2^{1/4}\cdot p_3^{1/8} \\ &=e^{-0.2}\cdot \left(\frac45\right)^{1/2}\cdot \left(\frac34\right)^{1/4}\cdot\left(\frac13\right)^{1/8} \\ &\approx0.5940 \end{aligned}$

2.2 BLEU Discussion

according to BLEU The definition of , When the prediction sequence is exactly the same as the tag sequence ,BLEU The value of is $1$ . On the other hand , because $e^x>0$ And $p_n\geq0$ , So there is

$\text{BLEU}\in[0,1]$

BLEU The closer the value of $1$ , It means the better the prediction effect ;BLEU The closer the value of $0$ , It means the worse the prediction effect .

Besides , because $n$ The longer the metagrammar, the more difficult it is to match , therefore BLEU For longer $n$ The accuracy of meta syntax assigns greater weight （ Fix $a\in(0,1)$ , be $a^{1/2^n}$ Will follow $n$ To increase by ）. and , Because the shorter the prediction sequence is $p_n$ The higher the value , So the coefficient $\exp(\cdot)$ This term is used to punish shorter prediction sequences .

2.3 BLEU Simple implementation of

import math
from collections import Counter


def bleu(label, pred, k=4):
    #  Let's assume that the input label and pred Word segmentation has been carried out 
    score = math.exp(min(0, 1 - len(label) / len(pred)))
    for n in range(1, k + 1):
        #  Use hash table to store label All of the n-gram
        hashtable = Counter([' '.join(label[i:i + n]) for i in range(len(label) - n + 1)])
        #  The number of successful matches 
        num_matches = 0
        for i in range(len(pred) - n + 1):
            ngram = ' '.join(pred[i:i + n])
            if ngram in hashtable and hashtable[ngram] > 0:
                num_matches += 1
                hashtable[ngram] -= 1
        score *= math.pow(num_matches / (len(pred) - n + 1), math.pow(0.5, n))
    return score

for example ：

label = 'A B C D E F'
pred = 'A B B C D'
for i in range(4):
    print(bleu(label.split(), pred.split(), k=i + 1))
# 0.7322950476607851
# 0.6814773296495302
# 0.5940339360503315
# 0.0