当前位置：网站首页>CRF (conditional random field) learning summary

CRF (conditional random field) learning summary

2022-06-30 09:46:00 【A grain of sand in the vast sea of people】

1. Why CRF( Conditional random field )

If you use softmax Classify every frame in the sequence . There is no direct consideration of the output context

CRF It is mainly used for sequence annotation , It can be simply understood as Classify every frame in the sequence , Since it's classification , It is natural to think of this sequence as CNN perhaps RNN After coding , Connected to a full connection layer softmax Activate , As shown in the figure below

Frame by frame softmax There is no direct consideration of the output context

Conditional random field

However , When we design labels , For example, use s、b、m、e Of 4 A label to do word segmentation of word tagging , The target output sequence itself carries some context , such as s You can't take it back m and e, wait . Per label softmax This output level context is not considered , So it means putting these associations at the coding level , I hope the model can learn these contents by itself , But sometimes “ A strong model ”.

and CRF More directly , it The output level associations are separated , This makes the model more “ Leisurely ”：

CRF Context is explicitly considered at the output

CRF Context is explicitly considered at the output

2. What is? CRF

Of course , If you just import the output correlation , It's not just CRF All ,CRF The real delicacy of , It's it In units of path , Consider the probability of the path .

2.1 Model profile

If an input has nn frame , The label of each frame has kk A possibility , So in theory, there is knkn Different kinds of output . We can use the following network diagram for simple visualization . In the following illustration , Each point represents the possibility of a label , The line between points indicates the association between labels , And each annotation result , All correspond to a complete path on the graph .

4tag Output network diagram in word segmentation model

In the sequence annotation task , Our correct answer is generally the only one . such as “ it's a nice day today ”, If the corresponding participle result is “ today / The weather / No / wrong ”, So the target output sequence is bebess, Other paths do not meet the requirements . In other words , In the sequence tagging task , The basic unit of our research should be the path , What we have to do , It's from knkn Choose the right path , That means , If we regard it as a classification problem , Then it will be knkn The classification problem of choosing one of the classes ！

This is frame by frame softmax and CRF Is fundamentally different ： The former regards sequence annotation as n individual k classification problem , The latter regards sequence annotation as 1 individual classification problem .

3. Two marking modes in the sequential annotation model

3.1 SBME Tagging

S Express A word that represents a single word (single word) , B The beginning of a word （begin） Namely the first character ,M In the middle of a word (Middle) The middle word of a word ,E It means the end of a word （end）, That is, the last word It is usually expressed in numbers ：
# -1, unknown
# 0-> 'S'
# 1-> 'B'
# 2-> 'M'
# 3-> 'E'

Examples ： I love to use Xiaomi mobile phone to play king glory -> I <S> Love <S> send <B> use <E> Small <B> rice <M> hand <M> machine <E> play <S> king <B> person <M> Rong <M> Yao <E>

3.2 CS Tagging

C  Express   The current word char And the last word char Is a continuous , Together they mean a word ,S  Express   Current word char And the last word char It's two different words  .
 It is usually expressed in numbers ：
 #-1,unkonwn
 # 0  -> 'C'
 # 1 -> 'S'

Examples ： I love to use Xiaomi mobile phone to play king glory -> I <S> Love <S> send <C> use <S> Small <C> rice <C> hand <C> machine <S> play <S> king <C> person <C> Rong <C> Yao <S>

3.3. IOB Inside-outside-beginning (tagging)

IOB Inside-outside-beginning (tagging)
IOB It's a marking technique ,IOB foramt It is a symbol commonly used in computer linguistics (tokens) In the form of .

B The prefix refers to the beginning of a statement block ;I The prefix refers to the statement block (chunk) among ;O Prefix refers to not in this statement block .

B A tag is only one tag that closely follows another tag of the same type, but there is no... Between two tags O Use when marking .O The tag shows that the symbol does not belong to any statement block .

An example with IOB format:

Alex I-PER
is O
going O
to O
Los I-LOC
Angeles I-LOC
in O
California I-LOC
Alex is going to Los Angeles in California
I-PER O O O I-LOC I-LOC O I-LOC

Notice how "Alex", "Los" and "California", although first tokens of their chunk, have the "I-" prefix.

Another example

Alex I-PER
going O
Los I-LOC
Angeles I-LOC
California B-LOC

Notice how "California" now has the "B-" prefix, because it immediately follows another LOC chunk.

3.4. IOB2 format

Another similar format which is widely used is IOB2 format, which is the same as the IOB format except that the B- tag is used in the beginning of every chunk (i.e. all chunks start with the B- tag).

Example

Alex B-PER
is O
going O
to O
Los B-LOC
Angeles I-LOC
in O
California B-LOC

3.5. BIOES

Related tagging schemes sometimes include "START/END: This consists of the tags B, E, I, S or O where S is used to represent a chunk containing a single token. Chunks of length greater than or equal to two always start with the B tag and end with the E tag."[4]

Other Tagging Scheme's include BIOES/BILOU, where 'E' and 'L' denotes Last or Ending character is such a sequence and 'S' denotes Single element or 'U' Unit element.

Alex S-PER
is O
going O
with O
Marty B-PER
A. I-PER
Rick E-PER
to O
Los B-LOC
Angeles E-LOC

Reference resources
Wikipedia: Inside-outside-beginning

Text Chunking using Transformation-Based Learning, Ramshaw and Marcus, 1995

4. Code implementation

install tensorflow-addons. because Tensorflow 1 Implementation in contrib. tensorflow 2 Implementation in Tensorflow_addons Inside

pip install tensorflow-addons

Test example

import tensorflow_addons as tfa
import tensorflow as tf
import numpy as np

inputs=tf.random.truncated_normal([2,10,5])
target=tf.convert_to_tensor(np.random.randint(5,size=(2,10)),dtype=tf.int32)
out=tf.keras.layers.Softmax(inputs)

lens=tf.convert_to_tensor([9,6],dtype=tf.int32)
log_likelihood,tran_paras=tfa.text.crf_log_likelihood(inputs, target, lens)
batch_pred_sequence,batch_viterbi_score=tfa.text.crf_decode(inputs,tran_paras,lens)
loss=tf.reduce_sum(-log_likelihood)
print('log_likelihood is :',log_likelihood.numpy())
print('batch_pred_sequence is :',batch_pred_sequence.numpy())
print('loss is :',loss.numpy())

Output

log_likelihood is : [-18.046837 -14.958561]
batch_pred_sequence is : [[0 3 1 4 3 4 2 0 4 3]
 [3 0 3 3 2 2 4 1 4 1]]
loss is : 33.005398

原网站

版权声明
本文为[A grain of sand in the vast sea of people]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202160524516224.html