当前位置:网站首页>Text error correction -- crisp model
Text error correction -- crisp model
2022-06-13 11:39:00 【xuanningmeng】
Text correction –CRASpell Model
CRASpell: A Contextual Typo Robust Approach to Improve Chinese Spelling Correction This paper was published in 22 year ACL, stay Chinese spelling correction (CSC) The task is SOTA. be based on bert In the pre training model CSC Our model has two limits :
(1) The model does not work well on multi error text , Usually in misspelled text , Misspelled characters occur at least 1 Time , This will bring noise , This kind of noisy text leads to the performance degradation of multi wrong word text .
(2) because bert Mask task , These models over correct the useful expressions of high-frequency words .
CRASpell Model each training sample to construct a noisy sample ,correct The model is based on the output which is more similar to the original training data and the noise sample output , To solve the over correction problem , A copy mechanism is combined to enable our model to select input characters when error correction and input characters are valid according to the given context .
The address of the article is : article
The code address is :code
Model
Task description
The purpose of Chinese spelling correction is to detect and correct spelling errors in text . Usually expressed as X = { x 1 , x 2 , … , x n } \Large\boldsymbol{X} = \{x_{1}, x_{2}, \dots, x_{n}\} X={ x1,x2,…,xn} It's a length of n n n Contains misspelled text , Y = { y 1 , y 2 , … , y n } \Large\boldsymbol{Y} = \{y_{1}, y_{2}, \dots, y_{n}\} Y={ y1,y2,…,yn} It's a length of n n n Correct text for , Model input X \Large\boldsymbol{X} X Generate correct text Y \Large\boldsymbol{Y} Y.
CRASpell Model

On the left is Correction Model , On the right is Noise Model , The following describes the model in detail .
(1) Correction Module
Given the input text X = { x 1 , x 2 , … , x n } \Large\boldsymbol{X} = \{x_{1}, x_{2}, \dots, x_{n}\} X={ x1,x2,…,xn} obtain embedding vector E = { e 1 , e 2 , … , e n } \Large\boldsymbol{E} = \{e_{1}, e_{2}, \dots, e_{n}\} E={ e1,e2,…,en}, Each of these characters x i x_{i} xi Corresponding embedding The vector is written as e i e_{i} ei, take E \Large\boldsymbol{E} E Input to Transformer Encoder Get in hidden state matrix H = { h 1 , h 2 , … , h n } \Large\boldsymbol{H} = \{h_{1}, h_{2}, \dots, h_{n}\} H={ h1,h2,…,hn}, among h i ∈ R 768 h_{i}\in\Large\boldsymbol{R}^{768} hi∈R768 Is the character x i x_{i} xi after Transformer Encoder Get the feature .
(2) Generative Distribution
X = { x 1 , x 2 , … , x n } \Large\boldsymbol{X} = \{x_{1}, x_{2}, \dots, x_{n}\} X={ x1,x2,…,xn} after Transformer Encoder Get the eigenvector H = { h 1 , h 2 , … , h n } \Large\boldsymbol{H} = \{h_{1}, h_{2}, \dots, h_{n}\} H={ h1,h2,…,hn}, Through a forward linear layer and a softmax Layer gets each character token Generation probability of , The formula is as follows :
p g = s o f t m a x ( W g h i + b g ) p_{g} = softmax(W_{g}h_{i} + b_{g}) pg=softmax(Wghi+bg)
among W g ∈ R n v × 768 W_{g}\in \Large\boldsymbol{R}^{n_{v}\times768} Wg∈Rnv×768, b g ∈ R 768 b_{g}\in \Large\boldsymbol{R}^{768} bg∈R768, n v n_{v} nv Is the size of the pre training model vocabulary .
(3) Copy Distribution
x i x_{i} xi Of copy distribution p c ∈ { 0 , 1 } n v p_{c} \in \{0,1\}^{n_{v}} pc∈{ 0,1}nv yes x i x_{i} xi In the dictionary i d x ( x i ) idx(x_{i}) idx(xi) Of one-hot Express , The specific expression is as follows :
(4) Copy Probability
Copy Probability Is in the model diagram Copy Block The output in $\omega \in\Large\boldsymbol{R}$, namely transformers encoder The obtained hidden layer eigenvector $h_{i}$ Through two forward linear layers and one layer normalization obtain $\omega$, The formula is as follows :
h c = W c h f l n ( h i ) + b c h h c ′ = f l n ( f a c t ( h c ) ) ω = S i g m o i d ( W c h c ′ ) h_{c} = W_{ch}f_{ln}(h_{i}) + b_{ch} \\ h_{c}^{'} = f_{ln}(f_{act}(h_{c})) \\ \omega = Sigmoid(W_{c}h_{c}^{'}) hc=Wchfln(hi)+bchhc′=fln(fact(hc))ω=Sigmoid(Wchc′)
among W c h ∈ R 768 × d c W_{ch}\in\Large\boldsymbol{R}^{768\times d_{c}} Wch∈R768×dc, b c h ∈ R d c b_{ch} \in \Large\boldsymbol{R}^{d_{c}} bch∈Rdc, W c ∈ R d c × 1 W_{c}\in\Large\boldsymbol{R}^{d_{c}\times 1} Wc∈Rdc×1, f l n f_{ln} fln yes layer normalization, f a c t f_{act} fact Is the activation function , The activation function used in the code is glue. See the code for details
Copy Block Output probability p p p Combined generation Generative Distribution p g p_{g} pg and Copy Distribution p c p_{c} pc
p = ω × p c + ( 1 − ω ) × p g p = \omega\times p_{c} + (1 - \omega)\times p_{g} p=ω×pc+(1−ω)×pg
Prior to CSC The difference between models is that ,CRASpell The model takes into account... In the final generation output of the model Copy Probability p c p_{c} pc, Make the model valid in input characters but not the most suitable BERT There are more opportunities to choose input characters , Avoid overcorrection .
(5) Noise Modeling Module
Noise Modeling Module The correction model produces a similar distribution for the original context and the noise context to solve the problem of context misspelling interference . As shown on the right side of the model diagram above ,Noise Modeling Module It can be roughly divided into the following processes :
a. According to the input sample X \Large\boldsymbol{X} X Generate noise context X ~ \Large\widetilde{\boldsymbol{X}} X
b. Noise context X ~ \Large\widetilde{\boldsymbol{X}} X As input, get Transformer Encoder Get the hidden eigenvector H ~ \widetilde{\boldsymbol{H}} H
c. Based on hidden eigenvectors H ~ \widetilde{\boldsymbol{H}} H Generate generate distribution p g ~ \widetilde{p_{g}} pg
d. The generated distribution is similar to that generated by the calibration model . The generated distribution is similar to the distribution generated by the calibration model minimizing the bidirectional Kullback-
Leibler divergence reflect , The formula is as follows :
L K L = 1 2 ( D K L ( p g ∥ p g ~ ) + D K L ( p g ~ ∥ p g ) ) \mathcal{L}_{KL} = \frac{1}{2}(\mathcal{D}_{KL}(p_{g}\Vert\widetilde{p_{g}}) + \mathcal{D}_{KL}(\widetilde{p_{g}}\Vert p_{g})) LKL=21(DKL(pg∥pg)+DKL(pg∥pg))
remarks :Noise Modeling Module Only during training , Model reasoning only uses correction networks
Noisy Block
The following describes adding noise data to data . The noise sample is generated by replacing the characters of the original training sample . In the process of replacing characters, only the misspelled character context is replaced d t d_{t} dt Word , If the training sample is not misspelled , This sample is not replaced to generate noise samples . As shown in the figure below :
d t d_{t} dt Select the experimental results 
We replace each selected position with a similar character based on the publicly available confusion set . say concretely , We choose the word at the position to replace
(i) 70% The replacement randomly selects phonetically similar characters
(ii) 15% Random selection of glyph like characters
(iii) 15% A random selection of alternatives from the vocabulary .
Loss
Given a training sample ( X , Y ) (\Large\boldsymbol{X}, \Large\boldsymbol{Y}) (X,Y), X \Large\boldsymbol{X} X Is an input error sample , Y \Large\boldsymbol{Y} Y Is to correct the sample , Correct sample for each calibration Y i \Large\boldsymbol{Y_{i}} Yi Of loss by
L c i = − log ( p ( Y i ∣ X ) ) \mathcal{L}_{c}^{i} = -\log(p(\Large\boldsymbol{Y_{i}}|\Large\boldsymbol{X})) Lci=−log(p(Yi∣X))
among p p p by
p = ω × p c + ( 1 − ω ) × p g p = \omega\times p_{c} + (1 - \omega)\times p_{g} p=ω×pc+(1−ω)×pg
See the introduction above for details .
Model loss by L \mathcal{L} L
L i = ( 1 − α i ) L c i + α i L K L i \mathcal{L}^{i} = (1 - \alpha_{i})\mathcal{L}_{c}^{i} + \alpha_{i} \mathcal{L}_{KL}^{i} Li=(1−αi)Lci+αiLKLi
among α i \alpha_{i} αi
among α \alpha α yes L c \mathcal{L}_{c} Lc and L K L \mathcal{L}_{KL} LKL The trade-off factor . The constructed noise samples themselves will not participate in the training process , It will only participate as a context . This strategy aims to ensure that the constructed noise data will not change the proportion of positive and negative samples in the training corpus .
experimental result
Later supplement sighan The result on the dataset .
CRASpell Model experimental results 
边栏推荐
- Nim game ladder Nim game and SG function application (set game)
- Web 3.0? High cost version of P2P
- 1051. height checker
- Will it be a great opportunity for entrepreneurs for Tiktok to attach so much importance to live broadcast sales of takeout packages?
- Lvgl Library Tutorial 01- porting to STM32 (touch screen)
- [tcapulusdb knowledge base] tcapulusdb Model Management Introduction
- 2021CCPC网络赛榜单
- Type de condition pour ts Advanced
- Performance monster on arm64: installation and performance test of API gateway Apache APIs IX on AWS graviton3
- 树莓派开发笔记(十六):树莓派4B+安装mariadb数据库(mysql开源分支)并测试基本操作
猜你喜欢

状态压缩DP例题(旅行商问题和填矩形问题)

树莓派开发笔记(十六):树莓派4B+安装mariadb数据库(mysql开源分支)并测试基本操作

【TcaplusDB知识库】TcaplusDB单据受理-创建游戏区介绍

我是如何解决码云图床失效问题?

Nim游戏阶梯 Nim游戏和SG函数应用(集合游戏)

查询当前电脑cpu核心数

MIIdock文件分布
![[tcapulusdb knowledge base] tcapulusdb tmonitor module architecture introduction](/img/fb/effd0732d85819d7fe50c4fadef977.png)
[tcapulusdb knowledge base] tcapulusdb tmonitor module architecture introduction
![[tcapulusdb knowledge base] Introduction to tmonitor background one click installation (II)](/img/89/1b3a301c72eef78658744383ef5124.png)
[tcapulusdb knowledge base] Introduction to tmonitor background one click installation (II)

【TcaplusDB知识库】TcaplusDB新增机型介绍
随机推荐
Ue5 small knowledge points geometry script modeling
Audio and video technology development weekly 𞓜 249
Determine the maximum match between bipartite graph and bipartite graph
[tcapulusdb knowledge base] tcapulusdb Model Management Introduction
Pyepics download and installation
To avoid letting the transformation enterprises go astray, it is time to re understand the integration of xiahu warehouse| Q recommendation
【TcaplusDB知识库】TcaplusDB单据受理-创建游戏区介绍
LVGL库入门教程01-移植到STM32(触摸屏)
【sql语句基础】——查(select)(单表查询顺序补充)
Lvgl Library Tutorial 01- porting to STM32 (touch screen)
[tcapulusdb knowledge base] tcapulusdb operation and maintenance doc introduction
Adaptation of multi system docking and application of packaging mode
CommonAPI与AUTOSAR AP通讯管理的异同
Camunda timer events example demo (timer events)
VSCode 如何将已编辑好的文件中的 tab 键转换成空格键
(幼升小信息-04)如何用手机WPS在PDF上进行电子签名
Anonymity in Web3 and NFT
Will it be a great opportunity for entrepreneurs for Tiktok to attach so much importance to live broadcast sales of takeout packages?
(幼升小信息-03)批量模板制作 幼儿基本信息收集文件夹(包含PDF、Word、证件文件夹)
(一)爬取Best Sellers的所有分类信息:爬取流程