当前位置:网站首页>Word vector - demo
Word vector - demo
2022-07-31 06:15:00 【Young_win】
word2vec and BERT are both landmark work in language representation. The former is a representative of the word embedding paradigm, and the latter is a representative of the pre-training paradigm.In addition to modeling polysemy, a good language representation also needs to reflect the complex characteristics of words, including syntax, semantics, and so on.
word2vec
Starting from the "distributed representation hypothesis of words (the meaning of a word is given by words that frequently appear in the context of the word)", a look-up table is finally obtained, and each word is mapped to a unique dense vector..
Static word representation, regardless of context, cannot handle polysemy.
BERT
Using the Transformer (encoder in ) as the feature extractor to train on a large-scale corpus with a denoising target such as MLM, the resulting representation is very helpful for downstream tasks.
ELMo, BERT and other pre-training methods learn a deep network, so after pre-training, different levels of features can be obtained on different network layers.The features generated at the high level reflect more abstract, context-dependent parts; while the features generated at the lower level are more concerned with the syntax-level parts.
word2vec is a BOW model and lacks modeling of position.The embedding of word2vec is the model itself, whether it is CBOW or skip-gram, there are no other parameters except embedding.
word2vec's vocab is generally huge, with 100w vocabulary at every turn, because of the limitation of mlm task and the popularity of bpe tokenizer, a single language is generally 3w-5w vocabstrong>.BERT series models generally token and sentence will be modeled together, which is also an advantage of the transformer model. The representation of each token in this layer is For the previousThe result of using attention of all tokens in the layer, so it is very simple to get the representation of the sentence, just use a special token such as [CLS], and you can also use all the models in the model.capacity.
The word2vec model does not have a particularly good way to directly obtain sentence-level representations. The general practice is to average all word vectors, but this is not equivalent to the representation of the modeled sentence displayed.From the objective point of view, the first two kinds of objective of word2vec, CBOW or skip-gram, are simpler than BERT's mlm.Among them, the objective of CBOW and BERT is closer. It can be considered that CBOW is an mlm task that only masks one token.
From the point of view of modeling, there is no essential difference between BERT and Word2vec (CBOW mode), both are Mask Language Model as the training task; the only progress is: BERT successfully willTransformer, a deep network with strong expressive ability and easy optimization, is applied to the task of Mask Language Model; assuming that each input sample only masks 1 word during BERT training, then BERT is somewhat similar to a sliding window of 512.The CBOW model, except that the word embedding here is calculated by the Transformer encoder, not a simple context embedding addition.
The word vector at each position in BERT goes through a multi-layer transformer network structure. The transformer network structure will name the word vector at each position and the word vector at other positions as self-The attention matrix changes, and finally the word vector at each position will integrate the information of the word vector at each position.With Word2Vec, what we get is just a parameter matrix of the network, and the representation of each word will not change because of the sentence it is in.Therefore, the improvement of BERT over Word2Vec is that the word vector output by multiple transformers at each position of BERT has contextual information, and it can more directly model words and distances farther than the previous RNN-based model BERT.The dependencies between words, which Word2Vec does not have.
In terms of training method, BERT uses the Denoise mode to predict the words at the positions where the random mask is dropped. Due to the word model, the number of words is less than words, so BERT predictsWhen the words dropped by MASK are directly passed through softmax and then use cross entropy as the loss function to train the model.Word2Vec generally uses a sliding window to predict the middle word of the window or predict the words on both sides of the window from the middle word. At the same time, because words are used as basic units, the number of words is relatively large. Word2Vec generally adopts hierarchical softmax or negative sample.train.Of course, regardless of the amount of calculation, it is hard to say whether the effect of directly using softmax is better than negative sample.
From the user's point of view, Word2Vec is also a pre-trained model. The parameter of the model is the word vector itself.As a more powerful pre-training model, BERT can output the words in sentences as word vectors with rich contextual information through Transformer, and can also get sentence vectors,And the pre-training of BERT is more sufficient.
Both BERT and Word2Vec can be used as part of other models, while BERT is stronger.But the strong price is that the computational efficiency of BERT is very low, and Word2Vec is a table look-up process, which is very efficient, which also limits the application of BERT in many fields that require real-time.
边栏推荐
- cocos2d-x-3.2 image graying effect
- Pytorch每日一练——预测泰坦尼克号船上的生存乘客
- JS写一段代码,判断一个字符串中出现次数最多的字符串,并统计出现的次数JS
- Why does read in bash need to cooperate with while to read the contents of /dev/stdin
- 数据预处理、特征工程和特征学习-摘抄
- wangeditor编辑器内容传至后台服务器存储
- 我的训练函数模板(动态修改学习率、参数初始化、优化器选择)
- cocos2d-x-3.2创建项目方法
- Nmap的下载与安装
- VS connects to MYSQL through ODBC (2)
猜你喜欢
随机推荐
Markdown help documentation
sqlite 查看表结构 android.database.sqlite.SQLiteException: table splitTable has no column named
quick-3.5 lua调用c++
Navicat从本地文件中导入sql文件
Understanding of js arrays
VS connects to MYSQL through ODBC (2)
cocos create EditBox 输入文字被刘海屏遮挡修改
化学试剂磷脂-聚乙二醇-氨基,DSPE-PEG-amine,CAS:474922-26-4
VTK:Could not locate vtkTextRenderer object.
Markdown 帮助文档
Cholesterol-PEG-NHS NHS-PEG-CLS 胆固醇-聚乙二醇-活性酯可修饰小分子材料
活体检测FaceBagNet阅读笔记
Tensorflow相关list
Pytorch学习笔记7——处理多维特征的输入
CAS:474922-22-0 Maleimide-PEG-DSPE 磷脂-聚乙二醇-马来酰亚胺简述
pytorch模型微调finetuning训练image_dog(kaggle)
cv2.resize()是反的
cocos2d-x-3.2 不能混合颜色修改
unicloud 发布后小程序提示连接本地调试服务失败,请检查客户端是否和主机在同一局域网下
MySQL 入门:Case 语句很好用