当前位置:网站首页>Week 6 Learning Representation: Word Embedding (symbolic →numeric)
Week 6 Learning Representation: Word Embedding (symbolic →numeric)
2022-07-26 05:09:00 【Jinzhou hungry bully】
One 、 Learning representation in machine learning and deep learning
1、RNN Knowledge review


2、 Comparison between traditional feature extraction and modern feature extraction



Two 、 Word embedding (Word embedding)
1、Word embedding Definition
- Embedding It is a noun in the field of Mathematics , yes Refers to an object X Embedded in another object Y in , mapping f : X → Y , For example, rational numbers are embedded in real numbers .
- Word embedding yes NLP One of them Language model (language modeling) and Feature learning technology (feature learning techniques) The general term of , These techniques will put words or phrases in the vocabulary (words or phrases) Map to a vector composed of real numbers .
- Word embedding Is to automatically learn from data Input space to Distributed representation Mapping of space f.
- The simplest one Word Embedding Method , Namely Bag based on words (BOW) Of One-Hot Express , There's another way : Co-occurrence matrix (Cocurrence matrix).
This process is called word embedding( Word embedding ), namely Embed the high-dimensional word vector into a low dimensional space . Pictured :

2、 Fever alone (One hot representation)
2.1 Definition
The only hot code is One-Hot code , Also known as a bit effective code , The method is to use N Bit status register Come on N Status Encoding , Each state has its own register bit , And at any time , Only one of them works . for instance , Suppose we have four samples ( That's ok ), Each sample has three characteristics ( Column ), Pictured :

our feature_1 There are two possible values , For example, men / Woman , Men's use here 1 Express , made for females 2 Express .feature_2 and feature_3 Each has 4 Species value ( state ).one-hot Coding is to ensure that every single feature in each sample has only 1 Bit in state 1, Everything else is 0. For the above states one-hot The coding is shown in the figure below :

Consider three characteristics :
- ["male", "female"]
- ["from Europe", "from US", "from Asia"]
- ["uses Firefox", "uses Chrome", "uses Safari", "uses Internet Explorer"]
After replacing it with a single heat code , Should be :
- feature1=[01,10]
- feature2=[001,010,100]
- feature3=[0001,0010,0100,1000]
1.2.2 Analysis of advantages and disadvantages
advantage :
- One is It solves the problem that the classifier is difficult to deal with discrete data The problem of ,
- Second, to a certain extent, it also played The role of extended features .
shortcoming :
- There are some shortcomings in the representation of text features .
- First , It's a word bag model , Don't consider the order between words ( The order information of words in text is also very important );
- secondly , it Assume that words are independent of each other ( in the majority of cases , Words and words affect each other );
- Last , It gets The characteristic is discrete and sparse Of
3、 ... and 、 Word2vec
1、 Word2vec Definition
word2vec The model is actually a simplified neural network .word2vec yes Use a layer by layer neural network ( namely CBOW) hold one-hot The form of sparse word vector mapping is called a n dimension (n Usually hundreds ) Dense vector process . In order to speed up model training , Among them tricks Include Hierarchical softmax,negative sampling,Huffman Tree etc. .
stay NLP in , The most fine-grained objects are words . If we want to tag part of speech , Use the general idea , We can have a series of sample data (x,y). among x It means words ,y Part of speech . And what we have to do , Is to find one x -> y The mapping relation of , Traditional methods include Bayes,SVM And so on . But our mathematical model , Generally, it is a numerical input . however NLP Words in , It is the abstract summary of human beings , It's symbolic ( Such as Chinese 、 english 、 Latin and so on ), therefore They need to be converted into numerical form , Or say —— Embedded in a mathematical space , such Embed mode , It's called word embedding (word embedding), and Word2vec, Is word embedding ( word embedding) A kind of .

Input is One-Hot Vector,Hidden Layer There is no activation function , It's a linear element .Output Layer Dimension follows Input Layer It's the same dimension , It's using Softmax Return to . When the model is trained , We don't use this trained model for new tasks , What we really need is the parameters that this model learns from the training data , For example, the weight matrix of hidden layer . How does this model define the input and output of data ? Generally divided into CBOW(Continuous Bag-of-Words) And Skip-Gram Two models .
- CBOW Model The training input is the word vector corresponding to the context sensitive words of a feature word , And the output is the word vector of this particular word , CBOW Yes Small database More appropriate , and Skip-Gram In large corpora Better performance in .
- Skip-Gram Models and CBOW The idea is the opposite , namely The input is a word vector for a particular word , The output is the context word vector corresponding to a specific word .
Word2Vec The model is actually divided into two parts , The first part is to establish the model , The second part is to obtain the embedded word vector through the model .Word2Vec The whole modeling process is actually related to the self encoder (auto-encoder) The idea is very similar , That is to build a neural network based on training data , When the model is trained , We don't use this trained model for new tasks , What we really need is the parameters that this model learns from the training data , For example, the weight matrix of hidden layer —— We will see these weights in Word2Vec In fact, it is what we are trying to learn “word vectors”.
The method mentioned above will actually be used in Unsupervised feature learning (unsupervised feature learning) See you in , The most common is Self encoder (auto-encoder): adopt Encode and compress the input in the hidden layer , Then, the data is decoded at the output layer and restored to the initial state , After training , We will put the output layer “ Cut off ”, Keep only hidden layers .
2、Continuous bag of words (CBOW)
2、Skip-gram
3、Negative sampling
3、 ... and 、Something to vector
1、Node2Vect
2、Doc2Vect
边栏推荐
- Nacos 介绍和部署
- When AQS wakes up the thread, I understand why it traverses from the back to the front
- What are the characteristics of the grammar of Russian documents in the translation of scientific papers
- Embedded sharing collection 21
- 嵌入式分享合集20
- MySQL八股知识点:从入门到删库
- @Principle of Autowired annotation
- @Autowired注解的原理
- 未来大气污染变化模拟
- 分布式ID的常用解决方案-一把拿下
猜你喜欢

地球系统模式(CESM)实践技术

AQS唤醒线程的时候为什么从后向前遍历,我懂了

Please elaborate on the implementation principle of synchronized and related locks

如何优雅的复现YOLOv5官方历程(二)——标注并训练自己的数据集

Common solutions for distributed ID - take one

未来大气污染变化模拟

Embedded sharing collection 20

【pytorch】torch1.8.1安装、查看torch版本、GPU是否可用

Nacos introduction and deployment

MySQL基础学习
随机推荐
MySQL基础学习
MySQL basic learning
SWAT模型在水文水资源、面源污染模拟中的实践技术
Install nccl \ mpirun \ horovod \ NVIDIA tensorflow (3090ti)
mysql函数汇总之日期和时间函数
Teach you how to use code to realize SSO single sign on
NPM operation instruction
How to connect tdengine through idea database management tool?
普林斯顿微积分读本02第一章--函数的复合、奇偶函数、函数图像
测试用例评审如何开展
Computable general equilibrium (CGE) model practice technology in resource environment under the goal of "double carbon"
Recommendation system - machine learning
DOM事件流 事件冒泡-事件捕获-事件委托
New knowledge in big homework
Add and modify the verification logic, and use -validation- to complete the group verification
Briefly describe the application fields of WMS warehouse management system
Seata submits at details in two stages
CLM land surface process model
【pytorch】torch1.8.1安装、查看torch版本、GPU是否可用
List converted to tree real use of the project