当前位置:网站首页>How gensim freezes some word vectors for incremental training
How gensim freezes some word vectors for incremental training
2022-07-02 07:52:00 【MezereonXP】
Gensim It can be used for topic model extraction , Word vector generated python The library of .
It's like something NLP Preprocessing , You can use this library to generate easily and quickly .
Like Word2Vec, We can generate word vectors with a few lines of code , As shown below :
import gensim
from numpy import float32 as REAL
import numpy as np
word_list = ["I", "love", "you", "."]
model = gensim.models.Word2Vec(sentences=word_list, vector_size=200, window=10, min_count=1, workers=4)
# Print word vector
print(model.wv["I"])
# Save the model
model.save("w2v.out")
The author uses Gensim Generate word vectors , But there is a need , There is already a word vector model , Now we want to expand the original vocabulary , But I don't want to modify the word vector of existing words .
Gensim There is no document describing how to freeze word vectors , But we check its source code , It is found that there is an experimental variable that can help us .
# EXPERIMENTAL lockf feature; create minimal no-op lockf arrays (1 element of 1.0)
# advanced users should directly resize/adjust as desired after any vocab growth
self.wv.vectors_lockf = np.ones(1, dtype=REAL)
# 0.0 values suppress word-backprop-updates; 1.0 allows
This code can be found in gensim Of word2vec.py You can find
therefore , We can use this vectos_lockf To meet our needs , The corresponding code is directly given here
# Read the old word vector model
model = gensim.models.Word2Vec.load("w2v.out")
old_key = set(model.wv.index_to_key)
new_word_list = ["You", "are", "a", "good", "man", "."]
model.build_vocab(new_word_list, update=True)
# Get the length of the updated vocabulary
length = len(model.wv.index_to_key)
# Freeze all the previous words
model.wv.vectors_lockf = np.zeros(length, dtype=REAL)
for i, k in enumerate(model.wv.index_to_key):
if k not in old_key:
model.wv.vectors_lockf[i] = 1.
model.train(new_word_list, total_examples=model.corpus_count, epochs=model.epochs)
model.save("w2v-new.out")
In this way, the word vector is frozen , It will not affect some existing models ( We may train some models based on old word vectors ).
边栏推荐
- ABM thesis translation
- Open failed: enoent (no such file or directory) / (operation not permitted)
- Calculate the difference in days, months, and years between two dates in PHP
- Traditional target detection notes 1__ Viola Jones
- 【FastDepth】《FastDepth:Fast Monocular Depth Estimation on Embedded Systems》
- 【双目视觉】双目矫正
- Faster-ILOD、maskrcnn_ Benchmark training coco data set and problem summary
- [multimodal] clip model
- label propagation 标签传播
- 【MagNet】《Progressive Semantic Segmentation》
猜你喜欢
What if the laptop task manager is gray and unavailable
[binocular vision] binocular stereo matching
【Random Erasing】《Random Erasing Data Augmentation》
[multimodal] clip model
【Mixed Pooling】《Mixed Pooling for Convolutional Neural Networks》
半监督之mixmatch
The difference and understanding between generative model and discriminant model
【MnasNet】《MnasNet:Platform-Aware Neural Architecture Search for Mobile》
Hystrix dashboard cannot find hystrix Stream solution
PointNet原理证明与理解
随机推荐
Two dimensional array de duplication in PHP
【双目视觉】双目矫正
Proof and understanding of pointnet principle
Mmdetection installation problem
【Sparse-to-Dense】《Sparse-to-Dense:Depth Prediction from Sparse Depth Samples and a Single Image》
EKLAVYA -- 利用神经网络推断二进制文件中函数的参数
常见的机器学习相关评价指标
win10+vs2017+denseflow编译
【Sparse-to-Dense】《Sparse-to-Dense:Depth Prediction from Sparse Depth Samples and a Single Image》
Memory model of program
Faster-ILOD、maskrcnn_benchmark训练自己的voc数据集及问题汇总
Yolov3 trains its own data set (mmdetection)
Feature Engineering: summary of common feature transformation methods
(15) Flick custom source
Execution of procedures
Deep learning classification Optimization Practice
论文tips
How to clean up logs on notebook computers to improve the response speed of web pages
用MLP代替掉Self-Attention
【Programming】