当前位置：网站首页>How gensim freezes some word vectors for incremental training

How gensim freezes some word vectors for incremental training

2022-07-02 07:52:00 【MezereonXP】

Gensim It can be used for topic model extraction , Word vector generated python The library of .

It's like something NLP Preprocessing , You can use this library to generate easily and quickly .

Like Word2Vec, We can generate word vectors with a few lines of code , As shown below ：

import gensim
from numpy import float32 as REAL
import numpy as np

word_list = ["I", "love", "you", "."]
model = gensim.models.Word2Vec(sentences=word_list, vector_size=200, window=10, min_count=1, workers=4)
#  Print word vector 
print(model.wv["I"])
#  Save the model 
model.save("w2v.out")

The author uses Gensim Generate word vectors , But there is a need , There is already a word vector model , Now we want to expand the original vocabulary , But I don't want to modify the word vector of existing words .

Gensim There is no document describing how to freeze word vectors , But we check its source code , It is found that there is an experimental variable that can help us .

# EXPERIMENTAL lockf feature; create minimal no-op lockf arrays (1 element of 1.0)
# advanced users should directly resize/adjust as desired after any vocab growth
self.wv.vectors_lockf = np.ones(1, dtype=REAL)  
# 0.0 values suppress word-backprop-updates; 1.0 allows

This code can be found in gensim Of word2vec.py You can find

therefore , We can use this vectos_lockf To meet our needs , The corresponding code is directly given here

#  Read the old word vector model 
model = gensim.models.Word2Vec.load("w2v.out")
old_key = set(model.wv.index_to_key)

new_word_list = ["You", "are", "a", "good", "man", "."]
model.build_vocab(new_word_list, update=True)
#  Get the length of the updated vocabulary 
length = len(model.wv.index_to_key)
#  Freeze all the previous words 
model.wv.vectors_lockf = np.zeros(length, dtype=REAL)
for i, k in enumerate(model.wv.index_to_key):
    if k not in old_key:
        model.wv.vectors_lockf[i] = 1.
        
model.train(new_word_list, total_examples=model.corpus_count, epochs=model.epochs)
model.save("w2v-new.out")

In this way, the word vector is frozen , It will not affect some existing models （ We may train some models based on old word vectors ）.

原网站

版权声明
本文为[MezereonXP]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/183/202207020623037389.html

当前位置：网站首页>How gensim freezes some word vectors for incremental training

How gensim freezes some word vectors for incremental training

边栏推荐

猜你喜欢

随机推荐