当前位置:网站首页>Read a paper_ ChineseBert
Read a paper_ ChineseBert
2022-07-03 03:39:00 【7frog7】
Recently, I found a very interesting bert It combines partials and Pinyin According to the article, the effect is good
Portal
According to previous habits First measure the cosine similarity
from datasets.bert_dataset import BertDataset
from models.modeling_glycebert import GlyceBertModel
from sklearn.metrics.pairwise import cosine_similarity
import torch
import numpy
CHINESEBERT_PATH = "C:\\Users\\1323231\\ChineseBERT-base"
tokenizer = BertDataset(CHINESEBERT_PATH)
chinese_bert = GlyceBertModel.from_pretrained(CHINESEBERT_PATH)
sentence1 = ' I like cats '
sentence2 = ' I also like cats '
sentence3 = ' Yueji ちぬ、 The wind も blow きぬべし'# I saw vocabulary.txt I found a pseudonym in it This right should be a control group Although it should have been ちぬ べし
long_sen1 = ' We must grasp the same from the opposite , Grasp opposites in the same , Use contact , Developing , A comprehensive view , Especially from a contradictory point of view . Have a deep understanding of the opposition between the two sides of the contradiction and the relationship between the two sides , The two are of different nature , But it is based on the common essence . Both see the contradiction between the two sides , See unity and transformation again , Only in this way can we really grasp the contradiction , Grasp the development of things .'
long_sen2 = ' Everything contains contradictions . Both sides of the contradiction are opposite , Again unified , Both struggle with each other , And interdependent , Infiltration and mutual transformation under certain conditions .'
def sentence2vec(sentence):
input_ids, pinyin_ids = tokenizer.tokenize_sentence(sentence)
length = input_ids.shape[0]
input_ids = input_ids.view(1, length)
pinyin_ids = pinyin_ids.view(1, length, 8)
output_hidden = chinese_bert.forward(input_ids, pinyin_ids)[0]
num = output_hidden.shape[1]
#output_hidden = output_hidden.squeeze(0)
output_hidden = torch.sum(output_hidden ,1)
output_hidden = output_hidden/num
return output_hidden
opt1 = sentence2vec(sentence1).detach().numpy()
opt2 = sentence2vec(sentence2).detach().numpy()
opt3 = sentence2vec(sentence3).detach().numpy()
lopt1 = sentence2vec(long_sen1).detach().numpy()
lopt2 = sentence2vec(long_sen2).detach().numpy()
cosine_similarity1 = cosine_similarity(opt1,opt2)
cosine_similarity2 = cosine_similarity(opt1,opt3)
print(cosine_similarity1)
print(cosine_similarity2)
cosine_similarity3 = cosine_similarity(lopt1,lopt2)
print(cosine_similarity3)
cosine_similarity4 = cosine_similarity(opt1,lopt1)
print(cosine_similarity4)
‘’‘
[[0.96489584]]
[[0.8336769]]
[[0.96856475]]
[[0.86439896]]
’‘’
Estimated below 0.9 Are all irrelevant texts
Maybe we can test more
Actually, I wanted to continue fine-tune A little bit. But I don't know how to find the right data set You can't copy the poor lines of data several times
边栏推荐
- Nce detail of softmax approximation
- MongoDB簡介
- Pytorch multi card distributed training distributeddataparallel usage
- npm : 无法将“npm”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写,如果包括路径,请确保路径正确,然后再试一次。
- 【学习笔记】seckill-秒杀项目--(11)项目总结
- FileZilla Client下载安装
- Open Visual Studio 2010 hangs when opening a SQL file sql file
- Pat class B "1104 forever" DFS optimization idea
- [mathematical logic] propositional logic (propositional logic reasoning | formal structure of reasoning | inference law | additional law | simplification law | hypothetical reasoning | refusal | disju
- MongoDB复制集【主从复制】
猜你喜欢

Elsevier latex submitted the article pdftex def Error: File `thumbnails/cas-email. jpeg‘ not found: using draf

How to move towards IPv6: IPv6 Transition Technology - Shangwen network quigo

Captura下载安装及在Captura配置FFmpeg

Stop using system Currenttimemillis() takes too long to count. It's too low. Stopwatch is easy to use!

Bid farewell to artificial mental retardation: Mengzi open source project team received RMB 100 million financing to help NLP develop

如何迈向IPv6之IPv6过渡技术-尚文网络奎哥

Why does thread crash not cause JVM crash

Pytorch multi card distributed training distributeddataparallel usage
![Ansible introduction [unfinished (semi-finished products)]](/img/2a/0003daf761ba02d8837c4657fc3f29.png)
Ansible introduction [unfinished (semi-finished products)]

IPv6过渡技术-6to4手工隧道配置实验--尚文网络奎哥
随机推荐
[mathematical logic] predicate logic (individual word | individual domain | predicate | full name quantifier | existence quantifier | predicate formula | exercise)
Tidal characteristics of the Bohai Sea and the Yellow Sea
Limit of one question per day
ffmpeg录制屏幕和截屏
简易版 微信小程序开发之for指令、上传图片及展示效果优化
动态规划:最长回文子串和子序列
递归:快速排序,归并排序和堆排序
Ffmpeg one / more pictures synthetic video
Pat class B "1104 forever" DFS optimization idea
简易版 微信小程序开发之页面跳转、数据绑定、获取用户信息、获取用户位置信息
Separable bonds and convertible bonds
可分离债券与可转债
Error in compiled file: error: unmapped character encoding GBK
递归:一维链表和数组
[MySQL] the difference between left join, right join and join
On the adjacency matrix and adjacency table of graph storage
@The difference between Autowired, @qualifier, @resource
【学习笔记】seckill-秒杀项目--(11)项目总结
Réglez la hauteur et lancez le système. Currenttimemillis catton
MongoDB基本操作【增、删、改、查】