当前位置:网站首页>Read a paper_ ChineseBert
Read a paper_ ChineseBert
2022-07-03 03:39:00 【7frog7】
Recently, I found a very interesting bert It combines partials and Pinyin According to the article, the effect is good
Portal
According to previous habits First measure the cosine similarity
from datasets.bert_dataset import BertDataset
from models.modeling_glycebert import GlyceBertModel
from sklearn.metrics.pairwise import cosine_similarity
import torch
import numpy
CHINESEBERT_PATH = "C:\\Users\\1323231\\ChineseBERT-base"
tokenizer = BertDataset(CHINESEBERT_PATH)
chinese_bert = GlyceBertModel.from_pretrained(CHINESEBERT_PATH)
sentence1 = ' I like cats '
sentence2 = ' I also like cats '
sentence3 = ' Yueji ちぬ、 The wind も blow きぬべし'# I saw vocabulary.txt I found a pseudonym in it This right should be a control group Although it should have been ちぬ べし
long_sen1 = ' We must grasp the same from the opposite , Grasp opposites in the same , Use contact , Developing , A comprehensive view , Especially from a contradictory point of view . Have a deep understanding of the opposition between the two sides of the contradiction and the relationship between the two sides , The two are of different nature , But it is based on the common essence . Both see the contradiction between the two sides , See unity and transformation again , Only in this way can we really grasp the contradiction , Grasp the development of things .'
long_sen2 = ' Everything contains contradictions . Both sides of the contradiction are opposite , Again unified , Both struggle with each other , And interdependent , Infiltration and mutual transformation under certain conditions .'
def sentence2vec(sentence):
input_ids, pinyin_ids = tokenizer.tokenize_sentence(sentence)
length = input_ids.shape[0]
input_ids = input_ids.view(1, length)
pinyin_ids = pinyin_ids.view(1, length, 8)
output_hidden = chinese_bert.forward(input_ids, pinyin_ids)[0]
num = output_hidden.shape[1]
#output_hidden = output_hidden.squeeze(0)
output_hidden = torch.sum(output_hidden ,1)
output_hidden = output_hidden/num
return output_hidden
opt1 = sentence2vec(sentence1).detach().numpy()
opt2 = sentence2vec(sentence2).detach().numpy()
opt3 = sentence2vec(sentence3).detach().numpy()
lopt1 = sentence2vec(long_sen1).detach().numpy()
lopt2 = sentence2vec(long_sen2).detach().numpy()
cosine_similarity1 = cosine_similarity(opt1,opt2)
cosine_similarity2 = cosine_similarity(opt1,opt3)
print(cosine_similarity1)
print(cosine_similarity2)
cosine_similarity3 = cosine_similarity(lopt1,lopt2)
print(cosine_similarity3)
cosine_similarity4 = cosine_similarity(opt1,lopt1)
print(cosine_similarity4)
‘’‘
[[0.96489584]]
[[0.8336769]]
[[0.96856475]]
[[0.86439896]]
’‘’
Estimated below 0.9 Are all irrelevant texts
Maybe we can test more
Actually, I wanted to continue fine-tune A little bit. But I don't know how to find the right data set You can't copy the poor lines of data several times
边栏推荐
- For instruction, uploading pictures and display effect optimization of simple wechat applet development
- Pat class B "1104 forever" DFS optimization idea
- Applet get user avatar and nickname
- [mathematical logic] normal form (conjunctive normal form | disjunctive normal form | major item | minor item | maximal item | minor item | principal conjunctive normal form | principal disjunctive no
- MongoDB主配置文件
- 编译文件时报错:错误: 编码GBK的不可映射字符
- MongoDB安装 & 部署
- [MySQL] the difference between left join, right join and join
- [mathematical logic] propositional logic (propositional logic reasoning | formal structure of reasoning | inference law | additional law | simplification law | hypothetical reasoning | refusal | disju
- Summary of electromagnetic spectrum
猜你喜欢
![[embedded module] OLED display module](/img/c4/474f5ee580d132654fbd1a4cd53bab.jpg)
[embedded module] OLED display module

用Three.js做一个简单的3D场景

The series of hyperbolic function in daily problem

900w+ data, from 17s to 300ms, how to operate

Captura下载安装及在Captura配置FFmpeg

Why does thread crash not cause JVM crash

How to move towards IPv6: IPv6 Transition Technology - Shangwen network quigo

LVGL使用心得

小程序获取用户头像和昵称

Tidal characteristics of the Bohai Sea and the Yellow Sea
随机推荐
C programming learning notes [edited by Mr. Tan Haoqiang] (Chapter III sequence programming) 05 data input and output
FileZilla Client下载安装
递归使用和多维数组对象变一维数组对象
Elsevier latex 提交文章 pdftex.def Error: File `thumbnails/cas-email.jpeg‘ not found: using draf
Ffmpeg recording screen and screenshot
FileZilla Client下載安裝
node,npm以及yarn下载安装
umi 路由拦截(简单粗暴)
Applet get user avatar and nickname
Dynamic programming: longest common substring and longest common subsequence
可分离债券与可转债
解决高並發下System.currentTimeMillis卡頓
C # webrequest post mode, based on "basic auth" password authentication mode, uploads files and submits other data using multipart / form data mode
Mysql Mac版下载安装教程
简易版 微信小程序开发之for指令、上传图片及展示效果优化
[mathematical logic] predicate logic (individual word | individual domain | predicate | full name quantifier | existence quantifier | predicate formula | exercise)
On the adjacency matrix and adjacency table of graph storage
Positioning (relative positioning, absolute positioning, fixed positioning, Z-index) 2022-2-11
Simple wechat applet development page Jump, data binding, obtaining user information, obtaining user location information
com. fasterxml. jackson. databind. Exc.invalidformatexception problem