当前位置:网站首页>Read a paper_ ChineseBert
Read a paper_ ChineseBert
2022-07-03 03:39:00 【7frog7】
Recently, I found a very interesting bert It combines partials and Pinyin According to the article, the effect is good
Portal
According to previous habits First measure the cosine similarity
from datasets.bert_dataset import BertDataset
from models.modeling_glycebert import GlyceBertModel
from sklearn.metrics.pairwise import cosine_similarity
import torch
import numpy
CHINESEBERT_PATH = "C:\\Users\\1323231\\ChineseBERT-base"
tokenizer = BertDataset(CHINESEBERT_PATH)
chinese_bert = GlyceBertModel.from_pretrained(CHINESEBERT_PATH)
sentence1 = ' I like cats '
sentence2 = ' I also like cats '
sentence3 = ' Yueji ちぬ、 The wind も blow きぬべし'# I saw vocabulary.txt I found a pseudonym in it This right should be a control group Although it should have been ちぬ べし
long_sen1 = ' We must grasp the same from the opposite , Grasp opposites in the same , Use contact , Developing , A comprehensive view , Especially from a contradictory point of view . Have a deep understanding of the opposition between the two sides of the contradiction and the relationship between the two sides , The two are of different nature , But it is based on the common essence . Both see the contradiction between the two sides , See unity and transformation again , Only in this way can we really grasp the contradiction , Grasp the development of things .'
long_sen2 = ' Everything contains contradictions . Both sides of the contradiction are opposite , Again unified , Both struggle with each other , And interdependent , Infiltration and mutual transformation under certain conditions .'
def sentence2vec(sentence):
input_ids, pinyin_ids = tokenizer.tokenize_sentence(sentence)
length = input_ids.shape[0]
input_ids = input_ids.view(1, length)
pinyin_ids = pinyin_ids.view(1, length, 8)
output_hidden = chinese_bert.forward(input_ids, pinyin_ids)[0]
num = output_hidden.shape[1]
#output_hidden = output_hidden.squeeze(0)
output_hidden = torch.sum(output_hidden ,1)
output_hidden = output_hidden/num
return output_hidden
opt1 = sentence2vec(sentence1).detach().numpy()
opt2 = sentence2vec(sentence2).detach().numpy()
opt3 = sentence2vec(sentence3).detach().numpy()
lopt1 = sentence2vec(long_sen1).detach().numpy()
lopt2 = sentence2vec(long_sen2).detach().numpy()
cosine_similarity1 = cosine_similarity(opt1,opt2)
cosine_similarity2 = cosine_similarity(opt1,opt3)
print(cosine_similarity1)
print(cosine_similarity2)
cosine_similarity3 = cosine_similarity(lopt1,lopt2)
print(cosine_similarity3)
cosine_similarity4 = cosine_similarity(opt1,lopt1)
print(cosine_similarity4)
‘’‘
[[0.96489584]]
[[0.8336769]]
[[0.96856475]]
[[0.86439896]]
’‘’
Estimated below 0.9 Are all irrelevant texts
Maybe we can test more
Actually, I wanted to continue fine-tune A little bit. But I don't know how to find the right data set You can't copy the poor lines of data several times
边栏推荐
- Message queue addition failure
- Applet get user avatar and nickname
- [mathematical logic] propositional logic (propositional logic reasoning | formal structure of reasoning | inference law | additional law | simplification law | hypothetical reasoning | refusal | disju
- shardingsphere动态数据源
- Recursion: quick sort, merge sort and heap sort
- Makefile demo
- Stepping on pits and solutions when using inputfilter to limit EditText
- Elsevier latex 提交文章 pdftex.def Error: File `thumbnails/cas-email.jpeg‘ not found: using draf
- 如何迈向IPv6之IPv6过渡技术-尚文网络奎哥
- Solve high and send system Currenttimemillis Caton
猜你喜欢

Mongodb installation & Deployment

Error in compiled file: error: unmapped character encoding GBK
![Ansible introduction [unfinished (semi-finished products)]](/img/2a/0003daf761ba02d8837c4657fc3f29.png)
Ansible introduction [unfinished (semi-finished products)]

Lvgl usage experience

MongoDB简介

Avec trois. JS fait une scène 3D simple

On the adjacency matrix and adjacency table of graph storage

Unity3d RPG implementation (medium)

docker安装及启动mysql服务

900W+ 数据,从 17s 到 300ms,如何操作
随机推荐
C# WebRequest POST模式 ,基于“Basic Auth”口令认证模式,使用multipart/form-data方式上传文件及提交其他数据
[mathematical logic] propositional logic (propositional and connective review | propositional formula | connective priority | truth table satisfiable contradiction tautology)
Ffmpeg one / more pictures synthetic video
动态规划:最长公共子串和最长公共子序列
[learning notes] seckill - seckill project - (11) project summary
Mysql Mac版下载安装教程
On the adjacency matrix and adjacency table of graph storage
静态网页 和 动态网页的区别 & WEB1.0和WEB2.0的区别 & GET 和 POST 的区别
释放数据力量的Ceph-尚文网络xUP楠哥
Message queue addition failure
New programmers use the isXXX form to define Boolean types in the morning, and are discouraged in the afternoon?
MongoDB簡介
Filter
Section 26 detailed explanation and demonstration of IPSec virtual private network configuration experiment - simulation experiment based on packettracer8.0
简易版 微信小程序开发之页面跳转、数据绑定、获取用户信息、获取用户位置信息
Bid farewell to artificial mental retardation: Mengzi open source project team received RMB 100 million financing to help NLP develop
Limit of one question per day
FileZilla Client下载安装
MySQL MAC download and installation tutorial
ffmpeg录制屏幕和截屏