当前位置:网站首页>Read a paper_ ChineseBert
Read a paper_ ChineseBert
2022-07-03 03:39:00 【7frog7】
Recently, I found a very interesting bert It combines partials and Pinyin According to the article, the effect is good
Portal
According to previous habits First measure the cosine similarity
from datasets.bert_dataset import BertDataset
from models.modeling_glycebert import GlyceBertModel
from sklearn.metrics.pairwise import cosine_similarity
import torch
import numpy
CHINESEBERT_PATH = "C:\\Users\\1323231\\ChineseBERT-base"
tokenizer = BertDataset(CHINESEBERT_PATH)
chinese_bert = GlyceBertModel.from_pretrained(CHINESEBERT_PATH)
sentence1 = ' I like cats '
sentence2 = ' I also like cats '
sentence3 = ' Yueji ちぬ、 The wind も blow きぬべし'# I saw vocabulary.txt I found a pseudonym in it This right should be a control group Although it should have been ちぬ べし
long_sen1 = ' We must grasp the same from the opposite , Grasp opposites in the same , Use contact , Developing , A comprehensive view , Especially from a contradictory point of view . Have a deep understanding of the opposition between the two sides of the contradiction and the relationship between the two sides , The two are of different nature , But it is based on the common essence . Both see the contradiction between the two sides , See unity and transformation again , Only in this way can we really grasp the contradiction , Grasp the development of things .'
long_sen2 = ' Everything contains contradictions . Both sides of the contradiction are opposite , Again unified , Both struggle with each other , And interdependent , Infiltration and mutual transformation under certain conditions .'
def sentence2vec(sentence):
input_ids, pinyin_ids = tokenizer.tokenize_sentence(sentence)
length = input_ids.shape[0]
input_ids = input_ids.view(1, length)
pinyin_ids = pinyin_ids.view(1, length, 8)
output_hidden = chinese_bert.forward(input_ids, pinyin_ids)[0]
num = output_hidden.shape[1]
#output_hidden = output_hidden.squeeze(0)
output_hidden = torch.sum(output_hidden ,1)
output_hidden = output_hidden/num
return output_hidden
opt1 = sentence2vec(sentence1).detach().numpy()
opt2 = sentence2vec(sentence2).detach().numpy()
opt3 = sentence2vec(sentence3).detach().numpy()
lopt1 = sentence2vec(long_sen1).detach().numpy()
lopt2 = sentence2vec(long_sen2).detach().numpy()
cosine_similarity1 = cosine_similarity(opt1,opt2)
cosine_similarity2 = cosine_similarity(opt1,opt3)
print(cosine_similarity1)
print(cosine_similarity2)
cosine_similarity3 = cosine_similarity(lopt1,lopt2)
print(cosine_similarity3)
cosine_similarity4 = cosine_similarity(opt1,lopt1)
print(cosine_similarity4)
‘’‘
[[0.96489584]]
[[0.8336769]]
[[0.96856475]]
[[0.86439896]]
’‘’
Estimated below 0.9 Are all irrelevant texts
Maybe we can test more
Actually, I wanted to continue fine-tune A little bit. But I don't know how to find the right data set You can't copy the poor lines of data several times
边栏推荐
- Recursion: quick sort, merge sort and heap sort
- Open Visual Studio 2010 hangs when opening a SQL file sql file
- 可分离债券与可转债
- MySQL MAC download and installation tutorial
- Simple wechat applet development page Jump, data binding, obtaining user information, obtaining user location information
- docker安装及启动mysql服务
- Download and install node, NPM and yarn
- Stepping on pits and solutions when using inputfilter to limit EditText
- Download and install captura and configure ffmpeg in captura
- 编译文件时报错:错误: 编码GBK的不可映射字符
猜你喜欢
Introduction to mongodb
Table structure of Navicat export database
docker安装及启动mysql服务
Use three JS make a simple 3D scene
[MySQL] the difference between left join, right join and join
Message queue addition failure
900w+ data, from 17s to 300ms, how to operate
QQ小程序开发之 一些前期准备:预约开发账号、下载安装开发者工具、创建qq小程序
Download and install node, NPM and yarn
Some preliminary preparations for QQ applet development: make an appointment for a development account, download and install developer tools, and create QQ applet
随机推荐
【DRM】DRM bridge驱动调用流程简单分析
Summary of determinant knowledge points in Chapter 1 of Linear Algebra (Jeff's self perception)
[mathematical logic] normal form (conjunctive normal form | disjunctive normal form | major item | minor item | maximal item | minor item | principal conjunctive normal form | principal disjunctive no
C programming learning notes [edited by Mr. Tan Haoqiang] (Chapter III sequence programming) 05 data input and output
FileZilla Client下載安裝
Nce detail of softmax approximation
Recursion: quick sort, merge sort and heap sort
ffmpeg下载安装教程及介绍
可分离债券与可转债
动态规划:最长公共子串和最长公共子序列
Advanced redis applications [password protection, data persistence, master-slave synchronization, sentinel mode, transactions] [not completed yet (semi-finished products)]
Pytoch configuration
MongoDB簡介
TCP/IP模型中的重磅嘉宾TCP--尚文网络奎哥
简易版 微信小程序开发之页面跳转、数据绑定、获取用户信息、获取用户位置信息
How to move towards IPv6: IPv6 Transition Technology - Shangwen network quigo
小程序获取用户头像和昵称
PHP generates PDF tcpdf
Message queue addition failure
Réglez la hauteur et lancez le système. Currenttimemillis catton