当前位置:网站首页>Read a paper_ ChineseBert
Read a paper_ ChineseBert
2022-07-03 03:39:00 【7frog7】
Recently, I found a very interesting bert It combines partials and Pinyin According to the article, the effect is good
Portal
According to previous habits First measure the cosine similarity
from datasets.bert_dataset import BertDataset
from models.modeling_glycebert import GlyceBertModel
from sklearn.metrics.pairwise import cosine_similarity
import torch
import numpy
CHINESEBERT_PATH = "C:\\Users\\1323231\\ChineseBERT-base"
tokenizer = BertDataset(CHINESEBERT_PATH)
chinese_bert = GlyceBertModel.from_pretrained(CHINESEBERT_PATH)
sentence1 = ' I like cats '
sentence2 = ' I also like cats '
sentence3 = ' Yueji ちぬ、 The wind も blow きぬべし'# I saw vocabulary.txt I found a pseudonym in it This right should be a control group Although it should have been ちぬ べし
long_sen1 = ' We must grasp the same from the opposite , Grasp opposites in the same , Use contact , Developing , A comprehensive view , Especially from a contradictory point of view . Have a deep understanding of the opposition between the two sides of the contradiction and the relationship between the two sides , The two are of different nature , But it is based on the common essence . Both see the contradiction between the two sides , See unity and transformation again , Only in this way can we really grasp the contradiction , Grasp the development of things .'
long_sen2 = ' Everything contains contradictions . Both sides of the contradiction are opposite , Again unified , Both struggle with each other , And interdependent , Infiltration and mutual transformation under certain conditions .'
def sentence2vec(sentence):
input_ids, pinyin_ids = tokenizer.tokenize_sentence(sentence)
length = input_ids.shape[0]
input_ids = input_ids.view(1, length)
pinyin_ids = pinyin_ids.view(1, length, 8)
output_hidden = chinese_bert.forward(input_ids, pinyin_ids)[0]
num = output_hidden.shape[1]
#output_hidden = output_hidden.squeeze(0)
output_hidden = torch.sum(output_hidden ,1)
output_hidden = output_hidden/num
return output_hidden
opt1 = sentence2vec(sentence1).detach().numpy()
opt2 = sentence2vec(sentence2).detach().numpy()
opt3 = sentence2vec(sentence3).detach().numpy()
lopt1 = sentence2vec(long_sen1).detach().numpy()
lopt2 = sentence2vec(long_sen2).detach().numpy()
cosine_similarity1 = cosine_similarity(opt1,opt2)
cosine_similarity2 = cosine_similarity(opt1,opt3)
print(cosine_similarity1)
print(cosine_similarity2)
cosine_similarity3 = cosine_similarity(lopt1,lopt2)
print(cosine_similarity3)
cosine_similarity4 = cosine_similarity(opt1,lopt1)
print(cosine_similarity4)
‘’‘
[[0.96489584]]
[[0.8336769]]
[[0.96856475]]
[[0.86439896]]
’‘’
Estimated below 0.9 Are all irrelevant texts
Maybe we can test more
Actually, I wanted to continue fine-tune A little bit. But I don't know how to find the right data set You can't copy the poor lines of data several times
边栏推荐
- Pat class B common function Usage Summary
- sigaction的使用
- Unity3d RPG implementation (medium)
- 二进制流转换成字节数组
- Ffmpeg recording screen and screenshot
- MongoDB主配置文件
- Pytoch configuration
- Converts a timestamp to a time in the specified format
- Nanning water leakage detection: warmly congratulate Guangxi Zhongshui on winning the first famous brand in Guangxi
- 简易版 微信小程序开发之for指令、上传图片及展示效果优化
猜你喜欢

Leetcode: dynamic planning template

Nanning water leakage detection: warmly congratulate Guangxi Zhongshui on winning the first famous brand in Guangxi

Use three JS make a simple 3D scene

简易版 微信小程序开发之for指令、上传图片及展示效果优化

如何迈向IPv6之IPv6过渡技术-尚文网络奎哥

npm : 无法将“npm”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写,如果包括路径,请确保路径正确,然后再试一次。

Pytoch configuration

IPv6过渡技术-6to4手工隧道配置实验--尚文网络奎哥

Vs 2019 installation and configuration opencv

Makefile demo
随机推荐
Learning notes of C programming [compiled by Mr. Tan Haoqiang] (Chapter III sequence programming) 04 C sentence
Recursion: quick sort, merge sort and heap sort
Web会话管理安全问题
Mongodb master profile
PHP generates PDF tcpdf
Stop using system Currenttimemillis() takes too long to count. It's too low. Stopwatch is easy to use!
Message queue addition failure
IPv6 transition technology-6to4 manual tunnel configuration experiment -- Kuige of Shangwen network
Ffmpeg one / more pictures synthetic video
Convert binary stream to byte array
UMI route interception (simple and rough)
Ffmpeg recording screen and screenshot
没有sXid,suid&sgid将进入险境!-尚文网络xUP楠哥
Introduction à mongodb
8.8.2-PointersOnC-20220214
Positioning (relative positioning, absolute positioning, fixed positioning, Z-index) 2022-2-11
ffmpeg录制屏幕和截屏
简易版 微信小程序开发之for指令、上传图片及展示效果优化
Captura下载安装及在Captura配置FFmpeg
Docker install and start MySQL service