当前位置:网站首页>Read a paper_ ChineseBert
Read a paper_ ChineseBert
2022-07-03 03:39:00 【7frog7】
Recently, I found a very interesting bert It combines partials and Pinyin According to the article, the effect is good
Portal
According to previous habits First measure the cosine similarity
from datasets.bert_dataset import BertDataset
from models.modeling_glycebert import GlyceBertModel
from sklearn.metrics.pairwise import cosine_similarity
import torch
import numpy
CHINESEBERT_PATH = "C:\\Users\\1323231\\ChineseBERT-base"
tokenizer = BertDataset(CHINESEBERT_PATH)
chinese_bert = GlyceBertModel.from_pretrained(CHINESEBERT_PATH)
sentence1 = ' I like cats '
sentence2 = ' I also like cats '
sentence3 = ' Yueji ちぬ、 The wind も blow きぬべし'# I saw vocabulary.txt I found a pseudonym in it This right should be a control group Although it should have been ちぬ べし
long_sen1 = ' We must grasp the same from the opposite , Grasp opposites in the same , Use contact , Developing , A comprehensive view , Especially from a contradictory point of view . Have a deep understanding of the opposition between the two sides of the contradiction and the relationship between the two sides , The two are of different nature , But it is based on the common essence . Both see the contradiction between the two sides , See unity and transformation again , Only in this way can we really grasp the contradiction , Grasp the development of things .'
long_sen2 = ' Everything contains contradictions . Both sides of the contradiction are opposite , Again unified , Both struggle with each other , And interdependent , Infiltration and mutual transformation under certain conditions .'
def sentence2vec(sentence):
input_ids, pinyin_ids = tokenizer.tokenize_sentence(sentence)
length = input_ids.shape[0]
input_ids = input_ids.view(1, length)
pinyin_ids = pinyin_ids.view(1, length, 8)
output_hidden = chinese_bert.forward(input_ids, pinyin_ids)[0]
num = output_hidden.shape[1]
#output_hidden = output_hidden.squeeze(0)
output_hidden = torch.sum(output_hidden ,1)
output_hidden = output_hidden/num
return output_hidden
opt1 = sentence2vec(sentence1).detach().numpy()
opt2 = sentence2vec(sentence2).detach().numpy()
opt3 = sentence2vec(sentence3).detach().numpy()
lopt1 = sentence2vec(long_sen1).detach().numpy()
lopt2 = sentence2vec(long_sen2).detach().numpy()
cosine_similarity1 = cosine_similarity(opt1,opt2)
cosine_similarity2 = cosine_similarity(opt1,opt3)
print(cosine_similarity1)
print(cosine_similarity2)
cosine_similarity3 = cosine_similarity(lopt1,lopt2)
print(cosine_similarity3)
cosine_similarity4 = cosine_similarity(opt1,lopt1)
print(cosine_similarity4)
‘’‘
[[0.96489584]]
[[0.8336769]]
[[0.96856475]]
[[0.86439896]]
’‘’
Estimated below 0.9 Are all irrelevant texts
Maybe we can test more
Actually, I wanted to continue fine-tune A little bit. But I don't know how to find the right data set You can't copy the poor lines of data several times
边栏推荐
- numpy之 警告VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences
- [mathematical logic] predicate logic (individual word | individual domain | predicate | full name quantifier | existence quantifier | predicate formula | exercise)
- @The difference between Autowired, @qualifier, @resource
- Web会话管理安全问题
- ffmpeg之 一张/多张图片合成视频
- Bid farewell to artificial mental retardation: Mengzi open source project team received RMB 100 million financing to help NLP develop
- Some preliminary preparations for QQ applet development: make an appointment for a development account, download and install developer tools, and create QQ applet
- QQ小程序开发之 一些前期准备:预约开发账号、下载安装开发者工具、创建qq小程序
- Numpy warning visibledeprecationwarning: creating an ndarray from ragged needed sequences
- 【AI实战】应用xgboost.XGBRegressor搭建空气质量预测模型(一)
猜你喜欢

Captura下载安装及在Captura配置FFmpeg

Vs 2019 configuration tensorrt

ffmpeg下载安装教程及介绍

IPv6 transition technology-6to4 manual tunnel configuration experiment -- Kuige of Shangwen network
![[embedded module] OLED display module](/img/c4/474f5ee580d132654fbd1a4cd53bab.jpg)
[embedded module] OLED display module

Docker install and start MySQL service

Ffmpeg download and installation tutorial and introduction

Hi3536c v100r001c02spc040 cross compiler installation

MongoDB簡介

释放数据力量的Ceph-尚文网络xUP楠哥
随机推荐
UMI route interception (simple and rough)
C# WebRequest POST模式 ,基于“Basic Auth”口令认证模式,使用multipart/form-data方式上传文件及提交其他数据
Table structure of Navicat export database
Leetcode: dynamic planning template
Ansible introduction [unfinished (semi-finished products)]
Avec trois. JS fait une scène 3D simple
[embedded module] OLED display module
Limit of one question per day
[mathematical logic] propositional logic (propositional logic reasoning | formal structure of reasoning | inference law | additional law | simplification law | hypothetical reasoning | refusal | disju
没有sXid,suid&sgid将进入险境!-尚文网络xUP楠哥
Download and install node, NPM and yarn
Captura下载安装及在Captura配置FFmpeg
Web session management security issues
Without sxid, suid & sgid will be in danger- Shangwen network xUP Nange
MySQL MAC download and installation tutorial
[learning notes] seckill - seckill project - (11) project summary
Using jasmine to monitor constructors - spying on a constructor using Jasmine
Limit of one question per day
The series of hyperbolic function in daily problem
C语言HashTable/HashSet库汇总