当前位置:网站首页>NLP model Bert: from introduction to mastery (2)
NLP model Bert: from introduction to mastery (2)
2020-11-06 01:22:00 【Elementary school students in IT field】
Named entity recognition
First download the corresponding bert modular
pip install bert-base==0.0.9 -i https://pypi.python.org/simple
Also can reference Official website Handle
What the package now supports
1. Named entity recognition training
2. Services for Named Entity Recognition C/S
3. Inherit excellent open source software :bert_as_service(hanxiao) Of BERT All services
4. Text categorization Services
The following functions will continue to increase
Training named entity recognition model based on named row :
installed bert-base after , Two tools based on named rows will be generated , among bert-base-ner-train Support the training of named entity recognition model , You just need to specify the directory of training data ,BERT The directory of relevant parameters can be . You can use the following command to view help
The examples of training are named as follows :
bert-base-ner-train \
-data_dir {your dataset dir}\
-output_dir {training output dir}\
-init_checkpoint {Google BERT model dir}\
-bert_config_file {bert_config.json under the Google BERT model dir} \
-vocab_file {vocab.txt under the Google BERT model dir}
Parameter description
among data_dir It's the directory where your data is located , Training data , The naming format of validation data and test data is :train.txt, dev.txt,test.txt, Please name the file in this format , Otherwise, an error will be reported .
The format of training data is as follows :
The sea O
fishing O
Than O
" O
The earth O
spot O
stay O
mansion B-LOC
door I-LOC
And O
gold B-LOC
door I-LOC
And O
between O
Of O
The sea O
Domain O
. O
The first word in each line is , The second is its label , Use spaces ’ ' Separate , Please make sure to use spaces . Use blank lines between sentences . The program will automatically read your data .
output_dir: Training model output file path , Model checkpoint And some tag mapping tables will be stored here , This path is used as a service , Can be specified as -ner_model_dir
init_checkpoint: Download Google BERT Model
bert_config_file : Google BERT Under the model bert_config.json
vocab_file: Google BERT Under the model vocab.txt
After training , You can specify in your output_dir To see the results of your training .
More operations :
One more bert Encapsulation of models
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
- Use of vuepress
- How to select the evaluation index of classification model
- 业内首发车道级导航背后——详解高精定位技术演进与场景应用
- Natural language processing - BM25 commonly used in search
- After brushing leetcode's linked list topic, I found a secret!
- htmlcss
- 6.3 handlerexceptionresolver exception handling (in-depth analysis of SSM and project practice)
- PHP应用对接Justswap专用开发包【JustSwap.PHP】
- Summary of common algorithms of linked list
- Architecture article collection
PHPSHE 短信插件说明
Arrangement of basic knowledge points
Didi elasticsearch cluster cross version upgrade and platform reconfiguration
Swagger 3.0 天天刷屏,真的香嗎?
Calculation script for time series data
What is the difference between data scientists and machine learning engineers? - kdnuggets
2018中国云厂商TOP5:阿里云、腾讯云、AWS、电信、联通 ...
Word segmentation, naming subject recognition, part of speech and grammatical analysis in natural language processing
Python3 e-learning case 4: writing web proxy
6.1.2 handlermapping mapping processor (2) (in-depth analysis of SSM and project practice)
Want to do read-write separation, give you some small experience
向北京集结!OpenI/O 2020启智开发者大会进入倒计时
Deep understanding of common methods of JS array
Grouping operation aligned with specified datum
Natural language processing - BM25 commonly used in search
Calculation script for time series data
What is the side effect free method? How to name it? - Mario
Aprelu: cross border application, adaptive relu | IEEE tie 2020 for machine fault detection
Didi elasticsearch cluster cross version upgrade and platform reconfiguration
This article will introduce you to jest unit test
Elasticsearch 第六篇:聚合統計查詢
High availability cluster deployment of jumpserver: (6) deployment of SSH agent module Koko and implementation of system service management
ES6 essence:
Windows 10 tensorflow (2) regression analysis of principles, deep learning framework (gradient descent method to solve regression parameters)