当前位置:网站首页>Fasttext text text classification
Fasttext text text classification
2022-07-02 04:56:00 【MasonYyp】
1 install fastText
facebook Reference address
https://github.com/facebookresearch/fastText
fastText Installation package
https://www.lfd.uci.edu/~gohlke/pythonlibs/#fasttext
Use tar File installation is troublesome , It is recommended to use whl install
pip install fasttext‑0.9.2‑cp38‑cp38‑win_amd64.whl
Developing documents
# python Developing documents
https://fasttext.cc/docs/en/python-module.html
# js Developing documents
https://fasttext.cc/docs/en/webassembly-module.html
2 Source file
import fasttext
# Eliminate warnings
# Warning : `load_model` does not return WordVectorModel or SupervisedModel any more, but a `FastText` object which is very similar
fasttext.FastText.eprint = lambda x: None
# Classification model name
classifier_model_name = "model_classify.bin"
# Training models
def train_model():
# Value range of parameter standard : lr =[0.1, 1.0], epoch=[5-50], wordNgrams=[1-5]
# loss Parameters
# When implementing multi label classification ,loss=ova,ova Express one-vs-all
# When the amount of data is large ,loss=hs,hs Express hierarchical softmax
model = fasttext.train_supervised("train.txt", lr=0.1, epoch=25, wordNgrams=4, loss='softmax', label_prefix='__label__')
# Save the model
model.save_model(classifier_model_name)
# test model
def test_model():
# Load model
classifier = fasttext.load_model(classifier_model_name)
# Test data
res_test = classifier.test("test.txt")
print(" Data volume :", res_test[0])
print(" Accuracy rate :", res_test[1])
print(" Recall rate :", res_test[2])
predict_file = open('predict.txt', 'w', encoding='utf-8')
with open('test.txt', encoding='utf-8') as fp:
# Format of each row of data : label + Text , Label from ’__label__‘+ Category composition
for line in fp.readlines():
line = line.strip()
# Predicted results , Raw data
predict_file.write(classifier.predict(line)[0][0] + ',\t' + line + '\n')
predict_file.close()
# prediction model
def predict_text():
classifier = fasttext.load_model(classifier_model_name)
# text Is a list of predicted text , k Indicates the number of output tags ,-1 Indicates all outputs
res_predict = classifier.predict(text=[" novel coronavirus pneumonia ", " Intelligent development "], k=-1)
print(" Probability list :", res_predict)
# text Is the predicted text , By default, the label and probability with the greatest similarity are returned
res_predict = classifier.predict(text=" epidemic situation ")
print(" probability :", res_predict)
if __name__ == '__main__':
train_model()
test_model()
predict_text()
3 data format
The training sample
__label__0 Intelligent scientific development , Artificial intelligence science
__label__1 Novel coronavirus pneumonia
__label__0 Deep learning techniques , Machine learning technology
__label__1 The epidemic situation has been effectively controlled
Test samples
__label__0 China has made some progress in the field of artificial intelligence
__label__1 China has effectively controlled the epidemic
边栏推荐
- 数学知识——快速幂的理解及例题
- Ansible installation and use
- Markdown编辑语法
- oracle 存储过程与job任务设置
- Keil compilation code of CY7C68013A
- Super detailed pycharm tutorial
- The underlying principle of go map (storage and capacity expansion)
- Idea automatic package import and automatic package deletion settings
- Online incremental migration of DM database
- Its appearance makes competitors tremble. Interpretation of Sony vision-s 02 products
猜你喜欢
Let正版短信测压开源源码
农业生态领域智能机器人的应用
Learn what definitelytyped is through the typescript development environment of SAP ui5
Social media search engine optimization and its importance
2022-003arts: recursive routine of binary tree
正大留4的主账户信息汇总
How to write a client-side technical solution
Record the bug of unity 2020.3.31f1 once
Mathematical problems (number theory) trial division to judge prime numbers, decompose prime factors, and screen prime numbers
cs架构下抓包的几种方法
随机推荐
Pytest learning ----- pytest Interface Association framework encapsulation of interface automation testing
Design and implementation of general interface open platform - (44) log processing of API services
Cultivate primary and secondary school students' love for educational robots
Mathematical knowledge (Euler function)
Summary of database problems
Go GC garbage collection notes (three color mark)
Video multiple effects production, fade in effect and border background are added at the same time
Learn AI safety monitoring project from zero [attach detailed code]
LeetCode-归并排序链表
What are the rules and trading hours of agricultural futures contracts? How much is the handling fee deposit?
Embedded-c language-8-character pointer array / large program implementation
解析少儿编程中的动手搭建教程
Rhcsa --- work on the fourth day
Its appearance makes competitors tremble. Interpretation of Sony vision-s 02 products
案例分享|智慧化的西部机场
js面试收藏试题1
解决:代理抛出异常错误
CubeMx DMA笔记
Oracle stored procedure and job task setting
Mouse events in JS