当前位置:网站首页>Fasttext text text classification
Fasttext text text classification
2022-07-02 04:56:00 【MasonYyp】
1 install fastText
facebook Reference address
https://github.com/facebookresearch/fastText
fastText Installation package
https://www.lfd.uci.edu/~gohlke/pythonlibs/#fasttext
Use tar File installation is troublesome , It is recommended to use whl install
pip install fasttext‑0.9.2‑cp38‑cp38‑win_amd64.whl
Developing documents
# python Developing documents
https://fasttext.cc/docs/en/python-module.html
# js Developing documents
https://fasttext.cc/docs/en/webassembly-module.html
2 Source file
import fasttext
# Eliminate warnings
# Warning : `load_model` does not return WordVectorModel or SupervisedModel any more, but a `FastText` object which is very similar
fasttext.FastText.eprint = lambda x: None
# Classification model name
classifier_model_name = "model_classify.bin"
# Training models
def train_model():
# Value range of parameter standard : lr =[0.1, 1.0], epoch=[5-50], wordNgrams=[1-5]
# loss Parameters
# When implementing multi label classification ,loss=ova,ova Express one-vs-all
# When the amount of data is large ,loss=hs,hs Express hierarchical softmax
model = fasttext.train_supervised("train.txt", lr=0.1, epoch=25, wordNgrams=4, loss='softmax', label_prefix='__label__')
# Save the model
model.save_model(classifier_model_name)
# test model
def test_model():
# Load model
classifier = fasttext.load_model(classifier_model_name)
# Test data
res_test = classifier.test("test.txt")
print(" Data volume :", res_test[0])
print(" Accuracy rate :", res_test[1])
print(" Recall rate :", res_test[2])
predict_file = open('predict.txt', 'w', encoding='utf-8')
with open('test.txt', encoding='utf-8') as fp:
# Format of each row of data : label + Text , Label from ’__label__‘+ Category composition
for line in fp.readlines():
line = line.strip()
# Predicted results , Raw data
predict_file.write(classifier.predict(line)[0][0] + ',\t' + line + '\n')
predict_file.close()
# prediction model
def predict_text():
classifier = fasttext.load_model(classifier_model_name)
# text Is a list of predicted text , k Indicates the number of output tags ,-1 Indicates all outputs
res_predict = classifier.predict(text=[" novel coronavirus pneumonia ", " Intelligent development "], k=-1)
print(" Probability list :", res_predict)
# text Is the predicted text , By default, the label and probability with the greatest similarity are returned
res_predict = classifier.predict(text=" epidemic situation ")
print(" probability :", res_predict)
if __name__ == '__main__':
train_model()
test_model()
predict_text()
3 data format
The training sample
__label__0 Intelligent scientific development , Artificial intelligence science
__label__1 Novel coronavirus pneumonia
__label__0 Deep learning techniques , Machine learning technology
__label__1 The epidemic situation has been effectively controlled
Test samples
__label__0 China has made some progress in the field of artificial intelligence
__label__1 China has effectively controlled the epidemic
边栏推荐
- Leetcode- insert and sort the linked list
- Binary tree problem solving (1)
- Thinkphp内核工单系统源码商业开源版 多用户+多客服+短信+邮件通知
- Pytest learning ----- pytest Interface Association framework encapsulation of interface automation testing
- Comp 250 parsing
- DMA Porter
- idea自动导包和自动删包设置
- [common error] the DDR type of FPGA device is selected incorrectly
- Idea automatic package import and automatic package deletion settings
- cs架构下抓包的几种方法
猜你喜欢

UNET deployment based on deepstream

VMware installation win10 reports an error: operating system not found

cs架构下抓包的几种方法

数学知识——快速幂的理解及例题

Getting started with pytest ----- confitest Application of PY

Vmware安装win10报错:operating system not found

Virtual machine installation deepin system

数学问题(数论)试除法做质数的判断、分解质因数,筛质数

win11安装pytorch-gpu遇到的坑

The underlying principle of go map (storage and capacity expansion)
随机推荐
Acelems Expressway microgrid energy efficiency management platform and intelligent lighting solution intelligent lighting tunnel
Getting started with pytest ----- confitest Application of PY
数学知识——快速幂的理解及例题
6.30 year end summary, end of student age
DC-1靶场搭建及渗透实战详细过程(DC靶场系列)
Interview question: do you know the difference between deep copy and shallow copy? What is a reference copy?
关于Steam 教育的知识整理
Let正版短信测压开源源码
List of common bugs in software testing
Binary tree problem solving (2)
Realize the function of data uploading
ThinkPHP kernel work order system source code commercial open source version multi user + multi customer service + SMS + email notification
Lay the foundation for children's programming to become a basic discipline
洛谷入门3【循环结构】题单题解
LM09丨费雪逆变换反转网格策略
初学爬虫-笔趣阁爬虫
Leetcode- insert and sort the linked list
Change deepin to Alibaba image source
C # picture display occupancy problem
Getting started with pytest -- description of fixture parameters