当前位置:网站首页>Fasttext text text classification
Fasttext text text classification
2022-07-02 04:56:00 【MasonYyp】
1 install fastText
facebook Reference address
https://github.com/facebookresearch/fastText
fastText Installation package
https://www.lfd.uci.edu/~gohlke/pythonlibs/#fasttext
Use tar File installation is troublesome , It is recommended to use whl install
pip install fasttext‑0.9.2‑cp38‑cp38‑win_amd64.whl
Developing documents
# python Developing documents
https://fasttext.cc/docs/en/python-module.html
# js Developing documents
https://fasttext.cc/docs/en/webassembly-module.html
2 Source file
import fasttext
# Eliminate warnings
# Warning : `load_model` does not return WordVectorModel or SupervisedModel any more, but a `FastText` object which is very similar
fasttext.FastText.eprint = lambda x: None
# Classification model name
classifier_model_name = "model_classify.bin"
# Training models
def train_model():
# Value range of parameter standard : lr =[0.1, 1.0], epoch=[5-50], wordNgrams=[1-5]
# loss Parameters
# When implementing multi label classification ,loss=ova,ova Express one-vs-all
# When the amount of data is large ,loss=hs,hs Express hierarchical softmax
model = fasttext.train_supervised("train.txt", lr=0.1, epoch=25, wordNgrams=4, loss='softmax', label_prefix='__label__')
# Save the model
model.save_model(classifier_model_name)
# test model
def test_model():
# Load model
classifier = fasttext.load_model(classifier_model_name)
# Test data
res_test = classifier.test("test.txt")
print(" Data volume :", res_test[0])
print(" Accuracy rate :", res_test[1])
print(" Recall rate :", res_test[2])
predict_file = open('predict.txt', 'w', encoding='utf-8')
with open('test.txt', encoding='utf-8') as fp:
# Format of each row of data : label + Text , Label from ’__label__‘+ Category composition
for line in fp.readlines():
line = line.strip()
# Predicted results , Raw data
predict_file.write(classifier.predict(line)[0][0] + ',\t' + line + '\n')
predict_file.close()
# prediction model
def predict_text():
classifier = fasttext.load_model(classifier_model_name)
# text Is a list of predicted text , k Indicates the number of output tags ,-1 Indicates all outputs
res_predict = classifier.predict(text=[" novel coronavirus pneumonia ", " Intelligent development "], k=-1)
print(" Probability list :", res_predict)
# text Is the predicted text , By default, the label and probability with the greatest similarity are returned
res_predict = classifier.predict(text=" epidemic situation ")
print(" probability :", res_predict)
if __name__ == '__main__':
train_model()
test_model()
predict_text()
3 data format
The training sample
__label__0 Intelligent scientific development , Artificial intelligence science
__label__1 Novel coronavirus pneumonia
__label__0 Deep learning techniques , Machine learning technology
__label__1 The epidemic situation has been effectively controlled
Test samples
__label__0 China has made some progress in the field of artificial intelligence
__label__1 China has effectively controlled the epidemic
边栏推荐
- GeoTrust ov multi domain SSL certificate is 2100 yuan a year. How many domain names does it contain?
- Learn BeanShell before you dare to say you know JMeter
- [high speed bus] Introduction to jesd204b
- Solution: the agent throws an exception error
- Learn AI safety monitoring project from zero [attach detailed code]
- Summary of main account information of zhengdaliu 4
- C - derived classes and constructors
- MMAP zero copy knowledge point notes
- 6.30 year end summary, end of student age
- Comp 250 parsing
猜你喜欢
Solution: the agent throws an exception error
Learn what definitelytyped is through the typescript development environment of SAP ui5
Video multiple effects production, fade in effect and border background are added at the same time
正大美欧4的主账户关注什么数据?
[Yu Yue education] autumn 2021 reference materials of Tongji University
Future trend of automated testing ----- self healing technology
Markdown编辑语法
Practical problem solving ability of steam Education
LeetCode-对链表进行插入排序
Precipitate yourself and stay up late to sort out 100 knowledge points of interface testing professional literacy
随机推荐
Detailed process of DC-1 range construction and penetration practice (DC range Series)
6.30 year end summary, end of student age
Steam教育的实际问题解决能力
Mathematical knowledge (Euler function)
[understand one article] FD_ Use of set
Getting started with pytest -- description of fixture parameters
Thinkphp内核工单系统源码商业开源版 多用户+多客服+短信+邮件通知
oracle 存储过程与job任务设置
Pit encountered in win11 pytorch GPU installation
UNET deployment based on deepstream
Hcip day 17
Use of Baidu map
CubeMx DMA笔记
TypeScript类的使用
How to recover deleted data in disk
Solution of DM database unable to open graphical interface
Getting started with pytest ----- confitest Application of PY
缓存一致性解决方案——改数据时如何保证缓存和数据库中数据的一致性
What data does the main account of Zhengda Meiou 4 pay attention to?
[graduation season · advanced technology Er] young people have dreams, why are they afraid of hesitation