当前位置:网站首页>CSDN blog summary (I) -- a simple first edition implementation
CSDN blog summary (I) -- a simple first edition implementation
2022-07-06 10:42:00 【Alexxinlu】
Catalog
Series articles
Team blog : CSDN AI team
1. background
2. Blog summary
2.1 Blog structured
The blog contains too many elements , Abstracting directly as text will seriously affect the quality of abstracts . So first, we need to structure the blog , After structuring, the content in the body will be effectively distinguished , for example :head( title )、code( Code )、table( form )、text( The paragraph )、img( picture )、link( link ) etc. , It's more convenient 、 Get the content of each part accurately , Provide more convenient and clear structured information for the preprocessing logic and rule logic in the subsequent blog abstracts , And provide better input for the model . The following figure is an example of blog structure :
2.2 The rules section
- The rules 1 : Judge whether there is “ Preface ”、“ Let me write it out front ” And other modules that introduce the article , If any , Directly extract the content in the preface , And cut it to the specified length ( Default length :256)
- The rules 2 : Judge whether there is content before the first level Title , If any , Extract this part directly , And cut it to the specified length ( Default length :256)
2.3 The model part
If the rule cannot extract the summary , Then use TextRank The model abstracts blog posts . The input of the model is except head( title )、code( Code )、table( form )、text( The paragraph )、img( picture )、link( link ) And other text information . The specific implementation process is as follows :
- a) For samples that do not meet the rules , Directly extract and divide pictures 、 Code 、 title 、 All text except the contents and other information ;
- b) Divide the text into sentences , Input to TextRank In the model , Make a text summary ;
- c) TextRank The model will be based on the importance of the sentence , Rate each sentence ( The total score of all sentences is 1);
- d) Rank all sentences from high to low , And splicing in turn , Until the length is close to the specified length , But no longer than the specified length .( Default length :256)
2.4 Score setting
- The score range is : [0, 1]
- The default rule score is :0.5
- Model score : Sum of scores of all spliced sentences
3. Next step
The current version is a preliminary version , Further optimization is needed . Next steps include :
- Build test set , Conduct quantitative effect evaluation . The evaluation index :BLEU、ROUGE;
- Optimization of sentence splicing : Rank all sentences from high to low , Combined with The order of sentences in the original Splicing , Until the length is close to the specified length ;
- TextRank When the algorithm constructs the sentence graph , Consider the weight of words . for example : Based on all blogs in the same tag , Use similar to TF-IDF The algorithm calculates the weight of each word .
P.S.
This series of articles will be continuously updated . hope NLP Colleagues in other fields 、 Teachers and experts can provide valuable advice , thank you !
边栏推荐
- MySQL storage engine
- Super detailed steps to implement Wechat public number H5 Message push
- Mysql24 index data structure
- MySQL28-数据库的设计规范
- Mysql34 other database logs
- [paper reading notes] - cryptographic analysis of short RSA secret exponents
- Security design verification of API interface: ticket, signature, timestamp
- How to change php INI file supports PDO abstraction layer
- Mysql25 index creation and design principles
- Complete web login process through filter
猜你喜欢
MySQL26-性能分析工具的使用
Breadth first search rotten orange
Mysql35 master slave replication
[after reading the series] how to realize app automation without programming (automatically start Kwai APP)
CSDN-NLP:基于技能树和弱监督学习的博文难度等级分类 (一)
MySQL24-索引的数据结构
How to build an interface automation testing framework?
API learning of OpenGL (2002) smooth flat of glsl
windows无法启动MYSQL服务(位于本地计算机)错误1067进程意外终止
Mysql25 index creation and design principles
随机推荐
How to change php INI file supports PDO abstraction layer
Mysql32 lock
该不会还有人不懂用C语言写扫雷游戏吧
Case identification based on pytoch pulmonary infection (using RESNET network structure)
基于Pytorch肺部感染识别案例(采用ResNet网络结构)
Use JUnit unit test & transaction usage
Kubernetes - problems and Solutions
Pytorch LSTM实现流程(可视化版本)
Mysql22 logical architecture
What is the current situation of the game industry in the Internet world?
MySQL22-逻辑架构
MySQL combat optimization expert 02 in order to execute SQL statements, do you know what kind of architectural design MySQL uses?
C language advanced pointer Full Version (array pointer, pointer array discrimination, function pointer)
MySQL34-其他数据库日志
MySQL real battle optimization expert 11 starts with the addition, deletion and modification of data. Review the status of buffer pool in the database
[after reading the series] how to realize app automation without programming (automatically start Kwai APP)
用于实时端到端文本识别的自适应Bezier曲线网络
Implement context manager through with
MySQL31-MySQL事务日志
[after reading the series of must know] one of how to realize app automation without programming (preparation)