当前位置:网站首页>CSDN question and answer module Title Recommendation task (I) -- Construction of basic framework
CSDN question and answer module Title Recommendation task (I) -- Construction of basic framework
2022-07-06 10:41:00 【Alexxinlu】
Catalog
Series articles
- CSDN Q & a module Title recommended tasks ( One ) —— Construction of basic framework
- CSDN Q & a module Title recommended tasks ( Two ) —— Effect optimization
Team blog : CSDN AI team
1 Problem definition
1.1 background
stay CSDN Of Q & a module in , Many beginners' question titles lack effective information , for example :
Help the children !
Boss, help me !!! Help me finish this topic
Ask for the great God !!!
In the example above , A better title would be “ How to add scroll bars to mobile pages ?”
In the title of this kind of question , It contains very little useful information , Unable to quickly understand the meaning of the problem from the title , To a certain extent, it will affect the efficiency of question respondents and the user experience . Besides , Such data will further affect the downstream NLP The effect of the mission , for example : Problem classification , Question recommendation, etc .
Therefore, in order to improve the quality of question titles , Need information based on problem description , After the user enters the title and problem description , Recommend more accurate titles to users , And prompt the user to change .
1.2 Input
The available input information is shown below :
"id": 998678,
"title": “ For help , Very simple. C# problem ”,
"body": “ The teacher asked to build a score management system , Classification login is required , But this cheng’xu If the user is a student , Even if comboBox You can log in smoothly without selecting students ,admin There is no such problem , Ask the boss why ?\r\n\t\r\n\t\r\n\tprivate void button1_Click(object sender, EventArgs e)\r\n {\r\n string sUser = txtUser.Text.ToString();\r\n string sPassword = txtPassword.Text.ToString();\r\n\r\n if (sUser == “admin” && sPassword == “1234” && comboBoxLeixing.Text == “ Administrators ”)\r\n {\r\n Menuadmin main = new Menuadmin();\r\n main.Show();\r\n this.Hide();\r\n }\r\n\r\n if (sUser == “ Xuguangrui ” || sUser == “ Cao Guang ” || sUser == “ Cao ziyue ” || sUser == “ Chen Sijia ” || sUser == “ Chen Xu ” || sUser == “ Huang Wenguang ” ||\r\n sUser == “ Lei Zhangshu ” || sUser == “ Liuqingqing ” || sUser == “ Qi SHIMENG ” || sUser == “ Shen bin ” || sUser == “ Shuaixing ” || sUser == “ Sun Quanwei ” ||\r\n sUser == “ Wang Heng ” || sUser == “ Wang Rui ” || sUser == “ Xiang Meng ” || sUser == “ Zhang Guoliang ” || sUser == “ Zhangzongyou ” || sUser == “ Zhang Shumin ”\r\n && sPassword == “1234” && comboBoxLeixing.Text == “ Student ”)\r\n {\r\n Menustudent main = new Menustudent();\r\n main.Show();\r\n this.Hide();\r\n }\r\n\r\n if (sUser == “ Liu Zhaoliang ” || sUser == “ Longlong ” || sUser == “ Feng Wei ” || sUser == “ Liu shanyong ” ||\r\n sUser == “ India forest ” || sUser == “ Cheng Leli ” || sUser == “ Liu Yan ” || sUser == “ Zhao Junwei ”\r\n && sPassword == “1234” && comboBoxLeixing.Text== “ Teachers' ”)\r\n {\r\n Menuteacher main = new Menuteacher();\r\n main.Show();\r\n this.Hide();\r\n }\r\n \r\n else\r\n label3.Text = “ Wrong user name or password , Please re-enter !”;\r\n }”,
"tag_id": 95,
"tag_name": “c Language ”
The input mainly includes the above five fields , among title Is the title that needs improvement .
Currently only title and body Two fields as input .
1.3 Output
Improved question Title .
2 Solution
This paper further abstracts the problem as NLP Text summary task in , The specific implementation steps are as follows :
2.1 Data preprocessing
At present, the following preprocessing operations are mainly done :
- Remove irrelevant information . for example : Code segment 、URL、 Irrelevant characters, etc ;
- Cut the paragraph into sentences . Segmentation based on delimiters , for example : A newline 、 Full stop 、 question mark 、 Exclamation marks, etc .
2.2 Model
2.2.1 Rough sort
The current scheme uses classic Extraction model TextRank, Rank all sentences entered , The final choice TopN Sentence to recommend .
2.2.2 Fine sorting
Because this article is to recommend the title of the question , Therefore, questions should be given priority .
A dictionary based approach is used here , Identify all questions in the input . Then the result of rough sorting , Put the questions at the top .
2.3 Experimental results and error data analysis
The preliminary analysis results are shown in the figure below :
It can be seen from the above figure , At present, the main problems include :
- Sample question : Some questions body There are only pictures in 、 Code snippets and so on , It does not contain useful Chinese text information .
- The title is too long : The current preprocessing method is too simple , Lead to segmentation , Some sentences are too long , And the current model is the extraction text summarization algorithm , The input sentence will not be modified . Therefore, some recommended titles are too long . And the title of the question is generally more concise .
3 Next step
- Classify the samples , For samples with only images or code snippets , You need to identify and judge the information , Then make a title recommendation .
- Simplify the title , Consider using problem templates or generative text summarization methods for improvement .
P.S.
This series of articles will be continuously updated . What we are doing now is too simple , The effect is not satisfactory , hope NLP Colleagues in other fields 、 Teachers and experts can provide valuable advice , thank you !
边栏推荐
- [Julia] exit notes - Serial
- 评估方法的优缺点
- Mysql32 lock
- MySQL27-索引優化與查詢優化
- MySQL26-性能分析工具的使用
- 基于Pytorch的LSTM实战160万条评论情感分类
- MySQL combat optimization expert 07 production experience: how to conduct 360 degree dead angle pressure test on the database in the production environment?
- MySQL24-索引的数据结构
- UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0xd0 in position 0成功解决
- PyTorch RNN 实战案例_MNIST手写字体识别
猜你喜欢
Mysql21 - gestion des utilisateurs et des droits
Adaptive Bezier curve network for real-time end-to-end text recognition
基于Pytorch的LSTM实战160万条评论情感分类
Mysql21 user and permission management
Complete web login process through filter
Mysql25 index creation and design principles
MySQL29-数据库其它调优策略
MySQL27-索引優化與查詢優化
Pytorch LSTM实现流程(可视化版本)
Database middleware_ MYCAT summary
随机推荐
Adaptive Bezier curve network for real-time end-to-end text recognition
MySQL combat optimization expert 04 uses the execution process of update statements in the InnoDB storage engine to talk about what binlog is?
Good blog good material record link
Opencv uses freetype to display Chinese
How to build an interface automation testing framework?
好博客好资料记录链接
数据库中间件_Mycat总结
CSDN-NLP:基于技能树和弱监督学习的博文难度等级分类 (一)
Time complexity (see which sentence is executed the most times)
MySQL25-索引的创建与设计原则
Not registered via @enableconfigurationproperties, marked (@configurationproperties use)
Texttext data enhancement method data argument
MySQL22-逻辑架构
Global and Chinese market of transfer switches 2022-2028: Research Report on technology, participants, trends, market size and share
MySQL18-MySQL8其它新特性
February 13, 2022-3-middle order traversal of binary tree
Solve the problem of remote connection to MySQL under Linux in Windows
Case identification based on pytoch pulmonary infection (using RESNET network structure)
Chrome浏览器端跨域不能访问问题处理办法
CSDN问答标签技能树(一) —— 基本框架的构建