当前位置:网站首页>CSDN question and answer module Title Recommendation task (I) -- Construction of basic framework
CSDN question and answer module Title Recommendation task (I) -- Construction of basic framework
2022-07-06 10:41:00 【Alexxinlu】
Catalog
Series articles
- CSDN Q & a module Title recommended tasks ( One ) —— Construction of basic framework
- CSDN Q & a module Title recommended tasks ( Two ) —— Effect optimization
Team blog : CSDN AI team
1 Problem definition
1.1 background
stay CSDN Of Q & a module in , Many beginners' question titles lack effective information , for example :
Help the children !
Boss, help me !!! Help me finish this topic
Ask for the great God !!!
In the example above , A better title would be “ How to add scroll bars to mobile pages ?”
In the title of this kind of question , It contains very little useful information , Unable to quickly understand the meaning of the problem from the title , To a certain extent, it will affect the efficiency of question respondents and the user experience . Besides , Such data will further affect the downstream NLP The effect of the mission , for example : Problem classification , Question recommendation, etc .
Therefore, in order to improve the quality of question titles , Need information based on problem description , After the user enters the title and problem description , Recommend more accurate titles to users , And prompt the user to change .
1.2 Input
The available input information is shown below :
"id": 998678,
"title": “ For help , Very simple. C# problem ”,
"body": “ The teacher asked to build a score management system , Classification login is required , But this cheng’xu If the user is a student , Even if comboBox You can log in smoothly without selecting students ,admin There is no such problem , Ask the boss why ?\r\n\t\r\n\t\r\n\tprivate void button1_Click(object sender, EventArgs e)\r\n {\r\n string sUser = txtUser.Text.ToString();\r\n string sPassword = txtPassword.Text.ToString();\r\n\r\n if (sUser == “admin” && sPassword == “1234” && comboBoxLeixing.Text == “ Administrators ”)\r\n {\r\n Menuadmin main = new Menuadmin();\r\n main.Show();\r\n this.Hide();\r\n }\r\n\r\n if (sUser == “ Xuguangrui ” || sUser == “ Cao Guang ” || sUser == “ Cao ziyue ” || sUser == “ Chen Sijia ” || sUser == “ Chen Xu ” || sUser == “ Huang Wenguang ” ||\r\n sUser == “ Lei Zhangshu ” || sUser == “ Liuqingqing ” || sUser == “ Qi SHIMENG ” || sUser == “ Shen bin ” || sUser == “ Shuaixing ” || sUser == “ Sun Quanwei ” ||\r\n sUser == “ Wang Heng ” || sUser == “ Wang Rui ” || sUser == “ Xiang Meng ” || sUser == “ Zhang Guoliang ” || sUser == “ Zhangzongyou ” || sUser == “ Zhang Shumin ”\r\n && sPassword == “1234” && comboBoxLeixing.Text == “ Student ”)\r\n {\r\n Menustudent main = new Menustudent();\r\n main.Show();\r\n this.Hide();\r\n }\r\n\r\n if (sUser == “ Liu Zhaoliang ” || sUser == “ Longlong ” || sUser == “ Feng Wei ” || sUser == “ Liu shanyong ” ||\r\n sUser == “ India forest ” || sUser == “ Cheng Leli ” || sUser == “ Liu Yan ” || sUser == “ Zhao Junwei ”\r\n && sPassword == “1234” && comboBoxLeixing.Text== “ Teachers' ”)\r\n {\r\n Menuteacher main = new Menuteacher();\r\n main.Show();\r\n this.Hide();\r\n }\r\n \r\n else\r\n label3.Text = “ Wrong user name or password , Please re-enter !”;\r\n }”,
"tag_id": 95,
"tag_name": “c Language ”
The input mainly includes the above five fields , among title Is the title that needs improvement .
Currently only title and body Two fields as input .
1.3 Output
Improved question Title .
2 Solution
This paper further abstracts the problem as NLP Text summary task in , The specific implementation steps are as follows :
2.1 Data preprocessing
At present, the following preprocessing operations are mainly done :
- Remove irrelevant information . for example : Code segment 、URL、 Irrelevant characters, etc ;
- Cut the paragraph into sentences . Segmentation based on delimiters , for example : A newline 、 Full stop 、 question mark 、 Exclamation marks, etc .
2.2 Model
2.2.1 Rough sort
The current scheme uses classic Extraction model TextRank, Rank all sentences entered , The final choice TopN Sentence to recommend .
2.2.2 Fine sorting
Because this article is to recommend the title of the question , Therefore, questions should be given priority .
A dictionary based approach is used here , Identify all questions in the input . Then the result of rough sorting , Put the questions at the top .
2.3 Experimental results and error data analysis
The preliminary analysis results are shown in the figure below :
It can be seen from the above figure , At present, the main problems include :
- Sample question : Some questions body There are only pictures in 、 Code snippets and so on , It does not contain useful Chinese text information .
- The title is too long : The current preprocessing method is too simple , Lead to segmentation , Some sentences are too long , And the current model is the extraction text summarization algorithm , The input sentence will not be modified . Therefore, some recommended titles are too long . And the title of the question is generally more concise .
3 Next step
- Classify the samples , For samples with only images or code snippets , You need to identify and judge the information , Then make a title recommendation .
- Simplify the title , Consider using problem templates or generative text summarization methods for improvement .
P.S.
This series of articles will be continuously updated . What we are doing now is too simple , The effect is not satisfactory , hope NLP Colleagues in other fields 、 Teachers and experts can provide valuable advice , thank you !
边栏推荐
- [Li Kou 387] the first unique character in the string
- UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0xd0 in position 0成功解决
- Use of dataset of pytorch
- C language string function summary
- 如何搭建接口自动化测试框架?
- 在jupyter NoteBook使用Pytorch进行MNIST实现
- 解决在window中远程连接Linux下的MySQL
- [Julia] exit notes - Serial
- Complete web login process through filter
- Mysql21 - gestion des utilisateurs et des droits
猜你喜欢
用于实时端到端文本识别的自适应Bezier曲线网络
MySQL34-其他数据库日志
Use JUnit unit test & transaction usage
Bytetrack: multi object tracking by associating every detection box paper reading notes ()
MySQL combat optimization expert 12 what does the memory data structure buffer pool look like?
Complete web login process through filter
The underlying logical architecture of MySQL
MySQL18-MySQL8其它新特性
CSDN博文摘要(一) —— 一个简单的初版实现
Typescript入门教程(B站黑马程序员)
随机推荐
Transactions have four characteristics?
好博客好资料记录链接
Set shell script execution error to exit automatically
Pytorch LSTM实现流程(可视化版本)
[untitled]
Solve the problem of remote connection to MySQL under Linux in Windows
Adaptive Bezier curve network for real-time end-to-end text recognition
Mysql36 database backup and recovery
CSDN博文摘要(一) —— 一个简单的初版实现
[paper reading notes] - cryptographic analysis of short RSA secret exponents
CSDN问答标签技能树(二) —— 效果优化
评估方法的优缺点
[unity] simulate jelly effect (with collision) -- tutorial on using jellysprites plug-in
【C语言】深度剖析数据存储的底层原理
MySQL底层的逻辑架构
Software test engineer development planning route
MySQL27-索引优化与查询优化
Mysql23 storage engine
Implement sending post request with form data parameter
ZABBIX introduction and installation