当前位置:网站首页>CSDN question and answer module Title Recommendation task (I) -- Construction of basic framework
CSDN question and answer module Title Recommendation task (I) -- Construction of basic framework
2022-07-06 10:41:00 【Alexxinlu】
Catalog
Series articles
- CSDN Q & a module Title recommended tasks ( One ) —— Construction of basic framework
- CSDN Q & a module Title recommended tasks ( Two ) —— Effect optimization
Team blog : CSDN AI team
1 Problem definition
1.1 background
stay CSDN Of Q & a module in , Many beginners' question titles lack effective information , for example :
Help the children !
Boss, help me !!! Help me finish this topic
Ask for the great God !!!
In the example above , A better title would be “ How to add scroll bars to mobile pages ?”
In the title of this kind of question , It contains very little useful information , Unable to quickly understand the meaning of the problem from the title , To a certain extent, it will affect the efficiency of question respondents and the user experience . Besides , Such data will further affect the downstream NLP The effect of the mission , for example : Problem classification , Question recommendation, etc .
Therefore, in order to improve the quality of question titles , Need information based on problem description , After the user enters the title and problem description , Recommend more accurate titles to users , And prompt the user to change .
1.2 Input
The available input information is shown below :
"id": 998678,
"title": “ For help , Very simple. C# problem ”,
"body": “ The teacher asked to build a score management system , Classification login is required , But this cheng’xu If the user is a student , Even if comboBox You can log in smoothly without selecting students ,admin There is no such problem , Ask the boss why ?\r\n\t\r\n\t\r\n\tprivate void button1_Click(object sender, EventArgs e)\r\n {\r\n string sUser = txtUser.Text.ToString();\r\n string sPassword = txtPassword.Text.ToString();\r\n\r\n if (sUser == “admin” && sPassword == “1234” && comboBoxLeixing.Text == “ Administrators ”)\r\n {\r\n Menuadmin main = new Menuadmin();\r\n main.Show();\r\n this.Hide();\r\n }\r\n\r\n if (sUser == “ Xuguangrui ” || sUser == “ Cao Guang ” || sUser == “ Cao ziyue ” || sUser == “ Chen Sijia ” || sUser == “ Chen Xu ” || sUser == “ Huang Wenguang ” ||\r\n sUser == “ Lei Zhangshu ” || sUser == “ Liuqingqing ” || sUser == “ Qi SHIMENG ” || sUser == “ Shen bin ” || sUser == “ Shuaixing ” || sUser == “ Sun Quanwei ” ||\r\n sUser == “ Wang Heng ” || sUser == “ Wang Rui ” || sUser == “ Xiang Meng ” || sUser == “ Zhang Guoliang ” || sUser == “ Zhangzongyou ” || sUser == “ Zhang Shumin ”\r\n && sPassword == “1234” && comboBoxLeixing.Text == “ Student ”)\r\n {\r\n Menustudent main = new Menustudent();\r\n main.Show();\r\n this.Hide();\r\n }\r\n\r\n if (sUser == “ Liu Zhaoliang ” || sUser == “ Longlong ” || sUser == “ Feng Wei ” || sUser == “ Liu shanyong ” ||\r\n sUser == “ India forest ” || sUser == “ Cheng Leli ” || sUser == “ Liu Yan ” || sUser == “ Zhao Junwei ”\r\n && sPassword == “1234” && comboBoxLeixing.Text== “ Teachers' ”)\r\n {\r\n Menuteacher main = new Menuteacher();\r\n main.Show();\r\n this.Hide();\r\n }\r\n \r\n else\r\n label3.Text = “ Wrong user name or password , Please re-enter !”;\r\n }”,
"tag_id": 95,
"tag_name": “c Language ”
The input mainly includes the above five fields , among title Is the title that needs improvement .
Currently only title and body Two fields as input .
1.3 Output
Improved question Title .
2 Solution
This paper further abstracts the problem as NLP Text summary task in , The specific implementation steps are as follows :
2.1 Data preprocessing
At present, the following preprocessing operations are mainly done :
- Remove irrelevant information . for example : Code segment 、URL、 Irrelevant characters, etc ;
- Cut the paragraph into sentences . Segmentation based on delimiters , for example : A newline 、 Full stop 、 question mark 、 Exclamation marks, etc .
2.2 Model
2.2.1 Rough sort
The current scheme uses classic Extraction model TextRank, Rank all sentences entered , The final choice TopN Sentence to recommend .
2.2.2 Fine sorting
Because this article is to recommend the title of the question , Therefore, questions should be given priority .
A dictionary based approach is used here , Identify all questions in the input . Then the result of rough sorting , Put the questions at the top .
2.3 Experimental results and error data analysis
The preliminary analysis results are shown in the figure below :
It can be seen from the above figure , At present, the main problems include :
- Sample question : Some questions body There are only pictures in 、 Code snippets and so on , It does not contain useful Chinese text information .
- The title is too long : The current preprocessing method is too simple , Lead to segmentation , Some sentences are too long , And the current model is the extraction text summarization algorithm , The input sentence will not be modified . Therefore, some recommended titles are too long . And the title of the question is generally more concise .
3 Next step
- Classify the samples , For samples with only images or code snippets , You need to identify and judge the information , Then make a title recommendation .
- Simplify the title , Consider using problem templates or generative text summarization methods for improvement .
P.S.
This series of articles will be continuously updated . What we are doing now is too simple , The effect is not satisfactory , hope NLP Colleagues in other fields 、 Teachers and experts can provide valuable advice , thank you !
边栏推荐
- 保姆级手把手教你用C语言写三子棋
- Mysql27 index optimization and query optimization
- 高并发系统的限流方案研究,其实限流实现也不复杂
- February 13, 2022-2-climbing stairs
- [Li Kou 387] the first unique character in the string
- ① BOKE
- C language string function summary
- API learning of OpenGL (2001) gltexgen
- MySQL combat optimization expert 06 production experience: how does the production environment database of Internet companies conduct performance testing?
- MySQL23-存储引擎
猜你喜欢
ByteTrack: Multi-Object Tracking by Associating Every Detection Box 论文阅读笔记()
MySQL 20 MySQL data directory
Pytorch RNN actual combat case_ MNIST handwriting font recognition
What is the current situation of the game industry in the Internet world?
【C语言】深度剖析数据存储的底层原理
Isn't there anyone who doesn't know how to write mine sweeping games in C language
Export virtual machines from esxi 6.7 using OVF tool
CSDN问答模块标题推荐任务(一) —— 基本框架的搭建
MySQL29-数据库其它调优策略
Mysql23 storage engine
随机推荐
Baidu Encyclopedia data crawling and content classification and recognition
How to find the number of daffodils with simple and rough methods in C language
Nanny hand-in-hand teaches you to write Gobang in C language
pytorch的Dataset的使用
Implement sending post request with form data parameter
Chrome浏览器端跨域不能访问问题处理办法
Anaconda3 installation CV2
该不会还有人不懂用C语言写扫雷游戏吧
MySQL23-存儲引擎
实现以form-data参数发送post请求
C语言标准的发展
C language string function summary
Mysql35 master slave replication
MySQL26-性能分析工具的使用
MySQL36-数据库备份与恢复
[after reading the series of must know] one of how to realize app automation without programming (preparation)
评估方法的优缺点
API learning of OpenGL (2005) gl_ MAX_ TEXTURE_ UNITS GL_ MAX_ TEXTURE_ IMAGE_ UNITS_ ARB
MySQL18-MySQL8其它新特性
MySQL combat optimization expert 07 production experience: how to conduct 360 degree dead angle pressure test on the database in the production environment?