当前位置:网站首页>CSDN question and answer module Title Recommendation task (I) -- Construction of basic framework

CSDN question and answer module Title Recommendation task (I) -- Construction of basic framework

2022-07-06 10:41:00 Alexxinlu

Series articles

Team blog : CSDN AI team

1 Problem definition

1.1 background

stay CSDN Of Q & a module in , Many beginners' question titles lack effective information , for example :

Help the children !
Boss, help me !!! Help me finish this topic
Ask for the great God !!!

 Insert picture description here
In the example above , A better title would be “ How to add scroll bars to mobile pages ?

In the title of this kind of question , It contains very little useful information , Unable to quickly understand the meaning of the problem from the title , To a certain extent, it will affect the efficiency of question respondents and the user experience . Besides , Such data will further affect the downstream NLP The effect of the mission , for example : Problem classification , Question recommendation, etc .

Therefore, in order to improve the quality of question titles , Need information based on problem description , After the user enters the title and problem description , Recommend more accurate titles to users , And prompt the user to change .

1.2 Input

The available input information is shown below :

"id": 998678,
"title": “ For help , Very simple. C# problem ”,
"body": “ The teacher asked to build a score management system , Classification login is required , But this cheng’xu If the user is a student , Even if comboBox You can log in smoothly without selecting students ,admin There is no such problem , Ask the boss why ?\r\n\t\r\n\t\r\n\tprivate void button1_Click(object sender, EventArgs e)\r\n {\r\n string sUser = txtUser.Text.ToString();\r\n string sPassword = txtPassword.Text.ToString();\r\n\r\n if (sUser == “admin” && sPassword == “1234” && comboBoxLeixing.Text == “ Administrators ”)\r\n {\r\n Menuadmin main = new Menuadmin();\r\n main.Show();\r\n this.Hide();\r\n }\r\n\r\n if (sUser == “ Xuguangrui ” || sUser == “ Cao Guang ” || sUser == “ Cao ziyue ” || sUser == “ Chen Sijia ” || sUser == “ Chen Xu ” || sUser == “ Huang Wenguang ” ||\r\n sUser == “ Lei Zhangshu ” || sUser == “ Liuqingqing ” || sUser == “ Qi SHIMENG ” || sUser == “ Shen bin ” || sUser == “ Shuaixing ” || sUser == “ Sun Quanwei ” ||\r\n sUser == “ Wang Heng ” || sUser == “ Wang Rui ” || sUser == “ Xiang Meng ” || sUser == “ Zhang Guoliang ” || sUser == “ Zhangzongyou ” || sUser == “ Zhang Shumin ”\r\n && sPassword == “1234” && comboBoxLeixing.Text == “ Student ”)\r\n {\r\n Menustudent main = new Menustudent();\r\n main.Show();\r\n this.Hide();\r\n }\r\n\r\n if (sUser == “ Liu Zhaoliang ” || sUser == “ Longlong ” || sUser == “ Feng Wei ” || sUser == “ Liu shanyong ” ||\r\n sUser == “ India forest ” || sUser == “ Cheng Leli ” || sUser == “ Liu Yan ” || sUser == “ Zhao Junwei ”\r\n && sPassword == “1234” && comboBoxLeixing.Text== “ Teachers' ”)\r\n {\r\n Menuteacher main = new Menuteacher();\r\n main.Show();\r\n this.Hide();\r\n }\r\n \r\n else\r\n label3.Text = “ Wrong user name or password , Please re-enter !”;\r\n }”,
"tag_id": 95,
"tag_name": “c Language ”

The input mainly includes the above five fields , among title Is the title that needs improvement .
Currently only title and body Two fields as input .

1.3 Output

Improved question Title .

2 Solution

This paper further abstracts the problem as NLP Text summary task in , The specific implementation steps are as follows :

2.1 Data preprocessing

At present, the following preprocessing operations are mainly done :

  1. Remove irrelevant information . for example : Code segment 、URL、 Irrelevant characters, etc ;
  2. Cut the paragraph into sentences . Segmentation based on delimiters , for example : A newline 、 Full stop 、 question mark 、 Exclamation marks, etc .

2.2 Model

2.2.1 Rough sort

The current scheme uses classic Extraction model TextRank, Rank all sentences entered , The final choice TopN Sentence to recommend .

2.2.2 Fine sorting

Because this article is to recommend the title of the question , Therefore, questions should be given priority .

A dictionary based approach is used here , Identify all questions in the input . Then the result of rough sorting , Put the questions at the top .

2.3 Experimental results and error data analysis

The preliminary analysis results are shown in the figure below :
 Insert picture description here
It can be seen from the above figure , At present, the main problems include :

  1. Sample question : Some questions body There are only pictures in 、 Code snippets and so on , It does not contain useful Chinese text information .
  2. The title is too long : The current preprocessing method is too simple , Lead to segmentation , Some sentences are too long , And the current model is the extraction text summarization algorithm , The input sentence will not be modified . Therefore, some recommended titles are too long . And the title of the question is generally more concise .

3 Next step

  1. Classify the samples , For samples with only images or code snippets , You need to identify and judge the information , Then make a title recommendation .
  2. Simplify the title , Consider using problem templates or generative text summarization methods for improvement .


This series of articles will be continuously updated . What we are doing now is too simple , The effect is not satisfactory , hope NLP Colleagues in other fields 、 Teachers and experts can provide valuable advice , thank you !

