当前位置:网站首页>CSDN question and answer tag skill tree (I) -- Construction of basic framework

CSDN question and answer tag skill tree (I) -- Construction of basic framework

2022-07-06 10:41:00 Alexxinlu

Series articles

Team blog : CSDN AI team

1 Problem definition

1.1 background

At present CSDN The questions in the question and answer module are simply classified , for example :Python、Java、C Languages and other categories , Instead of mapping questions to specific knowledge points in the general category , For example, in the example below , The problem belongs to Python Data visualization in language .
 Sample questions 1
Fine grained classification and division of problems , It can make the questioner more clearly understand the position of the question in the knowledge system , It is also convenient for the system to more accurately recommend relevant materials to the questioner for learning and reference .

In order to solve the above problems , This paper first builds a programming language skill tree for each category , Then map the previously adopted questions to specific nodes in the skill tree , Finally, for a new question , Based on the constructed skill tree , Match to the most similar node , And recommend the adopted questions on this node .

2 Solution

2.1 Knowledge gathering

To build a programming skill tree , First of all, we need to collect relevant knowledge , This paper starts with Python For example, programming languages , Carry out specific implementation .

Through online search and research , Summarize the following two channels :

2.2 Construction of skill tree

After obtaining the corresponding knowledge resources , You need to store resources in a tree structure , In this paper treelib Package implementation .

To facilitate the merging of trees in the next section , This article limits the directory to 4 The layer structure :

  • Chapter title . for example : The first part
  • Sub chapter title . for example : The first 1 Chapter
  • Section title . for example :1.1
  • Section title . for example :1.1.1

The structured tree structure is shown in the following figure :

 The picture sample 4

2.3 Merging of skill trees

After building a skill tree based on directories and knowledge system resources from different sources , You need to merge several different skill trees , Form a same Python The skill tree .

For the merging of trees , This paper mainly considers the following aspects :

  • Merge by layer starting from the root node
    • Use recursive method to merge multiple trees
  • Similar nodes in the same layer need to be merged
    • Use heuristic clustering methods ( There is no need to determine the number of clusters in advance ), Divide nodes into multiple clusters
    • The similarity calculation method in clustering uses Longest common subsequence ratio + Levin steinby ( Edit distance ratio ) The method of calculation
    • New node after merging , Use the longest common subsequence of multiple sentences to replace , for example :3 Nodes if Statements use if Statement processing list settings if Format of statement The longest common subsequence of is if sentence , Finally using if sentence As the value of the merge node .
  • Remove useless nodes
    • Use tree pruning + The method of dictionary , Remove useless nodes from the skill tree , for example : Summary of this chapter Extended reading project And other chapter nodes .

The merged skill tree is shown in the figure below :

 The picture sample 5

2.4 Match the problem with the skill tree

After the skill tree is built , Need to put Python All adopted problems in the field are mapped to the corresponding nodes , And for a new question , Based on the constructed skill tree , Match to the most similar node , And recommend the adopted questions on this node .

The matching algorithm used in this paper is Levin steinby ( Edit distance ratio ), By calculating the levinstein ratio between the question and the node , Determine the node that best matches the question .

3 Summary and next step plan


This paper mainly realizes the construction and merging of programming language skill tree , And the matching between questions and nodes in the skill tree . Now only the preliminary functions have been realized , The effect needs further optimization . The current problems mainly include :

  • The removal of irrelevant nodes is not clean enough
  • Similarity calculation method in clustering , And it is unreasonable to use the longest common subsequence to replace the new node after multiple nodes are merged , for example : Python Version run and Python code snippet Be divided into the same cluster , And merged into Python
  • There is a big difference between the description style of questions and nodes in the skill tree , One is asking questions , One is knowledge , Use when asking questions and matching nodes Levin steinby ( Edit distance ratio ) The method of calculating similarity is unreasonable
  • ……

Next step

For the current problems , Next, consider :

  • Further improve the quality of the synthesized skill tree
  • Improve the matching effect of problem and tree
