当前位置:网站首页>[knowledge atlas] practice -- Practice of question answering system based on medical knowledge atlas (Part3): rule-based problem classification
[knowledge atlas] practice -- Practice of question answering system based on medical knowledge atlas (Part3): rule-based problem classification
2022-07-25 16:52:00 【Coriander Chrysanthemum】
Preface article :
- 【 Knowledge map 】 Practice chapter —— Practice of question answering system based on medical knowledge map (Part1): Project introduction and environmental preparation
- 【 Knowledge map 】 Practice chapter —— Practice of question answering system based on medical knowledge map (Part2): Atlas data preparation and import
background
Based on the previous chapters , We can think that there is already a question and answer knowledge base that can provide medical knowledge . It's going on pipline Mode Q & a task , After receiving the question , Usually it is to classify problems , For refined processing and answer . This problem classification is also commonly referred to as intention recognition . For intention recognition and problem classification , In essence, it is to classify the text , Traditional machine learning algorithms and deep learning algorithms can be used to deal with this problem , But in the absence of corpus annotation , Using rules is probably the best way . The original project is like this .
Rule based problem classification
The function of entity data export is provided in the module of knowledge map data storage , The exported data is some entity data , In addition, the source code also provides some negative words deny.txt, I also put the file in dict Under the folder . This part is all feature words classified based on rules . The main problem of the problem is the next problem analysis of the corresponding category , Ready for problem search .
Now let's start designing classes for problem classification .KGQAMedicine\question_classify\rule_question_classify.py
In order to quickly match whether the question contains a feature thesaurus , Here we introduce a package ahocorasick, install :pip install pyahocorasick
The first step of problem classification is to determine whether there is entity content in the graph database in the question content , If you don't, you can't make relevant query answers .
Rule based classification mainly uses keyword matching . The following categories are supported for questions :
| Question type | Chinese meaning | Examples of questions |
|---|---|---|
| disease_symptom | Symptoms of illness | What are the symptoms of breast cancer ? |
| symptom_disease | Find possible diseases with known symptoms | How to do with a runny nose recently ? |
| disease_cause | Cause of disease | Why do some people have insomnia ? |
| disease_acompany | Complications of the disease | What are the complications of insomnia ? |
| disease_not_food | Disease requires diet | People who have insomnia should not eat anything ? |
| disease_do_food | What kind of food is recommended for illness | Tinnitus, eat something ? |
| food_not_disease | What disease is best not to eat something | Who had better not eat honey ? |
| food_do_disease | What disease is food good for | What are the benefits of goose ? |
| disease_drug | Take whatever medicine for any disease | What medicine should I take for liver disease ? |
| drug_disease | What disease can medicine cure | What disease can Banlangen Granule treat ? |
| disease_check | What examination is needed for the disease | How can meningitis be detected ? |
| check_disease | What disease can the examination check | What can the whole blood cell count find out ? |
| disease_prevent | Preventive measures | How can we prevent kidney deficiency ? |
| disease_lasttime | Treatment cycle | How long does it take for a cold to get over ? |
| disease_cureway | Treatment | How to treat hypertension ? |
| disease_cureprob | Cure probability | Can leukaemia be cured ? |
| disease_easyget | Disease susceptible people | Who is prone to high blood pressure ? |
| disease_desc | Disease description | diabetes |
The specific implementation is as follows :
import os
import ahocorasick
import tqdm
from utils.config import SysConfig
class RuleQuestionClassifier(object):
disease_feature_words = []
department_feature_words = []
check_feature_words = []
drug_feature_words = []
food_feature_words = []
producer_feature_words = []
symptom_feature_words = []
region_feature_words = set()
deny_feature_words = []
# Questions and interrogative words
symptom_qwds = [' symptoms ', ' characterization ', ' The phenomenon ', ' Symptoms ', ' performance ']
cause_qwds = [' reason ', ' origin ', ' Why? ', ' How could ', ' How ', ' How can I ', ' How ', ' How can ', ' Why? ', ' why ', ' How can I ', ' How can you ', ' It can lead to ', ' Can cause ']
acompany_qwds = [' complications ', ' Concurrent ', ' Together ', ' Together ', ' Come together ', ' Together ', ' Together ', ' Come together ', ' Accompanied by ', ' Along with ', ' Altogether ']
food_qwds = [' diet ', ' Drinking ', ' eat ', ' food ', ' food ', ' Diet ', ' drink ', ' food ', ' Avoid food ', ' Supplements ', ' Health care products ', ' The recipe ', ' menu ', ' eating ', ' food ', ' Supplements ']
drug_qwds = [' drug ', ' drug ', ' Medication ', ' capsule ', ' Oral liquid ', ' Inflammatory tablets ']
prevent_qwds = [' The prevention of ', ' To guard against ', ' boycott ', ' To resist ', ' prevent ', ' avoid ', ' escape ', ' To avoid the ', ' lest ', ' Escape ', ' To avoid the ', ' Avoid ', ' Get out of the way ', ' Hide ', ' bypass ',
' How can we not ', ' How can we not ', ' How can we not ', ' How can I not ', ' How can we not ', ' How not to ', ' Why not ', ' Why not ', ' Why not ',
' How not ', ' How can we not ', ' How can I not ', ' How can I not ', ' How can I not ', ' How can I not ', ' How can we not ', ' Why not ',
' How can we not ', ' Why not ', ' How not ']
lasttime_qwds = [' cycle ', ' How long? ', ' How long ', ' How much time ', ' A few days ', ' A few years ', ' How many days? ', ' How many hours ', ' A few hours ', ' How many years? ']
cureway_qwds = [' How to treat ', ' How to treat ', ' How to treat ', ' How to treat ', ' How to cure ', ' How to cure ', ' Treatment ', ' therapy ', ' How to cure ', ' What do I do ', ' To do ', ' How to cure ']
cureprob_qwds = [' How likely is it to be cured ', ' How likely is it to be cured ', ' Is there much hope for cure ', ' probability ', ' Almost ', ' The proportion ', ' possibility ', ' Able to cure ', ' Curable ', ' Can cure ', ' Can cure ']
easyget_qwds = [' Susceptible people ', ' Susceptible to infection ', ' Vulnerable people ', ' Who ', ' Who ', ' infection ', ' Get it ', ' It's up to ']
check_qwds = [' Check ', ' Inspection items ', ' Find out ', ' Check ', ' measure ', ' Try it out ']
belong_qwds = [' What department does it belong to ', ' Belong to ', ' What subject ', ' department ']
cure_qwds = [' Treat what ', ' Cure what ', ' Treat what ', ' Cure what ', ' Cure what ', ' What is the main treatment ', ' What is the main treatment ', ' What's the usage? ', ' What's the use ', ' use ', ' purpose ',
' What are the benefits ', ' What are the benefits ', ' What are the benefits ', ' be used for ', ' What to do ', ' Used for what ', ' need ', ' want ']
def __init__(self):
self.region_actree = None
self.word_kind_dict = None
self._init()
@staticmethod
def _load_line_file(file_path):
print(f"load file {
file_path}")
data_list = []
with open(file_path, 'r', encoding='utf8') as reader:
for line in reader:
if not line.strip():
continue
data_list.append(line.strip())
return data_list
def _init(self):
# load data
file_list = ["disease", "department", "check", "drug", "food", "producer", "symptoms", "deny"]
for index, file_path in enumerate(file_list):
data_list = self._load_line_file(os.path.join(SysConfig.DATA_DICT_DIR, file_path + ".txt"))
setattr(self, file_path + "_feature_words", data_list)
self.region_feature_words.update(data_list)
# build actree
self.region_actree = self._get_actree(list(self.region_feature_words))
# build word kind dict
self._build_word_kind_dict()
print("object init over")
def _build_word_kind_dict(self):
word_kind_dict = {
}
for word in tqdm.tqdm(self.region_feature_words, desc='building word kind dict'):
word_kind_dict.setdefault(word, [])
if word in self.disease_feature_words:
word_kind_dict[word].append("disease")
if word in self.department_feature_words:
word_kind_dict[word].append("department")
if word in self.check_feature_words:
word_kind_dict[word].append("check")
if word in self.drug_feature_words:
word_kind_dict[word].append("drug")
if word in self.food_feature_words:
word_kind_dict[word].append("food")
if word in self.symptom_feature_words:
word_kind_dict[word].append("symptom")
if word in self.producer_feature_words:
word_kind_dict[word].append("producer")
self.word_kind_dict = word_kind_dict
@staticmethod
def _get_actree(key_list):
actree = ahocorasick.Automaton()
for index, word in enumerate(key_list):
actree.add_word(word, (index, word))
actree.make_automaton()
return actree
def classify(self, question):
classify_res = {
}
medical_dict = self.check_query(question)
if not medical_dict:
return {
}
classify_res['args'] = medical_dict
region_word_kinds = []
for kinds in medical_dict.values():
region_word_kinds.extend(kinds)
question_kinds = []
# disease symptom
self.sub_classify(self.symptom_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_symptom")
# symptom disease
self.sub_classify(self.symptom_qwds, question, 'symptom', region_word_kinds, question_kinds, "symptom_disease")
# disease cause
self.sub_classify(self.cause_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_cause")
# disease accompany
self.sub_classify(self.acompany_qwds, question, 'disease', region_word_kinds, question_kinds,
"disease_accompany")
# disease food
if self.check_words(self.food_qwds, question) and 'disease' in region_word_kinds:
deny_status = self.check_words(self.deny_feature_words, question)
if deny_status:
question_kind = "disease_not_food"
else:
question_kind = "disease_do_food"
question_kinds.append(question_kind)
# food disease
if self.check_words(self.food_qwds + self.cure_qwds, question) and 'food' in region_word_kinds:
deny_status = self.check_words(self.deny_feature_words, question)
if deny_status:
question_kind = 'food_not_disease'
else:
question_kind = 'food_do_disease'
question_kinds.append(question_kind)
# disease_drug
self.sub_classify(self.drug_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_drug")
# drug disease
self.sub_classify(self.cure_qwds, question, 'drug', region_word_kinds, question_kinds, "drug_disease")
# disease check
self.sub_classify(self.check_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_check")
# check disease
self.sub_classify(self.check_qwds + self.cure_qwds, question, 'check', region_word_kinds, question_kinds,
"check_disease")
# disease prevent
self.sub_classify(self.prevent_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_prevent")
# disease last time
self.sub_classify(self.lasttime_qwds, question, 'disease', region_word_kinds, question_kinds,
"disease_lasttime")
# disease cure way
self.sub_classify(self.cureway_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_cureway")
# disease cure prob
self.sub_classify(self.cureprob_qwds, question, 'disease', region_word_kinds, question_kinds,
"disease_cureprob")
# disease easy get
self.sub_classify(self.easyget_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_easyget")
# others deal
if question_kinds == [] and 'disease' in region_word_kinds:
question_kinds.append('disease_desc')
if question_kinds == [] and 'symptom' in region_word_kinds:
question_kinds.append('symptom_disease')
classify_res['question_kinds'] = question_kinds
return classify_res
def sub_classify(self, kind_qkwds, question, key, region_word_kinds, question_kinds, kind_type):
if self.check_words(kind_qkwds, question) and (key in region_word_kinds):
question_kinds.append(kind_type)
@staticmethod
def check_words(kws, question):
for kw in kws:
if kw in question:
return True
return False
def check_query(self, question):
region_feature_words = []
for i in self.region_actree.iter(question):
feature_word = i[1][1]
region_feature_words.append(feature_word)
inner_words = []
for i in range(len(region_feature_words)):
wi = region_feature_words[i]
for j in range(i + 1, len(region_feature_words)):
wj = region_feature_words[j]
if wi in wj and wi != wj:
inner_words.append(wi)
final_dict = {
word: self.word_kind_dict.get(word) for word in
filter(lambda x: x not in inner_words, region_feature_words)}
return final_dict
Effect test :
The effect is basically in line with expectations . Of course , We can also use entity recognition to identify the target entity and use the model based on deep learning to classify problems, so as to improve the generalization ability of problem classification and recall effect . Use deep learning to optimize , That means a lot of annotation data is needed .
边栏推荐
- MySQL linked table query, common functions, aggregate functions
- Rainbow plug-in extension: monitor MySQL based on MySQL exporter
- [target detection] yolov5 Runtong voc2007 dataset (repair version)
- QT ListView 列表显示组件笔记
- 为什么 4EVERLAND 是 Web 3.0 的最佳云计算平台
- 从数字化到智能运维:有哪些价值,又有哪些挑战?
- 【知识图谱】实践篇——基于医疗知识图谱的问答系统实践(Part4):结合问题分类的问题解析与检索语句生成
- 【obs】发送前丢帧及帧优先级
- [Nanjing University of Aeronautics and Astronautics] information sharing for the first and second examinations of postgraduate entrance examination
- What are the free low code development platforms?
猜你喜欢

搜狗批量推送软件-搜狗批量推送工具【2022最新】

失意的互联网人拼命叩开Web3大门

Hcip notes 11 days

Budget report ppt

Test Driven Development (TDD) online practice room | classes open on September 17
![[mathematical modeling and drawing series tutorial] II. Drawing and optimization of line chart](/img/73/2b6fe0cf69fa013894abce331e1386.png)
[mathematical modeling and drawing series tutorial] II. Drawing and optimization of line chart
![[xiao5 chat] check the official account < the service provided by the official account has failed, please wait a moment>](/img/b2/cbba006e5d1ada959f494336e93f39.png)
[xiao5 chat] check the official account < the service provided by the official account has failed, please wait a moment>

多租户软件开发架构

WPF 实现用户头像选择器

QT listview list display component notes
随机推荐
Automatic reply of wechat official account development message
mindoc制作思维导图
3D semantic segmentation - scribed supervised lidar semantic segmentation
华泰vip账户证券开户安全吗
首页门户分类查询
【目标检测】YOLOv5跑通VOC2007数据集(修复版)
7.依赖注入
激荡20年,芯片产能从零起步到反超美国,中国制造的又一大成就
【知识图谱】实践篇——基于医疗知识图谱的问答系统实践(Part4):结合问题分类的问题解析与检索语句生成
Two methods of importing sqllite table from MySQL
EasyUI DataGrid control uses
吴恩达逻辑回归2
ILSSI认证|六西格玛DMAIC的历程
MySQL之联表查询、常用函数、聚合函数
一百个用户眼中,就有一百个QQ
What are the free low code development platforms?
easyui入门
2D semantic segmentation -- deeplabv3plus reproduction
2022 latest Beijing Construction welder (construction special operation) simulation question bank and answer analysis
Breakthrough in core technology of the large humanoid Service Robot Walker x