当前位置:网站首页>【知识图谱】实践篇——基于医疗知识图谱的问答系统实践(Part3):基于规则的问题分类
【知识图谱】实践篇——基于医疗知识图谱的问答系统实践(Part3):基于规则的问题分类
2022-07-25 16:38:00 【科皮子菊】
前序文章:
背景
基于前面的章节,我们可以认为当前已经有了一个可以提供关于医疗知识的问答知识库。在进行pipline方式问答任务时,接到问题后,通常就是将问题进行分类,以作精细化的处理与回答。这个问题分类通常也被称为意图识别。对于意图识别获问题分类来说,本质上就是对文本进行分类,可以使用传统的机器学习算法以及深度学习算法来处理该问题,但是在缺乏语料标注的情况下,使用规则可能是最好的方式。原项目就是如此。
基于规则的问题分类
在知识图谱数据入库的模块中提供了实体数据导出功能,导出的数据即为一些实体数据,除此之外源代码中还提供了一些否定词deny.txt,我也将该文件放到dict文件夹下。这部分都是基于规则进行分类的特征词。问题的问题主要是接下来的对应类别的问题解析,已经问题搜索做准备。
下面就开始设计问题分类的类。KGQAMedicine\question_classify\rule_question_classify.py
其中为了能够快速匹配到问句中是否包含特征词库,这里引入一个包ahocorasick, 安装:pip install pyahocorasick
问题分类的第一步是判断问句内容中是否有图数据库中的实体内容,如果没有就无法做出相关的查询解答。
基于规则的分类方式主要是使用关键词匹配。其中问题支持以下类别:
| 问句类型 | 中文含义 | 问句举例 |
|---|---|---|
| disease_symptom | 疾病症状 | 乳腺癌的症状有哪些? |
| symptom_disease | 已知症状找可能疾病 | 最近老流鼻涕怎么办? |
| disease_cause | 疾病病因 | 为什么有的人会失眠? |
| disease_acompany | 疾病的并发症 | 失眠有哪些并发症? |
| disease_not_food | 疾病需要忌口的食物 | 失眠的人不要吃啥? |
| disease_do_food | 疾病建议吃什么食物 | 耳鸣了吃点啥? |
| food_not_disease | 什么病最好不要吃某事物 | 哪些人最好不好吃蜂蜜? |
| food_do_disease | 食物对什么病有好处 | 鹅肉有什么好处? |
| disease_drug | 啥病要吃啥药 | 肝病要吃啥药? |
| drug_disease | 药品能治啥病 | 板蓝根颗粒能治啥病? |
| disease_check | 疾病需要做什么检查 | 脑膜炎怎么才能查出来? |
| check_disease | 检查能查什么病 | 全血细胞计数能查出啥来? |
| disease_prevent | 预防措施 | 怎样才能预防肾虚? |
| disease_lasttime | 治疗周期 | 感冒要多久才能好? |
| disease_cureway | 治疗方式 | 高血压要怎么治? |
| disease_cureprob | 治愈概率 | 白血病能治好吗? |
| disease_easyget | 疾病易感人群 | 什么人容易得高血压? |
| disease_desc | 疾病描述 | 糖尿病 |
具体实现如下:
import os
import ahocorasick
import tqdm
from utils.config import SysConfig
class RuleQuestionClassifier(object):
disease_feature_words = []
department_feature_words = []
check_feature_words = []
drug_feature_words = []
food_feature_words = []
producer_feature_words = []
symptom_feature_words = []
region_feature_words = set()
deny_feature_words = []
# 问句疑问词
symptom_qwds = ['症状', '表征', '现象', '症候', '表现']
cause_qwds = ['原因', '成因', '为什么', '怎么会', '怎样才', '咋样才', '怎样会', '如何会', '为啥', '为何', '如何才会', '怎么才会', '会导致', '会造成']
acompany_qwds = ['并发症', '并发', '一起发生', '一并发生', '一起出现', '一并出现', '一同发生', '一同出现', '伴随发生', '伴随', '共现']
food_qwds = ['饮食', '饮用', '吃', '食', '伙食', '膳食', '喝', '菜', '忌口', '补品', '保健品', '食谱', '菜谱', '食用', '食物', '补品']
drug_qwds = ['药', '药品', '用药', '胶囊', '口服液', '炎片']
prevent_qwds = ['预防', '防范', '抵制', '抵御', '防止', '躲避', '逃避', '避开', '免得', '逃开', '避开', '避掉', '躲开', '躲掉', '绕开',
'怎样才能不', '怎么才能不', '咋样才能不', '咋才能不', '如何才能不', '怎样才不', '怎么才不', '咋样才不', '咋才不',
'如何才不', '怎样才可以不', '怎么才可以不', '咋样才可以不', '咋才可以不', '如何可以不', '怎样才可不', '怎么才可不',
'咋样才可不', '咋才可不', '如何可不']
lasttime_qwds = ['周期', '多久', '多长时间', '多少时间', '几天', '几年', '多少天', '多少小时', '几个小时', '多少年']
cureway_qwds = ['怎么治疗', '如何医治', '怎么医治', '怎么治', '怎么医', '如何治', '医治方式', '疗法', '咋治', '怎么办', '咋办', '咋治']
cureprob_qwds = ['多大概率能治好', '多大几率能治好', '治好希望大么', '几率', '几成', '比例', '可能性', '能治', '可治', '可以治', '可以医']
easyget_qwds = ['易感人群', '容易感染', '易发人群', '什么人', '哪些人', '感染', '染上', '得上']
check_qwds = ['检查', '检查项目', '查出', '检查', '测出', '试出']
belong_qwds = ['属于什么科', '属于', '什么科', '科室']
cure_qwds = ['治疗什么', '治啥', '治疗啥', '医治啥', '治愈啥', '主治啥', '主治什么', '有什么用', '有何用', '用处', '用途',
'有什么好处', '有什么益处', '有何益处', '用来', '用来做啥', '用来作甚', '需要', '要']
def __init__(self):
self.region_actree = None
self.word_kind_dict = None
self._init()
@staticmethod
def _load_line_file(file_path):
print(f"load file {
file_path}")
data_list = []
with open(file_path, 'r', encoding='utf8') as reader:
for line in reader:
if not line.strip():
continue
data_list.append(line.strip())
return data_list
def _init(self):
# load data
file_list = ["disease", "department", "check", "drug", "food", "producer", "symptoms", "deny"]
for index, file_path in enumerate(file_list):
data_list = self._load_line_file(os.path.join(SysConfig.DATA_DICT_DIR, file_path + ".txt"))
setattr(self, file_path + "_feature_words", data_list)
self.region_feature_words.update(data_list)
# build actree
self.region_actree = self._get_actree(list(self.region_feature_words))
# build word kind dict
self._build_word_kind_dict()
print("object init over")
def _build_word_kind_dict(self):
word_kind_dict = {
}
for word in tqdm.tqdm(self.region_feature_words, desc='building word kind dict'):
word_kind_dict.setdefault(word, [])
if word in self.disease_feature_words:
word_kind_dict[word].append("disease")
if word in self.department_feature_words:
word_kind_dict[word].append("department")
if word in self.check_feature_words:
word_kind_dict[word].append("check")
if word in self.drug_feature_words:
word_kind_dict[word].append("drug")
if word in self.food_feature_words:
word_kind_dict[word].append("food")
if word in self.symptom_feature_words:
word_kind_dict[word].append("symptom")
if word in self.producer_feature_words:
word_kind_dict[word].append("producer")
self.word_kind_dict = word_kind_dict
@staticmethod
def _get_actree(key_list):
actree = ahocorasick.Automaton()
for index, word in enumerate(key_list):
actree.add_word(word, (index, word))
actree.make_automaton()
return actree
def classify(self, question):
classify_res = {
}
medical_dict = self.check_query(question)
if not medical_dict:
return {
}
classify_res['args'] = medical_dict
region_word_kinds = []
for kinds in medical_dict.values():
region_word_kinds.extend(kinds)
question_kinds = []
# disease symptom
self.sub_classify(self.symptom_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_symptom")
# symptom disease
self.sub_classify(self.symptom_qwds, question, 'symptom', region_word_kinds, question_kinds, "symptom_disease")
# disease cause
self.sub_classify(self.cause_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_cause")
# disease accompany
self.sub_classify(self.acompany_qwds, question, 'disease', region_word_kinds, question_kinds,
"disease_accompany")
# disease food
if self.check_words(self.food_qwds, question) and 'disease' in region_word_kinds:
deny_status = self.check_words(self.deny_feature_words, question)
if deny_status:
question_kind = "disease_not_food"
else:
question_kind = "disease_do_food"
question_kinds.append(question_kind)
# food disease
if self.check_words(self.food_qwds + self.cure_qwds, question) and 'food' in region_word_kinds:
deny_status = self.check_words(self.deny_feature_words, question)
if deny_status:
question_kind = 'food_not_disease'
else:
question_kind = 'food_do_disease'
question_kinds.append(question_kind)
# disease_drug
self.sub_classify(self.drug_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_drug")
# drug disease
self.sub_classify(self.cure_qwds, question, 'drug', region_word_kinds, question_kinds, "drug_disease")
# disease check
self.sub_classify(self.check_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_check")
# check disease
self.sub_classify(self.check_qwds + self.cure_qwds, question, 'check', region_word_kinds, question_kinds,
"check_disease")
# disease prevent
self.sub_classify(self.prevent_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_prevent")
# disease last time
self.sub_classify(self.lasttime_qwds, question, 'disease', region_word_kinds, question_kinds,
"disease_lasttime")
# disease cure way
self.sub_classify(self.cureway_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_cureway")
# disease cure prob
self.sub_classify(self.cureprob_qwds, question, 'disease', region_word_kinds, question_kinds,
"disease_cureprob")
# disease easy get
self.sub_classify(self.easyget_qwds, question, 'disease', region_word_kinds, question_kinds, "disease_easyget")
# others deal
if question_kinds == [] and 'disease' in region_word_kinds:
question_kinds.append('disease_desc')
if question_kinds == [] and 'symptom' in region_word_kinds:
question_kinds.append('symptom_disease')
classify_res['question_kinds'] = question_kinds
return classify_res
def sub_classify(self, kind_qkwds, question, key, region_word_kinds, question_kinds, kind_type):
if self.check_words(kind_qkwds, question) and (key in region_word_kinds):
question_kinds.append(kind_type)
@staticmethod
def check_words(kws, question):
for kw in kws:
if kw in question:
return True
return False
def check_query(self, question):
region_feature_words = []
for i in self.region_actree.iter(question):
feature_word = i[1][1]
region_feature_words.append(feature_word)
inner_words = []
for i in range(len(region_feature_words)):
wi = region_feature_words[i]
for j in range(i + 1, len(region_feature_words)):
wj = region_feature_words[j]
if wi in wj and wi != wj:
inner_words.append(wi)
final_dict = {
word: self.word_kind_dict.get(word) for word in
filter(lambda x: x not in inner_words, region_feature_words)}
return final_dict
效果测试:
效果也基本上符合预期。当然,也可以使用实体识别识别出目标实体以及使用基于深度学习的模型对问题进行分类提高问题分类泛化能力以及召回效果。使用深度学习的方式去优化,也就意味着需要大量的标注数据。
边栏推荐
- 测试驱动开发(TDD)在线练功房 | 9月17日开课
- Fastadmin TP installation uses Baidu rich text editor ueeditor
- Test framework unittest test test suite, results output to file
- 用递归进行数组求和
- 【故障诊断】基于贝叶斯优化支持向量机的轴承故障诊断附matlab代码
- 百度富文本编辑器UEditor单张图片上传跨域
- 链游开发现成版 链游系统开发详细原理 链游源码交付
- SAP Fiori 的附件处理(Attachment handling)
- What is the shortcut key for win11 Desktop Switching? Win11 fast desktop switching method
- 城市燃气安全再拉警钟,如何防患于未“燃”?
猜你喜欢

吴恩达逻辑回归2

Fudan University EMBA peer topic: always put the value of consumers in the most important position

0x80131500打不开微软商店的解决办法

激荡20年,芯片产能从零起步到反超美国,中国制造的又一大成就

easyui修改以及datagrid dialog form控件使用

百度富文本编辑器UEditor单张图片上传跨域

Use huggingface to quickly load pre training models and datasets in moment pool cloud

伦敦银K线图的各种有用形态

152. Product maximum subarray

【ZeloEngine】反射系统填坑小结
随机推荐
C # simulation lottery
China's chip self-sufficiency rate has increased significantly, resulting in high foreign chip inventories and heavy losses. American chips can be said to have thrown themselves in the foot
Fudan University EMBA peer topic: always put the value of consumers in the most important position
Why 4everland is the best cloud computing platform for Web 3.0
Homepage portal classification query
Promise期约
[fault diagnosis] bearing fault diagnosis based on Bayesian optimization support vector machine with matlab code
Breakthrough in core technology of the large humanoid Service Robot Walker x
unity 最好用热更方案卧龙 wolong
02. 将参数props限制在一个类型的列表中
Getting started with easyUI
解决Win10磁盘占用100%
easyui下拉框,增加以及商品的上架,下架
Roson的Qt之旅#99 QML表格控件-TableView
【redis】redis安装
复旦大学EMBA2022毕业季丨毕业不忘初心 荣耀再上征程
聊聊如何用 Redis 实现分布式锁?
华泰vip账户证券开户安全吗
Budget report ppt
进程之间的通信(管道详解)