当前位置:网站首页>Research on medical knowledge atlas question answering system (I)
Research on medical knowledge atlas question answering system (I)
2022-07-01 04:41:00 【Necther】
1、 Project background
To increase the understanding of knowledge map through project practice , I have found almost all open source projects and video practical tutorials on the Internet .
Sure enough , Everything comes to him who waits , eureka Mr. liuhuanyong, Institute of software, Chinese Academy of Sciences stay github The open source project on , Q & a project in the field of medicine based on knowledge map QABasedOnMedicaKnowledgeGraph.
Project address :https://github.com/liuhuanyong/QASystemOnMedicalKG
It took two nights to build two sets ,Mac Edition and Windows edition , ha-ha , The successful running !!!
From scratch, build a knowledge map of a certain scale of medical field with disease as the center , Complete automatic question answering and analysis services with this knowledge map . The project is based on the field of Medicine , Take the vertical medicine website as the data source , Take disease as the core , Build a containing 7 Class size is 4.4 Million knowledge entities ,11 Class size approx 30 The knowledge atlas of ten thousand entity relations . This project will consist of the following two parts :
1、 Construction of medical knowledge map based on vertical website data
2、 Automatic Question Answering Based on medical knowledge map
2、 Project environment
2.1 windows System
There are many pits in the middle of the building , Do and pay attention to .
Configuration requirements : Configuration required neo4j Database and corresponding python Dependency package .neo4j Remember the database user name and password , And modify the corresponding documents .
install neo4j,neo4j rely on java jdk 1.8 Above version :
java jdk Installation method can refer to : windows Installation under system JDK8, Download address : https://download.oracle.com/otn-pub/java/jdk/8u201-b09/42970487e3af4f5aa5bca3f542482c60/jdk-8u201-windows-x64.exe
install neo4j Please refer to the blog : windows install neo4j, Download address : https://go.neo4j.com/download-thanks.html?edition=community&release=3.4.1&flavour=winzip
install python May refer to : Windows Installation in environment python2.7
according to neo4j Port at installation 、 Account 、 Password configuration settings setting project configuration file :answer_search.py & build_medicalgraph.py (github You can also download items according to your personal needs git)
Data import :python build_medicalgraph.py, More imported data , It's estimated to take a few hours .
python build_medicalgraph.py Before importing data , Need to be in this file main Add... To the function :

build_medicalgraph.py

Start Q & A :python chat_graph.py
2.2 Mac System
mac Bring it with you python、java jdk Environmental Science , It can be installed directly neo4j Graph database , Project operation steps and windows Is essentially the same .
Problem solving :
In case of any problem during installation, please contact Wechat: dandan-sbb.
2.3 Neo4j Database display

2.4 The running effect of question answering system

3、 Project introduction
The data of this project comes from the vertical medical website to seek medical advice , Use crawler script data_spider.py, Focus on structured data , A disease centered medical knowledge map has been constructed , Entity size 4.4 ten thousand , The size of the entity relationship 30 ten thousand .schema The design of is based on the collected structured data , The structured data of web pages xpath analysis .
The data storage of the project adopts Neo4j Graph database , The question answering system uses rule matching to complete , Data operation adopts neo4j Declarative cypher.
The deficiency of the project lies in the cause of the disease 、 Prevention, etc. return in large paragraphs , This can introduce event extraction , The reason can be expressed structurally .

3.1 Project directory
.
├── README.md
├── __pycache__ \\ The compilation result is saved in the directory
│ ├── answer_search.cpython-36.pyc
│ ├── question_classifier.cpython-36.pyc
│ └── question_parser.cpython-36.pyc
├── answer_search.py
├── answer_search.pyc
├── build_medicalgraph.py \\ Knowledge map data warehousing script
├── chatbot_graph.py \\ Q & a script
├── data
│ └── medicaln.json \\ All data of the project , adopt build_medicalgraph.py guide neo4j
├── dict
│ ├── check.txt \\ Diagnostic check project entity library
│ ├── deny.txt \\ Negative Thesaurus
│ ├── department.txt \\ Medical subject entity library
│ ├── disease.txt \\ Disease entity library
│ ├── drug.txt \\ Drug entity warehouse
│ ├── food.txt \\ Food entity bank
│ ├── producer.txt \\ Drug store on sale
│ └── symptom.txt \\ Disease symptom entity library
├── document
│ ├── chat1.png \\ Screenshot of system operation question and answer 01
│ ├── chat2.png \\ Screenshot of system operation question and answer 01
│ ├── kg_route.png \\ Knowledge map construction framework
│ ├── qa_route.png \\ Q & a system framework
├── img \\README.md Pictures used in
│ ├── chat1.png
│ ├── chat2.png
│ ├── graph_summary.png
│ ├── kg_route.png
│ └── qa_route.png
├── prepare_data
│ ├── build_data.py \\ Database operation script
│ ├── data_spider.py \\ Network information collection script
│ └── max_cut.py \\ Dictionary based maximum forward / Backward script
├── question_classifier.py \\ Question type classification script
├── question_classifier.pyc
├── question_parser.py \\ Question parsing script
├── question_parser.pyc3.2 Entity type of knowledge map

3.3 The entity relation type of knowledge map

3.4 The attribute types of knowledge map

3.5 Q & a project implementation principle

The question answering system of this project is completely based on rule matching , By keyword matching , Classify questions , The medical problem itself belongs to a closed domain scenario , Enumerate and classify the domain problems , And then use cypher Of match To match and find neo4j, Assemble questions and answer according to the returned data , Last result returned .
Keyword matching in question sentences :

Classify questions according to the matching keywords

Question analysis

Find relevant data

Assemble the answer according to the returned data

3.6 The question answering system supports question answering types

4、 Project summary
Rule-based question answering system has no complicated algorithm , Template matching is generally used to find the answer with the highest matching degree , The answer depends on the type of question 、 The coverage of template corpus , Facing known problems , Can give the right answer , For questions or question types whose templates do not match , There are three kinds of answers that are often encountered :
1、 Give a nonsense answer ;
2、 I don't know , Prompt the user to ask in another way ;
3、 Change the subject , Avoid questions ;
for example , In this project, a tactful way is used to answer don't know :

The main feature of the question answering system based on knowledge atlas is knowledge atlas , The system depends on one or more domain entities , And reasoning or deduction based on the graph , Answer users' questions in depth , The question answering system based on knowledge map is better at answering knowledge questions , Different from the template based chat robot, it is more direct 、 Intuitive answers to users . For those who cannot answer 、 Or unknown problems , Generally, it directly returns failure , Instead of changing the subject to avoid embarrassment .
The quality of the whole question answering system depends on the quantity and quality of knowledge in the knowledge map . Both advantages and disadvantages coexist ! Knowledge map map has good scalability , The knowledge atlas is expanded, that is, the knowledge base of the question and answer system is expanded . If the question is within range , It's easy to answer , But if unfortunately miss the target , Experience a big discount .
边栏推荐
- 为什么香港服务器最适合海外建站使用
- CF1638E. Colorful operations Kodori tree + differential tree array
- 【深度学习】(4) Transformer 中的 Decoder 机制,附Pytorch完整代码
- Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation 阅读笔记
- Leetcode learning - day 36
- Common interview questions ①
- 2022 t elevator repair question bank and simulation test
- [pat (basic level) practice] - [simple simulation] 1064 friends
- Pytorch(二) —— 激活函数、损失函数及其梯度
- 2022年T电梯修理题库及模拟考试
猜你喜欢

Offline installation of Wireshark 2.6.10

Annual inventory review of Alibaba cloud's observable practices in 2021

Basic usage, principle and details of session

Shell之一键自动部署Redis任意版本

Kodori tree board

(12) Somersault cloud case (navigation bar highlights follow)

LM small programmable controller software (based on CoDeSys) note 19: errors do not match the profile of the target

LM小型可编程控制器软件(基于CoDeSys)笔记十九:报错does not match the profile of the target

TASK04|数理统计
![[leetcode skimming] February summary (updating)](/img/62/0d0d9f11434e49d33754a2e4f2ea65.jpg)
[leetcode skimming] February summary (updating)
随机推荐
Learn Chapter 20 of vue3 (keep alive cache component)
CF1638E colorful operations
CF1638E. Colorful operations Kodori tree + differential tree array
Mallbook: how can hotel enterprises break the situation in the post epidemic era?
js 图片路径转换base64格式
Knowledge supplement: redis' basic data types and corresponding commands
2022 question bank and answers for safety production management personnel of hazardous chemical production units
About the transmission pipeline of stage in spark
[send email with error] 535 error:authentication failed
Ten wastes of software research and development: the other side of research and development efficiency
Some small knowledge points
Day 52 - tree problem
Applications and features of VR online exhibition
Obtain detailed ideas for ABCDEF questions of 2022 American Games
Haskell lightweight threads overhead and use on multicores
网站服务器:好用的网站服务器怎么选这五方面要关注
Shell analysis server log command collection
selenium打开chrome浏览器时弹出设置页面:Mircrosoft Defender 防病毒要重置您的设置
[pat (basic level) practice] - [simple simulation] 1064 friends
Odeint and GPU