当前位置:网站首页>Research on medical knowledge atlas question answering system (I)
Research on medical knowledge atlas question answering system (I)
2022-07-01 04:41:00 【Necther】
1、 Project background
To increase the understanding of knowledge map through project practice , I have found almost all open source projects and video practical tutorials on the Internet .
Sure enough , Everything comes to him who waits , eureka Mr. liuhuanyong, Institute of software, Chinese Academy of Sciences stay github The open source project on , Q & a project in the field of medicine based on knowledge map QABasedOnMedicaKnowledgeGraph.
Project address :https://github.com/liuhuanyong/QASystemOnMedicalKG
It took two nights to build two sets ,Mac Edition and Windows edition , ha-ha , The successful running !!!
From scratch, build a knowledge map of a certain scale of medical field with disease as the center , Complete automatic question answering and analysis services with this knowledge map . The project is based on the field of Medicine , Take the vertical medicine website as the data source , Take disease as the core , Build a containing 7 Class size is 4.4 Million knowledge entities ,11 Class size approx 30 The knowledge atlas of ten thousand entity relations . This project will consist of the following two parts :
1、 Construction of medical knowledge map based on vertical website data
2、 Automatic Question Answering Based on medical knowledge map
2、 Project environment
2.1 windows System
There are many pits in the middle of the building , Do and pay attention to .
Configuration requirements : Configuration required neo4j Database and corresponding python Dependency package .neo4j Remember the database user name and password , And modify the corresponding documents .
install neo4j,neo4j rely on java jdk 1.8 Above version :
java jdk Installation method can refer to : windows Installation under system JDK8, Download address : https://download.oracle.com/otn-pub/java/jdk/8u201-b09/42970487e3af4f5aa5bca3f542482c60/jdk-8u201-windows-x64.exe
install neo4j Please refer to the blog : windows install neo4j, Download address : https://go.neo4j.com/download-thanks.html?edition=community&release=3.4.1&flavour=winzip
install python May refer to : Windows Installation in environment python2.7
according to neo4j Port at installation 、 Account 、 Password configuration settings setting project configuration file :answer_search.py & build_medicalgraph.py (github You can also download items according to your personal needs git)
Data import :python build_medicalgraph.py, More imported data , It's estimated to take a few hours .
python build_medicalgraph.py Before importing data , Need to be in this file main Add... To the function :

build_medicalgraph.py

Start Q & A :python chat_graph.py
2.2 Mac System
mac Bring it with you python、java jdk Environmental Science , It can be installed directly neo4j Graph database , Project operation steps and windows Is essentially the same .
Problem solving :
In case of any problem during installation, please contact Wechat: dandan-sbb.
2.3 Neo4j Database display

2.4 The running effect of question answering system

3、 Project introduction
The data of this project comes from the vertical medical website to seek medical advice , Use crawler script data_spider.py, Focus on structured data , A disease centered medical knowledge map has been constructed , Entity size 4.4 ten thousand , The size of the entity relationship 30 ten thousand .schema The design of is based on the collected structured data , The structured data of web pages xpath analysis .
The data storage of the project adopts Neo4j Graph database , The question answering system uses rule matching to complete , Data operation adopts neo4j Declarative cypher.
The deficiency of the project lies in the cause of the disease 、 Prevention, etc. return in large paragraphs , This can introduce event extraction , The reason can be expressed structurally .

3.1 Project directory
.
├── README.md
├── __pycache__ \\ The compilation result is saved in the directory
│ ├── answer_search.cpython-36.pyc
│ ├── question_classifier.cpython-36.pyc
│ └── question_parser.cpython-36.pyc
├── answer_search.py
├── answer_search.pyc
├── build_medicalgraph.py \\ Knowledge map data warehousing script
├── chatbot_graph.py \\ Q & a script
├── data
│ └── medicaln.json \\ All data of the project , adopt build_medicalgraph.py guide neo4j
├── dict
│ ├── check.txt \\ Diagnostic check project entity library
│ ├── deny.txt \\ Negative Thesaurus
│ ├── department.txt \\ Medical subject entity library
│ ├── disease.txt \\ Disease entity library
│ ├── drug.txt \\ Drug entity warehouse
│ ├── food.txt \\ Food entity bank
│ ├── producer.txt \\ Drug store on sale
│ └── symptom.txt \\ Disease symptom entity library
├── document
│ ├── chat1.png \\ Screenshot of system operation question and answer 01
│ ├── chat2.png \\ Screenshot of system operation question and answer 01
│ ├── kg_route.png \\ Knowledge map construction framework
│ ├── qa_route.png \\ Q & a system framework
├── img \\README.md Pictures used in
│ ├── chat1.png
│ ├── chat2.png
│ ├── graph_summary.png
│ ├── kg_route.png
│ └── qa_route.png
├── prepare_data
│ ├── build_data.py \\ Database operation script
│ ├── data_spider.py \\ Network information collection script
│ └── max_cut.py \\ Dictionary based maximum forward / Backward script
├── question_classifier.py \\ Question type classification script
├── question_classifier.pyc
├── question_parser.py \\ Question parsing script
├── question_parser.pyc3.2 Entity type of knowledge map

3.3 The entity relation type of knowledge map

3.4 The attribute types of knowledge map

3.5 Q & a project implementation principle

The question answering system of this project is completely based on rule matching , By keyword matching , Classify questions , The medical problem itself belongs to a closed domain scenario , Enumerate and classify the domain problems , And then use cypher Of match To match and find neo4j, Assemble questions and answer according to the returned data , Last result returned .
Keyword matching in question sentences :

Classify questions according to the matching keywords

Question analysis

Find relevant data

Assemble the answer according to the returned data

3.6 The question answering system supports question answering types

4、 Project summary
Rule-based question answering system has no complicated algorithm , Template matching is generally used to find the answer with the highest matching degree , The answer depends on the type of question 、 The coverage of template corpus , Facing known problems , Can give the right answer , For questions or question types whose templates do not match , There are three kinds of answers that are often encountered :
1、 Give a nonsense answer ;
2、 I don't know , Prompt the user to ask in another way ;
3、 Change the subject , Avoid questions ;
for example , In this project, a tactful way is used to answer don't know :

The main feature of the question answering system based on knowledge atlas is knowledge atlas , The system depends on one or more domain entities , And reasoning or deduction based on the graph , Answer users' questions in depth , The question answering system based on knowledge map is better at answering knowledge questions , Different from the template based chat robot, it is more direct 、 Intuitive answers to users . For those who cannot answer 、 Or unknown problems , Generally, it directly returns failure , Instead of changing the subject to avoid embarrassment .
The quality of the whole question answering system depends on the quantity and quality of knowledge in the knowledge map . Both advantages and disadvantages coexist ! Knowledge map map has good scalability , The knowledge atlas is expanded, that is, the knowledge base of the question and answer system is expanded . If the question is within range , It's easy to answer , But if unfortunately miss the target , Experience a big discount .
边栏推荐
- Maixll-Dock 快速上手
- Talk about testdeploy
- Haskell lightweight threads overhead and use on multicores
- 2. Use of classlist (element class name)
- Simple implementation of slf4j
- How to view the changes and opportunities in the construction of smart cities?
- Grey correlation cases and codes
- 总结全了,低代码还需要解决这4点问题
- Applications and features of VR online exhibition
- Embedded System Development Notes 81: Using Dialog component to design prompt dialog box
猜你喜欢

数据加载及预处理

Concurrent mode of different performance testing tools

2022 a special equipment related management (elevator) simulation test and a special equipment related management (elevator) certificate examination

Section 27 remote access virtual private network workflow and experimental demonstration

Pytorch(四) —— 可视化工具 Visdom

Daily question - line 10

Strategic suggestions and future development trend of global and Chinese vibration isolator market investment report 2022 Edition

Question bank and online simulation examination for special operation certificate of G1 industrial boiler stoker in 2022

Mallbook: how can hotel enterprises break the situation in the post epidemic era?

Software testing needs more and more talents. Why do you still not want to take this path?
随机推荐
Talk about testdeploy
The junior college students were angry for 32 days, four rounds of interviews, five hours of soul torture, and won Ali's offer with tears
Software testing needs more and more talents. Why do you still not want to take this path?
【硬十宝典】——2.【基础知识】开关电源各种拓扑结构的特点
About the transmission pipeline of stage in spark
Section 27 remote access virtual private network workflow and experimental demonstration
slf4j 简单实现
Odeint et GPU
Common interview questions ①
Grey correlation cases and codes
[pat (basic level) practice] - [simple simulation] 1064 friends
I also gave you the MySQL interview questions of Boda factory. If you need to come in and take your own
Internet winter, how to spend three months to make a comeback
Web server: how to choose a good web server these five aspects should be paid attention to
LM small programmable controller software (based on CoDeSys) note 20: PLC controls stepping motor through driver
TCP server communication flow
[difficult] sqlserver2008r2, can you recover only some files when recovering the database?
Maixll dock quick start
[godot] unity's animator is different from Godot's animplayer
After many job hopping, the monthly salary is equal to the annual salary of old colleagues