当前位置:网站首页>Research on medical knowledge atlas question answering system (I)
Research on medical knowledge atlas question answering system (I)
2022-07-01 04:41:00 【Necther】
1、 Project background
To increase the understanding of knowledge map through project practice , I have found almost all open source projects and video practical tutorials on the Internet .
Sure enough , Everything comes to him who waits , eureka Mr. liuhuanyong, Institute of software, Chinese Academy of Sciences stay github The open source project on , Q & a project in the field of medicine based on knowledge map QABasedOnMedicaKnowledgeGraph.
Project address :https://github.com/liuhuanyong/QASystemOnMedicalKG
It took two nights to build two sets ,Mac Edition and Windows edition , ha-ha , The successful running !!!
From scratch, build a knowledge map of a certain scale of medical field with disease as the center , Complete automatic question answering and analysis services with this knowledge map . The project is based on the field of Medicine , Take the vertical medicine website as the data source , Take disease as the core , Build a containing 7 Class size is 4.4 Million knowledge entities ,11 Class size approx 30 The knowledge atlas of ten thousand entity relations . This project will consist of the following two parts :
1、 Construction of medical knowledge map based on vertical website data
2、 Automatic Question Answering Based on medical knowledge map
2、 Project environment
2.1 windows System
There are many pits in the middle of the building , Do and pay attention to .
Configuration requirements : Configuration required neo4j Database and corresponding python Dependency package .neo4j Remember the database user name and password , And modify the corresponding documents .
install neo4j,neo4j rely on java jdk 1.8 Above version :
java jdk Installation method can refer to : windows Installation under system JDK8, Download address : https://download.oracle.com/otn-pub/java/jdk/8u201-b09/42970487e3af4f5aa5bca3f542482c60/jdk-8u201-windows-x64.exe
install neo4j Please refer to the blog : windows install neo4j, Download address : https://go.neo4j.com/download-thanks.html?edition=community&release=3.4.1&flavour=winzip
install python May refer to : Windows Installation in environment python2.7
according to neo4j Port at installation 、 Account 、 Password configuration settings setting project configuration file :answer_search.py & build_medicalgraph.py (github You can also download items according to your personal needs git)
Data import :python build_medicalgraph.py, More imported data , It's estimated to take a few hours .
python build_medicalgraph.py Before importing data , Need to be in this file main Add... To the function :

build_medicalgraph.py

Start Q & A :python chat_graph.py
2.2 Mac System
mac Bring it with you python、java jdk Environmental Science , It can be installed directly neo4j Graph database , Project operation steps and windows Is essentially the same .
Problem solving :
In case of any problem during installation, please contact Wechat: dandan-sbb.
2.3 Neo4j Database display

2.4 The running effect of question answering system

3、 Project introduction
The data of this project comes from the vertical medical website to seek medical advice , Use crawler script data_spider.py, Focus on structured data , A disease centered medical knowledge map has been constructed , Entity size 4.4 ten thousand , The size of the entity relationship 30 ten thousand .schema The design of is based on the collected structured data , The structured data of web pages xpath analysis .
The data storage of the project adopts Neo4j Graph database , The question answering system uses rule matching to complete , Data operation adopts neo4j Declarative cypher.
The deficiency of the project lies in the cause of the disease 、 Prevention, etc. return in large paragraphs , This can introduce event extraction , The reason can be expressed structurally .

3.1 Project directory
.
├── README.md
├── __pycache__ \\ The compilation result is saved in the directory
│ ├── answer_search.cpython-36.pyc
│ ├── question_classifier.cpython-36.pyc
│ └── question_parser.cpython-36.pyc
├── answer_search.py
├── answer_search.pyc
├── build_medicalgraph.py \\ Knowledge map data warehousing script
├── chatbot_graph.py \\ Q & a script
├── data
│ └── medicaln.json \\ All data of the project , adopt build_medicalgraph.py guide neo4j
├── dict
│ ├── check.txt \\ Diagnostic check project entity library
│ ├── deny.txt \\ Negative Thesaurus
│ ├── department.txt \\ Medical subject entity library
│ ├── disease.txt \\ Disease entity library
│ ├── drug.txt \\ Drug entity warehouse
│ ├── food.txt \\ Food entity bank
│ ├── producer.txt \\ Drug store on sale
│ └── symptom.txt \\ Disease symptom entity library
├── document
│ ├── chat1.png \\ Screenshot of system operation question and answer 01
│ ├── chat2.png \\ Screenshot of system operation question and answer 01
│ ├── kg_route.png \\ Knowledge map construction framework
│ ├── qa_route.png \\ Q & a system framework
├── img \\README.md Pictures used in
│ ├── chat1.png
│ ├── chat2.png
│ ├── graph_summary.png
│ ├── kg_route.png
│ └── qa_route.png
├── prepare_data
│ ├── build_data.py \\ Database operation script
│ ├── data_spider.py \\ Network information collection script
│ └── max_cut.py \\ Dictionary based maximum forward / Backward script
├── question_classifier.py \\ Question type classification script
├── question_classifier.pyc
├── question_parser.py \\ Question parsing script
├── question_parser.pyc3.2 Entity type of knowledge map

3.3 The entity relation type of knowledge map

3.4 The attribute types of knowledge map

3.5 Q & a project implementation principle

The question answering system of this project is completely based on rule matching , By keyword matching , Classify questions , The medical problem itself belongs to a closed domain scenario , Enumerate and classify the domain problems , And then use cypher Of match To match and find neo4j, Assemble questions and answer according to the returned data , Last result returned .
Keyword matching in question sentences :

Classify questions according to the matching keywords

Question analysis

Find relevant data

Assemble the answer according to the returned data

3.6 The question answering system supports question answering types

4、 Project summary
Rule-based question answering system has no complicated algorithm , Template matching is generally used to find the answer with the highest matching degree , The answer depends on the type of question 、 The coverage of template corpus , Facing known problems , Can give the right answer , For questions or question types whose templates do not match , There are three kinds of answers that are often encountered :
1、 Give a nonsense answer ;
2、 I don't know , Prompt the user to ask in another way ;
3、 Change the subject , Avoid questions ;
for example , In this project, a tactful way is used to answer don't know :

The main feature of the question answering system based on knowledge atlas is knowledge atlas , The system depends on one or more domain entities , And reasoning or deduction based on the graph , Answer users' questions in depth , The question answering system based on knowledge map is better at answering knowledge questions , Different from the template based chat robot, it is more direct 、 Intuitive answers to users . For those who cannot answer 、 Or unknown problems , Generally, it directly returns failure , Instead of changing the subject to avoid embarrassment .
The quality of the whole question answering system depends on the quantity and quality of knowledge in the knowledge map . Both advantages and disadvantages coexist ! Knowledge map map has good scalability , The knowledge atlas is expanded, that is, the knowledge base of the question and answer system is expanded . If the question is within range , It's easy to answer , But if unfortunately miss the target , Experience a big discount .
边栏推荐
- [pat (basic level) practice] - [simple simulation] 1064 friends
- Use winmtr software to simply analyze, track and detect network routing
- Openresty rewrites the location of 302
- Pytorch(三) —— 函数优化
- Threejs opening
- What is uid? What is auth? What is a verifier?
- Talk about testdeploy
- 2022年煤气考试题库及在线模拟考试
- 网站服务器:好用的网站服务器怎么选这五方面要关注
- 2022 hoisting machinery command registration examination and hoisting machinery command examination registration
猜你喜欢

Basic usage, principle and details of session

Common thread methods and daemon threads

283. move zero

2022 G2 power station boiler stoker examination question bank and G2 power station boiler stoker simulation examination question bank

One job hopping up 8K, three times in five years

Leetcode learning - day 36

Extension fragment

2022 t elevator repair question bank and simulation test

Tip of edge browser: enter+ctrl can automatically convert the address bar into a web address

Knowledge supplement: basic usage of redis based on docker
随机推荐
Introduction of Spock unit test framework and its practice in meituan optimization___ Chapter I
尺取法:有效三角形的个数
扩展-Fragment
[Master / slave] router election in DD message
Internet winter, how to spend three months to make a comeback
嵌入式系统开发笔记79:为什么要获取本机网卡IP地址
I also gave you the MySQL interview questions of Boda factory. If you need to come in and take your own
What are permissions? What are roles? What are users?
How to view the changes and opportunities in the construction of smart cities?
如何看待智慧城市建设中的改变和机遇?
Question bank and answers for chemical automation control instrument operation certificate examination in 2022
VIM简易使用教程
Measurement of quadrature axis and direct axis inductance of three-phase permanent magnet synchronous motor
嵌入式系統開發筆記80:應用Qt Designer進行主界面設計
Openresty rewrites the location of 302
Codeforces Round #771 (Div. 2) ABCD|E
2022年聚合工艺考试题及模拟考试
2022危险化学品生产单位安全生产管理人员题库及答案
OSPF notes [dr and bdr]
Mallbook: how can hotel enterprises break the situation in the post epidemic era?