当前位置:网站首页>Pattern matching: The gestalt approach一种序列的文本相似度方法
Pattern matching: The gestalt approach一种序列的文本相似度方法
2020-11-06 01:28:00 【Elementary school students in IT field】
Reprint please indicate original :https://blog.csdn.net/HHTNAN
Pattern matching: The gestalt approach
python Compare the similarity of two sequences , There is no need for a participle
Case study 1
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" What does tinea cruris look like ? How to treat tinea cruris good ?"
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.06666666666666667
Case study 2
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Do uterine fibroids minimally invasive surgery specific costs "
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.769230769
Case study 3
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Specific cost to do uterine fibroids minimally invasive surgery "
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.6923076923076923
Case study 4
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Specific cost of uterine fibroids to do minimally invasive surgery "
print (difflib.SequenceMatcher(None,a,b).ratio())
0.6153846153846154
Through the above case, we can see that the algorithm focuses on , It's sequence similarity . Will ignore the meaning of the subject 、 semantics .
The score returned by the algorithm is twice the number of sequence characters found by the algorithm divided by the total number of characters in two strings ; The score is returned as an integer , Reflect percentage match .
At present, the calculation formula of guessing algorithm is ,
If the positions in the sequence don't exactly match , Such as the case 3, Then the calculated score is 9/13,9 For the largest common string ,13 Is the total number of character sequences , Case study 4 by 8/13 Result , Understood as a 4+4/13 Result . So the question is why the case 2 The largest of 9 The score for the largest common string is so high , There should be a consistent score in one position +1. That is, the result is understood as 9+1/13 The result . The above conjectures are based on the test , It's not validated , It's not authoritative , I'll find the paper and read it later , Finishing again .( It is worth noting that in the process of re-engineering, it is to B On the basis of characters .)
Case study 5
import difflib
a=“10 Anemia in a month old baby ”
b=“10 A month old baby has nosebleed ”
print (difflib.SequenceMatcher(None,a,b).ratio())
Output
0.8235294117647058
(7+8)+1/len(a)+len(b)=7*2/8+9=0.8235294117647058
Reprint please indicate original :https://blog.csdn.net/HHTNAN
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- Let the front-end siege division develop independently from the back-end: Mock.js
- 一篇文章带你了解CSS3 背景知识
- Network security engineer Demo: the original * * is to get your computer administrator rights! 【***】
- Python3 e-learning case 4: writing web proxy
- Python saves the list data
- NLP model Bert: from introduction to mastery (2)
- Character string and memory operation function in C language
- I'm afraid that the spread sequence calculation of arbitrage strategy is not as simple as you think
- 教你轻松搞懂vue-codemirror的基本用法:主要实现代码编辑、验证提示、代码格式化
- CCR炒币机器人:“比特币”数字货币的大佬,你不得不了解的知识
猜你喜欢
100元扫货阿里云是怎样的体验?
PHPSHE 短信插件说明
Subordination judgment in structured data
ipfs正舵者Filecoin落地正当时 FIL币价格破千来了
一篇文章带你了解CSS3图片边框
前端都应懂的入门基础-github基础
至联云解析:IPFS/Filecoin挖矿为什么这么难?
CCR炒币机器人:“比特币”数字货币的大佬,你不得不了解的知识
Do not understand UML class diagram? Take a look at this edition of rural love class diagram, a learn!
Python saves the list data
随机推荐
前端都应懂的入门基础-github基础
Vite + TS quickly build vue3 project and introduce related features
Classical dynamic programming: complete knapsack problem
一篇文章带你了解CSS 渐变知识
Group count - word length
TRON智能钱包PHP开发包【零TRX归集】
Relationship between business policies, business rules, business processes and business master data - modern analysis
5.4 static resource mapping
It's so embarrassing, fans broke ten thousand, used for a year!
Linked blocking Queue Analysis of blocking queue
Skywalking series blog 1 - install stand-alone skywalking
Existence judgment in structured data
Character string and memory operation function in C language
Examples of unconventional aggregation
Skywalking series blog 5-apm-customize-enhance-plugin
Using Es5 to realize the class of ES6
5.5 controlleradvice notes - SSM in depth analysis and project practice
6.3 handlerexceptionresolver exception handling (in-depth analysis of SSM and project practice)
一篇文章教会你使用HTML5 SVG 标签
html