当前位置:网站首页>Pattern matching: The gestalt approach一种序列的文本相似度方法
Pattern matching: The gestalt approach一种序列的文本相似度方法
2020-11-06 01:28:00 【Elementary school students in IT field】
Reprint please indicate original :https://blog.csdn.net/HHTNAN
Pattern matching: The gestalt approach
python Compare the similarity of two sequences , There is no need for a participle
Case study 1
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" What does tinea cruris look like ? How to treat tinea cruris good ?"
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.06666666666666667
Case study 2
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Do uterine fibroids minimally invasive surgery specific costs "
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.769230769
Case study 3
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Specific cost to do uterine fibroids minimally invasive surgery "
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.6923076923076923
Case study 4
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Specific cost of uterine fibroids to do minimally invasive surgery "
print (difflib.SequenceMatcher(None,a,b).ratio())
0.6153846153846154
Through the above case, we can see that the algorithm focuses on , It's sequence similarity . Will ignore the meaning of the subject 、 semantics .
The score returned by the algorithm is twice the number of sequence characters found by the algorithm divided by the total number of characters in two strings ; The score is returned as an integer , Reflect percentage match .
At present, the calculation formula of guessing algorithm is ,
If the positions in the sequence don't exactly match , Such as the case 3, Then the calculated score is 9/13,9 For the largest common string ,13 Is the total number of character sequences , Case study 4 by 8/13 Result , Understood as a 4+4/13 Result . So the question is why the case 2 The largest of 9 The score for the largest common string is so high , There should be a consistent score in one position +1. That is, the result is understood as 9+1/13 The result . The above conjectures are based on the test , It's not validated , It's not authoritative , I'll find the paper and read it later , Finishing again .( It is worth noting that in the process of re-engineering, it is to B On the basis of characters .)
Case study 5
import difflib
a=“10 Anemia in a month old baby ”
b=“10 A month old baby has nosebleed ”
print (difflib.SequenceMatcher(None,a,b).ratio())
Output
0.8235294117647058
(7+8)+1/len(a)+len(b)=7*2/8+9=0.8235294117647058
Reprint please indicate original :https://blog.csdn.net/HHTNAN

版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
猜你喜欢
数字城市响应相关国家政策大力发展数字孪生平台的建设
一篇文章带你了解SVG 渐变知识
Just now, I popularized two unique skills of login to Xuemei
2019年的一个小目标,成为csdn的博客专家,纪念一下
NLP model Bert: from introduction to mastery (1)
How to select the evaluation index of classification model
ES6学习笔记(二):教你玩转类的继承和类的对象
vue任意关系组件通信与跨组件监听状态 vue-communication
Grouping operation aligned with specified datum
CCR炒币机器人:“比特币”数字货币的大佬,你不得不了解的知识
随机推荐
Our best practices for writing react components
Wiremock: a powerful tool for API testing
htmlcss
Do not understand UML class diagram? Take a look at this edition of rural love class diagram, a learn!
Skywalking series blog 5-apm-customize-enhance-plugin
H5 makes its own video player (JS Part 2)
Summary of common algorithms of linked list
从海外进军中国,Rancher要执容器云市场牛耳 | 爱分析调研
在大规模 Kubernetes 集群上实现高 SLO 的方法
Brief introduction and advantages and disadvantages of deepwalk model
Common algorithm interview has been out! Machine learning algorithm interview - KDnuggets
Using Es5 to realize the class of ES6
加速「全民直播」洪流,如何攻克延时、卡顿、高并发难题?
html
With the advent of tensorflow 2.0, can pytoch still shake the status of big brother?
NLP model Bert: from introduction to mastery (1)
华为云“四个可靠”的方法论
至联云解析:IPFS/Filecoin挖矿为什么这么难?
比特币一度突破14000美元,即将面临美国大选考验
Classical dynamic programming: complete knapsack problem