当前位置:网站首页>Pattern matching: The gestalt approach一种序列的文本相似度方法
Pattern matching: The gestalt approach一种序列的文本相似度方法
2020-11-06 01:28:00 【Elementary school students in IT field】
Reprint please indicate original :https://blog.csdn.net/HHTNAN
Pattern matching: The gestalt approach
python Compare the similarity of two sequences , There is no need for a participle
Case study 1
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" What does tinea cruris look like ? How to treat tinea cruris good ?"
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.06666666666666667
Case study 2
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Do uterine fibroids minimally invasive surgery specific costs "
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.769230769
Case study 3
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Specific cost to do uterine fibroids minimally invasive surgery "
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.6923076923076923
Case study 4
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Specific cost of uterine fibroids to do minimally invasive surgery "
print (difflib.SequenceMatcher(None,a,b).ratio())
0.6153846153846154
Through the above case, we can see that the algorithm focuses on , It's sequence similarity . Will ignore the meaning of the subject 、 semantics .
The score returned by the algorithm is twice the number of sequence characters found by the algorithm divided by the total number of characters in two strings ; The score is returned as an integer , Reflect percentage match .
At present, the calculation formula of guessing algorithm is ,
If the positions in the sequence don't exactly match , Such as the case 3, Then the calculated score is 9/13,9 For the largest common string ,13 Is the total number of character sequences , Case study 4 by 8/13 Result , Understood as a 4+4/13 Result . So the question is why the case 2 The largest of 9 The score for the largest common string is so high , There should be a consistent score in one position +1. That is, the result is understood as 9+1/13 The result . The above conjectures are based on the test , It's not validated , It's not authoritative , I'll find the paper and read it later , Finishing again .( It is worth noting that in the process of re-engineering, it is to B On the basis of characters .)
Case study 5
import difflib
a=“10 Anemia in a month old baby ”
b=“10 A month old baby has nosebleed ”
print (difflib.SequenceMatcher(None,a,b).ratio())
Output
0.8235294117647058
(7+8)+1/len(a)+len(b)=7*2/8+9=0.8235294117647058
Reprint please indicate original :https://blog.csdn.net/HHTNAN
data:image/s3,"s3://crabby-images/5462b/5462b01b2ed42a21720996f8ab558e8181ec7db9" alt="WeChat ID"
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- 带你学习ES5中新增的方法
- IPFS/Filecoin合法性:保护个人隐私不被泄露
- How to select the evaluation index of classification model
- Python + appium automatic operation wechat is enough
- Using Es5 to realize the class of ES6
- Flink的DataSource三部曲之二:内置connector
- Summary of common algorithms of binary tree
- Electron application uses electronic builder and electronic updater to realize automatic update
- 华为云“四个可靠”的方法论
- Skywalking series blog 1 - install stand-alone skywalking
猜你喜欢
PN8162 20W PD快充芯片,PD快充充电器方案
前端都应懂的入门基础-github基础
2019年的一个小目标,成为csdn的博客专家,纪念一下
Python基础数据类型——tuple浅析
Python基础变量类型——List浅析
Three Python tips for reading, creating and running multiple files
What to do if you are squeezed by old programmers? I don't want to quit
Brief introduction of TF flags
Working principle of gradient descent algorithm in machine learning
Thoughts on interview of Ali CCO project team
随机推荐
Existence judgment in structured data
Linked blocking Queue Analysis of blocking queue
一篇文章带你了解HTML表格及其主要属性介绍
中小微企业选择共享办公室怎么样?
Vite + TS quickly build vue3 project and introduce related features
华为云“四个可靠”的方法论
6.3 handlerexceptionresolver exception handling (in-depth analysis of SSM and project practice)
Our best practices for writing react components
Save the file directly to Google drive and download it back ten times faster
What problems can clean architecture solve? - jbogard
What is the difference between data scientists and machine learning engineers? - kdnuggets
This article will introduce you to jest unit test
Arrangement of basic knowledge points
[JMeter] two ways to realize interface Association: regular representation extractor and JSON extractor
What is the side effect free method? How to name it? - Mario
git rebase的時候捅婁子了,怎麼辦?線上等……
Flink的DataSource三部曲之二:内置connector
6.1.1 handlermapping mapping processor (1) (in-depth analysis of SSM and project practice)
Common algorithm interview has been out! Machine learning algorithm interview - KDnuggets
OPTIMIZER_ Trace details