当前位置:网站首页>Pattern matching: The gestalt approach一种序列的文本相似度方法
Pattern matching: The gestalt approach一种序列的文本相似度方法
2020-11-06 01:28:00 【Elementary school students in IT field】
Reprint please indicate original :https://blog.csdn.net/HHTNAN
Pattern matching: The gestalt approach
python Compare the similarity of two sequences , There is no need for a participle
Case study 1
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" What does tinea cruris look like ? How to treat tinea cruris good ?"
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.06666666666666667
Case study 2
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Do uterine fibroids minimally invasive surgery specific costs "
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.769230769
Case study 3
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Specific cost to do uterine fibroids minimally invasive surgery "
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.6923076923076923
Case study 4
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Specific cost of uterine fibroids to do minimally invasive surgery "
print (difflib.SequenceMatcher(None,a,b).ratio())
0.6153846153846154
Through the above case, we can see that the algorithm focuses on , It's sequence similarity . Will ignore the meaning of the subject 、 semantics .
The score returned by the algorithm is twice the number of sequence characters found by the algorithm divided by the total number of characters in two strings ; The score is returned as an integer , Reflect percentage match .
At present, the calculation formula of guessing algorithm is ,
If the positions in the sequence don't exactly match , Such as the case 3, Then the calculated score is 9/13,9 For the largest common string ,13 Is the total number of character sequences , Case study 4 by 8/13 Result , Understood as a 4+4/13 Result . So the question is why the case 2 The largest of 9 The score for the largest common string is so high , There should be a consistent score in one position +1. That is, the result is understood as 9+1/13 The result . The above conjectures are based on the test , It's not validated , It's not authoritative , I'll find the paper and read it later , Finishing again .( It is worth noting that in the process of re-engineering, it is to B On the basis of characters .)
Case study 5
import difflib
a=“10 Anemia in a month old baby ”
b=“10 A month old baby has nosebleed ”
print (difflib.SequenceMatcher(None,a,b).ratio())
Output
0.8235294117647058
(7+8)+1/len(a)+len(b)=7*2/8+9=0.8235294117647058
Reprint please indicate original :https://blog.csdn.net/HHTNAN
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- Common algorithm interview has been out! Machine learning algorithm interview - KDnuggets
- NLP model Bert: from introduction to mastery (2)
- PN8162 20W PD快充芯片,PD快充充电器方案
- Analysis of etcd core mechanism
- htmlcss
- I think it is necessary to write a general idempotent component
- html
- use Asponse.Words Working with word templates
- Electron application uses electronic builder and electronic updater to realize automatic update
- vue任意关系组件通信与跨组件监听状态 vue-communication
猜你喜欢

Network security engineer Demo: the original * * is to get your computer administrator rights! 【***】

Examples of unconventional aggregation

Natural language processing - BM25 commonly used in search

vue-codemirror基本用法:实现搜索功能、代码折叠功能、获取编辑器值及时验证

Grouping operation aligned with specified datum

一篇文章带你了解CSS3圆角知识

多机器人行情共享解决方案

全球疫情加速互联网企业转型,区块链会是解药吗?

ipfs正舵者Filecoin落地正当时 FIL币价格破千来了

熬夜总结了报表自动化、数据可视化和挖掘的要点,和你想的不一样
随机推荐
一篇文章带你了解CSS 渐变知识
Summary of common algorithms of linked list
The choice of enterprise database is usually decided by the system architect - the newstack
Skywalking series blog 1 - install stand-alone skywalking
5.4 static resource mapping
一篇文章带你了解CSS 分页实例
Python3 e-learning case 4: writing web proxy
CCR炒币机器人:“比特币”数字货币的大佬,你不得不了解的知识
Flink的DataSource三部曲之二:内置connector
Three Python tips for reading, creating and running multiple files
Grouping operation aligned with specified datum
6.3 handlerexceptionresolver exception handling (in-depth analysis of SSM and project practice)
Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
6.1.2 handlermapping mapping processor (2) (in-depth analysis of SSM and project practice)
git rebase的時候捅婁子了,怎麼辦?線上等……
Skywalking series blog 5-apm-customize-enhance-plugin
Multi classification of unbalanced text using AWS sagemaker blazingtext
[actual combat of flutter] pubspec.yaml Configuration file details
6.5 request to view name translator (in-depth analysis of SSM and project practice)
The data of pandas was scrambled and the training machine and testing machine set were selected