当前位置:网站首页>Pattern matching: The gestalt approach一种序列的文本相似度方法
Pattern matching: The gestalt approach一种序列的文本相似度方法
2020-11-06 01:28:00 【Elementary school students in IT field】
Reprint please indicate original :https://blog.csdn.net/HHTNAN
Pattern matching: The gestalt approach
python Compare the similarity of two sequences , There is no need for a participle
Case study 1
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" What does tinea cruris look like ? How to treat tinea cruris good ?"
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.06666666666666667
Case study 2
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Do uterine fibroids minimally invasive surgery specific costs "
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.769230769
Case study 3
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Specific cost to do uterine fibroids minimally invasive surgery "
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.6923076923076923
Case study 4
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Specific cost of uterine fibroids to do minimally invasive surgery "
print (difflib.SequenceMatcher(None,a,b).ratio())
0.6153846153846154
Through the above case, we can see that the algorithm focuses on , It's sequence similarity . Will ignore the meaning of the subject 、 semantics .
The score returned by the algorithm is twice the number of sequence characters found by the algorithm divided by the total number of characters in two strings ; The score is returned as an integer , Reflect percentage match .
At present, the calculation formula of guessing algorithm is ,
If the positions in the sequence don't exactly match , Such as the case 3, Then the calculated score is 9/13,9 For the largest common string ,13 Is the total number of character sequences , Case study 4 by 8/13 Result , Understood as a 4+4/13 Result . So the question is why the case 2 The largest of 9 The score for the largest common string is so high , There should be a consistent score in one position +1. That is, the result is understood as 9+1/13 The result . The above conjectures are based on the test , It's not validated , It's not authoritative , I'll find the paper and read it later , Finishing again .( It is worth noting that in the process of re-engineering, it is to B On the basis of characters .)
Case study 5
import difflib
a=“10 Anemia in a month old baby ”
b=“10 A month old baby has nosebleed ”
print (difflib.SequenceMatcher(None,a,b).ratio())
Output
0.8235294117647058
(7+8)+1/len(a)+len(b)=7*2/8+9=0.8235294117647058
Reprint please indicate original :https://blog.csdn.net/HHTNAN
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- 嘗試從零開始構建我的商城 (二) :使用JWT保護我們的資訊保安,完善Swagger配置
- Classical dynamic programming: complete knapsack problem
- Natural language processing - BM25 commonly used in search
- Existence judgment in structured data
- Python Jieba segmentation (stuttering segmentation), extracting words, loading words, modifying word frequency, defining thesaurus
- 使用 Iceberg on Kubernetes 打造新一代云原生数据湖
- 采购供应商系统是什么?采购供应商管理平台解决方案
- Vuejs development specification
- The difference between Es5 class and ES6 class
- 熬夜总结了报表自动化、数据可视化和挖掘的要点,和你想的不一样
猜你喜欢

IPFS/Filecoin合法性:保护个人隐私不被泄露

This article will introduce you to jest unit test

ES6学习笔记(五):轻松了解ES6的内置扩展对象

ES6学习笔记(四):教你轻松搞懂ES6的新增语法

一篇文章带你了解CSS3图片边框

前端工程师需要懂的前端面试题(c s s方面)总结(二)

Python saves the list data

一篇文章教会你使用HTML5 SVG 标签

Character string and memory operation function in C language

熬夜总结了报表自动化、数据可视化和挖掘的要点,和你想的不一样
随机推荐
Calculation script for time series data
采购供应商系统是什么?采购供应商管理平台解决方案
Using Es5 to realize the class of ES6
Subordination judgment in structured data
PHP应用对接Justswap专用开发包【JustSwap.PHP】
I've been rejected by the product manager. Why don't you know
axios学习笔记(二):轻松弄懂XHR的使用及如何封装简易axios
6.1.1 handlermapping mapping processor (1) (in-depth analysis of SSM and project practice)
一篇文章带你了解HTML表格及其主要属性介绍
Aprelu: cross border application, adaptive relu | IEEE tie 2020 for machine fault detection
教你轻松搞懂vue-codemirror的基本用法:主要实现代码编辑、验证提示、代码格式化
Process analysis of Python authentication mechanism based on JWT
Skywalking series blog 2-skywalking using
Examples of unconventional aggregation
PHPSHE 短信插件说明
5.4 static resource mapping
Summary of common algorithms of linked list
Three Python tips for reading, creating and running multiple files
带你学习ES5中新增的方法
关于Kubernetes 与 OAM 构建统一、标准化的应用管理平台知识!(附网盘链接)