当前位置:网站首页>Efficiency difference: the add method used by the set directly and the add method used by the set after judgment
Efficiency difference: the add method used by the set directly and the add method used by the set after judgment
2022-07-05 07:26:00 【work-harder】
background :
- One has 2 More than a million lines of test files , Count the contents of non repeating lines and their respective quantities by line . aggregate lineset = set()
- After reading each line linetmp = f.readline(), Use 4 Method processing . See code below
- win10, anaconda 4.8.3, python 3.8.3
result
- test_add_if() It takes the shortest time .
- That is to say , Without understanding efficiency , Use a general judgment flowchart , You can get the best efficiency applet .( For many data . There's less data , No one cares about efficiency anymore )
#
# comparing time difference between set().add(newitem) and
# if newitem not in list, then add to set().add(newitem)
# conclusion: test_add_if() is the best way for 2M+ lines check.
#
test_set = set()
test_file = "test_set_if.dxf"
def set_add_directly():
global test_set
with open(test_file, 'r') as f:
line_tmp = f.readline()
while line_tmp:
test_set.add(line_tmp)
line_tmp = f.readline()
def set_add_if():
global test_set
with open(test_file,'r') as f:
line_tmp = f.readline()
while line_tmp:
if line_tmp not in test_set:
test_set.add(line_tmp)
line_tmp = f.readline()
def set_line_split():
global test_set
with open(test_file, 'r') as f:
lines = f.readlines()
linelist = [line.split for line in lines]
test_set = set(linelist)
# print('set function:', test_set, flush=True)
def set_f(): # there is \n for each element. more time is needed.
global test_set
with open(test_file,'r') as f:
test_set = set (f.readlines())
# main
if __name__ == '__main__':
from timeit import Timer
timer1 = Timer('set_add_directly()', 'from __main__ import set_add_directly')
t1 = timer1.timeit(1) # one round is more than 1s. so it is timeit(1) not 10000.
timer2 = Timer('set_add_if()', 'from __main__ import set_add_if')
t2 = timer2.timeit(1)
timer3 = Timer('set_line_split()', 'from __main__ import set_line_split')
t3 = timer3.timeit(1)
timer4 = Timer('set_f()', 'from __main__ import set_f')
t4 = timer4.timeit(1)
print('set_add_directly - set_add_if:', t1-t2, flush=True)
print('set_add_directly - set_line_split:', t1-t3, flush=True)
print('set_line_split - set_add_if:', t3-t2, flush=True)
print('set_f - set_add_if:', t4-t2, flush=True)
- Results of one of them :
---------- Python ----------
set_add_directly - set_add_if: 0.06032050000000011
set_add_directly - set_line_split: -0.24342030000000003
set_line_split - set_add_if: 0.30374080000000014
set_f - set_add_if: 0.10101979999999977
Output completed (3 sec consumed) - Normal Termination
边栏推荐
- Steps and FAQs of connecting windows Navicat to Alibaba cloud server MySQL
- 苏打粉是什么?
- 【Node】nvm 版本管理工具
- Inftnews | drink tea and send virtual stocks? Analysis of Naixue's tea "coin issuance"
- The difference between NPM install -g/-save/-save-dev
- Daily Practice:Codeforces Round #794 (Div. 2)(A~D)
- Intelligent target detection 59 -- detailed explanation of pytoch focal loss and its implementation in yolov4
- C#学习笔记
- [OBS] x264 Code: "buffer_size“
- Unity ugui how to match and transform coordinates between different UI panels or uis
猜你喜欢

Light up the running light, rough notes for beginners (1)
![[untitled]](/img/d5/2ac2b15818cf66c241e307c6723d50.jpg)
[untitled]

PowerManagerService(一)— 初始化

An article was opened to test the real situation of outsourcing companies
![[vscode] prohibit the pylance plug-in from automatically adding import](/img/a7/d96c0c4739ff68356c15bafbbb1328.jpg)
[vscode] prohibit the pylance plug-in from automatically adding import

Matrix and TMB package version issues in R

(tool use) how to make the system automatically match and associate to database fields by importing MySQL from idea and writing SQL statements

Inftnews | drink tea and send virtual stocks? Analysis of Naixue's tea "coin issuance"

剑指 Offer 56 数组中数字出现的次数(异或)

DelayQueue延迟队列的使用和场景
随机推荐
PHY drive commissioning - phy controller drive (II)
[node] NVM version management tool
[node] differences among NPM, yarn and pnpm
Ethtool principle introduction and troubleshooting ideas for network card packet loss (with ethtool source code download)
Basic knowledge of public security -- FB
Docker installs MySQL and uses Navicat to connect
Reading literature sorting 20220104
【obs】x264编码:“buffer_size“
Chapter 2: try to implement a simple bean container
The golang timer uses the stepped pit: the timer is executed once a day
Jenkins reported an error. Illegal character: '\ufeff'. Class, interface or enum are required
How to delete the virus of inserting USB flash disk copy of shortcut to
Netease to B, soft outside, hard in
selenium 元素定位
Typescript get timestamp
并查集理论讲解和代码实现
CADD course learning (5) -- Construction of chemosynthesis structure with known target (ChemDraw)
What does soda ash do?
[vscode] prohibit the pylance plug-in from automatically adding import
[software testing] 05 -- principles of software testing