当前位置:网站首页>Efficiency difference: the add method used by the set directly and the add method used by the set after judgment
Efficiency difference: the add method used by the set directly and the add method used by the set after judgment
2022-07-05 07:26:00 【work-harder】
background :
- One has 2 More than a million lines of test files , Count the contents of non repeating lines and their respective quantities by line . aggregate lineset = set()
- After reading each line linetmp = f.readline(), Use 4 Method processing . See code below
- win10, anaconda 4.8.3, python 3.8.3
result
- test_add_if() It takes the shortest time .
- That is to say , Without understanding efficiency , Use a general judgment flowchart , You can get the best efficiency applet .( For many data . There's less data , No one cares about efficiency anymore )
#
# comparing time difference between set().add(newitem) and
# if newitem not in list, then add to set().add(newitem)
# conclusion: test_add_if() is the best way for 2M+ lines check.
#
test_set = set()
test_file = "test_set_if.dxf"
def set_add_directly():
global test_set
with open(test_file, 'r') as f:
line_tmp = f.readline()
while line_tmp:
test_set.add(line_tmp)
line_tmp = f.readline()
def set_add_if():
global test_set
with open(test_file,'r') as f:
line_tmp = f.readline()
while line_tmp:
if line_tmp not in test_set:
test_set.add(line_tmp)
line_tmp = f.readline()
def set_line_split():
global test_set
with open(test_file, 'r') as f:
lines = f.readlines()
linelist = [line.split for line in lines]
test_set = set(linelist)
# print('set function:', test_set, flush=True)
def set_f(): # there is \n for each element. more time is needed.
global test_set
with open(test_file,'r') as f:
test_set = set (f.readlines())
# main
if __name__ == '__main__':
from timeit import Timer
timer1 = Timer('set_add_directly()', 'from __main__ import set_add_directly')
t1 = timer1.timeit(1) # one round is more than 1s. so it is timeit(1) not 10000.
timer2 = Timer('set_add_if()', 'from __main__ import set_add_if')
t2 = timer2.timeit(1)
timer3 = Timer('set_line_split()', 'from __main__ import set_line_split')
t3 = timer3.timeit(1)
timer4 = Timer('set_f()', 'from __main__ import set_f')
t4 = timer4.timeit(1)
print('set_add_directly - set_add_if:', t1-t2, flush=True)
print('set_add_directly - set_line_split:', t1-t3, flush=True)
print('set_line_split - set_add_if:', t3-t2, flush=True)
print('set_f - set_add_if:', t4-t2, flush=True)
- Results of one of them :
---------- Python ----------
set_add_directly - set_add_if: 0.06032050000000011
set_add_directly - set_line_split: -0.24342030000000003
set_line_split - set_add_if: 0.30374080000000014
set_f - set_add_if: 0.10101979999999977
Output completed (3 sec consumed) - Normal Termination
边栏推荐
- ImportError: No module named ‘Tkinter‘
- And let's play dynamic proxy (extreme depth version)
- Basic knowledge of public security -- FB
- 苏打粉是什么?
- Import CV2 prompt importerror: libgl so. 1: Cannot open shared object file: no such file or directory
- Basic series of SHEL script (III) for while loop
- Intelligent target detection 59 -- detailed explanation of pytoch focal loss and its implementation in yolov4
- Batch convert txt to excel format
- 611. Number of effective triangles
- ORACLE CREATE SEQUENCE,ALTER SEQUENCE,DROP SEQUENCE
猜你喜欢
Jenkins reported an error. Illegal character: '\ufeff'. Class, interface or enum are required
Daily Practice:Codeforces Round #794 (Div. 2)(A~D)
DelayQueue延迟队列的使用和场景
M2DGR 多源多场景 地面机器人SLAM数据集
Docker installs MySQL and uses Navicat to connect
Intelligent target detection 59 -- detailed explanation of pytoch focal loss and its implementation in yolov4
arcgis_ spatialjoin
Basic series of SHEL script (III) for while loop
1290_ Implementation analysis of prvtaskistasksuspended() interface in FreeRTOS
Detailed explanation of miracast Technology (I): Wi Fi display
随机推荐
[vscode] recommended plug-ins
Three body goal management notes
Basic operation of external interrupt (keil5)
Pytorch has been installed in anaconda, and pycharm normally runs code, but vs code displays no module named 'torch‘
Using GEE plug-in in QGIS
Powermanagerservice (I) - initialization
ORACLE CREATE SEQUENCE,ALTER SEQUENCE,DROP SEQUENCE
arcgis_ spatialjoin
golang定时器使用踩的坑:定时器每天执行一次
[framework] multi learner
Don't confuse the use difference between series / and / *
Idea push project to code cloud
氫氧化鈉是什麼?
The SQL implementation has multiple records with the same ID, and the latest one is taken
Intelligent target detection 59 -- detailed explanation of pytoch focal loss and its implementation in yolov4
Basic series of SHEL script (III) for while loop
剑指 Offer 56 数组中数字出现的次数(异或)
公安专业知识--哔哩桐老师
苏打粉是什么?
R language learning notes 1