当前位置:网站首页>Efficiency difference: the add method used by the set directly and the add method used by the set after judgment
Efficiency difference: the add method used by the set directly and the add method used by the set after judgment
2022-07-05 07:26:00 【work-harder】
background :
- One has 2 More than a million lines of test files , Count the contents of non repeating lines and their respective quantities by line . aggregate lineset = set()
- After reading each line linetmp = f.readline(), Use 4 Method processing . See code below
- win10, anaconda 4.8.3, python 3.8.3
result
- test_add_if() It takes the shortest time .
- That is to say , Without understanding efficiency , Use a general judgment flowchart , You can get the best efficiency applet .( For many data . There's less data , No one cares about efficiency anymore )
#
# comparing time difference between set().add(newitem) and
# if newitem not in list, then add to set().add(newitem)
# conclusion: test_add_if() is the best way for 2M+ lines check.
#
test_set = set()
test_file = "test_set_if.dxf"
def set_add_directly():
global test_set
with open(test_file, 'r') as f:
line_tmp = f.readline()
while line_tmp:
test_set.add(line_tmp)
line_tmp = f.readline()
def set_add_if():
global test_set
with open(test_file,'r') as f:
line_tmp = f.readline()
while line_tmp:
if line_tmp not in test_set:
test_set.add(line_tmp)
line_tmp = f.readline()
def set_line_split():
global test_set
with open(test_file, 'r') as f:
lines = f.readlines()
linelist = [line.split for line in lines]
test_set = set(linelist)
# print('set function:', test_set, flush=True)
def set_f(): # there is \n for each element. more time is needed.
global test_set
with open(test_file,'r') as f:
test_set = set (f.readlines())
# main
if __name__ == '__main__':
from timeit import Timer
timer1 = Timer('set_add_directly()', 'from __main__ import set_add_directly')
t1 = timer1.timeit(1) # one round is more than 1s. so it is timeit(1) not 10000.
timer2 = Timer('set_add_if()', 'from __main__ import set_add_if')
t2 = timer2.timeit(1)
timer3 = Timer('set_line_split()', 'from __main__ import set_line_split')
t3 = timer3.timeit(1)
timer4 = Timer('set_f()', 'from __main__ import set_f')
t4 = timer4.timeit(1)
print('set_add_directly - set_add_if:', t1-t2, flush=True)
print('set_add_directly - set_line_split:', t1-t3, flush=True)
print('set_line_split - set_add_if:', t3-t2, flush=True)
print('set_f - set_add_if:', t4-t2, flush=True)
- Results of one of them :
---------- Python ----------
set_add_directly - set_add_if: 0.06032050000000011
set_add_directly - set_line_split: -0.24342030000000003
set_line_split - set_add_if: 0.30374080000000014
set_f - set_add_if: 0.10101979999999977
Output completed (3 sec consumed) - Normal Termination
边栏推荐
- I implement queue with C I
- GPIO port bit based on Cortex-M3 and M4 with operation macro definition (can be used for bus input and output, STM32, aducm4050, etc.)
- Shadowless cloud desktop - online computer
- How to deal with excessive memory occupation of idea and Google browser
- Raspberry pie 4B arm platform aarch64 PIP installation pytorch
- Basic series of SHEL script (I) variables
- Inftnews | drink tea and send virtual stocks? Analysis of Naixue's tea "coin issuance"
- An article was opened to test the real situation of outsourcing companies
- Graduation thesis project local deployment practice
- [software testing] 06 -- basic process of software testing
猜你喜欢
C learning notes
Solve tensorfow GPU modulenotfounderror: no module named 'tensorflow_ core. estimator‘
[vscode] prohibit the pylance plug-in from automatically adding import
Brief description of inux camera (Mipi interface)
PHY drive commissioning --- mdio/mdc interface Clause 22 and 45 (I)
Today, share the wonderful and beautiful theme of idea + website address
arcgis_ spatialjoin
2022 PMP project management examination agile knowledge points (7)
Detailed explanation of miracast Technology (I): Wi Fi display
M2dgr slam data set of multi-source and multi scene ground robot
随机推荐
公安专业知识--哔哩桐老师
(top) pretty girl binary color code portal
Solve tensorfow GPU modulenotfounderror: no module named 'tensorflow_ core. estimator‘
【无标题】
Today, share the wonderful and beautiful theme of idea + website address
Concurrent programming - how to interrupt / stop a running thread?
SD_ CMD_ RECEIVE_ SHIFT_ REGISTER
Reading literature sorting 20220104
C learning notes
公安基础知识--fb
Simple operation of running water lamp (keil5)
The SQL implementation has multiple records with the same ID, and the latest one is taken
An article was opened to test the real situation of outsourcing companies
Word import literature -mendeley
The mutual realization of C L stack and queue in I
window navicat连接阿里云服务器mysql步骤及常见问题
Using GEE plug-in in QGIS
DelayQueue延迟队列的使用和场景
纯碱是做什么的?
How to delete the virus of inserting USB flash disk copy of shortcut to