当前位置:网站首页>Efficiency difference: the add method used by the set directly and the add method used by the set after judgment
Efficiency difference: the add method used by the set directly and the add method used by the set after judgment
2022-07-05 07:26:00 【work-harder】
background :
- One has 2 More than a million lines of test files , Count the contents of non repeating lines and their respective quantities by line . aggregate lineset = set()
- After reading each line linetmp = f.readline(), Use 4 Method processing . See code below
- win10, anaconda 4.8.3, python 3.8.3
result
- test_add_if() It takes the shortest time .
- That is to say , Without understanding efficiency , Use a general judgment flowchart , You can get the best efficiency applet .( For many data . There's less data , No one cares about efficiency anymore )
#
# comparing time difference between set().add(newitem) and
# if newitem not in list, then add to set().add(newitem)
# conclusion: test_add_if() is the best way for 2M+ lines check.
#
test_set = set()
test_file = "test_set_if.dxf"
def set_add_directly():
global test_set
with open(test_file, 'r') as f:
line_tmp = f.readline()
while line_tmp:
test_set.add(line_tmp)
line_tmp = f.readline()
def set_add_if():
global test_set
with open(test_file,'r') as f:
line_tmp = f.readline()
while line_tmp:
if line_tmp not in test_set:
test_set.add(line_tmp)
line_tmp = f.readline()
def set_line_split():
global test_set
with open(test_file, 'r') as f:
lines = f.readlines()
linelist = [line.split for line in lines]
test_set = set(linelist)
# print('set function:', test_set, flush=True)
def set_f(): # there is \n for each element. more time is needed.
global test_set
with open(test_file,'r') as f:
test_set = set (f.readlines())
# main
if __name__ == '__main__':
from timeit import Timer
timer1 = Timer('set_add_directly()', 'from __main__ import set_add_directly')
t1 = timer1.timeit(1) # one round is more than 1s. so it is timeit(1) not 10000.
timer2 = Timer('set_add_if()', 'from __main__ import set_add_if')
t2 = timer2.timeit(1)
timer3 = Timer('set_line_split()', 'from __main__ import set_line_split')
t3 = timer3.timeit(1)
timer4 = Timer('set_f()', 'from __main__ import set_f')
t4 = timer4.timeit(1)
print('set_add_directly - set_add_if:', t1-t2, flush=True)
print('set_add_directly - set_line_split:', t1-t3, flush=True)
print('set_line_split - set_add_if:', t3-t2, flush=True)
print('set_f - set_add_if:', t4-t2, flush=True)
- Results of one of them :
---------- Python ----------
set_add_directly - set_add_if: 0.06032050000000011
set_add_directly - set_line_split: -0.24342030000000003
set_line_split - set_add_if: 0.30374080000000014
set_f - set_add_if: 0.10101979999999977
Output completed (3 sec consumed) - Normal Termination
边栏推荐
- Basic series of SHEL script (III) for while loop
- Basic series of SHEL script (II) syntax + operation + judgment
- [node] differences among NPM, yarn and pnpm
- 【Node】npm、yarn、pnpm 区别
- Three body goal management notes
- [software testing] 05 -- principles of software testing
- [software testing] 02 -- software defect management
- [vscode] prohibit the pylance plug-in from automatically adding import
- When jupyter notebook is encountered, erroe appears in the name and is not output after running, but an empty line of code is added downward, and [] is empty
- SOC_ SD_ CMD_ FSM
猜你喜欢
![[software testing] 02 -- software defect management](/img/2f/9987e10e9d4ec7509fa6d4ba14e84c.jpg)
[software testing] 02 -- software defect management

(tool use) how to make the system automatically match and associate to database fields by importing MySQL from idea and writing SQL statements
![[vscode] prohibit the pylance plug-in from automatically adding import](/img/a7/d96c0c4739ff68356c15bafbbb1328.jpg)
[vscode] prohibit the pylance plug-in from automatically adding import

Using GEE plug-in in QGIS

What if the DataGrid cannot see the table after connecting to the database

Three body goal management notes

Literacy Ethernet MII interface types Daquan MII, RMII, smii, gmii, rgmii, sgmii, XGMII, XAUI, rxaui

Concurrent programming - deadlock troubleshooting and handling

Ethtool principle introduction and troubleshooting ideas for network card packet loss (with ethtool source code download)

玩转gRPC—深入概念与原理
随机推荐
并查集理论讲解和代码实现
Concurrent programming - deadlock troubleshooting and handling
Today, share the wonderful and beautiful theme of idea + website address
Jenkins reported an error. Illegal character: '\ufeff'. Class, interface or enum are required
docker安装mysql并使用navicat连接
CADD课程学习(6)-- 获得已有的虚拟化合物库(Drugbank、ZINC)
Eclipse project recompile, clear cache
Install deeptools in CONDA mode
Implementation of one-dimensional convolutional neural network CNN based on FPGA (VIII) implementation of activation layer
Light up the running light, rough notes for beginners (1)
HDU1232 畅通工程(并查集)
Word import literature -mendeley
Basic knowledge of public security -- FB
What does soda ash do?
第 2 章:小试牛刀,实现一个简单的Bean容器
[solved] there is something wrong with the image
IPage can display data normally, but total is always equal to 0
DelayQueue延迟队列的使用和场景
Powermanagerservice (I) - initialization
Application of MATLAB in Linear Algebra (4): similar matrix and quadratic form