当前位置:网站首页>Interpretation of iterator related "itertools" module usage
Interpretation of iterator related "itertools" module usage
2022-07-06 13:58:00 【Algorithm channel】
0 Preface
When it comes to processing cycles , We are used to using for, while etc. , For example, print the characters in each list in turn :
lis = ['I', 'love', 'python']
for i in lis:
print(i)
I
love
python
When the number of bytes of printed content is small , When all is loaded in memory , Re print , No problem . But , If there are millions of vehicle tracks now , Ask you to analyze the travel rules of each of them , Traffic jams, etc , If it's on a single computer .
You may have to face... First , It can also be ignored by you , After the final code is written , A problem that could be exposed :outofmemory
, This is often encountered in practical projects .
This question reminds us of , While processing data , How to write programs that make efficient use of memory , It's very important . today , Let's explore how to use memory efficiently , Save memory and get things done .
Actually ,Python There is a module ready to deal with this , It is itertools
modular , The functions of these functions are actually well understood .
I'm not going to give you a general idea of what they can do , I want to analyze the implementation code behind these functions , How to save memory efficiently ,Python How did the kernel contributors write beautiful code , This is very interesting. , isn't it? ?
OK,let's go. Hope you enjoy the journey!
1 Patchwork elements
itertools Medium chain Function implementation element splicing , The prototype is as follows , Parameters * Represents a variable number of parameters
chain
(iterables)
The application is as follows :
In [33]: list(chain(['I','love'],['python'],['very', 'much']))
Out[33]: ['I', 'love', 'python', 'very', 'much']
wow , It's no longer easy to use , It's a little join The smell of , But compared to join strong , It focuses on the fact that the parameters are all iteratable instances .
that ,chain How to achieve efficient memory saving ?chain The general implementation code is as follows :
def chain(*iterables):
for it in iterables:
for element in it:
yield element
The above code is not difficult to understand ,chain Essence returns a generator , So it actually reads one element at a time into memory , So save the most memory
.
2 One by one
Returns the cumulative total value of the list , Prototype :
accumulate
(iterable[, func, *, initial=None])
The application is as follows :
In [36]: list(accumulate([1,2,3,4,5,6],lambda x,y: x*y))
Out[36]: [1, 2, 6, 24, 120, 720]
accumulate The general implementation code is as follows :
def accumulate(iterable, func=operator.add, *, initial=None):
it = iter(iterable)
total = initial
if initial is None:
try:
total = next(it)
except StopIteration:
return
yield total
for element in it:
total = func(total, element)
yield total
Above code , Are you all right? ? And chain ordinary yield Different , It's a little more complicated here ,yield It's kind of like return, therefore yield total
That line directly returns an element , That is to say iterable The first element of , Because the first element returned by this function at any time is its first . Again because yield Back to a generator object , Such as name gen, therefore next(gen) when , The code will execute to for element in it:
This line , And the iterator at this time it Has pointed to iterable The second element of ,OK, I believe you understand !
3 Funnel screening
It is compress function , Function is similar to funnel function , So I call it funnel screening , Prototype :
compress
(data, selectors)
In [38]: list(compress('abcdefg',[1,1,0,1]))
Out[38]: ['a', 'b', 'd']
Easy to see ,compress The number of elements returned is equal to the shorter list length of the two parameters .
Its general implementation code :
def compress(data, selectors):
return (d for d, s in zip(data, selectors) if s)
This function is very useful
4 Segment screening
Scan list , Start to hold back if you don't meet the conditions , The prototype is as follows :
dropwhile
(predicate, iterable)
Application examples :
In [39]: list(dropwhile(lambda x: x<3,[1,0,2,4,1,1,3,5,-5]))
Out[39]: [4, 1, 1, 3, 5, -5]
The general code to implement it is as follows :
def dropwhile(predicate, iterable):
iterable = iter(iterable)
for x in iterable:
if not predicate(x):
yield x
break
for x in iterable:
yield x
5 Segment screening 2
Scan list , Return elements from iteratable objects as long as conditions are met , Until the conditions are not met , The prototype is as follows :
takewhile
(predicate, iterable)
Application examples :
In [43]: list(takewhile(lambda x: x<5, [1,4,6,4,1]))
Out[43]: [1, 4]
The general code to implement it is as follows :
def takewhile(predicate, iterable):
for x in iterable:
if predicate(x):
yield x
else:
break # Return immediately
6 Selection of defective products
Scan list , Keep... As long as you don't meet the conditions , The prototype is as follows :
dropwhile
(predicate, iterable)
Application examples :
In [40]: list(filterfalse(lambda x: x%2==0, [1,2,3,4,5,6]))
Out[40]: [1, 3, 5]
The general code to implement it is as follows :
def dropwhile(predicate, iterable):
iterable = iter(iterable)
for x in iterable:
if not predicate(x):
yield x
break
for x in iterable:
yield x
7 Slice screening
Python General slicing operation in , such as :
lis = [1,3,2,1]
lis[:1]
Their defect is still lis You have to load all the memory , So more memory saving operations islice, The prototype is as follows :
islice
(iterable, start, stop[, step])
Application examples :
In [41]: list(islice('abcdefg',1,4,2))
Out[41]: ['b', 'd']
The general code to implement it is as follows :
def islice(iterable, *args):
s = slice(*args)
start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
it = iter(range(start, stop, step))
try:
nexti = next(it)
except StopIteration:
for i, element in zip(range(start), iterable):
pass
return
try:
for i, element in enumerate(iterable):
if i == nexti:
yield element
nexti = next(it)
except StopIteration:
for i, element in zip(range(i + 1, stop), iterable):
pass
Clever use of the generator will throw an exception at the end of the iteration StopIteration
, Do something about boundary handling .
8 cell division
tee The function is similar to the well-known cell division , It can replicate the original iterator n individual , The prototype is as follows :
tee
(iterable, n=2)
The application is as follows , It can be seen that the two iterators are independent
a = tee([1,4,6,4,1],2)
In [51]: next(a[0])
Out[51]: 1
In [52]: next(a[1])
Out[52]: 1
The code to implement it is as follows :
def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque:
try:
newval = next(it)
except StopIteration:
return
for d in deques:
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)
tee The implementation uses a queue type internally deques, At first, an empty queue is generated , Add elements... To each of the copied queues newval, meanwhile yield Currently called mydeque The leftmost element in .
9 map variant
starmap You can view it as map A variation of the , It can save more memory , meanwhile iterable The element of must also be an iteratable object , The prototype is as follows :
starmap
(function, iterable)
Apply it :
In [63]: list(starmap(lambda x,y: str(x)+'-'+str(y), [('a',1),('b',2),('c',3)]))
Out[63]: ['a-1', 'b-2', 'c-3']
starmap The implementation details are as follows :
def starmap(function, iterable):
for args in iterable:
yield function(*args)
10 Copy elements
repeat Implement copy elements n Time , The prototype is as follows :
repeat
(object[, times])
The application is as follows :
In [66]: list(repeat(6,3))
Out[66]: [6, 6, 6]
In [67]: list(repeat([1,2,3],2))
Out[67]: [[1, 2, 3], [1, 2, 3]]
Its implementation details are as follows :
def repeat(object, times=None):
if times is None:# If times Not set up , Will always repeat down
while True:
yield object
else:
for i in range(times):
yield object
11 The cartesian product
The effect of Cartesian product is the same as that of :
((x,y) for x in A for y in B)
therefore , The effect of Cartesian product is as follows :
In [68]: list(product('ABCD', 'xy'))
Out[68]:
[('A', 'x'),
('A', 'y'),
('B', 'x'),
('B', 'y'),
('C', 'x'),
('C', 'y'),
('D', 'x'),
('D', 'y')]
Its implementation details :
def product(*args, repeat=1):
pools = [tuple(pool) for pool in args] * repeat
result = [[]]
for pool in pools:
result = [x+[y] for x in result for y in pool]
for prod in result:
yield tuple(prod)
12 Enhanced Edition zip
Combination value . If the length of the iteratable object is not aligned , Based on the fillvalue Fill in missing values , Be careful : Iterations continue to the longest consuming iteratable object
, The effect is as follows :
In [69]: list(zip_longest('ABCD', 'xy', fillvalue='-'))
Out[69]: [('A', 'x'), ('B', 'y'), ('C', '-'), ('D', '-')]
Its implementation details :
def zip_longest(*args, fillvalue=None):
iterators = [iter(it) for it in args]
num_active = len(iterators)
if not num_active:
return
while True:
values = []
for i, it in enumerate(iterators):
try:
value = next(it)
except StopIteration:
num_active -= 1
if not num_active:
return
iterators[i] = repeat(fillvalue)
value = fillvalue
values.append(value)
yield tuple(values)
It uses repeat, That is, when the length of the iteratable object is not aligned , according to fillvalue Fill in missing values . The key to understanding the code above is the iterator object (iter),next The particularity of the method :
In [74]: for i, it in enumerate([iter([1,2,3]),iter(['x','y'])]):
...: print(next(it))
# Output :
1
x
Combined with this prompt to understand the above code , It's not going to work .
summary
Python Of itertools Memory efficient iterator provided by the module , Basically, the implementation inside is based on the generator , So on the one hand, understand this 12 The basic functions implemented by functions , At the same time, it can also deepen the generator (generator) The understanding of the , Write for us more efficiently 、 concise 、 Beautiful code laid a solid foundation .
Read more :
The most straightforward explanation of machine learning , Just look at this article !
Python List generator 12 A little function , Which ones do you often use ?
Python The most underrated Library , Use it well and improve efficiency 10 times !
Long press QR code , Follow my public number
边栏推荐
- [面試時]——我如何講清楚TCP實現可靠傳輸的機制
- 7-14 error ticket (PTA program design)
- 【数据库 三大范式】一看就懂
- Programme de jeu de cartes - confrontation homme - machine
- Detailed explanation of redis' distributed lock principle
- 7-9 make house number 3.0 (PTA program design)
- Principles, advantages and disadvantages of two persistence mechanisms RDB and AOF of redis
- Middleware vulnerability recurrence Apache
- Record a penetration of the cat shed from outside to inside. Library operation extraction flag
- [the Nine Yang Manual] 2019 Fudan University Applied Statistics real problem + analysis
猜你喜欢
1143_ SiCp learning notes_ Tree recursion
【黑马早报】上海市监局回应钟薛高烧不化;麦趣尔承认两批次纯牛奶不合格;微信内测一个手机可注册俩号;度小满回应存款变理财产品...
强化學習基礎記錄
2022 Teddy cup data mining challenge question C idea and post game summary
canvas基础2 - arc - 画弧线
Relationship between hashcode() and equals()
FAQs and answers to the imitation Niuke technology blog project (III)
Principles, advantages and disadvantages of two persistence mechanisms RDB and AOF of redis
PriorityQueue (large root heap / small root heap /topk problem)
HackMyvm靶机系列(7)-Tron
随机推荐
7-3 构造散列表(PTA程序设计)
7-1 output all primes between 2 and n (PTA programming)
Leetcode.3 无重复字符的最长子串——超过100%的解法
Strengthen basic learning records
Experiment 6 inheritance and polymorphism
[面試時]——我如何講清楚TCP實現可靠傳輸的機制
Get started with typescript
MATLAB打开.m文件乱码解决办法
[insert, modify and delete data in the headsong educator data table]
实验六 继承和多态
7-6 矩阵的局部极小值(PTA程序设计)
【黑马早报】上海市监局回应钟薛高烧不化;麦趣尔承认两批次纯牛奶不合格;微信内测一个手机可注册俩号;度小满回应存款变理财产品...
Force deduction 152 question multiplier maximum subarray
实验四 数组
Zatan 0516
Strengthen basic learning records
[during the interview] - how can I explain the mechanism of TCP to achieve reliable transmission
Leetcode. 3. Longest substring without repeated characters - more than 100% solution
记一次猫舍由外到内的渗透撞库操作提取-flag
1. First knowledge of C language (1)