当前位置：网站首页>Interpretation of iterator related "itertools" module usage

Interpretation of iterator related "itertools" module usage

2022-07-06 13:58:00 【Algorithm channel】

0 Preface

When it comes to processing cycles , We are used to using for, while etc. , For example, print the characters in each list in turn ：

lis = ['I', 'love', 'python']
for i in lis:
    print(i)
I
love
python

When the number of bytes of printed content is small , When all is loaded in memory , Re print , No problem . But , If there are millions of vehicle tracks now , Ask you to analyze the travel rules of each of them , Traffic jams, etc , If it's on a single computer .

You may have to face... First , It can also be ignored by you , After the final code is written , A problem that could be exposed :outofmemory, This is often encountered in practical projects .

This question reminds us of , While processing data , How to write programs that make efficient use of memory , It's very important . today , Let's explore how to use memory efficiently , Save memory and get things done .

Actually ,Python There is a module ready to deal with this , It is itertools modular , The functions of these functions are actually well understood .

I'm not going to give you a general idea of what they can do , I want to analyze the implementation code behind these functions , How to save memory efficiently ,Python How did the kernel contributors write beautiful code , This is very interesting. , isn't it? ？

OK,let's go. Hope you enjoy the journey!

1 Patchwork elements

itertools Medium chain Function implementation element splicing , The prototype is as follows , Parameters * Represents a variable number of parameters

chain(iterables)

The application is as follows ：

In [33]: list(chain(['I','love'],['python'],['very', 'much']))
Out[33]: ['I', 'love', 'python', 'very', 'much']

wow , It's no longer easy to use , It's a little join The smell of , But compared to join strong , It focuses on the fact that the parameters are all iteratable instances .

that ,chain How to achieve efficient memory saving ？chain The general implementation code is as follows ：

def chain(*iterables):
    for it in iterables:
        for element in it:
            yield element

The above code is not difficult to understand ,chain Essence returns a generator , So it actually reads one element at a time into memory , So save the most memory .

2 One by one

Returns the cumulative total value of the list , Prototype ：

accumulate(iterable[, func, *, initial=None])

The application is as follows ：

In [36]: list(accumulate([1,2,3,4,5,6],lambda x,y: x*y))
Out[36]: [1, 2, 6, 24, 120, 720]

accumulate The general implementation code is as follows ：

def accumulate(iterable, func=operator.add, *, initial=None):
    it = iter(iterable)
    total = initial
    if initial is None:
        try:
            total = next(it)
        except StopIteration:
            return
    yield total
    for element in it:
        total = func(total, element)
        yield total

Above code , Are you all right? ？ And chain ordinary yield Different , It's a little more complicated here ,yield It's kind of like return, therefore yield total That line directly returns an element , That is to say iterable The first element of , Because the first element returned by this function at any time is its first . Again because yield Back to a generator object , Such as name gen, therefore next(gen) when , The code will execute to for element in it: This line , And the iterator at this time it Has pointed to iterable The second element of ,OK, I believe you understand ！

3 Funnel screening

It is compress function , Function is similar to funnel function , So I call it funnel screening , Prototype ：

compress(data, selectors)

In [38]: list(compress('abcdefg',[1,1,0,1]))
Out[38]: ['a', 'b', 'd']

Easy to see ,compress The number of elements returned is equal to the shorter list length of the two parameters .

Its general implementation code ：

def compress(data, selectors):
    return (d for d, s in zip(data, selectors) if s)

This function is very useful

4 Segment screening

Scan list , Start to hold back if you don't meet the conditions , The prototype is as follows ：

dropwhile(predicate, iterable)

Application examples ：

In [39]: list(dropwhile(lambda x: x<3,[1,0,2,4,1,1,3,5,-5]))
Out[39]: [4, 1, 1, 3, 5, -5]

The general code to implement it is as follows ：

def dropwhile(predicate, iterable):
    iterable = iter(iterable)
    for x in iterable:
        if not predicate(x):
            yield x
            break
    for x in iterable:
        yield x

5 Segment screening 2

Scan list , Return elements from iteratable objects as long as conditions are met , Until the conditions are not met , The prototype is as follows ：

takewhile(predicate, iterable)

Application examples ：

In [43]: list(takewhile(lambda x: x<5, [1,4,6,4,1]))
Out[43]: [1, 4]

The general code to implement it is as follows ：

def takewhile(predicate, iterable):
    for x in iterable:
        if predicate(x):
            yield x
        else:
            break # Return immediately

6 Selection of defective products

Scan list , Keep... As long as you don't meet the conditions , The prototype is as follows ：

dropwhile(predicate, iterable)

Application examples ：

In [40]: list(filterfalse(lambda x: x%2==0, [1,2,3,4,5,6]))
Out[40]: [1, 3, 5]

The general code to implement it is as follows ：

def dropwhile(predicate, iterable):
    iterable = iter(iterable)
    for x in iterable:
        if not predicate(x):
            yield x
            break
    for x in iterable:
        yield x

7 Slice screening

Python General slicing operation in , such as ：

lis = [1,3,2,1]
lis[:1]

Their defect is still lis You have to load all the memory , So more memory saving operations islice, The prototype is as follows ：

islice(iterable, start, stop[, step])

Application examples ：

In [41]: list(islice('abcdefg',1,4,2))
Out[41]: ['b', 'd']

The general code to implement it is as follows ：

def islice(iterable, *args):
    s = slice(*args)
    start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
    it = iter(range(start, stop, step))
    try:
        nexti = next(it)
    except StopIteration:
        for i, element in zip(range(start), iterable):
            pass
        return
    try:
        for i, element in enumerate(iterable):
            if i == nexti:
                yield element
                nexti = next(it)
    except StopIteration:
        for i, element in zip(range(i + 1, stop), iterable):
            pass

Clever use of the generator will throw an exception at the end of the iteration StopIteration, Do something about boundary handling .

8 cell division

tee The function is similar to the well-known cell division , It can replicate the original iterator n individual , The prototype is as follows ：

tee(iterable, n=2)

The application is as follows , It can be seen that the two iterators are independent

a = tee([1,4,6,4,1],2)
In [51]: next(a[0])
Out[51]: 1

In [52]: next(a[1])
Out[52]: 1

The code to implement it is as follows ：

def tee(iterable, n=2):
    it = iter(iterable)
    deques = [collections.deque() for i in range(n)]
    def gen(mydeque):
        while True:
            if not mydeque:            
                try:
                    newval = next(it)   
                except StopIteration:
                    return
                for d in deques:     
                    d.append(newval)
            yield mydeque.popleft()
    return tuple(gen(d) for d in deques)

tee The implementation uses a queue type internally deques, At first, an empty queue is generated , Add elements... To each of the copied queues newval, meanwhile yield Currently called mydeque The leftmost element in .

9 map variant

starmap You can view it as map A variation of the , It can save more memory , meanwhile iterable The element of must also be an iteratable object , The prototype is as follows ：

starmap(function, iterable)

Apply it ：

In [63]: list(starmap(lambda x,y: str(x)+'-'+str(y), [('a',1),('b',2),('c',3)]))
Out[63]: ['a-1', 'b-2', 'c-3']

starmap The implementation details are as follows ：

def starmap(function, iterable):
    for args in iterable:
        yield function(*args)

10 Copy elements

repeat Implement copy elements n Time , The prototype is as follows ：

repeat(object[, times])

The application is as follows ：

In [66]: list(repeat(6,3))
Out[66]: [6, 6, 6]

In [67]: list(repeat([1,2,3],2))
Out[67]: [[1, 2, 3], [1, 2, 3]]

Its implementation details are as follows ：

def repeat(object, times=None):
    if times is None:#  If times Not set up , Will always repeat down 
        while True: 
            yield object
    else:
        for i in range(times):
            yield object

11 The cartesian product

The effect of Cartesian product is the same as that of ：

((x,y) for x in A for y in B)

therefore , The effect of Cartesian product is as follows ：

In [68]: list(product('ABCD', 'xy'))
Out[68]:
[('A', 'x'),
 ('A', 'y'),
 ('B', 'x'),
 ('B', 'y'),
 ('C', 'x'),
 ('C', 'y'),
 ('D', 'x'),
 ('D', 'y')]

Its implementation details ：

def product(*args, repeat=1):
    pools = [tuple(pool) for pool in args] * repeat
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)

12 Enhanced Edition zip

Combination value . If the length of the iteratable object is not aligned , Based on the fillvalue Fill in missing values , Be careful ： Iterations continue to the longest consuming iteratable object , The effect is as follows ：

In [69]: list(zip_longest('ABCD', 'xy', fillvalue='-'))
Out[69]: [('A', 'x'), ('B', 'y'), ('C', '-'), ('D', '-')]

Its implementation details ：

def zip_longest(*args, fillvalue=None):
    iterators = [iter(it) for it in args]
    num_active = len(iterators)
    if not num_active:
        return
    while True:
        values = []
        for i, it in enumerate(iterators):
            try:
                value = next(it)
            except StopIteration:
                num_active -= 1
                if not num_active:
                    return
                iterators[i] = repeat(fillvalue)
                value = fillvalue
            values.append(value)
        yield tuple(values)

It uses repeat, That is, when the length of the iteratable object is not aligned , according to fillvalue Fill in missing values . The key to understanding the code above is the iterator object (iter),next The particularity of the method ：

In [74]: for i, it in enumerate([iter([1,2,3]),iter(['x','y'])]):
    ...:     print(next(it))
    # Output ：
    1
    x

Combined with this prompt to understand the above code , It's not going to work .

summary

Python Of itertools Memory efficient iterator provided by the module , Basically, the implementation inside is based on the generator , So on the one hand, understand this 12 The basic functions implemented by functions , At the same time, it can also deepen the generator (generator) The understanding of the , Write for us more efficiently 、 concise 、 Beautiful code laid a solid foundation .

The most straightforward explanation of machine learning , Just look at this article ！

Python List generator 12 A little function , Which ones do you often use ？

Python The most underrated Library , Use it well and improve efficiency 10 times ！