More routines for operating on iterables, beyond itertools

Overview

More Itertools

https://readthedocs.org/projects/more-itertools/badge/?version=latest

Python's itertools library is a gem - you can compose elegant solutions for a variety of problems with the functions it provides. In more-itertools we collect additional building blocks, recipes, and routines for working with Python iterables.

Grouping chunked, ichunked, sliced, distribute, divide, split_at, split_before, split_after, split_into, split_when, bucket, unzip, grouper, partition
Lookahead and lookback spy, peekable, seekable
Windowing windowed, substrings, substrings_indexes, stagger, windowed_complete, pairwise
Augmenting count_cycle, intersperse, padded, mark_ends, repeat_last, adjacent, groupby_transform, padnone, ncycles
Combining collapse, sort_together, interleave, interleave_longest, zip_offset, zip_equal, dotproduct, convolve, flatten, roundrobin, prepend, value_chain
Summarizing ilen, unique_to_each, sample, consecutive_groups, run_length, map_reduce, exactly_n, is_sorted, all_equal, all_unique, first_true, quantify
Selecting islice_extended, first, last, one, only, strip, lstrip, rstrip, filter_except map_except nth_or_last, nth, take, tail, unique_everseen, unique_justseen
Combinatorics distinct_permutations, distinct_combinations, circular_shifts, partitions, set_partitions, product_index, combination_index, permutation_index, powerset, random_product, random_permutation, random_combination, random_combination_with_replacement, nth_product nth_permutation nth_combination
Wrapping always_iterable, always_reversible, consumer, with_iter, iter_except
Others locate, rlocate, replace, numeric_range, side_effect, iterate, difference, make_decorator, SequenceView, time_limited, consume, tabulate, repeatfunc

Getting started

To get started, install the library with pip:

pip install more-itertools

The recipes from the itertools docs are included in the top-level package:

>>> from more_itertools import flatten
>>> iterable = [(0, 1), (2, 3)]
>>> list(flatten(iterable))
[0, 1, 2, 3]

Several new recipes are available as well:

>>> from more_itertools import chunked
>>> iterable = [0, 1, 2, 3, 4, 5, 6, 7, 8]
>>> list(chunked(iterable, 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8]]

>>> from more_itertools import spy
>>> iterable = (x * x for x in range(1, 6))
>>> head, iterable = spy(iterable, n=3)
>>> list(head)
[1, 4, 9]
>>> list(iterable)
[1, 4, 9, 16, 25]

For the full listing of functions, see the API documentation.

Links elsewhere

Blog posts about more-itertools:

Development

more-itertools is maintained by @erikrose and @bbayles, with help from many others. If you have a problem or suggestion, please file a bug or pull request in this repository. Thanks for contributing!

Comments
  • New stagger() and zip_offsets() itertools

    New stagger() and zip_offsets() itertools

    This addition is adapted from @joshbode's ActiveState recipe.

    Updated, see thread below.


    ~~offset_groups()~~ stagger() lets you specify a set of offsets by which to lead or lag the iterable. It will yield groups of items, the ith of which corresponds to the ith offset. This is a sort of generalization of a sliding window.

    For example, to get a sliding window with both lookback and lookahead:

    >>> list(stagger('abcdefg', offsets=(-2, -1, 0, 1)))
    [(None, None, 'a', 'b'),
     (None, 'a', 'b', 'c'),
     ('a', 'b', 'c', 'd'),
     ('b', 'c', 'd', 'e'),
     ('c', 'd', 'e', 'f'),
     ('d', 'e', 'f', 'g')]
    

    The longest parameter will continue until the last item of the iterable is the first item of the last group:

    >>> list(stagger('abcdefg', offsets=(0, 1, 2), longest=True))
    [('a', 'b', 'c'),
     ('b', 'c', 'd'),
     ('c', 'd', 'e'),
     ('d', 'e', 'f'),
     ('e', 'f', 'g'),
     ('f', 'g', None),
     ('g', None, None)]
    

    fillvalue can be used in place of None as well. offsets is (-1, 0, 1) by default to get (before, during after) groups:

    >>> for before, during, after in stagger('abcd', longest=True, fillvalue='?'):
    ...     print(before, during, after, sep='\t')
    ?	a	b
    a	b	c
    b	c	d
    c	d	?
    d	?	?
    
    opened by bbayles 20
  • Bucket (was

    Bucket (was "Separate" and "Partition")

    This PR builds on the ideas in #26 and adds a separate function. By default it will split an input iterable into two - items that have bool(item) == True and items that have bool(item) == False. A dictionary with True and False keys is returned:

    >>> iterable = [0, '', 1, 'x', None, [1]]
    >>> D = separate(iterable)
    >>> list(D[False])
    [0, '', None]
    >>> list(D[True])
    [1, 'x', [1]]
    

    You can customize the keys of the returned dictionary as well as the function that operates on the items:

    >>> iterable = [0, 1, 2, 3, 4, 5, 6, 7, 8]
    >>> D = separate(iterable, keys=(0, 1, 2), fn=lambda x: x % 3)
    >>> list(D[0])  # Divisible by 3
    [0, 3, 6]
    >>> list(D[1])  # Remainder 1 when divided by 3
    [1, 4, 7]
    >>> list(D[2])  # Remainder 2 when divided by 3
    [2, 5, 8]
    

    Since we can't know what the return values of the function will be until we run it on all of the items, you must specify up-front what you think they'll be (obviously bool only returns True and False). I considered having an "other" key in the dictionary for items that don't match any keys, but decided against it.

    opened by bbayles 20
  • Request: pushback

    Request: pushback

    I'd like to suggest adding a wrapper that allows pushing a value back on to an iterator, so that the next call to next(it) will return the pushed value before the next element from the underlying iterable. I find myself wanting this from time to time (usually in parsing applications), and I could have sworn it was implemented somewhere standard, but I looked around and couldn't find it. Would this be a good addition to more-itertools?

    I do have code to offer, but I'm posing this as an issue instead of a pull request because I have a dilemma. I've come up with two implementations, one as a generator function

    def pushback(iterable, maxlen=None):
        iterable = iter(iterable)
        # add 1 to account for the append(None)
        stack = deque(maxlen=maxlen + 1 if maxlen is not None else None)
        while True:
            if stack:
                e = stack.pop()
            else:
                e = next(iterable)
            sent = yield e
            if sent is not None:
                stack.append(sent)
                stack.append(None) # dummy value to return from send()
    

    and the other as a class

    class pushback:
        def __init__(self, iterable, maxlen=None):
            self.iterable = iter(iterable)
            self.stack = deque(maxlen=maxlen)
        def __iter__(self):
            return self
        def __next__(self):
            return self.stack.pop() if self.stack else next(self.iterable)
        def send(self, value):
            self.stack.append(value)
    

    The function implementation is about twice as fast in my preliminary tests (using IPython)

    In [13]: %timeit list(pushback_function(range(10)))
    100000 loops, best of 3: 5.45 µs per loop
    In [14]: %timeit list(pushback_class(range(10)))
    100000 loops, best of 3: 10.8 µs per loop
    

    On the other hand the class implementation is conceptually cleaner, and also does not need to be "primed" by calling next(it) before sending in a value with it.send(x).

    Now, in most cases, you can prime the generator iterator without losing an item by running it.send(next(it)), and that could be done in a wrapper function to make it transparent to client code. But only the class implementation allows pushing in front of an empty iterable (admittedly a rather pathological use case):

    >>> it = pushback([])
    >>> it.send(10)
    >>> list(it)
    [10]
    

    So my point is: if this is something you want for more-itertools, which implementation to use? Or is there a way to "fix" one of them to make it strictly better than the other, that I'm not seeing? (Or does this whole thing already exist and I wasted an evening?)

    opened by diazona 18
  • Request: Sort iterables by

    Request: Sort iterables by

    I'm not sure if this is a fitting addition to more-itertools but it's a method I use quite often. This function sorts iterables using a defined order of priority. So you can sort iterables in concordance with a given sort pattern. I suppose it's tough to explain so here are three examples.

    # Will sort all iterables based on the ascending sort order of the first iterable
    >>>sort_iterables_by([['a', 'd', 'c', 'd'], [1, 3, 2, 4]], key_list=(0,))
    [('a', 'c', 'd', 'd'), (1, 2, 3, 4)]
    
    # Will sort all iterables based on the ascending sort order of the first iterable,
    # then the second iterable
    >>>sort_iterables_by([['d', 'd', 'd', 'c'], [4, 3, 7, 10], [1, 2, 3, 4]],
                          key_list=(0, 1))
    [('c', 'd', 'd', 'd'), (10, 3, 4, 7), (4, 2, 1, 3)]
    
    # Will sort all iterables based on the descending sort order of the first iterable,
    # then the second iterable
    >>>sort_iterables_by([['a', 'b', 'b'], [1, 3, 2]],
    >>>                   key_list=(0, 1),
    >>>                   reverse=True))
    [('b', 'b', 'a'), (3, 2, 1)]
    

    Here is the function I propose

    import operator
    
    def sort_iterables_by(iterables, key_list=(0,), reverse=False):
    
        return list(zip(*sorted(zip(*iterables),
                                key=operator.itemgetter(*key_list),
                                reverse=reverse)))
    

    What do you guys think? A useful addition? One remark is that because zip is used, iterables are returned trimmed to the length of the shortest iterable before sorting. An alternate form of the function could be used with zip_longest although for lists with heterogeneous objects no fillvalue will make obvious sense.

    Example:

    import operator
    import itertools
    
    def sort_iterables_by(iterables, key_list=(0,), reverse=False, fillvalue=None):
    
        return list(zip(*sorted(itertools.zip_longest(*iterables, fillvalue=fillvalue),
                                key=operator.itemgetter(*key_list),
                                reverse=reverse)))
    
    opened by clintval 18
  • Error in more.py during pytest in Python 2.7

    Error in more.py during pytest in Python 2.7

    I received this error in version 6.0 (released just now in PyPi) when running pytest which depends on more-itertools. I am using python 2.7

    def _collate(*iterables, key=lambda a: a, reverse=False):
                               ^
    SyntaxError: invalid syntax
    

    Edited to add: For people finding this issue from Google, the issue is with Python 2.7. For a version more more-itertools that works with that version of Python, pip install more_itertools==5.0.0.

    opened by joshuahendinata 16
  • Improve asymptotic running time of interleave_longest

    Improve asymptotic running time of interleave_longest

    interleave_longest() currently works by using zip_longest() with a fillvalue of _marker, and then filtering out all the _marker references in the returned iterable.

    This has a running time of O(len(iterables) * max(map(ilen,iterables)). This is particularly inefficient if one iterable is much larger than any of others.

    Using the proposed OrderedDict based implementation in distributions where OrderedDict exists, the running time can be changed to O(sum(map(ilen, iterables)) + len(iterables)), which is better in cases where one iterable is much larger than others without realistically affecting the runtime of the other cases.

    opened by michael-celani 15
  • Consider making side_effect(file_obj=) arg more generic

    Consider making side_effect(file_obj=) arg more generic

    Closing a file after yielding its last line is all well and good, but I wonder: can we be more general and add power without losing much brevity?

    side_effect(log, some_file, last=lambda: some_file.close())
    
    side_effect(ingest, listdir(some_dir), last=lambda: rmtree(some_dir))
    

    @bbayles? @yardsale8?

    opened by erikrose 15
  • Question about first() and life cycle of provided iterable

    Question about first() and life cycle of provided iterable

    This is a question rather than a bug:

    Take the first() function as an example https://github.com/more-itertools/more-itertools/blob/master/more_itertools/more.py#L160. It accepts an iterable as its first parameter, which can also be an iterator or a generator. I would expect the functionality of the method to be the same no matter what kind of iterable I pass in.

    Per design the function only extracts the first element of the provided iterable, but the behaviour for repeated invocation on the same iterable is different:

    • if the iterable is a not an iterator or generator, then repeated invocation will yield the same result each time (each time the first element)
    • if the iterable is an iterator (or generator) then repeated invocation will return one element after the other of the iterator, each time a different value. At least I would have expected that a second invocation yielded the first element on the first invocation and from there on the default value, because there is no more than one first element in an iterator.

    The other aspect is that of the handling of generators as an input. A generator has a close method that finalizes the generator when not exhausted completely. Since the first operation is by design not exhausing a generator with more than one element, shouldn't the method close the input sequence in case it's a generator?

    opened by CarstenLeue 14
  • [MRG] Sampling iterables

    [MRG] Sampling iterables

    Work in progress. Addresses #344.

    TODO

    • [x] Review literature.
    • [ ] Sampling with/without replacement.
    • [x] Sampling with/without weights.
    • [x] Perform empirical analysis of running time.
    • [x] Decide on function signature.

    Decided not to add sampling with replacement in this PR. Three reasons: (1) with a large number of elements, sampling with replacement is not that interesting, (2) there's no general algorithm, so we would have to split up into four cases: weighted/unweighted and with/without replacement and (3) I couldn't find a paper for the weighted case with replacements. Could be generalized from the other papers, but I don't want to spend time trying to do the math right now. Out of scope for this PR.

    opened by tommyod 14
  • The new `context` itertool is bad

    The new `context` itertool is bad

    The new context itertool tries to expose a context manager as an iterable. This breaks the context manager manager guarantee that __exit__ will be called. It's not enough to tell the caller that he has to iterate over the whole iterable. Even if there are no break or return statements in the loop, there is always the possibility of exceptions. The whole point of context managers is to guarantee that the __exit__ is always called when a block terminates. This is why context managers and iterables are orthogonal concepts; in general, one cannot be made to look like the other.

    Please remove context because it encourages people to write bad code.

    There is no benefit to context in any case. Even the motivating example in the documentation is just:

    consume(print(x, file=f) for f in context(file_obj) for x in it)
    

    which can be written just as succinctly

    with file_obj as f:
        consume(print(x, file=f) for x in it)
    
    opened by NeilGirdhar 14
  • Add ``zip_with_scalars()`` function

    Add ``zip_with_scalars()`` function

    Description

    Would you be interested in adding a function that mostly works like zip(), except that it treats any non-iterable arguments as values that should be included in each output tuple (without affecting how the iterable arguments are zipped)?

    References

    I needed a function like this when I was writing a function for reading keys from JSON files. The most common way to call the function would be with one key and one file, but it was also important to allow multiple keys or files to be specified. For example, consider these two files:

    # foo.json
    {"a": 1, "b": 2}
    
    # bar.json
    {"a": 3, "c": 4}
    

    I needed to be able to do:

    >>> f("foo.json", "a")
    [1]
    >>> f("foo.json", ["a", "b"])
    [1, 2]
    >>> f(["foo.json", "bar.json"], "a")
    [1, 3]
    >>> f(["foo.json", "bar.json"], ["b", "c"])
    [2, 4]
    

    This is pretty hard to write in a general way. You can't just wrap the scalars in itertools.repeat, because you'll get an infinite loop if only scalars are specified.

    Examples

    Here are some examples to show how it works:

    >>> list(zip_with_scalars(1, 2))
    [(1, 2)]
    >>> list(zip_with_scalars(1, [2]))
    [(1, 2)]
    >>> list(zip_with_scalars(1, [2, 3]))
    [(1, 2), (1, 3)]
    >>> list(zip_with_scalars([1, 2], [3, 4]))
    [(1, 3), (2, 4)]
    

    Some special cases:

    >>> list(zip_with_scalars("ab"))
    [("ab",)]
    >>> list(zip_with_scalars("ab", not_iterable=None))
    [("a", "b")]
    >>> list(zip_with_scalars([1], [2, 3], strict=True))
    Traceback...
    UnequalIterableError
    

    Implementation

    Here's an implementation. If you're interested in adding this itertool, I can turn this into a PR with tests and docs:

    def zip_with_scalars(*objs, strict=False, not_iterable=(str, bytes)):
        from builtins import zip
        from operator import itemgetter
        from more_itertools import zip_equal
    
        def is_scalar(obj):
            if not_iterable and isinstance(obj, not_iterable):
                return True
            try:
                iter(obj)
            except TypeError:
                return True
            else:
                return False
    
        iterables = []
        formatters = []
    
        for obj in objs:
            if is_scalar(obj):
                # The double-lambdas are necessary to create a closure.
                formatters.append((lambda x: lambda _: x)(obj))
            else:
                formatters.append(itemgetter(len(iterables)))
                iterables.append(obj)
    
        if not iterables:
            if not objs:
                return
            else:
                yield tuple(objs)
                return
    
        if strict:
            zip = zip_equal
    
        for values in zip(*iterables):
            yield tuple(f(values) for f in formatters)
    
    pr-welcome 
    opened by kalekundert 13
  • dzip (zip but for dictionaries)

    dzip (zip but for dictionaries)

    Description

    Zips multiple mappings to an iterable of (k, (v1, v2, ..., vn)).

    References

    My use case: DictEncoder is a torch module that stores a dict of modules and encodes data sample of dict[str, Tensor] by applying every module on the corresponding value in the given dict. Basically, every method was a dzip on (self, sample) so I wanted to abstract this away.

    Examples

    Here is an example implementation:

    
    import functools
    
    def dzip(*mappings):
        keys = functools.reduce(
            lambda a, b: a & b,
            (mapping.keys() for mapping in mappings),
        )
    
        for k in keys:
            yield k, tuple(mapping[k] for mapping in mappings)
    
    deferred 
    opened by khoda81 2
  • Suggestion to improve performance of `partitions`

    Suggestion to improve performance of `partitions`

    This is the current implementation of partitions:

    def partitions(iterable):
        sequence = list(iterable)
        n = len(sequence)
        for i in powerset(range(1, n)):
            yield [sequence[i:j] for i, j in zip((0,) + i, i + (n,))]
    

    This converts the iterable to a list and slices it many times to generate partitions.

    >>> %timeit consume(partitions(range(20)))
    1.89 s ± 18.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    I thought of a different (recursive) algorithm, which is very similar to how set_partitions is implemented:

    def partitions(iterable):
        iterable = iter(iterable)
        try:
            item = next(iterable)
        except StopIteration:
            yield [()]
            return
        for first_part, *other_parts in partitions(iterable):
            yield [(item,) + first_part, *other_parts]
            if first_part:
                yield [(item,), first_part, *other_parts]
    

    which gives

    >>> %timeit consume(partitions(range(20)))
    557 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    I haven't done thorough performance testing but it seems to give better performance on average, so I suggest switching to it.

    A few points to consider:

    • This algorithm returns partitions in a different order than the current implementation returns. The documentation makes no guarantees about this order, but maybe someone depends on the current one? Also not sure if one order is "better" than another, i.e. if there is any use-case which requires iterating partitions in a particular order.

    • Each item yielded by this new algorithm is a list of tuples, whereas the current algorithm yields lists of lists. If we change this to yield lists of lists, then other_parts will be a list of lists, and since its contents are yielded twice and lists are mutable the user might modify them and get weird results. So we need to copy the lists inside:

      def partitions(iterable):
          iterable = iter(iterable)
          try:
              item = next(iterable)
          except StopIteration:
              yield [[]]
              return
          for first_part, *other_parts in partitions(iterable):
              yield [[item] + first_part, *map(list.copy, other_parts)]
              if first_part:
                  yield [[item], first_part, *other_parts]
      

      which gives

      >>> %timeit consume(partitions(range(20)))
      1.59 s ± 8.71 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
      

      which is still faster than the current implementation but not by much. I'd vote for using tuples, and I think users expect this since this is what builtin functions return, such as zip. But maybe users rely on it yielding lists of lists?

    • I also think changing it to yield tuples of tuples (rather than lists of tuples) is a good idea, even though it doesn't matter, since the builtin itertools (e.g. permutations, combinations) yield tuples.

    • This is recursive therefore there might be concern that stack overflows will occur, but the amount of partitions this yields for an iterable of length n is 2**(n-1), and since the number of seconds since the big bang is around 2**58.6 I think it's safe to say we won't be using more than 50 stack frames, and the default limit is 1000.

    What do you think?

    opened by NotWearingPants 3
  • Add `iter_with` as mirror/companion utility to `with_iter`

    Add `iter_with` as mirror/companion utility to `with_iter`

    Description

    Utility adds the "flip side" of the existing with_iter, namely iter_with, allowing any Iterator to be used in a with statement.

    References

    I've used the actual code below in multiple situations to help streamline / clean up sections of code that otherwise would be unnecessarily allocation heavy or just... uglier 😅.

    Implementation

    from contextlib import contextmanager
    from typing import TypeVar, Iterable, Generator
    
    T = TypeVar("T")
    
    @contextmanager
    def iter_with(obj: Iterable[T]) -> Generator[Iterator[T], None, None]:
        """Use Python's built-in `iter` function as a context manager."""
        yield iter(obj)
    
    
    opened by the-wondersmith 1
  • Link

    Link "Getting Started" aka. Installation in Read the Docs sidebar

    Not to much of a problem. I was just confused for a second because I usually expect the Getting Started section to be the first one in the Docs.

    Maybe one could at least link it to the side bar menu image

    pr-welcome 
    opened by Sammeeey 1
  • set_partitions ignoring duplicates

    set_partitions ignoring duplicates

    Description

    Configure set_partitions to ignore duplicates whenever there is a repeat element

    References

    set_partitions works fine when all the element in the source list/iterator are distinct. But whenever there is a repeat it generates redundant sets.

    Examples

    lst = [2, 2,  3, 5]
    pp.pprint(list(mit.set_partitions(lst, 2)))
    [[[2], [2, 3, 5]],
     [[2, 2], [3, 5]],
     [[2], [2, 3, 5]],
     [[2, 2, 3], [5]],
     [[2, 3], [2, 5]],
     [[2, 3], [2, 5]],
     [[3], [2, 2, 5]]]
    

    As we can see row 0 and row 2(zero indexed) are identical ie [[2], [2, 3, 5]], set_partitions needs an argument to ignore duplicates. Don't you think so?

    pr-welcome 
    opened by boomboomjassi 3
Releases(v9.0.0)
A functional standard library for Python.

Toolz A set of utility functions for iterators, functions, and dictionaries. See the PyToolz documentation at https://toolz.readthedocs.io LICENSE New

4.1k Jan 03, 2023
Cython implementation of Toolz: High performance functional utilities

CyToolz Cython implementation of the toolz package, which provides high performance utility functions for iterables, functions, and dictionaries. tool

894 Jan 02, 2023
Make your functions return something meaningful, typed, and safe!

Make your functions return something meaningful, typed, and safe! Features Brings functional programming to Python land Provides a bunch of primitives

dry-python 2.5k Jan 05, 2023
粤语编程语言.The Cantonese programming language.

粤语编程语言.The Cantonese programming language.

Stepfen Shawn 895 Dec 24, 2022
Simple, elegant, Pythonic functional programming.

Coconut Coconut (coconut-lang.org) is a variant of Python that adds on top of Python syntax new features for simple, elegant, Pythonic functional prog

Evan Hubinger 3.6k Jan 03, 2023
More routines for operating on iterables, beyond itertools

More Itertools Python's itertools library is a gem - you can compose elegant solutions for a variety of problems with the functions it provides. In mo

2.9k Jan 04, 2023
A fancy and practical functional tools

Funcy A collection of fancy functional tools focused on practicality. Inspired by clojure, underscore and my own abstractions. Keep reading to get an

Alexander Schepanovski 2.9k Dec 29, 2022
Functional programming in Python: implementation of missing features to enjoy FP

Fn.py: enjoy FP in Python Despite the fact that Python is not pure-functional programming language, it's multi-paradigm PL and it gives you enough fre

Oleksii Kachaiev 3.3k Jan 04, 2023