Better directory iterator and faster os.walk(), now in the Python 3.5 stdlib

Overview

scandir, a better directory iterator and faster os.walk()

scandir on PyPI (Python Package Index) Travis CI tests (Linux) Appveyor tests (Windows)

scandir() is a directory iteration function like os.listdir(), except that instead of returning a list of bare filenames, it yields DirEntry objects that include file type and stat information along with the name. Using scandir() increases the speed of os.walk() by 2-20 times (depending on the platform and file system) by avoiding unnecessary calls to os.stat() in most cases.

Now included in a Python near you!

scandir has been included in the Python 3.5 standard library as os.scandir(), and the related performance improvements to os.walk() have also been included. So if you're lucky enough to be using Python 3.5 (release date September 13, 2015) you get the benefit immediately, otherwise just download this module from PyPI, install it with pip install scandir, and then do something like this in your code:

# Use the built-in version of scandir/walk if possible, otherwise
# use the scandir module version
try:
    from os import scandir, walk
except ImportError:
    from scandir import scandir, walk

PEP 471, which is the PEP that proposes including scandir in the Python standard library, was accepted in July 2014 by Victor Stinner, the BDFL-delegate for the PEP.

This scandir module is intended to work on Python 2.7+ and Python 3.4+ (and it has been tested on those versions).

Background

Python's built-in os.walk() is significantly slower than it needs to be, because -- in addition to calling listdir() on each directory -- it calls stat() on each file to determine whether the filename is a directory or not. But both FindFirstFile / FindNextFile on Windows and readdir on Linux/OS X already tell you whether the files returned are directories or not, so no further stat system calls are needed. In short, you can reduce the number of system calls from about 2N to N, where N is the total number of files and directories in the tree.

In practice, removing all those extra system calls makes os.walk() about 7-50 times as fast on Windows, and about 3-10 times as fast on Linux and Mac OS X. So we're not talking about micro-optimizations. See more benchmarks in the "Benchmarks" section below.

Somewhat relatedly, many people have also asked for a version of os.listdir() that yields filenames as it iterates instead of returning them as one big list. This improves memory efficiency for iterating very large directories.

So as well as a faster walk(), scandir adds a new scandir() function. They're pretty easy to use, but see "The API" below for the full docs.

Benchmarks

Below are results showing how many times as fast scandir.walk() is than os.walk() on various systems, found by running benchmark.py with no arguments:

System version Python version Times as fast
Windows 7 64-bit 2.7.7 64-bit 10.4
Windows 7 64-bit SSD 2.7.7 64-bit 10.3
Windows 7 64-bit NFS 2.7.6 64-bit 36.8
Windows 7 64-bit SSD 3.4.1 64-bit 9.9
Windows 7 64-bit SSD 3.5.0 64-bit 9.5
Ubuntu 14.04 64-bit 2.7.6 64-bit 5.8
Mac OS X 10.9.3 2.7.5 64-bit 3.8

All of the above tests were done using the fast C version of scandir (source code in _scandir.c).

Note that the gains are less than the above on smaller directories and greater on larger directories. This is why benchmark.py creates a test directory tree with a standardized size.

The API

walk()

The API for scandir.walk() is exactly the same as os.walk(), so just read the Python docs.

scandir()

The full docs for scandir() and the DirEntry objects it yields are available in the Python documentation here. But below is a brief summary as well.

scandir(path='.') -> iterator of DirEntry objects for given path

Like listdir, scandir calls the operating system's directory iteration system calls to get the names of the files in the given path, but it's different from listdir in two ways:

  • Instead of returning bare filename strings, it returns lightweight DirEntry objects that hold the filename string and provide simple methods that allow access to the additional data the operating system may have returned.
  • It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately.

scandir() yields a DirEntry object for each file and sub-directory in path. Just like listdir, the '.' and '..' pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each DirEntry object has the following attributes and methods:

  • name: the entry's filename, relative to the scandir path argument (corresponds to the return values of os.listdir)
  • path: the entry's full path name (not necessarily an absolute path) -- the equivalent of os.path.join(scandir_path, entry.name)
  • is_dir(*, follow_symlinks=True): similar to pathlib.Path.is_dir(), but the return value is cached on the DirEntry object; doesn't require a system call in most cases; don't follow symbolic links if follow_symlinks is False
  • is_file(*, follow_symlinks=True): similar to pathlib.Path.is_file(), but the return value is cached on the DirEntry object; doesn't require a system call in most cases; don't follow symbolic links if follow_symlinks is False
  • is_symlink(): similar to pathlib.Path.is_symlink(), but the return value is cached on the DirEntry object; doesn't require a system call in most cases
  • stat(*, follow_symlinks=True): like os.stat(), but the return value is cached on the DirEntry object; does not require a system call on Windows (except for symlinks); don't follow symbolic links (like os.lstat()) if follow_symlinks is False
  • inode(): return the inode number of the entry; the return value is cached on the DirEntry object

Here's a very simple example of scandir() showing use of the DirEntry.name attribute and the DirEntry.is_dir() method:

def subdirs(path):
    """Yield directory names not starting with '.' under given path."""
    for entry in os.scandir(path):
        if not entry.name.startswith('.') and entry.is_dir():
            yield entry.name

This subdirs() function will be significantly faster with scandir than os.listdir() and os.path.isdir() on both Windows and POSIX systems, especially on medium-sized or large directories.

Further reading

  • The Python docs for scandir
  • PEP 471, the (now-accepted) Python Enhancement Proposal that proposed adding scandir to the standard library -- a lot of details here, including rejected ideas and previous discussion

Flames, comments, bug reports

Please send flames, comments, and questions about scandir to Ben Hoyt:

http://benhoyt.com/

File bug reports for the version in the Python 3.5 standard library here, or file bug reports or feature requests for this module at the GitHub project page:

https://github.com/benhoyt/scandir

Comments
  • scandir fails to compile with PyPy and PyPy3 virtualenvs

    scandir fails to compile with PyPy and PyPy3 virtualenvs

    I've been testing my package with various combinations of additional optional packages, and in my PyPy and PyPy3 virtual environments, compilation always fails. This is on OS X with PyPy, PyPy3 and CPython 2.7.10 (with pip) installed through Homebrew and with virtualenv installed through pip, and all packages on the pip installation fully updated.

    PyPy:

    (pypy)Carloss-MacBook-Pro:~ aarzee$ pip install scandir
    Collecting scandir
      Using cached scandir-1.1.tar.gz
    Building wheels for collected packages: scandir
      Running setup.py bdist_wheel for scandir
      Complete output from command /Users/aarzee/airship/envs/pypy/bin/pypy -c "import setuptools;__file__='/private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-WoU6on/scandir/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/tmpUXh6s1pip-wheel-:
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.macosx-10.10-x86_64-2.7
      copying scandir.py -> build/lib.macosx-10.10-x86_64-2.7
      running build_ext
      building '_scandir' extension
      creating build/temp.macosx-10.10-x86_64-2.7
      cc -arch x86_64 -O2 -fPIC -Wimplicit -I/Users/aarzee/airship/envs/pypy/include -c _scandir.c -o build/temp.macosx-10.10-x86_64-2.7/_scandir.o
      _scandir.c:17:10: fatal error: 'osdefs.h' file not found
      #include <osdefs.h>
               ^
      1 error generated.
      error: command 'cc' failed with exit status 1
    
      ----------------------------------------
      Failed building wheel for scandir
    Failed to build scandir
    Installing collected packages: scandir
      Running setup.py install for scandir
        Complete output from command /Users/aarzee/airship/envs/pypy/bin/pypy -c "import setuptools, tokenize;__file__='/private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-WoU6on/scandir/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-Yy7_Eu-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aarzee/airship/envs/pypy/include/site/python2.7/scandir:
        running install
        running build
        running build_py
        running build_ext
        building '_scandir' extension
        cc -arch x86_64 -O2 -fPIC -Wimplicit -I/Users/aarzee/airship/envs/pypy/include -c _scandir.c -o build/temp.macosx-10.10-x86_64-2.7/_scandir.o
        _scandir.c:17:10: fatal error: 'osdefs.h' file not found
        #include <osdefs.h>
                 ^
        1 error generated.
        error: command 'cc' failed with exit status 1
    
        ----------------------------------------
    Command "/Users/aarzee/airship/envs/pypy/bin/pypy -c "import setuptools, tokenize;__file__='/private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-WoU6on/scandir/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-Yy7_Eu-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aarzee/airship/envs/pypy/include/site/python2.7/scandir" failed with error code 1 in /private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-WoU6on/scandir
    

    PyPy3:

    Collecting scandir
      Using cached scandir-1.1.tar.gz
    Building wheels for collected packages: scandir
      Running setup.py bdist_wheel for scandir
      Complete output from command /Users/aarzee/airship/envs/pypy3/bin/pypy3 -c "import setuptools;__file__='/private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-2tdqnn/scandir/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/tmp_9g_ogpip-wheel-:
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.macosx-10.10-x86_64-3.2
      copying scandir.py -> build/lib.macosx-10.10-x86_64-3.2
      running build_ext
      building '_scandir' extension
      creating build/temp.macosx-10.10-x86_64-3.2
      cc -arch x86_64 -O2 -fPIC -Wimplicit -I/Users/aarzee/airship/envs/pypy3/include -c _scandir.c -o build/temp.macosx-10.10-x86_64-3.2/_scandir.o
      _scandir.c:17:10: fatal error: 'osdefs.h' file not found
      #include <osdefs.h>
               ^
      1 error generated.
      error: command 'cc' failed with exit status 1
    
      ----------------------------------------
      Failed building wheel for scandir
    Failed to build scandir
    Installing collected packages: scandir
      Running setup.py install for scandir
        Complete output from command /Users/aarzee/airship/envs/pypy3/bin/pypy3 -c "import setuptools, tokenize;__file__='/private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-2tdqnn/scandir/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-dh64dj-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aarzee/airship/envs/pypy3/include/site/python3.2/scandir:
        running install
        running build
        running build_py
        running build_ext
        building '_scandir' extension
        cc -arch x86_64 -O2 -fPIC -Wimplicit -I/Users/aarzee/airship/envs/pypy3/include -c _scandir.c -o build/temp.macosx-10.10-x86_64-3.2/_scandir.o
        _scandir.c:17:10: fatal error: 'osdefs.h' file not found
        #include <osdefs.h>
                 ^
        1 error generated.
        error: command 'cc' failed with exit status 1
    
        ----------------------------------------
    Command "/Users/aarzee/airship/envs/pypy3/bin/pypy3 -c "import setuptools, tokenize;__file__='/private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-2tdqnn/scandir/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-dh64dj-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aarzee/airship/envs/pypy3/include/site/python3.2/scandir" failed with error code 1 in /private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-2tdqnn/scandir
    
    opened by carlosliam 18
  • _scandir extension doesn't compile on Solaris 11

    _scandir extension doesn't compile on Solaris 11

    gcc -m64 -fno-strict-aliasing -g -O2 -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python2.7 -c _scandir.c -o build/temp.solaris-2.11-i86pc.64bit-2.7/_scandir.o _scandir.c: In function '_fi_next': _scandir.c:484:60: error: 'struct dirent' has no member named 'd_type' _scandir.c:486:60: error: 'struct dirent' has no member named 'd_type' _scandir.c:488:1: warning: control reaches end of non-void function error: command 'gcc' failed with exit status 1

    See http://stackoverflow.com/questions/2197918/cross-platform-way-of-testing-whether-a-file-is-a-directory

    opened by jc0n 16
  • Release on PyPI

    Release on PyPI

    It would be much easier to depend on scandir in my own code if it were registered on PyPI. Even better would be if you published wheels as well. Is there any reason not to release it?

    opened by pfmoore 16
  • test_walk fails on Linux

    test_walk fails on Linux

    ======================================================================
    FAIL: test_traversal (test_walk.WalkTests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "C:\Work\scandir\tests\test_walk.py", line 66, in test_traversal
        self.assertEqual(all[3 - 2 * flipped], sub2_tree)
    AssertionError: Tuples differ: ('C:\\Work\\scandir\\tests\\te... != ('C:\\Work\\scandir\\tests\\te...
    
    First differing element 1:
    []
    ['link']
    
    - ('C:\\Work\\scandir\\tests\\temp\\TEST1\\SUB2', [], ['link', 'tmp3'])
    ?                                                 ----
    
    + ('C:\\Work\\scandir\\tests\\temp\\TEST1\\SUB2', ['link'], ['tmp3'])
    ?                                                        +  +
    
    opened by benhoyt 15
  • Jython support

    Jython support

    Right now, it seems doing a pip install scandir will fail on Jython with:

    error: Compiling extensions is not supported on Jython

    As it seems like the library should work without extensions, I think there should be no issues in running it with Jython 2.7 (provided that it could be installed with pip).

    Full output:

    λ pip install scandir
    Downloading/unpacking scandir
      Downloading scandir-1.7.tar.gz
      Running setup.py (path:C:\Users\fabio\AppData\Local\Temp\pip_build_fabio\scandir\setup.py) egg_info for package scandir
    
    Installing collected packages: scandir
      Running setup.py install for scandir
        building '_scandir' extension
        error: Compiling extensions is not supported on Jython
        Complete output from command C:\bin\jython2.7.0\bin\jython.exe -c "import setuptools, tokenize;__file__='C:\\Users\\fabio\\AppData\\Local\\Temp\\pip_build_fabio\\scandir\\setup.py';exec(compile
    (getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record C:\Users\fabio\AppData\Local\Temp\pip-inf_f4-record\install-record.txt --single-version
    -externally-managed --compile:
        running install
    
    running build
    
    running build_py
    
    creating build
    
    creating build\lib.java1.8.0_172-2.7
    
    copying scandir.py -> build\lib.java1.8.0_172-2.7
    
    running build_ext
    
    building '_scandir' extension
    
    error: Compiling extensions is not supported on Jython
    
    ----------------------------------------
    Cleaning up...
    Command "C:\bin\jython2.7.0\bin\jython.exe -c "import setuptools, tokenize;__file__='C:\\Users\\fabio\\AppData\\Local\\Temp\\pip_build_fabio\\scandir\\setup.py';exec(compile(getattr(tokenize, 'open
    ', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record C:\Users\fabio\AppData\Local\Temp\pip-inf_f4-record\install-record.txt --single-version-externally-managed --co
    mpile" failed with error code 1 in C:\Users\fabio\AppData\Local\Temp\pip_build_fabio\scandir
    Storing debug log for failure in C:\Users\fabio\.pip\pip.log
    
    opened by fabioz 13
  • Failed to build scandir

    Failed to build scandir

    Attempting to install Azure CLI 2.0, I've encountered an error that debugging seems to narrow down to a problem with "scandir". To narrow down whether the problem is with the CLI installer or with scandir, the following command were suggested. curl -O https://bootstrap.pypa.io/get-pip.py sudo python get-pip.py sudo pip install virtualenv virtualenv myenv source myenv/bin/activate pip install scandir

    That resulted in an error (the same one encountered with the CLI installer) as follows:

    Thomass-MacBook-Pro:~ thomaswagner$ curl -O https://bootstrap.pypa.io/get-pip.py % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1558k 100 1558k 0 0 5104k 0 --:--:-- --:--:-- --:--:— 7018k

    Thomass-MacBook-Pro:~ thomaswagner$ sudo python get-pip.py Password: The directory '/Users/thomaswagner/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. The directory '/Users/thomaswagner/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. Collecting pip Downloading pip-9.0.1-py2.py3-none-any.whl (1.3MB) 100% |████████████████████████████████| 1.3MB 654kB/s Collecting setuptools Downloading setuptools-36.0.1-py2.py3-none-any.whl (476kB) 100% |████████████████████████████████| 481kB 1.6MB/s Collecting wheel Downloading wheel-0.29.0-py2.py3-none-any.whl (66kB) 100% |████████████████████████████████| 71kB 5.8MB/s Installing collected packages: pip, setuptools, wheel Successfully installed pip-9.0.1 setuptools-36.0.1 wheel-0.29.0 Thomass-MacBook-Pro:~ thomaswagner$ sudo pip install virtualenv The directory '/Users/thomaswagner/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. The directory '/Users/thomaswagner/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag. Collecting virtualenv Downloading virtualenv-15.1.0-py2.py3-none-any.whl (1.8MB) 100% |████████████████████████████████| 1.8MB 426kB/s Installing collected packages: virtualenv Successfully installed virtualenv-15.1.0 Thomass-MacBook-Pro:~ thomaswagner$ virtualenv myenv New python executable in /Users/thomaswagner/myenv/bin/python Installing setuptools, pip, wheel...done. Thomass-MacBook-Pro:~ thomaswagner$ source myenv/bin/activate (myenv) Thomass-MacBook-Pro:~ thomaswagner$ pip install scandir Collecting scandir Downloading scandir-1.5.tar.gz Building wheels for collected packages: scandir Running setup.py bdist_wheel for scandir ... error Complete output from command /Users/thomaswagner/myenv/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/f4/0zjyjj1d6db6z__bfq55m6rm0000gn/T/pip-build-AuAhdj/scandir/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /var/folders/f4/0zjyjj1d6db6z__bfq55m6rm0000gn/T/tmpvtFbzqpip-wheel- --python-tag cp27: running bdist_wheel running build running build_py creating build creating build/lib.macosx-10.6-x86_64-2.7 copying scandir.py -> build/lib.macosx-10.6-x86_64-2.7 running build_ext building '_scandir' extension creating build/temp.macosx-10.6-x86_64-2.7 gcc -fno-strict-aliasing -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Applications/MAMP/Library/include/python2.7 -c _scandir.c -o build/temp.macosx-10.6-x86_64-2.7/_scandir.o _scandir.c:14:10: fatal error: 'Python.h' file not found #include <Python.h> ^ 1 error generated. error: command 'gcc' failed with exit status 1


    Failed building wheel for scandir Running setup.py clean for scandir Failed to build scandir Installing collected packages: scandir Running setup.py install for scandir ... error Complete output from command /Users/thomaswagner/myenv/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/f4/0zjyjj1d6db6z__bfq55m6rm0000gn/T/pip-build-AuAhdj/scandir/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /var/folders/f4/0zjyjj1d6db6z__bfq55m6rm0000gn/T/pip-Oo02mY-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/thomaswagner/myenv/include/site/python2.7/scandir: running install running build running build_py creating build creating build/lib.macosx-10.6-x86_64-2.7 copying scandir.py -> build/lib.macosx-10.6-x86_64-2.7 running build_ext building '_scandir' extension creating build/temp.macosx-10.6-x86_64-2.7 gcc -fno-strict-aliasing -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Applications/MAMP/Library/include/python2.7 -c _scandir.c -o build/temp.macosx-10.6-x86_64-2.7/_scandir.o _scandir.c:14:10: fatal error: 'Python.h' file not found #include <Python.h> ^ 1 error generated. error: command 'gcc' failed with exit status 1

    ----------------------------------------
    

    Command "/Users/thomaswagner/myenv/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/f4/0zjyjj1d6db6z__bfq55m6rm0000gn/T/pip-build-AuAhdj/scandir/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /var/folders/f4/0zjyjj1d6db6z__bfq55m6rm0000gn/T/pip-Oo02mY-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/thomaswagner/myenv/include/site/python2.7/scandir" failed with error code 1 in /private/var/folders/f4/0zjyjj1d6db6z__bfq55m6rm0000gn/T/pip-build-AuAhdj/scandir/ (myenv) Thomass-MacBook-Pro:~ thomaswagner$

    opened by TMWagner 13
  • Additional windows attributes

    Additional windows attributes

    I'd like to be able to tell if a file is hidden on windows, but unless I'm mistaken this information is lost when the generic stat object is created (with the C extension at least). Is there a way I'm overlooking? Thoughts?

    Thanks.

    opened by woodlandhunter 11
  • The benchmark test doesn't work

    The benchmark test doesn't work

    C:\Users\username\Documents\Python Scripts\scandir>python benchmark.py Creating tree at benchtree: depth=4, num_dirs=5, num_files=50 Using fast C version of scandir Comparing against builtin version of os.walk() Priming the system's cache... Traceback (most recent call last): File "benchmark.py", line 255, in benchmark(tree_dir, get_size=options.size) File "benchmark.py", line 160, in benchmark do_scandir_walk() File "benchmark.py", line 155, in do_scandir_walk for root, dirs, files in scandir.walk(path): File "C:\Users\username\Documents\Python Scripts\scandir\scandir.py", line 604, in walk for entry in scandir(top): File "C:\Users\username\Documents\Python Scripts\scandir\scandir.py", line 405, in scandir_c for name, stat in scandir_helper(path): TypeError: iter() returned non-iterator of type 'tuple'

    C:\Users\username\Documents\Python Scripts\scandir>

    opened by stephenboulet 10
  • drop old/dead pythons?

    drop old/dead pythons?

    considering the "start backporting from scratch to get the latest features from 3.7/3.8 stdlib scandir" suggested in #108:

    i could have a look, but when looking at the current setup.py it seems that py26, py27 and py34 are currently supported by scandir pypi edition - this is definitely something i would not like to work on, considering theses are all dead now or in the near future.

    supporting py35, py36[, py37] with the latest stuff from py37 [or py38] stdlib seems way easier and less error prone also.

    the question is just whether this is doable / wanted within benhoyt/scandir and would just result in a new major release (and all users wanting the old stuff would just require a lower version) or whether a new project is more appropriate.

    opened by ThomasWaldmann 9
  • mark the scandir extension as optional

    mark the scandir extension as optional

    build errors no longer stop the build this may hide errors but ensures installing on jython or linux without gcc works fine

    this can be refined to be conditional on jython/pypy and/or gcc missing

    opened by RonnyPfannschmidt 9
  • Possible memory leak?

    Possible memory leak?

    I was just testing scandir.walk on a big directory structure. The structure is a top-level directory with about 16K subdirectories directly underneath it, each of which just holds files and no further subdirectories. There are a total of about 150K files.

    The speed improvement over os.walk is great - 3 seconds rather than 11 seconds on a local disk (I've not tried it on the NAS drive yet :-) This is just for a simple test:

    start = time.perf_counter()
    l = list(scandir.walk('C:\\Test\\TestCollection'))
    end = time.perf_counter()
    print(end-start)
    

    However, I noticed when looking at the memory usage, that the working set of python.exe increases by about 100M each time I run the test. In contrast, a version using os.walk uses a constant 50M no matter how many times I run the test.

    The higher memory usage is fine, easily explained by the fact that we're using DirEntry objects rather than simple strings. But the memory growth is worrying, as it imples something isn't being garbage collected. I tried a gc.collect() but that made no difference.

    This is on Windows 7, 64-bit, using Python 3.4 and the latest version of scandir from PyPI built myself via pip install scandir.

    opened by pfmoore 9
  • 3 fields are labelled

    3 fields are labelled "unnamed field" in `repr(scandir.DirEntry.stat())`

    Using current master (rev 34a0cc1dd2b8f31d6f8a859db7b287c491f50fd9) the last 3 fields are mislabelled as unnamed field

    (v) $ pip install git+https://github.com/benhoyt/scandir.git                        
    ...
    Successfully installed scandir-1.10.1
    (v) $ python3 -c"import scandir; print(list(scandir.scandir('.'))[0].stat())"
    scandir.stat_result(st_mode=16877, st_ino=5942134, st_dev=16777229, st_nlink=12, st_uid=501, st_gid=20, st_size=384, unnamed field=1637099040, unnamed field=1637099040, unnamed field=1637099040)
    

    using the most recent PyPI release they're labelled as expected (st_atime, ...)

    (v) $ pip install scandir==1.10.0                                            
    ...
          Successfully uninstalled scandir-1.10.1
    Successfully installed scandir-1.10.0
    (v) $ python3 -c"import scandir; print(list(scandir.scandir('.'))[0].stat())"
    os.stat_result(st_mode=16877, st_ino=5045619, st_dev=16777229, st_nlink=9, st_uid=501, st_gid=20, st_size=288, st_atime=1660286912, st_mtime=1660286842, st_ctime=1660286842)
    

    System details, just incase

    (v) $ python --version
    Python 3.10.5
    (v) $ uname -a        
    Darwin kintha 21.6.0 Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:22 PDT 2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T6000 arm64
    
    opened by moreati 3
  • Wheel support for linux aarch64

    Wheel support for linux aarch64

    Summary Installing scandir on aarch64 via pip using command "pip3 install scandir" tries to build wheel from source code.

    Problem description scandir doesn't have wheel for aarch64 on PyPI repository. So, while installing scandir via pip on aarch64, pip builds wheel for same resulting in it takes more time to install scandir. Making wheel available for aarch64 will benefit aarch64 users by minimizing scandir installation time.

    Expected Output Pip should be able to download scandir wheel from PyPI repository rather than building it from source code.

    @scandir-team, please let me know if I can help you building wheel/uploading to PyPI repository. I am curious to make scandir wheel available for aarch64. It will be a great opportunity for me to work with you.

    opened by odidev 1
  • Non unicode path error on Linux when scanning a dir with unicode as input

    Non unicode path error on Linux when scanning a dir with unicode as input

    With Python 2.7.13, on Linux (fs.encoding is UTF-8) I get this error:

    $ touch foo$'\261'bar
    $ touch plain
    $ python2 -c "import scandir;print list(scandir.scandir('.'))"
    [<DirEntry 'foo\xb1bar'>, <DirEntry 'plain'>]
    
    $ python2 -c "import scandir;print list(scandir.scandir(u'.'))"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/venv/lib/python2.7/encodings/utf_8.py", line 16, in decode
        return codecs.utf_8_decode(input, errors, True)
    UnicodeDecodeError: 'utf8' codec can't decode byte 0xb1 in position 5: invalid start byte
    

    FWIW, os.listdir is completely out of whackon Python 2 returning a mix of bytes or unicode:

    $ python2 -c "import os;print os.listdir('.');print os.listdir(u'.')"
    ['foo\xb1bar', 'plain']
    ['foo\xb1bar', u'plain']
    

    While Python 3 uses surrogate escape for decoding to unicode:

    $ python3 -c "import os;print(os.listdir('.'))"
    ['foo\udcb1bar', 'plain']
    ``
    
    I was hoping that scandir would not have the shortcomings of os.listdir on Python 2....
    opened by pombredanne 3
  • Unicode issues in Linux and Unix when running the tests.

    Unicode issues in Linux and Unix when running the tests.

    This happens with scandir 1.5 and Python 2.7.11 in all our Linux, AIX, Solaris, FreeBSD and OpenBSD build slaves, but not on Windows and OS X / Mac OS.

    Firstly, test_basic fails with the following error:

    ======================================================================
    ERROR: test_basic (test_scandir.TestScandirC)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/srv/buildslave/runtime/build-ubuntu1404-x64/slave/python-package-ubuntu-1404/build/python-modules/python-scandir-1.5/test/test_scandir.py", line 301, in setUp
        TestMixin.setUp(self)
      File "/srv/buildslave/runtime/build-ubuntu1404-x64/slave/python-package-ubuntu-1404/build/python-modules/python-scandir-1.5/test/test_scandir.py", line 101, in setUp
        setup_main()
      File "/srv/buildslave/runtime/build-ubuntu1404-x64/slave/python-package-ubuntu-1404/build/python-modules/python-scandir-1.5/test/test_scandir.py", line 62, in setup_main
        os.mkdir(join(TEST_PATH, 'subdir', 'unidir\u018F'))
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u018f' in position 144: ordinal not in range(128)
    

    Subsequently, most tests that follow fail with No such file or directory errors, eg.:

    ======================================================================
    ERROR: test_bytes (test_scandir.TestScandirC)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/srv/buildslave/runtime/build-ubuntu1404-x64/slave/python-package-ubuntu-1404/build/python-modules/python-scandir-1.5/test/test_scandir.py", line 301, in setUp
        TestMixin.setUp(self)
      File "/srv/buildslave/runtime/build-ubuntu1404-x64/slave/python-package-ubuntu-1404/build/python-modules/python-scandir-1.5/test/test_scandir.py", line 104, in setUp
        setup_symlinks()
      File "/srv/buildslave/runtime/build-ubuntu1404-x64/slave/python-package-ubuntu-1404/build/python-modules/python-scandir-1.5/test/test_scandir.py", line 74, in setup_symlinks
        os.mkdir(join(TEST_PATH, 'linkdir', 'linksubdir'))
    OSError: [Errno 2] No such file or directory: '/srv/buildslave/runtime/build-ubuntu1404-x64/slave/python-package-ubuntu-1404/build/python-modules/python-scandir-1.5/test/testdir/linkdir/linksubdir'
    
    ======================================================================
    

    Actually, this breaks all scandir tests except the following three:

    test_traversal (test_walk.TestWalk) ... ok
    test_symlink_to_directory (test_walk.TestWalkSymlink) ... ok
    test_symlink_to_file (test_walk.TestWalkSymlink) ... ok
    

    All excerpts are from an Ubuntu 16.04 build slave, but the errors are common across Linux distributions and Unix varieties and versions. However, OS X / Mac OS and Windows are not affected.

    opened by dumol 4
  • Add new features from CPython (finalizer, close,

    Add new features from CPython (finalizer, close, "with" support, dir_fd handling)

    Since the last changes to this scandir module, CPython has fixed a couple of issues like adding a finalizer and adding a .close() method to the iterator. Python 3.6 added support for the with statement too. These changes from CPython should be included here.

    opened by benhoyt 1
Releases(v1.10.0)
Owner
Ben Hoyt
By day I’m a software engineer at Canonical, by night a Go hacker and husband/father.
Ben Hoyt
Yadl - it is a simple library for working with both dotenv files and environment variables.

Yadl Yadl - it is a simple library for working with both dotenv files and environment variables. Features Validation of whitespaces. Validation of num

Ivan Kapranov 3 Oct 19, 2021
Lumar - Smart File Creator

Lumar is a free tool for creating and managing files. With Lumar you can quickly create any type of file, add a file content and file size. With Lumar you can also find out if Photoshop or other imag

Paul - FloatDesign 3 Dec 10, 2021
Maltego transforms to pivot between PE files based on their VirusTotal codeblocks

VirusTotal Codeblocks Maltego Transforms Introduction These Maltego transforms allow you to pivot between different PE files based on codeblocks they

Ariel Jungheit 18 Feb 03, 2022
Better directory iterator and faster os.walk(), now in the Python 3.5 stdlib

scandir, a better directory iterator and faster os.walk() scandir() is a directory iteration function like os.listdir(), except that instead of return

Ben Hoyt 506 Dec 29, 2022
Dragon Age: Origins toolset to extract/build .erf files, patch language-specific .dlg files, and view the contents of files in the ERF or GFF format

DAOTools This is a set of tools for Dragon Age: Origins modding. It can patch the text lines of .dlg files, extract and build an .erf file, and view t

8 Dec 06, 2022
organize - The file management automation tool

organize - The file management automation tool

Thomas Feldmann 1.5k Jan 01, 2023
Test app for importing contact information in CSV files.

Contact Import TestApp Test app for importing contact information in CSV files. Explore the docs » · Report Bug · Request Feature Table of Contents Ab

1 Feb 06, 2022
A python wrapper for libmagic

python-magic python-magic is a Python interface to the libmagic file type identification library. libmagic identifies file types by checking their hea

Adam Hupp 2.3k Dec 29, 2022
A python script to convert an ucompressed Gnucash XML file to a text file for Ledger and hledger.

README 1 gnucash2ledger gnucash2ledger is a Python script based on the Github Gist by nonducor (nonducor/gcash2ledger.py). This Python script will tak

Thomas Freeman 0 Jan 28, 2022
Uproot is a library for reading and writing ROOT files in pure Python and NumPy.

Uproot is a library for reading and writing ROOT files in pure Python and NumPy. Unlike the standard C++ ROOT implementation, Uproot is only an I/O li

Scikit-HEP Project 164 Dec 31, 2022
LightCSV - This CSV reader is implemented in just pure Python.

LightCSV Simple light CSV reader This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles

Jose Rodriguez 6 Mar 05, 2022
gitfs is a FUSE file system that fully integrates with git - Version controlled file system

gitfs is a FUSE file system that fully integrates with git. You can mount a remote repository's branch locally, and any subsequent changes made to the files will be automatically committed to the rem

Presslabs 2.3k Jan 08, 2023
Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of.

Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of. Th

Singer 1.1k Jan 05, 2023
An easy-to-use library for emulating code in minidump files.

dumpulator Note: This is a work-in-progress prototype, please treat it as such. An easy-to-use library for emulating code in minidump files. Example T

Duncan Ogilvie 362 Dec 31, 2022
A simple Python code that takes input from a csv file and makes it into a vcf file.

Contacts-Maker A simple Python code that takes input from a csv file and makes it into a vcf file. Imagine a college or a large community where each y

1 Feb 13, 2022
Swiss army knife for Apple's .tbd file manipulation

Description Inspired by tbdswizzler, this simple python tool for manipulating Apple's .tbd format. Installation python3 -m pip install --user -U pytbd

10 Aug 31, 2022
A tiny Python library for writing multi-channel TIFF stacks.

xtiff A tiny Python library for writing multi-channel TIFF stacks. The aim of this library is to provide an easy way to write multi-channel image stac

23 Dec 27, 2022
Add Ranges and page numbers to IIIF Manifest from a CSV.

Add Ranges and page numbers to IIIF Manifest from CSV specific to a workflow of the Bibliotheca Hertziana.

Raffaele Viglianti 3 Apr 28, 2022
Python code snippets for extracting PDB codes from .fasta files

Python_snippets_for_bioinformatics Python code snippets for extracting PDB codes from .fasta files If you have a single .fasta file for all protein se

Sofi-Mukhtar 3 Feb 09, 2022
A simple bulk file renamer, written in python.

Python File Editor A simple bulk file renamer, written in python. There are two functions, the bulk rename and the bulk file extention change. Bulk Fi

Sam Bloomfield 2 Dec 22, 2021