Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.

Overview

DiskCache: Disk Backed Cache

DiskCache is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django.

The cloud-based computing of 2021 puts a premium on memory. Gigabytes of empty space is left on disks as processes vie for memory. Among these processes is Memcached (and sometimes Redis) which is used as a cache. Wouldn't it be nice to leverage empty disk space for caching?

Django is Python's most popular web framework and ships with several caching backends. Unfortunately the file-based cache in Django is essentially broken. The culling method is random and large caches repeatedly scan a cache directory which slows linearly with growth. Can you really allow it to take sixty milliseconds to store a key in a cache with a thousand items?

In Python, we can do better. And we can do it in pure-Python!

In [1]: import pylibmc
In [2]: client = pylibmc.Client(['127.0.0.1'], binary=True)
In [3]: client[b'key'] = b'value'
In [4]: %timeit client[b'key']

10000 loops, best of 3: 25.4 µs per loop

In [5]: import diskcache as dc
In [6]: cache = dc.Cache('tmp')
In [7]: cache[b'key'] = b'value'
In [8]: %timeit cache[b'key']

100000 loops, best of 3: 11.8 µs per loop

Note: Micro-benchmarks have their place but are not a substitute for real measurements. DiskCache offers cache benchmarks to defend its performance claims. Micro-optimizations are avoided but your mileage may vary.

DiskCache efficiently makes gigabytes of storage space available for caching. By leveraging rock-solid database libraries and memory-mapped files, cache performance can match and exceed industry-standard solutions. There's no need for a C compiler or running another process. Performance is a feature and testing has 100% coverage with unit tests and hours of stress.

Testimonials

Daren Hasenkamp, Founder --

"It's a useful, simple API, just like I love about Redis. It has reduced the amount of queries hitting my Elasticsearch cluster by over 25% for a website that gets over a million users/day (100+ hits/second)."

Mathias Petermann, Senior Linux System Engineer --

"I implemented it into a wrapper for our Ansible lookup modules and we were able to speed up some Ansible runs by almost 3 times. DiskCache is saving us a ton of time."

Does your company or website use DiskCache? Send us a message and let us know.

Features

  • Pure-Python
  • Fully Documented
  • Benchmark comparisons (alternatives, Django cache backends)
  • 100% test coverage
  • Hours of stress testing
  • Performance matters
  • Django compatible API
  • Thread-safe and process-safe
  • Supports multiple eviction policies (LRU and LFU included)
  • Keys support "tag" metadata and eviction
  • Developed on Python 3.9
  • Tested on CPython 3.6, 3.7, 3.8, 3.9
  • Tested on Linux, Mac OS X, and Windows
  • Tested using GitHub Actions

Quickstart

Installing DiskCache is simple with pip:

$ pip install diskcache

You can access documentation in the interpreter with Python's built-in help function:

>>> import diskcache
>>> help(diskcache)                             # doctest: +SKIP

The core of DiskCache is three data types intended for caching. Cache objects manage a SQLite database and filesystem directory to store key and value pairs. FanoutCache provides a sharding layer to utilize multiple caches and DjangoCache integrates that with Django:

>>> from diskcache import Cache, FanoutCache, DjangoCache
>>> help(Cache)                                 # doctest: +SKIP
>>> help(FanoutCache)                           # doctest: +SKIP
>>> help(DjangoCache)                           # doctest: +SKIP

Built atop the caching data types, are Deque and Index which work as a cross-process, persistent replacements for Python's collections.deque and dict. These implement the sequence and mapping container base classes:

>>> from diskcache import Deque, Index
>>> help(Deque)                                 # doctest: +SKIP
>>> help(Index)                                 # doctest: +SKIP

Finally, a number of recipes for cross-process synchronization are provided using an underlying cache. Features like memoization with cache stampede prevention, cross-process locking, and cross-process throttling are available:

>>> from diskcache import memoize_stampede, Lock, throttle
>>> help(memoize_stampede)                      # doctest: +SKIP
>>> help(Lock)                                  # doctest: +SKIP
>>> help(throttle)                              # doctest: +SKIP

Python's docstrings are a quick way to get started but not intended as a replacement for the DiskCache Tutorial and DiskCache API Reference.

User Guide

For those wanting more details, this part of the documentation describes tutorial, benchmarks, API, and development.

Comparisons

Comparisons to popular projects related to DiskCache.

Key-Value Stores

DiskCache is mostly a simple key-value store. Feature comparisons with four other projects are shown in the tables below.

  • dbm is part of Python's standard library and implements a generic interface to variants of the DBM database — dbm.gnu or dbm.ndbm. If none of these modules is installed, the slow-but-simple dbm.dumb is used.
  • shelve is part of Python's standard library and implements a “shelf” as a persistent, dictionary-like object. The difference with “dbm” databases is that the values can be anything that the pickle module can handle.
  • sqlitedict is a lightweight wrapper around Python's sqlite3 database with a simple, Pythonic dict-like interface and support for multi-thread access. Keys are arbitrary strings, values arbitrary pickle-able objects.
  • pickleDB is a lightweight and simple key-value store. It is built upon Python's simplejson module and was inspired by Redis. It is licensed with the BSD three-clause license.

Features

Feature diskcache dbm shelve sqlitedict pickleDB
Atomic? Always Maybe Maybe Maybe No
Persistent? Yes Yes Yes Yes Yes
Thread-safe? Yes No No Yes No
Process-safe? Yes No No Maybe No
Backend? SQLite DBM DBM SQLite File
Serialization? Customizable None Pickle Customizable JSON
Data Types? Mapping/Deque Mapping Mapping Mapping Mapping
Ordering? Insert/Sorted None None None None
Eviction? LRU/LFU/more None None None None
Vacuum? Automatic Maybe Maybe Manual Automatic
Transactions? Yes No No Maybe No
Multiprocessing? Yes No No No No
Forkable? Yes No No No No
Metadata? Yes No No No No

Quality

Project diskcache dbm shelve sqlitedict pickleDB
Tests? Yes Yes Yes Yes Yes
Coverage? Yes Yes Yes Yes No
Stress? Yes No No No No
CI Tests? Linux/Windows Yes Yes Linux No
Python? 2/3/PyPy All All 2/3 2/3
License? Apache2 Python Python Apache2 3-Clause BSD
Docs? Extensive Summary Summary Readme Summary
Benchmarks? Yes No No No No
Sources? GitHub GitHub GitHub GitHub GitHub
Pure-Python? Yes Yes Yes Yes Yes
Server? No No No No No
Integrations? Django None None None None

Timings

These are rough measurements. See DiskCache Cache Benchmarks for more rigorous data.

Project diskcache dbm shelve sqlitedict pickleDB
get 25 µs 36 µs 41 µs 513 µs 92 µs
set 198 µs 900 µs 928 µs 697 µs 1,020 µs
delete 248 µs 740 µs 702 µs 1,717 µs 1,020 µs

Caching Libraries

  • joblib.Memory provides caching functions and works by explicitly saving the inputs and outputs to files. It is designed to work with non-hashable and potentially large input and output data types such as numpy arrays.
  • klepto extends Python’s lru_cache to utilize different keymaps and alternate caching algorithms, such as lfu_cache and mru_cache. Klepto uses a simple dictionary-sytle interface for all caches and archives.

Data Structures

  • dict is a mapping object that maps hashable keys to arbitrary values. Mappings are mutable objects. There is currently only one standard Python mapping type, the dictionary.
  • pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive.
  • Sorted Containers is an Apache2 licensed sorted collections library, written in pure-Python, and fast as C-extensions. Sorted Containers implements sorted list, sorted dictionary, and sorted set data types.

Pure-Python Databases

  • ZODB supports an isomorphic interface for database operations which means there's little impact on your code to make objects persistent and there's no database mapper that partially hides the datbase.
  • CodernityDB is an open source, pure-Python, multi-platform, schema-less, NoSQL database and includes an HTTP server version, and a Python client library that aims to be 100% compatible with the embedded version.
  • TinyDB is a tiny, document oriented database optimized for your happiness. If you need a simple database with a clean API that just works without lots of configuration, TinyDB might be the right choice for you.

Object Relational Mappings (ORM)

  • Django ORM provides models that are the single, definitive source of information about data and contains the essential fields and behaviors of the stored data. Generally, each model maps to a single SQL database table.
  • SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL. It provides a full suite of well known enterprise-level persistence patterns.
  • Peewee is a simple and small ORM. It has few (but expressive) concepts, making it easy to learn and intuitive to use. Peewee supports Sqlite, MySQL, and PostgreSQL with tons of extensions.
  • SQLObject is a popular Object Relational Manager for providing an object interface to your database, with tables as classes, rows as instances, and columns as attributes.
  • Pony ORM is a Python ORM with beautiful query syntax. Use Python syntax for interacting with the database. Pony translates such queries into SQL and executes them in the database in the most efficient way.

SQL Databases

  • SQLite is part of Python's standard library and provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language.
  • MySQL is one of the world’s most popular open source databases and has become a leading database choice for web-based applications. MySQL includes a standardized database driver for Python platforms and development.
  • PostgreSQL is a powerful, open source object-relational database system with over 30 years of active development. Psycopg is the most popular PostgreSQL adapter for the Python programming language.
  • Oracle DB is a relational database management system (RDBMS) from the Oracle Corporation. Originally developed in 1977, Oracle DB is one of the most trusted and widely used enterprise relational database engines.
  • Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it stores and retrieves data as requested by other software applications.

Other Databases

  • Memcached is free and open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.
  • Redis is an open source, in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, and more.
  • MongoDB is a cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schema. PyMongo is the recommended way to work with MongoDB from Python.
  • LMDB is a lightning-fast, memory-mapped database. With memory-mapped files, it has the read performance of a pure in-memory database while retaining the persistence of standard disk-based databases.
  • BerkeleyDB is a software library intended to provide a high-performance embedded database for key/value data. Berkeley DB is a programmatic toolkit that provides built-in database support for desktop and server applications.
  • LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Data is stored sorted by key and users can provide a custom comparison function.

Reference

License

Copyright 2016-2021 Grant Jenks

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Owner
Grant Jenks
listen | learn | think | solve
Grant Jenks
An implementation of memoization technique for Django

django-memoize django-memoize is an implementation of memoization technique for Django. You can think of it as a cache for function or method results.

Unhaggle 118 Dec 09, 2022
An ORM cache for Django.

Django ORMCache A cache manager mixin that provides some caching of objects for the ORM. Installation / Setup / Usage TODO Testing Run the tests with:

Educreations, Inc 15 Nov 27, 2022
Extensible memoizing collections and decorators

cachetools This module provides various memoizing collections and decorators, including variants of the Python Standard Library's @lru_cache function

Thomas Kemmer 1.5k Jan 05, 2023
Automatic caching and invalidation for Django models through the ORM.

Cache Machine Cache Machine provides automatic caching and invalidation for Django models through the ORM. For full docs, see https://cache-machine.re

846 Nov 26, 2022
Render template parts with extended cache control.

Render template parts with extended cache control. Installation Install django-viewlet in your python environment $ pip install django-viewlet Support

5 Monkeys 59 Apr 05, 2022
RecRoom Library Cache Tool

RecRoom Library Cache Tool A handy tool to deal with the Library cache file. Features Parse Library cache Remove Library cache Parsing The script pars

Jesse 5 Jul 09, 2022
Peerix is a peer-to-peer binary cache for nix derivations

Peerix Peerix is a peer-to-peer binary cache for nix derivations. Every participating node can pull derivations from each other instances' respective

92 Dec 13, 2022
No effort, no worry, maximum performance.

Django Cachalot Caches your Django ORM queries and automatically invalidates them. Documentation: http://django-cachalot.readthedocs.io Table of Conte

NoriPyt 976 Dec 28, 2022
johnny cache django caching framework

Johnny Cache is a caching framework for django applications. It works with the django caching abstraction, but was developed specifically with the use

Jason Moiron 304 Nov 07, 2022
PyCache - simple key:value server written with Python

PyCache simple key:value server written with Python and client is here run server python -m pycache.server or from pycache.server import start_server

chick_0 0 Nov 01, 2022
Persistent caching for python functions

Cashier Persistent caching for python functions Simply add a decorator to a python function and cache the results for future use. Extremely handy when

Anoop Thomas Mathew 82 Mar 04, 2022
Aircache is an open-source caching and security solution that can be integrated with most decoupled apps that use REST APIs for communicating.

AirCache Aircache is an open-source caching and security solution that can be integrated with most decoupled apps that use REST APIs for communicating

AirCache 2 Dec 22, 2021
WSGI middleware for sessions and caching

Cache and Session Library About Beaker is a web session and general caching library that includes WSGI middleware for use in web applications. As a ge

Ben Bangert 500 Dec 29, 2022
Caching for HTTPX

Caching for HTTPX. Note: Early development / alpha, use at your own risk. This package adds caching functionality to HTTPX Adapted from Eric Larson's

Johannes 51 Dec 04, 2022
Simple caching transport for httpx

httpx-cache is yet another implementation/port is a port of the caching algorithms in httplib2 for use with httpx Transport object.

Ouail 28 Jan 01, 2023
No effort, no worry, maximum performance.

Django Cachalot Caches your Django ORM queries and automatically invalidates them. Documentation: http://django-cachalot.readthedocs.io Table of Conte

NoriPyt 979 Jan 03, 2023
CacheControl is a port of the caching algorithms in httplib2 for use with requests session object.

CacheControl CacheControl is a port of the caching algorithms in httplib2 for use with requests session object. It was written because httplib2's bett

Eric Larson 409 Dec 04, 2022
A slick ORM cache with automatic granular event-driven invalidation.

Cacheops A slick app that supports automatic or manual queryset caching and automatic granular event-driven invalidation. It uses redis as backend for

Alexander Schepanovski 1.7k Dec 30, 2022
A Redis cache backend for django

Redis Django Cache Backend A Redis cache backend for Django Docs can be found at http://django-redis-cache.readthedocs.org/en/latest/. Changelog 3.0.0

Sean Bleier 1k Dec 15, 2022
Asyncio cache manager for redis, memcached and memory

aiocache Asyncio cache supporting multiple backends (memory, redis and memcached). This library aims for simplicity over specialization. All caches co

aio-libs 764 Jan 02, 2023