Toolkit for storing files and attachments in web applications

Overview

https://raw.github.com/amol-/depot/master/docs/_static/logo.png

DEPOT - File Storage Made Easy

https://travis-ci.org/amol-/depot.png?branch=master https://coveralls.io/repos/amol-/depot/badge.png?branch=master

DEPOT is a framework for easily storing and serving files in web applications on Python2.6+ and Python3.2+.

DEPOT supports storing files in multiple backends, like:

  • Local Disk
  • In Memory (for tests)
  • On GridFS
  • On Amazon S3 (or compatible services)

and integrates with database by providing files attached to your SQLAlchemy or Ming/MongoDB models with respect to transactions behaviours (files are rolled back too).

Installing

Installing DEPOT can be done from PyPi itself by installing the filedepot distribution:

$ pip install filedepot

Getting Started

To start using Depot refer to Documentation

DEPOT was presented at PyConUK and PyConFR in 2014

standalone

Here is a simple example of using depot standalone to store files on MongoDB:

from depot.manager import DepotManager

# Configure a *default* depot to store files on MongoDB GridFS
DepotManager.configure('default', {
    'depot.backend': 'depot.io.gridfs.GridFSStorage',
    'depot.mongouri': 'mongodb://localhost/db'
})

depot = DepotManager.get()

# Save the file and get the fileid
fileid = depot.create(open('/tmp/file.png'))

# Get the file back
stored_file = depot.get(fileid)
print stored_file.filename
print stored_file.content_type

models

Or you can use depot with SQLAlchemy to store attachments:

from depot.fields.sqlalchemy import UploadedFileField
from depot.fields.specialized.image import UploadedImageWithThumb


class Document(Base):
    __tablename__ = 'document'

    uid = Column(Integer, autoincrement=True, primary_key=True)
    name = Column(Unicode(16), unique=True)
    content = Column('content_col', UploadedFileField)  # plain attached file

    # photo field will automatically generate thumbnail
    photo = Column(UploadedFileField(upload_type=UploadedImageWithThumb))


# Store documents with attached files, the source can be a file or bytes
doc = Document(name=u'Foo',
            content=b'TEXT CONTENT STORED AS FILE',
            photo=open('/tmp/file.png'))
DBSession.add(doc)
DBSession.flush()

# DEPOT is session aware, commit/rollback to keep or delete the stored files.
DBSession.commit()

ChangeLog

0.8.0

  • Replaced unidecode dependency with anyascii to better cope with MIT License.

0.7.1

  • Fix a bug in AWS-S3 support for unicode filenames.

0.7.0

  • Support for storage_class option in depot.io.boto3.S3Storage backend. Detaults to STANDARD

0.6.0

  • Officially support Python 3.7
  • Fix DEPOT wrongly serving requests for any url that starts with the mountpoint. (IE: /depotsomething was wrongly served for /depot mountpoint)
  • In SQLAlchemy properly handle deletion of objects deleted through Relationship.remove (IE: parent.children.remove(X))
  • In SQLAlchemy properly handle entities deleted through cascade='delete-orphan'

0.5.2

  • Fixed an start_response called a second time without providing exc_info error with storages supporting plublic urls

0.5.1

  • URLs generated by DepotMiddleware are now guaranteed to be plain ascii
  • [Breaking change]: Bucket existance with S3 storages should now be more reliable when the bucket didn't already exist, but it requires an additional AWS policy: s3:ListAllMyBuckets that wasn't required on 0.5.0

0.5.0

  • depot.io.boto3.S3Storage now provides support for accessing S3 with boto3. The previously existing depot.io.awss3.S3Storage can still be used to store files on S3 using boto.
  • SQLAlchemy integration now handles deletion of files on rollback when session is not flushed. Previously flushing the session was required before a rollback too.
  • It is now possible to run tests through tox and build docs through tox -e docs
  • DEPOT is now tested against Python 3.6

0.4.1

  • Fixed installation error on non-UTF8 systems
  • Improved support for polymorphic subtypes in SQLAlchemy

0.4.0

  • Support for Python 3.5
  • Fixed Content-Disposition header for filenames including a comma

0.3.2

  • MemoryFileStorage now accepts any option, for easier testing configuration

0.3.1

  • Fixed Content-Disposition header when serving from S3 directly
  • Fixed size of SQLAlchemy field on Oracle (was bigger than the allowed maximum)

0.3.0

  • MemoryFileStorage provides in memory storage for files. This is meant to provide a convenient way to speed up test suites and avoid fixture clean up issues.
  • S3Storage can now generate public urls for private files (expire in 1 year)
  • Files created from plain bytes are now named "unnamed" instead of missing a filename.

0.2.1

  • S3Storage now supports the prefix option to store files in a subpath

0.2.0

  • Storages now provide a list method to list files available on the store (This is not meant to be used to retrieve files uploaded by depot as it lists all the files).
  • DepotExtension for Ming is now properly documented

0.1.2

  • It is now possible to use multiple WithThumbnailFilter to generate multiple thumbnails with different resolutions.
  • Better documentation for MongoDB UploadedFileProperty

0.1.1

  • Fixed a bug with Ming support when acessing UploadedFileProperty as a class property
  • Improved support for DEPOT inside TurboGears admin when using MongoDB

0.1.0

  • Added DepotManager.alias to configure aliases to storage. This allows easy migration from one storage to another by switching where the alias points.
  • Now UploadedFileField permits to specify upload_storage to link a Model Column to a specific storage.
  • Added policy and encrypt_key options to S3Storage to upload private and encrypted files.

0.0.6

  • Added host option to S3Storage to allow using providers different from AWS.

0.0.5

  • Added FileIntent to explicitly provide content_type and filename to uploaded content.

0.0.4

  • Added Content-Disposition header with original filename in WSGI middleware

0.0.3

  • Work-Around for issue with wsgi.file_wrapper provided by Waitress WSGI Server

0.0.2

  • Official Support for AWS S3 on Python3
Comments
  • Ability to specify prefix for S3 storage

    Ability to specify prefix for S3 storage

    This change adds S3PrefixedStorage class which extends S3Storage with "prefix" required argument. I thought it would be nice to be ale to store all files within one folder in S3. So, I implemented it after reviewing this discussion: https://github.com/amol-/depot/issues/13#issuecomment-142432557 As suggested, prefix here is set in storage, not in the file field.

    Can you please review and provide feedback? One thing I am not sure about in this implementation is that prefix ends up in UploadedFile['file_id']. Perhaps it would be more clear to not leak prefix outside of the storage class.

    opened by eprikazc 16
  • How do I configure s3 to use with depot and flask application factory way?

    How do I configure s3 to use with depot and flask application factory way?

    How do I configure s3 settings to use with depot? I'm looking for an example. What should the Depot manager syntax look like?

    So far, I've been trying to look at your code and documentation to figure this out but nothing comes up.

    from depot.manager import DepotManager
    from depot.io.awss3 import S3Storage
    
    S3_LOCATION = 'test.s3-website-us-west-2.amazonaws.com'
    S3_KEY = 'XXXX'
    S3_SECRET = 'XXXX'
    S3_UPLOAD_DIRECTORY = 'media'
    S3_BUCKET = 'test'
    #
    s3 = S3Storage(S3_KEY,S3_SECRET,bucket=S3_BUCKET)
    
    DepotManager.configure('media',{
        'depot.backend': depot.io.awss3.S3Storage????
    })
    
    # DepotManager.configure('media', {'depot.storage_path': 'media/'})
    
    app.wsgi_app = DepotManager.make_middleware(app.wsgi_app)
    

    Thanks!

    opened by rlam3 13
  • File stored even though SQLAlchemy transaction rollbacked

    File stored even though SQLAlchemy transaction rollbacked

    I have observed this behavior:

    1. Create an SQLAlchemy model with a Depot file field, stored using the LocalFileStorage
    2. Add the model to a session
    3. Rollback the session

    Here I would not expect the file from step 1 to have been stored to disk, but it seems to me that it is. Is this intentional?

    Performing session.flush() before the rollback will however result in the file being removed during the rollback.

    opened by davidparsson 11
  • Is there a way to integrate celery async uploads?

    Is there a way to integrate celery async uploads?

    @amol-

    I'm doing multiple image uploads per POST request via wtforms. I'm trying to enable async background upload. Is there a way to integrate celery async uploads with DEPOT? If so could you please provide us with a solution for this? I'm currently using Celery with RabbitMQ.

    Suggestions would be good too.

    Thanks!

    opened by rlam3 8
  • tox.ini for locally run automated tests with multiple python versions

    tox.ini for locally run automated tests with multiple python versions

    Currently the package has definition of integration tests by means of .travis.yml. This allows for automated testing after changes are committed, however, a developer cannot run such tests locally without travis service.

    Proposed solution (keep it simple)

    • create tox.ini with definition of environments for all target python versions. Such environments would be run by default (would be listed in top section).
    • each environment would run all the tests, which are easy to handle. Candidates for tests to skip are all tests requiring access to Internet or other installed services (like Mongo).
    • .travis.yml would be kept as it is now as it runs more extensive set of tests.

    Possible extensions of proposed solution

    • drive even extended tests (requiring access to Internet or using external services like Mongo) by tox.ini. Possibly allow skiping such tests, e.g. by means of some env. variables or similar methods.
    • modify .travis.yml to use what extended tox.ini is providing (I have seen couple of such examples). This way automated and local tests could be the same (or very similar).

    If such idea sounds acceptable, I could provide PR with initial Proposed solution, which would conclude this issue.

    Extended solution may follow, but it could include more work and extend the basic functionality too far blocking possible simplification of locally run tests for developer.

    opened by vlcinsky 6
  • file_id generation customization

    file_id generation customization

    Hi,

    First, thanks for this awesome library, it fits almost everything I need except one use case.

    I have a mixed tree of static and dynamic files:

    http://ex.com/x/y/z1 can be a dynamic object (my logic here)
    http://ex.com/x/y/z2 can be a static picture (static serving using filedepot lookup)
    

    Therefore, I need to lookup the static files by their request path (eg. /x/y/z2 --their actual URI ID). When using create(), file_id is generated by an arbitrary uuid, there I can't use get() to look up by their URI ID. Would it be possible to allow overriding the way file_id is generated so i could compute it using the URI ?

    Thanks,

    Guillaume

    opened by glibersat 6
  • Remove unidecode

    Remove unidecode

    Fixes #64 by using URL encoding when unidecode isn't installed. It also removes the dependency on unidocde so that it is only installed when users want it/can use it.

    opened by jpmccu 5
  • boto3 and therefore minio support

    boto3 and therefore minio support

    For our development environment, I setup minio to simulate S3 storage. But communicating with it using existing depot and boto code was impossible (had to be SSL, needed patch to use custom port, custom s3 authentication mechanism, install CA to verify certs,...)

    So I gave up and wrote a layer to work with boto3, I figured I'll share it with you guys. There is one thing missing - encrypt_key support. I don't understand what that's supposed to do in the original code, therefore I have no clue how to migrate that to boto3.

    Find the boto3 wrapper attached to this issue:

    depot_minio.py.zip

    Here is how I call it

    DepotManager.configure('s3devel', {'depot.backend': 'depot_minio.MinioStorage',
                'depot.access_key_id': '...',
                'depot.secret_access_key': '....',
                'depot.bucket': 'my-bucket',
                'depot.endpoint': 'http://localhost:9000',
                'depot.prefix': 'my-prefix',
            })
    

    I am willing to help integrate this into your project, if you are interested to support boto3 and minio. Just let me know how I can help br mike

    opened by multiwave 5
  • Add a way of listing all stored objects

    Add a way of listing all stored objects

    I miss a way of listing all objects stored in a depot, which would be really useful. Was this left out of the API intentionally, or could this be added?

    opened by fmarczin 5
  • How do I use depot to specify which bucket to update/save an image to?

    How do I use depot to specify which bucket to update/save an image to?

    How do I use depot to specify which bucket to update/save an image to?

    Currently I'm using sqlalchemy to as the orm and following your details my folder looks like

    This is what my folder looks like currently

    /media
       /8404b4c7-2106-11e5-827f-685b358e848d
         /metadata.json
         /file
    ``
    
    I want to separate out say for example, profile avatar and user images
    
    
    Also, is there a way to get the size of the image that is stored within the folder/bucket?
    
    opened by rlam3 5
  • [SQLALCHEMY] Delete multiple row with query system don't delete the associated files

    [SQLALCHEMY] Delete multiple row with query system don't delete the associated files

    I use the SqlAlchemy ORM system. When I get an object and I delete it, the associated file is deleted to.

    d = DBSession.query(Document).filter_by(name=u_('Foo2')).first()
    DBSession.delete(d)
    

    But when I create query to delete multiple row, row are deleted in the db but the associated files are not deleted

    DBSession.query(Document).filter_by(name=u_('Foo2')).delete()
    DBSession.commit()
    
    opened by arnaudiko 4
  • Deprecation warning: Image.BILINEAR (pillow >=9.1.0)

    Deprecation warning: Image.BILINEAR (pillow >=9.1.0)

    Thanks for the development & maintenance of the filedepot package!

    I recently started seeing this deprecation warning when using the WithThumbnailFilter class with pillow v9.2.0 (I'm running on Python 3.10.7):

    lib/python3.10/site-packages/depot/fields/filters/thumbnails.py:37: DeprecationWarning: 
    BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
        thumbnail.thumbnail(self.thumbnail_size, Image.BILINEAR)
    

    It looks like Image.BILINEAR was deprecated in pillow 9.1.0 in favour of Image.Resampling.BILINEAR.

    opened by paulgoetze 2
  • need verify ssl with boto3

    need verify ssl with boto3

    Hello!

    I need ssl verification and option to set path to my selfsigned certificate when work with S3.

    There's no support of parametres verify and use_ssl для boto3.resource (boto3 docs).

    Can you add this support to your depot.io.boto3.S3Storage?

    Something like this

    class S3Storage:
         ...
    
         def __init__(self, access_key_id, secret_access_key, bucket=None, region_name=None,
                     policy=None, storage_class=None, endpoint_url=None, prefix='',
                     use_ssl=None, verify=None):
             ...
             kw = {}
             ...
             if use_ssl is not None:
                 kw['use_ssl'] = use_ssl
             if verify is not None:
                 kw['verify'] = verify
             self._s3 = self._conn.resource('s3', **kw)
             ...
    

    Or even this (support of all S3 configuration parametres)

    class S3Storage:
         ...
    
         def __init__(
            self,
            access_key_id,
            secret_access_key,
            bucket=None,
            policy=None,
            storage_class=None,
            prefix='',
            s3_params=None,
        ):
             ...
             s3_params = s3_params or {}
             self._s3 = self._conn.resource('s3', **s3_params)
             ...
    

    so I can configure depot like this

    depot.manager.DepotManager.configure(
        'default',
        {
            'depot.backend': 'depot.io.boto3.S3Storage',
            'depot.access_key_id': settings.depot_access_key_id,
            'depot.secret_access_key': settings.depot_secret_access_key,
            'depot.bucket': 'bububucket',
            'depot.policy': settings.depot_policy,
    
            'depot.s3_params': {
                'endpoint_url': settings.depot_endpoint_url,
                'use_ssl': True,
                'verify': 'certs.pem',
                'config': botocore.config.Config(
                    retries={
                        "max_attempts": MAX_RETRIES,
                        "mode": "standard",
                    },
                    s3={
                        "addressing_style": 'virtual',
                    },
                ),
            },
            'depot.prefix': 'default_prefix'
        },
    )
    
    opened by qxiddd 0
  • Reprocessing filters

    Reprocessing filters

    Not really an issue, more of a question 😅

    Is there a way to reprocess all filters without re-uploading the main file? I guess the file will have to be downloaded anyway to be processed, but I'd like to avoid the extra upload 👀

    I poked around a bit and the only thing that came to mind was to download/replace/upload every file.

    opened by olgeni 0
  • Added support for setting file IDs explicitly.

    Added support for setting file IDs explicitly.

    Use with care, especially with MongoDB. This is probably going to be a controversial PR, but I wanted to put it out there. It should be 100% backwards compatible.

    opened by jpmccu 5
  • Need to be able to replace nonexistant files in order to do proper backup/restore or import/export

    Need to be able to replace nonexistant files in order to do proper backup/restore or import/export

    If a depot is being stored locally, obviously the backup and restore can just use local archiving tool. But if they're remote and someone wants to either export the depot to another installation or do some other sort of extraction, we need to be able to replicate depots exactly, including ID. One easy way to do this is to allow fileids to optionally be set on create.

    This is needed because if the database is storing file ids, then it would break everything to have the new store use a new ID.

    opened by jpmccu 0
Releases(0.9.0)
  • 0.9.0(Dec 11, 2022)

    • Support for SQLAlchemy 1.4 and 2.0
    • Support for SQLAlchemy objects deleted with .delete(synchronize_session="fetch")
    • Tests migrated to unittest
    Source code(tar.gz)
    Source code(zip)
  • 0.8.0(Jul 27, 2020)

  • 0.7.1(Nov 26, 2019)

  • 0.7.0(Aug 13, 2019)

  • 0.5.0(May 7, 2017)

    • depot.io.boto3.S3Storage now provides support for accessing S3 with boto3. The previously existing depot.io.awss3.S3Storage can still be used to store files on S3 using boto.
    • SQLAlchemy integration now handles deletion of files on rollback when session is not flushed. Previously flushing the session was required before a rollback too.
    • It is now possible to run tests through tox and build docs through tox -e docs
    • DEPOT is now tested against Python 3.6
    Source code(tar.gz)
    Source code(zip)
Owner
Alessandro Molina
Core Developer of @TurboGears Engineer at @Crunch-io Partner of @axant Python fan with a particular interest for Web Development and distributed systems
Alessandro Molina
An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets

datasets_sql A 🤗 Datasets extension package that provides support for executing arbitrary SQL queries on HF datasets. It uses DuckDB as a SQL engine

Mario Šaško 19 Dec 15, 2022
Making it easy to query APIs via SQL

Shillelagh Shillelagh (ʃɪˈleɪlɪ) is an implementation of the Python DB API 2.0 based on SQLite (using the APSW library): from shillelagh.backends.apsw

Beto Dealmeida 207 Dec 30, 2022
A Python wheel containing PostgreSQL

postgresql-wheel A Python wheel for Linux containing a complete, self-contained, locally installable PostgreSQL database server. All servers run as th

Michel Pelletier 71 Nov 09, 2022
A database migrations tool for SQLAlchemy.

Alembic is a database migrations tool written by the author of SQLAlchemy. A migrations tool offers the following functionality: Can emit ALTER statem

SQLAlchemy 1.7k Jan 01, 2023
Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.

Simple DDL Parser Build with ply (lex & yacc in python). A lot of samples in 'tests/. Is it Stable? Yes, library already has about 5000+ usage per day

Iuliia Volkova 95 Jan 05, 2023
SAP HANA Connector in pure Python

SAP HANA Database Client for Python Important Notice This public repository is read-only and no longer maintained. The active maintained alternative i

SAP Archive 299 Nov 20, 2022
A framework based on tornado for easier development, scaling up and maintenance

turbo 中文文档 Turbo is a framework for fast building web site and RESTFul api, based on tornado. Easily scale up and maintain Rapid development for RESTF

133 Dec 06, 2022
A pythonic interface to Amazon's DynamoDB

PynamoDB A Pythonic interface for Amazon's DynamoDB. DynamoDB is a great NoSQL service provided by Amazon, but the API is verbose. PynamoDB presents y

2.1k Dec 30, 2022
db.py is an easier way to interact with your databases

db.py What is it Databases Supported Features Quickstart - Installation - Demo How To Contributing TODO What is it? db.py is an easier way to interact

yhat 1.2k Jan 03, 2023
A selection of SQLite3 databases to practice querying from.

Dummy SQL Databases This is a collection of dummy SQLite3 databases, for learning and practicing SQL querying, generated with the VS Code extension Ge

1 Feb 26, 2022
aiomysql is a library for accessing a MySQL database from the asyncio

aiomysql aiomysql is a "driver" for accessing a MySQL database from the asyncio (PEP-3156/tulip) framework. It depends on and reuses most parts of PyM

aio-libs 1.5k Jan 03, 2023
A simple python package that perform SQL Server Source Control and Auto Deployment.

deploydb Deploy your database objects automatically when the git branch is updated. Production-ready! ⚙️ Easy-to-use 🔨 Customizable 🔧 Installation I

Mert Güvençli 10 Dec 07, 2022
aioodbc - is a library for accessing a ODBC databases from the asyncio

aioodbc aioodbc is a Python 3.5+ module that makes it possible to access ODBC databases with asyncio. It relies on the awesome pyodbc library and pres

aio-libs 253 Dec 31, 2022
High level Python client for Elasticsearch

Elasticsearch DSL Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built o

elastic 3.6k Jan 03, 2023
New generation PostgreSQL database adapter for the Python programming language

Psycopg 3 -- PostgreSQL database adapter for Python Psycopg 3 is a modern implementation of a PostgreSQL adapter for Python. Installation Quick versio

The Psycopg Team 880 Jan 08, 2023
ClickHouse Python Driver with native interface support

ClickHouse Python Driver ClickHouse Python Driver with native (TCP) interface support. Asynchronous wrapper is available here: https://github.com/myma

Marilyn System 957 Dec 30, 2022
A simple password manager I typed with python using MongoDB .

Python with MongoDB A simple python code example using MongoDB. How do i run this code • First of all you need to have a python on your computer. If y

31 Dec 06, 2022
edaSQL is a library to link SQL to Exploratory Data Analysis and further more in the Data Engineering.

edaSQL is a python library to bridge the SQL with Exploratory Data Analysis where you can connect to the Database and insert the queries. The query results can be passed to the EDA tool which can giv

Tamil Selvan 8 Dec 12, 2022
The JavaScript Database, for Node.js, nw.js, electron and the browser

The JavaScript Database Embedded persistent or in memory database for Node.js, nw.js, Electron and browsers, 100% JavaScript, no binary dependency. AP

Louis Chatriot 13.2k Jan 02, 2023
A HugSQL-inspired database library for Python

PugSQL PugSQL is a simple Python interface for using parameterized SQL, in files. See pugsql.org for the documentation. To install: pip install pugsql

Dan McKinley 558 Dec 24, 2022