Python Data Structures for Humans™.

Overview

Schematics

Python Data Structures for Humans™.

Build Status Coverage

About

Project documentation: https://schematics.readthedocs.io/en/latest/

Schematics is a Python library to combine types into structures, validate them, and transform the shapes of your data based on simple descriptions.

The internals are similar to ORM type systems, but there is no database layer in Schematics. Instead, we believe that building a database layer is made significantly easier when Schematics handles everything but writing the query.

Further, it can be used for a range of tasks where having a database involved may not make sense.

Some common use cases:

Example

This is a simple Model.

>>> from schematics.models import Model
>>> from schematics.types import StringType, URLType
>>> class Person(Model):
...     name = StringType(required=True)
...     website = URLType()
...
>>> person = Person({'name': u'Joe Strummer',
...                  'website': 'http://soundcloud.com/joestrummer'})
>>> person.name
u'Joe Strummer'

Serializing the data to JSON.

>>> import json
>>> json.dumps(person.to_primitive())
{"name": "Joe Strummer", "website": "http://soundcloud.com/joestrummer"}

Let's try validating without a name value, since it's required.

>>> person = Person()
>>> person.website = 'http://www.amontobin.com/'
>>> person.validate()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "schematics/models.py", line 231, in validate
    raise DataError(e.messages)
schematics.exceptions.DataError: {'name': ['This field is required.']}

Add the field and validation passes.

>>> person = Person()
>>> person.name = 'Amon Tobin'
>>> person.website = 'http://www.amontobin.com/'
>>> person.validate()
>>>

Testing & Coverage support

Run coverage and check the missing statements.

$ coverage run --source schematics -m py.test && coverage report
Comments
  • RFC: The Undefined a.k.a. dealing with unset fields

    RFC: The Undefined a.k.a. dealing with unset fields

    Everybody probably agrees that Schematics needs the ability to distinguish between fields that have been set to None and fields that have not been assigned any value. The issue pops up all the time in various guises, recently as #249.

    I have written a patch that will resolve this once and for all. It's available for testing at https://github.com/bintoro/schematics/tree/undef-value-v1.

    Diff vs master: https://github.com/bintoro/schematics/compare/undef-value-v1

    Documentation: https://gist.github.com/bintoro/c767f314fe827d8082a2

    I posted on the dev list a while back, but since the list is so dead, let's try again here. Also, since then, I've been able to simplify the code and documentation quite a bit by removing a certain needlessly complicated feature.

    All kinds of feedback would be appreciated, for example:

    • How to deal with required fields? Should the required parameter be changed to be able to assume three different values representing "not required", "required", and "something better than None required"?
    • What does the upgrade path look like? This is a breaking change of sorts, so it's unfortunate that v1.0 is already out of the oven. If you ran the code, how did your application deal with the change in the meaning of obj.somefield is None? Should the feature ship as disabled by default until some future release?
    • Is there actually any use case for serializing Undefined as anything other than None? If yes, are the presently available hooks sufficient?
    feature discussion 
    opened by bintoro 42
  • subclassed models are not serialized

    subclassed models are not serialized

    Consider the following:

    from schematics.models import Model
    from schematics.types import StringType
    from schematics.types.compound import ModelType
    
    class Asset(Model):
        file_name = StringType()
    
    class S3Asset(Asset):
        bucket_name = StringType()
    
    class Product(Model):
        title = StringType()
        asset = ModelType(Asset)
    
    asset = S3Asset({'bucket_name': 'assets_bucket', 'file_name': 'bar'})
    
    product = Product({'title': 'baz', 'asset': asset})
    print product.to_native()
    

    This code outputs: {'asset': {'storage_location': u's3'}, 'title': u'baz'}

    Because "Product.asset" is defined as an "Asset", rather than an S3Asset, schematics does not serialize the S3Asset-specific fields.

    I believe this is because the list of fields to be serialized is defined in ModelMeta, before the model "knows" that it is being passed a subclass.

    I'm not yet sure of how I'd fix this - perhaps update to_primitive to do self inspection there, rather than ModelMeta and using self.__class__?

    opened by poundifdef 23
  • Py3

    Py3

    per schematics/schematics#241 I have attempted a python 3 port using the six library. Almost all tests are passing locally for me on python 2.7 and 3.4. Some of the code is a bit hacky and needs some attention and review but it seems to be good enough to get me going with schematics on python 3.

    opened by meantheory 22
  • support arbitrary kwargs to a field

    support arbitrary kwargs to a field

    This will allow us to document additional domain specific attributes on the schema & potentially use them for certain things.

    An example would be a type attribute of unique=True where all fields could be enumerated for unique attributes & auto generate database constraints.

    opened by smcclstocks 21
  • Introduce MixedType

    Introduce MixedType

    I've thought about PR #340 by @jmsdnns and decided to try to write a similar thing from scratch instead of adapting PolyModelType, since that one mostly consists of model-related machinery.

    As explained in #340, saying "store value x with type y" is just not possible in the current Schematics architecture. However, as long as the type can be inferred from the value itself, it works great.

    For now this thing is called MixedType.

    Examples

    A field that takes a numeric ID or a UUID

    >>> id_field = MixedType((IntType, UUIDType))
    >>> result = id_field.to_native('ee5a16cb-0ee1-46bc-ac40-fb3018b1d29b')
    >>> result
    UUID('ee5a16cb-0ee1-46bc-ac40-fb3018b1d29b')
    >>> id_field.to_primitive(result)
    'ee5a16cb-0ee1-46bc-ac40-fb3018b1d29b'
    >>> id_field.to_native('99999')
    99999
    

    A list of ints or floats

    numbers = ListType(MixedType((IntType, FloatType)))
    >>> result = numbers.to_native(["2", "0.5", "123", "2.999"])
    >>> result
    [2, 0.5, 123, 2.999]
    >>> [type(item).__name__ for item in result]
    ['int', 'float', 'int', 'float']
    

    A dict of ints or a list of ints

    ints = MixedType((DictType, ListType), field=IntType)
    >>> ints.to_native([1, 2, 3, 4])
    [1, 2, 3, 4]
    >>> ints.to_native(dict(a=1, b=2, c=3, d=4))
    {'a': 1, 'c': 3, 'b': 2, 'd': 4}
    

    Complex situations

    All the previous cases rely on to_native to detect the intended type. You can get quite far with that method by exploiting the fact that the types are evaluated in the original order. The most restrictive type must therefore appear first. In the second example, FloatType would have caught all items had the order been (FloatType, IntType).

    For more complex cases, make a subclass of MixedType and declare the resolution logic there:

    class ShoutType(StringType):
        def to_primitive(self, value, context):
            return StringType.to_native(self, value) + '!!!'
    
    class QuestionType(StringType):
        def to_primitive(self, value, context):
            return StringType.to_native(self, value) + '???'
    
    class FancyStringType(MixedType):
    
        # Declare the possible types, either as classes or parameterized instances.
        types = (ShoutType, QuestionType)
    
        # Override the method that detects the correct type. Return the appropriate class.
        def resolve(self, value, context):
            if value.endswith('?'):
                return QuestionType
            elif value.endswith('!'):
                return ShoutType
    
    >>> FancyStringType().to_primitive("Hello, world!")
    'Hello, world!!!!'
    >>> FancyStringType().to_primitive("Who's a good boy?")
    "Who's a good boy????"
    

    A resolver function can be provided in the field definition too:

    fancy_field = MixedType((ShoutType, QuestionType), resolver=resolve_func)
    

    Alternatively, edit the inner types and orchestrate their to_native methods so that the first successful conversion is guaranteed to be the right one.

    Parameterizing types

    Arguments destined for the inner types may appear directly in the MixedType definition and are passed on, much like with compound types.

    >>> FancyStringType(max_length=20).validate("What is the meaning of life?")
    Traceback (most recent call last):
      ...
    schematics.exceptions.ValidationError: [u'String value is too long.']
    

    Since there can be quite dissimilar types that accept various sets of parameters, MixedType inspects all __init__() argument lists in the types' MROs in order to figure out which parameters go where.

    >>> field = MixedType((IntType, QuestionType), required=True, max_value=100, max_length=50)
    >>> field.required
    True
    >>> field._types[IntType].required
    True
    >>> field._types[QuestionType].required
    True
    >>> field._types[IntType].max_value
    100
    >>> field._types[QuestionType].max_length
    50
    

    Parameterizing types separately

    Just make instances as usual.

    >>> field = MixedType((IntType(min_value=1), FloatType(min_value=0.5)))
    >>> field.validate(0)
    Traceback (most recent call last):
      ...
    schematics.exceptions.ValidationError: [u'Int value should be greater than or equal to 1.']
    >>> field.validate(0.75)
    >>>
    
    opened by bintoro 19
  • StringType() allows casting from int to str

    StringType() allows casting from int to str

    Hi,

    I noticed that StringType defines allow_casts = (str, int). I wonder what the motivation is for this. I was quite surprised that a StringType() field silently casts int input to str.

    I can always make a subclass that overrides the allow_casts class variable though.

    Thanks for your help!

    opened by xihan 19
  • No way to override strict attributes

    No way to override strict attributes

    I'm curious as to why the new strict behavior defaults to False everywhere except during the Model initialization? In any ways it'd be great to be able to override this.

    feature 
    opened by nside 19
  • Create shared environment across nested import/validation loops via a metadata object

    Create shared environment across nested import/validation loops via a metadata object

    Summary

    This patch introduces a metadata object that is propagated through every level of a recursive import/validation process. It is used to carry options and other data that need to be available throughout the entire process, not just during the outermost loop.

    Background

    One thing currently blocking the Undefined feature is the fact that import/validation parameters (importantly, partial=True) only affect the first level of processing. If you have nested models, your settings no longer apply.

    Apparently someone has noticed this before and created this construct in import_loop to pass the deserialization mapping down to nested models:

    try:
        ...
        raw_value = field_converter(field, raw_value, mapping=model_mapping)
    except Exception:
        raw_value = field_converter(field, raw_value)
    

    And there was a similar try—except inside the field_converter in question.

    The patch

    EDIT: Changed param name from meta to env.

    Instead of adding yet another parameter here, I have future-proofed this by turning the third parameter into a multipurpose object representing the current environment. Import converters and validators can now propagate options and whatever data they please in the env parameter. One practical application might be bookkeeping during validation; currently it's possible to have a ModelType structure that gets into an infinite loop.

    The above code now looks like this:

        sub_env = env.copy() # Make a copy
        sub_env.mapping = model_mapping  # Put the relevant mapping in there
        raw_value = field_converter(field, raw_value, env=sub_env) # Pass it on
    

    The way this works is any function (including the initial caller) could decide to set up the Environment object. Once it enters the processing chain, its contents override any overlapping parameters passed as keyword args (EDIT: not by necessity, of course, but because import_loop chooses to treat it that way).

    Right now, the setup is the responsibility of the outermost import_loop. It takes the strict and partial arguments it receives and locks them in for the rest of the import recursion.

    As an aside, if all options were collated into a env object from the start, the parameter signature of import_loop, convert, and friends could look like this:

    def ...(cls, instance_or_dict, env)
    

    Although incorporating many options into a single object may be considered a bit opaque, it's clearly preferable to always passing a dozen values to every method and their aunts:

    model.validate(...)
    => validate(...)
    => import_loop(...)
    => field_converter(...)
    => field.validate(...)
    => submodel.validate(...)
    

    That's five calls for each level of recursion.

    BREAKING CHANGE WARNING

    I've gotten rid of the try–except constructs that otherwise would have been all over the place.

    Considering that the env container is primarily needed in a recursive process, the method signature dilemma has been solved by requiring compound types (MultiType descendants) to accept the env argument in their to_native and validate_ methods. Simple types are called without env.

    Since custom compound types need a (trivial) change as a result, this would be a great opportunity also to get rid of the context parameter that appears in a ton of places. It would be better placed inside env as well.

    Other changes incorporated here

    • Model.__init__ now accepts the partial parameter too (default=True because it was the de facto default already, specified in the main convert func).
    • Fixed a bug where ListType.validate_items() was added to the list of validators twice: once explicitly and once via collation by TypeMeta.
    opened by bintoro 16
  • make_json_ownersafe doesn't traverse list of EmbeddedDocumentFields

    make_json_ownersafe doesn't traverse list of EmbeddedDocumentFields

    class Test(EmbeddedDocument): ... text = StringField() ... class Tester(Document): ... items = ListField(EmbeddedDocumentField(Test)) ... t=Tester(items=[Test(text='mytest')]) Tester.make_json_ownersafe(t) '{"items": [{"_types": ["Test"], "text": "mytest", "_cls": "Test"}]}'

    Without the ListField wrapping the embedded documents, it works just fine.

    opened by chaselee 16
  • Propagate environment during recursive import/export/validation

    Propagate environment during recursive import/export/validation

    A rewrite of #266.

    This is a huge PR, but it's split into meaningful commits, each of which represents a working state with all tests passing.

    env is now always passed to all to_primitive, to_native, and export_loop methods on types. Consequently, the standard field_converter interface also has env as a mandatory component: field_converter(field, value, env).

    The env object will be set up by either import_loop or export_loop during the first iteration if the application hasn't supplied it by then.

    context

    The overloading of "context" with a number of uses has been resolved as follows:

    • context in import_loop() and validate() renamed to trusted_data
    • context as an argument to to_native(), to_primitive(), validate(), and mock() superseded by env
    • context as an argument to expand in transforms.py renamed to expanded_data
    • context when accessing the private context replaced by env.context.

    Basically, env.context is just a recommendation for establishing a private namespace. Apparently the only thing in the library that accesses env.context is MultiLingualString that looks for a locale there.

    export_loop inspection gone

    [DELETED]

    Miscellaneous changes

    • ModelType.strict option removed for now. To support something like this, we would need to differentiate between at least two flavors of settings:

      • static settings (specified as function defaults, model options, or ModelType definitions), where later (as in deeper level of recursion) overrides earlier
      • explicit settings (specified by application at runtime) that would override everything else.

      I suspect ModelType.strict may have been a band-aid to provide some degree of control where the runtime strict setting hasn't been doing anything.

    • BaseType.validate_required() removed. import_loop enforces required=True for fields directly on a model, so this was only needed for nested fields. They now get this validator in MultiType.__init__() as needed.

    Validators

    env is now passed to all validators, including model-level validators. Although the object is absolutely needed only in compound type validators, it's probably best to make the change everywhere at once, since appending env to the argument list is all that is needed to adapt existing code.

    opened by bintoro 15
  • Add 'description' to fields and models

    Add 'description' to fields and models

    Descriptions are a fundamental aspect of a well-designed schema. They are not only great documentation for other developers, they can be programmatically inspected to populate UIs, etc. They're a key part of the JSON Schema specification and so their addition to schematics would enable tools like jsonschematics to create more compliant output, which would in turn make it more compatible with other tools that take JSON Schema documents as an input.

    For types, the description makes sense as a new argument to BaseType.__init__. For Models, the value could be taken from the object's __doc__ attribute (or perhaps just the first sentence).

    Thanks for listening.

    feature 
    opened by chadrik 15
  • docs: Fix a few typos

    docs: Fix a few typos

    There are small typos in:

    • docs/usage/extending.rst
    • schematics/compat.py
    • schematics/util.py

    Fixes:

    • Should read werkzeug rather than werzeug.
    • Should read insensitive rather than insenstive.
    • Should read accommodate rather than accomodate.

    Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

    opened by timgates42 0
  • the traceback including the line number where the validation error occurred

    the traceback including the line number where the validation error occurred

    I want to know the line number where the error occurred, it gives me something like this,

    (DataError({'url': ValidationError([ErrorMessage('Not a well-formed URL.', None)])}))
    

    and when I print the traceback for this error, then it points me to the validate.py file which is not what I want, how do I get the traceback to the line where url is entered incorrectly.

    opened by ravi140222 0
  • cannot collect all model errors

    cannot collect all model errors

    `from schematics.models import Model from schematics.types import StringType, FloatType

    class schema(Model): f1 = StringType(choices=['a']) f2 = FloatType()

    sample ={'f1': 'b', 'f2': 'x'} instance = schema (sample) instance.validate`

    The exception only shows the float error without even initialixing the instance. How do I collect all errors according to documentation?

    opened by dedominicisfa 1
  • Python 3.10 - Schematics finally broke

    Python 3.10 - Schematics finally broke

    Schematics has been ignoring this:

    $ python
    Python 3.9.7 (default, Aug 31 2021, 13:28:12) 
    [GCC 11.1.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from collections import Iterable
    <stdin>:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
    

    With the release of Python 3.10, it is finally broken:

    $ python                
    Python 3.10.0rc2 (default, Oct  4 2021, 11:48:47) [GCC 11.1.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from collections import Iterable
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: cannot import name 'Iterable' from 'collections' (/usr/local/lib/python3.10/collections/__init__.py)
    
    opened by ahopkins 4
  • schematics.exceptions.ConversionError: [u'No default or explicit locales were given.']

    schematics.exceptions.ConversionError: [u'No default or explicit locales were given.']

    from schematics.models import Model
    from schematics.types import MultilingualStringType 
    
    class TestModel(Model):
        mls = MultilingualStringType()
    
    mls_test = TestModel({'mls': {
            'en_US': 'Hello, world!',
            'fr_FR': 'Bonjour tout le monde!',
            'es_MX': '¡Hola, mundo!', }})
    print(mls_test.to_primitive(context={'locale': 'en_US'}))
    

    error: schematics.exceptions.ConversionError: [u'No default or explicit locales were given.']

    opened by phpsxg 1
Releases(v1.1.1)
Jupyter Notebook extension leveraging pandas DataFrames by integrating DataTables and ChartJS.

Jupyter DataTables Jupyter Notebook extension to leverage pandas DataFrames by integrating DataTables JS. About Data scientists and in fact many devel

Marek Čermák 142 Dec 28, 2022
Python Package for CanvasXpress JS Visualization Tools

CanvasXpress Python Library About CanvasXpress for Python CanvasXpress was developed as the core visualization component for bioinformatics and system

Dr. Todd C. Brett 5 Nov 07, 2022
eoplatform is a Python package that aims to simplify Remote Sensing Earth Observation by providing actionable information on a wide swath of RS platforms and provide a simple API for downloading and visualizing RS imagery

An Earth Observation Platform Earth Observation made easy. Report Bug | Request Feature About eoplatform is a Python package that aims to simplify Rem

Matthew Tralka 4 Aug 11, 2022
Exploratory analysis and data visualization of aircraft accidents and incidents in Brazil.

Exploring aircraft accidents in Brazil Occurrencies with aircraft in Brazil are investigated by the Center for Investigation and Prevention of Aircraf

Augusto Herrmann 5 Dec 14, 2021
Python scripts to manage Chia plots and drive space, providing full reports. Also monitors the number of chia coins you have.

Chia Plot, Drive Manager & Coin Monitor (V0.5 - April 20th, 2021) Multi Server Chia Plot and Drive Management Solution Be sure to ⭐ my repo so you can

338 Nov 25, 2022
script to generate HeN ipfs app exports of GLSL shaders

HeNerator A simple script to generate HeN ipfs app exports from any frag shader created with: GlslViewer GlslEditor The Book of Shaders glslCanvas VS

Patricio Gonzalez Vivo 22 Dec 21, 2022
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas

CogniPy for Pandas - In-memory Graph Database and Knowledge Graph with Natural Language Interface Whats in the box Reasoning, exploration of RDF/OWL,

Cognitum Octopus 34 Dec 13, 2022
Painlessly create beautiful matplotlib plots.

Announcement Thank you to everyone who has used prettyplotlib and made it what it is today! Unfortunately, I no longer have the bandwidth to maintain

Olga Botvinnik 1.6k Jan 06, 2023
Python package for hypergraph analysis and visualization.

The HyperNetX library provides classes and methods for the analysis and visualization of complex network data. HyperNetX uses data structures designed to represent set systems containing nested data

Pacific Northwest National Laboratory 304 Dec 27, 2022
Pebble is a stat's visualization tool, this will provide a skeleton to develop a monitoring tool.

Pebble is a stat's visualization tool, this will provide a skeleton to develop a monitoring tool.

Aravind Kumar G 2 Nov 17, 2021
Regress.me is an easy to use data visualization tool powered by Dash/Plotly.

Regress.me Regress.me is an easy to use data visualization tool powered by Dash/Plotly. Regress.me.-.Google.Chrome.2022-05-10.15-58-59.mp4 Get Started

Amar 14 Aug 14, 2022
Turn a STAC catalog into a dask-based xarray

StackSTAC Turn a list of STAC items into a 4D xarray DataArray (dims: time, band, y, x), including reprojection to a common grid. The array is a lazy

Gabe Joseph 148 Dec 19, 2022
The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

Plotly 12.7k Jan 05, 2023
Simple plotting for Python. Python wrapper for D3xter - render charts in the browser with simple Python syntax.

PyDexter Simple plotting for Python. Python wrapper for D3xter - render charts in the browser with simple Python syntax. Setup $ pip install PyDexter

D3xter 31 Mar 06, 2021
nptsne is a numpy compatible python binary package that offers a number of APIs for fast tSNE calculation.

nptsne nptsne is a numpy compatible python binary package that offers a number of APIs for fast tSNE calculation and HSNE modelling. For more detail s

Biomedical Visual Analytics Unit LUMC - TU Delft 29 Jul 05, 2022
An interactive UMAP visualization of the MNIST data set.

Code for an interactive UMAP visualization of the MNIST data set. Demo at https://grantcuster.github.io/umap-explorer/. You can read more about the de

grant 70 Dec 27, 2022
Functions for easily making publication-quality figures with matplotlib.

Data-viz utils 📈 Functions for data visualization in matplotlib 📚 API Can be installed using pip install dvu and then imported with import dvu. You

Chandan Singh 16 Sep 15, 2022
Data Analysis: Data Visualization of Airlines

Data Analysis: Data Visualization of Airlines Anderson Cruz | London-UK | Linkedin | Nowa Capital Project: Traffic Airlines Airline Reporting Carrier

Anderson Cruz 1 Feb 10, 2022
This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, reading data from PubSub.

Sample streaming Dataflow pipeline written in Python This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, readin

Israel Herraiz 9 Mar 18, 2022
Simple, realtime visualization of neural network training performance.

pastalog Simple, realtime visualization server for training neural networks. Use with Lasagne, Keras, Tensorflow, Torch, Theano, and basically everyth

Rewon Child 416 Dec 29, 2022