schemasheets - structuring your data using spreadsheets

Last update: Dec 01, 2022

Overview

schemasheets - structuring your data using spreadsheets

Create a data dictionary / schema for your data using simple spreadsheets - no coding required.

Author your schema as a google sheet or excel spreadsheet
Generate schemas:
- LinkML
- SHACL and ShEx
- JSON-Schema
- SQL DDL
- OWL
Validate data automatically

See the test google sheets for examples

See also the examples folder which has an end-to-end example

How it works

The following example shows a schema sheet for a schema that is focused around the concept of a Person. The rows in the sheet describe either classes or slots (fields)

record	field	key	multiplicity	range	desc	schema.org
`>` class	slot	identifier	cardinality	range	description	`exact_mappings: {curie_prefix: sdo}`
-	id	yes	1	string	any identifier	identifier
-	description	no	0..1	string	a textual description	description
Person		n/a	n/a	n/a	a person,living or dead	Person
Person	id	yes	1	string	identifier for a person	identifier
Person, Organization	name	no	1	string	full name	name
Person	age	no	0..1	decimal	age in years	-
Person	gender	no	0..1	decimal	age in years	-
Person	has medical history	no	0..*	MedicalEvent	medical history	-
MedicalEvent		n/a	n/a	n/a	-	-

personinfo google sheet

The sheet is structured as follows:

The first line is a header line. You get to decide the column headers
Subsequent lines starting with > are column descriptors
- these provide a way to interpret the columns
- descriptors can be drawn from the linkml vocabulary
Remaining rows are elements of your schema
- Each element gets its own row
- A row can represent a class (record, table), field (column), enumeration, or other element types

The most basic schema concepts are classes and slots

classes represent record types, similar to tables in a database or sheets in a spreadsheet
slots represent fields, similar to columns in a database or spreadsheet

These can be used in combination:

If a class is provided, but a slot is not, then the row represents a class.
If a slot is provided, but a class is not, then the row represents a slot (field)
If both class and slot are provided, then the row represents the usage of a slot in the context of a class

Generating schemas

Assuming your schema is arranged as a set of sheets (TSV files) in the src folder:

sheets2project -d . src/*.tsv

This will generate individual folders for jsonschema, shacl, ... as well as a website that can be easily hosted on github.

To create only LinkML yaml:

schemasheets -o my.yaml  src/*.tsv

Simple data dictionaries

This framework allows you to represent complex relation-style schemas using spreadsheets/TSVs. But it also allows for representation of simple "data dictionaries" or "minimal information lists". These can be thought of as "wide tables", e.g. representing individual observations or observable units such as persons or samples.

TODO

Prefixes

If you specify a column descriptor of prefix, then rows with that column populated will represent prefixes. The prefix expansion is specified using prefix_reference

Example:

prefix	URI
`>` prefix	prefix_reference
sdo	http://schema.org/
personinfo	https://w3id.org/linkml/examples/personinfo/
famrel	https://example.org/FamilialRelations#
GSSO	http://purl.obolibrary.org/obo/GSSO_

We recommend you specify prefixes in their own sheet.

If prefixes are not provided, and you do not specify --no-repair then prefixes will be inferred using bioregistry

Schema-level metadata

If you specify a column descriptor of schema, then rows with that column populated will represent schemas.

Example:

Schema	uri	Desc	Schema Prefix
`>` schema	id	description	default_prefix
PersonInfo	https://w3id.org/linkml/examples/personinfo	Information about people, based on schema.org	personinfo

The list of potential descriptors for a schema can be found by consulting SchemaDefinition in the LinkML metamodel.

Both id and name are required, these will be auto-filled if you don't fill this in.

Populating the fields description and license is strongly encouraged.

Currently multiple schemas are not supported, we recommend providing a single-row sheet for schema metadata

Enums

Two descriptors are provided for enumerations:

enum
permissible_value

These can be used in combination:

If enum is provided, and permissible_value is not, then the row represents an enumeration
If both enum and permissible_value are provided, the row represents a particular enum value

The following example includes two enums:

ValueSet	Value	Mapping	Desc
`>` enum	permissible_value	meaning	description
FamilialRelationshipType	-	-	familial relationships
FamilialRelationshipType	SIBLING_OF	famrel:01	share the same parent
FamilialRelationshipType	PARENT_OF	famrel:02	biological parent
FamilialRelationshipType	CHILD_OF	famrel:03	inverse of parent
GenderType	-	-	gender
GenderType	nonbinary man	GSSO:009254	-
GenderType	nonbinary woman	GSSO:009253	-
...	...	...	-

enums google sheet

All other descriptors are optional, but we recommend you provide descriptions of both the enumeration and the meaning descriptor which maps the value to a vocabulary or ontology term.

For more on enumerations, see the linkml tutorial

Specifying cardinality

See configschema.yaml for all possible vocabularies, these include:

UML strings, e.g. '0..1'
text strings matching the cardinality vocabulary, e.g. 'zero to one'
codes used in cardinality vocabulary, e.g. O, M, ...

The vocabulary maps to underlying LinkML primitives:

Slot-class grids

If you have a large number of fields/columns, with varying applicability/cardinality across different classes, it can be convenient to specify this as a grid.

An example is a minimal information standard that includes different packages or checklists, e.g. MIxS.

For example:

term	title	desc	mi_patient	mi_mod	mi_terrestrial	mi_marine	mi_extraterrestrial
`>` slot	title	description	cardinality	cardinality	cardinality	cardinality	cardinality
`>`			`applies_to_class: MI patient`	`applies_to_class: MI model organism`	`applies_to_class: MI terrestrial sample`	`applies_to_class: MI marine sample`	`applies_to_class: MI extraterrestrial sample`
id	unique identifier	a unique id	M	M	M	M	M
alt_ids	other identifiers	any other identifiers	O	O	O	O	O
body_site	body site	location where sample is taken from	M	R	-	-	-
disease	disease status	disease the patient had	M	O	-	-	-
age	age	age	M	R	-	-	-
depth	depth	depth in ground or water	-	-	R	R	R
alt	altitude	height above sea level			R	R	R
salinity	salinity	salinity			R	R	R
porosity	porosity	porosity
location	location	location on earth
astronomical_body	astronomical body	planet or other astronomical object where sample was collected					M

data dictionary google sheet

Here the applies_to_class descriptor indicates that the column value for the slot indicated in the row is interpreted as slot usage for that class.

Metatype fields

In all of the examples above, distinct descriptors are used for class names, slot names, type names, enum names, etc

An alternative pattern is to mix element types in a single sheet, indicate the name of the element using name and the type using metatype.

For example:

type	item	applies to	key	multiplicity	range	parents	desc	schema.org	wikidata	belongs	status	notes
`>` metatype	name	class	identifier	cardinality	range	is_a	description	`exact_mappings: {curie_prefix: sdo}`	`exact_mappings: {curie_prefix: wikidata}`	in_subset	status	ignore
`> vmap: {C: class, F: slot}`
`>`									curie_prefix: wikidata		`vmap: {T: testing, R: release}`
F	id		yes	1	string		any identifier	identifier
F	name	Person, Organization	no	1	string		full name	name
F	description		no	0..1	string		a textual description	description
F	age	Person	no	0..1	decimal		age in years
F	gender	Person	no	0..1	decimal		age in years
F	has medical history	Person	no	0..*	MedicalEvent		medical history				T
C	Person						a person,living or dead	Person	Q215627		R
C	Event						grouping class for events		Q1656682	a	R
C	MedicalEvent					Event	a medical encounter			b	T
C	ForProfit					Organization
C	NonProfit					Organization			Q163740			foo

personinfo with tyoes

Formal specification

In progress. The following is a sketch. Please refer to the above examples for elucidation.

The first line is a HEADER line.
- Each column must be non-null and unique
- In future grouping columns may be possible
Subsequent lines starting with > are column configurations
- A column configuration can be split over multiple lines
- Each line must be a valid yaml string (note that a single token is valid yaml for that token)
- The first config line must include a descriptor
- Subsequent lines are settings for that descriptor
- A descriptor can be one of:
  - Any LinkML metamodel slot (e.g. description, comments, required, recommended, multivalued)
  - The keyword cardinality
  - An element metatype (schema, prefix, class, enum, slot, type, subset, permissible_value)
- Setting can be taken from configschema.yaml
  - vmap provides a mapping used to translate column values. E.g. a custom "yes" or "no" to "true" or "false"
  - various keys provide ways to auto-prefix or manipulate strings
Remaining rows are elements of your schema
- Each element gets its own row
- A row can represent a class (record, table), field (column), enumeration, or other element types
- The type of the row is indicated by whether columns with metatype descriptors are filled
  - E.g. if a column header "field" has a descriptor "slot" then any row with a non-null value is interpreted as a slot
- If a metatype descriptor is present then this is used
- A row must represent exactly one element type
- If both class and slot descriptors are present then the row is interpreted as a slot in the context of that class (see slot_usage)
All sheets/TSVs are combined together into a single LinkML schema as YAML
This LinkML schema can be translated to other formats as per the LinkML generators

Working with files / google sheets

This tool takes as input a collection of sheets, which are stored as TSV files.

You can make use of various ways of managing/organizing these:

TSVs files maintained in GitHub
Google sheets
Excel spreadsheets
SQLite databases

Tips for each of these and for organizing your information are provided below

Multiple sheets vs single sheets

It is up to you whether you represent your schema as a single sheet or as multiple sheets

However, if your schema includes a mixture of different element types, you may end up with a lot of null values if you have a single sheet. It can be more intuitive to "normalize" your schema description into different sheets:

sheets for classes/slots
sheets for enums
sheets for types

Currently schemasheets has no built in facilities for interacting directly with google sheets - it is up to you to both download and upload these

TODO: scripts for merging/splitting sheets

Manual upload/download

Note that you can create a URL from a google sheet to the TSV download - TODO

COGS

We recommend the COGS framework for working with google sheets

cogs

A common pattern is a single sheet document for a schema, with different sheets/tabs for different parts of the schema

TODO: example

Working with Excel spreadsheets

Currently no direct support, it is up to you to load/save as individual TSVs

Working with SQLite

Comments

Three tests failing on Mark's laptop but in GH actions or several other people's computers

FAILED                               [ 30%]
test_schema_exporter.py:174 (test_types)
self = SchemaMaker(schema=SchemaDefinition(name='TEMP', id_prefixes=[], definition_uri=None, local_names={}, conforms_to=None...), element_map=None, metamodel=None, cardinality_vocabulary=None, default_name=None, unique_slots=None, gsheet_id=None)
file_name = '/Users/MAM/Documents/gitrepos/schemasheets/tests/output/mini.tsv'
delimiter = '\t'

    def merge_sheet(self, file_name: str, delimiter='\t') -> None:
        """
        Merge information from the given schema sheet into the current schema
    
        :param file_name: schema sheet
        :param delimiter: default is tab
        :return:
        """
        logging.info(f'READING {file_name} D={delimiter}')
        #with self.ensure_file(file_name) as tsv_file:
        #    reader = csv.DictReader(tsv_file, delimiter=delimiter)
        with self.ensure_csvreader(file_name, delimiter=delimiter) as reader:
            schemasheet = SchemaSheet.from_dictreader(reader)
            line_num = schemasheet.start_line_number
            # TODO: check why this doesn't work
            #while rows and all(x for x in rows[-1] if not x):
            #    print(f'TRIMMING: {rows[-1]}')
            #    rows.pop()
            logging.info(f'ROWS={len(schemasheet.rows)}')
            for row in schemasheet.rows:
                try:
>                   self.add_row(row, schemasheet.table_config)

../schemasheets/schemamaker.py:105: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SchemaMaker(schema=SchemaDefinition(name='TEMP', id_prefixes=[], definition_uri=None, local_names={}, conforms_to=None...), element_map=None, metamodel=None, cardinality_vocabulary=None, default_name=None, unique_slots=None, gsheet_id=None)
row = {'Desc': 'my string', 'Extends': 'string', 'Type': '', 'base': '', ...}
table_config = TableConfig(name=None, columns={'Type': ColumnConfig(name='Type', maps_to='type', settings=ColumnSettings(curie_prefix...], all_of=[]), is_element_type=None)}, column_by_element_type={'type': 'Type'}, metatype_column=None, name_column=None)

    def add_row(self, row: Dict[str, Any], table_config: TableConfig):
>       for element in self.row_focal_element(row, table_config):

../schemasheets/schemamaker.py:111: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SchemaMaker(schema=SchemaDefinition(name='TEMP', id_prefixes=[], definition_uri=None, local_names={}, conforms_to=None...), element_map=None, metamodel=None, cardinality_vocabulary=None, default_name=None, unique_slots=None, gsheet_id=None)
row = {'Desc': 'my string', 'Extends': 'string', 'Type': '', 'base': '', ...}
table_config = TableConfig(name=None, columns={'Type': ColumnConfig(name='Type', maps_to='type', settings=ColumnSettings(curie_prefix...], all_of=[]), is_element_type=None)}, column_by_element_type={'type': 'Type'}, metatype_column=None, name_column=None)
column = None

    def row_focal_element(self, row: Dict[str, Any], table_config: TableConfig,
                          column: COL_NAME = None) -> Generator[None, Element, None]:
        """
        Each row must have a single focal element, i.e the row is about a class, a slot, an enum, ...
    
        :param row:
        :param table_config:
        :return:
        """
        vmap = {}
        main_elt = None
        if table_config.metatype_column:
            tc = table_config.metatype_column
            if tc in row:
                typ = self.normalize_value(row[tc], table_config.columns[tc])
                if not table_config.name_column:
                    raise ValueError(f'name column must be set when type column ({tc}) is set; row={row}')
                name_val = row[table_config.name_column]
                if not name_val:
                    raise ValueError(f'name column must be set when type column ({tc}) is set')
                if typ == 'class':
                    vmap[T_CLASS] = [self.get_current_element(ClassDefinition(name_val))]
                elif typ == 'slot':
                    vmap[T_SLOT] = [self.get_current_element(SlotDefinition(name_val))]
                else:
                    raise ValueError(f'Unknown metatype: {typ}')
        if table_config.column_by_element_type is None:
            raise ValueError(f'No table_config.column_by_element_type')
        for k, elt_cls in tmap.items():
            if k in table_config.column_by_element_type:
                col = table_config.column_by_element_type[k]
                if col in row:
                    v = self.normalize_value(row[col])
                    if v:
                        if '|' in v:
                            vs = v.split('|')
                        else:
                            vs = [v]
                        if elt_cls == Prefix:
                            if len(vs) != 1:
                                raise ValueError(f'Cardinality of prefix col must be 1; got: {vs}')
                            pfx = Prefix(vs[0], 'TODO')
                            self.schema.prefixes[pfx.prefix_prefix] = pfx
                            vmap[k] = [pfx]
                        elif elt_cls == SchemaDefinition:
                            if len(vs) != 1:
                                raise ValueError(f'Cardinality of schema col must be 1; got: {vs}')
                            self.schema.name = vs[0]
                            vmap[k] = [self.schema]
                        else:
                            vmap[k] = [self.get_current_element(elt_cls(v)) for v in vs]
        def check_excess(descriptors):
            diff = set(vmap.keys()) - set(descriptors + [T_SCHEMA])
            if len(diff) > 0:
                raise ValueError(f'Excess slots: {diff}')
        if column:
            cc = table_config.columns[column]
            if cc.settings.applies_to_class:
                if T_CLASS in vmap and vmap[T_CLASS]:
                    raise ValueError(f'Cannot use applies_to_class in class-focused row')
                else:
                    cls = self.get_current_element(ClassDefinition(cc.settings.applies_to_class))
                    vmap[T_CLASS] = [cls]
        if T_SLOT in vmap:
            check_excess([T_SLOT, T_CLASS])
            if len(vmap[T_SLOT]) != 1:
                raise ValueError(f'Cardinality of slot field must be 1; got {vmap[T_SLOT]}')
            main_elt = vmap[T_SLOT][0]
            if T_CLASS in vmap:
                # TODO: attributes
                c: ClassDefinition
                for c in vmap[T_CLASS]:
                    #c: ClassDefinition = vmap[T_CLASS]
                    if main_elt.name not in c.slots:
                        c.slots.append(main_elt.name)
                    if self.unique_slots:
                        yield main_elt
                    else:
                        c.slot_usage[main_elt.name] = SlotDefinition(main_elt.name)
                        main_elt = c.slot_usage[main_elt.name]
                        yield main_elt
            else:
                yield main_elt
        elif T_CLASS in vmap:
            check_excess([T_CLASS])
            for main_elt in vmap[T_CLASS]:
                yield main_elt
        elif T_ENUM in vmap:
            check_excess([T_ENUM, T_PV])
            if len(vmap[T_ENUM]) != 1:
                raise ValueError(f'Cardinality of enum field must be 1; got {vmap[T_ENUM]}')
            this_enum: EnumDefinition = vmap[T_ENUM][0]
            if T_PV in vmap:
                for pv in vmap[T_PV]:
                    #pv = PermissibleValue(text=v)
                    this_enum.permissible_values[pv.text] = pv
                    yield pv
            else:
                yield this_enum
        elif T_PREFIX in vmap:
            for main_elt in vmap[T_PREFIX]:
                yield main_elt
        elif T_TYPE in vmap:
            for main_elt in vmap[T_TYPE]:
                yield main_elt
        elif T_SUBSET in vmap:
            for main_elt in vmap[T_SUBSET]:
                yield main_elt
        elif T_SCHEMA in vmap:
            for main_elt in vmap[T_SCHEMA]:
                yield main_elt
        else:
>           raise ValueError(f'Could not find a focal element for {row}')
E           ValueError: Could not find a focal element for {'Type': '', 'base': '', 'uri': '', 'Desc': 'my string', 'Extends': 'string'}

../schemasheets/schemamaker.py:318: ValueError

The above exception was the direct cause of the following exception:

    def test_types():
        """
        tests a specification that is dedicated to types
        """
        sb = SchemaBuilder()
        schema = sb.schema
        # TODO: add this functionality to SchemaBuilder
        t = TypeDefinition('MyString', description='my string', typeof='string')
        schema.types[t.name] = t
>       _roundtrip(schema, TYPES_SPEC)

test_schema_exporter.py:184: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_schema_exporter.py:94: in _roundtrip
    schema2 = sm.create_schema(MINISHEET)
../schemasheets/schemamaker.py:61: in create_schema
    self.merge_sheet(f, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SchemaMaker(schema=SchemaDefinition(name='TEMP', id_prefixes=[], definition_uri=None, local_names={}, conforms_to=None...), element_map=None, metamodel=None, cardinality_vocabulary=None, default_name=None, unique_slots=None, gsheet_id=None)
file_name = '/Users/MAM/Documents/gitrepos/schemasheets/tests/output/mini.tsv'
delimiter = '\t'

    def merge_sheet(self, file_name: str, delimiter='\t') -> None:
        """
        Merge information from the given schema sheet into the current schema
    
        :param file_name: schema sheet
        :param delimiter: default is tab
        :return:
        """
        logging.info(f'READING {file_name} D={delimiter}')
        #with self.ensure_file(file_name) as tsv_file:
        #    reader = csv.DictReader(tsv_file, delimiter=delimiter)
        with self.ensure_csvreader(file_name, delimiter=delimiter) as reader:
            schemasheet = SchemaSheet.from_dictreader(reader)
            line_num = schemasheet.start_line_number
            # TODO: check why this doesn't work
            #while rows and all(x for x in rows[-1] if not x):
            #    print(f'TRIMMING: {rows[-1]}')
            #    rows.pop()
            logging.info(f'ROWS={len(schemasheet.rows)}')
            for row in schemasheet.rows:
                try:
                    self.add_row(row, schemasheet.table_config)
                    line_num += 1
                except ValueError as e:
>                   raise SchemaSheetRowException(f'Error in line {line_num}, row={row}') from e
E                   schemasheets.schemamaker.SchemaSheetRowException: Error in line 2, row={'Type': '', 'base': '', 'uri': '', 'Desc': 'my string', 'Extends': 'string'}

../schemasheets/schemamaker.py:108: SchemaSheetRowException

opened by turbomam 2

Feature Requests for GA4GH-VA shcema Web Docs
Summarizing requests related to Web documentation content and format in this ticket. Providing as one long list for now, but happy to break out into tickets for specific feature requests as needed. @sujaypatil96 @sierra-moxon hope we can coordinate soon on these!

Content/Sections I’d like to see in each Class page (in the following order):

Definition: a. already provided, and looks fine b. content comes from the s/s "description" field.

UML-style diagram: a. already provided using YUML, but I find these YUML diagrams hard to read and not all that useful. b. It sounds like a new framework will be used to generate diagrams in the near future, so I will hold off on and requests here until I see how the new diagrams look.

Parents: a. already present as a section on the page, and looks fine

Description:
a. A new section with the title "Description". b. This should contain content in the 'comments' field of the s/s. Ideally as a bulleted list of sentences rather than one long paragraph/block of text, for improved readability. c. At present text form the 'comments' column is in a table at the end of each Class page - but I’d like it front and center directly under the Definition.

Implementation and Use: a. A new section with the title "Implementation and Use" b. content would ideally be derived from the s/s - but not sure how to do this in practice? . . . I hear that the Annotations feature might let me just create a new 'Implementation and Use' column and give it whatever name I want. Not sure what tooling would be needed to generate a proper section in the Class web page that holds the content. d. I'd also want this presented as a bulleted list of sentences/short paragraphs, rather than one long blob of text.

Own Attributes: a. This section already exists in each Class page b. content of course comes from the s/s c. prefer 'expanded' form - not tables - as this better accommodates the types and amount of text I want to provide in describing each attribute. (see below) d. don't think we need the class -> attribute pattern for 'own' attributes (no need class context when you are already on the class page and the section says 'own' )

Inherited Attributes a. This section already exists in each Class page b. content generated from s/s but pulling in all attributes from parents of a given class

Data Examples: a. A new section called "Data Examples" b. content would be nicely formatted yaml or json data examples - e.g. like those in the VRS RTD docs here), Ideally with some lead in text that describes what is being represented (but this could be part of the data example text block, as a # comment preceding the data itself) c. Chris suggested a housing these in a 'Data Examples' directory in the repo - and pulling relevant examples in to a Class web page from these example files automatically. These data examples could then serve multiple purposes (documentation, texting/validation, etc.)

Content/Fields I’d like to see in for each Attribute of a class, as shown in a Class page

The attribute name, description, cardinality, and range are already provided and look good as is.

I’d also like to include a 'Comments:' field that holds text from the ‘comments’ column in the s/s - to provide additional clarification on meaning and usage of an attribute.
opened by mbrush 2
unintuitively, non-string values require protection by a leading `'` in sheets2linkml gsheet-id mode
This works

sheets2linkml \ --output $@ \ --gsheet-id 1zsxvjvifDcmkt72v9m1_VKa2m73_THDJapJYK6dqidw core

In that sheet, I protected numerical values and Booleans in the examples column by preceding them with '. I think the same thing is required for dates, and the affirmative boolean value must be represented as 'true, not the magic value of TRUE.

But switch term MIXS:0000001's example to 555, and you get

sheets2linkml \ --output $@ \ --gsheet-id 1zsxvjvifDcmkt72v9m1_VKa2m73_THDJapJYK6dqidw core_example_555_num

Traceback (most recent call last): File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 105, in merge_sheet self.add_row(row, schemasheet.table_config) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 111, in add_row for element in self.row_focal_element(row, table_config): File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 233, in row_focal_element raise ValueError(f'No table_config.column_by_element_type') ValueError: No table_config.column_by_element_type

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/bin/sheets2linkml", line 8, in sys.exit(convert()) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/click/core.py", line 1130, in call return self.main(*args, **kwargs) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 578, in convert schema = sm.create_schema(list(tsv_files)) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 61, in create_schema self.merge_sheet(f, **kwargs) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 108, in merge_sheet raise SchemaSheetRowException(f'Error in line {line_num}, row={row}') from e schemasheets.schemamaker.SchemaSheetRowException: Error in line 1, row={'Structured comment name > slot > >': 'samp_size', 'Item (rdfs:label) title ': 'amount or size of sample collected', 'Definition description ': 'The total amount or size (volume (ml), mass (g) or area (m2) ) of sample collected.', 'Expected value annotations inner_key: expected_value': 'measurement value', 'Value syntax structured_pattern ': '{float} {unit}', 'Example examples internal_separator: "|"': '555', 'Section slot_group ': 'nucleic acid sequence source', 'migs_eu annotations applies_to_class: migs_eu inner_key: cardinality': 'X', 'migs_ba annotations applies_to_class: migs_ba inner_key: cardinality': 'X', 'migs_pl annotations applies_to_class: migs_pl inner_key: cardinality': 'X', 'migs_vi annotations applies_to_class: migs_vi inner_key: cardinality': 'X', 'migs_org annotations applies_to_class: migs_org inner_key: cardinality': 'X', 'mims annotations applies_to_class: mims inner_key: cardinality': 'C', 'mimarks_s annotations applies_to_class: mimarks_s inner_key: cardinality': 'C', 'mimarks_c annotations applies_to_class: mimarks_c inner_key: cardinality': 'X', 'misag annotations applies_to_class: misag inner_key: cardinality': 'C', 'mimag annotations applies_to_class: mimag inner_key: cardinality': 'C', 'miuvig annotations applies_to_class: miuvig inner_key: cardinality': 'C', 'Preferred unit annotations inner_key: preferred_unit': 'millliter, gram, milligram, liter', 'Occurrence multivalued vmap: {s: false, m: true}': 's', 'MIXS ID slot_uri ': 'MIXS:0000001', 'MIGS ID (mapping to GOLD) annotations inner_key: gold_migs_id': ''} make: *** [generated/MIxS6_from_gsheet_templates_bad.yaml] Error 1
opened by turbomam 1
mkdoc uses wrong branch name for "Edit" link

https://linkml.io/schemasheets/intro/converting/ link Edit on GitHub uses branch name "master" but the correct branch is "main".

Please fix the mkdoc configuration.

opened by VladimirAlexiev 1
problematic urllib3 or chardet versions for new verbatim stuff

when running schemasheets/get_metaclass_slotvals.py or schemasheets/verbatim_sheets.py

/Users/MAM/Library/Caches/pypoetry/virtualenvs/schemasheets-FMUhH2LU-py3.9/lib/python3.9/site-packages/requests/init.py:109: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (5.0.0)/charset_normalizer (2.0.12) doesn't match a supported version! warnings.warn(

opened by turbomam 1
linkml2sheets alternative?
I have found that the experimental linkml2sheets can't sheetify several of the elements and attributes I care about, and it can't seem to do even a minimal dump on complex/large schemas like MIxS. I have written some code that approximate a LinkML/sheets round trip on the following metaclasses:

See the turbomam/linkml-abuse project.Makefile

annotations

class_definitions

enum_definitions

prefixes

schema_definitions

slot_definitions

subset_definitions

type_definitions

The prefixes and subsets don't seem to include the content of imports

I'm not using a template to determine what gets written to the sheets. I'm iterating over all slots, except for the skipped slots listed below.

If my code was going to be included in any LinkML repo, it would need refactoring for performance and readability. I can do some of that. Even as it is, I have already used this for QC'ing the MIxS schema and plan to use it for round-tripping the NMDC submission portal schema (within sheets_and_friends)

There are some minor systematic changes between the before and after schemas. Thats crudely reported in target/roundtrip.yaml

skipped slots:

all_of

alt_descriptions

annotations

any_of

attributes

classes

classification_rules

default_curi_maps

enum_range

enums

exactly_one_of

extensions

from_schema

implicit_prefix

imports

local_names

name

none_of

prefixes

rules

slot_definitions

slot_usage

slots

structured_aliases

subsets

unique_keys

type_uri
opened by turbomam 1
Add better documentation for when to use metatype

Notes from @cmungall:

metatype is useful for cases where you want to have a single column always represent the element name, and the element type to switch depending on the row. If you do it this style, you always need a “name” column. Further up in the stack trace it was complaining about the name field missing.

opened by sujaypatil96 1
Issue 25 range override

issue #25

My contributions from #24 also got committed here

The range override and OWL consequences are triggered by make examples/output/range_override_examples_reasoned.ttl

opened by turbomam 1
no such option: -d
% poetry run sheets2linkml --help

gives the output below, but the script is called sheets2linkml, not schemasheets and the -d option doesn't seem to be implemented

/usr/local/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( Usage: sheets2linkml [OPTIONS] [TSV_FILES]...

Convert schemasheets to a LinkML schema

schemasheets -d . my_schema/*tsv

Options: -o, --output FILENAME output file -v, --verbose --help Show this message and exit.
opened by turbomam 1
invoke with sheets2linkml?
As opposed to schemasheets? See #10

% sheets2linkml --help

Traceback (most recent call last): File "/Users/MAM/my_first_ss/venv/bin/sheets2linkml", line 5, in from fairstructure.schemamaker import convert ModuleNotFoundError: No module named 'fairstructure'
opened by turbomam 1
make all -> No such file or directory~/edirect/pytest
% make all

poetry run pytest /usr/local/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( Creating virtualenv schemasheets-FMUhH2LU-py3.9 in /Users/MAM/Library/Caches/pypoetry/virtualenvs

FileNotFoundError

[Errno 2] No such file or directory: b'/Users/MAM/edirect/pytest'

at /usr/local/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/os.py:607 in _execvpe 603│ path_list = map(fsencode, path_list) 604│ for dir in path_list: 605│ fullname = path.join(dir, file) 606│ try: → 607│ exec_func(fullname, *argrest) 608│ except (FileNotFoundError, NotADirectoryError) as e: 609│ last_exc = e 610│ except OSError as e: 611│ last_exc = e make: *** [test] Error 1
opened by turbomam 1
Add ignore rows feature

I love the ignore column specification. Is there some way to ignore rows? That would help illustrate content from an upstream provider that is being excluded from the model.

Could the metatype specification be repurposed to allow for ignoring rows? (If it doesn't support that already?)
good first issue

opened by turbomam 2
Does the documentation mention that Google Sheets tab names have some constraints?
For example, if a user wants to use the "test enums" tab from https://docs.google.com/spreadsheets/d/1wVoaiFg47aT9YWNeRfTZ8tYHN8s8PAuDx5i2HUcDpvQ/edit#gid=823426713 , then their sheets2linkml command would look like this

sheets2linkml --gsheet-id 1wVoaiFg47aT9YWNeRfTZ8tYHN8s8PAuDx5i2HUcDpvQ test+enums

We should discourage tab names that contain characters requiring more aggressive URL encoding than -> +
documentation
opened by turbomam 0
Issue 76 schema template
#76

@cmungall I wrote test_schema_metadata and added

elif t == T_SCHEMA and isinstance(element, SchemaDefinition)

to schemasheets/schema_exporter.py, following your https://github.com/linkml/schemasheets/pull/77

but I think the exporting steps may have to be different, because we won't be iterating over a multiple schema rows, like we would for slots, classes or prefixes.
opened by turbomam 0
linkml2sheets doesn't work when given a directory of templates
@putmantime and I have observed that running linkml2sheets on a directory of templates doesn't work, even when all of the individual templates do work

the linkml2sheets help gives this example:

linkml2sheets -s my_schema.yaml sheets/*.tsv -d sheets --overwrite

In the nmdc-schema repo, the following two work

schemasheets/tsv_output/slots.tsv: clean_schemasheets linkml2sheets \ --schema src/schema/nmdc.yaml \ --output-directory schemasheets/tsv_output/ \ schemasheets/templates/slots.tsv schemasheets/tsv_output/classes.tsv: clean_schemasheets linkml2sheets \ --schema src/schema/nmdc.yaml \ --output-directory schemasheets/tsv_output/ \ schemasheets/templates/classes.tsv

but this doesn't work

schemasheets/tsv_output/all.tsv: clean_schemasheets linkml2sheets \ --schema src/schema/nmdc.yaml \ --output-directory schemasheets/tsv_output/ \ schemasheets/templates/*.tsv

Even though

ls -l schemasheets/templates

-rw-r--r--@ 1 MAM staff 71 Aug 16 17:22 classes.tsv -rw-r--r--@ 1 MAM staff 58 Aug 16 17:26 prefixes.tsv -rw-r--r--@ 1 MAM staff 2005 Aug 16 18:01 slots.tsv

The error is

Traceback (most recent call last): File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/bin/linkml2sheets", line 8, in sys.exit(export_schema()) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/click/core.py", line 1130, in call return self.main(*args, **kwargs) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/schemasheets/schema_exporter.py", line 297, in export_schema exporter.export(sv, specification=f, to_file=outpath) File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/schemasheets/schema_exporter.py", line 90, in export writer.writerow(row) File "/usr/local/Cellar/[email protected]/3.9.13_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/csv.py", line 154, in writerow return self.writer.writerow(self._dict_to_list(rowdict)) File "/usr/local/Cellar/[email protected]/3.9.13_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/csv.py", line 149, in _dict_to_list raise ValueError("dict contains fields not in fieldnames: " ValueError: dict contains fields not in fieldnames: 'class' make: *** [schemasheets/tsv_output/all.tsv] Error 1
opened by turbomam 1
three tests failing in main

FAILED tests/test_schema_exporter.py::test_types - schemasheets.schemamaker.SchemaSheetRowException: Error in line 2, row={'Type': '', 'base': '', 'uri': '', 'Desc': 'my string', 'Extends': 'string'} FAILED tests/test_schemamaker.py::test_types - AttributeError: 'TypeDefinition' object has no attribute 'type' FAILED tests/test_schemamaker.py::test_combined - AttributeError: 'TypeDefinition' object has no attribute 'type'

opened by turbomam 1

Releases(v0.1.17)

v0.1.17(Dec 6, 2022)
What's Changed

Additional options: use-attributes and table-config-path by @cmungall in https://github.com/linkml/schemasheets/pull/86

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.16...v0.1.17
Source code(tar.gz)
Source code(zip)
v0.1.16(Oct 3, 2022)
What's Changed

Adding correct normalization for inner keys, fixes #67 by @turbomam in https://github.com/linkml/schemasheets/pull/79

Adding documentation on data dictionaries by @cmungall in https://github.com/linkml/schemasheets/pull/80

update dependency by @sierra-moxon in https://github.com/linkml/schemasheets/pull/82

New Contributors

@sierra-moxon made their first contribution in https://github.com/linkml/schemasheets/pull/82

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.15...v0.1.16
Source code(tar.gz)
Source code(zip)
v0.1.15(Sep 16, 2022)
What's Changed

Test and fix for #70, linkmk2sheets for prefixes by @cmungall in https://github.com/linkml/schemasheets/pull/77

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.14...v0.1.15
Source code(tar.gz)
Source code(zip)
v0.1.14(Aug 26, 2022)
What's Changed

fixing pipe-separated whitespace bug in schemamaker by @turbomam in https://github.com/linkml/schemasheets/pull/63

Fixing bug where empty inner objects were not pre-populated. Fixes #72 by @cmungall in https://github.com/linkml/schemasheets/pull/73

bumping linkml version by @cmungall in https://github.com/linkml/schemasheets/pull/74

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.13...v0.1.14
Source code(tar.gz)
Source code(zip)
v0.1.13(Jul 20, 2022)
What's Changed

Fixed handling of inner_key in annotations for both import and export. Fixes #59 by @cmungall in https://github.com/linkml/schemasheets/pull/61

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.12...v0.1.13
Source code(tar.gz)
Source code(zip)
v0.1.12(Jul 13, 2022)
Google sheet IDs can now be passed in directly

sheets2linkml --gsheet-id 1wVoaiFg47aT9YWNeRfTZ8tYHN8s8PAuDx5i2HUcDpvQ personinfo types prefixes -o personinfo.yaml

What's Changed

AutoDocs using Github Actions by @hrshdhgd in https://github.com/linkml/schemasheets/pull/55

Adding support for google sheets by @cmungall in https://github.com/linkml/schemasheets/pull/57

New Contributors

@hrshdhgd made their first contribution in https://github.com/linkml/schemasheets/pull/55

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.11...v0.1.12
Source code(tar.gz)
Source code(zip)
v0.1.11(Jul 12, 2022)
What's Changed

Issue 25 range override by @turbomam in https://github.com/linkml/schemasheets/pull/27

fixes #45 by @VladimirAlexiev in https://github.com/linkml/schemasheets/pull/46

Extending schema_exporter to handle enums and types by @cmungall in https://github.com/linkml/schemasheets/pull/50

documentation and additional tests by @cmungall in https://github.com/linkml/schemasheets/pull/53

New Contributors

@VladimirAlexiev made their first contribution in https://github.com/linkml/schemasheets/pull/46

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.10...v0.1.11
Source code(tar.gz)
Source code(zip)
v0.1.10(Apr 28, 2022)
What's Changed

additional docs and reporting by @cmungall in https://github.com/linkml/schemasheets/pull/36

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.9...v0.1.10
Source code(tar.gz)
Source code(zip)
v0.1.9(Mar 16, 2022)
What's Changed

explicitly instantiate example values, in order to eliminate "argument must be a mapping, not extended_str" error by @turbomam in https://github.com/linkml/schemasheets/pull/24

Quick fix to tests badge in README by @sujaypatil96 in https://github.com/linkml/schemasheets/pull/28

Adding a schemasheet exporter. Refactoring: abstracting common methods and datamodel into shared class. by @cmungall in https://github.com/linkml/schemasheets/pull/30

Added CLI for export, plus documentation by @cmungall in https://github.com/linkml/schemasheets/pull/31

Note: We skipped release v0.1.8 because the release to PyPI was made manually using relevant poetry commands rather than the Github release interface. In order to bring the two into sync we are skipping v0.1.8 and creating v.0.1.9 directly.

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.7...v0.1.9
Source code(tar.gz)
Source code(zip)
v0.1.7(Feb 8, 2022)
What's Changed

started #17 by @turbomam in https://github.com/linkml/schemasheets/pull/18

closes #19 by @turbomam in https://github.com/linkml/schemasheets/pull/21

closes #20 by @turbomam in https://github.com/linkml/schemasheets/pull/22

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.6...v0.1.7
Source code(tar.gz)
Source code(zip)
v0.1.6(Feb 3, 2022)
What's Changed

changed method calls in click CLIs @turbomam in https://github.com/linkml/schemasheets/pull/16

New Contributors

@turbomam made their first contribution in https://github.com/linkml/schemasheets/pull/16

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.5...v0.1.6
Source code(tar.gz)
Source code(zip)
v0.1.5(Feb 2, 2022)
What's Changed

update references to fairstructure to schemasheets by @sujaypatil96 in https://github.com/linkml/schemasheets/pull/6

update version number automatically on PyPI by @sujaypatil96 in https://github.com/linkml/schemasheets/pull/7

docs by @cmungall in https://github.com/linkml/schemasheets/pull/9

issue 14 by @cmungall in https://github.com/linkml/schemasheets/pull/15

New Contributors

@cmungall made their first contribution in https://github.com/linkml/schemasheets/pull/9

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.4...v0.1.5
Source code(tar.gz)
Source code(zip)
v0.1.4(Jan 13, 2022)
Create release to make sure auto publishing Action works.

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.3...v0.1.4
Source code(tar.gz)
Source code(zip)
v0.1.3(Jan 13, 2022)
What's Changed

Github Action for auto publishing package to PyPI by @sujaypatil96 in https://github.com/linkml/schemasheets/pull/4

Note: This release was updated to v0.1.3 because there was a file name reuse error uploading v0.1.2 to pypi.

Full Changelog: https://github.com/linkml/schemasheets/compare/v0.1.1...v0.1.2
Source code(tar.gz)
Source code(zip)
v0.1.1(Jan 11, 2022)
show readme contents on pypi homepage

Source code(tar.gz)
Source code(zip)
v0.1.0(Jan 11, 2022)
First official release of the schemasheets package

See README.md for details on how to use the package

Source code(tar.gz)
Source code(zip)

schemasheets - structuring your data using spreadsheets

Related tags

Overview

schemasheets - structuring your data using spreadsheets

How it works

Generating schemas

Simple data dictionaries

Prefixes

Schema-level metadata

Enums

Specifying cardinality

Slot-class grids

Metatype fields

Formal specification

Working with files / google sheets

Multiple sheets vs single sheets

Manual upload/download

COGS

Working with Excel spreadsheets

Working with SQLite

Comments

Content/Sections I’d like to see in each Class page (in the following order):

Content/Fields I’d like to see in for each Attribute of a class, as shown in a Class page

Releases(v0.1.17)

v0.1.17(Dec 6, 2022)

What's Changed

v0.1.16(Oct 3, 2022)

What's Changed

New Contributors

v0.1.15(Sep 16, 2022)

What's Changed

v0.1.14(Aug 26, 2022)

What's Changed

v0.1.13(Jul 20, 2022)

What's Changed

v0.1.12(Jul 13, 2022)

What's Changed

New Contributors

v0.1.11(Jul 12, 2022)

What's Changed

New Contributors

v0.1.10(Apr 28, 2022)

What's Changed

v0.1.9(Mar 16, 2022)

What's Changed

v0.1.7(Feb 8, 2022)

What's Changed

v0.1.6(Feb 3, 2022)

What's Changed

New Contributors

v0.1.5(Feb 2, 2022)

What's Changed

New Contributors

v0.1.4(Jan 13, 2022)

v0.1.3(Jan 13, 2022)

What's Changed

v0.1.1(Jan 11, 2022)

v0.1.0(Jan 11, 2022)

Owner

Linked data Modeling Language

Python Data Structures and Algorithms

Al-Quran dengan Terjemahan Indonesia

Array is a functional mutable sequence inheriting from Python's built-in list.

Map single-cell transcriptomes to copy number evolutionary trees.

CLASSIX is a fast and explainable clustering algorithm based on sorting

Common sorting algorithims in Python

A mutable set that remembers the order of its entries. One of Python's missing data types.

Decided to include my solutions for leetcode problems.

Python tree data library

This repository contains code for CTF platform.

One-Stop Destination for codes of all Data Structures & Algorithms

Python library for doing things with Grid-like structures

Programming of a spanning tree algorithm with Python : In depth first with a root node.

Leetcode solutions - All algorithms implemented in Python 3 (for education)

Chemical Structure Generator

An command-line utility that schedules your exams preparation routines

A mutable set that remembers the order of its entries. One of Python's missing data types.

Final Project for Practical Python Programming and Algorithms for Data Analysis

A Python library for electronic structure pre/post-processing

Python collections that are backended by sqlite3 DB and are compatible with the built-in collections