Usage of Datatractor Schema
Usage example
This repository is intended to be used as a git submodule
to be cloned and used by your downstream code. As an example, we may look at the Datatractor Yard:
A screenshot of Datatractor Yard. Note that schemas
is a git submodule
, pointing to the Schema repository at a certain commit (here: c03a732
, corresponding to the 1.0 release).
After initializing and updating the submodule, the yaml files defining the FileType
and Extractor
schemas are available in the <submodule>/schemas/
directory.
Validation
The schema definitions contained in this repository can be used to locally validate your own FileTypes
and Extractors
. Several examples are provided for this purpose in the examples
folder.
To get started, first make sure LinkML
is installed in your python environment:
pip install linkml~=1.3
Then, you can check the validity of your filetype or extractor definition against the provided schemas using linkml-validate
. For example, to validate the provided example FileType
definition in netcdf.yml
against the FileType
schema, run:
linkml-validate -s <submodule>/schemas/filetype.yml -C FileType <submodule>/examples/filetype/netcdf.yml
If successful, you should see ✓ No problems found
returned by linkml-validate
.
Translation
The LinkML
schemas provided here can be automatically translated to other formats, including JSONSchema
, Python dataclasses
, or Pydantic
classes:
gen-json-schema <submodule>/schemas/filetype.yml >> filetype.json
gen-python <submodule>/schemas/filetype.yml >> filetype.py
gen-pydantic <submodule>/schemas/filetype.yml >> filetype.py
The generated files are intended to be used in downstream codes such as in the validation function of the Datatractor Yard.