Class: Extractor

A script, code, or web service that, when executed, can extract information from a supplied “file” with a specific FileType.

Comments

The supplied “file” may also be a set of files with a given structure.
The extracted information may be verbatim or transformed, and include the scientific data and/or metadata contained within the file.

URI: datatractor_schema:Extractor

erDiagram Extractor { string id string name string description stringList subject string source_repository string documentation string instructions } Installation { InstallerTypes method string requires_python string requirements stringList packages } Usage { UsageTypes method string setup string command UsageScope scope stringList supported_filetypes } SupportedFileType { string id string description } UsageTemplate { string input_path string input_type string output_path string output_type } License { string uri string spdx } Citation { string uri stringList creators stringList contributors string title string type } Extractor ||--}o Citation : "citations" Extractor ||--|| License : "license" Extractor ||--}| SupportedFileType : "supported_filetypes" Extractor ||--}o SupportedFileType : "supported_output_filetypes" Extractor ||--}o Usage : "usage" Extractor ||--}o Installation : "installation" SupportedFileType ||--|o UsageTemplate : "template"

Slots

Name	Cardinality and Range	Description	Inheritance
id	1 String	A unique identifier for the entry within the Datatractor Yard namespace, this should be a shorthand label rather than a UUID. Only lower-case alphanumeric and dash (“-”) characters are permitted.	direct
name	1 String	A recognisable name for the entry.	direct
description	1 String	A human-readable outline of the entry, its format, data content and uses.	direct
subject	* String	Any keywords, phrases or classification codes that are relevant to the entry, e.g., particular scientific domains of applicability, or experimental techniques.	direct
citations	* Citation	A citation or citations for the entry, to be provided should it be used in academic work (or otherwise).	direct
license	1 License	A URL, URI or SPDX license identifier for a legal document giving official permission to do something with the resource.	direct
supported_filetypes	1..* SupportedFileType	An enumeration of the `FileType` that an `Extractor` supports, matching `FileTypes` present in the registry. The `FileType->id` slot can be passed to the `Extractor`, see the `Usage` class.	direct
supported_output_filetypes	* SupportedFileType	An enumeration of the possible output formats of an `Extractor`. These should match `FileTypes` present in the registry. They can be specified on extractor execution using the templates described in the `Extractor->Usage->command` slot, see the `Usage` class.	direct
source_repository	0..1 String	A URL or URI for a source code repository associated with this extractor.	direct
documentation	0..1 String	A URL or URI for any online documentation associated with this extractor.	direct
usage	* Usage	A machine-actionable instructions for the usage of the Extractor. The described usage pattern shall be available after the instructions specified in `Extractor->installation` slot have been followed.	direct
installation	* Installation	A machine-actionable set of installation instructions to obtain a working set-up of the `Extractor`.	direct
instructions	0..1 String	Any human-readable usage notes or installation instructions for this `Extractor`. This field is intended for human use only and is not intended to be machine-actionable. Please use the `Extractor->installation` and `Extractor->usage` slots for that purpose.	direct

Identifier and Mapping Information

Schema Source

from schema: https://datatractor.github.io/schema/main/datatractor_schema/

Mappings

Mapping Type	Mapped Value
self	datatractor_schema:Extractor
native	datatractor_schema:Extractor
close	schema_org:SoftwareApplication, schema_org:ServiceChannel, dcmitype:Software, dcmitype:Service

Examples

Example: Extractor-datatree

---
id: >-
    datatree
name: >-
    Datatree is a prototype implementation of a tree-like hierarchical
    data structure for xarray.
description: >-
    Extractor for netCDF files using xarray's Datasets.
subject:
    - data science
source_repository: >-
    https://github.com/xarray-contrib/datatree
documentation: >-
    https://xarray-datatree.readthedocs.io/en/latest/
usage:
    - method: python
      scope: meta+data
      setup: datatree
      command: datatree.open_datatree({{ file_path }})
installation:
    - method: pip
      packages:
          - xarray-datatree==0.0.12
      requires_python: '>=3.9'
instructions: >-
    Install the xarray-datatree package into a Python 3.9+ environment with
    `pip install xarray-datatree`. After importing, netCDF files can be read
    as DataTrees using the 'datatree.open_datatree()' function.
citations:
    - title: Datatree documentation
      uri: https://xarray-datatree.readthedocs.io/en/latest/
      creators:
          - T. Nicholas
      type: software
supported_filetypes:
    - id: netcdf
      description: >-
          Can load netCDF files into Datatree objects.
license:
    spdx: Apache-2.0

Example: Extractor-example

---
id: >-
    example
name: >-
    Example Extractor
description: >-
    An example extractor entry, using all features from the schema.
subject:
    - science
    - engineering
source_repository: >-
    https://github.com/example/extractor
documentation: >-
    https://example.github.io/extractor
usage:
    - method: cli
      command: parse --type={{ input_type }} {{ file_path }}
installation:
    - method: pip
      packages:
          - example-extractor
      requires_python: ==3.4
instructions: >-
    Install the package into a Python 3.4 environment with
    `pip install example-extractor`. After activating the
    environment, the `parse` entrypoint will be available at
    the command-line, and functions can be directly invoked
    from Python code.
citations:
    - title: An example extractor paper using DOI
      uri: doi:10.1000/182
      creators:
          - A. Uthor
          - M. A. Nuscript
      contributors:
          - E. Ditor
      type: article
    - title: Example extractor code repo
      uri: https://github.com/example/extractor
      creators:
          - S. Omeone
          - A. Nother
      contributors:
          - A. Person
      type: software
supported_filetypes:
    - id: example-filetype
      description: >-
          Example Extractor can parse example-filetype once in a blue moon.
      template:
          input_type: example
license:
    uri: https://example.com

LinkML Source

Direct

name: Extractor
description: A script, code, or web service that, when executed, can extract information
  from a supplied "file" with a specific [`FileType`](FileType.md).
comments:
- The supplied "file" may also be a set of files with a given structure.
- The extracted information may be verbatim or transformed, and include the scientific
  data and/or metadata contained within the file.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
close_mappings:
- schema_org:SoftwareApplication
- schema_org:ServiceChannel
- dcmitype:Software
- dcmitype:Service
rank: 1000
slots:
- id
- name
- description
- subject
- citations
- license
attributes:
  supported_filetypes:
    name: supported_filetypes
    description: An enumeration of the `FileType` that an `Extractor` supports, matching
      `FileTypes` present in the registry. The `FileType->id` slot can be passed to
      the `Extractor`, [see the `Usage` class](Usage.md).
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    domain_of:
    - Extractor
    - Usage
    range: SupportedFileType
    required: true
    multivalued: true
  supported_output_filetypes:
    name: supported_output_filetypes
    description: An enumeration of the possible output formats of an `Extractor`.
      These should match `FileTypes` present in the registry. They can be specified
      on extractor execution using the templates described in the `Extractor->Usage->command`
      slot, [see the `Usage` class](Usage.md).
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    domain_of:
    - Extractor
    range: SupportedFileType
    required: false
    multivalued: true
  source_repository:
    name: source_repository
    description: A URL or URI for a source code repository associated with this extractor.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    domain_of:
    - Extractor
  documentation:
    name: documentation
    description: A URL or URI for any online documentation associated with this extractor.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    domain_of:
    - Extractor
  usage:
    name: usage
    description: A machine-actionable instructions for the usage of the Extractor.
      The described usage pattern shall be available after the instructions specified
      in `Extractor->installation` slot have been followed.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    domain_of:
    - Extractor
    range: Usage
    multivalued: true
    inlined: true
    inlined_as_list: true
  installation:
    name: installation
    description: A machine-actionable set of installation instructions to obtain a
      working set-up of the `Extractor`.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    domain_of:
    - Extractor
    range: Installation
    multivalued: true
    inlined: true
    inlined_as_list: true
  instructions:
    name: instructions
    description: Any human-readable usage notes or installation instructions for this
      `Extractor`. This field is intended for human use only and is not intended to
      be machine-actionable. Please use the `Extractor->installation` and `Extractor->usage`
      slots for that purpose.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    domain_of:
    - Extractor

Induced

name: Extractor
description: A script, code, or web service that, when executed, can extract information
  from a supplied "file" with a specific [`FileType`](FileType.md).
comments:
- The supplied "file" may also be a set of files with a given structure.
- The extracted information may be verbatim or transformed, and include the scientific
  data and/or metadata contained within the file.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
close_mappings:
- schema_org:SoftwareApplication
- schema_org:ServiceChannel
- dcmitype:Software
- dcmitype:Service
rank: 1000
attributes:
  supported_filetypes:
    name: supported_filetypes
    description: An enumeration of the `FileType` that an `Extractor` supports, matching
      `FileTypes` present in the registry. The `FileType->id` slot can be passed to
      the `Extractor`, [see the `Usage` class](Usage.md).
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    alias: supported_filetypes
    owner: Extractor
    domain_of:
    - Extractor
    - Usage
    range: SupportedFileType
    required: true
    multivalued: true
  supported_output_filetypes:
    name: supported_output_filetypes
    description: An enumeration of the possible output formats of an `Extractor`.
      These should match `FileTypes` present in the registry. They can be specified
      on extractor execution using the templates described in the `Extractor->Usage->command`
      slot, [see the `Usage` class](Usage.md).
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    alias: supported_output_filetypes
    owner: Extractor
    domain_of:
    - Extractor
    range: SupportedFileType
    required: false
    multivalued: true
  source_repository:
    name: source_repository
    description: A URL or URI for a source code repository associated with this extractor.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    alias: source_repository
    owner: Extractor
    domain_of:
    - Extractor
    range: string
  documentation:
    name: documentation
    description: A URL or URI for any online documentation associated with this extractor.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    alias: documentation
    owner: Extractor
    domain_of:
    - Extractor
    range: string
  usage:
    name: usage
    description: A machine-actionable instructions for the usage of the Extractor.
      The described usage pattern shall be available after the instructions specified
      in `Extractor->installation` slot have been followed.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    alias: usage
    owner: Extractor
    domain_of:
    - Extractor
    range: Usage
    multivalued: true
    inlined: true
    inlined_as_list: true
  installation:
    name: installation
    description: A machine-actionable set of installation instructions to obtain a
      working set-up of the `Extractor`.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    alias: installation
    owner: Extractor
    domain_of:
    - Extractor
    range: Installation
    multivalued: true
    inlined: true
    inlined_as_list: true
  instructions:
    name: instructions
    description: Any human-readable usage notes or installation instructions for this
      `Extractor`. This field is intended for human use only and is not intended to
      be machine-actionable. Please use the `Extractor->installation` and `Extractor->usage`
      slots for that purpose.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    alias: instructions
    owner: Extractor
    domain_of:
    - Extractor
    range: string
  id:
    name: id
    description: A unique identifier for the entry within the Datatractor Yard namespace,
      this should be a shorthand label rather than a UUID. Only lower-case alphanumeric
      and dash ("-") characters are permitted.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    slot_uri: schema_org:identifier
    identifier: true
    alias: id
    owner: Extractor
    domain_of:
    - Extractor
    - SupportedFileType
    - FileType
    range: string
    required: true
    pattern: ^[a-z]+[a-z,0-9,-]*[a-z,0-9]+$
  name:
    name: name
    description: A recognisable name for the entry.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    slot_uri: schema_org:name
    alias: name
    owner: Extractor
    domain_of:
    - Extractor
    - FileType
    range: string
    required: true
  description:
    name: description
    description: A human-readable outline of the entry, its format, data content and
      uses.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    slot_uri: schema_org:description
    alias: description
    owner: Extractor
    domain_of:
    - Extractor
    - SupportedFileType
    - FileType
    range: string
    required: true
  subject:
    name: subject
    description: Any keywords, phrases or classification codes that are relevant to
      the entry, e.g., particular scientific domains of applicability, or experimental
      techniques.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    slot_uri: dc_terms:subject
    alias: subject
    owner: Extractor
    domain_of:
    - Extractor
    - FileType
    range: string
    multivalued: true
  citations:
    name: citations
    description: A citation or citations for the entry, to be provided should it be
      used in academic work (or otherwise).
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    slot_uri: dcmitype:BibliographicReference
    alias: citations
    owner: Extractor
    domain_of:
    - Extractor
    range: Citation
    required: false
    multivalued: true
  license:
    name: license
    description: A URL, URI or SPDX license identifier for a legal document giving
      official permission to do something with the resource.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    slot_uri: dc_terms:license
    alias: license
    owner: Extractor
    domain_of:
    - Extractor
    range: License
    required: true