Class: Extractor

A script, code, or web service that, when executed, can extract information from a supplied “file” with a specific FileType.

Comments

  • The supplied “file” may also be a set of files with a given structure.

  • The extracted information may be verbatim or transformed, and include the scientific data and/or metadata contained within the file.

URI: datatractor_schema:Extractor

erDiagram Extractor { string id string name string description stringList subject string source_repository string documentation string instructions } Installation { InstallerTypes method string requires_python string requirements stringList packages } Usage { UsageTypes method string setup string command UsageScope scope stringList supported_filetypes } SupportedFileType { string id string description } UsageTemplate { string input_path string input_type string output_path string output_type } License { string uri string spdx } Citation { string uri stringList creators stringList contributors string title string type } Extractor ||--}o Citation : "citations" Extractor ||--|| License : "license" Extractor ||--}| SupportedFileType : "supported_filetypes" Extractor ||--}o SupportedFileType : "supported_output_filetypes" Extractor ||--}o Usage : "usage" Extractor ||--}o Installation : "installation" SupportedFileType ||--|o UsageTemplate : "template"

Slots

Name

Cardinality and Range

Description

Inheritance

id

1..1
String

A unique identifier for the entry within the Datatractor Yard namespace, this
should be a shorthand label rather than a UUID. Only lower-case alphanumeric and
dash (“-”) characters are permitted.

direct

name

1..1
String

A recognisable name for the entry.

direct

description

1..1
String

A human-readable outline of the entry, its format, data content and uses.

direct

subject

0..*
String

Any keywords, phrases or classification codes that are relevant to the entry,
e.g., particular scientific domains of applicability, or experimental
techniques.

direct

citations

0..*
Citation

A citation or citations for the entry, to be provided should it be used in
academic work (or otherwise).

direct

license

1..1
License

A URL, URI or SPDX license identifier for a legal document giving official
permission to do something with the resource.

direct

supported_filetypes

1..*
SupportedFileType

An enumeration of the FileType that an Extractor supports, matching
FileTypes present in the registry. The FileType->id slot can be passed to
the Extractor, see the Usage class.

direct

supported_output_filetypes

0..*
SupportedFileType

An enumeration of the possible output formats of an Extractor. These should
match FileTypes present in the registry. They can be specified on extractor
execution using the templates described in the Extractor->Usage->command slot,
see the Usage class.

direct

source_repository

0..1
String

A URL or URI for a source code repository associated with this extractor.

direct

documentation

0..1
String

A URL or URI for any online documentation associated with this extractor.

direct

usage

0..*
Usage

A machine-actionable instructions for the usage of the Extractor. The described
usage pattern shall be available after the instructions specified in
Extractor->installation slot have been followed.

direct

installation

0..*
Installation

A machine-actionable set of installation instructions to obtain a working set-up
of the Extractor.

direct

instructions

0..1
String

Any human-readable usage notes or installation instructions for this
Extractor. This field is intended for human use only and is not intended to be
machine-actionable. Please use the Extractor->installation and
Extractor->usage slots for that purpose.

direct

Identifier and Mapping Information

Schema Source

  • from schema: https://datatractor.github.io/schema/main/datatractor_schema/

Mappings

Mapping Type

Mapped Value

self

datatractor_schema:Extractor

native

datatractor_schema:Extractor

close

schema_org:SoftwareApplication, schema_org:ServiceChannel, dcmitype:Software, dcmitype:Service

Examples

Example: Extractor-example

---
id: >-
    example
name: >-
    Example Extractor
description: >-
    An example extractor entry, using all features from the schema.
subject:
    - science
    - engineering
source_repository: >-
    https://github.com/example/extractor
documentation: >-
    https://example.github.io/extractor
usage:
    - method: cli
      command: parse --type={{ input_type }} {{ file_path }}
installation:
    - method: pip
      packages:
          - example-extractor
      requires_python: ==3.4
instructions: >-
    Install the package into a Python 3.4 environment with
    `pip install example-extractor`. After activating the
    environment, the `parse` entrypoint will be available at
    the command-line, and functions can be directly invoked
    from Python code.
citations:
    - title: An example extractor paper using DOI
      uri: doi:10.1000/182
      creators:
          - A. Uthor
          - M. A. Nuscript
      contributors:
          - E. Ditor
      type: article
    - title: Example extractor code repo
      uri: https://github.com/example/extractor
      creators:
          - S. Omeone
          - A. Nother
      contributors:
          - A. Person
      type: software
supported_filetypes:
    - id: example-filetype
      description: >-
          Example Extractor can parse example-filetype once in a blue moon.
      template:
          input_type: example
license:
    uri: https://example.com

Example: Extractor-datatree

---
id: >-
    datatree
name: >-
    Datatree is a prototype implementation of a tree-like hierarchical
    data structure for xarray.
description: >-
    Extractor for netCDF files using xarray's Datasets.
subject:
    - data science
source_repository: >-
    https://github.com/xarray-contrib/datatree
documentation: >-
    https://xarray-datatree.readthedocs.io/en/latest/
usage:
    - method: python
      scope: meta+data
      setup: datatree
      command: datatree.open_datatree({{ file_path }})
installation:
    - method: pip
      packages:
          - xarray-datatree==0.0.12
      requires_python: '>=3.9'
instructions: >-
    Install the xarray-datatree package into a Python 3.9+ environment with
    `pip install xarray-datatree`. After importing, netCDF files can be read
    as DataTrees using the 'datatree.open_datatree()' function.
citations:
    - title: Datatree documentation
      uri: https://xarray-datatree.readthedocs.io/en/latest/
      creators:
          - T. Nicholas
      type: software
supported_filetypes:
    - id: netcdf
      description: >-
          Can load netCDF files into Datatree objects.
license:
    spdx: Apache-2.0

LinkML Source

Direct

name: Extractor
description: A script, code, or web service that, when executed, can extract information
  from a supplied "file" with a specific [`FileType`](FileType.md).
comments:
- The supplied "file" may also be a set of files with a given structure.
- The extracted information may be verbatim or transformed, and include the scientific
  data and/or metadata contained within the file.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
close_mappings:
- schema_org:SoftwareApplication
- schema_org:ServiceChannel
- dcmitype:Software
- dcmitype:Service
rank: 1000
slots:
- id
- name
- description
- subject
- citations
- license
attributes:
  supported_filetypes:
    name: supported_filetypes
    description: An enumeration of the `FileType` that an `Extractor` supports, matching
      `FileTypes` present in the registry. The `FileType->id` slot can be passed to
      the `Extractor`, [see the `Usage` class](Usage.md).
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    multivalued: true
    domain_of:
    - Extractor
    - Usage
    range: SupportedFileType
    required: true
  supported_output_filetypes:
    name: supported_output_filetypes
    description: An enumeration of the possible output formats of an `Extractor`.
      These should match `FileTypes` present in the registry. They can be specified
      on extractor execution using the templates described in the `Extractor->Usage->command`
      slot, [see the `Usage` class](Usage.md).
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    multivalued: true
    domain_of:
    - Extractor
    range: SupportedFileType
    required: false
  source_repository:
    name: source_repository
    description: A URL or URI for a source code repository associated with this extractor.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    domain_of:
    - Extractor
  documentation:
    name: documentation
    description: A URL or URI for any online documentation associated with this extractor.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    domain_of:
    - Extractor
  usage:
    name: usage
    description: A machine-actionable instructions for the usage of the Extractor.
      The described usage pattern shall be available after the instructions specified
      in `Extractor->installation` slot have been followed.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    multivalued: true
    domain_of:
    - Extractor
    range: Usage
    inlined: true
    inlined_as_list: true
  installation:
    name: installation
    description: A machine-actionable set of installation instructions to obtain a
      working set-up of the `Extractor`.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    multivalued: true
    domain_of:
    - Extractor
    range: Installation
    inlined: true
    inlined_as_list: true
  instructions:
    name: instructions
    description: Any human-readable usage notes or installation instructions for this
      `Extractor`. This field is intended for human use only and is not intended to
      be machine-actionable. Please use the `Extractor->installation` and `Extractor->usage`
      slots for that purpose.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    domain_of:
    - Extractor

Induced

name: Extractor
description: A script, code, or web service that, when executed, can extract information
  from a supplied "file" with a specific [`FileType`](FileType.md).
comments:
- The supplied "file" may also be a set of files with a given structure.
- The extracted information may be verbatim or transformed, and include the scientific
  data and/or metadata contained within the file.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
close_mappings:
- schema_org:SoftwareApplication
- schema_org:ServiceChannel
- dcmitype:Software
- dcmitype:Service
rank: 1000
attributes:
  supported_filetypes:
    name: supported_filetypes
    description: An enumeration of the `FileType` that an `Extractor` supports, matching
      `FileTypes` present in the registry. The `FileType->id` slot can be passed to
      the `Extractor`, [see the `Usage` class](Usage.md).
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    multivalued: true
    alias: supported_filetypes
    owner: Extractor
    domain_of:
    - Extractor
    - Usage
    range: SupportedFileType
    required: true
  supported_output_filetypes:
    name: supported_output_filetypes
    description: An enumeration of the possible output formats of an `Extractor`.
      These should match `FileTypes` present in the registry. They can be specified
      on extractor execution using the templates described in the `Extractor->Usage->command`
      slot, [see the `Usage` class](Usage.md).
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    multivalued: true
    alias: supported_output_filetypes
    owner: Extractor
    domain_of:
    - Extractor
    range: SupportedFileType
    required: false
  source_repository:
    name: source_repository
    description: A URL or URI for a source code repository associated with this extractor.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    alias: source_repository
    owner: Extractor
    domain_of:
    - Extractor
    range: string
  documentation:
    name: documentation
    description: A URL or URI for any online documentation associated with this extractor.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    alias: documentation
    owner: Extractor
    domain_of:
    - Extractor
    range: string
  usage:
    name: usage
    description: A machine-actionable instructions for the usage of the Extractor.
      The described usage pattern shall be available after the instructions specified
      in `Extractor->installation` slot have been followed.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    multivalued: true
    alias: usage
    owner: Extractor
    domain_of:
    - Extractor
    range: Usage
    inlined: true
    inlined_as_list: true
  installation:
    name: installation
    description: A machine-actionable set of installation instructions to obtain a
      working set-up of the `Extractor`.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    multivalued: true
    alias: installation
    owner: Extractor
    domain_of:
    - Extractor
    range: Installation
    inlined: true
    inlined_as_list: true
  instructions:
    name: instructions
    description: Any human-readable usage notes or installation instructions for this
      `Extractor`. This field is intended for human use only and is not intended to
      be machine-actionable. Please use the `Extractor->installation` and `Extractor->usage`
      slots for that purpose.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    alias: instructions
    owner: Extractor
    domain_of:
    - Extractor
    range: string
  id:
    name: id
    description: A unique identifier for the entry within the Datatractor Yard namespace,
      this should be a shorthand label rather than a UUID. Only lower-case alphanumeric
      and dash ("-") characters are permitted.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    slot_uri: schema_org:identifier
    identifier: true
    alias: id
    owner: Extractor
    domain_of:
    - Extractor
    - SupportedFileType
    - FileType
    range: string
    required: true
    pattern: ^[a-z]+[a-z,0-9,-]*[a-z,0-9]+$
  name:
    name: name
    description: A recognisable name for the entry.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    slot_uri: schema_org:name
    alias: name
    owner: Extractor
    domain_of:
    - Extractor
    - FileType
    range: string
    required: true
  description:
    name: description
    description: A human-readable outline of the entry, its format, data content and
      uses.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    slot_uri: schema_org:description
    alias: description
    owner: Extractor
    domain_of:
    - Extractor
    - SupportedFileType
    - FileType
    range: string
    required: true
  subject:
    name: subject
    description: Any keywords, phrases or classification codes that are relevant to
      the entry, e.g., particular scientific domains of applicability, or experimental
      techniques.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    slot_uri: dc_terms:subject
    multivalued: true
    alias: subject
    owner: Extractor
    domain_of:
    - Extractor
    - FileType
    range: string
  citations:
    name: citations
    description: A citation or citations for the entry, to be provided should it be
      used in academic work (or otherwise).
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    slot_uri: dcmitype:BibliographicReference
    multivalued: true
    alias: citations
    owner: Extractor
    domain_of:
    - Extractor
    range: Citation
    required: false
  license:
    name: license
    description: A URL, URI or SPDX license identifier for a legal document giving
      official permission to do something with the resource.
    from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
    rank: 1000
    slot_uri: dc_terms:license
    alias: license
    owner: Extractor
    domain_of:
    - Extractor
    range: License
    required: true