Class: Extractor
A script, code, or web service that, when executed, can extract information from a supplied “file” with a specific FileType
.
Slots
Name |
Cardinality and Range |
Description |
Inheritance |
---|---|---|---|
1..1 |
A unique identifier for the entry within the Datatractor Yard namespace, this |
direct |
|
1..1 |
A recognisable name for the entry. |
direct |
|
1..1 |
A human-readable outline of the entry, its format, data content and uses. |
direct |
|
0..* |
Any keywords, phrases or classification codes that are relevant to the entry, |
direct |
|
0..* |
A citation or citations for the entry, to be provided should it be used in |
direct |
|
1..1 |
A URL, URI or SPDX license identifier for a legal document giving official |
direct |
|
1..* |
An enumeration of the |
direct |
|
0..* |
An enumeration of the possible output formats of an |
direct |
|
0..1 |
A URL or URI for a source code repository associated with this extractor. |
direct |
|
0..1 |
A URL or URI for any online documentation associated with this extractor. |
direct |
|
0..* |
A machine-actionable instructions for the usage of the Extractor. The described |
direct |
|
0..* |
A machine-actionable set of installation instructions to obtain a working set-up |
direct |
|
0..1 |
Any human-readable usage notes or installation instructions for this |
direct |
Identifier and Mapping Information
Schema Source
from schema: https://datatractor.github.io/schema/main/datatractor_schema/
Mappings
Mapping Type |
Mapped Value |
---|---|
self |
datatractor_schema:Extractor |
native |
datatractor_schema:Extractor |
close |
schema_org:SoftwareApplication, schema_org:ServiceChannel, dcmitype:Software, dcmitype:Service |
Examples
Example: Extractor-example
---
id: >-
example
name: >-
Example Extractor
description: >-
An example extractor entry, using all features from the schema.
subject:
- science
- engineering
source_repository: >-
https://github.com/example/extractor
documentation: >-
https://example.github.io/extractor
usage:
- method: cli
command: parse --type={{ input_type }} {{ file_path }}
installation:
- method: pip
packages:
- example-extractor
requires_python: ==3.4
instructions: >-
Install the package into a Python 3.4 environment with
`pip install example-extractor`. After activating the
environment, the `parse` entrypoint will be available at
the command-line, and functions can be directly invoked
from Python code.
citations:
- title: An example extractor paper using DOI
uri: doi:10.1000/182
creators:
- A. Uthor
- M. A. Nuscript
contributors:
- E. Ditor
type: article
- title: Example extractor code repo
uri: https://github.com/example/extractor
creators:
- S. Omeone
- A. Nother
contributors:
- A. Person
type: software
supported_filetypes:
- id: example-filetype
description: >-
Example Extractor can parse example-filetype once in a blue moon.
template:
input_type: example
license:
uri: https://example.com
Example: Extractor-datatree
---
id: >-
datatree
name: >-
Datatree is a prototype implementation of a tree-like hierarchical
data structure for xarray.
description: >-
Extractor for netCDF files using xarray's Datasets.
subject:
- data science
source_repository: >-
https://github.com/xarray-contrib/datatree
documentation: >-
https://xarray-datatree.readthedocs.io/en/latest/
usage:
- method: python
scope: meta+data
setup: datatree
command: datatree.open_datatree({{ file_path }})
installation:
- method: pip
packages:
- xarray-datatree==0.0.12
requires_python: '>=3.9'
instructions: >-
Install the xarray-datatree package into a Python 3.9+ environment with
`pip install xarray-datatree`. After importing, netCDF files can be read
as DataTrees using the 'datatree.open_datatree()' function.
citations:
- title: Datatree documentation
uri: https://xarray-datatree.readthedocs.io/en/latest/
creators:
- T. Nicholas
type: software
supported_filetypes:
- id: netcdf
description: >-
Can load netCDF files into Datatree objects.
license:
spdx: Apache-2.0
LinkML Source
Direct
name: Extractor
description: A script, code, or web service that, when executed, can extract information
from a supplied "file" with a specific [`FileType`](FileType.md).
comments:
- The supplied "file" may also be a set of files with a given structure.
- The extracted information may be verbatim or transformed, and include the scientific
data and/or metadata contained within the file.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
close_mappings:
- schema_org:SoftwareApplication
- schema_org:ServiceChannel
- dcmitype:Software
- dcmitype:Service
rank: 1000
slots:
- id
- name
- description
- subject
- citations
- license
attributes:
supported_filetypes:
name: supported_filetypes
description: An enumeration of the `FileType` that an `Extractor` supports, matching
`FileTypes` present in the registry. The `FileType->id` slot can be passed to
the `Extractor`, [see the `Usage` class](Usage.md).
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
multivalued: true
domain_of:
- Extractor
- Usage
range: SupportedFileType
required: true
supported_output_filetypes:
name: supported_output_filetypes
description: An enumeration of the possible output formats of an `Extractor`.
These should match `FileTypes` present in the registry. They can be specified
on extractor execution using the templates described in the `Extractor->Usage->command`
slot, [see the `Usage` class](Usage.md).
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
multivalued: true
domain_of:
- Extractor
range: SupportedFileType
required: false
source_repository:
name: source_repository
description: A URL or URI for a source code repository associated with this extractor.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
domain_of:
- Extractor
documentation:
name: documentation
description: A URL or URI for any online documentation associated with this extractor.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
domain_of:
- Extractor
usage:
name: usage
description: A machine-actionable instructions for the usage of the Extractor.
The described usage pattern shall be available after the instructions specified
in `Extractor->installation` slot have been followed.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
multivalued: true
domain_of:
- Extractor
range: Usage
inlined: true
inlined_as_list: true
installation:
name: installation
description: A machine-actionable set of installation instructions to obtain a
working set-up of the `Extractor`.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
multivalued: true
domain_of:
- Extractor
range: Installation
inlined: true
inlined_as_list: true
instructions:
name: instructions
description: Any human-readable usage notes or installation instructions for this
`Extractor`. This field is intended for human use only and is not intended to
be machine-actionable. Please use the `Extractor->installation` and `Extractor->usage`
slots for that purpose.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
domain_of:
- Extractor
Induced
name: Extractor
description: A script, code, or web service that, when executed, can extract information
from a supplied "file" with a specific [`FileType`](FileType.md).
comments:
- The supplied "file" may also be a set of files with a given structure.
- The extracted information may be verbatim or transformed, and include the scientific
data and/or metadata contained within the file.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
close_mappings:
- schema_org:SoftwareApplication
- schema_org:ServiceChannel
- dcmitype:Software
- dcmitype:Service
rank: 1000
attributes:
supported_filetypes:
name: supported_filetypes
description: An enumeration of the `FileType` that an `Extractor` supports, matching
`FileTypes` present in the registry. The `FileType->id` slot can be passed to
the `Extractor`, [see the `Usage` class](Usage.md).
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
multivalued: true
alias: supported_filetypes
owner: Extractor
domain_of:
- Extractor
- Usage
range: SupportedFileType
required: true
supported_output_filetypes:
name: supported_output_filetypes
description: An enumeration of the possible output formats of an `Extractor`.
These should match `FileTypes` present in the registry. They can be specified
on extractor execution using the templates described in the `Extractor->Usage->command`
slot, [see the `Usage` class](Usage.md).
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
multivalued: true
alias: supported_output_filetypes
owner: Extractor
domain_of:
- Extractor
range: SupportedFileType
required: false
source_repository:
name: source_repository
description: A URL or URI for a source code repository associated with this extractor.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
alias: source_repository
owner: Extractor
domain_of:
- Extractor
range: string
documentation:
name: documentation
description: A URL or URI for any online documentation associated with this extractor.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
alias: documentation
owner: Extractor
domain_of:
- Extractor
range: string
usage:
name: usage
description: A machine-actionable instructions for the usage of the Extractor.
The described usage pattern shall be available after the instructions specified
in `Extractor->installation` slot have been followed.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
multivalued: true
alias: usage
owner: Extractor
domain_of:
- Extractor
range: Usage
inlined: true
inlined_as_list: true
installation:
name: installation
description: A machine-actionable set of installation instructions to obtain a
working set-up of the `Extractor`.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
multivalued: true
alias: installation
owner: Extractor
domain_of:
- Extractor
range: Installation
inlined: true
inlined_as_list: true
instructions:
name: instructions
description: Any human-readable usage notes or installation instructions for this
`Extractor`. This field is intended for human use only and is not intended to
be machine-actionable. Please use the `Extractor->installation` and `Extractor->usage`
slots for that purpose.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
alias: instructions
owner: Extractor
domain_of:
- Extractor
range: string
id:
name: id
description: A unique identifier for the entry within the Datatractor Yard namespace,
this should be a shorthand label rather than a UUID. Only lower-case alphanumeric
and dash ("-") characters are permitted.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
slot_uri: schema_org:identifier
identifier: true
alias: id
owner: Extractor
domain_of:
- Extractor
- SupportedFileType
- FileType
range: string
required: true
pattern: ^[a-z]+[a-z,0-9,-]*[a-z,0-9]+$
name:
name: name
description: A recognisable name for the entry.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
slot_uri: schema_org:name
alias: name
owner: Extractor
domain_of:
- Extractor
- FileType
range: string
required: true
description:
name: description
description: A human-readable outline of the entry, its format, data content and
uses.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
slot_uri: schema_org:description
alias: description
owner: Extractor
domain_of:
- Extractor
- SupportedFileType
- FileType
range: string
required: true
subject:
name: subject
description: Any keywords, phrases or classification codes that are relevant to
the entry, e.g., particular scientific domains of applicability, or experimental
techniques.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
slot_uri: dc_terms:subject
multivalued: true
alias: subject
owner: Extractor
domain_of:
- Extractor
- FileType
range: string
citations:
name: citations
description: A citation or citations for the entry, to be provided should it be
used in academic work (or otherwise).
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
slot_uri: dcmitype:BibliographicReference
multivalued: true
alias: citations
owner: Extractor
domain_of:
- Extractor
range: Citation
required: false
license:
name: license
description: A URL, URI or SPDX license identifier for a legal document giving
official permission to do something with the resource.
from_schema: https://datatractor.github.io/schema/main/datatractor_schema/
rank: 1000
slot_uri: dc_terms:license
alias: license
owner: Extractor
domain_of:
- Extractor
range: License
required: true
Comments
The supplied “file” may also be a set of files with a given structure.
The extracted information may be verbatim or transformed, and include the scientific data and/or metadata contained within the file.
URI: datatractor_schema:Extractor