beam package
Module contents
This script is intended as an example usage and reference implementation of the
API endpoints exposed on the . Currently, it can be used to:
query the registry of Extractors for extractors that support a given file type,
install those extractors in a fresh Python virtual environment environment via pip,
invoke the extractor either in Python or at the CLI, producing Python objects or files on disk.
- beam.extract(input_path, input_type, output_path=None, output_type=None, preferred_mode=SupportedExecutionMethod.PYTHON, preferred_scope=SupportedUsageScope.DATA, install=True, use_venv=True, extractor_definition=None, registry_base_url='https://yard.datatractor.org/api')
Parse a file given its path and file type.
- Parameters:
input_path (
Path
|str
) – The path or URL of the file to parse.input_type (
str
) – The ID of theFileType
in the registry.output_path (
Union
[Path
,str
,None
]) – The path to write the output to. If not provided, the output will be requested to be written to a file with the same name as the input file, but with an extension as defined using theoutput_type
. Defaults to{input_path}.out
.output_type (
Optional
[str
]) – A string specifying the desired output type.preferred_mode (
SupportedExecutionMethod
) – The preferred execution method. If the extractor supports both Python and CLI, this will be used to determine which to use. If the extractor only supports one method, this will be ignored. Accepts theSupportedExecutionMethod
values of “cli” or “python”.preferred_scope (
SupportedUsageScope
) – The preferred extraction scope. Accepts theSupportedUsageScope
values of “meta+data” (default) or “meta-only”.install (
bool
) – Whether to install the extractor package before running it. Defaults to True.extractor_definition (
Optional
[dict
]) – A dictionary containing the extractor definition to use instead of a registry lookup.registry_base_url (
str
) – The base URL of the registry to use. Defaults to the.
- Return type:
Any
- Returns:
The output of the extractor, either a Python object or nothing.