pixano.datasets.builders.folders.base
FolderBaseBuilder(source_dir, target_dir, dataset_item, info, url_prefix=None)
Bases: DatasetBuilder
This is a class for building datasets based on a folder structure.
The folder structure should be as follows
- source_dir/{split}/{item}.{ext}
- source_dir/{split}/metadata.jsonl
The metadata file should be a jsonl file with the following format:
[
{
"item": "item1",
"metadata1": "value1",
"metadata2": "value2",
...
"entities": {
"attr1": [val1, val2, ...],
"attr2": [val1, val2, ...],
...
}
},
{
"item": "item2",
"metadata1": "value1",
"metadata2": "value2",
...
"entities": {
"attr1": [val1, val2, ...],
"attr2": [val1, val2, ...],
...
}
},
...
]
Note
Only one view and one entity are supported in folder builders.
Attributes:
Name | Type | Description |
---|---|---|
source_dir |
The source directory for the dataset. |
|
view_name |
The name of the view schema. |
|
view_schema |
type[View]
|
The schema of the view. |
entity_name |
The name of the entities schema. |
|
entity_schema |
type[Entity]
|
The schema of the entities. |
METADATA_FILENAME |
str
|
The metadata filename. |
EXTENSIONS |
list[str]
|
The list of supported extensions. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_dir
|
Path | str
|
The source directory for the dataset. |
required |
target_dir
|
Path | str
|
The target directory for the dataset. |
required |
dataset_item
|
type[DatasetItem]
|
The dataset item schema. |
required |
info
|
DatasetInfo
|
User informations (name, description, ...) for the dataset. |
required |
url_prefix
|
Path | str | None
|
The path to build relative URLs for the views. Useful to build dataset libraries to pass the relative path from the media directory. |
None
|
Source code in pixano/datasets/builders/folders/base.py
generate_data()
Generate data from the source directory.
Returns:
Type | Description |
---|---|
Iterator[dict[str, BaseSchema | list[BaseSchema]]]
|
An iterator over the data following the dataset schemas. |