pixano.datasets.exporters.default_jsonl_dataset_exporter
DefaultJSONLDatasetExporter(dataset, export_dir, overwrite=False)
Bases: DatasetExporter
Default JSON Lines dataset exporter.
Source code in pixano/datasets/exporters/dataset_exporter.py
export_dataset_item(export_data, dataset_item)
Store the dataset item in the export_data
list of dictionaries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
export_data
|
list[dict[str, Any]]
|
A list of dictionaries containing the dataset items to be exported. |
required |
dataset_item
|
DatasetItem
|
The dataset item to be exported. |
required |
Returns:
Type | Description |
---|---|
list[dict[str, Any]]
|
A list of dictionaries containing the dataset items to be exported. |
Source code in pixano/datasets/exporters/default_jsonl_dataset_exporter.py
initialize_export_data(info, sources)
Initialize a list of dictionaries to be exported.
The first line contains the following elements: - dataset info - the sources
Parameters:
Name | Type | Description | Default |
---|---|---|---|
info
|
DatasetInfo
|
The dataset information. |
required |
sources
|
list[Source]
|
The list of sources. |
required |
Returns:
Type | Description |
---|---|
list[dict[str, Any]]
|
A list of dictionaries containing the data to be exported. |
Source code in pixano/datasets/exporters/default_jsonl_dataset_exporter.py
save_data(export_data, split, file_name, file_num)
Save data to the specified directory.
The saved directory has the following structure
export_dir/{split}{file_name}_0.jsonl /... /{split}{file_name}{file_num}.jsonl /... /{split}{file_name}_n.jsonl
Parameters:
Name | Type | Description | Default |
---|---|---|---|
export_data
|
list[dict[str, Any]]
|
The list of dictionaries containing the data to be saved. |
required |
split
|
str
|
The split of the dataset item being saved. |
required |
file_name
|
str
|
The name of the file to save the data in. |
required |
file_num
|
int
|
The number of the file to save the data in. |
required |