Skip to content

pixano.datasets.exporters.default_jsonl_dataset_exporter

DefaultJSONLDatasetExporter(dataset, export_dir, overwrite=False)

Bases: DatasetExporter

Default JSON Lines dataset exporter.

Source code in pixano/datasets/exporters/dataset_exporter.py
def __init__(self, dataset: Dataset, export_dir: str | Path, overwrite: bool = False):
    """Initialize a new instance of the DatasetExporter class.

    Args:
        dataset: The dataset to be exported.
        export_dir: The directory where the exported files will be saved.
        overwrite: Whether to overwrite existing directory.
    """
    self.dataset = dataset
    self.export_dir = Path(export_dir)
    self._overwrite = overwrite

export_dataset_item(export_data, dataset_item)

Store the dataset item in the export_data list of dictionaries.

Parameters:

Name Type Description Default
export_data list[dict[str, Any]]

A list of dictionaries containing the dataset items to be exported.

required
dataset_item DatasetItem

The dataset item to be exported.

required

Returns:

Type Description
list[dict[str, Any]]

A list of dictionaries containing the dataset items to be exported.

Source code in pixano/datasets/exporters/default_jsonl_dataset_exporter.py
def export_dataset_item(
    self, export_data: list[dict[str, Any]], dataset_item: DatasetItem
) -> list[dict[str, Any]]:
    """Store the dataset item in the `export_data` list of dictionaries.

    Args:
        export_data: A list of dictionaries containing the dataset items to be exported.
        dataset_item: The dataset item to be exported.

    Returns:
        A list of dictionaries containing the dataset items to be exported.
    """
    export_data.append(dataset_item.model_dump(exclude_timestamps=True))
    return export_data

initialize_export_data(info, sources)

Initialize a list of dictionaries to be exported.

The first line contains the following elements: - dataset info - the sources

Parameters:

Name Type Description Default
info DatasetInfo

The dataset information.

required
sources list[Source]

The list of sources.

required

Returns:

Type Description
list[dict[str, Any]]

A list of dictionaries containing the data to be exported.

Source code in pixano/datasets/exporters/default_jsonl_dataset_exporter.py
def initialize_export_data(self, info: DatasetInfo, sources: list[Source]) -> list[dict[str, Any]]:
    """Initialize a list of dictionaries to be exported.

    The first line contains the following elements:
    - dataset info
    - the sources

    Args:
        info: The dataset information.
        sources: The list of sources.

    Returns:
        A list of dictionaries containing the data to be exported.
    """
    export_data = [
        {"info": info.model_dump(), "sources": [s.model_dump(exclude_timestamps=True) for s in sources]}
    ]
    return export_data

save_data(export_data, split, file_name, file_num)

Save data to the specified directory.

The saved directory has the following structure

export_dir/{split}{file_name}_0.jsonl /... /{split}{file_name}{file_num}.jsonl /... /{split}{file_name}_n.jsonl

Parameters:

Name Type Description Default
export_data list[dict[str, Any]]

The list of dictionaries containing the data to be saved.

required
split str

The split of the dataset item being saved.

required
file_name str

The name of the file to save the data in.

required
file_num int

The number of the file to save the data in.

required
Source code in pixano/datasets/exporters/default_jsonl_dataset_exporter.py
def save_data(self, export_data: list[dict[str, Any]], split: str, file_name: str, file_num: int) -> None:
    """Save data to the specified directory.

    The saved directory has the following structure:
        export_dir/{split}_{file_name}_0.jsonl
                  /...
                  /{split}_{file_name}_{file_num}.jsonl
                  /...
                  /{split}_{file_name}_n.jsonl


    Args:
        export_data: The list of dictionaries containing the data to be saved.
        split: The split of the dataset item being saved.
        file_name: The name of the file to save the data in.
        file_num: The number of the file to save the data in.
    """
    info, data = export_data[0], export_data[1:]

    info_path = self.export_dir / "info.json"
    info_path.write_text(json.dumps(info), encoding="utf-8")

    json_path = self.export_dir / f"{split}_{file_name}_{file_num}.jsonl"
    json_path.write_text("\n".join([json.dumps(jsonable_encoder(d)) for d in data]), encoding="utf-8")