pixano.datasets.builders.dataset_builder
DatasetBuilder(target_dir, dataset_item, info)
Bases: ABC
Abstract base class for dataset builders.
To build a dataset, inherit from this class, implement the generate_data
method and launch the build
method.
Attributes:
Name | Type | Description |
---|---|---|
target_dir |
Path
|
The target directory for the dataset. |
previews_path |
Path
|
The path to the previews directory. |
info |
DatasetInfo
|
Dataset information (name, description, ...). |
dataset_schema |
DatasetSchema
|
The schema of the dataset. |
schemas |
dict[str, type[BaseSchema]]
|
The schemas of the dataset tables. |
db |
DBConnection
|
The connection to the |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_dir
|
Path | str
|
The target directory for the dataset. |
required |
dataset_item
|
type[DatasetItem]
|
The dataset item schema. |
required |
info
|
DatasetInfo
|
Dataset information (name, description, ...). |
required |
Source code in pixano/datasets/builders/dataset_builder.py
item_schema
property
The item schema for the dataset.
item_schema_name
property
The item schema name for the dataset.
add_ground_truth_source(metadata={})
Add a ground truth source to the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metadata
|
str | dict[str, Any]
|
Metadata of the ground truth source. |
{}
|
Returns:
Type | Description |
---|---|
str
|
The id of the ground truth source. |
Source code in pixano/datasets/builders/dataset_builder.py
add_source(name, kind, metadata={}, id='')
Add a source to the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
Name of the source. |
required |
kind
|
str | SourceKind
|
Kind of source. |
required |
metadata
|
str | dict[str, Any]
|
Metadata of the source. If a dict is provided, it is converted to a JSON string. |
{}
|
id
|
str
|
The id of the source. If not provided, a random id is generated. |
''
|
Returns:
Type | Description |
---|---|
str
|
The id of the source. |
Source code in pixano/datasets/builders/dataset_builder.py
build(mode='create', flush_every_n_samples=None, compact_every_n_transactions=None, check_integrity='raise')
Build the dataset.
It generates data from the source directory and insert them in the tables of the database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mode
|
Literal['add', 'create', 'overwrite']
|
The mode for creating the tables in the database. The mode can be "create", "overwrite" or "add": - "create": Create the tables in the database. If the tables already exist, an error is raised. - "overwrite": Overwrite the tables in the database. - "add": Append to the tables in the database. |
'create'
|
flush_every_n_samples
|
int | None
|
The number of samples accumulated from |
None
|
compact_every_n_transactions
|
int | None
|
The number of transactions before compacting each table. If None, the dataset is compacted only at the end. |
None
|
check_integrity
|
Literal['raise', 'warn', 'none']
|
The integrity check to perform after building the dataset. It can be "raise", "warn" or "none": - "raise": Raise an error if integrity errors are found. - "warn": Print a warning if integrity errors are found. - "none": Do not check integrity. |
'raise'
|
Returns:
Type | Description |
---|---|
Dataset
|
The built dataset. |
Source code in pixano/datasets/builders/dataset_builder.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
|
compact_dataset()
Compact the dataset by calling compact_table
for each table in the database.
compact_table(table_name)
Compact a table by cleaning up old versions and compacting files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name
|
str
|
The name of the table to compact. |
required |
Source code in pixano/datasets/builders/dataset_builder.py
create_tables(mode='create')
Create tables in the database.
Returns:
Type | Description |
---|---|
dict[str, Table]
|
The tables in the database. |
Source code in pixano/datasets/builders/dataset_builder.py
generate_data()
abstractmethod
Generate data from the source directory.
It should yield a dictionary with keys corresponding to the table names and values corresponding to the data.
It must be implemented in the subclass.
Returns:
Type | Description |
---|---|
Iterator[dict[str, BaseSchema | list[BaseSchema]]]
|
An iterator over the data following the dataset schemas. |