pixano.datasets.dataset_schema
DatasetItem(created_at=None, updated_at=None, **data)
Bases: BaseModel
Dataset Item.
It is a Pydantic model that represents an item in a dataset.
Attributes:
Name | Type | Description |
---|---|---|
id |
str
|
The unique identifier of the item. |
split |
str
|
The split of the item. |
created_at |
datetime
|
The creation date of the item. |
updated_at |
datetime
|
The last modification date of the item. |
Raises ValidationError
if the input data cannot be
validated to form a valid model.
self
is explicitly positional-only to allow self
as a field name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
created_at
|
datetime | None
|
The creation date of the object. |
None
|
updated_at
|
datetime | None
|
The last modification date of the object. |
None
|
data
|
Any
|
The data of the object validated by Pydantic. |
{}
|
Source code in pixano/datasets/dataset_schema.py
from_dataset_schema(dataset_schema, exclude_embeddings=True)
staticmethod
Create a dataset item model based on the schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_schema
|
DatasetSchema
|
The dataset schema. |
required |
exclude_embeddings
|
bool
|
Exclude embeddings from the dataset item model to reduce the size. |
True
|
Returns:
Type | Description |
---|---|
type[DatasetItem]
|
The dataset item model |
Source code in pixano/datasets/dataset_schema.py
from_schemas_data(schemas_data)
staticmethod
Create a DatasetItem from schemas data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cls
|
DatasetItem
|
The DatasetItem class. |
required |
schemas_data
|
dict[str, BaseSchema | list[BaseSchema] | None]
|
Schemas data. |
required |
Returns:
Type | Description |
---|---|
DatasetItem
|
The created DatasetItem. |
Source code in pixano/datasets/dataset_schema.py
get_sub_dataset_item(selected_fields)
classmethod
Create a new dataset item based on the selected fields of the original dataset item.
Note
The id and split fields are always included in the sub dataset item.
Note
The sub dataset item does not have the methods and config of the original dataset item.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
selected_fields
|
list[str]
|
The selected fields. |
required |
Returns:
Type | Description |
---|---|
type[Self]
|
The sub dataset item. |
Source code in pixano/datasets/dataset_schema.py
model_copy(*, dataset, deep=False)
Returns a copy of the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
Dataset
|
The dataset where the DatasetItem belongs. |
required |
deep
|
Set to |
False
|
Returns:
Type | Description |
---|---|
Self
|
New model instance. |
Source code in pixano/datasets/dataset_schema.py
model_dump(exclude_timestamps=False, **kwargs)
Dump the model to a dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exclude_timestamps
|
bool
|
Exclude timestamps "created_at" and "updated_at" from the model dump. Useful for comparing models without timestamps. |
False
|
kwargs
|
Any
|
Arguments for pydantic |
{}
|
Returns:
Type | Description |
---|---|
dict[str, Any]
|
The model dump. |
Source code in pixano/datasets/dataset_schema.py
to_dataset_schema()
classmethod
to_schemas_data(dataset_schema)
Convert DatasetItem to schemas data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_schema
|
DatasetSchema
|
DatasetSchema to convert to. |
required |
Returns:
Type | Description |
---|---|
dict[str, BaseSchema | list[BaseSchema] | None]
|
Schemas data. |
Source code in pixano/datasets/dataset_schema.py
DatasetSchema(**data)
Bases: BaseModel
A dataset schema that defines the tables and the relations between them.
Attributes:
Name | Type | Description |
---|---|---|
schemas |
dict[str, type[BaseSchema]]
|
The mapping between the table names and their schema. |
relations |
dict[str, dict[str, SchemaRelation]]
|
The relations between the item table and the other tables. |
groups |
dict[SchemaGroup, set[str]]
|
The groups of tables. It is filled automatically based on the schemas. |
Source code in pydantic/main.py
add_schema(table_name, schema, relation_item)
Add a schema to the dataset schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name
|
str
|
Name of the table to add to the dataset schema. |
required |
schema
|
type[BaseSchema]
|
Schema of the table. |
required |
relation_item
|
SchemaRelation
|
Relationship with the item schema. |
required |
Returns:
Type | Description |
---|---|
Self
|
The dataset schema. |
Source code in pixano/datasets/dataset_schema.py
deserialize(dataset_schema_json)
staticmethod
Deserialize the dataset schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_schema_json
|
dict[str, dict[str, Any]]
|
Serialized dataset schema. |
required |
Returns:
Type | Description |
---|---|
DatasetSchema
|
The dataset schema. |
Source code in pixano/datasets/dataset_schema.py
format_table_name(table_name)
staticmethod
from_dataset_item(dataset_item)
staticmethod
Create a dataset schema from a DatasetItem.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_item
|
type[DatasetItem]
|
The dataset item. |
required |
Returns:
Type | Description |
---|---|
DatasetSchema
|
The dataset schema. |
Source code in pixano/datasets/dataset_schema.py
from_json(json_fp)
staticmethod
Read a dataset schema from JSON file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
json_fp
|
Path
|
JSON file path |
required |
Returns:
Type | Description |
---|---|
DatasetSchema
|
The dataset schema. |
Source code in pixano/datasets/dataset_schema.py
get_table_group(table_name)
Get the group of a table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name
|
str
|
Table name. |
required |
Returns:
Type | Description |
---|---|
SchemaGroup
|
The group of the table. |
Source code in pixano/datasets/dataset_schema.py
serialize()
Serialize the dataset schema.
The serialized schema is a dictionary with the following format: { "relations": { "item": { "image": "one_to_one", } }, "schemas": { "table1": { "schema": "CustomItem", "base_schema": "Item", "fields": { "id": { "type": "str", "collection": False }, "split": { "type": "str", "collection": False }, ... }
}
}
}
Returns:
Type | Description |
---|---|
dict[str, dict[str, Any]]
|
The serialized dataset schema. |
Source code in pixano/datasets/dataset_schema.py
to_json(json_fp)
Save DatasetSchema to json file.
Source code in pixano/datasets/dataset_schema.py
SchemaRelation
Bases: Enum
Relation between tables.
Attributes:
Name | Type | Description |
---|---|---|
ONE_TO_MANY |
One to many relation. |
|
MANY_TO_ONE |
Many to one relation. |
|
ONE_TO_ONE |
One to one relation. |
|
MANY_TO_MANY |
Many to many relation |