Skip to content

pixano.inference.types

Pixano's inference types.

This module defines Pixano's own types for inference operations, independent of any specific inference backend.

CompressedRLEData(size, counts) dataclass

Compressed RLE mask data.

Attributes:

Name Type Description
size list[int]

Mask size as [height, width].

counts bytes

Mask RLE encoding as bytes.

from_dict(data) classmethod

Create from dictionary.

Source code in pixano/inference/types.py
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "CompressedRLEData":
    """Create from dictionary."""
    counts = data["counts"]
    if isinstance(counts, str):
        counts = counts.encode("utf-8")
    return cls(size=data["size"], counts=counts)

to_dict()

Convert to dictionary.

Source code in pixano/inference/types.py
def to_dict(self) -> dict[str, Any]:
    """Convert to dictionary."""
    return {
        "size": self.size,
        "counts": self.counts.decode("utf-8") if isinstance(self.counts, bytes) else self.counts,
    }

ImageMaskGenerationInput(image, model, image_embedding=None, high_resolution_features=None, reset_predictor=True, points=None, labels=None, boxes=None, num_multimask_outputs=3, multimask_output=True, return_image_embedding=False) dataclass

Input for image mask generation.

Attributes:

Name Type Description
image str

Image as base64 string or URL.

model str

Model name to use.

image_embedding NDArrayData | None

Pre-computed image embedding (optional).

high_resolution_features list[NDArrayData] | None

Pre-computed high-res features (optional).

reset_predictor bool

Whether to reset predictor state for new image.

points list[list[list[int]]] | None

Points for mask generation [num_prompts, num_points, 2].

labels list[list[int]] | None

Labels for points [num_prompts, num_points].

boxes list[list[int]] | None

Bounding boxes [num_prompts, 4].

num_multimask_outputs int

Number of masks to generate per prompt.

multimask_output bool

Whether to return multiple masks per prompt.

return_image_embedding bool

Whether to return computed embeddings.

ImageMaskGenerationOutput(masks, scores, image_embedding=None, high_resolution_features=None) dataclass

Output for image mask generation.

Attributes:

Name Type Description
masks list[list[CompressedRLEData]]

Generated masks [num_prompts, num_masks].

scores NDArrayData

Confidence scores.

image_embedding NDArrayData | None

Computed image embedding (if requested).

high_resolution_features list[NDArrayData] | None

Computed features (if requested).

ImageMaskGenerationResult(data, timestamp, processing_time, metadata, id='', status='SUCCESS') dataclass

Complete result of image mask generation.

Attributes:

Name Type Description
data ImageMaskGenerationOutput

The output data.

timestamp datetime

When the inference completed.

processing_time float

Time taken in seconds.

metadata dict[str, Any]

Additional metadata from the model.

id str

Unique identifier for the inference request.

status str

Status of the inference ("SUCCESS", "FAILURE").

ImageZeroShotDetectionInput(image, model, classes, box_threshold=0.5, text_threshold=0.5) dataclass

Input for zero-shot object detection.

Attributes:

Name Type Description
image str

Image as base64 string or URL.

model str

Model name to use.

classes list[str] | str

List of class names to detect.

box_threshold float

Confidence threshold for boxes.

text_threshold float

Confidence threshold for text matching.

ImageZeroShotDetectionOutput(boxes, scores, classes) dataclass

Output for zero-shot object detection.

Attributes:

Name Type Description
boxes list[list[int]]

Detected bounding boxes as [x1, y1, x2, y2].

scores list[float]

Confidence scores for each detection.

classes list[str]

Class names for each detection.

ImageZeroShotDetectionResult(data, timestamp, processing_time, metadata, id='', status='SUCCESS') dataclass

Complete result of zero-shot object detection.

Attributes:

Name Type Description
data ImageZeroShotDetectionOutput

The output data.

timestamp datetime

When the inference completed.

processing_time float

Time taken in seconds.

metadata dict[str, Any]

Additional metadata from the model.

id str

Unique identifier for the inference request.

status str

Status of the inference ("SUCCESS", "FAILURE").

InferenceTask

Bases: str, Enum

Tasks supported by Pixano inference providers.

ModelConfig(name, task, path=None, config=dict(), processor_config=dict()) dataclass

Configuration for instantiating a model.

Attributes:

Name Type Description
name str

Name of the model.

task str

Task of the model.

path Path | str | None

Path to the model dump.

config dict[str, Any]

Configuration of the model.

processor_config dict[str, Any]

Configuration of the processor.

to_dict()

Convert to dictionary.

Source code in pixano/inference/types.py
def to_dict(self) -> dict[str, Any]:
    """Convert to dictionary."""
    return {
        "name": self.name,
        "task": self.task,
        "path": str(self.path) if self.path else None,
        "config": self.config,
        "processor_config": self.processor_config,
    }

ModelInfo(name, task, model_path=None, model_class=None, provider=None) dataclass

Information about an available model.

Attributes:

Name Type Description
name str

Name of the model.

task str

Task the model can perform.

model_path str | None

Path to the model weights (optional).

model_class str | None

Class name of the model (optional).

provider str | None

Provider backend for the model (optional).

NDArrayData(values, shape) dataclass

N-dimensional array data.

Attributes:

Name Type Description
values list[float]

Flat list of values.

shape list[int]

Shape of the array.

from_dict(data) classmethod

Create from dictionary.

Source code in pixano/inference/types.py
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "NDArrayData":
    """Create from dictionary."""
    return cls(values=data["values"], shape=data["shape"])

to_dict()

Convert to dictionary.

Source code in pixano/inference/types.py
def to_dict(self) -> dict[str, Any]:
    """Convert to dictionary."""
    return {"values": self.values, "shape": self.shape}

ProviderCapabilities(tasks, supports_batching=False, supports_streaming=False, max_image_size=None) dataclass

What a provider can do.

Attributes:

Name Type Description
tasks list[InferenceTask]

List of supported inference tasks.

supports_batching bool

Whether the provider supports batch processing.

supports_streaming bool

Whether the provider supports streaming responses.

max_image_size int | None

Maximum supported image size (optional).

ServerInfo(app_name, app_version, app_description, num_cpus, num_gpus, num_nodes, gpus_used, gpu_to_model, models, models_to_task) dataclass

Information about the inference server.

Attributes:

Name Type Description
app_name str

Application name.

app_version str

Application version string.

app_description str

Application description.

num_cpus int | None

Number of CPUs available (None if unknown).

num_gpus int

Number of GPUs available.

num_nodes int

Number of nodes in the cluster.

gpus_used list[int]

List of GPU indices currently in use.

gpu_to_model dict[str, str]

Mapping of GPU index to model name.

models list[str]

List of loaded model names.

models_to_task dict[str, str]

Mapping of model names to their tasks.

TextImageConditionalGenerationInput(model, prompt, images=None, max_new_tokens=100, temperature=1.0) dataclass

Input for text-image conditional generation.

Attributes:

Name Type Description
model str

Model name to use.

prompt str | list[dict[str, Any]]

Prompt as string or list of message dicts.

images list[str | Path] | None

Optional list of image paths/base64 strings.

max_new_tokens int

Maximum tokens to generate.

temperature float

Sampling temperature.

TextImageConditionalGenerationOutput(generated_text, usage, generation_config=dict()) dataclass

Output for text-image conditional generation.

Attributes:

Name Type Description
generated_text str

The generated text response.

usage UsageInfo

Token usage information.

generation_config dict[str, Any]

Generation configuration used.

TextImageConditionalGenerationResult(data, timestamp, processing_time, metadata, id='', status='SUCCESS') dataclass

Complete result of text-image conditional generation.

Attributes:

Name Type Description
data TextImageConditionalGenerationOutput

The output data.

timestamp datetime

When the inference completed.

processing_time float

Time taken in seconds.

metadata dict[str, Any]

Additional metadata from the model.

id str

Unique identifier for the inference request.

status str

Status of the inference ("SUCCESS", "FAILURE").

UsageInfo(prompt_tokens, completion_tokens, total_tokens) dataclass

Token usage information.

Attributes:

Name Type Description
prompt_tokens int

Number of tokens in the prompt.

completion_tokens int

Number of tokens generated.

total_tokens int

Total tokens used.

VideoMaskGenerationInput(video, model, objects_ids, frame_indexes, points=None, labels=None, boxes=None) dataclass

Input for video mask generation.

Attributes:

Name Type Description
video list[str]

List of frame images as base64 or URLs.

model str

Model name to use.

objects_ids list[int]

IDs for each object to track.

frame_indexes list[int]

Frame indices for prompts.

points list[list[list[int]]] | None

Points for mask generation.

labels list[list[int]] | None

Labels for points.

boxes list[list[int]] | None

Bounding boxes.

VideoMaskGenerationOutput(objects_ids, frame_indexes, masks) dataclass

Output for video mask generation.

Attributes:

Name Type Description
objects_ids list[int]

IDs of tracked objects.

frame_indexes list[int]

Frame indices for each mask.

masks list[CompressedRLEData]

Generated masks for each frame.

VideoMaskGenerationResult(data, status, timestamp, processing_time, metadata, id='') dataclass

Complete result of video mask generation.

Attributes:

Name Type Description
data VideoMaskGenerationOutput

The output data.

status str

Status of the inference ("SUCCESS", "FAILURE").

timestamp datetime

When the inference completed.

processing_time float

Time taken in seconds.

metadata dict[str, Any]

Additional metadata from the model.

id str

Unique identifier for the inference request.