Custom Models Guide
This guide shows how to deploy your own models on Pixano-Inference without
modifying the pixano_inference package itself.
Overview
Custom deployment requires two pieces:
- A model class that subclasses one of the base classes from
pixano_inference.models - A Python config file that imports your class and passes it directly to
ModelConfig
At startup, the server resolves the class, infers its capability from the base model class, and deploys it as a Ray Serve actor.
Choose a base class
| Base class | Capability | Input type | Output type |
|---|---|---|---|
SegmentationModel |
segmentation |
SegmentationInput |
SegmentationOutput |
DetectionModel |
detection |
DetectionInput |
DetectionOutput |
TrackingModel |
tracking |
TrackingInput |
TrackingOutput |
VLMModel |
vlm |
VLMInput |
VLMOutput |
These are the current HTTP-exposed model families in pixano_inference.models.
Project layout
Example custom model
Create my_models/segmenter.py:
from __future__ import annotations
import numpy as np
from pixano_inference.models import SegmentationInput, SegmentationModel, SegmentationOutput, register_model
from pixano_inference.ray.config import ModelDeploymentConfig
from pixano_inference.schemas import CompressedRLE, NDArrayFloat
@register_model("MySegmenter")
class MySegmenter(SegmentationModel):
def __init__(self, config: ModelDeploymentConfig) -> None:
super().__init__(config)
self._threshold = float(config.model_params.get("threshold", 0.5))
def load_model(self) -> None:
self._path = self.config.model_params["path"]
def predict(self, input: SegmentationInput) -> SegmentationOutput:
mask = np.zeros((32, 32), dtype=np.uint8)
mask[8:24, 8:24] = 1
scores = NDArrayFloat.from_numpy(np.array([[self._threshold]], dtype=np.float32))
return SegmentationOutput(
masks=[[CompressedRLE.from_mask(mask)]],
scores=scores,
)
Python config
Create models.py next to your package:
from my_models.segmenter import MySegmenter
from pixano_inference.configs import DeploymentConfig, ModelConfig
models = [
ModelConfig(
name="my-segmenter",
model_class=MySegmenter,
model_params={"path": "my-org/my-segmentation-model", "threshold": 0.6},
deployment=DeploymentConfig(num_gpus=1, min_replicas=0, max_replicas=2, max_batch_size=4),
)
]
Start the server
If the module is not installed as a package, add the project root to the module search path:
Verify the deployment
curl -X POST http://localhost:7463/inference/segmentation/ \
-H "Content-Type: application/json" \
-d '{
"model": "my-segmenter",
"image": "data:image/png;base64,..."
}'
Design rules
- Keep heavy imports inside
load_model()orpredict()when practical. - Read deployment-specific options from
self.config.model_params. - Return the typed output models from
pixano_inference.models. - Use helper types from
pixano_inference.schemaswhen the payload contains masks or array-like values. - Pick the correct base model class first; it determines the endpoint family and request contract.
- Respect
self.config.resources.num_gpuswhen selecting CPU vs GPU execution.
Troubleshooting
Model class 'MySegmenter' not foundPass the class directly tomodel_classinstead of a string name.- Import errors inside the deployment Install the third-party dependencies required by your model in the same environment that starts Pixano-Inference.
- CPU-only execution
Set
num_gpus=0inDeploymentConfigand makeload_model()fall back to CPU.