Skip to content

Quickstart: Deploy Built-in Models

This guide walks you through deploying built-in models and running your first inference requests.

Prerequisites

  • Python 3.10+
  • A GPU is recommended for production use but not required for testing

Installation

Install Pixano-Inference with the model-specific extras you need:

uv sync --extra sam2

or with pip:

pip install pixano-inference[sam2]
uv sync --extra transformers

or with pip:

pip install pixano-inference[transformers]
uv sync --extra sam2 --extra transformers --extra vllm

or with pip:

pip install pixano-inference[sam2,transformers,vllm]

Write a Python config

Create a file called models.py that declares which models to deploy. Here is a minimal example deploying SAM2 for image segmentation:

from pixano_inference.configs import DeploymentConfig, ModelConfig, Sam2ImageParams


models = [
    ModelConfig(
        name="sam2-image",
        model_class="Sam2ImageModel",
        model_params=Sam2ImageParams(
            path="facebook/sam2-hiera-base-plus",
            torch_dtype="bfloat16",
        ),
        deployment=DeploymentConfig(
            num_gpus=1,
            min_replicas=0,
            max_replicas=2,
            max_batch_size=8,
        ),
    )
]

CPU-only testing

Set num_gpus: 0 and torch_dtype: float32 to run on CPU. This is useful for testing but not recommended for production.

Start the server

From the CLI

pixano-inference --config models.py

Programmatically

from pixano_inference.ray import InferenceServer

server = InferenceServer()
server.register_from_config("models.py")
server.start(blocking=True)

Verify the deployment

Check the health endpoint:

curl http://localhost:7463/health

List deployed models:

curl http://localhost:7463/app/models/

You should see your model in the response:

[
  {
    "name": "sam2-image",
    "capability": "segmentation",
    "model_class": "Sam2ImageModel"
  }
]

Send inference requests

With curl

# Encode an image to base64
IMAGE_B64=$(base64 -i your_image.png)

curl -X POST http://localhost:7463/inference/segmentation/ \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sam2-image",
    "image": "data:image/png;base64,'"$IMAGE_B64"'",
    "points": [[[200, 175]]],
    "labels": [[1]]
  }'

With the Python client

import asyncio
from pixano_inference.client import PixanoInferenceClient
from pixano_inference.schemas import SegmentationRequest


async def main():
    client = PixanoInferenceClient.connect("http://localhost:7463")

    request = SegmentationRequest(
        model="sam2-image",
        image="data:image/png;base64,...",  # Base64-encoded image
        points=[[[200, 175]]],              # Point prompt (x, y)
        labels=[[1]],                       # 1 = foreground
    )
    response = await client.segmentation(request)

    print(f"Status: {response.status}")
    print(f"Processing time: {response.processing_time:.3f}s")
    print(f"Masks: {len(response.data.masks[0])}")
    print(f"Scores: {response.data.scores.to_numpy()}")

    # Decode a mask to a numpy array
    mask = response.data.masks[0][0].to_mask()
    print(f"Mask shape: {mask.shape}")


asyncio.run(main())

Built-in models

The following model classes are available out of the box:

Model class Capability Extra required Example model_params
Sam2ImageModel segmentation sam2 path: facebook/sam2-hiera-base-plus
Sam2VideoModel tracking sam2 path: facebook/sam2-hiera-large
GroundingDINOModel detection transformers path: IDEA-Research/grounding-dino-base
TransformersVLMModel vlm transformers path: llava-hf/llava-1.5-7b-hf
VLLMVLMModel vlm vllm path: Qwen/Qwen2-VL-7B-Instruct

Multi-model config

You can deploy multiple models in a single config file. Each model gets its own Ray actor with dedicated resources:

from pixano_inference.configs import (
    DeploymentConfig,
    GroundingDINOParams,
    ModelConfig,
    Sam2ImageParams,
    Sam2VideoParams,
)


models = [
    ModelConfig(
        name="sam2-image",
        model_class="Sam2ImageModel",
        model_params=Sam2ImageParams(path="facebook/sam2-hiera-base-plus"),
        deployment=DeploymentConfig(num_gpus=1, max_batch_size=8),
    ),
    ModelConfig(
        name="sam2-video",
        model_class="Sam2VideoModel",
        model_params=Sam2VideoParams(path="facebook/sam2-hiera-large"),
        deployment=DeploymentConfig(num_gpus=1, max_batch_size=1),
    ),
    ModelConfig(
        name="grounding-dino",
        model_class="GroundingDINOModel",
        model_params=GroundingDINOParams(path="IDEA-Research/grounding-dino-base"),
        deployment=DeploymentConfig(num_gpus=1, max_batch_size=8),
    ),
]

Tip

See custom_models.md for external model modules via model_module, and deploy/sam2_example.py for a typed config example.

Deployment configuration reference

Each ModelConfig(...) entry supports the following fields:

Model fields

Field Type Default Description
name str required Unique model name
model_class str \| type required Registered model class name or class object
model_module str None Python module to import before resolving model_class (e.g. my_package.models). Used for external custom models.
model_params dict \| BaseModelParams {} Parameters passed to the model (e.g. path, torch_dtype)

Capability is derived automatically from model_class. For example: SegmentationModel subclasses deploy behind /inference/segmentation/, while DetectionModel subclasses deploy behind /inference/detection/.

Deployment fields (under deployment:)

Field Type Default Description
num_gpus float 0.0 GPUs per replica
num_cpus float 1.0 CPUs per replica
memory_mb int None Memory limit in MB (None = no limit)
min_replicas int 0 Minimum replicas (0 = scale to zero)
max_replicas int 4 Maximum replicas
target_num_ongoing_requests_per_replica int 2 Autoscaling target
downscale_delay_s float 60.0 Seconds to wait before scaling down
upscale_delay_s float 5.0 Seconds to wait before scaling up
max_batch_size int 8 Maximum batch size for inference
batch_wait_timeout_s float 0.1 Timeout for filling a batch (seconds)

Available capabilities

Capability Description
segmentation Generate masks from images (SAM2)
tracking Track prompted objects across video frames (SAM2)
detection Detect objects and optionally return masks when the model supports it
vlm Generate text from images and prompts (LLaVA, Qwen2-VL, etc.)

Next steps

To deploy your own custom models (PyTorch, JAX, TensorFlow, or any framework), see the Custom Models Guide.