Pixano Inference HTTP API
This document describes the HTTP API exposed by the Ray Serve-based Pixano Inference server.
Base URL: http://<host>:<port> with default http://127.0.0.1:7463
Start the server with a Python config file:
Overview
- Models are loaded at startup from a Python
.pyconfig file passed to--config. - All inference routes are synchronous
POSTendpoints. - There are no runtime HTTP endpoints for deploying or undeploying models.
- Every inference request includes a
modelfield that must match a deployed model name. - Endpoint families are capability-based: segmentation, detection, tracking, and VLM.
Service endpoints
| Method | Path | Purpose |
|---|---|---|
GET |
/ |
Basic API metadata and docs link |
GET |
/health |
Liveness probe |
GET |
/ready |
Readiness summary |
GET |
/app/settings/ |
Server settings and resource summary |
GET |
/app/models/ |
List deployed models |
GET /app/settings/
Example response:
{
"app_name": "Pixano Inference",
"app_version": "0.6.0",
"app_description": "Pixano Inference API powered by Ray Serve",
"num_cpus": 8,
"num_gpus": 2,
"num_nodes": 1,
"gpus_used": 1.0,
"gpu_to_model": {},
"models": ["sam2-image"],
"models_to_capability": {
"sam2-image": "segmentation"
}
}
GET /app/models/
Returns a list of ModelInfo objects:
[
{
"name": "sam2-image",
"capability": "segmentation",
"model_path": "facebook/sam2-hiera-base-plus",
"model_class": "Sam2ImageModel"
}
]
Inference endpoints
| Method | Path | Request schema | Response schema | Python client helper |
|---|---|---|---|---|
POST |
/inference/segmentation/ |
SegmentationRequest |
SegmentationResponse |
client.segmentation() |
POST |
/inference/detection/ |
DetectionRequest |
DetectionResponse |
client.detection() |
POST |
/inference/tracking/ |
TrackingRequest |
TrackingResponse |
client.tracking() |
POST |
/inference/vlm/ |
VLMRequest |
VLMResponse |
client.vlm() |
If a model exists but does not support the endpoint capability, the server
returns 400.
The request and response models are available from pixano_inference.schemas.
Example: segmentation
{
"model": "sam2-image",
"image": "data:image/png;base64,...",
"points": [[[200, 175]]],
"labels": [[1]]
}
Example: detection
{
"model": "grounding-dino",
"image": "http://images.cocodataset.org/val2017/000000039769.jpg",
"classes": ["cat", "remote control"],
"box_threshold": 0.3,
"text_threshold": 0.2
}
Response envelope
All inference endpoints return the same top-level envelope:
{
"id": "ray-sam2-image-1739000000000",
"status": "SUCCESS",
"timestamp": "2026-01-01T12:00:00",
"processing_time": 0.234,
"metadata": {
"model_name": "sam2-image",
"capability": "segmentation",
"model_class": "Sam2ImageModel"
},
"data": {}
}
| Field | Description |
|---|---|
id |
Server-generated request identifier |
status |
Inference status, typically SUCCESS |
timestamp |
Response timestamp |
processing_time |
End-to-end inference time in seconds |
metadata |
Deployment metadata for the model that handled the request |
data |
Capability-specific payload |
Python client
import asyncio
from pixano_inference.client import PixanoInferenceClient
from pixano_inference.schemas import SegmentationRequest
async def main() -> None:
client = PixanoInferenceClient.connect("http://localhost:7463")
request = SegmentationRequest(
model="sam2-image",
image="data:image/png;base64,...",
points=[[[200, 175]]],
labels=[[1]],
)
response = await client.segmentation(request)
print(response.processing_time)
print(response.data.scores.to_numpy())
asyncio.run(main())
Error responses
400when the model exists but does not support the requested capability.404when the requested model name is not deployed.422when the request body fails schema validation.500when inference fails inside the model deployment.
The Python client raises fastapi.HTTPException with the server error detail
when a request is unsuccessful.