Getting started with Pixano-Inference
Installation
Install the extras for the model backends you need:
Usage
Pixano-Inference serves models with GPU-aware deployment, autoscaling, and request batching. Models are declared in a Python config file, the server deploys them, and you interact via the Python client or HTTP API.
The workflow is:
- Write a Python config
- Start the server
- Connect the client
- Run inference
See the examples section for detailed use cases.
Step 1: Write a Python config
Create a file called models.py that declares which models to deploy.
Here is an example deploying Grounding DINO for zero-shot detection:
from pixano_inference.configs import DeploymentConfig, GroundingDINOParams, ModelConfig
models = [
ModelConfig(
name="grounding-dino",
model_class="GroundingDINOModel",
model_params=GroundingDINOParams(path="IDEA-Research/grounding-dino-tiny"),
deployment=DeploymentConfig(num_gpus=1, min_replicas=0, max_replicas=2),
)
]
See the Server Deployment documentation for all configuration options, deploying built-in and custom models, and configuring autoscaling.
Step 2: Start the server
The default address is http://127.0.0.1:7463. You can override it with
--host and --port.
Step 3: Connect the client
The easiest way to interact with Pixano-Inference is through the Python client.
You can also use the Swagger UI at http://localhost:7463/docs.
from pixano_inference.client import PixanoInferenceClient
client = PixanoInferenceClient.connect(url="http://localhost:7463")
Step 4: Run an inference
Build a request and call the appropriate client method:
import asyncio
from pixano_inference.client import PixanoInferenceClient
from pixano_inference.schemas import DetectionRequest
async def main():
client = PixanoInferenceClient.connect(url="http://localhost:7463")
request = DetectionRequest(
model="grounding-dino",
image="http://images.cocodataset.org/val2017/000000039769.jpg",
classes=["cat", "remote control"],
box_threshold=0.3,
text_threshold=0.2,
)
response = await client.detection(request)
print(response.data.boxes)
print(response.data.classes)
asyncio.run(main())