Pixano-Inference
Overview
Pixano-Inference serves multimodal inference models behind a REST API powered by Ray Serve. Models are declared in Python config files, deployed as Ray actors, and invoked through either the HTTP API or the Python client.
Key capabilities:
- Python-config deployment -- Declare models in
models.pywith typed validation - GPU-aware actors -- Assign CPU/GPU resources per model deployment
- Autoscaling and batching -- Tune replicas and batch size per model
- Multi-model serving -- Run several deployments in one server
- Custom models -- Register your own
InferenceModelsubclasses
How it works
- Write a Python config with
pixano_inference.configs.ModelConfig - Start the server with
pixano-inference --config models.py - Send requests via the Python client or HTTP API
Next steps
- Getting Started -- Install, configure, and run your first inference
- Server Deployment -- Advanced configuration, autoscaling, and custom models
- HTTP API Reference -- Full endpoint and schema documentation
Contributing
We welcome contributions from the community! Please open issues and pull requests for any bugs or feature requests you may have.
License
Pixano-Inference is released under the terms of the CeCILL-C license.