pixano_inference.ray.server
Inference server entry point for Ray Serve.
InferenceServer(config=None)
Server for managing Ray Serve deployments.
The FastAPI application runs via uvicorn in-process while model deployments run as separate Ray Serve actors. This avoids serialization issues with Pydantic models while keeping Ray's GPU management.
Example
from pixano_inference.ray import InferenceServer, RayServeConfig
config = RayServeConfig(host="0.0.0.0", port=7463, num_gpus=2)
server = InferenceServer(config)
server.start(blocking=True)
Using a Python config file:
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
RayServeConfig | None
|
Ray Serve configuration. If None, uses defaults. |
None
|
Source code in pixano_inference/ray/server.py
config
property
Server configuration.
is_running
property
Whether the server is running.
__enter__()
__exit__(exc_type, exc_val, exc_tb)
register_from_config(config_path)
Load config file and add models to startup list.
Supports Python (.py) config files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str | Path
|
Path to the configuration file (.py). |
required |
Source code in pixano_inference/ray/server.py
register_models(models)
Register typed model configs for deployment at startup.
Each ModelConfig is validated and converted to the internal
ModelDeploymentConfig format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
models
|
list[ModelConfig]
|
List of typed model configurations. |
required |
Source code in pixano_inference/ray/server.py
start(host=None, port=None, blocking=True)
Start the inference server.
This method: 1. Initializes Ray for model deployment actors 2. Creates the FastAPI application with DeploymentManager 3. Deploys startup models from config via Ray Serve 4. Runs the FastAPI app via uvicorn
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
host
|
str | None
|
Host to bind to. Uses config value if not specified. |
None
|
port
|
int | None
|
Port to serve on. Uses config value if not specified. |
None
|
blocking
|
bool
|
Whether to block until server is stopped. |
True
|
Source code in pixano_inference/ray/server.py
stop()
Stop the inference server.