pixano_inference.ray.app
DeploymentManager and FastAPI app factory for Ray Serve.
DeploymentManager(config)
In-process manager for deployed models, handles, and GPU allocation.
Manages the lifecycle of Ray Serve model deployments including: - Resolving model classes from the registry - Creating and running deployments - Storing deployment handles for inference - Tracking GPU allocations
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
RayServeConfig
|
Ray Serve configuration. |
required |
Source code in pixano_inference/ray/app.py
config
property
Server configuration.
cancel_tracking_job(job_id)
Cancel an asynchronous tracking job on a best-effort basis.
Source code in pixano_inference/ray/app.py
deploy_model(config)
Deploy a model.
Resolves the model class from the registry, creates a Ray actor, and stores the handle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ModelDeploymentConfig
|
Model deployment configuration. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the model is already deployed. |
KeyError
|
If the model class is not found in the registry. |
Source code in pixano_inference/ray/app.py
get_gpu_info()
Get GPU resource information from Ray.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
GPU info dictionary. |
Source code in pixano_inference/ray/app.py
get_handle(name)
Get a deployment handle by model name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Model name. |
required |
Returns:
| Type | Description |
|---|---|
Any | None
|
DeploymentHandle or None if not found. |
get_model_capability(name)
get_model_metadata(name)
Get metadata for a deployed model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Model name. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Model metadata dictionary. |
Source code in pixano_inference/ray/app.py
get_tracking_job(job_id)
Resolve the current state of an asynchronous tracking job.
Source code in pixano_inference/ray/app.py
list_models()
List all deployed models.
Returns:
| Type | Description |
|---|---|
list[ModelInfo]
|
List of ModelInfo objects. |
Source code in pixano_inference/ray/app.py
submit_tracking_job(model_name, input_data)
Submit a tracking request as an asynchronous Ray job.
Source code in pixano_inference/ray/app.py
undeploy_model(name)
Undeploy a model.
Kills the Ray actor, frees GPU, and removes the handle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Model name. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the model is not deployed. |
Source code in pixano_inference/ray/app.py
TrackingJobRecord(model_name, object_ref, status='queued', detail=None, result=None, metadata=dict(), timestamp=datetime.now(), submitted_at_monotonic=time.time(), processing_time=0.0)
dataclass
In-memory status for an asynchronous tracking job.
create_ray_serve_app(config=None)
Create the FastAPI application and DeploymentManager for Ray Serve.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
RayServeConfig | None
|
Ray Serve configuration. If None, uses defaults. |
None
|
Returns:
| Type | Description |
|---|---|
tuple[FastAPI, DeploymentManager]
|
Tuple of (FastAPI app, DeploymentManager). |