pixano_inference.ray.config
Ray Serve configuration models.
AutoscalingConfig(**data)
Bases: BaseModel
Autoscaling configuration for Ray Serve deployments.
Attributes:
| Name | Type | Description |
|---|---|---|
min_replicas |
int
|
Minimum number of replicas. Can be 0 for scale-to-zero. |
max_replicas |
int
|
Maximum number of replicas. |
target_num_ongoing_requests_per_replica |
int
|
Target number of ongoing requests per replica before scaling up. |
downscale_delay_s |
float
|
Delay in seconds before scaling down. |
upscale_delay_s |
float
|
Delay in seconds before scaling up. |
Source code in pydantic/main.py
ModelDeploymentConfig(**data)
Bases: BaseModel
Configuration for deploying a single model.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique model name. Optional for HuggingFace models (auto-derived from path). |
capability |
str
|
Capability string (e.g. "segmentation"). |
model_class |
str
|
Registered class name (e.g. "Sam2ImageModel"). |
model_module |
str | None
|
Python module path to import before resolving model_class (e.g. "my_package.models"). Used for external custom models. |
model_params |
dict
|
Parameters passed to model init via config. |
resources |
ResourceConfig
|
Resource configuration for the deployment. |
autoscaling |
AutoscalingConfig
|
Autoscaling configuration for the deployment. |
max_batch_size |
int
|
Maximum batch size for inference. |
batch_wait_timeout_s |
float
|
Timeout for waiting to fill batch. |
Source code in pydantic/main.py
RayServeConfig(**data)
Bases: BaseModel
Top-level Ray Serve configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
host |
str
|
Host to bind to. |
port |
int
|
Port to serve on. |
num_cpus |
int | None
|
Total number of CPUs available to Ray. None means auto-detect. |
num_gpus |
int | None
|
Total number of GPUs available to Ray. None means auto-detect. |
pip_packages |
list[str] | None
|
List of pip packages to install in Ray workers runtime environment. |
working_dir |
str | None
|
Working directory for Ray workers. |
models |
list[ModelDeploymentConfig]
|
List of models to deploy at startup. |
default_resources |
ResourceConfig
|
Default resource configuration for deployments. |
default_autoscaling |
AutoscalingConfig
|
Default autoscaling configuration for deployments. |
Source code in pydantic/main.py
ResourceConfig(**data)
Bases: BaseModel
Resource configuration for a model deployment.
Attributes:
| Name | Type | Description |
|---|---|---|
num_gpus |
float
|
Number of GPUs per replica. |
num_cpus |
float
|
Number of CPUs per replica. |
memory_mb |
int | None
|
Memory limit in MB. None means no limit. |