pixano_inference.impls.transformers.vlm
Transformers-based VLM (Vision-Language Model).
TransformersVLMModel(config)
Bases: VLMModel
Native Ray Serve model for Transformers-based VLMs.
model_params contract:
path(str, required): HuggingFace model ID or local checkpoint path.processor_config(dict, optional): Kwargs forAutoProcessor.from_pretrained.config(dict, optional): Kwargs for modelfrom_pretrained.model_type(str, optional): Model type hint (e.g. "llava", "llava-next"). If not provided, falls back toAutoModelForVision2Seq.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ModelDeploymentConfig
|
Model deployment configuration. |
required |
Source code in pixano_inference/impls/transformers/vlm.py
metadata
property
Model metadata.
load_model()
Load the Transformers VLM model and processor.
Source code in pixano_inference/impls/transformers/vlm.py
predict(input)
Run VLM generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
VLMInput
|
VLM input with prompt, images, and generation parameters. |
required |
Returns:
| Type | Description |
|---|---|
VLMOutput
|
VLM output with generated text, usage info, and generation config. |
Source code in pixano_inference/impls/transformers/vlm.py
unload()
Free resources.