pixano_inference.models.vlm
VLM (Vision-Language Model) base class and I/O types.
UsageInfo(**data)
Bases: BaseModel
Usage metadata for generation.
Attributes:
| Name | Type | Description |
|---|---|---|
prompt_tokens |
int
|
Number of tokens in the prompt. |
completion_tokens |
int
|
Number of tokens in the completion. |
total_tokens |
int
|
Total number of tokens. |
Source code in pydantic/main.py
VLMInput(**data)
Bases: BaseModel
Input for vision-language model generation.
Attributes:
| Name | Type | Description |
|---|---|---|
prompt |
str | list[dict[str, Any]]
|
Prompt for the generation. Can be a string or a list of dicts for chat templates. |
images |
list[str | Path] | None
|
Images for the generation. Can be None if images are passed in the prompt. |
max_new_tokens |
int
|
Maximum number of new tokens to generate. |
temperature |
float
|
Temperature for the generation. |
Source code in pydantic/main.py
VLMModel(config)
Bases: InferenceModel
Base class for vision-language models.
Example
@register_model("my-vlm")
class MyVLM(VLMModel):
def load_model(self):
self.model = load_weights(self.config.model_params["path"])
def predict(self, input: VLMInput) -> VLMOutput:
text = self.model.generate(input.prompt, input.images)
return VLMOutput(generated_text=text, usage=..., generation_config=...)
Source code in pixano_inference/models/base.py
predict(input)
abstractmethod
Run vision-language generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
VLMInput
|
VLM input with prompt, images, and generation parameters. |
required |
Returns:
| Type | Description |
|---|---|
VLMOutput
|
VLM output with generated text, usage info, and generation config. |
Source code in pixano_inference/models/vlm.py
VLMOutput(**data)
Bases: BaseModel
Output for vision-language model generation.
Attributes:
| Name | Type | Description |
|---|---|---|
generated_text |
str
|
Generated text. |
usage |
UsageInfo
|
Usage metadata. |
generation_config |
dict[str, Any]
|
Configuration used for the generation. |