pixano_inference.models.transformers
Inference models for Transformers.
TransformerModel(name, path, processor, model)
Bases: BaseInferenceModel
Inference model for transformers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
Name of the model. |
required |
path
|
Path | str
|
Path to the model or its Hugging Face hub's identifier. |
required |
processor
|
'ProcessorMixin'
|
Processor for the model. |
required |
model
|
'PreTrainedModel'
|
Model for the inference. |
required |
Source code in pixano_inference/models/transformers.py
metadata
property
Return the metadata of the model.
delete()
image_mask_generation(image, image_embedding=None, points=None, labels=None, boxes=None, num_multimask_outputs=3, multimask_output=True, return_image_embedding=False, **kwargs)
Generate a mask from the image.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image
|
'Tensor' | Image
|
Image for the generation. |
required |
image_embedding
|
'Tensor' | None
|
Image embeddings for the generation. |
None
|
points
|
list[list[list[int]]] | None
|
Points for the mask generation. The first fimension is the number of prompts, the second the number of points per mask and the third the coordinates of the points. |
None
|
labels
|
list[list[int]] | None
|
Labels for the mask generation. The first fimension is the number of prompts, the second the number of labels per mask. |
None
|
boxes
|
list[list[int]] | None
|
Boxes for the mask generation. The first fimension is the number of prompts, the second the coordinates of the boxes. |
None
|
num_multimask_outputs
|
int
|
Number of masks to generate per prediction. |
3
|
multimask_output
|
bool
|
Whether to generate multiple masks per prediction. |
True
|
return_image_embedding
|
bool
|
Whether to return the image embedding. |
False
|
kwargs
|
Any
|
Additional keyword arguments. |
{}
|
Source code in pixano_inference/models/transformers.py
image_zero_shot_detection(image, classes, box_threshold, text_threshold, **kwargs)
Perform zero shot detection on an image.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image
|
'Tensor' | Image
|
The image. |
required |
classes
|
str
|
The list of classes to detect in the format 'class1. class2'. |
required |
box_threshold
|
float
|
The threshold for bounding boxes detection. |
required |
text_threshold
|
float
|
The threshold for the classes identification during zero shot learning phase. |
required |
kwargs
|
Any
|
Additional arguments. |
{}
|
Returns:
Type | Description |
---|---|
ImageZeroShotDetectionOutput
|
The output of image zero-shot detection task. |
Source code in pixano_inference/models/transformers.py
text_image_conditional_generation(prompt, images, generation_config=None, **kwargs)
Generate text from an image and a prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompt
|
str | list[dict[str, Any]]
|
Prompt for the generation. |
required |
images
|
list['Tensor']
|
Images for the generation. |
required |
generation_config
|
'GenerationConfig' | None
|
Configuration for the generation as Hugging Face's GenerationConfig. |
None
|
kwargs
|
Any
|
Additional keyword arguments. |
{}
|