Getting started with Pixano-Inference

Installation

To install the library, simply execute the following command

pip install pixano-inference

If you want to dynamically make changes to the library to develop and test, make a dev install by cloning the repo and executing the following commands

cd pixano-inference
pip install -e .

To have access to the providers available in Pixano-Inference, make sure to install the relevant dependencies. For example to use the vLLM provider run:

pip install pixano-inference[vllm]

Usage

Pixano-Inference can be used as:

a Python library to instantiate models from providers and call inference
a server with the same features as the Library that can run asynchronously

See the examples sections that covers several use cases.

Run in a python script

Instantiate a model

To instantiate a model, you first need to instantiate its provider and then call its load_model method.

from pixano_inference.providers import VLLMProvider
from pixano_inference.tasks import MultimodalImageNLPTask

vllm_provider = VLLMProvider()
llava_qwen = vllm_provider.load_model(
    name="llava-qwen",
    task=MultimodalImageNLPTask.CONDITIONAL_GENERATION,
    device=torch.device("cuda"),
    path="llava-hf/llava-onevision-qwen2-0.5b-ov-hf",
    config={
        "dtype": "bfloat16",
    }
)

Call the model for inference

For the inference, call the method associated to the task you wish to perform. For VQA, it is the text_image_conditional_generation method.

from pixano_inference.pydantic import TextImageConditionalGenerationOutput


prompt=[
    {
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/9/9e/Ours_brun_parcanimalierpyrenees_1.jpg"
                },
            },
            {"type": "text", "text": "What is displayed in this image ? Answer with high level of description like 1000 words. "},
        ],
        "role": "user",
    }
]

output: TextImageConditionalGenerationOutput = llava_qwen.text_image_conditional_generation(
    prompt=prompt, max_new_tokens=1000
)
output.generated_text

Run the server

Launch the server

Pixano-Inference as a server requires a running Redis server. Its URL can be configured in a .env file at the root of the server. By default it will serve the localhost at the 6379 port.

Pixano-Inference can invoke a server that will serve the API. To do so, simply execute the following command:

pixano-inference --port 8000

The default port is 8000. You can change it by passing the --port argument.

Instantiate the client

The easiest way to interact with Pixano-Inference is through the Python client. However you can take a look at the swagger http://pixano_inference_url/docs for using the API REST by yourself.

from pixano_inference.client import PixanoInferenceClient


client = PixanoInferenceClient.connect(url="http://localhost:8000")

Instantiate a model

To instantiate a model, you need to provide the path of the model file and the task that the model will perform. For example to run a Llava model from the vLLM provider:

from pixano_inference.pydantic import ModelConfig
from pixano_inference.tasks import MultimodalImageNLPTask


await client.instantiate_model(
    provider="vllm",
    config=ModelConfig(
        name="llava-qwen",
        task=MultimodalImageNLPTask.CONDITIONAL_GENERATION.value,
        path="llava-hf/llava-onevision-qwen2-0.5b-ov-hf",
        config={
            "dtype": "bfloat16",
        },
    ),
)

Run an inference

The client provide methods to run the different models on various tasks. For image-text conditional generation using Llava here is the relevant method:

from pixano_inference.pydantic import TextImageConditionalGenerationRequest, TextImageConditionalGenerationResponse


request = TextImageConditionalGenerationRequest(
    model="llava-qwen",
    prompt=[
        {
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/9/9e/Ours_brun_parcanimalierpyrenees_1.jpg"
                    },
                },
                {"type": "text", "text": "What is displayed in this image ? Answer concisely. "},
            ],
            "role": "user",
        }
    ],
    images=None,
    max_new_tokens=100,
)

response: TextImageConditionalGenerationResponse = await client.text_image_conditional_generation(request)
print(response.data.generated_text)