Entity Linking
Link named entities between image regions and text spans. This guide uses the MEL (Multimodal Entity Linking) dataset format and covers: import, explore the image + text UI, and annotate with bounding boxes and text selection.
What is entity linking?
Entity linking connects mentions of the same real-world entity across different modalities. For example, a sentence might mention “the Eiffel Tower” while the image shows the tower itself. Entity linking creates a shared entity that ties the text span to a bounding box (or mask) on the image.
Prerequisites
- Pixano installed (see Installation)
- A data directory initialized with
pixano init ./my_data
Import the dataset
Prepare the source folder
An entity-linking dataset has an image and a text document per item. Organize your source folder like this:
mel_source/ train/ image_001.jpg image_002.jpg metadata.jsonlEach line in metadata.jsonl describes one item with both an image view and a text view, plus entities that link regions across both:
{ "status": "validated", "views": { "image": "image_001.jpg", "text": "The Eiffel Tower is a wrought-iron lattice tower in Paris." }, "entities": [ { "name": "Eiffel Tower", "annotations": { "image": { "bbox": [0.3, 0.1, 0.4, 0.8] }, "text": { "text_span": [4, 16] } } } ]}views.textis a string (the full text content).text_spanis a[start, end]character offset pair.- Each entity can have annotations on both the image and text views.
Review the dataset info
The dataset info at examples/mel/info.py:
from pixano.datasets import DatasetInfofrom pixano.datasets.workspaces import WorkspaceTypefrom pixano.schemas import BBox, CompressedRLE, Entity, Image, Record, Text, TextSpan
class MELEntity(Entity): name: str = ""
dataset_info = DatasetInfo( name="MEL Sample", description="Sample import for image-text entity linking datasets.", workspace=WorkspaceType.IMAGE_TEXT_ENTITY_LINKING, record=Record, entity=MELEntity, bbox=BBox, mask=CompressedRLE, text_span=TextSpan, views={"image": Image, "text": Text},)Key elements:
WorkspaceType.IMAGE_TEXT_ENTITY_LINKINGenables the image + text panel UI.viewshas two entries:"image"(typeImage) and"text"(typeText).TextSpanis the annotation type that marks character ranges in the text.- Entities can be linked to both
BBox(on the image) andTextSpan(on the text).
Run the import
pixano data import ./my_data ./mel_source \ --info examples/mel/info.py:dataset_infoLaunch the server
pixano server run ./my_dataExplore in the UI
Item page — entity linking mode
When you open an item from an entity-linking dataset, you see:
- Image in the center — with bounding boxes and masks drawn on it.
- Text panel on the left — displaying the full text with color-coded spans. Each text span is highlighted with the same color as its linked entity’s bounding box.
Text panel interactions
- Click a text span to select the corresponding entity. The entity’s bounding box highlights on the image, and its object card opens in the right panel.
- Select text and click “Tag Selected Text” to create a new text span annotation. The creation panel opens so you can assign it to an existing entity or create a new one.
Object panel
The Object panel on the right shows all entities. Expand an entity card to see:
- Features — entity attributes (name, etc.).
- Objects — sub-objects (bounding boxes, masks, text spans).
- Text spans — a table showing each text span’s features alongside the highlighted text.
- Select text in the text panel and click “Tag Selected Text”.
- In the creation prompt, choose an existing entity (or create a new one).
- Draw a bounding box on the image for the same entity.
- Both annotations are now linked through the shared entity.
Annotate
Bounding boxes
Use the bounding box tool on the image to draw regions around the entity. When prompted, select the entity you want to link it to.
Text spans
Select text in the text panel and click “Tag Selected Text”. Choose the same entity that has the bounding box annotation.
Masks
If your dataset includes CompressedRLE masks, you can also use the polygon tool or SAM to create segmentation masks linked to entities.
Next steps
- Visual Question Answering — image + conversation Q&A
- Object Detection — image-only detection with pre-annotation
- API Reference — full Python API documentation