I. Image regions

For the VIST dataset, extracted image regions can be obtained from RoViST-VG.

For AESOP, VWP, and other custom datasets, image regions can be extracted using the FasterRCNN model (with ResNet-101 backbone) trained on Visual Genome data - code.

II. Mapping between image IDs and extracted image regions

For evaluating the sequence(s) of interest, a mapping between the corresponding image-ids and the extracted image region bounding boxes is needed for the metric. For the three visual storytelling datasets, the mapping is available at the respective links:

VIST: mapping info file
AESOP (test set only): mapping info file
VWP: mapping info file

For new/custom datasets, a similar mapping file can be created by leveraging information during the image regions extraction step.

III. Mapping between story/scene IDs and image IDs

For connecting sequences to corresponding images, a mapping between story/scene ids and respective image ids is needed for the metric. For VIST and VWP datasets, the mapping is available at the respective links:

VIST: story id to image ids
AESOP: not required - since all sequences are made up of 3 images and all image ids follow a defined namespace.
VWP: story id to image ids

After obtaining the data needed for I, II, and III, make necessary changes to the configuration file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image_regions.md

image_regions.md

I. Image regions

II. Mapping between image IDs and extracted image regions

III. Mapping between story/scene IDs and image IDs

Files

image_regions.md

Latest commit

History

image_regions.md

File metadata and controls

I. Image regions

II. Mapping between image IDs and extracted image regions

III. Mapping between story/scene IDs and image IDs