Skip to content

Latest commit

 

History

History
23 lines (15 loc) · 1.74 KB

image_regions.md

File metadata and controls

23 lines (15 loc) · 1.74 KB

I. Image regions

For the VIST dataset, extracted image regions can be obtained from RoViST-VG.

For AESOP, VWP, and other custom datasets, image regions can be extracted using the FasterRCNN model (with ResNet-101 backbone) trained on Visual Genome data - code.

II. Mapping between image IDs and extracted image regions

For evaluating the sequence(s) of interest, a mapping between the corresponding image-ids and the extracted image region bounding boxes is needed for the metric. For the three visual storytelling datasets, the mapping is available at the respective links:

For new/custom datasets, a similar mapping file can be created by leveraging information during the image regions extraction step.

III. Mapping between story/scene IDs and image IDs

For connecting sequences to corresponding images, a mapping between story/scene ids and respective image ids is needed for the metric. For VIST and VWP datasets, the mapping is available at the respective links:

After obtaining the data needed for I, II, and III, make necessary changes to the configuration file.