In the context of PhotoBook, a reference chain is a sequence of "elements" within a given game that refer to the same target image. We have extracted two different datasets of reference chains from the PhotoBook dialogues.

The PhotoBook Dataset

The PhotoBook Dataset was collected using a dedicated conversation task called the PhotoBook Task. In the PhotoBook task, two participants are paired for an online multi-round image identification game. In this game they are shown collections of images which resemble the page of a photo book. Each of these collections is a randomly ordered grid of six similar images depicting everyday scenes extracted from the MS COCO Dataset. On each page of the photo book, some of the images are present in the displays of both participants (the common images). The other images are each shown to one of the participants only (the different images). Three of the images in each display are highlighted through a yellow bar under the picture. The participants are tasked to mark these highlighted target images as either common or different by chatting with their partner. A full game consists of five consecutive rounds, where some of the previously displayed images will re-appear in later rounds, prompting participants to re-refer to them multiple times. For a detailed account of the dataset and data collection process, see our papers The PhotoBook Dataset: Building Common Ground through Visually Grounded Dialogue and How should we call it? - Introducing the PhotoBook Conversation Task and Dataset for Training Natural Referring Expression Generation in Artificial Dialogue Agents. Further details can be found on the dataset information page. You can also visually browse the dataset from this website!

Download the PhotoBook log files (zip-compressed, 8MB)

Download the PhotoBook image sets (zip-compressed, 60MB)

Utterance-based Reference Chains

In PhotoBook, participants can freely interact via chat so the dialogues include different types of dialogue act. In our paper Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts, we concentrate on a specific type of dialogue contribution, referring utterances. From each PhotoBook game between two specific participants, we automatically extract all the key messages that contain an image description referring to a given image target. The extraction procedure is described in the paper and the corresponding code is on our GitHub repository. The result of this procedure is a new dataset of reference chains made up of 16,525 reference chains and a total of 41,340 referring utterances. You can visually browse through the reference chains on this website.

Contact Mario Giulianelli if you have any questions.

Download the utterance-based reference chains (v2)
To reproduce the results presented in the Refer, Reuse, Reduce paper, please download version 1 of the dataset.

Segment-based Reference Chains

In our paper The PhotoBook Dataset: Building Common Ground through Visually Grounded Dialogue, we propose a simple procedure to automatically extract reference chains made up of dialogue segments. A dialogue segment is defined as a collection of consecutive utterances that, as a whole, discuss a given target image and include expressions referring to it. The documented segment chain dataset creator can be downloaded from this GitHub repository (to execute the files, place the logs folder into the project's data folder). On this dataset information page you can find sample segment chains and relevant statistics of the segment chains dataset, as well as additional information about the segmentation heuristics and their evaluation and validation. You can also visually browse through the reference chains on this website.

Download the segment-based reference chains