Abstract

Dramatic progress has been witnessed in basic vision tasks involving low-level perception, such as object recognition, detection, and tracking. Unfortunately, there is still an enormous performance gap between artificial vision systems and human intelligence in terms of higher-level vision problems, especially ones involving reasoning. Earlier attempts in equipping machines with high-level reasoning have hovered around Visual Question Answering (VQA), one typical task associating vision and language understanding. In this work, we propose a new dataset, built in the context of Raven's Progressive Matrices (RPM) and aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation. Unlike previous works in measuring abstract reasoning using RPM, we establish a semantic link between vision and reasoning by providing structure representation. This addition enables a new type of abstract reasoning by jointly operating on the structure representation. Machine reasoning ability using modern computer vision is evaluated in this newly proposed dataset. Additionally, we also provide human performance as a reference. Finally, we show consistent improvement across all models by incorporating a simple neural module that combines visual understanding and structure reasoning.

Paper

RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
Chi Zhang*, Feng Gao*, Baoxiong Jia, Yixin Zhu, Song-Chun Zhu
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
(* indicates equal contribution.)
Paper / Supplementary / Blog

Team

Chi Zhang1,2

Feng Gao1,2

Baoxiong Jia1

Yixin Zhu1,2

Song-Chun Zhu1,2

1 UCLA Center for Vision, Cognition, Learning and Autonomy

2 International Center for AI and Robot Autonomy (CARA)

Dataset

Thanks for your interest in RAVEN. We are now finalizing the dataset format and preparing for the code release. Over the past few months, we have been trying to add more annotations to the dataset. If everything goese well, we will release the dataset and all the code for dataset generation and model training around mid-May. Please stay tuned!

The dataset is generated using Attributed Stochastic Image Grammar.

In total, we have 7 configurations.

Try it yourself!

Code

Code for dataset generation and model training will be released around mid-May. Please check back later.

Bibtex

@inproceedings{zhang2019raven,
author={Zhang, Chi and Gao, Feng and Jia, Baoxiong and Zhu, Yixin and Zhu, Song-Chun},
title={RAVEN: A Dataset for Relational and Analogical Visual rEasoNing},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2019}}