Causal induction, i.e., identifying unobservable mechanisms that lead to the observable relations among variables, has played a pivotal role in modern scientific discovery, especially in scenarios with only sparse and limited data. Humans, even young toddlers, can induce causal relationships surprisingly well in various settings despite its notorious difficulty. However, in contrast to the commonplace trait of human cognition is the lack of a diagnostic benchmark to measure causal induction for modern Artificial Intelligence (AI) systems. Therefore, in this work, we introduce the Abstract Causal REasoning (ACRE) dataset for systematic evaluation of current vision systems in causal induction. Motivated by the stream of research on causal discovery in Blicket experiments, we query a visual reasoning system with the following four types of questions in either an independent scenario or an interventional scenario: direct, indirect, screening-off, and backward-blocking, intentionally going beyond the simple strategy of inducing causal relationships by covariation. By analyzing visual reasoning architectures on this testbed, we notice that pure neural models tend towards an associative strategy under their chance-level performance, whereas neuro-symbolic combinations struggle in backward-blocking reasoning. These deficiencies call for future research in models with a more comprehensive capability of causal induction.


ACRE: Abstract Causal REasoning Beyond Covariation
Chi Zhang, Baoxiong Jia, Mark Edmonds, Song-Chun Zhu, Yixin Zhu
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Paper / Supplementary / Code / Blog


Chi Zhang

Baoxiong Jia

Mark Edmonds

Song-Chun Zhu

Yixin Zhu

UCLA Center for Vision, Cognition, Learning, and Autonomy


  • Each ACRE problem consists of 10 panels: 6 for context and 4 for query.
  • In queries, we ask a visual reasoning system to predict the state of the Blicket machine given the objects in the queries.
  • In addition to the IID split, we create a compositionality split and a systematicity split.

  • Compositionality (Comp): we assign different shape-material-color combinations to the training and test set similar to CoGenT in CLEVR.
  • Systematicity (Sys): we vary the distribution of an activated Blicket detector in the context panels, with the machine lighting up 3 times in the training set and 4 times during testing.
  • Download the dataset here and check our GitHub for dataset organization.

  • ACRE-Comp
  • ACRE-Sys
  • Code

    View on GitHub


     title={ACRE: Abstract Causal REasoning Beyond Covariation},
     author={Zhang, Chi and Jia, Baoxiong and Edmonds, Mark and Zhu, Song-Chun and Zhu, Yixin},
     booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},