Conclusion and Perspectives, References & Supplementary Materials: Our Journey With ARC

19 Apr 2024

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Sebastien Ferr ´e, Univ Rennes, CNRS, Inria, IRISA Campus de Beaulieu, 35042 Rennes, France and ferre@irisa.fr.

Table of Links

Abstract & Introduction

Abstraction and Reasoning Corpus (ARC)

Related Work

Object-centric Models for ARC Grids

MDL-based Model Learning

Evaluation

Conclusion and Perspectives, References & Supplementary Materials

7 Conclusion and Perspectives

We have presented a novel and general approach to efficiently learn skills at tasks that consist in generating structured outputs as a function of structured inputs. Following Chollet’s measure of intelligence, efficiently learning here means limited knowledge prior for the target scope of tasks, only a few examples per task, and low computational resources.

Our approach is based on descriptive task models that combine object-centric patterns and computations, and on the MDL principle for guiding the search for models. We have detailed an application to ARC tasks on colored grids, and sketched an application to FlashFill tasks on strings. We have shown promising results, especially in terms of efficiency, model complexity, and model naturalness.

Going further on ARC will require a substantial design effort as our current models cover so far a small subset of the knowledge priors that are required by ARC tasks (e.g., goal-directedness). For FlashFill tasks, the addition of a counterpart for loops and common functions is expected to suffice to match the state-of-the-art.

References

1. Acquaviva, S., Pu, Y., Kryven, M., Sechopoulos, T., Wong, C., Ecanow, G., Nye, M., Tessler, M., Tenenbaum, J.: Communicating natural programs to humans and machines. Advances in Neural Information Processing Systems 35, 3731–3743 (2022)

2. Ainooson, J., Sanyal, D., Michelson, J.P., Yang, Y., Kunda, M.: An approach for solving tasks on the abstract reasoning corpus. arXiv preprint arXiv:2302.09425 (2023)

3. Alford, S., Gandhi, A., Rangamani, A., Banburski, A., Wang, T., Dandekar, S., Chin, J., Poggio, T.A., Chin, S.P.: Neural-guided, bidirectional program search for abstraction and reasoning. CoRR abs/2110.11536 (2021), https://arxiv.org/abs/2110.11536

4. Chollet, F.: On the measure of intelligence. arXiv preprint arXiv:1911.01547 (2019)

5. Chollet, F.: A definition of intelligence for the real world. Journal of Artificial General Intelligence 11(2), 27–30 (2020)

6. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. pp. 4171– 4186. Assoc. Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423

7. Elias, P.: Universal codeword sets and representations of the integers. IEEE Trans. Information Theory 21(2), 194–203 (1975)

8. Ellis, K., et al.: Dreamcoder: Bootstrapping inductive program synthesis with wake-sleep library learning. In: ACM Int. Conf. Programming Language Design and Implementation. pp. 835–850 (2021)

9. Faas, M., Leeuwen, M.v.: Vouw: geometric pattern mining using the MDL principle. In: Int. Symp. Intelligent Data Analysis. pp. 158–170. Springer (2020)

10. Fischer, R., Jakobs, M., Mucke, S., Morik, K.: Solving Abstract Reasoning Tasks with Gram- ¨ matical Evolution. In: LWDA. pp. 6–10. CEUR-WS 2738 (2020)

11. Goertzel, B.: Artificial general intelligence: concept, state of the art, and future prospects. Journal of Artificial General Intelligence 5(1), 1 (2014)

12. Grunwald, P., Roos, T.: Minimum description length revisited. International journal of math- ¨ ematics for industry 11(01) (2019)

13. Gulwani, S.: Automating string processing in spreadsheets using input-output examples. In: Symp. Principles of Programming Languages. pp. 317–330. ACM (2011)

14. Johnson, A., Vong, W.K., Lake, B., Gureckis, T.: Fast and flexible: Human program induction in abstract reasoning tasks. arXiv preprint arXiv:2103.05823 (2021)

15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25, 1097–1105 (2012)

16. Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)

17. Lieberman, H.: Your Wish is My Command. The Morgan Kaufmann series in interactive technologies, Morgan Kaufmann / Elsevier (2001). https://doi.org/10.1016/b978-1-55860- 688-3.x5000-3

18. Menon, A., Tamuz, O., Gulwani, S., Lampson, B., Kalai, A.: A machine learning framework for programming by example. In: Int. Conf. Machine Learning. pp. 187–195. PMLR (2013)

19. Muggleton, S., Raedt, L.D.: Inductive logic programming: Theory and methods. Journal of Logic Programming 19,20, 629–679 (1994)

20. Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)

21. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

22. Vreeken, J., Van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Mining and Knowledge Discovery 23(1), 169–214 (2011)

23. Xu, Y., Khalil, E.B., Sanner, S.: Graphs, constraints, and search for the abstraction and reasoning corpus. arXiv preprint arXiv:2210.09880 (2022)

Supplementary Materials

We here describe the contents of the supplementary file accompanying the paper. A public repository of the source code is also available at https://github.com/sebferre/ ARC-MDL for ARC and at https://github.com/sebferre/ARC-MDL-strings for FlashFill. There are two main directories: one for ARC tasks and another for FlashFill tasks.

Task Sets

The public task sets of ARC are available online at https://github.com/fchollet/ARC. There are two task sets: training tasks and evaluation tasks, each containing 400 tasks.

The task set of FlashFill is made of the 14 examples in [13]. We provide them as JSON files in FlashFill/taskset/, using the same format as ARC tasks, except that strings and arrays of strings are used instead of colored grids. For convenience, we also provide the file FlashFill/taskset/all examples.json to allow for browsing all examples in one file.

Results

We provide the learning and prediction logs for each task set:

– ARC/training tasks.log

– ARC/evaluation tasks.log

– FlashFill/tasks.log

Each log file starts with the hyperparameter values, and ends with global statistical measures. For each task, it gives:

– the detailed DL (description length) of the initial model;

– the learning trace (including the pruning phase) as a sequence of refinements, and showing the decrease of the normalized DL;

– the learned models before and after pruning and their detailed DL;

– the best joint description for each training example, except for ARC evaluation tasks so as not to leak their contents to the AI developer (a recommendation made by F. Chollet);

– the prediction for each training and test example;

– and finally a few measures for the task.

The measures given for each task and at the end are the following:

– runtime-learning: learning time in seconds (including the pruning phase);

– bits-train-error: the remaining error commited on output training grids, in bits;

– acc-train-micro: the proportion of training output grids that are correctly predicted;

– acc-train-macro: 1 if all training output grids are correctly predicted, 0 otherwise;

– acc-train-mrr: Mean Reciprocal Rank (MRR) of correct predictions for training output grids, 1 if all first predictions are correct;

– acc-test-micro: the proportion of test output grids that are correctly predicted;

– acc-test-macro: 1 if all test output grids are correctly predicted, 0 otherwise;

– acc-test-mrr: Mean Reciprocal Rank (MRR) of correct predictions for test output grids, 1 if all first predictions are correct.

The reference measure in ARC is acc-test-macro. The micro measures provide a more fine-grained and more optimistic measure of success.

For convenience, we also provide in ARC/solved tasks a picture for each of the 96 training ARC tasks that are solved by our approach. We kindly invite the reader to browse them to get a quick idea of the diversity of the tasks that our approach can solve. The pictures are screenshots from the UI provided along with ARC tasks.

This paper is available on Arxiv under CC 4.0 license.