Perception and Analogy-Making in Visual Understanding

Students: Lanfranco Muzi, Mick Thomure

Grant support: J. S. McDonnell Foundation

Introduction

This research focuses on developing computer programs that can perceive patterns and analogies, using inspiration from pattern perception in complex adaptive systems. This work explores some of the most fundamental questions of artificial intelligence. How does a person (or how might a computer program) mentally explore the typically intractably huge number of possible ways of understanding what is going on and possible similarities to other scenes or situations? How does the perceiver continue to explore new possibilities when the stimuli are continually changing? More generally, how can we build a computer program that achieves this fluidity of perception evident in natural systems and avoids the brittleness that plagues present-day computers?

Mitchell and Hofstadter developed a computer program called ``Copycat'' that addresses these questions. Copycat perceives patterns and makes analogies in the domain of letter strings, solving problems such as ``If 'abc' changes to 'abd', what is the analogous change to 'kkjjii'?'' In Copycat, the perception of objects, relationships, and analogies is carried out by a swarm of simple, relatively autonomous agents (analogous to individual ants in a colony or cells in the immune system) acting with no central control. The global perception of a scene or situation comes about as a result of many small, diverse, often redundant, and often fruitless explorations by these agents. This strategy allows the exploration of many different possible ways of understanding a situation to be carried out simultaneously, but at varying speeds and depths, which change continually as information is gathered about what is promising. Over time, a coherent perception is discovered and rival perceptual candidates fade out, though are never completely gone. This is similar to the way that exploration proceeds in the immune system, in ant colonies, and, we claim, in human cognition.

Copycat was shown to give human-like answers to a large set of letter-string analogy problems, many of which required what people consider to be sophisticated pattern recognition and creativity. While Copycat's perceptual abilities were promising, they were limited to the domain of letter strings. It remains to be demonstrated that such a system will work well on more realistic situations requiring a much larger repertoire of concepts, or that these ideas will be useful for understanding information processing in natural complex systems.

Our research on perception and analogy-making---extending the Copycat project---is described below.


Bongard Problem Solver ("B-Cat")


(a) (b)



(c) (d)
Figure 1: Four sample Bongard problems, from M. Bongard, Pattern Recognition, Spartan Books, 1970. Each problem consists of two sets of six boxes, where the six boxes on the left represent one concept and the six boxes on the right represent a contrasting concept. The answers to the problems above are (a) vertical versus horizontal; (b) triangle versus quadrilateral; (c) triangle versus circle; (d) three versus four.

We are developing a computer program called "B-cat" ---a successor to Copycat---that can interpret and make analogies between visual figures, focusing on a set of visual pattern discovery problems called "Bongard problems" (M. Bongard, Pattern Recognition, Spartan Books, 1970; D. Hofstadter, Godel, Esher, Bach: an Eternal Golden Brain, Basic Books, 1979.

Four problems from Bongard's collection are given in Figure 1 above. Each problem consists of two sets of six boxes, where the six boxes on the left represent one concept and the six boxes on the right represent a contrasting concept. The answers to the problems above are (a) vertical versus horizontal; (b) triangle versus quadrilateral; (c) triangle versus circle; (d) three versus four. These problems all use simple black and white line drawings, and yet they contain, in an idealized form, many important issues of visual understanding. The B-cat program we are developing will (we believe) be able to solve such problems. For each problem, the input will be the raw pixels that make up the 12 boxes; the output will be a representation of the problem that will allow the program to classify any new boxes presented to it as "left", "right", or "neither".

Doing this will require the extraction and integration of both lower-level visual information (edges, segmentation of objects, characterization of shapes) higher-level "conceptual" information (object recognition, perceptual organization, and abstract analogy-making). The visual process in B-cat will not be only feed-forward from low to high levels. Instead, we claim that information discovered at higher levels must feed back to lower levels and influence what is then discovered at the lower levels.

This work is complementary to (and extends) the recent work of Harry Foundalis on Phaeaco, an architecture that is able to solve some Bongard problems.

Selected Publications from this Project: