Perceiving and Recognizing Objects

Chapter 4 of the textbook discusses “perceptual committees” and while we hope that is a useful way to think about what’s going on, it is just a metaphor. There are formal, mathematical models of the way knowledge about regularities in the world constrains the interpretation of ambiguous sensory input. One of the most fruitful approaches is known as the Bayesian approach (Kersten et al., 2004; Yuille and Kersten, 2006). Thomas Bayes (1702–1761), an eighteenth-century Presbyterian minister, was the first to describe the relevant mathematics.

 

 

The image above is a version of textbook Figure 4.36. Here, we retell the story of that figure with a bit more formal detail. Our visual system looks at the stimulus and makes an observation (call it “O”). Suppose we wanted to make our best guess as to what situation in the real world produced that observation. Let’s consider three hypotheses, illustrated in the figure:

  • Hypothesis A says that we’re looking at two green rectangles. The light one has a narrow letter E on it, and the dark one has a narrow numeral 3. These just happen to precisely line up in order to produce a B like the one we observed.
  • Hypothesis B says we’re looking at a green square—half light and half dark green—with a C on it.
  • Hypothesis C says we’re looking at the same green square, but it really has a B on it.

Of course, an infinite number of other hypotheses could be entertained here (it’s really a chicken, it’s really dinner, and so on), but these three will do as representatives.

Which hypothesis represents the most likely explanation of the observation? We can represent the likelihood of hypothesis A as P(A|O), which we would read as the probability of hypothesis A, given observation O. We want to know which probability is the highest: P(A|O), P(B|O), or P(C|O).

Bayes’ theorem gives us a way of computing these probabilities. Specifically, the theorem says:

$P(A|O) = {P(O|A)\,\times\,P(A)\,\over P(O)}$

Let’s unpack this equation a bit. P(A) is the probability that hypothesis A ever happens. Which hypothesis would we think was correct if we had our eyes closed and never made observation O? We don’t know precisely, but given what we know about the world, we can assume that both P(B) and P(C) are likely bigger than P(A), since hypothesis A relies on a very precise and therefore unlikely alignment of a skinny E with a skinny 3. P(A) is what is known as the prior probability; the probability that a given hypothesis is correct before any observations are made.

The term P(O|A), asks, if a given hypothesis is correct, how likely we were to make the observation O. In the figure, hypotheses A and C are likely to lead to observations much like O. Hypothesis B is unlikely. (What are the chances that the letter looks like a B but it is really a C?) So, P(O|A) and P(O|C) are both larger than P(O|B).

The third term, P(O), asks, of all possible observations we could possibly have made, anywhere in the world, what’s the probability of observing O? This number can be difficult to calculate, but thankfully it’s the same for all three hypotheses, so we can ignore the effects of P(O).

What, then, is our best guess about the situation in the real world? Hypotheses B and C have better prior probabilities: P(B) and P(C). Hypotheses A and C have better likelihoods: P(A|O) and P(C|O). Overall, the Bayesian approach will tell us that hypothesis C is a better bet than either hypothesis A or hypothesis B (and that all those chicken and dinner hypotheses are vastly unlikely).

 

References

Kersten, D., Mamassian, P. and Yuille, A. (2004). Object perception as Bayesian inference. Annu Rev Psychol 55: 271–304.

Yuille, A. and Kersten, D. (2006). Vision as Bayesian inference: Analysis by synthesis? Trends Cogn Sci 10: 301–308.

Back to top