Modeling Appearance Change in Image Sequences

(with David Fleet, Yaser Yacoob)

Examples of appearance change:

As Gibson noted, the world is made up of surfaces that ``flow or undergo stretching, squeezing, bending, and breaking in ways of enormous mechanical complexity.'' These events result in a wide variety of changes in the ``appearance'' of objects in a scene. While motion and illumination changes are examples of common scene events that result in appearance change, numerous other events occur in nature that cause changes in appearance. For example, the color of objects can change due to chemical processes (eg., oxidation), objects can change state (eg., evaporation, dissolving), or objects can undergo radical changes in structure (eg., exploding, tearing, rupturing, boiling). In this paper we formulate a general framework for representing appearance changes such as these. In so doing we have three primary goals. First, we wish to ``explain'' appearance changes in an image sequence as resulting from a ``mixture'' of causes. Second, we wish to locate where particular types of appearance change are taking place in an image. And, third, we want to provide a framework that generalizes previous work on motion estimation.

We propose four generative models to ``explain'' the classes of appearance change illustrated above. A change in ``form'' is modeled as the motion of pixels in one image to those in the next image. An image at time t+1 can be explained by warping the image at time t using this image motion.

Illumination variations may be global, occurring throughout the entire image due to changes in the illuminant, or local as the result of shadowing. Here we model illumination change as a smooth function that amplifies/attenuates image contrast. By comparison, specular reflections are typically local and can be modeled, in the simplest case, as a near saturation of image intensity.

The fourth class of events considered in this paper is iconic change. We use the word ``iconic'' to indicate changes that are ``pictorial.'' These are systematic changes in image appearance that are not readily explained by physical models of motion, illumination, or specularity. A simple example is the blinking of the eye shown above. Examples of physical phenomena that give rise to iconic change include occlusion, disocclusion, changes in surface materials, and motions of non-rigid objects. In this paper we consider iconic changes to be object specific and we ``learn'' models of the the iconic structure for particular objects.

These different types of appearance change commonly occur together with natural objects; for example, with articulated human motion or the textural motion of plants, flags, water, etc. We employ a probabilistic mixture model formulation to recover the various types of appearance change and to perform a soft assignment, or classification, of pixels to causes. We use the EM-algorithm to iteratively compute maximum likelihood estimates for the deformation and iconic model parameters as well as the posterior probabilities that pixels at time t are explained by each of the causes. These probabilities are the ``weights'' illustrated below and they provide a soft assignment of pixels to causes.

Modeling appearance change as a mixture of causes:

Illumination-Change Example

Figure 1: (click to animate results)

a. b. c.

d. e. f.

In this example, the appearance variation between frames (Figure 1a) includes both global motion and an illumination change caused by a shadow of a hand in frame t+1. The estimated motion field (Figure 1f) contains some expansion as the background surface moved towards the camera. Figure 1b shows the second image warped towards the first based on the motion field.

Figure 1d shows the weights corresponding to the illumination model while Figure 1e shows the weights corresponding to the motion model. The motion weights are near 1 (white) when the appearance change is captured by motion alone. When there is illumination change as well as motion, these weights are near 0 (black). The gray regions indicate weights near 0.5 which are equally well described by the two models.

We can produce a ``stabilized'' image using the weights to combine information from the motion and illumination models. This stabilized image is shown in Figure 1c. Not that the shadow has been "removed" and the image is visually similar to the image at time t.

Specularity Example

Figure 2: (click to animate results)

a. b. c.

d. e. f.

Consider the example in Figure 2 in which a stapler with a prominent specularity on the metal plate is moved. We model this situation using a mixture of motion and specularity models. This simplified model of specularities assumes that some regions of the image at time t can be modeled as a warp of the image at time t+1 while others are best modeled as a linear brightness function.

The estimated flow field is shown in Figure 2f and the motion-stabilized image is shown in Figure 2b. The stabilized image, using motion and the estimated linear brightness model is shown in Figure 2c. The ownership weights for the specularity and motion models are shown in Figures 2d and 2e respectively. Note how the weights in Figure 2e are near zero for the motion model where the specularity changes significantly. The region of specularity in the lower right corner of the metal plate is similar in both frames and hence is ``shared'' by both models.

Iconic Change Example

Figure 3: (click to animate results)

a.

b. c. d.

e. f. g.

Figure 3 shows the method applied to smiling sequence in which some of the appearance change between frames is due to motion while some is iconic (notice the appearance of teeth between frames). Animate the sequence to see the lips moving in Figure 3a.

The motion model does a good job of capturing the deformation around the mouth but cannot account for the appearance of teeth. The recovered flow field is shown in Figure 3g and one can see the expansion of the mouth.

The iconic model on the other hand, does a reasonable job of recovering an approximate representation of the image at time t (Figure 3b). The iconic model however does not capture the brightness structure of the lips in detail. This behavior is typical. The iconic model is an approximation to the brightness structure so, if the appearance change can be described as a smooth deformation, then the motion model will likely do a better job of explaining this structure.

The behavior of the mixture model can be seen in the weights (Figures 3e and 3f). The weights for the motion model (3f) are near zero in the region of the teeth, near one around the high contrast boarder of the lips, and near 0.5 in the untextured skin region which is also well modeled by the iconic approximation.

Figure 3c is the ``stabilized'' image using just the motion model while Figure 3d uses both motion and iconic models to stabilize the sequence. Note how the stablized image resembles the original image in Figure 3a. Also notice that the iconic model fills in around the edges of the stabilized image where no information was available for warping the image.

Related Publications

M. J. Black, D. J. Fleet, and Y. Yaccob, Robustly estimating changes in image appearance, Computer Vision and Image Understanding, Special Issue on Robust Statistical Techniques in Image Understanding, 78(1), pp. 8-31, 2000. (pdf).

Black, M. J., Fleet, D. J., Yacoob, Y., A Framework for Modeling Appearance Change in Image Sequences, Sixth International Conf. on Computer Vision, ICCV'98, Mumbai, India, Jan. 1998, pp. 660-667. (postscript)

Black, M. J., Yacoob, Y., Fleet, D. J., Modeling appearance change in image sequences, 3rd International Workshop on Visual Form, Capri, Italy, May 1997.