High-Order Markov Random Fields for Low-Level Vision

Stefan Roth
Ph.D. Dissertation, Brown University, May 2007.

Abstract. Low-level vision is a fundamental area of computer vision that is concerned with the analysis of digital images at the pixel level and the computation of other dense, pixel-based representations of scenes such as depth and motion. Many of the algorithms and models in low-level vision rely on a representation of prior knowledge about images or other dense scene representations. In the case of images, this prior knowledge represents our a-priori belief in observing a particular image among all conceivable images. Such prior knowledge can be supplied in a variety of different ways; a wide range of low-level vision techniques represent the prior belief using Markov random fields (MRFs). MRFs are a compact and efficient probabilistic representation, and are particularly appropriate for spatially arranged data, such as the pixels in an image. Markov random fields have a long history in low-level computer vision; their representational power, however, has often been limited by restricting them to very local spatial structures.
This dissertation introduces a novel, expressive Markov random field model for representing prior knowledge in low-level vision, for example about images and image motion (optical flow). This high-order MRF model, called Fields of Experts (FoE), represents interactions over larger spatial neighborhoods compared to many previous MRF models. Learning the parameters of large MRF models from training data, as well as inferring the quantity of interest (e. g., the noise-free image) are known to be very challenging, both algorithmically and computationally. This is even more so in models that represent complex spatial interactions and have many parameters, such as the FoE model. This dissertation describes machine learning techniques that enable approximate learning and inference with these models. The core thesis developed in this work is that these high-order Markov random fields are more powerful models for representing prior knowledge in low-level vision than previous MRF models, and that they lead to competitive algorithms for varied problems such as image denoising and the estimation of image motion.

Note: The copyright of the content lies with the respective holder.