Thesis Proposal
"Detailed Human Shape and Pose from Images"
Alexandru Balan
Tuesday, October 7, 2008 at 3:00 P.M.
Room 367 (3rd Floor CIT)
Automating the process of measuring human characteristics and human dynamics from images lies at the core of many applications in computer vision, computer graphics, robotics and bio-mechanics. In its most general form, this problem is severely under-constrained and requires making application-dependent practical considerations, either by employing simplifying assumptions and prior knowledge, or by engineering the environment appropriately.
In this thesis we demonstrate that using a data-driven model of the human body supports the recovery of both human shape and articulated pose from images, and has many benefits over previous body models. Specifically, we represent the body using a recently proposed triangulated mesh model called SCAPE which employs a low-dimensional, but detailed, parametric model of shape and pose-dependent deformations. We show that the parameters of the SCAPE model can be estimated directly from image data in a variety of imaging conditions and present a series of techniques enabled by this model.
We first consider the case of multiple calibrated and synchronized camera views and assume the subject wears tight-fitting clothing. We define a cost function between image silhouettes and a hypothesized mesh and formulate the problem as an optimization over the body shape and pose parameters. Second, we relax the tight-fitting clothing assumption and develop a robust method that accounts for the fact that observed silhouettes of clothed people do not provide tight bounds on the true 3D shape. We think of these silhouettes as providing only weak constraints, but collect many of them while observing the subject in many poses. These, together with strong constraints from regions detected as skin, can be combined with a prior expectation of typical shapes to infer the most likely shape model under the clothes. Third, we consider scenes with strong lighting and show that a point light source and the corresponding cast shadow of the body on the ground provide an additional view equivalent to a silhouette from an actual camera. This means we can effectively reduce the number of cameras needed for successful recovery of the body model by taking advantage of the lighting information in the scene. Results on a novel database of thousands of images of clothed and "naked" subjects, as well as sequences from the HumanEva dataset, suggest these methods may be accurate enough for biometric shape analysis in video.
Host: Michael Black
| Page Owner: Webmaster | Last Modified: Tue Sep 16 11:25:45 2008 |