Assignment 4

    Task 1 Due Feb 27.

    Tasks 2&3 Due Mar 5.

Goal:

The previous assignment gave us the pose of the mean head in every frame. Now we will estimate the head shape by combining data from multiple frames.  Here we will still only use clicked image points.

Data and support code

You need the code you wrote for Assignment 3 and the clicked points in the course directory.

Task 1 (25pts) 

Initialization. Take the manually clicked points for each frame.  Extend your method from Assignment 3 to use multiple sets of clicked points.  Read in all lmark data for a given frame and optimize the head pose.  This will be a bit painful as you need to read in all the data and some frames are missing and some frames may have more lmarks than others.  Show several frames illustrating the initial head pose.

Given your initial pose estimate in each frame, compute the visible points in each frame.  Lift the image texture for the visible points in each frame and compute the average face texture across all frames (ie count the number of times a vertex is visible and divide the sum of the colors by that).  Show your mean head.

Task 2 (50pts)

Optimization. Given your initial head pose in every frame, keep this fixed and update the shape parameters by minimizing the following objective function (or one you define yourself) over the linear shape coefficients c={c_1, ..., c_K}:

E(c) = sum_{i=1}^N (P(R_i*(mu+sum_{j=1}^K c_j b_j) +t_i)- u_i)^2 +  lambda E_s(c)

Where N is the number of frames, K is the number of basis heads, b_j, (eg 3-6), P(.) is the perspective projection operation x_i are the model points and u_i are the corresponding image points.  Note that you'll actually have an additional sum over each of the input set of lmarks for a given frame.  I've also left out any indexing into the lmarks for clarity.

You should be able to use the same optimization technique (fminsearch) as in Assign 3; I tried this on one frame and it worked.

E_s(c) is a prior on the coefficients (see B&V paper); I don't know if it is needed so vary the value of the scalar lambda to see if it has much effect.

Note that there may be gross errors in the clicked points.  The objective function I defined above gives a quadratic penalty no matter how far off the point is.  You should make this more robust.  The easiest way is just to set a threshold on the distance.

Task 3 (25pts)

Given the new shape, hold it fixed and recomputed the head pose in each frame. Report what happens to the squared pixel errors in the projected points.  Repeat this, alternating between estimating pose and shape until convergence.

Show the recovered head from several viewpoints (front, profile, 3/4 views)..

Texture map the head by averaging the appearance from all the frames (as in Task 1).  Show your result.