next up previous
Next: Affine Motion Tracking Up: 3-D Surface Normals and Previous: Introduction

Camera and Structure Model

Assume that a rigid surface in a 3-D scene is moving relative to a stationary camera and let points in the scene be defined with respect to the camera reference frame, with the z-axis aligned with the optical axis and the image plane lying in the xy-plane. As illustrated in Fig 1, the origin is then at the point where the optical axis cuts the image plane. For this model, perspective projection of a 3-D point tex2html_wrap_inline658 onto an image point tex2html_wrap_inline660 is given by

  equation401

where tex2html_wrap_inline662 is the inverse focal length. In contrast to the usual approach in which the origin is at the centre of projection, this model decouples the representation of depth from that of the camera, ie focal length, and this enables independent estimation of both as in [1]. The 2-D motion tex2html_wrap_inline664 induced by the motion of a 3-D point is then given by

equation403

Expressing the 3-D motion in terms of the instantaneous rectilinear and angular velocities, tex2html_wrap_inline666 and tex2html_wrap_inline668 respectively, ie

equation405

then gives an alternate form of the basic motion equations:

   eqnarray62

where because of the difference in origin, depth and angular velocity are no longer decoupled, as they are for the usual camera model [4].

   figure71
Figure 1: Camera and surface model

The structure model is based on the assumption that the scene consists of smooth surfaces and that these can be modelled by piecewise planar approximations, defined by sets of local normals and depths. Denoting the unit normal of one such surface at a point tex2html_wrap_inline670 by tex2html_wrap_inline672 , then combining the equation of the tangent plane, ie

equation407

with the projection model in eqn (1), the variation in depth about the projected point tex2html_wrap_inline674 can be approximated by

equation409

where tex2html_wrap_inline676 is the perpendicular distance of the surface point from the origin as shown in Fig. 1. Replacing tex2html_wrap_inline678 with this expression in equations (4) and (5) then gives a non-linear expression for the motion field about tex2html_wrap_inline674 in terms of the spatial coordinates, 3-D motion, surface normal and focal length. Denoting this expression by tex2html_wrap_inline682 , where k indicates the dependence on the local planar structure, a six parameter affine approximation to the motion field can be obtained by linearising about the projection centre tex2html_wrap_inline674 , ie

  equation411

where tex2html_wrap_inline688 is the Jacobian of tex2html_wrap_inline682 . Thus, for a rigid surface moving with motion tex2html_wrap_inline692 (or, alternatively, surfaces in a static scene viewed by a moving camera), eqn (8) defines an affine approximation to the motion field associated with local patches on the surface. Note that the affine parameters are non-linearly related to the 3-D structure and motion, hence the use of an EKF for their estimation. This is considered in Section 4. How such affine estimates are obtained from an image sequence is considered next.


next up previous
Next: Affine Motion Tracking Up: 3-D Surface Normals and Previous: Introduction

Andrew Calway
Mon Dec 4 11:27:23 GMT 2000