The primary goal of scene sensing is to enable 3-D scene and object models to be reconstructed directly from sequences of video images. The idea is a simple one: we want to be able to reconstruct and model the 3-D world by merely using a commercially available video camcorder and continuously videotaping the scene while moving the camcorder. Such 3-D models extracted can be used in graphical user interfaces. We term the technology of reproducing 3-D models by scanning them in using a camcorder as videocopying (an allusion to "photocopying" in 2-D).
We are also working on producing photorealistic single object models from video sequences using a variety of computer vision and computer graphics tools.
The project comprises two parts: 3-D data acquisition through omnidirectional multibaseline stereo (or multiple panoramic images), and extraction of 3-D models from 3-D data, as shown below.

We extract 3-D data of the scene using omnidirectional multibaseline stereo, i.e., from multiple composited 360-degree panoramic views of the scene. Each panoramic view at a particular camera location is created by compositing a sequence of images; this image sequence is taken while rotating the camera about the vertical axis through the camera optical center. The panoramic views are taken at different locations. The 3-D data can then be extracted by matching and triangulating.
The advantage of using panoramic views is that 3-D data corresponding to a wide field of view is acquired all at once. This is in comparison to merging many disparate depth maps from different pairs of views to generate similar 3-D output.
As an example, a sequence of rotated camera views (at a given camera position) of the vision-based interaction lab at CRL looks like this:

...

The composited panorama of the lab looks like this:
![]()
The plan view of extracted 3-D data points from stereo and recovered 3-D mesh are (left and right respectively):


Six panoramic images, successively taken about 6 inches apart, were used to extract the 3-D model of the lab. The longest dimensions of the lab are 15 feet by 25 feet.
Once the 3-D data have been acquired, the next step is to extract a 3-D model for rendering.
The model is represented as a 3-D mesh which comprises faces, each of which is texture-mapped with parts of actual images recorded earlier for realistic rendering. The initial 3-D mesh created from raw 3-D data is processed to simplify it (by reducing the number of faces) for faster rendering. In addition, this helps to reject outliers and reduce the effects of noise. These goals are accomplished by fitting planar patches to the acquired data. The display interface can be controlled using both the mouse and keyboard. For the lab example, we show an example view of the resulting 3-D model which has been texture-mapped:

To view example 3-D models (only if you have a VRML browser):
Panoramic test data are available.
In general, multiple views are required to create a complete 3-D model of an object or a multi-roomed indoor scene. We have also worked on the the problem of merging multiple textured 3-D data sets (for example, from omnidirectional multibaseline stereo), where each of data set corresponds to a different view of a scene. There are two steps to the merging process: registration and integration.
Registration is the process by which data sets are brought into alignment. To this end, we use a modified version of the Iterative Closest Point algorithm (ICP); our version, which we call color ICP, considers not only 3-D information, but color as well. This has shown to have resulted in improved performance.
Once the 3-D data sets have been registered, we then integrate them to produce a seamless, composite 3-D textured model. Our approach to integration uses a 3-D occupancy grid to represent likelihood of spatial occupancy through voting. The occupancy grid representation allows the incorporation of sensor modeling. The surface of the merged model is recovered by detecting ridges in the occupancy grid, and subsequently polygonized using the standard Marching Cubes algorithm. Texture merging is accomplished by trilinear interpolation of overlapping textures corresponding to the original contributing data sets.
A result of merging multiple 3-D data sets is shown below. The six registered 3-D data is shown in (a), with each shaded color sphere representing the corresponding data center. A horizontal slice of the recovered surface probability distribution is shown in (b), with (c)-(e) showing the resulting surface of the merged model at different viewpoints.

The figure below shows the full textured merged model. 
Sing Bing Kang 29 Jan 1997