Variational Recursive Joint Estimation of Dense Scene Structure and Camera Motion from Monocular High Speed Traffic Sequences (ICCV 2011)

Variational Recursive Joint Estimation of Dense Scene Structure and Camera Motion from Monocular Image Sequences

Principal investigators: Florian Becker, Frank Lenzen, Jörg H. Kappes, Christoph Schnörr

We investigate an approach to jointly estimating camera motion and dense scene structure in terms of depth maps from monocular image sequences in driver-assistance scenarios. For two consecutive frames of a sequence taken with a single fast moving camera, the approach combines numerical estimation of egomotion on the Euclidean manifold of motion parameters with variational regularization of dense depth map estimation. Embedding this online joint estimator into a recursive framework achieves a pronounced spatio-temporal filtering effect and robustness. We report the evaluation of thousands of images taken from a car moving at speed up to 100 km/h. The results compare favorably with two alternative settings that require more input data: stereo based scene reconstruction and camera motion estimation in batch mode using multiple frames.

Input Image Sequences and HCI Benchmark Database

The image sequences used in Becker er al (2011) are available for download here. They are part of a database that aims at providing a benchmark for computer vision algorithms in the context of automotive applications.

Details on the acquisition procedure of the image sequences are documented in Meister et al. (2012).

Supplemental Material

Supplemental material for the paper Becker et al. 2011 contains the results of the image sequences discussed (Avenue and Bend sequences) as well as additional sequences and different visualizations. The input data is available for download here.

All videos were encoded using ffmpeg/Linux with codec mpeg4, codecflag divx and output format avi, and thus contain compression artifacts. Uncompressed still-pictures are provided with full spatial resolution (656 x 541 pixels) on the right. Here, the gray-values of the input images are reduced to 8 bit depth and a histogram equalization was applied to increase the contrast for better visualization. Depth maps are encoded using a non-linear color map. As the global scale of the scene cannot be reconstructed from the monocular sequence, the depth maps are only unique up to a factor, which is constant for each sequence.

Avenue sequence: This sequence shows a ride with about 95 km/h along a straight avenue lined with trees.
- movie: the original sequence with the computed depth map superimposed [AVI]
- movie: the original sequence and the computed depth map displayed separately [AVI]
- movie: comparison of our approach without temporal regularization (left depth map) and with temporal regularization (right depth map) [AVI]
- still images (full resolution): input frame [PNG], computed depth map [PNG]
Bend sequence: This sequence shows a ride along a bend at about 72 km/h
- movie: the original sequence with the computed depth map superimposed [AVI]
- movie: the original sequence and the computed depth map displayed separately [AVI]
- still images (full resolution): input frame [PNG], computed depth map [PNG]
City sequence: This sequence describes a ride through a complex city-center scene at about 40 km/h.
- movie: the original sequence with the computed depth map superimposed [AVI]
- movie: the original sequence and the computed depth map displayed separately [AVI]
- still images (full resolution): input frame [PNG], computed depth map [PNG]
Parking sequence: This sequence shows a slow ride over a bumpy parking lot.
- movie: the original sequence with the computed depth map superimposed [AVI]
- movie: the original sequence and the computed depth map displayed separately [AVI]
- still images (full resolution): input frame [PNG], computed depth map [PNG]
Village sequence: This sequence shows a ride through a small town at about 50 km/h
- movie: the original sequence with the computed depth map superimposed [AVI]
- movie: the original sequence and the computed depth map displayed separately [AVI]
- still images (full resolution): input frame [PNG], computed depth map [PNG]

Acknowledgements

The research presented here was conducted at the Heidelberg Collaboratory for Image Processing (HCI). HCI is supported by the DFG, Heidelberg University and industrial partners. The authors thank Dr. W. Niehsen, Robert Bosch GmbH.

Publications

Variational Recursive Joint Estimation of Dense Scene Structure and Camera Motion from Monocular High Speed Traffic Sequences
Florian Becker, Frank Lenzen, Jörg H. Kappes and Christoph Schnörr
In Proceedings of the 2011 International Conference on Computer Vision, pages 1692-1699, 2011. IEEE Computer Society.
[PDF (preprint)] [PDF] [supplemental material] [overview] [BIB (bibtex)]

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

References

An Outdoor Stereo Camera System for the Generation of Real-World Benchmark Datasets with Ground Truth
S. Meister, D. Kondermann and B. Jähne
In SPIE Optical Engineering, 51(2), 2012.
[BIB (bibtex)]