Homepage of Frank Lenzen


Variational Recursive Joint Estimation of Dense Scene Structure and Camera Motion from Monocular High Speed Traffic Sequences (IJCV 2013)

Variational Recursive Joint Estimation of Dense Scene Structure and Camera Motion from Monocular Image Sequences

Principal investigators: Florian Becker, Frank Lenzen, Jörg H. Kappes, Christoph Schnörr

We present an approach to jointly estimating camera motion and dense structure of a static scene in terms of depth maps from monocular image sequences in driver-assistance scenarios. At each instant of time, only two consecutive frames are processed as input data of a joint estimator that fully exploits second-order information of the corresponding optimization problem and effectively copes with the non-convexity due to both the imaging geometry and the manifold of motion parameters. Additionally, carefully designed Gaussian approximations enable probabilistic inference based on locally varying confidence and globally varying sensitivity due to the epipolar geometry, with respect to the high-dimensional depth map estimation. Embedding the resulting joint estimator in an online recursive framework achieves a pronounced spatio-temporal filtering effect and robustness.

one frame of a monocular image sequence with the displacement field superimposed (arrows) subsequent frame of the Bend image sequence
Two consecutive frames (size: 656 x 541 pixels) of the Bend image sequence recorded by a fast moving camera. Displacements of up to 35 pixels can be observed. Click to enlarge the images.
gray-value frame with the reconstructed depth map (color-encoded) superimposed reconstructed scene structure and camera track
Left: Our approach jointly estimates the scene structure represented by a dense depth map (visualized using a non-linear color map) and the camera motion in an online recursive framework. Right: reconstruction of the scene structure based on the depth map from the camera's viewpoint (green symbol) as well as the camera track (red line). Click to enlarge the images.
estimated variance of the depth estimate
Estimated precision of the depth map. Image regions without texture (e.g. sky) and near the epipole (near image center) do not allow to derive scene information and thus their distance estimate are inferred using the temporal and spatial smoothness priors. However, the approach also provides the presented precision map which marks these pixels as unreliable (dark blue) in comparison to reliable ones (dark red).
Further information and results can be found in Becker et al. (2013).

Input Image Sequences and HCI Benchmark Database

Most image sequences used in Becker et al. (2013) and Becker et al. (2011) are available for download here. They are part of database that aims at providing a benchmark for computer vision algorithms in the context of automotive applications. Details on the acquisition procedure of the image sequences are documented in Meister et al. (2012).

Results and Supplemental Material

The proposed methods was applied to six real and one synthetic image sequences. In most sequences, only minor motions (pedestrians, trees) exist to fit the static scene assumption. This assumption is violated in the Junction and enpeda sequences due to moving objects (cars) and leads to distortions in the depth map which are, however, locally and temporally restricted to the occurrence of the non-static elements.

Visualization details: all images were encoded using ffmpeg/Linux with codec mpeg4, codecflag divx and output format mpg, and thus contain compression artifacts. The gray-values of the input images are reduced to 8 bit depth and a histogram equalization was applied to increase the contrast for better visualization. Depth maps are encoded using a non-linear color map. As the global scale of the scene cannot be reconstructed from the monocular sequence, the depth maps are only unique up to a factor, which is approximately constant for each sequence.

  • Avenue sequence: input frame (left) and estimated depth map (right) Avenue sequence: This sequence shows a ride with about 100 km/h along a straight avenue lined with trees.
  • Bend sequence: input frame (left) and estimated depth map (right) Bend sequence: This sequence shows a ride along a bend at about 70 km/h.
  • City sequence: input frame (left) and estimated depth map (right) City sequence: This sequence describes a ride through a complex city-center scene at about 40 km/h.
  • Parking sequence: input frame (left) and estimated depth map (right) Parking sequence: This sequences shows a slow and bumpy ride through a parking lot which makes this sequence a very challenging one.
  • Village sequence: input frame (left) and estimated depth map (right) Village sequence: This sequence shows a ride through a small town at about 50 km/h.
  • Junction sequence: input frame (left) and estimated depth map (right) Junction sequence: This sequence contains a challenging 90 degree turn. Moving objects (cars) significantly violate the assumption of a static scene.
    • movie: input sequence and computed depth map [MPG]

  • enpeda sequence: input frame (left) and estimated depth map (right) enpeda sequence: This synthetic sequence contains moving objects (cars) which significantly violates the assumption of a static scene.

Acknowledgments

The research presented here was conducted at the Heidelberg Collaboratory for Image Processing (HCI). HCI is supported by the DFG, Heidelberg University and industrial partners. The authors thank Dr. W. Niehsen, Robert Bosch GmbH.

Publications

  • Variational Recursive Joint Estimation of Dense Scene Structure and Camera Motion from Monocular High Speed Traffic Sequences
    Florian Becker, Frank Lenzen, Jörg H. Kappes and Christoph Schnörr
    In International Journal of Computer Vision, 105:269-297, 2013. Springer.
    [PDF (preprint)] [PDF] [supplemental material] [overview] [BIB (bibtex)]
  • Variational Recursive Joint Estimation of Dense Scene Structure and Camera Motion from Monocular High Speed Traffic Sequences
    Florian Becker, Frank Lenzen, Jörg H. Kappes and Christoph Schnörr
    In Proceedings of the 2011 International Conference on Computer Vision, pages 1692-1699, 2011. IEEE Computer Society.
    [PDF (preprint)] [PDF] [supplemental material] [overview] [BIB (bibtex)]

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

References

  • An Outdoor Stereo Camera System for the Generation of Real-World Benchmark Datasets with Ground Truth
    S. Meister, D. Kondermann and B. Jähne

    Results and Supplemental Material

    In SPIE Optical Engineering, 51(2), 2012.
    [BIB (bibtex)]





last update: 24.5.2017