Methods and Models for 3D Imaging in Low-Sensing Conditions
|Kustantaja||Tampere University of Technology|
|Tila||Julkaistu - 29 marraskuuta 2018|
|Nimi||Tampere University of Technology. Publication|
Starting with 3D scene sensing, one has to select the sensing modes, which generally vary from passive (stereo or multi-) camera systems to active depth sensors or optimized combinations of these. When dealing with multiple sensors, an important problem is to know their exact relative position in order to interpret the projected corresponding points correctly; a problem described as stereo calibration. While the problem has arguably been solved in its supervised form, more advanced solutions are needed for non-supervised cases, i.e. when cameras have to be calibrated seamlessly for the user based only on the features of the target image. This is especially important when mechanical or other misalignments affect the sensing quality. A major problem with active sensors is the existence of measurement imperfections, i.e. noise caused by weak illuminating signals, low-lighting, low material reflectivity, and other sensor or scene-related factors. Thus, the problem of denoising and enhancing such data becomes of primary importance. Working with multi-modal sensors naturally requires fusing the multiple modes in effective 3D representations, which also have to be suitable for compression, storage and subsequent rendering.
This thesis presents novel solutions for all the links in the 3D imaging chain. Our approach could be described as ‘pushing to the limits’. We have considered cases where the sensing is complicated by a number of factors which we summarize as low-sensing conditions, these being low-power of the device, miniaturization requirements for the sensor, low-light, lowreflectivity and low transmission bandwidth. Finding solutions for such difficult cases would ensure that 3D imaging techniques would work in any conditions. Our main object of interest is the depth modality, which provides information about a scene’s geometry and, when aligned with the color modality, can serve for depth image-based rendering (DIBR) of the desired virtual views. Depth can be estimated either by ‘passive stereo’ camera set-ups or by active sensors utilizing the “Time-of-flight” (ToF) principle.
For passive stereo, we have analyzed the effect of the image processing pipeline (IPP) on the quality of estimated depth maps. We have built a model of a mobile IPP and quantified the influence of all the processing blocks on the quality of the subsequent depth estimation, implemented with a set of state-of-the-art techniques. We place specific emphasis on the influence of (even small) mechanical misalignments, which have to be tackled by on-the-fly recalibration. We have developed a novel recalibration technique tailored for mobile stereo, where sensors are supposed to be rigidly fixed but might not always be so. Our approach considers roughly calibrated cameras and aims at constraining the number of their degrees of freedom, which yields a robust solution and speeds up the algorithm.
For the case of active depth sensing, we have concentrated on the use of miniaturized ToF sensors, where the illumination sources and the device’s power are reduced so that they can be integrated in mobile devices. In the considered ToF devices, the range data is measured by the elapsed time during which a light signal illuminates a scene and travels back to the sensing elements. The range accuracy of a typical ToF device is strongly correlated with the light intensity of the received reflected light signal – a weaker signal implies less accurate measurement. In low-sensing operating mode, the captured data has to be post-processed in order to achieve the desired measurement accuracy that is achieved in normal operating mode. We have thoroughly modelled two noises always presented in the low-sensing case. First, we have modelled a spatially-correlated noise cast as Fixed-pattern noise (FPN). Such noise is particularly pronounced in low-sensing conditions and has to be removed as a first step in any further processing. We have developed a method which effectively suppresses FPN by means of adaptive notch filtering. Furthermore, we have modelled the remaining noise in terms of probability distributions and validated the derived models with empirical measurements. Based on the new models, we have devised an effective denoising method which favors the use of a complex-valued representation of the sensed signal and makes use of its naturally stabilized noise variance.
Current ToF devices have certain technological limitations such as low spatial resolution and limited ability to capture color information. A solution for this is to combine two or more devices to capture color ((V)iew) and depth (Z) data and fuse them into a 3D representation referred to as “View-plus-depth” (V+Z). We have investigated the case of multi-sensor data fusion and developed appropriate methods also incorporating the modules for virtual view rendering and dis-occlusion in-painting. Finally, we have analyzed the 3D data representation by V+Z and developed a new method for its efficient asymmetric representation which has competitive performance in compression and fusion tasks.
The thesis includes a list of the software modules developed during the course of related research. It allows the developed methods and models to be used in a wide range of applications in mobile 3D imaging, car and robot navigation, and 3D realistic visualization.