FBK home > INFORMATION TECHNOLOGY > Technologies of Vision > semantic image labelling

Technologies of Vision: semantic image labelling

3D from 2D

We register photos by correlating their content against a rendered 3D spherical panorama of the World generated about the photo's 'geo-location'. In essence we generate a complete 360° synthetic image of what an observer would see all around them at a given location using a Digital Terrain Model and ray-tracing techniques.

The virtual view is 'unwrapped' into a 360° by 180° rectangular window and the photo is then deformed into the same 'space' depending upon estimated camera parameters, such as pan, tilt, lens distortion, etc. Scaling information (i.e. zoom) is extracted from the focal length meta-data contained within the EXIF JPEG data of the photo.

A correlation algorithm then attempts to find the best match between the synthetic image features and details extracted from the photo such as, land-sky junctions and perceived depth discontinuities (i.e. to estimate the ridges of non-horizon forming mountains).

During the alignment process we generate a synthetic model of the World. A small portion of the 360 degree panorama can be seen here. To generate this red/blue 3D image we created a new virtual camera for the blue channel (20m from the original camera's position) from the depth data produced by the Marmota alignment process. From a single aligned image we can view the pixels from any new perspective we wish through the creation of a virtual camera. The missing information (i.e. the grey pixels) show you the altitude (click to view a short movie).