object X

Object X, where X means: localization, detection, segmentation, classification, recognition, re-identification, but also reconstruction or tracking.

The object recognition task, generally speaking, involves locating and segmenting the entities appearing in digital images or in scenes and finally recognizing them as objects belonging to a certain class or that are similar to a given one. This task constitutes a key element towards a semantic description of the visual data, which is one of the main goals of Computer Vision.

Object detection and recognition makes it possible to semantic index and retrieve images in an archive, but also to locate a specific reference point for self-calibration, self-localization and (autonomous) navigation. Object recognition is strongly related to visual pattern recognition, i.e. the recognition of given patterns in visual data. They are building blocks in many computer vision applications, some well known examples include OCR, text detection, traffic sign recognition, pedestrian detection, face recognition. Applications in the industrial domain are related, for example, to quality control processes (for counting multi-instance objects depicted in an image or for detection and classification of defects) or to robotics (for recognizing objects to be grabbed or to be avoided).

In this research activity we develop innovative algorithms using deep learning as well as traditional image analysis to localize, detect, segment, and recognize patterns, objects, or their parts.


We introduce a method for multi-class, monocular 3D object detection from a single RGB image, which exploits a novel disentangling transformation and a novel, self-supervised confidence estimation method for predicted 3D bounding boxes.

related publications

  • A. Simonelli, S. Rota Bulò, L. Porzi, M. Lopez Antequera and P. Kontschieder. Disentangling Monocular 3D Object Detection: From Single to Multi-Class Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence

  • A. Simonelli, S. Rota Bulò, L. Porzi, P. Kontschieder and E. Ricci. Are We Missing Confidence in Pseudo-LiDAR Methods for Monocular 3D Object Detection? International Conference on Computer Vision - ICCV, pp. 3225-3233, October 2021

  • A. Simonelli, S. Rota Bulò, L. Porzi, E. Ricci and P. Kontschieder. Towards generalization across depth for monocular 3d object detection. European Conference on Computer Vision - ECCV, pp. 767-782, August 2020

  • A. Simonelli, S. Rota Bulò, L. Porzi, M. Lopez-Antequera and P. Kontschieder, Disentangling Monocular 3D Object Detection. International Conference on Computer Vision - ICCV, pp. 1991-1999, October 2019


Fine-grained recognition focuses on the challenging task of automatically identifying the subtle differences between similar categories. It is important to build systems able to solve increasingly complex tasks, for example to distinguish different types of pasta in a food recognition system just able to separate the objects in macro-categories.

We propose a simple method for fine-grained recognition that exploits a nearly cost-free attention-based focus operation to construct an ensemble of increasingly specialized Convolutional Neural Networks.

related publication

example fine grained method for bird species recognitions. Image is included in the reserch publication


This activity focuses on the development of algorithms for the detection of specific landmarks naturally embedded in the scene.

The image illustate a smarphon capturing the scene. On the screen is emphasized the detection of the LIFT sign. Picture acquired in FBK by cmm
The pictures shows a scene captured by a smartphone, specifically lift buttons. On the screen in highlighted the detection of the LIFT BUTTONS and the FINGER. Picture acquired in FBK by cmm

Our Visual Flat Target Detector - developed inside the EU project VENTURI (2011-2014) - aims at detecting and localizing in the 3d world specified landmarks within the visual range of a smartphone camera. A target can be single-part or multi-parts and its structure is encoded as a custom description file (xml format). The description file stores information about the number of sub-parts, their shape/size in the real world (represented by polylines) and the spatial relationships between them. The different parts are assumed to be co-planar. For each part, a specialized detection routine is specified inside the xml file. Using fast template matching methods and text in scene algorithms, we have developed specialized routines devoted to the detection of several landmarks, composed of a single or multiple parts.

The introduction of a multi-part target enables the detector to be particularly robust to partial occlusions, a useful feature especially if the target is not fully framed or in the case of interaction user-target (e.g. lift buttons can be covered by the user’s hand).

related publication

P. Chippendale, V. Tomaselli, V. D'Alto, G. Urlini, C.M. Modena, S. Messelodi, M. Strano, G. Alce, K. Hermodsson, M. Razafimahazo, T. Michel, G. Farinella, Personal Shopping Assistance and Navigator System for Visually Impaired People, ECCV Workshops, LNCS 8927/2014, pp. 375-390, 2014

traffic sign detection
traffic sign detection and recognition
other examples of pattern detection and recognition
sign detection and recognition from a moving vehicle


The detection and recognition of objects in images is a key research topic in the computer vision community. Within this area, face recognition and interpretation has attracted increasing attention owing to the possibility of unveiling human perception mechanisms, and for the development of practical biometric systems. This book and the accompanying website, focus on template matching, a subset of object recognition techniques of wide applicability, which has proved to be particularly effective for face recognition applications. Using examples from face processing tasks throughout the book to illustrate more general object recognition approaches, the author: examines the basics of digital image formation, highlighting points critical to the task of template matching; presents basic and advanced template matching techniques, targeting grey-level images, shapes and point sets; discusses recent pattern classification paradigms from a template matching perspective; illustrates the development of a real face recognition system; explores the use of advanced computer graphics techniques in the development of computer vision algorithms. Template Matching Techniques in Computer Vision is primarily aimed at practitioners working on the development of systems for effective object recognition such as biometrics, robot navigation, multimedia retrieval and landmark detection. It is also of interest to graduate students undertaking studies in these areas.


R. Brunelli, Template Matching Techniques in Computer Vision: Theory and Practice, Wiley, ISBN:978-0-470-51706-2, p. 348, 2009


MEMORI is memory-based system for the detection and recognition of objects in digital images developed in 2005-2007.

It is a user-friendly tool for content-based image annotation and for browsing digital catalogs by visual keywords. The objects to be recognized into an input image are represented by a 2D multi view model and each view is described by a set of visual features including color, shape and edges. The recognition is carried out by segmenting the input image and by merging adjacent segments into regions, described by the same features of the object views. The grouping process is guided by some heuristic rules, whose parameters are automatically estimated, and it outputs one or more image regions which are visually similar to the object to be recognized. No user interaction is required, although the system also allows the user to interact for visualizing partial results or algorithmic details or to grab an image portion when the object search can be restricted.

related publications

  • M. Lecca, S. Messelodi and C. Andreatta. An Object Recognition System for Automatic Image Annotation and Browsing of Object Catalogs, ACM Multimedia - ACMMM 2007, pp. 154-155, September 2007

  • M. Lecca and S. Messelodi. Rotation, Rescaling and Occlusion Invariant Object Retrieval, British Machine Vision Conference - BMVC, 2007

  • R. Bartolini, E. Giovannetti, S. Marchi, S. Montemagni, C. Andreatta, R. Brunelli, R. Stecher and P. Bouquet. Multimedia Information Extraction in Ontology-based Semantic Annotation of Product Catalogues, 3rd Italian Semantic Web Workshop - SWAP, 2006

  • M. Lecca and S. Messelodi. Recognition and Reconstruction of Partially Occluded Objects, Transactions on Engineering, Computing and Technology, Vol. 16, pp. 233-238, 2006

  • M. Lecca. Object Recognition in Color Images by the Self Configuring System MEMORI, International Journal of Signal Processing, Vol. 3, No. 3, pp. 176-185, 2006

  • R. Bartolini, E. Giovannetti, S. Marchi, S. Montemagni, C. Andreatta, R. Brunelli, R. Stecher and C. Niederée. Ontology Learning in Multimedia Information Extraction from Product Catalogues, Int. Conf. on Knowledge Engineering and Knowledge Management Managing Knowledge in a World of Networks - EKAW, 2006

  • M. Lecca, A Self Configuring System for Object Recognition in Color Images, International Conference on Computer Science - ICCS, pp. 35-40, 2006

  • C. Andreatta, M. Lecca and S. Messelodi, Memory-based Object Recognition in digital Images, International Fall Workshop - Vision, Modeling, and Visualization - VMV, 2005