object X
Object X, where X means: localization, detection, segmentation, classification, recognition, re-identification, but also reconstruction or tracking.
The object recognition task, generally speaking, involves locating and segmenting the entities appearing in digital images or in scenes and finally recognizing them as objects belonging to a certain class or that are similar to a given one. This task constitutes a key element towards a semantic description of the visual data, which is one of the main goals of Computer Vision.
Object detection and recognition makes it possible to semantic index and retrieve images in an archive, but also to locate a specific reference point for self-calibration, self-localization and (autonomous) navigation. Object recognition is strongly related to visual pattern recognition, i.e. the recognition of given patterns in visual data. They are building blocks in many computer vision applications, some well known examples include OCR, text detection, traffic sign recognition, pedestrian detection, face recognition. Applications in the industrial domain are related, for example, to quality control processes (for counting multi-instance objects depicted in an image or for detection and classification of defects) or to robotics (for recognizing objects to be grabbed or to be avoided).
In this research activity we develop innovative algorithms using deep learning models as well as traditional image analysis to localize, detect, segment, and recognize patterns, objects, or their parts.
UNSUPERVISED DOMAIN ADAPTATION IN OBJECT DETECTION
We propose a novel and effective four-step unsupervised domain adaptation approach that leverages self-supervision and trains source and target data concurrently. We harness self-supervised learning to mitigate the lack of ground truth in the target domain.
related publications
M.L. Mekhalfi, D.Boscaini and F. Poiesi. Detect, Augment, Compose, and Adapt: Four Steps for Unsupervised Domain Adaptation in Object Detection. British Machine Vision Conference - BMVC, November 2023 [arXiv]
MONOCULAR 3D OBJECT DETECTION
We introduce a method for multi-class, monocular 3D object detection from a single RGB image, which exploits a novel disentangling transformation and a novel, self-supervised confidence estimation method for predicted 3D bounding boxes.
related publications
A. Simonelli, S. Rota Bulò, L. Porzi, M. Lopez Antequera and P. Kontschieder. Disentangling Monocular 3D Object Detection: From Single to Multi-Class Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence
A. Simonelli, S. Rota Bulò, L. Porzi, P. Kontschieder and E. Ricci. Are We Missing Confidence in Pseudo-LiDAR Methods for Monocular 3D Object Detection? International Conference on Computer Vision - ICCV, pp. 3225-3233, October 2021
A. Simonelli, S. Rota Bulò, L. Porzi, E. Ricci and P. Kontschieder. Towards generalization across depth for monocular 3d object detection. European Conference on Computer Vision - ECCV, pp. 767-782, August 2020
A. Simonelli, S. Rota Bulò, L. Porzi, M. Lopez-Antequera and P. Kontschieder, Disentangling Monocular 3D Object Detection. International Conference on Computer Vision - ICCV, pp. 1991-1999, October 2019
FINE GRAINED RECOGNITION
Fine-grained recognition focuses on the challenging task of automatically identifying the subtle differences between similar categories. It is important to build systems able to solve increasingly complex tasks, for example to distinguish different types of pasta in a food recognition system just able to separate the objects in macro-categories.
We propose a simple method for fine-grained recognition that exploits a nearly cost-free attention-based focus operation to construct an ensemble of increasingly specialized Convolutional Neural Networks.
related publication
A. Simonelli, F. De Natale, S. Messelodi, S. Rota Bulò, Increasingly specialized ensemble of convolutional neural networks for fine-grained recognition, International Conference on Image Processing - ICIP, 2018
PATTERN AND LANDMARK DETECTION
This activity focuses on the development of algorithms for the detection of specific landmarks naturally embedded in the scene.
Our Visual Flat Target Detector - developed inside the EU project VENTURI (2011-2014) - aims at detecting and localizing in the 3d world specified landmarks within the visual range of a smartphone camera. A target can be single-part or multi-parts and its structure is encoded as a custom description file (xml format). The description file stores information about the number of sub-parts, their shape/size in the real world (represented by polylines) and the spatial relationships between them. The different parts are assumed to be co-planar. For each part, a specialized detection routine is specified inside the xml file. Using fast template matching methods and text in scene algorithms, we have developed specialized routines devoted to the detection of several landmarks, composed of a single or multiple parts.
The introduction of a multi-part target enables the detector to be particularly robust to partial occlusions, a useful feature especially if the target is not fully framed or in the case of interaction user-target (e.g. lift buttons can be covered by the user’s hand).
related publication
P. Chippendale, V. Tomaselli, V. D'Alto, G. Urlini, C.M. Modena, S. Messelodi, M. Strano, G. Alce, K. Hermodsson, M. Razafimahazo, T. Michel, G. Farinella, Personal Shopping Assistance and Navigator System for Visually Impaired People, ECCV Workshops, LNCS 8927/2014, pp. 375-390, 2014
TEMPLATE MATCHING TECHNIQUES IN COMPUTER VISION
The detection and recognition of objects in images is a key research topic in the computer vision community. Within this area, face recognition and interpretation has attracted increasing attention owing to the possibility of unveiling human perception mechanisms, and for the development of practical biometric systems. This book and the accompanying website, focus on template matching, a subset of object recognition techniques of wide applicability, which has proved to be particularly effective for face recognition applications. Using examples from face processing tasks throughout the book to illustrate more general object recognition approaches, the author: examines the basics of digital image formation, highlighting points critical to the task of template matching; presents basic and advanced template matching techniques, targeting grey-level images, shapes and point sets; discusses recent pattern classification paradigms from a template matching perspective; illustrates the development of a real face recognition system; explores the use of advanced computer graphics techniques in the development of computer vision algorithms. Template Matching Techniques in Computer Vision is primarily aimed at practitioners working on the development of systems for effective object recognition such as biometrics, robot navigation, multimedia retrieval and landmark detection. It is also of interest to graduate students undertaking studies in these areas.
book
R. Brunelli. Template Matching Techniques in Computer Vision: Theory and Practice, Wiley, ISBN:978-0-470-51706-2, p. 348, 2009
MEMORI
MEMORI is memory-based system for the detection and recognition of objects in digital images developed in 2005-2007.
It is a user-friendly tool for content-based image annotation and for browsing digital catalogs by visual keywords. The objects to be recognized into an input image are represented by a 2D multi view model and each view is described by a set of visual features including color, shape and edges. The recognition is carried out by segmenting the input image and by merging adjacent segments into regions, described by the same features of the object views. The grouping process is guided by some heuristic rules, whose parameters are automatically estimated, and it outputs one or more image regions which are visually similar to the object to be recognized. No user interaction is required, although the system also allows the user to interact for visualizing partial results or algorithmic details or to grab an image portion when the object search can be restricted.
related publications
M. Lecca, S. Messelodi and C. Andreatta. An Object Recognition System for Automatic Image Annotation and Browsing of Object Catalogs, ACM Multimedia - ACMMM 2007, pp. 154-155, September 2007
M. Lecca and S. Messelodi. Rotation, Rescaling and Occlusion Invariant Object Retrieval, British Machine Vision Conference - BMVC, 2007
R. Bartolini, E. Giovannetti, S. Marchi, S. Montemagni, C. Andreatta, R. Brunelli, R. Stecher and P. Bouquet. Multimedia Information Extraction in Ontology-based Semantic Annotation of Product Catalogues, 3rd Italian Semantic Web Workshop - SWAP, 2006
M. Lecca and S. Messelodi. Recognition and Reconstruction of Partially Occluded Objects, Transactions on Engineering, Computing and Technology, Vol. 16, pp. 233-238, 2006
M. Lecca. Object Recognition in Color Images by the Self Configuring System MEMORI, International Journal of Signal Processing, Vol. 3, No. 3, pp. 176-185, 2006
R. Bartolini, E. Giovannetti, S. Marchi, S. Montemagni, C. Andreatta, R. Brunelli, R. Stecher and C. Niederée. Ontology Learning in Multimedia Information Extraction from Product Catalogues, Int. Conf. on Knowledge Engineering and Knowledge Management Managing Knowledge in a World of Networks - EKAW, 2006
M. Lecca. A Self Configuring System for Object Recognition in Color Images, International Conference on Computer Science - ICCS, pp. 35-40, 2006
C. Andreatta, M. Lecca and S. Messelodi. Memory-based Object Recognition in digital Images, International Fall Workshop - Vision, Modeling, and Visualization - VMV, 2005