shape
Many tasks such as object recognition and action recognition involve understanding the shapes of objects or human silhouettes. Shape reconstruction of objects from their appearance in images has a long history: techniques to detect the shape from... texture, shading, stereo, motion have been proposed since early 1970s with algorithms working well in certain situations or for certain objects, but performing poorly for others, sometimes unusable in real-time applications and often with un-calibrated acquisition devices. Research in this area has never stopped. Some methods focus on restricted classes of objects in interest, such as faces or human bodies. Recently, deep learning methods are proposed to estimate the 3D shape of objects from a single flat image. Furthermore, shape and pose of articulated objects are studied by means of geometric deep learning techniques with input multiple RGB images or 3D scans.
We study the problem of reconstructing a 3D object, possibly articulated or deformable, from a single or multiple scans or images. For example, we study algorithms to generate correspondences between images depicting the same object or person, to provide a geometric reconstruction of that specific piece of world. Data of the same object or scene, acquired from different viewpoints or at different times, need to be aligned for the 3D reconstruction. The processing chain include the following steps, possibly end-to-end: (i) primitives detection, i.e. extraction of 2D/3D key points, or salient regions, to be described with a feature vector; (ii) matching of descriptors by similarity measures; (iii) finally, parameter estimation of a transformation model used for the alignment of the entire data, which permits to describe the shape. Applications range from extended reality and graphics, to robotics and autonomous navigation.
We study novel shape descriptors, particularly by learning descriptors for (deformable) shapes using deep networks. This is a topic related to indexing and retrieval of 3D objects which has many and varied applications: hand gesture recognition, retrieval and classification of scanned objects, classification of proteins, just to cite a few.
POINT CLOUD DESCRIPTORS AND REGISTRATION
3D point set registration is the problem of finding an optimal Euclidean transformation to align two partially overlapping 3D point sets such that they can be represented in a common reference frame. Our research focuses on the design of algorithms that process 3D point clouds that are captured in the real world.
Point cloud registration approaches can be broadly categorised into correspondence-free and correspondence-based.
Correspondence-free registration approaches aim at minimizing the difference between the global features extracted from two input point clouds (like OGMM).
Correspondence-based registration approaches rely on point-level correspondences between two input point clouds, for example by computing the correspondences through 3D descriptors (like DIP or GeDi).
related publications
G. Mei, C. Saltori, E. Ricci, N. Sebe, Q. Wu, J. Zhang, F. Poiesi. Unsupervised Point Cloud Representation Learning by Clustering and Neural Rendering. International Journal on Computer Vision - IJCV, 2024 [doi]
X. Zheng, X. Huang, G. Mei, Y. Hou, Z. Lyu, B. Dai, W. Ouyang, Y. Gong. Point Cloud Pre-training with Diffusion Models. IEEE Conference on Computer Vision and Pattern Recognition - CVPR, June 2024 [pdf]
C. Saltori, F. Galasso, G. Fiameni, N. Sebe, F. Poiesi and E. Ricci. Compositional Semantic Mix for Domain Adaptation in Point Cloud Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023 [arXiv][doi]
D. Boscaini and F. Poiesi. PatchMixer: Rethinking network design to boost generalization for 3D point cloud understanding. Image and Vision Computing, 2023 [arXiv]
F. Poiesi and D. Boscaini. Learning general and distinctive 3D local deep descriptors for point cloud registration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3979-3985, 2023 [doi][pdf]
G. Mei, F. Poiesi, C. Saltori, J. Zhang, E. Ricci, N. Sebe. Overlap-guided Gaussian Mixture Models for Point Cloud Registration, Winter Conference on Applications of Computer Vision - WACV, 2023
F. Poiesi and D. Boscaini. Learning general and distinctive 3D local deep descriptors for point cloud registration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022 [doi][pdf]
F. Poiesi, D. Boscaini. Distinctive 3D local deep descriptors, International Conference on Pattern Recognition - ICPR, pp. 5720-5727, 2021 [pdf]
M. Zanin, F. Remondino and M. Dalla Mura. High-performance computing in image registration, SPIE Remote Sensing, 2012
3D SEMANTIC SEGMENTATION
With the advent of mobile phones capable of capturing 3D information, there has been tremendous increase in the point cloud data availability. Point cloud processing and 3D shape understanding are very challenging tasks for which deep learning techniques have demonstrated great potential. These studies are intended for general semantic scene understanding purposes; a practical application is the detection and identification of various object parts for robotic manipulation tasks.
To allow artificial intelligent agents to interact with the real world, where the amount of annotated data may be limited, integrating new sources of knowledge becomes crucial to support autonomous learning. We consider several possible scenarios involving synthetic and real-world point clouds where supervised learning fails due to data scarcity and large domain gaps. We propose to enrich standard feature representations by leveraging self-supervision through a multi-task model that can solve a 3D puzzle while learning the main task of shape classification or part segmentation.
In augmented reality the segmentation of an object in semantic parts can be exploited for example to detect the single parts in order to virtually manipulate and reconstruct the object in a different fashion, like done in the REPLICATE project.
The semantic segmentation of 3D shapes with a high-density of vertices could be impractical due to large memory requirements. To make this problem computationally tractable, we propose neural-network based approaches that produces 3D augmented views of the 3D shape to solve the whole segmentation as sub-segmentation problems.
related publications
L. Riz, C. Saltory, Y. Wang, E. Ricci, F. Poiesi. Novel class discovery meets foundation models for 3D semantic segmentation. International Journal on Computer Vision - IJCV, 2024
L. Riz, C. Saltori, E. Ricci and F. Poiesi. Novel class discovery for 3D point cloud semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition - CVPR, June 2023 [pdf, code]
A. Alliegro, D. Boscaini and T. Tommasi. Joint Supervised and Self-Supervised Learning for 3D Real-World Challenges, International Conference on Pattern Recognition - ICPR, 2020 [pdf]
A. Alliegro, D. Boscaini and T. Tommasi. Self-Supervision for 3D Real-World Challenges, European Conference on Computer Vision Workshops - ECCVW, 2020
D. Boscaini and F. Poiesi. 3D Shape Segmentation with Geometric Deep Learning, 20th International Conference on Image Analysis and Processing - ICIAP, 2019
MULTI VIEW DATA CAPTURE
We propose a system to capture nearly synchronous frame streams from multiple and moving handheld mobiles that is suitable for dynamic object 3D reconstruction. Each mobile executes Simultaneous Localisation and Mapping on-board to estimate its pose, and uses a wireless communication channel to send or receive synchronization triggers. The system can harvest frames and mobile poses in real time using a decentralized triggering strategy and a data-relay architecture that can be deployed either at the Edge or in the Cloud.
related publications
M. Bortolon, A. Del Bue and F. Poiesi. VM-NeRF: Tackling Sparsity in NeRF with View Morphing. International Conference on Image Analysis and Processing - ICIAP, September 2023
M. Bortolon and F. Poiesi. An open-source mobile-based system for synchronised multi-view capture and dynamic object reconstruction, Software Impact, 9, 2021 [doi]
M. Bortolon, L. Bazzanella and F. Poiesi. Multi-view data capture for dynamic object reconstruction using handheld augmented reality mobiles, Journal of Real-Time Image Processing, vol. 18, pp. 345-355, 2021 [pdf]
M. Bortolon, P. Chippendale, S. Messelodi and F. Poiesi. Multi-view data capture using edge-synchronised mobiles, 15th International Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - VISAPP, pp. 730-740, 2020
KEY-POINTS IN HUMAN BODY
One of the fundamental tasks in computer vision when analyzing, for example, humans is the problem of accurately estimating 2D/3D key-points from images depicting human bodies. This is the first step in object reconstruction.
We study innovative methods for 2D/3D key-points locations, for example applied to human pictures under domain shift, i.e. when the training (source) and the test (target) images significantly differ in terms of visual appearance. One of the proposed methods seamlessly combines three different components: feature alignment, adversarial training and self-supervision. Specifically, a deep architecture leverages domain-specific distribution alignment layers to perform target adaptation at the feature level. Furthermore, a loss is proposed which combines an adversarial term for ensuring aligned predictions in the output space and a geometric consistency term which guarantees coherent predictions between a target sample and its perturbed version.
related publications
L.O. Vasconcelos, M. Mancini, D. Boscaini, S. Rota Bulò, B. Caputo and E. Ricci. Shape Consistent 2D Keypoint Estimation under Domain Shift, International Conference on Pattern Recognition - ICPR, pp. 8037-8044, 2020
L.O. Vasconcelos, M. Mancini, D. Boscaini, B. Caputo and E. Ricci. Structured domain adaptation for 3d keypoint estimation, International Conference on 3D Vision - 3DV, pp. 57-66, 2019
SHAPE RECOGNITION
We propose novel approaches based on geometric deep learning techniques, for example to 3D hand shape recognition from RGB-D data. In this case the model, trained on synthetic data, retains the performance on real samples during test time.
related publication
J. Svoboda, P. Astolfi, D. Boscaini, J. Masci and MM. Bronstein. Clustered Dynamic Graph CNN for Biometric 3D Hand Shape Recognition, IEEE International Joint Conference on Biometrics - IJCB, pp. 1-9, 2020