You are here

Segmentation and Recognition

Main activities

We investigate how to deploy representation learning within the framework of decision forests, which are ensembles of binary decision trees that have become very popular in computer vision.

This activity focuses on the development of algorithms for the detection of landmarks naturally embedded in the scene.

Recognizing and classifying all the entities appearing in an image is a fundamental goal of computer vision, and constitutes a key element in the semantic understanding of images, videos and other multi-media resources. This macro-activity is concerned with the development of methods for extraction of visual information needed to understand the image content.

We develop algorithms devoted to the detection of text embedded in scenes, its segmentation from the background and its adjustment to facilitate its readability by an OCR engine. 

Object recognition systems provide a smart tool for the automatic indexing of an image by its visual content, allowing a high-level (semantic) description of the visual data.


  • S. Rota Bulo' B. Biggio, I. Pillai, M. Pelillo, F. Roli
    Randomized Prediction Games for Adversarial Machine Learning
    IEEE Transactions on Neural Networks and Learning Systems
  • L. Porzi, S. Rota Bulo', A. Penate-Sanchez, E. Ricci, F. Moreno-Noguer
    Learning Depth-aware Deep Representations for Robotic Perception
    IEEE Robotics and Automation Letters
  • P. Kontschieder, M. Fiterau, A. Criminisi, S. Rota Bulò
    Deep Neural Decision Forests
    International Conference on Computer Vision, Santiago, Chile, December 13-16, 2015


REPLICATE - cReative-asset harvEsting PipeLine to Inspire Collective-AuThoring and Experimentation - will enhance creativity through the integration of novel Mixed-Reality user experiences, enabling 3D/4D storyboarding in unconstrained environments and the ad-hoc expression of ideas by disassembling and reassembling objects in a co-creative workspace.

TRAVEL - Traffic Road Analysis by Visual Event Labelling project is about the automatic analysis of traffic sequences from static or moving cameras, aiming at the detection, classification and tracking of vehicles on the road.

MY-E-DIRECTOR 2012 - Real-Time Context-Aware and Personalized Media Streaming Environments for Large Scale Broadcasting Applications, is an FP7-ICT Project. The user becomes the director in personalized tailored sports broadcasting.

VIKEF - Virtual Information and Knowledge Environment Framework is to advanced semantic-enabled support for Information, Content and Knowledge (ICK) production, acquisition, processing, annotation, sharing and use by empowering information and knowledge environments for scientific and business communities.


MEMORI is a memory-based system for the detection and recognition of objects in digital images.

Tools and facilities

TeV has a long history of research in document image segmentation and on camera-based text detection. This page provides some links to academic research projects around the world, related to Document Image Understanding and Text Extraction from generic images, arranged by topics.