You are here

Activity and Scene Analysis

Main activities

We conduct research on vision based gesture recognition, action recognition and activity analysis, with particular emphasis on user-adaptation and the integration with other modalities such as acoustic scene analysis.

Non-invasive technologies for monitoring and understanding complex environments have applications in diverse domains such as security in industrial environment, outdoor and indoor surveillance, traffic analysis, assisted living, customer behaviour, sports analysis etc. This macro-activity is concerned with the development of computer vision technologies for dynamic scene understanding in such applicative contexts.

Our activities focus on signal processing techniques for detecting user-customized gestures as well as recognising repetitive motion patterns such as occurring during work-out and sport activities.

We apply computer vision monitoring in combination with other sensing modalities (e.g. audio) to explore the relationship between proxemics, visual attention, social signals and personality traits during interaction.

We have developed and field-tested traffic analysis tools for monitoring road intersections and queues. We also apply statistical methods for analyzing complex visual scenes involving people and their activities.


TeV ranked in the 3rd position in EPIC-Kitchens 2019 Action Recognition Challenge at CVPR


The project develops solutions for identity-preserving tracking in multi-camera environment and tools to facilitate their deployment. Main applications are with real-time people monitoring and behaviour analytics in indoor spaces.

Using a multi-camera setup, TeV contributes to build sport analysis systems by developing video analytics which detect and track in real time players and objects movement in a court.

FITCITY - Promoting a physically active lifestyle through the use of a mobile application that integrates scientifically based fitness assessments and workout routines, gamification techniques, psychological theories and peer-pressure mechanisms.

TRAVEL - Traffic Road Analysis by Visual Event Labelling project is about the automatic analysis of traffic sequences from static or moving cameras, aiming at the detection, classification and tracking of vehicles on the road.

ACUBE - Ambient Aware Assistance develops technologies for monitoring complex environments that can be applied in areas such as assisted living homes to help personnel, as well as to support the independence and safety of users.

PUMALAB - Multimodal Monitoring and Behavior Analysis is to advance the vision-based and audio-visual monitoring of people and their behavior, and to enable the inference of attention patterns as well as of physically observable social signals.

NETCARITY - A NETworked multisensor system for elderly people: health CARe, safety and securITY in home environment - is an EC FP6 project that proposes a new integrated paradigm for supporting independence and engagement in elderly people living alone at their own home place.

CHIL - Computers in the Human Interaction Loop is a FP6 IST project. Explore and create environments in which computers serve humans who focus on interacting with other humans as opposed to having to attend to and being preoccupied by the machines themselves.

DIPLODOC - DIstributed Processing of LOcal Data for On-line Car services project is to design and develop a system based on a distributed architecture where intelligent vehicles communicate with a remote traffic control center.

PROGETTO PILOTA - Vision-Based Traffic Parameters for Quantifying Accident Risk project targets the development of new methods for the automatic collection of visual traffic parameters describing the road users behavior in order to build an accident risk map. Objective is the quantification of the accident risk mitigation by changing some physical characteristics of the road structure.


SCOCA - Traffic Analyzer is a video-based junction monitoring system that is able to extract and collect traffic data in real-time by analysing video sequences acquired from monocular pole-mounted cameras.