Person Tracking in Camera and Microphone Network

Integrated audio-visual monitoring can provide detailed reporting on when who is speaking, and towards whom or what, that is especially relevant for realizing multi-modal interfaces operating from a distance, and for the analysis of social interactions occurring in a closed space. Based on spatial reasoning, detected acoustic events can be either associated with one or more speaking individuals that are tracked persistently by the cameras, or be ignored as background noise. A more precise head orientation estimation of the speaker is also obtained through early fusion of audio-visual cues. 

In collaboration with the SHINE team of FBK, we combine real-time tracking with acoustic source localization techniques into an integrated solution for audio-visual monitoring in smart spaces.

