Cocktail Party

Social Signal Processing aims at analyzing and modeling social signals in human–human and human–machine interactions. Non-verbal behavior, including gaze, facial expression and body language, is extremely significant in human interaction and can be captured by means of computer vision techniques. In the group behavior analysis some key variables are the proximity and focus of attention indicating the object or person one is attending to.

CocktailParty Dataset is a multi-view dataset designed for social behavior analysis.

CocktailParty dataset contains about 30 minutes of video recordings of a cocktail party in a 30 sqm lab environment involving 6 subjects.

It was recorded in the TeV's LAB using four synchronized angled-view cameras (15Hz, 512x384px, jpeg format) installed in the corners of the room. Subject’s positions and horizontal head orientations were logged using a particle filter-based body tracker with head pose estimation. The dataset is challenging for video analysis due to frequent and persistent occlusions, in a highly cluttered scene. Groups in one frame every 5 seconds were annotated manually by an expert, resulting in a total of 320 distant frames for evaluation.

Tev Lab CocktailParty dataset. People tracking

Disclaimer: CocktailParty dataset can be used for research or academic purposes only. The dataset has been published along with the paper referenced below.

download

Video: CAM1 CAM2 CAM3 CAM4 (185M each file)
HJS-PF position and pose tracking
Group annotations
README

note on usage

Although the data-set is multi-view, some groups working on single-view approaches have used only CAM1 data for their results (e.g. as we did in our ICCV15 paper cited below).

camera calibration

Calibration files for CAM1 CAM2 CAM3 CAM4 generated using checkerboard pattern and OpenCV
C++ code to compute the image projection of a 3D point from the calibration files
CHECK THIS OUT bash script to learn about how to use the annotations

related publications

E. Ricci, J. Varadarajan, R. Subramanian, S. Rota Bulò, N. Ahuja, O. Lanz: Uncovering Interactions and Interactors: Joint Estimation of Head, Body Orientation and F-formations from Surveillance Video. International Conference on Computer Vision - ICCV, Santiago, Chile, December 13-16, 2015
R. Subramanian, Y. Yan, J. Staiano, O. Lanz, N. Sebe: On the relationship between head pose, social attention and personality prediction for unstructured and dynamic group interactions. ACM International Conference on Multimodal Interaction - ICMI, Sydney, Australia, December 9-13, 2013
F. Setti, O. Lanz, R. Ferrario, V. Murino, M. Cristani: Multi-scale f-formation discovery for group detection. IEEE International Conference on Image Processing - ICIP, Melbourne, Australia, 13-18 September 2013
G. Zen, B. Lepri, E. Ricci, O. Lanz: Space Speaks - Towards Socially and Personality Aware Visual Surveillance. Multimodal Pervasive Video Analysis Workshop - MPVA 2010, Satellite workshop of ACMMM 2010, Firenze, Italy, October 29, 2010, pp. 37-42

Page updated

Report abuse