Cocktail Party

Social Signal Processing aims at analyzing and modeling social signals in human–human and human–machine interactions. Non-verbal behavior, including gaze, facial expression and body language, is extremely significant in human interaction and can be captured by means of computer vision techniques. In the group behavior analysis some key variables are the proximity and focus of attention indicating the object or person one is attending to.

CocktailParty Dataset is a multi-view dataset designed for social behavior analysis.

CocktailParty dataset contains about 30 minutes of video recordings of a cocktail party in a 30 sqm lab environment involving 6 subjects.

It was recorded in the TeV's LAB using four synchronized angled-view cameras (15Hz, 512x384px, jpeg format) installed in the corners of the room. Subject’s positions and horizontal head orientations were logged using a particle filter-based body tracker with head pose estimation. The dataset is challenging for video analysis due to frequent and persistent occlusions, in a highly cluttered scene. Groups in one frame every 5 seconds were annotated manually by an expert, resulting in a total of 320 distant frames for evaluation.

Tev Lab CocktailParty dataset. People tracking

Disclaimer: CocktailParty dataset can be used for research or academic purposes only. The dataset has been published along with the paper referenced below.


note on usage

Although the data-set is multi-view, some groups working on single-view approaches have used only CAM1 data for their results (e.g. as we did in our ICCV15 paper cited below).

camera calibration

  • Calibration files for CAM1 CAM2 CAM3 CAM4 generated using checkerboard pattern and OpenCV

  • C++ code to compute the image projection of a 3D point from the calibration files

  • CHECK THIS OUT bash script to learn about how to use the annotations

related publications