SALSA: Synergetic sociAL Scene Analysis data-set contains uninterrupted recordings of an indoor social event involving 18 subjects over 60 minutes. It serves as a rich and extensive repository for the behavioral analysis and social signal processing communities. In addition to the raw multi-modal data, SALSA also contains position, pose and F-formation annotations over the entire event duration for evaluation purposes, as well as information regarding participants’ personality traits.
Scenario and roles. SALSA was recorded in a regular indoor space and the captured social event involved 18 participants and consisted of two parts of roughly equal duration. The first part consisted of a poster presentation session, where four research studies were presented by graduate students. A fifth person chaired the poster session. In the second half, all participants were allowed to freely interact over food and beverages during a cocktail party.
Sensors. The data were captured by a camera network and wearable badges worn by targets. The camera network comprised four synchronized static RGB cameras (1024×768 resolution) operating at 15 frames per second (fps). Each participant wore a sociometric badge during the recordings which is a 9×6×0.5 cm box equipped with four sensors, namely, a microphone, an infrared (IR) beam and detector, a Bluetooth detector and an accelerometer.
Annotations. Using a dedicated multi-view scene annotation tool, the position, head and body orientation of each target was annotated every 45 frames (3 seconds). Annotated positions and head/body orientations were used for deducing F-formations. Prior to data collection, all participants filled the Big Five personality questionnaire. The Big Five questionnaire owes its name to the five traits it assumes as constitutive of personality: Extraversion; Agreeableness; Conscientiousness; Emotional Stability; Creativity.
SALSA Dataset is published with the paper