Person re-identification consists in matching observations of individuals across disjoint views in a network of surveillance cameras in order to recognize an individual who has previously been observed in a different location. In many domains camera settings and low resolution do not allow the use of face recognition. The most used cue is global appearance: this presupposes that the person does not change clothes, which is generally true, for example, for pedestrians walking in a monitored town. Nevertheless, this is a non-trivial problem because the appearance of individuals varies greatly through the scenes, due to possibly different acquisition devices, changes in viewpoints, illumination conditions, shadows, occlusions, different pose/orientation of the person that has to be searched for, as well as the presence of other similar individuals that populate the scene. This task is related to multi-camera single person tracking and has its main application in video surveillance.
On the contrary, visual person recognition or identification is based on hard biometric cues such as face, fingerprint, iris, and so on. Often the subject is collaborative, for example in security applications such as unlocking doors or smartphones. Considering the TeV's pioneering studies in this field, some past research activities are briefly reported.
Reidentification methods can be roughly divided into single-shot and multiple-shot approaches. The former have only one occurrence of the individual to be searched, while the latter integrate information over time using multiple views of the subject by her/his tracking in the video-stream where s/he is indicated as suspect by an operator (or by an intelligent module of a surveillance platform). In general, the features to describe the suspect can be biometric (gait, height) and/or appearance-based (clothes, pieces of clothes, case). The selection depends on the resolution of the images and the field of view of the cameras. In any case, selected features are used to build a signature of the person. Then the same kind of features are extracted from the frames of streams captured by the surveillance cameras, possibly in restricted regions, e.g. only where people move, and are compared with the signature of the suspect, therefore detecting possible locations of her/his presence.
Re-identification can provide a useful tool to re-identify and track impaired or elderly people that pass from a room to another in a surveilled house. In multi-camera surveillance systems, this technology is used to capture the presence of suspicious persons in complex environments such as train stations, assisting security operators by providing hypotheses of re-identifications. Tracking with re-identification provides useful information for improving shopping space management.
TeV contributed to the development of a video-surveillance platform with re-identification modules designed to assist security operators, therefore providing hypotheses of re-identifications. We proposed BFiVe, a supervised algorithm for single-shot person re-identification. How it works:
Image of the person is split into a set of rectangular receptive fields (differently sized, overlapped);
The following 5 color channels are considered: H, S, L, Cb, Cr, and handled in a separate way;
Image pixels I(x,y) are described by vectors of low level features: I, x, y, Gx, Gy, Gxx, Gyy for each channel;
For each receptive field, a Fisher vector description is built for each channel. They are concatenated and compressed (PCA) to obtain a single descriptor of the receptive field;
The image is described by the set of local descriptors.
weak scoring functions
For each receptive field we collect the set of descriptors computed on the images in the training set (containing pairs of images depicting the same person).
From this set a scoring function is learnt:
differences among descriptors are computed and split into two sets: S (difference of vectors depicting the same person) and D (otherwise);
the scoring function is an estimate of the log-likelihood ratio log P(x|S) / P(x|D) under the hypothesis that S and D have multivariate Gaussian distribution.
A boosting procedure is iteratively applied to select the weak scoring producing a ranking with minimum error:
the error is the given by the number of samples (image pairs) not correctly ranked, i.e. pair depicting the same person is not at rank 1, or pairs depicting different individuals have rank lower than the correct one.
Final scoring function is the linear combination of the selected weak scoring function with a weight selected as well by the boosting procedure.
Comparison of people re-identification algorithms require tests on common available databases. In supervised algorithms, the database is randomly split in two disjoint parts, the former is used for the training phase, the latter for the test. Since results are affected by the splitting of the databases, it is common practice to repeat the whole process using different random partitions averaging the results to represent the algorithm performance. We provide all the random partitions of four standard re-identification databases (namely: VIPeR, 3DPeS, PRID2011, iLIDS119) that we have used to train and test BFiVe. Making our partition available, the improvement of the state-of-the-art with respect to BFiVe can be measured on exactly the same runs, avoiding the random splitting factor.
S. Messelodi and C.M. Modena. Boosting Fisher Vector based Scoring Functions for Person Re-Identification, Image and Vision Computing, Vol. 44, pp. 44-58, 2015
N. Conci, F.G.B. De Natale, S. Messelodi, C.M. Modena, M. Verza, and R. Fioravanti. An integrated framework for video surveillance in complex environments. IEEE International Smart Cities Conference - ISC2, 2016
Unusual biometric cues can be used in some context for identity verification. Machine learning is applied to create a person verification tool based on an unusual biometric cue: the people identity is verified by matching the top view finger snapshots, supplementing purely geometrical finger shape comparison with textural information. Low dimensional feature vectors are used to train binary classifiers based on small Gaussian Basis Functions networks.
R. Brunelli. Identity verification through finger matching: A comparison of Support Vector Machines and Gaussian Basis Functions classifiers. Pattern Recognition Letters 27(16):1905-1915, 2006
A pioneering integrated multi-sensory person recognition system (for identification and verification) using visual and acoustic cues was developed and patented in 1993 (US 5412738, EP0582989). The system is based on face and voice analysis and a multi-classifier for the recognition. A technique for the integration of multiple classifiers (two based on acoustic features and three based on visual ones) at an hybrid rank/measurement level is introduced using HyperBF networks. Two different methods for the rejection of an unknown person are introduced.
R. Brunelli and D. Falavigna. Person identification using multiple cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(10):955-966, 1995
R. Brunelli, D. Falavigna, T Poggio, and L. Stringa. Automatic person recognition by acoustic and geometric features, Machine Vision and Applications, 8:317-325, 1995
R. Brunelli and T. Poggio. Face recognition: features versus templates, IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(10):1042-1052, 1993