|main page||workshop faces||workshop proceedings||CVPR 2004|
As we try to recognize a face in an image or a scene, we notice the following division and hierarchy of face processing tasks. First we scan a scene to localize the areas where the face is located, which defines the face segmentation task. Then we approach the area of interest and detect the presence of a face there (the face detection task). Then we follow the face (the tracking task), until it appears in the position convenient for recognition, which, in the case of faces, is an eye-to-eye position (eye detection and face modeling tasks). Only then do we attempt to assert whether the face is familiar or not. If it is familiar, we recognize it (the recognition task), and if it does not look familiar, we memorize it (the memorization task). These and other face processing tasks are summarized in Figure 2.
Not claiming that is the exact order in which humans recognize faces, as, for example, facial expression and orientation can be retrieved without retrieval of the face position, this is the order used to organize the papers presented at the workshop.
There were thirty papers selected out of 43 submissions for the presentation at the workshop. The papers are now retrieved from IEEEXplore digital library.
As it might be difficult to evaluate the video-based approaches presented by the papers by viewing only the video snap-shots shown, many authors have also submitted links to the actual video-demos which can be downloaded from the Internet for viewing. These links as well as the links to the related project's websites are made available at the workshop's website at http://www.visioninterface.net/fpiv04. The bibtex file with the list of all workshop's papers is also made available at http://www.visioninterface.net/fpiv04/fpiv04.bib.
The logo designed for the workshop, which appears as an animated image at the workshop's website, is developed by the workshop's chair to illustrate some peculiarities of processing faces in video, which are the following. In video a face is often arbitrarily oriented and captured in low resolution and under poor lighting conditions. It can also be blurred because of motion. At the same time, video allows one to capture facial motion, which makes it possible to localize and recognize a face from blinking, for example. The canonical face representation, which is the base face representation used to memorize and recognize faces from video, is often eye-centered and uses only the central part of the face. Commonly it is also chosen to be of the lowest possible resolution under which the face is still recognizable. In particular, one of the most frequently used canonical face sizes is 24 x 24 pixels, which allows one to describe the natural symmetry of a human face using 16 equal blocks, with eyes being located in the intersection of the upper blocks and mouth located in the intersection of lower blocks. Face recognition on black-and-white images is just as good as recognition on colour images. Besides, many recognition techniques work on the binary features extracted from face. The image also shows that the eyes are the most salient features in a human face, capturing immediately the observer's attention, while hair is not. The image also shows that despite low and binary representation of the face, it is still possible for humans to classify it as being a face of a man or a woman, and that it is a face of the same person, even ... as the age difference between the two images is almost thirty years.
Finally, I would like to thank all authors of the submitted papers. With their participation the First IEEE Workshop on Face Processing in Video becomes a real success and an inspiration for future workshops on this new and exciting area of research.
Dmitry O. Gorodnichy, FPIV'04 Program Chair
Copyright © 2004