Face features were the
key of face recognition systems that was
used from the beginning and still plays a major role in the today’s developed
approaches to imitate the capacity of face recognition.
There were efforts marqeus PAPER20 suggested the geometric measurement of eyes (distance from the two eyes and the
ratio between width and length) and disposing of certain face feature that were
used in the past to improve the performance marqeus PAPER24. Other important
features include skin color marqeus PAPER 99, 33 which influence in a high
scale the face detection. Localization of
specific features such as the mouth and eyes is used in normalization
before extracting the features. It is critical to determine what facial
features should be present to an accurate face recognition and what features
are irrelevant causing just added noise.
initiation of mathematical tools such as
eigenfaces generated a new method to facial recognition marqeus PAPER101, 56.
Which made feasible the computation between the similarities of the features
from human faces.
3.0 Face Detection
considering the face recognition, it is required to go through some
of face detection is to locate and detect if the human faces come into sight in
a given frame or not and determine the position of the human face in the image.
The outcome of it are patches that hold each single face in the input image, in
order to compensate the orientations and scales of these patches is carried out
a face alignment.
With the technology
evolving continuously some applications do not necessarily need the phase of face
detection. In addition most of data bases usually contain images that are
normalized before. With other words the
frames are already standardized so the detecting phase can be skipped.
However this rule doesn’t
apply to every case, for example in case there is the implementation of a face
tracking system and recognition system
in a video surveillance then it is
mandatory to have face detection phase. Presumably face detection is more or
less a division of recognition algorithms. From the other hand there are a few problems
to be taken into consideration. marqeus PAPER 117. 125 Because
the images might have been capture in environments that are not controlled for
example a surveillance video occurring the following challenges:
Large pose variations due to the moving human or a bad angle of
camera. With that being said the performance of the face detection drops to an
undesirably great degree.
Making it hard to extract the features due to the
existence of different components such as glasses, beard or even the subject itself
can be covered by other faces. .
Facial gestures are also a major challenge due to
the subject’s face expression may vary drastically making it tough to detect
If the image doesn’t have the appropriate settings
when capturing the image such as a good exposure, shutter speed and the
appropriate aperture for the specific ambient it might end up having a blurry image, too dark or washed out image,
making it tough to distinguish the face.
3.1 Structure of Face Detection
The challenges mentioned
above are not the only one in the phase of Face Detection, there are plenty of
sub-challenges. Some approaches detect and in the meantime locate the faces and
then proceed with the next phase which is the face tracking -see Figure 1.0. While
other approaches initially execute a detection loop and if the face is detected
it tries to locate the face we are interested in afterwards execute face
tracking -see Figure 1.0. A face location
is an easier way to face detection and it is responsible to find the
location of a particular face in an image showing only one face.
Figure 1.0 : Two different Face Detection Processes.
Such approaches generally
share the same processes. Initially
there is reduction of the images so there is a more efficient
performance by reducing the response time.
In addition there are some
pre-processing steps that are done in most algorithms in order to adapt the
input frames to fulfill some precondition to a specific algorithm. Depending on
the algorithm the frames might be examined as they are without any further
modification while other are extracting different measurements of the face.
Proceeding with the next step where there the extraction of facial feature take
place and the measurements are weighted, compared and assessed to determine
whether there is a face in the frame or not, and if yes, where it is located.
Eventually other approaches have a learning procedure where they adjust the new
frames to their existing models. With
that being said face detection can be seen in two sub-problems determining
whether there is a face in the frame or not. Thus extending it further to that
point where it resembles a simplified face recognition algorithm. Whereas face detection algorithms have the same
techniques as face recognition.
3.2 Face Detection Approaches
Classifying face detection approaches is not so easy but
rather difficult to achieve. As there isn’t a specific criteria when it comes
to grouping them due to the fact that they overlap and are combined with each
other at some extent. Based on two taxonomy criteria that are presented below,
one of them perform better in clear-cut scenarios. Depending on the scenario
different algorithms might be used. While the other criteria make a division of
the detection approaches into four groups.
criterion “Scenario dependent”
Controlled environment. The frames are
undemanding due to the controlled light, face towards the camera and the
techniques for face detection are simple
.Colored frames. Skin
color is a key factor as it used to find the human faces among other objects in
the frame. This technique is not that robust because if the light conditions
tend to change then there might be over exposure or a lot of noise in dark
situations. Even though the human skin
has the tendency to change drastically, several studies determine this change
is due to the different intensities with that being said chrominance is quite a
good feature to be considered in a particular scenario where the light variance
is kept steady to some point. marqeus PAPER 117. Building an effective
robust face detection approach based on skin color has a lot of
challenges to deal with as it is not easy to set up a solid human skin color.
Nevertheless there are approaches that established face detection algorithms on
this domain marqeus PAPER 99.
Motioned frames. It is usually present
in video where the subjects are in motion. The algorithm is seeking and trying
to localize the face where the motion is present. A lot of researches putting
effort to achieve best detecting results and highest performance at the same
time. Another significant algorithm based on motion detection is eye blink
detection that has gained a lot of attention in the motion detection field and
beyond that marqeus PAPER 30, 53.
criterion “Division of the detection approaches into four groups”
Classification of these approaches was made possible by Yan, Kriegman and Ahuja
marqeus PAPER 117. Where the approaches are sub-divided into four groups. Taking
into consideration that these approaches might overlap or be mixed with one
another with other words an approach could belong to more than one group.
classification is presented below:
Comprehensive-based approaches. Controlled-based
approaches that convert the
comprehension of human faces. .
Constant face features approaches. The
algorithms that are seeking constant face features discarding the pose
Modeled combining approaches. The Algorithms
are measuring the input features of the frames from the sensor with the one
that are stored on the database to see if the faces are matched or not.
approaches. The model is learnt from a set of training images relying on
These four categories are
examined in further details below:
These are controlled-based
approaches. Trying to capture our Comprehension
of human faces and interpret them into simple rules. For example, a human face generally
has two eyes that are symmetric, where the part of the eyes is usually darker
than the rest of the face. With that being said, some facial features can
considered the distance from one eye to he other. Another facial feature is the
color intensity difference in the eye area in contrast with the cheeks area. A
major issue that occurs with these methods is finding the appropriate number of
rules in order to keep the right balance between the false positives and false
negatives. In the case where the rules are too general then could be many false
positives. Inversely if the rules are too detailed there is a higher chance of
A solution to
overcome such problems is to construct a hierarchical comprehensive-based approach.
Although, this method shows quite limitations in finding many faces in complicated
frames when it operates alone.
There are a
lot of researches who showed large interest and tried to find some constant
face features for face detection. The concept is to overwhelm the boundaries of
our intuitive comprehension of faces. In the early stage of this algorithm Han,
Liao, Yu and Chen gave their contribution in 1997 marqeus
was divided in several sub-steps: the initial step is to determine eye-analogue
pixels so that the undesirable pixels are removed completely from the frame.
segmentation operation is carried out each individual eye-analogue segment as a
candidate of one of the eyes. In addition to that the set of rules is executed
to examine any possible pair of eyes.
As soon as the eyes are determined, then this technique computes
the face area
as a rectangle which are regulated by a set of functions to show that the potential
face is detected, normalized, well oriented and at a fixed size.
Next, the face
area is verified utilizing a back propagation neural network.
to approach it is applied a cost function to make the final
success rate that is reported reaches 94%, tested even in frames with a lot of human faces. Those
methods perform in an efficient way when inputs are simpler. But, how about in
the case of a human wearing glasses?
To answer this
question we take into account other features that candle such problems. For
at above sections there are approaches that detect faces as textures or just by
the color of the skin thus choosing the
appropriate color model is important. A good color model is the RGB or HSV that
is used widely in this area of face detection through human skin marqeus
PAPER 112 chose the succeeding parameters.
Parameter 1.0 Parameter
and Parameter 2.0 have been utilized to
detect pixels in the frame that contain skin color. Nevertheless, these
approaches themselves do not have a good performance in detecting the face due
to the variations of the skin color which can be changed dramatically when the
light conditions that hit the subjects changes as well.
algorithm is preferred to be combined with other techniques such as structure
and geometry or even local symmetry.
Modeled combining approaches
techniques and attempting to determine the subject’s face as a function.
Trying to figure out a
propel model for every subject’s face is not easy task, different physiological
characteristics are defined independently. For instance the
subject’s face can be divided into sub-categories such as the eyes, mouth, nose
and face contour. There is also the possibility that the face template can be
constructed by edges, but it is not so convenient as the techniques are
restricted to faces that are towards the camera and with no occurred occlusion.
From the other side it is observed that a face can be also represented as an
outline. Some models perceive the relation between face regions as the level of
The level of
required patterns are measured with the input frames to estimate the
similarities or dissimilarities between frames trying to avoid false positive
and false negative in order to detect the subject’s faces.
can be applied straightforward, but it’s insufficient for face detection.
Because the performance is poor when it comes to pose variation, shape and
scale. In any case there is a proposal
to solve these issues, one of them is deformable models.
techniques are acquired from the instances that are collected in the frames.
Commonly these techniques rely on different segments such as machine learning
and statistical analysis seeking in the frames for physiological characteristics of a subject’s
face. In this group particular approaches are based on a probabilistic network.
Studies show that a feature vector of the face in the frame is a variable that
indicates the probability of belonging to a certain face or not. An additional
method worth mentioning is the discriminant function which is used to determine
among face and non-face classes. Such approaches are also utilized in feature
extraction process for face recognition algorithms. The examination of such
approaches is discussed in further details in the sections below. However, some
of the important tools or approaches are:
Eigenface-based developed by Sirovich
and Kirby Marques paper 101, 56 that
built a well organized approach that would represent subject’s faces by
utilizing the so called Principal Component Analysis or PCA. The aim of this
method was straightforward, where a subject’s face is represented as a
coordinate system whose vectors are named eigenpictures. In the following years
there was an improvement by Turk and Pentland developing the so called
eigenface-based technique for recognition algorithms. Marques paper 110.
Distribution-based. The algorithms proposed Marques paper Sung
106 were designed mostly for patter detection or object detection. The way
this method works is by collecting an adequate number of frames for the pattern
that we are interested in and make sure that all sources of frame variation
that are desired to be handle are covered.
Afterwards a feature space need to be selected
and must show a distribution of all allowable frames appearances in its pattern
class. By applying that, the system matches all potential frames against the
distribution-based canonical face template.
Eventually, to identify instances of a desired pattern class from a
background frame pattern it is used a trained classifier, this is achieved thanks to the distance measurements between
the input pattern and distribution based one in the most appropriate feature
space. Representation of facial patterns
defining the subspace are made possible by PCA and Fisher’s Discriminant
Networks. A considerable amount of pattern recognition problems have had a god
performance rate some of them are character recognition, object recognition,
etc. With that
being said this approach can be utilized also in face detection in various
ways. Marques paper Sung 93 state that prior researchers have used NN to
discover and examine the face and non-face patterns. Where the detection issue
is determined as a two-class problem. From that perspective the ultimate problem
was how to classify the “frames not
including subject’s faces” class. Another
use of neural networks is in
classification of patterns utilizing distance measurements to find a
discriminant function marques paper 106. Although, there are many attempts to
balance the boundary between face and non-face frames by utilizing a limited
productive algorithm marques paper 91.
.Support Vector Machines. This approach
is based on linear classifiers that has
the tendency to maximize the margin among the decision hyperplane and the
instances in the training set. With that being said a well defined hyperplane
should minimize the classification error at its best from the unrevealed test
patterns. The first use of this classifier was by Osuna et al. marques
paper 87 in face detection.
.Sparse Network of Winnows. SNoWs were initially utilized by Yang et al. 118
for detection. Basically they determined a sparse network of two target nodes
or with other words linear units, one of them was appointed for face patterns
and the other for the nonface patterns. Sparse Network of Winnows has a progressive
learnt feature space and the new labeled cases served as positive instance for
one target and as a negative instance for the not yet used target. The
performance was pretty promising and at the same time very efficient.
Classifiers. This algorithm is based on object recognition as described by Schneiderman and Kanade marques paper 96. The
probability of a subject’s face is
calculated in the frame by measuring the frequency of patterns over the training
frames that occur. The Naive Bayes Classifier defines joint statistics of local
appearance and estimate the localization of the face as well as the statistics
of local appearance and localization in the visual world. This technique had a
very good success rate with good outcome on faces that are towards the camera
during the face detection process.
.Hidden Markov Model. The use of this statistical algorithm is mostly for face
detection problems. Building a proper model so the outcome of this probability
to be convincing can be very challenging. This model works with the facial
features that are interpreted as strips of pixels and the probabilistic transition
from one state to another are basically the borders among the pixel strips. As
in Naive Bayes Classifiers the same rule
applies also here in Hidden Markov Model where they are used together with
other approaches in order a detection algorithm to be built.
.Information-Theoretical Approach. In
order to model contextual constraints of a subject’s face pattern or correlated
feature is utilized the Markov Random Fields (MRF) as it increases the
discrimination among classes looking if a frame has a face or not using the divergence from Kullback–Leibler.
Thus comes the utilization of the algorithm in the face detection.
.Inductive Learning. The algorithm was
widely applied in face detection. In the same category fall also Quinlan’s C4.5
or Mitchell’s FIND-S that are utilized for the same purpose MARQUES PAPER 32,
3.3 Face tracking
of the face recognition algorithms have
video sequence as the input. The idea behind these systems is not only to detect particular faces but at the same time
tracking them. With that being said we can comprehend that tracking falls into the category of a motion estimation
problem. To perform such a process there are different methods for example
there is the feature tracking, model-based tracking, head tracking, image-based
tracking and so on. There is a variety of ways that these algorithms can be
classified 125: – .Head tracking where the head can be
tracked as a whole, or particular
physiological characteristics that of course can be tracked individually. – To
get more advance on this process there are the Two dimensional and Three
dimensional system tracking. The 2D algorithm outputs a frame
the face is traced. While the 3D algorithm, from the other side, carries out a 3D modeling of the subject’s face. The 3D
algorithm outperforms the 2D because it allows for proper orientation variations and
of a face tracking process is attempting
to locate a given face in an image.
Afterwards, calculates the dissimilarities among frames to update the location
subject’s face in the frame. During this process there occur a lot of
challenges that must be faced for example partial occlusions, variance in illumination,
the speed of computation and when the human face undergoes deformations due to
facial expression. A popular algorithm in face tracking is worth mentioning the
et al. in MARQUES
PAPER 33. Most of the cases the human face is represented as a state vector
in the central position of the frame surrounded with a rectangle. Any potential
new faces that comes into the frame are assessed by a Kalman estimator. During
the tracking process if a face was already in one of the previous frames then that frame is used a model or template by
Kalman estimator, at the meantime the region is evaluated by a SSD algorithm
after the evaluation is done then the measured
color information is inserted into the
Kalman estimator and it certainly approximate the face region and afterwards
the state vector is updated with the included face that we are tracking. The
output is pretty impressive when there are a considerable amount of faces in
the frame or even when there is a change in the color due to the light conditions.