Revisiting face detection: Supercharging Viola-Jones with particle swarm optimization for enhanced performance

Abstract

In recent years, face detection has emerged as a prominent research field within Computer Vision (CV) and Deep Learning. Detecting faces in images and video sequences remains a challenging task due to various factors such as pose variation, varying illumination, occlusion, and scale differences. Despite the development of numerous face detection algorithms in deep learning, the Viola-Jones algorithm, with its simple yet effective approach, continues to be widely used in real-time camera applications. The conventional Viola-Jones algorithm employs AdaBoost for classifying faces in images and videos. The challenge lies in working with cluttered real-time facial images. AdaBoost needs to search through all possible thresholds for all samples to find the minimum training error when receiving features from Haar-like detectors. Therefore, this exhaustive search consumes significant time to discover the best threshold values and optimize feature selection to build an efficient classifier for face detection. In this paper, we propose enhancing the conventional Viola-Jones algorithm by incorporating Particle Swarm Optimization (PSO) to improve its predictive accuracy, particularly in complex face images. We leverage PSO in two key areas within the Viola-Jones framework. Firstly, PSO is employed to dynamically select optimal threshold values for feature selection, thereby improving computational efficiency. Secondly, we adapt the feature selection process using AdaBoost within the Viola-Jones algorithm, integrating PSO to identify the most discriminative features for constructing a robust classifier. Our approach significantly reduces the feature selection process time and search complexity compared to the traditional algorithm, particularly in challenging environments. We evaluated our proposed method on a comprehensive face detection benchmark dataset, achieving impressive results, including an average true positive rate of 98.73% and a 2.1% higher average prediction accuracy when compared against both the conventional Viola-Jones approach and contemporary state-of-the-art methods.

Keywords

AdaBoost Computer Vision (CV)face detection algorithm particle swarm optimization Viola-Jones

1 Introduction

Face detection and tracking play pivotal roles in a multitude of computer vision applications, encompassing human-computer interaction (HCI), human-robot interaction (HRI), computer surveillance systems, biometrics, facial recognition, facial expression recognition (FER), and various authentication solutions [22]. Yet, it remains an intricate challenge within the realms of computer vision, image processing, and pattern recognition. Face detection involves the identification of faces in digital images or videos, encompassing tasks such as determining their precise locations, recognizing facial landmarks, and even discerning emotional expressions [15]. Furthermore, achieving high detection accuracy in complex backgrounds holds paramount importance in real-time scenarios, where factors such as pose variations, varying illumination, occlusions, and scale variations pose significant hurdles [28].

In recent years, researchers have introduced various techniques to address the challenges associated with face detection. These techniques leverage different types of prior knowledge about faces and can be broadly categorized into four distinct approaches: (i) Knowledge-Based Approach: This approach, as outlined in [16], relies on predefined rules based on human understanding of facial geometry. These rules dictate the relative distances and positions of facial features. By applying these rules, faces are detected and recognized. A subsequent verification process is often employed to eliminate incorrect detections. (ii) Template Matching Method: The template matching method, as described in [20], involves using a predefined face template or a parameterized face model to identify faces within input images. This technique entails analyzing the pixels within an image window using a predefined pattern to determine the presence of a human face. After initial detection, a verification step is typically applied to refine the results. The edge detection method is employed to detect specific facial features such as eyes, nose, and mouth within a face model. This method is utilized both for face detection and facial feature localization. (iii) Feature-Invariant Approach: The feature-invariant approach, as discussed in [9], focuses on extracting structural features of the face. Initially, these features are utilized for classifier algorithms that distinguish between faces and non-faces in images or videos. Such features may include skin tone, facial contours, and specific facial elements like eyes, nose, and mouth. (iv) Appearance-Based Approach: In the appearance-based approach, detailed in [29, 32], a collection of representative training face images is used to create a face model. This model encapsulates pixel intensities, effectively representing the human face. Machine learning techniques are often employed to identify relevant facial image characteristics.

The Viola-Jones algorithm yields significant results in real-time scenarios. It was introduced by Paul Viola and Michael Jones in 2001 [21]. This algorithm is a general-purpose tool for object detection when trained with datasets of other objects. It comprises four key components: Haar-like features with thresholds, integral images, AdaBoost, and a cascade classifier. Haar features are used to extract a vast number of features for identifying faces in images. These features are designed as black-and-white rectangular regions where the difference in pixel intensities is calculated. If the feature values fall below a threshold, the detection window is classified as positive (indicating a face); otherwise, it’s classified as negative (non-face). Integral images expedite the Haar-like Feature extraction process. AdaBoost, a machine-learning algorithm, selects Haar-like features and combines them to build a strong classifier by iteratively selecting the weak features [7]. However, constructing a classifier with a low error rate often requires a significant number of rounds for identifying optimal features. When using a decision stump as a weak classifier, AdaBoost may require more time to identify optimal features. This often leads to a higher false-positive rate, particularly in dynamic environments [31]. Furthermore, AdaBoost must search among over 180,000 possible features, involving a staggering 2.16 × 10¹³ feature evaluation combinations. The cascade classifier efficiently dismisses non-faced regions in images or video frames. Besides, the algorithm is sensitive to face rotation, potentially leading to missed detections if faces are not upright. Scale variations are another challenge, impacting accuracy for extremely small or large faces. The algorithm’s computational complexity during training time is one of its shortcomings, as it necessitates a large number of features due to the exhaustive search mechanism used in the AdaBoost algorithm. Moreover, faces that are partially obscured or hidden by occlusions might not be accurately detected. These challenges stem from the selection of numerous features in AdaBoost and the time required to determine the optimal threshold values for identifying strong features while detecting faces in the search window. However, several studies [5–7 , 9] have suggested exploring more homogeneous feature types to enhance detector performance. Nevertheless, expanding the number of features inevitably leads to a larger feature set and increased storage memory requirements. As the feature space grows significantly, it becomes evident that the exhaustive search mechanism employed in the standard AdaBoost algorithm is inadequate for efficiently managing the search process. Consequently, this prolongs the training time, which constitutes one of the primary factors discouraging many approaches from exploring alternative feature types.

On the other hand, the advancement of deep learning approaches, such as YOLO [40, 41], SSD [44], Fast-RCNN [47], and CNN-based face detection [43], offers significant performance in a real-time environment. These algorithms are general object detection methods designed for real-time processing speed but may not be as specialized for face detection. They could encounter challenges, especially in scenarios with crowded faces, impacting optimal face detection performance. Similarly, Faster R-CNN achieves high accuracy in complex scenarios in images and video sequences but demands a substantial amount of training data and computational resources. This makes it less suitable for simple applications. Notwithstanding, these algorithms require a large amount of data for training and significant computational resources [46] for implementation on small devices such as mobile cameras.

In this paper, we proposed a Particle Swarm Optimization (PSO) algorithm that integrates into the AdaBoost framework and replaces the exhaustive search used in the original AdaBoost for efficient feature selection and finding the optimal threshold values in the decision stump. PSO is used in a wide range of feature screening optimization and computer vision tasks and has given promising results so far [8, 11]. The proposed approach aims to expedite the training process time, minimize the training error, and develop a robust classifier for face detection by selecting discriminative features. The essential contributions of this paper are as follows:

We have optimized the extraneous feature selection process in AdaBoost with the PSO algorithm,

Threshold values of the AdaBoost selection process are optimized using PSO,

The proposed method has reduced the computational time during the feature selection process, and,

The performance of the proposed method has been compared with the conventional method using related metrics of the face detection algorithm.

This paper is organized into five sections. Section 2 presents related works regarding face detection using evolutionary and heuristic approaches and challenges. Section 3 summarizes background information on conventional methods. Section 4 discusses the proposed Viola-Jones algorithm utilizing PSO. Section 5 presents the experimental results and comparison of other state-of-the-art algorithms. Finally, Section 6 concludes the overall works, findings, and results summary.

2 Related work

2.1 Selection criteria

This section presents existing works in the face detection approach, focusing on efforts to reduce computational time and optimize the feature selection using evolutionary and heuristic approaches like PSO. To gather related works for this analysis, we conducted searches across various databases, including Scopus, Web of Science, IEEE, and Science Direct. We used specific keywords such as “Face Detection”, “PSO”, “Viola-Jones algorithm”, and “Object Detection and PSO” to refine the search. While many keywords were available, we limited our search to find papers related to Viola-Jones and evolutionary algorithms. Fig. 1 illustrates the PRISMA (Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols) flow diagram, depicting the paper selection process and the exclusion criteria applied to carry out this research work. Additionally, this paper [2, 27] provides insights into existing face detection methods and highlights current challenges that occur still in the real-time face detection process.

Fig. 1

Prisma flow diagram for paper selection for this study.

2.2 Face detection using machine learning

Perez and Vallejos [7] proposed using PSO to optimize template-based face detection on frontal faces. This approach yielded significant results, relying on face size and line integral values. Lu and Ming [35] introduced a composite feature-based face detection algorithm to enhance the detection rate of rigid objects on faces. They conducted an experiment using the FDDB benchmark dataset and achieved considerable results compared to conventional approaches. Huang et al. [34] improved the Viola-Jones algorithm by upgrading it with HoloLens and enhancing face detection using Haar-like rectangle features. This approach resulted in a 12% increase in average detection accuracy compared to existing face detection methods. Mohemmed et al. [5] proposed optimizing the AdaBoost feature selection process using the PSO evolutionary algorithm. This method selects the best features and optimizes the threshold values within the search space. Experiments were conducted with a “Wisconsin Breast Cancer” image dataset, achieving an average classification rate of 0.97% and a false-negative rate of 0.02%. Similarly, Zakaria and Suandi [38] combined a neural network with AdaBoost methods for face detection, improving detection performance and creating a robust AdaBoost classifier. However, the method was too complex for rapid face detection. An AdaBoost neural network, developed by Zakaria et al. [39], is hierarchical, with the skin module roughly identifying faces, the AdaBoost filtering non-face regions, and the neural network serving as the primary face recognition tool. Li et al. [19] created a Gaussian model for skin color distribution, identified regions with different shades, and detected skin color areas using a cascade classifier. Additionally, the work of Lee et al. [33] attempted to incorporate a weight adjustment factor into a normalized support vector machine (SVM) as the base learner for AdaBoost.

In addition, Zhang and Ye [14] modified AdaBoost by incorporating two features: used PSO to determine the threshold values corresponding to the optimal solution of these two features, and they formed a strong classifier by combining weak classifiers based on these dual features. Zhang and Fan [12] employed Q-statistic correlation determination in training weak classifiers to reduce commonality among them and eliminate similar rectangular functionalities. In a study conducted by Yang et al. [36], they utilized a neural network and AdaBoost to develop an efficient pedestrian detection algorithm. Krishnan et al. [46] designed face detection for thermal and visible image registration using a saliency map strategy integrated with PSO techniques. The author achieved an average improvement of 16.93% similarity index score and 7.02% image quality index score. Subsequently, Besnassi et al. [47] introduced a dispersed Haar filter and optimized it with PSO, differential evolution, and genetic algorithm. The author achieved significant results when using Haar-differential evolution on frontal face detection on various state-of-the-art face detection datasets. Babu et al. [50] presented a facial expression recognition system based on a Deep belief network and PSO used for feature extraction with PCA. Taherkhania et al. [4] developed a CNN based on AdaBoost to reduce the computational processing time required for component prediction over large training datasets. These approaches either reduce training time or enhance detection rates, but none of them completely address the shortcomings of the Viola-Jones-based face detection algorithm. Similarly, Deep learning-based approaches like the paper [45] have presented a modified version of U-NET for significant face detection and recognition accuracy of face images captured by AI cameras with filters. This study attained reasonable detection and recognition accuracy on AI face images. Ranjan et al. [42] introduced a deep pyramid single-shot face detector method for face verification and identification. However, this study does not focus on face-detection challenging images.

2.3 Face detection challenges

Face detection is the first step of various face-related applications such as face recognition, facial emotions recognition, face tracking, and face analysis. The process of face detection is to identify the location of the face in images or video frames. There are two different types of factors that affect the effectiveness of the face detection algorithm. One is intrinsic factors that affect face appearances through facial hair, sunglasses, age, and cosmetics, and another one is extrinsic factors, which are illumination, pose variation, scale variations, and noise [30]. However, face detection techniques always require an efficient method to detect the face in various challenging conditions [12]. Figure 2 shows examples of typical challenges associated with faces.

Fig. 2

Sample Face Detection challenge images [30].

The first one is locating, and detecting the face is not an easy task in motion and when it has a complex environment. The second one is the illumination which affects the image visibility by the magnitude of light intensity as well as patterns of shading and shadows of the visible image. The third one is, pose variations with different rotations of the face orientation. It is one of the serious problems for face identification. However, the face detection system can handle rotation of head movement up to 40° in the Viola-Jones algorithm. It becomes more challenging when it goes to a higher angle if the trained image is with a particular angle. The final one is occlusion, which is a blockage on the face. It is one of the hardest challenges in face detection when the whole face is not available as input images.

Hence, there are several face detection algorithms in deep learning and machine learning. Currently, the Viola-Jones algorithm is still widely used in digital cameras and social networking applications. Many authors have attempted to modify the Viola-Jones algorithm using various techniques for general object detection purposes. However, there is still a gap in identifying problems and exploring new approaches for improvement.

3 Methodology

This section briefly explains the fundamental taxonomy of the Viola-Jones and PSO algorithms and their significance in integrating to enhance face detection performance.

3.1 Viola-Jones algorithm

Viola and Jones [21] introduced the Viola-Jones algorithm for general object detection, but it was later trained using face images for face detection.

The Pseudocode for the AdaBoost algorithm is as follows:

Algorithm 1 Pseudocode for AdaBoost [21]
Given N examples (x₁, y₁) , … (x_i, y_i) . . , (x_M, y_M) where y_i∈ { 0, 1 }
Define initial weights w_1,i = 1/2m, 1/2l for y_i = 0, 1 respectively, where m and l
represent the number of negative and positive entries.
for t=1,...,T do
(1) For each feature j, train a classifier h_j ()
(2) Evaluate the error of the classifier ∈_t = ∑_i=0w_t,i . b_i
(3) Select a classifier h_t() with the minimum error ∈_t
Update weights: $w_{t + 1, i} = w_{t, i} β_{t}^{{1 - b}_{i}}$ (2)
where b_i=0 if h_t(x_i) = y_i, b_i =1 otherwise
with β_t = ∈ _t/(1 - ∈ _t) (3)
end for
Output strong classifier:
$H (x) = {\begin{matrix} 1, if sign (\sum_{t = 1}^{T} α_{t} b_{t} (I) \geq θ \sum_{t = 1}^{T} α_{t}) is positive \\ 0, otherwise \end{matrix}$
With α_t = log (1/β_t)

This system accelerates the detection process by promptly eliminating non-face images upon detection. It employs four fundamental sets of Haar-like features, Integral images, AdaBoost, and Cascade classifier. For identifying facial features, this system utilizes five sets of Haar-wavelet features, which are black-and-white regions subtracted to compute features (refer to Fig. 3(a)). Approximately 1.80 million pixels of features are generated by varying the height, width, and feature position in a 24 × 24 moving window, as depicted in Fig. 3(b). Integral images play a crucial role in rapidly computing these simple features, as defined by Equation (1). To construct a robust classifier, it’s important to note the abundance of rectangle attributes associated with sub-windows [35, 13].

$II (x, y) = \sum_{x^{i} \leq x, y^{'} \leq y} I (x^{i}, y^{i})$ (1)

Fig. 3

Haar-like Features for feature extraction.

Recalling a substantial number of features selected from rectangles allows for the construction of an effective classifier. The primary objective is to identify the relevant features. The fundamental AdaBoost algorithm is presented in Algorithm 1. This algorithm iteratively trains a weak classifier over T rounds. During training, the algorithm adjusts the weights for samples that were misclassified, increasing the weight for those misidentified and decreasing it for those correctly identified. Likewise, correctly classified samples are less likely to be included in the next iteration, while misclassified samples are given greater consideration. AdaBoost takes a training sample, denoted as S = (x₁, y₁) , … (x_i, y_i) . . , (x_m, y_m) with a size M, as input. In this context, each sample x_i represents a vector in the domain space X, and y_i represents a label in the label space Y. A weight vector is assigned to each sample and updated during each iteration of the training process (as described in Equation (2)). The error rate for each sample is calculated using Equation (3), as shown AdaBoost algorithm. Based on the weights assigned to the weak classifiers h₁, and h₂, the final strong classifier, H (x), is determined. Therefore, this algorithm is particularly focused on challenging facial samples that are difficult to detect. This research work focuses on binary classification in which Y = {0, 1}.

Algorithm 2 Proposed Viola-Jones algorithm using PSO
Input: Original sample images
Output: Identified face will be shown in the bounding box
for i ← 1 to number of different scale images do
Load sample image to create images_i
Calculate integral image, images_i,i
Given N labeled samples (x₁, y₁) , … (x_i, y_i) . . , (x_N, y_N) where y_i∈ { 0, 1 }
(0,1 class labels)
Initialize w_1,i = 1/2m, 1/2p for y_i = 0, 1, where m and p represent
no. of negative and positive samples respectively.
for t=1,.....,T do
(1) Initialize the weights: $w_{t, i} = \frac{w_{t, i}}{\sum_{j = 1}^{n} w_{t, j}}$ for each feature j, instruct a classifier h_j ()
(2) If t ≤ T/2
Optimize weak classifiers ${h_{j} ()}_{j = 1}^{J}$ using PSO algorithm:
${h_{t}, \in_{t}} = PSO ({h_{j}}_{j = 1}^{j}, {x_{n}, y_{n}, w_{n}}_{n = 1}^{N})$
Evaluate the fitness function for each particle
Select the classifiers h_t() matching to the particles at the best global position
end if
(3) Assess the classifier weights $α_{t} = \log \frac{1 - ?_{t}}{?_{t}}$
(4) Reform weights: $w_{t + 1, i} = w_{t, i} β_{t}^{{1 - b}_{i}}$ where b_i=1 if h_t(x_i) = y_i, b_i =–1 otherwise
with β_t = ∈ _t/(1 - ∈ _t)
end for
Output strong classifier:
$H (x) = {\begin{matrix} 1, if sign (\sum_{t = 1}^{T} α_{t} b_{t} (I) \geq θ \sum_{t = 1}^{T} α_{t}) is positive \\ - 1, otherwise \end{matrix}$
With α_t = log (1/β_t)
If sub-window verified all per-stage checks then
Select this sub-window as a face
end if
end for
Optimization Function PSO () for AdaBoost method
Input arguments ${{h_{i} ()}_{j = 1}^{j}, {x_{n}, y_{n}, w_{n}}_{n = 1}^{N}}$
Define C_s, C_g = 2, W_min = 0.2, w = w_max = 1.5
Define random parameters: r_s, r_g ∈ [0, 1]
Define state vector: $X_{t}^{j} \in R^{D} and V_{t}^{l} \in R^{D}$ with random values.
for l=1, ..., L
for i=1, ..., I
(1) Set a classifier h ( $X_{t}^{l};; x)$ to the training examples using weights Adw_n
(2) Evaluate $\in_{t}^{j} = \frac{\sum_{n = 1}^{N} w_{n} X \| h (X_{i}^{l};; x_{n}) - y^{n} \|}{\sum_{n = 1}^{N} W_{n}}$ (8)
(3) Updates the particles:
$V_{i} (t + 1) = V_{i} (t) + c_{s} r_{s} (Q_{1} (t) - X_{i} (t)) + c_{g} r_{g} (Q_{2}^{g} (t) - X_{i} (t))$
$X_{i} (t + 1) = X_{i} (t) + V_{i} (t + 1)$
(4) Update the personal best point $Q_{i}^{s}$ , if necessary
end for
(5) Update the global best point Q^g, if necessary
(6) Update momentum: w $w_{\max} - \frac{l}{L} (w_{\max} - w_{\min})$
end for
return{h_{Q ^g} () , ∈ _{h _{Q ^g()}} }

3.2 Particle swarm optimization

This algorithm works based on a population approach to determine optimal function parameters through a naturally inspired optimization method known as particle swarm optimization (PSO) [5 , 25]. PSO, a stochastic gradient technique inspired by the collective behavior of a swarm, was initially proposed by James Kennedy in 1995. In the PSO algorithm, each solution is referred to as a particle in the search space. These particles have cost values that are optimized by a cost function, and their velocities define their orientation [26, 3].

The PSO process begins with a random population of particles drawn from the original solution space, and their velocities are initialized at irregular intervals within the problem search area. The motion of all particles is guided by the most promising location in the search space, as well as the best position of each particle. The ideal position for each particle is attained by adjusting the particle’s speed and acceleration [37]. According to the following equation, the velocity and position of each particle ‘i’ are updated at iteration ‘t’:

$\begin{matrix} V_{i} (t + 1) = V_{i} (t) + c_{1} s_{1} (P_{i} (t) - Q_{i} (t)) \\ + c_{2} s_{2} (P_{i}^{g} (t) - Q_{i} (t)) \end{matrix}$ (4)

$Q_{i} (t + 1) = Q_{i} (t) + V_{i} (t + 1)$ (5)

Here, s₁ and s₂ represent random values within the range [0,1], while c₁ and c₂ represent cognitive constants, and P and V denote the position and velocity of the particles, respectively. In each iteration, all particles undergo dynamic modifications based on the aforementioned position and velocity, as defined by the algorithm above. Consequently, in most cases, the velocity quickly attains extremely high values, especially for a population distant from its global optimum. PSO consists of two topologies: the Global Neighborhood Topology, which promotes information sharing among particles, and the Ring Topology, which restricts knowledge transfer and prevents a population from converging to a local best solution [3].

4 Combining Viola-Jones with PSO

In this paper, we propose two ways of approaches to optimize the Viola-Jones algorithm for enhancing prediction accuracy and minimizing training errors. First, we employ PSO to select the optimal threshold values in the decision stump for choosing the best features. Second, we optimize AdaBoost to select the optimal features, enhancing face detection accuracy and speed on cluttered real-time images.

The proposed algorithm of Viola-Jones using PSO pseudo-code is given below.

4.1 Selecting threshold value using PSO

In the proposed method, for selection of threshold values, a weak classifier has significantly improved the computational efficiency of the base algorithm by utilizing a decision tree with two leaves, commonly known as a decision stump instead of exhaustively searching for a multitude of features to construct a weak classifier, we employ PSO to pinpoint the optimal decision stump threshold. When decision stumps are employed as weak classifiers on complex datasets, the algorithm must explore all possible thresholds to minimize training error. Consequently, finding the best threshold values can be time-consuming. In such cases, an evolutionary search strategy PSO is invaluable as a proposed approach, accelerating the training of an AdaBoost classifier. Furthermore, each iteration of the PSO approach is dedicated to learning a new weak classifier, and through numerous runs, it may uncover the ideal set of values that collectively form a strong classifier.

In the PSO algorithm, the cost function is utilized to optimize each particle within the entire solution space. Particles employ thresholding values to categorize the solution space into two classes: 1 (representing ’face’) and 0 (representing ‘non-face’). In the initial stage, sample values greater than the threshold are classified as 1, while values below the threshold are classified as 0. This classification is reversed in the subsequent stage during the training of weak classifiers. The training loss is computed for each subgroup, and the weak classifier output with the lowest error is selected. For instance, let S = ((x₁, y₁) … . (x_n, y_n)) represent a training set of weak classifiers, where the labels y_i ∈ {0, 1}. To calculate each particle, a decision stump requires three parameters: the decision limit (+1 or 0), index characteristics (j), and the optimized threshold value to split the solution space. For input examples x, Equation (6) defines the positive cost function, while Equation (7) defines the negative stump.

$h_{j, θ} + (x) = {\begin{matrix} + x, if x (j) \geq θ \\ x, otherwise \end{matrix}$ (6)

$h_{j, θ} - (x) = {\begin{matrix} - x, if x (j) \geq θ \\ x, otherwise \end{matrix}$ (7)

The computational cost of the enhanced Viola-Jones algorithm depends on two factors: the population size (S) and the number of iterations (T). Each step in the boosting procedure optimizes the S×T classifiers. PSO is employed to select the best threshold Haar-like features in the AdaBoost.

4.2 Selecting the best features in the AdaBoost algorithm using PSO

Enhancing the speed of the face detector without compromising classifier accuracy is a crucial objective. However, the exhaustive feature selection process in AdaBoost often leads to increased complexity. Furthermore, the limited learning capacity of the simple decision stump classifier reduces the efficiency of conventional face detection approach. To address this, we have incorporated the PSO in AdaBoost for the selection of the optimal features for face detection and optimizing the computational processing time. Considering these factors, we propose two improvements to our face detector to reduce the computational burden of feature selection and enhance the selection speed. Firstly, we employ PSO to select optimized threshold values, as discussed in the previous section. Lastly, we combine the PSO technique with the AdaBoost algorithm, enabling rapid exploration of the entire feature space and the selection of the most optimal feature sets, thus expediting the training process and minimise the training error. Algorithm 2 illustrates the proposed Viola-Jones using PSO approach.

In the AdaBoost classifier, exhaustive searches are conducted each time to select relevant features and minimize classification errors. To address the high complexity associated with this exhaustive search, we introduced the use of PSO within the AdaBoost algorithm. PSO is applied to explore potential feature locations, sizes, orientations, and combinations, resulting in the selection of a discriminative feature set. These selected features are then incorporated into AdaBoost to construct an ensemble classifier. The PSO demonstrates efficient search capabilities compared to exhaustive search techniques. In PSO, each particle could explore not only its own space but also the spaces of other particles. Consequently, many particles collectively strive to identify the best possible positions. However, this collaborative approach can lead to a decline in the diversity of selected features as we integrated a random feature selection approach. Specifically, we initially employ PSO to identify the most relevant features at an early stage. As the boosting phase unfolds, our proposed approach transitions to random feature selection to uncover additional discriminative features, thus expanding the pool of candidate features. This adjustment strikes a balance between efficient feature selection and the preservation of feature diversity, enabling us to discover a wider range of optimal features during the boosting process.

5 Experiments and discussion

This section analyzes the performance of the conventional Viola-Jones algorithm and compares it with an improved approach incorporating PSO optimization. In the conventional Viola-Jones algorithm, AdaBoost employs an exhaustive search to build a weak classifier, while in the proposed approach, AdaBoost utilizes an optimized search to select the best features and threshold values. Furthermore, significantly reduced the false positive rate.

5.1 Dataset description

Face images were collected from the Yearbook Dataset of frontal-facing American high-school seniors [23], while non-face images were obtained from the Stanford Background Dataset [24] and ImageNet [10]. These images are used for both training and testing purposes. These databases contain 4,999 different face images and 6,960 non-face images, all with a pixel resolution of 25 × 25. The positive and negative images are randomly divided into two folders. The training folder comprises 1,200 positive and 1,000 negative grayscale images (See Fig. 4). The test set consists of 750 positive and 658 negative images. This experiment was validated using the Wider Face test benchmark dataset, which includes various real-time facial detection challenge images [30]. The dataset comprises 32,203 photographs and labels 393,703 faces, covering a wide range of scales, poses, and occlusions.

Fig. 4

Training and testing sample images(a) Face images (Positive Images) [23] (b) non-face images (Negative Images) [24].

5.2 Evaluation metrics

The face detection algorithm has two classes: faces and non-faces. The performance of the proposed method is evaluated using Equations (9) and (10). The True Positive Rate (TPR) is used to measure how well the model correctly predicts the positive class. The equation for TPR is given below:

$TPR = \frac{True Positive (TP)}{True Positive (TP) + False Negative (FN)}$ (9)

False Positive Rate (FPR) is used to measure the outcome of the model that incorrectly predicted the negative classes. This equation is given below:

$FPR = \frac{False Positive (FP)}{False Positive (FP) + True Negative (TN)}$ (10)

To construct the weak classifier for selecting the best threshold value, we examined the particle’s size and maximum iteration using the ImageNet dataset. The results are displayed in Table 1 showing the performance of various particle sizes and their iterations. The selected optimal threshold value is then applied in the feature selection section of AdaBoost to optimize the features and computation time. Besides, the best PSO parameters were chosen according to Table 1 (Particle size 20 and iteration 100).

Table 1

The training error of the training dataset

S (No. of particles)	T (Iterations)	Training Error
5	50	0.9034
5	100	0.8912
10	100	0.8713
10	50	0.9613
20	100	0.8531
20	50	0.1034
30	100	0.8989

5.3 Parameter setting and threshold selection

To analyse the proposed approach reliability, accuracy and time spend of each sample parameter are used. The proposed approach consists of 200 particles and could run for up to 1,000 iterations for constructing a weak classifier. However, it terminates if there is improvements are observed in the feature selection process within the global solution search space. Initially, the population is randomly defined, with the feature selection parameters (x, y, w, h) in the range of [0, 250] and the feature type in the range of [0, 4]. The social value parameters are set to c₁ and c₂, both ranging from 100 to -100. Random values are independently sampled from the range [0, 1], and Q1 and Q2 are both set to 3.05. These experiments were conducted with 1,000 iterations, and the results, including the best, worst, and average, are reported in Table 2. These experiments were run on Google Colab with GPU K80.

Table 2
Number of features needed for detection

Best Average Worst

Viola-Jones 340 340 340

Proposed 120 134 150

	Best	Average	Worst
Viola-Jones	340	340	340
Proposed	120	134	150

5.4 Feature selection using PSO

The proposed approach is analyzed in terms of classifier accuracy and execution time. The experiments were repeated ten times for each algorithm, and the best, average, and worst outcomes are presented in Table 2. Additionally, the analysis indicates the number of features required for face detection process on complex face images. The performance of face detection is influenced by both the population size and the number of PSO iterations. The results demonstrate the effectiveness of PSO for optimal feature selection in this problem when compared to the conventional Viola-Jones algorithm.

This experiment reveals that the proposed method utilizes as few features as possible compared to the conventional algorithm. Table 2 shows the number of features generated for a strong classifier during the training process. Viola-Jones with PSO required only 120 features in the best case, 134 in the average case, and 150 features in the worst case for building the weak classifiers, whereas the conventional Viola-Jones algorithm required 340 features in the best case. Therefore, the proposed method constructs superior classifiers using only 120 features in the best case, which is significantly fewer than the conventional Viola-Jones method.

Table 3 summarizes the overall comparison between the conventional Viola-Jones and the proposed method on the ImageNet test dataset. The proposed approach achieved an average classification accuracy rate of 98.73% and a False Positive Rate (FPR) of 1.27% on the face images when testing. In contrast, the conventional Viola-Jones algorithm achieved an average accuracy of 96.63% and a 3.37% FPR on the face images. These results indicate that the proposed method is not only more efficient than the Conventional-VJ algorithm but also more effective in classifying unseen datasets. Furthermore, the proposed method requires less time to test the dataset compared to the conventional algorithm.

Table 3
Face detection performance

Approach TPR\newline (Average) FPR\newline (Average) Time

Viola-Jones 96.63 % 0.0337 52.5s

Proposed 98.73 % 0.0127 30.6s

Approach	TPR\newline (Average)	FPR\newline (Average)	Time
Viola-Jones	96.63 %	0.0337	52.5s
Proposed	98.73 %	0.0127	30.6s

Fig. 5

ROC curves of Conventional-VJ and VJ with PSO.

The performance of the proposed approach and the conventional Viola-Jones algorithm is depicted using the Receiver Operating Characteristic (ROC) curve (See Fig. 5). The performance of the proposed method is represented in orange color, whereas the conventional Viola-Jones algorithm’s performance is shown in green. According to the ROC curve, the proposed approach achieved a 98.73% accuracy on the face and non-face image dataset, whereas the conventional algorithm achieved 96.63%. After a successful testing process, the proposed approach converted as face detection model also saved as.XML file, allowing it to detect faces in various challenging contexts. Finally, a comparison of the performance of the classic machine learning based face detection algorithm and the proposed technique is presented in Table 4. The Viola-Jones algorithm with PSO performed effectively in various face detection complex real-time face images, including scale variation, illumination, pose variation, and occlusion, compared to the conventional method. Table 5 shows the results of prosed approach detection performance and conventional approach.

Table 4

Performance comparison of the proposed method with another approach in Face Detection

Face detection approach	Accuracy
Viola–Jones, Geometric Distribution [17]	95%
Viola–Jones, Condensation Algorithm (CA), NN, SVM [6]	95%
Skin Color Algorithm, Circular Hough Transform [1]	80%
Kalman Filter, Principal Component Analysis (PCA), Local-Binary-Pattern (LBP), SVM [2]	95%
Proposed (Viola-Jone with PSO)	98.73%

Table 5

Performance comparison of proposed and conventional Viola-Jones algorithm on the Wider benchmark dataset [30]

Challenges in Face Detection	Conventional Viola-Jones	Viola-Jones with PSO
Scale variation

Illumination

Pose variation

Occlusion

The computational complexity of the optimized Viola-Jones algorithm is determined by two parameters: S, which represents the number of particles, and T, which represents the number of iterations. In contrast, the computational complexity of the AdaBoost algorithm is determined by the parameter N, denoting the number of samples. The time complexity of the proposed algorithm at each stage of the boosting technique is O(S×T), whereas the time complexity in the base model is O(Nˆ2). The basic AdaBoost technique trains a weak classifier in polynomial time, while the improved PSO-based Viola-Jones algorithm’s time complexity scales linearly with S and T.

6 Conclusion

In this paper, we propose an efficient and enhanced face detection approach using PSO in Viola-Jones to improve prediction accuracy for complex real-time face images. This research work aims to enhance the optimal feature selection process and global threshold determination in AdaBoost and Haar-like features using PSO. The use of PSO enables a reduction in false-positive rates and computational time significantly. The proposed approach constructs a more efficient weak classifier for face detection in complex face images. Instead of an exhaustive feature search, PSO optimizes the selection process, leading to better performance. The proposed method is validated on the Wider face detection benchmark and demonstrated superior results compared to the conventional algorithm. It achieved an impressive average true positive rate of 98.73% with only a 1.27% false positive rate. Additionally, the proposed approach significantly reduced face detection time on the test samples. Although the proposed approach outperformed the conventional algorithm in terms of true positive rate, longer training time on the dataset. The results suggest that the proposed method can be a promising solution for achieving accurate and rapid face detection in various applications.

Acknowledgment

The authors convey sincere thanks to the ISO Certified (ISO/IEC 20000-1:2018) Centre for Machine Learning and Intelligence (CMLI) funded by the Department of Science and Technology (DST-CURIE), India for providing the facility to carry out this research study.

Conflict of interest

The authors declared that no conflict of interest in this work.

Author’s contribution

P. Subashini developed the methodology and design of the manuscript. Diksha Shukla contributed to the text and content of the manuscript, including entire revisions and edits. M. Mohana conducted the experiments, compared the results, created the figures, and drafted the entire manuscript with contributions from the co-authors. All authors were involved in conducting the experiment and analyzing the results. They have reviewed and approved the content of the manuscript and are willing to be held accountable for the work.

Ethics approval and consent to participate

This study was approved by the Avinashilingam Human Ethics Committee, Coimbatore, India. The approval number is AUW/IHEC/CS-21-22/XMT-03.

Data availability

Open-source Wider Face, Yearbook, and ImageNet Datasets.

Fund availability

There is no external funding for this research study.

Declaration of AI and AI-assisted technologies in the writing process

During the preparation of this work, the authors utilize Grammarly assistant tools included in Microsoft Word for grammar checking. After using these tools, the authors reviewed and edited the content as necessary and take full responsibility for the publication’s content.

References

Dasgupta

, George

, Happy

S.L.

and Routray

A.A.

, Vision-based system for monitoring the loss of attention in automotive drivers, IEEE Transactions Intelligent Transportation Systems 14(4) (2013), 1825–1838.

Kumar

, Kaur

and Kumar

, Face detection techniques: A review, Artificial Intelligence Review52 (2019), 927–948.

Sharma

and Singh

, Object detection in image using particle swarm optimization, International Journal of Engineering and Technology 2(6) (2010), 419–426.

Taherkhani

, Cosma

and McGinnity

T.M.

, AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning, Neurocomputing 404 (2020), 351–366.

Mohemmed

A.W.

, Zhang

, Johnston

Particle swarm optimization based adaboost for face detection. In 2009 IEEE Congress on Evolutionary Computation (2009), 2494–2501. IEEE.

Fatima

, Shahid

A.R.

, Ziauddin

, Safi

A.A.

and Ramzan

, Driver fatigue detection using viola jones and principal component analysis, Applied Artificial Intelligence 34(6) (2020), 456–483.

Perez

C.A.

and Vallejos

J.I.

, Face detection using PSO template selection, In 2006 IEEE International Conference on Systems, Man and Cybernetics 5 (2006), 4220–4224.IEEE

Marini

and Walczak

, Particle swarm optimization (PSO). A tutorial. , Chemometrics and Intelligent Laboratory Systems (2015), 153–165.

Hosni

H.A.

, Mahmoud and H.A. Mengash, A novel technique for automated concealed face detection in surveillance videos, Personal and Ubiquitous Computing 25 (2021), 129–140.

10.

Deng

, Dong

, Socher

, Li

L.J.

, Li

and Fei-Fei

, Imagenet: A large-scale hierarchical image database, In 2009 IEEE conference on computer vision and pattern recognition (2009), 248–255. IEEE.

11.

Kennedy

and Eberhart

, Particle swarm optimization, In Proceedings of ICNN’95-international conference on neural networks4 (1995), 1942–1948.

12.

Zhang

J.C.

and Fan

, AdaBoost face detection algorithm based on correlation,, Computer Engineering 37(8) (2010), 158–163.

13.

Huang

, Shang

and Chen

, Improved Viola-Jones face detection algorithm based on HoloLens, EURASIP Journal on Image and Video Processing 1 (2019), 1–11.

14.

Zhang

and Ye

Q.W.

, Improved AdaBoost face detection algorithm based on dual features, Wireless Communication Technology 29(2) (2020), 23–27.

15.

Kirana

K.C.

, Wibawanto

and Herwanto

H.W.

, Facial emotion recognition based on Viola-Jones algorithm in the learning environment, In 2018 International seminar on application for technology of information and communication (2018), 406–410. IEEE.

16.

Zhang

and Lenders

, Knowledge-based eye detection for human face recognition, In KES’2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No. 00TH8516) 1 (2000), 117–120. IEEE.

17.

Flores

M.J.

, Armingol

J.M.

, de la Escalera,

, Real-time warning system for driver drowsiness detection using visual information, Journal of Intelligent & Robotic Systems 59 (2010), 103–125.

18.

Mohammadpour

, Ghorbanian

, Mozaffari

AdaBoost performance improvement using PSO algorithm. In 2016 Eighth international conference on information and knowledge technology (IKT) (2016), 273–275. IEEE.

19.

, Wang

, Li

and Liu

, Analysis of face detection based on skin color characteristic and AdaBoost algorithm, Journal of Physics: Conference Series 1601(5) (2020), 052019.

20.

Bose

and Bandyopadhyay

, Human face and facial parts detection using template matching technique, International Journal of Engineering and Advanced Technology 9(4) (2020), 2249–8958.

21.

Viola

, Jones

Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR (2001), 1. IEEE.

22.

Belaroussi

and Milgram

, A comparative study on face detection and tracking algorithms, Expert Systems with Applications 39(8) (2012), 7158–7164.

23.

Ginosar

, Rakelly

, Sachs

, Yin

and Efros

A.A.

, A century of portraits: A visual historical record of American high school yearbooks, In Proceedings of the IEEE International Conference on Computer Vision Workshops (2015), 1–7.

24.

Gould

, Fulton

, Koller. Decomposing

Decomposing a scene into geometric and semantically consistent regions. Proceedings of International Conference on Computer Vision (ICCV) (2009).

25.

and Du

, Improved adaboost face detection, In 2010 International Conference on Measuring Technology and Mechatronics Automation 2 (2010), 434–437. IEEE

26.

Cagnoni

, Mordonini

and Sartori

, Particle swarm optimization for object detection and segmentation, In Workshops on Applications of Evolutionary Computation (2007), 241–250.

27.

Minaee

, Luo

, Lin

, Bowyer

Going deeper into face detection: A survey. arXiv preprint arXiv:2103.14983 (2021).

28.

Singh

and Prasad

S.V.A.V.

, Techniques and challenges of face recognition: A critical review, Procedia Computer Science 143 (2018), 536–543.

29.

Soleymani

, Chaudhary

, Dabouei

, Dawson

and Nasrabadi

N.M.

, Differential morphed face detection using deep Siamese networks. In Cham: Springer International Publishing, International Conference on Pattern Recognition (2021), 560–572.

30.

Yang

, Luo

, Loy

C.C.

and Tang

, Wider face: a face detection benchmark, In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), 5525–5533.

31.

Ephraim

, Himmelman

, Siddiqi

Real-time violajones face detection in a web browser. In 2009 Canadian Conference on Computer and Robot Vision (2009), 321–328. IEEE.

32.

Dietterich

T.G.

, Ensemble methods in machine learning, In Multiple Classifier Systems: First International Workshop, MCS 2000 Cagliari, Proceedings 1 (2000), 1–15. Springer Berlin Heidelberg.

33.

Lee

, Jun

C.H.

and Lee

J.S.

, Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification, Information Sciences 381 (2017), 92–103.

34.

Kong

, Zhou

, Wang

, Zhang

, Liu

and Gao

, A system of driving fatigue detection based on machine vision and its application on smart device, Journal of Sensors 2015(1) (2015), 1–11.

35.

W.Y.

, Ming

Y.A.N.G.

Face detection based on viola-jones algorithm applying composite features. In 2019 International Conference on Robots & Intelligent System (ICRIS) (2019), 82–85. IEEE.

36.

Yang

, Liang

, Zhou

and Lu

, A face detection method based on skin color model and improved AdaBoost algorithm, Traitement du Signal 37(6) (2020), 929–937.

37.

, Ai.

Face detection in color images using AdaBoost algorithm based on skin color information. In First International Workshop on Knowledge Discovery and Data Mining (WKDD) (2008), 339–342. IEEE

38.

Zakaria

, Suandi

S.A.

Face detection using combination of Neural Network and Adaboost. In TENCON 2011–2011 IEEE Region 10 Conference (2011), 335–338, Bali, Indonesia.

39.

Zakaria

, Suandi

S.A.

and Mohamad-Saleh

, Hierarchical skin-AdaBoost-neural network (H-SKANN) for multi-face detection, Applied Soft Computing 68 (2018), 172–190.

40.

, Tan

, Yao

, Liu

YOLO5Face: Why reinventing a face detector. In European Conference on Computer Vision. Cham: Springer Nature Switzerland (2022), 228–244.

41.

Chen

, Huang

, Peng

, Zhou

and Zhang

, YOLO-face: A real-time face detector, The Visual Computer 37(2021), 805–813.

42.

Ranjan

, Bansal

, Zheng

, Xu

, Gleason

, Lu

, Nanduri

, Chen

J.C.

, et al. A fast and accurate system for face detection, identification, and verification, IEEE Transactions on Biometrics, Behavior, and Identity Science 1(2) (2019), 82–96.

43.

Mamieva

, Abdusalomov

A.B.

, Mukhiddinov

and Whangbo

T.K.

, Improved face detection method via learning small faces on hard images based on a deep learning approach, Sensors 23(1) (2023), 502.

44.

, Hao

and He

, Single-stage face detection under extremely low-light conditions, In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), 3523–3532.

45.

Hedman

, Skepetzis

, Hernandez-Diaz

, Bigun

and Alonso-Fernandez

, On the effect of selfie beautification filters on face detection and recognition, Pattern Recognition Letters 163 (2022), 104–111.

46.

Whang

S.E.

and Lee.

J.G.

, Data collection and quality challenges for deep learning, Proceedings of the VLDB Endowment 13(12) (2020), 3429–3432.

47.

Sharma

V.K.

and Mir

R.N.

, Saliency guided faster-RCNN (SGFr-RCNN) model for object detection and recognition, Journal of King Saud University-Computer and Information Sciences 34(5) (2022), 1687–1699.

48.

Krishnan

P.T.

, Balasubramanian

, Jeyakumar

, Mahadevan

and Noel

, Joseph Raj, Intensity matching through saliency maps for thermal and visible image registration for face detection applications, The Visual Computer 39(10) (2023), 4529–4542.

49.

Besnassi

, Neggaz

and Benyettou

, Face detection based on evolutionary Haar filter, Pattern Analysis and Applications 23 (2020), 309–330.

50.

Babu

, Kumar

and Kannaiyaraju

, Face recognition system using deep belief network and particle swarm optimization, Intelligent Automation & Soft Computing 33(1) (2022), 317–329.

Revisiting face detection: Supercharging Viola-Jones with particle swarm optimization for enhanced performance

Abstract

Keywords

1 Introduction

2 Related work

2.1 Selection criteria

2.3 Face detection challenges

3.1 Viola-Jones algorithm

4.1 Selecting threshold value using PSO

5 Experiments and discussion

5.1 Dataset description

Table 2 Number of features needed for detection Best Average Worst Viola-Jones 340 340 340 Proposed 120 134 150

Table 3 Face detection performance Approach TPR\newline (Average) FPR\newline (Average) Time Viola-Jones 96.63 % 0.0337 52.5s Proposed 98.73 % 0.0127 30.6s

Acknowledgment

Conflict of interest

Author’s contribution

Ethics approval and consent to participate

Data availability

Fund availability

Declaration of AI and AI-assisted technologies in the writing process

References

Table 2
Number of features needed for detection

Best Average Worst

Viola-Jones 340 340 340

Proposed 120 134 150

Table 3
Face detection performance

Approach TPR\newline (Average) FPR\newline (Average) Time

Viola-Jones 96.63 % 0.0337 52.5s

Proposed 98.73 % 0.0127 30.6s