Abstract
Multiple exposure fusion (MEF) is attracting considerable attention in research on high dynamic range (HDR) imaging: Eliminating the need to generate an intermediate HDR image, MEF directly expands an image’s dynamic range and thus provides greater detail enhancement than traditional HDR techniques. However, in the fusion stage, the optimal weights of each pixel in the images input to the final synthesized image are challenging to determine and usually required manual tuning of parameters. In addition, many MEF algorithms have been proposed, but most have lacked a self-regulation mechanism. To tackle the above disadvantages, we apply fuzzy theory and present a novel MEF framework with a fuzzy feedback structure. In this work, over- and under-exposed images are generated from a single input image using local histogram stretching. This avoids the creation of ghost artifacts when multiple exposed images are fused in the dynamic scene containing object motion. In the fusion stage, fuzzy logic is used to determine pixel weights based on gradient and chrominance analysis, and a guided image filter is used to suppress noise and enhance edges in the weight maps. To ensure detail enhancement without excessive or insufficient sharpness, we developed a simple sharpness measure named the edge-map overlapping rate (EOR). With EOR and the feedback structure, users are allowed to manipulate the output synthesized image to their preferred sharpness level, and the above weights are appropriately redesigned by automatically regulating the magnitude of the fuzzy input. From experimental results, this work demonstrated excellent image quality and outperformed other existing HDR/MEF methods.
Introduction
Common camera only can capture limited range of intensities, for example, 8-bit depth intensity levels for each color channel. However, the human visual system (HVS) responds to real-world scenes, which contains much greater dynamic range (about 20-bit depth intensity levels) than that of captured digital images. Therefore, HDR imaging is an essential technique used in photographs to overcome the limited dynamic ranges of cameras [1].
Traditionally, the HDR scene is first obtained by merging several differently exposed low dynamic range (LDR) images; to visualize the HDR scene on the conventional 8-bit display devices, the HDR image is then converted into a final LDR image by using tone mapping algorithms [2]. Compared to traditional HDR, multiple exposure fusion (MEF) is a more recently developed technique that produces details and directly expands the dynamic range of the final synthesized image without the need to generate an intermediate HDR image.
Although MEF methods generate more informative final synthesized LDR image than traditional HDR, the challenge comes from how to appropriately combine useful information from input images, and generate a final image that resembles the natural world scene attained by human visual system as close as possible. Most MEF methods can be classified into two categories: 1) multiple image-based MEF [3–11]; and 2) single image-based MEF [17–23].
For the methods belong to the first category, the input LDR images are captured by using the same imaging sensor at different exposures. Shen et al. [3] proposed a HDR image composition method to fuse multi-exposure images based on an extended Retinex model. Gu et al. [4] proposed an MEF algorithm based on gradient field, which is derived from the structure tensor using multi-dimensional Riemannian geometry. To preserve the detail information, the gradient field is modified with twice average filtering under an iterative process, until the gradient magnitude of each pixel is determined. Shen et al. [5] proposed an exposure fusion approach using a boosting Laplacian pyramid, which involves boosting the detail and base signal independently. Patel et al. [6] proposed a MEF method that improves the visual perception of HDR images. The Weber’s law states that people cannot perceive variations in bright areas as much as in dark areas. Therefore, they proposed an exposure control mechanism that compensates for the limitations of the HVS by adaptively adjusting the exposure level. Li et al. [7] proposed a multiscale MEF algorithm that uses guided image filtering (GIF) to smooth the Gaussian pyramids of the weight map of each input LDR image. Details are preserved in the brightest and darkest regions of the final synthesized image. Other multiple image-based fusion methods have also been proposed in [8–11].
However, for the methods in the first category, fusing a sequence of differently exposed images is typically suitable for static scenes only. In dynamic scenes with camera movement or moving objects such as cars and people, multiple image-based MEF methods are susceptible to image artifacts such as ghosts and edge blurring.
To overcome ghost and blur problems, Li et al. [12] proposed a hybrid patching algorithm for MEF methods with moving objects. First, the motion regions of the differently exposed images are detected through consideration of spatiotemporal consistencies. To compensate for the inconsistent pixels in the motion regions, an optimization problem is formulated and then solved using both the intensity mapping method at the pixel level and the hole-filling method at the block level iteratively. However, this method is computationally expensive and the convergence speed depends heavily on the parameter setting. Li et al. [13] proposed a ghost removal algorithm, which includes a bidirectional normalization-based method for detecting the moving pixels of each LDR image. The correction of inconsistent pixels takes time because it involves a two-step approach using temporal and spatial correlation. Numerous deghosting methods that are designed for MEF have previously been proposed in [14–16]; yet most of them require high computation cost.
For the methods belong to the second category, only single input LDR image is sufficient, and the images with other exposures are generated from the same image with post-processing. Typically, three exposed (normally-, under-, and over-exposed) LDR images are required. Therefore, it inherently removes the ghost artifacts that occur in multiple image-based MEF. Im et al. [17] proposed a ghost-free single image-based MEF method, in which the three LDR image are generated from the input image by local histogram stretching. An edge-preserving denoising method is proposed, and the weighting map is computed by a Gaussian-shaped function of the pixel value. Celebi et al. [18] proposed a fuzzy-based MEF method wherein a contrast limited adaptive histogram separation scheme is used to generate additional LDR images from a single input image. To account for pixel visibility, fuzzy logic is applied to determine the pixel weights when fusing the images. Raveendranath and Johnson [19] proposed a method for HDR image generation, in which differently exposed images are generated by varying the gamma value of the input image using power law transformation. However, the selection of the most suitable gamma value was not discussed in the study.
Therefore, single image-based MEF methods are an important subset of HDR imaging. For mobile devices, small low-cost sensors are commonly used to reduce manufacturing costs. However, they have limited light-capturing capacity and require long exposure times, which tend to be suffered from ghost artifacts in the dynamic scenes. On the other hand, the computational power of mobile devices is limited and cannot handle sophisticated deghosting processes, which limits the application of mobile HDR imaging. Single image-based MEF methods thus seem to be a promising solution to mobile HDR imaging. Because of its importance in dynamic-scene HDR and mobile HDR imaging, Gu et al. [20] proposed a single image-based HDR method using multiscale decomposition, where an input image is decomposed into one base layer (containing variations in intensity) and three detail layers (containing small-scale details). A local edge-preserving (LEP) filter is utilized to fuse these layers. Other single image-based HDR methods have also been proposed in [21–23].
Although single image-based MEF methods have the advantage of being ghost-free, most of them are one-step approaches that lack a self-regulating mechanism. That is, most methods determine the final pixel weights without checking the visual quality of the synthesized output image. In addition, for most existing MEF methods, the sharpness of the final synthesized image tends to vary according to the content of input images and is sometimes detail-preserved but excessively sharp, or insufficiently sharp (i.e. blurring in details).
Considering the above shortcomings of the existing methods, we present a single image-based MEF method. Its advantages are summarized as follows: We apply the concept of using fuzzy logic to determine pixel weights from our baseline method [18] and further modify [18] by accounting for the color vividness (in Section 2.2) to improve the visual quality. In addition, the GIF technique is utilized in the fusion stage to enhance details and reduce noise (in Section 2.3). We propose a MEF framework that consists of a simple sharpness measure EOR and a feedback structure (in Section 2.4). This allows users to decide their preferable sharpness, and the above pixel weights are automatically and appropriately adjusted by the feedback structure. Therefore, compared to existing methods, this work demonstrates robust sharpness-control ability that avoids excessively or insufficiently sharp. The proposed MEF method is evaluated on various visual quality measures (in Section 3), and the results show that this work outperforms the state-of-the-art methods. Through consideration of the EOR difference between the desired EOR and the current EOR, we propose an adaptive adjustment scheme to fine-tune the pixel weights, which enables the fuzzy loop to converge within a reasonable number of loops.
This paper presents a fuzzy-based HDR, which can appropriately determine pixel weights. The rest of the paper is organized as follow. Section 2 presents the proposed single image-based MEF method that belongs to the abovementioned second category. In Section 3, we provide the experimental results and compare them to the existing methods in Refs. [17, 20]. Finally, Section 4 concludes our work.
Proposed single image-based MEF method
This section presents the proposed MEF method with fuzzy feedback control for sharpness. It can be considered as an improved version of Celebi’s work [18]. As shown in Fig. 1, this method consists of four steps: 1) histogram stretching; 2) formulating a fuzzy rule base; 3) image fusion based on fuzzy weights and GIF; and 4) sharpness evaluation and self-regulation using the feedback structure. The following subsections describe each step.

Overall flowchart of the proposed fuzzy-based MEF method, where abbreviations NE, OE, and UE denote images with normal-, over-, and under-exposures, respectively.
Typically, single image-based MEF requires three exposure-level LDR images: normal-, under-, and over-exposed images. In this paper, we adopt the LHS method proposed in [17] to generate the three LDR images, detailed as follows. In LHS, the input image is transformed from RGB (indicating red, green, and blue) to YUV (indicating luminance, chrominance, and chroma) color space. The input image is assumed to be the normal-exposed image, but with a quite limited dynamic range, and the two other differently exposed LDR images are generated by stretching the Y channel of the input image.
LHS utilizes the weighted histogram separation (WHS) technique, which is based on the concept of data separation units. Let H
l
denote the histogram of the Y channel, WHS divides H
l
into two subsets by finding an adaptive threshold thr:
LHS is performed by using thr to divide H
l
into two subhistograms (H0 and H1), which are defined as
As shown in (2), the first subhistogram contains information on lower luminance levels (ranging from 0 to thr), and the second subhistogram contains information on higher luminance levels (ranging from thr + 1 to 255). Therefore, LHS generates two subset images by using linear stretching in H0 and H1, respectively:
In Celebi’s work [18], the same WHS technique and same linear stretching process were used. Furthermore, [18] applied an iterative-based RANdom Sample Consensus (RANSAC) method to determine the weighted factor in (1) and used the contrast limited adaptive histogram equalization (CLAHE) method after (3) to partially flatten individual subhistograms through the control of noise amplification. In [18], the quality of under- and over-exposed LDR images was supposedly improved by the addition of RANSAC and CLAHE. However, its fusion stage simply involves inputting the normalized luminance and smoothed local variance as fuzzy inputs and outputting the fuzzy weights.
In contrast, we argue that the main factor of MEF that dominates the visual quality of the final synthesized image, should be the fusion stage; more specifically, determining the appropriate weight level of each pixel when merging different LDR images. Therefore, we simply apply the LHS method of [17] to generate LDR images and improve the design of the fusion process.
It is well-known that fuzzy logic provides an intuitive means of modeling complex systems through linguistic variables. In [18], the two fuzzy inputs were apparently overemphasized in the luminance (Y) channel, and chrominance information in the UV channels was underemphasized. We observed that in some cases, the output synthesized image is excessively sharp and thus appears unnatural.
To cope with above disadvantage, this paper proposes that both the gradient in the Y channel and the chrominance information in the UV channels should be considered simultaneously. This is based on our observation that generally, well-exposed regions tend to contain stronger gradients and more vivid colors than under- and over-exposed regions. Two fuzzy inputs are thus proposed. The first input is the local gradient in the Y channel, which is defined as
The second input is the color vividness in the UV channels, which is defined as
During the transformation from RGB to YUV, the values of Y, U, and V all range from 0 to 255: The Y channel indicates the illuminance level, and the chrominance components in the UV channels indicate the color difference signals. If both U and V approach 128, neutral colors are shown such as black, gray, and white, and these colors are usually found in under-exposed, shadow, and over-exposed regions. In such regions, viewers pay more attention to detail contours than to color difference. In other words, for regions with less vivid color, unless the local gradient is high, the weights of the region’s pixels should be low in the construction of the final synthesized image.
Figure 2 specifies the proposed membership functions of the fuzzy inputs and the output weight, and Table 1 constructs the fuzzy rule base for determining the pixel weight in the proposed MEF method. As shown in Table 1, if both gradient and color vividness are high, the pixel is more likely to be well-exposed, and thus this pixel should make a large contribution to the final synthesized image. Combining (4) and (5) with fuzzy logic enables us to model the complicated MEF weighting process in an efficient and straightforward manner.

Membership functions of the proposed fuzzy rule. (a) Normalized input 1 (gradient). (b) Normalized input 2 (vividness). (c) Output weight.
Fuzzy rule base of the proposed MEF approach*
*L: low, M-L: medium-low, M: medium, M-H: medium-high, H: high.
Edge-preserving denoising is a major concern in the fusion stage because noise tends to be amplified when the Y channel is stretched; by contrast, simple averaging removes not only noise but also the essential edge information of the final image. This paper proposes using the GIF technique [24] to smooth the weight maps (i.e., the outputs of fuzzy logic) of all the LDR images. To save the computation cost, we adopt the fast GIF method [25], which is 10 times faster than traditional GIF but with almost no visible degradation. The filtered weight maps determine the final contribution level of each pixel in the three LDR images. Once the final pixel weights are determined, the final synthesized image is constructed by an arithmetic operation of the input LDR images:
It is an important observation of this paper that although edges are mainly perceived according to the Y channel, in order to increase control and consistency when fusing images, GIF is used to smooth the weight maps rather than the stretched histograms or the gradient maps in the Y channel, which is different from our baseline method [18]. This is mainly because in the final step, the fusion of three LDR images is based on a linear relationship among the corresponding weight maps, as shown in (6). However, because fuzzy logic is nonlinear, GIF should be performed after fuzzy logic to achieve more favorable visual quality.
Edge enhancement is a common challenge in MEF methods because merging images often blurs edges, significantly degrading visual quality. Many MEF methods try to solve this problem by increasing the importance of edge pixels when determining the pixel weights. However, this tends to unnaturally alter the overall sharpness of the entire image.
By contrast, the proposed method can allow users to dynamically control the sharpness level and appropriately redesign the pixel weights by automatically regulating the magnitude of the fuzzy input through a feedback structure. To do this, we propose a simple sharpness measure for MEF, called edge-map overlapping rate (EOR), which is defined as
EOR value ranges from 0 to 1 and represents the measure of image sharpness-preserving (compared to the input image), where C input and C fused indicate the canny-based edge maps of the input LDR image and the current fused image, respectively. In this paper, we employ Canny edge detector to determine edge maps because it is efficient and commonly used in real-time image processing applications. The underlying concept of the proposed EOR is the preservation of the crucial edge pixels of the input LDR image.
With the proposed EOR, users can control the sharpness level of the output image by manipulating their desired EOR. Once the fused image is obtained using (6), its overall sharpness is checked according to whether
When α is larger than A + τ, the current fused image is implied to be excessively sharp and requires less weight in the edge pixels (i.e., pixels with high gradient values). Conversely, when α is smaller than A - τ, insufficient sharpness is implied. Therefore, this work uses the difference between A and α as a reference to adjust the first fuzzy input (i.e., the local gradient) as shown in (4), which can be expressed by
As shown in (10), the proposed adjustment scheme is adaptive. For example, if α is much smaller than A - τ (i.e., implying insufficient sharpness and blurs in edges), the initial gradient
The adjustment using (9) is repeated until is within the desired range shown in (8). In some cases, the resulting HDR images after the first fuzzy fusion are of highly favorable quality and suitable sharpness, which require no further loop. Based on our experiments, most images converge within three loops, and two loops are required on average.
This section evaluates the performance of the proposed method in various aspects. There are eight test images used for verifying algorithms, and most of them are randomly selected from the online database HDR Photographic Survey. Figure 3 shows thumbnails of all test images. In addition to [18], we also compare the proposed method with a single image-based MEF method [17] and a single image-based HDR method [20]. All four methods were implemented using MATLAB R2013a on an Intel Core i5 machine with 3.3 GHz CPU and 4 GB RAM.

Test images. (Images 1 to 4) First row from left to right: Lounge, Mt. Rushmore, Exploratorium, and Sunset Point. (Images 5 to 7) Second row from left to right: Devil’s Golf Course, Las Vegas, and Niagara Falls. (Image 8) Right side: Chapel.
This subsection presents the effects of two important steps used in the proposed MEF framework: 1) GIF-based denoising and 2) self-regulation using the feedback structure. First, it is proposed in this paper that before the LDR images are fused, applying GIF in fuzzy weight maps to enhance details and reduce noise. Figure 4 provides an example demonstrating the effect of using the GIF method.

Contribution of GIF: local edge-enhancing denoising (using the portions of Chapel). (a) and (c): Noisy HDR image without GIF. (b) and (d): HDR image after GIF has been performed in weight maps.
As shown in Fig. 4(a), without the application of GIF, the contour information near the window arch is blurred and the shapes of the bricks on the ceiling can barely be distinguished, whereas in Fig. 4(b), the aforementioned problems have been solved using GIF. In addition, as shown in Fig. 4(c), there is noise (spots with inconsistent colors) within the red rectangle, whereas in Fig. 4(d), GIF has successfully removed this noise and enhanced the local edge.
Because the filtered output of GIF is locally a linear transform of the guidance image (in this study, the weight map itself), local structures can be utilized effectively. The advantage of this (i.e., denoising while preserving important structural edge pixels) can be more clearly observed from the Canny-based edge map of the fused image, as shown in Fig. 5.

Contribution of GIF: denoising while preserving essential structural edge pixels (using the edge map of Chapel). (a) Without GIF. (b) With GIF.
Second, the effect of using feedback structure to self-regulate the overall sharpness is presented as follows. Fine-tuning the pixel weights used to fuse the three LDR images is highly challenging. When pixels with high gradient intensity are given weights that are too high to enhance the output details, the fused image might be excessively sharp and the noise might be amplified, degrading the visual quality. Conversely, when the structural edge pixels are not given enough weights, the fused image might be blurred in details, which degrades the visual quality, too. In addition, other factors such as color vividness and consistency should be taken into account.
To deal with this, this work allows the users to dynamically control their preferable sharpness level, and then, the appropriate pixel weights are all adaptively adjusted by the proposed fuzzy rules and feedback structure. The EOR difference between (desired sharpness level) and (current sharpness level) are used as a reference, and thus it avoids the case of over-adjusting the pixel weights. Figure 6 shows an example of executing feedback loop in one and two times: Although the overall sharpness is different between Fig. 6(b) and 6(c), they both have satisfactory visual quality.

Contribution of feedback structure to the fine-tuning of overall sharpness. (a) Original image. (b) After the first fuzzy fusion (EOR = 0.445). (c) After the second fuzzy fusion (EOR = 0.462). Please refer to the barbed wire region in the bottom right to see the slight sharpness difference.
There are two advantages of using the proposed fuzzy feedback loop scheme. First, the proposed method can automatically self-regulate the pixel weights according to the user’s desired sharpness. Second, using feedback loop allows us to tune the pixel weights more precisely by accounting for the EOR difference. Moreover, this method is computationally efficient, because as shown in (9), only the first fuzzy input is adjusted, meaning that the second fuzzy fusion takes less time than the first. Generally, by using fuzzy logic, our baseline method [18] already generates high-quality HDR images that some resulting images of [18] are of highly favorable quality and suitable sharpness, which require no further loop. This work further improves [18] by utilizing the feedback loop to make the fuzzy fusion more self-adaptive and suited to the user’s preference.
As explained in Section 2.4, users are instructed to freely select their desired EOR A within the range between 0.4 and 0.52. Based on our experiments, when the tolerance value (τ) is set to 0.05, an average of two loops are needed for convergence.
To demonstrate the superiority of the proposed method, we compare it to three existing single image-based HDR methods: those of Im [17], Celebi [18] (our baseline method), and Gu [20]. Figure 7 shows the HDR images generated by different methods using the image Mt. Rushmore. For the result of [17] (Fig. 7b), because of the simple fusion process, the local edge is blurred: Please refer to the enlarged version of the fence (bottom middle) and brick regions (bottom right). It is because the fusion process in [17] is almost as simple as an arithmetic averaging of the three input LDR images. For the result of [18] (Fig. 7c), although the detailed edges are preserved using fuzzy logic to determine pixel weights, the dynamic range is not fully stretched and the fence region is therefore slightly dark. Moreover, [18] seems to over-enhance the sharpness of the clouds and some unnatural artifacts are visible in the center of the sky. For the result of [20] (Fig. 7d), the sky region is over-exposed and the detailed contours of the clouds are lost. By contrast, the proposed method demonstrates significant improvement on the aforementioned shortcomings and yields a visually pleasing HDR image, as shown in Fig. 7(e).

Figure 8 presents another visual comparison using the image Chapel. For the result of [17] (Fig. 8b), it can be seen that same as the case of Fig. 7(b), it is insufficiently sharp compared with others; so much so that the details of the ceiling (such as the brick texture) are nearly lost. By contrast, although details are preserved in the result of [18] (Fig. 8c), the image is excessively sharp in some regions, as indicated by the unnatural textures visible on the backs of the pews. This occurred because when the fuzzy-based pixel weights are determined in [18], pixels with high gradient or high intensity are given excessive weights, whereas color consistency is ignored. For [20] (Fig. 8d), the output HDR image is somehow over-exposed to the extent that a color shift occurs (which can be seen in the color checker). The result of proposed method (Fig. 8e) provides apparently higher visual quality. Compared to [20], the image produced is brilliant, edges are preserved, and the overall color tone of the input image is completely maintained. The overall sharpness of Fig. 8(e) is controlled to a visually pleasing degree; that is, the local edges are appropriately enhanced without the loss of detail (compared with [17]), neither over-enhancing the noise nor the unnatural textures (compared with [18]).

For the comparisons involving objective image quality criteria, we adopt the same criteria as were employed in our baseline method [18]: Entropy and Cumulative Probability of Blur Detection (CPBD) [27]. The first criterion (Entropy) measures the uncertainty associated with the richness of color distribution. Higher Entropy value indicates that the image has richer details.
The second criterion (CPBD) measures the sharpness of the image. More specifically, CPBD utilizes the concept of just noticeable blur (JNB) to predict the percentage of edges where blur cannot be detected. Higher CPBD value indicates perceptually better preserving details with enough clarity. Tables 2 and 3 compare the results for Entropy and CPBD, respectively. From them, the proposed method demonstrated better performance (higher Entropy and CPBD) than other methods.
Comparisons of Entropy among methods
Comparisons of CPBD among methods
Table 4 shows the execution times for the generation of HDR images using different methods. The execution time for the image Chapel was about twice as that of other images due to its large size. Compared to our baseline method [18], this work requires less time: Although the proposed method uses a feedback loop to adaptively adjust weights, it saves less time: Although the proposed method uses a feedback loop to adaptively adjust weights, it saves time by omitting the CLAHE (used in [18]) to filter the stretched histograms. For the proposed method (in Table 4), the loop number is set as two according to the average number required for convergence as explained in Section 3.1. The execution times of [17, 20] were shorter than those of [18] and this work. It is because when implementing [18] and the proposed method, we use Matlab fuzzy toolbox to determine fuzzy weights instead of designing a simplified code as used in [18]. (From the result shown in Table 3 of [18], the average execution time of [18] is about twice as that of [17].) In addition, it is important to note that as discussed above, the proposed method achieves apparently higher visual quality than others.
Comparisons of execution times among methods (sec)
This paper presents a novel MEF method for HDR imaging. This work is ghost-free because it requires only one input image. Considering various image contents, it is difficult to determine suitable pixel weights without any adjustment; if the output HDR image is slightly too sharp or not sharp enough, its quality will be degraded. Therefore, the proposed method allows users to set their preferred sharpness level, and the pixel weights are automatically and appropriately adjusted by the feedback structure. For the design of fuzzy rules, unlike our baseline method [18], we propose that in addition to the gradient in the Y channel, color vividness in the UV channels should be considered. In addition, GIF is utilized to enhance details and reduce noise. The adaptive adjustment scheme proposed in (10) enables the fuzzy loop to converge within a limited number of loops.
Although this work is single image-based MEF, it actually addresses the fundamental problem of how to appropriately determine and adjust pixel weights when fusing images so as to generate high-quality HDR images, and thus the proposed framework can also be used for multiple image-based MEF. The experimental results of several subjective and objective image quality evaluations demonstrated that the proposed method outperforms other single image-based HDR methods.
Regardless of showing many advantages, there are still some interesting future works of this study. First, a software optimization technique is required. In our baseline [18] and this study, the implementation of fuzzy fusion is based on the Matlab fuzzy toolbox. As shown in Table 4, both methods seem to require much more time than [17]. However, as documented in [18], using a 2D Look-Up Table to simplify fuzzy fusion can greatly decrease the processing time to approximately 1.6 ms for fusing a full HD image through fuzzy fusion. Second, this work proposes EOR as a measure of sharpness because it is simple and compatible with real-time Canny edge detector. However, we found that sometimes, slight differences in overall sharpness cannot be distinguished using EOR. CPBD is an alternative means of sharpness measure, but its computation cost might be a concern in real-time processing. We plan to develop another efficient measure of sharpness and integrate it into our system in the future.
