Abstract
Introduction
Traditional imaging device can only capture a limited spectral emission or radiation of lights of the target scene, in which exist unpredictable objects or materials. Compound imaging and advanced signal processing are widely studied to enhance imaging capability and adjustability. Heterogeneous image fusion is a typical image-enhancing technique that combines at least two distinct source or sensor images into a single composite image. The fused image will be more informative, which may be deteriorated and limited by low signal-to-noise ratio condition. Hence, many applications are benefited from computational theory of multi-source image fusion, especially for the social security surveillance, network security monitoring, network video analysis, and military observation.1–3
According to the heat radiation emitted by the object, infrared rays can be used to detect hidden heat sources in complex environments. To capture the heat radiation of specific targets, the infrared image sensor is designed in such a way that it is only sensitive to light wavelength between 780 and 2000 nm, but at the cost to lose the information of visible light parts. On the contrary, visible images contain more rich texture of surrounding environment, which may also be important to effective target observation.4,5 In this article, we discuss the visible and infrared image fusion problem by contrast and directional features optimization (CDFO).
The image fusion can usually be divided into three levels: pixel level, feature level, and decision level. Traditional feature-level fusion method includes weight-averaging method, Bayesian estimation method, and cluster analysis method. 6 These methods combine the features extracted from multi-source information on different sensors (including shape, edge, region, contour, texture, and horn). The feature extraction and comprehensive analysis can be well processed even with number of different sensor images. However, the hierarchical information fusion at different resolutions is still an open problem. Decision-level fusion method 7 could find feature correspondence via classification and identification of each image, and then further fusion process will be carried out by global optimization. The non-uniform resolution problem cannot produce a good fusion result due to simple decision-level fusion with multi-resolution images. In recent years, pixel level–based fusion methods8,9 show more advantages on those cases with complex images. For example, Kannan and Perumal 10 proposed a pixel-level image fusion method based on discrete wavelet transform (DWT), in which an image can be decomposed into a sequence of different spatial resolution images. Then, the fusion process can be applied on different image scales. Li and Dong 11 and He et al. 12 presented a contrast pyramid (CP) fusion method. The original image is resolved into a series of decomposing layers with different resolution and frequencies. The fusion process is carried out separately on the components of each spatial frequency layer. In order to get better direction information, Jin et al.13,14 suggested fusing the satellite infrared image with visible image by introducing directional filter banks. This method can extract the directional sub-graph of infrared image, so that the directional information can be well preserved during the fusion process.
The signal processing by human visual system is performed on different channels of different scales, 15 so we introduced CP to pre-processing. Considering the direction information in full and the edge feature of high-frequency information, directional filter banks are used to get more direction features. First, the CP is used to decompose the image and adjust the contrast in different scales. The input multi-channel images are decomposed on different frequency bands, and the multi-resolution CP sequences form a pyramid-like structure. Each scale of CP corresponds to different spatial frequency characteristics. Second, we use directional filter banks to process the decomposition image on the basis of CP. Filter group provides an efficient analysis and synthesis of discrete signal structure. Specific structure of filter bank can obtain continuous multi-resolution base. Hence, we can obtain the abundant direction information and multi-scale information. During the fusion process, traditional methods determine the fusion coefficient weights according to the experience of human, but we cannot get a good fusion effect. Therefore, whale optimization algorithm (WOA) is introduced to search the optimal fusion coefficients adaptively. Compared to majority of the traditional methods, the proposed method based on CDFO provides an accurate image fusion result for getting the most significant features from the multiple sensor images. In this article, the overall technical route of image fusion based on CDFO is shown in Figure 1.

Overall technical route of CDFO image fusion.
Related work
Simple pixel-level fusion can be treated as a linear summation of pixels from the distinct source images, 16 as shown in equation (1)
where

Pixel-level image fusion mechanism.
CP decomposition
In order to obtain multi-resolution-level image information, the original visible and infrared image should be pre-processed. The principles of CP decomposition are as follows.
Establishing Gaussian decomposition
CP is based on Gaussian pyramid. First, we use
where 0 <
The limitation of the above constraint is to ensure the low-pass property and the smoothness of the image after the image is reduced and expanded, and no seam effect occurs. Thus, we construct
Building the CP
In equation (2), we can calculate
The window function has low-pass filter properties, thus
where
Rebuilding the original image from the CP
Transforming equation (4), we can get a reconstruction equation of CP
According to equation (5), we can iteratively start from the top layer,
Construction of directional features
Considering the multi-sensor image characteristics, we use directional filter banks19,20 to capture the directional features. The traditional directional filter banks lack multi-scale analysis capabilities, so in this article, we design a new directional filter bank to make it suitable for multi-sensor visible and infrared image.
It consists of two-channel diamond filter bank and parallelogram filter bank that cascade through the tree structure and through the

Directional filter bank spectral segmentation: (a) spectrum division diagram and (b) decomposition part.
In Figure 3(b),
Figure 3(b) shows the three-stage cascaded decomposition of the directional filter bank (the reconstructed part is the inverse of the decomposed part). The wedge-shaped directional spectrum partitioning makes it suitable for extracting the edge contours with spatial localization in the image.
The WOA
Traditional fusion uses the maximum fusion rule or experience to determine the fusion coefficients, thus the fusion effect can be greatly deteriorated by unreasonable coefficient selection. In this article, we adopt a heuristic intelligent optimization algorithm to determine the optimal fusion coefficients.
Some brain regions of the whale are similar to the brain structure of human beings. 21 Prey is one of the most characteristic behaviors of the whale. By studying the hunting behaviors of whales’ group in the nature environment, researchers have proposed a meta-heuristic optimization algorithm which is called WOA. In 2015, Mirjalili and Lewis 22 proposed this method by simulating the hierarchy of the whale population and their hunting behaviors.
The proposed CDFO method
Contrast and directional features
The traditional CP decomposition has good physical meaning but ignores the direction information and affects the fusion effect. The directional filter bank adds direction based on the CP decomposition, which not only maintains the physical meaning of contrast but also detects the direction information well and can provide more feature information for the fused image.
In directional filter banks, we define a sampling of multi-dimensional space
First, we give a proposition:
In this article, we use the decomposition of two-dimensional, two-channel ladder structure filter and quincunx sampling network. Two matrices are used to represent the quincunx sub-network
In directional filter banks, the following four basic unimodular matrices are used to achieve the invariance of rotation operation
here, we use Smith’s decomposition which can diagonalize any integer matrix
where
In the first two layers of directional filter banks, quincunx filter banks (QFBs) will be used on each layer.
We can simplify the total sampling matrix of the QFB with the quincunx matrix of Smith’s form. We can calculate the sampling matrix using the following equation
According to the proposition presented above, we can get the following equation
The above equation shows the sampling network of QFB, which uses the
Fusion coefficient optimization with the WOA
The traditional fusion coefficient is selected through human experience. In this article, WOA is introduced to optimize the adaptive fusion coefficient, so that the image contrast information and direction information can be maximally retained and the fusion effect is obviously improved.
The WOA builds a global search mechanism by simulating the hunting behavior of whale, which can simulate the attacking mechanism of humpback whales (bubble-net hunting). The details of the WOA algorithm are described as follows.
Encircling prey
Before hunting, humpback whales identify the location of prey and then encircle their prey. In search space, the optimal position of the target prey is unknown at the beginning of time, thus WOA assumes that the current optimal solution is the target prey or close to the optimal position. When we ascertain the optimal search agent, other searches will update their own position based on this agent. Mirjalili and Lewis 22 described this behavior in a mathematical model as follows
where
here, the value of
The two-dimensional feedback behavior is shown in Figure 4. In Figure 4,

Predicted possible locations of position vectors in 2D space.
Bubble-net searching mechanism has two patterns: the first is shrinking encircling pattern and the second is spiral updating position.
where

Shrinking encircling mechanism.
In the process of optimization, we assume that if
where

Mechanism of bubble-net searching with spiral pattern.
Search for prey (exploration phase)
Unlike the exploitation phase, the search agent updates its location according to the random selection during the exploration phase, thus the random value of
where
On the whole, WOA algorithm is a global optimization, because it has the capabilities of exploration and exploitation. Pseudo code of WOA is given as follows.
To generalize the proposed algorithm, our fusion can be summarized by the following steps:
Initialize parameters. We adopt window function,
Decompose CP on each original image separately and obtain the decomposition images
Use the directional filter banks built according to section “Fusion coefficient optimization with the WOA,” and decompose
Take the filtering images
In each final obtained image, add information
Conduct the low-pass filtering of decomposition results
Use the WOA proposed in this article to optimize the coefficients
Consider
Conduct contrast reconstruction and obtain the final fusion result
Computational complexity analysis
The computational complexity of the algorithm in this article consists of two parts: the computational complexity of the algorithm in section “Contrast and directional features” and that of the algorithm in section “Fusion coefficient optimization with the WOA.”
In section “Contrast and directional features,” CP decomposes the original images first before entering the directional filter bank. After down-sampling and filtering, the amount of data is reduced greatly, thereby improving the efficiency. Suppose the size of both images to be fused is
where
In the optimization of the fusion coefficient in section “Fusion coefficient optimization with the WOA,” if the maximum number of iterations is
As a whole, the computational complexity of the algorithm in this article is
Experiment and analysis
Numerical indexes
At present, there is no unified standard to evaluate the fusion effect for all kinds of images. In this article, we adopt some evaluation metrics presented in relevant references to evaluate the fusion results objectively, such as information entropy (IE), average grads, standard deviation, and spatial frequency. For example, the IE of the image reflects the amount of information of the image, where the larger the value, the more information the image contains. The average gradient (AG) of the image reflects the ability of the image to express contrast to the tiny details. The standard deviation (SD) of the image is a measure of the degree of dispersion of the pixel values of the image compared to the pixel average, which is the contrast information of the image, where the larger the value, the greater the contrast of the image. The spatial frequency (SF) effectively reflects the details of the image, where the larger the value, the clearer the image. The edge strength (ES) value of the image is an evaluation index that describes the visually important edge intensity information and direction information of the fusion result. The execution time (ET) reflects the efficiency of different fusion methods.
Experimental analysis
In order to verify the advantages of the proposed algorithm, we choose several fusion algorithms to compare, such as fast filtering image fusion (FFIF) method, 24 CP-based fusion method,11,12 DWT-based fusion method, 10 and principal component analysis (PCA)-based fusion method. 25 As shown in Figure 7, the first dataset represents a section (256 × 256 pixels) of visible and infrared images without smoke shielding acquired by a Sony Camcorder and a long-wave infrared (LWIR) sensor. The second group of visible and infrared images is near the street signs and a man in the doorway. The third group of images (256 × 256 pixels) is crossroads at night. The fourth group of images (256 × 256 pixels) is three soldiers and a jeep, and the last group of images (256 × 256 pixels) is two men in front of a house. The fusion results are shown in Figure 8.

Five pairs of visible and infrared images: (a) building images without smoke shielding, (b) man in the doorway, (c) crossroads at night, (d) soldiers with jeep, and (e) two men in front of a house.

Fusion results by different methods: (a) FFIF-based, (b) CP-based, (c) DWT-based, (d) PCA-based, and (e) CDFO-based.
In the experiments, fusion performance is evaluated by qualitative analysis and objective indicator (as shown in Tables 1–5). Qualitative analysis is adopted by the visual effect and spectral fidelity, including spatial resolution, clarity, and sophisticated details, which indicate the extent of preserving original spectral signal or characteristics.
Numerical results of the first line of Figure 8.
IE: information entropy; AG: average gradient; SD: standard deviation; SF: spatial frequency; ES: edge strength; ET: execution time; IR: original infrared image; FFIF: fast filtering image fusion; CP: contrast pyramid; DWT: discrete wavelet transform; PCA: principal component analysis; CDFO: contrast and directional features optimization; VS: original visible image.
Significant of bold values represents the optimal numerical value gained from different fusion methods using the corresponding evaluation index.
Numerical results of the second line of Figure 8.
IE: information entropy; AG: average gradient; SD: standard deviation; SF: spatial frequency; ES: edge strength; ET: execution time; IR: original infrared image; FFIF: fast filtering image fusion; CP: contrast pyramid; DWT: discrete wavelet transform; PCA: principal component analysis; CDFO: contrast and directional features optimization; VS: original visible image.
Significant of bold values represents the optimal numerical value gained from different fusion methods using the corresponding evaluation index.
Numerical results of the third line of Figure 8.
IE: information entropy; AG: average gradient; SD: standard deviation; SF: spatial frequency; ES: edge strength; ET: execution time; IR: original infrared image; FFIF: fast filtering image fusion; CP: contrast pyramid; DWT: discrete wavelet transform; PCA: principal component analysis; CDFO: contrast and directional features optimization; VS: original visible image.
Significant of bold values represents the optimal numerical value gained from different fusion methods using the corresponding evaluation index.
Numerical results of the fourth line of Figure 8.
IE: information entropy; AG: average gradient; SD: standard deviation; SF: spatial frequency; ES: edge strength; ET: execution time; IR: original infrared image; FFIF: fast filtering image fusion; CP: contrast pyramid; DWT: discrete wavelet transform; PCA: principal component analysis; CDFO: contrast and directional features optimization; VS: original visible image.
Significant of bold values represents the optimal numerical value gained from different fusion methods using the corresponding evaluation index.
Numerical results of the fifth line of Figure 8.
IE: information entropy; AG: average gradient; SD: standard deviation; SF: spatial frequency; ES: edge strength; ET: execution time; IR: original infrared image; FFIF: fast filtering image fusion; CP: contrast pyramid; DWT: discrete wavelet transform; PCA: principal component analysis; CDFO: contrast and directional features optimization; VS: original visible image.
Significant of bold values represents the optimal numerical value gained from different fusion methods using the corresponding evaluation index.
From the visual effects (Figure 8), the overall profile of the FFIF fusion result is very rough and the saturation is distorted, especially in the second group of experiments; CP-based fusion results have higher contrast, but the edge contour is not clear enough; DWT-based fusion results lack spectral characteristics, and the sharpness is not enough; PCA-based fusion images have excessive brightness, rough edge contours, loss of a large amount of detail information, and severe distortion of saturation; the proposed CDFO method has better fusion results than the previous four groups of experiments. Rich-structure features and clear-edge contours can better highlight the target area of interest.
By the numerical results (Tables 1–5), the evaluation indexes of the proposed CDFO method are mainly the highest, which indicate that the fusion images have the highest sharpness and the most abundant details. However, in terms of the SD indicators, the SD values of the CP-based fusion results in the second, fourth, and fifth groups are higher. The reason is that the contrast of the fusion results based on the CP is too large, and its visual effect is particularly poor. Also, the time cost of the CDFO method is almost lower, which is very important in practical applications.
The convergence curve of the fusion coefficient optimization of the WOA in five different groups of images is shown in Figure 9. The second and third groups of images tend to converge when iterated 4 times, and the fusion result no longer changes. The first and fifth sets of images also reached a steady state when iterated 6 times, and the fourth set of images also converge when the number of iterations reached 8. Therefore, the figure shows that the image fusion method based on WOA for coefficient optimization has a fast convergence speed and good convergence stability, which can effectively save the optimization time of the fusion coefficient.

Convergence curves of CDFO method applied in five different sets of images.
In summary, the above five sets of simulation experiments have proved that both on the subjective vision and the quantitative numerical indicators, also taking into account the time cost, the proposed CDFO method has obvious advantages.
Conclusion
In this article, a CDFO image fusion algorithm is proposed to take advantages of both infrared images and visible images, which can better reconstruct the target scene using CDFO. The method can preserve more structural features and color spectrum information. At the same time, considering the high-frequency information with different scales, we construct directional filter banks to capture directional features. Finally, the WOA is applied to optimize the fusion coefficient. The proposed method has great advantages in both image quality and quantity. By testing on real visible and infrared image datasets, the proposed algorithm can preserve more scene details and information, which is more valuable for video monitoring and security surveillance in low visible environment.
