Sage Journals: Discover world-class research

Abstract

Compressive beamforming with a planar microphone array can effectively estimate the two-dimensional directions-of-arrival and quantify the strengths of acoustic sources. Due to the superiorities of overcoming the basis mismatch issue of the conventional grid-based method and improving the performance of the single-snapshot grid-free method, the multiple-snapshot grid-free method has become the current research focus. Its existing atomic norm minimization (ANM) based strategy blocks high resolution because it cannot work well for sources with a small separation. This paper commits itself to remedying this drawback. After revealing the cause for this drawback, we present an iterative reweighted ANM (IRANM) approach. Both simulations and experiments demonstrate that compared with the ANM-based two-dimensional multiple-snapshot grid-free compressive beamforming, the IRANM-based one enjoys not only the enhanced resolution but also the stronger denoising ability and the higher identification accuracy.

Keywords

Planar microphone array compressive beamforming two-dimensional multiple snapshot grid-free iterative reweighted atomic norm minimization

Introduction

Compressive beamforming with a planar microphone array is a promising approach to estimate the directions-of-arrival (DOAs) and quantify the strengths of acoustic sources. Based on compressive sensing theory,^1–3 the approach recovers the source distribution from an underdetermined system of linear equations by imposing sparsity constraints. The system of linear equations relates the noisy measurements of the wavefield with a microphone array to the source distribution. Due to the superiorities of strong anti-interference, low demand for the number of microphones, clear source imaging, and so on, it has attracted extensive concern recently.^4–9

Conventional compressive beamforming grids/discretizes the DOA domain into a finite set of observation directions, and assumes that all sources are located in these observation directions. The sparsity constraint is imposed to minimize the $l_{1} -$ norm of a source strength vector (for the single-snapshot case) or the $l_{2, 1} -$ norm of a source strength matrix (for the multiple-snapshot case). When the DOAs of sources do not fall in these observation directions, the results become inaccurate. The problem is called basis mismatch.^4,10–12 Utilizing a finer grid^13,14 can only alleviate this basis mismatch problem to a certain extent, and will increase computational complexity. More seriously, grid refinement burdens the measuring process with increased coherence, which can cause offset in the estimates.⁴ In order to fundamentally solve the basis mismatch problem, the grid-free strategy is presented.^15–17 In 2015, based on the minimization of the atomic norm of source strength and the polynomial rooting method, Xenaki and Gerstoft¹⁸ developed a one-dimensional (1D) single-snapshot grid-free compressive beamforming (GFCB) for a uniform linear microphone arrays (ULMAs). In 2018, Park, Choo, and Seong¹⁹ extended Xenaki and Gerstoft’s strategy into the 1D multiple-snapshot case. In 2017–2018, the authors^20–23 successively developed the two-dimensional (2D) single- and multiple-snapshot GFCB based on atomic norm minimization (ANM) for a uniform rectangular microphone array (URMA). They have proven that for sources that are active across multiple snapshots and uncorrelated or partially correlated, the multiple-snapshot GFCB outperforms the single-snapshot one in many aspects.²³

Three steps are included in the ANM-based 2D multiple-snapshot GFCB.²³ First, the atomic norm of the microphone pressure induced by sources is minimized to obtain a positive semidefinite (PSD) and twofold Toeplitz matrix that can be understood as the array covariance matrix, and meanwhile denoise the measured pressure and thus reconstruct the microphone pressure from sources. Then, the Toeplitz matrix is processed to estimate the number and the DOAs of sources through the matrix pencil and pairing (MaPP) method.^23,24 Finally, the source strengths are quantified based on the reconstructed microphone pressure and the estimated DOAs. When the sources are sufficiently separated, ANM amounts to the sparsity constraint, the obtained PSD and twofold Toeplitz matrix can contain the accurate DOA information and the microphone pressure from sources can be accurately reconstructed with a high probability. However, it fails to identify sources that have a small separation. This drawback can be attributed to the fact that the atomic norm in the first step is only a convex relaxation of the source sparsity metric. This drawback limits the resolution of the ANM-based 2D multiple-snapshot GFCB. Exploring an effective method to remedy this drawback is of significance for the function perfection and performance improvement of the 2D multiple-snapshot GFCB. In this paper, an iterative reweighted ANM (IRANM)-based 2D multiple-snapshot GFCB is proposed.

The remainder of this paper is organized as follows. Section 2 presents the theory of the ANM-based 2D multiple-snapshot GFCB, and demonstrates and explains its drawback with simulations. Section 3 develops the IRANM-based 2D multiple-snapshot GFCB by illuminating its theory and revealing its mechanism with a simulated illustrative example. Section 4 compares the two strategies both with simulations and experimentally. Section 5 concludes this paper.

ANM-based 2D multiple-snapshot GFCB

Theory

In the 2D GFCB, a URMA is utilized to measure the signal. Figure 1 shows the layout, and the symbol “•” represents the microphone. $a = 0,1, \dots, A - 1$ and $b = 0,1, \dots, B - 1$ index the microphones in $x$ and $y$ dimensions, respectively. The microphone spacing is $Δ x$ and $Δ y$ in $x$ and $y$ dimensions, respectively. $(θ_{i}, ϕ_{i})$ indicates the actual DOA of the $i -$ th source with $θ_{i}$ and $ϕ_{i}$ being the elevation and the azimuth angle, respectively. $s_{i, l}$ represents the complex strength of the $i -$ th source under the $l -$ th snapshot, and $s_{i} = [s_{i, 1}, s_{i, 2}, \dots, s_{i, L}] \in C^{1 \times L}$ represents the complex strength vector of the $i -$ th source under all $L$ snapshots. Here, $C$ represents a set of complex numbers. The theoretical pressure vector $p_{a, b} \in C^{1 \times L}$ at the (a,b)-th microphone under each snapshot can be described by $p_{a, b} = \sum_{i = 1}^{I} s_{i} e^{j 2 π (t_{1 i} a + t_{2 i} b)}$ (1)where $I$ is the total number of sources, $j = \sqrt{- 1}$ , $t_{1 i} \equiv \sin θ_{i} \cos ϕ_{i} Δ x / λ$ , $t_{2 i} \equiv \sin θ_{i} \sin ϕ_{i} Δ y / λ$ , and $λ$ is the wavelength. Construct $d (t_{1 i}, t_{2 i}) = {[1, e^{j 2 π t_{1 i}}, \dots, e^{j 2 π t_{1 i} (A - 1)}]}^{T} \otimes {[1, e^{j 2 π t_{2 i}}, \dots, e^{j 2 π t_{2 i} (B - 1)}]}^{T} \in C^{A B}$ , $P = {[p_{0,0}^{T}, p_{0,1}^{T}, \dots, p_{0, B - 1}^{T}, p_{1,0}^{T}, p_{1,1}^{T}, \dots, p_{1, B - 1}^{T}, \dots, p_{A - 1,0}^{T}, p_{A - 1,1}^{T}, \dots, p_{A - 1, B - 1}^{T}]}^{T} \in C^{A B \times L}$ , $s_{i} = {‖ s_{i} ‖}_{2} \in ℝ^{+}$ , and $ψ_{i} = s_{i} / s_{i} \in C^{1 \times L}$ , where ${(\cdot)}^{T}$ represents the transpose operator, $\otimes$ represents the Kronecker product, ${‖ \cdot ‖}_{2}$ represents the $l_{2} -$ norm, and $R^{+}$ represents a positive real number set. Corresponding to equation (1) $P = \sum_{i = 1}^{I} d (t_{1 i}, t_{2 i}) s_{i} = \sum_{i = 1}^{I} s_{i} d (t_{1 i}, t_{2 i}) ψ_{i}$ (2)

Figure 1.

Measurement layout.

The measured pressure $P^{★} \in C^{A B \times L}$ can be expressed as the sum of the theoretical pressure $P$ and the additive noise $N \in C^{A B \times L}$ , that is $P^{★} = P + N$ (3)

In subsequent simulations, the noise is assumed to be independent identically distributed complex Gaussian. Define the array signal-to-noise ratio (SNR) as $20 \log_{10} ({‖ P ‖}_{F} / {‖ N ‖}_{F})$ , where ${‖ \cdot ‖}_{F}$ represents the Frobenius norm.

The first postprocessing step of the 2D multiple-snapshot GFCB is to denoise $P^{★}$ and thus reconstruct $P$ . This can be realized by imposing a sparsity constraint, that is, minimizing the number of sources. In the grid-free frame, $t_{1} \equiv \sin θ \cos ϕ Δ x / λ$ , $t_{2} \equiv \sin θ \sin ϕ Δ y / λ$ and the elements of $ψ$ are continuous functions related to $θ$ and $ϕ$ . Thereinto, ${- 90}^{\circ} \leq θ \leq 90^{\circ}$ and $0^{\circ} \leq φ < 36 0^{\circ}$ . Assemble all the values of $d (t_{1}, t_{2}) ψ$ to form an atomic set and represented as $A$ . The atomic $l_{0} -$ norm of $P$ is defined as ${‖ P ‖}_{A, 0} = \inf_{\begin{matrix} d (t_{1 i}, t_{2 i}) ψ_{i} \in A \\ s_{i} \in R^{+} \end{matrix}} {I | P = \sum_{i = 1}^{I} s_{i} d (t_{1 i}, t_{2 i}) ψ_{i}}$ (4)where $\inf {\cdot}$ represents the infimum operator. ${‖ P ‖}_{A, 0}$ is a direct metric of source sparsity. The reconstruction of $P$ can be transformed into $\hat{P} = \underset{P \in C^{A B \times L}}{\arg \min} {‖ P ‖}_{A, 0} subject to {‖ P^{★} - P ‖}_{F} \leq ε$ (5)where $ε$ is the control parameter corresponding to noise. Let $ε = {‖ N ‖}_{F}$ normally. Equation (5) can be further transformed into the following rank minimization problem ${\hat{u}, \hat{P}, \hat{E}} = \underset{u \in C^{N_{u}}, P \in C^{A B \times L}, E \in C^{L \times L}}{\arg \min} rank (T_{b} (u)) subject to [\begin{matrix} T_{b} (u) & P \\ P^{H} & E \end{matrix}] \geq 0, {‖ P^{★} - P ‖}_{F} \leq ε$ (6)where $u \in C^{N_{u}}$ with $N_{u} = (A - 1) (2 B - 1) + B$ and $E \in C^{L \times L}$ are auxiliary quantities, $rank (\cdot)$ represents to compute the matrix rank, $T_{b} (\cdot)$ represents the twofold Toeplitz operator,^20–25 ${(\cdot)}^{H}$ represents the conjugate operator, and $\geq 0$ means that the matrix is PSD. The sparsity is exploited to the best extent possible by ${‖ P ‖}_{A, 0}$ . Nevertheless, ${‖ P ‖}_{A, 0}$ is non-convex and equation (6) is nondeterministic polynomial-time problem and hard to solve. Therefore, a computationally feasible alternative is required. The atomic norm ${‖ P ‖}_{A}$ is employed as a convex relaxation of ${‖ P ‖}_{A, 0}$ in Ref. [23]. Its definition is ${‖ P ‖}_{A} = \inf_{\begin{matrix} d (t_{1 i}, t_{2 i}) ψ_{i} \in A \\ s_{i} \in R^{+} \end{matrix}} {\sum_{i} s_{i} | P = \sum_{i} s_{i} d (t_{1 i}, t_{2 i}) ψ_{i}}$ (7)

Correspondingly, equation (5) becomes $\hat{P} = \underset{P \in C^{A B \times L}}{\arg \min} {‖ P ‖}_{A} subject to {‖ P^{★} - P ‖}_{F} \leq ε$ (8)

Equation (8) can be transformed into the following trace minimization problem²³ ${\hat{u}, \hat{P}, \hat{E}} = \underset{u \in C^{N_{u}}, P \in C^{A B \times L}, E \in C^{L \times L}}{\arg \min} \frac{1}{2 \sqrt{A B}} (tr (T_{b} (u)) + tr (E)) subject to [\begin{matrix} T_{b} (u) & P \\ P^{H} & E \end{matrix}] \geq 0, {‖ P^{★} - P ‖}_{F} \leq ε$ (9)where $tr (\cdot)$ represents to compute the matrix trace. Obviously, substituting ${‖ P ‖}_{A}$ for ${‖ P ‖}_{A, 0}$ is substituting the trace minimization for the rank minimization. Equation (9) is a disciplined convex optimization problem²⁶ and can be solved utilizing the SDPT3 solver in MATLAB CVX toolbox.²⁷

The $T_{b} (\hat{u})$ obtained by equation (9) is a PSD and twofold Toeplitz matrix. Its Vandermonde decomposition,^{20,21,23–25} expressed by equation (10),²¹ contains the DOA information of sources $T_{b} (\hat{u}) = V Σ V^{H} = \sum_{i = 1}^{r} σ_{i} d (t_{1 i}, t_{2 i}) d {(t_{1 i}, t_{2 i})}^{H}$ (10)where $V = [d (t_{11}, t_{21}), d (t_{12}, t_{22}), \dots, d (t_{1 r}, t_{2 r})]$ , $Σ = diag ([σ_{1}, σ_{2}, \dots, σ_{r}])$ , $diag (\cdot)$ creates a square diagonal matrix with diagonal elements being the vector in the parentheses, $σ_{i} (i = 1,2, \dots, r) \in R^{+}$ , and $r$ corresponds to the rank of $T_{b} (\hat{u})$ . Obviously, $T_{b} (\hat{u})$ can be understood as the sum of the array covariance matrix (the covariance matrix of microphone pressure) induced by each source in a group. The second postprocessing step of the 2D multiple-snapshot GFCB is to find the Vandermonde decomposition of $T_{b} (\hat{u})$ and thus estimate the number and DOAs of sources. This can be realized through the MaPP method. Its detailed procedure can be seen in Ref. [23]. It is worth mentioning that not all the PSD and twofold Toeplitz matrices admit the Vandermonde decomposition. Refs. [24,25] have proved that the sufficient condition for $T_{b} (\hat{u})$ to support a Vandermonde decomposition is $r \leq \min {A, B}$ , and the sufficient condition for $T_{b} (\hat{u})$ to admit a unique Vandermonde decomposition is $r \leq \min {A, B}$ . Based on the $\hat{P}$ reconstructed by ANM and the $V$ obtained by MaPP, the 2D multiple-snapshot GFCB quantifies the source strength as $\hat{P}$ divided by $V$ .

An array formed by randomly selecting partial microphones from a standard URMA shown in Figure 1 is called a non-uniform rectangular microphone array (NURMA). Employing the measured pressures of the partial microphones, the 2D multiple-snapshot GFCB can also obtain the $T_{b} (\hat{u})$ , reconstruct the pressure of all microphones, and finally achieve the estimation of source DOA and strength. In this circumstance, $P$ , $P^{★}$ and $N$ in equations (2) and (3) become $P_{Ω}$ , $P_{Ω}^{★}$ and $N_{Ω}$ , and $P^{★} - P$ in equations (5), (6), (8) and (9) becomes , where $Ω$ is the index set of the chosen microphones, $| Ω |$ is the cardinality of $Ω$ , $P_{Ω}^{★} \in C^{| Ω | \times L}$ is the matrix formed by the measured pressures of the selected microphones, $P_{Ω} \in C^{| Ω | \times L}$ is the matrix composed of the pressures induced by sources at the selected microphones, and $N_{Ω} \in C^{| Ω | \times L}$ is the matrix composed of noises suffered by the selected microphones.

Simulations of drawback

A simulation example with six sound sources is designed. The DOAs of six sources are (70°, 300°), (30°, 200°), (40°, 90°), (50°, 90°), (60°, 190°), and (60°, 180°), in turn. The corresponding root mean square strengths $({‖ s_{i} ‖}_{2} / \sqrt{L})$ are 90 dB, 94 dB, 94 dB, 97 dB, 97 dB and 100 dB (referring to 2 × 10^-5 Pa). Set the frequency to 2000 Hz, the microphone array to eight rows and eight columns $(i .e ., A = B = 8)$ , the microphone spacing to 0.035 m $(Δ x = Δ y = 0.035 m)$ , the SNR to 30 dB, and the number of snapshots to 10, respectively. Figure 2(a) presents the source DOA map reconstructed by the ANM-based 2D multiple-snapshot GFCB. It has severe distortion. That can be explained as follows. The minimum separation among these sources, defined as $Δ_{\min} = \min_{i, i^{'} \in {1,2, \dots, k}, i \neq i^{'}} \max {| t_{1 i} - t_{1 i^{'}} |, | t_{2 i} - t_{2 i^{'}} |}$ , is only 0.025.

In this case, the ANM in equation (8) does not amount to the atomic $l_{0} -$ norm minimization in equation (5). Correspondingly, the trace minimization in equation (9) does not amount to the rank minimization in equation (6). As shown by the symbol “□” in Figure 2(e), the obtained $T_{b} (\hat{u})$ has many relatively large eigenvalues, which means the rank of $T_{b} (\hat{u})$ is high. Consequently, $T_{b} (\hat{u})$ may not support a Vandermonde decomposition, or its Vandermonde decomposition does not contain the correct DOA information. Change the fourth source’s elevation angle to 30° and the second source’s azimuth angle to 200° for simulation. Figure 2(b) presents the result. In this case, $Δ_{\min}$ increases to 0.054, the Vandermonde decomposition of $T_{b} (\hat{u})$ contains the correct DOA information, and therefore the reconstructed source DOAs coincide with the true ones well. When simulating Figure 2(a) and (b), we directly specify the number of sources as 6 to avoid the effect of the inaccurate estimation. In practical applications, the MaPP method utilizes the number of the eigenvalues of $T_{b} (\hat{u})$ in a preset dynamic range, that is, $| {λ (T_{b} (\hat{u})) | 10 \lg (λ_{\max} (T_{b} (\hat{u})) / λ (T_{b} (\hat{u}))) \leq R} |$ , to estimate the number of sources, where $λ (T_{b} (\hat{u}))$ and $λ_{\max} (T_{b} (\hat{u}))$ denote the eigenvalue and the largest eigenvalue of $T_{b} (\hat{u})$ , $R$ is the preset dynamic range, and $| \cdot |$ calculates the cardinality of a set. The eigenvalues of $T_{b} (\hat{u})$ decrease slowly when sources have a small separation, as shown in Figure 2(e). As a result, it is very hard to give out an appropriate $R$ to estimate the number of sources robustly and reliably. As shown by Figure 2(c) and (d), whether the number of sources is underestimated or overestimated, the reconstruction is inaccurate. To sum up, for sources with a small separation, because the $T_{b} (\hat{u})$ obtained by ANM cannot contain the correct DOA information with a high probability or the number of sources cannot be estimated robustly and reliably, the ANM-based 2D multiple-snapshot GFCB fails to work well. To obtain Figure 2, all the 64 microphones are utilized. Figure 3 shows the results when only 30 randomly selected microphones are utilized, which exhibits the same phenomena as in Figure 2, demonstrating the drawback of the ANM-based two-dimensional multiple-snapshot grid-free compressive beamforming again.

Figure 2.

Results of ANM-based 2D multiple-snapshot grid-free GFCB when a standard URMA with 64 microphones is utilized. (a), (b), (c), (d) Reconstructions (*) for sources (○). (e) Eigenvalues of $T_{b} (\hat{u})$ . $Δ_{\min}$ is (a) 0.025 and (b), (c), (d) 0.054. The estimated number of sources is (a), (b) 6, (c) $| {λ (T_{b} (\hat{u})) | 10 \lg (λ_{\max} (T_{b} (\hat{u})) / λ (T_{b} (\hat{u}))) \leq 6 dB} |$ and (d) $| {λ (T_{b} (\hat{u})) | 10 \lg (λ_{\max} (T_{b} (\hat{u})) / λ (T_{b} (\hat{u}))) \leq 20 dB} |$ . In the DOA maps (a)–(d), the true source strength output and the reconstructed one are scaled to dB with reference to their respective maximum, and the reconstructed maximum sound level (referring to 2×10⁻⁵ Pa) is marked on the top. Similar in subsequent DOA maps.

Figure 3.

Results of ANM-based 2D multiple-snapshot GFCB when a NURMA with 30 randomly selected microphones is utilized. (a) Layout of the NURMA. (b), (c), (d), (e) Reconstructions (*) for sources (○). (f) Eigenvalues of $T_{b} (\hat{u})$ . $Δ_{\min}$ is (b) 0.025 and (c), (d), (e) 0.054. The estimated number of sources is (b), (c) 6, (d) $| {λ (T_{b} (\hat{u})) | 10 \lg (λ_{\max} (T_{b} (\hat{u})) / λ (T_{b} (\hat{u}))) \leq 6 dB} |$ and (e) $| {λ (T_{b} (\hat{u})) | 10 \lg (λ_{\max} (T_{b} (\hat{u})) / λ (T_{b} (\hat{u}))) \leq 20 dB} |$ .

IRANM-based 2D multiple-snapshot GFCB

Theory

Build a new metric $M^{κ} (P) = \min_{u \in ℂ^{N_{u}}, E \in ℂ^{L \times L}} \frac{1}{2 \sqrt{A B}} (ln | T_{b} (u) + κ I | + tr (E)) subject to [\begin{matrix} T_{b} (u) & P \\ P^{H} & E \end{matrix}] \geq 0$ (11)where $κ > 0$ is a regularization parameter, $I \in R^{A B \times A B}$ is an identity matrix. The following properties hold under a certain condition:^11,21 (1) when $κ \to 0$ and ${‖ P ‖}_{A, 0} < A B$ , $M^{κ} (P) \sim ({‖ P ‖}_{A, 0} / 2 \sqrt{A B} - \sqrt{A B} / 2) \ln κ^{- 1}$ , that is, $M^{κ} (P)$ is equal to $({‖ P ‖}_{A, 0} / 2 \sqrt{A B} - \sqrt{A B} / 2) \ln κ^{- 1}$ asymptotically as $κ$ tends to 0 if ${‖ P ‖}_{A, 0} < A B$ ; (2) When $κ \to \infty$ , $M^{κ} (P) - (\sqrt{A B} / 2) \ln κ \sim {‖ P ‖}_{A} κ^{- 1 / 2}$ , that is, $M^{κ} (P) - (\sqrt{A B} / 2) \ln κ$ is equal to ${‖ P ‖}_{A} κ^{- 1 / 2}$ asymptotically as $κ$ tends to $+ \infty$ ; (3) Use ${\hat{u}}_{κ \to 0}$ to represent the optimal variable when $κ$ tends to 0. Only ${‖ P ‖}_{A, 0}$ eigenvalues of $T_{b} ({\hat{u}}_{κ \to 0})$ are relatively large. The smallest $A B - {‖ P ‖}_{A, 0}$ eigenvalues are around 0. The first property shows that minimizing $M^{κ} (P)$ amounts to minimizing ${‖ P ‖}_{A, 0}$ as $κ$ tends to 0. The second property shows that minimizing $M^{κ} (P)$ amounts to minimizing ${‖ P ‖}_{A}$ as $κ$ tends to $+ \infty$ . That is to say, $M^{κ} (P)$ acts as the bridge between ${‖ P ‖}_{A, 0}$ and ${‖ P ‖}_{A}$ , and is capable of enhancing sparsity compared to ${‖ P ‖}_{A}$ . The third property shows that the sufficient condition to support a Vandermonde decomposition for $T_{b} ({\hat{u}}_{κ \to 0})$ can be guaranteed and employing the number of eigenvalues of $T_{b} ({\hat{u}}_{κ \to 0})$ that are larger than a tiny value can estimate the number of sources robustly and reliably. Therefore, it is expected to remedy the above-mentioned drawback by replacing ${‖ P ‖}_{A}$ with $M^{κ} (P)$ to measure the source sparsity. So, the reconstruction problem of $P$ can be rewritten as $\hat{P} = \underset{P \in C^{A B \times L}}{\arg \min} M^{κ} (P) subject to {‖ P^{★} - P ‖}_{F} \leq ε$ (12)

Simultaneous equations (11) and (12) bring ${\hat{u}, \hat{P}, \hat{E}} = \underset{u \in C^{N_{u}}, P \in C^{A B \times L}, E \in C^{L \times L}}{\arg \min} \frac{1}{2 \sqrt{A B}} (ln | T_{b} (u) + κ I | + tr (E)) subject to [\begin{matrix} T_{b} (u) & P \\ P^{H} & E \end{matrix}] \geq 0, {‖ P^{★} - P ‖}_{F} \leq ε$ (13)

The cost function in equation (13) is a combination of concave function $ln | T_{b} (u) + κ I |$ and convex function $tr (E)$ , which can be solved by the majorization–minimization algorithm.^21,28,29 This algorithm iteratively updates the optimal variable and the optimal value of the cost function. Let ${\hat{u}}^{k}$ , $κ^{k}$ and $W^{k} \equiv {(T_{b} ({\hat{u}}^{k}) + κ^{k} I)}^{- 1} \in C^{A B \times A B}$ be the optimal variable, the regularization parameter and the weighting matrix determined by the $k -$ th iteration, respectively. In the (k+1)-th iteration, a surrogate function is first constructed to locally approximate the cost function. Its expression is $\frac{1}{2 \sqrt{A B}} (ln | T_{b} ({\hat{u}}^{k}) + κ^{k} I | + tr (W^{k} T_{b} (u - {\hat{u}}^{k})) + tr (E)) = \frac{1}{2 \sqrt{A B}} (tr (W^{k} T_{b} (u)) + tr (E)) + c^{k}$ (14)where $c^{k}$ is a constant. Then, ignoring $c^{k}$ , the minimization problem can be transformed into a disciplined convex optimization problem²⁶ as follows ${{\hat{u}}^{k + 1}, {\hat{P}}^{k + 1}, {\hat{E}}^{k + 1}} = \underset{u \in C^{N_{u}}, P \in C^{A B \times L}, E \in C^{L \times L}}{\arg \min} \frac{1}{2 \sqrt{A B}} (tr (W^{k} T_{b} (u)) + tr (E)) subject to [\begin{matrix} T_{b} (u) & P \\ P^{H} & E \end{matrix}] \geq 0, {‖ P^{★} - P ‖}_{F} \leq ε$ (15)

Equation (15) can be solved utilizing the SDPT3 solver in MATLAB CVX toolbox.²⁷ Initializing ${\hat{u}}^{0} = 0$ and $κ^{0} = 1$ , the first iteration solution to equation (15) is consistent with the ANM for equation (9). In order to enhance sparsity, let $κ$ gradually decrease during the iteration $κ^{k} = {\begin{cases} \min {κ^{k - 1} / 2, λ_{\max} (T_{b} ({\hat{u}}^{k})) / 10}, k = 1 \\ κ^{k - 1} / 2, 2 \leq k \leq 10 \\ κ^{10}, k > 10 \end{cases}$ (16)where $λ_{\max} (T_{b} ({\hat{u}}^{k}))$ denotes the largest eigenvalue of $T_{b} ({\hat{u}}^{k})$ .

Denoting by $w (t_{1 i}, t_{2 i}) \geq 0$ a weighting coefficient, we define the weighted atomic norm of $P$ as ${‖ P ‖}_{A^{w}} = \inf_{\begin{matrix} d (t_{1 i}, t_{2 i}) ψ_{i} \in A \\ s_{i} \in R^{+} \end{matrix}} {\sum_{i} \frac{s_{i}}{w (t_{1 i}, t_{2 i})} | P = \sum_{i} s_{i} d (t_{1 i}, t_{2 i}) ψ_{i}}$ (17)

Obviously, when $w (t_{1 i}, t_{2 i}) = 1$ , ${‖ P ‖}_{A^{w}} = {‖ P ‖}_{A}$ . Let ${‖ P ‖}_{A^{w^{k}}}$ and $w^{k} (t_{1}, t_{2})$ be the weighted atomic norm and the weighting coefficient of the $k -$ th iteration. If $T_{b} ({\hat{u}}^{k + 1})$ admits a Vandermonde decomposition and $w^{k} (t_{1}, t_{2}) = \sqrt{\frac{A B}{d {(t_{1}, t_{2})}^{H} W^{k} d (t_{1}, t_{2})}}$ (18)then ${‖ P ‖}_{A^{w^{k}}} = \min_{u \in C^{N_{u}}, E \in C^{L \times L}} \frac{1}{2 \sqrt{A B}} (tr (W^{k} T_{b} (u)) + tr (E)) subject to [\begin{matrix} T_{b} (u) & P \\ P^{H} & E \end{matrix}] \geq 0$ (19)

The proposition is proved in Appendix. According to the proposition, equation (15) can be rewritten as $\begin{array}{l} {\hat{P}}^{k + 1} = \underset{P \in C^{A B \times L}}{\arg \min} (\min_{u \in C^{N_{u}}, E \in C^{L \times L}} \frac{1}{2 \sqrt{A B}} (tr (W^{k} T_{b} (u)) + tr (E)) subject to [\begin{matrix} T_{b} (u) & P \\ P^{H} & E \end{matrix}] \geq 0) \\ subject to {‖ P^{★} - P ‖}_{F} \leq ε \\ = \underset{P \in C^{A B \times L}}{\arg \min} {‖ P ‖}_{A^{w^{k}}} subject to {‖ P^{★} - P ‖}_{F} \leq ε \end{array}$ (20)

In summary, $P$ can be reconstructed by iteratively minimizing its weighted atomic norm and updating the weighting coefficient accordingly in each iteration. Hence, this proposed algorithm can be named IRANM. Like ANM, IRANM is also applicable to the NURMA. This is realized by changing $P^{★} - P$ in equations (12), (13), (15), and (20) to $P_{Ω}^{★} - P_{Ω}$ .

A simulated illustrative example

We reconstruct the source DOAs distribution assumed in Figure 2(a) by IRANM + MaPP. Here, MaPP estimates the number of sources as the number of eigenvalues of $T_{b} ({\hat{u}}^{k})$ that are greater than 10⁻⁶. Figure 4 presents the results when the standard URMA is utilized. Figure 4(a) shows that $T_{b} ({\hat{u}}^{1})$ has slowly decreased eigenvalues, whereas $T_{b} ({\hat{u}}^{2})$ and $T_{b} ({\hat{u}}^{3})$ have only 6 relatively large eigenvalues and their other eigenvalues are as low as 10⁻⁸. That agrees with the third property of $M^{κ} (P)$ in Section 3.1. Figure 4(b) shows that the reconstructed source DOAs distribution deviates from the true one seriously after one iteration. This can be attributed to the fact that on one hand, as demonstrated by Figure 2(a), the $T_{b} (\hat{u})$ obtained by ANM (the first iteration of IRANM) does not contain the correct DOA information, on the other hand, as shown by Figure 4(a), there are 55 eigenvalues greater than 10⁻⁶ for $T_{b} ({\hat{u}}^{1})$ , which means that the estimated number of sources is far greater than the true one. Figure 4(c) and (d) show that the source DOAs distribution has been accurately reconstructed after two iterations. This is because, as shown by Figure 4(a), both $T_{b} ({\hat{u}}^{2})$ and $T_{b} ({\hat{u}}^{3})$ are low-rank and admit a unique Vandermonde decomposition, and MaPP accurately estimates the number of sources. Figure 4(e)–(g) present the weighting coefficients used in the first three iterations. The weighting coefficients utilized in the present iteration is got according to the $T_{b} (\hat{u})$ and $κ$ determined in the previous iteration, and affords preference to the atoms around the sources identified in the previous iteration. Figure 5 presents the results when the NUMRA shown in Figure 3(a) is utilized, which exhibits the similar phenomena to Figure 4. These phenomena indicate that the IRANM-based 2D multiple-snapshot GFCB can remedy the drawback of the ANM-based one.

Figure 4.

Results of IRANM-based 2D multiple-snapshot grid-free GFCB when a standard URMA with 64 microphones is utilized. (a) Eigenvalues of $T_{b} ({\hat{u}}^{k})$ . Reconstructions (*) for sources (○) after (b) one iteration, (c) two iterations and (d) three iterations. Weighting coefficients utilized in (e) the first iteration, (f) the second iteration and (g) the third iteration.

Figure 5.

Results of IRANM-based 2D multiple-snapshot GFCB when a NURMA with 30 randomly selected microphones is utilized. (a) Eigenvalues of $T_{b} ({\hat{u}}^{k})$ . Reconstructions (*) for sources (○) after (b) one iteration, (c) two iterations and (d) three iterations. Weighting coefficients utilized in (e) the first iteration, (f) the second iteration and (g) the third iteration.

Performance comparisons

Simulations

We compare the reconstruction results of DOAs distribution based on ANM + MaPP and IRANM + MaPP for the sources assumed in Figure 2(a). In the former, MaPP estimates the number of sources as the number of eigenvalues of $T_{b} (\hat{u})$ in 20 dB dynamic range. In the latter, IRANM stops the iteration when the relative change of $\hat{P}$ between two consecutive iterations $(i .e ., {‖ {\hat{P}}^{k} - {\hat{P}}^{k - 1} ‖}_{F} / {‖ {\hat{P}}^{k - 1} ‖}_{F})$ is not greater than 10⁻³ or the number of iterations reaches the preset maximum value (set to 20), and MaPP estimates the number of sources as the number of eigenvalues of $T_{b} (\hat{u})$ that are greater than 10⁻⁶. Figure 6(a) and (b) presents the results. Change the frequency from 2000 Hz to 3000 Hz, 4000 Hz and 4900 Hz to conduct simulations. Correspondingly, the minimum separation among these sources $(Δ_{\min})$ changes from 0.025 to 0.038, 0.050 and 0.062. Figure 6(c)–(h) presents the results. It is worth mentioning that 4900 Hz is the upper limit frequency of the current array according to $f_{\max} = 0.5 c / \max {Δ x, Δ y}$ with $c$ being the speed of sound.^20–22 Obviously, affected by the drawback of failing to identify sources with a small separation, the ANM-based 2D multiple-snapshot GFCB (Figure 6(a), (c), (e) and (g)) reconstructs the source distribution accurately only at 4900 Hz. In contrast, the IRANM-based 2D multiple-snapshot GFCB (Figure 6(b), (d), (f) and (h)) reconstructs the source distribution accurately at all the four frequencies, enjoying the enhanced resolution. To obtain Figure 6, the URMA with 64 microphones is utilized. Figure 7 presents the results when using the NURMA with 30 randomly selected microphones, again exhibiting the advantage of the IRANM-based 2D multiple-snapshot GFCB.

Figure 6.

Reconstructions (*) of (a), (c), (e), (g) ANM- and (b), (d), (f), (h) IRANM-based 2D multiple-snapshot GFCB for sources (○) at (a), (b) 2000 Hz, (c), (d) 3000 Hz, (e), (f) 4000 Hz and (g), (h) 4900 Hz. The URMA with 64 microphones is utilized.

Figure 7.

Referring to Ref. [23],we use ${‖ \hat{P} - P ‖}_{F} / {‖ P ‖}_{F}$ , ${‖ [\hat{θ}, \hat{ϕ}] - [θ, ϕ] ‖}_{F} / 2 I$ and ${‖ {\hat{s}}_{rms} - s_{rms} ‖}_{2} / {‖ s_{rms} ‖}_{2}$ to measure the pressure reconstruction accuracy, the DOA estimation accuracy and the strength quantification accuracy, respectively. Hereinto, $θ \in R^{I}$ , $ϕ \in R^{I}$ and $s_{rms} \in R^{I}$ are the true elevation angle vector, the true azimuth angle vector and the true root mean square strength vector. $\hat{θ} \in R^{I}$ , $\hat{ϕ} \in R^{I}$ , and ${\hat{s}}_{rms} \in R^{I}$ are the estimated ones. Figure 8 shows the curves of these errors vs. $Δ_{\min}$ of the ANM- and the IRANM-based 2D multiple-snapshot GFCB. For each $Δ_{\min}$ , these errors are averaged over 50 Monte Carlo trials. In each trial, we randomly generate two sources with a distance $Δ_{\min}$ and a strength difference of less than 10 dB. The SNR is set as 20 dB. The URMA with 64 microphones is utilized to obtain Figure 8(a), (c) and (e). The NURMA with 30 randomly selected microphones is utilized to obtain Figure 8(b), (d) and (f). For reconstruction of $P$ , Figure 8(a) and (b) show that the error of ANM under $Δ_{\min} < 0.8 / \sqrt{A B}$ is distinctly higher than that under $Δ_{\min} \geq 0.8 / \sqrt{A B}$ , whereas the error of IRANM varies slightly across all $Δ_{\min}$ and is distinctly lower than the one of ANM. For DOA estimation, Figure 8(c) and (d) show that on one hand, the IRANM-based 2D multiple-snapshot GFCB has very low errors across all $Δ_{\min}$ ; on the other hand, compared to the former, the ANM-based 2D multiple-snapshot GFCB has distinctly higher errors under $Δ_{\min} < 0.5 / \sqrt{A B}$ , slightly higher errors under $0.5 / \sqrt{A B} \leq Δ_{\min} < 0.8 / \sqrt{A B}$ and almost the same errors under $Δ_{\min} \geq 0.8 / \sqrt{A B}$ . For strength quantification, Figure 8(e) and (f) shows that the error of the ANM-based 2D multiple-snapshot GFCB is always higher than the one of the IRANM-based 2D multiple-snapshot GFCB, especially under small $Δ_{\min}$ . These phenomena demonstrate that, compared to the ANM-based, the proposed IRANM-based one has the better performance in denoising and spatial resolution. Therefore, even in the case of a small separated signal source, the proposed method can accurately reconstruct the microphone pressure induced by sources, estimate the DOAs, and quantify the strengths.

Figure 8.

Error curves. (a), (b) ${‖ \hat{P} - P ‖}_{F} / {‖ P ‖}_{F}$ , (c), (d) ${‖ [\hat{θ}, \hat{ϕ}] - [θ, ϕ] ‖}_{F} / 2 k$ and (e), (f) ${‖ {\hat{s}}_{rms} - s_{rms} ‖}_{2} / {‖ s_{rms} ‖}_{2}$ vs. $Δ_{\min}$ . (a), (c), (e) The URMA with 64 microphones and (b), (d), (f) the NURMA with 30 randomly selected microphones are utilized.

Experiments

As shown in Figure 9, in a semi-anechoic chamber, we use a URMA with 64 Brüel&Kjær Type 4958 microphones to conduct an experimental measurement on three small speakers to verify correctness of the simulation conclusion. The array parameters are as follows: $A = B = 8$ and $Δ x = Δ y = 0.035 m$ . The three loudspeakers are located at Cartesian coordinates (2.24, 0, 5) m, (−1.24, 0, 5) m and (−2.24, 0, 5) m from left to right. Correspondingly, their DOAs are (24.13°, 0°), (13.93°, 180°), and (24.13°, 180°). In addition, the Cartesian coordinates of three mirror sources caused by the reflection effect of the ground are (2.24, −2.2, 5) m, (−1.24, −2.2, 5) m, and (−2.24, −2.2, 5) m. Correspondingly, their DOAs are (32.13°, 315.52°), (26.80°, 240.59°), and (32.13°, 224.48°). All microphone pressure signals are acquired synchronously by Brüel&Kjær PULSE Type 3560D Data Acquisition System and then transferred to Brüel&Kjær BKConnect Software to obtain Fourier spectra. Eventually, these Fourier spectra are processed by ANM + MaPP and IRANM + MaPP to reconstruct the source DOAs distribution. MaPP utilizes the same method to estimate the number of sources and IRANM utilizes the same stopping condition for iteration as in Section 4.1.

Figure 9.

Experimental layout.

Figure 10 presents the results at 2000 Hz, 3000 Hz, 4000 Hz and 4900 Hz when all the 64 microphones are utilized. For these four frequencies, the minimum separation among these sources are $0.032 (0.256 / \sqrt{A B})$ , $0.048 (0.384 / \sqrt{A B})$ , $0.065 (0.52 / \sqrt{A B})$ , and $0.079 (0.63 / \sqrt{A B})$ , respectively. As shown by Figure 10(a), (c), (e), and (g), ANM + MaPP reconstructs the source DOAs distribution accurately only at 4000 Hz and 4900 Hz. In contrast, as shown by Figure 10(b), (d), (f), and (h), IRANM + MaPP reconstructs the source DOAs distribution accurately at all the four frequencies. Figure 11 presents the result when only 30 randomly selected microphones are utilized, which exhibits the same phenomena as in Figure 10. These phenomena demonstrate that the IRANM-based 2D multiple-snapshot GFCB has the enhanced resolution compared to the ANM-based one. The experimental conclusion is consistent with the simulation one, indicating that the conclusion is correct.

Figure 10.

Reconstructions (*) of (a), (c), (e), (g) ANM and (b), (d), (f), (h) IRANM-based 2D multiple-snapshot GFCB for loudspeaker sources and their mirror sources (○) at (a), (b) 2000 Hz, (c), (d) 3000 Hz, (e), (f) 4000 Hz and (g), (h) 4900 Hz. The URMA with 64 microphones is utilized. The symbols ○ indicate the true DOAs but do not contain the strength information, so do in Figure 11.

Figure 11.

Reconstructions (*) of (a), (c), (e), (g) ANM and (b), (d), (f), (h) IRANM-based 2D multiple-snapshot GFCB for loudspeaker sources and their mirror sources (○) at (a), (b) 2000 Hz, (c), (d) 3000 Hz, (e), (f) 4000 Hz and (g), (h) 4900 Hz. The NURMA with 30 randomly selected microphones are utilized.

Conclusions

The existing ANM-based 2D multiple-snapshot GFCB fails to work well for sources with a small separation. After analyzing the cause of this drawback, we develop an IRANM-based 2D multiple-snapshot GFCB to remedy it. Through simulations and experiments, some interesting and valuable conclusions have been drawn. First, the developed IRANM-based 2D multiple-snapshot GFCB can accurately estimate the DOAs and quantify the strengths of the small separation sources, successfully remedying the drawback of the ANM-based one and enhancing the resolution. Second, the IRANM-based 2D multiple-snapshot GFCB has stronger denoising ability and higher identification accuracy than the ANM-based one. Finally, the above conclusions hold true not only for a standard URMA, but also for a NURMA consisting of a small number of randomly selected microphones.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: The study was supported by the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJQN202003206),the Natural Science Foundation of Chongqing (Grant No. cstc2019jcyj-msxmX0399) and the National Natural Science Foundation of China (Grant No. 11874096).

ORCID iD

Zhigang Chu

References

Elad

. Sparse and redundant representations: from theory to applications in signal and image processing. New York, NY: Springer, 2010.

Foucart

Rauhut

. A mathematical introduction to compressive sensing. New York, NY: Springer, 2013.

Boche

Calderbank

Kutyniok

, et al. Compressed sensing and its applications. New York, NY: Springer, 2015.

Xenaki

Gerstoft

Mosegaard

. Compressive beamforming. J Acoust Soc Am 2014; 136(1): 260–271.

Gerstoft

Xenaki

Mecklenbräuker

. Multiple and single snapshot compressive beamforming. J Acoust Soc Am 2015; 138(4): 2003–2014.

Ning

Wei

Qiu

Three-dimensional acoustic imaging with planar microphone arrays and compressive sensing. J Sound Vib 2016; 380: 112–128.

Mecklenbräuker

Gerstoft

Zöchmann

. c-LASSO and its dual for sparse signal estimation from array data. Signal Process 2017; 130: 204–216.

Ning

Pan

Zhang

A highly efficient compressed sensing algorithm for acoustic imaging in low signal-to-noise ratio environments. Mech Syst Signal Process 2018; 112: 113–128.

Gerstoft

Mecklenbräuker

Seong

Introduction to special issue on compressive sensing in acoustics. J Acoust Soc Am 2018; 143(6): 3731–3736.

10.

Chi

Scharf

Pezeshki

Sensitivity to basis mismatch in compressed sensing. IEEE Trans Signal Process 2011; 59(5): 2182–2195.

11.

Yang

Xie

. Enhancing sparsity and resolution via reweighted atomic norm minimization. IEEE Trans Signal Process 2016; 64(4): 995–1006.

12.

Yang

Stoica

Sparse methods for direction-of-arrival estimation. arXiv Preprint arXiv: 1609. 09596 [cs.IT] 2017, 1–65.

13.

Malioutov

Çetin

Willsky

. A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans Signal Process 2005; 53(8): 3010–3022.

14.

Duarte

Baraniuk

. Spectral compressive sensing. Appl Comput Harmon Anal 2013; 35(1): 111–129.

15.

Candès

Fernandez-Granda

. Towards a mathematical theory of super-resolution. Comm Pure Appl Math 2014; 67(6): 906–956.

16.

Candès

Fernandez-Granda

. Super-resolution from noisy data. J Fourier Anal Appl 2013; 19(6): 1229–1254.

17.

Tang

Bhaskar

Shah

, et al. Compressed sensing off the grid. IEEE Trans Inf Theor 2013; 59(11): 7465–7490.

18.

Xenaki

Gerstoft

. Grid-free compressive beamforming. J Acoust Soc Am 2015; 137(4): 1923–1935.

19.

Park

Choo

Seong

. Multiple snapshot grid free compressive beamforming. J Acoust Soc Am 2018; 143(6): 3849–3859.

20.

Yang

Chu

, et al. Two-dimensional grid-free compressive beamforming. J Acoust Soc Am 2017; 142(2): 618–629.

21.

Yang

Chu

Ping

, et al. Resolution enhancement of two-dimensional grid-free compressive beamforming. J Acoust Soc Am 2018; 143(6): 3860–3872.

22.

Yang

Chu

Ping

. Alternating direction method of multipliers for weighted atomic norm minimization in two-dimensional grid-free compressive beamforming. J Acoust Soc Am 2018; 144(5): EL361–EL366.

23.

Yang

Chu

Ping

. Two-dimensional multiple-snapshot grid-free compressive beamforming. Mech Syst Signal Process 2019; 124: 524–540.

24.

Yang

Xie

Stoica

. Vandermonde decomposition of multilevel Toeplitz matrices with application to multidimensional super-resolution. IEEE Trans Inf Theor 2016; 62(6): 3685–3701.

25.

Chi

Chen

. Compressive two-dimensional harmonic retrieval via atomic norm minimization. IEEE Trans Signal Process 2015; 63(4): 1030–1042.

26.

Boyd

Vandenberghe

. Convex optimization. Cambridge, UK: Cambridge University Press, 2004.

27.

Grant

Boyd

. CVX: MATLAB software for disciplined convex programming, version 2.1, http://cvxr.com/cvx (2019, accessed 26 April 2019).

28.

Hunter

Lange

. A tutorial on MM algorithms. Am Stat 2004; 58(1): 30–37.

29.

Sun

Babu

Palomar

. Majorization-Minimization algorithms in signal processing, communications, and machine learning. IEEE Trans Signal Process 2017; 65(3): 794–816.

Iterative reweighted atomic norm minimization based two-dimensional multiple-snapshot grid-free compressive beamforming with planar microphone array

Abstract

Keywords

Introduction

ANM-based 2D multiple-snapshot GFCB

Theory

Simulations of drawback

IRANM-based 2D multiple-snapshot GFCB

Theory

A simulated illustrative example

Performance comparisons

Simulations

Experiments

Conclusions

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References