Sage Journals: Discover world-class research

Abstract

Rapid growth of machine-type communications devices challenges the future network with a significant aggregated data traffic. Distributed source coding is a promising technique that compresses data sources and decreases required aggregated data transmission rate. In this article, we discuss the merits and demerits of deploying distributed source coding in machine-type communications uplink transmissions. We analyze how the decoding delay and storage consumption scale with the number of users and prove that the average decoding delay grows linearly with the user number under some assumptions. A machine-type communications uplink transmission scheme adopting clustered distributed source coding is proposed to balance the compression ratio and decoding delay of distributed source coding where users are divided into independently encoded and decoded clusters. We evaluate three clustering algorithms, grid dividing, Weighted Pair Group Method with Arithmetic Mean, and K-medoids in our system model. The grid dividing algorithm clusters users based on their locations, while Weighted Pair Group Method with Arithmetic Mean and K-medoids cluster users using the correlation intensity between them. Our simulation results show that Weighted Pair Group Method with Arithmetic Mean and K-medoids outperform grid dividing on compression ratio and K-medoids and grid dividing have a more balanced delay distribution among different clusters than Weighted Pair Group Method with Arithmetic Mean.

Keywords

Distributed source coding machine-type communication uplink transmission clustering K-medoids Weighted Pair Group Method with Arithmetic Mean

Introduction

With the enormous development of Internet of Things (IoT), machine-type communication (MTC) devices are expected to grow exponentially and demand a significant aggregated transmission rate in the next decade. Meanwhile, their diverse applications in different areas including automatic driving, environmental monitoring, and wireless video surveillance generate varying requirements in terms of delay, reliability, bit rate, and energy consumption.¹ Taking platooning, a representative application of autonomous driving, as an example, vehicles in one platoon drive on the same path and maintain a fixed small distance between each other. To ensure the safety of vehicles, a highly reliable low-latency data transmission link is demanded by the autonomous controlling system. Information exchanging and processing should take no more than 100 ms.² In a wireless video surveillance system, instead of delay, channel resource and power constraint are the main concerns. Video transmission will generate a great amount of uplink data traffic that consumes lots of channel resources. When wireless video sensors are powered with battery, minimizing the energy consumption of data transmission and encoding computation are essential for prolonging the life of sensors.³

A typical MTC architecture consists of three domains: device domain, network domain, and application domain. A cellular network plays the role of network domain and MTC devices exchange data with MTC servers or other MTC devices through base station⁴ or managed device-to-device links. Current researches focus on the access control rather than effective data transmission of massive MTC devices, because the amount of data from each device is considered small. However, some devices, like sensors on autonomous driving cars and monitoring cameras in a smart city, require higher transmission rates. As these intense data aggregation applications emerge and the amount of various MTC devices grows, aggregated data transmission rate becomes enormous. Overall, data traffic from MTC devices will occupy the majority of future data traffic.¹ Serving a large amount of diverse MTC devices and satisfying their significant aggregated data transmission are going to challenge future mobile networks.

To achieve higher data transmitting rate with limited bandwidth, one traditional method is to improve the spectrum efficiency. However, state-of-the-art modulation and coding techniques have pushed the current communication system close to Shannon limit, and it is getting harder to improve the spectrum efficiency. Another way to relief the pressure of communication system is compressing the data to be transmitted using source coding techniques. For single data source such as text, audio and video, many coding schemes like Huffman coding, JPEG, and H.264 are investigated and well applied in practical communication systems. By contrast, jointly compressing the data from more than one data source is immature and not widely adopted, but it is a promising technique for fifth-generation wireless systems (5G) to support the data transmission of massive MTC devices in the future. Distributed source coding (DSC) is a representative multi-source coding technique which has been widely investigated in wireless sensor networks (WSNs).⁵ It can be used to compress the output of correlated MTC devices that do not communicate with each others. However, deploying DSC will bring encoding and decoding delays which should be considered in MTC uplink transmission.

DSC is first introduced by Slepian and Wolf,⁶ and they proved that separate encoders and a joint decoder for two dependent sources can achieve a coding rate same as that a joint encoder and decoder does for lossless compression. They also presented the achievable rate region of lossless DSC, know as the Slepian–Wolf region. Their work was further extended to an information theoretical bounds for lossy compression by Wyner and Ziv.⁷ Based on their theorems, many practical coding schemes that approach the theoretic bound are proposed and most of them are derived from channel codings like turbo codes and low-density parity-check (LDPC) codes.⁸ Any arbitrary rate on the Slepian–Wolf rate region is then verified and achievable based on non-uniform LDPC.⁹

The capability to eliminate the information redundancy of correlated sources without requiring them to communicate with each other earns DSC many applications in different areas like WSN and video coding. In a densely deployed WSN, observations of sensors are expected to be spatially correlated.¹⁰ For example, the measurements of adjacent temperature sensors are usually similar. Sensors use relay nodes to transmit data to the aggregator, and the relay nodes can compress the observations it aggregated with the help of DSC and thus reduce the energy consumption of communication. Exploiting inherent correlation in the observations of MTC devices and deploying DSC can effectively increase battery life of WSN.¹¹ In video coding, DSC are mainly used for inter-frame coding and inter-video coding. The inter-frame coding take advantage of the correlation between video frames by encoding frames individually and decode them jointly with DSC,⁸ while intra-video coding tries to exploit the correlation of videos from cameras with overlapped field of views.¹²

Although DSC is promising in WSN, there is still a long way toward deploying DSC in a large-scale sensor network. In the past two decades, not only the benefits but also the drawbacks of applying DSC are investigated. In terms of encoding and decoding scheme, most of them are based on channel coding. They have the advantage of achieving compression rate near to the Slepian–Wolf border and being resilient to channel errors, but these capacity-achieving channel codes require a data block as large as $10^{5}$ symbols, which is not practical in realistic system.¹³ Moreover, channel codings have a high encoding/decoding complexity and delay. Other algorithms that use quantization techniques introduce nearly no delay at the encoder, but their exponentially growing complexity and storage requirement at the decoder also make them not suitable for deploying in a large cluster.¹⁴ Some works managed to degrade the decoding complexity under a few assumptions; J Barros and M Tuchler¹⁵ obtained a simplified model using factor graph by removing some of the conditioning dependence variables from the chain rule expansion and reduce the decoding complexity to be linear, but this methods requires storing the factor graph, and the storage grows exponentially with the rate of the quantizers. Most of the proposed DSC schemes are evaluated and discussed on two or three correlated sources. It is not well researched how they perform when there are more data sources.

The estimation of the correlation model of practical DSC systems is also a challenge. In practice, it is usually hard to obtain the correlation model, especially when the sensor network topology changes fast so that little time or information exists for training the joint probability mass or density function.⁵ Some works manage to estimate the correlation parameters at the decoder side, but they introduce other problems. For example, turbo codes are adopted to the compression of two correlated sources and the decoder estimate the correlation utilizing the iterative decoding process. The decoder manages to perform close to the Slepian–Wolf bound, but iterative decoding apparently brings higher computation complexity and delay.¹⁶ Aiming at the situation that neither encoder nor decoder knows the correlation structure, a rate-adaptive scheme with feedback is proposed,¹⁷ and the encoder first sends a short syndrome and the decoder attempts decoding; if the decoding fails, the encoder will be informed to extend the short syndrome with additional bits until the decoding achieves success. This scheme is further improved in decoding complexity and delay by reducing feedback loops.¹⁸ However, the request-and-decode process will make the system more fragile to channel failures.

Related works

Realizing the merits and demerits of DSC, some researchers point out that we should consider the overall performance of a DSC-employed system. The balance between the benefit and price of DSC is investigated from different viewpoint. In 2004, the tradeoff between reliability and efficiency in DSC is considered by Marco and Neuhoff.¹⁹ Four schemes including sequential Slepian–Wolf coding and clustered Slepian–Wolf coding are analyzed. It is assumed that the transmission of packets has a failure probability; since the decoding of Slepian–Wolf coding depends on data transmitted from other nodes, failure of transmissions at one node may cause the decoding failure of all nodes, and the results show that splitting sensors into clusters will increase the system reliability at the price of decreased coding efficiency. Hong and his colleagues consider sequential Slepian–Wolf coding in a random access network,²⁰ and they analyze the throughput, delay, and energy efficiency of DSC in the proposed system. Their results show that increasing the throughput per node will bring higher average delay and energy consumption.

The disadvantages of adopting DSC to a large cluster inspired the idea of splitting data sources into some small clusters. Lots of algorithms are proposed to maximize the benefit of DSC by finding a better clustering metric in WSN. In a study by Wang et al.,²¹ a distributed optimal compression clustering protocol (DOC²) is proposed to maximize the global compression gain. Sensors are divided into clusters based on their communication range and their contribution of the average entropy of messages inside one cluster. Inside one cluster, rate allocation is based on sequential Slepian–Wolf coding. The simulation results show that DOC² achieves a promising overall compression ratio. Aside from clustering algorithms, optimization of the number of clusters is also discussed by many researchers. In addiction to analyzing the number of clusters at single network layer, some researchers optimize the number of clusters using a cross-layer approach which incorporates effects from different network layers.²²

Our contributions

In this work, we consider a cellular network supporting massive MTC devices, and these devices upload their measurements to an MTC server. And we assume that these measurements are spatially correlated. This case is similar to a common scenario in WSN where lots of wireless sensors transmit data to a sink. Both scenario support many terminals and need to transmit as much information as possible with limited resources; for WSN, it is battery life, and for cellular network, it is bandwidth. Their uplink transmissions both fit into a many-to-one communication model, and the feature of aggregating correlated data makes their uplinks a natural application of DSC. Nonetheless, differences exist too. One big difference is that sensors may transmit data with the assistant of relay sensors in WSN, whereas MTC devices in a cellular network mostly communicate directly with the base station, and the communication structure in cellular network makes the clustering problem simpler. However, under certain scenarios, some MTC devices require low latency, which is a challenge that we rarely consider in WSN. To study the latency caused by DSC, we build a simple model of average decoding delay in sequential Slepian–Wolf coding and try to balance the benefit of DSC and the induced decoding delay by dividing MTC devices into independent clusters. Part of analyses and simulations is finished in our previous work.²³

Our contribution is twofolds. First, we propose a scheme which adopts clustered DSC in the uplink transmission of MTC to reduce the aggregated uplink traffic by compressing the data generated from massive MTC devices. It is assumed that the data sources are spatially correlated, and they are formalized as a multi-variate Gaussian distribution in this article. We analyze a LDPC-based DSC scheme and proved that the decoding delay grows linearly with the number of data sources under certain assumptions. To support massive delay-sensitive MTC devices, we propose to divide MTC devices into several discrete clusters to reduce the average delay.

Second, to balance the compression ratio and decoding delay of DSC, we adopt three clustering algorithms based on griding, K-medoids, and Weighted Pair Group Method with Arithmetic Mean (WPGMA). They are evaluated using compression ratio, decoding delay, and another evaluation indicator which combines both. Through sufficient simulations, we found that the cluster number can balance the compression ratio and decoding delay of DSC, and an optimal cluster number exists that maximizes the evaluation indicator. Our simulation shows that K-medoids perform better at average decoding delay and, however, worse at compression ratio than WPGMA.

DSC

In this section, in order to facilitate the understanding of the decoding delay and complexity of DSC, the theory of DSC is briefly introduced, and then, we focus on the analysis of the delay in the decoding process.

Consider the case of two random sources $X$ and $Y$ that are correlated, and it is proved that they can use a total rate of $R_{X} + R_{Y} = H (X, Y)$ if they can elaborate at both encoder and decoder.²⁴ Surprisingly, the Slepian–Wolf theorem shows that they can be coded with a total rate equal to the joint entropy $H (X, Y)$ even if they are unable to communicate with each other as long as their individual rate are no lower than the conditional entropies,⁶ as the rate region shown in the following

$\begin{matrix} R_{X} + R_{Y} \geq H (X, Y) \\ R_{X} \geq H (X | Y), R_{Y} \geq H (Y | X) \end{matrix}$ (1)

This easily generalizes to the N-dimensional case

$\sum_{i \in Z} R_{i} \geq H (Z | Z^{C}), Z \subseteq S$ (2)

where $S$ denotes the set of $N$ sources, $Z$ is any one of the $2^{N} - 1$ subset of $S$ , and $Z^{C}$ is the relative complement of $Z$ with respect to $S$ . The gray region in Figure 1 shows the achievable rate region of two correlated sources.

Figure 1.

Achievable rate region of Slepian–Wolf coding.

Since $X$ and $Y$ are correlated, the separately encoded $Y$ can be regarded as $X$ being transfered in a lossy channel, so there exists inherent relationship between Slepian–Wolf coding and channel coding. This relationship gave researchers a hint to develop practical Slepian–Wolf coding schemes. In 1999, Pradhan and Ramchandran²⁵ proposed DISCUS and ushered a wave of applying channel coding to DSC. Since then, more sophisticated channel coding techniques have been employed and achieved rate close to the Slepian–Wolf border.

The Slepian–Wolf coding can be extended to Rate-Distortion theory with side information when we allow a certain degree of distortion of decoded data. We denote reconstructed $X$ as $\hat{X}$ , if a distortion $D = E [d (X, \hat{X})]$ is allowed, and the minimum bit rate to transmit $X$ is called rate-distortion function of $X$ , denoted as $R_{X | Y} (D)$ . The side information will help decoding $X$ , so we have $R_{X | Y} (D) < R (D)$ , where $R (D)$ is the rate distortion function when no side information is available at the decoder.²⁴

DISCUS inspired a lot of work with the idea of syndrome.²⁵ In a syndrome-based DSC algorithm, the encoder observes $X$ and sends the syndrome $s$ to the decoder, and then, the decoder finds the code vector closest to $Y$ in the coset of syndrome $s$ , and syndromes can be generated with the product of $X$ and the parity-check matrix of a linear code. LDPC codes are adapted with the idea of syndrome,²⁶ and syndrome information is used in the decoding process to calculate the log-likely ratio sent from the check node.

Using LDPC-based DSC of multiple data sources, we can approach to the corner point of the region of achievable rates,^27,28 as shown in equation (3)

$\begin{matrix} R_{1} = H (X_{1}) \\ R_{i} = H (X_{i} | X_{i - 1}, X_{i - 2}, \dots, X_{1}), 2 \leq i \leq N \end{matrix}$ (3)

where $N$ is the number of data sources. In the decoder’s side, message $X_{i}$ has to be decoded before $X_{i + 1}$ , the estimated $X_{k}, \forall k \geq i$ together with the syndrome of $X_{i + 1}$ are input to the decoder to estimate $X_{i + 1}$ , and side information is used in the computing of log-likelihood ratio which requires $p (X_{i + 1} | X_{1}, X_{2}, \dots, X_{i})$ .²⁷ The LDPC-based DSC decoding complexity is related to the length of data source sequence and the rate allocation. Longer sequence length, more iterations or higher rate data sources bring higher decoder complexity.

Because the decoding needs to store the probability density function (PDF) of all the data sources, if we cannot build a model of the PDF with limited parameters, $O (2^{N})$ storage is required. However, if we consider constrained chain rule expansion (CCRE) proposed by Barros and Tuchler,¹⁵ we can degrade the storage requirement to be linear. The symmetric CCRE of the joint distribution of $X = {X_{1}, X_{2}, \dots, X_{N}}$ can be expressed as follows

$p (X) = Π_{k = 1}^{K} p (a_{k} | b_{k})$ (4)

where $a_{k}$ and $b_{k}$ are disjoint subsets of $X$ and any $b_{k}, k = 2, 3, \dots, K$ is a subset of $(a_{l}, b_{l})$ for some $l < k$ .

If we simplify the sequential dependency in equation (3) with symmetric CCRE, the characteristic of symmetric CCRE will make sure that there is no loop in the dependency structure. The maximum storage required to decode each message is constrained by $2^{| | a_{k} | | + | | b_{k} | |}$ , and if $| | a_{k} | | + | | b_{k} | |$ has an upper bound $B$ , the storage requirement is $O (N 2^{B}) = O (N)$ , which is linear.

The chain dependency prevents decoder from decoding sources in parallel, and the decoding complexity time of each data source can be regarded as a constant as we can adjust the number of iterations to ensure the decoding complexity of different data sources same. so the decoding delay of any $X_{i}, i = 1, 2, \dots, N$ is the summation of decoding time of data source $X_{j}, j \leq i$ , which grows linearly with $i$ . The order of maximum delay of all data sources can be denoted as

$T_{\max} = O (N)$ (5)

That means the maximum decoding delay will increase linearly with $N$ , which is the number of devices in one sequential encoding cluster, and we call it cluster size in this article.

System models and problem formulation

In this section, we build a network model with spatially correlated sources and raise the problem of balancing the increased decoding delay and reduced transmission rate by splitting MTC devices into independent clusters. We only consider decoding delay because the decoding time and storage requirements both grow with the same pattern with decoding delay according to the analyses in the previous section.

Network model

In a typical data aggregation MTC application, MTC devices connect to base stations based on maximum signal-to-noise ratio. When we only consider adopting DSC among these MTC devices, each cell can be considered a discrete unit of the network, so we can use a single-base-station model to describe each cell. In this model, one cell covers a 1 km × 1 km area, and $N$ MTC devices are uniformly distributed in this area. To describe their locations, we build a coordinate system as shown in Figure 2. All the devices communicate directly with the base station. To reduce the overall transmission rate without causing a significant delay, clustered DSC is deployed where devices are divided into different clusters, and devices in the same cluster employ sequential Slepian–Wolf coding to compress their observations prior to the transmission. Suppose that we want to divide all the devices into $M$ clusters, our objective is to assign the cluster id to each devices, and we use ${c_{i} | c_{i} \in [1, M], c_{i} \in N, i = 1, 2, \dots, N}$ to denote the belong of device $i$ .

Figure 2.

Area separated into grids, inside one grid, sequential Slepian–Wolf coding is employed.

Data correlation model

In this article, we assume that the data sources of MTC devices are spatially correlated. Let $X_{i}$ denote the observations of device $i$ and the location of $X_{i}$ is $x_{i}$ , and we assume that the correlation between $X_{i}$ and $X_{j}$ depends on $K (x_{i}, x_{j})$ , where $K (\cdot)$ is the correlation function and $K (\cdot) \in [0, 1]$ , and $x_{i}, x_{j}$ is the location of device $i$ and $j$ , respectively. Since $X_{i}$ and $X_{j}$ are spatially correlated, $K (x_{i}, x_{j})$ should decrease with distance $d = ∥ x_{i} - x_{j} ∥$ , and it reaches upper bound at $d = 0$ and lower bound at $d = \infty$ .

The spatially correlated data sources are modeled with a Gaussian random field. We denote the observation vector as $X = {X_{1}, X_{2}, \dots, X_{N}}$ , and then, $X$ can be written in the following notation

$X ~ N (μ, Σ)$ (6)

Without loss of generality, we assume that mean $μ$ equals zero. $Σ = [σ_{ij}]_{N \times N}$ is the covariance matrix. The correlation between devices is indicated by setting $σ_{ij} = K (x_{i}, x_{j})$ .

Different covariance models can be adopted to $K (\cdot)$ to achieve different correlation structure; Berger et al.²⁹ presented four most common kind of models in spatial data: spherical, power exponential, rational quadratic, and Matern. In this article, we use power exponential model

$K_{θ}^{PE} (d) = \exp {- {(\frac{d}{θ_{1}})}^{θ_{2}}}; θ_{1} > 0, θ_{2} \in (0, 2]$ (7)

$θ_{2}$ is called a smoothness parameter that controls the geometrical properties of the random field, and here, we set it to 2, and then, we have correlation function expressed as

$K (d_{ij}) = \exp (- \frac{d_{ij}^{2}}{a})$ (8)

where $a$ controls how fast the correlation decays with distance and we call it correlation degree.

In a practical system, sampling and discretization of data sources will cause a certain extent of distortion of the data sources, and a practical Slepian–Wolf decoder cannot reconstruct the quantified sources without loss. According to the rate-distortion theory,²⁴ when we use mean-squared error as the distortion measurement and the maximum allowed distortion of each data source is $D$ , we have

$R (X, D) = \frac{1}{2} \sum_{i = 1}^{N} \underset{2}{\log} (\frac{λ_{i}}{D_{i}})$ (9)

where $R (X, D)$ is the achievable minimum bit rate to transmit $N$ data sources $X$ in a Gaussian random field, $λ_{1}, λ_{2}, \dots, λ_{n}$ are eigenvalues of $Σ$ , and $D_{i}$ is the distortion of data source of device $i$ , and it is defined as

$D_{i} = {\begin{matrix} θ, & if θ < λ_{i} \\ λ_{i}, & otherwise \end{matrix}$ (10)

where $θ$ is found from $\sum_{n = 1}^{N} min (λ_{n}, θ) = ND$ . Here, $D$ is the average allowed distortion of each data source.

Delay model

As we analyzed in equation (5), the delay introduced by sequential decoding is linear to the cluster size. In this work, we measure the delay of different clusters with the average of the maximum delay in each cluster $T$

$T = \frac{1}{M} \sum_{j = 1} γ N_{j} = \frac{γ N}{M}$ (11)

where $N_{j}$ is the number of users in cluster $j$ , $γ$ is a constant and $γ N_{j}$ is the maximum delay of cluster $j$ . We can see that $T$ is only related to $N / M$ , which represents the average cluster size.

Problem formulation

To evaluate the efficiency of DSC under our clustering strategy, we define the system gain as the reduction of bit rate required to transmitting all the data sources under an allowed distortion $D$ of each data source

$Δ R = \sum_{n = 1}^{N} R_{n} (D) - \sum_{i = 1}^{M} R (C_{i}, D)$ (12)

where $R_{n} (D)$ is the bit-rate region of data source $X_{n}$ with allowed distortion as $D$ , $C_{i}$ is the collection of data sources in cluster $i$ , and $M$ is the number of cluster. When $M = 1$ , all the users are divided into one cluster, and the side information is maximized, so the $Δ R$ reaches the maximum in that case. We call normalized $Δ R$ as compression ratio $η$

$η = \frac{Δ R}{\sum_{n = 1}^{N} R_{n} (D)}$ (13)

Since the observations from devices are correlated to a certain extent depending on the distance between devices, the more users a cluster includes, the more side information we can utilize to increase the compression ratio. Figure 3 shows the influence of cluster size to compression ratio. DSC is adopted to all nodes in one fixed area, and we can see that under different data correlation factor, $η$ grows as cluster size increases. Another conclusion from Figure 3 is that the compression ratio will grow slower when the cluster becomes larger, and this can be explained as follows. When we add one user into a large cluster, its position will be very close to others, which means its observations are highly similar to the existing users. In other words, the new member in the cluster will not bring much new information, and in this way, there will not be a noticeable incrementation of the bit rate required by the DSC. Therefore, in equation (13), the numerator barely grows, so the effect of cluster size becomes negligible in a large cluster.

Figure 3.

Compression ratio of single cluster with different size $D = 0.1$ and $σ_{i} = 1, \forall i$ .

Different from compression ratio, the average delay will increase with the average cluster size linearly according to equation (11). In our network model, we divide all devices into clusters, and the number of clusters will affect the cluster size and eventually change the compression ratio and average delay.

To find the best cluster dividing strategy, we define an evaluation indicator combining coding efficiency and average delay

$β = Δ R / T$ (14)

here, $η$ is not used because $η$ is the normalized transmission rate reduction, while $T$ is not normalized. $β$ can be explained as the rate reduction we earned at the price of an unit of increased average delay. The definition of $β$ is arbitrary, but it demonstrates the philosophy of balancing between efficiency and delay effectively. Now, the optimization problem is to maximize $β$ by changing the number of clusters $M$ and device association $c_{i}$

$\begin{matrix} max_{M, c_{i}} β \\ i = 1, 2, \dots, N \\ c_{i} \in [1, M], c_{i} \in N \end{matrix}$ (15)

User clustering algorithms

In this section, we will discuss the user clustering algorithms to obtain the optimized solution for the problem formulated above, which divide users into clusters according to spatial locations or the correlation intensity between users.

Grid dividing clustering

First, considering that the devices collect spatially correlated data sources, one intuitive idea is to divide the area into grids of the same size, and each grid forms a cluster. This idea is demonstrated in Figure 2, and the side of area is evenly divided into five segmentations, thus the area is split into 25 girds. Inside a grid, messages of MTC devices are sequentially encoded. When the devices are normally distributed in the area, theoretically they are equally separated and each cluster will include almost the same number of devices.

WPGMA and K-medoids

Aside from the simple grid dividing scheme, we use two clustering algorithms from machine learning. We choose a hierarchical clustering algorithm called WPGMA, and a non-hierarchical clustering algorithms named K-medoids. In both WPGMA and K-medoids, we define the distance between two devices as the achievable rate when they are joint encoded, which reflects the correlation intensity, and considering devices $i$ and $j$ , for example, the distance between them is expressed as $d_{ij} = R ({X_{i}, X_{j}}, D)$ .

WPGMA is a bottom-up hierarchical clustering method to generate clusters by merging small clusters.³⁰ In WPGMA, at each iteration, two nearest cluster, for example $C_{1}$ and $C_{2}$ will merge into a new cluster, say $C_{3}$ . Then, the distance between the new cluster and another cluster $C_{k}$ will be recomputed as

$d_{3, k} = \frac{1}{2} (d_{1, k} + d_{2, k})$ (16)

Each node is a separate cluster in the beginning, and we perform the above operations until we get target number of clusters.

K-medoids and K-means are similar non-hierarchical clustering algorithms, and in this article, we cannot get the device location with the distance defined above, so we choose K-medoids where a member of the cluster is used as the cluster center and the clustering process only involves the distance between nodes. We adopt the classic and powerful partitioning around medoids (PAM) algorithm³¹ to find a clustering plan. First, we randomly select $M$ nodes as the cluster center and assign each node to the nearest cluster, and the distance between a node and a cluster is defined as the distance between the node and the cluster center. Then, for each medoid $i$ and each non-medoid $h$ , we try swapping $i$ and $h$ and see whether the sum of distance between nodes and their medoids decreases, and if not we undo the swapping. The swapping is repeated until the sum of distance between nodes and their medoids will not change.

Simulation results and discussions

In this section, we simulate the effects of correlation degree and clustering schemes on the gain of Slepian–Wolf coding and the delay it brings. We first evaluate the grid dividing clustering algorithm under different settings and then compare it with other two clustering algorithms.

All the numerical results are produced from the average of 30 repetition if there is no special explanation. The parameter settings are listed in Table 1. We consider different settings of $a$ to simulate scenarios of low-, middle-, and high-data correlation, and the standard deviation and allowed distortion are set united across different devices to avoid their unnecessary impact on the analysis of coding efficiency. Parameter $γ$ is set to $1$ , since $γ$ only affects the evaluation indicator with a fixed ratio, setting it to $1$ will not harm the generality.

Table 1.

Simulation parameters.

Parameters	Value	Remarks
$L$ (km)	1	Area side length
$σ_{i}, \forall i$	1	Standard deviation
$D$	0.1	Allowed distortion
$N$	100, 500, and 1000	Number of MTC devices
$a$	0.005, 0.01, and 10	Correlation degree
$γ$	1	Delay multiplier

Performance of grid dividing clustering

First of all, we evaluate the performance of cluster Slepian–Wolf coding against number of grids we divided under different correlation degree and number of users.

In Figure 4, the compression ratio $η$ decreases as we divide all users into more clusters, and eventually, $η$ approaches zero. That is because when users are divided into different clusters, they cannot utilize side information from others to reduce transmission rate and improve the compression ratio. When the grid number is large enough, all users are in different clusters, and they encode and decode individually without deploying Slepian–Wolf coding, thus the compression ratio will decrease to 0. Figure 4 also shows that higher correlation degree helps to increase the compression ratio of DSC, when the griding number is same, higher correlation degree provides more information redundancy to be compressed.

Figure 4.

Compression ratio with different number of grids.

Then, we analyzed the delay of grid dividing clustering. In Figure 5, the results are as expected, and the average delay $T$ decreases as the griding number increases. Eventually, when all the devices are divided in different clusters, the average delay will become $1$ . In our analysis at equation (11), $T$ is linear to average cluster size, so $T$ is proportional to $1 / M$ when $N$ is a constant, and the simulation is same with our analyses.

Figure 5.

Average delay with different number of grids.

Finally, we combine efficiency and delay to evaluate the benefit and loss of dividing devices into different clusters. In Figure 6, the ratio of transmission rate reduction to average delay $β$ is presented under different settings of user number $N$ and correlation degree $a$ . We can see that $β$ increases along with grid number in the beginning and then start decreasing. That means, to achieve a balance between the efficiency and delay that Slepian–Wolf coding brings, we need to divide MTC devices into proper number of clusters, and there exists an optimal grid number that maximizes $β$ . When $N = 500, a = 0.01$ , the optimal grid number is about 361, and from Figures 4 and 5, we can read when $M = 361$ , and we still achieve over 35% compression ratio and limit the $T$ under 2 units. In practical application scenarios, the evaluation indicator can be defined in different forms, like $Δ R / \sqrt{T}$ , but the their trend should be similar to increasing and then decreasing. Another notable observation in Figure 6 is that the optimal $β$ is higher when there are more devices in the area. In Figure 3, we see that the compression ratio will increase slower when the cluster size becomes larger, whereas the delay grows linearly. So, the optimal grid number is higher when $N$ is higher to keep the average cluster size smaller. This also means that when the user density is higher, we can achieve higher bit rate decrement with the sacrifice of one unit latency.

Figure 6.

Rate reduction/average delay with different number of grids.

Comparison of clustering algorithms

We compare three clustering algorithms in this subsection, Figure 7 shows the compression ratio of them, and we can see that two K-medoids and WPGMA obtain similar coding efficiency, while WPGMA performs a little better. In Figure 8, we show the maximum delay in all clusters using different clustering algorithms, and we can see that WPGMA performs worst. We can see from Figure 9 that in the result of WPGMA, devices are not distributed as evenly as other two algorithms in terms of cluster numbers. According to our analyses of delay, when the clustering is not balanced, clusters that contain more members will suffer from a higher delay. In the practical system, if we have no prior information of the delay requirement of devices, unbalanced clustering algorithms perform not good in terms of user equality. Although griding performs best on average, delay in our simulation results, and it is worse mentioning that griding is not practical in real-world systems where users are rarely randomly distributed and griding algorithm will generate unevenly divided clusters in that case, which will cause bad delay performance.

Figure 7.

Compression ratio comparison between three clustering algorithms, $a = 0.1$ , $D = 0.01$ , and $N = 500$ .

Figure 8.

Delay comparison between three clustering algorithms, $a = 0.1$ , $D = 0.01$ , and $N = 500$ . Dashed line shows the max delay when all the devices are evenly split into $M$ clusters, which is also the optimal max delay.

Figure 9.

Boxplot of delay of clusters generated by three clustering algorithms, $a = 0.1$ , $D = 0.01$ , $M = 25$ , and $N = 500$ .

Conclusion and future work

Leveraging DSC to reduce the redundancy of correlated sources can relieve the fast growing demand of communication resources from massive MTC devices. In this article, we analyze the storage consumption and decoding delay of clustered DSC that adopt sequential Slepian–Wolf coding inside one cluster. We find that the decoding delay grows linearly with the cluster size under certain assumptions. The balance between the benefit of reduced transmission rate and the disadvantage of increased delay is then studied with three clustering algorithm: grid dividing, WPGMA, and K-medoids. Our simulation results show that when data from MTC devices are spatially correlated, dividing MTC devices into clusters successfully reduces the delay and achieves a proper compression ratio at the same time. K-medoids and WPGMA both outperform grid dividing in terms of compression ratio, and WPGMA performs a little better than K-medoids. As for decoding delay, grid dividing divide users more evenly into clusters and have the most balanced delay among different clusters, and the performance of K-medoids is close to griding while WPGMA performs worst.

Our current work is based on a simple single base station model where the correlation among users is stationary. Considering the heterogeneity of future wireless network and the mobility of users, we are going to study the clustering problem in a more complicate network model, and better clustering algorithms that adapt to time-varying correlation structure will be investigated.

Footnotes

Handling Editor: Antonino Staiano

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was partially supported by Natural Science Foundation of China (Grant No. 61461136002),Key Program of National Natural Science Foundation of China (Grant No. 61631018),Fundamental Research Funds for the Central Universities,and Huawei Technology Innovative Research.

References

Dawy

Saad

Ghosh

et al . Toward massive machine type cellular communications. IEEE Wirel Commun 2017; 24(1): 120–128.

Campolo

Molinaro

Araniti

et al . Better platooning control toward autonomous driving: an LTE device-to-device communications strategy that meets ultralow latency requirements. IEEE Veh Technol Mag 2017; 12(1): 30–38.

Katsaggelos

et al . Wireless video surveillance: a survey. IEEE Access 2013; 1: 646–660.

Shariatmadari

Ratasuk

Iraji

et al . Machine-type communications: current status and future perspectives toward 5G systems. IEEE Commun Mag 2015; 53(9): 10–17.

Xiong

Liveris

Cheng

. Distributed source coding for sensor networks. IEEE Signal Proc Mag 2004; 21(5): 80–94.

Slepian

Wolf

. Noiseless coding of correlated information sources. IEEE T Inform Theory 1973; 19(4): 471–480.

Wyner

Ziv

. The rate-distortion function for source coding with side information at the decoder. IEEE T Inform Theory 1976; 22(1): 1–10.

Girod

Aaron

Rane

et al . Distributed video coding. P IEEE 2005; 93(1): 71–83.

Sartipi

Fekri

. Distributed source coding in wireless sensor networks using LDPC coding: the entire Slepian-Wolf rate region. In: Proceedings of the wireless communications and networking conference, New Orleans, LA, 13–17 March 2005, vol. 4, pp.1939–1944. New York: IEEE.

10.

Vuran

Akyildiz

. Spatial correlation-based collaborative medium access control in wireless sensor networks. IEEE/ACM Trans Netw 2006; 14(2): 316–329.

11.

Chou

Petrovic

Ramachandran

. A distributed and adaptive signal processing approach to reducing energy consumption in sensor networks. In: Proceedings of the 22nd annual joint conference of the computer and communications societies, San Francisco, CA, 30 March–3 April 2003, vol. 2, pp.1054–1062. New York: IEEE.

12.

Dai

Wang

Akyildiz

. Correlation-aware QoS routing with differential coding for wireless video sensor networks. IEEE T Multimedia 2012; 14(5): 1469–1479.

13.

Grangetto

Magli

Olmo

. Distributed arithmetic coding. IEEE Commun Lett 2007; 11(11): 883–885.

14.

Viswanatha

Ramaswamy

Saxena

et al . Error/erasure-resilient and complexity-constrained zero-delay distributed coding for large-scale sensor networks. ACM T Sensor Network 2015; 11(2): 35.

15.

Barros

Tuchler

. Scalable decoding on factor trees: a practical solution for wireless sensor networks. IEEE T Commun 2006; 54(2): 284–294.

16.

Garcia-Frias

. Compression of correlated binary sources using turbo codes. IEEE Commun Lett 2001; 5(10): 417–419.

17.

Varodayan

Aaron

Girod

. Rate-adaptive codes for distributed source coding. Signal Process 2006; 86(11): 3123–3130.

18.

Fang

Jeong

. Correlation parameter estimation for LDPC-based Slepian-Wolf coding. IEEE Commun Lett 2009; 13(1): 37–39.

19.

Marco

Neuhoff

. and Reliability vs. efficiency in distributed source coding for field-gathering sensor networks. In: Proceedings of the 3rd international symposium on information processing in sensor networks, Berkeley, CA, 27 April 2004, pp.161–168. New York: IEEE.

20.

Tsai

Hong

Liao

et al . The efficiency and delay of distributed source coding in random access sensor networks. In: Proceedings of the wireless communications and networking conference, Kowloon, China, 11–15 March 2007, pp.785–790. New York: IEEE.

21.

Wang

Zheng

. Distributed data aggregation using clustered Slepian-Wolf coding in wireless sensor networks. In: Proceedings of the international conference on communications, Glasgow, 24–28 June 2007, pp.3616–3622. New York: IEEE.

22.

Wang

Liu

. Optimal number of clusters in dense wireless sensor networks: a cross-layer approach. IEEE T Veh Technol 2009; 58(2): 966–976.

23.

Wang

Zhu

Zhang

et al . Tradeoff between efficiency and delay of distributed source coding for uplink transmissions in machine type communications. In: Proceedings of the 9th international conference on wireless communications and signal processing, Nanjing, China, 11–13 October 2017. New York: IEEE.

24.

Cover

Thomas

. Elements of information theory. New York: Wiley-Interscience, 1991.

25.

Pradhan

Ramchandran

. Distributed source coding using syndromes (DISCUS): design and construction. In: Proceedings of the data compression conference, Snowbird, UT, USA, 29–31 March 1999, pp.158–167. New York: IEEE.

26.

Liveris

Xiong

Georghiades

. Compression of binary sources with side information at the decoder using LDPC codes. IEEE Commun Lett 2002; 6(10): 440–442.

27.

Liveris

Lan

Narayanan

et al . Slepian-Wolf coding of three binary sources using LDPC codes. In: Proceedings of the international symposium on turbo codes and related topics, Brest, France, September 2003, pp.63–66.

28.

Lan

Liveris

Narayanan

et al . Slepian-Wolf coding of multiple M-ary sources using LDPC codes. In: Proceedings of the data compression conference, Snowbird, UT, 23–25 March 2004, p.549. New York: IEEE.

29.

Berger

De Oliveira

Sansó

. Objective Bayesian analysis of spatially correlated data. J Am Stat Assoc 2001; 96(456): 1361–1374.

30.

Sokal

. A statistical method for evaluating systematic relationship. Univ Kans Sci Bull 1958; 28: 1409–1438.

31.

Kaufman

Rousseeuw

. Finding groups in data: an introduction to cluster analysis. vol. 344. Hoboken, NJ:John Wiley & Sons, 2009.

Tradeoff between compression ratio and decoding delay of distributed source coding for uplink transmissions in machine-type communication

Abstract

Keywords

Introduction

Related works

Our contributions

DSC

System models and problem formulation

Network model

Data correlation model

Delay model

Problem formulation

User clustering algorithms

Grid dividing clustering

WPGMA and K-medoids

Simulation results and discussions

Performance of grid dividing clustering

Comparison of clustering algorithms

Conclusion and future work

Footnotes

Declaration of conflicting interests

Funding

References