Abstract
Keywords
Introduction
With the enormous development of Internet of Things (IoT), machine-type communication (MTC) devices are expected to grow exponentially and demand a significant aggregated transmission rate in the next decade. Meanwhile, their diverse applications in different areas including automatic driving, environmental monitoring, and wireless video surveillance generate varying requirements in terms of delay, reliability, bit rate, and energy consumption. 1 Taking platooning, a representative application of autonomous driving, as an example, vehicles in one platoon drive on the same path and maintain a fixed small distance between each other. To ensure the safety of vehicles, a highly reliable low-latency data transmission link is demanded by the autonomous controlling system. Information exchanging and processing should take no more than 100 ms. 2 In a wireless video surveillance system, instead of delay, channel resource and power constraint are the main concerns. Video transmission will generate a great amount of uplink data traffic that consumes lots of channel resources. When wireless video sensors are powered with battery, minimizing the energy consumption of data transmission and encoding computation are essential for prolonging the life of sensors. 3
A typical MTC architecture consists of three domains: device domain, network domain, and application domain. A cellular network plays the role of network domain and MTC devices exchange data with MTC servers or other MTC devices through base station 4 or managed device-to-device links. Current researches focus on the access control rather than effective data transmission of massive MTC devices, because the amount of data from each device is considered small. However, some devices, like sensors on autonomous driving cars and monitoring cameras in a smart city, require higher transmission rates. As these intense data aggregation applications emerge and the amount of various MTC devices grows, aggregated data transmission rate becomes enormous. Overall, data traffic from MTC devices will occupy the majority of future data traffic. 1 Serving a large amount of diverse MTC devices and satisfying their significant aggregated data transmission are going to challenge future mobile networks.
To achieve higher data transmitting rate with limited bandwidth, one traditional method is to improve the spectrum efficiency. However, state-of-the-art modulation and coding techniques have pushed the current communication system close to Shannon limit, and it is getting harder to improve the spectrum efficiency. Another way to relief the pressure of communication system is compressing the data to be transmitted using source coding techniques. For single data source such as text, audio and video, many coding schemes like Huffman coding, JPEG, and H.264 are investigated and well applied in practical communication systems. By contrast, jointly compressing the data from more than one data source is immature and not widely adopted, but it is a promising technique for fifth-generation wireless systems (5G) to support the data transmission of massive MTC devices in the future. Distributed source coding (DSC) is a representative multi-source coding technique which has been widely investigated in wireless sensor networks (WSNs). 5 It can be used to compress the output of correlated MTC devices that do not communicate with each others. However, deploying DSC will bring encoding and decoding delays which should be considered in MTC uplink transmission.
DSC is first introduced by Slepian and Wolf, 6 and they proved that separate encoders and a joint decoder for two dependent sources can achieve a coding rate same as that a joint encoder and decoder does for lossless compression. They also presented the achievable rate region of lossless DSC, know as the Slepian–Wolf region. Their work was further extended to an information theoretical bounds for lossy compression by Wyner and Ziv. 7 Based on their theorems, many practical coding schemes that approach the theoretic bound are proposed and most of them are derived from channel codings like turbo codes and low-density parity-check (LDPC) codes. 8 Any arbitrary rate on the Slepian–Wolf rate region is then verified and achievable based on non-uniform LDPC. 9
The capability to eliminate the information redundancy of correlated sources without requiring them to communicate with each other earns DSC many applications in different areas like WSN and video coding. In a densely deployed WSN, observations of sensors are expected to be spatially correlated. 10 For example, the measurements of adjacent temperature sensors are usually similar. Sensors use relay nodes to transmit data to the aggregator, and the relay nodes can compress the observations it aggregated with the help of DSC and thus reduce the energy consumption of communication. Exploiting inherent correlation in the observations of MTC devices and deploying DSC can effectively increase battery life of WSN. 11 In video coding, DSC are mainly used for inter-frame coding and inter-video coding. The inter-frame coding take advantage of the correlation between video frames by encoding frames individually and decode them jointly with DSC, 8 while intra-video coding tries to exploit the correlation of videos from cameras with overlapped field of views. 12
Although DSC is promising in WSN, there is still a long way toward deploying DSC in a large-scale sensor network. In the past two decades, not only the benefits but also the drawbacks of applying DSC are investigated. In terms of encoding and decoding scheme, most of them are based on channel coding. They have the advantage of achieving compression rate near to the Slepian–Wolf border and being resilient to channel errors, but these capacity-achieving channel codes require a data block as large as
The estimation of the correlation model of practical DSC systems is also a challenge. In practice, it is usually hard to obtain the correlation model, especially when the sensor network topology changes fast so that little time or information exists for training the joint probability mass or density function.
5
Some works manage to estimate the correlation parameters at the decoder side, but they introduce other problems. For example, turbo codes are adopted to the compression of two correlated sources and the decoder estimate the correlation utilizing the iterative decoding process. The decoder manages to perform close to the Slepian–Wolf bound, but iterative decoding apparently brings higher computation complexity and delay.
16
Aiming at the situation that neither encoder nor decoder knows the correlation structure, a rate-adaptive scheme with feedback is proposed,
17
and the encoder first sends a short syndrome and the decoder attempts decoding; if the decoding fails, the encoder will be informed to extend the short syndrome with additional bits until the decoding achieves success. This scheme is further improved in decoding complexity and delay by reducing feedback loops.
18
However, the
Related works
Realizing the merits and demerits of DSC, some researchers point out that we should consider the overall performance of a DSC-employed system. The balance between the benefit and price of DSC is investigated from different viewpoint. In 2004, the tradeoff between reliability and efficiency in DSC is considered by Marco and Neuhoff. 19 Four schemes including sequential Slepian–Wolf coding and clustered Slepian–Wolf coding are analyzed. It is assumed that the transmission of packets has a failure probability; since the decoding of Slepian–Wolf coding depends on data transmitted from other nodes, failure of transmissions at one node may cause the decoding failure of all nodes, and the results show that splitting sensors into clusters will increase the system reliability at the price of decreased coding efficiency. Hong and his colleagues consider sequential Slepian–Wolf coding in a random access network, 20 and they analyze the throughput, delay, and energy efficiency of DSC in the proposed system. Their results show that increasing the throughput per node will bring higher average delay and energy consumption.
The disadvantages of adopting DSC to a large cluster inspired the idea of splitting data sources into some small clusters. Lots of algorithms are proposed to maximize the benefit of DSC by finding a better clustering metric in WSN. In a study by Wang et al., 21 a distributed optimal compression clustering protocol (DOC2) is proposed to maximize the global compression gain. Sensors are divided into clusters based on their communication range and their contribution of the average entropy of messages inside one cluster. Inside one cluster, rate allocation is based on sequential Slepian–Wolf coding. The simulation results show that DOC2 achieves a promising overall compression ratio. Aside from clustering algorithms, optimization of the number of clusters is also discussed by many researchers. In addiction to analyzing the number of clusters at single network layer, some researchers optimize the number of clusters using a cross-layer approach which incorporates effects from different network layers. 22
Our contributions
In this work, we consider a cellular network supporting massive MTC devices, and these devices upload their measurements to an MTC server. And we assume that these measurements are spatially correlated. This case is similar to a common scenario in WSN where lots of wireless sensors transmit data to a sink. Both scenario support many terminals and need to transmit as much information as possible with limited resources; for WSN, it is battery life, and for cellular network, it is bandwidth. Their uplink transmissions both fit into a many-to-one communication model, and the feature of aggregating correlated data makes their uplinks a natural application of DSC. Nonetheless, differences exist too. One big difference is that sensors may transmit data with the assistant of relay sensors in WSN, whereas MTC devices in a cellular network mostly communicate directly with the base station, and the communication structure in cellular network makes the clustering problem simpler. However, under certain scenarios, some MTC devices require low latency, which is a challenge that we rarely consider in WSN. To study the latency caused by DSC, we build a simple model of average decoding delay in sequential Slepian–Wolf coding and try to balance the benefit of DSC and the induced decoding delay by dividing MTC devices into independent clusters. Part of analyses and simulations is finished in our previous work. 23
Our contribution is twofolds. First, we propose a scheme which adopts clustered DSC in the uplink transmission of MTC to reduce the aggregated uplink traffic by compressing the data generated from massive MTC devices. It is assumed that the data sources are spatially correlated, and they are formalized as a multi-variate Gaussian distribution in this article. We analyze a LDPC-based DSC scheme and proved that the decoding delay grows linearly with the number of data sources under certain assumptions. To support massive delay-sensitive MTC devices, we propose to divide MTC devices into several discrete clusters to reduce the average delay.
Second, to balance the compression ratio and decoding delay of DSC, we adopt three clustering algorithms based on griding, K-medoids, and Weighted Pair Group Method with Arithmetic Mean (WPGMA). They are evaluated using compression ratio, decoding delay, and another evaluation indicator which combines both. Through sufficient simulations, we found that the cluster number can balance the compression ratio and decoding delay of DSC, and an optimal cluster number exists that maximizes the evaluation indicator. Our simulation shows that K-medoids perform better at average decoding delay and, however, worse at compression ratio than WPGMA.
DSC
In this section, in order to facilitate the understanding of the decoding delay and complexity of DSC, the theory of DSC is briefly introduced, and then, we focus on the analysis of the delay in the decoding process.
Consider the case of two random sources
This easily generalizes to the
where

Achievable rate region of Slepian–Wolf coding.
Since
The Slepian–Wolf coding can be extended to Rate-Distortion theory with side information when we allow a certain degree of distortion of decoded data. We denote reconstructed
DISCUS inspired a lot of work with the idea of syndrome.
25
In a syndrome-based DSC algorithm, the encoder observes
Using LDPC-based DSC of multiple data sources, we can approach to the corner point of the region of achievable rates,27,28 as shown in equation (3)
where
Because the decoding needs to store the probability density function (PDF) of all the data sources, if we cannot build a model of the PDF with limited parameters,
where
If we simplify the sequential dependency in equation (3) with symmetric CCRE, the characteristic of symmetric CCRE will make sure that there is no loop in the dependency structure. The maximum storage required to decode each message is constrained by
The chain dependency prevents decoder from decoding sources in parallel, and the decoding complexity time of each data source can be regarded as a constant as we can adjust the number of iterations to ensure the decoding complexity of different data sources same. so the decoding delay of any
That means the maximum decoding delay will increase linearly with
System models and problem formulation
In this section, we build a network model with spatially correlated sources and raise the problem of balancing the increased decoding delay and reduced transmission rate by splitting MTC devices into independent clusters. We only consider decoding delay because the decoding time and storage requirements both grow with the same pattern with decoding delay according to the analyses in the previous section.
Network model
In a typical data aggregation MTC application, MTC devices connect to base stations based on maximum signal-to-noise ratio. When we only consider adopting DSC among these MTC devices, each cell can be considered a discrete unit of the network, so we can use a single-base-station model to describe each cell. In this model, one cell covers a 1 km × 1 km area, and

Area separated into grids, inside one grid, sequential Slepian–Wolf coding is employed.
Data correlation model
In this article, we assume that the data sources of MTC devices are spatially correlated. Let
The spatially correlated data sources are modeled with a Gaussian random field. We denote the observation vector as
Without loss of generality, we assume that mean
Different covariance models can be adopted to
where
In a practical system, sampling and discretization of data sources will cause a certain extent of distortion of the data sources, and a practical Slepian–Wolf decoder cannot reconstruct the quantified sources without loss. According to the rate-distortion theory,
24
when we use mean-squared error as the distortion measurement and the maximum allowed distortion of each data source is
where
where
Delay model
As we analyzed in equation (5), the delay introduced by sequential decoding is linear to the cluster size. In this work, we measure the delay of different clusters with the average of the maximum delay in each cluster
where
Problem formulation
To evaluate the efficiency of DSC under our clustering strategy, we define the system gain as the reduction of bit rate required to transmitting all the data sources under an allowed distortion
where
Since the observations from devices are correlated to a certain extent depending on the distance between devices, the more users a cluster includes, the more side information we can utilize to increase the compression ratio. Figure 3 shows the influence of cluster size to compression ratio. DSC is adopted to all nodes in one fixed area, and we can see that under different data correlation factor,

Compression ratio of single cluster with different size
Different from compression ratio, the average delay will increase with the average cluster size linearly according to equation (11). In our network model, we divide all devices into clusters, and the number of clusters will affect the cluster size and eventually change the compression ratio and average delay.
To find the best cluster dividing strategy, we define an evaluation indicator combining coding efficiency and average delay
here,
User clustering algorithms
In this section, we will discuss the user clustering algorithms to obtain the optimized solution for the problem formulated above, which divide users into clusters according to spatial locations or the correlation intensity between users.
Grid dividing clustering
First, considering that the devices collect spatially correlated data sources, one intuitive idea is to divide the area into grids of the same size, and each grid forms a cluster. This idea is demonstrated in Figure 2, and the side of area is evenly divided into five segmentations, thus the area is split into 25 girds. Inside a grid, messages of MTC devices are sequentially encoded. When the devices are normally distributed in the area, theoretically they are equally separated and each cluster will include almost the same number of devices.
WPGMA and K-medoids
Aside from the simple grid dividing scheme, we use two clustering algorithms from machine learning. We choose a hierarchical clustering algorithm called WPGMA, and a non-hierarchical clustering algorithms named K-medoids. In both WPGMA and K-medoids, we define the distance between two devices as the achievable rate when they are joint encoded, which reflects the correlation intensity, and considering devices
WPGMA is a bottom-up hierarchical clustering method to generate clusters by merging small clusters.
30
In WPGMA, at each iteration, two nearest cluster, for example
Each node is a separate cluster in the beginning, and we perform the above operations until we get target number of clusters.
K-medoids and K-means are similar non-hierarchical clustering algorithms, and in this article, we cannot get the device location with the distance defined above, so we choose K-medoids where a member of the cluster is used as the cluster center and the clustering process only involves the distance between nodes. We adopt the classic and powerful partitioning around medoids (PAM) algorithm
31
to find a clustering plan. First, we randomly select
Simulation results and discussions
In this section, we simulate the effects of correlation degree and clustering schemes on the gain of Slepian–Wolf coding and the delay it brings. We first evaluate the grid dividing clustering algorithm under different settings and then compare it with other two clustering algorithms.
All the numerical results are produced from the average of 30 repetition if there is no special explanation. The parameter settings are listed in Table 1. We consider different settings of
Simulation parameters.
Performance of grid dividing clustering
First of all, we evaluate the performance of cluster Slepian–Wolf coding against number of grids we divided under different correlation degree and number of users.
In Figure 4, the compression ratio

Compression ratio with different number of grids.
Then, we analyzed the delay of grid dividing clustering. In Figure 5, the results are as expected, and the average delay

Average delay with different number of grids.
Finally, we combine efficiency and delay to evaluate the benefit and loss of dividing devices into different clusters. In Figure 6, the ratio of transmission rate reduction to average delay

Rate reduction/average delay with different number of grids.
Comparison of clustering algorithms
We compare three clustering algorithms in this subsection, Figure 7 shows the compression ratio of them, and we can see that two K-medoids and WPGMA obtain similar coding efficiency, while WPGMA performs a little better. In Figure 8, we show the maximum delay in all clusters using different clustering algorithms, and we can see that WPGMA performs worst. We can see from Figure 9 that in the result of WPGMA, devices are not distributed as evenly as other two algorithms in terms of cluster numbers. According to our analyses of delay, when the clustering is not balanced, clusters that contain more members will suffer from a higher delay. In the practical system, if we have no prior information of the delay requirement of devices, unbalanced clustering algorithms perform not good in terms of user equality. Although griding performs best on average, delay in our simulation results, and it is worse mentioning that griding is not practical in real-world systems where users are rarely randomly distributed and griding algorithm will generate unevenly divided clusters in that case, which will cause bad delay performance.

Compression ratio comparison between three clustering algorithms,

Delay comparison between three clustering algorithms,

Boxplot of delay of clusters generated by three clustering algorithms,
Conclusion and future work
Leveraging DSC to reduce the redundancy of correlated sources can relieve the fast growing demand of communication resources from massive MTC devices. In this article, we analyze the storage consumption and decoding delay of clustered DSC that adopt sequential Slepian–Wolf coding inside one cluster. We find that the decoding delay grows linearly with the cluster size under certain assumptions. The balance between the benefit of reduced transmission rate and the disadvantage of increased delay is then studied with three clustering algorithm: grid dividing, WPGMA, and K-medoids. Our simulation results show that when data from MTC devices are spatially correlated, dividing MTC devices into clusters successfully reduces the delay and achieves a proper compression ratio at the same time. K-medoids and WPGMA both outperform grid dividing in terms of compression ratio, and WPGMA performs a little better than K-medoids. As for decoding delay, grid dividing divide users more evenly into clusters and have the most balanced delay among different clusters, and the performance of K-medoids is close to griding while WPGMA performs worst.
Our current work is based on a simple single base station model where the correlation among users is stationary. Considering the heterogeneity of future wireless network and the mobility of users, we are going to study the clustering problem in a more complicate network model, and better clustering algorithms that adapt to time-varying correlation structure will be investigated.
