Abstract
Keywords
Introduction
Electric Internet of things (EIoT) can provide significant support for the intelligence, digitalization, and transparency of power grid through timely collecting the operation parameters, including voltage, current, as well as active and reactive power, and transmitting them to the cloud platform for processing and analysis. 1 In EIoT, the communication devices produced by different manufacturers utilize multiple communication protocols for data transmission and information interaction. 2 Typical communication protocols in EIoT include message queue telemetry transport (MQTT), data distribution service (DDS), constrained application protocol (CoAP), hypertext transfer protocol (HTTP), etc. DDS is commonly used for state monitoring in EIoT. 3 CoAP is particularly suitable for services like meter reading management and load forecasting. 4 HTTP is applicable for high-performance devices with large computing and storage resources in EIoT. 5 MQTT is suitable for lightweight data transmission of gateways due to the characteristics of high bandwidth utilization and simple implementation. 6 The gateway can achieve the adaptation and conversion of different protocols to MQTT. Through the information interaction between gateways based on MQTT, the connectivity and interoperability among different devices can be achieved, which shields the differences among various protocols.
QoS guarantee is of vital importance in the process of data transmission between gateway and platform in EIoT.7,8 MQTT provides three quality of service (QoS) levels, that is, at most once (QoS0) level, at least once (QoS1) level, and exactly once (QoS2) level, 9 which provide different QoS guarantees in terms of transmission delay and packet-loss ratio. Specifically, the transmission delay of QoS0 is relatively lower but the packet-loss ratio is higher, while QoS1 and QoS2 achieve no packet loss at the expense of increased transmission delay. Moreover, QoS1 guarantees that the data packet is successfully transmitted at least once, and QoS2 ensures that the data packet is successfully transmitted exactly once by leveraging more complicated retransmission mechanism. Therefore, it is necessary to dynamically and intelligently select MQTT QoS levels for data transmission between gateway and platform according to the time-varying network state and QoS requirements in EIoT. 10
However, the dynamic MQTT QoS level selection still faces some challenges, which are summarized as follows. First, the QoS requirements of control services and acquisition services differ in terms of delay and reliability.11–13 However, the different metrics are contradictory, for example, adopting retransmission mechanism ensures lower packet-loss ratio but greatly increasing transmission delay. Therefore, it is a critical challenge to achieve a balanced trade-off among different QoS metrics. 14 Second, the current delay and packet-loss ratio models do not take the impact of protocol-specific QoS guarantee mechanism on the physical-layer performance into consideration. Therefore, deriving the accurate closed-form models of delay and packet-loss ratio which are adaptive with MQTT-specific QoS levels is challenging. Last but not least, due to network resource limitation and prohibitive signaling overhead, the global state information (GSI), for example, channel gain, is uncertain.15–17 Therefore, it is necessary to intelligently optimize MQTT QoS level selection under incomplete information. 18
There exist some works that have addressed MQTT QoS level selection problems in IoT. Sadeq et al. 19 proposed a QoS approach for IoT environment utilizing MQTT and designed a flow control mechanism to minimize the transmission delay. Niruntasukrat et al. 20 proposed an authorization mechanism for MQTT-based IoT service platform to minimize delay and message overhead. However, these works have not considered the joint optimization of delay and packet-loss ratio. Lee et al. 21 proposed a push notification service network utilizing MQTT protocol to minimize the packet loss and delay by selecting appropriate QoS level according to different payloads. Nurwarsito et al. 22 proposed a communication architecture using MQTT protocol for emergency vehicles which aims to minimize the packet loss and average delay. However, the above-mentioned works have not considered uncertain GSI in practical EIoT application scenarios. Weerasinghe et al. 23 proposed an MQTT-based localization mechanism for wireless sensor network by utilizing supervised learning. Ahmadon et al. 24 proposed a machine learning-based anomaly detection method for MQTT-based network. However, these works need offline scene data, which cannot adapt to the complex environment in EIoT. 25
Reinforcement learning provides a powerful tool to deal with sequential decision problems under incomplete information.26–28 Among various reinforcement learning algorithms, upper confidence bound (UCB) originally developed for the multi-armed bandit (MAB) problems has rapid convergence speed and well-balanced trade-off between exploitation and exploration. Zhou et al. 29 proposed an energy-aware and data backlog-aware UCB-based channel selection algorithm, which can improve energy efficiency and throughput. However, the delay and reliability are not taken into account. Endo et al. 30 proposed a distributed QoS-UCB channel selection algorithm considering channel rating quality, which can improve the reliability and reduce the delay while avoiding congestion. However, this work has not considered the complex communication environment in EIoT and MQTT-specific QoS level selection optimization.
Motivated by the aforementioned challenges, we propose a delay-reliability-aware protocol adaption and QoS guarantee method for EIoT based on reinforcement learning. First, considering the adaptation and conversion of heterogeneous protocols, we establish a communication architecture of EIoT based on MQTT. Second, we propose a delay-reliability-aware MQTT QoS level selection (DR-MQLS) algorithm based on UCB to minimize the weighted sum of packet-loss ratio and delay. Last but not least, simulations are carried out to validate the effectiveness of DR-MQLS. Compared with single and fixed QoS level selection strategies, DR-MQLS can effectively reduce the weighted sum of packet-loss ratio and delay and satisfy the differentiated QoS requirements in EIoT. We summarize the main contributions of this work as follows:
The remaining structure is as follows. In section “System model and problem formulation,” we describe system model and problem formulation in details. The proposed DR-MQLS algorithm is introduced in section “Delay-reliability-aware MQTT QoS level selection in EloT.” Section “Simulation results” provides simulation results. In section “Conclusion,” we summarize this article.
System model and problem formulation
The considered communication architecture of EIoT based on MQTT is shown in Figure 1,31,32 which consists of an MQTT broker server, a cloud platform, multiple EIoT devices, and multiple gateways. The gateways with protocol adaption and conversion functions adopt publish/subscribe pattern for information interaction with cloud platform and can act as both publishers and subscribers. The broker server acts as an intermediary for data transmission between publishers and subscribers, which is deployed on the cloud platform. The publisher notifies the broker server with topics which it tends to publish. Then, the broker server keeps the topics and pushes them when subscribers ask for relevant topics. Multiple communication protocols are used for data transmission between EIoT devices and gateways, for example, HTTP, CoAP, and DDS. Through parsing and repackaging protocol messages, the gateway achieves the conversion between multiple protocols and MQTT protocol. An example is shown in Figure 1. The broker server pushes the subscribed topic and transmits the related data to the gateway based on the transmission mechanism specified by MQTT QoS1 level. Then the gateway executes protocol adaption and conversion to repackage protocol messages based on DDS, CoAP, and HTTP and transmits the data to the corresponding EIoT devices.

Communication architecture of EIoT based on MQTT.
We assume that there are
We assume that channel state remains unchanged during small packet data transmission process but varies across different small packets.
33
In particular, each retransmission is considered as a small packet transmission process for QoS1 and QoS2 which adopt retransmission mechanisms. The channel gain34,35 of the
where
Figure 2 shows MQTT data transmission processes of three QoS levels. The packet-loss ratio and delay models of the three QoS levels are elaborated in the following.

MQTT data transmission processes of three QoS levels.
QoS0 level
QoS0 provides best-effort delivery of the PUBLISH packet. After the gateway sending the PUBLISH packet to the broker server, the transmission process is completed immediately, regardless of whether the broker server receives the packet. Therefore, although the transmission delay of QoS0 is low, the packet-loss ratio is relatively high under poor channel states.
Packet-loss ratio model
QoS0 level for data transmission has only one PUBLISH packet transmission process. Therefore, the packet-loss variable of the
where
Here,
Delay model
The transmission delay of the
where
The total delay of the
QoS1 level
QoS1 adopts a PUBACK packet to acknowledge the reception of the PUBLISH packet. If the PUBACK packet is not received by the gateway within a certain time, the PUBLISH packet is retransmitted. In this case, the PUBLISH packet is received at least once at the broker server. The data deduplication process is required to delete the duplicate packets at the expensive of a certain data processing delay. 36 Therefore, the packet-loss ratio in QoS1 level is zero, but the transmission delay and data deduplication delay are relatively high.
Packet-loss ratio model
Since QoS1 adopts retransmission to ensure successful data transmission, the packet-loss ratio of the
Delay model
There are two transmission processes in QoS1 level, that is, PUBLISH packet transmission and PUBACK packet feedback. When the above two processes are successful, the transmission process of a small packet is completed. Define
We define
Then, the transmission delay of the
where
In order to simplify the model, we assume that the data deduplication delay of different small packets is uniformly defined as
The total delay of the
QoS2 level
QoS2 ensures that messages are delivered exactly once through two interaction processes by means of PUBLISH, PUBREC, PUBREL, and PUBCOMP packets. In the first interaction process, after the gateway sending the PUBLISH packet to the broker server, if a PUBREC is not received within a certain time, the PUBLISH packet will be retransmitted until the PUBREC packet is successfully received. If a duplicate PUBLISH packet is received at the broker server, it will be deleted immediately. In the second interaction process, when receiving the PUBREC packet, the gateway responds to the broker server with a PUBREL packet and waits for the feedback PUBCOMP packet. Similarly, if the PUBCOMP packet is not received within a certain time, the PUBREL packet will be retransmitted until the PUBCOMP packet is successfully received. Therefore, the QoS2 level ensures that each packet is successfully received without duplication.
Packet-loss ratio model
Since QoS2 also adopts retransmission to ensure successful data transmission, the packet-loss ratio of the
Delay model
There are four processes, that is, PUBLISH packet transmission, PUBREC packet feedback, PUBREL packet transmission, and PUBCOMP packet feedback in QoS2 level. When the above processes are successful, the transmission of a small packet is completed. The PUBREL packet will be transmitted only after the PUBREC packet is successfully fed back.
In the first interaction process, we define
Therefore, the transmission delay of the
where
The transmission delay of the
where
Since there is no data deduplication process in QoS2 level, the total delay of the
Problem formulation
To solve the differentiated QoS guarantee problem in EIoT, the optimization objective is defined to minimize the weighted sum of packet-loss ratio and delay under the QoS level selection constraint. The optimization problem is formulated as
where
Delay-reliability-aware MQTT QoS level selection in EloT
Problem transformation
MAB is an efficient reinforcement learning tool to cope with the sequential decision problems under incomplete information. 38 It describes a sequence of exploration–exploitation decision-making processes.39,40 The MAB model is mainly composed of decision makers, arms, and rewards. 41 In each round, the decision maker selects an arm, and the selected arm will generate a reward. 42 The decision maker aims to maximize its reward by exploiting the empirically optimal arm or exploring non-optimal arms.
In this paper, we transform
The proposed DR-MQLS algorithm
DR-MQLS estimates the reward based on historical observations and considers estimation uncertainty through the confidence bound based on UCB.
43
Therefore, the gateway estimates its preference
44
toward
Here,
Then, the gateway selects the QoS level with the maximum estimation value, which is denoted as
Therefore, DR-MQLS draws that
The implementation procedure of the proposed algorithm is summarized in Algorithm 1, which is divided into three phases, as follows:
Complexity analysis
The computational complexity of DR-MQLS is composed of three parts. The computational complexity of the first phase is
Simulation results
In this section, we validate the performance of DR-MQLS through simulations. The single and fixed QoS level selection strategies, that is, only selecting a specific QoS level for data transmission, for example, QoS0, QoS1, and QoS2, are used for comparison. We assume that there are a total of 800 large packets to be transmitted. The channel gain is randomly distributed within
Simulation parameters.
Figure 3 shows the weighted sum of packet-loss ratio and delay versus the number of large packet transmission. Simulation result shows that after 200 large packet transmissions, all the curves show the downward trend, and the performance of QoS0 decreases the fastest. The reason is that the packet-loss ratio of QoS0 decreases due to the channel gain improvement after 200 large packet transmissions, while QoS1 and QoS2 are less affected by the channel gain based on the retransmission mechanism. DR-MQLS outperforms the single and fixed QoS level selection strategies of QoS0, QoS1, and QoS2 in weighted sum of packet-loss ratio and delay by

The weighted sum of packet-loss ratio and delay versus the number of large packet transmission.
Table 2 shows the delay versus the number of large packet transmission. Simulation result demonstrates that the delay of DR-MQLS is slightly higher than QoS0. The reason is that there is no retransmission mechanism and deduplication process in QoS0 level. It performs best in terms of delay, but sacrifices the packet-loss ratio as shown in Figure 3. When
Average delay versus the number of large packet transmission.
DR-MQLS: delay-reliability-aware MQTT QoS level selection.
Figure 4 shows the optimal QoS level selection probability versus the number of large packet transmission. The optimal QoS level selection probability of DR-MQLS converges to 60.10% when the number of large packet transmission reaches

The optimal QoS level selection probability versus the number of large packet transmission.
Figure 5 shows the weighted sum of packet-loss ratio and delay versus

The weighted sum of packet-loss ratio and delay versus
Figure 6 shows the impact of

The impact of
Conclusion
In this paper, aiming at the QoS guarantee problem for EIoT based on MQTT protocol, we proposed a UCB-based delay-reliability-aware MQTT QoS level selection algorithm named DR-MQLS to minimize the weighted sum of packet-loss ratio and delay under incomplete information. Compared with the single and fixed QoS level selection strategies, that is, QoS0, QoS1, and QoS2, DR-MQLS can reduce the weighted sum of packet-loss ratio and delay by
