Sage Journals: Discover world-class research

Abstract

Real-time multimedia transmission from smart physical terminals usually meets the challenges of quality and security problems in cyber physical social systems, such as in the Internet of things systems, social network, and industrial Internet of things. The quality of real-time video transmission is often disturbed by the instability of cyber-physical-social systems, and the security of real-time video can be compromised if fake contents are embedded into the video during the transmission. According to the video coding information and the conditions of cyber-physical-social systems like mobile network, we design a sliding window scheme to solve the out-of-order and loss problems and thus improve the quality of real-time video. In the scheme, the key video frame can be retransmitted in a dynamic range of sliding window. The scheme also simplifies the process of packet sorting and can quickly complete the process of unpacking and playing of the video stream with high accuracy. To defend the attack of inserting fake video into real-time video, we also design a detection algorithm based on time-related token. The experimental results show that the proposed methods can guarantee the continuity of video stream, improve the quality of image in the mobile network and ensure the security during transmission effectively.

Keywords

Real-time transmission cyber-physical-social system Internet of things security improvement sliding window

Introduction

With the rapid development of network infrastructure and mobile devices, video information transmission on the Internet is becoming more and more important, especially in the Internet of things (IoT) environments. Streaming media technology achieves the function of displaying as the video has been downloaded,¹ which becomes the preferred technology for all kinds of network video applications and gains various kinds of customer recognition. Mobile intelligent terminal equipment, such as smart phones and tablet personal computer (PC), has the natural convenience in socializing, online video displaying, video interaction, and many other application scenarios and becomes the mainstream equipment carrier of network video.

However, the network stability of mobile device is poor and can easily be effected by many factors like swinging of signal, shielding of building, and disturbing on signal source. Correspondingly, the negative effects on the network transmission are high packet loss ratio and network delay, and the negative effects on the user experience are video lag and poor effect issued real-time playing especially when the frame rate and distinguishability are higher.

The main reason for the above problems is that when the Internet service uses the best-effort way, packets would be dropped actively if network congestion appears. It will bring lots of effect on the real-time streaming media transmission and could not ensure the quality of video display.² In order to enable real-time streaming media to adapt to the characteristics of the Internet and improve the quality of streaming media services, the researchers have put forward some solutions and optimization program. In the aspect of transmission rate control, Parikh and Kim³ proposed a scalable approach to mitigate different levels of packet loss by classifying packets in real networks and using different slow packet loss methods for different types of packet loss. Bansal and Jain⁴ proposed a method to improve the receiver quality by controlling transmitter transmission based on the feedback of real-time transport protocol (RTP) receiver packet loss rule. Perkins and Singh⁵ defined a set of minimized constraints “circuit breakers” to control the RTP transmitter, which can protect the network from excessive congestion and enhance the user’s multimedia experience. Yang and Meng⁶ proposed a video transmission method based on feedback information which can avoid transmission delay jitter, reduce packet loss rate, and prevent network congestion according to the network parameters and information obtained by the RTP/real-time transport control protocol (RTCP) collection. In the aspect of data packet loss processing, Shen et al.⁷ proposed a video transmission method based on the forward error correction (FEC) flag to control the occurrence of packet loss in the FEC coding network. Melliar-Smith et al.⁸ proposed a method which can recover the burst and random lost packets by odd-even checking the packet during real-time multimedia communication. Frnda et al.⁹ analyzed the packet loss and delay in different situations and proposed a quality-of-service (QoS) model for estimating the triple play service, which can improve the quality of video service. In the aspect of buffer mechanism research, Lin et al.¹⁰ proposed a method to recover the lost packets using the buffer mechanism of periodic synchronization frames. In the aspect of data source and receiver optimization, Singh et al.¹¹ proposed a multipath algorithm in real-time streaming, which used the RTP service algorithm across multiple paths at the sending end and used the corresponding dithering algorithm at the receiving end to improve the transmission quality.

Meanwhile, security problems exist during transmission process, such as video forgery, which is a technique for generating fake video by altering, combining, or creating new video contents.¹² To this problem, in 2015, Patel and Patel¹³ proposed methodologies that used exchangeable image file format (EXIF) image tag information to detect the forgery region frames of given input video. Bozkurt et al.¹⁴ constructed correlation image using binarized discrete cosine transform (DCT) features extracted from the frames, and then estimated the exact location of the forgery line on the correlation image to detect the forgery. Sitara and Mehtre¹⁵ proposed a frame-shuffling detection method that exploits abnormalities in the spatio-temporal domain and compressed domain, which can localize and differentiate the type of tampering present in the video. Mathai et al.¹⁶ proposed a video-forgery detection and localization method based on statistical moment features and normalized cross correlation factor. Yao et al.¹⁷ detected object-based forgery in the advanced video by deep learning.

Most of the above research is only for the improvement of a single link for the entire real-time video play process, such as the optimal transmission for encoded video, the improvement for the transmission process and the simple processing only on sender or receiver. Moreover, they solved the problems by considering only the improvement of quality or just the security problem. There are few research on considering and optimizing the real-time video streaming sending, transmission, decoding, and security as a whole. Especially in the heterogeneous, low bit rate, high packet loss rate, strong interference network environment, and wireless networks environments, real-time streaming media transmission technology needs further research and improvement.

In this article, we analyzed the whole real-time video display process and proposed a method based on RTP protocol to improve the quality of H.264 real-time video display. This method achieves the RTP packet re-order of video data and the retransmission of missing keyframe by combining the network condition, the video resolution, and the frame rate. This method can effectively improve the quality of real-time video display under poor network environment conditions and ensure real-time performance at the same time. Later, we discussed the security problem and proposed a detection algorithm for real-time video based on time-related token to solve the problem that the video may be tampered with.

Real-time video streaming displayprocess analysis

In the process of real-time video display, video coding and streaming media transmission are the two most important links. Currently, the most widely used video coding compression standard is H.264/MPEG-4 AVC (H.264). The streaming media transmission protocol usually selects the RTCP/RTP protocol to meet the higher real-time requirements.

Based on the traditional hybrid coding framework with the predictive-transform pattern, H.264 can improve the compression rate by using multi-reference frame prediction, motion vector of 1/4 pixel precision, integer transformation, and intra space estimation but has the drawback of being susceptible to the transmission errors.¹⁸ A bit of error may cause a serious degradation of the decoding quality or even make the video fail to decode. The delayed packets may be discarded by the decoder because of the time expired.

H.264 encoding is structurally divided into video coding layer (VCL) and network abstraction layer (NAL). The VCL carries video-encoded data, and NAL is responsible for packaging and transmitting data. An H.264 video file consists of a set of network abstraction layer units (NALUs), which contains the encoder processing and the packaged video data.¹⁹ According to the type of slice included in NALU, there are three main types of H.264 protocol frames: I frame, P frame, and B frame. I frame, with the complete decoding information, is independent and has high data volume which can generate a complete picture. P frame uses one-way inter-frame prediction coding which needs to refer to the previous I frame or P frame for coding and has smaller data volume. P frame may be affected by the error or missing of the reference frame which can cause failure to normally decode. B frame adopts bidirectional inter-frame prediction coding, which needs the front and rear frames to be the reference frame.²⁰ The instantaneous decoder refresh (IDR) frame is a special I frame because the following P frame and B frame will not use any frame before the IDR frame as a reference frame.

B frame is not an essential frame during transmission and display process. P frame is also not an essential frame, but is generally indispensable. I frame and IDR frame are indispensable. During the transmission process, no matter what type of data frame is lost, it will lead to the decoding process error, the manifestations are flower screen, frame skip, and so on.

RTP achieves the real-time end-to-end transmission services based on the user datagram protocol (UDP). When the data transfer volume is large, packet re-ordering and packet loss may be occurred. Especially if the video resolution and frame rate are high or the network condition is poor, NALU fragment is more likely to lose, which can lead to the incomplete video data frame.

Specifically, when using RTP to transmit the NALU of the video stream, the sender encapsulates each NALU into a series of RTP packets. The sequence number is S₀, S₁, S₂,…, S_n. The receiving end performs the corresponding processing according to the received RTP packet number. In the ideal case, the sender wraps the video stream into a series of RTP packets with the sequence numbers S₀, S₁, S₂,…, S_n, and the receiver receives the complete packets in the order of S₀, S₁, S₂,…, S_n. Assuming that the receiving packet is S_ex, the currently received packet is S_rep

$S_{rep} = S_{i} (i = 0, 1, \dots, n)$ (1)

At this point, the received package is just the package to be unpacked, then the packet can be parsed and displayed directly. However, in real cases, due to the complexity and dynamics of the network, the transmission paths of the packets are not the same, and the arrival times to the receiving end are not the same either. Usually, there are many situations in which some latter data packages may arrive before the previous packages. Therefore, the distort problem of packet is inevitable. And as that happens, the receiver receives the packet S_rep, and the packet to be unpacked S_ex are not equal. If the received packets are unsorted, the parsed video data may be mistake, and the video could not be played.

As a consequence, in the receiver end, there must be a packet re-ordering process. For the received packet, if S_rep > S_ex, it will be cached and wait for the subsequent packet to be received. Until S_rep is equal to S_ex, then the subsequent packet is searched in the cache from S_ex to parse the RTP packet, and compose the NALU to decode and display. If the sequence number of cached packet is not continuous, it means that the RTP packet is lost, the processing procedure will be skipped, and all the RTP packets of current NALU will be discarded. This kind of treatment may cause video lag and flower screen.

In addition, the size of NALU of high-definition video encoded by H.264 is generally large. Because of the limits of maximum transmission unit (MTU), RTP needs to adopt the FU-A or FU-B way to package the video data (FU-A and FU-B, which are two versions of the fragmentation unit, identified with the NALU type numbers 28 and 29, respectively).²¹ The FU-A or FU-B approach essentially divides an NALU into multiple RTP packets, which increases the likelihood of out-of-order or packet loss during the transmission process and aggravates the decline of display quality.

Compared with the desktop environment, mobile device terminal is weak in computing, storage, and network performance, the impact of video play quality is more obvious. Therefore, it needs some special treatment to ensure video display effect.

Video data reconstitution algorithm

In this section, an improved video packet reconstitution algorithm is proposed. The core idea of this algorithm is to provide a variable time range for the RTP packets which belong to the same NALU. All the out of order and packet loss problems of RTP packets within an NALU will be handled in this time range. If the NALU is a keyframe, the retransmitted lost RTP packet is supported within a variable time range.

When the player end is receiving RTP packets, a hash table will be used to cache the RTP packets. In the hash table, the RTP packet number and RTP packet form a key-value pair. When the player receives an RTP packet, its serial number and content will form a key-value pair and will be inserted into the hash table. The player end keeps taking associated RTP packet from the hash table to restore NALU and to decode it, as shown in Figure 1.

Figure 1.

Use a hash table cache RTP packet.

As can be seen from Figure 1, an NALU may be packaged into one or more RTP packets. So, at the receiving end, the number of RTP packets being processed and received are displayed.

During the video transmission, the timestamps of the RTP packets belonging to an NALU are the same, and the RTP packet sequence number is continuous. Therefore, when the RTP packet is received, if the timestamp of the packet is changed, it indicates that an RTP packet of a new NALU has been received. And previously received RTP packets can be composed of a complete NALU, which can be directly provided to the decoder for decoding and display.

Assuming that the number and timestamp of the currently received RTP packet are r and T_r, the number and timestamp of the currently decoded RTP packet are d and T_d. During the reception process when the RTP packet timestamp changes, a set P_NALU containing the to-be-decoded RTP packet P_d to the previous packet P_r_–1 from the currently received packet can be obtained. The elements in the set are ordered. It is specifically stated as follows

$P_{NALU} = {P_{d}, P_{d + 1}, \dots, P_{r - 1}}$ (2)

When the number of elements of the set P_NALU is equal to r–d, it indicates that the data packets in the set are continuous and the NALU is complete, which can be directly supplied to the decoder for decoding.

When the number of elements of the set P_NALU is not equal to the value of r–d, it means that not all the RTP packets of the NALU have been received. The NALU cannot be completely formed. When the packet timestamp changes, it means that the RTP packet comes from a new NALU packet. The received RTP has been used to sort the hash table, so that without considering the case of packet out of order, and when the packet timestamp changes, there may be two cases: one is that the RTP timestamps of two consecutive NALU have changed, and the other one is that one or more RTP data between two NALU may be lost, so the RTP timestamps of two discontinuous NALU have changed, as shown in Figure 2. As can be seen in the figure, the RTP packets with number D and number R belong to the ath NALU and the (n + a)th NALU. There may be $N (N \geq 0)$ NALU between the two NALU, so this kind of discontinuities may occur on one element P_n or multiple elements P_n, P_n₊₁,… (n = d, d + 1,…, r–1).

Figure 2.

RTP transmission H.264 NALU.

When the elements in the set P_NALU appear discontinuous, if the NALU is directly sent to the decoder, the image quality will have problems. In order to ensure the real-time feature and improve the image quality to a certain extent, we use a sliding window to deal with the incomplete processing NALU.

The sliding window is to achieve the purpose of order correction by waiting for the packet with small serial number but late received. The sliding window has two time boundaries: the left boundary W_l and the right border W_r. The main factors that affect the size of the sliding window include the average number of I frames packed into RTP packets, video parameters, and bandwidth.

W_l can be computed by the following function. The size of the sliding window is at least slightly larger than Num, so that all RTP packets of the I frame after the out-of-order can be reached as much as possible

$W_{l} = Num + \frac{Num \times MTU}{S} \times (fps \times Num)$ (3)

where Num is a fixed value which represents Num RTP packets, MTU represents the maximum transmission unit (1500 bytes), S represents the number of video bytes received per second by the display program, and fps represents the resolution of the video.

W_r can be computed by the following function

$W_{r} = λ \times f \times Num + W_{l}$ (4)

where f × Num means ideally the number of packet received by the player end for 1 s, and λ is the configuration parameter of the program and indicates the maximum delay time adjustment factor of the player end.

The relationship between the difference value of the number of unreceived packet and received packet w and the window boundary W_r and W_l is determined by the following process:

When 0 < w < w_l, it indicates that the time difference between the unreceived packet and the received packet is small, so the decoding process can continue to wait.

When w_l < w < w_r, the NALU is re-organized by all the RTP packets in the set and is ready to decode. For the RTP packet of non-keyframe, the simplest method is to discard all the data of the set and discard all the RTP packet of non-keyframe, until the RTP packet of keyframe is received then start the process. If the RTP packet in the set belong to some keyframe, it needs a retransmission treatment. For the retransmission packet, there must be a timeout mechanism, the mechanism of overtime can also correspond to the number of packets, which change the value of w_l and let w_l = w_l + Num, then going into next reception.

When w > w_r, the RTP packet in the entire set needs to be discarded and reprocessed from the current receive packet P_r.

The entire pseudo-code of the algorithm based on sliding window and keyframe is as follows:

Begin

W_l = Num

W_r = W_l ₊ $λ$ *f*Num

T_o = T_n = 0

Nd = Nr = 0

Wt = 0

while isStopRecv:

P = recvRTPFromNet()

T_n = P.timestamp

Nr = P.seq

if T_n == T_o:

InsertRTPPairToMap()

else:

T_o ₌ T_n

PNALU = GetRTPPacketFromMap(Nd,Nr-1)

N = GetNotReceivePacketSeq(PNALU)

if N != -1:

if N < W_l:

do nothing

continue

elseif W_l _< N < W_r:

if isIDRFrame(PNALU):

SendReSendReq(N)

W_l ₌ W_l ₊ Num

continue

delete PNALU

continue

else:

SendToDecodeThread(PNALU)

Nd = Nr

if (Sometime later):

$W_{l} = Num + Nu m^{2} \times fps \times MTU / S$

W_r = W_l ₊ $λ$ *f*Num

In the practical application, the video transmission process and decoding process of multimedia application mainly require developers to achieve it by themselves. Our method can be used in the stage including getting video data from network video recorder/digital video recorder (NVR/DVR), packet and transmission, and packet organizing and display. The proposed method does not need to rely on NVR/DVR manufacturers to modify or update, it is very easy to operate.

Video-forgery detection method

Due to the lack of security design, attackers can exploit the following methods to achieve the attack purposes in the existing video surveillance technology.

The attacker can implant a virus, Trojan horse or other malicious code in a monitor computer or other devices to make the monitoring terminal continue playing a non-real-time fake video.

The attacker can transmit non-real-time pseudo-monitoring video to the monitoring terminal by a certain attack method from a point between the camera and the monitoring terminal.

All these attack methods will cheat the monitoring staffs so that the real situation of the monitored location cannot be obtained.

Video security problem during transmission can be viewed as the problem of distinguishing the tampered videos from untampered original ones reliably. The diagram of video forgery is shown in Figure 3. Assume the frame insertion occurs, frame-m to frame-n is a forgery part, frame-m and frame-b are connections. No matter what the inter-frame forgery method is, the frames around the connection must be the same without any traces left in visual.²² That means frame-a and frame-m should be the same and so are the frame-n and frame-b.

Figure 3.

Video-forgery diagram.

In this case, we propose a video-forgery detection method based on time-related token to solve this problem. The main idea is to add a field between RTP packet header and video data to represent token. One token should be related to time, cannot be calculated from the previous token, and can be verified on another side. Moreover, the token values in the RTP packets of one frame (P/I frame) are same, and the token values of RTP packets in different frames are different. In real-time video transmission, both the sender and the receiver can calculate the token of each frame separately. By this way, when receiving the RTP packets, the receiver can verify the consistency of the token in the received packets with its own calculated token, and then detect the inserted forged video when the attack occurs.

How to generate token is the key point in this method. We propose an algorithm to meet the requirement, as shown in Figure 4. The generation of token needs to be associated with the start time of playing to ensure that it is associated with a video play process. For a play process of real-time video, the generation sequence of token should be same at the video sender and receiver. That is, the token generated by the sender must be consistent with the token generated by the receiver. To solve this problem, we should design the function for generating the random number sequences based on the seeds, and make the same seed produce the same random number sequence. In this way, we can provide same seed for sender and receiver during one transmission, so they can generate the same token in the corresponding number of times. In this method, we need to pay attention to the processing and verification of token when packet loss and I frame retransmission occur.

Figure 4.

The diagram of generating token sequence.

Experiment results and analysis

The camera used in our experiment is DS-2CD4232FWD-I (HIKVISION), three kinds of real-time video stream are used to be the sampled data. The encoding information is (A) 15 fps, 720 × 480; (B) 15 fps, 1280 × 720; (C) 15 fps, 1920 × 1080, respectively, choosing 8 min video stream of A, B, and C as the sender of the video input source.

The experiment is carried out in the internal network, the sender PC and the player-end mobile phone are in different internal network. The sender PC uses the wired connection, and the mobile phone uses the wireless connection. The network transmission between the two subnets in the intranet does not fluctuate for a certain period of time. All the experiments in the following are completed in a continuous period of time.

The main goal of the proposed algorithm is to improve the quality of real-time video pictures and ensure the security of real-time video in a network environment with poor transmission conditions, and there is no significant impact on the real-time performance of video. In order to show the advantages of the algorithm in this article, we mainly compare two normal transmission scenarios with our algorithm. The test scenarios are:

UDP-based standard RTP/RTCP transmission scenarios. This transmission is used by OpalVOIP, sipdroid, and IMSDroid. During the processing of RTP reception, the NALU that has lost RTP packets will be discarded and avoid the mistake caused by the wrong NALU received by the decoder. In this experiment, we use sipdroid as the contrast sample.

TCP-based NALU transmission. Here, the comparison object mainly refers to the TCP-based RTMP protocol. The softwares using the RTMP protocol are often some SDK provided by the camera manufacturers such as HCNetSDK of HIKVISION and some source softwares such as FFmpeg, ijkplayer. We use ijkplayer as the contrast sample.

Our improved RTP protocol method will be referred to as RTP+.

The experiment will analyze the algorithm and other methods from four aspects: video image quality, image delay, transmission efficiency, and transmission security.

Video image quality

In the same network environment, three types video stream (A, B, and C) are all displayed in the abovementioned three scenarios. The number of flower screen and discontinuous screen will be recorded, and for the comparison, the results are shown in Figure 5. As shown in the figure, the horizontal axis represents the transmission and processing mode, and the vertical axis represents the number of times the abnormal display of the video. The number of these abnormal displays is counted in the 8-min display time.

Figure 5.

Picture quality comparison.

According to the results in Figure 5, we can see that with the increase of the video resolution and the data size, the algorithm proposed in this article presents a dominant growth. When playing a 720p video, the screen in the scene 3 display screen is significantly less than the use of scene 1 transmission, but for the scene 2, there is no particular advantages. And when playing 1080p, the image quality advantage of our method is obvious. We have less video lag than scene 2 and less flower screen than scene 1.

The algorithm in this article discards the incomplete NALU of the non-I frame, and retransmits the incomplete I frame, which can reduce the number of flower screen and video lag times. The discarding of the non-I frame may cause the frame skip, but it will not affect the overall effect of display during the actual monitoring process.

Image delay

In order to quantify the value of the screen delay $Δ t$ , we should first eliminate the time difference between the sender and the receiver. $Δ t$ can be calculated based on the transmission time T_s of the first packet and the reception time T_r

$Δ t = | T_{s} - T_{r} |$ (5)

In the experiment, when recording the sending time and reception time in the player end of some I frame, using $Δ t$ to amend the difference between the two times, which is the quantizable retardation value of the image.

We use the video with 1080p, 15 fps H.264 format as the input source and compare the display effect between different methods. For every scene, we choose the sending time and the time providing to the decoder of 30 I frame. After using Δt amendment, the difference between the two times indicates the delay of different methods. The results are shown in Figure 6.

Figure 6.

Delay time comparison.

As can be seen from Figure 6, the delay of the scenario 2 is significantly greater than that of scene 1 and scene 3, and even some NALUs have multiplied. The processing latency of NALU in scene 1 is minimal. In scenario 3, the NALU processing delay is relatively stable, but slightly larger than the scene 1. This result can also be predicted by analysis, when the data transmission is heavy, packet loss will happen because of the congestion. The TCP protocol approach is to achieve the retransmission and to confirm, and the standard RTP/RTCP is based on the UDP protocol, which has no mechanism to guarantee the reliability of the data. The algorithm in this article retransmits the keyframe on the basis of dealing with out-of-order. This retransmission mechanism has time-limited efficacy, which does not guarantee that the lost data can certainly be retransmitted. Based on the analysis of these results, the proposed algorithm has obvious advantages when the video resolution and video quality are at a high level.

Transmission efficiency

We use the received data from the player end to compare the video transmission efficiency in scene 1, scene 2, and scene 3. The results are shown in Figure 7. The horizontal axis in the figure represents the time and the vertical axis represents the number of transmitted data. When the video source is 480p, the transmission rate of three scenes is similar, but when the video resolution increases, the algorithm in this article will show a certain advantage.

Figure 7.

Transmission efficiency comparison.

Scene 2 uses TCP to transmit, and due to the mechanism of TCP, the video transmission speed is slower than UDP. Scene 1 and scene 3 use UDP to transmit video, so the video transmission speeds are faster. The algorithm in this article has obvious advantages over scene 2.

For the video transmission in scene 1 and scene 3, the scene 3 needs to retransmit the keyframe if the keyframe is lost, so the average transfer rate is lower than the scene 1. Furthermore, both scene 1 and scene 3 contain control commands, of which scene 1 uses RTCP to provide the feedback information of the data transmission and scene 3 will inform the sender to retransmit the lost data if the keyframe is lost. RTCP protocol packets are sent intermittently, and the keyframe retransmission request is sent out only if the keyframe is lost. So, in the control information, the data size transmitted in scene 3 is less than scene 1.

Of course, the network environment has great effect on the transmission rate comparison. In a very poor network environment, if the number of RTP packet loss is frequent, then the possibility of keyframe loss will become larger, and the number of retransmissions will increase. If the retransmission request is implemented by using the UDP protocol, the retransmission request may be lost due to the poor network, which makes the keyframe that can be really retransmitted reduce, so the efficiency of our algorithm is almost similar to the scene 1.

Transmission security

In the experiment, the original video sequences and result of frame insertion are shown in Figure 8. In this case, frame number a and b are faked frames inserted in frame location between frame 6 and 7. We examine the forgery by our method. The result is that our method that can detect the token generated by the receiver is not consistent with the token in the package. Meanwhile, the calculation time of token is positively correlated to the number of frames and size of resolution in original video, as shown in Table 1.

Figure 8.

Original video sequence and result of frame insertion.

Table 1.

Calculation time of token with different frames and resolution.

Σ frames original video	Resolution	Time calculation (s)
152	480p	~0.98
	720p	~1.95
	1080p	~3.14
336	480p	~3.33
	720p	~5.41
	1080p	~7.86

Conclusion

The quality of real-time video transmission is often disturbed by the instability of cyber-physical-social system (CPSS), and the security of real-time video can be compromised if fake contents are embedded into the video during the transmission. According to the video coding information and the conditions of CPSS-like mobile network, we analyzed the receiving and display process of the real-time video stream, and discussed the reassembling process of the received data packet in the receiving end. Then we proposed an improved data packet reassembling algorithm aiming at the problem of instability of the mobile client network and its resulting video lag, video interrupt, or even shutdown. The algorithm used the hash table to cache the data packet, carried on the keyframe retransmission for the missing packet, then used a sliding window mechanism to treat the video transmission. According to the position of the difference between the unpacking serial number and the receiving packet serial number in the window, the anomaly of the video transmission can be handled. The algorithm is different with the traditional method which used timestamp and serial number to sort the packet, which can guarantee the accuracy and streamline the process of video packet sorting and quickly complete the sequential unpacking and display process. We also proposed a method based on time-related token to ensure security of real-time video transmission. On the process of receiving packages, the receiver examines whether the token in the package is consistent with its own calculated token and can detect the video forgery when the attack occurs.

The experimental results show that the proposed algorithm not only improves the video clarity in the case of excellent network environment but also improves the video display in the poor network environment such as mobile terminal equipment and the wireless mobile network, and the network sending and receiving signals are poor. The fluency and accuracy of our algorithm have obvious effect. When the bandwidth is poor, it will lead to the increased data transmission delay. The transmission time of each video frame is longer, and when the transmission time is greater than the video display waiting time, it will cause the screen to stop and waiting for data buffer. In such cases, the keyframe retransmissions in the algorithm may aggravate the network congestion, which will result in a delay increase. Therefore, how to adjust the retransmission strategy according to the bandwidth situation is also a focus of follow-up research. In the follow-up study, we will study the effects of bandwidth on the algorithm especially under low-bandwidth conditions, how to further improve the video picture quality, and how to further reduce the screen delay.

Footnotes

Handling Editor: Shancang Li

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: The research presented in this paper is supported in part by the National Natural Science Foundation (grant no. 61571360),the Shaanxi Science & Technology Co-ordination & Innovation Project (grant no. 2016KTZDGY05-09),and the Innovation Project of Shaanxi Provincial Department of Education (grant no. 17JF023).

References

Pan

Zou

Liu

et al . Buffer management for streaming media transmission in hierarchical data of opportunistic networks. Neurocomputing 2016; 193: 42–50.

Gorlatch

Humernbrum

Glinka

Improving QoS in real-time internet applications: from best-effort to software-defined networks. In: 2014 international conference on computing, networking and communications (ICNC), Honolulu, HI, 3–6 February 2014. New York: IEEE.

Parikh

Kim

Methods for mitigating IP network packet loss in realtime audio streaming applications. SMPTE Motion Imag J 2014; 123: 28–34.

Bansal

Jain

Providing feedback to media senders over real time transport protocol (RTP). US20140043994 Patent, 2014.

Perkins

Singh

. Multimedia congestion control: circuit breakers for unicast RTP sessions. IETF Internet Draft, 2013, https://tools.ietf.org/pdf/draft-ietf-avtcore-rtp-circuit-breakers-02.pdf

Yang

Meng

Design of control method for real-time stream media transmission in video surveillance system. J Zhejiang Univ Technol 2012; 40(4): 454–457.

Shen

Moore

Wang

et al . Lost real-time media packet recovery. US8819513 Patent, 2014.

Melliar-Smith

Moser

Koh

CC.

Forward error correction for burst and random packet loss for real-time multi-media communication. US8230316 Patent, 2012.

Frnda

Voznak

Sevcik

Impact of packet loss and delay variation on the quality of real-time video streaming. Telecommun Syst 2016; 62(2): 265–275.

10.

Lin

Xia

Hsu

et al . Use of frame caching to improve packet loss recovery. US20160366445 Patent, 2016.

11.

Singh

Ahsan

Ott

. MPRTP: multipath considerations for real-time media. In: ACM multimedia systems conference, Oslo, 26 February–1 March 2013, pp.190–201. New York: ACM.

12.

Shih

. Video forgery. In: International conference on network-based information systems, Oslo, 7–9 September 2011, pp.7–11. New York: IEEE.

13.

Patel

MM.

An improvement of forgery video detection technique using error level analysis. Int J Comput Appl 2015; 111(15): 26–28.

14.

Bozkurt

Ulutaş

A new video forgery detection approach based on forgery line. Turk J Electr Eng Co 2017; 25: 4558–4574.

15.

Sitara

Mehtre

. A comprehensive approach for exposing inter-frame video forgeries. In: IEEE 13th international colloquium on signal processing & its applications (CSPA), Batu Ferringhi, Malaysia, 10–12 March 2017, pp.73–78. New York: IEEE.

16.

Mathai

Rajan

Emmanuel

. Video forgery detection and localization using normalized cross-correlation of moment features. In: 2016 IEEE southwest symposium on image analysis and interpretation (SSIAI), Santa Fe, NM, 6–8 March 2016, pp.149–152. New York: IEEE.

17.

Yao

Shi

Weng

et al . Deep learning for detection of object-based forgery in advanced video. Symmetry 2017; 10(1): 3.

18.

Wang

. Two-dimensional reversible data hiding-based approach for intra-frame error concealment in H.264/AVC. Signal Process: Image 2016; 47: 369–379.

19.

Deng

Ding

et al . Efficient block-based transparent encryption for H.264/SVC bitstreams. Multimedia Syst 2014; 20(2): 165–178.

20.

Wang

Alvarez-Mesa

Ching Chi

et al . Parallel H.264/AVC motion compensation for GPUs using OpenCL. IEEE T Circ Syst Vid 2015; 25(3): 525–531.

21.

Wang

Y-K

Sanchez

Schierl

et al . RTP payload format for high efficiency video coding (HEVC), https://tools.ietf.org/html/rfc7798.2016

22.

Liang

Tian

et al . A novel video inter-frame forgery detection method based on histogram intersection. In: 2016 IEEE/CIC international conference on communications in China (ICCC), Chengdu, China, 27–29 July 2016, pp.1–6. New York: IEEE.

Improving the security and quality of real-time multimedia transmission in cyber-physical-social systems

Abstract

Keywords

Introduction

Real-time video streaming displayprocess analysis

Video data reconstitution algorithm

Video-forgery detection method

Experiment results and analysis

Video image quality

Image delay

Transmission efficiency

Transmission security

Conclusion

Footnotes

Declaration of conflicting interests

Funding

References