Abstract
Introduction
Recent years have witnessed a growing interest in research on the Internet of Things (IoT), 1 and the latest developments in radio frequency identification (RFID), smart sensors, communication technologies, and Internet protocols have made the concept of IoT feasible. 2 One of the recent advances in the computing world is Cloud computing, which offers vast resources that can be flexibly used to fit changing demand. 3 Cheap and large amount of virtual computing power available at the cloud center makes cloud computing a superb platform to process big data. However, cloud computing often leads to high latency when using services placed at distant locations. 4 To address this problem, edge computing5–7 has recently received intensive attention in both academia and industry. Edge computing aims to place resources for storage and computation closer to the edge of the network. Due to geographically distributed deployment and proximity to consumers, it can support applications and services with reduced latency and improved Quality of Service. 8 Unfortunately, edge computing units usually do not have sufficient storage and computing resources to handle the massive amount of IoT data. 9
To fully exploit the advantages of both cloud computing and edge computing, several works10–12 introduce collaboration between edge computing and cloud computing. Inspired by this idea, we develop an intelligent visitor system based on cloud and edge collaborative computing.
In many enterprises and government departments, there are reception posts to register visitors. From traditional manual processing to computer system management, without exception, all largely rely on human participation. Such a management model is not only inefficient, but also error-prone and low in security. In the intelligent visitor system, the visitor information is obtained through sensors, and the cloud service of artificial intelligence (AI) is applied to integrate face recognition 13 and speech recognition 14 technologies into the system, which can greatly improve the efficiency of visitor registration and the accuracy of recorded data, making management more convenient. Recently, Wang et al. 15 designed a remote visitor system based on the near field communication technology to improve the level of automation of the visitor system. Luo et al. 16 developed an intelligent key management system based on face recognition technology. However, the above two works do not involve cloud and edge collaborative computing.
In this paper, we develop an intelligent visitor system based on the edge computing hardware and AI cloud, which can use voice guidance and face recognition technology to enable visitors to complete self-registration in an unmanned environment and ensure the accuracy and effectiveness of real-time of data, while reducing the influence of human factors on the system and greatly improving work efficiency. With the framework of cloud-edge collaboration, the problem of insufficient AI computing capacity at the edge is remedied. Moreover, since the model is deployed in the cloud, it is easier to update and maintain the model. Affected by the novel coronavirus, there is currently a demand for temperature detection of visitors. Therefore, to progress matters, an infrared (IR) temperature measurement module and an alarm mechanism are introduced into the visitor system to protect the health of the staff. To sum up, the system we developed has the following differences from the general visitor system based on AI cloud services. First, our system employs the ID card reader at the edge computing end to identify the visitor’s identity. Second, combined with the human body temperature sensor at the edge, our system can obtain the visitor’s temperature and record it in real-time, which is suitable for the current epidemic normalization prevention and control requirements. Unfortunately, the general visitor system does not have the above two advantages.
In summary, our contributions are listed as follows:
We design an intelligent visitor system based on the edge-computing hardware and AI cloud. Our system can fully utilize the merits of both cloud computing and edge computing. We introduce an IR temperature measurement module and an alarm mechanism into the visitor system.
Related work
Lee et al. 17 proposed an IoT-based visitor detection system that uses an infrared sensor to detect the human body and two ultrasonic sensors to locate visitor servo motor under the position. Lin et al. 18 developed a local binary pattern (LBP)-based visitor access control system for the senior citizen. To effectively control the flow of visitors, Gowtham et al. 19 presented a visitor gate pass management system. Gautam et al. 20 designed a smart system to handle the visitor’s entry into houses or premises automatically. The system is able to keep track of people visiting, and their condition while entering the premise, safety gear equipment, etc. Jayanth et al. 21 proposed a real-time surveillance system by using the IoT module and the Raspberry Pi. Recently, Koshe et al. 22 created an IoT system based on a contactless guest approval using Raspberry Pi and Arduino.
Cloud and edge collaborative computing technology
Edge computing is a new computing model associated with the rapid development of the IoT, AI, and big data. It deploys smart terminals with computing, storage, and other capabilities on the source side of the network, providing intelligent services in the edge sense to meet the industry’s needs for real-time, offering application intelligence, and security. Low energy consumption, intelligence, real-time capability, and reliability are the main directions of edge computing research.
Cloud computing depends on virtualization technology to effectively integrate and manage system resources, in order to provide users with efficient computing services, and reduce enterprises’ investment in basic resources such as equipment and networks.
In cloud-edge collaboration, the cloud is responsible for data storage, model training, algorithm update, and other tasks, while the edge is responsible for on-site data collection, storage, transmission, identification, and execution of the cloud results. According to the division of computing tasks, cloud-edge collaboration can be divided into three modes:
Training/computing cloud-edge collaboration. The edge terminal collects data and uploads, downloads the model, and performs related local computing tasks, while the cloud performs model design, training, and update. Cloud-oriented cloud-edge collaboration. This places both training and model calculation in the cloud. The edge side transmits data to the cloud through the Application Programming Interface (API) provided by the cloud and directly obtains the calculation results of the model. Edge-oriented cloud-edge collaboration. The edge downloads the initial training model of the cloud. During use, it not only calls the model for calculation, but also performs real-time training and update.
Modes 1 and 3 are suitable for scenarios with strong computing power at the edge and high energy consumption. They can work offline though they have high integration difficulty. Mode 2 is suitable for scenarios with weak hardware capabilities and low energy consumption. However, here online work is required, and integration is less difficult. With the development of 5G networks,
23
cloud-oriented cloud-edge collaboration technology will be promoted and applied in the future.
System structure
The structure of the cloud-edge collaborative intelligent visitor system is shown in Figure 1. This includes two parts: (1) cloud service layer and (2) edge computing layer. Through the applications of cloud face recognition and speech recognition technologies, the system allows the edge side to process visitor information in real-time. Combined with the information fed back by the edge sensors, the visitor’s autonomous registration function can be achieved in unmanned scenarios.

System architecture. Our proposed system can be logically divided into the cloud service layer and edge-computing layer. Specifically, the cloud service layer includes AI cloud service and visitor application cloud service. The edge-computing layer provides the visitor system with local computing capability. The core motherboard uses the RK3399 development board, runs the Android 8.0 system, and uses USB, UART, I2C, and other interfaces to connect external devices. AI: artificial intelligence; USB: universal serial bus; UART: universal asynchronous receiver–transmitter; I2C: inter-integrated circuit.
The edge-computing layer provides local computing power for the visitor system. The core motherboard uses the RK3399 development board, runs the Android 8.0 system, and connects external devices by universal serial bus (USB), universal asynchronous receiver–transmitter (UART), inter-integrated circuit (I2C) and other interfaces. The camera acquires the visitor’s face image. The microphone array acquires the visitor’s voice, the ID card reader reads the ID card of visitors, and the IR temperature sensor obtains the face temperature of visitors.
The cloud service layer is composed of AI cloud services and application services. The AI cloud provides image recognition and speech recognition services for the visitor system and intelligently recognizes the image and voice data from the edge. Through access to the cloud API, the edge can quickly obtain the services of AI. Application services are mainly to provide data storage and query and other related services for the visitor system, realizing the remote management of the visitor system and providing data access interfaces for edge computing.
System design
In the cloud-edge collaborative intelligent visitor system, the visitor can interact with the system through the intelligent terminal and the administrator uses the cloud application to manage the terminal data and visitor services, and obtain visitor information. Visitors and administrators use different platforms. Edge terminal applications are run on the Android platform. Visitors can manipulate the system through speech and touch screen. The management terminal manages the system operation by using a browser to access the cloud application. Figure 2 presents the system structure.

System diagram of application software.
The system is divided into four layers. The top layer is the front-end application layer, providing users with human–machine interfaces, which are mainly an Android App and browser running on edge terminal devices.
The API layer provides a data access interface for the front-end interface, mainly including visitor registration, visitor sign-in, visit record query, Android message push, and related interfaces that integrate third-party services.
The platform layer includes Baidu AI cloud platform, application server Tomcat, and third-party cloud services, which provide an operating environment and technical support for the API.
The data storage layer contains a MySQL database and a face feature database that store visitor-related data. The feature database is stored on the Baidu AI cloud. The application side manages the face data, and the face data needs to be managed through the API provided by the other party.
Edge hardware design
The traditional visitor system mainly relies on the receptionist to enter the visitor information in the management software, while the intelligent visitor system employs intelligent edge-computing devices to assist visitors to complete the registration by themselves. The hardware device is mainly composed of the rk3399 motherboard, which integrates a six-core ARM 64-bit processor. The main frequency is up to 1.8 GHz, providing sufficient computing power for edge computing. With wireless fidelity (WIFI) module, 4G memory, and 32G EMMC storage, the board can support Android and the operation of the Linux system.
The hardware integrated on the periphery of the motherboard includes six-microphone arrays, USB camera, ID card reader, Panasonic AMG8833 IR thermal array sensor, and USB thermal printer. The six-microphone array is used for voice wake-up and voice recognition. In a noisy environment, the background sound of the environment is removed to the maximum extent, leaving only the required sound waves. The ID card reader is used to read the visitor’s ID and perform identity authentication of the ID card, and the IR thermal array sensor is used to detect the visitor’s temperature. The USB thermal printer can output QR codes and text information to facilitate the access control system to confirm the identity of visitors. The hardware composition is shown in Figure 3.

Hardware diagram of the edge device.
Edge application design
The edge terminal application software is designed based on an Android App. The main modules are composed of four modules, that is, AD/wake-up, registration items, visitor face registration, and voice recognition guide, which are shown in Figure 4.

Android App module.
AD/wake-up
The visitor system usually needs to run uninterruptedly. This application automatically runs when the system is turned on. When no one needs service, it plays a promotional image and waits to wake-up. The system can enter the main interface of the system in three ways: voice wake-up words, face recognition, and tapping on the touch screen. The reason why it needs the application to wake-up is that it can help to reduce unnecessary access to cloud services.
The six-microphone array provides a hardware wake-up function. The designated voice wake-up word array will send the sound intensity and position information through the hardware’s UART interface. The APP will enter the main interface by monitoring the interface data.
Face recognition wake-up adopts Android’s own FaceDetector component. It continuously acquires images from the camera and recognizes the faces in the images to enter the main interface
Touch wake-up uses the touch event in the Android activity. By monitoring the touch event, the system can determine whether there is a finger touching the screen, and if so, enter the main interface.
Visitor face registration
In order to ensure the authenticity of the visitor data records, the first time the visitor uses the system, the visitor needs to register in the system. The registration module saves the verified visitor’s face photo and identity information in the visitor database. After the registration is completed, visitors can complete the visit registration by face recognition, thereby improving work efficiency.
The integration of the ID card is that by reading the internal data of the ID card (RFID) through the ID card reader, the edge terminal obtains the visitor’s ID card number, name, address, and ID photo. The real identity of the visitor is judged mainly by comparing the face photo on the spot with the ID photo. Its workflow is shown in Figure 5.

Service flow in the visitor registration.
The terminal application judges the integration of the ID card by Baidu Cloud’s AI service. The interactive interface adopts the form of a RESTful interface. Both request and response use JSON format data. The detailed process is given as follows:
The edge terminal first needs to read the electronic data of the ID card, and parse out the ID card number, ID card photo, and related information of the visitor. The edge terminal calls the camera to obtain the face image of the visitor through the camera. Next there is a requirement to optimize the information of the two images and compress their size. Encode the image by Base64 and generate a data packet requesting the service according to the parameters of the API interaction of the AI cloud. Send a request to the AI cloud service, and the cloud service obtains account information for judgment. Perform face recognition on the image that meets the calling conditions and return the face recognition result. The application parses the returned results to obtain the recognition rate and related information for judgment. After the comparison is successful, use the ID number as the key to the image obtained by the camera, and record the face features in the Baidu AI cloud face database to complete the registration of the face database. Meanwhile, the cloud service interface is used to save the relevant information for this visitor registration.
Face sign-in
After visitors complete the registration, they can sign-in by face recognition service. When the visitor selects this module, the system uses the camera avatar to obtain the visitor’s face image and selects the face database for face search. The process is depicted in Figure 6.

Service flow in the visitor sign-in.
The sign-in module is similar to the registration process. It calls the Baidu AI cloud face search service, encodes the image captured by the camera, and sends it to the corresponding interface of the cloud service. The cloud will return a list that matches the current face image, where the one with the highest similarity ranks in the first position. We only need to determine whether the first face data has reached the recognition rate we set.
Speech recognition
In the Android application of the edge terminal, Baidu AI cloud’s speech recognition technology is applied for guest guidance throughout the visit. According to the screen prompts, visitors can operate the system by voice instead of clicking a button. Besides, the system also supports voice playback of data query results. The essence is the application of automatic speech recognition 24 and text-to-speech 25 technologies.
In the intelligent visitor system, the system interaction mainly adopts the form of short text questions. In order to control the operation of the program, the system submits the visitor’s voice to the cloud for recognition to obtain the corresponding text and parse the text keywords. Figure 7 shows the process of using speech recognition.

Service flow in the visitor’s speech recognition.
First, the user sets the relevant parameters (language, sampling rate, etc.) of speech recognition playback and specifies the speech recognition listener. To avoid blocking the user interface, background threads are adopted for identification. A request in the background thread is sent to the cloud speech recognition interface to recognize the voice data stream, thus obtaining the text corresponding to the voice. According to the keywords of the text characters, the system will choose to execute the corresponding instructions or use the voice to answer the visitor’s questions.
IR temperature measurement
Affected by the new coronavirus, many companies need to measure and record the body temperature when receiving visitors. If the temperature is measured manually, it is not only inefficient, but also increases the risk of exposure. To alleviate this problem, we introduce an IR temperature measurement module into the intelligent visitor system. When a visitor applies for a visit, the IR temperature sensor will automatically recognize the temperature of the visitor’s face and record the corresponding temperature. If the alarm condition is met, the terminal will sound the alarm and send a message to the administrator through the cloud platform.
Panasonic’s IR thermal image sensor AMG8833, which uses an
Cloud application design
Visitor data and application-related data are stored in the cloud database, and the system management is realized through the cloud management application. This application not only provides a set of API interfaces for Android applications, but also provides functions for visitor data management. The main modules include department management, employee management, visit management, visitor management, visit record query, administrator account settings, etc., as shown in Figure 8. The application is implemented by the lightweight Java framework Spring Boot.
26
Functions of the modules are described as follows:
Department management is to maintain the relevant information of each department, and the data is used on the edge terminal through the API interface. Employee management maintains the names and telephone numbers of all employees of the enterprise. Moreover, when the visitor signs in, the employee to be visited can be notified in advance by receiving a text message. The operation of Visitor information maintenance is to maintain the visitor identity information registered on the platform. Inquiry of visit record is to inquire about the visit time and reason of all visitors uploaded by the terminal, reception staff, and other relevant information.

Application service module.
System implementation and application
Cloud-edge collaboration is to balance the utilization of computing resources between the edge and the cloud. According to the characteristics of our system, the computing capacity of the edge and the cloud is fixed. The edge is responsible for the data interaction with the sensor hardware and uses the face detection algorithm of the Android SDK to complete the face judgment. After the face is detected, the cloud searches the face. The core computing of face recognition is concentrated in the cloud, and the reading and storage of access data are completed through the cloud. The edge end completes lightweight computing, and reduces the power consumption of edge devices, thus improving the performance of the visitor system.
Technical comparison between face recognition in cloud computing mode and face recognition in traditional embedded devices: (1) Computing power comparison. Embedded devices usually employ OPEN CV’s LBP or other algorithms to detect faces, and then use eigenfaces or Fisherfaces to complete face recognition. The edge needs to realize face feature extraction and classification, which requires a large computing power. Fortunately, in cloud-edge collaboration mode, only face detection needs to be completed at the edge, and the classification process is completed by the cloud. (2) Model management comparison. Traditional embedded devices manage their own models, that is, the addition and deletion of faces need to be retrained to generate corresponding signatures, and the models in the devices need to be updated at the same time. While under cloud-edge collaboration, the model is located in the cloud, and the device does not involve model training, but only needs to be called through the interface. (3) For accuracy comparison, the computing power of embedded devices is limited. In order to complete the identification, the models are usually simplified and the recognition accuracy is not that high. With cloud-edge collaboration, the model runs on a server with higher computing power, and the recognition accuracy can be greatly improved. (4) Real-time comparison. If embedded devices are configured with more resources, they can meet the general real-time requirements. Relevant literature shows that it takes 900 ms to identify on the ARM 11 platform, which has the advantage of offline identification. Cloud-edge collaboration has a certain impact on the real-time performance of the system, but under the condition of sufficient network resources, the queries per second (QPS) index is at least 2, and the performance is better than that of edge computing.
At present, the intelligent visitor system has been on trial and has served the enterprise front desk. From the running situation, the overall system has high stability, which effectively solves the problem of the reception of corporate visitors at the current stage of the novel coronavirus epidemic, and is well received by related enterprises. The operation of the edge terminal is shown in Figure 9.

The terminal system and app.
In the process of cloud-edge collaboration, both network and cloud processing capabilities will affect the operation of the system. To evaluate the processing performance of the system, batch simulation identification requests are used to check the overall processing delay and recognition accuracy of local and cloud interaction. In the test environment, the Baidu Cloud AI platform uses a free account, which provides only 2QPS (throughput) peak access performance. There are 315 images in the face database, and the test data contains 30 face images. Each time, four positive samples and one negative sample are randomly selected. The image size is
Simulation results.
From Table 1, we can have the following observations: (1) The recognition accuracy is high. Generally, the matching score of a registered person is more than 80, while the matching score of an unregistered one is far less than 80. (2) The recognition delay is short. The average cloud processing delay is 382 ms, and the first connection request processing delay is 1100 ms.
Conclusions
In this work, we have applied cloud-edge computing to the visitor system. Through face recognition, voice recognition, and other related technologies, we have not only fully realized the data electronization, solving the problems of the previous visitor systems, but also greatly improved visitor registration efficiency and reduced labor costs. It can be seen from the use of our developed system that cloud-edge collaboration effectively solves the problem of the limited computing capacity of AI systems and promotes the applications of AI in the industry.
