Movable electronic device and control method thereof
By integrating a camera, actuator, and deep learning model into a portable electronic device for authentication and image feature analysis, the authentication problem of automatic following devices is solved, enabling convenient automatic tracking functionality.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- MSI COMPUTER (SHENZHEN) CO LTD
- Filing Date
- 2021-05-25
- Publication Date
- 2026-06-26
Smart Images

Figure CN115221485B_ABST
Abstract
Description
TECHNICAL FIELD
[0001] The present application relates to a mobile electronic device, and particularly relates to a mobile electronic device with an identity verification function. BACKGROUND
[0002] Electronic devices with automatic following function have been developed for many years, such as Gita smart luggage robot. This robot is able to achieve automatic tracking by comparing the images captured by the camera on the robot with the images captured by the camera on the waistband worn by the user. The luggage robot is provided with a button for the user to enable or disable the automatic following function. However, this seemingly convenient function hides the crisis. Due to the lack of identity verification process in use, anyone can enable the following device, and disable it at any time during the following process. In addition, the user needs to wear an additional waistband to use the automatic tracking function, so it is not very convenient to use.
[0003] Therefore, a solution is needed to enable the electronic device with automatic following function to provide identity verification function, and to consider the convenience in use. SUMMARY
[0004] The present application provides a mobile electronic device and a control method thereof, which can provide an identity verification function and consider the convenience in use.
[0005] The control method of the mobile electronic device of the present application comprises: enabling the mobile electronic device to enter a following mode, and performing an identity verification action of a user according to a verification command; and acquiring, by the mobile electronic device, a first image of the user according to the identity verification action, and setting the user as a following target and performing a following action according to the first image. The details of performing the following action include: continuously acquiring, by the mobile electronic device, a plurality of second images including the user in time sequence, obtaining a plurality of feature vector information related to image features of the user in time sequence according to image information of the plurality of second images and a deep learning model; and judging, by the mobile electronic device, the position of the user to follow according to the plurality of feature vector information.
[0006] The portable electronic device of the present invention includes a camera, an actuator, a deep learning model, and a processing circuit. The camera is used to perform a shooting action. The actuator is used to be driven to move the portable electronic device. The deep learning model is used to generate multiple feature vectors related to the image features of a person in the image based on image information. The processing circuit is used to: when entering follow mode, perform a user authentication action according to an authentication command; and control the camera to perform a shooting action according to the authentication action to acquire a first image of the user, and set the user as the follow target and perform a follow action based on the first image. The processing circuit is also used to: control the camera to continuously perform shooting actions to sequentially acquire multiple second images including the user, and sequentially obtain multiple feature vectors related to the image features of the user based on the image information of the multiple second images and the deep learning model; and determine the user's position based on the multiple feature vectors for following.
[0007] The authentication function of the portable electronic device of this invention prevents unregistered users from arbitrarily activating the automatic tracking function. Furthermore, the portable electronic device of this invention only needs to continuously capture user images to achieve automatic tracking. Therefore, the portable electronic device of this invention, while possessing user authentication functionality, also ensures ease of use. Attached Figure Description
[0008] Figure 1 The diagram shows a block illustration of a portable electronic device according to a first embodiment of the present invention.
[0009] Figure 2 The diagram illustrates a flowchart of the control method for a portable electronic device according to a first embodiment of the present invention.
[0010] Figure 3 A block diagram illustrating a portable electronic device according to a second embodiment of the present invention is shown.
[0011] Figure 4A The diagram illustrates a flowchart of the control method for a portable electronic device according to a second embodiment of the present invention.
[0012] Figure 4B The illustration is for continuation. Figure 4A A flowchart illustrating the steps of a control method for a portable electronic device.
[0013] The reference numerals in the attached figures are explained as follows:
[0014] 100… Portable electronic devices
[0015] 110…processing circuit
[0016] 120…Deep Learning Model
[0017] 130… cameras
[0018] 140… Gesture Recognition Module
[0019] 150… Face recognition module
[0020] 160…Voiceprint Recognition Module
[0021] 170… loudspeakers
[0022] 180… Actuator
[0023] 190… human face recognition model
[0024] S210~S240, S401~S421: Steps Detailed Implementation
[0025] This invention proposes a portable electronic device with automatic following and authentication functions. Figure 1 The diagram shows a block illustration of a portable electronic device according to a first embodiment of the present invention. Figure 2 The diagram illustrates a flowchart of the control method for a portable electronic device according to a first embodiment of the present invention. Please also refer to... Figure 1 and Figure 2 The portable electronic device 100 includes a processing circuit 110, a deep learning model 120, a camera 130, and an actuator 180. The camera 130 is used to perform a shooting action. The actuator 180 (e.g., a motor) is driven to move the portable electronic device 100. The deep learning model 120 is pre-built to generate multiple feature vectors related to the image features of a person in the image based on the image information of an image.
[0026] Processing circuit 110 is coupled to deep learning model 120, camera 130, and actuator 180 to control the aforementioned components. Processing circuit 110 is activated to enter follow mode. In follow mode, processing circuit 110 performs user authentication based on an authentication command (step S210). Processing circuit 110 also controls camera 130 to perform a shooting action based on the authentication action to acquire a first image of the user. Processing circuit 110 sets the user as the follow target based on the first image and performs a follow action (step S220). When the follow action begins, processing circuit 110 controls camera 130 to continuously perform a shooting action to sequentially acquire multiple second images, including the user. Processing circuit 110 obtains multiple feature vectors related to the user's image features sequentially based on the image information of the multiple second images and deep learning model 120 (step S230). Processing circuit 110 determines the user's location based on the multiple feature vectors to perform follow (step S240).
[0027] Figure 3 A block diagram illustrating a portable electronic device according to a second embodiment of the present invention is shown. Figure 4A The diagram illustrates a flowchart of the control method for a portable electronic device according to a second embodiment of the present invention. Please see... Figure 3 In the second embodiment, the portable electronic device 100, in addition to the aforementioned processing circuit 110, deep learning model 120, camera 130, and actuator 180, further includes a gesture recognition module 140, a face recognition module 150, a voiceprint recognition module 160, a speaker 170, and a portrait recognition model 190. The gesture recognition module 140, face recognition module 150, voiceprint recognition module 160, and portrait recognition model 190 can all perform their specific functions via a cloud server. However, in another embodiment, the gesture recognition module 140, face recognition module 150, voiceprint recognition module 160, and portrait recognition model 190 can also perform their specific functions locally.
[0028] Please also see Figure 3 and Figure 4AThe control method for the portable electronic device 100 begins in step S401. In step S402, the portable electronic device 100 can be awakened from its sleep state based on a first gesture to enter a follow mode (steps S402 and S403). Specifically, a gesture model needs to be established in the portable electronic device 100 beforehand. The aforementioned gesture model can be established by training, for example, a Convolutional Neural Network (CNN) based on multiple training data. The processing circuit 110 controls the gesture recognition module 140 and the camera 130, so that the gesture recognition module 140 uses the image information captured by the camera 130 to recognize the first gesture through the aforementioned gesture model. In this embodiment, the first gesture is, for example, waving. If no specific gesture is detected, the portable electronic device 100 remains in a sleep state (step S404).
[0029] Steps S405-S408 primarily involve facial recognition. Generally, facial recognition includes steps such as face image acquisition, face localization, face recognition preprocessing, and identity verification. Regarding the technical details of identity verification, one or a series of face images containing an unidentified person are input, and compared with several known face images or corresponding codes in a face database to output a series of similarity scores. Based on these similarity scores, it can be determined whether the person in the image is a registered user.
[0030] In detail, upon recognizing the first gesture, the processing circuit 110 sends a control signal to activate the camera 130 and drive the actuator 180, thereby moving the movable electronic device 100 in the direction of the first gesture. When the movable electronic device 100 enters a shooting range, the processing circuit 110 controls the camera 130 to capture a first image containing the user's face (step S405). In one embodiment, the processing circuit 110 determines whether it has entered the shooting range based on the aspect ratio and minimum height information of the human figure in the captured image. The processing circuit 110 performs facial recognition based on the first image through the face recognition module 150 to obtain the image features of the user's face in the first image (step S406), and performs a first-level authentication action accordingly (step S407).
[0031] In this embodiment, the face recognition algorithm may include recognition algorithms using neural networks. However, the invention is not limited thereto. In other embodiments, face recognition may also be performed using feature-based recognition algorithms, appearance-based recognition algorithms, template-based recognition algorithms, or recognition algorithms using support vector machines (SVM).
[0032] If the identification result indicates that the user is not a registered user, the follow mode ends (step S412). In detail, when the execution result of step S408 is verification failure, the processing circuit 110 may control the speaker 170 to emit an instruction voice to indicate to the user that verification has failed. In one embodiment, if the verification failure exceeds a certain time duration, the follow mode ends (step S412); otherwise, the process returns to step S405.
[0033] If the identification result indicates a registered user (step S408), the processing circuit 110 can perform a second layer of authentication (step S410) based on the first voice command issued by the user (step S409). In practice, before executing step S409, the processing circuit 110 can also control the speaker 170 to emit a prompt voice to prompt the user to issue the first voice command (e.g., "Start following"). The processing circuit 110 can perform voiceprint recognition based on the first voice command through the voiceprint recognition module 160, comparing the recognition result with multiple pre-established voiceprint information of multiple registered users. When it is confirmed that the voiceprint feature of the aforementioned first voice command matches one of the pre-established multiple voiceprint information, the processing circuit 110 determines that the sound source is indeed a registered user (step S411) and further executes step S413. Conversely, when the voiceprint feature of the aforementioned first voice command does not match any of the pre-established multiple voiceprint information, the processing circuit 110 determines that the sound source is not a registered user and ends the following mode (step S412). In detail, when the execution result of step S411 is verification failure, the processing circuit 110 can control the speaker 170 to emit an instruction voice to indicate to the user that the verification has failed. In one embodiment, if the verification failure lasts for a certain period of time (including the situation where the user does not issue the first voice command for a long time), the follow mode ends (step S412); otherwise, it returns to step S409.
[0034] Figure 4B The illustration is for continuation. Figure 4A A flowchart illustrating the steps of a control method for a portable electronic device. Please also refer to... Figure 3 and Figure 4B When the processing circuit 110 confirms that the sound source is indeed a registered user, it executes the corresponding action (following action) and enters the following procedure (step S413) based on the first voice command (e.g., "start following"). In step S414, the processing circuit 110 sequentially acquires multiple second images including the aforementioned user, where the second images ideally include the user's full-body image (hereinafter referred to as a portrait). In step S415, the processing circuit 110 can sequentially acquire multiple feature vector information related to the user's image features based on the image information of the aforementioned multiple second images and the deep learning model 120. Furthermore, the processing circuit 110 can continuously compare the aforementioned multiple feature vector information to determine the user's position and perform following (step S416).
[0035] In detail, the processing circuit 110 can locate the user's image in the first second image based on the position of the user's face in the first image. Then, the processing circuit 110 uses the image information of the user's image as input, and through the deep learning module 120, obtains feature vector information related to the image features of the user, and simultaneously obtains at least one of user position information, user body proportion information, and color block information. Next, the processing circuit 110 controls the person recognition model 190 to identify the image information of all people in the second second image based on the image information of the second second image. The processing circuit 110 can filter the image information of all people in the second second image based on at least one of the previously obtained user position information, user body proportion information, and color block information, and find at least one similar image as a candidate. In this invention, the number of candidate objects can be three.
[0036] However, the present invention is not limited to determining candidate objects in the manner described above. In another embodiment, the processing circuit 110 can define a region of interest (ROI) in the second second image based on the previously obtained user location information. Furthermore, the processing circuit 110 obtains at least one human image information within the ROI using the person recognition model 190. The processing circuit 110 can filter the at least one human image information within the ROI based on at least one of the previously obtained user proportion information and color block information to determine candidate objects.
[0037] After determining the candidate objects, the processing circuit 110 takes the image information of at least one candidate object as input and obtains at least one corresponding feature vector information through the deep learning module 120. When there are multiple candidate objects, the processing circuit 110 can use the previously obtained user feature vector information as a benchmark to compare the multiple feature vector information of the current multiple candidate objects to obtain multiple feature vector difference information. The processing circuit 110 can determine the only candidate object whose vector difference information is less than a threshold as the following target based on the multiple feature vector difference information. If there are multiple candidate objects whose vector difference information is less than a threshold, the search for the following target using the second second image is abandoned, and the search for the following target is instead performed using the subsequent third second image. When there is only one candidate object, the processing circuit 110 can obtain the corresponding feature vector difference information and directly use that candidate object as the following target. At the same time, the processing circuit 110 also updates with the currently obtained user feature vector information.
[0038] Upon acquiring the third second image, the processing circuit 110 determines at least one candidate object in the third second image in the same manner, and determines the tracking target by comparing the feature vector difference information with the aforementioned threshold. Through continuous comparison, the processing circuit 110 can find the person whose current feature vector information is most similar to the previous feature vector information as the tracking target. Since the camera 130 of this invention can capture images at a rate of 30 frames per second, the position of the user (tracking object) does not differ much between consecutive frames, and the user's proportion information and color block information do not change significantly. In other words, the method of finding the tracking target from the candidate objects determined by the above two methods has high accuracy.
[0039] In one embodiment, after confirming that the source of the sound is indeed a registered user, the processing circuit 110 can further control the speaker 170 to emit a prompt voice, prompting the user to issue a second voice command at any time to end the follow program. When the processing circuit 110 receives the second voice signal (e.g., "End Follow") issued by the user (step S417), it performs voiceprint recognition based on the second voice signal to perform third-level authentication (step S418). If the authentication fails (step S419), it indicates that the person who issued the second voice command is not the user who enabled the follow program, and the processing circuit 110 continues to execute the follow program (step S420). In detail, if the execution result of step S419 is authentication failure, the processing circuit 110 can control the speaker 170 to emit an indication voice to indicate to the user that the authentication has failed. If the authentication is successful (step S419), the processing circuit 110 ends the follow program based on the second voice signal (step S421). The execution details of the third-level authentication are similar to those of the aforementioned second-level authentication and will not be repeated here.
[0040] The deep learning module 120 can be built using deep learning algorithms, such as Deep Neural Networks (DNN), Convolutional Neural Networks (DNN), Deep Belief Networks (DNN), and Recurrent Neural Networks (RNN). Taking a DNN as an example, the model consists of an input layer, multiple hidden layers, and an output layer. Each hidden layer contains multiple nodes, and nodes in different layers are interconnected. During model training, image information from multiple images can be used as input (which can be considered a "question"), and corresponding expected values (which can be considered an "answer" to whether the people in the images are the same person) can be assigned. This allows the weights and biases between nodes to be continuously adjusted in multiple question-and-answer sessions. Simply put, model training uses backpropagation. Initially, weights and biases are assigned randomly. By continuously modifying the weights and biases, the final result is made closer to the true answer. After feeding a large amount of data, the model's accuracy will increase. When the accuracy improvement becomes limited, these weights and biases can be stored. By this step, the model has been trained.
[0041] It is worth noting that the deep learning module 120 of this invention does not directly use the aforementioned trained model. What this invention requires is not the calculation result of the aforementioned model (i.e., whether it is the same person), but rather the feature vector information used to generate the aforementioned calculation result (to represent the features of the person's image). In this invention, the last layer of the trained model is removed to form the deep learning module 120 of this invention.
[0042] In terms of hardware implementation, the aforementioned processing circuit 110 can be a logic circuit implemented on an integrated circuit. The related functions of the processing circuit 110 can be implemented as hardware using hardware description languages (such as Verilog HDL or VHDL) or other suitable programming languages. The aforementioned deep learning module 120 can be various logic blocks, modules, and circuits within a Field Programmable Gate Array (FPGA) and / or other processing units.
[0043] The portable electronic device of the present invention can be a suitcase, a wheelchair, or other electronic device that needs to follow the user. In one use case, a suitcase with an automatic following function can follow behind a passenger. In another use case, a wheelchair with an automatic following function can follow a patient undergoing rehabilitation so that the patient can return to the wheelchair as soon as rehabilitation is completed.
[0044] The authentication function of the portable electronic device of this invention prevents unregistered users from arbitrarily activating the automatic tracking function of the portable electronic device. Furthermore, it also prevents unregistered users from arbitrarily stopping the portable electronic device while it is performing automatic tracking. In addition, the portable electronic device of this invention does not require additional devices to assist in identification (such as a belt worn by the user and a camera mounted on it), but only needs to continuously capture user images to achieve the automatic tracking function. Therefore, the portable electronic device of this invention, while having user authentication functionality, also takes into account ease of use.
Claims
1. A method for controlling a portable electronic device, comprising: In a follow mode, the portable electronic device acquires a first image of a user and performs facial recognition based on the first image to perform a first layer of identity verification on the user. In response to the first authentication, the portable electronic device performs a voiceprint recognition based on a first voice command from the user to perform a second authentication of the user. as well as In response to the successful completion of the second authentication, the portable electronic device designates the user as a follow target and performs a follow action, including: The portable electronic device sequentially acquires multiple second images of the user, and based on the image information of the second images and a deep learning model, sequentially obtains multiple feature vectors related to the user's image features; and The portable electronic device determines the user's location based on the feature vector information in order to follow the user. The process of setting the user as a follow target and performing a follow action by the portable electronic device in response to passing the second authentication also includes: The portable electronic device receives the first voice command instructing the follow action to begin, and performs voiceprint recognition based on the first voice command; and The following action begins when the identification result matches a preset voiceprint information of the user by the portable electronic device.
2. The control method for a portable electronic device as described in claim 1, further comprising: The portable electronic device receives a second voice command instructing the user to stop the following action, and performs voiceprint recognition based on the second voice command. as well as The portable electronic device stops the following action when the identification result matches the user's preset voiceprint information.
3. The control method for a portable electronic device as described in claim 1, wherein, The step of determining the user's location using the feature vector information by the portable electronic device further includes: Based on the currently acquired second image, the portable electronic device obtains multiple candidate feature vectors of multiple candidate objects; and The portable electronic device compares the differences between the currently acquired candidate feature vector information and the previously acquired feature vector information of the user, and selects a candidate object whose difference is less than a threshold as the following target.
4. The control method for a portable electronic device as described in claim 3, wherein, The step of obtaining the candidate object further includes: The portable electronic device performs person recognition based on a person recognition model to obtain multiple person image information in the currently acquired second image; The portable electronic device determines a portion of the image information of the person as a candidate based on at least one of the previously acquired location information, scale information, and color block information of the target being followed.
5. The control method for a portable electronic device as described in claim 3, wherein, The step of obtaining the candidate object further includes: The portable electronic device determines a region of interest in the currently acquired second image based on the location information of the previously acquired target. A person recognition model is used to identify multiple person image information in the region of interest, and the candidate object is determined based on the person image information.
6. The control method for a portable electronic device as described in claim 5, wherein, The step of determining the candidate object based on the person image information further includes: The portable electronic device determines a portion of the image information of the person as a candidate based on at least one of the previously obtained proportional information and color block information of the target being followed.
7. The control method for a portable electronic device as described in claim 1, wherein, The feature vector information includes 256 or 512 elements.
8. The control method for a portable electronic device as described in claim 1, further comprising: During the following action, the portable electronic device stores multiple angular feature vectors of the user at different angles, so as to identify the user based on the angular feature vectors when the target is lost.
9. The control method for a portable electronic device as described in claim 1, further comprising: The portable electronic device performs a gesture detection so that when a first gesture is detected, it enters the follow mode.
10. The control method for a portable electronic device as described in claim 1, wherein, The step of acquiring the user's first image by the portable electronic device further includes: The portable electronic device is moved to a shooting range to take a close-up picture of the user's face in order to obtain the user's first image.
11. A portable electronic device, comprising: A camera used to perform a shooting action; An actuator is driven to move the movable electronic device. A deep learning model is used to generate multiple feature vectors related to the image features of a person in an image based on image information. as well as A processing circuit, used to: In a follow mode, the camera is controlled to perform the shooting action to obtain a first image of a user, and a facial recognition is performed based on the first image of the user to perform a first layer of identity verification for the user; In response to the successful completion of the first layer of authentication, the portable electronic device performs a voiceprint verification based on a first voice command from the user to perform a second layer of authentication on the user; and In response to the successful completion of the second authentication, the user is designated as a follow target and a follow action is performed, wherein the processing circuitry is further configured to: The camera is controlled to continuously perform the shooting action to acquire multiple second images of the user in a sequential manner, and based on the image information of the second images and the deep learning model, multiple feature vector information related to the image features of the user is obtained in a sequential manner. as well as The user's location is determined based on the feature vector information for tracking purposes. The processing circuit is also used for: The system receives a first voice command instructing the user to begin the follow action, and performs voiceprint recognition based on the first voice command, so that the follow action begins when the recognition result matches a preset voiceprint information of the user.
12. The portable electronic device as claimed in claim 11, wherein, This processing circuit is also used for: The system receives a second voice command instructing the user to stop the following action, and performs voiceprint recognition based on the second voice command, so as to stop the following action when the recognition result matches the user's preset voiceprint information.
13. The portable electronic device as claimed in claim 11, wherein, This processing circuit is also used for: Based on the currently acquired second image, multiple candidate feature vectors of multiple candidate objects are obtained, and the differences between the currently acquired candidate feature vectors and the previously acquired feature vectors of the user are compared, so as to select a candidate object whose difference is less than a threshold as the following target.
14. The portable electronic device as claimed in claim 13, wherein, This processing circuit is also used for: Based on a person recognition model, a person is identified to obtain image information of multiple persons in the currently acquired second image; The portable electronic device determines a portion of the image information of the person as a candidate based on at least one of the previously acquired location information, scale information, and color block information of the target being followed.
15. The portable electronic device as claimed in claim 13, wherein, This processing circuit is also used for: Based on the previously acquired location information of the target being followed, a region of interest is determined in the currently acquired second image; Multiple person image information is obtained by identifying the region of interest, and the candidate object is determined based on the person image information.
16. The portable electronic device as claimed in claim 15, wherein, This processing circuit is also used for: Based on at least one of the previously obtained proportional information of the target being followed and the color block information of the target being followed, a portion of the image information of the person is selected as the candidate object.
17. The portable electronic device of claim 11, wherein the feature vector information comprises 256 or 512 elements.
18. The portable electronic device as claimed in claim 11, wherein, This processing circuit is also used for: During the execution of the following action, multiple angular feature vectors of the user at different angles are stored so that the user can be identified based on the angular feature vectors when the target is lost.
19. The portable electronic device as claimed in claim 11, wherein, This processing circuit is also used for: Perform a gesture detection so that the follow mode is entered when a first gesture is detected.
20. The portable electronic device as claimed in claim 11, wherein, This processing circuit is also used for: The actuator is controlled to move the movable electronic device to a shooting range and take a close-up picture of the user's face to obtain the user's first image.