Face recognition method and device, in-memory computing system

By integrating a lightweight face detection and recognition model on an in-memory computing chip, the high cost and high power consumption problems caused by traditional chip computing architecture are solved, realizing low-power real-time face recognition, which is suitable for low-power devices.

CN119068519BActive Publication Date: 2026-06-23BOE TECHNOLOGY GROUP CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BOE TECHNOLOGY GROUP CO LTD
Filing Date
2023-05-31
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Traditional chip computing architecture results in high cost and high power consumption for facial recognition technology, especially due to the excessive energy consumption caused by frequent transfers between the computing core and memory during algorithm operation.

Method used

A lightweight face detection and recognition model is integrated using an in-memory computing chip. Face detection and recognition are performed using an in-memory computing PIM chip, reducing data copying during the computing process. The model is optimized by combining knowledge distillation technology, thereby reducing power consumption.

Benefits of technology

It achieves low-power real-time face detection and recognition, suitable for low-power devices such as drones and wearable devices, and supports functions such as device wake-up and unlocking.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119068519B_ABST
    Figure CN119068519B_ABST
Patent Text Reader

Abstract

The present disclosure provides a face recognition method and device, in-memory computing system, and relates to the technical field of computers, which can be used in the technical fields of computer vision, image processing, machine learning and the like. The method comprises: acquiring a to-be-processed image; inputting the to-be-processed image into an in-memory computing chip configured with a face detection model, performing face detection processing on the to-be-processed image by using a preset face detection model to obtain an initial face detection frame in the to-be-processed image; acquiring a target face image according to the initial face detection frame; performing face recognition processing on the target face image by using a face recognition model pre-configured in the in-memory computing chip to obtain a target face feature vector corresponding to the target face image; and comparing the target face feature vector with a feature vector corresponding to a preset face image to obtain a face recognition result. The present disclosure also provides an electronic device and a computer-readable storage medium.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of computer technology, and in particular to a face recognition method and apparatus, an in-memory computing system, an electronic device, and a computer-readable storage medium. Background Technology

[0002] Facial recognition is a biometric technology that identifies individuals based on their facial features. It involves using cameras or webcams to capture images or video streams containing faces, automatically detecting and tracking faces within the images, and then performing facial recognition. This process is also commonly referred to as image recognition or face identification. With the rise of applications such as target surveillance, target tracking, human-computer interaction, and people counting, facial recognition technology is receiving increasing attention.

[0003] With the development of artificial intelligence (AI) applications, more and more cloud algorithms (such as facial recognition algorithms) are gradually shifting to the edge. How to migrate high-computing-power and high-energy-consuming algorithms from the cloud to the edge with limited computing power is a technical challenge.

[0004] In related technologies, traditional chip computing architectures suffer from the memory wall problem. During algorithm operation, a significant amount of energy is consumed in the frequent transfer between the computing core and memory. Since facial recognition technology is usually implemented based on traditional chip computing architectures, facial recognition technology in related technologies usually suffers from high cost and high power consumption. Summary of the Invention

[0005] This disclosure provides a face recognition method and apparatus, an in-memory computing system, an electronic device, and a computer-readable storage medium.

[0006] According to a first aspect of this disclosure, a face recognition method is provided, the face recognition method comprising:

[0007] Obtain the image to be processed;

[0008] The image to be processed is input into an in-memory computing chip configured with a face detection model. The preset face detection model is used to perform face detection processing on the image to be processed to obtain the initial face detection box in the image to be processed.

[0009] The target face image is obtained based on the initial face detection box;

[0010] In the in-memory computing chip, a face recognition model pre-configured in the in-memory computing chip is used to perform face recognition processing on the target face image to obtain the target face feature vector corresponding to the target face image;

[0011] The target face feature vector is compared with the feature vector corresponding to the preset face image to obtain the face recognition result.

[0012] In some embodiments, obtaining the target face image based on the initial face detection bounding box includes:

[0013] The initial face detection box is input into a programmable logic unit for non-maximum suppression processing to obtain the target face detection box;

[0014] The target face detection box is then input into the memory computing chip.

[0015] The target face detection box is then mapped onto the image to be processed in the in-memory computing chip to obtain the target face image corresponding to the target face detection box on the image to be processed.

[0016] In some embodiments, the face recognition method further includes, prior to performing nonmaximum suppression processing:

[0017] In the programmable logic unit, based on the confidence level of multiple initial face detection boxes output by the face detection model, a preset number of initial face detection boxes are selected from the multiple initial face detection boxes in descending order of confidence level.

[0018] The step of inputting the initial face detection box into a programmable logic unit for non-maximum suppression processing to obtain the target face detection box includes:

[0019] A preset number of initial face detection boxes are input into the programmable logic unit for nonmaximum suppression processing to obtain the target face detection box.

[0020] In some embodiments, the face recognition method for acquiring the image to be processed further includes:

[0021] Obtain the original image;

[0022] The original image is then subjected to image decoding and moving object detection.

[0023] In response to the detection of a moving object in the original image, the original image is input to the programmable logic unit to perform image preprocessing to obtain the image to be processed.

[0024] In some embodiments, the image preprocessing includes at least one of the following: image scaling, image cropping, image format conversion, and image enhancement.

[0025] In some embodiments, the moving object detection includes performing moving object detection on the original image using frame difference.

[0026] In some embodiments, comparing the target facial feature vector with a preset facial feature vector to obtain a facial recognition result includes:

[0027] The target facial feature vector is input into a programmable logic unit (PLU) so that the PLU can perform the following processing:

[0028] The target face feature vector is normalized to obtain the target face normalized feature vector;

[0029] The normalized feature vector of the target face is compared with the feature vector corresponding to a preset face image in the preset database to obtain the face feature matching degree.

[0030] When the facial feature matching degree is greater than a preset threshold, a facial recognition result indicating successful facial matching is obtained.

[0031] In some embodiments, the face detection model includes: a Backbone layer, a Neck layer, and a Head layer;

[0032] The Backbone layer is used to extract image features from the input image to be processed;

[0033] The Neck layer is used to reduce or adjust the image features from the Backbone layer;

[0034] The Head layer is used to generate the final network output by taking the features obtained after processing by the Neck layer as input, so as to obtain the initial face detection box.

[0035] In some embodiments, the face recognition model is trained using a knowledge distillation method.

[0036] According to a second aspect of this disclosure, a face recognition device is provided, the face recognition device comprising:

[0037] The acquisition module is configured to acquire the image to be processed;

[0038] The detection module is configured to input the image to be processed into an in-memory computing chip equipped with a face detection model, and use the face detection model to perform face detection processing on the image to be processed to obtain an initial face detection box in the image to be processed.

[0039] The processing module is configured to acquire a target face image based on the initial face detection box;

[0040] The recognition module is configured to perform face recognition processing on the target face image in the in-memory computing chip using a face recognition model pre-configured in the in-memory computing chip, and obtain the target face feature vector corresponding to the target face image.

[0041] The comparison module is configured to compare the target face feature vector with the feature vector corresponding to a preset face image to obtain a face recognition result.

[0042] According to a third aspect of this disclosure, this disclosure provides an in-memory computing system, the in-memory computing system comprising:

[0043] An in-memory computing chip integrates a face detection model, which is configured to use the face detection model to perform face detection processing on the image to be processed, obtain an initial face detection box in the image to be processed, and send the initial face detection box to the PL terminal;

[0044] The PL terminal is configured to acquire a target face image based on the initial face detection box and send the target face image to the in-memory computing chip;

[0045] The in-memory computing chip also integrates a face recognition model, which is configured to use the face recognition model to perform face recognition processing on the target face image, obtain the target face feature vector corresponding to the target face image, and send the target face feature vector to the PL terminal;

[0046] The PL terminal is further configured to compare the target face feature vector with the feature vector corresponding to a preset face image to obtain a face recognition result.

[0047] In some embodiments, the PL terminal is configured as follows:

[0048] The initial face detection bounding box is subjected to non-maximum suppression processing to obtain the target face detection bounding box;

[0049] The target face detection box is mapped onto the image to be processed to obtain the target face image corresponding to the target face detection box on the image to be processed.

[0050] In some embodiments, the in-memory computing system further includes a PS terminal, which is deployed on the FPGA device;

[0051] The PS terminal is configured to: acquire raw images from a camera device; perform image decoding and moving object detection on the raw images; and when a moving object is detected in the raw images, send the raw images to the PL terminal.

[0052] The PL terminal is also configured to, in response to the PS terminal detecting a moving object in the original image, perform image preprocessing on the original image to obtain the image to be processed, and send the image to be processed to the in-memory computing chip.

[0053] In some embodiments, the PL terminal is configured as follows:

[0054] The target face feature vector is normalized to obtain the target face normalized feature vector;

[0055] The normalized feature vector of the target face is compared with the feature vector corresponding to a preset face image in the preset database to obtain the face feature matching degree.

[0056] When the facial feature matching degree is greater than a preset threshold, a facial recognition result indicating successful facial matching is obtained.

[0057] According to a fourth aspect of this disclosure, an electronic device is provided that includes the aforementioned in-memory computing system.

[0058] According to a fifth aspect of this disclosure, an electronic device is provided, the electronic device comprising:

[0059] At least one processor; and

[0060] A memory communicatively connected to the at least one processor; wherein,

[0061] The memory stores one or more computer programs that can be executed by the at least one processor, and the one or more computer programs are executed by the at least one processor to enable the at least one processor to perform the above-described face recognition method.

[0062] According to a sixth aspect of this disclosure, a computer-readable storage medium is provided thereon storing a computer program that, when executed by a processor, implements the above-described face recognition method. Attached Figure Description

[0063] The accompanying drawings are provided to further illustrate the present disclosure and form part of the specification. They are used together with the embodiments of the present disclosure to explain the disclosure and do not constitute a limitation thereof. The above and other features and advantages will become more apparent to those skilled in the art from the detailed description of exemplary embodiments with reference to the accompanying drawings, in which:

[0064] Figure 1 A flowchart illustrating a face recognition method provided in an embodiment of this disclosure;

[0065] Figure 2 This is a schematic diagram of the model architecture of a face detection model in an embodiment of this disclosure;

[0066] Figure 3 This is a flowchart illustrating a method for acquiring a target face image according to an embodiment of this disclosure;

[0067] Figure 4 This is a flowchart illustrating a method for acquiring an image to be processed according to an embodiment of the present disclosure;

[0068] Figure 5 This is a schematic diagram of the model architecture of a face recognition model in an embodiment of this disclosure;

[0069] Figure 6 This is a flowchart illustrating one method of obtaining face recognition results in an embodiment of this disclosure;

[0070] Figure 7 This is a schematic diagram of the structure of a face recognition device provided in an embodiment of the present disclosure;

[0071] Figure 8 This is a schematic diagram of the structure of an in-memory computing system provided in an embodiment of the present disclosure;

[0072] Figure 9 A schematic diagram illustrating an application scenario of an in-memory computing system provided in an embodiment of this disclosure;

[0073] Figure 10 This is a block diagram of an electronic device provided in an embodiment of the present disclosure. Detailed Implementation

[0074] To enable those skilled in the art to better understand the technical solutions of this disclosure, exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments of this disclosure to aid understanding. These should be considered merely exemplary. Therefore, those skilled in the art should recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and conciseness, descriptions of well-known functions and structures are omitted in the following description.

[0075] Where there is no conflict, the various embodiments of this disclosure and the features thereof in the embodiments may be combined with each other.

[0076] As used herein, the term “and / or” includes any and all combinations of one or more related enumerated entries.

[0077] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit this disclosure. As used herein, the singular forms “a” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that when the terms “comprising” and / or “made of” are used in this specification, the presence of the stated feature, integral, step, operation, element, and / or component is specified, but the presence or addition of one or more other features, integrals, steps, operations, elements, components, and / or groups thereof is not excluded. Words such as “connected” or “linked” are not limited to physical or mechanical connections but can include electrical connections, whether direct or indirect.

[0078] Unless otherwise specified, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant art and this disclosure, and will not be interpreted as having an idealized or overly formal meaning, unless expressly so defined herein.

[0079] Figure 1 This is a flowchart illustrating a face recognition method provided in an embodiment of the present disclosure, as shown below. Figure 1 As shown, the face recognition method includes steps S11 to S14.

[0080] Step S10: Obtain the image to be processed.

[0081] Step S11: Input the image to be processed into the in-memory computing chip configured with a face detection model, and use the preset face detection model to perform face detection processing on the image to be processed to obtain the initial face detection box in the image to be processed.

[0082] Step S12: Obtain the target face image based on the initial face detection bounding box.

[0083] Step S13: In the in-memory computing chip, the face recognition model pre-configured in the in-memory computing chip is used to perform face recognition processing on the target face image to obtain the target face feature vector corresponding to the target face image.

[0084] Step S14: Compare the target face feature vector with the feature vector corresponding to the preset face image to obtain the face recognition result.

[0085] According to the face recognition method provided in this disclosure, a lightweight face detection model and face recognition model are integrated into a low-power in-memory computing chip to perform face detection and recognition. This method can achieve real-time face detection and recognition with low power consumption and can be applied to low-power devices such as drones, e-ink tablets, and wearable devices to realize functions such as device wake-up and unlocking.

[0086] In this embodiment of the present disclosure, in step S11, the image to be processed is input into a pre-configured face detection model in the in-memory computing chip for face detection processing to obtain an initial face detection box in the image to be processed. The initial face detection box can be used to represent the position coordinate information of the face in the image predicted by the model.

[0087] In this embodiment of the disclosure, the input to the face detection model is an image to be processed. The pixel size of the image to be processed is a preset pixel size, and the image format is a preset image format. For example, the pixel size of the image to be processed is 320*240, and the image format is RGB. The image to be processed is input into the face detection model, and after the neural network inference calculation of the face detection model, the original result of face detection is output. The original result of face detection includes an initial face detection box.

[0088] Figure 2 This is a schematic diagram of the model architecture of a face detection model according to an embodiment of this disclosure. To achieve a lightweight model design, the number of network channels and the number of network layers in the face detection model are set to small values. In some embodiments, such as... Figure 2 As shown, the face detection model consists of three parts: a feature extraction layer, a feature processing layer, and a prediction output layer. The feature extraction layer is the Backbone layer, the feature processing layer is the Neck layer, and the prediction output layer is the Head layer.

[0089] The feature extraction layer, or backbone layer, is the core of the model, used to extract image features from the input image for subsequent processing and analysis. The backbone layer consists of an input layer (input), a convolutional layer (Conv2d), a batch normalization layer (BatchNorm), a non-linear activation function layer (ReLU6), four network blocks (Block), and three attention mechanism network blocks (Block_SE). The first convolutional layer (Conv2d) has 16 channels, and N = 1. The value of N represents the number of network blocks in the model, and this value can be adjusted based on the computing power of the in-memory chip.

[0090] Among them, such as Figure 2On the right side, the network module (Block) consists of three units composed of a convolutional layer (Conv2d), a batch normalization layer (BatchNorm), and a non-linear activation function layer (ReLU6). The attention mechanism network module (Block_SE) consists of one network module (Block) and one attention mechanism module (Squeeze-and-Excitation, SE). The feature map is short-circuited to the attention mechanism module after passing through the network module (Block). The attention mechanism module (SE) includes a pooling layer (AveragePooling) and two units composed of a convolutional layer (Conv2d), a batch normalization layer (BatchNorm), and a non-linear activation function layer (ReLU6).

[0091] The feature processing layer, or Neck layer, is an intermediate layer connecting the Backbone layer and the Head layer. The main function of the Neck layer is to reduce or adjust the dimensionality of image features from the Backbone layer, improving feature diversity and robustness to better adapt to task requirements. In the Neck layer, the feature map obtained from the seventh module of the seven modules in the Backbone layer is processed by convolution (Conv2d) to obtain the first feature map, Feature1. Feature1 is then upsampled and added to the feature map obtained from the fifth module of the seven modules in the Backbone layer, followed by convolution (Conv2d) to obtain the second feature map, Feature2. Feature2 is then upsampled and added to the feature map obtained from the third module of the seven modules in the Backbone layer, followed by convolution (Conv2d) to obtain Feature3. Feature1, Feature2, and Feature3 are then processed by their respective convolution (Conv2d) steps and input into the Head layer.

[0092] The prediction output layer, or Head layer, is the last layer of the model and serves as the detection head for face detection. It is typically a classifier or regressor that takes the features processed by the Neck layer as input and generates the final network output to obtain the initial face detection bounding boxes. In the Head layer, the first feature map (Feature1), the second feature map (Feature2), and the third feature map (Feature3) are convolved (Conv2d) in the Neck layer, then concatenated and fused (Concat) in the Head layer. After further convolution (Conv2d) and dimensionality transformations (permute, view), the softmax layer outputs three feature map vectors: the first feature map vector, the second feature map vector, and the third feature map vector.

[0093] Wherein, the first feature map vector represents the initial face detection box and its corresponding position coordinates, the second feature map vector represents the initial face detection box and its corresponding facial key point coordinates, and the third feature map vector represents the initial face detection box and its corresponding confidence score. For example, the first feature map vector, the second feature map vector, and the third feature map vector are data of lengths of 3160*4, 3160*10, and 3160*2, respectively. 3160 represents 3160 initial face detection boxes, 4 represents the coordinate dimension (x,y,w,h) of the initial face detection box, 10 represents the position coordinates (5*(x,y)) of the 5 facial key points corresponding to the initial face detection box, and the facial key points include the two eyes, the nose, and the two corners of the mouth. 2 represents the confidence score, for a total of 4+10+2=16 dimensions of information.

[0094] In this embodiment of the disclosure, the main computational resources for the neural network computation of the model are consumed in multiplication and addition operations. Processing-In-Memory (PIM) chips are suitable for neural network computation and can greatly reduce the process of repeatedly copying data during computation.

[0095] Figure 3 This is a flowchart illustrating a method for acquiring a target face image according to an embodiment of the present disclosure. In some embodiments, such as... Figure 3 As shown, in step S12, obtaining the target face image based on the initial face detection box may further include:

[0096] Step S31: Input the initial face detection box into the programmable logic unit for nonmaximum suppression processing to obtain the target face detection box.

[0097] The initial face detection boxes obtained and selected by the above face detection are processed by non-maximum suppression (NMS) to obtain the target face detection boxes.

[0098] Among them, the Nonmaximum Suppression (NMS) algorithm is an algorithm that searches for local maxima and suppresses nonmaximum elements.

[0099] During face detection inference, many initial face detection boxes (such as A, B, C, D, E, F, etc.) are generated. Many of these initial face detection boxes detect the same target face, but ultimately only one target face detection box is needed for each target face. Using Non-Maximum Suppression (NMS), the initial face detection box with the highest score can be selected, let's say it's initial face detection box C. Then, the IOU (Intersection over Union) value of initial face detection box C is calculated with other initial face detection boxes. The IOU value is an evaluation metric for object detection. When the IOU value exceeds a set threshold, the initial face detection boxes that exceed the threshold are suppressed. The suppression method is to set the score of the initial face detection box to 0. After one round, the initial face detection box with the highest score is searched among the remaining initial face detection boxes, and then the initial face detection boxes with IOU values ​​exceeding the threshold with that initial face detection box are suppressed, until finally, initial face detection boxes with almost no overlap are retained. In this way, the target face detection box corresponding to each target face is obtained.

[0100] In some embodiments, after inputting the initial face detection box into the programmable logic unit and before performing non-maximum suppression processing, the face recognition method further includes:

[0101] In the programmable logic unit, based on the confidence scores of multiple initial face detection boxes output by the face detection model, a preset number of initial face detection boxes are selected from the multiple initial face detection boxes in descending order of confidence scores.

[0102] For example, the face detection model outputs 3160 initial face detection boxes, with a preset number of 100. After sorting the initial face detection boxes according to their confidence levels, the top 100 initial face detection boxes with the highest confidence levels are selected for the next step of processing.

[0103] Furthermore, the initial face detection boxes are input into the programmable logic unit for non-maximum suppression processing to obtain the target face detection boxes, including: inputting a preset number of initial face detection boxes into the programmable logic unit for non-maximum suppression processing to obtain the target face detection boxes.

[0104] Step S32: Input the target face detection box into the memory computing chip.

[0105] Step S33: Map the target face detection box onto the image to be processed in the in-memory computing chip to obtain the target face image corresponding to the target face detection box on the image to be processed.

[0106] The detected target face detection box is mapped to the corresponding position coordinates on the image to be processed according to its position coordinates, and then cropped to obtain the target face image corresponding to the target face detection box. The cropped image is scaled to a predetermined pixel size, such as 112×112 pixels. Affine transformation is performed based on the coordinates of the facial key points (such as eyes, nose, and corners of the mouth) obtained by face detection to correct the target face image.

[0107] Figure 4 This is a flowchart illustrating a method for acquiring an image to be processed according to an embodiment of the present disclosure. In some embodiments, such as... Figure 4 As shown, in step S10, the image to be processed is acquired, including:

[0108] Step S41: Obtain the original image.

[0109] In some embodiments, the master control system controls the camera device to take pictures and receives the raw images captured by the camera device. In some embodiments, the system receives raw images captured and transmitted periodically by the camera device.

[0110] Step S42: Decode the original image and detect moving objects.

[0111] After acquiring the original image, the main control system decodes the original image and performs moving object detection on the decoded original image.

[0112] In some embodiments, the frame difference method is used to detect moving objects in the original image. It is determined whether the pixel values ​​of any two corresponding pixels in the preceding and following frames are consistent. If the number of inconsistent pixels is greater than a predetermined threshold, it is considered that there is a moving object in the original image.

[0113] Step S43: In response to detecting a moving object in the original image, the original image is input to the programmable logic unit to perform image preprocessing to obtain the image to be processed.

[0114] Face detection models have certain input requirements. Therefore, after acquiring the original image, before inputting the image into the face detection model for face detection, image preprocessing is required to obtain an image to be processed that meets the model's input requirements.

[0115] In some embodiments, image preprocessing includes at least one of the following: image scaling, image format conversion, and image enhancement.

[0116] Image scaling refers to downsampling an image to a preset pixel size, such as 320*240 pixels. For example, it involves directly taking the values ​​of some pixels in the original image and using them as the pixel values ​​in the scaled image, thereby improving processing efficiency. Image format conversion refers to converting the image format to a preset format. For example, if the model requires the input image to be in RGB format, no format conversion is needed if the original image is in RGB format, but if the original image is in YUV format, format conversion is required. Image enhancement can involve histogram equalization to prevent the image from being too bright or too dark.

[0117] In this embodiment of the present disclosure, in step S13, the acquired target face image is input into a pre-configured face recognition model in the in-memory computing chip, so that the face recognition model is used in the in-memory computing chip to perform face recognition processing on the target face image, thereby obtaining the target face feature vector corresponding to the target face image. The dimension of the target face feature vector can be 256.

[0118] Figure 5 This is a schematic diagram of the model architecture of a face recognition model according to an embodiment of the present disclosure. In some embodiments, such as... Figure 5 As shown, due to the limited computing power of the in-memory PIM chip, the model is designed with a small model structure. The network input can be a 64*64 3-channel color image. To achieve a lightweight model design, the number of network channels and network layers in the face recognition model are set to small values. In some embodiments, such as... Figure 5 As shown, the face recognition model includes a first network unit consisting of a convolutional layer (Conv2d), a batch normalization layer (BatchNorm), and a nonlinear activation function layer (ReLU6); four network blocks; three attention mechanism network blocks (Block_SE); a second network unit consisting of a convolutional layer (Conv2d), a batch normalization layer (BatchNorm), and a nonlinear activation function layer (ReLU6); a pooling layer (AveragePooling); and a fully connected layer consisting of a first linear layer (Linear), a batch normalization layer (BatchNorm), a nonlinear activation function layer (ReLU6), and a second linear layer (Linear). The first convolutional layer (Conv2d) has 16 channels, and N = 1. The value of N represents the number of network blocks in the model, and this value can be adjusted according to the computing power of the in-memory computing chip.

[0119] The network module (Block) consists of three units, each composed of a convolutional layer (Conv2d), a batch normalization layer (BatchNorm), and a non-linear activation function layer (ReLU6). The attention mechanism network module (Block_SE) consists of one network module (Block) and one attention mechanism module (Squeeze-and-Excitation, SE). The feature map is short-circuited between the network module (Block) and the attention mechanism module (SE). The attention mechanism module (SE) includes a pooling layer (AveragePooling) and two units, each composed of a convolutional layer (Conv2d), a batch normalization layer (BatchNorm), and a non-linear activation function layer (ReLU6).

[0120] To improve the model's ability to represent facial features, knowledge distillation is used during training to enhance its reasoning and computational capabilities. The face recognition model to be trained is used as the student model. The number of network channels and network layers of the student model is increased to obtain a model with a relatively larger number of network channels and network layers, which is then used as the teacher model. For example, the number of network channels in the first convolutional layer of the student model is set to 16, and N=1, while the number of network channels in the first convolutional layer of the teacher model is set to 64, and N=4.

[0121] The teacher model is a pre-trained face recognition model using a training dataset, while the student model is a face recognition model to be trained. Knowledge distillation refers to transferring the dark knowledge from the teacher model to the student model. Through knowledge distillation, the expressive power of the student model can be made as close as possible to or even exceed that of the teacher model, thereby achieving the effect of obtaining the network prediction (output) of the high-complexity teacher model using the low-complexity student model, simplifying network parameters, and ensuring network accuracy.

[0122] Figure 6 This is a flowchart illustrating one method of obtaining face recognition results according to an embodiment of the present disclosure. In some embodiments, such as... Figure 6 As shown, in step S14, the target face feature vector is compared with the feature vector corresponding to the preset face image to obtain the face recognition result, including: inputting the target face feature vector into the programmable logic unit so that the programmable logic unit can perform the following processing:

[0123] Step S61: Normalize the target face feature vector to obtain the target face normalized feature vector.

[0124] Normalization refers to subtracting the mean from the target face feature vector and then dividing it by its variance.

[0125] Step S62: Compare the normalized feature vector of the target face with the feature vector corresponding to the preset face image in the preset database to obtain the face feature matching degree.

[0126] The feature vector corresponding to the preset face image is obtained by normalizing the face feature vector obtained by face recognition model on the preset face image.

[0127] In some embodiments, the facial feature matching degree can be obtained by calculating the similarity between two feature vectors. The similarity between the two feature vectors can be calculated using a preset vector similarity algorithm, such as Euclidean distance similarity algorithm, cosine similarity algorithm, etc.

[0128] Step S63: When the facial feature matching degree is greater than the preset threshold, obtain the facial recognition result indicating successful facial matching.

[0129] When the facial feature matching degree is greater than the preset threshold, it means that there is a facial image in the facial image represented by the preset facial feature vector in the preset database that is similar to the target facial image. In other words, the target facial image is matched in the preset database, and thus a facial recognition result indicating successful facial matching is obtained.

[0130] When the facial feature matching degree is less than or equal to the preset threshold, it means that the facial images represented by the preset facial feature vectors in the preset database are not similar to the target facial image. In other words, the target facial image cannot be matched in the preset database, and thus a facial recognition result indicating unsuccessful facial matching is obtained.

[0131] In some embodiments, the normalized feature vector of the target face can be compared one by one with multiple preset face feature vectors in a preset database to obtain multiple face feature matching degrees. The maximum face feature matching degree is obtained from the multiple face feature matching degrees, and the maximum face feature matching degree is compared with a preset threshold to determine whether the target face image matches the face image in the preset database.

[0132] In some embodiments, the establishment of the preset database involves first acquiring multiple preset face images and labeling their identity information, then using a face recognition model to extract their face feature vectors, and performing normalization processing to obtain the face normalization feature vector corresponding to each preset face image, which is then saved to the preset database.

[0133] It is understood that the various method embodiments mentioned above in this disclosure can be combined with each other to form combined embodiments without violating the principle and logic. Due to space limitations, this disclosure will not elaborate further. Those skilled in the art will understand that in the above methods of specific implementation, the specific execution order of each step should be determined by its function and possible internal logic.

[0134] Figure 7 This is a schematic diagram of the structure of a face recognition device provided in an embodiment of this disclosure, as shown below. Figure 7 As shown, the face recognition device 700 includes: an acquisition module 701, a detection module 702, a processing module 703, a recognition module 704, and a comparison module 705.

[0135] The system comprises: an acquisition module 701 configured to acquire an image to be processed; a detection module 702 configured to input the image to be processed into an in-memory computing chip equipped with a face detection model, and use the face detection model to perform face detection processing on the image to be processed to obtain an initial face detection box in the image to be processed; a processing module 703 configured to acquire a target face image based on the initial face detection box; a recognition module 704 configured to perform face recognition processing on the target face image in the in-memory computing chip using a face recognition model pre-configured in the in-memory computing chip to obtain a target face feature vector corresponding to the target face image; and a comparison module 705 configured to compare the target face feature vector with the feature vector corresponding to a preset face image to obtain a face recognition result.

[0136] The face recognition device 700 provided in this disclosure is used to implement the face recognition method provided in any of the above embodiments. For specific related descriptions, please refer to the descriptions in the face recognition methods of any of the above embodiments, which will not be repeated here.

[0137] Figure 8 This is a schematic diagram of the structure of an in-memory computing system provided in an embodiment of the present disclosure, as shown below. Figure 8 As shown, the in-memory computing system 800 includes: an in-memory computing chip 801 and a PL (Programmable Logic) terminal 802.

[0138] The in-memory computing chip 801 integrates a face detection model, which is configured to perform face detection processing on the image to be processed using the face detection model to obtain an initial face detection box in the image to be processed, and send the initial face detection box to the PL terminal 802; the PL terminal 802 is deployed on an FPGA (Field Programmable Gate Array) device, which is configured to acquire a target face image based on the initial face detection box and send the target face image to the in-memory computing chip 801; the in-memory computing chip 801 also integrates a face recognition model, which is further configured to perform face recognition processing on the target face image using the face recognition model to obtain a target face feature vector corresponding to the target face image, and send the target face feature vector to the PL terminal 802; the PL terminal 802 is further configured to compare the target face feature vector with the feature vector corresponding to a preset face image to obtain a face recognition result.

[0139] In some embodiments, PL terminal 802 is configured to: perform non-maximum suppression processing on the initial face detection box to obtain a target face detection box; and map the target face detection box onto the image to be processed to obtain the target face image corresponding to the target face detection box on the image to be processed.

[0140] In some embodiments, such as Figure 8 As shown, the in-memory computing system 800 also includes a PS (Processing System) terminal 803, which is deployed on an FPGA device. The PS terminal 801 is configured to acquire an image to be processed, which includes: acquiring an original image from a camera device; performing image decoding and moving object detection on the original image; and sending the original image to a PL terminal 802 when a moving object is detected in the original image. The PL terminal 802 is also configured to perform image preprocessing on the original image in response to the PS terminal 803 detecting a moving object in the original image, obtaining an image to be processed, and sending the image to be processed to the in-memory computing chip 801.

[0141] In some embodiments, a Linux operating system can be deployed on the FPGA device as the main control system (PS side) to control and schedule hardware resources. Based on the operating system, some general-purpose algorithms with low performance requirements (such as motion detection algorithms) can be implemented.

[0142] In some embodiments, the PL terminal 802 is configured to: normalize the target face feature vector to obtain a normalized target face feature vector; compare the normalized target face feature vector with the feature vector corresponding to a preset face image in a preset database to obtain a face feature matching degree; and when the face feature matching degree is greater than a preset threshold, obtain a face recognition result in which the face is successfully matched.

[0143] Figure 9 This is a schematic diagram illustrating an application scenario of an in-memory computing system provided in an embodiment of this disclosure, such as... Figure 9 As shown, in some application scenarios, the in-memory computing system 900 can be applied to the terminal device 905. The main control system (PS end) 903 in the in-memory computing system 900 can be used as the device control end to interact with the camera device 904 and the terminal device 905. For example, it can control the camera device 904 to perform image acquisition and transmit the face detection results and face recognition results to the terminal device 905.

[0144] In some application scenarios, such as Figure 9 As shown, the camera device 904 transmits the captured raw image to the main control system 903. The main control system 903 performs image signal decoding and moving object detection to activate the programmable logic unit (PL terminal) 902. When the main control system 903 detects a moving object, it transmits the image to the programmable logic unit (PL terminal) 902 for image preprocessing. The programmable logic unit (PL terminal) 902 can maintain a normally-on, low-computing-power operating mode in the hardware. After activation, the programmable logic unit (PL terminal) 902 transmits the processed image to the in-memory computing chip 901, which integrates a face detection model and a face recognition model. The in-memory computing chip 901 performs face detection processing using the face detection model and then... The feature map is sent back to the programmable logic unit (PL end) 902 for feature map post-processing to obtain the target face detection box. The corresponding target face image is then cropped from the image to be processed and sent to the in-memory computing chip 901. The in-memory computing chip 901 uses a face recognition model to calculate the face feature vector of the target face image and sends it back to the programmable logic unit (PL end) 902 for face feature comparison to obtain the face recognition result. The programmable logic unit (PL end) 902 feeds back the face detection result and face recognition result to the terminal device 905 through the main control system 903 so that the terminal device 905 can display the face detection result and face recognition result in real time, or use them to implement other functions, such as device wake-up and unlocking.

[0145] In some embodiments, the in-memory computing system 900 may be integrated into the terminal device 905, and further, the aforementioned camera device 904 may also be integrated into the terminal device 905.

[0146] In some application scenarios, the terminal device 905 can use the in-memory computing system 900 to perform face detection and face recognition for low-power electronic devices such as e-ink tablets, drones, and wearable devices, thereby enabling functions such as device wake-up display and unlocking.

[0147] In drone application scenarios, the terminal device 905 can also use the in-memory computing system 900 to perform face detection and face recognition, and realize functions such as assisting obstacle avoidance, avoiding pedestrians, and chasing suspects.

[0148] In wearable device application scenarios, such as smartwatches and smart glasses, the face recognition method provided in the embodiments of this disclosure can be used to achieve functions such as unlocking the watch and waking up the screen. When a single face is recognized and it is the owner's face, the screen will light up and a notification message will be displayed. When multiple faces are recognized, even if the owner's face is recognized, the screen can remain off and the notification message can be hidden.

[0149] In smart glasses applications, facial feature matching can be performed locally. However, the glasses have limited storage, and the amount of data in the facial feature database is also small. Therefore, the device can upload facial features to the cloud via the Internet and perform facial feature search and comparison in the large database in the cloud.

[0150] In some embodiments, the terminal device 905 may also be an electronic device such as a smartphone, computer, tablet, or television.

[0151] This disclosure also provides an electronic device that includes the in-memory computing system provided in the above embodiments. The in-memory computing system can be integrated into the electronic device to realize face detection and face recognition functions.

[0152] Figure 10 A block diagram of another electronic device provided in an embodiment of this disclosure.

[0153] Reference Figure 10 This disclosure provides an electronic device 1000, which includes: at least one processor 1001; at least one memory 1002; and one or more I / O interfaces 1003 connected between the processor 1001 and the memory 1002; wherein the memory 1002 stores one or more computer programs that can be executed by the at least one processor 1001, and the one or more computer programs are executed by the at least one processor 1001 to enable the at least one processor 1001 to perform the above-described face recognition method.

[0154] This disclosure also provides a computer-readable storage medium storing a computer program thereon, wherein the computer program, when executed by a processor, implements the aforementioned face recognition method. The computer-readable storage medium may be volatile or non-volatile.

[0155] This disclosure also provides a computer program product, including computer-readable code, or a non-volatile computer-readable storage medium carrying computer-readable code. When the computer-readable code is run in the processor of an electronic device, the processor in the electronic device executes the above-described face recognition method.

[0156] Those skilled in the art will understand that all or some of the steps, systems, and apparatuses disclosed above, and their functional modules / units, can be implemented as software, firmware, hardware, or suitable combinations thereof. In hardware implementations, the division between functional modules / units mentioned above does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed collaboratively by several physical components. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit (ASIC). Such software can be distributed on a computer-readable storage medium, which may include computer storage media (or non-transitory media) and communication media (or transient media).

[0157] As is known to those skilled in the art, the term computer storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable program instructions, data structures, program modules, or other data). Computer storage media includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), static random access memory (SRAM), flash memory or other memory technologies, portable compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical disc storage, magnetic cartridges, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and is accessible to a computer. Furthermore, it is known to those skilled in the art that communication media typically contain computer-readable program instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.

[0158] The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.

[0159] Computer program instructions used to perform the operations of this disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, status setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages ​​such as Smalltalk, C++, etc., and conventional procedural programming languages ​​such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuitry, such as programmable logic circuitry, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), is personalized by utilizing the status information of the computer-readable program instructions to implement various aspects of this disclosure.

[0160] The computer program product described herein can be implemented specifically through hardware, software, or a combination thereof. In one alternative embodiment, the computer program product is specifically embodied in a computer storage medium; in another alternative embodiment, the computer program product is specifically embodied in a software product, such as a software development kit (SDK), etc.

[0161] Various aspects of this disclosure are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.

[0162] These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processor of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner; thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.

[0163] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.

[0164] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0165] Example embodiments have been disclosed herein, and while specific terminology has been used, it is for illustrative purposes only and should be construed as such, and is not intended to be limiting. In some instances, it will be apparent to those skilled in the art that features, characteristics, and / or elements described in connection with particular embodiments may be used alone, or in combination with features, characteristics, and / or elements described in connection with other embodiments, unless otherwise expressly indicated. Therefore, those skilled in the art will understand that various changes in form and detail may be made without departing from the scope of this disclosure as set forth by the appended claims.

Claims

1. A face recognition method, characterized in that, The face recognition method includes: Obtain the image to be processed; The image to be processed is input into an in-memory computing chip configured with a face detection model. The preset face detection model is used to perform face detection processing on the image to be processed to obtain the initial face detection box in the image to be processed. Obtaining the target face image based on the initial face detection bounding box includes: The initial face detection box is input into a programmable logic unit for non-maximum suppression processing to obtain the target face detection box; The target face detection box is then input into the memory computing chip. The target face detection box is mapped onto the image to be processed in the in-memory computing chip to obtain the target face image corresponding to the target face detection box on the image to be processed. In the in-memory computing chip, a face recognition model pre-configured in the in-memory computing chip is used to perform face recognition processing on the target face image to obtain the target face feature vector corresponding to the target face image; The target face feature vector is compared with the feature vector corresponding to a preset face image to obtain a face recognition result, including: The target facial feature vector is input into a programmable logic unit (PLU) so that the PLU can perform the following processing: The target face feature vector is normalized to obtain the target face normalized feature vector; The normalized feature vector of the target face is compared with the feature vector corresponding to a preset face image in the preset database to obtain the face feature matching degree. When the facial feature matching degree is greater than a preset threshold, a facial recognition result indicating successful facial matching is obtained.

2. The face recognition method according to claim 1, characterized in that, Before performing nonmaximum suppression processing, the face recognition method further includes: In the programmable logic unit, based on the confidence level of multiple initial face detection boxes output by the face detection model, a preset number of initial face detection boxes are selected from the multiple initial face detection boxes in descending order of confidence level. The step of inputting the initial face detection box into a programmable logic unit for non-maximum suppression processing to obtain the target face detection box includes: A preset number of initial face detection boxes are input into the programmable logic unit for nonmaximum suppression processing to obtain the target face detection box.

3. The face recognition method according to claim 1, characterized in that, The process of acquiring the image to be processed includes: Obtain the original image; The original image is then subjected to image decoding and moving object detection. In response to the detection of a moving object in the original image, the original image is input to the programmable logic unit to perform image preprocessing to obtain the image to be processed.

4. The face recognition method according to claim 3, characterized in that, The image preprocessing includes at least one of the following: image scaling, image cropping, image format conversion, and image enhancement.

5. The face recognition method according to claim 3, characterized in that, The moving object detection includes using frame difference to detect moving objects in the original image.

6. The face recognition method according to any one of claims 1-5, characterized in that, The face detection model includes: a Backbone layer, a Neck layer, and a Head layer; The Backbone layer is used to extract image features from the input image to be processed; The Neck layer is used to reduce or adjust the image features from the Backbone layer; The Head layer is used to take the features processed by the Neck layer as input and generate the final network output to obtain the initial face detection box.

7. The face recognition method according to any one of claims 1-5, characterized in that, The face recognition model was trained using a knowledge distillation method.

8. A face recognition device, characterized in that, include: The acquisition module is configured to acquire the image to be processed; The detection module is configured to input the image to be processed into an in-memory computing chip equipped with a face detection model, and use the face detection model to perform face detection processing on the image to be processed to obtain an initial face detection box in the image to be processed. A processing module is configured to acquire a target face image based on the initial face detection bounding box; the processing module is configured to: The initial face detection box is input into a programmable logic unit for non-maximum suppression processing to obtain the target face detection box; The target face detection box is then input into the memory computing chip. The target face detection box is mapped onto the image to be processed in the in-memory computing chip to obtain the target face image corresponding to the target face detection box on the image to be processed. The recognition module is configured to perform face recognition processing on the target face image in the in-memory computing chip using a face recognition model pre-configured in the in-memory computing chip, and obtain the target face feature vector corresponding to the target face image. The comparison module is configured to compare the target face feature vector with the feature vector corresponding to a preset face image to obtain a face recognition result; the comparison module is configured to: The target facial feature vector is input into a programmable logic unit (PLU) so that the PLU can perform the following processing: The target face feature vector is normalized to obtain the target face normalized feature vector; The normalized feature vector of the target face is compared with the feature vector corresponding to a preset face image in the preset database to obtain the face feature matching degree. When the facial feature matching degree is greater than a preset threshold, a facial recognition result indicating successful facial matching is obtained.

9. An in-memory computing system, characterized in that, include: An in-memory computing chip integrates a face detection model, which is configured to use the face detection model to perform face detection processing on the image to be processed, obtain an initial face detection box in the image to be processed, and send the initial face detection box to the PL terminal; The PL terminal is configured to acquire a target face image based on the initial face detection bounding box, and send the target face image to the in-memory computing chip; the PL terminal is configured to: The initial face detection bounding box is subjected to non-maximum suppression processing to obtain the target face detection bounding box; The target face detection box is mapped onto the image to be processed to obtain the target face image corresponding to the target face detection box on the image to be processed; The in-memory computing chip also integrates a face recognition model, which is configured to use the face recognition model to perform face recognition processing on the target face image, obtain the target face feature vector corresponding to the target face image, and send the target face feature vector to the PL terminal; The PL terminal is further configured to compare the target face feature vector with the feature vector corresponding to a preset face image to obtain a face recognition result; the PL terminal is configured to: The target face feature vector is normalized to obtain the target face normalized feature vector; The normalized feature vector of the target face is compared with the feature vector corresponding to a preset face image in the preset database to obtain the face feature matching degree. When the facial feature matching degree is greater than a preset threshold, a facial recognition result indicating successful facial matching is obtained.

10. The in-memory computing system according to claim 9, characterized in that, The in-memory computing system also includes a PS terminal, which is deployed on an FPGA device; The PS terminal is configured to: acquire raw images from a camera device; perform image decoding and moving object detection on the raw images; and when a moving object is detected in the raw images, send the raw images to the PL terminal. The PL terminal is also configured to, in response to the PS terminal detecting a moving object in the original image, perform image preprocessing on the original image to obtain the image to be processed, and send the image to be processed to the in-memory computing chip.

11. An electronic device, characterized in that, Including the in-memory computing system as described in claim 9 or 10.

12. An electronic device, characterized in that, include: At least one processor; as well as A memory communicatively connected to the at least one processor; wherein, The memory stores one or more computer programs that can be executed by the at least one processor, and the one or more computer programs are executed by the at least one processor to enable the at least one processor to perform the face recognition method as described in any one of claims 1-7.

13. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the face recognition method as described in any one of claims 1-7.