A face key point detection method, device and equipment based on a lightweight network

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a lightweight network based on PConv convolutional modules and combining it with structural reparameterization techniques, the model structure was optimized, solving the problem of insufficient detection accuracy of lightweight networks on mobile devices, and achieving fast and accurate facial landmark detection.

CN117253272BActive Publication Date: 2026-06-23XIAMEN MEITUZHIJIA TECH

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: XIAMEN MEITUZHIJIA TECH
Filing Date: 2023-09-07
Publication Date: 2026-06-23

AI Technical Summary

Technical Problem

Existing lightweight networks cannot effectively balance model performance and detection accuracy on mobile devices, and cannot guarantee high detection accuracy while maintaining a small amount of computation and model size.

Method used

A lightweight network based on the PConv convolutional module is constructed. Combined with structural reparameterization technology, the model is trained and inferred through the CS-PConv Block and Rep-PConv modules to optimize the model structure and improve detection accuracy.

Benefits of technology

It achieves fast and accurate facial landmark detection on mobile devices, meeting the requirements for real-time performance and accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117253272B_ABST

Patent Text Reader

Abstract

The application discloses a face key point detection method and device based on a light network, equipment and a storage medium, which comprises the following steps: constructing a training data set of face key points; inputting the training data set into a light network based on a PConv convolution module to perform model training; training and optimizing the light network by using a mean square error loss function, performing structure reparameterization operation on a Rep-PConv module in the light network, and obtaining a face point detection model; inputting a face image to be detected into the face point detection model to perform face key point detection, and obtaining a detection result. The scheme is based on a PConv light convolution structure, combined with structure reparameterization, constructs a light network to perform face key point detection, can improve the model detection precision while ensuring the light structure of the model.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and in particular to a method, apparatus and device for facial landmark detection based on lightweight networks. Background Technology

[0002] Facial landmark detection is a computer vision task that detects and locates specific locations on a person's face in an image or video. It typically involves identifying the geometric locations of facial features such as the eyes, nose, mouth, and chin. These landmarks provide rich information about facial expressions, poses, and identity, and are one of the fundamental technologies in applications such as face recognition and facial beautification.

[0003] Currently, facial landmark detection technology is mainly developed based on deep neural network models, primarily deployed on mobile devices. Since mobile resources are often limited, it's crucial to ensure model performance for real-time application on mobile devices. However, high-precision deep neural network models typically require high computational complexity and significant storage needs. Therefore, improving detection accuracy while maintaining performance necessitates model lightweighting. Research methods based on lightweight deep neural networks mainly fall into the following categories: First, network design methods, which reduce computational load by designing smaller, more efficient neural network structures or modules; second, strategies based on the pre-designed network to reduce computational load or storage requirements, such as network pruning, model compression, and parameter quantization; and third, knowledge distillation, which transfers knowledge from a large pre-trained teacher network to a smaller student network, allowing the smaller network to learn the characteristics of the larger network, thus achieving better performance while maintaining a smaller computational load and model size. However, existing lightweight networks still have limited ability to balance model performance and detection accuracy on mobile devices, failing to maintain high detection accuracy while keeping computationally low and model size low, as their limited channel information restricts the model's learning capacity. Summary of the Invention

[0004] In view of this, the purpose of this invention is to provide a method, apparatus, and device for facial landmark detection based on lightweight networks, in order to solve the above-mentioned problems.

[0005] To achieve the above objectives, the present invention provides a facial landmark detection method based on lightweight networks, the method comprising:

[0006] Construct a training dataset for facial landmarks;

[0007] The training dataset is input into a lightweight network built based on the PConv convolutional module for model training;

[0008] The lightweight network is trained and optimized using the mean squared error loss function, and the Rep-PConv module in the lightweight network is reparameterized to obtain a face detection model.

[0009] The face image to be detected is input into the face point detection model to detect facial key points and obtain the detection results.

[0010] Preferably, the training dataset for constructing facial landmarks includes:

[0011] The facial landmarks in the acquired face images are marked, and the faces in the face images are aligned using the facial landmarks, cropped and scaled to a preset size to obtain a set of face data.

[0012] The face dataset is subjected to data gain to obtain the training dataset.

[0013] Preferably, the lightweight network structure includes a CS-PConv Block module, a downsampling layer, and a fully connected output layer; wherein,

[0014] The input feature map is processed by the CS-PConv Block module to obtain the first output;

[0015] The first output is downsampled using the downsampling layer to obtain the second output;

[0016] The second output is subjected to facial landmark location prediction through the fully connected output layer to obtain the facial landmark prediction result.

[0017] Preferably, the CS-PConv Block module includes a Channle Shuffle module, a Rep-PConv module, and a GConv 1*1 module; the CS-PConv Block module processes the input feature map to obtain a first output, including:

[0018] The input feature map is shuffled through the Channel Shuffle module to obtain the first feature map.

[0019] The first feature map is input into the Rep-PConv module to obtain the second feature map, wherein the Rep-PConv module is a PConv module based on structural reparameterization;

[0020] The second feature map is input into the Channel Shuffle module for channel restoration to obtain the third feature map;

[0021] The third feature map is input into the GConv 1*1 module for inter-channel information aggregation, and after adding an identity connection, the first output is obtained.

[0022] Preferably, the Rep-PConv module includes k parallel 3*3PConv and 1*1PConv convolutional branches and BN layers.

[0023] Preferably, the step of performing structural reparameterization on the Rep-PConv module in the lightweight network includes:

[0024] pass Perform reparameterized computation; where Reparam represents reparameterized computation, (3*3PConv-BN) represents the structure after 3*3PConv and then connected to the BN layer, (1*1PConv-BN) represents the structure after 1*1PConv and then connected to the BN layer, and k represents the number of parallel branches.

[0025] To achieve the above objectives, the present invention also provides a facial landmark detection device based on a lightweight network, the device comprising:

[0026] Building units are used to construct training datasets for facial landmarks;

[0027] The training unit is used to input the training dataset into a lightweight network built based on the PConv convolutional module for model training.

[0028] The optimization unit is used to train and optimize the lightweight network using the mean squared error loss function, and to perform structural reparameterization operation on the Rep-PConv module in the lightweight network to obtain the face point detection model.

[0029] The detection unit is used to input the face image to be detected into the face point detection model to detect face key points and obtain the detection result.

[0030] To achieve the above objectives, the present invention also proposes an apparatus comprising a processor, a memory, and a computer program stored in the memory, the computer program being executed by the processor to implement the steps of a face key point detection method based on a lightweight network as described in the above embodiments.

[0031] To achieve the above objectives, the present invention also proposes a computer-readable storage medium storing a computer program that is executed by a processor to implement the steps of a face key point detection method based on a lightweight network as described in the above embodiments.

[0032] Beneficial effects:

[0033] The above solution utilizes a lightweight convolutional structure based on Partial Convolution (PConv) combined with structural reparameterization techniques to construct a lightweight network for face landmark detection, improving detection accuracy while maintaining a lightweight structure. In practical applications, it can achieve fast and accurate face landmark prediction, meeting the real-time and accuracy requirements of mobile deployments.

[0034] The above solution constructs a CS-PConv Block by introducing a channel shuffle structure based on PConv, distributing the fixed channel information obtained by PConv convolutions into different channel groups. This replaces the ordinary Conv convolutional structure subsequently connected to PConv with GConv group convolutions, further reducing the computational cost of the model. Furthermore, by introducing structural reparameterization technology, a Rep-PConv reparameterized structure is constructed based on PConv convolutions. During training, Rep-PConv performs parallel multi-branch PConv learning optimization, and during inference, Rep-PConv is reparameterized to maintain its single-branch structure. Combining these two aspects, the goal of improving model detection accuracy while maintaining its lightweight structure can be achieved. Attached Figure Description

[0035] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0036] Figure 1 This is a flowchart illustrating a face landmark detection method based on a lightweight network, as provided in an embodiment of the present invention.

[0037] Figure 2 This is a schematic diagram of the network structure of the CS-PConv Block module provided in an embodiment of the present invention.

[0038] Figure 3 (a) is a schematic diagram of the network structure of the Rep-PConv training process provided in an embodiment of the present invention.

[0039] Figure 3 (b) is a schematic diagram of the network structure for Rep-PConv inference provided in an embodiment of the present invention.

[0040] Figure 4 This is a schematic diagram of a face landmark detection device based on a lightweight network, provided in an embodiment of the present invention.

[0041] The realization of the invention's objective, its functional characteristics, and advantages will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0042] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention. Therefore, the following detailed description of the embodiments of the present invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to represent selected embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0043] In the description of this invention, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined with "first" and "second" may explicitly or implicitly include one or more of that feature.

[0044] The present invention will be described in detail below with reference to the embodiments.

[0045] This invention proposes a face landmark detection method based on a lightweight network, primarily relying on network design methods to achieve model lightweighting. The lightweight convolutional layer Partial Convolution (PConv) utilizes feature map redundancy, performing convolution operations only on a subset of layers. To fully utilize all channel information, PWConv is further appended after PConv. However, for real-time models on mobile devices, the computational cost of PWConv is still considerable. Therefore, by further combining channel shuffle and GConv to construct the CS-PConv Block, computational cost is reduced while maximizing the utilization of channel information. To further compensate for the lack of channel information and improve the model's learning ability, a model reparameterization structure Rep-PConv is constructed using structural reparameterization techniques, enabling parallel multi-branch training during model training and maintaining a single-branch structure during inference.

[0046] Reference Figure 1 The diagram shown is a flowchart of a face landmark detection method based on a lightweight network according to an embodiment of the present invention.

[0047] In this embodiment, the method includes:

[0048] S11, Construct a training dataset for facial landmarks.

[0049] Furthermore, in step S11, constructing the training dataset for facial key points includes:

[0050] The facial landmarks in the acquired face images are marked, and the faces in the face images are aligned using the facial landmarks, cropped and scaled to a preset size to obtain a set of face data.

[0051] The face dataset is subjected to data gain to obtain the training dataset.

[0052] In this embodiment, a facial landmark training dataset is constructed. Facial landmarks are calibrated on the collected facial data, facial landmark labels are created, and faces are aligned using these landmarks. The faces are then cropped and scaled to a uniform size H×W, and this facial dataset is denoted as S. Gain perturbation is then applied to the facial dataset S, randomly performing angle correction, overlay scaling, rotation, horizontal flipping, histogram equalization, and other data gain operations to obtain the training dataset.

[0053] S12, input the training dataset into the lightweight network built based on the PConv convolutional module for model training.

[0054] The lightweight network structure includes a CS-PConv Block module, a downsampling layer, and a fully connected output layer; wherein,

[0055] The input feature map is processed by the CS-PConv Block module to obtain the first output;

[0056] The first output is downsampled using the downsampling layer to obtain the second output;

[0057] The second output is subjected to facial landmark location prediction through the fully connected output layer to obtain the prediction output result.

[0058] Furthermore, the downsampling layer includes an average pooling layer with a stride of 2 and a convolutional layer with a kernel size of 1.

[0059] Furthermore, the CS-PConv Block module includes a Channle Shuffle module, a Rep-PConv module, and a GConv 1*1 module; the CS-PConv Block module processes the input feature map to obtain a first output, including:

[0060] The input feature map is shuffled through the Channel Shuffle module to obtain the first feature map.

[0061] The first feature map is input into the Rep-PConv module to obtain the second feature map, wherein the Rep-PConv module is a PConv module based on structural reparameterization;

[0062] The second feature map is input into the Channel Shuffle module for channel restoration to obtain the third feature map;

[0063] The third feature map is input into the GConv 1*1 module for inter-channel information aggregation, and after adding an identity connection, the first output is obtained.

[0064] Furthermore, the step of shuffling the input feature map through the Channel Shuffle module to obtain the first feature map includes:

[0065] The Channel Shuffle module transposes the input feature maps in different channel groups to arrange them in the same channel group, resulting in the first feature map after channel shuffling.

[0066] Furthermore, the Rep-PConv module includes k parallel 3*3PConv and 1*1PConv convolutional branches and BN layers.

[0067] S13, the lightweight network is trained and optimized using the mean squared error loss function, and the Rep-PConv module in the lightweight network is reparameterized to obtain the face detection model.

[0068] S14, input the face image to be detected into the face point detection model to perform face key point detection and obtain the detection result.

[0069] In this embodiment, the lightweight network is mainly composed of CS-PConvBlocks built based on PConv convolutional modules. The entire lightweight network includes 5 CS-PConv Blocks, 4 downsampling layers, and a fully connected output layer (fc layer). Each downsampling layer consists of an average pooling layer with a stride of 2 and a convolutional layer with a kernel size of 1. (Refer to...) Figure 2The diagram shows the network structure. Specifically, each CS-PConv Block mainly includes Channel Shuffle, Rep-PConv, and GConv 1*1. First, Channel Shuffle is used to shuffle the channels, then Rep-PConv, then Channel Shuffle is used to restore the channels, and finally GConv 1*1 is used to aggregate information between channels, with an identity connection added for module output. Let the input graph size of the CS-PConv Block be h×w×c1, the number of channel groups be g, and the number of channels in each channel group be n = c1 / g.

[0070] The channel shuffling operation primarily involves transposing the input feature maps from different channel groups into the same channel group. Specifically, the input feature map f is first expanded into h×w×g×n, and the dimensions g and n are transposed, transforming the dimensions of the input feature map f into h×w×n×g. Then, the dimensions are merged into h×w×c1, resulting in the channel-shuffled input feature map.

[0071] Furthermore, the input feature map f, shuffled through the channel shuffle, is input into the Rep-PConv module. Rep-PConv is a PConv module based on structural reparameterization, and the number of parallel modules in Rep-PConv is set to k. The network structure for the Rep-PConv training process is described in [reference needed]. Figure 3 As shown in (a), during Rep-PConv training, for k parallel convolutional kernels of size 3*3PConv, parallel identity and 1*1PConv convolutional branches are constructed, and each is summed after passing through a BN normalization layer. Correspondingly, let the local convolution dimension of PConv in Rep-PConv be r, which is equal to the number of channel groups g, that is, after channel shuffle, the first g different feature maps of h×w×c1 feature map f are extracted and convolved. That is,

[0072] PConv = Conv(f h×w×c1[:g] )

[0073] Furthermore, the feature maps output from Rep-PConv are then subjected to channel shuffle to restore the channels, restoring the first g feature maps into g feature channel groups. These are then input into the first GConv 1*1 convolution, which has g channel groups, c1 input channels, and 2*c1 output channels. The second GConv 1*1 convolution also has g channel groups, 2*c1 input channels, and c1 output channels.

[0074] The output feature map of the CS-PConv Block has a size of h×w×c1, which is input into the downsampling layers. Each downsampling layer consists of an average pooling layer with a stride of 2 and a convolutional layer with a kernel size of 1. The output feature map of the downsampling layer has a size of h / 2×w / 2×c1.

[0075] The calculation method for subsequent modules follows the same logic, until the input is to the fully connected (FC) layer for facial landmark location prediction. The predicted facial landmark locations are then output, and the mean squared error loss is calculated using the predicted facial landmark labels and the result. This process is used to train and optimize the entire lightweight network until the loss converges. The formula for the mean squared error loss function is as follows:

[0076]

[0077] Among them, y i 'express Figure 2 The face point prediction result output by the middle fc layer, y i This indicates the annotation of the corresponding facial landmark, where i represents the index of the corresponding facial landmark and n represents the total number of facial landmarks.

[0078] After saving the model structure and parameters after training convergence, the Rep-PConv lightweight network undergoes structural reparameterization to obtain the final face landmark detection model for face landmark detection. Specifically, based on the principle of structural reparameterization, the structural parameters of the Rep-PConv module during inference can be equivalently transformed into another set of structural parameters, that is, the parameters of the parallel branches are added to the 3*3 PConv structural parameters for inference. The network structure of Rep-PConv inference is as follows: Figure 3 As shown in (b). This increases the computational cost during model training while maintaining the time efficiency for inference. The computational principle of reparameterization is as follows:

[0079]

[0080] Where Reparam represents reparameterized computation, (3*3PConv-BN) represents the structure after 3*3PConv and then connected to BN, (1*1PConv-BN) represents the structure after 1*1PConv and then connected to the BN layer, and k represents the number of parallel branches.

[0081] This embodiment utilizes lightweight convolutional PConv and structural reparameterization techniques to design the CS-PConv Block, constructing a lightweight network for learning face landmark detection tasks. This improves model detection accuracy while maintaining a lightweight structure. On one hand, a reparameterization module, Rep-PConv, is built based on PConv to increase computational load during training while maintaining lightweight inference. On the other hand, channel shuffle operations are introduced before and after Rep-PConv to construct the CS-PConv Block. This shuffles and reassembles the input channels and restores the output channels, subsequently connecting GConv group convolutions with the corresponding number of channels. This compensates for the insufficient learning ability of local convolutions in PConv while maintaining its lightweight structure. In practical applications, this enables fast and accurate face landmark prediction, meeting the real-time and accuracy requirements of mobile deployments.

[0082] Reference Figure 4 The diagram shown is a structural schematic of a face key point detection device based on a lightweight network according to an embodiment of the present invention.

[0083] In this embodiment, the device 40 includes:

[0084] Building unit 41 is used to build a training dataset of facial landmarks;

[0085] Training unit 42 is used to input the training dataset into a lightweight network built based on the PConv convolutional module for model training;

[0086] Optimization unit 43 is used to train and optimize the lightweight network using the mean square error loss function, and to perform structural reparameterization operation on the Rep-PConv module in the lightweight network to obtain a face point detection model.

[0087] The detection unit 44 is used to input the face image to be detected into the face point detection model to detect face key points and obtain the detection result.

[0088] Each unit module of the device 40 can execute the corresponding steps in the above method embodiment, so the details of each unit module will not be elaborated here. Please refer to the description of the corresponding steps above for details.

[0089] This invention also provides a device comprising the lightweight network-based facial landmark detection apparatus described above, wherein the lightweight network-based facial landmark detection apparatus can employ... Figure 4 The structure of the embodiment, correspondingly, can be executed Figure 1 The technical solutions of the method embodiments shown are similar in implementation principle and technical effect. For details, please refer to the relevant records in the above embodiments, which will not be repeated here.

[0090] The device includes: a mobile phone, digital camera, or tablet computer, or other device with a camera function; or a device with an image processing function; or a device with an image display function. The device may include components such as a memory, processor, input unit, display unit, and power supply.

[0091] The memory can be used to store software programs and modules. The processor executes various functional applications and data processing by running the software programs and modules stored in the memory. The memory can mainly include a program storage area and a data storage area. The program storage area can store the operating system, application programs required for at least one function (such as image playback function), etc.; the data storage area can store data created according to the use of the device. In addition, the memory can include high-speed random access memory, and can also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory can also include a memory controller to provide access to the memory for the processor and input units.

[0092] The input unit can be used to receive input numerical, character, or image information, and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control. Specifically, in addition to a camera, the input unit of this embodiment may also include a touch-sensitive surface (e.g., a touch screen) and other input devices.

[0093] The display unit can be used to display information input by the user or information provided to the user, as well as various graphical user interfaces of the device. These graphical user interfaces can be composed of graphics, text, icons, video, and any combination thereof. The display unit may include a display panel, optionally configured as an LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or other similar display panel. Furthermore, a touch-sensitive surface may cover the display panel. When the touch-sensitive surface detects a touch operation on or near it, it transmits the information to the processor to determine the type of touch event. Subsequently, the processor provides corresponding visual output on the display panel based on the type of touch event.

[0094] This invention also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the memory described in the above embodiments; or it may be a standalone computer-readable storage medium not assembled into a device. The computer-readable storage medium stores at least one instruction, which is loaded and executed by a processor to implement... Figure 1The method for facial landmark detection based on lightweight networks is shown. The computer-readable storage medium can be a read-only memory, a hard disk, or an optical disk, etc.

[0095] It should be noted that the various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the device embodiments, equipment embodiments, and storage medium embodiments, since they are basically similar to the method embodiments, the descriptions are relatively simple, and relevant parts can be referred to the descriptions in the method embodiments.

[0096] Furthermore, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0097] The foregoing description illustrates and describes preferred embodiments of the present invention. It should be understood that the present invention is not limited to the forms disclosed herein and should not be construed as excluding other embodiments. It can be used in various other combinations, modifications, and environments, and can be altered within the scope of the inventive concept by means of the foregoing teachings or techniques or knowledge in related fields. Any modifications and variations made by those skilled in the art that do not depart from the spirit and scope of the present invention should be within the protection scope of the appended claims.

Claims

1. A method for facial landmark detection based on lightweight networks, characterized in that, The method includes: Construct a training dataset for facial landmarks; including: The facial landmarks in the acquired face images are marked, and the faces in the face images are aligned using the facial landmarks, cropped and scaled to a preset size to obtain a set of face data. Data gain is applied to the face dataset to obtain the training dataset; The training dataset is input into a lightweight network constructed based on the PConv convolutional module for model training; the network structure of the lightweight network includes a CS-PConv Block module, a downsampling layer, and a fully connected output layer; wherein... The input feature map is processed by the CS-PConv Block module to obtain the first output; The first output is downsampled using the downsampling layer to obtain the second output; The second output is subjected to facial landmark location prediction through the fully connected output layer to obtain the facial landmark prediction result. The CS-PConv Block module includes a Channle Shuffle module, a Rep-PConv module, and a GConv 1 module. Module 1; The input feature map is processed by the CS-PConv Block module to obtain the first output, including: The input feature map is shuffled through the Channel Shuffle module to obtain the first feature map. The first feature map is input into the Rep-PConv module to obtain the second feature map, wherein the Rep-PConv module is a PConv module based on structural reparameterization; The second feature map is input into the Channel Shuffle module for channel restoration to obtain the third feature map; Input the third feature map into GConv 1 Module 1 aggregates information between channels and adds an identity connection to obtain the first output; The lightweight network is trained and optimized using the mean squared error loss function, and the Rep-PConv module in the lightweight network is reparameterized to obtain a face detection model. The face image to be detected is input into the face point detection model to detect facial key points and obtain the detection results.

2. The face landmark detection method based on lightweight networks according to claim 1, characterized in that, The Rep-PConv module includes k parallel 3 3 PConv, 1 1. PConv convolutional branches and BN layers.

3. The face landmark detection method based on lightweight networks according to claim 1, characterized in that, The step of performing structural reparameterization on the Rep-PConv module in the lightweight network includes: pass Perform reparameterized calculations; where Reparam represents reparameterized calculations, (3 3 PConv - BN) means 3 3. The structure of PConv connected to the BN layer, (1) 1 PConv - BN) represents 1 1. The structure of PConv connected to the BN layer, where k represents the number of parallel branches.

4. A facial landmark detection device based on a lightweight network, characterized in that, The apparatus for using the face landmark detection method based on lightweight networks according to any one of claims 1-3 includes: Building units are used to construct training datasets for facial landmarks; The training unit is used to input the training dataset into a lightweight network built based on the PConv convolutional module for model training. The optimization unit is used to train and optimize the lightweight network using the mean squared error loss function, and to perform structural reparameterization operation on the Rep-PConv module in the lightweight network to obtain the face point detection model. The detection unit is used to input the face image to be detected into the face point detection model to detect face key points and obtain the detection result.

5. A facial landmark detection device based on a lightweight network, characterized in that, The method includes a processor, a memory, and a computer program stored in the memory, the computer program being executed by the processor to implement the steps of a face landmark detection method based on a lightweight network as described in any one of claims 1 to 3.

6. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that is executed by a processor to implement the steps of a face landmark detection method based on a lightweight network as described in any one of claims 1 to 3.