Face recognition method and system based on infrared binocular camera
By performing multi-scale decomposition and texture enhancement on infrared binocular camera images, the problem of inaccurate recognition of low-texture areas in infrared binocular cameras for face recognition is solved, achieving high-precision face recognition and liveness detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGDONG QILI ELECTRONICS CO LTD
- Filing Date
- 2026-04-08
- Publication Date
- 2026-06-19
AI Technical Summary
In the face recognition process, existing infrared binocular cameras reflect light uniformly on the face surface, such as the forehead and cheeks, in the near-infrared band. This lack of sufficient corner points, edges, or gradient changes leads to a deterioration in the quality of the generated depth map, affecting the accuracy of face recognition and liveness detection.
Image texture is enhanced through multi-scale decomposition and Fourier transform, and the texture richness of low-texture areas is identified and enhanced. Combined with image quality assessment and stereo matching, facial feature data is extracted for recognition.
It significantly improves the accuracy of face recognition and liveness detection, ensuring face recognition accuracy under different lighting conditions.
Smart Images

Figure CN122244926A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of facial recognition technology, specifically a facial recognition method and system based on an infrared binocular camera. Background Technology
[0002] With the continuous development of smart locks, facial recognition technology has been widely used in the field of smart locks. Traditional smart locks mainly rely on 2D images for facial recognition. However, 2D images cannot distinguish between flat photos and real people. Photo prints, mobile phone screens, and even videos can directly unlock traditional smart locks. Furthermore, traditional facial recognition smart locks are sensitive to changes in ambient light. The recognition rate of traditional smart locks drops significantly under backlight and low light conditions.
[0003] To address these issues, a technical solution has emerged in recent years: installing infrared binocular cameras on smart locks for facial recognition. Infrared imaging via binocular cameras not only avoids the effects of visible light but also accurately identifies real people, preventing deception by flat photographs. However, current infrared binocular camera facial recognition technology still has shortcomings. During facial recognition, areas such as the forehead and cheeks reflect light uniformly in the near-infrared band, lacking sufficient corner points, edges, or gradient variations. These areas degrade the quality of the generated depth map, affecting not only the accuracy of subsequent facial recognition but also potentially leading to inaccurate liveness detection and impacting the overall accuracy of facial recognition. Summary of the Invention
[0004] The purpose of this invention is to provide a face recognition method and system based on an infrared binocular camera to solve the problems raised in the prior art.
[0005] To achieve the above objectives, the present invention provides the following technical solution: a face recognition method based on an infrared binocular camera, the method comprising:
[0006] Step S1: Obtain the set of infrared binocular images captured by the infrared binocular camera on the target person, evaluate the image quality of the infrared binocular image pairs in the set, and determine the target binocular image pair;
[0007] Step S2: Perform multi-scale decomposition on the target binocular image pair, perform image texture recognition on the target infrared image in the target binocular image pair, determine the low texture region and high texture region in the target infrared image, analyze the similarity between the high texture region and the low texture region, enhance the texture richness of the hidden texture region, reconstruct and enhance the target infrared image pair to obtain the enhanced binocular image pair.
[0008] Step S3: Perform stereo matching on the enhanced binocular image pair, extract facial features from the enhanced binocular image pair, and obtain facial feature data;
[0009] Step S4: Obtain the face database from the platform and perform face recognition on the target person based on the face feature data.
[0010] Furthermore, step S2 includes:
[0011] Set the layer feature value K, and use the pre-constructed Laplacian pyramid to perform multi-scale decomposition on the target binocular image and the target infrared image to obtain the high-frequency detail layer and the low-frequency baseband. The total number of layers in the high-frequency detail layer is K.
[0012] To obtain a high-frequency detail map of a target infrared image at a specific high-frequency detail layer, the window size of that layer is set, and the high-frequency detail map at that layer is divided into blocks using a sliding window to obtain image window blocks of the high-frequency detail map at that layer. Windowing is then applied to these image window blocks, and a two-dimensional Fourier transform is performed on each block to obtain its Fourier spectrum G. L ;
[0013] Obtain the Fourier spectrum G of the infrared image of another target in a certain high-frequency detail layer of the target binocular image pair. R Calculate the Fourier spectrum G of the image window block. L With Fourier spectrum G R The cross-power spectrum E between the two points is used as the correlation coefficient γ of the image window block.
[0014] Set an indicator variable ζ and a correlation threshold γ´. When γ > γ´, ζ = 1; otherwise, ζ = 0. Calculate the Fourier spectrum G of the image window block. L The enhancement mask M is used to obtain the amplitude A of the frequency point enhancement at index (u,v) in the image window block. (new,L) (u,v);
[0015] Enhancement is performed on each frequency point within the image window block to obtain the enhancement amplitude spectrum A of the image window block in the target infrared image. (new,L) ;
[0016] Fourier spectrum G (△,L) Perform a phase stretching transform to obtain the Fourier spectrum G. (▽,L) ;
[0017] Fourier spectrum G (▽,L) Perform an inverse two-dimensional Fourier transform to obtain the enhanced image window block U. L Get image window block U L The center point;
[0018] The preset face detector is used to obtain key points on the target infrared image, and the high-texture region and low-texture region in the target infrared image are defined. Texture enhancement is performed on each low-texture region to obtain the high-frequency detail map after texture enhancement.
[0019] The high-frequency detail map and low-frequency baseband of the target infrared image after texture enhancement in each high-frequency detail layer are obtained, and Laplacian pyramid reconstruction is performed to obtain the reconstructed and enhanced target infrared image. The other target infrared image of the target infrared image pair is also reconstructed and enhanced, and the reconstructed and enhanced target infrared image and the other target infrared image are combined to obtain the enhanced binocular image pair.
[0020] The above steps obtain the amplitude of different frequency points in the image window block after enhancement, so as to accurately locate biological features such as hidden pores and fine lines in the frequency domain without amplifying image noise. Finally, the enhanced spectral pattern of the high-texture area is injected into the low-texture area through coherence curve matching, which greatly improves the intensity of texture features in the image and significantly improves the accuracy of subsequent face recognition.
[0021] Furthermore, step S1 includes:
[0022] When a target person uses a face lock for face recognition, the infrared binocular camera in the face lock takes pictures of the target person, acquires each pair of infrared binocular images of the target person, and collects each pair of infrared binocular images to obtain a set of infrared binocular image pairs.
[0023] Infrared binocular image pairs are obtained from the set of infrared binocular image pairs, and the image width W and image height H of the infrared image in the infrared binocular image pair are obtained. A two-dimensional coordinate system is constructed for the infrared image, and the length of the pixel in the infrared image is used as the unit length in the two-dimensional coordinate system.
[0024] The image sharpness value of the infrared image is calculated using Laplace variance. The average image sharpness value of the infrared image in the infrared binocular image pair is obtained, and the binocular sharpness score V of the infrared binocular image pair is obtained.
[0025] Obtain all matching points between the left and right infrared images in the infrared stereo image pair. Obtain the fundamental matrix F of the infrared stereo camera. Calculate the epipolar constraint error e of a matching point in the infrared stereo image pair. Set the outlier error threshold e'. When e > e', the matching point is determined to be an outlier; otherwise, it is determined to be an inlier. Obtain the total number C of matching points determined to be outliers in the infrared stereo image pair. Obtain the total number C of all matching points. sum Calculate the out-of-field ratio ρ = C / C for infrared binocular image pairs. sum ;
[0026] Calculate the standard deviation σ of the in-point disparity in the infrared binocular image pair, and calculate the binocular consistency score B of the infrared binocular image pair;
[0027] Based on the binocular consistency score B and the binocular sharpness score V, the image quality score Q of the infrared binocular image pair is calculated, and the maximum value Q of the image quality score of each infrared binocular image pair in the set is obtained. max The maximum value Q is obtained from the set of infrared binocular image pairs. max The corresponding infrared binocular image pair is denoted as the target binocular image pair.
[0028] Furthermore, step S3 includes:
[0029] Obtain enhanced stereo image pairs. According to the positional distribution of the infrared stereo cameras, the enhanced infrared images in the enhanced stereo image pairs are respectively denoted as the left enhanced infrared image and the right enhanced infrared image. Using a preset transformation algorithm, calculate the matching cost between the left enhanced infrared image and the right enhanced infrared image. Perform cost aggregation on the left enhanced infrared image and the right enhanced infrared image to obtain the disparity map of the enhanced stereo image pairs. Perform consistency checks and median filtering optimization on the disparity map to obtain the baseline and focal length of the infrared stereo cameras. Convert the disparity map into a depth map.
[0030] Face detection is performed on the enhanced binocular image pair using a face detection model, generating several candidate boxes and several key points. Non-maximum suppression is applied to the candidate boxes, and face boxes are obtained from the candidate boxes. The coordinates of the key points are obtained, and the faces in the face boxes are transformed to the standard pose based on the coordinates of the key points. The face boxes are then standardized.
[0031] Obtain face bounding boxes and depth maps, extract multi-dimensional features from the face bounding boxes and depth maps, fuse the extracted multi-dimensional features to generate face feature vectors, and aggregate the face feature vectors to obtain face feature data.
[0032] Furthermore, step S4 includes:
[0033] Obtain a preset face database from the platform. The face database contains each infrared binocular image pair of each registered user. Obtain the face feature vector of each infrared binocular image pair of a certain registered user and obtain the average value of the face feature vector of each infrared binocular image pair, which is denoted as the template face feature vector Z of a certain registered user.
[0034] Obtain the facial feature vector Z´ from the facial feature data of the target person, and calculate the facial similarity value δ between the target person and a certain registered user:
[0035] ,
[0036] Obtain the maximum value δ of the facial similarity between the target person and each registered user on the platform. max To obtain the maximum value δ max The registered users corresponding to each registered user are identified and recorded as the target registered users. A facial similarity threshold δ´ is set.
[0037] When δ max When the value is greater than δ´, the target person is determined to be the target registered person, and the target person's face recognition is successful. Otherwise, the target person's face recognition is determined to have failed, and the face lock issues a face recognition failure prompt.
[0038] To better implement the above method, a face recognition system is also proposed, which includes an image quality assessment module, an image enhancement module, a face feature extraction module, and a face recognition module.
[0039] The image quality assessment module is used to acquire a set of infrared binocular image pairs captured by an infrared binocular camera on a target person, evaluate the image quality of the infrared binocular image pairs in the set, and determine the target binocular image pairs.
[0040] The image enhancement module is used to reconstruct and enhance the target binocular image pair to obtain an enhanced binocular image pair.
[0041] The face feature extraction module is used to extract face features from enhanced binocular image pairs to obtain face feature data;
[0042] The face recognition module is used to acquire a face database and perform face recognition on target individuals based on face feature data.
[0043] Furthermore, the image quality assessment module includes an image pair acquisition unit and an image quality assessment unit;
[0044] The image pair acquisition unit is used to capture images of the target person using the infrared binocular camera in the face lock, acquire each infrared binocular image pair of the target person and aggregate them to obtain an infrared binocular image pair set.
[0045] The image quality assessment unit is used to assess the image quality of each infrared binocular image pair in the infrared binocular image pair set and determine the target binocular image pair.
[0046] Furthermore, the image enhancement module includes a texture recognition unit and an image enhancement unit;
[0047] The texture recognition unit is used to perform multi-scale decomposition on the target binocular image pair, identify the influence texture of the target infrared image in the target binocular image pair, and determine the low-texture region and high-texture region in the target infrared image.
[0048] The image enhancement unit is used to analyze the similarity between high-texture regions and low-texture regions, enhance the texture richness of hidden texture regions, reconstruct and enhance the target infrared image pair, and obtain an enhanced binocular image pair.
[0049] Furthermore, the face feature extraction module includes a data acquisition unit and a face feature extraction unit;
[0050] The data acquisition unit is used to perform stereo matching and face recognition on the enhanced binocular image pairs to obtain face bounding boxes and depth maps;
[0051] The face feature extraction unit is used to extract multi-dimensional features from the face bounding box and the depth map, generate face feature vectors, and collect the face feature vectors to obtain face feature data.
[0052] Furthermore, the face recognition module includes a face recognition unit;
[0053] The face recognition unit is used to obtain a preset face database from the platform, obtain template face feature vectors of each registered user from the face database, and combine the face feature vectors in the face feature data to perform face recognition on the target person.
[0054] Compared with existing technologies, the beneficial effects of this invention are as follows: by evaluating image quality, the target binocular image pair with the best image quality is selected, thereby fundamentally improving the accuracy of face recognition. Furthermore, the latent periodic texture shared by the left and right eyes is mined and enhanced from the frequency domain. At the same time, the real pattern of high-texture areas is injected into low-texture areas, significantly improving the stereo matching accuracy of low-texture areas. Finally, by constructing face feature vectors, similarity matching is performed on the faces of registered users in the face database, thereby realizing face recognition of target individuals. This not only ensures the accuracy of face recognition but also effectively improves the accuracy of liveness detection, truly achieving accurate face recognition. Attached Figure Description
[0055] Figure 1 This is a flowchart of a face recognition method based on an infrared binocular camera according to the present invention.
[0056] Figure 2 This is a flowchart of the modules of a face recognition system according to the present invention. Detailed Implementation
[0057] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0058] Example: Figures 1-2 As shown, the present invention provides a technical solution, a face recognition method based on an infrared binocular camera, the method comprising:
[0059] Step S1: Obtain the set of infrared binocular images captured by the infrared binocular camera on the target person, evaluate the image quality of the infrared binocular image pairs in the set, and determine the target binocular image pair;
[0060] Step S1 includes:
[0061] When a target person uses a face lock for face recognition, the infrared binocular camera in the face lock takes pictures of the target person, acquires each pair of infrared binocular images of the target person, and collects each pair of infrared binocular images to obtain a set of infrared binocular image pairs.
[0062] Infrared binocular image pairs are obtained from the set of infrared binocular image pairs, and the image width W and image height H of the infrared image in the infrared binocular image pair are obtained. A two-dimensional coordinate system is constructed for the infrared image, and the length of the pixel in the infrared image is used as the unit length in the two-dimensional coordinate system.
[0063] The image sharpness value of the infrared image is calculated using Laplace variance. The average image sharpness value of the infrared image in the infrared binocular image pair is obtained, and the binocular sharpness score V of the infrared binocular image pair is obtained.
[0064] For example, the image width W is the maximum value of the horizontal coordinate of the infrared image in the two-dimensional coordinate system, and the image height H is the maximum value of the vertical coordinate of the infrared image in the two-dimensional coordinate system.
[0065] Calculate the binocular sharpness score V of the infrared image:
[0066] ,
[0067] Among them, ▽ 2 I(x,y) represents the discrete Laplacian operator for the pixel value at coordinate (x,y) in an infrared image, where ▽ 2I(x,y) = I(x-1,y) + I(x+1,y) + I(x,y-1) + I(x,y+1) - 4I(x,y), where I(x-1,y), I(x+1,y), I(x,y-1), I(x,y+1) and I(x,y) represent the pixel values of the points with coordinates (x-1,y), (x+1,y), (x,y-1), (x,y+1) and (x,y) in the infrared image, respectively.
[0068] Obtain all matching points between the left and right infrared images in the infrared stereo image pair. Obtain the fundamental matrix F of the infrared stereo camera. Calculate the epipolar constraint error e of a matching point in the infrared stereo image pair. Set the outlier error threshold e'. When e > e', the matching point is determined to be an outlier; otherwise, it is determined to be an inlier. Obtain the total number C of matching points determined to be outliers in the infrared stereo image pair. Obtain the total number C of all matching points. sum Calculate the out-of-field ratio ρ = C / C for infrared binocular image pairs. sum ;
[0069] Calculate the standard deviation σ of the in-point disparity in the infrared binocular image pair, and calculate the binocular consistency score B of the infrared binocular image pair;
[0070] For example, the specific process of matching points between the left and right infrared images in a binocular infrared image is as follows:
[0071] ORB features are used to extract matching points between the left and right infrared images in an infrared stereo image pair.
[0072] For example, the specific calculation process for the epipolar constraint error 'e' at a certain matching point is as follows:
[0073] Get a matching point (p) L ,p R ), where p L Let p be the homogeneous coordinates of the left infrared image at a given matching point. R Let p be the homogeneous coordinates of the right infrared image at a given matching point. L =(x L ,y L ,1), p R =(x R ,y R ,1),(x L ,y L (x) represents the coordinates of the pixel corresponding to a given matching point in the left infrared image. R ,y R () represents the coordinates of the pixel corresponding to a certain matching point in the right infrared image;
[0074] Obtain the fundamental matrix F, which is a 3x3 matrix calibrated from the intrinsic parameters and relative pose (rotation, translation) of the infrared binocular camera. It describes the epipolar geometric constraints between the left and right infrared images.
[0075] Calculate the fundamental matrix F and p L The product of these terms yields a 3-dimensional column vector Fp. L =(V,b,c) T Calculate the epipolar constraint error e:
[0076] ,
[0077] For example, when a matching point is determined to be an interior point, the interior disparity d=x of that matching point is calculated. L -x R ;
[0078] For example, the specific formula for calculating the binocular consistency score B of an infrared binocular image pair is as follows:
[0079] ,
[0080] Where d´ is a preset visual feature value, which is usually set to 50 pixels;
[0081] Based on the binocular consistency score B and the binocular sharpness score V, the image quality score Q of the infrared binocular image pair is calculated, and the maximum value Q of the image quality score of each infrared binocular image pair in the set is obtained. max The maximum value Q is obtained from the set of infrared binocular image pairs. max The corresponding infrared binocular image pairs are denoted as target binocular image pairs;
[0082] For example, the specific process for calculating the image quality score Q of an infrared binocular image pair is as follows:
[0083] The binocular consistency score B and binocular clarity score V were normalized so that the normalized binocular consistency score B and binocular clarity score V were in the range of 0 to 1.
[0084] Calculate the image quality score Q=η B ×B+η V ×V, where η B For the preset image sharpness weights, η V η is the preset image consistency weight. B With η V It is greater than zero and its sum is 1.
[0085] Step S2: Perform multi-scale decomposition on the target binocular image pair, perform image texture recognition on the target infrared image in the target binocular image pair, determine the low texture region and high texture region in the target infrared image, analyze the similarity between the high texture region and the low texture region, enhance the texture richness of the hidden texture region, reconstruct and enhance the target infrared image pair to obtain the enhanced binocular image pair.
[0086] Step S2 includes:
[0087] Set the layer feature value K, and use the pre-constructed Laplacian pyramid to perform multi-scale decomposition on the target binocular image and the target infrared image to obtain the high-frequency detail layer and the low-frequency baseband. The total number of layers in the high-frequency detail layer is K.
[0088] For example, the Laplacian pyramid is obtained by the Gaussian pyramid and the difference operation. Each high-frequency detail layer is the difference between adjacent Gaussian pyramid layers, which can decompose the image into a low-frequency baseband (the top Gaussian image) and multiple high-frequency detail layers.
[0089] For example, the total number of high-frequency detail layers, K, is usually set to 3;
[0090] To obtain a high-frequency detail map of a target infrared image at a specific high-frequency detail layer, the window size of that layer is set, and the high-frequency detail map at that layer is divided into blocks using a sliding window to obtain image window blocks of the high-frequency detail map at that layer. Windowing is then applied to these image window blocks, and a two-dimensional Fourier transform is performed on each block to obtain its Fourier spectrum G. L ;
[0091] For example, windowing an image window block specifically involves:
[0092] Obtain the window data of the image window block and multiply it by a two-dimensional separable HVnning window to complete the windowing of the image window block;
[0093] Obtain the Fourier spectrum G of the infrared image of another target in a certain high-frequency detail layer of the target binocular image pair. R Calculate the Fourier spectrum G of the image window block. L With Fourier spectrum G R The cross-power spectrum E between the two points is used as the correlation coefficient γ of the image window block.
[0094] For example, another target infrared image and the target infrared image represent the left target infrared image and the right target infrared image in the target binocular image pair, respectively;
[0095] For example, the specific formula for calculating the cross power spectrum E is:
[0096] ,
[0097] Where ϵ is a preset minimum positive constant, usually taken as 10. -8 ; For G R The complex conjugate;
[0098] Set an indicator variable ζ and a correlation threshold γ´. When γ > γ´, ζ = 1; otherwise, ζ = 0. Calculate the Fourier spectrum G of the image window block. L The enhancement mask M is used to obtain the amplitude A of the frequency point enhancement at index (u,v) in the image window block. (new,L) (u,v);
[0099] For example, the amplitude A of the frequency point at index (u,v) in the image window block after enhancement. (new,L) The specific process for obtaining (u,v) is as follows: Obtain the Fourier spectrum G. L The amplitude A of the frequency corresponding to the frequency point at index (u,v) of the image window block. L Given (u,v), obtain the enhancement mask M(u,v) for the frequency point at index (u,v) in the image window block. Enhance the frequency point at index (u,v) in the image window block and obtain the amplitude A of the enhanced frequency point at index (u,v) in the image window block. (new,L) (u,v)=A L (u,v)×(1+λ×M(u,v)), where λ is the preset mask enhancement coefficient;
[0100] Enhancement is performed on each frequency point within the image window block to obtain the enhancement amplitude spectrum A of the image window block in the target infrared image. (new,L) ;
[0101] For example, the specific formula for calculating the enhancement mask M(u,v) is:
[0102] ,
[0103] Where ζ(u,v) is an indicator variable for the frequency point at index (u,v) in the image window block; max(A L ) represents the Fourier spectrum G L The maximum value of the moderate amplitude;
[0104] Fourier spectrum G (△,L) Perform a phase stretching transform to obtain the Fourier spectrum G. (▽,L) ;
[0105] For example, the Fourier spectrum G (△,L) Perform a phase stretching transform to obtain the Fourier spectrum G. (▽,L) The specific process is as follows:
[0106] The frequency points at index (u,v) in the image window block are transformed into radial frequencies g in polar coordinates. A phase transformation function J(g) for the frequency points at index (u,v) in the image window block is established, and the Fourier spectrum G of the enhanced image window block in the target infrared image is obtained. (△,L) A phase stretching transformation is performed on the frequency points at index (u,v) in the image window block to obtain G. (△,L) (u,v), and the Fourier spectrum G (△,L) Perform a phase stretching transform to obtain the Fourier spectrum G. (▽,L) ;
[0107] For example, the radial frequency g is specifically:
[0108] ,
[0109] Among them, g max For high-frequency control values, g is obtained by normalizing half of the sampling frequency. max ;
[0110] The phase transformation function J(g) is specifically as follows:
[0111] ,
[0112] Where β is the preset high-frequency fading steepness; α is the preset low-frequency boost steepness; g min The preset low-frequency control values; β, α, and g min The classic values are 2, 0.5, and 0.01, respectively;
[0113] For example, the enhanced Fourier spectrum G (△,L) Specifically:
[0114] ,
[0115] Where i is the imaginary unit; Φ L Fourier spectrum G L The phase spectrum;
[0116] Fourier spectrum G (▽,L) Perform an inverse two-dimensional Fourier transform to obtain the enhanced image window block U. L Get image window block U L The center point;
[0117] The preset face detector is used to obtain key points on the target infrared image, and the high-texture region and low-texture region in the target infrared image are defined. Texture enhancement is performed on each low-texture region to obtain the high-frequency detail map after texture enhancement.
[0118] For example, texture enhancement is performed on low-texture areas. The specific enhancement process is as follows:
[0119] Obtain image window blocks of each high-texture region and each low-texture region, obtain similar high-texture regions of low-texture regions, and replace the pixel value of the center pixel of the image window block of the low-texture region with the pixel value of the center pixel of the enhanced image window block in the similar high-texture region. Then write the replaced pixel value of the center pixel of the image window block in the low-texture region into the high-frequency detail map of a certain high-frequency detail layer of the target infrared image to enhance the texture of the low-texture region.
[0120] For example, a preset face detector is used to acquire key points on the target infrared image, and high-texture regions and low-texture regions in the target infrared image are defined. The specific process is as follows:
[0121] Using a face detector (such as MTCNN), 68 key points are obtained on the target infrared image, defining high-texture regions (such as eyes, eyebrows, and mouth) and low-texture regions (such as forehead and cheeks).
[0122] For example, the specific process for obtaining similar high-texture regions from low-texture regions is as follows:
[0123] Obtain the correlation coefficient of each index in the low-texture region to obtain the correlation coefficient curve of the low-texture region. Obtain the correlation coefficient curve of each high-texture region. Calculate the Euclidean distance between the correlation coefficient curve of each high-texture region and the correlation coefficient curve of the low-texture region. Obtain a high-texture region corresponding to the minimum value of the Euclidean distance between the correlation coefficient curve of each high-texture region and the correlation coefficient curve of the low-texture region. Determine that the spectral features of a high-texture region are most similar to those of a low-texture region, and record the high-texture region as a similar high-texture region.
[0124] The high-frequency detail map and low-frequency baseband of the target infrared image after texture enhancement in each high-frequency detail layer are obtained, and Laplacian pyramid reconstruction is performed to obtain the reconstructed and enhanced target infrared image. The other target infrared image in the target infrared image pair is also reconstructed and enhanced, and the reconstructed and enhanced target infrared image and the other target infrared image are combined to obtain the enhanced binocular image pair.
[0125] Step S3: Perform stereo matching on the enhanced binocular image pair, extract facial features from the enhanced binocular image pair, and obtain facial feature data;
[0126] Step S3 includes:
[0127] Obtain enhanced stereo image pairs. According to the positional distribution of the infrared stereo cameras, the enhanced infrared images in the enhanced stereo image pairs are respectively denoted as the left enhanced infrared image and the right enhanced infrared image. Using a preset transformation algorithm, calculate the matching cost between the left enhanced infrared image and the right enhanced infrared image. Perform cost aggregation on the left enhanced infrared image and the right enhanced infrared image to obtain the disparity map of the enhanced stereo image pairs. Perform consistency checks and median filtering optimization on the disparity map to obtain the baseline and focal length of the infrared stereo cameras. Convert the disparity map into a depth map.
[0128] For example, the preset transformation algorithm is the Census transformation algorithm;
[0129] For example, cost aggregation is performed on the left and right enhanced infrared images to obtain the disparity map of the enhanced binocular image pair, specifically:
[0130] Through the multi-path dynamic programming steps of SGM, cost aggregation is performed to obtain the disparity map of the enhanced binocular image pair;
[0131] For example, the specific process for performing a consistency check on a disparity map is as follows:
[0132] Calculate the disparity maps between the left enhanced infrared image and the right enhanced infrared image, and between the right enhanced infrared image and the left enhanced infrared image, respectively;
[0133] For a pixel in the left enhanced infrared image, find its corresponding point in the right enhanced infrared image based on its parallax, and check whether the parallax value of the corresponding point is consistent with that of the left enhanced infrared image.
[0134] If the difference between the two exceeds a preset threshold (usually 1 or 2 pixels), the matching point is considered unreliable and will be marked and removed.
[0135] For example, median filtering is a classic nonlinear filtering technique. After removing mismatches, it fills in the holes in the disparity map and removes noise. It can effectively smooth the image while better preserving edge information, thereby improving the overall quality of the disparity map.
[0136] Face detection is performed on the enhanced binocular image pair using a face detection model, generating several candidate boxes and several key points. Non-maximum suppression is applied to the candidate boxes, and face boxes are obtained from the candidate boxes. The coordinates of the key points are obtained, and the faces in the face boxes are transformed to the standard pose based on the coordinates of the key points. The face boxes are then standardized.
[0137] For example, the existing face detection model is MTCNN;
[0138] For example, a face detection model is used to detect faces in enhanced binocular image pairs, generating candidate boxes and several key points. Non-maximum suppression is then applied to the candidate boxes to optimize them and obtain the coordinates of the key points. This is specifically achieved through three cascaded sub-networks of MTCNN, as follows:
[0139] P-Net: Used to quickly generate candidate boxes and the approximate locations of 5 key points (eyes, nose tip, corners of mouth);
[0140] R-Net: Performs non-maximum suppression on candidate boxes, refines bounding boxes and key points;
[0141] O-Net: Outputs the final face bounding box and precise coordinates of 5 key points;
[0142] For example, the standardization of face frames specifically involves:
[0143] The aligned face area is cropped to a fixed size and histogram equalization is performed to eliminate the effects of lighting.
[0144] Obtain face bounding boxes and depth maps, extract multi-dimensional features from the face bounding boxes and depth maps, fuse the extracted multi-dimensional features to generate face feature vectors, and aggregate the face feature vectors to obtain face feature data.
[0145] For example, the process involves acquiring a face bounding box and a depth map, extracting multi-dimensional features from the face bounding box and depth map, and then fusing the extracted multi-dimensional features to generate a face feature vector. The specific steps are as follows:
[0146] LBP features are extracted from the standardized face bounding box, and the LBP feature vector is output. The LBP generates a binary pattern to describe the local texture by comparing the gray values of the center pixel with those of the neighboring pixels.
[0147] Calculate the gradient magnitude and direction of each pixel in the face bounding box, and divide the face bounding box into cell units based on the gradient magnitude and direction of each pixel. Calculate the gradient direction histogram in each cell, normalize it by block, and finally concatenate the features of all blocks to obtain the HOG feature vector.
[0148] Depth offset features based on key points are extracted from the depth map, the depth difference between each key point and the tip of the nose key point is calculated, and the features are sorted and aggregated in a preset order to obtain a depth geometric feature vector.
[0149] The LBP feature vector, HOG feature vector and deep geometric feature vector are concatenated and fused to obtain the original face feature vector.
[0150] Principal component analysis was used to reduce the dimensionality of the original facial feature vectors to obtain the facial feature vectors.
[0151] Step S4: Obtain the face database from the platform and perform face recognition on the target person based on the face feature data;
[0152] Step S4 includes:
[0153] Obtain a preset face database from the platform. The face database contains each infrared binocular image pair of each registered user. Obtain the face feature vector of each infrared binocular image pair of a certain registered user and obtain the average value of the face feature vector of each infrared binocular image pair, which is denoted as the template face feature vector Z of a certain registered user.
[0154] Obtain the facial feature vector Z´ from the facial feature data of the target person, and calculate the facial similarity value δ between the target person and a certain registered user:
[0155] ,
[0156] Obtain the maximum value δ of the facial similarity between the target person and each registered user on the platform. max To obtain the maximum value δ max The registered users corresponding to each registered user are identified and recorded as the target registered users. A facial similarity threshold δ´ is set.
[0157] When δ max When the value is greater than δ´, the target person is determined to be the target registered person, and the target person's face recognition is successful. Otherwise, the target person's face recognition is determined to have failed, and the face lock issues a face recognition failure prompt.
[0158] To better implement the above method, a face recognition system is also proposed, which includes an image quality assessment module, an image enhancement module, a face feature extraction module, and a face recognition module.
[0159] The image quality assessment module is used to acquire a set of infrared binocular image pairs captured by an infrared binocular camera on a target person, evaluate the image quality of the infrared binocular image pairs in the set, and determine the target binocular image pairs.
[0160] The image enhancement module is used to reconstruct and enhance the target binocular image pair to obtain an enhanced binocular image pair.
[0161] The face feature extraction module is used to extract face features from enhanced binocular image pairs to obtain face feature data;
[0162] The face recognition module is used to acquire a face database and perform face recognition on target individuals based on face feature data.
[0163] The image quality assessment module includes an image pair acquisition unit and an image quality assessment unit.
[0164] The image pair acquisition unit is used to capture images of the target person using the infrared binocular camera in the face lock, acquire each infrared binocular image pair of the target person and aggregate them to obtain an infrared binocular image pair set.
[0165] The image quality assessment unit is used to assess the image quality of each infrared binocular image pair in the infrared binocular image pair set and determine the target binocular image pair.
[0166] The image enhancement module includes a texture recognition unit and an image enhancement unit.
[0167] The texture recognition unit is used to perform multi-scale decomposition on the target binocular image pair, identify the influence texture of the target infrared image in the target binocular image pair, and determine the low-texture region and high-texture region in the target infrared image.
[0168] The image enhancement unit is used to analyze the similarity between high-texture regions and low-texture regions, enhance the texture richness of hidden texture regions, reconstruct and enhance the target infrared image pair, and obtain an enhanced binocular image pair.
[0169] The facial feature extraction module includes a data acquisition unit and a facial feature extraction unit.
[0170] The data acquisition unit is used to perform stereo matching and face recognition on the enhanced binocular image pairs to obtain face bounding boxes and depth maps;
[0171] The face feature extraction unit is used to extract multi-dimensional features from the face bounding box and the depth map, generate face feature vectors, and collect the face feature vectors to obtain face feature data.
[0172] The face recognition module includes a face recognition unit;
[0173] The face recognition unit is used to obtain a preset face database from the platform, obtain template face feature vectors of each registered user from the face database, and combine the face feature vectors in the face feature data to perform face recognition on the target person.
[0174] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the invention can be implemented in other specific forms without departing from its spirit or essential characteristics. Therefore, the embodiments should be considered in all respects as exemplary and non-limiting, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be included within the present invention. No reference numerals in the claims should be construed as limiting the scope of the claims.
Claims
1. A face recognition method based on an infrared binocular camera, characterized in that, The method includes: Step S1: Obtain the set of infrared binocular images captured by the infrared binocular camera on the target person, evaluate the image quality of the infrared binocular image pairs in the set, and determine the target binocular image pair; Step S2: Perform multi-scale decomposition on the target binocular image pair, perform image texture recognition on the target infrared image in the target binocular image pair, determine the low texture region and high texture region in the target infrared image, analyze the similarity between the high texture region and the low texture region, enhance the texture richness of the hidden texture region, reconstruct and enhance the target infrared image pair to obtain the enhanced binocular image pair. Step S3: Perform stereo matching on the enhanced binocular image pair, extract facial features from the enhanced binocular image pair, and obtain facial feature data; Step S4: Obtain the face database from the platform and perform face recognition on the target person based on the face feature data.
2. The face recognition method based on an infrared binocular camera according to claim 1, characterized in that, Step S2 includes: Set the layer feature value K, and use the pre-constructed Laplacian pyramid to perform multi-scale decomposition on the target binocular image and the target infrared image to obtain the high-frequency detail layer and the low-frequency baseband. The total number of layers in the high-frequency detail layer is K. To obtain a high-frequency detail map of a target infrared image at a specific high-frequency detail layer, the window size of that layer is set, and the high-frequency detail map at that layer is divided into blocks using a sliding window to obtain image window blocks of the high-frequency detail map at that layer. Windowing is then applied to these image window blocks, and a two-dimensional Fourier transform is performed on each block to obtain its Fourier spectrum G. L ; Obtain the Fourier spectrum G of the infrared image of another target in a certain high-frequency detail layer of the target binocular image pair. R Calculate the Fourier spectrum G of the image window block. L With Fourier spectrum G R The cross-power spectrum E between the two points is used as the correlation coefficient γ of the image window block. Set an indicator variable ζ and a correlation threshold γ´. When γ > γ´, ζ = 1; otherwise, ζ = 0. Calculate the Fourier spectrum G of the image window block. L The enhancement mask M is used to obtain the amplitude A of the frequency point enhancement at index (u,v) in the image window block. (new,L) (u,v); Enhancement is performed on each frequency point within the image window block to obtain the enhancement amplitude spectrum A of the image window block in the target infrared image. (new,L) ; Fourier spectrum G (△,L) Perform a phase stretching transform to obtain the Fourier spectrum G. (▽,L) ; Fourier spectrum G (▽,L) Perform an inverse two-dimensional Fourier transform to obtain the enhanced image window block U. L Get image window block U L The center point; The preset face detector is used to obtain key points on the target infrared image, and the high-texture region and low-texture region in the target infrared image are defined. Texture enhancement is performed on each low-texture region to obtain the high-frequency detail map after texture enhancement. The high-frequency detail map and low-frequency baseband of the target infrared image after texture enhancement in each high-frequency detail layer are obtained, and Laplacian pyramid reconstruction is performed to obtain the reconstructed and enhanced target infrared image. The other target infrared image in the target infrared image pair is also reconstructed and enhanced, and the reconstructed and enhanced target infrared image and the other target infrared image are combined to obtain the enhanced binocular image pair.
3. The face recognition method based on an infrared binocular camera according to claim 1, characterized in that, Step S1 includes: When a target person uses a face lock for face recognition, the infrared binocular camera in the face lock takes pictures of the target person, acquires each pair of infrared binocular images of the target person, and collects each pair of infrared binocular images to obtain a set of infrared binocular image pairs. Infrared binocular image pairs are obtained from the set of infrared binocular image pairs, and the image width W and image height H of the infrared image in the infrared binocular image pair are obtained. A two-dimensional coordinate system is constructed for the infrared image, and the length of the pixel in the infrared image is used as the unit length in the two-dimensional coordinate system. The image sharpness value of the infrared image is calculated using Laplace variance. The average image sharpness value of the infrared image in the infrared binocular image pair is obtained, and the binocular sharpness score V of the infrared binocular image pair is obtained. Obtain all matching points between the left and right infrared images in the infrared stereo image pair. Obtain the fundamental matrix F of the infrared stereo camera. Calculate the epipolar constraint error e of a matching point in the infrared stereo image pair. Set the outlier error threshold e'. When e > e', the matching point is determined to be an outlier; otherwise, it is determined to be an inlier. Obtain the total number C of matching points determined to be outliers in the infrared stereo image pair. Obtain the total number C of all matching points. sum Calculate the out-of-field ratio ρ = C / C for infrared binocular image pairs. sum ; Calculate the standard deviation σ of the in-point disparity in the infrared binocular image pair, and calculate the binocular consistency score B of the infrared binocular image pair; Based on the binocular consistency score B and the binocular sharpness score V, the image quality score Q of the infrared binocular image pair is calculated, and the maximum value Q of the image quality score of each infrared binocular image pair in the set is obtained. max The maximum value Q is obtained from the set of infrared binocular image pairs. max The corresponding infrared binocular image pair is denoted as the target binocular image pair.
4. The face recognition method based on an infrared binocular camera according to claim 1, characterized in that, Step S3 includes: Obtain enhanced stereo image pairs. According to the positional distribution of the infrared stereo cameras, the enhanced infrared images in the enhanced stereo image pairs are respectively denoted as the left enhanced infrared image and the right enhanced infrared image. Using a preset transformation algorithm, calculate the matching cost between the left enhanced infrared image and the right enhanced infrared image. Perform cost aggregation on the left enhanced infrared image and the right enhanced infrared image to obtain the disparity map of the enhanced stereo image pairs. Perform consistency checks and median filtering optimization on the disparity map to obtain the baseline and focal length of the infrared stereo cameras. Convert the disparity map into a depth map. Face detection is performed on the enhanced binocular image pair using a face detection model, generating several candidate boxes and several key points. Non-maximum suppression is applied to the candidate boxes, and face boxes are obtained from the candidate boxes. The coordinates of the key points are obtained, and the faces in the face boxes are transformed to the standard pose based on the coordinates of the key points. The face boxes are then standardized. Obtain face bounding boxes and depth maps, extract multi-dimensional features from the face bounding boxes and depth maps, fuse the extracted multi-dimensional features to generate face feature vectors, and aggregate the face feature vectors to obtain face feature data.
5. A face recognition method based on an infrared binocular camera according to claim 1, characterized in that, Step S4 includes: Obtain a preset face database from the platform. The face database contains each infrared binocular image pair of each registered user. Obtain the face feature vector of each infrared binocular image pair of a certain registered user and obtain the average value of the face feature vector of each infrared binocular image pair, which is denoted as the template face feature vector Z of a certain registered user. Obtain the facial feature vector Z´ from the facial feature data of the target person, and calculate the facial similarity value δ between the target person and a certain registered user: , Obtain the maximum value δ of the facial similarity between the target person and each registered user on the platform. max To obtain the maximum value δ max The registered users corresponding to each registered user are identified and recorded as the target registered users. A facial similarity threshold δ´ is set. When δ max When the value is greater than δ´, the target person is determined to be the target registered person, and the target person's face recognition is successful. Otherwise, the target person's face recognition is determined to have failed, and the face lock issues a face recognition failure prompt.
6. A face recognition system, used to execute the face recognition method based on an infrared binocular camera as described in any one of claims 1-5, characterized in that, The system includes an image quality assessment module, an image enhancement module, a facial feature extraction module, and a facial recognition module; The image quality assessment module is used to acquire a set of infrared binocular image pairs captured by the infrared binocular camera on the target person, evaluate the image quality of the infrared binocular image pairs in the set, and determine the target binocular image pairs. The image enhancement module is used to reconstruct and enhance the target binocular image pair to obtain an enhanced binocular image pair; The face feature extraction module is used to extract face features from the enhanced binocular image pair to obtain face feature data; The face recognition module is used to acquire a face database and perform face recognition on the target person based on the face feature data.
7. A face recognition system according to claim 6, characterized in that, The image quality assessment module includes an image pair set acquisition unit and an image quality assessment unit; The image pair acquisition unit is used to use the infrared binocular camera in the face lock to take pictures of the target person, acquire each infrared binocular image pair of the target person and collect them to obtain an infrared binocular image pair set. The image quality assessment unit is used to assess the image quality of each infrared binocular image pair in the infrared binocular image pair set and determine the target binocular image pair.
8. A face recognition system according to claim 6, characterized in that, The image enhancement module includes a texture recognition unit and an image enhancement unit; The texture recognition unit is used to perform multi-scale decomposition on the target binocular image pair, identify the influence texture of the target infrared image in the target binocular image pair, and determine the low-texture region and high-texture region in the target infrared image. The image enhancement unit is used to analyze the similarity between high-texture regions and low-texture regions, enhance the texture richness of hidden texture regions, reconstruct and enhance the target infrared image pair, and obtain an enhanced binocular image pair.
9. A face recognition system according to claim 6, characterized in that, The facial feature extraction module includes a data acquisition unit and a facial feature extraction unit; The data acquisition unit is used to perform stereo matching and face recognition on the enhanced binocular image pairs to obtain face bounding boxes and depth maps. The face feature extraction unit is used to extract multi-dimensional features from the face bounding box and the depth map based on the face bounding box and the depth map, generate face feature vectors, and collect the face feature vectors to obtain face feature data.
10. A face recognition system according to claim 6, characterized in that, The face recognition module includes a face recognition unit; The face recognition unit is used to obtain a preset face database from the platform, obtain template face feature vectors of each registered user from the face database, and combine the face feature vectors in the face feature data to perform face recognition on the target person.