Model training methods, image processing methods, computing devices, and non-transient computer-readable media
By using a convolutional neural network training method, utilizing downsampling of the encoding network and upsampling of the decoding network, combined with asymmetric convolutional kernels and skip connections, the problem of image overlap or discreteness in optical under-display fingerprint acquisition is solved, achieving efficient and clear fingerprint image generation and recognition.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BOE TECHNOLOGY GROUP CO LTD
- Filing Date
- 2021-10-28
- Publication Date
- 2026-06-30
AI Technical Summary
Existing optical under-display fingerprint acquisition technology struggles to effectively capture clear fingerprint images, and existing methods suffer from long acquisition times or overlapping or discrete images due to light source arrangement, making effective stitching impossible.
By acquiring a set of blurred and clear image samples of the same fingerprint, a convolutional neural network is trained. The encoding network is used for downsampling and the decoding network for upsampling. Combined with asymmetric convolutional kernels and skip connections, image feature extraction is optimized. The network parameters are adjusted using a preset loss function, and finally, a clear image is generated.
It improves the accuracy and efficiency of fingerprint image acquisition, reduces acquisition time, and enhances the accuracy of fingerprint recognition.
Smart Images

Figure CN116368500B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of computer technology, and in particular to a model training method, an image processing method, a computing processing device, and a non-transient computer-readable medium. Background Technology
[0002] Optical under-display fingerprint sensors illuminate the finger using a point light source under the screen. The light is reflected by the finger and received by an optical sensor beneath the screen. Because the intensity of light reflected from the fingerprint valleys and ridges differs, a fingerprint image can be generated. Optical under-display fingerprint systems have high production value due to their large fingerprint collection area and low hardware cost. Summary of the Invention
[0003] This disclosure provides a model training method, including:
[0004] Obtain a sample set, wherein the samples in the sample set include blurred and clear images of the same fingerprint;
[0005] The blurred image is input into a convolutional neural network (CNN). The encoding network in the CNN downsamples and extracts features from the blurred image, outputting multiple feature maps. The decoding network in the CNN upsamples and extracts features from the feature maps, outputting a predicted image corresponding to the blurred image. The encoding network includes multiple encoding layers, and the decoding network includes multiple decoding layers. The feature map obtained from the F-th encoding layer and the feature map obtained from the G-th decoding layer are fused and used as the input to the (G+1)-th decoding layer. The feature map obtained from the F-th encoding layer and the feature map obtained from the G-th decoding layer have the same resolution. F and G are both positive integers.
[0006] Based on the predicted image, the clear image, and a preset loss function, the loss value of the convolutional neural network is calculated, and the parameters of the convolutional neural network are adjusted with the goal of minimizing the loss value.
[0007] The convolutional neural network with its parameters adjusted was identified as the image processing model.
[0008] In one alternative implementation, each of the coding layers includes a first convolutional block and / or a downsampling block, and each of the decoding layers includes a second convolutional block and / or an upsampling block;
[0009] Wherein, at least one of the first convolutional block, the downsampling block, the second convolutional block, and the upsampling block includes at least one set of asymmetric convolutional kernels.
[0010] In one optional implementation, the encoding network includes N encoding modules, each encoding module includes M encoding layers, where M and N are both positive integers. The step of the encoding network in the convolutional neural network downsampling and extracting features from the blurred image to output multiple feature maps includes:
[0011] The first encoding level of the first encoding module in the N encoding modules performs feature extraction on the blurred image;
[0012] The i-th encoding level of the first encoding module performs downsampling and feature extraction on the feature map obtained by the (i-1)-th encoding level of the first encoding module in sequence; wherein i is greater than or equal to 2 and less than or equal to M;
[0013] The first encoding level of the j-th encoding module in the N encoding modules performs feature extraction on the feature map obtained by processing the first encoding level of the (j-1)-th encoding module; wherein, j is greater than or equal to 2 and less than or equal to N;
[0014] The i-th encoding level of the j-th encoding module downsamples the feature map obtained by the (i-1)-th encoding level of the j-th encoding module, fuses the downsampled feature map with the feature map obtained by the i-th encoding level of the (j-1)-th encoding module, and extracts features from the fused result.
[0015] The plurality of feature maps include feature maps obtained from each coding level of the Nth coding module among the N coding modules.
[0016] In one optional implementation, the decoding network includes the M decoding layers, and the step of the decoding network in the convolutional neural network upsampling and extracting features from the feature map to output a predicted image corresponding to the blurred image includes:
[0017] The first decoding level in the M decoding levels extracts features from the feature map obtained by the M-th encoding level of the N-th encoding module, and upsamples the extracted feature map.
[0018] The feature map obtained from the (u-1)th decoding level of the M decoding levels is fused with the feature map obtained from the (M-u+1)th encoding level of the Nth encoding module to obtain a first fused feature map; wherein, u is greater than or equal to 2 and less than or equal to M-1;
[0019] The first fused feature map is input into the u-th decoding level among the M decoding levels, and the u-th decoding level sequentially performs feature extraction and upsampling on the first fused feature map;
[0020] The feature map obtained from the (M-1)th decoding level of the M decoding levels is fused with the feature map obtained from the first encoding level of the Nth encoding module to obtain a second fused feature map.
[0021] The second fused feature map is input into the Mth decoding layer of the M decoding layers, and the Mth decoding layer performs feature extraction on the second fused feature map to obtain the predicted image.
[0022] In one optional implementation, the step of fusing the downsampled feature map with the feature map processed by the i-th coding level of the (j-1)-th coding module, and extracting features from the fused result, includes:
[0023] The feature map obtained by downsampling is concatenated with the feature map obtained by the i-th encoding level of the j-1-th encoding module in the channel dimension, and the concatenation result is used for feature extraction.
[0024] The step of fusing the feature map obtained from the (u-1)th decoding level of the M decoding levels with the feature map obtained from the (M-u+1)th encoding level of the Nth encoding module to obtain the first fused feature map includes:
[0025] The feature map obtained from the (u-1)th decoding layer in the M decoding layers is concatenated with the feature map obtained from the (M-u+1)th encoding layer in the Nth encoding module along the channel dimension to obtain the first fused feature map;
[0026] The step of fusing the feature map obtained from the (M-1)th decoding level of the M decoding levels with the feature map obtained from the first encoding level of the Nth encoding module to obtain a second fused feature map includes:
[0027] The feature map obtained from the (M-1)th decoding level of the M decoding levels is concatenated with the feature map obtained from the first encoding level of the Nth encoding module along the channel dimension to obtain the second fused feature map.
[0028] In one optional implementation, both the first convolutional block and the second convolutional block include a first convolutional layer and a second convolutional layer, wherein the first convolutional layer includes the asymmetric convolutional kernel and the second convolutional layer includes a 1×1 convolutional kernel.
[0029] The downsampling block includes a max-pooling layer and a min-pooling layer, and both the max-pooling layer and the min-pooling layer include the asymmetric convolution kernel;
[0030] The asymmetric convolution kernel includes a 1×k convolution kernel and a k×1 convolution kernel, where k is greater than or equal to 2.
[0031] In one alternative implementation, the convolution kernels in both the encoding and decoding layers are symmetric convolution kernels.
[0032] In one optional implementation, the encoding network includes P encoding layers, and the step of the encoding network in the convolutional neural network downsampling and extracting features from the blurred image to output multiple feature maps includes:
[0033] The first coding level in the P coding levels sequentially performs feature extraction and downsampling on the blurred image;
[0034] In the P coding levels, the feature map obtained by the q-th coding level is processed by the (q-1)-th coding level in sequence for feature extraction and downsampling;
[0035] Wherein, q is greater than or equal to 2 and less than or equal to P, and the plurality of feature maps include feature maps obtained by processing the P coding levels.
[0036] In one optional implementation, the decoding network includes the P decoding layers, and the step of the decoding network in the convolutional neural network upsampling and extracting features from the feature map to output a predicted image corresponding to the blurred image includes:
[0037] Feature extraction is performed on the feature map obtained from the Pth coding level among the P coding levels to obtain a computed feature map;
[0038] The calculated feature map is fused with the feature map obtained from the processing of the Pth encoding level to obtain the third fused feature map;
[0039] The third fused feature map is input into the first decoding layer among the P decoding layers, and the first decoding layer sequentially upsamples and extracts features from the third fused feature map;
[0040] The feature map obtained from the (r-1)th decoding level among the P decoding levels is fused with the feature map obtained from the (P-r+1)th encoding level among the P encoding levels to obtain the fourth fused feature map;
[0041] The fourth fused feature map is input to the r-th decoding level among the P decoding levels, and the r-th decoding level performs upsampling and feature extraction on the fourth fused feature map in sequence;
[0042] Wherein, r is greater than or equal to 2 and less than or equal to P, and the predicted image is the feature map obtained by processing the Pth decoding level among the P decoding levels.
[0043] In one optional implementation, the step of fusing the computed feature map with the feature map obtained from the Pth encoding level to obtain a third fused feature map includes:
[0044] The calculated feature map is concatenated with the feature map obtained from the Pth encoding level along the channel dimension to obtain the third fused feature map;
[0045] The step of fusing the feature map obtained from the (r-1)th decoding level among the P decoding levels with the feature map obtained from the (P-r+1)th encoding level among the P encoding levels to obtain the fourth fused feature map includes:
[0046] The feature map obtained from the (r-1)th decoding level of the P decoding levels is concatenated with the feature map obtained from the (P-r+1)th encoding level of the P encoding levels along the channel dimension to obtain the fourth fused feature map.
[0047] In one optional implementation, the step of calculating the loss value of the convolutional neural network based on the predicted image, the sharpened image, and a preset loss function includes:
[0048] The loss value is calculated using the following formula:
[0049]
[0050]
[0051]
[0052] Among them, the The loss value is Y, and the predicted image is Y. The sharpened image is defined as follows: W is the width of the predicted image, H is the height of the predicted image, C is the number of channels of the predicted image, and E(Y) is the edge map of the predicted image. The edge map of the clear image is defined as follows: λ is greater than or equal to 0 and less than or equal to 1; x is a positive integer greater than or equal to 1 and less than or equal to W; y is a positive integer greater than or equal to 1 and less than or equal to H; and z is a positive integer greater than or equal to 1 and less than or equal to C.
[0053] In one optional implementation, the step of obtaining the sample set includes:
[0054] Obtain the original image of the same fingerprint;
[0055] The original image is preprocessed to obtain the blurred image; wherein the preprocessing includes at least one of the following: image segmentation, size cropping, flipping, brightness enhancement, noise reduction, and normalization.
[0056] In one optional implementation, the step of preprocessing the original image to obtain the blurred image includes:
[0057] The original image is segmented to obtain a first image, a second image, and a third image, wherein the first image, the second image, and the third image respectively contain information about different regions of the original image;
[0058] The first image, the second image, and the third image are each normalized. The blurred image includes the normalized first image, the second image, and the third image.
[0059] In one optional implementation, the original image includes a first pixel value of a first pixel, and the step of performing image segmentation on the original image to obtain a first image, a second image, and a third image includes:
[0060] If the first pixel is located outside the preset area, and the value of the first pixel is greater than or equal to the first threshold and less than or equal to the second threshold, then the pixel value of the first pixel in the first image is determined to be the first pixel value.
[0061] If the first pixel is located outside the preset area, and the value of the first pixel is less than the first threshold and greater than the second threshold, then the pixel value of the first pixel in the first image is determined to be 0.
[0062] If the first pixel is located outside the preset area, and the value of the first pixel is greater than or equal to the third threshold and less than or equal to the fourth threshold, then the pixel value of the first pixel in the second image is determined to be the first pixel value.
[0063] If the first pixel is located outside the preset area, and the value of the first pixel is less than the third threshold and greater than the fourth threshold, then the pixel value of the first pixel in the second image is determined to be 0.
[0064] If the first pixel is located within a preset area, then the pixel value of the first pixel in the third image is determined to be the first pixel value;
[0065] The third threshold is greater than the second threshold.
[0066] In one optional implementation, the step of segmenting the original image to obtain a first image, a second image, and a third image includes:
[0067] Edge detection is performed on the original image, and the original image is segmented into the first image, the second image, and the third image based on the position and length of the detected edges.
[0068] In one optional implementation, the step of normalizing the first image, the second image, and the third image respectively includes:
[0069] Determine the maximum and minimum values among all pixel values contained in the image to be processed, wherein the image to be processed is any one of the first image, the second image, and the third image, and the image to be processed includes the second pixel value of the second pixel;
[0070] Based on the maximum value, minimum value, and the second pixel value, the pixel value of the second pixel in the normalized image to be processed is determined.
[0071] This disclosure provides an image processing method, including:
[0072] Obtain a blurred fingerprint image;
[0073] The blurred fingerprint image is input into the image processing model trained by any of the model training methods described in the present invention to obtain a clear fingerprint image corresponding to the blurred fingerprint image.
[0074] In one optional implementation, when the blurred image is the result of preprocessing the original image, the step of obtaining the blurred fingerprint image includes:
[0075] Obtain the raw fingerprint image;
[0076] The original fingerprint image is preprocessed to obtain the blurred fingerprint image; wherein the preprocessing includes at least one of the following: image segmentation, size cropping, flipping, brightness enhancement, noise reduction, and normalization.
[0077] This disclosure provides a computing processing device, including:
[0078] Memory containing computer-readable code;
[0079] One or more processors, when the computer-readable code is executed by the one or more processors, the computing processing device performs the method as described in any one of them.
[0080] This disclosure provides a non-transient computer-readable medium storing computer-readable code that, when executed on a computing processing device, causes the computing processing device to perform the method according to any one of the methods.
[0081] The above description is merely an overview of the technical solution disclosed herein. In order to better understand the technical means of this disclosure and to implement it in accordance with the contents of the specification, and to make the above and other objects, features and advantages of this disclosure more apparent and understandable, specific embodiments of this disclosure are described below. Attached Figure Description
[0082] To more clearly illustrate the technical solutions in the embodiments or related technologies of this disclosure, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. It should be noted that the scale in the drawings is for illustration only and does not represent the actual scale.
[0083] Figure 1 A schematic diagram illustrating an optical under-display fingerprint image acquisition method is shown.
[0084] Figure 2 A schematic diagram of a multi-point light source imaging scheme is shown.
[0085] Figure 3 A schematic diagram illustrating the process of a model training method is shown.
[0086] Figure 4 A set of original and sharpened images is shown schematically;
[0087] Figure 5 A schematic diagram illustrating the process of acquiring clear images is shown.
[0088] Figure 6 A schematic diagram of a blurred image is shown.
[0089] Figure 7 A schematic diagram of the first, second, and third images is shown.
[0090] Figure 8 A schematic diagram of the structure of the first type of convolutional neural network is shown.
[0091] Figure 9 A schematic diagram of the structure of the first convolutional block is shown.
[0092] Figure 10 A schematic diagram of the downsampling block structure is shown.
[0093] Figure 11 A schematic diagram of the structure of the second type of convolutional neural network is shown.
[0094] Figure 12 A schematic diagram illustrating the process of an image processing method is shown.
[0095] Figure 13 A schematic diagram of a model training device is shown.
[0096] Figure 14 A schematic block diagram of an image processing device is shown.
[0097] Figure 15 A block diagram of a computing processing apparatus for performing the method according to the present disclosure is shown schematically;
[0098] Figure 16 A storage unit for holding or carrying program code that implements the method according to this disclosure is illustrated schematically. Detailed Implementation
[0099] To make the objectives, technical solutions, and advantages of the embodiments of this disclosure clearer, the technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this disclosure, and not all embodiments. Based on the embodiments of this disclosure, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this disclosure.
[0100] like Figure 1 As shown, optical under-display fingerprint technology uses an under-display point light source to illuminate the finger. The light is reflected by the finger and received by a photosensitive element under the screen. Since there is a difference in the intensity of light reflected from the fingerprint valleys and fingerprint ridges, a fingerprint image can be generated.
[0101] In related technologies, multiple point light sources below the screen are typically illuminated simultaneously to obtain a larger area and higher intensity fingerprint image. However, due to the limitations of the light emission and imaging principles of point light sources, no matter how the positions of the point light sources are arranged, an ideal fingerprint image cannot be obtained. Figure 2 As shown, when multiple point light sources are sparsely arranged, the fingerprint images corresponding to each point light source are too discrete to be stitched together into a complete fingerprint image. In order to obtain a complete fingerprint image, multiple point light sources need to be densely arranged, which will cause the fingerprint images corresponding to each point light source to overlap with each other.
[0102] In related technologies, to obtain a clear fingerprint image, one can sequentially illuminate each point light source, acquire the fingerprint image corresponding to each single point light source, and then perform processing such as cropping and alignment stitching on the fingerprint images corresponding to multiple single point light sources to obtain a complete and clear fingerprint image. However, this method requires acquiring fingerprint images corresponding to each single point light source, which takes a long time and is not very feasible.
[0103] To solve the above problems, Figure 3 A flowchart illustrating a model training method is shown schematically, such as... Figure 3 As shown, the method may include the following steps.
[0104] Step S31: Obtain a sample set, which includes both blurred and clear images of the same fingerprint.
[0105] The execution subject of this embodiment can be a computer device, which has a model training device to execute the model training method provided in this embodiment. The computer device can be, for example, a smartphone, tablet computer, personal computer, etc., and this embodiment does not limit it to this type.
[0106] The execution entity in this embodiment can acquire the sample set in various ways. For example, the execution entity can acquire samples stored on another server (e.g., a database server) used for data storage via a wired or wireless connection. As another example, the execution entity can acquire samples collected by an under-display fingerprint acquisition device and store these samples locally to generate a sample set.
[0107] In practical implementation, multiple point light sources can be lit simultaneously on the under-display fingerprint collection device to collect fingerprints from different fingers of the person participating in the collection multiple times. The original image is then generated through the imaging module inside the device, such as... Figure 4 As shown in the middle left image. The original image could be, for example, a 16-bit PNG image.
[0108] The blurred image can be the original image directly generated by the under-display fingerprint collection device, or it can be an image obtained by preprocessing the original image. This disclosure does not limit it in this way.
[0109] Figure 4 The left and right images in the diagram show the original and clear images of the same fingerprint, respectively. In this embodiment, both the original and clear images can be single-channel grayscale images. In practical applications, the original and clear images can also be multi-channel color images, such as RGB images.
[0110] Reference Figure 5 A schematic diagram illustrating the process of obtaining a clear image is shown. (For example...) Figure 5As shown, after multiple point light sources are lit simultaneously and the initial image of a fingerprint (such as a finger fingerprint) is acquired, the finger of the person being photographed is kept still on the screen. Then, each point light source is lit in turn to acquire the fingerprint image corresponding to a single point light source. By cropping and stitching the fingerprint images corresponding to multiple single point light sources, a clear image of the fingerprint is finally obtained. Figure 5 The diagram shows four point light sources: point light source 1, point light source 2, point light source 3, and point light source 4. By lighting up the four point light sources in sequence, four fingerprint images corresponding to each single point light source are obtained. After cropping and stitching the four fingerprint images corresponding to each single point light source, a clear image is obtained.
[0111] In one optional implementation, step S31 may specifically include: firstly acquiring the original image of the same fingerprint; then preprocessing the original image to obtain a blurred image; wherein the preprocessing includes at least one of the following: image segmentation, size cropping, flipping, brightness enhancement, noise processing, and normalization processing.
[0112] In this implementation, the blurred image is the image obtained by preprocessing the original image.
[0113] like Figure 6 As shown, the original image obtained by simultaneously illuminating multiple point light sources not only contains fingerprint information generated by the photosensitive element receiving the reflected light from the fingerprint (located in...) Figure 6 (Region a in the text); there is also noise information introduced by ambient light (located in the area). Figure 6 (region b in the image) and light information near the point light source (located in the image) Figure 6 The image is divided into regions (e.g., region c). Region a contains the main fingerprint information, region b contains a large amount of ambient light noise and a small amount of weak fingerprint information, and region c contains a strong light source signal and a small amount of fingerprint information. Before training the convolutional neural network, the original image can be segmented to obtain the first image corresponding to region a (e.g., region c). Figure 7 The second image corresponding to region a) and region b) is shown in Figure 1. Figure 7 (as shown in b) and the third image corresponding to region c (as shown in b) Figure 7 (as shown in c).
[0114] The first, second, and third images each contain image information from different regions of the original image. By dividing the original image into these three images according to regions, primary and secondary data can be separated, reducing the impact of ambient light and point light sources on the fingerprint image.
[0115] Furthermore, the inventors discovered that when the pixel value ranges from 0 to 65535, the pixel values of most pixels in region a, which contains the main fingerprint information, are below 10000, meaning that the pixel values in region a are mainly in the low value range, while the pixel values in region b, especially region c, are in the higher value range. Therefore, in order to obtain more fingerprint information from region a and prevent the loss of the main fingerprint information, the first, second, and third images obtained from image segmentation can be normalized respectively. Figure 7 In the image, 'a' represents the first image after normalization. Figure 7 In the image, b represents the second image after normalization. Figure 7 The 'c' in the image represents the third image after normalization.
[0116] In this embodiment, the blurred image includes a normalized first image, a second image, and a third image. Specifically, the blurred image can be a three-channel image obtained by stitching the normalized first image, second image, and third image along the channel dimension. By stitching images of different regions along the channel dimension, more effective fingerprint information can be extracted in multiple dimensions, improving the accuracy of subsequent fingerprint recognition.
[0117] In specific implementations, image segmentation of the original image can be performed using threshold segmentation or edge detection methods, etc., and this disclosure does not limit the specific methods used.
[0118] The original image includes the first pixel value of the first pixel. In the first implementation, the step of performing image segmentation on the original image to obtain the first image, the second image, and the third image may include:
[0119] If the first pixel is outside the preset area, and the value of the first pixel is greater than or equal to the first threshold and less than or equal to the second threshold, then the pixel value of the first pixel in the first image is determined to be the first pixel value; if the first pixel is outside the preset area, and the value of the first pixel is less than the first threshold and greater than the second threshold, then the pixel value of the first pixel in the first image is determined to be 0.
[0120] If the first pixel is outside the preset area, and the value of the first pixel is greater than or equal to the third threshold and less than or equal to the fourth threshold, then the pixel value of the first pixel in the second image is determined to be the first pixel value; if the first pixel is outside the preset area, and the value of the first pixel is less than the third threshold and greater than the fourth threshold, then the pixel value of the first pixel in the second image is determined to be 0.
[0121] The third threshold is greater than the second threshold. That is, the pixel values of region b are generally higher than the pixel values of region a.
[0122] If the first pixel is within the preset area, then the pixel value of the first pixel in the third image is determined to be the first pixel value.
[0123] Specifically, the first image corresponds to region a, which can be segmented from the original image using the following formula.
[0124]
[0125] in, I represents the pixel value at coordinates (x, y) in the first image. (x,y) Min represents the pixel value at coordinates (x, y) in the original image. a The first threshold, max a This is the second threshold.
[0126] The specific values of the first threshold and the second threshold can be determined by manually selecting a relatively smooth area in region a, statistically calculating the pixel values of the original image in that area, and determining the minimum and maximum values of region a. The first threshold can be the average of the minimum values of region a of multiple original images, and the second threshold can be the average of the maximum values of region a of multiple original images.
[0127] It should be noted that the above formula can be used to segment region a from the original image, similar to image matting. In the first image, the pixel values of regions b and c are both 0.
[0128] Specifically, the second image corresponds to region b, which can be segmented from the original image using the following formula.
[0129]
[0130] in, I represents the pixel value at coordinates (x, y) in the second image. (x,y) Min represents the pixel value at coordinates (x, y) in the original image. b The third threshold, max b This is the fourth threshold.
[0131] The specific values of the third and fourth thresholds can be determined by manually selecting a relatively smooth area in region b, statistically calculating the pixel values in that area of the original image, and determining the minimum and maximum values of region b. The third threshold can be the average of the minimum values of region b of multiple original images, and the fourth threshold can be the average of the maximum values of region b of multiple original images.
[0132] It should be noted that the above formula can be used to segment region b from the original image, similar to image matting. In the second image, the pixel values of regions a and c are both 0.
[0133] The segmentation of region c in the third image can be performed based on the position of the point light source in the fingerprint image. With the point light source fixed, the coordinates of the preset region are also fixed. The preset region can be determined by directly measuring the coordinates and radius of the point light source, thus achieving the segmentation of region c. In the third image, the pixel values of regions a and b are both 0.
[0134] In the second implementation, the step of performing image segmentation on the original image to obtain a first image, a second image, and a third image may include: performing edge detection on the original image, and segmenting the original image into a first image, a second image, and a third image based on the position and length of the detected edges.
[0135] In practical implementation, the Laplacian edge detection algorithm can be used to perform edge detection on the original image, filter the length and position of the detected edges, and use the finally extracted edges as the boundaries of each region for segmentation.
[0136] In bright-state acquisition environments, the Laplacian edge detection algorithm can detect the boundary between region a and region b, the boundary between region a and region c, and may also detect boundaries caused by noise, as well as boundaries of the effectively identifiable region. Furthermore, boundaries caused by noise can be filtered out based on boundary length, and boundaries of the effectively identifiable region can be filtered out based on boundary position. Since edge detection is relatively fast, using edge detection for image segmentation can improve segmentation efficiency.
[0137] Assuming the image to be processed is any one of the first image, the second image, and the third image, and the image to be processed includes the second pixel value of the second pixel. In a specific implementation, the steps of normalizing the first image, the second image, and the third image respectively can include: firstly, determining the maximum and minimum values among all pixel values contained in the image to be processed; then, based on the maximum and minimum values and the second pixel value, determining the pixel value of the second pixel in the normalized image to be processed.
[0138] Specifically, the maximum and minimum values of all pixel values in the image to be processed can be calculated separately. Let the maximum value be `max` and the minimum value be `min`. The second pixel value of the second pixel in the image to be processed is `I`, and the normalized pixel value of the second pixel is `I`. norm = (I-min) / (max-min), which normalizes the pixel values in the image to be processed to the range of 0 to 1.
[0139] Step S32: Input the blurred image into the convolutional neural network. The encoding network in the convolutional neural network downsamples and extracts features from the blurred image, outputting multiple feature maps. The decoding network in the convolutional neural network upsamples and extracts features from the feature maps, outputting a predicted image corresponding to the blurred image.
[0140] The encoding network includes multiple encoding levels, and the decoding network includes multiple decoding levels. The feature map obtained by the Fth encoding level in the encoding network and the feature map obtained by the Gth decoding level in the decoding network are fused together and used as the input of the G+1th decoding level in the decoding network.
[0141] The feature map obtained from the F-th encoding level has the same resolution as the feature map obtained from the G-th decoding level, where F and G are both positive integers.
[0142] Convolutional Neural Networks (CNNs) are neural network structures that use images, for example, as input and output, and replace scalar weights with filters (convolutional kernels). The convolution process can be viewed as using a trainable filter to convolve an input image or a convolutional feature map, outputting a convolutional feature plane, also known as a feature map. A convolutional layer is a layer of neurons in a CNN that performs convolutional processing on the input signal. In a convolutional layer of a CNN, a neuron is connected only to neurons in a subset of its neighboring layers. A convolutional layer can apply several convolutional kernels to the input image to extract various types of features. Each convolutional kernel can extract one type of feature. Convolutional kernels are typically initialized as matrices of random size, and during the training of the CNN, they learn to obtain appropriate weights. Multiple convolutional kernels can be used in the same convolutional layer to extract different image information.
[0143] By fusing the feature map obtained from the Fth encoding level of the encoding network with the feature map obtained from the Gth decoding level of the decoding network, and then inputting the fused feature map into the (G+1)th decoding level of the decoding network, a skip connection can be achieved between the encoding and decoding networks. This skip connection increases the image detail retention of the decoding network. It allows image details and information lost during downsampling in the encoding network to be passed to the decoding network, enabling the decoding network to generate a more accurate image during upsampling to restore spatial resolution. This improves the accuracy of extracting a sharp image from a blurry image.
[0144] Downsampling operations may include: maximum value merging, average value merging, random merging, undersampling (e.g., selecting a fixed number of pixels), and demultiplexing output (e.g., splitting an input image into multiple smaller images), etc., which are not limited in this disclosure.
[0145] Upsampling operations may include: maximum merging, strides transposed convolutions, interpolation, etc., and this disclosure is not limited thereto.
[0146] In coding networks, the spatial dimension of the feature map can be gradually reduced by downsampling multiple times, which can expand the receptive field. This allows the coding network to better extract local and global features at different scales. Moreover, downsampling can compress the extracted feature map, thereby saving computation and memory usage and improving processing speed.
[0147] In the decoding network, the spatial resolution of multiple feature maps output by the encoding network can be restored to be consistent with that of the blurred image through multiple upsampling.
[0148] Step S33: Based on the predicted image, the clear image, and the preset loss function, calculate the loss value of the convolutional neural network and adjust the parameters of the convolutional neural network with the goal of minimizing the loss value.
[0149] The loss function is an important equation used to measure the difference between the predicted image and the sharp image. For example, a higher output value (loss) of the loss function indicates a greater difference.
[0150] In one alternative implementation, the loss value of the convolutional neural network can be calculated using the following formula:
[0151]
[0152]
[0153]
[0154] in, Let Y be the loss value and Y be the predicted image. For a sharp image, W is the width of the predicted image, H is the height of the predicted image, C is the number of channels in the predicted image, and E(Y) is the edge map of the predicted image. For the edge map of the clear image, λ is greater than or equal to 0 and less than or equal to 1, x is a positive integer greater than or equal to 1 and less than or equal to W, y is a positive integer greater than or equal to 1 and less than or equal to H, and z is a positive integer greater than or equal to 1 and less than or equal to C.
[0155] Where E(Y) can be the edge map of the predicted image obtained according to the Sobel edge extraction algorithm. It can be an edge map of a sharp image obtained using the Sobel edge extraction algorithm.
[0156] because It can guide the network to recover low-frequency information from clear images. This method helps to enhance the edge information of the original image; therefore, in this implementation, it uses... and Using a weighted sum as a loss function can improve the image extraction effect.
[0157] In a practical implementation, the AdamW optimizer can be used to optimize the parameters of the convolutional neural network based on the loss value. The initial learning rate can be set to 10. -4 The batch size of the training data can be set to 48.
[0158] In practical implementation, the termination of training can be determined by whether the convolutional neural network (CNN) has converged. Convergence can be determined in any of the following ways: checking if the number of times the CNN parameters have been updated has reached an iteration threshold; or checking if the CNN's loss value is lower than a loss threshold. The iteration threshold can be a pre-set number of iterations; for example, if the number of times the CNN parameters have been updated exceeds the iteration threshold, training terminates. The loss threshold can also be pre-set; for example, if the loss value calculated by the loss function is lower than the loss threshold, training terminates.
[0159] Step S34: Determine the convolutional neural network with completed parameter adjustments as the image processing model.
[0160] In this embodiment, in response to determining that the training of the convolutional neural network is complete, the trained convolutional neural network can be identified as an image processing model. This image processing model can be used to extract clear fingerprint images from blurry fingerprint images.
[0161] The model training method provided in this embodiment, by training a convolutional neural network, can obtain a model that can be used to extract clear fingerprint images. The convolutional neural network provided in this embodiment includes an encoder network and a decoder network with skip connections. The skip connections between the encoder and decoder networks can increase the preservation of image details by the decoder network, thereby improving the accuracy of extracting clear images from blurry images and improving image processing performance.
[0162] In this embodiment, the specific structure of the convolutional neural network can be set according to actual needs.
[0163] In one alternative implementation, each coding layer may include a first convolutional block and / or a downsampling block. The first convolutional block is used to extract features from the input feature matrix. The downsampling block is used to downsample the input feature map.
[0164] Each decoding layer may include a second convolutional block and / or an upsampling block. The second convolutional block is used to extract features from the input feature matrix. The upsampling block is used to upsample the input feature map.
[0165] The first convolutional block, the downsampling block, the second convolutional block, and the upsampling block each include at least one set of asymmetric convolutional kernels.
[0166] Asymmetric convolution kernels can include, for example, 1×k convolution kernels and k×1 convolution kernels. The value of k is greater than or equal to 2, and the value of k can be set according to requirements, for example, it can be 5.
[0167] In this implementation, the computational load can be significantly reduced by using asymmetric convolution kernels for feature extraction or sampling, thereby improving processing speed. By using asymmetric convolution kernels for both horizontal and vertical convolution, the horizontal and vertical gradients in the image can be learned, which helps in extracting changes in information within the fingerprint image.
[0168] like Figure 8 As shown, an encoding network can include N encoding modules, such as Figure 8 The values CM-1, CM-2, ..., CM-N are shown. N can be a positive integer, or N can be greater than or equal to 2 and less than or equal to 20. For example, N can take the values 8, 10, 12, 15, etc. This disclosure does not limit the specific value of N.
[0169] Each encoding module may include M encoding levels. M can be a positive integer, or M can be greater than or equal to 2 and less than or equal to 8, such as... Figure 8 The value of M shown is 3, meaning that each encoding module includes 3 encoding levels: the first encoding level a1, the second encoding level a2, and the third encoding level a3. This disclosure does not limit the specific value of M.
[0170] Specifically, the first coding level a1 of any coding module may include one or more first convolutional blocks. The i-th coding level of any coding module may include one or more first convolutional blocks and a downsampling block. Where i is greater than or equal to 2 and less than or equal to M.
[0171] A decoding network can include M decoding levels, meaning the number of decoding levels in the decoding network is the same as the number of encoding levels in each encoding module. For example... Figure 8The decoding network shown includes three decoding levels: the first decoding level b1, the second decoding level b2, and the third decoding level b3.
[0172] In the decoding network, each of the first to the (M-1)th decoding levels may include one or more second convolutional blocks and an upsampling block. The Mth decoding level may include one or more second convolutional blocks.
[0173] Figure 8 Each encoding module shown includes two downsampling blocks, each of which downsamples the input feature map by a factor of 2. The decoding network includes two upsampling blocks, each of which upsamples the input feature map by a factor of 2. This ensures that the image output by the convolutional neural network has the same resolution as the image input to the convolutional neural network.
[0174] In this implementation, the step of the encoding network in the convolutional neural network downsampling and extracting features from the blurred image to output multiple feature maps may include:
[0175] In the N coding modules, the first coding level a1 of the first coding module CM-1 performs feature extraction on the blurred image;
[0176] The i-th coding level of the first coding module CM-1 performs downsampling and feature extraction on the feature map obtained from the (i-1)-th coding level of the first coding module in sequence;
[0177] In N coding modules, the first coding level of the j-th coding module extracts features from the feature map obtained by processing the first coding level of the (j-1)-th coding module; where j is greater than or equal to 2 and less than or equal to N.
[0178] The i-th coding level of the j-th coding module downsamples the feature map obtained by the (i-1)-th coding level of the j-th coding module, and fuses the downsampled feature map with the feature map obtained by the i-th coding level of the (j-1)-th coding module, and extracts features from the fused result.
[0179] The step of fusing the downsampled feature map with the feature map processed by the i-th coding level of the (j-1)-th coding module and extracting features from the fused result may include: concatenating the downsampled feature map with the feature map processed by the i-th coding level of the (j-1)-th coding module along the channel dimension and extracting features from the concatenated result.
[0180] The blurred image can be obtained by stitching together the first, second, and third images along the channel dimension. In a specific implementation, the matrix size of the blurred image can be B×3×H×W, where B is the number of original images in a training batch, H is the height of the original images, and W is the width of the original images. The output sharp image is a matrix of size B×1×H×W.
[0181] Within the first encoding module CM-1, the first convolutional block in the first encoding level a1 can extract features from the blurred image to obtain a first feature map; the downsampling block in the second encoding level a2 performs a first downsampling on the first feature map, and the first convolutional block in the second encoding level a2 extracts features from the feature map obtained from the first downsampling to obtain a second feature map; the downsampling block in the third encoding level a3 performs a second downsampling on the second feature map, and the first convolutional block in the third encoding level a3 extracts features from the feature map obtained from the second downsampling to obtain a third feature map.
[0182] Within the second encoding module CM-2, the first convolutional block in the first encoding layer a1 extracts features from the first feature map output by the first encoding module CM-1; the downsampling block in the second encoding layer a2 performs a first downsampling on the feature map output by the first encoding layer a1, and the first convolutional block in the second encoding layer a2 fuses the feature map obtained from the first downsampling with the second feature map output by the first encoding module CM-1, and extracts features from the fused result; the downsampling block in the third encoding layer a3 performs a second downsampling on the feature map output by the second encoding layer a2, and the first convolutional block in the third encoding layer a3 fuses the feature map obtained from the second downsampling with the third feature map output by the first encoding module CM-1, and extracts features from the fused result.
[0183] Assume that within the (N-1)th coding module CM-N-1, the feature map output by the first coding level a1 is the fourth feature map, the feature map output by the second coding level a2 is the fifth feature map, and the feature map output by the third coding level a3 is the sixth feature map.
[0184] Within the Nth encoding module CM-N, the first convolutional block in the first encoding layer a1 extracts features from the fourth feature map output by encoding module CM-N-1 to obtain the seventh feature map; the downsampling block in the second encoding layer a2 performs a first downsampling on the feature map output by the first encoding layer a1, and the first convolutional block in the second encoding layer a2 fuses the feature map obtained from the first downsampling with the fifth feature map output by encoding module CM-N-1, and extracts features from the fused result to obtain the eighth feature map; the downsampling block in the third encoding layer a3 performs a second downsampling on the feature map output by the second encoding layer a2, and the first convolutional block in the third encoding layer a3 fuses the feature map obtained from the second downsampling with the sixth feature map output by encoding module CM-N-1, and extracts features from the fused result to obtain the ninth feature map.
[0185] The multiple feature maps output by the encoding network include feature maps obtained from the processing of each encoding level of the Nth encoding module among the N encoding modules.
[0186] Accordingly, the steps in which the decoding network in a convolutional neural network upsamples and extracts features from the feature map to output a predicted image corresponding to the blurred image may include:
[0187] The first decoding level in the M decoding levels extracts features from the feature map obtained by the M-th encoding level of the N-th encoding module, and then upsamples the extracted feature map.
[0188] The decoding network fuses the feature map obtained from the (u-1)th decoding level in the M decoding levels with the feature map obtained from the (M-u+1)th encoding level in the Nth encoding module to obtain the first fused feature map; where u is greater than or equal to 2 and less than or equal to M-1; the value of M can be greater than or equal to 3;
[0189] The decoding network inputs the first fused feature map into the u-th decoding layer among M decoding layers. The u-th decoding layer sequentially performs feature extraction and upsampling on the first fused feature map.
[0190] The decoding network fuses the feature map obtained from the (M-1)th decoding level among the M decoding levels with the feature map obtained from the first encoding level of the Nth encoding module to obtain the second fused feature map.
[0191] The decoding network inputs the second fused feature map into the Mth decoding layer of the M decoding layers. The Mth decoding layer extracts features from the second fused feature map to obtain the predicted image.
[0192] In a specific implementation, the step of fusing the feature map obtained from the (u-1)th decoding layer in the M decoding layers with the feature map obtained from the (M-u+1)th encoding layer in the Nth encoding module to obtain the first fused feature map may include: concatenating the feature map obtained from the (u-1)th decoding layer in the M decoding layers with the feature map obtained from the (M-u+1)th encoding layer in the Nth encoding module along the channel dimension to obtain the first fused feature map.
[0193] The step of fusing the feature map obtained from the (M-1)th decoding layer in the M decoding layers with the feature map obtained from the first encoding layer of the Nth encoding module to obtain the second fused feature map may include: concatenating the feature map obtained from the (M-1)th decoding layer in the M decoding layers with the feature map obtained from the first encoding layer of the Nth encoding module in the channel dimension to obtain the second fused feature map.
[0194] As mentioned earlier, within the Nth encoding module CM-N, the first encoding level a1 outputs the seventh feature map; the second encoding level a2 outputs the eighth feature map; and the third encoding level a3 outputs the ninth feature map.
[0195] like Figure 8 As shown, within the decoding network, the second convolutional block in the first decoding layer b1 extracts features from the ninth feature map, and the upsampling block in the first decoding layer b1 performs a first upsampling on the feature extraction result. The decoding network performs a first fusion of the feature map output from the first decoding layer b1 with the eighth feature map, and inputs the feature map obtained from the first fusion into the second decoding layer b2. The second convolutional block in the second decoding layer b2 extracts features from the feature map obtained from the first fusion, and the upsampling block in the second decoding layer b2 performs a second upsampling on the feature extraction result. The decoding network performs a second fusion of the feature map output from the second decoding layer b2 with the seventh feature map, and inputs the feature map obtained from the second fusion into the third decoding layer b3. The second convolutional block in the third decoding layer b3 extracts features from the feature map obtained from the second fusion and outputs the predicted image.
[0196] Reference Figure 9 A schematic diagram of the structure of a first convolutional block is shown. (See diagram below.) Figure 9 As shown, the first convolutional block may include a first convolutional layer and a second convolutional layer. The first convolutional layer may include an asymmetric convolutional kernel, and the second convolutional layer may include a 1×1 convolutional kernel.
[0197] In the first convolutional block, splicing layers (such as...) can be used. Figure 9The `cat` layer shown fuses the feature maps obtained from processing a pair of asymmetric convolutional kernels within the first convolutional layer. Then, a second convolutional layer compresses the channel count of the fused result to reduce computation. Next, the `InstanceNorm` layer normalizes the output of the second convolutional layer using the `InstanceNorm` method. Finally, a `PRelu` layer processes the input feature map using the `PRelu` activation function, outputting the first convolutional block.
[0198] The second convolutional block can have the same structure as the first convolutional block, or it can be different.
[0199] Reference Figure 10 A schematic diagram of a downsampling block structure is shown. (See attached diagram.) Figure 10 As shown, the downsampling block can include max-pooling layers and min-pooling layers, each containing a set of asymmetric convolutional kernels. That is, the downsampling block includes spatially separable max-pooling and min-pooling layers, and the kernel size k for the max-pooling and min-pooling layers can be set to 5. The asymmetric convolutional kernels contained in the max-pooling and min-pooling layers can be the same or different.
[0200] In downsampling blocks, splicing layers (such as...) can be used. Figure 10 The cat shown in the figure fuses the feature maps output by the max pooling layer and the feature maps output by the min pooling layer, and outputs a downsampled block after fusion.
[0201] The upsampling block is used to perform upsampling operations, which may include PixelShuffle, maximum merging, strides transposed convolutions, interpolation (e.g., interpolation, bicubic interpolation, etc.). However, this disclosure is not limited thereto.
[0202] like Figure 8 As shown, the structure of a convolutional neural network is like a cross-grid, which can enhance the fusion between deep and shallow features, make full use of the limited fingerprint information in the original image, and improve the accuracy of extracting a clear image from the original image.
[0203] In this implementation, the convolutional neural network uses spatially separable convolution to perform most convolution operations. By using spatially separable convolution for feature extraction or sampling, the computational load can be significantly reduced, thereby improving processing speed and facilitating real-time processing of input images. Furthermore, spatially separable convolution can learn the lateral and vertical gradients in blurred images, which helps extract changes in fingerprint information and improves the accuracy of extracting clear images from blurred images.
[0204] In another alternative implementation, the convolution kernels in both the encoding and decoding layers are symmetric convolution kernels.
[0205] In this implementation, the encoding network includes P encoding levels. As shown in Figure 11, the encoding network includes three encoding levels: the first encoding level, the second encoding level, and the third encoding level.
[0206] Figure 11 The dashed box to the left of the second coding layer shows the specific structure of the second coding layer, which may include: InstanceNorm layer, PRelu layer, third convolutional layer, InstanceNorm layer, PRelu layer and downsampling layer.
[0207] The InstanceNorm layer uses the InstanceNorm method to normalize the input feature map.
[0208] The PRelu layer processes the input feature map using the PRelu activation function.
[0209] The third convolutional layer can include a 5×5 convolutional kernel, used to extract features from the input feature map.
[0210] The downsampling layer can include a convolutional layer with a 4×4 kernel and a stride of 2. Therefore, the width and height of the feature map output by the second encoding layer are reduced by a factor of 2 compared to the input feature map.
[0211] The specific structures of the first, second, and third coding levels can be the same.
[0212] In this implementation, the decoding network can include P decoding levels, meaning the number of decoding levels is the same as the number of encoding levels. For example, the encoding network shown in Figure 11 includes three decoding levels: a first decoding level, a second decoding level, and a third decoding level.
[0213] Figure 11 The dashed box to the right of the second decoding layer shows the specific structure of the second decoding layer, which may include: InstanceNorm layer, PRelu layer, upsampling layer, InstanceNorm layer, PRelu layer, and fourth convolutional layer.
[0214] The InstanceNorm layer normalizes the input feature map using the InstanceNorm method. The PRelu layer processes the input feature map using the PRelu activation function.
[0215] The upsampling layer can include a convolutional layer with a 4×4 transposed convolutional kernel. The stride of this convolutional layer can be 2, so the width and height of the feature map output by the second decoding layer are each twice that of the input feature map.
[0216] The fourth convolutional layer may include a 5×5 convolutional kernel, which is used to extract features from the input feature map.
[0217] The specific structures of the first, second, and third decoding layers can be the same.
[0218] In this implementation, the step of the encoding network in the convolutional neural network downsampling and extracting features from the blurred image to output multiple feature maps may include:
[0219] The first coding level in the P coding levels sequentially performs feature extraction and downsampling on the blurred image;
[0220] In the P coding levels, the q-th coding level sequentially extracts and downsamples the feature maps obtained from the (q-1)-th coding level.
[0221] Where q is greater than or equal to 2 and less than or equal to P, the multiple feature maps output by the encoding network include feature maps obtained from P encoding layers.
[0222] In the specific implementation, the first coding layer sequentially extracts and downsamples features from the blurred image to obtain the tenth feature map; the second coding layer sequentially extracts and downsamples features from the tenth feature map to obtain the eleventh feature map; and the third coding layer sequentially extracts and downsamples features from the eleventh feature map to obtain the twelfth feature map.
[0223] The matrix size corresponding to the blurred image is B×3×H×W, where B is the number of original images in a training batch, H is the height of the original image, and W is the width of the original image. The matrix size corresponding to the tenth feature map is B×64×H / 2×W / 2, the matrix size corresponding to the eleventh feature map is B×128×H / 4×W / 4, and the matrix size corresponding to the twelfth feature map is B×256×H / 8×W / 8.
[0224] The decoding network may also include a third convolutional block, which includes InstanceNorm layers, PRelu layers, convolutional layers with 5×5 kernels, and the width and height dimensions of the feature matrices of the input and output third convolutional blocks remain unchanged.
[0225] In this implementation, the step of the decoding network in the convolutional neural network upsampling and extracting features from the feature map, and outputting a predicted image corresponding to the blurred image, may include:
[0226] The feature map is obtained by extracting features from the feature map obtained by processing the Pth coding level in the P coding levels through the third convolutional block.
[0227] The decoding network fuses the computed feature map with the feature map obtained from the Pth encoding level to obtain the third fused feature map;
[0228] The decoding network inputs the third fused feature map into the first decoding layer of P decoding layers. The first decoding layer performs upsampling and feature extraction on the third fused feature map in sequence.
[0229] The decoding network fuses the feature map obtained from the (r-1)th decoding layer among the P decoding layers with the feature map obtained from the (P-r+1)th encoding layer among the P encoding layers to obtain the fourth fused feature map;
[0230] The decoding network inputs the fourth fused feature map into the r-th decoding layer of P decoding layers. The r-th decoding layer performs upsampling and feature extraction on the fourth fused feature map in sequence.
[0231] Where r is greater than or equal to 2 and less than or equal to P, the predicted image is the feature map obtained by processing the Pth decoding level out of P decoding levels.
[0232] The step of fusing the computed feature map with the feature map obtained from the Pth encoding level to obtain the third fused feature map may include: concatenating the computed feature map with the feature map obtained from the Pth encoding level along the channel dimension to obtain the third fused feature map.
[0233] The step of fusing the feature map obtained from the (r-1)th decoding level in the P decoding levels with the feature map obtained from the (P-r+1)th encoding level in the P encoding levels to obtain the fourth fused feature map may include: concatenating the feature map obtained from the (r-1)th decoding level in the P decoding levels with the feature map obtained from the (P-r+1)th encoding level in the P encoding levels along the channel dimension to obtain the fourth fused feature map.
[0234] In specific implementation, combined with Figure 8The third convolutional block extracts features from the twelfth feature map to obtain a computed feature map. The decoding network then fuses the computed feature map with the twelfth feature map to obtain a third fused feature map. This third fused feature map serves as the input to the first decoding layer. The first decoding layer then upsamples and extracts features from the third fused feature map to obtain a thirteenth feature map. The decoding network fuses the thirteenth and eleventh feature maps to obtain a fourteenth feature map, which is then input into the second decoding layer. The second decoding layer then upsamples and extracts features from the fourteenth feature map to obtain a fifteenth feature map. The decoding network then fuses the fifteenth and tenth feature maps to obtain a sixteenth feature map, which is then input into the third decoding layer. The third decoding layer then upsamples and extracts features from the sixteenth feature map to obtain the predicted image.
[0235] Figure 12 A flowchart of an image processing method is illustrated schematically, such as... Figure 12 As shown, the method may include the following steps.
[0236] Step S1201: Obtain a blurred fingerprint image.
[0237] When the blurred image used in the model training process is the result of preprocessing the original image, the steps for obtaining the blurred fingerprint image include: obtaining the original fingerprint image; preprocessing the original fingerprint image to obtain the blurred fingerprint image; wherein the preprocessing includes at least one of the following: image segmentation, size cropping, flipping, brightness enhancement, noise processing, and normalization processing.
[0238] The process of obtaining the original fingerprint image is the same as the process of obtaining the original image, and the process of preprocessing the original fingerprint image is the same as the process of preprocessing the original image, so it will not be described again here.
[0239] The execution subject of this embodiment can be a computer device, which has an image processing device, and the image processing method provided in this embodiment is executed through the image processing device. The computer device can be, for example, a smartphone, tablet computer, personal computer, etc., and this embodiment does not limit it to this type.
[0240] The execution subject in this embodiment can acquire a blurred fingerprint image in various ways. For example, the execution subject can acquire the original fingerprint image collected by a multi-point light source under-display fingerprint acquisition device, and then preprocess the acquired original fingerprint image to obtain a blurred fingerprint image.
[0241] Step S1202: Input the blurred fingerprint image into the image processing model trained by the model training method provided in any embodiment to obtain a clear fingerprint image corresponding to the blurred fingerprint image.
[0242] The image processing model can be pre-trained or trained during the image processing process; this embodiment does not limit this.
[0243] The image processing method provided in this embodiment can extract high-quality, clear fingerprint images by inputting blurry fingerprint images into an image processing model. It extracts and enhances the fingerprint ridges and valleys, and this clear fingerprint image can be directly applied to fingerprint recognition. Compared with related techniques that sequentially illuminate a light source to acquire clear fingerprint images, this embodiment can improve the efficiency of acquiring clear fingerprint images.
[0244] Figure 13 A schematic diagram of a model training device is shown. (Refer to...) Figure 13 It can include:
[0245] The acquisition module 1301 is configured to acquire a sample set, wherein the samples in the sample set include blurred and clear images of the same fingerprint;
[0246] The prediction module 1302 is configured to input the blurred image into a convolutional neural network. The encoding network in the convolutional neural network downsamples and extracts features from the blurred image, outputting multiple feature maps. The decoding network in the convolutional neural network upsamples and extracts features from the feature maps, outputting a predicted image corresponding to the blurred image. The encoding network includes multiple encoding layers, and the decoding network includes multiple decoding layers. The feature map obtained from the Fth encoding layer and the feature map obtained from the Gth decoding layer are fused and used as the input to the (G+1)th decoding layer. The feature map obtained from the Fth encoding layer and the feature map obtained from the Gth decoding layer have the same resolution. F and G are both positive integers.
[0247] The training module 1303 is configured to calculate the loss value of the convolutional neural network based on the predicted image, the clear image, and a preset loss function, and to adjust the parameters of the convolutional neural network with the goal of minimizing the loss value.
[0248] The determination module 1304 is configured to determine the completed parameter-tuned convolutional neural network as an image processing model.
[0249] Regarding the apparatus in the above embodiments, the specific ways in which each module performs operations have been described in detail in the embodiments of the relevant model training methods, for example, implemented using software, hardware, firmware, etc., and will not be elaborated here.
[0250] Figure 14A block diagram of an image processing apparatus is schematically shown. (Refer to...) Figure 14 It can include:
[0251] The acquisition module 1401 is configured to acquire a blurred fingerprint image;
[0252] Extraction module 1402 is configured to input the blurred fingerprint image into an image processing model trained by the model training method provided in any embodiment, to obtain a clear fingerprint image corresponding to the blurred fingerprint image.
[0253] Regarding the apparatus in the above embodiments, the specific manner in which each module performs its operations has been described in detail in the embodiments of the image processing method, for example, by using software, hardware, firmware, etc., and will not be elaborated here.
[0254] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0255] The various component embodiments of this disclosure can be implemented in hardware, or as software modules running on one or more processors, or a combination thereof. Those skilled in the art will understand that microprocessors or digital signal processors (DSPs) can be used in practice to implement some or all of the functions of some or all of the components in the computing processing device according to embodiments of this disclosure. This disclosure can also be implemented as a device or apparatus program (e.g., a computer program and computer program product) for performing some or all of the methods described herein. Such an implementation of this disclosure can be stored on a computer-readable medium or can be in the form of one or more signals. Such signals can be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
[0256] For example, Figure 15A computing processing apparatus is shown that can implement the methods according to this disclosure. The computing processing apparatus conventionally includes a processor 1010 and a computer program product or non-transitory computer-readable medium in the form of memory 1020. Memory 1020 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read-Only Memory), EPROM, hard disk, or ROM. Memory 1020 has storage space 1030 for program code 1031 for performing any of the method steps described above. For example, storage space 1030 for program code may include various program codes 1031 respectively for implementing the various steps in the methods described above. These program codes can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, CDs, memory cards, or floppy disks. Such computer program products are typically as described in the references. Figure 16 The portable or fixed storage unit is described above. This storage unit may have the same characteristics as... Figure 15 The memory 1020 in the computing processing device is arranged similarly to storage segments, storage spaces, etc. Program code can be compressed, for example, in an appropriate form. Typically, the storage unit includes computer-readable code 1031', that is, code that can be read by a processor such as 1010, which, when run by the computing processing device, causes the computing processing device to perform the various steps in the methods described above.
[0257] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.
[0258] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0259] The foregoing has provided a detailed description of the model training method, image processing method, computing processing device, and non-transient computer-readable medium provided by this disclosure. Specific examples have been used to illustrate the principles and implementation methods of this disclosure. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of this disclosure. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this disclosure. Therefore, the content of this specification should not be construed as a limitation of this disclosure.
[0260] It should be understood that although the steps in the flowcharts of the accompanying figures are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the accompanying figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.
[0261] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the following claims.
[0262] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.
[0263] The terms "an embodiment," "embodiment," or "one or more embodiments" as used herein mean that a particular feature, structure, or characteristic described in connection with an embodiment is included in at least one embodiment of this disclosure. Furthermore, please note that the examples of the phrase "in one embodiment" do not necessarily all refer to the same embodiment.
[0264] Numerous specific details are set forth in the specification provided herein. However, it will be understood that embodiments of this disclosure may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this specification.
[0265] In the claims, any reference signs placed between parentheses should not be construed as limiting the claims. The word "comprising" does not exclude the presence of elements or steps not listed in the claims. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. This disclosure can be implemented by means of hardware comprising a plurality of different elements and by means of a suitably programmed computer. In a unit claim enumerating a plurality of means, several of these means may be embodied by the same item of hardware. The use of the words first, second, and third, etc., does not indicate any order. These words may be interpreted as names.
[0266] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this disclosure, and are not intended to limit them. Although this disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this disclosure.
Claims
1. A model training method, wherein, include: Obtain a sample set, wherein the samples in the sample set include blurred and clear images of the same fingerprint; The blurred image is input into a convolutional neural network (CNN). The encoding network in the CNN downsamples and extracts features from the blurred image, outputting multiple feature maps. The decoding network in the CNN upsamples and extracts features from the feature maps, outputting a predicted image corresponding to the blurred image. The encoding network includes multiple encoding layers, each including a first convolutional block and / or a downsampling block, and each decoding layer including a second convolutional block and / or an upsampling block. At least one of the first convolutional block, the downsampling block, the second convolutional block, and the upsampling block includes at least one set of asymmetric convolutional kernels. The decoding network includes multiple decoding layers. The feature map obtained from the F-th encoding layer in the encoding network is fused with the feature map obtained from the G-th decoding layer in the decoding network, and this fusion is used as the input to the (G+1)-th decoding layer in the decoding network. The feature map obtained from the F-th encoding layer and the feature map obtained from the G-th decoding layer have the same resolution, where F and G are both positive integers. Based on the predicted image, the clear image, and a preset loss function, the loss value of the convolutional neural network is calculated, and the parameters of the convolutional neural network are adjusted with the goal of minimizing the loss value. The convolutional neural network with adjusted parameters is identified as the image processing model. The encoding network includes N encoding modules, and each encoding module includes M encoding levels, where M and N are both positive integers. The steps of the encoding network in the convolutional neural network downsampling and extracting features from the blurred image to output multiple feature maps include: The first encoding level of the first encoding module in the N encoding modules performs feature extraction on the blurred image; The i-th encoding level of the first encoding module performs downsampling and feature extraction on the feature map obtained by the (i-1)-th encoding level of the first encoding module in sequence; wherein i is greater than or equal to 2 and less than or equal to M; The first encoding level of the j-th encoding module in the N encoding modules performs feature extraction on the feature map obtained by processing the first encoding level of the (j-1)-th encoding module; wherein, j is greater than or equal to 2 and less than or equal to N; The i-th encoding level of the j-th encoding module downsamples the feature map obtained by the (i-1)-th encoding level of the j-th encoding module, fuses the downsampled feature map with the feature map obtained by the i-th encoding level of the (j-1)-th encoding module, and extracts features from the fused result. The plurality of feature maps include feature maps obtained by processing each coding level of the Nth coding module among the N coding modules; The decoding network includes the M decoding layers. The steps of the decoding network in the convolutional neural network upsampling and extracting features from the feature map, and outputting a predicted image corresponding to the blurred image, include: The first decoding level in the M decoding levels extracts features from the feature map obtained by the M-th encoding level of the N-th encoding module, and upsamples the extracted feature map. The feature map obtained from the (u-1)th decoding level of the M decoding levels is fused with the feature map obtained from the (M-u+1)th encoding level of the Nth encoding module to obtain a first fused feature map; wherein, u is greater than or equal to 2 and less than or equal to M-1; The first fused feature map is input into the u-th decoding level among the M decoding levels, and the u-th decoding level sequentially performs feature extraction and upsampling on the first fused feature map; The feature map obtained from the (M-1)th decoding level of the M decoding levels is fused with the feature map obtained from the first encoding level of the Nth encoding module to obtain a second fused feature map. The second fused feature map is input into the Mth decoding layer of the M decoding layers, and the Mth decoding layer performs feature extraction on the second fused feature map to obtain the predicted image.
2. The model training method according to claim 1, wherein, The step of fusing the downsampled feature map with the feature map processed by the i-th coding level of the (j-1)-th coding module, and extracting features from the fused result, includes: The feature map obtained by downsampling is concatenated with the feature map obtained by the i-th encoding level of the j-1-th encoding module in the channel dimension, and the concatenation result is used for feature extraction. The step of fusing the feature map obtained from the (u-1)th decoding level of the M decoding levels with the feature map obtained from the (M-u+1)th encoding level of the Nth encoding module to obtain the first fused feature map includes: The feature map obtained from the (u-1)th decoding layer in the M decoding layers is concatenated with the feature map obtained from the (M-u+1)th encoding layer in the Nth encoding module along the channel dimension to obtain the first fused feature map; The step of fusing the feature map obtained from the (M-1)th decoding level of the M decoding levels with the feature map obtained from the first encoding level of the Nth encoding module to obtain a second fused feature map includes: The feature map obtained from the (M-1)th decoding level of the M decoding levels is concatenated with the feature map obtained from the first encoding level of the Nth encoding module along the channel dimension to obtain the second fused feature map.
3. The model training method according to claim 1, wherein, Both the first convolutional block and the second convolutional block include a first convolutional layer and a second convolutional layer. The first convolutional layer includes the asymmetric convolutional kernel, and the second convolutional layer includes a 1×1 convolutional kernel. The downsampling block includes a max-pooling layer and a min-pooling layer, and both the max-pooling layer and the min-pooling layer include the asymmetric convolution kernel; The asymmetric convolution kernel includes a 1×k convolution kernel and a k×1 convolution kernel, where k is greater than or equal to 2.
4. The model training method according to claim 1, wherein, The convolution kernels in both the encoding and decoding layers are symmetric convolution kernels.
5. The model training method according to claim 4, wherein, The encoding network includes P encoding layers. The steps of the encoding network in the convolutional neural network downsampling and extracting features from the blurred image to output multiple feature maps include: The first coding level in the P coding levels sequentially performs feature extraction and downsampling on the blurred image; In the P coding levels, the feature map obtained by the q-th coding level is processed by the (q-1)-th coding level in sequence for feature extraction and downsampling; Wherein, q is greater than or equal to 2 and less than or equal to P, and the plurality of feature maps include feature maps obtained by processing the P coding levels.
6. The model training method according to claim 5, wherein, The decoding network includes the P decoding layers. The steps of the decoding network in the convolutional neural network upsampling and extracting features from the feature map, and outputting a predicted image corresponding to the blurred image, include: Feature extraction is performed on the feature map obtained from the Pth coding level among the P coding levels to obtain a computed feature map; The calculated feature map is fused with the feature map obtained from the processing of the Pth encoding level to obtain the third fused feature map; The third fused feature map is input into the first decoding layer among the P decoding layers, and the first decoding layer sequentially upsamples and extracts features from the third fused feature map; The feature map obtained from the (r-1)th decoding level among the P decoding levels is fused with the feature map obtained from the (P-r+1)th encoding level among the P encoding levels to obtain the fourth fused feature map; The fourth fused feature map is input to the r-th decoding level among the P decoding levels, and the r-th decoding level performs upsampling and feature extraction on the fourth fused feature map in sequence; Wherein, r is greater than or equal to 2 and less than or equal to P, and the predicted image is the feature map obtained by processing the Pth decoding level among the P decoding levels.
7. The model training method according to claim 6, wherein, The step of fusing the calculated feature map with the feature map obtained from the Pth encoding level to obtain the third fused feature map includes: The calculated feature map is concatenated with the feature map obtained from the Pth encoding level along the channel dimension to obtain the third fused feature map; The step of fusing the feature map obtained from the (r-1)th decoding level among the P decoding levels with the feature map obtained from the (P-r+1)th encoding level among the P encoding levels to obtain the fourth fused feature map includes: The feature map obtained from the (r-1)th decoding level of the P decoding levels is concatenated with the feature map obtained from the (P-r+1)th encoding level of the P encoding levels along the channel dimension to obtain the fourth fused feature map.
8. The model training method according to any one of claims 1 to 7, wherein, The step of calculating the loss value of the convolutional neural network based on the predicted image, the clear image, and a preset loss function includes: The loss value is calculated using the following formula: , , , Among them, the The loss value is Y, and the predicted image is Y. The sharpened image is defined as follows: W is the width of the predicted image, H is the height of the predicted image, C is the number of channels of the predicted image, and E(Y) is the edge map of the predicted image. ) is the edge map of the clear image, where λ is greater than or equal to 0 and less than or equal to 1, x is a positive integer greater than or equal to 1 and less than or equal to W, y is a positive integer greater than or equal to 1 and less than or equal to H, and z is a positive integer greater than or equal to 1 and less than or equal to C.
9. The model training method according to any one of claims 1 to 7, wherein, The step of obtaining the sample set includes: Obtain the original image of the same fingerprint; The original image is preprocessed to obtain the blurred image; wherein the preprocessing includes at least one of the following: image segmentation, size cropping, flipping, brightness enhancement, noise reduction, and normalization.
10. The model training method according to claim 9, wherein, The step of preprocessing the original image to obtain the blurred image includes: The original image is segmented to obtain a first image, a second image, and a third image, wherein the first image, the second image, and the third image respectively contain information about different regions of the original image; The first image, the second image, and the third image are each normalized. The blurred image includes the normalized first image, the second image, and the third image.
11. The model training method according to claim 10, wherein, The original image includes the first pixel value of the first pixel. The step of performing image segmentation on the original image to obtain the first image, the second image, and the third image includes: If the first pixel is located outside the preset area, and the value of the first pixel is greater than or equal to the first threshold and less than or equal to the second threshold, then the pixel value of the first pixel in the first image is determined to be the first pixel value. If the first pixel is located outside the preset area, and the value of the first pixel is less than the first threshold and greater than the second threshold, then the pixel value of the first pixel in the first image is determined to be 0. If the first pixel is located outside the preset area, and the value of the first pixel is greater than or equal to the third threshold and less than or equal to the fourth threshold, then the pixel value of the first pixel in the second image is determined to be the first pixel value. If the first pixel is located outside the preset area, and the value of the first pixel is less than the third threshold and greater than the fourth threshold, then the pixel value of the first pixel in the second image is determined to be 0. If the first pixel is located within a preset area, then the pixel value of the first pixel in the third image is determined to be the first pixel value; The third threshold is greater than the second threshold.
12. The model training method according to claim 10, wherein, The step of segmenting the original image to obtain a first image, a second image, and a third image includes: Edge detection is performed on the original image, and the original image is segmented into the first image, the second image, and the third image based on the position and length of the detected edges.
13. The model training method according to claim 10, wherein, The step of normalizing the first image, the second image, and the third image respectively includes: Determine the maximum and minimum values among all pixel values contained in the image to be processed, wherein the image to be processed is any one of the first image, the second image, and the third image, and the image to be processed includes the second pixel value of the second pixel; Based on the maximum value, minimum value, and the second pixel value, the pixel value of the second pixel in the normalized image to be processed is determined.
14. An image processing method, wherein, include: Obtain a blurred fingerprint image; The blurred fingerprint image is input into the image processing model trained by the model training method according to any one of claims 1 to 13 to obtain a clear fingerprint image corresponding to the blurred fingerprint image.
15. The image processing method according to claim 14, wherein, When the blurred image is the result of preprocessing the original image, the step of obtaining the blurred fingerprint image includes: Obtain the raw fingerprint image; The original fingerprint image is preprocessed to obtain the blurred fingerprint image; wherein the preprocessing includes at least one of the following: image segmentation, size cropping, flipping, brightness enhancement, noise reduction, and normalization.
16. A computing processing device, wherein, include: Memory containing computer-readable code; One or more processors, wherein when the computer-readable code is executed by the one or more processors, the computing processing device performs the method as described in any one of claims 1 to 15.
17. A non-transient computer-readable medium storing computer-readable code that, when executed on a computing processing device, causes the computing processing device to perform the method according to any one of claims 1 to 15.