Image processing method and device based on wavelet pooling network

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By designing flexible updaters and predictors in wavelet pooling networks and combining convolutional neural networks and Transformers, the problems of information loss and insufficient global feature dependency in wavelet pooling methods are solved, achieving more refined image processing results.

CN117576419BActive Publication Date: 2026-06-19XIDIAN UNIV

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: XIDIAN UNIV
Filing Date: 2023-11-13
Publication Date: 2026-06-19

Application Information

Patent Timeline

13 Nov 2023

Application

19 Jun 2026

Publication

CN117576419B

IPC: G06V10/52; G06V10/42; G06V10/44; G06V10/80; G06V10/82; G06N3/0455; G06N3/0464

AI Tagging

Application Domain

Character and pattern recognition Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN117576419B_ABST

Patent Text Reader

Abstract

This invention discloses an image processing method and apparatus based on wavelet pooling networks, relating to the field of computer vision technology. The method includes: acquiring an image to be processed; inputting the image to be processed into a preset convolutional neural network, processing it through a wavelet pooling module of the preset convolutional neural network, considering the influence of local and global features of the image to be processed, and outputting the processed image; wherein the wavelet pooling module includes a predictor and an updater, which are identical, making wavelet pooling linearly independent at the feature layer. The wavelet enhancement scheme provided by this invention fully considers global and local influences when generating detail and summary information, and can generate more refined wavelet high- and low-frequency information.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer vision technology, specifically relating to an image processing method and apparatus based on wavelet pooling networks. Background Technology

[0002] Pooling plays a crucial role in convolutional neural networks (CNNs). Its presence allows neural networks to capture features at different scales and simplifies the computation of parameters, making it possible to deepen the network. Existing pooling schemes include max pooling and average pooling, both of which often lose important image information. Although much work has addressed this information loss, wavelet pooling stands out as an important method. By focusing on pooling in the wavelet domain, it alleviates the loss of high-frequency details while preserving important geometric information. Boosting wavelet pooling has been utilized to construct pooling schemes, increasing its adaptability. However, recent boosting wavelet constructions only consider enhancing the learnability of convolutional neural networks, meaning they only consider the influence between local regions during the boosting process, lacking consideration of global feature dependencies.

[0003] Among related technologies, the patent "Image Classification Method and System Based on Red-Black Morphological Wavelet Pooling Network" (CN111461259A) proposed by He Chu et al. of Wuhan University mainly integrates red-black morphological wavelets with max pooling to construct a lifting scheme pooling layer for extracting features of the image to be classified and performing downsampling. The red-black morphological wavelet branch includes vertical and horizontal lifting processes as well as diagonal lifting processes, which can effectively improve the recognition ability of the neural network and significantly improve accuracy in the classification of benchmark images. However, its main core relies on the structural features and multi-scale decomposition of red-black wavelets based on the lifting scheme to extract image texture information, thereby improving robustness while maintaining accuracy. The specific lifting scheme algorithm is fixed and lacks flexibility. Furthermore, the fixed algorithm only considers the influence of local features.

[0004] In their dissertation, Yu Hongtao et al. from Southwestern University of Finance and Economics proposed "Research on Pooling Technology of Convolutional Neural Networks Based on Wavelet Transform," which mainly uses wavelet transform to implement the pooling process of convolutional neural networks. Specifically, they used a first-order two-dimensional discrete wavelet transform to downsample the feature image, and used the approximate information obtained by low-pass filtering of the original image as the output result after pooling. They compared the experimental results with a method that uses a second-order wavelet transform followed by an inverse wavelet transform to achieve wavelet pooling. Although their experiments verified that wavelet transform has superior pooling performance compared to other pooling methods in pooling layers with larger pooling kernels, the discrete wavelet transform is computationally complex, depends to some extent on the pooling kernel size, and its algorithm is relatively fixed, lacking learnability and global dependency.

[0005] Therefore, there is an urgent need to propose an image processing method using wavelet pooling networks that fully considers both global and local effects. Summary of the Invention

[0006] To address the aforementioned problems in the existing technology, this invention provides an image processing method and apparatus based on wavelet pooling networks. The technical problem to be solved by this invention is achieved through the following technical solution:

[0007] In a first aspect, the present invention provides an image processing method based on a wavelet pooling network, comprising:

[0008] Obtain the image to be processed;

[0009] The image to be processed is input into a preset convolutional neural network. After being processed by the wavelet pooling module of the preset convolutional neural network, the local and global feature effects of the image to be processed are taken into account, and the processed image is output. The wavelet pooling module includes a predictor and an updater. The predictor and updater are the same, so that the wavelet pooling has linear independence in the feature layer.

[0010] Secondly, the present invention also provides an image processing apparatus based on a wavelet pooling network, comprising:

[0011] The image acquisition module is used to acquire the image to be processed.

[0012] The image processing module is used to input the image to be processed into a preset convolutional neural network. After processing by the wavelet pooling module of the preset convolutional neural network, the local and global feature effects of the image to be processed are considered, and the processed image is output. The wavelet pooling module includes a predictor and an updater. The predictor and updater are the same, so that the wavelet pooling has linear independence in the feature layer.

[0013] The beneficial effects of this invention are:

[0014] This invention provides an image processing method and apparatus based on wavelet pooling networks, which mainly solves the problem of information loss in existing pooling methods. The implementation scheme is based on the structure of the lifting wavelet scheme itself, flexibly designing the attributes of its updater and predictor, and combining convolutional neural networks and Transformers for design, so that the lifting wavelet scheme fully considers global and local effects when generating detailed and summary information, thereby generating more refined wavelet high and low frequency information.

[0015] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0016] Figure 1 This is a flowchart of an image processing method based on wavelet pooling network provided in an embodiment of the present invention;

[0017] Figure 2 This is a schematic diagram of lifting wavelet pooling provided in an embodiment of the present invention;

[0018] Figure 3 This is a schematic diagram of an updater or predictor operating in parallel according to an embodiment of the present invention;

[0019] Figure 4 This is a schematic diagram of an updater or predictor sequence provided in an embodiment of the present invention;

[0020] Figure 5 This is a schematic diagram of a preset convolutional neural network for image classification provided in an embodiment of the present invention;

[0021] Figure 6 This is a schematic diagram of a preset convolutional neural network for image segmentation provided in an embodiment of the present invention;

[0022] Figure 7 This is a schematic diagram of the simulation experiment output provided in an embodiment of the present invention. Detailed Implementation

[0023] The present invention will be further described in detail below with reference to specific embodiments, but the implementation of the present invention is not limited thereto.

[0024] Please see Figure 1 , Figure 1 This is a flowchart of an image processing method based on a wavelet pooling network provided by an embodiment of the present invention. The image processing method based on a wavelet pooling network provided by the present invention includes:

[0025] Obtain the image to be processed;

[0026] The image to be processed is input into a preset convolutional neural network. After being processed by the wavelet pooling module of the preset convolutional neural network, the local and global feature effects of the image to be processed are taken into account, and the processed image is output. The wavelet pooling module includes a predictor and an updater. The predictor and updater are the same, so that the wavelet pooling has linear independence in the feature layer.

[0027] Specifically, this embodiment provides an image processing method based on wavelet pooling networks, which mainly solves the information loss problem in existing pooling methods. The implementation scheme is based on the structure of the lifting wavelet scheme itself, flexibly designing the attributes of its updater and predictor, and combining convolutional neural networks and Transformers for design, so that the lifting wavelet scheme fully considers global and local effects when generating detailed and summary information, thereby generating more refined wavelet high and low frequency information.

[0028] In an optional embodiment of the present invention, please refer to Figure 2 , Figure 2 This is a schematic diagram of an improved wavelet pooling method provided in an embodiment of the present invention. The wavelet pooling module includes a first processing module, a second processing module, a third processing module, and a fusion module.

[0029] The first processing module includes a horizontal partitioning module, a first predictor, and a first updater. The horizontal partitioning module divides the input image horizontally to obtain non-overlapping partitioning components x1 and x2, where [x1,x2] = HSplit(x). The first predictor processes x1 and x2 to obtain a high-frequency component H, where H = x2 - LG - Predictor(x1). The first updater processes x1 and x2 to obtain a low-frequency component L, where L = x1 + LG - Updater(x2).

[0030] The second processing module includes a first vertical partitioning module, a second predictor, and a second updater. The first vertical partitioning module divides the input high-frequency components vertically to obtain components HL0 and HH0, where [HL0,HH0] = VSplit(H). The second predictor processes HL0 and HH0 to obtain a first high-frequency feature HH, where HH = HH0 - LG - Predictor(HL0). The second updater processes HL0 and HH0 to obtain a second high-frequency feature HL, where HL = HL0 + LG - Updater(HH0).

[0031] The third processing module includes a second vertical partitioning module, a third predictor, and a third updater. The second vertical partitioning module divides the input low-frequency components vertically to obtain components LH0 and LL0, where [LL0,LH0] = VSplit(L). The third predictor processes LH0 and LL0 to obtain the third high-frequency feature, LH = LH0 - LG - Predictor(LL0). The third updater processes LH0 and LL0 to obtain the first low-frequency feature LL, where LL = LL0 + LG - Updater(LH0).

[0032] The fusion module fuses the first high-frequency feature, the second high-frequency feature, the third high-frequency feature, and the first low-frequency feature to obtain the fused feature.

[0033] For details, please continue to see Figure 2 The novel lifting wavelet pooling proposed in this embodiment includes three processing modules, each equipped with an updater and a predictor. Through the processing of these three modules, the global and local effects are fully considered when generating detailed and summary information, thereby generating more refined wavelet high and low frequency information.

[0034] In an optional embodiment of the present invention, please refer to Figure 3 , Figure 3 This is a schematic diagram of an updater or predictor in parallel according to an embodiment of the present invention. Both the predictor and the updater are parallel structures. The predictor or updater includes a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer, a first activation function, and a second activation function.

[0035] The first convolutional layer processes the input image and inputs the result into the first activation function. After processing by the first activation function, the image is input into the second convolutional layer, processed by the second convolutional layer, and then input into the second activation function. The third, fourth, and fifth convolutional layers process the input image. The softmax function is used to process the image processed by the third and fourth convolutional layers to obtain the first processing result. The softmax function is used to process the image processed by the fifth convolutional layer and the first processing result to obtain the second processing result. The softmax function is used to process the image processed by the first activation function and the second processing result to obtain the third processing result. The third processing result is added to the input image to obtain the final processing result.

[0036] In an optional embodiment of the present invention, the final processing result F output by the predictor or updater in a parallel structure is... LGp The expression is:

[0037]

[0038] Where x represents the input image, softmax(·) represents the softmax function, q represents the query value of the image after the third convolutional layer, k represents the key value of the image after the fourth convolutional layer, and v represents the value value of the image after the fifth convolutional layer. T This represents the transpose of k. denoted by scale coefficient, g(x) represents the image after processing by the second activation function.

[0039] For details, please continue to see Figure 3 In this embodiment, the updater or predictor is provided as a parallel property, and the wavelet feature decomposition is achieved by combining the convolutional neural network and the self-attention mechanism. The influence of local and global features is considered simultaneously during the wavelet pooling process.

[0040] In an optional embodiment of the present invention, please refer to Figure 4 , Figure 4 This is a schematic diagram of a serial updater or predictor provided in an embodiment of the present invention. Both the predictor and the updater are serial structures. The predictor or updater includes a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer, a first activation function, and a second activation function. The first convolutional layer processes the input image and inputs the processing result into the first activation function. After processing by the first activation function, the image is input into the second convolutional layer. After processing by the second convolutional layer, the image is input into the second activation function and processed by the second activation function. The third convolutional layer processes the image processed by the second activation function. The fourth convolutional layer processes the image processed by the second activation function. The fifth convolutional layer processes the image processed by the second activation function. The Softmax function is used to process the image processed by the third convolutional layer and the image processed by the fourth convolutional layer to obtain a first processing result. The Softmax function is used to process the image processed by the fifth convolutional layer and the first processing result to obtain a second processing result. The second processing result is added to the input image to obtain the final processing result.

[0041] In an optional embodiment of the present invention, the final processing result F output by the predictor or updater in a serial structure is... LGp The expression is:

[0042]

[0043] g(x) = δ(w2(δ(w1x)));

[0044] q'=w3g(x); k'=w4g(x); v'=w5g(x);

[0045] Where x represents the input image, softmax(·) represents the softmax function, q' represents the query value of the image after the third convolutional layer, k' represents the key value of the image after the fourth convolutional layer, and v' represents the value of the image after the fifth convolutional layer. T This represents the transpose of k'. Let g(x) represent the scaling factor, g(x) represent the image after processing with the second activation function, δ(·) represent the ReLU activation function operation, and w represent the scaling factor. i The convolution operation is represented by , where i is the convolution label, w1 represents the convolution operation of the first convolutional layer, w2 represents the convolution operation of the second convolutional layer, w3 represents the convolution operation of the third convolutional layer, w4 represents the convolution operation of the fourth convolutional layer, and w5 represents the convolution operation of the fifth convolutional layer.

[0046] For details, please continue to see Figure 4 In this embodiment, the updater and predictor are serial attributes, and the wavelet feature decomposition is achieved by combining a convolutional neural network and a self-attention mechanism. The influence of local and global features is considered simultaneously during the wavelet pooling process.

[0047] In this embodiment, the updater and predictor are set to the same structure, so that the proposed wavelet pooling is linearly independent at the feature level, which improves the convergence speed of the network.

[0048] In an optional embodiment of the present invention, please refer to Figure 5 , Figure 5 This is a schematic diagram of a preset convolutional neural network for image classification provided in an embodiment of the present invention. The preset convolutional neural network for image classification includes a first convolutional module, a second convolutional module, a first wavelet pooling module, a third convolutional module, a second wavelet pooling module, a fourth convolutional module, and a Softmax function.

[0049] The image to be processed is processed by the first convolution module to increase the number of channels, then by the second convolution module to increase the number of channels again, then by the first pooling module to decrease the length and width, then by the third convolution module to increase the number of channels, then by the second pooling module to decrease the length and width, then by the fourth convolution module to increase the number of channels, and finally by the Softmax function to output the image category.

[0050] In this embodiment, the wavelet pooling proposed above is used instead of the residual network pooling scheme to obtain a high-precision scene parsing network.

[0051] It should be noted that, Figure 5The illustrated embodiments only schematically show the number of convolutional layers in each convolutional module and do not represent the actual number. The input image size is (3, 32, 32). After processing by Conv1_1 and Conv1_2 in the first convolutional module, the image size is (64, 32, 32). After processing by Conv2_1, Conv2_2, and Conv2_3 in the second convolutional module, the image size is (128, 32, 32). After processing by the first pooling module, the image size... The image size is (128, 16, 16). After processing by Conv3_1, Conv3_2, and Conv3_3 in the third convolution module, the image size is (256, 16, 16). After processing by the second pooling module, the image size is (256, 8, 8). After processing by Conv4_1, Conv4_2, and Conv4_3 in the fourth convolution module, the image size is (512, 8, 8). After processing by the Softmax function, the image category is output.

[0052] In an optional embodiment of the present invention, please refer to Figure 6 , Figure 6 This is a schematic diagram of a preset convolutional neural network for image segmentation provided in an embodiment of the present invention, including a CNN network, convolutional layers, a first wavelet pooling module, a second wavelet pooling module, and a multilayer perceptron;

[0053] The image to be processed is processed by a CNN network to increase the number of channels and decrease the length and width. After processing by a convolutional layer, the number of channels is reduced. After being upsampled by the first pooling module and then by the second pooling module, the image is processed by a multilayer perceptron and the resulting image segmentation is output.

[0054] In this embodiment, the wavelet pooling proposed above is used instead of the residual network pooling scheme to obtain a high-precision scene parsing network.

[0055] It should be noted that, Figure 6 The illustrated embodiment only shows the schematic representation of each module in the CNN network and does not represent its actual number; the input image size is (3, 480, 480), after processing by the CNN network, the image size is (2048, 60, 60), after processing by the Conv_layer convolutional layer, the image size is (341, 60, 60), after processing by the first pooling module, the image size is (341, 60, 60), after processing by the second pooling module, the image size is (341, 60, 60), and after processing by the multilayer perceptron, the output image segmentation result is shown.

[0056] In an optional embodiment of the present invention, the effectiveness of the image processing method proposed in the above embodiments of the present invention is verified by simulation experiments. For details, please refer to... Figure 7 , Figure 7 This is a schematic diagram of the simulation experiment output provided in the embodiment of the present invention. The image classification experiment was conducted on the CIFAR10 and CIFAR100 datasets, and the image segmentation experiment was conducted on the ADE20K dataset. GroundTruth represents the true label, Conv(Stride=2) represents the convolution operation with a stride of 2, and AvePooling represents average pooling. It can effectively verify the superior performance of the proposed algorithm in the image parsing task.

[0057] Based on the same inventive concept, this invention also provides an image processing apparatus based on a wavelet pooling network, applicable to the image processing method based on a wavelet pooling network provided in the above embodiments of this invention. Please refer to the above description, which will not be repeated here; the image processing apparatus includes:

[0058] Image processing method: Image acquisition module, used to acquire the image to be processed;

[0059] The image processing module is used to input the image to be processed into a preset convolutional neural network. After processing by the wavelet pooling module of the preset convolutional neural network, the local and global feature effects of the image to be processed are considered, and the processed image is output. The wavelet pooling module includes a predictor and an updater. The predictor and updater are the same, so that the wavelet pooling has linear independence in the feature layer.

[0060] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations are intended to cover non-exclusive inclusion, such that an article or device comprising a list of elements includes not only those elements but also other elements not expressly listed. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the article or device comprising said element. Terms such as "connected" or "linked" are not limited to physical or mechanical connections but can include electrical connections, whether direct or indirect. The orientations or positional relationships indicated by terms such as "upper," "lower," "left," and "right" are based on the orientations or positional relationships shown in the accompanying drawings and are used only for the convenience of describing the invention and for simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the invention.

[0061] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features or characteristics described may be combined in any suitable manner in one or more embodiments or examples. In addition, those skilled in the art can combine and integrate the different embodiments or examples described in this specification.

[0062] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, various simple deductions or substitutions can be made without departing from the concept of the present invention, and all such modifications and substitutions should be considered within the scope of protection of the present invention.

Claims

1. A method for image processing based on a wavelet pooling network, characterized in that, include: Obtain the image to be processed; The image to be processed is input into a preset convolutional neural network (CNN), and processed by the wavelet pooling module of the preset CNN. Taking into account the local and global feature effects of the image to be processed, the processed image is output. The wavelet pooling module includes a predictor and an updater, which are identical, ensuring linear independence of wavelet pooling at the feature layer. The preset CNN is an image classification network or an image segmentation network. Both the predictor and the updater are parallel or serial structures. When it is a parallel structure, the predictor or the updater includes a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer, a first activation function, and a second activation function; The first convolutional layer processes the input image and inputs the processing result into the first activation function. After processing by the first activation function, the image is input into the second convolutional layer, processed by the second convolutional layer, and then input into the second activation function. The third convolutional layer processes the input image, the fourth convolutional layer processes the input image, and the fifth convolutional layer processes the input image. The Softmax function is used to process the image processed by the third convolutional layer and the image processed by the fourth convolutional layer to obtain a first processing result. The Softmax function is used to process the image processed by the fifth convolutional layer and the first processing result to obtain a second processing result. The Softmax function is used to process the image processed by the first activation function and the second processing result to obtain a third processing result. The third processing result is added to the input image to obtain the final processing result. When it is a serial structure, the predictor or the updater includes a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer, a first activation function, and a second activation function; The first convolutional layer processes the input image and inputs the processing result into the first activation function. After processing by the first activation function, the image is input into the second convolutional layer, processed by the second convolutional layer, and then input into the second activation function. The third convolutional layer processes the image processed by the second activation function, the fourth convolutional layer processes the image processed by the second activation function, and the fifth convolutional layer processes the image processed by the second activation function. The softmax function is used to process the image processed by the third convolutional layer and the image processed by the fourth convolutional layer to obtain a first processing result. The softmax function is used to process the image processed by the fifth convolutional layer and the first processing result to obtain a second processing result. The second processing result is added to the input image to obtain the final processing result.

2. The wavelet pooling network-based image processing method of claim 1, wherein, The wavelet pooling module includes a first processing module, a second processing module, a third processing module, and a fusion module; The first processing module comprises a horizontal division module, a first predictor and a first updater, the horizontal division module divides the input image in horizontal direction to obtain non-overlapping division components and ; the first predictor processes and to obtain high-frequency components, and the first updater processes and to obtain low-frequency components; The second processing module includes a first vertical partitioning module, a second predictor, and a second updater. The first vertical partitioning module partitions the input high-frequency components vertically to obtain the components. and The second predictor pairs and The process is performed to obtain the first high-frequency feature, and the second updater then processes the data to obtain the second high-frequency feature. and The second high-frequency feature is obtained through processing. The third processing module includes a second vertical partitioning module, a third predictor, and a third updater. The second vertical partitioning module partitions the input low-frequency components vertically to obtain the components. and The third predictor is... and The process is performed to obtain the third high-frequency feature, and the third updater then processes the data to obtain the third high-frequency feature. and The process is performed to obtain the first low-frequency feature; The fusion module fuses the first high-frequency feature, the second high-frequency feature, the third high-frequency feature, and the first low-frequency feature to obtain the fused feature.

3. The image processing method based on wavelet pooling network according to claim 1, characterized in that, The final processing result output by the predictor or updater in the parallel structure The expression is: ； in, This represents the input image. This represents the Softmax function. This represents the query value of the image after processing by the third convolutional layer. This represents the key value of the image after processing by the fourth convolutional layer. This represents the value of the image after processing by the fifth convolutional layer. express transpose, Represents the scaling factor. This represents the image after processing by the second activation function.

4. The image processing method based on wavelet pooling network according to claim 1, characterized in that, The final processing result output by the predictor or updater in the serial structure The expression is: ；；； in, This represents the input image. This represents the Softmax function. This represents the query value of the image after processing by the third convolutional layer. This represents the key value of the image after processing by the fourth convolutional layer. This represents the value of the image after processing by the fifth convolutional layer. express transpose, Represents the scaling factor. This represents the image after processing by the second activation function. This represents the ReLU activation function operation. This represents the convolution operation. The convolution label.

5. The image processing method based on wavelet pooling network according to claim 1, characterized in that, The preset convolutional neural network is an image classification network, including a first convolutional module, a second convolutional module, a first wavelet pooling module, a third convolutional module, a second wavelet pooling module, a fourth convolutional module, and a Softmax function; The image to be processed is processed by the first convolution module to increase the number of channels, then by the second convolution module to increase the number of channels again, then by the first pooling module to decrease the length and width, then by the third convolution module to increase the number of channels, then by the second pooling module to decrease the length and width, then by the fourth convolution module to increase the number of channels, and finally by the Softmax function to output the image category.

6. The image processing method based on wavelet pooling network according to claim 1, characterized in that, The predicted convolutional neural network is an image segmentation network, including a CNN network, convolutional layers, a first wavelet pooling module, a second wavelet pooling module, and a multilayer perceptron. The image to be processed is processed by the CNN network to increase the number of channels and decrease the length and width. After being processed by the convolutional layer, the number of channels is reduced. After being upsampled by the first pooling module and then by the second pooling module, the image is processed by the multilayer perceptron and the resulting image segmentation is output.

7. An image processing device based on a wavelet pooling network, characterized in that, include: The image acquisition module is used to acquire the image to be processed. An image processing module is used to input the image to be processed into a preset convolutional neural network, and process it through the wavelet pooling module of the preset convolutional neural network, taking into account the local and global feature effects of the image to be processed, and outputting the processed image; wherein, the wavelet pooling module includes a predictor and an updater, the predictor and the updater are the same, so that the wavelet pooling has linear independence at the feature layer; the preset convolutional neural network is an image classification network or an image segmentation network; the predictor and the updater are both parallel structures or serial structures; The image processing module is further configured such that, when in a parallel structure, the predictor or the updater includes a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer, a first activation function, and a second activation function; the first convolutional layer processes the input image and inputs the processing result into the first activation function, then inputs it into the second convolutional layer after processing by the first activation function, then inputs it into the second activation function after processing by the second convolutional layer, and finally processes it through the second activation function; The third convolutional layer processes the input image, the fourth convolutional layer processes the input image, and the fifth convolutional layer processes the input image. The Softmax function is used to process the image processed by the third convolutional layer and the image processed by the fourth convolutional layer to obtain a first processing result. The Softmax function is used to process the image processed by the fifth convolutional layer and the first processing result to obtain a second processing result. The Softmax function is used to process the image processed by the first activation function and the second processing result to obtain a third processing result. The third processing result is added to the input image to obtain the final processing result. The image processing module is further configured such that, when in a serial structure, the predictor or updater includes a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer, a first activation function, and a second activation function; the first convolutional layer processes the input image and inputs the processing result into the first activation function, then inputs it into the second convolutional layer after processing by the first activation function, then inputs it into the second activation function after processing by the second convolutional layer, and then processes it again; the third convolutional layer processes the image processed by the second activation function; the fourth convolutional layer processes the image processed by the second activation function; the fifth convolutional layer processes the image processed by the second activation function; the Softmax function is used to process the image processed by the third convolutional layer and the image processed by the fourth convolutional layer to obtain a first processing result; the Softmax function is used to process the image processed by the fifth convolutional layer and the first processing result to obtain a second processing result; and the second processing result is added to the input image to obtain the final processing result.