Pipe fitting image classification method based on light network model
By using pointwise convolution and feature map augmentation techniques in lightweight convolutional neural networks, the pipe fitting image classification model is optimized, solving the problems of high computational cost and poor adaptability of traditional models, and achieving efficient and low-resource image classification results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANGHAI UNIV
- Filing Date
- 2023-04-18
- Publication Date
- 2026-06-23
AI Technical Summary
Traditional convolutional neural network models suffer from high computational cost, slow processing speed, and poor adaptability to different types of images in pipe fitting image classification, resulting in high computer performance requirements and low classification accuracy.
A lightweight convolutional neural network is employed to expand the feature map through pointwise convolution and enhance the flow of feature maps between convolutional layers. This results in a classification model suitable for pipe fitting image datasets, which includes pointwise convolution, batch normalization, and ReLU6 activation function. The network architecture is optimized by combining block structure and inverse bottleneck structure.
With relatively low computational and parameter requirements, it achieves high accuracy in pipe fitting image classification, saving computational resources and maintaining high classification accuracy under low hardware platform requirements, outperforming traditional models and MobileNet.
Smart Images

Figure CN116468943B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the fields of image classification and deep learning, and specifically relates to a method for classifying pipe images based on lightweight neural networks. Background Technology
[0002] As deep neural network technology matures in image classification, traditional neural network models such as VGG16, ResNet, Inception, Xception, and DenseNet have achieved good accuracy. However, traditional convolutional neural network classification models are large in size and slow in computation, which greatly affects their practical use. In order to better apply deep learning to real life, reduce running time, and save operating resources, exploring efficient network architectures has become a trend.
[0003] Unlike traditional large-scale deep learning networks, today's compact neural network models, such as MobileNet, ShuffleNet, and GhostNet, can achieve results comparable to or better than traditional networks while consuming very few resources. However, because most models are trained and validated on the ImageNet dataset, they exhibit a strong bias towards it. Traditional classification methods directly apply these models to pipe image datasets with limited size, scale, and number of categories, resulting in poor model-dataset compatibility. Furthermore, directly training existing neural network models to classify pipe images incurs high computational costs, demands high computer performance, and yields low classification accuracy. Summary of the Invention
[0004] To overcome the problems of high computational cost and poor adaptability to different types of images in existing models, this invention proposes an image classification method based on lightweight convolutional neural networks. By expanding the feature map through pointwise convolution and enhancing the flow of feature maps between convolutional layers, a small-sized, high-accuracy, and fast classification model can be trained on different image datasets to achieve image classification.
[0005] To solve the above problems, the technical solution of the present invention is as follows:
[0006] An image classification method based on lightweight neural networks, characterized by the following steps:
[0007] S1. Obtain pipe fitting image data, which is divided into training set and test set;
[0008] S2. Perform normalization preprocessing and data augmentation on the training set data; perform normalization on the test set data;
[0009] S3. Construct a lightweight convolutional neural network and train it using the training set data to obtain a trained neural network image classification model.
[0010] S4. Input the test set data into the neural network image classification model and output the probability of the pipe fitting image data corresponding to all categories, where the category with the highest probability is the category to which the pipe fitting image data belongs.
[0011] Furthermore, step S3, which trains the lightweight convolutional neural network using the training set data, includes the following steps:
[0012] S3.1 Let the number of channels c, depth n, and stride s of the output feature map of the block structure be defined;
[0013] S3.2 Obtain the number of channels c0 of the input feature map and define the input features as historical features;
[0014] S3.3 processes the input feature map using pointwise convolution, with the number of output channels being twice the number of input channels, and concatenates the generated feature map with the input feature map;
[0015] S3.4 The feature maps obtained in step S3.4 are batch normalized and filtered using the ReLU6 activation function;
[0016] S3.5 Determine if the required depth n for the block structure is 1. If yes, input the feature map from the previous step into a depthwise separable convolution and output it. If not, proceed to step S3.6.
[0017] S3.6 Iterate through the following steps n-1 times, with the current iteration number being i:
[0018] S3.6.1 updates the historical features to the concatenation of the feature map obtained in the last step and itself.
[0019] S3.6.2 Expand the feature map obtained in the previous step with pointwise convolution, and output the number of channels as (c0+c0*0.125*(i+1))*2. Concatenate the generated feature map with the historical feature map.
[0020] S3.6.3 Batch standardize the feature maps obtained in the previous step and filter them using the ReLU6 activation function;
[0021] S3.6.4 If this is the last iteration, input the feature map obtained in the previous step into a depthwise separable convolution. The depthwise separable convolution outputs c channels with a stride of s. Batch normalize the generated feature map and output it after passing it through the ReLU6 activation function. If this is not the last iteration, input the feature map obtained in the previous step into a depthwise separable convolution. The depthwise separable convolution outputs (c0 + c0 * 0.125 * (i + 1)) channels with a stride of 1.
[0022] Preferably, the lightweight convolutional neural network includes a 3×3 standard convolutional layer, multiple block structures, a global average pooling layer, a fully connected layer, and a Softmax activation function. The block structure includes multiple inverse bottleneck structures, and the connection method of different bottleneck structures is as follows: the input is connected to each fusion layer and 1×1 convolution in all bottleneck structures; the first 1×1 convolution in all bottleneck structures is connected to each subsequent fusion layer and 1×1 convolution; the last 1×1 convolution in all bottleneck structures is connected to each subsequent fusion layer and 1×1 convolution; and each bottleneck structure is cascaded. Activation functions and batch normalization are added between model layers. Except for the Softmax activation function used to output the classification result at the end, all other activation functions in the model use the ReLU6 activation function.
[0023] Compared with the prior art, the beneficial effects of the present invention are:
[0024] 1. This invention uses a large number of pointwise convolutions to expand the dimensionality of the feature map, thereby expanding the features of the pipe fitting image and enabling it to maintain high accuracy with a small amount of training data.
[0025] 2. This invention uses multiplexing to expand historical feature maps into current feature maps, saving a lot of computational pressure for pointwise convolution and achieving higher efficiency compared to traditional methods.
[0026] 3. In the process of reusing historical features, this invention ensures that the output of each bottleneck layer increases by one-eighth, so that the number of nearby feature maps reused is greater than that of distant feature maps, and also solves the problem that the current feature has low dependence on the early feature.
[0027] 4. This invention has a simple structure, is easy to train, and has low hardware platform requirements. While requiring only tens of times less computation and parameters, the final classification accuracy is still far higher than traditional classification models. Even compared to MobileNet, the most commonly used high-efficiency classification model, it can save several times the computational cost while achieving almost the same accuracy. Attached Figure Description
[0028] Figure 1 This is a flowchart of the image classification method based on lightweight neural networks of the present invention.
[0029] Figure 2 This is a schematic diagram of the lightweight convolutional neural network structure in this invention.
[0030] Figure 3 This is a flowchart illustrating a depth of 2 in an embodiment of the image classification method based on a lightweight neural network according to the present invention. Detailed Implementation
[0031] The technical solution of the present invention will be described in detail below with reference to the accompanying drawings and embodiments, but this should not be construed as limiting the scope of protection of the present invention.
[0032] Please see Figure 1 and Figure 2 , Figure 1 This is a flowchart of the image classification method based on lightweight neural networks of the present invention. Figure 2 This is a schematic diagram of the lightweight convolutional neural network structure in this invention. As shown in the figure, an image classification method based on a lightweight neural network includes the following steps:
[0033] 1. Feed the image into a 3×3 convolutional layer. The output channel of the convolution has a stride of 2. The block structure is a type of layer structure. The specific calculation steps are as follows: the number of channels is 32*S, and the stride is 2.
[0034] 2. Input the feature map obtained in the previous step into the block structure, with an output channel count of 16*S and a depth of 2.
[0035] a. Obtain the number of channels c0 in the input feature map and define the input features as historical features.
[0036] b. Process the input feature map using pointwise convolution, with the number of output channels being twice the number of input channels, and concatenate the generated feature map with the input feature map.
[0037] c. Perform batch normalization on the feature maps from the previous step, and filter them using the ReLU6 activation function.
[0038] d. Determine if the required depth n of the block is 1. If it is, input the feature map from the previous step into a depthwise separable convolution and output it. If not, continue with the following steps.
[0039] e. Repeat the following steps n-1 times, with the current iteration number counted as i:
[0040] (1) Update the historical features to the concatenation of the feature map obtained in the last step and itself.
[0041] (2) Expand the feature map obtained in the previous step using pointwise convolution, and the number of output channels is (c0 + c0 * 0.125 * ...
[0042] (i+1))*2, concatenate the generated feature map with the historical feature map.
[0043] (3) Perform batch normalization on the feature maps obtained in the previous step, and filter them using the ReLU6 activation function.
[0044] (4) If it is the last loop, input the feature map obtained in the previous step into the depthwise separable convolution. The depthwise separable convolution outputs c channels with a stride of s. Batch standardize the generated feature map and output it after passing it through the ReLU6 activation function.
[0045] If it is not the last iteration, the feature map obtained in the previous step is input into the depthwise separable convolution. The number of output channels of the depthwise separable convolution is (c0+c0*0.125*(i+1)), and the stride is 1.
[0046] 3. Input the feature map obtained in the previous step into the block structure described in step 2, with an output channel number of 32*S, a depth of 3, and a stride of 1.
[0047] 4. Input the feature map obtained in the previous step into the block structure described in step 2, with an output channel count of 64*S, a depth of 4, and a stride of 1.
[0048] 5. Input the feature map obtained in the previous step into the block structure described in step 2, with an output channel count of 96*S, a depth of 5, and a stride of 2.
[0049] 6. Input the feature map obtained in the previous step into the block structure described in step 2, with an output channel count of 64*S, a depth of 4, and a stride of 1.
[0050] 7. Perform global average pooling on the feature map obtained in the previous step, pass it through a fully connected layer, and then use the Softmax activation function to calculate the probability of different categories.
[0051] 8. Input the image to be identified into the classification model, calculate the probability of different categories, and find the category with the highest probability to be the category to which the image belongs.
[0052] Example:
[0053] 1) Collect a large number of RGB images of the classification target samples, create a dataset, and divide the dataset into training set and test set in a 7:3 ratio.
[0054] 2) Normalize the images in the training set. Since the pixel values of RGB images are between 0 and 255, the normalization formula is: f(x) = x ÷ 255. Then, perform data augmentation processing such as random rotation, random scaling and random translation on the normalized data.
[0055] 3) Perform the above normalization process only on the images in the test set;
[0056] 4) Build a lightweight network model and train it using the processed training set;
[0057] The aforementioned lightweight God Network includes a 3×3 standard convolution, 6 block structures, a global average pooling layer, a fully connected layer, and a Softmax activation function. The block structure includes multiple inverse bottleneck structures. The connection method of different bottleneck structures is as follows: the input is connected to each fusion layer and 1×1 convolution in all bottleneck structures; the first 1×1 convolution in all bottleneck structures is connected to each subsequent fusion layer and 1×1 convolution; the last 1×1 convolution in all bottleneck structures is connected to each subsequent fusion layer and 1×1 convolution; and each bottleneck structure is connected in series.
[0058] Activation functions and batch normalization are added between each layer of the model. Except for the Softmax activation function used to output the classification result, all other activation functions in the model use the ReLU6 activation function.
[0059] 5) After training is complete, save the neural network classification model.
[0060] 6) Input the test set images into the trained model, and the model will output the probability of the image belonging to all categories. The category with the highest probability is the category to which the image belongs.
[0061] It is evident that, compared to traditional large-scale classification models, this invention achieves a significantly higher classification accuracy while requiring only tens of times less computation and parameters. Even compared to MobileNet, the most commonly used and efficient classification model, it saves several times the computational cost while maintaining almost the same accuracy.
Claims
1. A method for classifying pipe fitting images based on a lightweight neural network, characterized in that, Includes the following steps: S1. Obtain pipe fitting image data, which is divided into training set and test set; S2. Perform normalization preprocessing and data augmentation on the training set data; Normalize the test set data; S3. Construct a lightweight convolutional neural network and train it using the training set data to obtain a trained neural network image classification model. S4. Input the test set data into the neural network image classification model and output the probability of the pipe fitting image data corresponding to all categories, where the category with the highest probability is the category to which the pipe fitting image data belongs; Step S3 involves training the lightweight convolutional neural network using the training set data, including the following steps: S3.1 Let the number of channels c, depth n, and stride s of the output feature map of the block structure be defined; S3.2 Obtain the number of channels c0 of the input feature map and define the input features as historical features; S3.3 processes the input feature map using pointwise convolution, with the number of output channels being twice the number of input channels, and concatenates the generated feature map with the input feature map; S3.4 Perform batch normalization on the feature maps obtained in step S3.4 and filter them using the ReLU6 activation function; S3.5 Determine if the required depth n for the block structure is 1. If yes, input the feature map from the previous step into a depthwise separable convolution and output it. If not, proceed to step S3.
6. S3.6 Repeat the following steps n-1 times, with the current iteration number counted as i: S3.6.1 updates the historical features to the concatenation of the feature map obtained in the last step and itself. S3.6.2 Expand the feature map obtained in the previous step with pointwise convolution, and output the number of channels as (c0 + c0*0.125*(i+1))*2. Concatenate the generated feature map with the historical feature map. S3.6.3 Batch standardize the feature maps obtained in the previous step and filter them using the ReLU6 activation function; S3.6.4 If this is the last iteration, input the feature map obtained in the previous step into a depthwise separable convolution. The depthwise separable convolution outputs c channels with a stride of s. Batch normalize the generated feature map and output it after passing it through the ReLU6 activation function. If this is not the last iteration, input the feature map obtained in the previous step into a depthwise separable convolution. The depthwise separable convolution outputs (c0 + c0 * 0.125 * (i + 1)) channels with a stride of 1.
2. The pipe fitting image classification method based on a lightweight neural network according to claim 1, characterized in that, The lightweight convolutional neural network includes a 3×3 standard convolutional layer, multiple block structures, a global average pooling layer, a fully connected layer, and a Softmax activation function. Each block structure includes multiple inverse bottleneck structures. The connection method between different bottleneck structures is as follows: the input is connected to each fusion layer and 1×1 convolution in all bottleneck structures; the first 1×1 convolution in all bottleneck structures is connected to each subsequent fusion layer and 1×1 convolution; the last 1×1 convolution in all bottleneck structures is connected to each subsequent fusion layer and 1×1 convolution; and each bottleneck structure is cascaded. Activation functions and batch normalization are added between model layers. Except for the Softmax activation function used to output the classification result, all other activation functions in the model use the ReLU6 activation function.