Oil drilling and production equipment defect detection system based on deep learning

By using deep learning's matrix transformation, feature extraction, and model dimensionality reduction modules, combined with lightweight technology, the problem of false alarms and missed alarms in the oil drilling and production equipment detection system under light interference and noise was solved, and real-time defect detection at edge computing nodes was realized.

CN122289179APending Publication Date: 2026-06-26SHENZHEN KAIRUI YIDE TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN KAIRUI YIDE TECH CO LTD
Filing Date
2026-03-25
Publication Date
2026-06-26

Smart Images

  • Figure CN122289179A_ABST
    Figure CN122289179A_ABST
Patent Text Reader

Abstract

This invention relates to the field of defect detection technology, specifically a deep learning-based defect detection system for oil drilling and production equipment. The system includes: acquiring visible light images of the drilling and production equipment using a camera, converting them into a pixel matrix to eliminate illumination interference and generate a standard image matrix; extracting features from the standard image matrix, calculating weight ratios to generate a defect feature tensor; calculating the activation contribution of convolutional kernels, filtering redundant branches, performing network pruning, and generating a lightweight mapping map; and inputting the lightweight mapping map into a node classifier to calculate confidence and generate instructions. In this invention, a multi-scale feature fusion network is constructed to dynamically adjust feature weights, enhancing the perception of micro-cracks and suppressing background noise interference to achieve precise target localization. Pruning techniques are used to lightweightly compress and remove redundant parameters, reducing computational consumption and allowing direct deployment at on-site edge nodes to avoid bandwidth consumption caused by cloud transmission, ensuring real-time output of detection results even with limited computing power.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of defect detection technology, and in particular to a deep learning-based defect detection system for oil drilling and production equipment. Background Technology

[0002] The field of defect detection technology involves systematic engineering that utilizes computer vision and pattern recognition technologies to automatically monitor the physical condition of industrial production equipment surfaces to ensure operational safety and quality control. Among these, traditional oil drilling and production equipment defect detection systems refer to hardware monitoring devices that employ a single threshold segmentation image algorithm combined with manual visual inspection to assess the crack and wear conditions of key mechanical components at the drilling site.

[0003] Existing technologies are susceptible to interference from surface oil and sudden changes in lighting when processing acquired images, making it impossible to separate the target from background noise during feature extraction. Single-scale convolution is difficult to capture multi-dimensional morphological changes, and the fixed-weight receptive field limits the accurate localization of abnormal regions. This restricts the generalization ability of the model under complex working conditions, making it prone to false alarms and false negatives. The massive network architecture consumes too many computing resources and is difficult to deploy on computing-limited terminal devices, resulting in excessively high data transmission latency, which cannot meet the needs of operational safety monitoring. Summary of the Invention

[0004] The purpose of this invention is to address the shortcomings of existing technologies by proposing a deep learning-based defect detection system for oil drilling and production equipment.

[0005] To achieve the above objectives, the present invention adopts the following technical solution: a deep learning-based defect detection system for oil drilling and production equipment includes:

[0006] The matrix conversion module is configured to acquire visible light images of drilling equipment, convert the visible light images to construct a pixel matrix, extract the local brightness features of the pixel matrix, eliminate illumination interference, and generate a standard image matrix.

[0007] The feature extraction module is configured to extract the texture features and semantic features of the standard image matrix using a network model, calculate the channel weight ratio of the texture features and semantic features, and concatenate them to construct a defect feature tensor.

[0008] The model dimensionality reduction module is configured to calculate the activation contribution of the convolutional kernels of the detection network, filter out redundant branches whose activation contribution is lower than a threshold, prune redundant branches, and compress the defect feature tensor to generate a lightweight mapping graph.

[0009] The classification and determination module is configured to input the lightweight mapping map into an edge classifier to calculate the anomaly confidence score, compare the anomaly confidence score with the safe interval, and generate an alarm command if the anomaly confidence score exceeds the safe interval.

[0010] As a further aspect of the present invention, the specific function of the matrix transformation module is as follows:

[0011] The image acquisition submodule acquires a visible light image, maps the visible light image to a grayscale space using a preset color space conversion matrix, and performs scale normalization processing on the grayscale image using a bilinear interpolation algorithm to extract the grayscale coordinates of each pixel in the image to construct a pixel matrix.

[0012] The brightness extraction submodule analyzes the spatial distribution pattern of each pixel in the pixel matrix, uses a local Gaussian smoothing filter to remove high-frequency noise from the pixel matrix, calculates the gray-level gradient change rate between adjacent pixels, and selects a set of key pixels with significant features based on the gray-level gradient change rate to extract the local brightness features.

[0013] The illumination cancellation submodule compares the local brightness features with a preset global ambient illumination distribution benchmark, uses a homomorphic filtering algorithm to separate the low-frequency illumination reflection component and the high-frequency object detail component in the pixel matrix, suppresses the amplitude of the low-frequency illumination reflection component and enhances the high-frequency object detail component, and eliminates illumination interference to generate the standard image matrix.

[0014] As a further aspect of the present invention, the specific function of the feature extraction module is as follows:

[0015] The feature extraction submodule obtains the standard image matrix and inputs it into a pre-trained multi-scale convolutional network model. Through multiple convolutional kernels, it extracts spatial texture details and deep abstract semantic structures under different receptive fields and outputs the corresponding initial dimension texture features and semantic features respectively.

[0016] The weight calculation submodule analyzes the energy distribution of texture features and semantic features in different channel dimensions, calculates the global response value of each feature channel by combining global average pooling operation, performs probability mapping on the global response values ​​of all channels using a normalized exponential function, and calculates the channel weight ratio.

[0017] The tensor splicing submodule optimizes the expression weights of texture features and semantic features in different channel dimensions. It uses the channel weight ratio to perform element-wise dynamic weighted multiplication calculations on each channel corresponding to texture features and semantic features, thereby enhancing the expressive power of the channel containing key defect information and weakening the background noise channel. It then performs cascading operations along the preset channel dimensions to splice and construct the defect feature tensor.

[0018] As a further aspect of the present invention, the specific function of the model dimensionality reduction module is as follows:

[0019] The contribution calculation submodule obtains the response output tensor of the detection network convolution kernel to each feature channel during the forward propagation process, calculates the non-zero activation ratio of the output tensor under all validation sample sets and the absolute value of the scaling factor in the batch normalization layer, and comprehensively calculates the activation contribution.

[0020] The branch filtering submodule compares the activation contribution with a preset structure retention threshold, uses a binary search algorithm to locate the network layer node with the least performance impact in the sorted contribution sequence, and identifies and marks redundant branches that are below the threshold and do not affect the global semantic integrity.

[0021] The network pruning submodule filters network connections and parameter weights corresponding to redundant branches, uses knowledge distillation technology to transfer the network's defect identification capability to the pruned simplified network structure, and compresses the defect feature tensor based on the simplified network structure to generate the lightweight mapping graph.

[0022] As a further aspect of the present invention, the specific function of the classification determination module is as follows:

[0023] The confidence inference submodule obtains the lightweight mapping map and inputs it into the edge classifier pre-deployed in the terminal device. It maps the multi-dimensional feature space to a one-dimensional category probability space through a fully connected layer and calculates the anomaly confidence by combining the Softmax activation function for probability normalization.

[0024] The interval comparison submodule obtains the confidence distribution statistics under normal conditions from the system's historical operation database, determines the safe interval by setting the upper and lower limits of reasonable fluctuations based on the Gaussian distribution model, and compares the abnormal confidence with the safe interval one by one to obtain the out-of-bounds deviation value.

[0025] The alarm triggering submodule determines that if the obtained out-of-bounds deviation value is greater than zero, it indicates that there is a structural defect or surface damage risk in the current drilling and production equipment. It calls the underlying hardware control interface of the system to encapsulate the out-of-bounds alarm information into a standard communication protocol data packet, and determines that the abnormal confidence level exceeds the safe range to generate an alarm command.

[0026] As a further aspect of the present invention, the process of eliminating illumination interference to generate the standard image matrix specifically includes:

[0027] The local brightness features and the initial pixel matrix are obtained, and a two-dimensional logarithmic domain function that can map the attenuation law of ambient light intensity is constructed. The pixel matrix is ​​transformed from linear space to logarithmic space to separate the incident light component and the reflected light component.

[0028] The spectral distribution information contained in the local brightness features is analyzed, an adaptive Gaussian high-pass filter transfer matrix is ​​constructed, the Euclidean distance between the center frequency and the cutoff frequency of the pixel is calculated, and the low-frequency incident light component representing the slow change of illumination is suppressed by the Gaussian high-pass filter transfer matrix while retaining the high-frequency reflected light component.

[0029] The image data after frequency domain processing is optimized and restored to the original spatial domain using the inverse exponential transform function. At the same time, the histogram equalization algorithm is used to stretch the contrast distribution range of the image, and all enhanced pixels are integrated to generate the standard image matrix.

[0030] As a further aspect of the present invention, the process of calculating the channel weight ratio specifically includes:

[0031] Obtain a 3D feature map array corresponding to texture features and semantic features. Perform global max pooling operation along the spatial dimension to extract the most significant feature response peak within each channel. At the same time, perform global average pooling operation to extract the average feature distribution state within each channel.

[0032] The most significant feature response peak and average feature distribution state are analyzed and then input into a fully connected shared network composed of multilayer perceptrons for nonlinear dimensionality reduction and dimensionality increase. The two sets of output vectors after network mapping are superimposed to generate a comprehensive feature descriptor of the channel dimension.

[0033] The numerical values ​​of each element in the comprehensive feature descriptor are optimized and a Sigmoid activation function is introduced to perform range compression, nonlinearly mapping the values ​​of all dimensions to a continuous interval between zero and one, and the channel weight ratio is calculated based on the normalized numerical sequence after mapping.

[0034] As a further aspect of the present invention, the process of comprehensively calculating the activation contribution specifically includes:

[0035] Obtain the set of batch normalized scaling factors corresponding to the convolutional kernels of the detection network and the channel activation sparsity matrix during the forward propagation of the network. Based on the preset channel importance evaluation model, use the batch normalized scaling factors to perform logarithmic smoothing calculation on the corresponding dimensions in the channel activation sparsity matrix. Combine the absolute value term of the Taylor expansion to approximate the gradient change rate of the network loss function with respect to the channel parameter, and establish a contribution evaluation model that can characterize the redundancy of the convolutional kernel.

[0036] Calculate the feature map entropy value of the target channel, based on the contribution evaluation model, using the formula:

[0037] ;

[0038] Calculate the activation contribution;

[0039] in, Representing the The activation contribution of each channel This represents the preset scaling factor importance penalty coefficient. Representing the The absolute value of the batch normalized scaling factor corresponding to each channel. This represents the preset gradient response importance adjustment coefficient. Represents the network loss function relative to the first The rate of change of the parameter gradient of each channel Representing the The information entropy value of the feature map output by each channel.

[0040] As a further aspect of the present invention, the process of calculating the anomaly confidence level specifically includes:

[0041] Obtain the deep multidimensional abstract feature vector contained in the lightweight mapping graph, flatten the deep multidimensional abstract feature vector into a one-dimensional continuous feature sequence, configure a self-attention mechanism module that can capture long-distance dependencies in the sequence to calculate the correlation weight matrix between each element in the feature sequence.

[0042] The weighted summation of elements in the feature sequence is calculated and the aggregated global representation vector is obtained based on the correlation weight matrix. The global representation vector is then input into a nonlinear fully connected network containing a dropout layer for feature space transformation and mapping, and the original logistic regression scores of each detection category are output.

[0043] The numerical items representing defect and anomaly categories in the original logistic regression scores are selected. An exponential function is used to convert the numerical items of the anomaly categories into positive numbers and then divided by the sum of the exponential conversion values ​​of all categories. The anomaly confidence level is calculated based on the output probability distribution values.

[0044] As a further aspect of the present invention, the process of determining the safe zone specifically includes:

[0045] The system obtains the baseline confidence sampling sequence and the dynamic variable of the equipment operating environment temperature recorded in the system's historical operation database during the fault-free period. Based on the preset boundary adaptive adjustment model, the discrete variance of the baseline confidence sampling sequence is linearly compensated using the dynamic variable of the equipment operating environment temperature. Combined with the confidence mean drift within the time sliding window, a dynamic safety threshold model that can dynamically track the system health baseline is constructed.

[0046] Calculate the temperature compensation coefficient for the current sampling period based on the dynamic safety threshold model, using the formula:

[0047] ;

[0048] Determine the safe zone;

[0049] in, This represents the upper and lower bounds of the dynamically adjusted safety range. This represents the statistical average of historical normal confidence levels within a specified time sliding window. This represents the preset confidence level standard deviation tolerance factor. This represents the standard deviation of historical normal confidence levels within a specified time sliding window. The weighting coefficient representing the influence of the equipment's operating environment temperature on the confidence benchmark. This represents the absolute deviation between the current operating ambient temperature of the equipment and the standard reference temperature.

[0050] Compared with the prior art, the advantages and positive effects of the present invention are as follows:

[0051] In this invention, a multi-scale feature fusion network architecture is constructed to extract multi-dimensional visual information. An attention mechanism is introduced to dynamically adjust the channel feature weights, enhancing the perception of micro-cracks and defects obscured by oil, suppressing background noise interference, and achieving accurate positioning of target areas under complex working conditions. Knowledge distillation combined with network pruning is used to lightweight compress the model, removing redundant parameters and computational branches, reducing overall computational resource consumption, and deploying the lightweight model directly on the edge computing nodes at the drilling site. This avoids bandwidth occupation and latency caused by transmitting image data to the cloud, ensuring that detection results can still be output in real time under limited computing power, meeting the real-time monitoring needs of drilling and production equipment. Attached Figure Description

[0052] Figure 1 This is a flowchart of the system modules of the present invention;

[0053] Figure 2 This is a flowchart of the matrix transformation module execution of the present invention;

[0054] Figure 3 This is a flowchart of the feature extraction module of the present invention.

[0055] Figure 4 This is a flowchart of the execution process of the model dimensionality reduction module of this invention;

[0056] Figure 5 This is a flowchart of the classification and determination module of the present invention. Detailed Implementation

[0057] To make the objectives, technical solutions, and advantages of this invention clearer, the software-based technical solution is described in detail below with reference to system architecture diagrams and embodiments. It should be understood that the specific embodiments described herein are only for explaining the technical solutions of this invention and do not constitute a limitation on the scope of protection.

[0058] In the description of this invention, the system architecture relationships or data processing flows indicated by terms such as "layer," "module," "interface," "data flow," "client," and "server" are all defined based on the architecture diagram or flowchart corresponding to the embodiments. This way of describing is only used to clearly illustrate the logical relationships between the elements in the technical solution, and not to limit the physical deployment form. The term "multiple" includes two or more technical units, including but not limited to multiple data nodes, processing threads, service instances, or functional components and other scalable elements. The specific number is determined according to the actual business scenario and needs to be specifically specified.

[0059] Please see Figure 1 and Figure 2 This invention provides a technical solution: a deep learning-based defect detection system for oil drilling and production equipment, comprising:

[0060] The matrix conversion module is configured to acquire visible light images of drilling equipment, convert the visible light images to construct a pixel matrix, extract the local brightness features of the pixel matrix, eliminate illumination interference, and generate a standard image matrix.

[0061] The specific functions of the matrix transformation module are as follows:

[0062] The image acquisition submodule acquires a visible light image, maps the visible light image to a grayscale space using a preset color space conversion matrix, and performs scale normalization processing on the grayscale image using a bilinear interpolation algorithm to extract the grayscale coordinates of each pixel in the image to construct a pixel matrix.

[0063] The brightness extraction submodule analyzes the spatial distribution pattern of each pixel in the pixel matrix, uses a local Gaussian smoothing filter to remove high-frequency noise from the pixel matrix, calculates the gray-level gradient change rate between adjacent pixels, and selects a set of key pixels with significant features based on the gray-level gradient change rate to extract local brightness features.

[0064] The illumination cancellation submodule compares local brightness features with a preset global ambient illumination distribution benchmark, uses a homomorphic filtering algorithm to separate low-frequency illumination reflection components and high-frequency object detail components in the pixel matrix, suppresses the amplitude of low-frequency illumination reflection components and enhances high-frequency object detail components, eliminates illumination interference and generates a standard image matrix.

[0065] The process of eliminating illumination interference to generate a standard image matrix specifically includes:

[0066] Local brightness features and an initial pixel matrix are obtained. A two-dimensional logarithmic domain function that can map the attenuation law of ambient light intensity is constructed. The pixel matrix is ​​transformed from linear space to logarithmic space to separate the incident light component and the reflected light component.

[0067] Analyze the spectral distribution information contained in the local brightness features, construct an adaptive Gaussian high-pass filter transfer matrix, calculate the Euclidean distance between the center frequency and cutoff frequency of the pixel, and suppress the low-frequency incident light component that represents the slow change of illumination while retaining the high-frequency reflected light component through the Gaussian high-pass filter transfer matrix.

[0068] The image data after frequency domain processing is optimized and restored to the original spatial domain using the inverse exponential transform function. At the same time, the histogram equalization algorithm is used to stretch the contrast distribution range of the image, and all enhanced pixels are integrated to generate a standard image matrix.

[0069] The system acquires visible light images of the equipment surface from a CMOS industrial camera positioned 500mm directly above the top drive unit of the No. 3 drilling and production platform in an oilfield. The camera is configured in global exposure mode with a resolution of 4096 x 2160 pixels, a color depth of 8 bits, and a frame rate locked at 60 frames per second. The system retrieves the visible light image data packet from the memory buffer and, for a specific center pixel with an x-coordinate of 1024 and a y-coordinate of 2048, reads its red channel quantization value as 215, green channel quantization value as 188, and blue channel quantization value as 142. The preset color space conversion matrix is ​​invoked to extract the quantization values ​​of the corresponding channels. A linear weighted calculation is then performed: the red channel value (215) is multiplied by 0.299, the green channel value (188) by 0.587, and the blue channel value (142) by 0.114, resulting in weighted values ​​of 64.285, 110.356, and 16.188 respectively. These three weighted values ​​are summed to obtain 190.829. The decimal part is truncated using the floor rule, yielding the absolute grayscale value of the pixel as 190. This weighted summation and floor operation is then performed on all pixels at a 4096x2160 resolution to construct an initial grayscale space matrix with dimensions 4096x2160x1.

[0070] The initial grayscale matrix is ​​scaled using a bilinear interpolation algorithm. The spatial resolution of the target matrix is ​​set to 1024 x 1024 pixels. For the target pixel at coordinates (256, 256) in the target matrix, its inverse mapped floating-point coordinates in the initial grayscale matrix are calculated as (1024.25, 540.75) based on the scaling factor. The four adjacent integer coordinate pixels surrounding this floating-point coordinate are located, with coordinates of (1024, 540), (1025, 540), (1024, 541), and (1025, 541), respectively. The corresponding grayscale values ​​of these four coordinates are read as 185, 188, 182, and 186. The horizontal distance weights between the inverse mapped floating-point coordinate and the adjacent integer coordinates are calculated to be 0.25 and 0.75, and the vertical distance weights are 0.75 and 0.25. The grayscale values ​​of the four pixels are cross-multiplied and summed with their corresponding horizontal and vertical distance weights: 185 multiplied by 0.1875, plus 188 multiplied by 0.0625, plus 182 multiplied by 0.5625, plus 186 multiplied by 0.1875. This yields an interpolated grayscale value of 183.6875, which is rounded down to 184. Following this logic, all pixels in the 1024x1024 matrix are resampled, and the grayscale coordinates of each pixel are extracted to construct the final normalized pixel matrix.

[0071] The spatial distribution of the normalized pixel matrix was analyzed, and a 5x5 local Gaussian smoothing filter with a standard deviation of 1.2 was applied. This filter was then applied across a 1024x1024 normalized pixel matrix with a stride of 1. When processing the local receptive field centered at (512, 512), a local grayscale matrix consisting of 25 pixels was extracted. This matrix was then multiplied element-wise with the 25 static weight coefficients of the Gaussian filter, and the results were summed. The calculated result replaced the original grayscale value at the center coordinate (512, 512) to filter out high-frequency noise. Subsequently, a 3x3 Sobel operator was applied, and the smoothed matrix was convolved and differentiated along both the horizontal and vertical directions to calculate the grayscale gradient rate of change between adjacent pixels. At coordinates (300, 400), the horizontal gradient value was 45, and the vertical gradient value was 60. The sum of the squares of these two values, taken as the square root, yielded a combined grayscale gradient rate of change of 75. Based on the global gradient distribution histogram, the gray-level gradient change rate screening threshold is set to 50. Key pixel points with gradient values ​​greater than 50 are selected, and the gray-level values ​​of all pixels in the point set are combined and extracted as local brightness features.

[0072] Local brightness features and an initial pixel matrix of size 1024x1024 are obtained, and a 2D logarithmic domain function that can map the attenuation law of ambient light intensity is constructed. For the pixel at coordinates (100, 100) in the pixel matrix, its gray value is 150. A constant 1 is added to this gray value to prevent errors due to zero logarithmic values, resulting in 151, and its natural logarithmic value is calculated to be 5.017. A logarithmic space transformation is performed on all pixels in the matrix, transforming the original linear space model of the product of incident and reflected light into a linear summation model of the incident and reflected light components. A fast Fourier transform is performed to transform the logarithmic space matrix from the spatial domain to the two-dimensional spectral domain containing low-frequency and high-frequency distribution information. An adaptive Gaussian high-pass filter transfer matrix is ​​constructed, with the system frequency domain center coordinates set to (512, 512) and the cutoff frequency set to 30 Hz. For the frequency domain node with coordinates (532, 527) in the spectral domain, the Euclidean distance between this node and the center frequency is calculated. Its lateral deviation is 20, and its longitudinal deviation is 15. Taking the square root of the sum of the squares yields an Euclidean distance of 25. According to the Gaussian high-pass filter transfer matrix rule, since the Euclidean distance of 25 is below the cutoff frequency of 30 Hz, this frequency domain node is determined to represent a low-frequency incident light component with slowly changing illumination, and its amplitude is suppressed by multiplying it by an attenuation factor of 0.2. For high-frequency nodes with an Euclidean distance greater than 30 Hz, they are determined to represent high-frequency reflected light components representing object details, and are preserved and enhanced by multiplying them by an enhancement factor of 1.5.

[0073] Image data after frequency domain processing is extracted and restored to the logarithmic space domain using an inverse fast Fourier transform. Then, an inverse exponential transform function is applied to restore each logarithmic value to the original linear space domain. A histogram equalization algorithm is used on the restored matrix to statistically analyze the pixel distribution probability within the grayscale range of 0 to 255 and calculate the cumulative distribution function. Assuming the original grayscale value distribution is concentrated in a narrow range of 50 to 120, cumulative probability mapping stretches grayscale value 120 to 235 and grayscale value 50 to 15, expanding the image's contrast distribution range. All 1024x1024 pixels that have undergone frequency domain separation, high-frequency enhancement, and contrast stretching are integrated to generate a standard image matrix free from illumination interference.

[0074] Table 1. Correlation Analysis between Gray-level Gradient Change Rate and Illumination Intensity

[0075]

[0076] Table 1 shows the processing data under different strong light interference conditions. Experimental data shows that when the ambient light intensity reaches 65,000 lux, by setting the logarithmic domain high-pass cutoff frequency to 35 Hz, the contrast improvement rate of the filtered image reaches 88%, which is 53% higher than the histogram processing method without homomorphic filtering for light cancellation. At the same time, the detail preservation of key pixels remains stable at 95%, demonstrating the effectiveness of the light cancellation processing mechanism under strong light interference conditions.

[0077] The bilinear interpolation algorithm mentioned above refers to an image scaling method that calculates the new pixel gray value at the target location by performing linear interpolation in both the horizontal and vertical directions and using the gray values ​​of the four surrounding adjacent pixels as a weighted average based on distance.

[0078] Please see Figure 1 and Figure 3 The feature extraction module is configured to extract texture and semantic features from a standard image matrix using a network model, calculate the channel weight ratio of texture and semantic features, and concatenate them to construct a defect feature tensor.

[0079] The specific functions of the feature extraction module are as follows:

[0080] The feature extraction submodule obtains a standard image matrix and inputs it into a pre-trained multi-scale convolutional network model. Through multiple convolutional kernels, it extracts spatial texture details and deep abstract semantic structures under different receptive fields, and outputs the corresponding texture features and semantic features of the initial dimension respectively.

[0081] The weight calculation submodule analyzes the energy distribution of texture features and semantic features across different channel dimensions, calculates the global response value of each feature channel using global average pooling, and performs probability mapping on the global response values ​​of all channels using a normalized exponential function to calculate the channel weight ratio.

[0082] The tensor splicing submodule optimizes the expression weights of texture features and semantic features in different channel dimensions. It uses the channel weight ratio to perform element-wise dynamic weighted multiplication calculations on each channel corresponding to texture features and semantic features, enhances the expressive power of the channel where key defect information is located and weakens the background noise channel. It splices and constructs the defect feature tensor by performing cascading operations along the preset channel dimensions.

[0083] The process of calculating the channel weight ratio specifically includes:

[0084] Obtain a 3D feature map array corresponding to texture features and semantic features. Perform global max pooling operation along the spatial dimension to extract the most significant feature response peak within each channel. At the same time, perform global average pooling operation to extract the average feature distribution state within each channel.

[0085] The most significant feature response peak and average feature distribution state are analyzed and then input into a fully connected shared network composed of multilayer perceptrons for nonlinear dimensionality reduction and dimensionality increase. The two sets of output vectors after network mapping are superimposed to generate a comprehensive feature descriptor of the channel dimension.

[0086] The numerical values ​​of each element in the comprehensive feature descriptor are optimized and the Sigmoid activation function is introduced to perform range compression, nonlinearly mapping the values ​​of all dimensions to a continuous interval between zero and one, and calculating the channel weight ratio based on the normalized numerical sequence after mapping.

[0087] Obtain a standard image matrix with a spatial dimension of 1024 x 1024 and a depth of 1 channel, output by the pre-processor module. Load the standard image matrix into the input tensor buffer of the improved ResNet-50 multi-scale convolutional network model. Activate the pre-trained network model parameters. The standard image matrix first enters the first-layer convolutional kernel processing array, which contains 64 7 x 7 spatial receptive field convolutional kernels with a stride parameter set to 2 and an edge padding parameter set to 3. Perform matrix sliding multiplication and addition operations through the above network configuration, outputting an initial feature map with a spatial dimension decayed to 512 x 512 and a channel dimension expanded to 64. Subsequently, drive the feature map through subsequent residual structure stacked modules. In the second residual feature extraction stage, the network model outputs an intermediate tensor with a dimension of 256 x 256 x 256. Extract this intermediate tensor as the initial dimension texture feature representing the local physical morphology. The data flow continues to drill down along the network hierarchy. After passing through the 5th residual feature extraction stage, the output space dimension of the network model is compressed to a 32x32 high-order abstract tensor with a channel depth of 2048. This tensor is extracted as the initial dimensional semantic feature containing global environmental information.

[0088] Semantic features with dimensions of 32x32x2048 were extracted as the analysis benchmark, and their energy distribution across different channel dimensions was calculated. The semantic features were divided into 2048 independent 32x32 2D matrix planes. For the 128th channel plane, a global average pooling operation was performed, summing the response values ​​of the 1024 pixels within this plane. Assuming the total sum is 870.4, divided by the total number of pixels (1024), the global average response value for this channel was calculated to be 0.85. Simultaneously, a global max pooling operation was performed, iterating through the 1024 pixel response values ​​within this plane. The highest peak value was identified and extracted using a numerical ratio hint algorithm, and determined to be the most significant feature response peak value of 2.45. This dual-track pooling operation was then performed on each of the 2048 channels, extracting the average feature distribution state within each channel to construct an average feature column vector of length 2048, and extracting the most significant feature response peak value within each channel to construct a maximum feature column vector of length 2048.

[0089] The two sets of column vectors are processed using a fully connected shared network composed of multilayer perceptrons. The fully connected shared network consists of one input layer, one hidden layer, and one output layer. The average feature column vector of length 2048 is fed into the input layer, and matrix multiplication is performed using a 2048x128 dimensionality-reducing weight matrix. A ReLU activation function is introduced to remove all negative response values ​​less than zero, outputting a low-dimensional condensed vector of length 128, achieving non-linear dimensionality reduction. Subsequently, the low-dimensional condensed vector is multiplied using a 128x2048 dimensionality-increasing weight matrix, outputting a reconstructed average vector of length 2048. Using the exact same shared weight matrix combination, parallel dimensionality reduction and dimensionality increase are performed on the largest feature column vector, outputting the largest mapped vector. The 128th index value of the reconstructed average vector (1.12) and the 128th index value of the reconstructed largest vector (1.46) are extracted, and the corresponding index values ​​are superimposed to calculate the comprehensive feature response result of 2.58. The superposition calculation is completed by traversing 2048 index positions, generating a channel dimension comprehensive feature descriptor that represents the importance of each channel.

[0090] A sigmoid activation function is introduced to perform range compression and probability mapping on the comprehensive feature descriptors of the channel dimension. The comprehensive feature response result of 2.58 obtained from the superposition calculation of channel 128 is substituted into the inverse exponential formula, and the result of 1 divided by 1 plus the natural constant e to the power of -2.58 is calculated, yielding a normalized value of 0.929 after mapping, meaning the channel weight ratio of channel 128 is 0.929. For the 32x32x2048 three-dimensional feature map array corresponding to the semantic features, element-wise dynamic weighted multiplication is performed. The original 32x32 matrix of channel 128 is extracted, and each element is uniformly multiplied by a dynamic weight of 0.929 to enhance the activation expression capability of the channel containing the key defect information; conversely, if the weight value of a channel after mapping is 0.05, multiplying by this weight weakens the background noise represented by that channel.

[0091] After performing dynamic weighted multiplication of semantic features, the same weight mapping and weighted multiplication process is applied to the texture features with a size of 256x256x256. The weighted semantic features are extracted, and the nearest neighbor upsampling algorithm is used to enlarge their spatial resolution from 32x32 to 256x256 to align with the spatial dimensions of the texture features. The concatenation axis is set to the channel dimension axis, and the upsampled semantic features (256x256x2048) and the weighted texture features (256x256x256) are concatenated to integrate their channel sequences, concatenating them to construct a final defect feature tensor with a spatial size of 256x256 and a channel depth of 2304.

[0092] Table 2. Quantization Distribution of Channel-Dimensional Comprehensive Feature Descriptors

[0093]

[0094] Table 2 shows the specific data of some channel feature responses and weight mappings. Experimental data shows that when the feature channels correspond to metal microcracks and normal background materials respectively, after the fully connected shared network superposition calculation and mapping processing, the weight of channel 128, which contains key defect feature information such as microcracks, reaches 0.929, while the weight of channel 512, which is pure background, is compressed to 0.224. Through element-wise dynamic weighted multiplication calculation, the feature response amplitude of key defect information is improved. Compared with the baseline model without weight ratio calculation, the feature recall rate of defect edges is improved by 16.5%, verifying the effectiveness of dynamic weighted multiplication calculation in weakening background noise channels.

[0095] The aforementioned residual structure stacked module refers to a network unit composed of multiple convolutional layers containing skip connections. Skip connections allow input data to be directly added to the output features across one or more layers, thereby alleviating the feature degradation and gradient vanishing problems in deep neural networks during backpropagation.

[0096] Please see Figure 1 and Figure 4 The model dimensionality reduction module is configured to calculate the activation contribution of the convolutional kernels of the detection network, filter redundant branches with activation contributions below a threshold, prune redundant branches, and compress the defect feature tensor to generate a lightweight mapping graph.

[0097] The specific functions of the model dimensionality reduction module are as follows:

[0098] The contribution calculation submodule obtains the response output tensor of the detection network convolutional kernel to each feature channel during the forward propagation process, calculates the non-zero activation ratio of the output tensor under all validation sample sets and the absolute value of the scaling factor in the batch normalization layer, and comprehensively calculates the activation contribution.

[0099] The branch filtering submodule compares the activation contribution with the preset structure retention threshold, uses a binary search algorithm to locate the network layer node with the least performance impact in the sorted contribution sequence, and identifies and marks redundant branches that are below the threshold and do not affect the global semantic integrity.

[0100] The network pruning submodule filters the network connections and parameter weights corresponding to redundant branches, uses knowledge distillation technology to transfer the network's defect identification capability to the pruned simplified network structure, and compresses the defect feature tensor based on the simplified network structure to generate a lightweight mapping graph.

[0101] The process of comprehensively calculating the activation contribution includes the following:

[0102] The set of batch normalized scaling factors corresponding to the convolutional kernels of the detection network and the channel activation sparsity matrix during the forward propagation of the network are obtained. Based on the preset channel importance evaluation model, the corresponding dimensions in the channel activation sparsity matrix are logarithmically smoothed using the batch normalized scaling factors. The absolute value term of the Taylor expansion is combined to approximate the gradient change rate of the network loss function with respect to the channel parameter, and a contribution evaluation model that can characterize the redundancy of the convolutional kernel is established.

[0103] Calculate the feature map entropy value of the target channel using the contribution evaluation model, through the formula:

[0104] ;

[0105] Calculate activation contribution;

[0106] in, Representing the The activation contribution of each channel This represents the preset scaling factor importance penalty coefficient. Representing the The absolute value of the batch normalized scaling factor corresponding to each channel. This represents the preset gradient response importance adjustment coefficient. Represents the network loss function relative to the first The rate of change of the parameter gradient of each channel Representing the The information entropy value of the feature map output by each channel.

[0107] The forward propagation logs and feature map data of the detection network were obtained when processing 5000 verification samples of drilling and production equipment with oil stains and different wear levels. A defect feature tensor of 256x256x2304 dimensions was extracted and imported into the data buffer of the model dimensionality reduction module. The configuration of the 85th convolutional kernel layer, which performs feature transformation on this tensor, was read from the detection network, and the running status parameters of the batch normalization operation layer corresponding to this layer were extracted. The 2304 channels were traversed, and the batch normalization scaling factor value corresponding to channel 512 was extracted. This scaling factor is used to adjust the variance of the feature distribution of the corresponding channel; its original floating-point value was found to be -0.155. An absolute value operation was performed on this original value, and the absolute value of the batch normalization scaling factor corresponding to channel 512 was calculated to be 0.155.

[0108] The channel activation sparsity matrix is ​​extracted during the forward propagation of the detection network. This matrix fully records the proportion of non-zero elements in each channel under the validation set input. To evaluate the impact of feature output on the final classification result, the gradient rate of change of the network loss function with respect to the parameters of that channel is calculated based on the error backpropagation principle. The network is input with the real label data containing crack annotations and the predicted probability distribution of the current network output. The difference between the cross-entropy loss and the actual loss is calculated, and the absolute value of the Taylor expansion is used as an approximation of the loss change. For channel 512, the gradient rate of change of the network loss function with respect to the parameters of channel 512 is extracted to be 0.045.

[0109] The feature map output by channel 512, with a size of 256 x 256, contains a total of 65,536 numerical elements. The range of these elements is divided into 256 discrete intervals with a fixed step size. The number of elements falling into each interval is counted and divided by 65,536 to obtain the probability distribution value for each interval. The probability distribution value of the 8th interval is extracted as 0.02. The natural logarithm of -0.02 is calculated to obtain the information content of a single item. The information contents of all 256 intervals are summed to obtain the information entropy value of the output feature map of channel 512, representing the feature richness, which is 4.65.

[0110] The preset channel importance assessment model parameter library is retrieved, and the scaling factor importance penalty coefficient is set to 0.35, while the gradient response importance adjustment coefficient is set to 0.65. Based on the parameters extracted and calculated above, the core calculation of activation contribution is performed using the following mathematical formula:

[0111] ;

[0112] in, Representing the The activation contribution of each channel; Represents the default scaling factor importance penalty coefficient; Representing the The absolute value of the batch normalized scaling factor corresponding to each channel; Represents the preset gradient response importance adjustment coefficient; Represents the network loss function relative to the first The rate of change of the parameter gradient of each channel; Representing the The information entropy value of the feature map output by each channel.

[0113] Substituting the absolute value of the batch normalized scaling factor of channel 512, 0.155, into the front term of the formula, we calculate 1 plus 0.155 equals 1.155. The natural logarithm of 1.155 is extracted as 0.144. Multiplying this logarithm by the scaling factor importance penalty coefficient of 0.35 yields the first part of the structural importance score, 0.0504. The obtained parameter gradient change rate of 0.045 is multiplied by the information entropy value of the output feature map, 4.65, resulting in 0.20925. Multiplying this product by the gradient response importance adjustment coefficient of 0.65 yields the second part of the information importance score, 0.13601. Adding the first and second part scores, i.e., 0.0504 plus 0.13601, gives the final result of 0.18641. This numerical result indicates that the activation contribution of channel 512 is 0.18641, representing the weight of this channel in maintaining the global semantic integrity of the network.

[0114] The entire 2304 channels of the defect feature tensor are traversed, and all activation contribution values ​​are calculated according to the above processing flow, generating a continuous floating-point sequence of length 2304. The quicksort algorithm is used to sort this contribution sequence in ascending order of value. With a target model compression rate of 45%, the number of network channels to be retained is calculated to be 2304 multiplied by 0.55, equaling 1267 channels, requiring pruning and removing 1037 channels. A binary search algorithm is used to quickly locate the 1037th element in the sorted sequence, and the corresponding activation contribution value of 0.215 is taken and established as the preset structure retention threshold.

[0115] Based on a structure retention threshold of 0.215, the network layer nodes are traversed sequentially. When an activation contribution of 0.18641 is identified for a branch, it is determined to be below the threshold of 0.215 and not within the global semantic core retention list, thus marking it as a redundant branch with the least performance impact. Network pruning is performed, physically cutting off and removing the network connections and parameter weight matrices corresponding to redundant branches, and reconstructing the feature transmission channels. Knowledge distillation technology is deployed, designating the unpruned network as the teacher network and the simplified network structure with the number of channels reduced to 1267 after pruning as the student network. The soft-label probability distribution data carrying the temperature smoothing coefficient in the output layer of the teacher network is extracted and used together with the output results of the student network to calculate the value of the difference function. This value is used to update the weight parameters of the retained channels in the student network, transferring the defect identification capability to the pruned network. By remapping the initial 256x256x2304 defect feature tensor using a simplified network structure after training, redundant feature planes are removed, resulting in a lightweight mapping map with dimensions of 16x16x512. Test data shows that with a compression rate of 45%, the computational cost decreased by 12 GFLOPs, the inference latency was reduced from 35 milliseconds to 18 milliseconds, and the average recognition accuracy fluctuated by only 0.3%, achieving the model dimensionality reduction target.

[0116] The aforementioned knowledge distillation technique refers to using the soft label probability distribution features learned by the complex and parameter-rich teacher network as a supervision signal to guide the weight update of the simplified student network with fewer parameters, thereby effectively transferring the feature extraction capability of the teacher network to the model compression technique in the student network.

[0117] Please see Figure 1 and Figure 5 The classification and determination module is configured to input the lightweight mapping map into the edge classifier to calculate the anomaly confidence score, compare the anomaly confidence score with the safe interval, and generate an alarm command if the anomaly confidence score exceeds the safe interval.

[0118] The specific functions of the classification and determination module are as follows:

[0119] The confidence inference submodule acquires a lightweight mapping map and inputs it into the edge classifier pre-deployed on the terminal device. It maps the multi-dimensional feature space to a one-dimensional class probability space through a fully connected layer and calculates the anomaly confidence by combining the Softmax activation function for probability normalization.

[0120] The interval comparison submodule obtains the confidence distribution statistics under normal conditions from the system's historical operation database, determines the safe interval by setting the upper and lower limits of reasonable fluctuations based on the Gaussian distribution model, and compares the abnormal confidence with the safe interval one by one to obtain the out-of-bounds deviation value.

[0121] The alarm triggering submodule determines that if the obtained out-of-bounds deviation value is greater than zero, it indicates that there is a structural defect or surface damage risk in the current drilling and production equipment. It calls the underlying hardware control interface of the system to encapsulate the out-of-bounds alarm information into a standard communication protocol data packet, and generates an alarm command if the abnormal confidence level exceeds the safe range.

[0122] The process of calculating the confidence level of anomalies specifically includes:

[0123] Obtain the deep multidimensional abstract feature vector contained in the lightweight mapping graph, flatten the deep multidimensional abstract feature vector into a one-dimensional continuous feature sequence, configure a self-attention mechanism module that can capture long-distance dependencies in the sequence to calculate the correlation weight matrix between each element in the feature sequence.

[0124] The weighted summation of elements in the feature sequence is calculated and the aggregated global representation vector is obtained based on the correlation weight matrix. The global representation vector is then input into a nonlinear fully connected network containing a dropout layer for feature space transformation and mapping, and the original logistic regression scores of each detection category are output.

[0125] The numerical items representing defect and anomaly categories in the original logistic regression scores are selected. An exponential function is used to convert the numerical items of the anomaly categories into positive numbers and divides them by the sum of the exponential conversion values ​​of all categories. The anomaly confidence level is calculated based on the output probability distribution values.

[0126] The process of determining a safe zone specifically includes:

[0127] The system obtains the baseline confidence sampling sequence and the dynamic variable of the equipment operating environment temperature recorded in the system's historical operation database during the fault-free period. Based on the preset boundary adaptive adjustment model, the discrete variance of the baseline confidence sampling sequence is linearly compensated using the dynamic variable of the equipment operating environment temperature. Combined with the confidence mean drift within the time sliding window, a dynamic safety threshold model that can dynamically track the system health baseline is constructed.

[0128] The temperature compensation coefficient for the current sampling period is calculated based on a dynamic safety threshold model using the formula:

[0129] ;

[0130] Determine the safe zone;

[0131] in, This represents the upper and lower bounds of the dynamically adjusted safety range. This represents the statistical average of historical normal confidence levels within a specified time sliding window. This represents the preset confidence level standard deviation tolerance factor. This represents the standard deviation of historical normal confidence levels within a specified time sliding window. The weighting coefficient representing the influence of the equipment's operating environment temperature on the confidence benchmark. This represents the absolute deviation between the current operating ambient temperature of the equipment and the standard reference temperature.

[0132] Obtain a lightweight mapping map with dimensions of 16 x 16 x 512 output by the model dimensionality reduction module. Call the feature flattening logic to flatten the deep multidimensional abstract feature vector in 3D space along continuous memory addresses, multiplying 16 by 16 by 512 to obtain 131072, generating a one-dimensional continuous feature sequence of length 131072. Input this feature sequence into the self-attention mechanism module, multiplying it by three independent initialization weight matrices to generate a query vector array, a key vector array, and a value vector array. Extract the first element vector from the query vector array and perform a dot product operation with the transpose of the key vector array. Divide the dot product result by the square root of the key vector dimension (8) for numerical scaling, and then use the Softmax function to calculate the relevance weight matrix. This matrix contains the relevance weight values ​​between each element within the feature sequence. Multiply the relevance weight matrix with the value vector array to calculate the weighted sum of the elements in the feature sequence. Extract the results of the operation and superimpose residual connections to obtain the aggregated global representation vector.

[0133] The global representation vector is input into a nonlinear fully connected network. This network contains three cascaded linear layers, with a dropout layer with a dropout probability of 0.3 connected between layers 1 and 2 to prevent overfitting. The fully connected layers map the multidimensional feature space to a one-dimensional class probability space of length 4, outputting raw logistic regression scores for four preset detection categories (normal structure, surface wear, deep cracks, and deformation anomalies). The extracted raw logistic regression score array is [0.2, 3.5, -1.2, 0.8]. Exponential functions are applied to transform the scores in the array: 0.2 is 1.22, 3.5 is 33.11, -1.2 is 0.30, and 0.8 is 2.22. The four transformed values ​​are summed: 1.22 + 33.11 + 0.30 + 2.22, resulting in a total of 36.85. Dividing the exponential transformation value of each category by the sum of 36.85 yields normalized probability results of 0.033, 0.898, 0.008, and 0.060. Numerical items representing defect anomaly categories—wear, cracks, and deformation—are selected from the original logistic regression scores. The probability distribution values ​​of these three anomaly categories (0.898, 0.008, and 0.060) are summed to calculate a total anomaly confidence score of 0.966.

[0134] Establish a physical connection to the historical operation database and read the equipment's normal operating status characteristics recorded during the past 720-hour fault-free period. Extract the baseline confidence level sampling sequence and calculate the statistical mean of all normal confidence level samples within this specified time sliding window as 0.045, with a statistical standard deviation of 0.012. Simultaneously read the current ambient temperature measured by the temperature sensor deployed on-site, which is 65 degrees Celsius. Read the preset standard reference temperature as 25 degrees Celsius. Calculate the absolute deviation of the dynamic variable of ambient temperature as 65 minus 25, which equals 40.

[0135] The preset confidence level standard deviation tolerance factor parameter within the boundary adaptive adjustment model is retrieved, with a fixed value of 4; the influence weighting coefficient of the equipment operating environment temperature on the confidence level benchmark is extracted, with a calibration value of 0.0015. Based on the aforementioned extracted historical normal confidence level statistical average, standard deviation, and environmental temperature dynamic variable parameters, the safety range is determined through the following mathematical formula:

[0136] ;

[0137] in, This represents the upper and lower bounds of the dynamically adjusted safety range. This represents the statistical average of historical normal confidence levels within a specified time sliding window. This represents the preset confidence level standard deviation tolerance factor; The standard deviation of historical normal confidence levels within a specified time sliding window; The weighting coefficient representing the influence of the equipment operating environment temperature on the confidence benchmark; This represents the absolute deviation between the current operating ambient temperature of the equipment and the standard reference temperature.

[0138] Multiplying the obtained confidence level standard deviation tolerance factor of 4 by the standard deviation of 0.012 yields an inherent fluctuation tolerance of 0.048. Multiplying the influence weighting coefficient of 0.0015 by the absolute temperature deviation of 40 yields a temperature compensation coefficient of 0.06 for the current sampling period. Summing the inherent fluctuation tolerance of 0.048 and the temperature compensation coefficient of 0.06 yields a total dynamic offset of 0.108. Adding and subtracting the total dynamic offset of 0.108 from the statistical average of 0.045 yields upper and lower bounds of the dynamically adjusted safety range of -0.063 and 0.153, respectively. Since the confidence level cannot be less than zero, a truncation operation is performed to clear the lower limit of the safety range to zero, ultimately determining the safety range to be 0 to 0.153. This result indicates that the maximum permissible normal feature activation threshold for the device at this ambient temperature is 0.153.

[0139] The current periodic anomaly confidence level of 0.966, calculated in the previous steps, is extracted and compared with the dynamically adjusted safety range of 0 to 0.153. The anomaly confidence level of 0.966 is determined to exceed the upper limit of the safety range of 0.153. The out-of-bounds deviation value of 0.813 is calculated by subtracting 0.153 from 0.966. Since the obtained out-of-bounds deviation value of 0.813 is greater than zero, the classification module determines that the current drilling equipment has a structural defect or surface damage risk. The underlying RS485 serial communication interface is called to encapsulate the out-of-bounds deviation value, anomaly confidence level, and occurrence timestamp into a data packet conforming to the Modbus RTU standard communication protocol. The data packet header includes the device physical address 01, and the function code instruction is set to 05 single-coil state. An alarm instruction code containing the above parameters is generated and sent to the programmable logic controller, triggering the warning light and driving the buzzer to sound an alarm.

[0140] Table 3 Compensation Table for Equipment Operating Ambient Temperature and Dynamic Safety Threshold

[0141]

[0142] Table 3 shows the compensation data for the dynamic safety threshold model under different operating temperature conditions. Experimental data indicates that when the operating temperature of the drilling platform reaches 65 degrees Celsius, by introducing an absolute deviation value of 40 for linear compensation calculation, the upper limit of the safety range increases from 0.093 at the 25-degree Celsius baseline to 0.153. Under the interference of thermal distortion of the optical properties of metal components caused by temperature conditions, this temperature compensation coefficient encompasses 12 instances of abnormally high confidence levels (ranging from 0.11 to 0.14) caused by heat within the safe range. Compared to the comparison method using a fixed threshold of 0.093, this dynamic safety threshold model reduces the false boundary crossing alarm rate under environmental interference by 95%, demonstrating the effectiveness of using the dynamic variable of the equipment's operating temperature for linear compensation calculation of discrete variance.

[0143] The ModbusRTU standard communication protocol mentioned above refers to a serial communication protocol used between industrial electronic controllers. It adopts binary data transmission, has a master-slave node addressing architecture and a cyclic redundancy check mechanism, and is used to realize data interaction between edge devices and programmable logic controllers.

[0144] The above embodiments illustrate preferred embodiments of the present invention. Any equivalent adjustments to the technical solution based on software engineering methods are within the scope of protection, including but not limited to: implementing algorithm logic using different programming languages, refactoring functional modules into services, adjusting data interaction protocols, and optimizing resource scheduling strategies. Any implementation scheme derived from reasonable modifications to the data processing flow, service call chain, or system architecture layer without departing from the core technology of the present invention should be considered within the protection scope defined by the technical solution of the present invention.

Claims

1. A deep learning-based defect detection system for oil drilling and production equipment, characterized in that, The system includes: The matrix conversion module is configured to acquire visible light images of drilling equipment, convert the visible light images to construct a pixel matrix, extract the local brightness features of the pixel matrix, eliminate illumination interference, and generate a standard image matrix. The feature extraction module is configured to extract the texture features and semantic features of the standard image matrix using a network model, calculate the channel weight ratio of the texture features and semantic features, and concatenate them to construct a defect feature tensor. The model dimensionality reduction module is configured to calculate the activation contribution of the convolutional kernels of the detection network, filter out redundant branches whose activation contribution is lower than a threshold, prune redundant branches, and compress the defect feature tensor to generate a lightweight mapping graph. The classification and determination module is configured to input the lightweight mapping map into an edge classifier to calculate the anomaly confidence score, compare the anomaly confidence score with the safe interval, and generate an alarm command if the anomaly confidence score exceeds the safe interval.

2. The deep learning-based defect detection system for oil drilling and production equipment according to claim 1, characterized in that, The specific functions of the matrix transformation module are as follows: The image acquisition submodule acquires a visible light image, maps the visible light image to a grayscale space using a preset color space conversion matrix, and performs scale normalization processing on the grayscale image using a bilinear interpolation algorithm to extract the grayscale coordinates of each pixel in the image to construct a pixel matrix. The brightness extraction submodule analyzes the spatial distribution pattern of each pixel in the pixel matrix, uses a local Gaussian smoothing filter to remove high-frequency noise from the pixel matrix, calculates the gray-level gradient change rate between adjacent pixels, and selects a set of key pixels with significant features based on the gray-level gradient change rate to extract the local brightness features. The illumination cancellation submodule compares the local brightness features with a preset global ambient illumination distribution benchmark, uses a homomorphic filtering algorithm to separate the low-frequency illumination reflection component and the high-frequency object detail component in the pixel matrix, suppresses the amplitude of the low-frequency illumination reflection component and enhances the high-frequency object detail component, and eliminates illumination interference to generate the standard image matrix.

3. The deep learning-based defect detection system for oil drilling and production equipment according to claim 1, characterized in that, The specific functions of the feature extraction module are as follows: The feature extraction submodule obtains the standard image matrix and inputs it into a pre-trained multi-scale convolutional network model. Through multiple convolutional kernels, it extracts spatial texture details and deep abstract semantic structures under different receptive fields and outputs the corresponding initial dimension texture features and semantic features respectively. The weight calculation submodule analyzes the energy distribution of texture features and semantic features in different channel dimensions, calculates the global response value of each feature channel by combining global average pooling operation, performs probability mapping on the global response values ​​of all channels using a normalized exponential function, and calculates the channel weight ratio. The tensor splicing submodule optimizes the expression weights of texture features and semantic features in different channel dimensions. It uses the channel weight ratio to perform element-wise dynamic weighted multiplication calculations on each channel corresponding to texture features and semantic features, thereby enhancing the expressive power of the channel containing key defect information and weakening the background noise channel. It then performs cascading operations along the preset channel dimensions to splice and construct the defect feature tensor.

4. The deep learning-based defect detection system for oil drilling and production equipment according to claim 1, characterized in that, The specific function of the model dimensionality reduction module is as follows: The contribution calculation submodule obtains the response output tensor of the detection network convolution kernel to each feature channel during the forward propagation process, calculates the non-zero activation ratio of the output tensor under all validation sample sets and the absolute value of the scaling factor in the batch normalization layer, and comprehensively calculates the activation contribution. The branch filtering submodule compares the activation contribution with a preset structure retention threshold, uses a binary search algorithm to locate the network layer node with the least performance impact in the sorted contribution sequence, and identifies and marks redundant branches that are below the threshold and do not affect the global semantic integrity. The network pruning submodule filters network connections and parameter weights corresponding to redundant branches, uses knowledge distillation technology to transfer the network's defect identification capability to the pruned simplified network structure, and compresses the defect feature tensor based on the simplified network structure to generate the lightweight mapping graph.

5. The deep learning-based defect detection system for oil drilling and production equipment according to claim 1, characterized in that, The specific functions of the classification determination module are as follows: The confidence inference submodule obtains the lightweight mapping map and inputs it into the edge classifier pre-deployed in the terminal device. It maps the multi-dimensional feature space to a one-dimensional category probability space through a fully connected layer and calculates the anomaly confidence by combining the Softmax activation function for probability normalization. The interval comparison submodule obtains the confidence distribution statistics under normal conditions from the system's historical operation database, determines the safe interval by setting the upper and lower limits of reasonable fluctuations based on the Gaussian distribution model, and compares the abnormal confidence with the safe interval one by one to obtain the out-of-bounds deviation value. The alarm triggering submodule determines that if the obtained out-of-bounds deviation value is greater than zero, it indicates that there is a structural defect or surface damage risk in the current drilling and production equipment. It calls the underlying hardware control interface of the system to encapsulate the out-of-bounds alarm information into a standard communication protocol data packet, and determines that the abnormal confidence level exceeds the safe range to generate an alarm command.

6. The deep learning-based defect detection system for oil drilling and production equipment according to claim 2, characterized in that, The process of eliminating illumination interference to generate the standard image matrix specifically includes: The local brightness features and the initial pixel matrix are obtained, and a two-dimensional logarithmic domain function that can map the attenuation law of ambient light intensity is constructed. The pixel matrix is ​​transformed from linear space to logarithmic space to separate the incident light component and the reflected light component. The spectral distribution information contained in the local brightness features is analyzed, an adaptive Gaussian high-pass filter transfer matrix is ​​constructed, the Euclidean distance between the center frequency and the cutoff frequency of the pixel is calculated, and the low-frequency incident light component representing the slow change of illumination is suppressed by the Gaussian high-pass filter transfer matrix while retaining the high-frequency reflected light component. The image data after frequency domain processing is optimized and restored to the original spatial domain using the inverse exponential transform function. At the same time, the histogram equalization algorithm is used to stretch the contrast distribution range of the image, and all enhanced pixels are integrated to generate the standard image matrix.

7. The deep learning-based defect detection system for oil drilling and production equipment according to claim 3, characterized in that, The process of calculating the channel weight ratio specifically includes: Obtain a 3D feature map array corresponding to texture features and semantic features. Perform global max pooling operation along the spatial dimension to extract the most significant feature response peak within each channel. At the same time, perform global average pooling operation to extract the average feature distribution state within each channel. The most significant feature response peak and average feature distribution state are analyzed and then input into a fully connected shared network composed of multilayer perceptrons for nonlinear dimensionality reduction and dimensionality increase. The two sets of output vectors after network mapping are superimposed to generate a comprehensive feature descriptor of the channel dimension. The numerical values ​​of each element in the comprehensive feature descriptor are optimized and a Sigmoid activation function is introduced to perform range compression, nonlinearly mapping the values ​​of all dimensions to a continuous interval between zero and one, and the channel weight ratio is calculated based on the normalized numerical sequence after mapping.

8. The deep learning-based defect detection system for oil drilling and production equipment according to claim 4, characterized in that, The process of comprehensively calculating the activation contribution specifically includes: Obtain the set of batch normalized scaling factors corresponding to the convolutional kernels of the detection network and the channel activation sparsity matrix during the forward propagation of the network. Based on the preset channel importance evaluation model, use the batch normalized scaling factors to perform logarithmic smoothing calculation on the corresponding dimensions in the channel activation sparsity matrix. Combine the absolute value term of the Taylor expansion to approximate the gradient change rate of the network loss function with respect to the channel parameter, and establish a contribution evaluation model that can characterize the redundancy of the convolutional kernel. Calculate the feature map entropy value of the target channel, based on the contribution evaluation model, using the formula: ; Calculate the activation contribution; in, Representing the The activation contribution of each channel This represents the preset scaling factor importance penalty coefficient. Representing the The absolute value of the batch normalized scaling factor corresponding to each channel. This represents the preset gradient response importance adjustment coefficient. Represents the network loss function relative to the first The rate of change of the parameter gradient of each channel Representing the The information entropy value of the feature map output by each channel.

9. The deep learning-based defect detection system for oil drilling and production equipment according to claim 5, characterized in that, The process of calculating the anomaly confidence level specifically includes: Obtain the deep multidimensional abstract feature vector contained in the lightweight mapping graph, flatten the deep multidimensional abstract feature vector into a one-dimensional continuous feature sequence, configure a self-attention mechanism module that can capture long-distance dependencies in the sequence to calculate the correlation weight matrix between each element in the feature sequence. The weighted summation of elements in the feature sequence is calculated and the aggregated global representation vector is obtained based on the correlation weight matrix. The global representation vector is then input into a nonlinear fully connected network containing a dropout layer for feature space transformation and mapping, and the original logistic regression scores of each detection category are output. The numerical items representing defect and anomaly categories in the original logistic regression scores are selected. An exponential function is used to convert the numerical items of the anomaly categories into positive numbers and then divided by the sum of the exponential conversion values ​​of all categories. The anomaly confidence level is calculated based on the output probability distribution values.

10. The deep learning-based defect detection system for oil drilling and production equipment according to claim 5, characterized in that, The process of determining a safe zone specifically includes: The system obtains the baseline confidence sampling sequence and the dynamic variable of the equipment operating environment temperature recorded in the system's historical operation database during the fault-free period. Based on the preset boundary adaptive adjustment model, the discrete variance of the baseline confidence sampling sequence is linearly compensated using the dynamic variable of the equipment operating environment temperature. Combined with the confidence mean drift within the time sliding window, a dynamic safety threshold model that can dynamically track the system health baseline is constructed. Calculate the temperature compensation coefficient for the current sampling period based on the dynamic safety threshold model, using the formula: ; Determine the safe zone; in, This represents the upper and lower bounds of the dynamically adjusted safety range. This represents the statistical average of historical normal confidence levels within a specified time sliding window. This represents the preset confidence level standard deviation tolerance factor. This represents the standard deviation of historical normal confidence levels within a specified time sliding window. The weighting coefficient representing the influence of the equipment's operating environment temperature on the confidence benchmark. This represents the absolute deviation between the current operating ambient temperature of the equipment and the standard reference temperature.