An air conditioner indoor unit cleanliness degree detection method based on deep learning
By improving the differentiable physical rendering and geometric consistency verification mechanism of the BiSeNet model, constructing a differential depth map and stitching feature tensors, the problem of high-frequency information loss of complex curved surface structures in the cleanliness detection of air conditioner indoor units is solved, and high-precision dust accumulation area segmentation and cleanliness detection are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGDONG JIAFU DIGITAL TECHNOLOGY CO LTD
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing 3D reconstruction and segmentation algorithms neglect the physical occlusion relationships hidden in complex curved surface structures and the microscopic aero-optical characteristics of dust accumulation layers in the cleanliness detection of air conditioner indoor units. This leads to the loss of high-frequency geometric information and the aliasing of texture features, making it difficult to effectively distinguish between small dust accumulations and background structures, thus reducing the reliability and accuracy of detection.
An improved BiSeNet model is adopted, and a differential depth map is constructed and a dual-channel feature tensor is stitched together through differentiable physical rendering and geometric consistency verification mechanism. High-resolution and low-resolution feature maps are extracted using spatial detail coding layer and context semantic coding layer. Combined with geometric confidence mask and attention feature fusion, an effective gray area mask is generated.
It achieves high-precision segmentation of dust accumulation areas on complex curved surfaces, improving the accuracy and reliability of cleanliness detection of air conditioner indoor units, and overcoming the limitations of low three-dimensional data utilization and blurred edges in traditional methods.
Smart Images

Figure CN122243980A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer vision technology, and in particular to a method for detecting the cleanliness of an air conditioner indoor unit based on deep learning. Background Technology
[0002] With the development of industrial testing equipment towards higher precision and real-time capabilities, 3D point cloud acquisition and reconstruction technology based on structured light has been widely applied in the field of cleanliness detection for air conditioner indoor units. Existing 3D reconstruction and segmentation algorithms, such as the classic point cloud registration combined with deep learning image segmentation, while utilizing phase calculation and triangulation principles to obtain the surface morphology of the target, primarily rely on the Euclidean distance transformation of the original point cloud coordinates and the statistical feature extraction of convolutional neural networks. This method, based solely on geometric coordinates and image pixel statistics, ignores the implicit physical occlusion relationships within the complex curved surface structure of the air conditioner indoor unit and the unique microscopic optical properties (such as scattering rate and density) of the dust accumulation layer. This leads to the loss of high-frequency geometric information and the aliasing of texture features during surface model construction, thus limiting the effective differentiation between small dust accumulations and background structures. Furthermore, classic cleanliness detection methods often struggle to fully utilize the depth prior geometric constraints in the reconstructed data to optimize the network feature extraction process, resulting in numerous false positives and false negatives when dealing with surface reflections, shadow interference, and low-texture dust accumulation areas, reducing the reliability and accuracy of air conditioner indoor unit cleanliness detection.
[0003] Therefore, how to provide a deep learning-based method for detecting the cleanliness of air conditioner indoor units is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention
[0004] This invention proposes a deep learning-based method for detecting the cleanliness of an air conditioner indoor unit. It utilizes a differentiable physical rendering and geometric consistency verification mechanism based on an improved BiSeNet model to construct differential depth maps from the decoupled standard depth map and diffuse texture map, and then concatenates them to generate a dual-channel feature tensor. The spatial detail encoding layer of the improved BiSeNet model is used to calculate the depth gradient field, and feature aggregation is performed along the depth contour lines to obtain a high-resolution spatial feature map. The context semantic encoding layer of the improved BiSeNet model is used to perform viewpoint transformation and feature resampling to obtain a low-resolution semantic feature map containing disparity invariance. The low-resolution semantic feature map is input into the geometric consistency verification layer of the improved BiSeNet model, and a physically predicted depth map and a geometric confidence mask are calculated based on the differentiable physical rendering mapping. The geometric confidence mask is used to perform gated modulation in the attention feature fusion layer of the improved BiSeNet model, outputting a fused and enhanced feature map and generating an effective dust accumulation region mask. This mechanism, by establishing a closed-loop feedback path in the improved BiSeNet model—from physical geometry guidance to multi-view feature extraction and rendering residual verification—effectively eliminates the limitation of traditional image segmentation methods being insensitive to geometric structural features. It ensures that the effective gray area mask generated by the improved BiSeNet model accurately preserves high-frequency texture details and conforms to the physical distribution law of gray accumulation. This achieves improved accuracy in gray area segmentation of complex curved surfaces while optimizing the cleanliness detection technology based on the gray density coefficient. This invention overcomes the limitations of traditional air conditioner indoor unit detection methods, such as low 3D data utilization, blurred edges, and neglect of physical constraints, providing a high-precision solution for the cleanliness detection of air conditioner indoor units.
[0005] A method for detecting the cleanliness of an air conditioner indoor unit based on deep learning, according to an embodiment of the present invention, specifically includes:
[0006] S1. Project a sinusoidal phase-shift coding pattern under clean conditions, acquire multiple frames of phase-shift images and calculate the phase, and reconstruct a three-dimensional digital reference model by combining calibration parameters.
[0007] S2. Project the coded pattern according to the preset cycle while the air conditioner is running, collect real-time images and decode and reconstruct the real-time three-dimensional point cloud data of the target component surface;
[0008] S3. Project the real-time 3D point cloud data onto the depth image space, construct a robust objective function and calculate the initial transformation matrix to obtain coarse registration point cloud data;
[0009] S4. Based on the coarsely registered point cloud data, a dynamic weight map is constructed. Outliers are dynamically removed by using the point-to-plane ICP algorithm combined with the cosine value of the angle between the normal vectors. The coordinates are then corrected to obtain the registered real-time 3D point cloud data.
[0010] S5. Extract the texture grayscale of the registered real-time 3D point cloud data, and construct a joint implicit field by fitting the mapping relationship between spatial coordinates and directed distance field and illumination reflectivity, using the geometric topology of the 3D digital benchmark model as constraints.
[0011] S6. Using the RayMarching algorithm and based on the initial spatial position of the registered real-time 3D point cloud data, the joint implicit field is sampled and tracked to decouple and generate a standard depth map and a diffuse texture map.
[0012] S7. Input the standard depth map and diffuse texture map into the improved BiSeNet model, use the 3D digital benchmark model to project and calculate the differential depth map and stitch the texture map to construct a dual-channel feature tensor, use the geometric residual to generate a geometric confidence mask based on the differentiable physical rendering map, introduce the geometric confidence mask to perform gated attention fusion on the dual-scale features and output the effective gray area mask.
[0013] S8. Based on the effective ash accumulation area mask, perform three-dimensional integral calculation of the differential depth map to calculate the total ash accumulation volume, establish the mapping relationship between diffuse reflection texture and differential depth to solve the ash accumulation density coefficient, match the cleanliness threshold to output the level result and trigger control.
[0014] Optionally, S1 specifically includes:
[0015] S11. Solve the wrapping phase and calculate the absolute phase using the multi-frequency heterodyne principle. Combine the calibration parameters and the system calibration matrix, calculate the three-dimensional spatial coordinates of the pixel points according to the phase-depth mapping formula, and convert the coordinates to the world coordinate system to generate the initial reference point cloud data.
[0016] S12. Perform statistical filtering on the initial reference point cloud data, calculate the average distance from the point to the neighboring points and remove outlier noise points whose distance exceeds the preset threshold, and generate effective point cloud data.
[0017] S13. Perform triangulation mesh reconstruction on the effective point cloud data, connect spatial points according to the consistency of spatial point normal vectors and the distance criterion of neighboring points, construct topological connection relationship and generate a three-dimensional mesh model.
[0018] S14. Project the 3D mesh model onto the image plane, determine the visibility of the patches and map the texture color information using the ray casting method, and generate a 3D digital reference model with texture mapping.
[0019] Optionally, S2 specifically includes:
[0020] S21. The wrapping phase of the real-time image is solved using the sinusoidal phase shift algorithm. The absolute phase is calculated by combining the hierarchical relationship between high-frequency and low-frequency phases. The absolute phase is mapped to three-dimensional spatial coordinates according to the system calibration parameters to generate initial real-time three-dimensional point cloud data.
[0021] S22. Perform statistical filtering on the initial real-time 3D point cloud data, calculate the average distance from each point in the point cloud to the set of neighboring points and set a distance threshold, remove discrete noise points whose distance exceeds the threshold, and generate filtered point cloud data.
[0022] S23. Downsample the filtered point cloud data, divide the point cloud space using a voxel grid, take the centroid of the point cloud falling in the same grid as the replacement point, reduce the amount of point cloud data while maintaining geometric features, and generate simplified point cloud data.
[0023] S24. Extract grayscale values from the pixel coordinate indices of each point in the simplified point cloud data, assign the grayscale values as attribute information to the corresponding three-dimensional spatial points, and generate real-time three-dimensional point cloud data containing texture information.
[0024] Optionally, S3 specifically includes:
[0025] S31. Project the real-time 3D point cloud data and the 3D digital reference model onto the depth image space to generate a real-time depth map and a reference depth map; calculate the difference between the depth value at each pixel coordinate in the real-time depth map and the corresponding depth value at the pixel coordinate in the reference depth map to generate a depth residual map; substitute the depth residual value at each pixel coordinate in the depth residual map into the Gaussian kernel function, calculate the ratio of the square of the negative depth residual value to the variance parameter, and perform exponential operation on the comparison value to generate a weight value; multiply the weight value by the square of the corresponding depth residual value to generate a weighted error value, and sum the weighted error values at all pixel coordinates to obtain the robust objective function;
[0026] S32. Perform nonlinear optimization on the constructed robust objective function, iteratively update the pose parameters using the Levenberg-Marquardt algorithm until the objective function converges, and calculate the initial transformation matrix.
[0027] S33. Use the calculated initial transformation matrix to perform coordinate transformation on the real-time 3D point cloud data to generate coarsely registered point cloud data.
[0028] Optionally, S4 specifically includes:
[0029] S41. Perform a neighborhood search on each point in the coarsely registered point cloud data, fit a local micro-tangent plane and calculate the normal vector to generate a set of normal vectors.
[0030] S42. Project the coarse registration point cloud data onto the surface of the three-dimensional digital reference model, calculate the reference normal vector at the projection point, and generate a set of reference normal vectors.
[0031] S43. Traverse each point in the coarsely registered point cloud data, calculate the cosine of the angle between the current point's normal vector and the corresponding reference normal vector, mark points with a cosine of the angle less than a preset consistency threshold as outliers and remove them, and generate a set of valid matching points.
[0032] S44. Apply the point-to-plane ICP algorithm to the effective matching point set, construct the perpendicular distance error function between the point and the corresponding tangent plane, and iteratively solve the correction transformation matrix by minimizing the error function;
[0033] S45. Use the correction transformation matrix to perform coordinate transformation on the coarsely registered point cloud data to generate registered real-time 3D point cloud data.
[0034] Optionally, S5 specifically includes:
[0035] S51. Extract the texture gray value corresponding to each point in the registered real-time 3D point cloud data, and normalize the texture gray value to obtain the reflectance value, and generate a reflectance dataset.
[0036] S52. Using the bounding box of the three-dimensional digital reference model as the spatial boundary, the interior of the spatial boundary is discretized into a voxel mesh to generate a set of sampling points.
[0037] S53. For each sampling point in the sampling point set, query the grid cell index of the 3D digital reference model at the sampling point to obtain the coordinates of the four vertices of the grid cell and the corresponding unit normal vector; calculate the vector difference between the sampling point and each vertex, and perform a dot product operation between the vector difference and the unit normal vector of the corresponding vertex to obtain the distance values of the four vertices; perform bilinear interpolation on the distance values of the four vertices to obtain the sampling point distance value; determine the sign of the sampling point distance value according to the spatial position relationship of the sampling point relative to the tangent plane of the grid cell, and generate a signed distance function value; calculate the centroid coordinates relative to the grid cell using the spatial coordinates of the sampling point, and perform a linear weighted sum of the signed distance function value and the centroid coordinates to obtain the geometric implicit field value;
[0038] S54. Using the spatial coordinates of the sampling points as the input vector, perform a linear weighted summation calculation on the input vector and the parameter matrix to be optimized. Substitute the calculation result into the GELU activation function, calculate the product of the input variable and the cumulative distribution function value of the standard normal distribution to obtain the nonlinear response value, perform feature mapping operation, and generate feature vectors layer by layer; perform decomposition calculation on the feature vectors, and output the predicted directed distance value and the predicted reflectance value respectively; calculate the first difference between the predicted directed distance value and the geometric implicit field value, perform a square operation on the first difference to generate the geometric loss term; calculate the second difference between the predicted reflectance value and the actual reflectance value in the reflectance dataset. The difference is calculated by squaring the second difference to generate a photometric loss term. Geometric constraint weight coefficients are set, and the geometric loss term is multiplied by these coefficients to obtain a weighted geometric error. Photometric constraint weight coefficients are also set, and the photometric loss term is multiplied by these coefficients to obtain a weighted photometric error. The weighted geometric error and the weighted photometric error are summed to obtain the joint loss function value. The partial derivative of the joint loss function value with respect to the parameter matrix to be optimized is calculated, and a parameter update gradient is constructed. This gradient is used to perform a correction iteration calculation on the parameter matrix to be optimized until the joint loss function value is less than a preset convergence threshold, generating a joint implicit field.
[0039] Optionally, S6 specifically includes:
[0040] S61. Define the imaging plane parameters of the virtual camera, and back-project each pixel on the imaging plane into three-dimensional space to generate multiple virtual rays starting from the optical center of the camera.
[0041] S62. Traverse each virtual ray, calculate the spatial distance between the current virtual ray and all data points in the registered real-time 3D point cloud data, and filter out the target point cloud data point corresponding to the minimum spatial distance; calculate the perpendicular vector from the target point cloud data point to the virtual ray, and use the spatial coordinates of the target point cloud data point minus the perpendicular vector to calculate the projection coordinates of the target point cloud data point on the virtual ray, and set the projection coordinates as the initial sampling starting point; use the initial sampling starting point coordinates plus the product of the preset step size and the direction vector of the virtual ray to calculate the coordinates of the current step point; input the coordinates of the current step point into the joint implicit field, and solve for the directed distance field value and the illumination reflectivity value corresponding to the coordinates of the current step point;
[0042] S63. Determine whether the directional distance field value of the current point is less than the preset distance threshold. If so, determine that the current point is located on the implicit surface of the three-dimensional digital reference model, stop the tracking of the virtual ray and record the current ray travel length as the depth value, and extract the light reflectance value of the current point as the diffuse texture value.
[0043] S64. If the directional distance field value of the current point is greater than or equal to the preset distance threshold, the next step length is dynamically adjusted according to the directional distance field value of the current point, and the tracking continues along the ray direction until the ray travel length exceeds the preset maximum threshold or the surface is determined to be hit.
[0044] S65. Arrange the depth values of all virtual rays according to pixel coordinates to generate a standard depth map, and arrange the diffuse texture values of all virtual rays according to pixel coordinates to generate a diffuse texture map.
[0045] Optionally, the improved BiSeNet model includes a differential feature construction layer, an input preprocessing layer, a spatial detail encoding layer, a contextual semantic encoding layer, a geometric consistency verification layer, an attention feature fusion layer, and a gray area segmentation output layer.
[0046] The differential feature construction layer is used to obtain the internal and external parameters of the virtual camera, project the three-dimensional digital reference model onto the imaging plane to generate a projection depth map, and calculate the differential depth map between the projection depth map and the standard depth map; the normalized differential depth map is stitched with the diffuse texture map to generate a dual-channel feature tensor.
[0047] The input preprocessing layer is used to perform batch normalization and zero-padding on the dual-channel feature tensor and output a standard input feature tensor.
[0048] The spatial detail encoding layer is used to calculate the pixel-level depth gradient field based on the differential depth map, obtain the geometric edge orientation and normal vector distribution of the dust accumulation surface; using the orthogonal direction of the depth gradient vector as the sampling guide trajectory, the pixels in the standard input feature tensor are offset and addressed, and feature aggregation and cascaded downsampling calculations are performed along the depth contour line direction to output a high-resolution spatial feature map that retains high-frequency texture details and is sensitive to geometric edges.
[0049] The context semantic coding layer is used to reconstruct the three-dimensional spatial coordinates and surface normal vectors of pixels by backprojecting from the standard depth map, calculate the projection mapping relationship of the virtual camera under the optical axis view along the preset deflection angle, perform view transformation of pixel coordinates and feature resampling on the standard input feature tensor based on the projection mapping relationship, calculate the feature response consistency under different virtual view, extract deep semantic features with disparity invariance, and output a low-resolution semantic feature map containing ash thickness prediction information.
[0050] The geometric consistency verification layer is used to receive low-resolution semantic feature maps and construct a differentiable physical rendering mapping based on a 3D digital reference model. This includes obtaining the virtual camera intrinsic and extrinsic parameter matrices, calculating the rotation and translation transformation matrix from the camera coordinate system to the world coordinate system; extracting the vertex coordinates of the 3D digital reference model, multiplying the vertex coordinates by the rotation and translation transformation matrix and normalizing them to generate a differential geometric projection transformation matrix; performing a back-projection transformation on the low-resolution semantic feature map, mapping it to the virtual 3D space to obtain pixel-level dust accumulation thickness prediction values, superimposing the dust accumulation thickness prediction values along the reference surface normal vector direction onto the differential geometric projection transformation matrix, calculating the physical location distribution of the predicted dust accumulation layer on the imaging plane, and generating a physical prediction depth map; calculating the pixel-level residual between the physical prediction depth map and the depth data corresponding to the standard input feature tensor, performing a negative exponential transformation on the pixel-level residual, and generating a geometric confidence mask.
[0051] The attention feature fusion layer uses a geometric confidence mask to perform gated modulation on the low-resolution semantic feature map, filters out feature responses that do not conform to the geometric graying mechanism, obtains a calibrated semantic feature map, and then upsamples the calibrated semantic feature map and adds it pixel by pixel to the high-resolution spatial feature map to output a fused enhanced feature map.
[0052] The gray area segmentation output layer is used to perform pixel-by-pixel classification on the fused enhanced feature map and generate a binary effective gray area mask based on a preset classification threshold.
[0053] Optionally, S8 specifically includes:
[0054] S81. Extract the region of interest based on the effective ash accumulation area mask, perform pixel physical size calibration on the differential depth map within the region of interest and perform three-dimensional integral calculation to generate the total ash accumulation volume;
[0055] S82. Based on the effective gray area mask, locate the corresponding area in the diffuse texture map, convert the corresponding area from RGB color space to grayscale space, and calculate the average grayscale value of all pixels in the area as the texture grayscale feature.
[0056] S83. Based on the effective ash accumulation area mask, locate the corresponding depth value of the corresponding area in the differential depth map, perform maximum and minimum normalization processing, map the value to the [0,1] interval, and calculate the normalized depth mean of all pixels in the area.
[0057] S84. Based on the historical gray sample set, with texture gray features as independent variables and normalized depth mean as dependent variable, a gray-depth nonlinear mapping function is constructed by fitting Gaussian process regression.
[0058] S85. Input the texture grayscale features into the grayscale-depth nonlinear mapping function to calculate the theoretically predicted normalized depth value, and calculate the ratio of the theoretically predicted normalized depth value to the actual normalized depth mean value as the gray accumulation density coefficient.
[0059] S86. Calculate the product of the total ash volume and the ash density coefficient to obtain the ash quality parameter. Compare the ash quality parameter with the preset cleanliness threshold range, output the cleanliness level result and generate the corresponding control command.
[0060] The beneficial effects of this invention are:
[0061] (1) This invention achieves deep decoupling and continuous representation of the surface geometric and texture information of the target component by constructing a joint implicit field and RayMarching sampling and tracking mechanism. Utilizing the geometric topological constraints of the 3D digital benchmark model, the mapping relationship between spatial coordinates and directed distance field and illumination reflectivity is jointly fitted. By minimizing the geometric loss term and photometric loss term, a joint implicit field containing high-frequency details is generated. Combining the RayMarching algorithm and calculating the initial sampling starting point based on the registered real-time 3D point cloud data, dynamic step tracking is performed, and a standard depth map and diffuse texture map are generated through decoupling. This mechanism transforms sparse point clouds into a continuous implicit function space, perfectly filtering out high-frequency noise and discretization errors in the original measurement data, and providing high-quality input data with both complete geometric structure and realistic texture details.
[0062] (2) This invention establishes a physically-guided feature extraction and geometric closed-loop verification system by employing an improved BiSeNet model and differentiable physical rendering mapping. The spatial detail encoding layer calculates the depth gradient field based on the differential depth map, performs feature aggregation and cascaded downsampling calculations along the depth contour lines, and accurately obtains high-resolution spatial features sensitive to geometric edges. The context semantic encoding layer reconstructs 3D coordinates using backprojection of the standard depth map, performs viewpoint transformation and feature resampling, and extracts deep semantic features with disparity invariance. The geometric consistency verification layer constructs a differentiable physical rendering mapping, calculates the pixel-level residual between the physically predicted depth map and the standard input feature tensor, generates a geometric confidence mask using negative exponential transformation, and performs attention feature fusion. This system achieves multi-dimensional feature enhancement from feature space to physical space through depth gradient-guided sampling, multi-view consistency constraints, and physical rendering residual verification, ensuring that the output effective gray area mask has extremely high capture accuracy for the gray physical boundary. Attached Figure Description
[0063] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings:
[0064] Figure 1 This is an overall flowchart of a deep learning-based method for detecting the cleanliness of an air conditioner indoor unit, as proposed in this invention.
[0065] Figure 2 This is a flowchart illustrating the working principle of the improved BiSeNet model for detecting the cleanliness of an air conditioner indoor unit based on deep learning, as proposed in this invention. Detailed Implementation
[0066] The invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams, illustrating only the basic structure of the invention, and therefore only show the components relevant to the invention.
[0067] refer to Figure 1 and Figure 2 A deep learning-based method for detecting the cleanliness of an air conditioner's indoor unit, specifically including:
[0068] S1. Project a sinusoidal phase-shift coding pattern under clean conditions, acquire multiple frames of phase-shift images and calculate the phase, and reconstruct a three-dimensional digital reference model by combining calibration parameters.
[0069] S2. Project the coded pattern according to the preset cycle while the air conditioner is running, collect real-time images and decode and reconstruct the real-time three-dimensional point cloud data of the target component surface;
[0070] S3. Project the real-time 3D point cloud data onto the depth image space, construct a robust objective function and calculate the initial transformation matrix to obtain coarse registration point cloud data;
[0071] S4. Based on the coarsely registered point cloud data, a dynamic weight map is constructed. Outliers are dynamically removed by using the point-to-plane ICP algorithm combined with the cosine value of the angle between the normal vectors. The coordinates are then corrected to obtain the registered real-time 3D point cloud data.
[0072] S5. Extract the texture grayscale of the registered real-time 3D point cloud data, and construct a joint implicit field by fitting the mapping relationship between spatial coordinates and directed distance field and illumination reflectivity, using the geometric topology of the 3D digital benchmark model as constraints.
[0073] S6. Using the RayMarching algorithm and based on the initial spatial position of the registered real-time 3D point cloud data, the joint implicit field is sampled and tracked to decouple and generate a standard depth map and a diffuse texture map.
[0074] S7. Input the standard depth map and diffuse texture map into the improved BiSeNet model, use the 3D digital benchmark model to project and calculate the differential depth map and stitch the texture map to construct a dual-channel feature tensor, use the geometric residual to generate a geometric confidence mask based on the differentiable physical rendering map, introduce the geometric confidence mask to perform gated attention fusion on the dual-scale features and output the effective gray area mask.
[0075] S8. Based on the effective ash accumulation area mask, perform three-dimensional integral calculation of the differential depth map to calculate the total ash accumulation volume, establish the mapping relationship between diffuse reflection texture and differential depth to solve the ash accumulation density coefficient, match the cleanliness threshold to output the level result and trigger control.
[0076] In this embodiment, S1 specifically includes:
[0077] S11. Project three sets of sinusoidal fringe patterns and acquire images. Calculate the truncated phase using a multi-step phase-shifting algorithm, calculate the absolute phase using the multi-frequency heterodyne principle, read the intrinsic and extrinsic parameter matrices calibrated by the system, calculate the spatial distance based on the trigonometric function relationship between phase and depth, calculate the camera coordinate system coordinates in reverse using the intrinsic parameters, and transform to the world coordinate system using the extrinsic parameters to generate initial reference point cloud data.
[0078] S12. Traverse the initial reference point cloud data to calculate the average Euclidean distance of each point to its 50 nearest neighboring points. Compare this average value with the standard deviation of the average distance of the overall point cloud. Remove points whose average distance exceeds 3 times the standard deviation. Repeat the calculation and removal process for the remaining point set until convergence is achieved, generating valid point cloud data.
[0079] S13. Construct a k-dimensional tree index to query the three nearest neighboring points of each point, calculate the volume of the tetrahedron formed by the point and its neighbors, select the group of points with the smallest volume as candidate vertices and connect them to generate triangular facets, calculate the angle between the facet normal vector and the point estimated normal vector, delete facets with an angle greater than 30 degrees, and construct a three-dimensional mesh model.
[0080] S14. Set the parameters of the virtual camera optical center and imaging plane, calculate the intersection of the projection ray of the mesh model vertex and the imaging plane to obtain the pixel coordinates, calculate the cosine value of the angle between the normal vector of the face and the line of sight, select the face with the cosine value greater than 0 as the visible face, read the pixel color value of the original image and map it to the vertex of the visible face to generate a three-dimensional digital reference model.
[0081] In this embodiment, S2 specifically includes:
[0082] S21. Project a sinusoidal stripe pattern and capture an image. Extract the pixel light intensity grayscale value. Calculate the wrapping phase using the 4-step phase shift method. Solve the absolute phase through the high and low frequency phase difference. Calculate the physical depth distance by combining calibration parameters and triangulation principle. Use the depth distance and camera intrinsic parameters to inversely calculate the three-dimensional spatial coordinates and generate initial real-time three-dimensional point cloud data.
[0083] S22. Construct a kd-tree spatial index to query the 50 nearest neighbor points of each point and calculate the average Euclidean distance. Calculate the average distance and standard deviation of the overall point cloud. Remove points whose average distance is greater than the overall average distance plus 3 times the standard deviation. Repeat the query and removal operations until convergence, and generate filtered point cloud data.
[0084] S23. Set a 2 mm cube voxel grid to divide the space, calculate the grid index of each point, filter the grids containing point clouds, calculate the arithmetic mean of the X, Y, and Z axis coordinates of all points in the grid as the centroid coordinates, and use the centroid coordinates to replace the original point cloud in the grid to generate simplified point cloud data.
[0085] S24. Determine the pixel coordinates corresponding to each point based on the camera imaging model, extract the gray values of the red, green and blue channels of the pixel to form a color attribute vector, and attach the color attribute vector to the three-dimensional space point to generate real-time three-dimensional point cloud data.
[0086] In this embodiment, S3 specifically includes:
[0087] S31. Based on the camera intrinsic parameters, project the real-time 3D point cloud data and the 3D digital reference model to generate a depth map. Calculate the difference between the real-time depth value and the reference depth value to obtain the depth residual value. Set the variance parameter of the Gaussian kernel function to 0.5. Calculate the ratio of the square of the depth residual value divided by twice the variance parameter. Take the negative sign of the comparison value and calculate the exponent with the natural constant e as the base to obtain the weight value. Multiply the weight value by the square of the depth residual value and sum them up to construct a robust objective function.
[0088] S32. Initialize the pose vector, calculate the Jacobian matrix of the robust objective function relative to the pose vector using the Levenberg-Marquardt algorithm, set the damping factor parameter to 0.01 to adjust the displacement and iteratively update the pose vector. When the absolute value of the difference between the values of the robust objective function in two adjacent iterations is less than 0.0001, convergence is determined. Construct the initial transformation matrix using the final pose vector.
[0089] S33. Convert the real-time 3D point cloud data into homogeneous coordinates, perform matrix multiplication with the initial transformation matrix, calculate the transformed 3D coordinates, and generate coarse registration point cloud data.
[0090] The robust objective function solution process proposed in this step is similar to that of the traditional ICP algorithm in that both are based on the pose estimation theory of point cloud registration, that is, the rigid body transformation matrix is solved by minimizing the distance error between corresponding points between the source point cloud and the target point cloud, and both use nonlinear optimization algorithms to iteratively update the pose vector.
[0091] The difference lies in that this invention breaks away from the limitations of traditional methods that merely use the least squares method to equally weight the residuals of all corresponding points. Building upon the traditional model's direct calculation of the sum of squared depth residuals, this step adds a robust weight construction step. It uses a Gaussian kernel function to map the depth residual values to weight values, rather than assigning the same weight to all points. In the objective function construction step, the calculated weight values are used to weight and modulate the squared depth residual values, rather than directly accumulating the squared errors. Finally, in the optimization solution step, the Jacobian matrix is calculated based on the weighted robust objective function, and the pose is updated using damped least squares, rather than based on ordinary least squares errors.
[0092] The beneficial effects of this improvement are that, by using robust weight calculation based on Gaussian kernel function and weighted error summation, the contribution of different regions to pose estimation can be dynamically adjusted according to the magnitude of the depth residual. This breaks the limitation of traditional methods in that the registration accuracy is reduced due to interference from outliers and dust accumulation areas during the registration process, and achieves precise focusing from global uniform error to robust local error. This design significantly enhances the tolerance of the registration algorithm to unstructured changes in the scene (such as dust accumulation), and can more accurately suppress false geometric matching caused by dust accumulation. The robust optimization based on weighted residual effectively improves the accuracy of the initial pose estimation, providing a reliable initial value for subsequent high-precision fine registration, and enhancing the robustness of the system in complex air conditioner indoor unit cleanliness detection scenarios.
[0093] In this embodiment, S4 specifically includes:
[0094] S41. Construct a tree index structure to spatially partition the coarsely registered point cloud data. For each point, query the 15 nearest neighboring points. Fit the local micro-tangent plane where the neighboring points are located using the least squares method. Calculate the unit normal vector of the local micro-tangent plane and adjust its direction to point towards the camera optical center. Generate a set of normal vectors containing the normal vector information of all points.
[0095] S42. Using the point-to-plane distance formula, each point in the coarse registration point cloud data is orthogonally projected onto the mesh surface of the three-dimensional digital reference model to determine the spatial coordinates of the projection points. The unit normal vector attribute stored at the projection point position of the three-dimensional digital reference model is extracted to generate a reference normal vector set containing reference normal vector information of all projection points.
[0096] S43. Traverse each point in the coarse registration point cloud data, read the normal vector of the point and the corresponding reference normal vector, calculate the dot product of the two vectors as the cosine value of the included angle, determine the spatial coordinates of points with an included angle cosine value less than 0.9 as having inconsistent normal vector directions, mark such points as outliers and remove them from the dataset, and retain the remaining point set to generate a valid matching point set.
[0097] S44. Calculate the vertical distance from each point in the effective matching point set to the corresponding tangent plane of the 3D digital reference model. Square all vertical distance values and sum them up to construct the point-to-plane distance error function. Use the singular value decomposition algorithm to iteratively optimize and solve the error function. Stop the iteration and calculate the correction transformation matrix when the decrease in the error function value is less than 0.001.
[0098] S45. Using the correction transformation matrix, perform matrix multiplication transformation on each three-dimensional spatial coordinate in the coarse registration point cloud data, calculate the transformed precise coordinates, reassemble all precise coordinate points, and generate the registered real-time three-dimensional point cloud data.
[0099] In this embodiment, S5 specifically includes:
[0100] S51. Traverse the registered real-time 3D point cloud data, read the texture gray value in the range of 0 to 255, divide the texture gray value by 255 for normalization calculation, and assign the result as the reflectance value to the corresponding 3D spatial point to generate a reflectance dataset.
[0101] S52. Read the maximum and minimum values of length, width and height of the three-dimensional digital reference model, construct an axially aligned cube bounding box, set the voxel mesh side length to 5 mm, divide the internal space of the bounding box into 256 by 256 by 256 cube units, calculate the center point coordinates of each cube unit, and generate a sampling point set.
[0102] S53. Locate the sampling point in the grid cell index of the three-dimensional digital reference model, obtain the coordinates and normal vectors of the four vertices, calculate the vector difference from the sampling point to each vertex and the dot product of the normal vector to obtain the vertex distance value, calculate the bilinear interpolation weight using the relative position and perform weighted summation, determine the sign according to the position of the sampling point relative to the tangent plane, generate a signed distance function value, and calculate the geometric implicit field value by combining the barycentric coordinates with linear weighting.
[0103] S54. Construct an 8-layer fully connected multilayer perceptron network. Input the coordinate vector of the sampling points, and generate a high-dimensional feature vector through linear weighting and the GELU activation function. Output the predicted directed distance value and the predicted reflectance value. Calculate the squared difference between the predicted value, the geometric implicit field value, and the actual reflectance value, multiply it by the weight coefficients 1.0 and 0.1 respectively, and accumulate them to obtain the joint loss function value. Calculate the partial derivatives to obtain the parameter update gradient and correct the parameter matrix to be optimized. Stop training when the value is less than 0.00001 and generate the joint implicit field.
[0104] The joint implicit field construction process proposed in this step is similar to the traditional implicit neural representation technology in that it is based on the continuous function field fitting theory. That is, it uses a multilayer perceptron network to map low-dimensional spatial coordinates to a high-dimensional feature space, learns the implicit representation of spatial location through differentiable functions, and uses a gradient-based backpropagation algorithm to optimize network parameters.
[0105] The difference lies in that this invention breaks away from the limitations of traditional methods that rely solely on single geometric constraints for surface reconstruction or focus only on texture rendering. Building upon the traditional model's prediction of only a signed distance function, this step adds a texture reflectivity mapping step, using normalized texture grayscale values as reflectivity values to jointly construct the dataset, rather than solely using geometric coordinates. In the constraint loss construction step, geometric loss terms and photometric loss terms are calculated separately using the implicit geometric field values generated by benchmark model mesh interpolation and the reflectivity values, and weighted coefficients are set for joint weighted optimization, rather than simply minimizing geometric errors. Finally, in the implicit field output step, a joint implicit field that simultaneously incorporates spatial geometry and texture reflectivity attributes is generated, rather than a single geometric surface field.
[0106] The beneficial effects of this improvement are that by introducing texture reflectivity constraints and a joint loss optimization mechanism, the optical reflection characteristics of the object surface can be integrated into the geometric implicit representation. This breaks through the limitations of traditional methods that lead to loss of detail or incomplete representation when texture information is lacking, and achieves a precise mapping from single geometric reconstruction to geometric-texture joint representation. This design significantly enhances the model's ability to perceive the surface material and micro-deformation of the target part, and can more accurately recover the texture details and undulation changes of the dust accumulation area in the implicit space. The parameterized training based on dual weight constraints effectively improves the representation completeness of the implicit field and enhances the robustness and reconstruction accuracy of the system under complex lighting conditions and low-quality texture input.
[0107] In this embodiment, S6 specifically includes:
[0108] S61. Set the virtual camera resolution to 1920 x 1080 pixels, define the optical center position and focal length parameters, use the intrinsic parameter matrix to inversely project the pixel coordinates into a unit direction vector, and connect the optical center and the direction vector to generate a virtual ray.
[0109] S62. Traverse the virtual ray to calculate and register the minimum spatial distance of the real-time 3D point cloud data, determine the target point and calculate its projection coordinates on the ray as the initial sampling starting point, set the initial step size to 1 mm, calculate the coordinates of the current step point, and input them into the joint implicit field solution of the directed distance field value and the illumination reflectivity value.
[0110] S63. Determine whether the directional distance field value of the current point is less than 0.1 mm. If so, stop tracking and record the cumulative ray travel length as the depth value, and extract the illumination reflectivity value as the diffuse texture value.
[0111] S64. If the directional distance field value of the current point is greater than or equal to 0.1 mm, then adjust the next step length to the current point value multiplied by the safety factor of 0.8, and continue tracking until the ray travel length exceeds 10,000 mm or the surface is determined to be hit.
[0112] S65. Arrange the depth values of all virtual rays according to pixel coordinates to generate a standard depth map, and arrange the diffuse texture values according to pixel coordinates to generate a diffuse texture map.
[0113] In this embodiment, the improved BiSeNet model includes a differential feature construction layer, an input preprocessing layer, a spatial detail encoding layer, a contextual semantic encoding layer, a geometric consistency verification layer, an attention feature fusion layer, and a gray area segmentation output layer:
[0114] The differential feature construction layer uses the camera intrinsic and extrinsic parameters matrix to transform the mesh vertices to the camera coordinate system, calculates the projected depth map, normalizes the difference between the projected depth map and the standard depth map to generate a differential depth map, and concatenates the differential depth map with the diffuse texture map to generate a feature tensor.
[0115] The input preprocessing layer performs batch normalization on the feature tensor and fills it with two rings of zero-value pixels around it, outputting the standard input feature tensor;
[0116] The spatial detail encoding layer calculates the horizontal and vertical gradient vectors of the differential depth map, rotates the gradient vectors by 90 degrees to obtain the tangent direction, performs offset addressing and feature interpolation aggregation on the standard input feature tensor along the tangent direction, and outputs a high-resolution spatial feature map by downsampling through a convolutional kernel with a stride of 2.
[0117] The context semantic encoding layer calculates the three-dimensional spatial coordinates of the pixel and inversely infers the surface normal vector. It sets the viewpoint transformation parameter of 5 degrees camera tilt and calculates the projection mapping relationship under different viewpoints. It performs bilinear interpolation resampling on the standard input feature tensor, calculates the absolute value of the difference as a disparity consistency measure, and outputs a low-resolution semantic feature map.
[0118] The geometric consistency verification layer calculates the rotation and translation transformation matrix and the differential geometric projection transformation matrix, backprojects the low-resolution semantic feature map to the virtual 3D space to obtain the predicted value of the ash thickness, superimposes the predicted value along the normal vector direction to calculate the physical prediction depth map, calculates the pixel-level residual between the physical prediction depth map and the depth data corresponding to the standard input feature tensor, extracts the negative pixel-level residual, squares it and divides it by 2, performs exponential operation with the natural constant e as the base, and calculates the geometric confidence mask.
[0119] The attention feature fusion layer uses a geometric confidence mask to perform pixel-wise multiplication with the low-resolution semantic feature map, filters out feature responses with a confidence level below 0.3, enlarges the calibrated feature map by 2 times, and adds it pixel-wise to the high-resolution spatial feature map to output a fused enhanced feature map.
[0120] The output layer for gray region segmentation uses a 1x1 convolutional kernel and a sigmoid activation function to perform pixel-by-pixel classification on the fused enhanced feature map, outputting probability values. Probability values greater than 0.5 are marked as 1, and those less than or equal to 0.5 are marked as 0, generating an effective gray region mask.
[0121] The improved BiSeNet model proposed in this step is similar to the traditional BiSeNet model in that it is based on feature pyramid and multi-scale fusion theory. That is, it extracts high-dimensional features of the input image through convolution operation, constructs feature layers of different resolutions by using downsampling and upsampling operations, and optimizes the feature distribution by using nonlinear activation function and batch normalization.
[0122] The difference lies in that this invention breaks away from the limitations of traditional methods that rely solely on image texture grayscale differences or direct comparison of depth data for segmentation. Building upon the traditional model's two-dimensional convolutional feature extraction, this step adds a spatial detail encoding step. It calculates the gradient vector of the differential depth map and rotates it 90 degrees to obtain the contour tangent direction, performing offset addressing and feature interpolation aggregation along the tangent direction, rather than a simple rectangular convolutional window scan. In the contextual semantic encoding step, the projection mapping relationship is calculated using the viewpoint transformation parameters of a virtual camera with a 5-degree tilt, and bilinear interpolation resampling is performed to extract disparity consistency metrics, rather than semantic analysis from a single viewpoint. Finally, in the geometric consistency verification step, the residual is calculated based on the predicted back-projected grayscale thickness and the physical depth map, and then substituted into an exponential function with the natural constant e as the base to generate a geometric confidence mask, rather than a single pixel feature classification confidence score.
[0123] The beneficial effects of this improvement are that, through contour-tangent-guided feature aggregation and multi-view parallax consistency measurement, the microscopic geometric structure of the air conditioner surface and the spatial distribution characteristics of the dust accumulation layer are integrated into the feature extraction process. This breaks through the limitations of traditional methods that rely solely on texture features under complex lighting or low contrast, resulting in blurred edges. It achieves accurate mapping from two-dimensional image appearance features to three-dimensional spatial geometric features. This design significantly enhances the model's ability to perceive subtle geometric changes between the dust accumulation area and the background material, enabling more accurate capture of the continuity and thickness changes of the dust accumulation layer along the surface tangent direction. The geometric confidence verification based on physical depth residuals effectively eliminates false texture noise interference, improves the geometric consistency of the segmentation results, and enhances the robustness and accuracy of the system in identifying dust accumulation boundaries in the task of air conditioner indoor unit cleanliness detection.
[0124] In this embodiment, S8 specifically includes:
[0125] S81. Extract the corresponding area of the diffuse texture map using the effective dust accumulation area mask, set the virtual camera focal length to 16 mm and the pixel size to 3.45 micrometers, calculate the actual physical size represented by the pixel, calibrate the physical size of the differential depth map, multiply the depth difference by the actual physical area to obtain the pixel volume unit, and sum them up to generate the total dust accumulation volume.
[0126] S82. Use the effective gray area mask to locate the corresponding area of the diffuse texture map, extract the red, green and blue color channel values of each pixel, calculate the pixel gray value according to the brightness formula, accumulate and divide by the total number of pixels to obtain the texture gray value feature.
[0127] S83. Use the effective dust accumulation area mask to locate the depth difference of the corresponding area of the differential depth map, find the maximum and minimum values, normalize the depth difference to the range of 0 to 1, and calculate the normalized depth mean of all pixels.
[0128] S84. Construct a sample set containing 500 historical gray samples, use texture gray features as independent variables and normalized depth mean as dependent variable to input the Gaussian process regression model, use the squared exponential kernel function to calculate the covariance matrix and optimize the hyperparameters through maximum likelihood estimation, and fit and construct the gray-depth nonlinear mapping function.
[0129] S85. Input the texture grayscale features into the grayscale-depth nonlinear mapping function to obtain the theoretically predicted normalized depth value, calculate the ratio of the actual normalized depth mean to the theoretically predicted normalized depth value, and obtain the gray accumulation density coefficient.
[0130] S86. Set the standard density parameter of the ash material to 1500 kg per cubic meter. Calculate the product of the total ash volume and the standard density parameter, or multiply the total ash volume by the ash density coefficient to obtain the corrected ash mass parameter. Compare the ash mass parameter with the preset cleanliness threshold range. If it is less than 0.05 g, the cleanliness level is judged as excellent. If it is between 0.05 g and 0.2 g, the cleanliness level is judged as good. If it is greater than 0.2 g, the cleanliness level is judged as poor. Based on the level result, control the air conditioner to start the automatic cleaning function and generate a prompt control command.
[0131] Example 1: To verify the feasibility of this invention in the monitoring of smart home appliance cleaning, the method of this invention was applied to the automatic cleaning system of a high-end smart air conditioner indoor unit of a well-known home appliance manufacturer (hereinafter referred to as "Company A"). In traditional air conditioner cleanliness detection systems, a single visual sensor is typically used to estimate the dust accumulation area based on color thresholds or simple edge detection algorithms. These methods not only struggle to accurately distinguish dust accumulation areas from background shadows under complex lighting and dark fin backgrounds, but also fail to accurately obtain the thickness and density of the dust accumulation layer, easily leading to misjudgments of dust accumulation quality or false triggering of cleaning commands. To solve the above problems, Company A decided to adopt the deep learning-based air conditioner indoor unit cleanliness detection method proposed in this invention.
[0132] During implementation, Company A first used a miniature 3D laser scanner and a high-resolution RGB camera deployed on the air conditioner's air guide plate and inner wall to acquire real-time point cloud data and texture image streams of the evaporator surface. After preprocessing operations such as spatiotemporal registration, denoising and filling, and texture normalization, a joint implicit field including reflectivity attributes was constructed. Simultaneously, Company A's technical experts performed precise labeling of the dust accumulation areas and thickness measurements on the collected multi-source data, serving as a benchmark for model training and quality evaluation.
[0133] Company A improved the BiSeNet model by constructing a spatial detail encoding layer and a contextual semantic encoding layer. It used the gradient vectors of the differential depth map to calculate the tangent direction of contour lines, and performed offset addressing and feature interpolation aggregation along the tangent direction to generate a high-resolution spatial feature map. Simultaneously, it used a 5-degree virtual camera tilt perspective transformation parameter to calculate the projection mapping relationship, extract the disparity consistency metric, and output a low-resolution semantic feature map. Subsequently, it used an attention feature fusion layer combined with a geometric confidence mask based on physical depth residuals to filter out interfering features with a geometric confidence level below 0.3. The calibrated semantic feature map and spatial feature map were then added pixel-by-pixel to output a fused and enhanced feature map.
[0134] In the core segmentation and evaluation stage, this invention uses a dust accumulation region segmentation output layer and a 1x1 convolution kernel with a Sigmoid activation function to perform pixel-by-pixel classification, generating a binary effective dust accumulation region mask, thus achieving precise localization of the dust accumulation region. Subsequently, virtual camera parameters are used to calibrate the pixel physical size of the differential depth map, and the total dust accumulation volume is generated by summing the pixels. A gray-level-depth nonlinear mapping function is constructed by combining texture gray-level features and a Gaussian process regression model to calculate the dust accumulation density coefficient, thereby obtaining the corrected dust accumulation quality parameters. Finally, the system compares the dust accumulation quality parameters with preset cleanliness threshold ranges (0.05 grams, 0.2 grams) to determine the cleanliness level as excellent, good, or poor, and triggers automatic cleaning functions or prompts control commands accordingly, achieving a closed-loop transition from perception to cleaning decision-making.
[0135] During implementation, Company A's technical team discovered that, compared to traditional timed cleaning and single-vision detection methods, the method of this invention significantly improves the accuracy of air conditioner dust accumulation identification and the efficiency of cleaning resource utilization. Traditional methods cannot quantify the volume and density of dust accumulation and have poor identification effects on thin layers of dust or dark oil stains, easily leading to "missed scans" or "over-cleaning." In contrast, the method of this invention effectively achieves accurate perception of dust accumulation morphology and quantitative assessment of quality through contour tangent guidance feature extraction, multi-view parallax consistency verification, and dust accumulation density coefficient correction.
[0136] To further verify the actual performance of the method of the present invention, Company A conducted a detailed comparative test between the method of the present invention and the traditional method. The specific performance data is shown in Table 1:
[0137] Table 1. Performance Comparison of Dust Accumulation Identification and Cleaning Control Methods for Air Conditioner Indoor Units of Company A
[0138] index Traditional methods Method of the present invention Increase Accuracy rate of dust accumulation area segmentation (%) 84.2 97.5 +13.3% Error in ash accumulation quality estimation (%) 25.6 3.8 -85.2% False positive rate (%) in complex contexts 15.8 1.2 -92.4% Average depth measurement error (mm) 0.8 0.05 -93.8% Thin-layer ash accumulation detection rate (%) 72.0 96.0 +24.0% Number of invalid cleaning triggers (times / month) 12 1 -91.7% Time taken for a single detection process (seconds) 2.5 0.8 -68.0% The rate of compliance with cleanliness standards for dense dust accumulation (%) 88.0 99.2 +11.2% User cleaning satisfaction (%) 89.0 98.5 +9.5% Maintenance and consumable costs (RMB / year) 300 180 -40.0%
[0139] As shown in Table 1, the performance of the air conditioner indoor unit dust accumulation identification and cleaning control system has been comprehensively improved after applying the method of this invention. The accuracy of dust accumulation area segmentation increased from 84.2% of the traditional method to 97.5%, and the dust accumulation quality estimation error decreased from 25.6% to 3.8%, significantly improving the accuracy of cleanliness assessment and providing a reliable basis for intelligent cleaning. The false detection rate under complex backgrounds decreased from 15.8% to 1.2%, effectively avoiding false cleaning caused by shadow interference. The average depth measurement error decreased from 0.8 mm to 0.05 mm, significantly enhancing the ability to perceive micron-level dust accumulation. In addition, the cleanliness compliance rate of dense dust accumulation increased from 88.0% to 99.2%, and the maintenance consumable cost decreased from 300 yuan / year to 180 yuan / year, significantly reducing operation and maintenance costs. User cleaning satisfaction also significantly improved, from 89.0% to 98.5%.
[0140] Through the method of this invention, Company A has successfully achieved accurate perception and on-demand cleaning control of the dust accumulation status of the indoor unit of the air conditioner, effectively solving the pain points of traditional air conditioners' "blind cleaning" or "incomplete cleaning", ensuring users' respiratory health, significantly improving the intelligence and energy-saving level of air conditioner cleaning, significantly enhancing the user experience, enhancing the environmental adaptability and robustness of the smart home appliance system, and providing strong technical support for the construction of a healthy smart home ecosystem.
[0141] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.
Claims
1. A method for detecting the cleanliness of an air conditioner indoor unit based on deep learning, characterized in that, Includes the following steps: S1. Project a sinusoidal phase-shift coding pattern under clean conditions, acquire multiple frames of phase-shift images and calculate the phase, and reconstruct a three-dimensional digital reference model by combining calibration parameters. S2. Project the coded pattern according to the preset cycle while the air conditioner is running, collect real-time images and decode and reconstruct the real-time three-dimensional point cloud data of the target component surface; S3. Project the real-time 3D point cloud data onto the depth image space, construct a robust objective function and calculate the initial transformation matrix to obtain coarse registration point cloud data; S4. Based on the coarsely registered point cloud data, a dynamic weight map is constructed. Outliers are dynamically removed by using the point-to-plane ICP algorithm combined with the cosine value of the angle between the normal vectors. The coordinates are then corrected to obtain the registered real-time 3D point cloud data. S5. Extract the texture grayscale of the registered real-time 3D point cloud data, and construct a joint implicit field by fitting the mapping relationship between spatial coordinates and directed distance field and illumination reflectivity, using the geometric topology of the 3D digital benchmark model as constraints. S6. Using the RayMarching algorithm and based on the initial spatial position of the registered real-time 3D point cloud data, the joint implicit field is sampled and tracked to decouple and generate a standard depth map and a diffuse texture map. S7. Input the standard depth map and diffuse texture map into the improved BiSeNet model, use the 3D digital benchmark model to project and calculate the differential depth map and stitch the texture map to construct a dual-channel feature tensor, use the geometric residual to generate a geometric confidence mask based on the differentiable physical rendering map, introduce the geometric confidence mask to perform gated attention fusion on the dual-scale features and output the effective gray area mask. S8. Based on the effective ash accumulation area mask, perform three-dimensional integral calculation of the differential depth map to calculate the total ash accumulation volume, establish the mapping relationship between diffuse reflection texture and differential depth to solve the ash accumulation density coefficient, match the cleanliness threshold to output the level result and trigger control.
2. The method for detecting the cleanliness of an air conditioner indoor unit based on deep learning according to claim 1, characterized in that, S1 specifically includes: S11. Solve the wrapping phase and calculate the absolute phase using the multi-frequency heterodyne principle. Combine the calibration parameters and the system calibration matrix, calculate the three-dimensional spatial coordinates of the pixel points according to the phase-depth mapping formula, and convert the coordinates to the world coordinate system to generate the initial reference point cloud data. S12. Perform statistical filtering on the initial reference point cloud data, calculate the average distance from the point to the neighboring points and remove outlier noise points whose distance exceeds the preset threshold, and generate effective point cloud data. S13. Perform triangulation mesh reconstruction on the effective point cloud data, connect spatial points according to the consistency of spatial point normal vectors and the distance criterion of neighboring points, construct topological connection relationship and generate a three-dimensional mesh model. S14. Project the 3D mesh model onto the image plane, determine the visibility of the patches and map the texture color information using the ray casting method, and generate a 3D digital reference model with texture mapping.
3. The method for detecting the cleanliness of an air conditioner indoor unit based on deep learning according to claim 1, characterized in that, S2 specifically includes: S21. The wrapping phase of the real-time image is solved using the sinusoidal phase shift algorithm. The absolute phase is calculated by combining the hierarchical relationship between high-frequency and low-frequency phases. The absolute phase is mapped to three-dimensional spatial coordinates according to the system calibration parameters to generate initial real-time three-dimensional point cloud data. S22. Perform statistical filtering on the initial real-time 3D point cloud data, calculate the average distance from each point in the point cloud to the set of neighboring points and set a distance threshold, remove discrete noise points whose distance exceeds the threshold, and generate filtered point cloud data. S23. Downsample the filtered point cloud data, divide the point cloud space using a voxel grid, take the centroid of the point cloud falling in the same grid as the replacement point, reduce the amount of point cloud data while maintaining geometric features, and generate simplified point cloud data. S24. Extract grayscale values from the pixel coordinate indices of each point in the simplified point cloud data, assign the grayscale values as attribute information to the corresponding three-dimensional spatial points, and generate real-time three-dimensional point cloud data containing texture information.
4. The method for detecting the cleanliness of an air conditioner indoor unit based on deep learning according to claim 1, characterized in that, S3 specifically includes: S31. Project the real-time 3D point cloud data and the 3D digital reference model onto the depth image space to generate a real-time depth map and a reference depth map; calculate the difference between the depth value at each pixel coordinate in the real-time depth map and the corresponding depth value at the pixel coordinate in the reference depth map to generate a depth residual map; substitute the depth residual value at each pixel coordinate in the depth residual map into the Gaussian kernel function, calculate the ratio of the square of the negative depth residual value to the variance parameter, and perform exponential operation on the comparison value to generate a weight value; multiply the weight value by the square of the corresponding depth residual value to generate a weighted error value, and sum the weighted error values at all pixel coordinates to obtain the robust objective function; S32. Perform nonlinear optimization on the constructed robust objective function, iteratively update the pose parameters using the Levenberg-Marquardt algorithm until the objective function converges, and calculate the initial transformation matrix. S33. Use the calculated initial transformation matrix to perform coordinate transformation on the real-time 3D point cloud data to generate coarsely registered point cloud data.
5. The method for detecting the cleanliness of an air conditioner indoor unit based on deep learning according to claim 1, characterized in that, S4 specifically includes: S41. Perform a neighborhood search on each point in the coarsely registered point cloud data, fit a local micro-tangent plane and calculate the normal vector to generate a set of normal vectors. S42. Project the coarse registration point cloud data onto the surface of the three-dimensional digital reference model, calculate the reference normal vector at the projection point, and generate a set of reference normal vectors. S43. Traverse each point in the coarsely registered point cloud data, calculate the cosine of the angle between the current point's normal vector and the corresponding reference normal vector, mark points with a cosine of the angle less than a preset consistency threshold as outliers and remove them, and generate a set of valid matching points. S44. Apply the point-to-plane ICP algorithm to the effective matching point set, construct the perpendicular distance error function between the point and the corresponding tangent plane, and iteratively solve the correction transformation matrix by minimizing the error function; S45. Use the correction transformation matrix to perform coordinate transformation on the coarsely registered point cloud data to generate registered real-time 3D point cloud data.
6. The method for detecting the cleanliness of an air conditioner indoor unit based on deep learning according to claim 1, characterized in that, S5 specifically includes: S51. Extract the texture gray value corresponding to each point in the registered real-time 3D point cloud data, and normalize the texture gray value to obtain the reflectance value, and generate a reflectance dataset. S52. Using the bounding box of the three-dimensional digital reference model as the spatial boundary, the interior of the spatial boundary is discretized into a voxel mesh to generate a set of sampling points. S53. For each sampling point in the sampling point set, query the grid cell index of the 3D digital reference model at the sampling point to obtain the coordinates of the four vertices of the grid cell and the corresponding unit normal vector; calculate the vector difference between the sampling point and each vertex, and perform a dot product operation between the vector difference and the unit normal vector of the corresponding vertex to obtain the distance values of the four vertices; perform bilinear interpolation on the distance values of the four vertices to obtain the sampling point distance value; determine the sign of the sampling point distance value according to the spatial position relationship of the sampling point relative to the tangent plane of the grid cell, and generate a signed distance function value; calculate the centroid coordinates relative to the grid cell using the spatial coordinates of the sampling point, and perform a linear weighted sum of the signed distance function value and the centroid coordinates to obtain the geometric implicit field value; S54. Using the spatial coordinates of the sampling points as the input vector, perform a linear weighted summation calculation on the input vector and the parameter matrix to be optimized. Substitute the calculation result into the GELU activation function, calculate the product of the input variable and the cumulative distribution function value of the standard normal distribution to obtain the nonlinear response value, perform feature mapping operation, and generate feature vectors layer by layer; perform decomposition calculation on the feature vectors, and output the predicted directed distance value and the predicted reflectance value respectively; calculate the first difference between the predicted directed distance value and the geometric implicit field value, perform a square operation on the first difference to generate the geometric loss term; calculate the second difference between the predicted reflectance value and the actual reflectance value in the reflectance dataset. The difference is calculated by squaring the second difference to generate a photometric loss term. Geometric constraint weight coefficients are set, and the geometric loss term is multiplied by these coefficients to obtain a weighted geometric error. Photometric constraint weight coefficients are also set, and the photometric loss term is multiplied by these coefficients to obtain a weighted photometric error. The weighted geometric error and the weighted photometric error are summed to obtain the joint loss function value. The partial derivative of the joint loss function value with respect to the parameter matrix to be optimized is calculated, and a parameter update gradient is constructed. This gradient is used to perform a correction iteration calculation on the parameter matrix to be optimized until the joint loss function value is less than a preset convergence threshold, generating a joint implicit field.
7. The method for detecting the cleanliness of an air conditioner indoor unit based on deep learning according to claim 1, characterized in that, S6 specifically includes: S61. Define the imaging plane parameters of the virtual camera, and back-project each pixel on the imaging plane into three-dimensional space to generate multiple virtual rays starting from the optical center of the camera. S62. Traverse each virtual ray, calculate the spatial distance between the current virtual ray and all data points in the registered real-time 3D point cloud data, and filter out the target point cloud data point corresponding to the minimum spatial distance; calculate the perpendicular vector from the target point cloud data point to the virtual ray, and use the spatial coordinates of the target point cloud data point minus the perpendicular vector to calculate the projection coordinates of the target point cloud data point on the virtual ray, and set the projection coordinates as the initial sampling starting point; use the initial sampling starting point coordinates plus the product of the preset step size and the direction vector of the virtual ray to calculate the coordinates of the current step point; input the coordinates of the current step point into the joint implicit field, and solve for the directed distance field value and the illumination reflectivity value corresponding to the coordinates of the current step point; S63. Determine whether the directional distance field value of the current point is less than the preset distance threshold. If so, determine that the current point is located on the implicit surface of the three-dimensional digital reference model, stop the tracking of the virtual ray and record the current ray travel length as the depth value, and extract the light reflectance value of the current point as the diffuse texture value. S64. If the directional distance field value of the current point is greater than or equal to the preset distance threshold, the next step length is dynamically adjusted according to the directional distance field value of the current point, and the tracking continues along the ray direction until the ray travel length exceeds the preset maximum threshold or the surface is determined to be hit. S65. Arrange the depth values of all virtual rays according to pixel coordinates to generate a standard depth map, and arrange the diffuse texture values of all virtual rays according to pixel coordinates to generate a diffuse texture map.
8. The method for detecting the cleanliness of an air conditioner indoor unit based on deep learning according to claim 1, characterized in that, The improved BiSeNet model includes a differential feature construction layer, an input preprocessing layer, a spatial detail encoding layer, a contextual semantic encoding layer, a geometric consistency verification layer, an attention feature fusion layer, and a gray area segmentation output layer. The differential feature construction layer is used to obtain the internal and external parameters of the virtual camera, project the three-dimensional digital reference model onto the imaging plane to generate a projection depth map, and calculate the differential depth map between the projection depth map and the standard depth map; the normalized differential depth map is stitched with the diffuse texture map to generate a dual-channel feature tensor. The input preprocessing layer is used to perform batch normalization and zero-padding on the dual-channel feature tensor and output a standard input feature tensor. The spatial detail encoding layer is used to calculate the pixel-level depth gradient field based on the differential depth map, obtain the geometric edge orientation and normal vector distribution of the dust accumulation surface; using the orthogonal direction of the depth gradient vector as the sampling guide trajectory, the pixels in the standard input feature tensor are offset and addressed, and feature aggregation and cascaded downsampling calculations are performed along the depth contour line direction to output a high-resolution spatial feature map that retains high-frequency texture details and is sensitive to geometric edges. The context semantic coding layer is used to reconstruct the three-dimensional spatial coordinates and surface normal vectors of pixels by backprojecting from the standard depth map, calculate the projection mapping relationship of the virtual camera under the optical axis view along the preset deflection angle, perform view transformation of pixel coordinates and feature resampling on the standard input feature tensor based on the projection mapping relationship, calculate the feature response consistency under different virtual view, extract deep semantic features with disparity invariance, and output a low-resolution semantic feature map containing ash thickness prediction information. The geometric consistency verification layer is used to receive low-resolution semantic feature maps and construct a differentiable physical rendering mapping based on a three-dimensional digital benchmark model. This includes obtaining the intrinsic and extrinsic parameters of the virtual camera, calculating the rotation and translation transformation matrix from the camera coordinate system to the world coordinate system, extracting the vertex coordinates of the three-dimensional digital benchmark model, multiplying the vertex coordinates with the rotation and translation transformation matrix and normalizing them to generate a differential geometric projection transformation matrix. A back-projection transformation is performed on the low-resolution semantic feature map to map it to a virtual 3D space to obtain pixel-level predicted dust accumulation thickness values. The predicted dust accumulation thickness values are superimposed on the differential geometric projection transformation matrix along the normal vector direction of the reference surface. The physical location distribution of the predicted dust accumulation layer on the imaging plane is calculated to generate a physical predicted depth map. The pixel-level residual between the physical predicted depth map and the depth data corresponding to the standard input feature tensor is calculated. The pixel-level residual is subjected to a negative exponential transformation to generate a geometric confidence mask. The attention feature fusion layer uses a geometric confidence mask to perform gated modulation on the low-resolution semantic feature map, filters out feature responses that do not conform to the geometric graying mechanism, obtains a calibrated semantic feature map, and then upsamples the calibrated semantic feature map and adds it pixel by pixel to the high-resolution spatial feature map to output a fused enhanced feature map. The gray area segmentation output layer is used to perform pixel-by-pixel classification on the fused enhanced feature map and generate a binary effective gray area mask based on a preset classification threshold.
9. The method for detecting the cleanliness of an air conditioner indoor unit based on deep learning according to claim 1, characterized in that, S8 specifically includes: S81. Extract the region of interest based on the effective ash accumulation area mask, perform pixel physical size calibration on the differential depth map within the region of interest and perform three-dimensional integral calculation to generate the total ash accumulation volume; S82. Based on the effective gray area mask, locate the corresponding area in the diffuse texture map, convert the corresponding area from RGB color space to grayscale space, and calculate the average grayscale value of all pixels in the area as the texture grayscale feature. S83. Based on the effective ash accumulation area mask, locate the corresponding depth value of the corresponding area in the differential depth map, perform maximum and minimum normalization processing, map the value to the [0,1] interval, and calculate the normalized depth mean of all pixels in the area. S84. Based on the historical gray sample set, with texture gray features as independent variables and normalized depth mean as dependent variable, a gray-depth nonlinear mapping function is constructed by fitting Gaussian process regression. S85. Input the texture grayscale features into the grayscale-depth nonlinear mapping function to calculate the theoretically predicted normalized depth value, and calculate the ratio of the theoretically predicted normalized depth value to the actual normalized depth mean value as the gray accumulation density coefficient. S86. Calculate the product of the total ash volume and the ash density coefficient to obtain the ash quality parameter. Compare the ash quality parameter with the preset cleanliness threshold range, output the cleanliness level result and generate the corresponding control command.