Blind path recognition method and system based on deep neural network
By combining deep neural networks with perspective transformation and frequency domain decomposition, the problem of blind path recognition failure under complex working conditions was solved, and accurate blind path recognition was achieved in environments such as snow and mud.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QINGDAO YAHE SCI & TECH DEV
- Filing Date
- 2026-04-29
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies fail to recognize tactile paving under complex conditions due to the breakage of spatial gradients at the edges, making it difficult to achieve accurate recognition in environments such as snow and mud.
By constructing a tactile paving recognition method based on deep neural networks, perspective distortion is eliminated by using a perspective transformation matrix, local phase consistency features are extracted by frequency domain decomposition, adaptive feature fusion is performed by combining information entropy evaluation, and the enhanced feature tensor is input into the neural network for recognition.
To maintain recognition accuracy in complex environments, deterministic recognition of tactile paving targets and environmental adaptability are achieved through frequency domain phase consistency and information entropy adjustment, reducing sensitivity to illumination and occlusion.
Smart Images

Figure CN122244775A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of visual detection technology for assistive guides in the field of information processing. It relates to a method and system for tactile paving recognition based on deep neural networks. It is used for tactile guide terminals for visually impaired people to perceive the road surface under complex obstruction conditions. It can solve the recognition failure problem caused by spatial gradient breakage by utilizing phase structure consistency, thereby improving the accuracy of tactile paving detection. Background Technology
[0002] Currently, using computer vision technology for tactile paving identification is a research hotspot in the field of guide devices for the visually impaired. Deep neural networks are used to extract pixel features from environmental images, and semantic segmentation is achieved by fitting the color and texture of the tactile paving. The convolutional neural network feature extraction mechanism is based on the continuity of gray-level changes in pixel space. It captures the spatial gradient magnitude of the target edge through convolution with different receptive field weights, generating a high-dimensional tensor representing the geometric structure of the object. Representative existing technical solutions in this field include: 1. Chinese invention patent with authorization publication number CN108665050A discloses a method for identifying tactile paving based on deep convolutional neural networks. Its technical solution mainly uses an end-to-end semantic segmentation network to classify and extract tactile paving pixels. However, the drawback of this solution is that its feature extraction is highly dependent on the continuity of pixel spatial gradient. In real urban travel environments, when there is snow cover, rain reflection, mud obstruction or strong backlight interference, the physical texture and edge features of the tactile paving will have physical dimension breaks in the pixel matrix. The distortion of the continuity of the spatial pixel matrix will cause semantic holes in the original feature matrix, resulting in the risk of missed detection.
[0003] 2. Chinese invention patent CN105096327B discloses a method for tactile paving location based on computer binocular vision and homography matrix. The technical solution is to detect SURF feature points in the images of left and right cameras, match and solve the plane equation of the tactile paving and the spatial equation of the center line. The premise of this solution is that the environment has stable lighting conditions and the texture features of the tactile paving are clear, so that the feature operator can obtain accurate matching points. Its drawback is that: in real urban travel environments, there is snow cover, mud obstruction or strong backlight interference, and the physical texture and edge features of the tactile paving produce physical dimension breaks in the pixel matrix, resulting in distortion of the continuity of the spatial pixel matrix, which makes the feature detection mechanism unable to capture gradient changes, resulting in recognition gaps or mismatches.
[0004] 3. Chinese invention patent with authorization publication number CN112417754A discloses a tactile paving detection system that combines multi-sensor fusion and temporal analysis. Its technical solution introduces an ultrasonic ranging sensor in conjunction with a monocular camera and uses multi-frame temporal interpolation to compensate for the shortcomings of single-frame recognition. The drawbacks of this solution are: on the one hand, the temporal compensation is constrained by the dynamic blur of the mobile device and fails when the device shakes rapidly; on the other hand, the additional hardware is difficult to popularize in guide terminals due to cost and size limitations, and the spatiotemporal coordination calibration between sensors is complex and difficult to cope with the problem of physical truncation of spatial features under extremely complex working conditions.
[0005] In summary, the core algorithm has an inherent contradiction between its dependence on the continuity of local gradient magnitude and the physical truncation of spatial features under complex working conditions. This logic has limitations in dealing with extreme environments. Therefore, how to utilize existing visual acquisition units and construct a feature response mechanism with structural penetration capability using the physical geometric periodic properties of tactile paving to achieve deterministic identification of tactile paving targets under complex occlusion conditions has become the technical problem to be solved by this invention. Summary of the Invention
[0006] The purpose of this invention is to overcome the shortcomings of the prior art and solve the technical problem that the spatial gradient of the tactile paving edge breaks due to complex working conditions such as snow and mud, which leads to recognition failure. This invention provides a tactile paving recognition method based on deep neural networks.
[0007] To achieve the above-mentioned objectives, this invention provides a method for identifying tactile paving based on deep neural networks, the main process of which includes the following steps: Step S1, Acquisition of raw image data and pitch angle parameters: Acquire raw image data containing the blind road surface and acquire the pitch angle parameters of the imaging sensor when acquiring the raw image data; Step S2, perspective transformation processing and top view generation: Construct a perspective transformation matrix based on the pitch angle parameters, and use the perspective transformation matrix to perform spatial geometric mapping processing on the original image data to generate a top view pixel matrix that eliminates perspective distortion. Step S3, Frequency Domain Decomposition and Phase Consistency Matrix Construction: The pixel matrix of the top view is decomposed in the frequency domain using a preset logarithmic Gabor filter bank, and the local phase consistency feature values of each pixel are extracted to construct a local phase consistency matrix to characterize the continuity of the tactile paving stripes. Step S4, Calculation of pixel grayscale variance and spatial information entropy: Calculate the pixel grayscale variance and spatial information entropy of the top view pixel matrix within a preset local area; Step S5, Weight Adaptive Fusion and Feature Tensor Enhancement: Determine the mixed weight parameters of the local phase consistency matrix and pixel gray-level variance based on the spatial information entropy, and use the mixed weight parameters to perform linear weighted fusion of the local phase consistency matrix and pixel gray-level variance to generate an enhanced feature tensor. When the spatial information entropy increases to a preset entropy threshold, the weight ratio coefficient of the local phase consistency matrix in the fusion process is increased. Step S6, Neural Network Feature Extraction and Recognition Result Output: Input the enhanced feature tensor into the spatial feature extraction channel of the deep neural network, use the deep neural network to extract tactile paving features, and output the tactile paving recognition result.
[0008] In step S3 of this invention, a preset log-Gabor filter bank is used to perform frequency domain decomposition on the top-view pixel matrix, extract local phase consistency feature values, and construct a local phase consistency matrix. This includes: Step S31, constructing a multi-scale, multi-directional Log-Gabor filter bank using preset center frequencies and directional distribution parameters; Step S32, performing a discrete Fourier transform on the top-view pixel matrix and mapping it to the frequency domain feature space; Step S33, using the log-Gabor filter bank to perform spectral sampling in the frequency domain feature space and extracting response components at different scales; Step S34, calculating the energy broadening distribution and local phase distribution of each pixel point based on the complex amplitude and phase information of each response component; Step S35, determining the local phase consistency feature values based on the proportional relationship between the energy broadening distribution and the local phase distribution, and constructing a local phase consistency matrix based on the local phase consistency feature values.
[0009] In step S2 of this invention, the original image data is processed by spatial geometric mapping using a perspective transformation matrix to generate a top-view pixel matrix that eliminates perspective distortion. This includes: Step S21: Calculating the rotational transformation constraint relationship between the imaging optical axis and the tactile paving plane using pitch angle parameters; Step S22: Determining the ground sampling distance of each pixel row in the original image data relative to the imaging sensor; Step S23: Determining the resampling step size based on the ground sampling distance, and using the resampling step size to perform homography reconstruction on the original image data; Step S24: Outputting a geometrically scale-aligned top-view pixel matrix, wherein tactile paving targets with a medium physical width in the top-view pixel matrix have a consistent pixel projection width.
[0010] In step S5 of this invention, the mixed weight parameters of the local phase consistency matrix and the pixel grayscale variance are determined based on the spatial information entropy, and an enhanced feature tensor is generated. This includes: Step S51: Comparing the spatial information entropy with a preset noise interference threshold; Step S52: When the spatial information entropy exceeds the noise interference threshold, determining that there is physical occlusion in the current local area, and reducing the weight coefficient of the pixel grayscale variance; Step S53: Compensating the weight coefficient of the local phase consistency matrix to the weight share of the reduced pixel grayscale variance, so as to use the phase coherence feature in the local phase consistency matrix to complete the texture of the disturbed area in the top view pixel matrix and generate an enhanced feature tensor.
[0011] In step S4 of this invention, the spatial information entropy of the top-view pixel matrix within a preset local region is calculated using the following formula: ,in, For spatial information entropy, The total number of gray levels. grayscale value The probability of it appearing in a preset local area.
[0012] The step S6 of this invention, which involves inputting the enhanced feature tensor into the spatial feature extraction channel of a deep neural network, includes: Step S61: Extracting a high-dimensional semantic feature map of the top-view pixel matrix using the convolutional layer of the deep neural network; Step S62: Mapping the enhanced feature tensor to an attention weight mask and superimposing the attention weight mask onto the pixel coordinates of the high-dimensional semantic feature map; Step S63: Using the enhanced feature tensor to suppress background noise in areas where the spatial information entropy exceeds a preset entropy threshold.
[0013] The deep neural network described in this invention comprises multiple depthwise separable convolutional layers and global average pooling layers, wherein the kernel size of the depthwise separable convolutional layers is set to... This limits the time overhead of a single forward inference operation during mobile processor operation.
[0014] The training process of the deep neural network described in this invention includes: Step S81: Obtaining a tactile paving labeled dataset containing conditions such as mud cover, snow cover, and strong light flooding; Step S82: Inputting the labeled dataset into the initial network model and using the local phase consistency matrix corresponding to the labeled dataset for feature-guided training; Step S83: Updating the internal weights of the deep neural network through the backpropagation algorithm until the tactile paving recognition accuracy under occlusion conditions reaches a preset target value. The tactile paving recognition result includes the coordinate parameters of the tactile paving centerline and the tactile paving category attributes; the tactile paving category attributes include the attributes of strip-shaped guiding tactile paving and the attributes of dotted tactile paving.
[0015] The present invention further includes: step S91: real-time statistical analysis of the geometric center position offset of the recognition result between adjacent time frames; step S92: when the geometric center position offset continuously exceeds the preset safety deviation threshold, a status warning signal is sent to the external guide device.
[0016] The present invention also provides a tactile paving identification system based on a deep neural network, comprising: The signal acquisition unit is used to acquire raw image data including the blind road surface and to acquire the imaging sensor pitch angle parameters when acquiring the raw image data; The geometric transformation unit is used to construct a perspective transformation matrix based on the pitch angle parameter, and to perform spatial geometric mapping processing on the original image data using the perspective transformation matrix to generate a top view pixel matrix that eliminates perspective distortion. The phase analysis unit is used to perform frequency domain decomposition on the top view pixel matrix using a preset logarithmic Gabor filter bank, extract the local phase consistency feature values of each pixel, and construct a local phase consistency matrix to characterize the continuity of tactile paving stripes. The quality assessment unit is used to calculate the pixel grayscale variance and spatial information entropy of the top-view pixel matrix within a preset local area. The feature enhancement unit is used to determine the mixed weight parameters of the local phase consistency matrix and the pixel gray-level variance based on the spatial information entropy, and to use the mixed weight parameters to perform linear weighted fusion of the local phase consistency matrix and the pixel gray-level variance to generate an enhanced feature tensor. When the spatial information entropy increases to a preset entropy threshold, the feature enhancement unit increases the weight ratio coefficient of the local phase consistency matrix in the fusion process. The reasoning and recognition unit is used to input the enhanced feature tensor into the spatial feature extraction channel of the deep neural network, use the deep neural network to extract tactile paving features, and output the tactile paving recognition results.
[0017] Compared with the prior art, the present invention has at least the following beneficial effects: Firstly, in the blind path recognition of deep neural networks, the logical transformation of image feature extraction from pixel gradient dependence to structural phase constraint is realized. This invention utilizes the two-dimensional geometric periodic physical properties of blind paths and constructs a feature response mechanism with illumination invariance and structural penetration capability by extracting local phase consistency features in the image frequency domain. Since the phase information can still maintain a high degree of continuity when the image amplitude fluctuates drastically or the texture edge is physically broken due to mud and snow, this mechanism makes the recognition process no longer solely dependent on the easily disturbed spatial gradient amplitude. This ensures that the system can still achieve mathematical reconstruction of the target structure through the continuity of the main frequency energy in the frequency domain under complex lighting environments or physical occlusion conditions.
[0018] Secondly, it achieves global consistency alignment between 3D perspective distortion and spatial frequency distribution. By constructing a perspective projection transformation matrix using the elevation angle parameters of the imaging device when acquiring the current environmental image, the perspective image is converted into an orthogonal projection matrix. This processing step eliminates the nonlinear frequency drift caused by the perspective relationship at the initial stage of data flow, so that the tactile paving structure of the same physical period presents a constant spatial frequency reference in the entire image matrix. This deep coupling of geometric prior and frequency domain analysis provides a stable feature space for subsequent narrowband filtering and phase extraction, avoiding feature response failure caused by changes in viewing angle.
[0019] Third, an adaptive feature compensation and dynamic weight arbitration loop based on information entropy evaluation is constructed. By calculating the spatial information entropy of local areas of the image, the degree of occlusion or noise interference in the area is evaluated in real time, and the fusion weight of frequency domain features and spatial features is dynamically adjusted accordingly. When the information entropy of a local area increases sharply due to strong light flooding or foreign object occlusion, the system autonomously enhances the guiding role of phase consistency weight and uses structural priors to complete the broken spatial semantics. In low-noise areas, the edge sharpness of spatial features is preserved. This closed-loop modulation method based on data quality feedback solves the inherent contradiction between detail preservation and robustness in feature representation. Attached Figure Description
[0020] Figure 1 This is a schematic diagram of the processing flow of the tactile paving identification algorithm based on frequency-space feature fusion involved in this invention; Figure 2 This is a schematic diagram of the end-to-end logical state transition and risk warning of the guide system for the visually impaired, as involved in this invention. Detailed Implementation
[0021] The technical solution of the present invention will be clearly and completely described below with reference to the embodiments and accompanying drawings.
[0022] Example 1: This embodiment relates to a method for identifying tactile paving based on deep neural networks, the main process of which includes the following steps: Step S1, Acquisition of raw image data and pitch angle parameters: Acquire raw image data containing the blind road surface and acquire the pitch angle parameters of the imaging sensor when acquiring the raw image data; Step S2, perspective transformation processing and top view generation: Construct a perspective transformation matrix based on the pitch angle parameters, and use the perspective transformation matrix to perform spatial geometric mapping processing on the original image data to generate a top view pixel matrix that eliminates perspective distortion. Step S3, Frequency Domain Decomposition and Phase Consistency Matrix Construction: The pixel matrix of the top view is decomposed in the frequency domain using a preset log-Gabor filter bank, and the local phase consistency feature values of each pixel are extracted to construct a local phase consistency matrix to characterize the continuity of the tactile paving stripes. Step S4, Calculation of pixel grayscale variance and spatial information entropy: Calculate the pixel grayscale variance and spatial information entropy of the top view pixel matrix within a preset local area; Step S5, Weight Adaptive Fusion and Feature Tensor Enhancement: Determine the mixed weight parameters of the local phase consistency matrix and pixel gray-level variance based on the spatial information entropy, and use the mixed weight parameters to perform linear weighted fusion of the local phase consistency matrix and pixel gray-level variance to generate an enhanced feature tensor. When the spatial information entropy increases to a preset entropy threshold, the weight ratio coefficient of the local phase consistency matrix in the fusion process is increased. Step S6, Neural Network Feature Extraction and Recognition Result Output: Input the enhanced feature tensor into the spatial feature extraction channel of the deep neural network, use the deep neural network to extract tactile paving features, and output the tactile paving recognition result.
[0023] In step S3 of this embodiment, a preset log-Gabor filter bank is used to perform frequency domain decomposition on the top-view pixel matrix, extract local phase consistency feature values, and construct a local phase consistency matrix. This includes: step S31, constructing a multi-scale, multi-directional log-Gabor filter bank using preset center frequency and directional distribution parameters; step S32, performing a discrete Fourier transform on the top-view pixel matrix and mapping it to the frequency domain feature space; step S33, using the log-Gabor filter bank to perform spectral sampling in the frequency domain feature space and extracting response components at different scales; step S34, calculating the energy broadening distribution and local phase distribution of each pixel point based on the complex amplitude and phase information of each response component; and step S35, determining the local phase consistency feature values based on the proportional relationship between the energy broadening distribution and the local phase distribution, and constructing a local phase consistency matrix based on the local phase consistency feature values.
[0024] In step S2 of this embodiment, the original image data is spatially geometrically mapped using a perspective transformation matrix to generate a top-view pixel matrix that eliminates perspective distortion. This includes: Step S21: Calculating the rotational transformation constraint relationship between the imaging optical axis and the tactile paving plane using pitch angle parameters; Step S22: Determining the ground sampling distance of each pixel row in the original image data relative to the imaging sensor; Step S23: Determining the resampling step size based on the ground sampling distance, and using the resampling step size to perform homography reconstruction on the original image data; Step S24: Outputting a geometrically scale-aligned top-view pixel matrix, wherein tactile paving targets with medium physical width in the top-view pixel matrix have a consistent pixel projection width.
[0025] In this embodiment, step S5 determines the mixed weight parameters of the local phase consistency matrix and pixel grayscale variance based on spatial information entropy and generates an enhanced feature tensor, including: Step S51: Compare the spatial information entropy with a preset noise interference threshold; Step S52: When the spatial information entropy exceeds the noise interference threshold, determine that there is physical occlusion in the current local area and reduce the weight coefficient of the pixel grayscale variance; Step S53: Compensate the weight coefficient of the local phase consistency matrix to the weight share of the reduced pixel grayscale variance, so as to use the phase coherence feature in the local phase consistency matrix to complete the texture of the disturbed area in the top view pixel matrix and generate an enhanced feature tensor.
[0026] In step S4 of this embodiment, the spatial information entropy of the top-view pixel matrix within a preset local region is calculated using the following formula: ,in, For spatial information entropy, The total number of gray levels. grayscale value The probability of it appearing in a preset local area.
[0027] In this embodiment, step S6, which involves inputting the enhanced feature tensor into the spatial feature extraction channel of the deep neural network, includes: step S61: extracting a high-dimensional semantic feature map of the top-view pixel matrix using the convolutional layer of the deep neural network; step S62: mapping the enhanced feature tensor to an attention weight mask and superimposing the attention weight mask onto the pixel coordinates of the high-dimensional semantic feature map; and step S63: using the enhanced feature tensor to suppress background noise in areas where the spatial information entropy exceeds a preset entropy threshold.
[0028] The deep neural network described in this embodiment includes multiple layers of depthwise separable convolutional layers and global average pooling layers. The kernel size of the depthwise separable convolutional layers is set to 3×3 to limit the time overhead of a single forward inference during operation on a mobile processor.
[0029] The training process of the deep neural network described in this embodiment includes: Step S81: Obtaining a tactile paving labeled dataset containing conditions such as mud cover, snow cover, and strong light flooding; Step S82: Inputting the labeled dataset into the initial network model and using the local phase consistency matrix corresponding to the labeled dataset for feature-guided training; Step S83: Updating the internal weights of the deep neural network through the backpropagation algorithm until the tactile paving recognition accuracy under occlusion conditions reaches a preset target value. The tactile paving recognition result includes the coordinate parameters of the tactile paving centerline and the tactile paving category attributes; the tactile paving category attributes include the attributes of strip-shaped guiding tactile paving and the attributes of dotted tactile paving.
[0030] This embodiment also includes: step S91: real-time statistical analysis of the geometric center position offset of the recognition result between adjacent time frames; step S92: when the geometric center position offset continuously exceeds the preset safety deviation threshold, a status warning signal is sent to the external guide device.
[0031] This embodiment relates to a blind path recognition system based on a deep neural network, the main structure of which includes: The signal acquisition unit is used to acquire raw image data including the blind road surface and to acquire the imaging sensor pitch angle parameters when acquiring the raw image data; The geometric transformation unit is used to construct a perspective transformation matrix based on the pitch angle parameter, and to perform spatial geometric mapping processing on the original image data using the perspective transformation matrix to generate a top view pixel matrix that eliminates perspective distortion. The phase analysis unit is used to perform frequency domain decomposition on the top view pixel matrix using a preset log-Gabor filter bank, extract the local phase consistency feature values of each pixel, and construct a local phase consistency matrix to characterize the continuity of tactile paving stripes. The quality assessment unit is used to calculate the pixel grayscale variance and spatial information entropy of the top-view pixel matrix within a preset local area. The feature enhancement unit is used to determine the mixed weight parameters of the local phase consistency matrix and the pixel gray-level variance based on the spatial information entropy, and to use the mixed weight parameters to perform linear weighted fusion of the local phase consistency matrix and the pixel gray-level variance to generate an enhanced feature tensor. When the spatial information entropy increases to a preset entropy threshold, the feature enhancement unit increases the weight ratio coefficient of the local phase consistency matrix in the fusion process. The reasoning and recognition unit is used to input the enhanced feature tensor into the spatial feature extraction channel of the deep neural network, use the deep neural network to extract tactile paving features, and output the tactile paving recognition results.
[0032] Example 2: In this embodiment, in a specific urban navigation scenario for the visually impaired, a navigation terminal equipped with a low-power edge computing module navigates on a muddy road surface after snowmelt. The road surface contains irregular mud patches, puddles, and unevenly melted snow. This environmental noise causes physical truncation of the tactile paving texture and vanishing edge gradients in the image spatial domain, posing a risk of feature extraction failure to the recognition logic that relies on local pixel gradient changes. The signal acquisition unit acquires the original image data containing the road surface and simultaneously records the pitch angle parameters of the imaging sensor at the time of data acquisition. The navigation terminal integrates a microelectromechanical inertial measurement unit (MEMS) that simultaneously acquires triaxial acceleration and triaxial angular velocity physical quantities and uses a Kalman filter algorithm to fuse the aforementioned physical quantity sequence to output a real-time spatial attitude deviation value as the pitch angle parameter of the imaging sensor. The geometric transformation unit constructs a transparent structure based on this parameter. The perspective transformation matrix performs spatial geometric mapping on the original image data, generating a top-view pixel matrix that eliminates perspective distortion at different distances. This step ensures that tactile paving targets of equal physical width at different distances in the image have a consistent pixel projection width. The texture frequency, which exhibits nonlinearity due to the viewing angle, is normalized to a spatial frequency reference. When constructing this perspective transformation matrix and calculating the ground sampling distance, the system retrieves the vertical installation height parameters of the imaging sensor and the intrinsic optical focal length of the lens, which are pre-programmed into the non-volatile memory. By substituting the real-time acquired pitch angle parameters into the coordinate system rotation component and using the preset vertical installation height parameters as the translation component reference, the system, together with the camera intrinsic parameter matrix, calculates the absolute homography parameter matrix containing real spatial scale information, thereby constructing a deterministic perspective image model that extends from the pixel image plane to the physical ground.
[0033] The phase analysis unit utilizes a pre-set log-Gabor filter bank to perform frequency domain decomposition on the top-view pixel matrix. By constructing a multi-scale, multi-directional filter bank and performing spectral sampling in the frequency domain feature space after discrete Fourier transform, it extracts the energy broadening distribution and local phase distribution of each pixel, determines local phase consistency eigenvalues to construct a local phase consistency matrix. The extraction process, based on the image feature local energy model, determines that the degree of image edge structure is uniquely determined by the phase coincidence of each Fourier frequency component at that point. For any coordinate point in the top-view pixel matrix, the local phase consistency eigenvalues are calculated... The calculation formula is as follows: ,in, Represents the coordinates of a point in a spatial coordinate system The dimensionless value range of the local phase consistency eigenvalue is constrained by the underlying mathematical mapping relationship. to between; This represents the frequency weighting factor based on bandwidth widening; and These represent the scale components after decomposition using the Log-Gabor filter bank. Corresponding complex amplitude and local phase; This represents the weighted average phase within the corresponding local neighborhood; The background noise energy threshold used to suppress false texture responses is estimated by calculating the median amplitude response at the minimum scale in the frequency domain. To prevent overflow of the minimum constant in the denominator calculation, the sign... The system retains the original value when the internal algebraic result is positive and outputs zero when it is negative. When specifically estimating the background noise energy threshold T, the system constructs a statistical histogram of the full-image amplitude response at the minimum frequency domain scale and extracts the median of its absolute values as the median amplitude response. Based on the statistical prior characteristic that high-frequency noise in natural images follows a Rayleigh distribution, the system divides this median amplitude response by the empirical constant of the standard deviation (approximately 0.8) to deduce the standard deviation estimate of the floor noise. Then, it multiplies this standard deviation estimate by a preset confidence coefficient (usually 3). Through the above algebraic transformation, the frequency domain amplitude data is linearly mapped to an absolute truncation energy boundary that can be used to determine the authenticity of features. Since the phase information maintains structural coherence even when the image amplitude fluctuates due to mud occlusion, this step provides a structural completion basis based on frequency domain priors for physically truncated textures. The quality assessment unit simultaneously calculates the pixel grayscale variance and spatial information entropy of the top-view pixel matrix within a preset local region. The spatial information entropy The calculation formula is as follows: ,in, For spatial information entropy, The total number of gray levels. grayscale value The probability of occurrence within a preset local region; the feature enhancement unit assigns this spatial information entropy. Compared with the preset noise interference threshold, the current local area has a chaotic grayscale distribution and low spatial information entropy due to mud and dirt occlusion. When the noise interference threshold is exceeded, it is determined that there is physical occlusion in the area. The weight coefficients of the local phase consistency matrix and the pixel gray-level variance are adjusted. In order to eliminate the physical effect of numerical submersion caused by the linear combination of feature data with different dimensions, before the feature enhancement unit is fused, the global maximum and minimum values of the pixel gray-level variance in the top view pixel matrix are extracted. The range normalization operator is used to linearly compress the pixel gray-level variance numerical domain to the interval between 0 and 1, and the normalized variance matrix is output. The weight coefficients are dynamically allocated according to the spatial information entropy H. The normalized variance matrix and the local phase consistency matrix are calculated by bit weighting along the pixel coordinate system to increase the proportion of the local phase consistency matrix in the fusion process. The enhanced phase structure prior is used to compensate for the lack of spatial features and generate an enhanced feature tensor with structural penetration capability.
[0034] The inference and recognition unit inputs the enhanced feature tensor into the spatial feature extraction channel of the deep neural network. It extracts high-dimensional semantic feature maps through a depthwise separable convolutional layer. The kernel size of the depthwise separable convolutional layer is set to 3×3. The enhanced feature tensor is mapped to an attention weight mask and superimposed on the feature map to suppress background noise in high-entropy regions and enhance the tactile paving feature response. The output includes the recognition result containing the coordinates of the tactile paving centerline and the tactile paving category attribute. Even in areas where mud cover causes spatial gradient breaks, the system reconstructs the geometric attributes of the tactile paving target using frequency domain phase consistency constraints, achieving deterministic recognition in occluded environments and real-time statistical analysis between adjacent time frames. The system calculates the geometric center offset and sends a status warning signal to the external guide device when the offset exceeds the safety deviation threshold. During the process of weight optimization of the deep neural network using labeled samples, the system acquires a dataset for identifying tactile paving under conditions of mud and snow cover. To ensure that subsequent feature matching is valid at the physical scale, after acquiring the initial low-resolution high-dimensional semantic feature map output by the deep neural network, a transposed convolution operator is used to perform cascaded spatial upsampling, restoring the two-dimensional spatial array size to the physical resolution level aligned with the local phase consistency matrix. Simultaneously, a multilayer perceptron is used to perform channel-dimensional feature analysis on the local phase consistency matrix. The channel depth is expanded to match the upsampled high-dimensional semantic feature map, thus constructing a physical common medium that is equivalent in both spatial resolution and semantic depth. During the model parameter iteration phase, the inference and recognition unit extracts the local phase consistency matrix of each training sample. This matrix is used as a structure guiding term, and the structural correlation between the high-dimensional semantic feature map output by the deep neural network and this matrix in the same-dimensional tensor space is calculated through a feature matching operator. The feature matching operator extracts the channel feature vectors of the high-dimensional semantic feature map and the local phase consistency matrix at each spatial pixel, and calculates the normalized cosine similarity between the two vector sets pixel by pixel to generate a representation. When generating the total loss function, the system calculates the difference between the structural correlation scalar graph of the two tensor structures and the global mean of the structural correlation scalar graph as a physical structure penalty term. This physical structure penalty term is then linearly weighted and summed with the cross-entropy loss term of the original network model for the blind path category to generate a calculation benchmark for composite physical constraints. The system incorporates the deviation corresponding to the structural correlation into the total loss function and corrects the convolution weights of the neural network through the backpropagation algorithm, so that the response region of the high-dimensional semantic feature map focuses on the continuous stripe position with high phase consistency. When the total loss function reaches the convergence criterion, the model parameters for the occlusion condition are solidified.
[0035] Example 3: This embodiment uses a guide terminal equipped with a 4.0 TOPS floating-point processor and a 1280×720 pixel resolution CMOS imaging sensor as the experimental platform; the elevation angle parameter of the imaging sensor is set in the range of 45 degrees to 60 degrees to simulate a handheld posture; Gaussian white noise with a signal-to-noise ratio of 20dB is injected into the input raw image data to generate reference data composed of the original top-view pixel matrix; and the center frequency of the log-Gabor filter bank is set. The average projection spacing λ of the tactile paving stripes in the top-view pixel matrix is determined. The mapping rule = 1 / λ; where Center frequency, in pixels To determine the average projection spacing, under the condition that the average projection spacing of the tactile paving stripes is 15 pixels, the center frequency is set. 0.067 pixels The phase analysis unit calculates the local phase consistency feature value. In a local area where 40% of the area is obscured by mud, when the spatial pixel gradient amplitude decreases by 82.5%, the local phase consistency feature value remains above 0.72. Because the phase information maintains structural coherence even when the image amplitude fluctuates due to occlusion, the feature enhancement unit calculates the local phase consistency feature value based on the spatial information entropy. Adjusting the hybrid weight parameters, when the spatial information entropy of the occluded area... When the bit depth is increased from 3.2 bits to 5.8 bits, the mixing weight of the local phase consistency matrix is increased from 0.5 to 0.85, and the output enhanced feature tensor achieves a blind path recognition accuracy of 96.8% after deep neural network inference.
[0036] The design included comparative experiments with different occlusion gradients and parameter boundaries. Under occlusion of 0%, the recognition accuracy of the experimental group using the method of this invention and the comparison group using only spatial convolution feature extraction were 99.2% and 98.5%, respectively. When the occlusion rate increased to 20%, the accuracy of the experimental group was 98.4%, while the accuracy of the comparison group decreased to 89.2%. When the occlusion rate increased to 40%, the recognition accuracy of the comparison group was 65.4%, while the experimental group maintained 96.5% through phase structure compensation. Under occlusion of 60%, the recognition accuracy of the experimental group was 92.1%, while the accuracy of the comparison group was below 40.0%. The control group, with the weight adaptive adjustment mechanism removed, had an accuracy of 81.2% under 40% occlusion. For the parameter boundaries, spatial information entropy was set. The noise interference threshold is between 4.5 and 6.5. When the threshold is below 4.0, the suppression of the tactile paving edge leads to a decrease in recognition accuracy of more than 15.0%. When the threshold is above 7.5, the false detection rate increases from 1.2% to 12.8%. The defined threshold range is the working window for balancing the signal-to-noise ratio and feature sensitivity. Within the range of occlusion rate below 75%, the recognition accuracy of the present invention shows a linear decreasing trend with the increase of occlusion area. When the occlusion rate exceeds the inflection point of 80%, the recognition accuracy declines rapidly, dropping below 60.0% at 90% occlusion. During the process of the signal-to-noise ratio decreasing from 30dB to 15dB, the growth rate of recognition error is controlled within 5.0%, which confirms the suppressive effect of the frequency domain phase feature and spatial gray-scale variance weighted fusion mechanism on environmental interference factors, achieving the engineering goal of transforming the original image data into deterministic guide decision information.
[0037] Example 4: In this embodiment, during the system initialization phase, the geometric transformation unit combines the elevation angle of the imaging sensor... The projection width of the tactile paving stripes in the pixel matrix of the top view is determined by the optical focal length parameter; a length of [missing value] is selected. The horizontal sampling sequence is used to extract pixel grayscale distribution characteristics and calculate the peak offset of the autocorrelation function, thereby determining the dominant fringe frequency corresponding to the current imaging depth. Phase analysis unit based on Set the center frequency of the Log-Gabor filter bank Frequency domain response functions with four scales and six directions are constructed by adjusting the scaling factor. This ensures that the spectral energy region of the tactile paving stripes is covered; before the recognition task begins, the quality assessment unit selects the background road surface area and collects 50 consecutive frames of image data to calculate the background spatial information entropy. Calculate the variance of the entropy values within this time interval. The summation of the result with twice the variance offset is set as the noise interference threshold; the feature enhancement unit extracts the spatial information entropy of local regions during inference. The value of each pixel in the confidence map is distributed between 0 and 1, and then compared with the noise interference threshold to generate a single-channel occlusion confidence map.
[0038] The feature enhancement unit processes the occlusion confidence map using a 1×1 convolutional layer, concatenates it with the previously generated enhanced feature tensor along the feature channel dimension, and then performs cross-channel parameter recombination and channel dimensionality reduction through a second 1×1 convolutional layer to output a single-channel spatial response probability matrix, expanding the number of feature channels to match the dimension of the convolutional layer output tensor. An attention weight mask is then generated by processing the matrix using a sigmoid activation function. The inference and recognition unit performs element-wise multiplication of the semantic feature map and the attention weight mask to suppress spatial information entropy. To address the background noise in areas with large occlusions, and considering the physical scale differences in the two-way data network layer deepening process, the system calls a bilinear interpolation algorithm to perform spatial resolution upsampling on the single-channel enhanced feature tensor, aligning the size of the two-dimensional pixel array with the semantic feature map. By injecting structural features provided by the local phase consistency matrix through the residual connection path, the system reconstructs the response of the tactile paving texture at the pixel gradient break position, and outputs the recognition result containing the tactile paving type and geometric parameters.
[0039] Example 5: This embodiment, within a simulated environment with various tactile paving specifications, obtains the physical spacing as... A spatial frequency mapping reference is established using texture samples, and an imaging sensor is used at a fixed installation height. Adjust pitch angle The system acquires multiple sets of calibration images and extracts the dominant frequency of each set of images after spatial geometric mapping. Based on this, the center frequency of the Log-Gabor filter bank is determined. With pitch angle The linear compensation coefficients between the two are written into the non-volatile memory of the guide terminal, so that the phase analysis unit can dynamically generate the corresponding filter frequency domain response mask based on the tilt feedback value of the imaging sensor in the subsequent identification process.
[0040] When the system faces situations such as switching of the imaging lens focal length or changes in the sensor installation height, the guide terminal captures two objects with a known physical distance before entering the recognition mode. The image of ground markers is used, and the geometric transformation unit calculates the unit pixel length of the ground markers in the top-view pixel matrix. The calculation result is compared with the factory-preset scale to determine the scale factor. This scale factor is used to correct the component parameters of the perspective transformation matrix, and the spatial information entropy of the top-view pixel matrix is calculated. To verify the integrity of the calibrated texture structure, when the energy peak of the feature response tensor is within the preset convergence range, the system locks the current perspective mapping relationship and completes the on-site initialization debugging.
[0041] Example 6: In this embodiment, under a standardized factory calibration environment, the system selects sample images with known physical materials and standard texture frequencies to establish an entropy mapping benchmark; and uses a quality assessment unit to collect spatial information entropy corresponding to different mud and dirt occlusion rates. The slope of the linear change of the weighting coefficients was determined by least-squares fitting of the associated data with local phase consistency eigenvalues. With intercept Construct the following linear correspondence model: ,in, The fusion weights are the local phase consistency matrix. To calculate the spatial information entropy obtained, To calibrate the obtained weighted slope, To calibrate the obtained weight bias; the calibration procedure is performed at the sensor pitch angle. The system iteratively executes at preset positions of 45°, 50°, and 60°, generating multiple sets of adaptive mapping tables stored in non-volatile memory. This allows the feature enhancement unit to determine weight selection based on real-time angle feedback values and texture frequency parameters during operation. When the guide terminal is deployed in complex road environments with dynamic lighting changes, the feature enhancement unit obtains the original feature tensor output by the inference recognition unit at the depthwise separable convolutional layer. It then uses a 1×1 convolution operator to expand the number of channels in the occlusion confidence map from 1 to the same dimension as the original feature tensor. The system uses the occlusion confidence map activated by Sigmoid as an attention mask and performs a dot product operation with the original feature tensor at spatial coordinates to suppress local high-entropy noise responses caused by snow melting. The system also uses residual connection paths to superimpose the structural features extracted by the phase analysis unit into the multiplied tensor. When the quality assessment unit detects background spatial entropy... When the instantaneous jump variable exceeds a preset offset threshold of 20%, the system initiates an online recalibration process. It updates the noise interference threshold by collecting the statistical distribution of preset edge regions in the current frame image and outputs an environmentally adaptable recognition result.
[0042] After the reasoning and recognition unit extracts the position coordinates of the center line of the tactile paving, the system collects the position data of the current time frame in the time domain. and the position data of the adjacent previous frame The offset of the geometric center position is calculated using the following formula. : ,in, This is the offset of the geometric center position. This represents the current coordinate position. The coordinates of the point at the previous moment; the quality assessment unit monitors the offset. The system determines the deviation of the current walking path from the tactile paving guide trajectory when the offset exceeds the preset 15-pixel safety deviation threshold for three consecutive frames. The system then outputs a pulse trigger signal to the tactile paving terminal. The specific logic for determining the 15-pixel safety deviation threshold is as follows: The system extracts the ground sampling distance of the top view pixel matrix that has been solved in the perspective distortion correction stage. It divides the lateral physical limit span of the tactile paving (i.e., the one-sided distance from the outer edge of the standard stripe area to the center line) by the physical sampling size corresponding to a single pixel. The resulting quotient is rounded down to obtain the dimensionless pixel parameter of 15. This parameter establishes the safe tolerance limit for the human walking stride not to deviate from the boundary of the tactile paving texture at the physical engineering level.
Claims
1. A method for identifying tactile paving based on deep neural networks, characterized in that, The main process includes the following steps: Step S1: Obtain the original image data containing the blind road surface and obtain the imaging sensor pitch angle parameters when acquiring the original image data; Step S2: Construct a perspective transformation matrix based on the pitch angle parameters, and use the perspective transformation matrix to perform spatial geometric mapping processing on the original image data to generate a top view pixel matrix that eliminates perspective distortion. Step S3: Use a preset logarithmic Gabor filter bank to perform frequency domain decomposition on the top view pixel matrix, extract the local phase consistency feature values of each pixel, and construct a local phase consistency matrix to characterize the continuity of the tactile paving stripes. Step S4: Calculate the pixel grayscale variance and spatial information entropy of the top view pixel matrix within a preset local area; Step S5: Determine the hybrid weight parameters of the local phase consistency matrix and the pixel gray-level variance based on the spatial information entropy, and use the hybrid weight parameters to perform linear weighted fusion of the local phase consistency matrix and the pixel gray-level variance to generate an enhanced feature tensor. When the spatial information entropy increases to a preset entropy threshold, the weight ratio coefficient of the local phase consistency matrix in the fusion process is increased. Step S6: Input the enhanced feature tensor into the spatial feature extraction channel of the deep neural network, use the deep neural network to extract blind path features, and output the blind path recognition result.
2. The method for identifying tactile paving based on a deep neural network according to claim 1, characterized in that, Step S3 involves using a preset Log-Gabor filter bank to perform frequency domain decomposition on the top-view pixel matrix, extracting local phase consistency feature values, and constructing a local phase consistency matrix. This includes: Step S31, constructing a multi-scale, multi-directional Log-Gabor filter bank using preset center frequencies and directional distribution parameters; Step S32, performing a discrete Fourier transform on the top-view pixel matrix and mapping it to the frequency domain feature space; Step S33, using the Log-Gabor filter bank to perform spectral sampling in the frequency domain feature space and extracting response components at different scales; Step S34, calculating the energy broadening distribution and local phase distribution of each pixel based on the complex amplitude and phase information of each response component; and Step S35, determining the local phase consistency feature values based on the proportional relationship between the energy broadening distribution and the local phase distribution, and constructing a local phase consistency matrix based on these feature values.
3. The method for identifying tactile paving based on a deep neural network according to claim 1, characterized in that, Step S2 uses a perspective transformation matrix to perform spatial geometric mapping processing on the original image data to generate a top-view pixel matrix that eliminates perspective distortion. This includes: Step S21: Calculating the rotational transformation constraint relationship between the imaging optical axis and the tactile paving plane using the pitch angle parameter; Step S22: Determining the ground sampling distance of each pixel row in the original image data relative to the imaging sensor; Step S23: Determining the resampling step size based on the ground sampling distance, and using the resampling step size to perform homography reconstruction on the original image data; Step S24: Outputting a geometrically scale-aligned top-view pixel matrix, wherein tactile paving targets with medium physical width in the top-view pixel matrix have a consistent pixel projection width.
4. The method for identifying tactile paving based on a deep neural network according to claim 1, characterized in that, Step S5 determines the mixed weight parameters of the local phase consistency matrix and pixel grayscale variance based on the spatial information entropy, and generates an enhanced feature tensor, including: Step S51: Compare the spatial information entropy with a preset noise interference threshold; Step S52: When the spatial information entropy exceeds the noise interference threshold, determine that there is physical occlusion in the current local area, and reduce the weight coefficient of the pixel grayscale variance; Step S53: Compensate the weight coefficient of the local phase consistency matrix to the weight share of the reduction in pixel grayscale variance, so as to use the phase coherence feature in the local phase consistency matrix to complete the texture of the disturbed area in the top view pixel matrix and generate an enhanced feature tensor.
5. The method for identifying tactile paving based on a deep neural network according to claim 1, characterized in that, In step S4, the spatial information entropy of the top-view pixel matrix within a preset local region is calculated using the following formula: ,in, For spatial information entropy, This represents the total number of gray levels. grayscale value The probability of it appearing in a preset local area.
6. The method for identifying tactile paving based on a deep neural network according to claim 1, characterized in that, Step S6 involves inputting the enhanced feature tensor into the spatial feature extraction channel of the deep neural network, including: Step S61: using the convolutional layer of the deep neural network to extract a high-dimensional semantic feature map of the top-view pixel matrix; Step S62: mapping the enhanced feature tensor to an attention weight mask and superimposing the attention weight mask onto the pixel coordinates of the high-dimensional semantic feature map; Step S63: using the enhanced feature tensor to suppress background noise in areas where the spatial information entropy exceeds a preset entropy threshold.
7. The method for identifying tactile paving based on a deep neural network according to claim 1, characterized in that, The deep neural network contains multiple layers of depthwise separable convolutional layers and global average pooling layers. The kernel size of the depthwise separable convolutional layers is set to 3×3 to limit the time cost of a single forward inference during operation on a mobile processor.
8. The method for identifying tactile paving based on a deep neural network according to claim 1, characterized in that, The training process of the deep neural network includes: Step S81: Obtaining a labeled dataset of tactile paving under conditions of mud cover, snow cover, and strong light flooding; Step S82: Inputting the labeled dataset into the initial network model and using the local phase consistency matrix corresponding to the labeled dataset for feature-guided training; Step S83: Updating the internal weights of the deep neural network through the backpropagation algorithm until the tactile paving recognition accuracy under occlusion conditions reaches the preset target value. The tactile paving recognition results include the coordinate parameters of the center line position of the tactile paving and the tactile paving category attributes; the tactile paving category attributes include the attributes of strip-shaped guiding tactile paving and the attributes of dotted tactile paving.
9. The method for identifying tactile paving based on a deep neural network according to claim 8, characterized in that, It also includes: step S91: real-time statistical analysis of the geometric center position offset of the recognition result between adjacent time frames; step S92: when the geometric center position offset continuously exceeds the preset safety deviation threshold, a status warning signal is sent to the external guide device.
10. A tactile paving identification system based on a deep neural network, used to implement the tactile paving identification method based on a deep neural network as described in claim 1, characterized in that, include: The signal acquisition unit is used to acquire raw image data including the blind road surface and to acquire the imaging sensor pitch angle parameters when acquiring the raw image data; The geometric transformation unit is used to construct a perspective transformation matrix based on the pitch angle parameter, and to perform spatial geometric mapping processing on the original image data using the perspective transformation matrix to generate a top view pixel matrix that eliminates perspective distortion. The phase analysis unit is used to perform frequency domain decomposition on the top view pixel matrix using a preset logarithmic Gabor filter bank, extract the local phase consistency feature values of each pixel, and construct a local phase consistency matrix to characterize the continuity of tactile paving stripes. The quality assessment unit is used to calculate the pixel grayscale variance and spatial information entropy of the top-view pixel matrix within a preset local area. The feature enhancement unit is used to determine the mixed weight parameters of the local phase consistency matrix and the pixel gray-level variance based on the spatial information entropy, and to use the mixed weight parameters to perform linear weighted fusion of the local phase consistency matrix and the pixel gray-level variance to generate an enhanced feature tensor. When the spatial information entropy increases to a preset entropy threshold, the feature enhancement unit increases the weight ratio coefficient of the local phase consistency matrix in the fusion process. The reasoning and recognition unit is used to input the enhanced feature tensor into the spatial feature extraction channel of the deep neural network, use the deep neural network to extract tactile paving features, and output the tactile paving recognition results.
Citation Information
Patent Citations
A Blind Track Localization Method Based on Computer Binocular Vision and Homography Matrix
CN105096327B
Automatic counting method of metallurgical rod material
CN108665050A
Crowd evacuation simulation method based on scene semantic information under complex indoor structure
CN112417754A