A road sign inspection system and method
The road sign inspection system, which features autonomous planning and closed-loop optimization, solves the problems of passive data collection and fuzzy scene recognition in existing technologies. It achieves efficient and stable sign recognition and system optimization, making it suitable for road sign inspection in intelligent transportation systems.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN URBAN TRANSPORT PLANNING CENT CO LTD
- Filing Date
- 2026-05-14
- Publication Date
- 2026-06-16
Smart Images

Figure CN122223973A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of intelligent transportation technology, specifically relating to a road sign inspection system and inspection method. Background Technology
[0002] With the development of digital maintenance of road infrastructure and autonomous driving technology, the collection of information on traffic facilities such as road signs, directional signs, speed limit signs, and prohibition signs has gradually shifted from manual photography and data entry to vehicle-mounted perception, automatic recognition, and platform-based management. Road inspection scenarios are characterized by long routes, dispersed objects, significant variations in lighting, numerous obstructions, and constantly changing vehicle movement. Especially in urban expressways, main roads, and complex intersections, relying solely on passive data collection along fixed routes or single-shot vehicle-mounted recognition makes it difficult to simultaneously ensure data collection quality, real-time processing, bandwidth costs, and task closure capabilities.
[0003] Existing road sign inspection and recognition solutions generally suffer from the following problems: First, they only achieve passive processing of "recognizing signs after seeing them," and cannot perform route planning, priority scheduling of key areas, and optimization of collection strategies around the inspection task; second, they lack an active re-collection mechanism for low-quality images such as blurry, backlit, angled, and partially occluded images, resulting in unstable recognition results in adverse scenarios; third, the single-sided computing architecture of the cloud or vehicle is not conducive to balancing real-time performance, accuracy, and cost; and fourth, they lack mechanisms for historical memory, duplicate sign deduplication, and inspection strategy feedback, making it difficult to form a closed loop for continuous optimization. Summary of the Invention
[0004] The problem this invention aims to solve is to develop a continuously optimized road inspection system, and proposes a road sign inspection system and inspection method.
[0005] To achieve the above objectives, the present invention provides the following technical solution:
[0006] A road sign inspection system includes a task interface module, an autonomous planning module, an edge perception and detection module, an embodied interaction closed-loop module, a cloud-based cognitive processing module, a result output module, a long-term memory and deduplication module, and an autonomous iteration module.
[0007] The task interface module, autonomous planning module, edge perception and detection module, embodied interaction closed-loop module, cloud cognitive processing module, and long-term memory and deduplication module are connected in sequence.
[0008] The long-term memory and deduplication module are respectively connected to the result output module and the autonomous iteration module, and the autonomous iteration module is connected to the autonomous planning module.
[0009] The task interface module inputs the inspection task package into the autonomous planning module;
[0010] The autonomous planning module outputs route and stop point parameters to the edge perception and detection module based on the inspection task package;
[0011] The edge perception and detection module drives the unmanned vehicle to conduct inspections according to the inspection route and stop point parameters. The detected sign candidate results are sent to the embodied interaction closed loop module to perform quality assessment and secondary data collection, and obtain sign data that meets the quality standards. After preprocessing, the sign data that meets the quality standards is formed into an encapsulated data package and sent to the cloud cognitive processing module.
[0012] The cloud-based cognitive processing module sends the cloud-based recognition results to the long-term memory and deduplication module.
[0013] The long-term memory and deduplication module processes the cloud recognition results and historical inspection results, performs deduplication verification and version merging, updates the historical database, and then sends it to the result output module.
[0014] The autonomous iteration module trains the model based on the updated historical database to obtain updated model parameters, quality thresholds, and route priorities, and then feeds them back into the autonomous planning module.
[0015] Furthermore, the modules of the road sign inspection system communicate with each other using standardized data packets.
[0016] Furthermore, the unmanned vehicle is equipped with a camera, lidar, and a GNSS / IMU integrated navigation unit.
[0017] An inspection method for a road sign inspection system includes the following steps:
[0018] S1. The task interface module sends the inspection task package to the autonomous planning module, and the autonomous planning module generates the inspection route and stopping points.
[0019] S2. The edge perception and detection module drives the unmanned vehicle to perform inspections according to the inspection route and stop point parameters, collect images, point clouds and pose information, and then establish a synchronization constraint for the sampling time difference. The obtained images that meet the synchronization constraint are used to perform edge lightweight sign detection to obtain sign candidate images.
[0020] S3. The embodied interaction closed-loop module performs quality assessment on the candidate sign images obtained in step S2. For candidate sign images that are below the quality threshold, the driver vehicle and camera are driven to perform secondary acquisition to obtain candidate sign images that meet the quality standards. After processing, the packaged data package is uploaded to the cloud cognitive processing module.
[0021] S4. The cloud-based cognitive processing module parses the signage cutout area in the encapsulated data packet obtained in step S3, and sequentially performs text line detection, Chinese and English semantic parsing, line-level matching, and structured field generation to obtain the cloud recognition result;
[0022] S5. After receiving the cloud recognition result, the long-term memory and deduplication module associates the cloud recognition result with the historical inspection result to obtain the latest recognition result, completes the deduplication verification and version merging, and then updates the historical database.
[0023] S6. The autonomous iteration module updates the model parameters, quality thresholds, and route priorities based on the updated historical database, and then feeds them back into the autonomous planning module to carry out a new round of inspection tasks.
[0024] Furthermore, the specific implementation method of step S1 includes the following steps:
[0025] S1.1. The autonomous planning module receives a task package from the task interface module. The task package includes the set of road sections to be covered, the inspection cycle, the set of key areas, the allowed operation time window, and safety rules.
[0026] S1.2. The autonomous planning module abstracts the road network to be inspected into a directed graph. For each candidate road segment, it calculates the segment utility value and uses the segment utility value as... The weighting criteria for path search and stop selection, the optimal inspection route, and the formula for calculating the road segment utility value are as follows:
[0027] J p =w c ·C p +w h ·H p -w t ·T p -w r ·R p
[0028] Among them, J p C represents the overall inspection utility value of the p-th candidate road segment; p Indicates the coverage gain of this road segment for uncovered signs; H p This indicates the priority of high-value signage based on historical memory statistics; T p R represents the time cost of entering this road segment; p This indicates the risk costs associated with construction, congestion, temporary traffic control, or safety restrictions; c w h w t w r These are the weighting coefficients for coverage items, historical items, time items, and risk items, respectively.
[0029] S1.3. Based on the obtained optimal inspection route, the autonomous planning module generates a stop sequence, target speed limit, camera default parameters, and priority table for key signs.
[0030] Furthermore, the specific implementation method of step S2 includes the following steps:
[0031] S2.1. The edge perception and detection module drives the camera, LiDAR, and positioning attitude module to sample synchronously according to the execution strategy issued by the autonomous planning module, and encapsulates the image, point cloud, and pose information within the same sampling period into a data packet D. sense ;
[0032] S2.2. Construct a sampling time difference execution synchronization constraint method. For the k-th sampling, calculate the difference ΔT between the maximum and minimum timestamps. k Only if ΔT k Not greater than the synchronization threshold τ sync When the sampling is valid, it is determined to be valid.
[0033] S2.3. The YOLO11n single-stage object detection network, which is sampled and executed in real time on the vehicle side, performs edge-lightweight sign detection on the effectively sampled images and outputs the sign region bounding box B. n Confidence level P n and signage type n For confidence level P n Greater than or equal to the edge preservation threshold τ det Then, the rectangular mask corresponding to the detection box is processed to obtain the candidate image of the sign.
[0034] Furthermore, in step S3, the quality of the candidate sign images acquired in the second acquisition is evaluated again until the image quality score is greater than or equal to the quality threshold or the number of acquisitions reaches the upper limit.
[0035] The method for processing candidate images of signs that meet the quality standards is to, for the nth candidate sign, set the bounding box B... n According to the lateral expansion amount D x and longitudinal expansion D y Expand outwards to form a signage cutoff area. n , as candidate images for the processed sign;
[0036] Processed candidate image of the sign and GPS coordinates of the acquisition location pos Vehicle position information Pose car Collection timestamp T cap Signage Type sign Image quality score Q img The fields are collectively encapsulated into an encapsulated data packet P. cloud .
[0037] Furthermore, the cloud recognition result JSON obtained in step S4 outThis includes sign type, sign area coordinates, Chinese text, English text, text line coordinates, collection location, collection time, quality score, and version number.
[0038] Furthermore, the specific implementation method of step S5 includes the following steps:
[0039] S5.1. The long-term memory and deduplication module receives the cloud recognition result JSON generated in step S4. out By combining the sign image captured from the cloud-based recognition results, the acquisition location, acquisition time, and vehicle pose metadata, the current inspection record R is constructed. cur ;
[0040] S5.2. Generate a unique identifier U for each sign. n The expression is:
[0041] U n =Hash(F vis,n ||F geo,n ||F dir,n ||F type,n )
[0042] Among them, U n The nth sign candidate is represented by a unique identifier; Hash represents the hash function; F vis,n This represents the encoding of visual features extracted from the signage area image; F geo,n This represents the location code obtained by discretizing geographic coordinates; F dir,n Indicates the vehicle's direction of travel encoding; F type,n This indicates the sign type code; the symbol || indicates that fields are concatenated in a predetermined order.
[0043] S5.3. Calculate the similarity D between the current inspection record obtained in step S5.1 and the h-th historical record. n,h The expression is:
[0044] D n,h =w v ·S vis,n,h +w g ·S geo,n,h +w d ·S dir,n,h
[0045] Among them, D n,h S represents the similarity between the current nth sign candidate and the hth historical record; vis,n,h Indicates visual feature similarity; S geo,n,h Indicates geographical similarity; S dir,n,h Indicates the similarity of the collection direction; w v w g w dThese are the weighting coefficients for the visual item, the geographical item, and the directional item, respectively.
[0046] When D n,h Greater than or equal to the deduplication threshold τ dup When the current inspection record and the h-th historical record are identified as the same sign, the historical entry is updated by merging versions; otherwise, a new entry is created and written to the historical database; when merging versions, the latest high-quality image, the latest recognition result, and the complete time series are retained.
[0047] S5.4. After completing the deduplication verification and version merging, the long-term memory and deduplication module will send the final inspection results to the result output module.
[0048] Furthermore, in step S6, the autonomous iteration module updates the model parameters using high-value samples, low-quality resampled samples, platform-manually corrected samples, and failed identification samples from the historical database as training samples.
[0049] The beneficial effects of this invention are:
[0050] The road sign inspection system described in this invention solves the problems of existing road inspection schemes, which can only passively collect data and cannot autonomously generate routes, prioritize key sections, or execute closed loops around task objectives. It also addresses the issue of decreased recognition accuracy due to a lack of active re-collection control when dealing with blurry, backlit, angled, or partially occluded sign images. Furthermore, it addresses the problems of bandwidth waste caused by uploading all images or excessive computing costs due to end-to-end vehicle processing in existing schemes. It also solves the problem of existing schemes lacking long-term memory, duplicate sign deduplication, and historical result reuse mechanisms, thus failing to achieve continuously improving system capabilities with increasing inspection frequency. Finally, it addresses the problem that while existing schemes include detection or recognition models, they lack a multimodal agent technology chain for road physical world inspection scenarios, making it difficult to achieve a closed loop of "task-driven—perception execution—cognitive processing—memory iteration."
[0051] The road sign inspection system described in this invention, through a pre-planning mechanism of "task interface - route planning - stop point generation", transforms road inspection from passive data collection into a closed-loop process of proactive execution around task objectives, making it suitable for special inspections, key road section inspections, and periodic repetitive inspections.
[0052] The road sign inspection system described in this invention, through a embodied interactive closed loop of "quality assessment - control parameter generation - secondary acquisition", can proactively improve the acquisition quality in blurry, backlit, off-angle, or occluded scenarios, thereby improving the readability of sign text and the stability of subsequent structured parsing from a technical mechanism perspective.
[0053] The road sign inspection system described in this invention reduces the transmitted data from a full road image to a sign area image and its necessary metadata by detecting, capturing, and uploading the data at the edge first. This reduces bandwidth consumption and shortens cloud processing time.
[0054] The road sign inspection system described in this invention, through unique identifier generation, duplicate similarity verification, and version merging mechanisms, can avoid duplicate reporting of the same sign in multiple inspections, while retaining time-series evolution information, making it suitable for long-term maintenance scenarios.
[0055] The road sign inspection system described in this invention achieves continuous improvement of system capabilities by feeding back historical missed detections, re-collection, and manual correction results to priority, quality thresholds, and model parameters, rather than a one-time static deployment. Attached Figure Description
[0056] Figure 1 This is a schematic diagram of the structure of a road sign inspection system according to the present invention;
[0057] Figure 2 This is a flowchart of a road sign inspection method according to the present invention. Detailed Implementation
[0058] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only for explaining the invention and are not intended to limit the invention; that is, the described specific embodiments are merely a part of the embodiments of the invention, and not all of them. The components of the specific embodiments of the invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations, and the invention may also have other embodiments.
[0059] Therefore, the following detailed description of specific embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected specific embodiments of the invention. All other specific embodiments obtained by those skilled in the art based on these specific embodiments without inventive effort are within the scope of protection of this invention.
[0060] To further understand the invention's content, features, and effects, the following specific embodiments are provided, along with accompanying drawings. Figure 1 - Appendix Figure 2 Detailed explanation is as follows:
[0061] Example 1:
[0062] A road sign inspection system includes a task interface module, an autonomous planning module, an edge perception and detection module, an embodied interaction closed-loop module, a cloud-based cognitive processing module, a result output module, a long-term memory and deduplication module, and an autonomous iteration module.
[0063] The task interface module, autonomous planning module, edge perception and detection module, embodied interaction closed-loop module, cloud cognitive processing module, and long-term memory and deduplication module are connected in sequence.
[0064] The long-term memory and deduplication module are respectively connected to the result output module and the autonomous iteration module, and the autonomous iteration module is connected to the autonomous planning module.
[0065] The task interface module inputs the inspection task package into the autonomous planning module;
[0066] The autonomous planning module outputs route and stop point parameters to the edge perception and detection module based on the inspection task package;
[0067] The edge perception and detection module drives the unmanned vehicle to conduct inspections according to the inspection route and stop point parameters. The detected sign candidate results are sent to the embodied interaction closed loop module to perform quality assessment and secondary data collection, and obtain sign data that meets the quality standards. After preprocessing, the sign data that meets the quality standards is formed into an encapsulated data package and sent to the cloud cognitive processing module.
[0068] The cloud-based cognitive processing module sends the cloud-based recognition results to the long-term memory and deduplication module.
[0069] The long-term memory and deduplication module processes the cloud recognition results and historical inspection results, performs deduplication verification and version merging, updates the historical database, and then sends it to the result output module.
[0070] The autonomous iteration module trains the model based on the updated historical database to obtain updated model parameters, quality thresholds, and route priorities, and then feeds them back into the autonomous planning module.
[0071] Furthermore, the modules of the road sign inspection system communicate with each other using standardized data packets.
[0072] Furthermore, the unmanned vehicle is equipped with a camera, lidar, and a GNSS / IMU integrated navigation unit.
[0073] Furthermore, the task interface module inputs the inspection task package T into the autonomous planning module. task The autonomous planning module outputs route and stop point parameters to the edge perception and detection module; after generating candidate signage results, the edge perception and detection module sends them to the embodied interaction closed-loop module for quality assessment and secondary data collection; the signage data that meets the quality standards is preprocessed to form a cloud data package P. cloudThe data is then sent to the cloud-based cognitive processing module; the cloud-based cognitive processing module generates a structured JSON result. out The data is then sent to the long-term memory and deduplication module. The long-term memory and deduplication module combines the corresponding sign image capture and collected metadata to complete the generation of unique IDs, deduplication verification, version merging, and historical data storage. The final inspection result after deduplication and merging is then sent to the result output module for reporting on the inspection platform. Subsequently, the long-term memory and deduplication module outputs training samples and historical statistics to the autonomous iteration module. The autonomous iteration module feeds back the updated model parameters, quality thresholds, and route priorities to the autonomous planning module.
[0074] Example 2:
[0075] An inspection method for a road sign inspection system according to Embodiment 1 includes the following steps:
[0076] S1. The task interface module sends the inspection task package to the autonomous planning module, and the autonomous planning module generates the inspection route and stopping points.
[0077] Furthermore, the specific implementation method of step S1 includes the following steps:
[0078] S1.1. The autonomous planning module receives a task package from the task interface module. The task package includes the set of road sections to be covered, the inspection cycle, the set of key areas, the allowed operation time window, and safety rules.
[0079] Furthermore, the set of road segments to be covered is SegList, the inspection cycle is Cycle, the set of key areas is PrioritySet, the allowed operation time window is TimeWindow, and the safety rule is SafeRule;
[0080] S1.2. The autonomous planning module abstracts the road network to be inspected into a directed graph. For each candidate road segment, it calculates the segment utility value and uses the segment utility value as... The weighting criteria for path search and stop selection, the optimal inspection route, and the formula for calculating the road segment utility value are as follows:
[0081] J p =w c ·C p +w h ·H p -w t ·T p -w r ·R p
[0082] Among them, J p C represents the overall inspection utility value of the p-th candidate road segment; p Indicates the coverage gain of this road segment for uncovered signs; Hp This indicates the priority of high-value signage based on historical memory statistics; T p R represents the time cost of entering this road segment; p This indicates the risk costs associated with construction, congestion, temporary traffic control, or safety restrictions; c w h w t w r These are the weighting coefficients for coverage items, historical items, time items, and risk items, respectively.
[0083] Furthermore, there is a directed graph Graph=(V,E), where V represents candidate road segment nodes or candidate stop points, and E represents drivable edges between nodes;
[0084] S1.3. Based on the obtained optimal inspection route, the autonomous planning module generates a stop sequence, target speed limit, camera default parameters, and priority table for key signs.
[0085] Furthermore, after obtaining the optimal inspection route, the autonomous planning module generates an edge execution strategy based on the optimal path results, task package constraints, and historical statistical information. Specifically, it selects stops from the candidate stop point set along the optimal inspection route that correspond to the observable locations of the signs, and arranges them according to the order of vehicle travel to obtain a stop point sequence; it determines the corresponding target speed limit based on the road grade, speed limit, permitted operation time window, and safety rules of the road segment where each stop point is located; it presets default camera parameters based on the best imaging parameters of similar signs in the corresponding time period and road scenario from historical inspections, and these default camera parameters include at least focal length, exposure time, and gain; and it generates a priority table for key signs based on the PrioritySet of key areas in the task package, historical missed detection frequency, re-sampling frequency, and distribution of high-value signs. All of the above together form the edge execution strategy. The output no longer uses the broad term "instruction" but is explicitly defined as a set of executable parameters.
[0086] S2. The edge perception and detection module drives the unmanned vehicle to perform inspections according to the inspection route and stop point parameters, collect images, point clouds and pose information, and then establish a synchronization constraint for the sampling time difference. The obtained images that meet the synchronization constraint are used to perform edge lightweight sign detection to obtain sign candidate images.
[0087] Furthermore, the specific implementation method of step S2 includes the following steps:
[0088] S2.1. The edge perception and detection module drives the camera, LiDAR, and positioning attitude module to sample synchronously according to the execution strategy issued by the autonomous planning module, and encapsulates the image, point cloud, and pose information within the same sampling period into a data packet D. sense ;
[0089] Furthermore, the GNSS / IMU integrated navigation unit outputs the vehicle's position, heading angle, attitude angle, and corresponding timestamp, and encapsulates the image, point cloud, and pose information within the same sampling period into a data packet D. sense Specifically, the D sense This does not mean forcibly concatenating the image, point cloud, and pose into the same input tensor, but rather represents a synchronous sampling record D constructed with a uniform sampling number k. sense,k ={FrameID k I rgb,k P lidar,k Pose car,k T cam,k T lidar,k T nav,k}. Among them, I rgb,k For the image frame sampled for the kth time, P lidar,k Pose is the LiDAR point cloud aligned with the nearest neighbor timestamp of this image frame. car,k The vehicle pose output by the GNSS / IMU integrated navigation unit is determined by the spatial correspondence between the camera and the lidar, which is determined by the pre-calibrated extrinsic parameter matrix.
[0090] S2.2. Construct a sampling time difference execution synchronization constraint method. For the k-th sampling, calculate the difference ΔT between the maximum and minimum timestamps. k Only if ΔT k Not greater than the synchronization threshold τ sync When the sampling is valid, it is determined to be valid.
[0091] Furthermore, for the k-th sampling, calculate the difference ΔT between the maximum and minimum timestamps. k Only if ΔT k Not greater than the synchronization threshold τ sync Only when the sampling is complete is the sample considered valid, and the expression is:
[0092] ΔT k =max(T cam,k ,T lidar,k ,T nav,k )-min(T cam,k ,T lidar,k ,T nav,k )
[0093] In the formula, ΔT k T represents the multi-source time difference of the k-th sampling; cam,k T lidar,k T nav,k τ represents the timestamps of the camera, lidar, and GNSS / IMU integrated navigation unit at the k-th sampling time, respectively; sync This is the threshold for determining spatiotemporal synchronization.
[0094] S2.3. The YOLO11n single-stage object detection network, which is sampled and executed in real time on the vehicle side, performs edge-lightweight sign detection on the effectively sampled images and outputs the sign region bounding box B. n Confidence level P n and signage type n For confidence level P n Greater than or equal to the edge preservation threshold τ det Then, the rectangular mask corresponding to the detection box is processed to obtain the candidate image of the sign.
[0095] Furthermore, the edge-lightweight signage detection model is denoted as D. det Specifically, it is a YOLO11n single-stage object detection network that performs real-time execution on the vehicle side. det The input is an RGB image scaled to 640×640 and normalized. rgb The output is a set of detection boxes B. det ={b n}, where b n =(x 1n ,y 1n ,x 2n ,y 2n ,s n ,t n ), x 1n y 1n x 2n y 2n Represents the bounding box coordinates, s n t represents the confidence level. n This indicates the type of signage label.
[0096] Furthermore, the edge-lightweight signage detection model D det Only I rgb,k Perform two-dimensional sign detection as input; P lidar,k and Pose car,k Do not directly enter D det Instead, it is retained as spatiotemporal metadata associated with the detection results for subsequent location lookup, historical correlation, and cloud data packet P. cloud Metadata encapsulation.
[0097] To ensure the clarity of subsequent quality assessment areas, those meeting the requirements of s n ≥τ det Each detection box b n Construct a rectangular mask M n (u,v): When pixel (u,v) falls into the detection box b n When within the coverage area, M n (u,v)=1; otherwise M n(u,v)=0. The total candidate mask for the current frame is denoted as M(u,v)=maxnM. n (u,v). Edge detection is not the end point, but the starting point of the quality closure loop. Its output bounding box, category, and confidence directly determine the subsequent image quality scoring region, camera adjustment direction, and whether re-acquisition is needed.
[0098] S3. The embodied interaction closed-loop module performs quality assessment on the candidate sign images obtained in step S2. For candidate sign images that are below the quality threshold, the driver vehicle and camera are driven to perform secondary acquisition to obtain candidate sign images that meet the quality standards. After processing, the packaged data package is uploaded to the cloud cognitive processing module.
[0099] Furthermore, in step S3, the quality of the candidate sign images acquired in the second acquisition is evaluated again until the image quality score is greater than or equal to the quality threshold or the number of acquisitions reaches the upper limit.
[0100] Furthermore, for each sign candidate n, an image quality score Q is calculated at the edge. n When Q n Below the quality threshold τ q At that time, generate the specific control parameter package C. emb The vehicle and camera complete one supplementary data acquisition. To ensure consistency in the formula direction, all components below are defined as positive indicators of "higher scores indicate better quality," expressed as follows:
[0101] Q n =w s ·S sharp,n +w b ·S bright,n +w o ·S occ,n +w v ·S view,n ,
[0102] S sharp,n =clip((Var n -L min ) / (L max -L min +ε1),0,1),
[0103] S bright,n =max(0,1-|μ n -μ0| / (Δμ+ε2)); S occ,n =max(0,1-A occ,n / (A box,n +ε3)),
[0104] S view,n =max(0,1-|Δc n | / (δc +ε4)),
[0105] Among them, Q n S represents the image quality score of the nth sign candidate; sharp,n S bright,n S occ,n S view,n These represent the sharpness score, luminance score, occlusion visibility score, and viewing angle score, respectively; Var n L represents the Laplace variance of the candidate region; min and L max These represent the lower and upper bounds of sharpness normalization, respectively; μ n The average brightness of the candidate region is represented by μ0; the center value of the target brightness is represented by μ0; Δμ represents the allowable brightness deviation; A occ,n Indicates the area within the candidate region that is occluded by the foreground; A box,n Indicates the area of the candidate region; Δc n δ represents the normalized deviation of the sign center relative to the target's field of view center; c Indicates allowable center deviation; w s w b w o w v These are the four weight coefficients, and w s +w b +w o +w v =1; ε1 to ε4 are positive constants to prevent the denominator from being zero.
[0106] Furthermore, let the quality deficit Δq be denoted as... n =max(0,τ q -Q n When Δq n When >0, the embodied interactive closed-loop module generates the control parameter package C. emb The control parameter package includes the target vehicle speed V. n Horizontal offset Y n Ψ heading adjustment n Camera focal length F n Exposure time E n and gain G n The expression is:
[0107] V n =clip(V cur,n -k v ·Δq n V min V max ),
[0108] Y n =clip(Y cur,n +ky ·Δc n ,Y min ,Y max ),
[0109] Ψ n =clip(Ψ cur,n +k ψ ·Δψ n ,Ψ min ,Ψ max ),
[0110] E n =clip(E cur,n +k e ·(μ0-μ n ),E min E max ),
[0111] F n =clip(F cur,n +k f ·(A0-A box,n ) / (A0+ε5),F min ,F max ),
[0112] G n =clip(G cur,n +k g ·max(0,μ low -μ n ),G min G max ),
[0113] Where, Δq n Indicates a quality deficit; V cur,n Y cur,n Ψ cur,n E cur,n F cur,n G cur,n These represent the current vehicle speed, current lateral position, current heading, current exposure time, current focal length, and current gain, respectively; k v k y k ψ k e k f k g These represent the adjustment coefficients for the corresponding control quantities; Δψ n Indicates the angle between the center of the sign's line of sight and the vehicle's current heading; A0 represents the target's readable area; μ low Indicates the minimum permissible brightness; V min To G maxRespectively represent the safety boundaries of each control quantity. Among them, clip(x,a,b) represents a truncation function: when x < a, take a; when a ≤ x ≤ b, take x; when x > b, take b. Therefore, the control quantities of V, Y, Ψ, E, F, and G after calculation are all restricted between their respective allowable minimum and maximum boundaries to prevent the control parameters from exceeding the safety range.
[0114] Furthermore, after performing a control adjustment, Q is recollected and recalculated. n ; When Q n ≥τ q or the number of recollections reaches the upper limit N retry the quality closed-loop of this sign candidate is terminated. If the standard is still not met when reaching the upper limit, it is recorded as a low-quality sample and enters the historical database for subsequent threshold and model iteration.
[0115] The method for processing the sign candidate images with qualified quality is for the nth sign candidate, expand the bounding box B n horizontally by the expansion amount D x and vertically by the expansion amount D y to perform outward expansion to form the sign cropping region Crop n , which serves as the processed sign candidate image;
[0116] Furthermore, to reduce bandwidth occupancy, the edge side does not upload the full-scale original image but only uploads the local region corresponding to the sign candidate. For the nth sign candidate, expand the bounding box B n horizontally by the expansion amount D x and vertically by the expansion amount D y to perform outward expansion to form the sign cropping region Crop n , and the expression is:
[0117] Crop n =(max(0,X 1n -Dx),max(0,Y 1n -Dy),min(W img -1,X 2n +Dx),min(H img -1,Y 2n +Dy))
[0118] Among them, Crop n represents the cropping region of the nth sign candidate; X 1n , Y 1n , X 2n , Y 2n respectively represent the upper-left and lower-right coordinates of the original bounding box; Dx and Dy are the horizontal and vertical expansion amounts respectively; W img and H imgThese represent the width and height of the current image, respectively.
[0119] Processed candidate image of the sign and GPS coordinates of the acquisition location pos Vehicle position information Pose car Collection timestamp T cap Signage Type sign Image quality score Q img The fields are collectively encapsulated into an encapsulated data packet P. cloud .
[0120] Furthermore, data is uploaded in real time when the network is normal; when the network is interrupted, it is written to the edge-end buffer queue and re-uploaded in chronological order after the link is restored. Considering that the goal of this invention is to reduce edge-cloud transmission bandwidth, D sense The full point cloud P lidar,k Usually do not follow P cloud Instead of uploading all at once, the synchronization and location lookup with the image frame are completed at the edge; if necessary, distance or orientation summaries can be extracted from the local point cloud corresponding to the sign candidate and appended to P as an extended field. cloud .
[0121] S4. The cloud-based cognitive processing module parses the signage cutout area in the encapsulated data packet obtained in step S3, and sequentially performs text line detection, Chinese and English semantic parsing, line-level matching, and structured field generation to obtain the cloud recognition result;
[0122] Furthermore, the cloud recognition result JSON obtained in step S4 out This includes sign type, sign area coordinates, Chinese text, English text, text line coordinates, collection location, collection time, quality score, and version number.
[0123] Furthermore, the text line detection is performed by the text line detection model L. det Execution, L det A lightweight text detection network of the DBNet type is used. det The input is a cropped image of the sign. img The output is a set of text lines L={l u}, where each text line box l u At least include the line box coordinates (Bbox). line,u , direction angle θ line,u and row-level confidence level s line,u .
[0124] The multimodal semantic parsing is performed by model M. sem Execute. M sem The input is a cropped image of the sign. imgThe text box set L and the preset field template FieldSet are used to output Chinese text Text. cn,u English text en,v Line-level semantic type u And field affiliation. Constrain M using a fixed field template. sem The output format allows cloud results to be directly generated into structured JSON results. out .
[0125] To address the issue that Chinese and English place names may be distributed across different lines of text, this invention introduces an inter-line matching score M. u,v This is used to automatically match Chinese and English lines during the structured output stage. The expression is:
[0126] M u,v =w y ·Y u,v +w x ·X u,v +w l ·L u,v ,
[0127] Y u,v =max(0,1-|y u -y v | / (max(h u ,h v )+ε6));X u,v =max(0,1-|x u,R -x v,L | / (W crop +ε7)),
[0128] L u,v =1, when the u-th text line is in Chinese and the v-th text line is in English, and the field types of the two are compatible; otherwise, L u,v =0, where M u,v Y represents the matching score between the u-th Chinese text line and the v-th English text line; u,v Indicates vertical alignment; X u,v Indicates horizontal proximity; L u,v This indicates the complementarity between the language and the field type; y u y v These represent the center ordinates of the two rows respectively; h u h v These represent the heights of the two lines respectively; x u,R The x-coordinate represents the right boundary of a Chinese line; v,L W represents the x-coordinate of the left boundary of the English line. crop Indicates the width of the area captured by the sign; w y w x wl These represent the three weight coefficients, and w y +w x +w l =1.
[0129] When M u,v Greater than or equal to the pairing threshold τ pair When the condition is met, the system will bind the corresponding Chinese and English lines to the same structured entry; otherwise, the Chinese or English line will be output as an unpaired field and missing items will be filled with null values.
[0130] Structured output JSON out At a minimum, it includes: sign type, sign area coordinates, Chinese text, English text, text line coordinates, collection location, collection time, quality score, and version number. This structured result is first sent to the long-term memory and deduplication module as structured input for unique ID generation, deduplication verification, and version merging. After the long-term memory and deduplication module completes the historical association, the final inspection result after deduplication and merging is sent to the result output module for reporting on the inspection platform.
[0131] S5. After receiving the cloud recognition result, the long-term memory and deduplication module associates the cloud recognition result with the historical inspection result to obtain the latest recognition result, completes the deduplication verification and version merging, and then updates the historical database.
[0132] Furthermore, the specific implementation method of step S5 includes the following steps:
[0133] S5.1. The long-term memory and deduplication module receives the cloud recognition result JSON generated in step S4. out By combining the sign image captured from the cloud-based recognition results, the acquisition location, acquisition time, and vehicle pose metadata, the current inspection record R is constructed. cur ;
[0134] S5.2. Generate a unique identifier U for each sign. n The expression is:
[0135] U n =Hash(F vis,n ||F geo,n ||F dir,n ||F type,n )
[0136] Among them, U n The nth sign candidate is represented by a unique identifier; Hash represents the hash function; F vis,n This represents the encoding of visual features extracted from the signage area image; F geo,n This represents the location code obtained by discretizing geographic coordinates; F dir,n Indicates the vehicle's direction of travel encoding; Ftype,n This indicates the sign type code; the symbol || indicates that fields are concatenated in a predetermined order.
[0137] Furthermore, visual feature encoding F vis,n The positional encoding F is obtained by generating a normalized feature vector from an image cropped from the sign using a visual encoding network. geo,n Obtained from geographic coordinates via discrete grid encoding or GeoHash encoding; Directional encoding F dir,n Obtained by discretizing the vehicle heading angle to a preset direction range; Type code F type,n The identifier is obtained by mapping the sign category label to a preset type dictionary. The above codes are concatenated in a fixed field order and then hashed to generate a unique identifier to ensure index consistency for the same sign during repeated inspections.
[0138] To account for coding differences caused by appearance variations when the same sign is photographed multiple times, this invention does not directly use U. n Instead of using strict equality as the sole criterion for deduplication, it further calculates the duplication similarity between the current record and the historical record h.
[0139] S5.3. Calculate the similarity D between the current inspection record obtained in step S5.1 and the h-th historical record. n,h The expression is:
[0140] D n,h =w v ·S vis,n,h +w g ·S geo,n,h +w d ·S dir,n,h
[0141] Among them, D n,h S represents the similarity between the current nth sign candidate and the hth historical record; vis,n,h Indicates visual feature similarity; S geo,n,h Indicates geographical similarity; S dir,n,h Indicates the similarity of the collection direction; w v w g w d These are the weighting coefficients for the visual item, the geographical item, and the directional item, respectively.
[0142] Furthermore, each repetition similarity component can be calculated in the following form: S vis,n,h =cos(F vis,n ,F vis,h );S geo,n,h =exp(-||G n -G h ||2 / σ g );S dir,n,h =1-min(|θn -θ h |,360-|θ n -θ h |) / 180;
[0143] Where cos(·,·) represents the cosine similarity; G n and G h These represent the geographic coordinate vectors of the current record and the historical record, respectively; σ g Represents the geographical distance attenuation coefficient; θ n and θ h These represent the acquisition direction angles of the current record and the historical record, respectively; ||·||2 represents the Euclidean norm. A joint determination of similarity based on visual, geographical, and directional similarities avoids the misidentification of the same sign as different objects due to variations in shooting angle and lighting, which can occur solely based on strict hash equality.
[0144] When D n,h Greater than or equal to the deduplication threshold τ dup When the current inspection record and the h-th historical record are identified as the same sign, the historical entry is updated by merging versions; otherwise, a new entry is created and written to the historical database; when merging versions, the latest high-quality image, the latest recognition result, and the complete time series are retained.
[0145] S5.4. After completing the deduplication verification and version merging, the long-term memory and deduplication module will send the final inspection results to the result output module.
[0146] S6. The autonomous iteration module updates the model parameters, quality thresholds, and route priorities based on the updated historical database, and then feeds them back into the autonomous planning module to carry out a new round of inspection tasks.
[0147] Furthermore, in step S6, the autonomous iteration module updates the model parameters using high-value samples, low-quality resampled samples, platform-manually corrected samples, and failed identification samples from the historical database as training samples.
[0148] Furthermore, the autonomous iteration module takes high-value samples, low-quality resampling samples, platform-manually corrected samples, and failed identification samples from the historical database as input to jointly update the edge-end model parameters, cloud-based parsing template parameters, quality judgment parameters, and route priority parameters. Among these, the edge-end model parameters include at least the edge lightweight signage detection model D. det Network weight parameters and edge retention threshold τ det The cloud-based template parsing parameters must include at least the FieldSet template, field type mapping rules, and the Chinese / English line pairing threshold τ. pair The quality assessment parameters should include at least the image quality threshold τ. q The route priority parameter includes the priority P of each candidate road segment.p,new .
[0149] For each candidate road segment p, the system updates its priority P for the next period based on the segment's false detections, missed detections, re-sampling, and manual corrections in the historical period. p,new The expression is:
[0150] P p,new =ρ·P p,old +(1-ρ)·(w m ·M miss,p +w q ·M reacq,p +w f ·M fix,p )
[0151] Among them, P p,new and P p,old These represent the new and old priorities of the p-th road segment, respectively; ρ is the smoothing coefficient; M miss,p Indicates the historical missed detection intensity of this road section; M reacq,p Indicates the historical re-sampling frequency of this road section; M fix,p This indicates the historical frequency of manual corrections for this road section; w m w q w f These are the weighting coefficients for the missed detection item, the re-sampling item, and the correction item, respectively, and w m +w q +w f =1.
[0152] Updated edge-lightweight signage detection model network weight parameters and edge-preserving threshold τ det and image quality threshold τ q Backfeed to the edge; updated field template FieldSet, field type mapping rules, and Chinese / English line pairing threshold τ. pair Feedback to the cloud-based cognitive processing module; updated road segment priority P p,new The system feeds back into the autonomous planning module, enabling higher sampling density to be configured in key road sections during the next cycle of inspections, and adopting a more proactive embodied supplementary sampling strategy in high-risk lighting environments.
[0153] The following set of feasible parameter examples is provided to illustrate the engineering implementation of the present invention, without limiting the scope of protection of the present invention:
[0154] (1) Edge detection threshold τ det It can be set to 0.50–0.70; multi-source synchronization threshold τ sync It can be set to 50ms~100ms;
[0155] (2) Image quality threshold τ qIt can be set to 0.70~0.85; the upper limit for resampling is N. retry It can be set to 2 to 4 times;
[0156] (3) Vehicle speed limit V max The speed limit can be set from 20km / h to 60km / h depending on the road grade; the lateral offset boundary Y min ~Y max Determined according to lane markings and safety rules;
[0157] (4) Images uploaded from the edge are in a lossless or near-lossless compression format after region cropping; the upload packet for a single sign area can be controlled to be in the tens of KB range;
[0158] (5) The geolocation code when generating the unique identifier can be a road-level discrete grid, and the direction code can be a vehicle heading discrete interval.
[0159] (6) Text line pairing threshold τ pair It can be set to 0.60~0.80; the deduplication threshold τ dup The value can be set to 0.75–0.90; the path smoothing coefficient ρ can be set to 0.60–0.85; the weights w for visual, geographic, and directional items. v w g w d It can be configured according to road grade and inspection cycle, with an optimal increase in w being preferred in scenarios with high repetitive inspections. g and w d The value of .
[0160] It should be noted that relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0161] Although this application has been described above with reference to specific embodiments, various modifications can be made and components can be replaced with equivalents without departing from the scope of this application. In particular, as long as there is no structural conflict, the features in the specific embodiments disclosed in this application can be combined with each other in any way. The lack of an exhaustive description of these combinations in this specification is merely for the sake of brevity and resource conservation. Therefore, this application is not limited to the specific embodiments disclosed herein, but includes all technical solutions falling within the scope of the claims.
Claims
1. A road sign inspection system, characterized in that, It includes a task interface module, an autonomous planning module, an edge perception and detection module, an embodied interaction closed-loop module, a cloud-based cognitive processing module, a result output module, a long-term memory and deduplication module, and an autonomous iteration module. The task interface module, autonomous planning module, edge perception and detection module, embodied interaction closed-loop module, cloud cognitive processing module, and long-term memory and deduplication module are connected in sequence. The long-term memory and deduplication module are respectively connected to the result output module and the autonomous iteration module, and the autonomous iteration module is connected to the autonomous planning module. The task interface module inputs the inspection task package into the autonomous planning module; The autonomous planning module outputs route and stop point parameters to the edge perception and detection module based on the inspection task package; The edge perception and detection module drives the unmanned vehicle to conduct inspections according to the inspection route and stop point parameters. The detected sign candidate results are sent to the embodied interaction closed loop module to perform quality assessment and secondary data collection, and obtain sign data that meets the quality standards. After preprocessing, the sign data that meets the quality standards is formed into an encapsulated data package and sent to the cloud cognitive processing module. The cloud-based cognitive processing module sends the cloud-based recognition results to the long-term memory and deduplication module. The long-term memory and deduplication module processes the cloud recognition results and historical inspection results, performs deduplication verification and version merging, updates the historical database, and then sends it to the result output module. The autonomous iteration module trains the model based on the updated historical database to obtain updated model parameters, quality thresholds, and route priorities, and then feeds them back into the autonomous planning module.
2. The road sign inspection system according to claim 1, characterized in that, The modules of the road sign inspection system communicate with each other using standardized data packets.
3. A road sign inspection system according to claim 2, characterized in that, The unmanned vehicle is equipped with a camera, lidar, and GNSS / IMU integrated navigation unit.
4. A method for inspecting a road sign inspection system according to any one of claims 1-3, characterized in that, Includes the following steps: S1. The task interface module sends the inspection task package to the autonomous planning module, and the autonomous planning module generates the inspection route and stopping points. S2. The edge perception and detection module drives the unmanned vehicle to perform inspections according to the inspection route and stop point parameters, collect images, point clouds and pose information, and then establish a synchronization constraint for the sampling time difference. The obtained images that meet the synchronization constraint are used to perform edge lightweight sign detection to obtain sign candidate images. S3. The embodied interaction closed-loop module performs quality assessment on the candidate sign images obtained in step S2. For candidate sign images that are below the quality threshold, the driver vehicle and camera are driven to perform secondary acquisition to obtain candidate sign images that meet the quality standards. After processing, the packaged data package is uploaded to the cloud cognitive processing module. S4. The cloud-based cognitive processing module parses the signage cutout area in the encapsulated data packet obtained in step S3, and sequentially performs text line detection, Chinese and English semantic parsing, line-level matching, and structured field generation to obtain the cloud recognition result; S5. After receiving the cloud recognition result, the long-term memory and deduplication module associates the cloud recognition result with the historical inspection result to obtain the latest recognition result, and updates the historical database after completing the deduplication verification and version merging. S6. The autonomous iteration module updates the model parameters, quality thresholds, and route priorities based on the updated historical database, and then feeds them back into the autonomous planning module to carry out a new round of inspection tasks.
5. The inspection method for a road sign inspection system according to claim 4, characterized in that, The specific implementation method of step S1 includes the following steps: S1.
1. The autonomous planning module receives a task package from the task interface module. The task package includes the set of road sections to be covered, the inspection cycle, the set of key areas, the allowed operation time window, and safety rules. S1.
2. The autonomous planning module abstracts the road network to be inspected into a directed graph. For each candidate road segment, it calculates the segment utility value and uses the segment utility value as... The weighting criteria for path search and stop selection, the optimal inspection route, and the formula for calculating the road segment utility value are as follows: J p =w c ·C p +w h ·H p -w t ·T p -w r ·R p; Among them, J p C represents the overall inspection utility value of the p-th candidate road segment; p Indicates the coverage gain of this road segment for uncovered signs; H p This indicates the priority of high-value signage based on historical memory statistics; T p R represents the time cost of entering this road segment; p This indicates the risk costs associated with construction, congestion, temporary traffic control, or safety restrictions; c w h w t w r These are the weighting coefficients for coverage items, historical items, time items, and risk items, respectively. S1.
3. Based on the obtained optimal inspection route, the autonomous planning module generates a stop sequence, target speed limit, camera default parameters, and priority table for key signs.
6. The inspection method for a road sign inspection system according to claim 5, characterized in that, The specific implementation method of step S2 includes the following steps: S2.
1. The edge perception and detection module drives the camera, LiDAR, and positioning attitude module to sample synchronously according to the execution strategy issued by the autonomous planning module, and encapsulates the image, point cloud, and pose information within the same sampling period into a data packet D. sense ; S2.
2. Construct a sampling time difference execution synchronization constraint method. For the k-th sampling, calculate the difference ΔT between the maximum and minimum timestamps. k Only if ΔT k Not greater than the synchronization threshold τ sync When the sampling is valid, it is determined to be valid. S2.
3. The YOLO11n single-stage object detection network, which is sampled and executed in real time on the vehicle side, performs edge-lightweight sign detection on the effectively sampled images and outputs the sign region bounding box B. n Confidence level P n and signage type n For confidence level P n Greater than or equal to the edge preservation threshold τ det Then, the rectangular mask corresponding to the detection box is processed to obtain the candidate image of the sign.
7. The inspection method for a road sign inspection system according to claim 6, characterized in that, In step S3, the quality of the candidate sign images acquired in the second acquisition is evaluated again until the image quality score is greater than or equal to the quality threshold or the number of acquisitions reaches the upper limit. The method for processing candidate images of signs that meet the quality standards is to, for the nth candidate sign, set the bounding box B... n According to the lateral expansion amount D x and longitudinal expansion D y Expand outwards to form a signage cutoff area. n , as candidate images for the processed sign; Processed candidate image of the sign and GPS coordinates of the acquisition location pos Vehicle position information Pose car Collection timestamp T cap Signage Type sign Image quality score Q img The fields are collectively encapsulated into an encapsulated data packet P. cloud .
8. The inspection method for a road sign inspection system according to claim 7, characterized in that, The cloud recognition result JSON obtained in step S4 out This includes sign type, sign area coordinates, Chinese text, English text, text line coordinates, collection location, collection time, quality score, and version number.
9. The inspection method for a road sign inspection system according to claim 8, characterized in that, The specific implementation method of step S5 includes the following steps: S5.
1. The long-term memory and deduplication module receives the cloud recognition result JSON generated in step S4. out By combining the sign image captured from the cloud-based recognition results, the acquisition location, acquisition time, and vehicle pose metadata, the current inspection record R is constructed. cur ; S5.
2. Generate a unique identifier U for each sign. n The expression is: U n =Hash(F vis,n ||F geo,n ||F dir,n ||F type,n ); Among them, U n The nth sign candidate is represented by a unique identifier; Hash represents the hash function; F vis,n This represents the encoding of visual features extracted from the signage area image; F geo,n This represents the location code obtained by discretizing geographic coordinates; F dir,n Indicates the vehicle's direction of travel encoding; F type,n This indicates the sign type code; the symbol || indicates that fields are concatenated in a predetermined order. S5.
3. Calculate the similarity D between the current inspection record obtained in step S5.1 and the h-th historical record. n,h The expression is: D n,h =w v ·S vis,n,h +w g ·S geo,n,h +w d ·S dir,n,h; Among them, D n,h S represents the similarity between the current nth candidate sign and the hth historical record; vis,n,h Indicates visual feature similarity; S geo,n,h Indicates geographical similarity; S dir,n,h Indicates the similarity of the collection direction; w v w g w d These are the weighting coefficients for the visual item, the geographical item, and the directional item, respectively. When D n,h Greater than or equal to the deduplication threshold τ dup When the current inspection record and the h-th historical record are identified as the same sign, the historical entry is updated by merging versions; otherwise, a new entry is created and written to the historical database; when merging versions, the latest high-quality image, the latest recognition result, and the complete time series are retained. S5.
4. After completing the deduplication verification and version merging, the long-term memory and deduplication module will send the final inspection results to the result output module.
10. The inspection method for a road sign inspection system according to claim 9, characterized in that, Step S6, the autonomous iteration module, updates the model parameters using high-value samples, low-quality resampled samples, platform-corrected samples, and failed identification samples from the historical database as training samples.