Road anomaly event identification method, system, terminal and medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using multi-view image processing and feature mapping technology, feature images and text prompts for road anomalies are generated, which solves the problem of low recognition accuracy in existing technologies and improves the recognition accuracy and information richness of road anomalies.

CN117789139BActive Publication Date: 2026-06-19SHANGHAI SANSI ELECTRONICS ENG +4

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHANGHAI SANSI ELECTRONICS ENG
Filing Date: 2023-12-29
Publication Date: 2026-06-19

Application Information

Patent Timeline

29 Dec 2023

Application

19 Jun 2026

Publication

CN117789139B

IPC: G06V20/54; G06V10/40; G06V10/774; G06V10/82; G06F40/166; G06V10/30; G06V10/84

AI Tagging

Application Domain

Character and pattern recognition Natural language data processing

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies based on single-frame or limited-frame images have low accuracy in identifying road anomalies, leading to inaccurate judgments of traffic violations, foreign object intrusions, and sudden dangerous events, which may cause greater danger.

Method used

By acquiring multi-view images, extracting the mutation features of manifest variables and mapping them to the probability distribution of latent variables, low-dimensional data information is generated. After denoising, a pre-trained road anomaly event recognition model is used to generate feature images, and combined with a text generation model to provide text prompts.

Benefits of technology

It improves the accuracy of identifying abnormal road events, provides richer information to support road management personnel in timely handling, and reduces the risk of misjudgment.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117789139B_ABST

Patent Text Reader

Abstract

This application provides a method, system, terminal, and medium for identifying road anomalies. It constructs a road anomaly identification model and, based on this model, generates new road anomaly feature images reflecting key attribute information of the road anomalies from multi-view images of the road anomalies that have undergone dimensionality reduction and noise reduction processing. Furthermore, based on a constructed road anomaly text generation model, it generates text prompts describing the road anomalies from the feature images. This invention, based on generative artificial intelligence technology, achieves panoramic recognition of road anomalies, effectively improving the accuracy of road anomaly identification. It also provides richer information about road anomalies, offering effective evidence and support for road management personnel in handling such incidents, and helping them to take faster action.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image recognition technology, and in particular to a method, system, terminal, and medium for identifying abnormal road events. Background Technology

[0002] Traffic violations such as vehicles crossing lane lines, occupying lanes, illegal parking, and driving against traffic; foreign objects such as debris, spilled objects, and accident residue appearing on the road surface; and sudden dangerous events such as traffic accidents, large gatherings of people, and natural disasters are collectively referred to as abnormal road events.

[0003] On existing highways or high-grade roads, abnormal road events such as traffic violations, foreign object intrusions, and sudden dangerous events frequently occur. Currently, artificial intelligence technology is mainly used to identify event images. For example, a neural network model pre-trained based on abnormal road images and normal road images is used to predict whether the road image to be identified is abnormal.

[0004] However, due to the limited shooting angle, obstruction and limited ambient light of roadside and inter-road cameras, the information quality of the images used to identify abnormal road events is often incomplete or lost, resulting in a very low accuracy rate of road abnormal event identification achieved by using only a single frame or a limited number of frames.

[0005] If the accuracy rate of identifying abnormal road events is low, and the boundaries of traffic violations, the size and properties of foreign objects, and the nature and scope of dangerous events are not clearly determined, it is highly likely that misjudgments will occur, preventing timely and appropriate measures from being taken. This could lead to the event developing in an uncertain direction and potentially causing greater danger. Therefore, it is essential to effectively improve the accuracy rate of identifying abnormal road events such as traffic violations, foreign object intrusions, and sudden dangerous situations to avoid incalculable losses to the transportation system. Summary of the Invention

[0006] In view of the shortcomings of the prior art described above, the purpose of this application is to provide a road anomaly event recognition method, system, terminal and medium to solve the problem of low accuracy in road anomaly event recognition based on single-frame images or a limited number of image frames in the prior art.

[0007] To achieve the above and other related objectives, a first aspect of this application provides a method for identifying road anomaly events. The method includes: identifying whether a road anomaly event has occurred based on acquired multi-view images of the road to obtain multi-view images of the road anomaly event; wherein the multi-view images of the road include road images taken from different directions by cameras positioned at different locations; extracting abrupt change features of manifest variables from the multi-view images of the road anomaly event, mapping the abrupt change features to probability distributions of latent variables, generating low-dimensional data information describing key content features of the multi-view images, and performing noise reduction processing on the low-dimensional data information; based on pre-... A trained road anomaly event recognition model generates a road anomaly event feature image based on denoised low-dimensional data. The road anomaly event feature image includes key attribute information of the road anomaly event, including at least one or more of the following: type of road anomaly event, location of occurrence, and violation details. A pre-trained road anomaly event text generation model generates matching text prompts describing the road anomaly event based on the road anomaly event feature image. The road anomaly event feature image and the matching text prompts are uploaded to a network platform to remind road management personnel to take timely measures.

[0008] In some embodiments of the first aspect of this application, the training method of the road anomaly event recognition model includes: extracting abrupt change features of manifest variables from multi-view images of road anomalies, mapping the abrupt change features to probability distributions of latent variables, and generating low-dimensional data information to describe key content features of the multi-view images; denoising the low-dimensional data information to obtain sample data; training a contrastive learning model using the obtained sample data to construct a primary road anomaly event recognition model for generating feature images of road anomalies; and adjusting and updating the parameters of the primary road anomaly event recognition model based on a defined contrastive loss function to obtain a finally converged road anomaly event recognition model.

[0009] In some embodiments of the first aspect of this application, extracting abrupt change features of manifest variables from multi-view images of road anomalies and mapping the abrupt change features to probability distributions of latent variables to generate low-dimensional data information describing key content features of the multi-view images includes: extracting key features from multi-view images of road anomalies; wherein the key features are used to describe key content features of the multi-view images, including color, texture, shape, structure, and edge features; based on the extracted key features, organizing the multi-view images into clusters with similar features, and separating noise data in the multi-view images into separate clusters; analyzing similar images within each cluster to find patterns of abrupt changes or significant variations, generating abrupt change feature data of manifest variables in the multi-view images; wherein the patterns include local changes in the image, color distribution changes, and texture differences; mapping the abrupt change feature data to points in a latent representation space, while preserving the probability distribution information of the mapping, to generate probability distributions of latent variables in the multi-view images; and sampling points in the latent representation space to generate low-dimensional data information.

[0010] In some embodiments of the first aspect of this application, the denoising method for the low-dimensional data information includes: extracting key features from the low-dimensional data information and mapping the extracted key features back to the original latent representation space to obtain denoised low-dimensional data information; wherein the key features include color, texture, shape, structure and edge features.

[0011] In some embodiments of the first aspect of this application, adjusting and updating the parameters of the primary road anomaly recognition model based on a defined contrastive loss function to obtain a finally converged road anomaly recognition model includes: randomly performing augmentation transformations on the sample data, taking two related views that generate the same sample as positive sample pairs, and not explicitly extracting augmented sample pairs with an equal number of positive sample pairs as negative sample pairs; defining a normalized cross-entropy loss function for positive and negative sample pairs, and continuously adjusting and updating the parameters of the primary road anomaly recognition model based on the similarity scores of the positive and negative sample pairs to obtain a converged road anomaly recognition model.

[0012] In some embodiments of the first aspect of this application, generating a road anomaly event feature image based on a pre-trained road anomaly event recognition model and denoised low-dimensional data information includes: introducing an attention mechanism into the intermediate layer of the neural network in the pre-trained road anomaly event recognition model; calculating an attention weight for each position of each feature map generated by the intermediate layer through a fully connected layer; multiplying each feature map of the intermediate layer with the obtained corresponding attention weight to obtain a weighted feature map; and extracting a representation vector from the weighted feature map to generate a road anomaly event feature image.

[0013] In some embodiments of the first aspect of this application, the training method of the road anomaly event text generation model includes: training a cross-attention model using the road anomaly event feature image as sample data to obtain a primary road anomaly event text generation model, and generating a matching text vector based on the sample data; adjusting and updating the parameters of the primary road anomaly event text generation model based on a defined contrastive loss function to obtain a final converged road anomaly event text generation model.

[0014] To achieve the above and other related objectives, a second aspect of this application provides a road anomaly event recognition system, comprising: a multi-view image acquisition module, used to identify whether a road anomaly event has occurred based on the acquired multi-view images of the road to obtain multi-view images of the road anomaly event; wherein the multi-view images of the road include road images taken from different directions by cameras set at different locations; an anomaly event data representation module, used to extract abrupt change features of manifest variables from the multi-view images of the road anomaly event, map the abrupt change features to probability distributions of latent variables, generate low-dimensional data information describing key content features of the multi-view images, and perform noise reduction processing on the low-dimensional data information; and event attributes. A panoramic recognition module is used to generate a road anomaly event feature image based on a pre-trained road anomaly event recognition model and denoised low-dimensional data information. The road anomaly event feature image includes key attribute information of the road anomaly event, including at least one or more of the following: type of road anomaly event, location of occurrence, and violation details. An event matching text generation module is used to generate matching text prompts describing the road anomaly event based on a pre-trained road anomaly event text generation model and the road anomaly event feature image. An image and text information uploading module is used to upload the road anomaly event feature image and the matching text prompts to a network platform to remind road management personnel to take timely measures.

[0015] To achieve the above and other related objectives, a third aspect of this application provides an electronic terminal, comprising: a processor and a memory; the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory to enable the terminal to perform any of the road anomaly event identification methods.

[0016] To achieve the above and other related objectives, a fourth aspect of this application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements any of the road anomaly event identification methods.

[0017] As described above, this application provides a method, system, terminal, and medium for identifying road anomalies. It constructs a road anomaly identification model and, based on this model, generates new road anomaly feature images reflecting key attribute information of the road anomalies from multi-view images of the road anomalies that have undergone dimensionality reduction and noise reduction processing. Furthermore, based on a constructed road anomaly text generation model, it generates text prompts describing the road anomalies from the feature images. This invention, based on generative artificial intelligence technology, achieves panoramic recognition of road anomalies, effectively improving the accuracy of road anomaly identification. It also provides richer information about road anomalies, offering effective evidence and support for road management personnel in handling such incidents, and helping them to take faster action. Attached Figure Description

[0018] Figure 1 The diagram shown is a flowchart illustrating a road anomaly event identification method according to an embodiment of this application.

[0019] Figure 2 The diagram shown is a flowchart illustrating the process of generating low-dimensional data information to describe key content features of multi-view images in one embodiment of this application.

[0020] Figure 3 The diagram shown is a flowchart illustrating the training of a road anomaly event recognition model according to an embodiment of this application.

[0021] Figure 4 The diagram shows a flowchart of generating a road anomaly event feature image based on a road anomaly event recognition model in one embodiment of this application.

[0022] Figure 5 The diagram shown is a flowchart illustrating the training process for a road anomaly event text generation model according to one embodiment of this application.

[0023] Figure 6 The diagram shown is a schematic representation of a road anomaly event recognition system according to an embodiment of this application.

[0024] Figure 7 The diagram shown is a structural schematic of a road anomaly event recognition terminal in one embodiment of this application. Detailed Implementation

[0025] The following specific examples illustrate the implementation of this application. Those skilled in the art can easily understand other advantages and effects of this application from the content disclosed in this specification. This application can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of this application. It should be noted that, unless otherwise specified, the following embodiments and features in the embodiments can be combined with each other.

[0026] It should be noted that in the following description, reference is made to the accompanying drawings, which illustrate several embodiments of this application. It should be understood that other embodiments may also be used, and changes in mechanical composition, structure, electrical system, and operation may be made without departing from the spirit and scope of this application. The following detailed description should not be considered limiting, and the scope of the embodiments of this application is defined only by the claims of the published patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the application. Spatially related terms, such as “upper,” “lower,” “left,” “right,” “below,” “below,” “lower part,” “above,” “upper part,” etc., may be used herein to illustrate the relationship between one element or feature shown in the figures and another element or feature.

[0027] In this application, unless otherwise expressly specified and limited, the terms "installation," "connection," "linking," "fixing," and "holding" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection between two components. Those skilled in the art can understand the specific meaning of the above terms in this application according to the specific circumstances.

[0028] Furthermore, as used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context indicates otherwise. It should be further understood that the terms “comprising,” “including,” indicate the presence of the stated feature, operation, element, component, item, kind, and / or group, but do not preclude the presence, occurrence, or addition of one or more other features, operations, elements, components, items, kinds, and / or groups. The terms “or” and “and / or” as used herein are interpreted as inclusive, or mean any one or any combination thereof. Thus, “A, B, or C” or “A, B, and / or C” means “any one of: A; B; C; A and B; A and C; B and C; A, B, and C.” Exceptions to this definition arise only when combinations of elements, functions, or operations are inherently mutually exclusive in some manner.

[0029] To address the problems described in the background section, this invention provides a method, system, terminal, and medium for identifying road anomaly events, aiming to solve the problem of low accuracy in road anomaly event identification based on single-frame or a limited number of frames in the prior art. Meanwhile, to make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions in the embodiments of this invention are further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are only for explaining the invention and are not intended to limit the invention.

[0030] This invention provides a method, system, terminal, and medium for identifying abnormal road events. Regarding the implementation of the method, this invention will describe exemplary implementation scenarios for generating feature images of abnormal road events and matching text prompts.

[0031] like Figure 1 The diagram illustrates a flowchart of a road anomaly event identification method according to an embodiment of the present invention. The road anomaly event identification method in this embodiment mainly includes the following steps:

[0032] Step S1: Based on the collected multi-view images of the road, identify whether road anomalies have occurred to obtain multi-view images of road anomalies.

[0033] The multi-view road images include road images taken from different angles by cameras positioned at different locations. In existing intelligent transportation management systems, a large number of cameras are typically installed. These cameras can cover different parts of the road and provide rich road image information from multiple perspectives through shooting from different positions and angles, such as front-end shooting, rear-end shooting, overhead shooting, and side shooting.

[0034] Specifically, the collected multi-view images of the road must include panoramic images and local detail images obtained from one or more shooting directions, such as front-end shooting, rear-end shooting, top-down shooting, and side shooting. The images must also show various details of the road, surrounding area, and vehicles, such as scratches, dents, and damaged parts.

[0035] Taking a rear-end collision as an example, based on the collected multi-view images of the road, an abnormal road event is identified, and multi-view images of the rear-end collision are obtained. These multi-view images include at least one front panoramic image of the vehicle's front end, one rear panoramic image of the vehicle's rear end, and one detailed image of the collision site. Specifically, the front panoramic image, captured from the front of the rear-ended vehicle, shows the extent of damage to the rear of the rear-ended vehicle, including collision marks, scratches, or deformation features; the rear panoramic image, captured from the rear of the rear-ending vehicle, shows the front of the rear-ending vehicle and the contact between it and the rear of the rear-ended vehicle, reflecting the relative position and angle of collision between the two vehicles; and the detailed image of the collision site, captured from the side of the rear-ended vehicle, typically shows one side of the rear-ended vehicle and the other side of the potential rear-ending vehicle, allowing the observer to clearly see the vehicle's details.

[0036] Taking a wrong-way driving accident as an example, the obtained multi-view images can include at least: road sign images, detailed images of the two-vehicle collision, and a diagram showing the driving direction and position of the vehicles. The road sign images can be obtained by capturing images from the front of a camera. In this case, the road sign is located in the center of the image, and additional information such as the type of road sign and the direction it indicates is clearly visible. Simultaneously, capturing images of parking spaces from a frontal angle clearly shows the parking position of the vehicles and captures the relationship between the vehicles and the parking spaces to determine if there is wrong-way driving or illegal parking. The detailed images of the two-vehicle collision can be captured by shooting from above the point of impact, showing the relative positions of the vehicles, the angle of impact, and the extent of damage. Additionally, side-view images can obtain images of the driving direction and position of the vehicles, highlighting their parking status to identify whether there are illegally parked or wrong-way parked vehicles. Rear-end cameras can capture any vehicles involved in the wrong-way driving accident, and the obtained images can clearly display the license plate numbers.

[0037] The purpose of this design in this embodiment of the invention is to: acquire images of road anomalies from different perspectives using multiple cameras, which not only provides clearer images and solves the problem of incomplete image information quality caused by limited shooting angle, obstruction, or limited ambient light in a single image, but also provides more image content and richer information about road anomalies based on multi-view images. This allows the generated road anomaly feature image to include key attribute information of the road anomaly, providing effective basis and evidence for handling road anomalies.

[0038] Step S2: Extract the mutation features of the manifest variables from the multi-view images of road anomalies, map the mutation features to the probability distribution of the latent variables, generate low-dimensional data information to describe the key content features of the multi-view images, and perform noise reduction processing on the low-dimensional data information.

[0039] The purpose of this design in this embodiment is to effectively reduce the computational requirements for subsequent panoramic recognition processing of road anomalies and to reduce the computational load of generating feature images of road anomalies by replacing the abrupt changes in the apparent variables of road anomalies such as traffic violations, foreign object intrusions, and sudden dangerous events with low-dimensional data information in the potential representation space.

[0040] In one embodiment, such as Figure 2 As shown, step S2 includes:

[0041] Step S21: Extract key features from multi-view images of road anomalies; wherein the key features are used to describe the key content features of the multi-view images, including color, texture, shape, structure and edge features.

[0042] In one specific embodiment, key features are extracted from multi-view images of road anomaly events using Histogram of Oriented Gradient (HOG). The extraction process includes:

[0043] (1) The input multi-view image is normalized in color space using the Gamma correction method. By performing an exponential transformation on the grayscale values of the multi-view image and correcting brightness deviations, the contrast of the multi-view image is adjusted to reduce the impact of local shadows and lighting changes on the multi-view image and suppress interference from noise data. The Gamma correction formula is as follows:

[0044] f(I)=I γ (1)

[0045] Where I is the input image; γ is the correction coefficient, typically 0.5; and f(I) is the value of the input image after color space standardization.

[0046] (2) Calculate the gradient information of each pixel in the standardized multi-view image to capture the contour information of the image and further weaken the interference of illumination; wherein, the gradient information includes gradient magnitude and gradient direction; the formula for calculating the gradient information of the pixel (x, y) of the image is:

[0047] G x (x,y)=H(x+1,y)-H(x-1,y); (2)

[0048] G y (x, y)=H(x, y+1)-H(x, y-1); (3)

[0049] Among them, G x (x, y), G y The H(x,y) distribution represents the horizontal gradient, vertical gradient, and pixel value of a pixel (x,y) in the input image.

[0050] (3) Divide the image into cell units of equal size and count the gradient histogram of each cell unit, that is, count the number of different gradients in each cell unit to form the features of each cell unit. Specifically, each cell unit can be divided into 3*3 pixels. It should be noted that the size of each cell unit can be set according to the requirements, and this invention does not limit it.

[0051] (4) Several adjacent cell units are combined into a block with mutual overlap. The features of all cell units in each block are connected in series to obtain the directional gradient histogram features of the block.

[0052] (5) Standardize the directional gradient histogram features of each block to ensure that the numerical ranges of different features are consistent. The standardization formula is as follows:

[0053]

[0054] Where x is the histogram of oriented gradients (HARQ) feature of each block; mean is the mean of the histogram of oriented gradients (HARQ) feature of each block; σ is the standard deviation of the histogram of oriented gradients (HARQ) feature of each block; X ′ The normalized value of the directional gradient histogram feature for each block.

[0055] Step S22: Based on the extracted key features, organize the multi-view images into clusters with similar features, and separate the noisy data in the multi-view images into separate clusters.

[0056] Clustering analysis is a machine learning and data analysis technique used to divide samples or data points in a dataset into clusters with similar characteristics or attributes in order to discover inherent structures or patterns in the data. Specifically, clustering analysis grouping and summarizing data is achieved by assigning similar data points to the same cluster and dissimilar data points to different clusters. Since multi-view images contain a large amount of data information, using clustering analysis to organize this data into clusters with similar characteristics and separating noisy data into separate clusters can effectively reduce data complexity and the impact of noisy data on subsequent extraction of abrupt features.

[0057] In one specific embodiment, the multi-view image is analyzed using K-means clustering analysis. The K-means clustering analysis method uses minimizing the squared error between the sample and the mass point as its objective function. The specific process includes:

[0058] (1) Use methods such as the Elbow Method or Silhouette Score to determine the optimal number of clusters K. Taking the Elbow Method as an example, train multiple K-means models repeatedly using different selected values of the number of clusters K, and obtain the optimal value of the number of clusters K based on the degree of distortion of the output objective function.

[0059] The objective function is to minimize the squared error between the sample and the particle; the distortion degree is the sum of the squared distance errors between the particles in each cluster and the sample points within the cluster. A lower distortion degree indicates a more compact cluster structure; a higher distortion degree indicates a looser cluster structure. Furthermore, the distortion degree decreases as the number of clusters K increases. However, for data with a certain degree of discriminative power, the distortion degree will significantly improve at a certain critical point, and then slowly decrease. This critical point can be considered as the point with better clustering performance, thus yielding the optimal number of clusters K.

[0060] In a preferred embodiment, the number of clusters K is selected as 4.

[0061] (2) Select K data points as the initial cluster center of each cluster, calculate the distance between each data point in the multi-view image and each cluster center, and assign each data point to the nearest cluster.

[0062] (3) Use the average feature value of all data points in each cluster as the new center, and iteratively execute step (2) until convergence. In this way, after completing K-means clustering, each cluster represents a set of similar images.

[0063] Step S23: Analyze similar images within each cluster to find patterns of mutations or significant changes, and generate mutation feature data of the manifest variables in the multi-view images.

[0064] The patterns include local changes in the image, changes in color distribution, and texture differences; the explicit variables are features or attributes that can be directly observed or measured from the image, including brightness, contrast, texture, shape, and size. In this embodiment, key features of the multi-view image are extracted by extracting the explicit variables from the multi-view image.

[0065] Step S24: Map the mutation feature data to points in the latent representation space and preserve the probability distribution information of the mapping to generate the probability distribution of latent variables in the multi-view image.

[0066] In one specific embodiment, an encoder in a variational autoencoder is used to map the abrupt change features of explicit variables in the multi-view image to the probability distribution of latent variables.

[0067] A Variational Autoencoder (VAE) is a generative model consisting of an encoder and a decoder. It combines the ideas of autoencoders and probabilistic graphical models to learn latent representations of data and has the ability to generate new data. This invention utilizes the encoder to map the input mutation feature data as sample data to a latent representation space. Based on fitting the mean and variance of the sample data, a Gaussian distribution of the mean and variance is obtained to sample the probability distribution of latent variables. The Gaussian distribution of the mean and variance describes the features of the sample data.

[0068] Specifically, the Variational Autoencoder (VAE) treats each of the K clusters constructed in the above steps as a sample X. k For each sample X k Match a Gaussian distribution Two neural networks are constructed to fit the mean and variance of the corresponding samples. The variational autoencoder (VAE) ensures that the Gaussian distributions of each mean and variance tend to a standard Gaussian distribution N(0,1) to capture each sample X. k The feature data in the sample X. k The fitting formulas for the mean and variance are:

[0069] μ k =f1(X k (5)

[0070]

[0071] Where, μ k For sample X k The mean; For sample X k The variance.

[0072] Step S25: Sample points in the latent representation space to generate low-dimensional data information.

[0073] In one specific embodiment, the present invention uses a reparameterized gradient descent algorithm to train a variational autoencoder (VAE), enabling the VAE to generate a random sample representing a point in the latent representation space from the probability distribution of latent variables in the multi-view image, avoiding direct sampling from the probability distribution. This achieves backpropagation, making training more stable, and the generated low-dimensional data includes more comprehensive and accurate key content features for describing the multi-view image.

[0074] Step S26: Denoise the low-dimensional data information.

[0075] In one embodiment, step S26 includes: extracting key features from the low-dimensional data information and mapping the extracted key features back to the original latent representation space to obtain denoised low-dimensional data information. The key features are consistent with those in step S21, including color, texture, shape, structure, and edge features.

[0076] In one specific embodiment, the U-Net network in the Stable Diffusion model is used to denoise the low-dimensional data information in the latent representation space.

[0077] First, the encoder of the U-Net network is used to extract key features from the low-dimensional data. Specifically, the low-dimensional data is processed into a 256×256 image as the initial data input to the encoder of the U-Net network. The initial data is then processed using a 3×3 convolutional kernel and a 2×2 pooling layer with a stride of 2 for dimensionality reduction encoding. That is, the initial data is first convolved and then pooled using the ReLU activation function. Finally, the feature map of the key features is obtained.

[0078] Then, the decoder of the U-Net network maps the key features back to the original latent representation space, thereby recovering the denoised image data. Specifically, during the decoding process, after receiving the feature map of the key features, the U-Net network performs a deconvolution operation on the feature map using a 3×3 deconvolution kernel, doubling the size of the feature map. The doubled feature map is then merged with the original feature map so that the decoder of the U-Net network can decode denoised low-dimensional data information of the same size as the original low-dimensional data information.

[0079] By using the U-Net network to extract key features from the low-dimensional data to complete the denoising task, it is possible not only to effectively process noise and remove it from the data, but also to better preserve low-level details and high-level semantic information, ensuring that the denoised low-dimensional data can still retain the key content features of the original multi-view image.

[0080] Step S3: Based on the pre-trained road anomaly event recognition model, generate road anomaly event feature images according to the denoised low-dimensional data information.

[0081] The road anomaly event feature image includes key attribute information of the road anomaly event, which can be directly used as the basis and evidence for handling road anomalies. The key attribute information includes at least one or more of the following: the type of road anomaly event, its location, and the nature of the violation. Specifically, taking the aforementioned rear-end collision as an example, its road anomaly event feature image should at least include the relative position between the rear-ending vehicle and the rear-ended vehicle, the contact details between the front of the rear-ending vehicle and the rear of the rear-ended vehicle, the angle of collision between the two vehicles, the extent of damage to the rear of the rear-ended vehicle (including collision marks, scratches, dents, and damaged parts), and the license plate numbers of both vehicles. Similarly, the road anomaly event feature image for a wrong-way driving accident should at least include the type of road sign, the direction indicated by the road sign, the vehicle's parking status and location, the relationship between the vehicle and the parking space, the vehicle's driving direction, and the vehicle's license plate number.

[0082] In this embodiment, a road anomaly event recognition model is first constructed, and then the denoised low-dimensional data information is input into the road anomaly event recognition model. By learning from a large-scale dataset, new original road anomaly event feature images are generated, thereby achieving panoramic recognition of road anomalies and effectively improving the recognition accuracy of road anomalies.

[0083] Step S31: Construct a road anomaly event identification model.

[0084] In one embodiment, such as Figure 3 As shown, step S31 includes:

[0085] Step S311: Extract the mutation features of the manifest variables from the multi-view images of road anomalies, and map the mutation features to the probability distribution of the latent variables to generate low-dimensional data information to describe the key content features of the multi-view images.

[0086] Specifically, the data processing objective of step S311 is the same as that of step S2, and its specific implementation process is detailed in the embodiment of step S2, which will not be repeated here.

[0087] Step S312: Denoise the low-dimensional data information to use it as sample data.

[0088] Accordingly, the specific implementation process of step S312 can be found in the embodiment of step S26, and will not be repeated here.

[0089] Step S313: Use the obtained sample data to train a contrastive learning model and construct a primary road anomaly event recognition model for generating feature images of road anomalies.

[0090] In deep learning, contrastive learning models are an important unsupervised learning method that can learn meaningful feature representations from unlabeled data.

[0091] In one specific embodiment, the SimCLR (Simple Contrastive Learning) framework is adopted. Based on the contrastive learning model, the primary road anomaly event recognition model is trained by maximizing the similarity of positive sample pairs and minimizing the similarity of negative sample pairs. This can effectively map the input sample data to the latent representation space and learn the feature values in the sample data to better fit the training data, ultimately generating a road anomaly event feature image that can reflect the key attributes of road anomalies.

[0092] In this invention, the SimCLR (Simple Contrastive Learning) framework includes a random data augmentation module, a basic neural network encoder, a small neural network projector head, and a defined contrastive loss function. Specifically, the random data augmentation module generates augmented data samples randomly based on the original sample data; the basic neural network encoder extracts representation vectors from the augmented data samples to generate a road anomaly event feature image; the small neural network projector head maps the original sample data and the augmented sample data to a contrastive loss space; and the contrastive loss function measures the similarity and difference between positive and negative sample pairs to evaluate the accuracy of the learned feature representation vectors.

[0093] Specifically, the neural network base encoder uses a ResNet residual network to obtain representation vectors, and the small neural network projection head uses a multi-layer perceptron (MLP) to obtain the latent variables of the sample data and the augmented sample data. The specific calculation formula is as follows:

[0094]

[0095] z i =g(h i ) = W (2) σ(W(1) h i (8)

[0096] in, For input samples; h i σ is the output of the average pooling layer of the ResNet residual network; W is the training weight, and σ is the ReLU nonlinear function.

[0097] Step S314: Based on the defined contrastive loss function, adjust and update the parameters of the primary road anomaly event recognition model to obtain the final converged road anomaly event recognition model.

[0098] In one specific embodiment, step S314 includes:

[0099] (1) Randomly perform enhancement transformation on the sample data, take two related views that generate the same sample as positive sample pairs, and do not explicitly extract an enhancement sample pair with a number equivalent to the number of positive sample pairs as negative sample pairs.

[0100] Specifically, this invention sequentially employs three enhancement methods—random cropping and readjustment to the original size, random color distortion, and random Gaussian blur—to enhance the sample data. The resulting enhanced samples and original samples are then used as positive sample pairs. Simultaneously, negative samples are randomly and implicitly extracted from other original samples and enhanced samples. For example: First, a mini-batch of N samples is extracted from the original samples, and N enhanced samples are derived from this mini-batch to obtain 2N samples. The original sample and its corresponding enhanced sample in each mini-batch are considered positive sample pairs, while the other original samples and their corresponding enhanced samples in the mini-batch are considered negative samples.

[0101] (2) Define the normalized cross-entropy loss function for positive and negative sample pairs. Based on the similarity scores of the positive and negative sample pairs, continuously adjust and update the parameters of the primary road anomaly event recognition model to obtain a converged road anomaly event recognition model.

[0102] In one specific embodiment, the defined contrastive loss function is the normalized cross-entropy loss function for positive and negative sample pairs, and the calculation formula is as follows:

[0103]

[0104] Among them, l [k≠i] τ is the similarity score; τ is the temperature parameter; sim(z) i ,z jThe product of the two negative samples is the normalized dot product. By optimizing the temperature parameter, the similarity score between positive and negative sample pairs can be controlled. When the similarity score of the positive sample pair is close to 1 and the similarity score of the negative sample pair is close to 0, it can be determined that the performance of the contrastive learning network has converged to a satisfactory level.

[0105] Specifically, this invention uses the Stochastic Gradient Descent (SGD) algorithm to update the parameters of the primary road anomaly event recognition model, including hyperparameters such as the weights, biases, learning rates, and batch sizes of each neuron, thereby minimizing the contrastive loss of the primary road anomaly event recognition model. The steps of forward propagation, loss calculation, backpropagation, and parameter optimization are repeated until the performance of the primary road anomaly event recognition model converges to a satisfactory level or the number of training iterations reaches a threshold. At this point, the final road anomaly event recognition model can better fit the sample data and exhibit better performance in subsequent applications generating road anomaly event feature images. It should be noted that the similarity threshold for positive and negative sample pairs and the training iteration threshold can be set according to requirements to judge the performance of the primary road anomaly event recognition model; the specific values are not limited in this invention.

[0106] Step S32: Based on the pre-trained road anomaly event recognition model, generate a road anomaly event feature image according to the denoised low-dimensional data information.

[0107] In one embodiment, such as Figure 4 As shown, step S32 includes:

[0108] Step S321: Introduce an attention mechanism into the intermediate layer of the neural network in the pre-trained road anomaly event recognition model.

[0109] Since the intermediate layers of a neural network contain higher-level features, introducing an attention mechanism into the intermediate layers of a neural network helps to identify image regions associated with road anomalies, thereby making the generated road anomaly feature images more accurate.

[0110] Step S322: For each location of each feature map generated by the intermediate layer, calculate an attention weight through a fully connected layer.

[0111] The attention weights represent the importance of the image region for identifying road anomalies and can be generated using fully connected layers of a neural network. Each fully connected layer accepts each feature map generated by the intermediate layers as input and outputs a corresponding weight map of the same size as each feature map, thus obtaining the attention weights for each location in each feature map.

[0112] Step S323: Multiply each feature map of the intermediate layer by the corresponding attention weight obtained to obtain a weighted feature map.

[0113] Step S324: Extract the representation vector from the weighted feature map to generate a road anomaly event feature image.

[0114] Since the low-dimensional data information is a data representation of the multi-view image with extracted key features in the latent representation space, the feature map generated by the neural network in the road anomaly event recognition model contains rich key features for describing the multi-view image. Furthermore, by multiplying with attention weights, the importance of each key feature is graded. Therefore, the road anomaly event feature image can include key attribute information of the road anomaly event and can be directly used as evidence for handling road anomaly events, thereby realizing panoramic recognition of road anomaly events and meeting the requirement of high recognition accuracy.

[0115] Step S4: Based on the pre-trained road anomaly event text generation model, generate matching text prompts describing the road anomaly event according to the feature image of the road anomaly event.

[0116] The text prompt information is a text description of the road anomaly, which is used to help road management personnel to better understand the attributes of the road anomaly event, including the type of road anomaly event, the location of the occurrence, the violation, the local details of the vehicles involved, and the license plate number, and to help road management personnel to speed up the response and implementation of road anomaly events.

[0117] In one embodiment, such as Figure 5 As shown, the process of training the road anomaly event text generation model in step S4 includes:

[0118] Step S41: Use the road anomaly event feature image as sample data to train a cross-attention model to obtain a primary road anomaly event text generation model, and generate a matching text vector based on the sample data.

[0119] In one specific embodiment, a primary road anomaly event text generation model is constructed using Google's Imagen model. By introducing self-attention and cross-attention mechanisms, text and images are combined, enabling the primary road anomaly event text generation model to understand both text and images simultaneously. Furthermore, a loss function is defined to measure the matching degree between the generated text vector and the feature image of the road anomaly event.

[0120] Specifically, step S41, which generates a matching text vector based on the sample data, includes:

[0121] (1) Create labels for the input road anomaly event feature images and match preset text templates based on the labels.

[0122] (2) Text embedding is performed on the feature image of the road abnormal event and the matched text template to generate a text vector.

[0123] This invention fuses the feature images of the road anomaly events and the matched text templates based on a cross-attention layer. By establishing connections between data from different modalities, it captures the correlation between data and, by understanding the semantic relationships and contextual information in the text, generates text vectors with more semantic information.

[0124] Specifically, this invention projects the deep features of the road anomaly event feature image onto the query matrix, projects the text template onto the key matrix and value matrix, and calculates the attention weights of the image feature vector and text feature vector to obtain the text vector. The specific calculation formula is as follows:

[0125]

[0126] K = l K (ψ(P)); (11)

[0127] V = l V (ψ(P)); (12)

[0128]

[0129]

[0130] in, The deep features of the road anomaly event feature image; ψ(P) is the text template; l Q l K and l V These are the learned projection transformations; Q, K, and V are the query matrix, key matrix, and value matrix, respectively; M is the attention weight; and d is the projection dimension of the key matrix and query matrix. This is the output text vector.

[0131] (3) The text vector is encoded to generate text prompt information.

[0132] In one specific embodiment, a text encoder is used to encode the text vector into text features to generate corresponding text prompt information. Preferably, a category prompt can be input into the text encoder, so that the text encoder can synchronously embed the category prompt when generating the text prompt information, providing additional information for the text prompt information of the road anomaly event feature image, thereby making the final generated text prompt information have a higher matching degree with the road anomaly event feature image and enhancing the performance of the primary road anomaly event text generation model.

[0133] Step S42: Based on the defined contrastive loss function, adjust and update the parameters of the primary road anomaly event text generation model to obtain the final converged road anomaly event text generation model.

[0134] In one specific embodiment, the cosine similarity between the road anomaly event feature image and the text vector is used as the contrastive loss function. The specific calculation formula is as follows:

[0135]

[0136] Here, A and B represent two vectors: the road anomaly event feature image and the text vector, respectively. The smaller the angle between the two vectors, the closer the cosine similarity value is to 1, indicating that the two vectors are closer and that the generated text vector is more relevant to the road anomaly event feature image.

[0137] Therefore, the matching degree between the text vector and the road anomaly event feature image can be evaluated by calculating the cosine similarity between them. Based on the evaluation results, the parameters of the primary road anomaly event text generation model are adjusted and updated to improve the quality and matching degree of the finally generated text prompts describing road anomalies.

[0138] Step S5: Upload the road anomaly event feature image and the matching text prompt information to the network platform to remind staff to take appropriate measures in a timely manner.

[0139] like Figure 6 The diagram illustrates the structure of a road anomaly event recognition system according to an embodiment of the present invention. In this embodiment, the road anomaly event recognition system 600 includes:

[0140] The multi-view image acquisition module 601 is used to identify whether a road anomaly event has occurred based on the acquired multi-view images of the road to obtain a multi-view image of the road anomaly event; wherein, the multi-view images of the road include road images from different directions captured by cameras set at different locations.

[0141] The abnormal event data representation module 602 is connected to the multi-view image acquisition module 601. It is used to extract the mutation features of the manifest variables from the multi-view images of road abnormal events, map the mutation features to the probability distribution of the latent variables, generate low-dimensional data information to describe the key content features of the multi-view images, and perform noise reduction processing on the low-dimensional data information.

[0142] The event attribute panoramic recognition module 603, connected to the abnormal event data representation module 602, is used to generate a road abnormal event feature image based on a pre-trained road abnormal event recognition model and denoised low-dimensional data information; wherein, the road abnormal event feature image includes key attribute information of the road abnormal event; the key attribute information includes at least one or more of the following: the type of road abnormal event, the location of occurrence, and the violation situation;

[0143] The event matching text generation module 604 is connected to the event attribute panoramic recognition module 603 and is used to generate matching text prompt information describing the road abnormal event based on the road abnormal event feature image, according to the pre-trained road abnormal event text generation model.

[0144] The image and text information uploading module 605 is connected to the event matching text generation module 604 and is used to upload the feature image of the road abnormal event and the matching text prompt information to the network platform to remind road management personnel to take appropriate measures in a timely manner.

[0145] It should be noted that the road anomaly event recognition system provided in the above embodiments, when generating road anomaly event feature images and matching text prompts, is only illustrated by the division of the above-described program modules. In practical applications, the above processing can be assigned to different program modules as needed, that is, the internal structure of the system can be divided into different program modules to complete all or part of the processing described above. Furthermore, the road anomaly event recognition system and the road anomaly event recognition method embodiments provided in the above embodiments belong to the same concept, and their specific implementation process is detailed in the method embodiments, which will not be repeated here.

[0146] The road anomaly event identification method provided in this embodiment of the invention can be implemented on the terminal side or the server side. For the hardware structure of the road anomaly event identification terminal, please refer to [link to relevant documentation]. Figure 7This is a schematic diagram of an optional hardware structure of a road anomaly event identification terminal 700 provided in an embodiment of the present invention. The terminal 700 can be a mobile phone, computer device, tablet device, personal digital processing device, factory back-end processing device, etc. The terminal 700 includes: at least one processor 701, a memory 702, at least one network interface 704, and a user interface 706. The various components in the device are coupled together through a bus system 705. It is understood that the bus system 705 is used to realize the connection and communication between these components. In addition to a data bus, the bus system 705 also includes a power bus, a control bus, and a status signal bus. However, for clarity, in... Figure 7 The general will label all buses as bus systems.

[0147] The user interface 706 may include a monitor, keyboard, mouse, trackball, clicker, button, touchpad, or touch screen.

[0148] It is understood that memory 702 can be volatile memory or non-volatile memory, or both. Non-volatile memory can be read-only memory (ROM) or programmable read-only memory (PROM), used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (SRAM) and synchronous static random access memory (SSRAM). The memories described in the embodiments of this invention are intended to include, but are not limited to, these and any other suitable categories of memory.

[0149] In this embodiment of the invention, the memory 702 is used to store various types of data to support the operation of the terminal 700. Examples of this data include: any executable program for operation on the terminal 700, such as operating system 7021 and application program 7022; operating system 7021 includes various system programs, such as framework layer, core library layer, driver layer, etc., for implementing various basic services and handling hardware-based tasks. Application program 7022 may include various applications, such as media player, browser, etc., for implementing various application services. The road anomaly event identification method provided in this embodiment of the invention can be included in application program 7022.

[0150] The methods disclosed in the above embodiments of the present invention can be applied to or implemented by processor 701. Processor 701 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by the integrated logic circuit of the hardware in processor 701 or by instructions in software form. The processor 701 may be a general-purpose processor, a digital signal processor (DSP), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Processor 701 can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present invention. General-purpose processor 701 may be a microprocessor or any conventional processor, etc. The steps of the accessory optimization method provided in the embodiments of the present invention can be directly reflected as being executed by a hardware decoding processor, or being executed by a combination of hardware and software modules in the decoding processor. The software module may be located in a storage medium, which is located in memory. The processor reads the information in the memory and combines it with its hardware to complete the steps of the aforementioned method.

[0151] In an exemplary embodiment, the terminal 700 may be used by one or more application-specific integrated circuits (ASICs), DSPs, programmable logic devices (PLDs), or complex programmable logic devices (CPLDs) to execute the aforementioned method.

[0152] Those skilled in the art will understand that all or part of the steps of the above-described method embodiments can be implemented using computer program-related hardware. The aforementioned computer program can be stored in a computer-readable storage medium. When executed, the program performs the steps of the above-described method embodiments; and the aforementioned storage medium includes various media capable of storing program code, such as ROM, RAM, magnetic disks, or optical disks.

[0153] In the embodiments provided in this application, the computer-readable and writable storage medium may include read-only memory, random access memory, EEPROM, CD-ROM or other optical disc storage devices, disk storage devices or other magnetic storage devices, flash memory, USB flash drive, portable hard drive, or any other medium capable of storing desired program code in the form of instructions or data structures and accessible by a computer. Additionally, any connection may be appropriately referred to as a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of the medium. However, it should be understood that computer-readable and writable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are intended for non-transient, tangible storage media. The disks and optical discs used in the application include compact discs (CDs), laser discs, optical discs, digital multifunction discs (DVDs), floppy disks, and Blu-ray discs, where disks typically copy data magnetically, while optical discs use lasers to copy data optically.

[0154] In summary, this application provides a method, system, terminal, and medium for identifying road anomalies. It constructs a road anomaly identification model and, based on this model, generates new road anomaly feature images reflecting key attribute information of the road anomalies from multi-view images of the road anomalies that have undergone dimensionality reduction and noise reduction processing. Furthermore, based on a constructed road anomaly text generation model, it generates text prompts describing the road anomalies from the feature images. This invention, based on generative artificial intelligence technology, achieves panoramic recognition of road anomalies, effectively improving the accuracy of road anomaly identification. It also provides richer information about road anomalies, offering effective evidence and support for road management personnel to handle such incidents and helping them to take faster action. Therefore, this application effectively overcomes the shortcomings of existing technologies and has high industrial application value.

[0155] The above embodiments are merely illustrative of the principles and effects of this application and are not intended to limit this application. Any person skilled in the art can modify or alter the above embodiments without departing from the spirit and scope of this application. Therefore, all equivalent modifications or alterations made by those skilled in the art without departing from the spirit and technical concept disclosed in this application should still be covered by the claims of this application.

Claims

1. A road abnormal event recognition method characterized by comprising: include: Based on the collected multi-view images of the road, it is possible to identify whether road anomalies have occurred and obtain multi-view images of road anomalies; wherein, the multi-view images of the road include road images from different directions captured by cameras set at different locations; The abrupt change features of the manifest variables are extracted from the multi-view images of road anomalies, and the abrupt change features are mapped to the probability distribution of the latent variables to generate low-dimensional data information to describe the key content features of the multi-view images. The low-dimensional data information is then denoised. Based on a pre-trained road anomaly event recognition model, a road anomaly event feature image is generated from denoised low-dimensional data information; wherein, the road anomaly event feature image includes key attribute information of the road anomaly event; the key attribute information includes at least one or more of the following: type of road anomaly event, location of occurrence, and violation situation; Based on a pre-trained road anomaly event text generation model, matching text prompts describing the road anomaly event are generated according to the feature image of the road anomaly event. The characteristic image of the abnormal road event and the matching text prompt information are uploaded to the network platform to remind road management personnel to take appropriate measures in a timely manner.

2. The road abnormal event recognition method according to claim 1, characterized by, The training methods for the road anomaly event recognition model include: Extract abrupt change features of explicit variables from multi-view images of road anomalies, and map the abrupt change features to the probability distribution of latent variables to generate low-dimensional data information to describe the key content features of the multi-view images. The low-dimensional data information is denoised to be used as sample data; A contrastive learning model was trained using the obtained sample data to construct a primary road anomaly event recognition model for generating feature images of road anomalies. Based on the defined contrastive loss function, the parameters of the primary road anomaly event recognition model are adjusted and updated to obtain the final converged road anomaly event recognition model.

3. The road abnormality event recognition method according to claim 1 or 2, characterized by, Extracting abrupt change features of manifest variables from multi-view images of road anomalies, and mapping these abrupt change features to probability distributions of latent variables, generates low-dimensional data information describing key content features of the multi-view images, including: Key features are extracted from multi-view images of road anomalies; wherein, the key features are used to describe the key content features of the multi-view images, including color, texture, shape, structure and edge features; Based on the extracted key features, the multi-view images are organized into clusters with similar features, and the noisy data in the multi-view images are separated into separate clusters. Analyze similar images within each cluster to find patterns of mutations or significant changes, and generate mutation feature data of manifest variables in the multi-view images; wherein, the patterns include local changes in the image, changes in color distribution, and texture differences; The mutation feature data is mapped to points in the latent representation space, while preserving the probability distribution information of the mapping, to generate the probability distribution of latent variables in the multi-view image. Points in the latent representation space are sampled to generate low-dimensional data information.

4. The road abnormality event recognition method according to claim 1 or 2, characterized by, The methods for denoising the low-dimensional data information include: Key features are extracted from the low-dimensional data information, and the extracted key features are mapped back to the original latent representation space to obtain the denoised low-dimensional data information. The key features include color, texture, shape, structure, and edge features.

5. The road abnormal event recognition method according to claim 2, characterized by, Based on the defined contrastive loss function, the parameters of the primary road anomaly recognition model are adjusted and updated to obtain the final converged road anomaly recognition model, including: The sample data is randomly augmented, and two related views that generate the same sample are taken as positive sample pairs, and augmented sample pairs with an equal number of positive sample pairs are not explicitly extracted as negative sample pairs. Define a normalized cross-entropy loss function for positive and negative sample pairs. Based on the similarity scores of the positive and negative sample pairs, continuously adjust and update the parameters of the primary road anomaly event recognition model to obtain a converged road anomaly event recognition model.

6. The road abnormal event recognition method according to claim 1, characterized by, Based on a pre-trained road anomaly event recognition model, road anomaly event feature images are generated from denoised low-dimensional data, including: An attention mechanism is introduced into the intermediate layer of the neural network in the pre-trained road anomaly event recognition model; For each location of each feature map generated by the intermediate layer, an attention weight is calculated through a fully connected layer. Each feature map of the intermediate layer is multiplied by the corresponding attention weight to obtain a weighted feature map; The representation vector is extracted from the weighted feature map to generate a feature image of road anomaly events.

7. The road abnormal event recognition method according to claim 1, characterized by, The training methods for the road anomaly event text generation model include: The cross-attention model is trained using the feature images of the road anomaly event as sample data to obtain a primary road anomaly event text generation model, and matching text prompt information is generated based on the sample data. Based on the defined contrastive loss function, the parameters of the primary road anomaly event text generation model are adjusted and updated to obtain the final converged road anomaly event text generation model.

8. A road abnormal event recognition system characterized by comprising: include: A multi-view image acquisition module is used to identify whether road anomaly events have occurred based on the acquired multi-view images of the road to obtain multi-view images of the road anomaly events; wherein, the multi-view images of the road include road images from different directions captured by cameras set at different locations; An abnormal event data representation module is used to extract abrupt change features of explicit variables from multi-view images of road abnormal events, map the abrupt change features to the probability distribution of latent variables, generate low-dimensional data information to describe the key content features of the multi-view images, and perform noise reduction processing on the low-dimensional data information. The event attribute panoramic recognition module is used to generate a road anomaly event feature image based on a pre-trained road anomaly event recognition model and denoised low-dimensional data information; wherein, the road anomaly event feature image includes key attribute information of the road anomaly event; the key attribute information includes at least one or more of the following: type of road anomaly event, location of occurrence, and violation situation; The event matching text generation module is used to generate matching text prompts describing the road anomaly event based on the road anomaly event feature image, using a pre-trained road anomaly event text generation model. The image and text information uploading module is used to upload the feature image of the road anomaly and the matching text prompt information to the network platform to remind road management personnel to take appropriate measures in a timely manner.

9. An electronic terminal, characterized in that include: Processor and memory; The memory is used to store computer programs; The processor is used to execute the computer program stored in the memory to cause the terminal to perform the road anomaly event identification method as described in any one of claims 1 to 7.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that, When the computer program is executed by the processor, it implements the road anomaly event identification method according to any one of claims 1 to 7.