Highway accident prediction method and system based on gradient boosting tree model

A gradient boosting tree and accident prediction technology, applied in the field of traffic safety, can solve the problem of inability to quantify the risk state, and achieve the effect of improving the detection performance, reducing the false alarm rate, and overcoming the imbalance.

Pending Publication Date: 2021-08-20
招商新智科技有限公司
3 Cites 1 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0004] None of the above traffic accident prediction methods can quantify the current risk status o...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Method used

Utilize linear search to estimate the value of leaf node region, make loss function minimization;
[0120] The present invention can overcome the imbalance of training data by using the gradient boosting tree model, and improve the detection performance, especially the effect of reducing the false alarm rate.
[0122] Additionally, for simplicity of illustration and discussion, and so as not to obscure one or more embodiments of the present specification, integrated circuit (IC) chips and Known power/ground connections for other components. Furthermore, devices may be shown in block diagram form in order to avoid obscuring one or more embodiments of the description, and this also takes into account the fa...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

One or more embodiments of the invention provide a highway accident prediction method and system based on a gradient boosting tree model, and the method comprises the steps of carrying out the data cleaning of a collected traffic accident data set, carrying out the data preprocessing, carrying out the feature extraction of the processed data, obtaining a key variable for predicting the occurrence of an accident, constructing and training a dynamic accident prediction model in combination with a gradient boosting tree model, and inputting the online data into the trained dynamic accident prediction model to obtain an output predicted accident value. According to the embodiment, the accident number of the road section can be predicted through the real accident data details, the problem that an existing model cannot carry out refined prediction on traffic accidents is solved, imbalance of training data can be overcome, and the effects of improving the detection performance and particularly reducing the false alarm rate are achieved; and compared with a traditional method, the model can carry out more accurate prediction on the traffic accidents.

Application Domain

Technology Topic

Data cleansingGradient boosting +11

Image

  • Highway accident prediction method and system based on gradient boosting tree model

Examples

  • Experimental program(1)

Example Embodiment

[0080] As an embodiment, the data preprocessing also includes estimating the full-day cross-sectional traffic flow, including:
[0081] Through the traffic flow distribution curve of the last day, obtain the proportion of the hourly cross-section traffic flow to the whole-day cross-section traffic flow;
[0082] Estimated cross-sectional traffic flow throughout the day;
[0083] Form a sparse matrix of segment hourly accident data.
[0084] Embodiments of the present specification also provide a system for predicting highway accidents based on a gradient boosting tree model, including:
[0085] The data collection module is used to collect historical accident data to form a traffic accident data set;
[0086] The data cleaning module is used to clean the traffic accident dataset;
[0087] The data preprocessing module is used to perform data preprocessing on the traffic accident data set after data cleaning;
[0088] The feature extraction module is used to perform feature extraction on the traffic accident data set after data preprocessing to obtain key variables for predicting the occurrence of accidents;
[0089] The model building module is used to combine the gradient boosting tree model to build a dynamic accident prediction model;
[0090] The model training module is used to input the traffic accident data set into the dynamic accident prediction model, use the key variables to train the dynamic accident prediction model, and obtain the trained dynamic accident prediction model;
[0091] The prediction module is used to input the online data into the trained dynamic accident prediction model, and obtain the output predicted accident value.
[0092] As an embodiment, the historical accident data includes, but is not limited to, the time of the accident, the road section where the accident occurred, the direction of the lane where the accident occurred, the weather of the accident, and the damage caused by the accident;
[0093] The data preprocessing module includes a data aggregation module, a one-hot encoding module and a date judgment module;
[0094] The data aggregation module is used to aggregate the number of accidents and the losses caused by accidents in each road section and each time period since the first traffic accident according to the time of accident and the road section where the accident occurred, and form a traffic accident case. Aggregate database;
[0095] The one-hot encoding module is used to separate the accident weather data of the traffic accident case aggregation database into discrete multi-dimensional vectors;
[0096] The date judgment module is used to correlate the accident time with the actual calendar, extract the attributes of the date when the accident occurred, and add each attribute to the traffic accident case aggregation database as an independent feature.
[0097] The following example illustrates the prediction process of the expressway accident prediction method based on the gradient boosting tree model.
[0098] 1. Establish an accident database
[0099] According to the excel table of accident statistics of Wuhuang Expressway, a traffic accident case database is constructed. The contents include: accident time, accident direction, accident type, accident weather, accident loss, accident cause.
[0100] Cross-section traffic flow: For the data before the installation of traffic statistics equipment, according to the monthly cross-section flow charge data of Wuhuang Expressway, divide by the number of days in the month to obtain the estimated cross-section traffic flow on the day. For the data after the installation of the traffic statistics equipment, directly obtain the traffic flow of the cross-section of the day
[0101] Format the data: For example, in the data, the manual recording method is used. For the same direction, there are different expressions such as Xihuang, east to west, upward, etc., and replace them with 0 and 1.
[0102] Eliminate data outliers: For data record errors or unclear or unknown data, use vacancies to fill in.
[0103] Fill in vacancies according to the distribution of non-null data: Count the probability distribution of non-null values ​​in each item, for example, 60% for the up and 40% for the down, and 6:4 for the null values.
[0104] 2. Data preprocessing: remove redundant variables to normalize the data, and use the groupby function to aggregate the data by hours and sections.
[0105] One-hot encoding processing for weather, etc.: For example, weather in the original data set has five representations of sunny, overcast, fog, rain, and snow, which are represented by 0, 1, 2, 3, and 4, respectively, but this representation exists during regression. The problem is that the average of sunny and foggy days is cloudy, which is not in line with common sense. It is necessary to split the weather phase vector into 5 vectors of 5 dimensions: sunny, cloudy, foggy, rainy, and snowy, represented by 0 to 1 respectively.
[0106] Estimated full-day cross-section traffic flow: Obtain the ratio of the hourly cross-section traffic flow to the full-day cross-section traffic flow through the traffic flow distribution curve of the most recent day, and divide it by the estimated full-day cross-section traffic flow.
[0107] Finally, a sparse matrix of road segment hourly accident data is formed. The dimensions of the matrix include:
[0108] Accident location kilometers (such as K971), hour (such as 16:00), whether it is a weekend (0 or 1), whether it is a holiday (0 or 1), weather_clear, weather_cloudy, weather_fog, weather_rain, weather _Snow, hourly cross-section traffic flow (such as 500), full-day cross-section traffic flow (such as 8000).
[0109] 3. Gradient boosting tree model training: Input the historical accident data into the gradient boosting tree model, and use 75% of the data as the training set and 25% of the data as the test set to train the gradient boosted tree model.
[0110] The main training parameters are as follows:
[0111] n_estimators is the maximum number of weak learners, set to 0.1;
[0112] learning_rate learning rate, set to 0.1;
[0113] subsample subsample, set to 0.8;
[0114] max_features maximum number of features, set to None;
[0115] max_depth maximum tree depth, set to 3;
[0116] After testing, under the current accident dataset, the accuracy of the gradient boosted tree model is 65.28%, which is higher than the 48.60% of the SVM model.
[0117] 4. Model operation, input the actual section hourly traffic flow and date into the gradient boosting tree model operation part, output the predicted number of accidents in each road section, use the k-means clustering algorithm to cluster the results, set the number of clusters to 4, and divide the It is a level 4 warning level.
[0118]The present invention performs accident prediction based on a fast gradient boosting tree model, and can use hours and road sections as granularities to finely predict accident road sections and quantities.
[0119] When generating each decision tree, the gradient boosting tree model of the present invention adopts the strategy of growing according to leaves, and in the case of increasing a leaf node, more errors can be reduced than the strategy of growing according to layers; in order to prevent the model from overfitting , the fast gradient boosting tree model limits the depth of each decision tree, and the final model is composed of fewer decision trees and leaf nodes. This feature makes the fast gradient boosting tree model in the decision-making phase of the matching process has a good time. Efficiency.
[0120] The invention can overcome the imbalance of training data by adopting the gradient boosting tree model, and has the effect of improving the detection performance, especially reducing the false alarm rate.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Grass carp compound feed

InactiveCN101810261AImprove quality and edible ratioOvercome imbalanceAnimal feeding stuffTotal phosphorusChemistry
Owner:SUN YAT SEN UNIV

Classification and recommendation of technical efficacy words

Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products