Unbalanced data processing method for traffic accident analysis

A technology for data processing and traffic accidents, applied in electrical digital data processing, special data processing applications, reasoning methods, etc., can solve problems such as biased analysis results, and achieve the effect of reducing erroneous division

Pending Publication Date: 2022-05-10
SUN YAT SEN UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a method for processing unbalanced data for traffic accident analysis, which can overcome the problem that the analysis results are biased towards majority class samples when the original unbalanced traffic accident data set is used for reasoning and prediction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unbalanced data processing method for traffic accident analysis
  • Unbalanced data processing method for traffic accident analysis
  • Unbalanced data processing method for traffic accident analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0049] like figure 1 As shown, an unbalanced data processing method for traffic accident analysis includes the following steps:

[0050] S1: Obtain traffic accident data and set multiple accident attributes, which include n influencing factors and 1 decision variable, where the decision variable is the severity of the accident;

[0051] S2: Combine data according to the accident attributes to obtain a new data table, perform data cleaning on the obtained data table, and obtain the original data table of traffic accidents;

[0052] S3: Classify the severity of accidents in the original data table, and discretize the impact factors to obtain an optimized data table;

[0053] S4: In the optimization data table, according to the grade distribution of accident severity, divide the positive samples and negative samples, and re-sample the unbalanced data of the positive samples to obtain a balanced data table;

[0054] S5: Input the obtained balanced data table and optimized data t...

Embodiment 2

[0096] The data source used in this example is the traffic accident data in Guangdong Province from 2017 to 2018, with a total of 24,816 records. Each record includes relevant information at the time of the collision: basic information, vehicle information, party information, party information and road information.

[0097] S1: First set 15 accident attributes, including 14 influencing factors and 1 decision variable;

[0098] S2: Perform data union and data cleaning to generate an original data table with 15 columns, and the number of rows in the original data table is 24816; take the data of 16 cases in the original data table as an example, as shown in Table 1:

[0099] Table 1

[0100] BEL AIR OVE ROA CROs TYP INT WEA Tim VIS VIG CON DRV COL SEV 1 1 2 1 2 12 21 1 2012 / 2 / 3 15:50:00 3 1 1 1 21 loss and minor injury 1 3 2 1 1 12 21 1 2014 / 1 / 19 10:50:00 4 1 1 1 21 loss and minor injury 2 1 2 1 1 ...

Embodiment 3

[0114] This embodiment is similar to Embodiment 2, the difference is that this embodiment can analyze the grade distribution characteristics of accident severity, including descriptive statistics on decision variables and influencing factors, and complete the average value, standard deviation, maximum value, minimum value, and posterior distribution characteristics, where the calculation of posterior distribution characteristics includes statistical values ​​of kurtosis and skewness and calculation of standard errors, and the descriptive statistics of each accident attribute are shown in Table 5;

[0115] Table 5 Descriptive statistics of accident attributes

[0116]

[0117] In this embodiment, the average value of the grade case frequency of the decision variable (accident severity) is 1.22, the standard deviation is 0.589, the maximum value is 3, and the minimum value is 1. Through the above descriptive statistics, the grade distribution characteristics of traffic accide...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an unbalanced data processing method for traffic accident analysis, and the method comprises the steps: firstly obtaining traffic accident data, and carrying out the data combination and data cleaning, and obtaining an original data table; performing grade division on the accident severity in the original data table, and discretizing the influence factors to obtain an optimized data table; then in the optimized data table, according to the grade distribution of the accident severity, dividing positive samples and negative samples, and carrying out unbalanced data resampling on the positive samples to obtain a balanced data table; and finally, inputting the obtained balance data table and optimization data table into a Bayesian classifier, and determining the final classification of the accident case according to the maximum posterior probability. According to the method, the problem that an analysis result is biased to majority class samples when reasoning and prediction classification are carried out through an original unbalanced traffic accident data set is effectively solved, and meanwhile, the recognition capability of minority class samples is kept at an acceptable level.

Description

technical field [0001] The invention relates to the technical field of traffic safety analysis and prediction, in particular to an unbalanced data processing method for traffic accident analysis. Background technique [0002] The frequent occurrence of road traffic accidents has always been the focus of people's attention. Although the occurrence of accidents is random, the negative impact of serious accidents such as serious injuries and deaths is far greater than that of general property damage accidents, and the burden on society and individuals is heavier. . Therefore, the analysis and research on the severity of road traffic accidents is of great significance to dig out the objective laws of accidents and improve the level of traffic safety. [0003] However, traffic accident datasets are usually unbalanced, with property damage or minor injury accidents occupying more instances as negative samples, and fatal accidents occupying fewer instances as positive samples. In...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/22G06F16/242G06F16/28G06K9/62G06N5/04
CPCG06F16/215G06F16/2282G06F16/2433G06F16/285G06N5/04G06F18/24155
Inventor 李军王琪贾碧岑
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products