Method for applying text mining to road traffic accident data processing
A technology of road traffic and text mining, which is applied in electronic digital data processing, text database query, special data processing applications, etc. Due to accurate analysis and efficient processing
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0026] In this embodiment, the data is firstly processed and the model is constructed. The present invention uses python as the main language, uses the open source library jieba to perform Chinese word segmentation on the sample, and then uses the word2vec model to vectorize the data set in three dimensions, and finally uses volume The product neural network CNN is used to build the TextCNN network to realize the model.
[0027] 1.1 Chinese word segmentation
[0028] Such as figure 1 As shown, Chinese word segmentation refers to the process of recombining Chinese character sequences into word sequences according to the specification of extracting special words. The key step is to remove stop words, which means that in information retrieval, when processing natural language data, some words that are not necessary for this word are filtered out. Words, characters and words that have no actual meaning in text data, so as to save storage space and improve search efficiency.
[0...
Embodiment 2
[0046] A method of text mining applied to road traffic accident data processing, which performs Chinese word segmentation on road traffic accident data samples, three-dimensionally vectorizes the sample data set through a word embedding model, and then builds a large-scale text classification network TextCNN through a neural network CNN The network builds a model and outputs key traffic information.
[0047] Furthermore, the Chinese word segmentation of the road traffic accident data sample includes: on the basis of the universal corpus of the open source library jieba itself, according to the characteristics of the scene, import the traffic safety corpus as a custom lexicon, perform word segmentation on the sample, and then Remove stop words, delete texts that have nothing to do with judgment, and enhance the ability to correct ambiguities.
[0048] Furthermore, the three-dimensional vectorization of the sample data set through the word embedding model further includes: accor...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com