Case text classification method, system and storage medium based on naive bayes

A text classification and Bayesian algorithm technology, applied in the case text classification method, system and storage medium field based on Naive Bayes, can solve the problems of large difference in the number of training set texts, uneven distribution of categories, etc., and achieve good results. The effect of the classification effect

Inactive Publication Date: 2019-02-01
东莞数汇大数据有限公司
View PDF2 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the case texts of public security have the characteristics of unbalanced category distribution, that is, the characteristics of the large difference i

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Case text classification method, system and storage medium based on naive bayes
  • Case text classification method, system and storage medium based on naive bayes
  • Case text classification method, system and storage medium based on naive bayes

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0051] refer to figure 1 , a case text classification method based on Naive Bayes, including the following steps:

[0052] S101. Construct a classifier based on the naive Bayesian algorithm.

[0053] S102. Obtain training samples to train the classifier, and calculate the prior probability of each category and the prior probability of each feature word belonging to each category. The training samples can be processed training samples or unprocessed training samples. If unprocessed training samples are used, the training samples need to be preprocessed in step S104.

[0054] S103. Obtain the text to be classified. The text to be classified is the original text and has not been processed, so it needs to be processed in step S104.

[0055] S104. Perform preprocessing on the text to be classified to obtain a text vector of the text to be classi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a case text classification method based on naive Bayes, a system and a storage medium. The method comprises the following steps: a classifier is constructed based on the naiveBayes algorithm; Obtaining training samples to train the classifier; Obtaining text to be categorized; The text vector of the text to be classified is obtained by preprocessing the text to be classified; Inputting the text vector into a classifier, and calculating a posterior probability of the text to be classified belonging to each category according to a priori probability of each category anda priori probability of each feature word belonging to each category; The class with the highest posterior probability is output as the classification result. The invention fully considers the difference of the sample quantity between different categories, and takes the prior probability of each category and the prior probability of each characteristic word belonging to each category as the calculation factor of the classifier, so that the invention has better classification effect on case text classification. The invention can be widely applied in the field of data mining.

Description

technical field [0001] The invention relates to the field of data mining, in particular to a naive Bayesian-based case text classification method, system and storage medium. Background technique [0002] The text classification method is a supervised classification method. It uses a text dataset with a marked category to train the classifier, and then uses the trained classifier to classify the text of the unmarked category. The commonly used classification algorithms are simple Bayesian method, K-nearest neighbor method, support vector machine method, etc. Among them, the naive Bayesian classification method is currently recognized as a simple and effective classification method, and it shows satisfactory performance in the field of text classification. However, the case texts of public security have the characteristics of unbalanced category distribution, that is, the characteristics of the large difference in the number of texts contained in each category of the training ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06K9/62
CPCG06F18/24155
Inventor 屈丽平朱凌峰胡裕丰
Owner 东莞数汇大数据有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products