Case text classification method, system and storage medium based on naive bayes

A text classification and Bayesian algorithm technology, applied in the case text classification method, system and storage medium field based on Naive Bayes, can solve the problems of large difference in the number of training set texts, uneven distribution of categories, etc., and achieve good results. The effect of the classification effect
CN109299255AInactive Publication Date: 2019-02-01东莞数汇大数据有限公司

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
东莞数汇大数据有限公司
Publication Date
2019-02-01
Estimated Expiration
Not applicable · inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a case text classification method based on naive Bayes, a system and a storage medium. The method comprises the following steps: a classifier is constructed based on the naiveBayes algorithm; Obtaining training samples to train the classifier; Obtaining text to be categorized; The text vector of the text to be classified is obtained by preprocessing the text to be classified; Inputting the text vector into a classifier, and calculating a posterior probability of the text to be classified belonging to each category according to a priori probability of each category anda priori probability of each feature word belonging to each category; The class with the highest posterior probability is output as the classification result. The invention fully considers the difference of the sample quantity between different categories, and takes the prior probability of each category and the prior probability of each characteristic word belonging to each category as the calculation factor of the classifier, so that the invention has better classification effect on case text classification. The invention can be widely applied in the field of data mining.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the field of data mining, in particular to a naive Bayesian-based case text classification method, system and storage medium. Background technique

[0002] The text classification method is a supervised classification method. It uses a text dataset with a marked category to train the classifier, and then uses the trained classifier to classify the text of the unmarked category. The commonly used classification algorithms are simple Bayesian method, K-nearest neighbor method, support vector machine method, etc. Among them, the naive Bayesian classification method is currently recognized as a simple and effective classification method, and it shows satisfactory performance in the field of text classification. However, the case texts of public security have the characteristics of unbalanced category distribution, that is, the characteristics of the large difference in the number of texts contained in each category of the training ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More