Case text classification method, system and storage medium based on naive bayes

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A text classification and Bayesian algorithm technology, applied in the case text classification method, system and storage medium field based on Naive Bayes, can solve the problems of large difference in the number of training set texts, uneven distribution of categories, etc., and achieve good results. The effect of the classification effect

Inactive Publication Date: 2019-02-01

东莞数汇大数据有限公司

View PDF2 Cites 21 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the case texts of public security have the characteristics of unbalanced category distribution, that is, the characteristics of the large difference in the number of texts contained in each category of the training set, so the present invention proposes an improved Naive Bayesian method to classify case texts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0050] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0051] refer to figure 1 , a case text classification method based on Naive Bayes, including the following steps:

[0052] S101. Construct a classifier based on the naive Bayesian algorithm.

[0053] S102. Obtain training samples to train the classifier, and calculate the prior probability of each category and the prior probability of each feature word belonging to each category. The training samples can be processed training samples or unprocessed training samples. If unprocessed training samples are used, the training samples need to be preprocessed in step S104.

[0054] S103. Obtain the text to be classified. The text to be classified is the original text and has not been processed, so it needs to be processed in step S104.

[0055] S104. Perform preprocessing on the text to be classified to obtain a text vector of the text to be classi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a case text classification method based on naive Bayes, a system and a storage medium. The method comprises the following steps: a classifier is constructed based on the naiveBayes algorithm; Obtaining training samples to train the classifier; Obtaining text to be categorized; The text vector of the text to be classified is obtained by preprocessing the text to be classified; Inputting the text vector into a classifier, and calculating a posterior probability of the text to be classified belonging to each category according to a priori probability of each category anda priori probability of each feature word belonging to each category; The class with the highest posterior probability is output as the classification result. The invention fully considers the difference of the sample quantity between different categories, and takes the prior probability of each category and the prior probability of each characteristic word belonging to each category as the calculation factor of the classifier, so that the invention has better classification effect on case text classification. The invention can be widely applied in the field of data mining.

Description

technical field [0001] The invention relates to the field of data mining, in particular to a naive Bayesian-based case text classification method, system and storage medium. Background technique [0002] The text classification method is a supervised classification method. It uses a text dataset with a marked category to train the classifier, and then uses the trained classifier to classify the text of the unmarked category. The commonly used classification algorithms are simple Bayesian method, K-nearest neighbor method, support vector machine method, etc. Among them, the naive Bayesian classification method is currently recognized as a simple and effective classification method, and it shows satisfactory performance in the field of text classification. However, the case texts of public security have the characteristics of unbalanced category distribution, that is, the characteristics of the large difference in the number of texts contained in each category of the training ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/35G06K9/62

CPCG06F18/24155

Inventor屈丽平朱凌峰胡裕丰

Owner东莞数汇大数据有限公司

Case text classification method, system and storage medium based on naive bayes

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology