Unlock instant, AI-driven research and patent intelligence for your innovation.

A Text Classification Algorithm Based on Mixed Multinomial Distribution

A multinomial distribution and text classification technology, applied in the field of text classification algorithms based on mixed multinomial distributions, can solve problems such as large limitations, low computational complexity, and difficulty in establishing, and achieve the effect of reducing difficulty and accurate model prediction.

Active Publication Date: 2021-05-14
GUANGDONG KINGPOINT DATA SCI & TECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The traditional text classification algorithm based on Naive Bayes assumes that under a given text category, each text feature vector attribute is independent and identically distributed. Due to the simple assumption, the text classification algorithm based on Naive Bayes has a small computational complexity. In some cases, better classification results can also be achieved. However, in actual tasks, this conditional independence assumption is difficult to establish and has great limitations. It is necessary to consider relaxing the attribute conditional independence assumption to a certain extent.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Text Classification Algorithm Based on Mixed Multinomial Distribution
  • A Text Classification Algorithm Based on Mixed Multinomial Distribution
  • A Text Classification Algorithm Based on Mixed Multinomial Distribution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The above and other technical features and advantages of the present invention will be described in more detail below in conjunction with the accompanying drawings.

[0049] see figure 1 , which is a functional block diagram of the text classification algorithm based on mixed multinomial distribution in the present invention.

[0050] Such as figure 1 As shown, a text classification algorithm based on mixed multinomial distribution, including the following steps:

[0051]S1: Input training set, the category set of its text is C={C 1 ,C 2 ,...,C S}, the attribute feature set of the text is x={x 1 ,x 2 ,...,x d};

[0052] S2: Calculate and save all text categories as C j The probability distribution of j=1,2...S;

[0053] S3: Initialize the probability parameter θ and weight π of the mixed multinomial distribution k And the number of components K;

[0054] S4: Use current parameter values ​​θ, π k , to calculate the expectation of the log-likelihood function ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a text classification algorithm based on mixed multinomial distribution, including the following steps: S1: input training set text; S2: calculate and save the probability distribution of all text categories C; S3: initialize the parameter value θ of mixed multinomial distribution , π k And the number of components K; S4: Use the current parameter values ​​θ, π k , calculate the log-likelihood function of the complete data on the expectation of the posterior probability distribution of the hidden variable; S5: use the EM algorithm to train the parameter values ​​θ and π of the mixed multinomial distribution k ; S6: For the different numbers of components K, draw the prediction error graphs of the model against the test set and the training set, and select the K value with the smallest prediction error; S7: Output the result. The beneficial effect of the present invention is that the present invention combines naive Bayesian algorithm with mixed multinomial distribution, uses EM algorithm to estimate the parameters of the mixed model, so as to improve the classification accuracy of the model.

Description

technical field [0001] The invention relates to a text classification algorithm, in particular to a text classification algorithm based on mixed multinomial distribution. Background technique [0002] With the emergence of a large number of online texts and the rise of machine learning, large-scale text classification and retrieval have aroused the interest of researchers. A large number of results show that the method based on statistical learning has high text classification accuracy and can be applied to any field of learning, making it the mainstream method of text classification. [0003] The traditional text classification algorithm based on Naive Bayes assumes that under a given text category, each text feature vector attribute is independent and identically distributed. Due to the simple assumption, the text classification algorithm based on Naive Bayes has a small computational complexity. In some cases, better classification results can also be achieved. However, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06K9/62
CPCG06F16/35G06F18/24155
Inventor 许飞月陶波陈乐焱
Owner GUANGDONG KINGPOINT DATA SCI & TECH CO LTD