Unlock instant, AI-driven research and patent intelligence for your innovation.

Text classification algorithm based on hybrid multinomial distribution

A multinomial distribution and text classification technology, which is applied in the field of text classification algorithm based on mixed multinomial distribution, can solve problems such as small computational complexity, large limitations, and difficult establishment, and achieve accurate model prediction and reduce difficulties.

Active Publication Date: 2018-07-10
GUANGDONG KINGPOINT DATA SCI & TECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The traditional text classification algorithm based on Naive Bayes assumes that under a given text category, each text feature vector attribute is independent and identically distributed. Due to the simple assumption, the text classification algorithm based on Naive Bayes has a small computational complexity. In some cases, better classification results can also be achieved. However, in actual tasks, this conditional independence assumption is difficult to establish and has great limitations. It is necessary to consider relaxing the attribute conditional independence assumption to a certain extent.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification algorithm based on hybrid multinomial distribution
  • Text classification algorithm based on hybrid multinomial distribution
  • Text classification algorithm based on hybrid multinomial distribution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The above and other technical features and advantages of the present invention will be described in more detail below in conjunction with the accompanying drawings.

[0049] see figure 1 , which is a functional block diagram of the text classification algorithm based on mixed multinomial distribution in the present invention.

[0050] like figure 1 As shown, a text classification algorithm based on mixed multinomial distribution, including the following steps:

[0051]S1: Input training set, the category set of its text is C={C 1 ,C 2 ,...,C S}, the attribute feature set of the text is x={x 1 ,x 2 ,...,x d};

[0052] S2: Calculate and save all text categories as C j The probability distribution of j=1,2...S;

[0053] S3: Initialize the probability parameter θ and weight π of the mixed multinomial distribution k And the number of components K;

[0054] S4: Use current parameter values ​​θ, π k , to calculate the expectation of the log-likelihood function of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text classification algorithm based on hybrid multinomial distribution. The algorithm comprises the following steps: S1, inputting a training set text; S2, calculating and storing probability distribution of all text classifications C; S3, initializing parameter values [theta], [pi]k, and the number K of components of the hybrid multinomial distribution; S4, using the current parameter values [theta] and [pi]k to calculate expectation of a log-likelihood function of complete data with respect to hidden variable posterior probability distribution; S5, using an EM algorithm to train the parameter values [theta] and [pi]k of the hybrid multinomial distribution; S6, for different number K of components, respectively drawing figures lines of forecast errors on the testing set and the training set by the model, selecting a K value whose forecast error is minimum; S7, outputting a result. Beneficial effects of the algorithm are that the naive Bayes algorithm is combined with hybrid multinomial distribution, the EM algorithm is used to estimate parameters of a hybrid model, to improve classification precision of the model.

Description

technical field [0001] The invention relates to a text classification algorithm, in particular to a text classification algorithm based on mixed multinomial distribution. Background technique [0002] With the emergence of a large number of online texts and the rise of machine learning, large-scale text classification and retrieval have aroused the interest of researchers. A large number of results show that the method based on statistical learning has high text classification accuracy and can be applied to any field of learning, making it the mainstream method of text classification. [0003] The traditional text classification algorithm based on Naive Bayes assumes that under a given text category, each text feature vector attribute is independent and identically distributed. Due to the simple assumption, the text classification algorithm based on Naive Bayes has a small computational complexity. In some cases, better classification results can also be achieved. However, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/35G06F18/24155
Inventor 许飞月陶波陈乐焱
Owner GUANGDONG KINGPOINT DATA SCI & TECH CO LTD