Text classification method and system based on part-of-speech classification

A text classification and text technology, applied in semantic analysis, character and pattern recognition, instruments, etc., can solve the problems of low classification accuracy, high dimensionality, and poor generalization ability of classifiers, so as to reduce dimensionality and improve accuracy Effect

Active Publication Date: 2018-11-06
HUAZHONG UNIV OF SCI & TECH
View PDF2 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the above defects or improvement needs of the prior art, the present invention provides a text classification method and system based on part-of-speech classification. The technical problems of low accuracy and poor generalization ability of the classifier

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method and system based on part-of-speech classification
  • Text classification method and system based on part-of-speech classification
  • Text classification method and system based on part-of-speech classification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0052] Such as figure 1 Shown, the text classification method based on part-of-speech classification of the present invention comprises:

[0053] 1. The construction process of the text classifier, which specifically includes the following steps:

[0054] (1) Obtain a training text set and a test text set from the network, and preprocess the training text set and the test text set, so as to ob...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method based on part-of-speech classification. The method comprises the steps of acquiring a training text set and a test text set from a network; preprocessing the training text set and the test text set to obtain a plurality of word sets of each text in the training text set and the test text set; training a text topic generation model LDA by takingthe obtained word sets of each text as inputs to obtain a text-word set-topic mixed probability distribution model of each text under different topic numbers; by using an SVM-train function, carryingout classifier training on a plurality of text-word set-topic mixed probability distribution models to obtain a plurality of trained classifiers; and by using the text-word set-topic mixed probabilitydistribution models as inputs of the trained classifiers, carrying out SVM class prediction. The method can solve the technical problems of high dimension of needed characteristic words during modeltraining, low classification accuracy and poor generalization capability of the classifiers in an existing method.

Description

technical field [0001] The invention belongs to the technical field of computer deep learning, and more specifically relates to a text classification method and system based on part-of-speech classification. Background technique [0002] With the wide application of various social software and self-media software, the data generated by Internet platforms every day is also increasing rapidly. These data mainly include pictures, voice, text, etc., among which text is the main one. In order to classify these massive data, it is necessary to rely on manual screening and extraction of these massive data, which will consume a lot of time and energy, and the effect of classification is not satisfactory. Text classification technology came into being just to improve the efficiency and accuracy of text classification. [0003] The existing text classification methods mainly use the text topic generation model (Latent Dirichletallocation, referred to as LDA), which can be used to ide...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27G06K9/62
CPCG06F40/289G06F40/30G06F18/2411
Inventor 周可李兴曾江峰
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products