Method, system and storage medium for classifying text set

A text collection, text technology, applied in special data processing applications, instruments, electrical digital data processing and other directions, can solve the problems of poor classification effect, increased computational complexity, increased parameters, etc., to improve classification efficiency and computational complexity. The effect of reducing, reducing dimensions

Inactive Publication Date: 2018-11-20
HEFEI UNIV OF TECH
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, in the actual application process of the above algorithm, the problem of poor classification effect often occurs due to changes in text content and classification requirements.
For example, the Naive Bayesian classification method is limited by the assumption that the attributes that affect the classification are independent of each other, which is unrealistic; the decision tree method is not effective for the classification of data with continuous characteristics such as text; the k-nearest algorithm is widely used in text classification. The classification effect of ordinary three-layer neural network algorithm is average; SVM is a classifier with high accuracy in text classification for

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, system and storage medium for classifying text set
  • Method, system and storage medium for classifying text set
  • Method, system and storage medium for classifying text set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] Specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention.

[0053] Such as figure 1 Shown is a flowchart of a method for classifying a text collection according to an embodiment of the present invention. exist figure 1 , the method may include the following steps:

[0054] In step S01, the text set to be classified is read and the text set is preprocessed. In this embodiment, the preprocessing may be, for example, desensitizing the read text set; removing stop words in the text set; and performing word segmentation on the text set according to a preset custom dictionary. In an example of the present invention, the text set is

[0055] Text 1: "I am a student of Hefei University of Technology, which is a 211 universi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method, system and storage medium for classifying a text set, and belongs to the technical field of the text classification algorithm. The method comprises the following steps: reading the text set needing to be classified and preprocessing the text set; determining the perplexity of the text set; determining topic number of the text set when the perplexity takes the minimum value; generating the topic vector of the text set by adopting a BTM model according to the topic number; generating a feature vector according to the text set by adopting the Doc2vec model; combining the topic vector and the feature vector to generate a feature space vector of the text set; and serving the feature space vector as the original input space vector of a SVM classifier to input into the SVM classifier, thereby performing the classification. Through the method, system and storage medium for classifying the text set disclosed by the invention, the efficiency of the text classification algorithm can be improved.

Description

technical field [0001] The present invention relates to the technical field of text classification algorithms, in particular to a method, system and storage medium for classifying text collections. Background technique [0002] Text classification algorithm is one of the commonly used algorithms in the field of computer programs. Currently, there are mainly two types of text classification algorithms. One is a classification algorithm based on machine learning methods, such as Naive Bayesian classification method based on probability statistics, decision tree method based on information entropy, k nearest neighbor algorithm, neural network classification algorithm and SVM (Support Vector Machine, Support Vector Machine) classification algorithm, etc.; the other is a classification algorithm based on deep learning, such as CNN (Convolution Neural Network, convolutional neural network) algorithm and RNN (Recurrent Neural Networks, cyclic neural network) classification algorith...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62
CPCG06F18/2411
Inventor 余本功陈杨楠杨颖曹雨蒙岳美许庆堂张培行张宏梅范招娣
Owner HEFEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products