Unbalanced text classification method and system combining SVM and semi-supervised clustering

A semi-supervised clustering and text classification technology, applied in text database clustering/classification, unstructured text data retrieval, character and pattern recognition, etc., can solve the problem of poor classification effect of classifiers or algorithms and accurate unbalanced text Classification and other problems to achieve the effect of improving poor classification effect, accurate classification, and improving accuracy

Active Publication Date: 2019-10-08
JIANGSU UNIV
View PDF3 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] According to the problems existing in the prior art, the present invention proposes an unbalanced text classification method and system combining SVM and semi-supervised clustering, which can improve t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unbalanced text classification method and system combining SVM and semi-supervised clustering
  • Unbalanced text classification method and system combining SVM and semi-supervised clustering
  • Unbalanced text classification method and system combining SVM and semi-supervised clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0052] Such as figure 1 , 2 Shown, a kind of unbalanced text classification method that the present invention designs combines SVM and semi-supervised clustering, concrete process is as follows:

[0053] S1. First select the keywords in the text to be tested, and remove the stop words that have little effect on text classification; calculate the weight according to the frequency of keywords, so that the text to be tested is vectorized; secondly, standardize the deviation Normalize the vector; finally output the data set in the vector format used in libsvm as a standard format for subsequent proce...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an unbalanced text classification method and system combining SVM and semi-supervised clustering. The unbalanced text classification method comprises the steps: carrying out preprocessing on a to-be-processed text, and obtaining text data in a vector format, and enabling the text data to serve as a data set; using the training set to train the SVM classifier to obtain a classification model, and using the classification model to predict the test set to obtain the category and confidence of the test set; clustering the data set by using a semi-supervised clustering algorithm to obtain the category to which the test set belongs and the confidence coefficient of the test set; and fusing the category to which the test set obtained by the SVM classifier and the semi-supervised clustering algorithm belongs and the confidence coefficient of the test set to obtain final output. According to the unbalanced text classification method, different types of methods in the technical field of unbalanced text classification are combined; advantage complementation of the different methods is achieved; vectorization and normalization methods are used; and the defect that whenhigh-dimensional sparse text data are processed, a text classification result is inaccurate due to the fact that labeled texts are too few is overcome. The unbalanced text classification method effectively solves the problem of text class imbalance.

Description

technical field [0001] The invention belongs to the field of natural language processing, in particular to the field of unbalanced text classification, in particular to an unbalanced text classification method and system combining SVM and semi-supervised clustering. Background technique [0002] Text classification is a classic problem in the field of natural language processing, and it is widely used in information filtering, mail classification, query intent prediction, text topic tracking and other fields. Traditional text classification methods are mainly designed for balanced text classification problems, and they work well when dealing with small-scale balanced text classification problems with uniform and dense data distribution. But there are still many limitations. Especially in practical applications, due to the characteristics of category imbalance, too few labeled texts, and high-dimensional sparse samples, the complexity of text classification is increased, res...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06K9/62
CPCG06F16/35G06F18/23213G06F18/2411G06F18/254G06F18/214
Inventor 姜震熊相真杜阳冯路捷孙祥瑜
Owner JIANGSU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products