Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text classification method based on expanded label samples and system

A technology for labeling samples and text classification, applied in pattern recognition and machine learning in the fields of data mining and text classification processing, can solve problems such as low classification accuracy, achieve the effect of improving performance, alleviating lack of problems, and improving accuracy

Inactive Publication Date: 2018-04-20
NANJING UNIV OF POSTS & TELECOMM
View PDF4 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is: in view of the problems and deficiencies in the above-mentioned prior art, the purpose of the present invention is to perform text classification on text data sets, and reduce the influence of "misleading" marked samples by expanding the set of marked samples In order to solve the problem of low classification accuracy of the existing technology on the text with few labeled samples and inaccurate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method based on expanded label samples and system
  • Text classification method based on expanded label samples and system
  • Text classification method based on expanded label samples and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:

[0039] Those skilled in the art can understand that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It should also be understood that terms such as those defined in commonly used dictionaries should be understood to have a meaning consistent with the meaning in the context of the prior art, and will not be interpreted in an idealized or overly formal sense unless defined as herein Explanation.

[0040] The present invention proposes a new semi-supervised text classification method. For text classification on text datasets, the text dataset contains multiple web pages, and the task is to divide these web pages into two categories: useful and useless, namely positive and negative classes....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text classification method based on expanded label samples. Firstly, a real sample data set containing labeled text samples and unlabeled text samples is collected; then high-dependability samples are found through a clustering method KFCM to obtain the expanded label samples; then a square loss function is utilized to formulate a uniform classification objective functionfor labeled, unlabeled and extended sample data, parameters of regularization parameters, a kernel function and the like therein are set, and a text classification function is obtained by learning; and finally, to-be-classified text data are input, the text classification function is utilized for classification, and a class of text is obtained. The invention also provides a text classification system. Compared with error rates of other classical classification algorithms and related algorithms on test sets, error rates of the method are significantly improved. The method solves the problem oflow classification precision of the prior art on text with fewer and inaccurate label samples, and has best mutual information and class variables.

Description

technical field [0001] The invention belongs to the field of text classification processing, and in particular relates to the application of pattern recognition and machine learning in the field of data mining. Background technique [0002] There is no essential difference between the text classification problem and other classification problems. The method can be attributed to matching according to certain characteristics of the data to be classified. Of course, complete matching is not possible, so it must be selected (according to some evaluation criteria) The best matching result to complete the classification. Among them, the selection and training of classifiers, and the evaluation and feedback of classification results are very important. Text classification is a fundamental task in machine learning. [0003] Text classification can be divided into two broad categories, supervised and semi-supervised. Supervised classification means that all text samples have label...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/21
CPCG06F16/35G06F40/117
Inventor 沈雅婷汪云云
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products