Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Text Classification Method with Missing Negative Examples

A text classification and text technology, applied in text database clustering/classification, unstructured text data retrieval, instruments, etc., can solve the problems of lack of statistical theory support, lack of negative example data, poor accuracy, etc., to achieve Improve the classification accuracy, the classifier is accurate, and the classification effect is good

Active Publication Date: 2022-02-22
南京稷图数据科技有限公司
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Research in this area, such as the LSI (Latent Semantic Indexing) model, is sometimes called LSA (Latent Semantic Analysis), which can obtain text compression information to a large extent through SVD (Singular Value Decomposition), but such methods also exist Some defects, such as the obtained model cannot be explained by probability, lack of statistical theory support, etc.
[0007] (2) In the absence of negative example data, it is difficult to select training data for classification models
For example, PCA, decision tree, Bayesian framework, S-EM and other methods are used by scholars in the selection process of negative examples, but these are not classifiers with strong generalization ability, so the final effect is slightly lacking
[0008] (3) When there are many categories that need to be judged, it will take a lot of time to use the trained classification model to score each category, which will seriously affect the use in the production environment
[0009] Through the above analysis, when performing text classification, for text classification lacking negative example data, the classification is difficult due to the lack of negative example data, and the accuracy rate is poor, the effect is poor, and the efficiency is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Text Classification Method with Missing Negative Examples
  • A Text Classification Method with Missing Negative Examples
  • A Text Classification Method with Missing Negative Examples

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The present invention will be further described below in conjunction with embodiment and accompanying drawing.

[0044] The present invention provides a text classification method lacking negative examples. The method is used for text classification. It should be noted that the method is not limited to text classification in a single field, and can be used in various fields. combine figure 1 , the specific steps of the method are:

[0045] S1: Determine the classification text and classification category

[0046] Determine the data text to be classified, and customize the text classification category, where the customized text classification category is used as the positive example category.

[0047] When performing text classification, it is necessary to determine which texts to classify, and the user can customize the classification categories according to the needs, and determine which categories the data texts are to be divided into, and these given categories are...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text classification method lacking negative examples, which belongs to the technical field of machine learning and text classification. This method first determines the data text to be classified, and customizes the text classification category; then trains the TF-IDF model and the LSI model based on the acquired corpus; then builds the feature vector of the text based on the trained TF-IDF model and the LSI model, And build a combined text feature vector based on the ensemble method; then use the ROC-SVM combination algorithm to train the Basic classifier, and can combine the k-means clustering method to train the Basic classifier and train the label classifier at the same time; finally, the to-be-classified The text of the text is initially classified by the Basic classifier, and then screened by Elasticsearch to determine the candidate classification, and then the label classifier is used to accurately classify the documents to be classified into one or several categories in the custom category. The method of the invention can effectively classify text data lacking negative examples, and has high accuracy, good effect and high efficiency.

Description

technical field [0001] The invention belongs to the technical field of machine learning and text classification, and in particular relates to a text classification method lacking negative examples. Background technique [0002] With the development of the Internet, the number of Internet texts has increased dramatically, and the resulting demand for textual classification has also become stronger. In the face of massive data texts, manual classification is obviously impossible, but with the rise of machine learning methods, it provides ideas to solve this demand. Therefore, a large number of researchers have proposed a series of methods around this field. For example, machine learning methods such as naive Bayesian method, decision tree, k-nearest neighbors, and support vector machines have been successfully applied to text classification and achieved good results. However, because the data texts in different fields are intricate and the mechanisms of many methods are diffe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/216G06F40/242G06K9/62
CPCG06F16/35G06F16/355G06F18/23213G06F18/2411G06F18/22G06F18/241G06F18/214
Inventor 吴刚王楠
Owner 南京稷图数据科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products