Text classification method based on CNN-SVM-KNN combined model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of text classification and combined models, which is applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., and can solve the problems of low text classification accuracy

Inactive Publication Date: 2019-11-05

HARBIN INST OF TECH

View PDF2 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The purpose of the present invention is to solve the problem of low accuracy rate of text classification in existing methods, and propose a text classification method based on CNN-SVM-KNN combination model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment approach 1

[0037] Specific implementation mode one: combine figure 1 Describe this embodiment, the specific process of the text classification method based on CNN-SVM-KNN combination model in this embodiment is:

[0038] The general process of text classification can generally be divided into the following processes: text preprocessing, feature selection, training and testing, and index evaluation. First, use the training set to establish a classifier model, and then use the model in the test set for classification, and finally compare the predicted category label with the real label, and judge the quality of the classifier through indicators.

[0039] Step 1: Text preprocessing;

[0040] Step 2: Perform feature extraction on the text after step 1 preprocessing to obtain the text after feature extraction;

[0041] Step 3: Establish a CNN model based on step 2;

[0042] Step 4: Establish a CNN-SVM model;

[0043] Step 5: Establish CNN-KNN model;

[0044] Step 6: artificially set the...

specific Embodiment approach 2

[0054] Specific embodiment two: the difference between this embodiment and specific embodiment one is that the text is preprocessed in the step 1; the specific process is:

[0055] Text information is usually composed of words and sentences. Computers cannot directly recognize these text information. Therefore, it is necessary to preprocess the text to remove useless information and convert it into a language that can be recognized by the computer. Since the preprocessing methods of Chinese and English are different, they need to be operated separately.

[0056] Each word in the English text is connected by spaces, so its word segmentation operation can be completed by using spaces to perform word segmentation. Such as Figure 9 ;

[0057] The English text preprocessing process is:

[0058] (1) Convert uppercase letters to lowercase;

[0059] (2) Remove stop words, such as a, an, the words that have no practical meaning;

[0060] (3) morphological restoration; all English...

specific Embodiment approach 3

[0066] Specific embodiment three: the difference between this embodiment and specific embodiment one or two is that in the step 2, the text after the preprocessing of step one is subjected to feature extraction to obtain the text after the feature extraction; the process is:

[0067] Feature selection is to select b(b<B) features from B features, and the other B-b features are discarded.

[0068] So in this case, the new features are only a subset of the original features. The discarded features are considered to be of no importance and cannot represent the theme of the article. After the preprocessing operation, usually the feature matrix at this time will be very large and the dimension is very high, which leads to problems such as excessive calculation, long training time, and low classification accuracy, and feature selection is to eliminate those Part of the unimportant noise retains the features that can highlight the theme of the article, thereby achieving the purpose ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a text classification method based on a CNN-SVM-KNN combined model, and relates to a text classification method based on a combined model. The objective of the invention is tosolve the problem of low text classification accuracy of an existing method. The method specifically comprises the steps of 1, text preprocessing; 2, performing feature extraction on the text preprocessed in the step 1 to obtain a text subjected to feature extraction; 3, establishing a CNN model based on the step 2; 4, establishing a CNN-SVM model; 5, establishing a CNN-KNN model; 6, setting a distinguishing threshold d; 7, calculating the distance: calculating the optimal classification surface distance tmp from the to-be-classified sample points to the CNN-SVM classifier; 8, comparing distances: when tmp is greater than d, selecting a CNN-SVM classifier; otherwise, selecting a CNN-KNN classifier; and 9, repeatedly executing the steps 6 to 9, and searching for the optimal d value of the evaluation index. The method is applied to the field of text classification.

Description

technical field [0001] The invention relates to a text classification method based on a combined model. The invention is used in the field of text classification. Background technique [0002] With the vigorous development of network technology, information on the Internet emerges in an endless stream. It would be too impractical to rely on manual classification of massive information on the Internet. Manual classification will consume a lot of time and resources, and it is difficult to achieve a unified classification result due to the differences between different people. Therefore, after the 1990s, the automatic classification technology through statistics and machine learning has always been the focus of people's attention, and it is also the main application technology of people. However, with the gradual expansion of text resources, it has become more and more difficult to meet people's actual needs. , which brings a severe test to the text classification technology....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/35

CPCG06F16/353

Inventor 郑文斌凤雷刘冰付平孙媛媛石金龙叶俊涛王天城魏明晨徐明珠吴瑞东

Owner HARBIN INST OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Text classification method based on CNN-SVM-KNN combined model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment approach 1

specific Embodiment approach 2

specific Embodiment approach 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology