KNN text classifying method for optimizing training sample set

A training sample set, text classification technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., can solve problems such as low efficiency and accuracy

Active Publication Date: 2014-09-24
UNIV OF ELECTRONIC SCI & TECH OF CHINA
View PDF2 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0016] The present invention provides a KNN text classification method that optimizes the training sample set to solve the problems of low efficiency and accuracy of the traditional KNN text classification method, and introduces the mutual information v

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • KNN text classifying method for optimizing training sample set
  • KNN text classifying method for optimizing training sample set
  • KNN text classifying method for optimizing training sample set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0084] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0085] see figure 1 and figure 2 , a text classification method based on the optimized sample set KNN algorithm, firstly preprocess the text of the training set, then represent the preprocessed text in a vector space model, and then perform feature extraction on the representation result, and then perform a text classification model Calculation, after text preprocessing, text representation, and feature extraction are performed on the text dataset to be classified, the model is applied to the text dataset to be classified, and finally the result is obtained.

[0086] A kind of KNN text classification method that optimizes training sample set, concrete steps are as follows:

[0087] (1) The total number of predefined text categories is n, and n represents the number of categories of known category samples, that is, the number of categories of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a KNN text classifying method for optimizing a training sample set, which belongs to the fields of text mining, natural language processing and the like, and solves the problems of the traditional KNN text classifying method that the efficiency and accuracy are low. The KNN text classifying method is characterized by comprising the following steps: preprocessing text data for training and to-be-classified text data; carrying out the text representation for the preprocessed text data for the training and the to-be-classified text data; carrying out the characteristic extraction for the text data and the to-be-classified text data after the text representation by utilizing a genetic algorithm; carrying out the classification training for the extracted text data characteristics for the training, carrying out the training classification by utilizing a KNN algorithm of the optimized sample set, and configuring a text classifier; utilizing the text classifier to act on the to-be-classified text data after the characteristic extraction to obtain a classification result of the to-be-classified text data. The KNN text classifying method can be better applied to a text information mining system.

Description

technical field [0001] A KNN text classification method that optimizes the training sample set, classifies text based on the K-nearest neighbor node algorithm that cuts and optimizes the training set, and belongs to the fields of text mining, natural language processing, and the like. Background technique [0002] With the continuous emergence of a large amount of information on the Internet, it has brought great inconvenience to the query and retrieval of information, and people's demand for the quickness and simplicity of obtaining information is increasing day by day. Faced with this problem, text classification technology was proposed. It can organize massive amounts of information in an orderly manner, and can help users discover useful and potential knowledge in a large amount of hidden and unknown text information. [0003] The emergence of text classification technology has immediately attracted people's attention, and it has also become a research hotspot. Text cl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06N3/02
CPCG06F16/35
Inventor 屈鸿谌语绍领解修蕊黄利伟
Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products