A knn text classification method with optimized training sample set

A technology for training sample sets and text classification, which is applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., and can solve problems such as low efficiency and accuracy

Active Publication Date: 2017-02-15
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF2 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0016] The present invention provides a KNN text classification method that optimizes the training sample set to solve the problems of low efficiency and accuracy of the traditional KNN text classification method, and introduces the mutual information value into the genetic In the algorithm, the advantages of the two extraction methods can be combined to make the feature extraction results more reliable, so that the entire text classification can be better applied to the text information mining system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A knn text classification method with optimized training sample set
  • A knn text classification method with optimized training sample set
  • A knn text classification method with optimized training sample set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0083] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0084] see figure 1 with figure 2 , a text classification method based on the optimized sample set KNN algorithm, firstly preprocess the text of the training set, then represent the preprocessed text in a vector space model, and then perform feature extraction on the representation result, and then perform a text classification model Calculation, after text preprocessing, text representation, and feature extraction are performed on the text dataset to be classified, the model is applied to the text dataset to be classified, and finally the result is obtained.

[0085] A kind of KNN text classification method that optimizes training sample set, concrete steps are as follows:

[0086] (1) The total number of predefined text categories is n, and n represents the number of categories of known category samples, that is, the number of categories o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a KNN text classification method for optimizing a training sample set, belongs to the fields of text mining, natural language processing and the like, and solves the problems of low efficiency and accuracy of the traditional KNN text classification method. The present invention is to carry out text preprocessing to the training text data and the text data to be classified; respectively carry out text representation on the preprocessed training text data and the text data to be classified; Use the genetic algorithm for feature extraction; classify and train the extracted text data features for training, use the KNN algorithm with an optimized sample set for training and classify, and construct a text classifier; apply the text classifier to the text to be classified after feature extraction data to obtain the classification result of the text data to be classified. The present invention can be better applied to text information mining systems.

Description

technical field [0001] A KNN text classification method that optimizes the training sample set, classifies text based on the K-nearest neighbor node algorithm that cuts and optimizes the training set, and belongs to the fields of text mining, natural language processing, and the like. Background technique [0002] With the continuous emergence of a large amount of information on the Internet, it has brought great inconvenience to the query and retrieval of information, and people's demand for the quickness and simplicity of obtaining information is increasing day by day. Faced with this problem, text classification technology was proposed. It can organize massive amounts of information in an orderly manner, and can help users discover useful and potential knowledge in a large amount of hidden and unknown text information. [0003] The emergence of text classification technology has immediately attracted people's attention, and it has also become a research hotspot. Text cl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06N3/02
CPCG06F16/35
Inventor 屈鸿谌语绍领解修蕊黄利伟
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products