Correction-based K nearest neighbor text classification method

A text classification and K-nearest neighbor technology, applied in the field of electronic resource information classification and retrieval, can solve the problems of consuming manpower and material resources, prone to human errors, reducing the classification performance of classifiers, etc., and achieve the effect of accurate classification results

Active Publication Date: 2011-04-27
INFORMATION & COMM BRANCH OF STATE GRID JIANGSU ELECTRIC POWER
View PDF2 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

From the 1960s to the end of the 1980s, knowledge engineering technology was the most important and most effective content text classification system during this period. It mainly used artificial methods to build classifiers, which was labor-intensive and prone to human errors.
There is obviously a problem in this kind of rule: when the sample distribution density is not uniform, only the order of the first K nearest neighbor samples is taken without considering their distance difference. Generally, the K nearest neighbors have classification decisions that tend to be large categories, which will reduce the Classification performance of the classifier

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Correction-based K nearest neighbor text classification method
  • Correction-based K nearest neighbor text classification method
  • Correction-based K nearest neighbor text classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] A modified K-nearest neighbor text classification method according to the present invention, the method firstly performs word segmentation on each document in the training text set, removes stop words, expresses the text as an item, and then reduces the dimensionality of the text vector , select document features that are as few as possible and closely related to the concept of the document topic, and finally use the deviation-based K-nearest neighbor text classification algorithm to construct a classifier for classification, and obtain the classification result.

[0015] 1) Text preprocessing; collect text and perform preprocessing, including processing garbled text and non-text content word segmentation and removing stop words, and deleting irrelevant text; since text preprocessing is not the focus of this invention, it will not be described in detail.

[0016] 2) Text feature selection, which should select as few and accurate document features as possible that are cl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a correction-based K nearest neighbor text classification method, which comprises the following steps of: pre-processing texts, namely performing word segmentation on each document in a training text set, removing stop words, and performing projectized expression on the texts; selecting text characteristics, namely reducing dimensions of text vectors, and selecting document characteristics closely related with a document theme concept as few as possible; and finally, performing classification by using a correction-based K nearest neighbor text classification algorithm construction classifier to obtain a classification result. The method has accurate classification result.

Description

technical field [0001] The invention belongs to the field of electronic resource information classification and retrieval, and relates to an unstructured text classification and management method, in particular to a modified K-nearest neighbor text classification method. Background technique [0002] In recent years, with the rapid development of information technology, especially the popularization of the Internet and the large-scale application of databases, the electronic resource information on the Internet has increased dramatically. Facing the information explosion and diversification of information, how to effectively organize and manage these massive information, And it has become a major problem to quickly and accurately obtain the information you need and are really interested in. As a key technology for organizing and processing a large amount of electronic resource information, text classification technology will help information retrieval and analysis, and facil...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 曹杰伍之昂王有权方仓健
Owner INFORMATION & COMM BRANCH OF STATE GRID JIANGSU ELECTRIC POWER
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products