Method for automatically classifying documents based on K nearest neighbor algorithm under power cloud environment

An automatic classification, nearest neighbor technology, applied in computing, electrical digital data processing, instruments, etc., can solve problems such as reducing classification performance, and achieve the effect of improving execution efficiency

Inactive Publication Date: 2011-08-10
JIANGSU ELECTRIC POWER CO
View PDF1 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although there are improved algorithms, most of them are

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically classifying documents based on K nearest neighbor algorithm under power cloud environment
  • Method for automatically classifying documents based on K nearest neighbor algorithm under power cloud environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] A document automatic classification method based on the K-nearest neighbor algorithm in the power cloud environment of the present invention, the method improves the MapReduce programming framework of cloud computing, wherein the Map function completes the calculation of document similarity, and the reduce function stipulates similarity The K samples with the highest reliability, count the weights of each category to which the nearest neighbor belongs, and output the category with the largest weight. The specific content includes:

[0015] Utilize the metadata in the power system information base to construct a feature word dictionary, a set of forbidden words and a concept set specific to the power system industry. Then, the training set documents are structured, a model is established, and useless and general stopwords are removed according to the stopword set; the document is segmented according to the feature word dictionary; the same concept with different expressio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for automatically classifying documents based on a K nearest neighbor algorithm under a power cloud environment. By the method, a MapReduce programming framework of cloud calculation is improved; a Map function finishes the calculation of document similarity; and a reduce function defines K samples having highest similarity, counts weights of various classifications where the nearest neighbor belongs and outputs the classification having the largest weight so as to automatically classify the documents. By the method, the task for classifying a large quantity of documents can be finished quickly, so the execution time of the task for classifying the documents is shortened greatly, and the classifying efficiency is improved; and the method has robustness.

Description

technical field [0001] The invention belongs to the field of cloud computing and data mining, and relates to a document classification method of an electric power company, in particular to an automatic document classification method based on a K-nearest neighbor algorithm in an electric power cloud environment. Background technique [0002] Automatic document classification technology is a technology that uses natural language, data mining and artificial intelligence technology to enable the program to automatically identify and classify documents after certain training, and has important applications in large-scale data processing. [0003] The traditional K-nearest neighbor algorithm has been widely used in document automatic classification because of its simplicity and effectiveness. Due to the disadvantages of high computational complexity and poor scalability of the traditional K-nearest neighbor algorithm, in the case of a sharp increase in power company documents, if ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 赵俊峰王磊祁建
Owner JIANGSU ELECTRIC POWER CO
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products