Document classification method and system

A document classification and document technology, applied in the computer field, can solve problems such as low system performance, achieve the effect of improving efficiency, improving system performance, and avoiding limitations

Inactive Publication Date: 2014-12-24
INSPUR BEIJING ELECTRONICS INFORMATION IND
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a document classification method a

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document classification method and system
  • Document classification method and system

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0035] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

[0036] It should be noted that, if there is no conflict, the embodiments of the present invention and various features in the embodiments can be combined with each other, and all fall within the protection scope of the present invention. In addition, although a logical sequence is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than here.

[0037] In the embodiment of the present in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a document classification method and a document classification system, and is applied to a Hadoop cluster comprising a Map program and a Reduce program. The method comprises the following steps that the Map program parses a training document and a document to be classified, determines a characteristic attribute according to a parsing result, and divides the characteristic attribute; the Map program generates a classifier according to the characteristic attribute of the training document and a classification result of the training document; the Reduce program classifies the document to be classified to obtain a classification result of the document to be classified by virtue of the classifier. According to the method and the system, a distributed characteristic of the Hadoop cluster is fully utilized, and the limitation of a conventional system frame is avoided; the method and the system have the characteristics of concurrency and high speed; massive documents can be rapidly classified, so that classification time is saved, and the document classification efficiency and the system performance are improved.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a document classification method and system. Background technique [0002] With the increasing popularity of network technology, the amount of data in the network has increased dramatically, and the types of applications are also very rich. Data mining technology makes full use of existing information resources and finds hidden knowledge from a large amount of data, which is a powerful development direction. Data mining involves fields such as machine learning, pattern recognition, statistics, intelligent databases, data visualization, and high-performance computing. Its purpose is to discover hidden, novel, and interesting relationships and laws from large amounts of data. Among them, document classification is an important direction of data mining. [0003] In the prior art, traditional system frameworks are usually used for document classification, which will lead to long ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 宗栋瑞郭美思吴楠
Owner INSPUR BEIJING ELECTRONICS INFORMATION IND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products