Method and device of file classification and generation of support vector machine model

A support vector machine and document classification technology, which is applied in computer parts, character and pattern recognition, special data processing applications, etc., can solve problems such as unsatisfactory, inaccurate document classification results, low efficiency, etc., and achieve high accuracy Effect

Active Publication Date: 2013-05-15
新浪技术(中国)有限公司
View PDF3 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0042] The inventors of the present invention have found that the document automatic classification method in the prior art can classify documents with a single category level; however, the document automatic classification method in the prior art is not suitable for the classification of documents with multi-level categories, and the document classification results Imprecise and not ideal; therefore, at present, for documents of multi-level categories, such as news documents, manual methods are still used for classification, which makes the workload of the staff heavy and the efficiency is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device of file classification and generation of support vector machine model
  • Method and device of file classification and generation of support vector machine model
  • Method and device of file classification and generation of support vector machine model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0074] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in further detail below with reference to the accompanying drawings and preferred embodiments. However, it should be noted that many of the details listed in the specification are only for readers to have a thorough understanding of one or more aspects of the present invention, and these aspects of the present invention can be implemented even without these specific details.

[0075] As used herein, terms such as "module" and "system" are intended to include computer-related entities such as, but not limited to, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a module may be, but is not limited to being limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and / or a computer. For example, both an applicatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a device of file classification and generation of support vector machine model. The method comprises the steps of confirming classification of a file to be classified according to feature vectors of the file to be classified and the support vector machine model generated through a training set processed through classification flattening, wherein the classification flattening processing process of the training set comprises the steps of sequencing, according to levels of classifications from top to bottom, preset classification of each training sample in the training set; judging whether a sub-classification exists in the classification of each sample from a classification relatively high in level; and deleting the classification from the classification of the training sample if the sub-classification exists in the classification of each sample. Due to the fact that classification flattening treatment is conducted according to the level relationship between classifications, the obtained support vector machine model is enabled to be suitable for classification of multi-level classifications, and the classification is enabled to have better accuracy.

Description

technical field [0001] The invention relates to computer processing technology, in particular to methods and devices for document classification and support vector machine model generation. Background technique [0002] In recent years, with the rapid development of the Internet, the document resources on the Web (network) have shown explosive growth, and the document information has a large amount of data and complicated content. Compared with structured information in databases, unstructured or semi-structured web document information is richer and more complicated. In order to make full use of these document resources and enable users to quickly and effectively find the information they need and extract potentially valuable information, it is necessary to classify these documents. [0003] At present, the method for automatically classifying documents usually adopts a method based on a support vector machine model; the method includes: a training phase and a classificati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
Inventor 戴明洋
Owner 新浪技术(中国)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products