Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Automatic file classification system

An automatic classification and text classification technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of low accuracy, and achieve the effect of ensuring efficiency, accuracy, and correctness

Active Publication Date: 2011-01-05
INST OF SCI & TECHN INFORMATION OF CHINA
View PDF4 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention aims at the shortcomings of low accuracy existing in the existing automatic text classification system, and proposes a method based on multiple media (image, audio, video and text information) on the basis of the existing decision-level text automatic classification fusion model. The automatic classification system of documents can obtain classification results with higher accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic file classification system
  • Automatic file classification system
  • Automatic file classification system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] According to the above technical solutions, the present invention will be described in detail below in conjunction with the examples.

[0049] The system of the present invention adopts a JAVA development platform and an Oracle database. The file automatic classification system of the present invention comprises: input module, information extraction module, text preprocessing module, image preprocessing module, audio frequency preprocessing module, video preprocessing module, text classification module (adopting KNN algorithm), image classification module (adopting KNN algorithm) SVM algorithm), audio classification module (GMM algorithm), video classification module (SVM algorithm), fusion module (D-S evidence theory algorithm), output module (display and printer).

[0050] Use this system to classify 21,000 corpus, of which 6,000 are text training corpus, 5,000 are image training corpus, 3,000 are video training corpus, 3,000 are audio training corpus, and 4,000 are t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an automatic file classification system, which belongs to the field of data mining. The automatic file classification system provided by the invention comprises an input module, an information extraction module, a text preprocessing module, an image preprocessing module, a video preprocessing module, an audio preprocessing module, a text classification module, an image classification module, a video classification module, an audio classification module, a fusion module and an output module. In the system, text information, image information, video information and audio information in a file are extracted through the information extraction module, preprocessed by the text preprocessing module, the image preprocessing module, the video preprocessing module and the audio preprocessing module respectively and classified by the text classification module, the image classification module, the video classification module and the audio classification module respectively and classification results are comprehensively processed by the fusion module, so that a final classification result is obtained. A text classification result with higher accuracy can be obtained by the method.

Description

technical field [0001] The invention relates to an automatic file classification system, which belongs to the field of data mining and is suitable for automatic resource classification, network content supervision, spam filtering, digital libraries and the like. Background technique [0002] File automatic classification is a relatively hot research issue in the field of data mining. Its purpose is to train a classification function or classifier, which can map the files to be divided into the given corresponding categories. The goal is to find faster and more accurate ways of managing textual information. [0003] At present, a large number of researches focus on the research of text file classification. For example, Zhang Xiaodan et al. disclosed a decision-level text automatic classification method in the document "A Decision-Level Text Automatic Classification Fusion Method" (national patent, patent application number: 2009100878443). Fusion method, whose classificatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 张晓丹乔晓东朱礼军梁冰
Owner INST OF SCI & TECHN INFORMATION OF CHINA
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More