An automatic archive classification method based on an extreme learning machine

An extreme learning machine and automatic classification technology, applied in neural learning methods, text database clustering/classification, semantic analysis, etc., can solve the problems of insufficient efficiency and low dimensionality of archive text classification, improve classification accuracy, and simplify network training Effect

Pending Publication Date: 2019-04-05
福建南威软件有限公司
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to solve the problem of insufficient efficiency and stability of existing file text classification, and provide an automatic file classification method based on extreme learning machine, which can accurately understand the file content in the text and build an efficient and stable dimension Lower profile dictionaries, while ensuring higher classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An automatic archive classification method based on an extreme learning machine
  • An automatic archive classification method based on an extreme learning machine
  • An automatic archive classification method based on an extreme learning machine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] Example 1: see figure 1 . The file automatic classification method based on the extreme learning machine of the present invention mainly includes two stages: a model learning stage and a model running stage. Each stage contains four modules: preprocessing module, text feature extraction module, low-level feature and middle-level feature fusion module, and archive classification module based on extreme learning machine. The text feature extraction module contains two sub-modules: the bottom-level feature extraction module and the middle-level feature autonomous learning module. The steps are as follows:

[0053] (1) Training sample preprocessing: normalize the text training sample set used for model learning to remove information irrelevant to the task;

[0054] (2) Low-level feature extraction of text training samples: The samples processed by the preprocessing module are sent to the bottom-level feature extraction module to extract the bottom-level features of the text. I...

Embodiment 2

[0064] Example 2: see figure 1 , figure 2 . The file automatic classification method of the extreme learning machine of this embodiment will further describe the technical solution of pooling in step (3) and step (6). The process of this step is as follows:

[0065] (1) Assuming that the archive file contains x words, and t words are left after the bottom-level feature extraction, this text is expressed as , Where the word vector of each word is , Each word vector has k-dimensional features;

[0066] (2) Divide the word vector in the text T into N parts to form N word vector groups, and each group corresponds to t / N word vectors;

[0067] (3) Perform the following operations for each word vector group: accumulate all word vectors in the group, and finally each word vector group will form a feature vector v(z), the dimension of the feature vector is also k;

[0068] (4) Concatenate the feature vectors of N word vector groups to get the feature vector of the entire document, as show...

Embodiment 3

[0071] Example three: see figure 1 , image 3 . In this embodiment, based on the automatic file classification method of the extreme learning machine, the technical solution of step (10) is further described in detail. Step (10) The details of the classification of the sample files to be determined are as follows:

[0072] The algorithm consists of the following steps:

[0073] (1) Extract the bottom-level features, middle-level features and fusion features of the text samples separately;

[0074] (2) Send the three types of features to the trained file classification model based on low-level features, the trained file classification model based on middle-level features, and the trained file classification model based on fusion features;

[0075] (3) Add the output result vectors of the three classification models (each dimension of the vector corresponds to one of the file categories, and the value of each dimension represents the probability that the text sample belongs to the file...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an automatic archive classification method based on an extreme learning machine. The method comprises a learning stage and a running stage, in the first stage, a preprocessingmodule is needed, and the main function of the preprocessing module is to carry out standardized processing on data and remove information incoherent with a task; A preprocessing module firstly unifies text contents into utf-8 coding format; Filtering the illegal characters by adopting a regular expression matching mode; Carrying out word segmentation and part-of-speech tagging by adopting an ICTCLAS Chinese lexical analysis system; And finally, filtering words which often appear in the text but are not significant to text analysis by adopting a Baidu stop word table. According to the method,the archive content in the text can be accurately understood, an efficient and stable archive dictionary with low dimension can be constructed, and meanwhile high classification precision can be guaranteed.

Description

Technical field [0001] The invention belongs to the technical field of text classification, and particularly relates to an automatic file classification method based on an extreme learning machine. Background technique [0002] Facing the massive electronic archives information, the current management mode is to rely on professionals with rich archives work experience to perform manual operation classification and classification supervision in the archives management system. However, with the explosive growth of the number of electronic archives, the manual classification method consumes more and more manpower, which has greatly exceeded the workload of archivists. In addition, different archival professionals have classified the results of the same archive material. There are also unpredictable differences, which may cause inconsistencies in the classification of some archives in the long run. Therefore, classification and management of electronic archives through computer text ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F16/35G06N3/04G06N3/08G06K9/62
CPCG06N3/08G06F40/289G06F40/30G06N3/044G06F18/24
Inventor 曾伟波张建辉林培煜潘淑英陈泰隆
Owner 福建南威软件有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products