Document classifying method based on network measure index

A document classification and network measurement technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve difficult classification problems, unable to extract feature value sets, unable to effectively classify and other problems

Inactive Publication Date: 2014-08-06
INFORMATION RES INST OF SHANDONG ACAD OF SCI
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, without the dependence of the topic, these classification algorithms cannot macroscopically establish an effective classification model for documents without topic restrictions before the domain is determined.
The above problems exist in the distinction between literary works and scientific and technological documents, that is, it is impossible to effectively classify whether a document belongs to a scientific and technological document, a nove...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document classifying method based on network measure index
  • Document classifying method based on network measure index
  • Document classifying method based on network measure index

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0029] Such as figure 1 As shown, the principle diagram of the document classification method based on the network metric index of the present invention is given, which includes the sample training stage and the document classification stage, and the principle adopted is determined by the wording characteristics of scientific literature, novels and prose The network metrics of the feature network are different. In the sample training stage, the regularity results of different types of documents are obtained through the training of known types of samples; in the document classification stage, the metrics of the feature network of the document to be classified are obtained. , to determine the type of the document to be classified according to the value range in which the metric index falls. For the entire classification method, the sample training stage ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a document classifying method based on a network measure index. The document classifying method comprises a sample training phase and a document classifying phase. The sample training phase comprises the first step of sample collecting, the second step of text segmenting, the third step of word class analyzing, the fourth step of function word and name removing, the fifth step of word frequency counting, the sixth step of characteristic set Vd establishing, the seventh step of characteristic network peak establishing, the eighth step of characteristic network edge establishing, the ninth step of average degree calculating, the tenth step of cluster coefficient calculating, the eleventh step of characteristic path length calculating and the twelfth step of network measure index interval obtaining. The document classifying phase comprises the first step of processing a document to be classified and the second step of judging document classification. According to the document classifying method, classifying is accurate, classifying efficiency is high, the problem that according to an existing classifying method, scientific and technical literature, novels and prose cannot be distinguished is solved, and a scientific classification method and a theoretical foundation is laid for automatic distinguishing of the scientific and technical literature, the novels and the prose.

Description

technical field [0001] The present invention relates to a document classification method based on network metrics, and more specifically, to a method based on network metrics that distinguishes document types based on the different metrics of the characteristic network determined by the characteristics of words used in different documents. Document classification method. Background technique [0002] With the development and progress of Internet technology, the document resources in the network are constantly enriched, including literary works such as novels and essays that enrich people's spiritual life, and scientific and technological documents that provide people with knowledge and lay the foundation for scientific research. The crystallization of technology and technology is the precious wealth of human civilization. However, with the advent of the era of big data, the exponential growth of massive resources poses challenges for the effective organization and managemen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/353
Inventor 魏墨济杨子江朱世伟于俊凤李晨蔡斌雷王蕾冯海洲王彦
Owner INFORMATION RES INST OF SHANDONG ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products