Method for automatically classifying academic documents

A technology for automatic classification and document classification, which is applied in the fields of instruments, computing, and electrical and digital data processing. Effect

Active Publication Date: 2010-09-01
山西同方知网数字出版技术有限公司
View PDF6 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] 1. The acquisition of classification numbers mainly relies on the manual review of editors, which not only consumes a lot of manpower and material resources, but also has low efficiency; a large number of people

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically classifying academic documents
  • Method for automatically classifying academic documents
  • Method for automatically classifying academic documents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the purpose, technical solutions and advantages of the present invention clearer, the implementation of the present invention will be further described in detail below in conjunction with the accompanying drawings:

[0038] This embodiment provides a method for automatic classification of academic documents, the specific implementation process is as follows image 3 As shown, the method includes the following steps:

[0039] Step 10 Enter new thesis resources.

[0040] In step 20, all documents are automatically classified by an automatic classifier.

[0041] Step 30 judges whether the automatic classification result is high accuracy.

[0042] If it is not a high accuracy result, then go to step 40, otherwise go to step 50.

[0043] Step 40 is manual classification.

[0044] Step 50 directly enters the network inspection;

[0045] If the classification number is correct after checking, go to step 60; otherwise go to step 70.

[0046] Step 60 submi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for automatically classifying academic documents, which comprises the following steps: inputting training documents into a database, wherein the training documents comprise document classification numbers; selecting unitary characteristic words and binary characteristic words, and generating binary word pairs for the training documents; reading the training documents in the database, and respectively calculating the probability relations between the unitary and binary characteristic words and the document classification numbers, thereby forming a unitary classification dictionary and a binary classification dictionary; reading a document to be labelled, calculating the Chinese library classification number corresponding to the document according to the weight of the unitary and binary classification dictionaries, and the unitary and binary characteristic words in the document to be labelled, and automatically labelling; and dividing the unitary and binary classification result into a high-accuracy result set and a low-accuracy result set according to the degree of confidence, and outputting the classification result.

Description

technical field [0001] The invention relates to a method for classifying academic documents, in particular to a method for automatically classifying academic documents. Background technique [0002] With the development of information technology, Internet data and resources present massive characteristics. In order to effectively manage and utilize these distributed massive information, content-based information retrieval and data mining have gradually become areas of concern. Among them, text classification (text categorization , referred to as TC) technology is an important basis for information retrieval and text mining, and its main task is to determine its category according to the text content under the pre-given category label (1abel) set. [0003] Text classification has a wide range of applications in natural language processing and understanding, information organization and management, content information filtering and other fields. The text classification method...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 张振海罗霄
Owner 山西同方知网数字出版技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products