Method for classifying documents in mass document library

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A document classification and document library technology, applied in the computer field, can solve problems such as time-consuming and complex document classification, and achieve the effects of improving efficiency, reducing the number of matching times, and simplifying the matching process

Active Publication Date: 2013-04-17

IOL WUHAN INFORMATION TECH CO LTD

View PDF2 Cites 27 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] The present invention aims to provide a method for classifying documents in a massive document library to solve the problem of complicated and time-consuming classification of documents in a reference library by means of term matching

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0017] Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and the embodiments. See figure 1 , The steps of the embodiment include:

[0018] S11: Determine each keyword of all documents in the document library and the correspondence between each keyword and each document to which it belongs;

[0019] S12: Match the keywords one by one in the term database, and use the industry category attribute of the term matched by each keyword as the industry category attribute to which the keyword belongs in each document corresponding to it;

[0020] S13: Determine the same maximum industry category attributes contained in each document according to the corresponding relationship;

[0021] S14: The most industry category attribute is used as the classification of each document.

[0022] The present invention adopts a reverse matching idea to perform term search on documents in a reference library, that is, use all words in the reference libra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a method for classifying documents in a mass document library. The method includes: determining keywords of each document in the document library and correspondence between each keyword and the document that the keyword belongs to; matching the keywords one by one in a term base, using industry category attribute of a term matching with each keyword as the industry category attribute of the keyword belonging to the corresponding document; determining same maximum industry category attributes in each document according to the correspondence; and using the industry category attribute with the maximum attribution as the category of the corresponding document. Documents in a reference library are subjected to term retrieval according to the idea of backward matching. The term base is a set with a character sequence index structure, string matching by dichotomy in the term base needs 1+log2n times of matching calculation at most, and accordingly matching times are decreased greatly, the matching process is simplified and efficiency in document classification is improved.

Description

Technical field [0001] The present invention relates to the field of computers, and in particular, to a method for classifying documents in a massive document library. Background technique [0002] The translation reference library (hereinafter referred to as the reference library) is a document library with a large number of auxiliary translation resources. The general similarity retrieval method is used to classify it according to certain industries, disciplines, and fields, and it needs to be very large The time and space consumed for text similarity matching calculation are unbearable for the system. [0003] Through the large term corpus to calculate the number of terms in the documents in the reference library, the documents can be divided into industries, disciplines, fields and other attributes, and the cost of string pattern matching calculations is much less than the calculations for text similarity matching calculations the amount. [0004] A large term corpus is a large...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/30

Inventor江潮

OwnerIOL WUHAN INFORMATION TECH CO LTD

Method for classifying documents in mass document library

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology