Storage method and system for semiautomatic extraction and structuralization of document information

A structured document and semi-automatic technology, applied in the field of information retrieval, can solve the problems of automatic positioning of extracted items, text and table structure, etc., and achieve the effects of convenient horizontal expansion, reasonable information organization, and efficient query

Active Publication Date: 2019-04-16
HUAZHONG UNIV OF SCI & TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the defects of the prior art, the purpose of the present invention is to provide a method and system for semi-automatically extracting and storing structured document information, aiming to solve the problem of automatic positioning of extraction items and specific text and table structured issues

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Storage method and system for semiautomatic extraction and structuralization of document information
  • Storage method and system for semiautomatic extraction and structuralization of document information
  • Storage method and system for semiautomatic extraction and structuralization of document information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0047] figure 1 is the structural diagram of the whole system, from figure 1 It can be seen that the system includes an item management module 100 , an automatic extraction algorithm module 200 , an extraction item collection module 300 , a user management module 400 , a WebUI module 500 , and a storage module 600 . The WebUI module 500 supports the visualization and interaction of the three modules of the project management module 100 , the extraction item collection module 300 , and the user management module 400 . The automatic extraction algorithm module 200 is the core algorithm, responsible...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a storage method and system for semiautomatic extraction and structuralization of document information and mainly realizes semiautomatic classification extraction and structuralization on the document information. A Web interface provided by a WebUI module in the system is used for supporting a project management module, an extracted item collection module and a user management module; the project management module imports a to-be-analyzed PDF document; the extracted item collection module manually calibrates and saves information in a to-be-extracted list; the user management module manages user rights; a storage module is adopted for saving document metainformation and extracted item information; and automatic extraction and structuralization on the document information are realized, wherein a core adopts an automatic extraction algorithm module, and the automatic extraction algorithm module is used for scanning an uploaded document, then automatically detecting page prediction for generation of extracted items and performing structuralization on information obtained by the extracted item collection module. The system disclosed by the invention realizes semiautomatic extraction and structuralization on the document information, so that stored information is more reasonable, and speed and efficiency of the system are improved.

Description

technical field [0001] The invention belongs to the technical field of information retrieval, and more specifically relates to a method and system for semi-automatically extracting and storing structured document information. Background technique [0002] At present, with the rapid development of the information age, the demand for document information retrieval is getting higher and higher, and the retrieval systems of many industries have become the indicators of core competition, among which the most prominent performance is in the financial field. [0003] Taking securities companies as an example, it is very important to effectively use a large amount of publicly disclosed asset purchase report information to improve competitiveness. The information of most listed companies is publicly disclosed online in the form of PDF documents for individuals or organizations to download and use. Usually, the knowledge base in the financial field adopts the following two schemes: th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06Q10/10G06F16/951G06F16/22H04L29/08
CPCG06Q10/103H04L67/02H04L67/06
Inventor 李瑞轩熊梦婷李玉华辜希武刘洋张纯鹏李相臣苑雨萌
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products