Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and a device for acquiring document information

A document information and document technology, applied in the field of obtaining document information, can solve problems such as time-consuming, labor-intensive, impact, and effect cannot be guaranteed, and achieve the effect of wide application

Active Publication Date: 2019-04-26
DATAGRAND TECH INC
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Some documents are short, the key information is relatively concentrated, and the document format and content are relatively simple, the key information is relatively easy to find
For texts with long text content and various formats and content, it is time-consuming and laborious to find key information
For example, a bond prospectus usually has hundreds of pages, with many contents and a complex document structure, and key information is often scattered in different places in the document. It is time-consuming and laborious to find these key information
[0004] Some existing methods can perform a certain degree of information extraction, but most of the existing methods are based on traditional technologies such as keyword search, text matching, and regular expressions. , resulting in poor information extraction function, which directly affects the subsequent links

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and a device for acquiring document information
  • A method and a device for acquiring document information
  • A method and a device for acquiring document information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is an embodiment of a part of the application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

[0029] Furthermore, the terms "installed", "set", "provided", "connected", "configured to" are to be interpreted broadly. For example, it may be a fixed connection, a detachable connection, or an integral structure; it may be a mechanical connection or an electrical connection; it may be a direct connection or an indirect connection through an intermed...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a document information extraction method and device based on sequence labeling and a learning model. The method comprises the following steps: training at least one sequence labeling algorithm model to obtain at least one offline sequence labeling algorithm model; Determining the accuracy of the annotation information in each of the offline sequence annotation algorithm models, and converting a to-be-processed document into a text document; Obtaining document structure format property information from the to-be-processed document; And inputting the text document and the structural format property information into the offline sequence labeling algorithm model to obtain labeling information corresponding to the document information in the document. According to the method, the key information of the document can be extracted by using the sequence labeling technology. And by using a multi-model fusion technology, different key information in the document can be extracted by using an optimal model. In addition, business rule reasoning and calculation are carried out on a typeface extraction result, and the application range is wider.

Description

technical field [0001] The present application relates to the field of data processing, and in particular, to a method and device for obtaining document information. Background technique [0002] Natural Language Processing (NLP) is simply a technology that enables computers to understand human language. There are many application directions of natural language processing, including text classification, text clustering, abstract extraction, sentiment analysis, text review and other applications. Machines can assist or even replace humans to do some text-related work to a certain extent. [0003] In daily work, document writing, review, review, and revision are relatively common, such as custom-made writing and revision of contract documents, preparation and review of bidding documents, extraction of insurance clauses, information extraction and analysis of securities announcements, etc. At present, NLP technology is still lacking in text writing work, and the effect of many...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/20G06K9/62
CPCG06V10/225G06F18/214
Inventor 高翔王江安怡李瀚清曾彦能赵业辉杨慧宇陈运文纪达麒
Owner DATAGRAND TECH INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products