Check patentability & draft patents in minutes with Patsnap Eureka AI!

Table information extraction method, device, storage medium and electronic equipment

A table information and table technology, applied in the field of information processing, can solve the problems of reducing the accuracy of table information extraction results, complex information, labor and time costs, etc., and achieve the effect of improving the efficiency and accuracy of information extraction

Active Publication Date: 2022-04-08
杭州恒生聚源信息技术有限公司 +1
View PDF12 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the information extraction method of tables is mostly manual extraction, which requires a lot of manpower and time costs.
[0003] Although machine learning methods are also used to extract table information, the accuracy of the extraction results is not high for tables with more complex information.
For example, tables in the financial field have complex headers, diverse announcement types, and annotation types. For each different extraction requirement, data needs to be labeled, and the amount of labeling is large, which reduces the final table information extraction results. accuracy rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Table information extraction method, device, storage medium and electronic equipment
  • Table information extraction method, device, storage medium and electronic equipment
  • Table information extraction method, device, storage medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0074] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0075] The terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish different objects, rather than to describe a specific order. Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product or apparatus comprising a series of steps or units is not defined by listed steps or unit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a form information extraction method, device, storage medium and electronic equipment, which preprocesses the merged data in the target form to obtain a two-dimensional matrix; and inputs the row and / or column data of the two-dimensional matrix to the table header Detect the model, obtain the header detection result, and determine the table style of the target table; generate a cell text sequence and a header text matrix according to the processing mode corresponding to the table style; obtain each text in the target field text and the header text sequence The matching result of the target field text is obtained, and the cell object corresponding to the target field text is obtained; based on the value sequence of the cell object corresponding to each target field text, the index information of the row or column corresponding to the matching result is established to realize the information extraction of the target cell , to get the extraction result. The present invention is more suitable for complex form information extraction through the header detection model and the text matching processing of the header text sequence, and improves the efficiency and accuracy of information extraction of complex forms.

Description

technical field [0001] The present invention relates to the technical field of information processing, and in particular, to a method, device, storage medium and electronic device for extracting table information. Background technique [0002] With the growth of data volume, data extraction technology is an effective means to mine target information. Among various forms of information, tables are an important form of data representation in documents, and are usually used to organize the basic information and statistical data of the described objects. Wait. At present, most of the table information extraction methods are manual extraction methods, which require a lot of manpower and time costs. [0003] Although machine learning methods are also used to extract table information, for tables with more complex information, the accuracy of the extraction results is not high. For example, tables in the financial field have complex headers, diverse announcement types and annotat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/31G06F16/25
CPCG06F16/313G06F16/316G06F16/254
Inventor 孙勇丁雪纯于业达顾文斌罗丰
Owner 杭州恒生聚源信息技术有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More