Method for processing key table information of image type PDF financial data

A technology for financial data and tabular information, which can be used in electronic digital data processing, natural language data processing, word processing, etc., and can solve problems such as error-prone and low-efficiency

Inactive Publication Date: 2020-04-17
海南港澳资讯产业股份有限公司
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the closed nature of PDF files, the commonly used data processing method can o

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for processing key table information of image type PDF financial data
  • Method for processing key table information of image type PDF financial data
  • Method for processing key table information of image type PDF financial data

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0025] Specific implementation method: If keywords are identified such as: balance sheet, profit distribution statement, cash flow statement, equity change statement, etc. keywords are identified as table titles; such as: assets, notes, period-end balance, and period-begin balance are used as table header keywords ;Identify such as: monetary funds, trading financial assets and other financial index keywords; if the above keyword feature matching conditions appear, extract the PDF page where the financial keyword is located, and initially complete the PDF pages that need to be processed for screening; the above implementation steps complete the preliminary screening It can simplify the workload of subsequent image recognition.

[0026] The present invention analyzes the features of the image tables. The financial data table has the characteristics of large depth and cross-page, and it is necessary to merge the tables across pages. The specific implementation method: use the deep...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of document processing, in particular to a method for processing key table information of image type PDF financial data. A PDF financial data table recognition system is loaded on a cloud server, a user uploads a PDF document needing to be processed to obtain financial data, and the system extracts key financial data through an OCR optical image recognition character interface, a table layout analysis algorithm and a deep learning algorithm; the system serializes the extraction result data, outputs the serialized extraction result data as JSON format data ina structured form, calls a database interface, and stores all table information in the PDF; a financial table recognition performance evaluation system is introduced to perform quantitative evaluationon extracted data, and a financial data table image processing parameter is adjusted by a real-time adjustment system according to a quantitative result, so that the system recognition efficiency isoptimized. By using the method, the image type financial document data can be accurately and quickly analyzed and extracted, and data source channels for financial data storage are increased.

Description

technical field [0001] The invention relates to the fields of data processing and image processing, in particular to a method for processing image-type financial data table information. Background technique [0002] The full name of PDF is Portable Document Format, translated into Portable Document Format, which is a commonly used electronic file format. It has high versatility and compatibility in multiple types of operating systems, and can ensure that data information is not modified or changed due to encoding types during file transfers. Therefore, PDF is used as a mainstream form of file information transfer. PDF files contain a large amount of data information, especially in the field of financial data processing, carrying a large number of key data table information. However, due to the closed nature of PDF files, the commonly used data processing method can only input the content into the database against the image, which is inefficient and error-prone. Therefore, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/18G06F40/174G06K9/00
CPCG06V30/43G06V30/413
Inventor 计璐杨胜
Owner 海南港澳资讯产业股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products