Document parsing method, system and device applied in big-data analysis technology

A technology of analysis processing and big data, applied in the field of big data analysis, can solve the problems of restricting the acquisition channels of data sources for document analysis, the inapplicability of document analysis solutions, reducing the compatibility and comprehensiveness of document analysis applications, etc., and achieving high accuracy , Improve the effect of application compatibility

Active Publication Date: 2018-05-25
广东广业开元科技有限公司
View PDF10 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in fact, in addition to PDF format, financial data documents will also be saved in other document formats, such as WORD format, EXCEL format, etc., but the existing document parsing solutions cannot be applied to documents in other formats except PDF format. , which limits the acquisition channels of data sources for document parsing, and reduces the application compatibility and comprehensiveness of document parsing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document parsing method, system and device applied in big-data analysis technology
  • Document parsing method, system and device applied in big-data analysis technology
  • Document parsing method, system and device applied in big-data analysis technology

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] Such as figure 1 As shown, the present embodiment provides a document parsing and processing method applied in big data analysis, the method includes the following steps:

[0031] Regular expression rules for constructing financial indicators;

[0032] Obtain the start characteristic index and end characteristic index of the financial statement;

[0033] Use regular expression rules of financial indicators, start feature indicators, and end feature indicators to locate documents in different formats for financial statements;

[0034] After positioning the data in the financial statement, record the financial data and the name and time of the indicators corresponding to the financial data;

[0035] After unit conversion is performed on numerical data, record the converted data.

[0036] Further as a preferred implementation of this embodiment, the step of constructing regular expression rules for financial indicators specifically includes:

[0037] Obtaining a standa...

Embodiment 2

[0063] Such as figure 2 As shown, this embodiment provides a document parsing and processing system applied in big data analysis, the system includes:

[0064] Construction unit, regular expression rules for constructing financial indicators;

[0065] An acquisition unit, configured to acquire the start characteristic index and the end characteristic index of the financial statement;

[0066] The first positioning unit is configured to use the regular expression rule of the financial index, the start feature index and the end feature index to perform positioning processing of financial statements for documents in different formats;

[0067] The second positioning unit is used to record the financial data and the index name and time corresponding to the financial data after positioning the data in the financial statement;

[0068] The conversion unit is used to perform unit conversion on numerical data and record the converted data.

[0069] Further as a preferred implement...

Embodiment 3

[0095] This embodiment provides a document parsing and processing device applied in big data analysis, the device comprising:

[0096] at least one processor;

[0097] at least one memory for storing at least one program;

[0098] When the at least one program is executed by the at least one processor, the at least one processor is made to implement the steps of the document parsing and processing method applied in big data analysis as described in Embodiment 1 above.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a document parsing method, system and device. The method includes: utilizing regular expression rules of financial indexes, starting characteristic indexes and ending characteristic indexes to carry out locating of financial statements on different-format documents; when data in the financial statements are located, recording the financial data and corresponding index namesand times; and carrying out unit conversion on numerical-type data, and then recording the data. The system includes a construction unit, an acquisition unit, a first locating unit, a second locatingunit and a conversion unit. The device includes a memory and a processor. When a program is executed by the processor, the processor is enabled to realize the document parsing method. According to the method, system and device, the financial data in the different-format documents can be quickly and accurately parsed, and application compatibility, comprehensiveness, accuracy, and processing efficiency of a parsing solution are improved. The document parsing method, system and device of the invention can be widely applied in the field of big-data parsing technology.

Description

technical field [0001] The present invention relates to big data analysis technology, in particular to a document analysis processing method, system and device applied in big data analysis. Background technique [0002] Explanation of technical terms: [0003] Regular expression: Use a single string to describe and match a series of strings that match a certain syntax rule. [0004] Balance Sheet: The main accounting statement showing the financial status (that is, the status of assets, liabilities and owner's equity) of an enterprise on a certain date (usually at the end of each accounting period). [0005] Income statement: A report that reflects the operating results of a company during a certain accounting period. [0006] Cash flow statement: A statement that reflects the inflow and outflow of cash and cash equivalents of an enterprise during a certain accounting period. [0007] In the field of corporate financial big data analysis, the acquisition of many financial...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06Q40/00
CPCG06F16/313G06F16/90344G06Q40/125
Inventor 陈贤耿纪晓阳伍紫莹
Owner 广东广业开元科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products