Financial document processing method, device and equipment of semi-structured data and medium
By performing structured identification and summarization of financial sample data, a metadata dictionary is generated, which solves the problem of low utilization efficiency of semi-structured financial data, realizes standardized storage and processing of data, and improves data utilization and computer processing efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- PING AN TECH (SHENZHEN) CO LTD
- Filing Date
- 2023-05-31
- Publication Date
- 2026-06-16
AI Technical Summary
Existing technologies for semi-structured financial data are not very efficient in their utilization, lacking a unified storage format and processing method, which leads to difficulties in analysis.
By performing structured identification on financial sample data, generating a metadata dictionary, handling missing data and standardizing the data, constructing a data model and data processing logic library, and converting semi-structured data into structured data and storing it in the database.
It improves the utilization rate and computer processing efficiency of semi-structured financial data, ensures the accuracy and integrity of data, and simplifies the data processing process.
Smart Images

Figure CN116680443B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of financial document processing, and in particular to a method, apparatus, equipment, and medium for processing semi-structured financial documents. Background Technology
[0002] In the era of big data, numerous application systems based on cloud computing, such as the Internet of Things, mobile internet, and smart terminals, are constantly generating massive amounts of financial data, including single-transaction data. Single-transaction data generally falls into three categories: structured data, semi-structured data, and unstructured data. The scale of single-transaction data typically exceeds the capacity of traditional databases, and existing data processing methods often fail to meet the demands of practical applications in terms of both processing efficiency and results. For structured data, relational query analysis based on SQL statements is often the most efficient method. However, in reality, semi-structured data is often tens or even hundreds of times larger than structured data, yet it lacks a unified storage format and processing methods based on relational algebra, leading to difficulties in analysis and significantly hindering the efficiency of semi-structured data utilization. In summary, existing technologies suffer from the problem of low utilization efficiency of semi-structured financial data, such as single-transaction data. Summary of the Invention
[0003] This invention provides a method, apparatus, device, and medium for processing semi-structured financial documents, with the main purpose of solving the problem of low utilization efficiency of semi-structured financial data in financial data such as single transactions.
[0004] To achieve the above objectives, the present invention provides a method for processing semi-structured data in financial documents, comprising:
[0005] The acquired financial sample data is subjected to structured identification to obtain semi-structured financial sample data;
[0006] The semi-structured financial document data is summarized and processed to obtain a metadata dictionary;
[0007] The missing data is processed in the metadata dictionary to obtain metadata, and a data model and data processing logic library are constructed based on the metadata.
[0008] The semi-structured financial document data is standardized using the data model to obtain structured financial document data, and the structured financial document data is stored in a preset structured financial document database.
[0009] Standardized data logic is extracted from the data processing logic library, and the semi-structured financial document data is benchmarked and filled according to the data logic to obtain structured financial document data. The structured financial document data is then stored in the structured financial document database.
[0010] The financial sample document data is updated using the structured financial document data in the structured financial document database to obtain the target data file.
[0011] Optionally, the step of performing structured identification on the acquired financial sample document data to obtain semi-structured financial document data includes:
[0012] The financial sample data is parsed to obtain financial data and the field attributes and attribute values of the financial data. The field type of the field attribute is determined by the attribute value.
[0013] The test requirements are obtained based on the financial sample data, and the financial data format is determined based on the test requirements.
[0014] Determine whether the financial data is semi-structured financial document data based on the field type and the financial data format.
[0015] Optionally, the process of summarizing the semi-structured financial document data to obtain a metadata dictionary includes:
[0016] The semi-structured financial document data is classified according to data format to obtain multiple data formats;
[0017] The different values of the data format are uniformly encoded to obtain multiple encoded values;
[0018] The data format is used as a row vector, the encoded value is used as a column vector, a semi-structured financial document data table is generated based on the row vector and the column vector, and the semi-structured financial document data table is used as a metadata dictionary.
[0019] Optionally, the step of performing missing data processing on the metadata dictionary to obtain metadata includes:
[0020] Obtain data with flag bits from the metadata dictionary, and determine whether the data field corresponding to the data is missing based on the flag bits;
[0021] When the data field is missing, the degree of data loss is determined based on the flag bit;
[0022] Using a preset average sampling method, the missing data is filled in according to the degree of missing data to obtain filled data, which is then used as metadata.
[0023] Optionally, the step of constructing a data model and data processing logic library based on the metadata includes:
[0024] The metadata is categorized into various type tags, and a relational data table is constructed based on the type tags.
[0025] Obtain the data structure of the metadata, and generate a data model based on the data structure and relational data tables;
[0026] A data processing logic library is created based on the metadata using preset SQL statements.
[0027] Optionally, the standardization process of the semi-structured financial document data using the data model to obtain structured financial document data includes:
[0028] The metadata of the relational data table in the data model is numerically matched with the semi-structured financial document data to obtain the extreme values of the semi-structured financial document data. The extreme values in the semi-structured financial document data are then removed to obtain standard semi-structured financial document data.
[0029] The standard semi-structured financial document data is verified using a preset verification algorithm to obtain data attributes and data types;
[0030] Determine whether the semi-structured financial document data is usable based on the data attributes described above;
[0031] When the semi-structured financial document data is available, the semi-structured financial document data is matched with the metadata in the data model according to the data type to obtain structured financial document data.
[0032] Optionally, the step of benchmarking and filling in the semi-structured financial document data according to the data logic to obtain structured financial document data includes:
[0033] The metadata in the data logic and the semi-structured financial document data are input into a preset keyword extraction model for word segmentation to obtain target keywords and required keywords.
[0034] Determine whether there are any missing keywords in the required keywords based on the target keywords;
[0035] When there are missing keywords, the missing keywords are filled in to obtain complete keywords;
[0036] The complete keywords are aggregated to obtain structured financial transaction data.
[0037] To address the aforementioned problems, the present invention also provides a financial document processing device for semi-structured data, the device comprising:
[0038] The semi-structured financial document data recognition module is used to perform structured recognition on the acquired financial sample document data to obtain semi-structured financial document data.
[0039] The metadata dictionary generation module is used to summarize and process the semi-structured financial document data to obtain a metadata dictionary;
[0040] The metadata processing module is used to handle missing data in the metadata dictionary to obtain metadata, and to construct a data model and a data processing logic library based on the metadata.
[0041] The structured financial document data generation module is used to standardize the semi-structured financial document data using the data model to obtain structured financial document data, and store the structured financial document data in a preset structured financial document database.
[0042] The structured financial document data storage module is used to extract standardized data logic from the data processing logic library, perform benchmarking and value filling on the semi-structured financial document data according to the data logic to obtain structured financial document data, and store the structured financial document data in the structured financial document database.
[0043] The financial sample document data update module is used to update the financial sample document data using the structured financial document data in the structured financial document database to obtain the target data file.
[0044] To address the above problems, the present invention also provides an electronic device, the electronic device comprising:
[0045] At least one processor; and,
[0046] A memory communicatively connected to the at least one processor; wherein,
[0047] The memory stores a computer program that can be executed by the at least one processor, which enables the at least one processor to perform the financial document processing method for semi-structured data described above.
[0048] To address the aforementioned problems, the present invention also provides a computer-readable storage medium storing at least one computer program, which is executed by a processor in an electronic device to implement the aforementioned method for processing semi-structured data in financial documents.
[0049] This invention employs structured identification of financial sample document data, resulting in more accurate semi-structured financial document data. By summarizing and processing this semi-structured financial document data to obtain a metadata dictionary, the data is clearly presented, thereby improving computer processing efficiency. By removing missing metadata from the metadata dictionary, the established data model and data processing logic library are made more complete. The semi-structured financial document data is then processed using the data model and data logic to convert it into structured financial document data, simplifying the data and improving its utilization rate. Therefore, the financial document processing method, apparatus, equipment, and medium proposed in this invention can solve the problem of low utilization efficiency of semi-structured financial data in financial data such as single-transaction transactions. Attached Figure Description
[0050] Figure 1 A flowchart illustrating a method for processing semi-structured data financial documents according to an embodiment of the present invention;
[0051] Figure 2 This is a schematic diagram illustrating the process of performing structured identification on acquired financial sample document data to obtain semi-structured financial document data, according to an embodiment of the present invention.
[0052] Figure 3 This is a flowchart illustrating the process of summarizing and processing semi-structured financial document data to obtain a metadata dictionary, as provided in an embodiment of the present invention.
[0053] Figure 4 A functional block diagram of a financial document processing device for semi-structured data provided in an embodiment of the present invention;
[0054] Figure 5 This is a schematic diagram of the structure of an electronic device that implements the financial document processing method for semi-structured data according to an embodiment of the present invention.
[0055] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation
[0056] It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
[0057] This application provides a method for processing semi-structured financial documents. The execution entity of this method includes, but is not limited to, at least one of the following electronic devices that can be configured to execute the method provided in this application: a server, a terminal, etc. In other words, the method for processing semi-structured financial documents can be executed by software or hardware installed on a terminal device or a server device, and the software can be a blockchain platform. The server includes, but is not limited to, a single server, a server cluster, a cloud server, or a cloud server cluster. The server can be an independent server or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.
[0058] Reference Figure 1 The diagram shown is a flowchart illustrating a method for processing semi-structured financial documents according to an embodiment of the present invention. In this embodiment, the method for processing semi-structured financial documents includes:
[0059] S1. Perform structured identification on the acquired financial sample data to obtain semi-structured financial sample data.
[0060] In this embodiment of the invention, the financial sample data may be materials containing relevant financial documents and data in the financial field, such as electronic transaction data, wherein the electronic transaction data contained therein needs to be structured.
[0061] Please see Figure 2 As shown in this embodiment of the invention, the step of performing structured recognition on the acquired financial sample document data to obtain semi-structured financial document data includes:
[0062] S21. Parse the financial sample data to obtain financial data and the field attributes and attribute values of the financial data, and determine the field type of the field attribute through the attribute values;
[0063] S22. Obtain test requirements based on the financial sample order data, and determine the financial data format based on the test requirements;
[0064] S23. Determine whether the financial data is semi-structured financial document data based on the field type and the financial data format.
[0065] In this embodiment of the invention, the test requirement refers to an item or event that can be verified by one or more test cases, such as a function, feature, semi-structured element, etc.; wherein the financial data format of the semi-structured financial document data includes financial data formats such as JSON (Data Interchange Format), XML (Extensible Markup Language), log file, email, etc.; the field type includes field types such as string, number, enumeration, etc.
[0066] In this embodiment of the invention, the field type and financial data format can be used to determine whether the data is semi-structured financial document data. When the electronic transaction data in the financial sample document data is semi-structured financial document data, the electronic transaction data needs to be extracted and filtered according to multiple financial sample document data to obtain various types of semi-structured financial document data, thereby improving the utilization rate and accuracy of the data.
[0067] S2. The semi-structured financial document data is processed to obtain a metadata dictionary.
[0068] Please see Figure 3 As shown in this embodiment of the invention, the step of summarizing and processing the semi-structured financial document data to obtain a metadata dictionary includes:
[0069] S31. The semi-structured financial document data is classified according to data format to obtain multiple data formats;
[0070] S32. The different values of the data format are uniformly encoded to obtain multiple encoded values;
[0071] S33. Using the data format as a row vector and the encoded value as a column vector, a semi-structured financial document data table is generated based on the row vector and the column vector, and the semi-structured financial document data table is used as a metadata dictionary.
[0072] In this embodiment of the invention, the metadata dictionary is a collection of information describing semi-structured financial document data, and is a collection of definitions for all data elements used in the system.
[0073] In this embodiment of the invention, classifying semi-structured data according to data format can clearly list the data that needs to be used for electronic transactions, thereby improving the processing efficiency of the computer; uniformly encoding different values can make the encoded values more orderly and clear.
[0074] S3. Perform missing data processing on the metadata dictionary to obtain metadata, and construct a data model and data processing logic library based on the metadata.
[0075] In this embodiment of the invention, the step of performing missing data processing on the metadata dictionary to obtain metadata includes:
[0076] Obtain data with flag bits from the metadata dictionary, and determine whether the data field corresponding to the data is missing based on the flag bits;
[0077] When the data field is missing, the degree of data loss is determined based on the flag bit;
[0078] Using a preset average sampling method, the missing data is filled in according to the degree of missing data to obtain filled data, which is then used as metadata.
[0079] In this embodiment of the invention, the flag bits use 0 and 1 to represent whether the corresponding data field in the metadata dictionary is missing. For example, when the second flag bit of the data is 0, it means that the second field of the data is missing, and when the second flag bit of the data is 1, it means that the second field of the data is complete.
[0080] In this embodiment of the invention, the metadata, also known as intermediary data or relay data, is data describing structured financial document data. Artificial intelligence technology can be used to help manage and integrate it. For example, metadata collection of structured financial document data can be achieved through methods such as semantic models, classification and clustering algorithms, and automated data catalogs based on tag systems.
[0081] In this embodiment of the invention, the preset average sampling method refers to collecting the average of relevant measurement data, adjusting the number of randomly selected data to match the collected data with the current data, filling in the missing parts, and making the data complete.
[0082] In this embodiment of the invention, the step of constructing a data model and data processing logic library based on the metadata includes:
[0083] The metadata is categorized into various type tags, and a relational data table is constructed based on the type tags.
[0084] Obtain the data structure of the metadata, and generate a data model based on the data structure and relational data tables;
[0085] A data processing logic library is created based on the metadata using preset SQL statements.
[0086] In this embodiment of the invention, the metadata can be categorized into types such as computational metadata, storage metadata, quality metadata, model metadata, and management metadata; the data model is constructed based on metadata and is a data model describing relational data tables; the preset SQL statement, also known as Structured Query Language, is a language for operating on the database; the relational data tables can be created using preset SQL statements based on the type tags, for example, using the CREATE TABLE statement to create tables; the data processing logic library can be created using the CREATE DATABASE SQL statement.
[0087] In this embodiment of the invention, the degree of data loss is used to measure the data loss situation in electronic transactions, which can more quickly and accurately determine the data loss situation, point the way for the next step of data processing, and improve the adaptability of the data processing method; the use of the average value sampling method to fill in the missing data improves the reliability and efficiency of data filling and reduces the complexity of data processing.
[0088] S4. Standardize the semi-structured financial document data using the data model to obtain structured financial document data, and store the structured financial document data in a preset structured financial document database.
[0089] In this embodiment of the invention, the standardization process of the semi-structured financial document data using the data model to obtain structured financial document data includes:
[0090] The metadata of the relational data table in the data model is numerically matched with the semi-structured financial document data to obtain the extreme values of the semi-structured financial document data. The extreme values in the semi-structured financial document data are then removed to obtain standard semi-structured financial document data.
[0091] The standard semi-structured financial document data is verified using a preset verification algorithm to obtain data attributes and data types;
[0092] Determine whether the semi-structured financial document data is usable based on the data attributes described above;
[0093] When the semi-structured financial document data is available, the semi-structured financial document data is matched with the metadata in the data model according to the data type to obtain structured financial document data.
[0094] In this embodiment of the invention, the numerical matching refers to matching the numerical values of the metadata in the relational data table with the numerical values of the semi-structured financial document data to determine the upper and lower limits of the semi-structured financial document data, and then removing data that exceeds or falls below the limits.
[0095] In this embodiment of the invention, the preset verification algorithm refers to obtaining the attributes and types of the remaining semi-structured financial document data through two verification methods: attribute and data type. Common verification algorithms include parity check, checksum, CRC (cyclic redundancy check), etc. The data types include null, string, number, and date, etc.
[0096] In this embodiment of the invention, the metadata in the data model corresponding to the semi-structured financial document data is found according to the data type. The data structure information of the semi-structured financial document data is matched with the data structure information of the metadata. The successfully matched metadata is used as the structured financial document data.
[0097] In this embodiment of the invention, a data model is used to standardize semi-structured financial document data. The resulting standardized data needs to be stored in a preset structured financial document database. This database contains multiple structured financial document data sets. When it is necessary to convert semi-structured data into structured financial document data, the structured financial document data can be directly obtained, improving computer processing efficiency. However, in addition to storing the structured financial document data in the database, the standardized transaction data can also be stored in a preset transaction pool corresponding to a business node. Subsequently, standardized transaction data can be obtained from the transaction pool for data processing.
[0098] In this embodiment of the invention, extreme value removal processing is performed on the semi-structured financial document data to eliminate data interference items and improve the accuracy of data conclusions; obtaining the data attributes and types of the remaining semi-structured financial document data makes the obtained structured financial document data more complete.
[0099] S5. Extract standardized data logic from the data processing logic library, perform benchmarking and value filling on the semi-structured financial document data according to the data logic to obtain structured financial document data, and store the structured financial document data in the structured financial document database.
[0100] In this embodiment of the invention, the step of benchmarking and filling in the semi-structured financial document data according to the data logic to obtain structured financial document data includes:
[0101] The metadata in the data logic and the semi-structured financial document data are input into a preset keyword extraction model for word segmentation to obtain target keywords and required keywords.
[0102] Determine whether there are any missing keywords in the required keywords based on the target keywords;
[0103] When there are missing keywords, the missing keywords are filled in to obtain complete keywords;
[0104] The complete keywords are aggregated to obtain structured financial transaction data.
[0105] In this embodiment of the invention, the preset keyword extraction model includes a word segmentation network capable of performing word segmentation operations; the metadata in the data logic describes the structured financial document data, and the keywords extracted from the metadata determine whether there are missing keywords in the keywords extracted from the semi-structured financial document data. When keywords are missing, they need to be filled in to make the semi-structured financial document data complete and obtain the structured financial document data.
[0106] In this embodiment of the invention, the complete keywords can be aggregated using a preset aggregation function, such as the count() function. The structured financial order data obtained after processing the semi-structured financial order data according to the data logic also needs to be stored in the structured financial order database or in the transaction pool of the business node corresponding to the structured financial order data for easy recycling.
[0107] In this embodiment of the invention, the semi-structured financial document data is segmented according to the keyword extraction model, which can improve the computer processing efficiency; furthermore, the required keywords and target keywords are matched and filled to obtain structured financial document data, making the obtained structured financial document data more complete.
[0108] S6. Update the financial sample document data using the structured financial document data in the structured financial document database to obtain the target data file.
[0109] In this embodiment of the invention, the semi-structured financial document data in the financial sample document data is replaced with the corresponding structured financial document data using the structured financial document data in the structured financial document database, thereby updating the financial sample document data and obtaining the target data file, thus improving the utilization rate of the semi-structured financial document data in the financial sample document data.
[0110] In this embodiment of the invention, the structured financial document data in the structured financial document database can be applied to various financial data related to the financial industry, such as electronic transaction data. The structured financial document data is used to perform structuring processing on the semi-structured financial document data in the electronic transaction data, thereby improving the utilization rate of the semi-structured financial document data.
[0111] This invention employs structured identification of financial sample document data, resulting in more accurate semi-structured financial document data. By summarizing and processing this semi-structured financial document data to obtain a metadata dictionary, the data is clearly presented, thereby improving computer processing efficiency. By removing missing data from the metadata dictionary, the established data model and data processing logic library are made more complete. The semi-structured financial document data is then processed using the data model and data logic to convert it into structured financial document data, simplifying the data and improving its utilization rate. Therefore, the financial document processing method for semi-structured data proposed in this invention can solve the problem of low utilization efficiency of semi-structured financial data in electronic transactions and other financial data.
[0112] like Figure 4 The diagram shown is a functional block diagram of a financial document processing device for semi-structured data provided in an embodiment of the present invention.
[0113] The semi-structured data financial document processing device 100 of this invention can be installed in an electronic device. Depending on the functions implemented, the semi-structured data financial document processing device 100 may include a semi-structured financial document data recognition module 101, a metadata dictionary generation module 102, a metadata processing module 103, a structured financial document data generation module 104, a structured financial document data storage module 105, and a financial sample document data update module 106. The module described in this invention can also be called a unit, referring to a series of computer program segments that can be executed by the processor of an electronic device and can perform a fixed function, stored in the memory of the electronic device.
[0114] In this embodiment, the functions of each module / unit are as follows:
[0115] The semi-structured financial document data recognition module 101 is used to perform structured recognition on the acquired financial sample document data to obtain semi-structured financial document data.
[0116] The metadata dictionary generation module 102 is used to summarize and process the semi-structured financial document data to obtain a metadata dictionary;
[0117] The metadata processing module 103 is used to perform missing data processing on the metadata dictionary to obtain metadata, and to construct a data model and a data processing logic library based on the metadata.
[0118] The structured financial document data generation module 104 is used to standardize the semi-structured financial document data using the data model to obtain structured financial document data, and store the structured financial document data in a preset structured financial document database.
[0119] The structured financial document data storage module 105 is used to extract standardized data logic from the data processing logic library, perform benchmarking and value filling on the semi-structured financial document data according to the data logic to obtain structured financial document data, and store the structured financial document data in the structured financial document database.
[0120] The financial sample document data update module 106 is used to update the financial sample document data using the structured financial document data in the structured financial document database to obtain the target data file.
[0121] In detail, each module in the semi-structured data financial document processing device 100 described in this embodiment of the invention uses the same technical means as the semi-structured data financial document processing method described in the accompanying drawings, and can produce the same technical effect, which will not be repeated here.
[0122] like Figure 5 The diagram shown is a structural schematic of an electronic device for implementing a semi-structured data financial document processing method according to an embodiment of the present invention.
[0123] The electronic device 1 may include a processor 10, a memory 11, a communication bus 12, and a communication interface 13. It may also include a computer program stored in the memory 11 and capable of running on the processor 10, such as a financial document processing program for semi-structured data.
[0124] In some embodiments, the processor 10 may be composed of integrated circuits, such as a single packaged integrated circuit or multiple integrated circuits with the same or different functions, including combinations of one or more central processing units (CPUs), microprocessors, digital processing chips, graphics processors, and various control chips. The processor 10 is the control unit of the electronic device, connecting various components of the entire electronic device through various interfaces and lines. It executes programs or modules stored in the memory 11 (e.g., financial document processing programs for semi-structured data) and calls data stored in the memory 11 to perform various functions of the electronic device and process data.
[0125] The memory 11 includes at least one type of readable storage medium, including flash memory, portable hard drive, multimedia card, card-type memory (e.g., SD or DX memory), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 11 can be an internal storage unit of an electronic device, such as a portable hard drive. In other embodiments, the memory 11 can be an external storage device of the electronic device, such as a plug-in portable hard drive, Smart Media Card (SMC), Secure Digital (SD) card, Flash Card, etc. Furthermore, the memory 11 can include both internal and external storage units of the electronic device. The memory 11 can be used not only to store application software and various types of data installed on the electronic device, such as the code of a financial document processing program for semi-structured data, but also to temporarily store data that has been output or will be output.
[0126] The communication bus 12 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This bus can be divided into an address bus, a data bus, a control bus, etc. The bus is configured to enable communication between the memory 11 and at least one processor 10, etc.
[0127] The communication interface 13 is used for communication between the aforementioned electronic device and other devices, including a network interface and a user interface. Optionally, the network interface may include a wired interface and / or a wireless interface (such as a Wi-Fi interface, Bluetooth interface, etc.), typically used to establish communication connections between the electronic device and other electronic devices. The user interface may be a display, an input unit (such as a keyboard), or, optionally, a standard wired or wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, or an OLED (Organic Light-Emitting Diode) touchscreen, etc. The display may also be appropriately referred to as a screen or display unit, used to display information processed in the electronic device and to display a visual user interface.
[0128] Figure 5 Only electronic devices with components are shown; it will be understood by those skilled in the art that... Figure 5 The structure shown does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown, or combine certain components, or have different component arrangements.
[0129] For example, although not shown, the electronic device may also include a power supply (such as a battery) to power the various components. Preferably, the power supply can be logically connected to the at least one processor 10 through a power management device, thereby enabling functions such as charging management, discharging management, and power consumption management. The power supply may also include one or more DC or AC power supplies, recharging devices, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components. The electronic device may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be described in detail here.
[0130] It should be understood that the embodiments described are for illustrative purposes only and are not limited to this structure in the scope of the patent application.
[0131] The financial document processing program for semi-structured data stored in the memory 11 of the electronic device 1 is a combination of multiple instructions. When run in the processor 10, it can achieve the following:
[0132] The acquired financial sample data is subjected to structured identification to obtain semi-structured financial sample data;
[0133] The semi-structured financial document data is summarized and processed to obtain a metadata dictionary;
[0134] The missing data is processed in the metadata dictionary to obtain metadata, and a data model and data processing logic library are constructed based on the metadata.
[0135] The semi-structured financial document data is standardized using the data model to obtain structured financial document data, and the structured financial document data is stored in a preset structured financial document database.
[0136] Standardized data logic is extracted from the data processing logic library, and the semi-structured financial document data is benchmarked and filled according to the data logic to obtain structured financial document data. The structured financial document data is then stored in the structured financial document database.
[0137] The financial sample document data is updated using the structured financial document data in the structured financial document database to obtain the target data file.
[0138] Specifically, the specific implementation method of the processor 10 for the above instructions can be referred to the description of the relevant steps in the corresponding embodiment of the accompanying drawings, and will not be repeated here.
[0139] Furthermore, if the modules / units integrated in the electronic device 1 are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. The computer-readable storage medium can be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, or a read-only memory (ROM).
[0140] The present invention also provides a computer-readable storage medium storing a computer program, which, when executed by a processor of an electronic device, can perform the following:
[0141] The acquired financial sample data is subjected to structured identification to obtain semi-structured financial sample data;
[0142] The semi-structured financial document data is summarized and processed to obtain a metadata dictionary;
[0143] The missing data is processed in the metadata dictionary to obtain metadata, and a data model and data processing logic library are constructed based on the metadata.
[0144] The semi-structured financial document data is standardized using the data model to obtain structured financial document data, and the structured financial document data is stored in a preset structured financial document database.
[0145] Standardized data logic is extracted from the data processing logic library, and the semi-structured financial document data is benchmarked and filled according to the data logic to obtain structured financial document data. The structured financial document data is then stored in the structured financial document database.
[0146] The financial sample document data is updated using the structured financial document data in the structured financial document database to obtain the target data file.
[0147] In the several embodiments provided by this invention, it should be understood that the disclosed devices, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and other division methods may be used in actual implementation.
[0148] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.
[0149] Furthermore, the functional modules in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or in the form of hardware plus software functional modules.
[0150] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the present invention can be implemented in other specific forms without departing from the spirit or essential characteristics of the present invention.
[0151] Therefore, the embodiments should be considered exemplary and non-limiting in all respects, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be embraced within the invention. No appended diagram markings in the claims should be construed as limiting the scope of the claims.
[0152] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.
[0153] Furthermore, it is clear that the word "comprising" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices recited in a system claim may also be implemented by a single unit or device through software or hardware. The terms "first," "second," etc., are used to indicate names and do not indicate any specific order.
[0154] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims
1. A method for processing semi-structured financial documents, characterized in that, The method includes: The acquired financial sample data is subjected to structured identification to obtain semi-structured financial sample data; The semi-structured financial document data is classified according to data format to obtain multiple data formats. Different values of the data formats are uniformly encoded to obtain multiple encoded values. The data formats are used as row vectors and the encoded values are used as column vectors. A semi-structured financial document data table is generated based on the row vectors and column vectors. The semi-structured financial document data table is used as a metadata dictionary. The missing data is processed in the metadata dictionary to obtain metadata, and a data model and data processing logic library are constructed based on the metadata. The semi-structured financial document data is standardized using the data model to obtain structured financial document data, and the structured financial document data is stored in a preset structured financial document database. Standardized data logic is extracted from the data processing logic library, and the semi-structured financial document data is benchmarked and filled according to the data logic to obtain structured financial document data. The structured financial document data is then stored in the structured financial document database. The financial sample document data is updated using the structured financial document data in the structured financial document database to obtain the target data file.
2. The financial document processing method for semi-structured data as described in claim 1, characterized in that, The process of performing structured identification on the acquired financial sample document data to obtain semi-structured financial document data includes: The financial sample data is parsed to obtain financial data and the field attributes and attribute values of the financial data. The field type of the field attribute is determined by the attribute value. The test requirements are obtained based on the financial sample data, and the financial data format is determined based on the test requirements. Determine whether the financial data is semi-structured financial document data based on the field type and the financial data format.
3. The financial document processing method for semi-structured data as described in claim 1, characterized in that, The step of performing missing data processing on the metadata dictionary to obtain metadata includes: Obtain data with flag bits from the metadata dictionary, and determine whether the data field corresponding to the data is missing based on the flag bits; When the data field is missing, the degree of data loss is determined based on the flag bit; Using a preset average sampling method, the missing data is filled in according to the degree of missing data to obtain filled data, which is then used as metadata.
4. The financial document processing method for semi-structured data as described in claim 1, characterized in that, The step of constructing a data model and data processing logic library based on the metadata includes: The metadata is categorized into various type tags, and a relational data table is constructed based on the type tags. Obtain the data structure of the metadata, and generate a data model based on the data structure and relational data tables; A data processing logic library is created based on the metadata using preset SQL statements.
5. The financial document processing method for semi-structured data as described in claim 1, characterized in that, The standardization process of the semi-structured financial document data using the data model to obtain structured financial document data includes: The metadata of the relational data table in the data model is numerically matched with the semi-structured financial document data to obtain the extreme values of the semi-structured financial document data. The extreme values in the semi-structured financial document data are then removed to obtain standard semi-structured financial document data. The standard semi-structured financial document data is verified using a preset verification algorithm to obtain data attributes and data types; Determine whether the semi-structured financial document data is usable based on the data attributes described above; When the semi-structured financial document data is available, the semi-structured financial document data is matched with the metadata in the data model according to the data type to obtain structured financial document data.
6. The financial document processing method for semi-structured data as described in any one of claims 1 to 5, characterized in that, The step of benchmarking and filling in the semi-structured financial document data according to the data logic to obtain structured financial document data includes: The metadata in the data logic and the semi-structured financial document data are input into a preset keyword extraction model for word segmentation to obtain target keywords and required keywords. Determine whether there are any missing keywords in the required keywords based on the target keywords; When there are missing keywords, the missing keywords are filled in to obtain complete keywords; The complete keywords are aggregated to obtain structured financial transaction data.
7. A financial document processing apparatus for semi-structured data, used to implement the financial document processing method for semi-structured data as described in any one of claims 1 to 6, characterized in that, The device includes: The semi-structured financial document data recognition module is used to perform structured recognition on the acquired financial sample document data to obtain semi-structured financial document data. The metadata dictionary generation module is used to summarize and process the semi-structured financial document data to obtain a metadata dictionary; The metadata processing module is used to handle missing data in the metadata dictionary to obtain metadata, and to construct a data model and a data processing logic library based on the metadata. The structured financial document data generation module is used to standardize the semi-structured financial document data using the data model to obtain structured financial document data, and store the structured financial document data in a preset structured financial document database. The structured financial document data storage module is used to extract standardized data logic from the data processing logic library, perform benchmarking and value filling on the semi-structured financial document data according to the data logic to obtain structured financial document data, and store the structured financial document data in the structured financial document database. The financial sample document data update module is used to update the financial sample document data using the structured financial document data in the structured financial document database to obtain the target data file.
8. An electronic device, characterized in that, The electronic device includes: At least one processor; and, A memory communicatively connected to the at least one processor; wherein, The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the financial document processing method for semi-structured data as described in any one of claims 1 to 6.
9. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the financial document processing method for semi-structured data as described in any one of claims 1 to 6.