Semi-structured data file processing system, method, apparatus, and storage medium
The semi-structured data file processing system addresses the limitations of RPA by using a Python script to convert semi-structured data into a format that RPA can handle, enhancing its adaptability and flexibility in processing hybrid-format data.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- CHINA THREE GORGES INT CORP
- Filing Date
- 2024-06-12
- Publication Date
- 2026-06-15
AI Technical Summary
Current RPA solutions are limited to processing the simplest table format and struggle to effectively decipher and extract data from hybrid-format semi-structured tables.
A semi-structured data file processing system that integrates an RPA process automation module and a data processing module, utilizing a pre-configured Python script to process semi-structured data files, determining target execution parameters, and adjusting data formats to generate target data files that can be continuously processed by the RPA module.
Enables RPA to process semi-structured data files by converting them into a format that can be continuously processed, providing flexibility and adaptability in handling diverse data formats.
Smart Images

Figure 0007874190000001 
Figure 0007874190000002 
Figure 0007874190000003
Abstract
Description
【Technical Field】 【0001】 This application relates to the technical field of data processing, and specifically to semi-structured data file processing systems, methods, apparatuses, and storage media. 【Background Art】 【0002】 RPA technology (Robotic process automation) is an application technology that realizes the automation of business processes. Based on business logic and predefined rule control, by configuring software or robots, it performs scraping and analysis of application programs and operation data, response triggering, and communication with other digital systems. RPA aims to improve work efficiency, reduce costs, and reduce human errors. 【0003】 The development of RPA technology is driven by various factors. First, enterprises are facing a large number of repetitive and low-value-added tasks that take up employees' time and resources. Second, due to the trend of digital transformation, enterprises are promoting automation solutions to improve efficiency and competitiveness. In addition, with the progress of artificial intelligence and machine learning, machines can better simulate and execute human work tasks. 【0004】 Currently, there are multiple RPA solutions that enterprises can choose and use. These solutions provide a visualized workflow designer, data scraping and processing tools, automated script creation and execution functions, and the ability to integrate with other systems. By simulating human operations, tasks such as filling in forms, copying and pasting data, and sending emails in application programs are executed. 【0005】 However, while RPA technology offers significant advantages in automating business processes, current RPA solutions only support the simplest table format and are unable to effectively decipher and extract data from hybrid-format semi-structured tables. [Overview of the Initiative] [Problems that the invention aims to solve] 【0006】 In view of this, the present invention provides a semi-structured data file processing system, method, apparatus, and storage medium to solve the problem that current RPA only supports the simplest table format and cannot properly decode and extract semi-structured tables in hybrid format. [Means for solving the problem] 【0007】 In a first aspect, the present invention provides a semi-structured data file processing system that includes an RPA process automation module and a data processing module, wherein the RPA process automation module is used to acquire a semi-structured data file to be processed and to send the semi-structured data file to be processed to the data processing module, and the data processing module is used to process the semi-structured data file to be processed using a pre-configured Python script and to obtain a target data file. 【0008】 The semi-structured data file processing system according to the present invention provides flexibility and adaptability when the RPA process automation module processes semi-structured data files that the RPA process automation module cannot process using a pre-configured Python script in the data processing module, thereby obtaining a target data file that the RPA process automation module can continuously process. 【0009】 In one selectable embodiment, the RPA process automation module includes a first login submodule and a retrieve submodule, the first login submodule being used to retrieve first login account information and first login password information, log in to a pre-configured web page based on the first login account information and first login password information, and send a first login success command to the retrieve submodule, the retrieve submodule being used to retrieve a semi-structured data file to be processed on a pre-configured web page according to pre-configured business requirements based on the first login success command. 【0010】 In one selectable embodiment, the first login submodule includes a first acquisition unit and a scraping unit, wherein the first acquisition unit is used to acquire first login account information and first login password information and to transmit the first login account information and first login password information to the scraping unit, and the scraping unit is used to scrape a pre-configured web page based on the web page tags of a pre-configured web page to determine a target input box, and to input the first login account information and first login password information into the target input box to log in. 【0011】 In one selectable embodiment, the acquisition submodule includes an access unit and a first decision unit, wherein the access unit, upon receiving a first login success command, is used to access the target interface and send an access success command to the first decision unit, and the first decision unit, upon receiving an access success command, is used to acquire a semi-structured data file to be processed at the target interface according to pre-configured business requirements. 【0012】 In one selectable embodiment, the data processing module includes a call submodule, a determination submodule, and a data processing submodule, wherein the call submodule is used to call a pre-configured Python script from a pre-configured script library and send the pre-configured Python script to the data processing submodule, the determination submodule is used to determine target execution parameters based on the semi-structured data file to be processed and send the target execution parameters to the data processing submodule, and the data processing submodule is used to process the semi-structured data file to be processed using a pre-configured Python script based on the target execution parameters and obtain a target data file. 【0013】 This invention allows for the determination of target execution parameters for a pre-configured Python script based on the semi-structured data file to be processed. Furthermore, by using a data processing module to execute the called pre-configured Python script based on the target execution parameters, the RPA process automation module can process semi-structured data files that it cannot process into target data files that it can continuously process, thereby providing flexibility and adaptability when the RPA process automation module processes semi-structured data files. 【0014】 In one selectable embodiment, the decision submodule includes a second decision unit, a configuration unit, and a third decision unit, wherein the second decision unit is used to determine a file path and target account information based on a semi-structured data file to be processed, and to send the target account information to the configuration unit and the file path to the third decision unit; the configuration unit is used to set a data type and global computed variables based on the target account information and to send the data type and global computed variables to the third decision unit; and the third decision unit is used to determine target execution parameters based on the file path, data type, and global computed variables. 【0015】 In one selectable embodiment, the data processing submodule includes a first processing unit, a generation unit, and a fourth decision unit, wherein the first processing unit is used to process a semi-structured data file to be processed using a pre-configured Python script based on target execution parameters and a pre-configured first condition, obtain a first dataset, and send the first dataset to the generation unit and the fourth decision unit; the generation unit is used to generate a first data file based on the first dataset and send the first data file to the fourth decision unit; and the fourth decision unit is used to determine a second dataset based on the first dataset and determine a target data file based on the second dataset and the first data file. 【0016】 In one selectable embodiment, the first processing unit includes an acquisition subunit, an adjustment subunit, a first processing subunit, and a determination subunit, wherein the acquisition subunit processes a semi-structured data file to be processed using a pre-configured Python script based on target execution parameters, obtaining a first data subset that satisfies a pre-configured first condition and a second data subset that does not satisfy the pre-configured first condition, and sends the first data subset to the adjustment subunit and the second data subset to the first processing subunit, the adjustment subunit adjusts the data format of each data in the first data subset that does not satisfy a pre-configured data format, obtains a first target data subset, and sends the first target data subset to the determination subunit, the first processing subunit processes the second data subset using pre-configured placeholders, obtains a second target data subset, and sends the second target data subset to the determination subunit, and the determination subunit determines a first dataset based on the first target data subset and the second target data subset. 【0017】 In one selectable embodiment, the fourth decision unit includes a second processing subunit, a generation subunit, and a third processing subunit, wherein the second processing subunit is used to obtain a second dataset based on a second target data subset using a pre-configured processing method and to send the second dataset to the generation subunit, the generation subunit is used to generate a second data file based on the second dataset and the first data file and to send the second data file to the third processing subunit, and the third processing subunit is used to process the second data file according to pre-configured requirements and to obtain a target data file. 【0018】 In one selectable embodiment, a semi-structured data processing system is connected to an SPA system, and an RPA process automation module is further used to receive a target data file sent from the data processing module and upload the target data file to the SPA system. 【0019】 In one selectable embodiment, the RPA process automation module further includes a second login submodule and an upload submodule, the second login submodule being used to obtain second login account information and second login password information, log in to a pre-configured SPA webpage based on the second login account information and second login password information, and send a second login success command to the upload submodule, the upload submodule being used to upload a target data file to the SPA system based on the second login success command. 【0020】 In one selectable embodiment, the upload submodule includes a second acquisition unit and an upload unit, the second acquisition unit being used to acquire a target transaction code and file configuration parameters and to send the target transaction code and file configuration parameters to the upload unit, and the upload unit being used to upload a target data file to the SPA system based on the target transaction code and file configuration parameters. 【0021】 In a second aspect, the present application provides a semi-structured data file processing method applicable to the semi-structured data file processing system described in the first aspect or any one corresponding embodiment thereof, The present invention provides a method for processing semi-structured data files, which includes the steps of: obtaining a semi-structured data file to be processed; calling a pre-configured Python script from a pre-configured script library; determining target execution parameters based on the semi-structured data file to be processed; and processing the semi-structured data file to be processed using the pre-configured Python script based on the target execution parameters to obtain a target data file. 【0022】 The semi-structured data file processing method according to the present application processes a semi-structured data file using the semi-structured data file processing system described in the first aspect of the present application or any one corresponding embodiment, thereby enabling the RPA process automation module to process a semi-structured data file that it cannot process into a target data file that it can continuously process, and providing flexibility and adaptability when the RPA process automation module processes a semi-structured data file. 【0023】 In the third aspect, the present application relates to a semi-structured data file processing device for performing the semi-structured data file processing method according to the second aspect, The present invention provides a semi-structured data file processing device that includes: an acquisition module for acquiring a semi-structured data file to be processed and calling a pre-configured Python script from a pre-configured script library; a determination module for determining target execution parameters based on the semi-structured data file to be processed; and a processing module for processing the semi-structured data file to be processed using a pre-configured Python script based on the target execution parameters and obtaining a target data file. 【0024】 In a fourth aspect, the present invention provides a computer-readable storage medium that stores computer instructions for causing a computer to execute the semi-structured data file processing method according to the second aspect described above. 【0025】 In the fifth aspect, the present application provides a computer device including a memory and a processor communicatively connected to each other, wherein computer instructions are stored in the memory, and the processor executes the computer instructions to execute the semi-structured data file processing method according to the second aspect. 【0026】 To more clearly explain the specific embodiments of the present application or the technical solutions of the prior art, the following briefly describes the drawings necessary for the description of the specific embodiments or the prior art. Obviously, the drawings described below are some embodiments of the present application, and those skilled in the art can obtain other drawings based on these drawings without creative labor. 【Brief Description of the Drawings】 【0027】 [Figure 1] It is a structural block diagram of a semi-structured data file processing system according to an embodiment of the present application. [Figure 2] It is a flowchart of a semi-structured data file processing method according to an embodiment of the present application. [Figure 3] It is a structural block diagram of a semi-structured data file processing apparatus according to an embodiment of the present application. [Figure 4] It is a structural schematic diagram of the hardware of a computer device in an embodiment of the present application. 【Modes for Carrying Out the Invention】 【0028】 To make the objectives, technical solutions and advantages of the embodiments of the present application clearer, the following clearly and completely describes the drawings of the embodiments of the present application and the technical solutions of the embodiments of the present application. Obviously, the described embodiments are some embodiments of the present application, not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of the present application without creative labor belong to the protection scope of the present application. 【0029】 While RPA technology offers significant advantages in automating business processes, current RPA solutions only support the simplest table format and are unable to effectively decipher and extract data from hybrid-format semi-structured tables. 【0030】 For example, in the internet banking systems of multiple different banks, it is necessary to download bank transaction records (in an Excel format set by the bank) according to a specific date, import the data into a predefined Excel template, and upload it to the SAP system. When performing this task, RPA needs to use technologies such as browser page element recognition, automatic mouse clicking, keyboard input, Excel manipulation, and OCR recognition on a computer. RPA robots are used to repeatedly execute pre-configured operational processes by pre-saving data on the pages that need to be operated on, the operation content to be performed on those pages, and the interaction data. 【0031】 When banks download record files, each bank has its own separate Excel format version, and this format version may not match the Excel template supported by the SAP financial system. Generally, information can be extracted using RPA-specific Excel information extraction functions, the data format can be reorganized, and then the data can be entered into a new Excel file according to the template. However, current RPA only supports the simplest table format (i.e., a table with the first row as a header and the rows from the second row onwards), and if the downloaded Excel file format is a semi-structured data format, current RPA cannot properly decode and extract such hybrid semi-structured table semi-structured data files directly. 【0032】 In this embodiment, a semi-structured data file processing system is provided, and Figure 1 is a structural block diagram of the semi-structured data file processing system according to an embodiment of the present application. As shown in Figure 1, the semi-structured data file processing system 1 includes an RPA process automation module 11 and a data processing module 12. 【0033】 Specifically, the semi-structured data file processing system 1 is connected to the SPA system 2. The SPA system is an enterprise resource management software system. 【0034】 The system can be understood to include other devices and equipment. 【0035】 Selectively, the RPA process automation module 11 includes a first login submodule 111, a retrieve submodule 112, a second login submodule 113, and an upload submodule 114. 【0036】 The first login submodule 111 includes a first retrieval unit 1111 and a scraping unit 1112; the retrieval submodule 112 includes an access unit 1121 and a first decision unit 1122; and the upload submodule 114 includes a second retrieval unit 1141 and an upload unit 1142. 【0037】 Selectively, the data processing module 12 includes a call submodule 121, a decision submodule 122, and a data processing submodule 123. 【0038】 The decision submodule 122 includes a second decision unit 1221, a setting unit 1222, and a third decision unit 1223, while the data processing submodule 123 includes a first processing unit 1231, a generation unit 1232, and a fourth decision unit 1233. 【0039】 Furthermore, the first processing unit 1231 includes an acquisition subunit 12311, an adjustment subunit 12312, a first processing subunit 12313, and a decision subunit 12314, while the fourth decision unit 1233 includes a second processing subunit 12331, a generation subunit 12332, and a third processing subunit 12333. 【0040】 Furthermore, the functions of each device in the above system will be explained. 【0041】 Optionally, the RPA process automation module 11 is used to acquire the semi-structured data file to be processed and to send the semi-structured data file to be processed to the data processing module 12. 【0042】 First, the first login submodule 111 obtains the first login account information and the first login password information. Based on the obtained first login account information and first login password information, it logs into a pre-configured web page and makes it selectable. After successful login, it sends a first login success command to the acquisition submodule 112. 【0043】 Specifically, the first acquisition unit 1111 acquires the first login account information and the first login password information and sends them to the scraping unit 1112. 【0044】 Selectively, the scraping unit 1112 scrapes username, password tags, and login buttons from the pre-configured web page according to the web page tags of the pre-configured web page, identifies a target input box which is an input box where account information needs to be entered, and enters the received first login account information and first login password information into the target input box to complete the login. 【0045】 Next, after receiving the first login success command, the acquisition submodule 112, under the control of the first login success command, acquires the corresponding semi-structured data file to be processed on the pre-configured web page according to the pre-configured business requirements. 【0046】 Specifically, under the control of the first login success command, the access unit 1121 accesses the corresponding target interface and sends an access success command to the first decision unit 1122. 【0047】 Selectively, after receiving the access success command, the first decision unit 1122 acquires the corresponding semi-structured data file to be processed at the target interface according to the pre-configured business requirements. 【0048】 For example, if the target interface is a banking transaction interface, the corresponding file will be downloaded from that banking transaction interface according to the selected pre-configured start and end dates. During the download, the file will be downloaded in .csv format and stored in a pre-configured execution path. 【0049】 Optionally, the data processing module 12 is used to process the semi-structured data file to be processed using a pre-configured Python script and obtain the target data file. 【0050】 First, the calling submodule 121 calls a corresponding pre-configured Python script from a pre-configured script library integrated within it, and then sends the pre-configured Python script to the data processing submodule 123. 【0051】 Next, in the decision submodule 122, the target execution parameters of the pre-configured Python script are determined according to the received semi-structured data file to be processed. 【0052】 Specifically, the second decision unit 1221 determines the file path and target account information corresponding to the semi-structured data file to be processed. The file path is the corresponding file storage path when the first decision unit 1122 acquires the corresponding semi-structured data file to be processed at the target interface according to the pre-set business requirements, and the target account information is the account related to the semi-structured data file to be processed. For example, if the semi-structured data file to be processed is a transaction record file of a banking system, the target account information is the bank account related to that file. 【0053】 In the configuration unit 1222, the corresponding data type and global calculation variable are set according to the target account information. For example, the data type corresponding to the transaction record file of a banking system may be a currency type, and the corresponding global calculation variable may be a global variable for calculating the opening balance of each day. In this embodiment, this is not specifically limited and can be determined according to the semi-structured data file to be processed. 【0054】 The data type and global computation variables can be selected and sent to the third decision unit 1223, where the third decision unit 1223 can determine the target execution parameters corresponding to a pre-configured Python script based on the received file path, data type, and global computation variables. 【0055】 For example, the target execution parameters of a pre-configured Python script corresponding to a bank system transaction record file are: This could be the file path of the downloaded transaction record file, or the content of the account-currency type correspondence list corresponding to the transaction record file. 【0056】 Finally, based on the target execution parameters, the data processing submodule 123 executes the pre-configured Python script, which processes the semi-structured data file to be processed into a target data file that the RPA process automation module 11 can continuously process. 【0057】 Specifically, based on the target execution parameters, the acquisition subunit 12311 of the first processing unit 1231 processes the semi-structured data file to be processed using the pre-configured Python script, thereby obtaining a first data subset that satisfies a pre-configured first condition and a second data subset that does not satisfy the pre-configured first condition. The pre-configured first condition is used to determine whether the corresponding data can be directly obtained from the semi-structured data file to be processed. 【0058】 The first data subset is selectively sent to the adjustment subunit 12312, and the second data subset is sent to the first processing subunit 12313. 【0059】 Selectively, the adjustment subunit 12312 adjusts the data format of each data in the first data subset that does not meet the pre-set data format, obtains the adjusted first target data subset, and selectably transmits the first target data subset to the determination subunit 12314. 【0060】 Since the second data subset cannot be directly obtained in the semi-structured data file to be processed, the first processing subunit 12313 processes the second data subset using pre-configured placeholders to obtain a second target data subset, i.e., the second target data subset consists of pre-configured placeholders. 【0061】 Selectively, the decision subunit 12314 can obtain a corresponding first dataset based on the received first target data subset and second target data subset, and transmit the first dataset to the generation unit 1232. 【0062】 Selectively, the generation unit 1232 generates a corresponding first data file according to the first dataset, i.e., the first data file does not contain the data values for each data in the second target data subset, and transmits the first data file to the fourth decision unit 1233. 【0063】 Selectively, after receiving the first data file, the second processing subunit 12331 of the fourth decision unit 1233 obtains the data values of each data in the second target data subset by a pre-set processing method such as calculation, and forms a corresponding second dataset. 【0064】 Optionally, the second processing subunit 12331 sends the second dataset to the generation subunit 12332, which then writes the received second dataset into a formed first data file, generates a corresponding second data file, and sends the second data file to the third processing subunit 12333. 【0065】 Optionally, the third processing subunit 12333, after receiving the second data file, processes the second data file according to pre-configured requirements to obtain a final target data file that the RPA process automation module 11 can continuously process. The pre-configured requirements may be determined according to pre-configured business requirements. For example, if the semi-structured data file to be processed is a transaction record file of a banking system, after obtaining a second data file corresponding to the transaction record file by the above processing process, the row data in the second data file can be divided according to the date to obtain a target data file which is the bank record file for the corresponding date (if no transaction record occurs on the corresponding date, the corresponding file is not generated; for example, since there are 31 days in January, up to 31 new independent .csv files are generated in January, each file containing only the transaction information for that day). 【0066】 Optionally, the RPA process automation module 11 is used to receive a target data file sent from the data processing module 12 and to upload the received target data file to the SPA system connected to the corresponding semi-structured data file processing system 1. 【0067】 First, the second login submodule 113 obtains the second login account information and the second login password information, and then logs in to a pre-configured SPA webpage corresponding to the SPA system according to the obtained second login account information and second login password information to access the SPA system. 【0068】 Next, after successful login, the second login submodule 113 sends a second login success command to the upload submodule 114. 【0069】 Finally, after receiving the second login success command, the upload submodule 114 can upload the target data file to the corresponding SPA system. 【0070】 Specifically, after receiving the second login success command, the second acquisition unit 1141 is used to acquire the corresponding target transaction code (for example, the target transaction code corresponding to the transaction record file of the banking system is the corresponding electronic bank statement) and file configuration parameters, and the target transaction code and file configuration parameters are sent to the upload unit 1142. 【0071】 Optionally, the upload unit 1142 can upload the target data file to the corresponding SPA system according to the received target transaction code and file configuration parameters. 【0072】 The semi-structured data file processing system according to this embodiment can determine the target execution parameters of a pre-configured Python script based on the semi-structured data file to be processed, and, by selecting the target execution parameters, it can process the target semi-structured data file by executing the called pre-configured Python script using a data processing module, thereby processing semi-structured data files that the RPA process automation module cannot process and obtaining target data files that the RPA process automation module can process continuously, providing flexibility and adaptability when the RPA process automation module processes semi-structured data files. 【0073】 As an example, a semi-structured bank transaction record file will be used, and the processing process of the bank transaction record file will be explained based on the semi-structured data file processing system according to this embodiment. 【0074】 1. Download bank record balances (executed by RPA process automation module 11) 【0075】 1. Open the internet banking website and enter your account password to log in. 【0076】 Based on the web page tags, the system scrapes username, password tags, and login buttons from the page, identifies input boxes where account information needs to be entered, automatically fills in the pre-configured account password information, and simulates a single mouse click to log in. 【0077】 2. Download account transaction information 【0078】 The system continuously simulates mouse movements and single-click operations, accessing the trading interface from the account, selecting a pre-set start and end date (in this scenario, both the start and end dates are the current day), then clicking the download button, downloading the file in .csv format, and storing it in a pre-set execution path. 【0079】 2. Analysis of downloadable content (performed by data processing module 12) 【0080】 Specifically, the RPA process automation module 11 calls a pre-written Python script to process and manipulate the contents of the downloaded bank records .csv file row by row, calculate the numbers, and then generates a new .csv file with the processed data as upload material that the RPA process automation module 11 will submit to the SPA system 2 in the next step. 【0081】 The execution parameters for the Python script are: 1. Downloaded bank record file path (in embedded code format), 2. This is the content of the account-currency type correspondence list (in the form of an embedded code). 【0082】 Specifically, the execution step is streamlined by setting the start parameters using embedded code. Before each execution, it's necessary to check whether these two parameters are correctly set and to make timely changes. 【0083】 The processing process of the data processing module 12, which is the execution process of the Python script, is as follows: 【0084】 First, we read the bank accounts associated with this file and set the corresponding currency type (since these two data points are in the same table, each row will be the same), while simultaneously setting some global variables to calculate the daily opening balance. 【0085】 Next, the first cycle begins, which mainly consists of the following: 1. Enter data from the first data subset that can be directly abstracted (including "Transaction Amount," "Account Date," "Transaction Time," "Summary," etc.). 2. Calculate the daily beginning and ending balances and temporarily fill in the blanks with the pre-set placeholder "NaN" (these two data points need to be calculated and cannot be directly abstracted, so they cannot be directly entered in the first cycle). While abstracting the directly enterable data, it is also necessary to adjust the format of some data. For example, the date format needs to be changed from "01 / 06 / 2023" to "20230601," the dollar currency symbol before the amount needs to be removed, string type numbers need to be changed to float format (for calculations), leading zeros need to be added if the bank account number is less than 11 digits, and data format adjustments such as rounding to two decimal places if there are fewer than two decimal places or more than two decimal places. 【0086】 Next, the second cycle is started and used to enter the opening and ending balances corresponding to each date, splitting the row data according to the date and simultaneously generating the corresponding bank record files (if no transaction records occur on the corresponding date, the corresponding file is not generated; for example, since January has 31 days, up to 31 new, independent .csv files will be generated in January, each file containing only the transaction information for that day). 【0087】 Finally, as a notification that a single Python script has completed its task, the words "Execution complete" will be displayed on the execution terminal screen. 【0088】 3. Upload to the SAP system (executed by RPA process automation module 11) 【0089】 1. Log in to SAP 【0090】 Specifically, the process is similar to Step 1 of Step 1, simulating keyboard and mouse operations, entering pre-configured account information into the SAP homepage, and clicking a button to complete the login. 【0091】 2. Complete the upload operation using the SAP webpage's file upload function. 【0092】 The process involves continuously simulating mouse movements and single-click operations, finding transaction code FF.5, importing an electronic bank statement, simulating mouse clicks, setting file upload configuration parameters, selecting the final generated Excel file from the local machine, and finally clicking the execute button. This operation needs to be repeated multiple times, with the number of cycles determined by the number of Excel files generated in the previous step. 【0093】 This embodiment provides a method for processing semi-structured data files, which can be applied to the semi-structured data file processing system 1 described above. The steps shown in the flowchart may be executed in a computer system such as a set of computer-executable instructions. Although the flowchart shows a logical order, the steps shown or described may be executed in a different order than shown here. 【0094】 Figure 2 is a flowchart of a semi-structured data file processing method according to an embodiment of the present invention, and as shown in Figure 2, the process includes the following steps S201 to S203. 【0095】 Step S201: Obtain the semi-structured data file to be processed and call a pre-configured Python script from a pre-configured script library. 【0096】 For specific implementation processes, please refer to the functional descriptions of the RPA process automation module 11 and the calling submodule 121 of the semi-structured data file processing system 1 described above; therefore, a redundant explanation will be omitted here. 【0097】 Step S202: Determine the target execution parameters based on the semi-structured data file to be processed. 【0098】 For details on the implementation process, please refer to the functional description of the decision submodule 122 of the semi-structured data file processing system 1 described above; therefore, a redundant explanation will be omitted here. 【0099】 Step S203: Based on the target execution parameters, process the semi-structured data file to be processed using a pre-configured Python script to obtain the target data file. 【0100】 For details on the implementation process, please refer to the functional description of the data processing submodule 123 of the semi-structured data file processing system 1 described above; therefore, a redundant explanation will be omitted here. 【0101】 The semi-structured data file processing method according to this embodiment processes semi-structured data files using the semi-structured data file processing system according to the above embodiment of the present application, thereby enabling the RPA process automation module to process semi-structured data files that it cannot process into target data files that it can continuously process, and providing flexibility and adaptability when the RPA process automation module processes semi-structured data files. 【0102】 This embodiment further provides a semi-structured data file processing device for realizing the above embodiment and preferred embodiment, and omits redundant explanations of what has already been described. As used below, the term "module" can realize a combination of software and / or hardware with a predetermined function, and the devices described in the following embodiments are preferably realized in software, but can also be realized in hardware, or a combination of software and hardware, and are conceived. 【0103】 This embodiment provides a semi-structured data file processing device, which includes an acquisition module 301, a determination module 302, and a processing module 303, as shown in Figure 3. 【0104】 The acquisition module 301 is used to acquire the semi-structured data file to be processed and to call a pre-configured Python script from a pre-configured script library. 【0105】 The decision module 302 is used to determine the target execution parameters based on the semi-structured data file to be processed. 【0106】 The processing module 303 is used to process the semi-structured data file to be processed using a pre-configured Python script based on the target execution parameters, and to obtain the target data file. 【0107】 Further functional descriptions of each of the above modules and units are the same as those in the corresponding embodiments described above, and therefore, redundant explanations are omitted here. 【0108】 The semi-structured data file processing device in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC (Application Specific Integrated Circuit) circuit, a processor and memory that execute one or more software or fixed programs, and / or other devices that can provide the above functions. 【0109】 The embodiment of the present invention further provides a computer device including a semi-structured data file processing device shown in Figure 3 above. 【0110】 As shown in Figure 4, Figure 4 is a schematic diagram of the structure of a computer device according to an optional embodiment of the present invention, and as shown in Figure 4, the computer device includes one or more processors 10, memory 20, and interfaces for connecting each component, including a high-speed interface and a low-speed interface. Each component communicates with one another via different buses and may be mounted on a common motherboard or otherwise mounted as needed. The processors can process instructions executed within the computer device, including instructions stored in or on memory for displaying graphical information of a GUI to an external input / output device (e.g., a display device coupled to the interface). In some optional embodiments, multiple processors and / or multiple buses may be used together with multiple memories as needed. Similarly, multiple computer devices may be connected, each providing a portion of the required operations (e.g., functioning as a server array, a set of blade servers, or a multiprocessor system). In Figure 4, one processor 10 is used as an example. 【0111】 The processor 10 may be a central processing unit, a network processor, or a combination thereof. The processor 10 may optionally include a hardware chip. The hardware chip may be an application-specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable logic gate array, a general-purpose array logic, or any combination thereof. 【0112】 The memory 20 stores instructions that can be executed by at least one processor 10, thereby enabling at least one processor 10 to execute and implement the method shown in the above embodiment. 【0113】 Memory 20 may include an operating system, a program storage area for storing application programs required for at least one function, and a data storage area for storing data created in accordance with the use of the computer equipment. Memory 20 may also include high-speed random-access memory and may further include non-temporary memory such as at least one magnetic disk storage device, a flash memory device, or other non-temporary solid-state storage device. In some optional embodiments, memory 20 optionally includes memory remotely installed from the processor 10, and these remote memories may be connected to the computer equipment via a network. Examples of such networks include, but are not limited to, the Internet, a corporate intranet, a local area network, a mobile communication network, and combinations thereof. 【0114】 Memory 20 may include volatile memory such as random access memory, or non-volatile memory such as flash memory, hard disk, or solid-state drive, and memory 20 may include a combination of the above types of memory. 【0115】 The computer device further includes a communication interface 30 for the computer device to communicate with other devices or a communication network. 【0116】 Embodiments of the present application further provide a computer-readable storage medium, and the methods according to the embodiments of the present application may be implemented in hardware, firmware, or in a recordable manner on a storage medium, or as computer code downloaded over a network, originally stored on a remote or non-temporary machine-readable storage medium but stored on a local storage medium, thereby enabling the methods described herein to be processed by a general-purpose computer, a dedicated processor, or software stored on a storage medium using programmable or dedicated hardware. The storage medium may be a magnetic disk, an optical disk, read-only memory, random access memory, flash memory, a hard disk, or a solid-state drive, and optionally, the storage medium may include a combination of the above types of memory. As understood, the computer, processor, microprocessor controller, or programmable hardware includes a storage component capable of storing or receiving software or computer code, and when the software or computer code is accessed and executed by the computer, processor, or hardware, the methods described in the embodiments are implemented. 【0117】 While embodiments of the present application have been described with reference to the drawings, those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present application, and such changes and modifications all fall within the scope defined in the attached claims.
Claims
[Claim 1] A semi-structured data file processing system, comprising an RPA process automation module and a data processing module, The RPA process automation module is used to acquire a semi-structured data file to be processed and to transmit the semi-structured data file to the data processing module. The aforementioned data processing module is used to process the semi-structured data file to be processed using a pre-configured Python script and to obtain a target data file. The aforementioned data processing module includes a call submodule, a decision submodule, and a data processing submodule, The aforementioned calling submodule is used to call the aforementioned pre-configured Python script from a pre-configured script library and to send the aforementioned pre-configured Python script to the data processing submodule. The determination submodule is used to determine target execution parameters based on the semi-structured data file to be processed and to transmit the target execution parameters to the data processing submodule. The data processing submodule is used to process the semi-structured data file to be processed using the pre-configured Python script based on the target execution parameters, and to obtain the target data file. The aforementioned data processing submodule includes a first processing unit, a generation unit, and a fourth decision unit. The first processing unit is used to process the semi-structured data file to be processed using the pre-configured Python script based on the target execution parameters and the pre-configured first conditions, to obtain a first dataset, and to transmit the first dataset to the generation unit and the fourth decision unit. The generation unit is used to generate a first data file based on the first dataset and to transmit the first data file to the fourth decision unit. A semi-structured data file processing system characterized in that the fourth decision unit is used to determine a second data set based on the first data set, and to determine the target data file based on the second data set and the first data file. [Claim 2] The RPA process automation module includes a first login submodule and a retrieve submodule, The first login submodule is used to obtain first login account information and first login password information, log in to a pre-configured web page based on the first login account information and first login password information, and send a first login success command to the acquisition submodule. The system according to claim 1, characterized in that the acquisition submodule is used to acquire the semi-structured data file to be processed on the pre-configured web page in accordance with the pre-configured business requirements based on the first login success command. [Claim 3] The first login submodule includes a first acquisition unit and a scraping unit, The first acquisition unit is used to acquire the first login account information and the first login password information, and to transmit the first login account information and the first login password information to the scraping unit. The system according to claim 2, characterized in that the scraping unit is used to scrape the pre-configured web page based on the web page tags of the pre-configured web page to determine the target input box, and to input the first login account information and the first login password information into the target input box to log in. [Claim 4] The acquisition submodule includes an access unit and a first decision unit, The access unit, upon receiving the first login success command, is used to access the target interface and transmit the access success command to the first decision unit. The system according to claim 2, characterized in that, upon receiving the access success command, the first decision unit is used to acquire the semi-structured data file to be processed at the target interface in accordance with the pre-set business requirements. [Claim 5] The aforementioned decision submodule includes a second decision unit, a setting unit, and a third decision unit. The second decision unit is used to determine the file path and target account information based on the semi-structured data file to be processed, to transmit the target account information to the setting unit, and to transmit the file path to the third decision unit. The setting unit is used to set the data type and global calculation variables based on the target account information, and to transmit the data type and global calculation variables to the third decision unit. The system according to claim 1, characterized in that the third decision unit is used to determine the target execution parameters based on the file path, the data type, and the global calculation variables. [Claim 6] The first processing unit includes an acquisition subunit, an adjustment subunit, a first processing subunit, and a determination subunit. The acquisition subunit is used to process the semi-structured data file to be processed using the pre-configured Python script based on the target execution parameters, to obtain a first data subset that satisfies the pre-configured first condition and a second data subset that does not satisfy the pre-configured first condition, to send the first data subset to the adjustment subunit and to send the second data subset to the first processing subunit. The adjustment subunit is used to adjust the data format of each data in the first data subset that does not satisfy a predetermined data format, to obtain a first target data subset, and to transmit the first target data subset to the decision subunit. The first processing subunit is used to process the second data subset using pre-configured placeholders, obtain a second target data subset, and transmit the second target data subset to the decision subunit. The system according to claim 1, characterized in that the decision subunit is used to determine the first dataset based on the first target data subset and the second target data subset. [Claim 7] The fourth decision unit includes a second processing subunit, a generation subunit, and a third processing subunit. The second processing subunit is used to obtain the second dataset based on the second target data subset using a pre-configured processing method, and to transmit the second dataset to the generation subunit. The generation subunit is used to generate a second data file based on the second dataset and the first data file, and to transmit the second data file to the third processing subunit. The system according to claim 6, characterized in that the third processing subunit is used to process the second data file according to pre-set requirements and obtain the target data file. [Claim 8] The semi-structured data processing system is connected to the SPA system. The system according to claim 1, further characterized in that the RPA process automation module is used to receive the target data file transmitted from the data processing module and to upload the target data file to the SPA system. [Claim 9] The aforementioned RPA process automation module further includes a second login submodule and an upload submodule, The second login submodule is used to obtain second login account information and second login password information, log in to a pre-configured SPA webpage based on the second login account information and second login password information, and send a second login success command to the upload submodule. The system according to claim 8, characterized in that the upload submodule is used to upload the target data file to the SPA system based on the second login success command. [Claim 10] The aforementioned upload submodule includes a second acquisition unit and an upload unit, The second acquisition unit is used to acquire the target transaction code and file configuration parameters, and to transmit the target transaction code and file configuration parameters to the upload unit. The system according to claim 9, characterized in that the upload unit is used to upload the target data file to the SPA system based on the target transaction code and the file configuration parameters.