Task processing methods, apparatus, computer equipment and storage media

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing metadata information for partitioned data tables and managing segmented locks, the problem of concurrent write conflicts in data lake files was resolved, achieving stable writing of data lake files and reducing the impact of data latency and inconsistency.

CN117331956BActive Publication Date: 2026-06-30CHINA PING AN PROPERTY INSURANCE CO LTD

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHINA PING AN PROPERTY INSURANCE CO LTD
Filing Date: 2023-10-20
Publication Date: 2026-06-30

Smart Images

Figure CN117331956B_ABST

Patent Text Reader

Abstract

This application belongs to the fields of big data and fintech, and relates to a task processing method, including: receiving concurrent write tasks; determining the first partition bucket number matching the concurrent write task from the metadata information of the partitioned data table; obtaining the first segment lock state value corresponding to the first partition bucket number based on the metadata information; if all first segment lock state values are unlocked, setting the first segment lock state in the metadata information to locked, and executing the concurrent write task until completion, then restoring the first segment lock state value to unlocked; if the first segment lock state value includes a locked state, then executing the concurrent write task only after detecting that it meets the task retry conditions. This application also provides a task processing device, computer equipment, and storage medium. Furthermore, the partitioned data table of this application can be stored in a blockchain. This application effectively avoids the problem of concurrent write conflicts in the data lake file of the partitioned data table.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the fields of big data technology and fintech, and in particular to task processing methods, apparatus, computer equipment and storage media. Background Technology

[0002] With the continuous development of the big data field, data lake technology has gradually emerged, leading to its increasingly widespread application in fintech companies such as insurance companies and banks. Currently, the component technologies of data lakes are also constantly iterating, evolving from Apache Hive to today's Apache Hudi, Apache Iceberg, and others. User demands for data are also constantly increasing, with the timeliness of business data gradually improving from the original T+1 requirement to real-time or near real-time. This presents a greater challenge to data lakes storing foundational data, namely, the near-real-time requirement for synchronizing source data.

[0003] As data lakes evolve towards real-time data processing, concurrent write conflicts in data lake files have become a common problem in the industry. The background is as follows: Data lakes use real-time technology to synchronize data from a data table. The real-time synchronization process continuously writes the latest data to this table using short-job tasks. These continuously generated short-job tasks, characterized by their short duration and high frequency, request and occupy write locks on the data table, preventing other tasks from acquiring write locks. This prevents multiple tasks from concurrently writing to the same data table, resulting in concurrent write conflicts in the data lake files. These concurrent write conflicts can cause data delays or inconsistencies, negatively impacting production. Summary of the Invention

[0004] The purpose of this application is to provide a task processing method, apparatus, computer equipment, and storage medium to solve the existing problem of concurrent write conflicts in data lake files, which can cause data delays or inconsistencies and thus adversely affect production.

[0005] To address the aforementioned technical problems, this application provides a task processing method, employing the following technical solution:

[0006] Determine whether a concurrent write task corresponding to the data lake file write scenario has been received;

[0007] If so, obtain the metadata information from the preset partition data table; wherein, corresponding metadata information has been pre-established for all data lake files in the partition data table;

[0008] The first partition and bucket number of the specified data lake file that matches the concurrent write task is determined from the metadata information;

[0009] Based on the metadata information, obtain the first segment lock status value corresponding to the first segment bin number;

[0010] If the first segment lock status value corresponding to the first segment bucket number is unlocked, then the first segment lock status in the metadata information is set to locked, and the concurrent write task is executed until the concurrent write task is completed, and then the first segment lock status value is restored to unlocked.

[0011] If the first segment lock status value corresponding to the segment bucket number includes a locked state, then the concurrent write task is restricted from execution.

[0012] After detecting that the concurrent write task meets the preset task retry conditions, the concurrent write task is executed until the concurrent write task is completed, and the first segment lock state value is set to the unlocked state.

[0013] Furthermore, the step of determining the first partition bucket number of the specified data lake file matching the concurrent write task from the metadata information specifically includes:

[0014] The concurrent write task is parsed to obtain the corresponding parsed information;

[0015] Extract the file partition description information from the parsed information in the concurrent write task;

[0016] Filter out the specified partition and bucket number that matches the file partition description information from the metadata information;

[0017] Use the specified partition and bucket number as the first partition and bucket number of the specified data lake file.

[0018] Furthermore, prior to the step of obtaining metadata information from the preset partition data table, the method further includes:

[0019] Construct partitioned data tables for all the data lake files;

[0020] The file storage paths of all the data lake files are initialized in the partition data table;

[0021] Based on the preset segmented lock mapping mechanism and the file storage path, the partition data table is initialized to construct corresponding metadata information for all the data lake files in the partition data table.

[0022] Furthermore, the step of initializing the partition data table based on the preset segmented lock mapping mechanism and the file storage path to construct corresponding metadata information for all data lake files in the partition data table specifically includes:

[0023] Based on the file storage path, the second partition bucket number of each data lake file in the partition data table is determined;

[0024] Construct data initialization write tasks corresponding to all the aforementioned data lake files;

[0025] Execute the data initialization write task to add the second partition bucket number corresponding to all the data lake files to the preset segment lock mapping table;

[0026] In the segmented lock mapping table, the second segmented lock status value corresponding to each second segmented zone bucket number is set to the unlocked state to obtain the modified segmented lock mapping table;

[0027] The modified segmented lock mapping table is stored in a preset initial metadata file to obtain the metadata information.

[0028] Furthermore, before the step of executing the concurrent write task after detecting that the concurrent write task meets the preset task retry conditions, until the concurrent write task is completed, and setting the first segment lock state value to the unlocked state, the method further includes:

[0029] In real time, obtain the third segment lock status value from the metadata information that matches the first segment bucket number of the specified data lake file;

[0030] Determine whether all the lock status values of the third segment are in an unlocked state;

[0031] If yes, the concurrent write task is determined to meet the task retry conditions; otherwise, the concurrent write task is determined to not meet the task retry conditions.

[0032] Furthermore, the step of obtaining metadata information from the preset partition data table specifically includes:

[0033] Invoke the preset task verification rules;

[0034] The concurrent write task is validated based on the aforementioned task validation rules;

[0035] If the concurrent write task passes the verification, then the step of obtaining metadata information from the preset partition data table is executed.

[0036] Furthermore, after the step of setting the first segment lock state in the metadata information to a locked state and executing the concurrent write task until the concurrent write task is completed, and then restoring the first segment lock state value to an unlocked state if all the first segment lock state values corresponding to the first segment bucket numbers are unlocked, the method further includes:

[0037] Invoke the preset lock-free algorithm;

[0038] Obtain the current version information of the partition data table;

[0039] The partition data table is updated based on the lock-free algorithm and the current version information.

[0040] To address the aforementioned technical problems, this application also provides a task processing device, which employs the following technical solution:

[0041] The first judgment module is used to determine whether a concurrent write task corresponding to the data lake file write scenario has been received;

[0042] The first acquisition module is used to acquire metadata information from a preset partitioned data table if the condition is met; wherein, corresponding metadata information has been pre-established for all data lake files in the partitioned data table.

[0043] The determination module is used to determine the first partition bucket number of the specified data lake file that matches the concurrent write task from the metadata information;

[0044] The second acquisition module is used to acquire the first segment lock status value corresponding to the first segment bin number based on the metadata information;

[0045] The first processing module is configured to, if the first segment lock status values corresponding to the first segment bucket number are all unlocked, set the first segment lock status in the metadata information to locked, execute the concurrent write task, and then restore the first segment lock status value to unlocked if the first segment lock status value is unlocked.

[0046] The second processing module is used to restrict the execution of the concurrent write task if the first segment lock status value corresponding to the segment bucket number includes a locked status.

[0047] The third processing module is used to execute the concurrent write task after detecting that the concurrent write task meets the preset task retry conditions, until the concurrent write task is completed, and set the first segment lock state value to the unlocked state.

[0048] To address the aforementioned technical problems, this application also provides a computer device, which includes a memory and a processor. The memory stores computer-readable instructions, and the processor executes the computer-readable instructions to implement the steps of the task processing method described above.

[0049] To address the aforementioned technical problems, this application also provides a computer-readable storage medium storing computer-readable instructions. When these computer-readable instructions are executed by a processor, they implement the steps of the task processing method described above.

[0050] Compared with the prior art, the embodiments of this application have the following main advantages:

[0051] First, determine whether a concurrent write task corresponding to the data lake file write scenario has been received. If so, obtain metadata information from a preset partition data table. Then, determine the first partition bucket number of the specified data lake file matching the concurrent write task from the metadata information. Subsequently, obtain the first segment lock status value corresponding to the first partition bucket number based on the metadata information. If all first segment lock status values corresponding to the first partition bucket number are unlocked, set the first segment lock status in the metadata information to locked and execute the concurrent write task until it is completed, then restore the first segment lock status value to unlocked. If the first segment lock status value corresponding to the partition bucket number includes locked, restrict the execution of the concurrent write task. Finally, after detecting that the concurrent write task meets the preset task retry conditions, execute the concurrent write task until it is completed, and set the first segment lock status value to unlocked. This application optimizes the processing of concurrent write tasks corresponding to data lake file write scenarios by using metadata information from a pre-built partitioned data table. It determines the first partition bucket number of a specified data lake file matching the concurrent write task from the metadata information, and obtains the first segment lock state value corresponding to the first partition bucket number from the metadata information. Then, it performs content analysis on the first segment lock state value to accurately respond to the concurrent write task based on the analysis results. The analysis of the segment lock state in the metadata information enables fast and accurate processing of concurrent write tasks corresponding to data lake files in the partitioned data table, effectively avoiding concurrent write conflicts in the data lake files in the partitioned data table. This significantly reduces the probability of conflicts in data lake files during concurrent write scenarios and greatly reduces the adverse effects on production caused by data latency or inconsistency due to concurrent write conflicts in data lake files. Attached Figure Description

[0052] To more clearly illustrate the solutions in this application, the accompanying drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0053] Figure 1 This is an exemplary system architecture diagram to which this application can be applied;

[0054] Figure 2 A flowchart of an embodiment of the task processing method according to this application;

[0055] Figure 3 This is a schematic diagram of the structure of one embodiment of the task processing apparatus according to this application;

[0056] Figure 4 This is a schematic diagram of the structure of one embodiment of the computer device according to this application. Detailed Implementation

[0057] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains; the terminology used herein in the specification of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having," and any variations thereof, in the specification, claims, and foregoing drawings of this application, are intended to cover non-exclusive inclusion. The terms "first," "second," etc., in the specification, claims, or foregoing drawings of this application are used to distinguish different objects, not to describe a particular order.

[0058] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.

[0059] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

[0060] like Figure 1As shown, system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. Network 104 serves as the medium for providing communication links between terminal devices 101, 102, and 103 and server 105. Network 104 may include various connection types, such as wired or wireless communication links, or fiber optic cables, etc.

[0061] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social media platform software, etc.

[0062] Terminal devices 101, 102, and 103 can be various electronic devices with displays and support web browsing, including but not limited to smartphones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), MP4 players (Moving Picture Experts Group Audio Layer IV), laptops, and desktop computers, etc.

[0063] Server 105 can be a server that provides various services, such as a backend server that supports the pages displayed on terminal devices 101, 102, and 103.

[0064] It should be noted that the task processing method provided in the embodiments of this application is generally executed by a server / terminal device, and correspondingly, the task processing device is generally set in the server / terminal device.

[0065] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.

[0066] Continue to refer to Figure 2 A flowchart illustrating an embodiment of the task processing method according to this application is shown. The order of steps in the flowchart can be changed, and some steps can be omitted, depending on different requirements. The task processing method provided in this application embodiment can be applied to any scenario requiring concurrent writing of data lake files, and thus can be applied to products in these scenarios, such as concurrent writing of data lake files in the financial insurance field. The task processing method includes the following steps:

[0067] Step S201: Determine whether a concurrent write task corresponding to the data lake file write scenario has been received.

[0068] In this embodiment, the task processing method runs on an electronic device (e.g., Figure 1 The server / terminal device shown can acquire concurrent write tasks via wired or wireless connections. It should be noted that the aforementioned wireless connection methods include, but are not limited to, 3G / 4G / 5G connections, WiFi connections, Bluetooth connections, WiMAX connections, Zigbee connections, UWB (ultra-wideband) connections, and other currently known or future-developed wireless connection methods. A data lake is a storage system that includes different file formats and lake table formats at its underlying layer, capable of storing large amounts of unstructured and semi-structured raw data. Data consumers can access this data for data analysis, including BI, reporting, and machine learning model training. With a data lake, data becomes increasingly usable. Data lake technology can be specifically applied to the storage systems of fintech companies, such as insurance companies and banks. In the application field of fintech, data lake files can be used to store financial data, such as business data, purchase data, payment data, transaction data, financial product data, etc. The aforementioned data lake file write scenario refers to a business scenario where the received concurrent write tasks include concurrent write processing of data lake files stored in partitioned data tables. The concurrent write tasks may include one or more. For example, concurrent write tasks may include: a "concurrency-1" write task, the task content of which is to append real-time data to the data lake file corresponding to the latest partition path 2023-01-01; or it may include: a "concurrency-2" write task, the task content of which is to update a correction data to the data lake file corresponding to the historical partition path 1990-01-01; or it may also include: overwriting the data of the historical partition, rewriting all data files of the partition path from "1990-01-01" to "2000-01-01".

[0069] Step S202: If yes, obtain the metadata information from the preset partition data table.

[0070] In this embodiment, corresponding metadata information is pre-established for all data lake files within the partitioned data table. This metadata information, also known as metadata, stores information about the segment lock mapping table; specifically, it stores the segment lock information for all data lake files contained in the entire partitioned data table, and this metadata information is persistently stored. All pending concurrent write tasks related to the data lake files in the partitioned data table must first request the metadata information of the partitioned data table to obtain the corresponding segment lock information. The process of all concurrent write tasks requesting metadata information is quick and lightweight, and none of them maintain a long-term read / write state on the metadata information. Therefore, the metadata information itself does not experience concurrent write conflicts.

[0071] Step S203: Determine the first partition bucket number of the specified data lake file that matches the concurrent write task from the metadata information.

[0072] In this embodiment, the specific implementation process of determining the first partition bucket number of the specified data lake file matching the concurrent write task from the metadata information will be further described in detail in subsequent specific embodiments, and will not be elaborated on here. The specified data lake file matching the concurrent write task can refer to all data lake files in the partition data table corresponding to the concurrent write task that need to be modified (including insertion, update, deletion, etc.). Furthermore, the partition bucket number is composed of the partition path + bucket number. Multiple specified data lake files can exist corresponding to the concurrent write task, and multiple first partition bucket numbers can also exist.

[0073] Step S204: Obtain the first segment lock status value corresponding to the first segment bin number based on the metadata information.

[0074] In this embodiment, the segment lock mapping table stored in the metadata information can be queried based on the first segment bucket number to retrieve the first segment lock status value corresponding to that first segment bucket number. If there are multiple first segment bucket numbers, there are also multiple corresponding first segment lock status values.

[0075] Step S205: If the first segment lock status values corresponding to the first segment bucket number are all unlocked, then the first segment lock status in the metadata information is set to locked, and the concurrent write task is executed until the concurrent write task is completed, and then the first segment lock status value is restored to unlocked.

[0076] In this embodiment, if the first segment lock status values corresponding to the first partition bucket numbers are all in an unlocked state, it indicates that there are currently no other concurrent write tasks related to the specified data lake file being executed. Therefore, the aforementioned concurrent write task can be executed directly. Specifically, the first segment lock status in the metadata information is set to a locked state, and the concurrent write task is executed until the concurrent write task is completed. Then, the first segment lock status value is restored to an unlocked state. Furthermore, after completing the execution of the concurrent write task, a lock-free algorithm can be further used to update the version information of the partition data table to complete the processing record of this concurrent write task.

[0077] Step S206: If the first segment lock state value corresponding to the segment bucket number includes a locked state, then restrict the execution of the concurrent write task.

[0078] In this embodiment, if the first segment lock status values corresponding to the first partition bucket number are not all in an unlocked state, that is, the first segment lock status values include a locked state, it indicates that there are other concurrent write tasks currently being executed related to the specified data lake file. In this case, the execution of the concurrent write task will be intelligently restricted to avoid the situation where other concurrent write tasks being executed related to the specified data lake file will fail due to task anomalies caused by immediately executing the current concurrent write task. This effectively avoids the problem of concurrent write conflicts in the data lake file of the partition data table.

[0079] Step S207: After detecting that the concurrent write task meets the preset task retry conditions, the concurrent write task is executed until the concurrent write task is completed, and the first segment lock state value is set to the unlocked state.

[0080] In this embodiment, the specific implementation process of detecting that the concurrent write task meets the preset task retry conditions will be further described in detail in subsequent specific embodiments of this application, and will not be elaborated on here. In addition, the above-mentioned execution processing of the concurrent write task refers to first setting the first segment lock state in the metadata information to a locked state, and then executing the processing of the concurrent write task.

[0081] This application first determines whether a concurrent write task corresponding to a data lake file write scenario has been received; if so, it obtains metadata information from a preset partition data table; then, it determines the first partition bucket number of the specified data lake file matching the concurrent write task from the metadata information; subsequently, it obtains the first segment lock status value corresponding to the first partition bucket number based on the metadata information; if all the first segment lock status values corresponding to the first partition bucket number are unlocked, the first segment lock status in the metadata information is set to locked, and the concurrent write task is executed until the concurrent write task is completed, and then the first segment lock status value is restored to unlocked; if the first segment lock status value corresponding to the partition bucket number includes a locked status, the execution of the concurrent write task is restricted; finally, after detecting that the concurrent write task meets the preset task retry conditions, the concurrent write task is executed until the concurrent write task is completed, and the first segment lock status value is set to unlocked. This application optimizes the processing of concurrent write tasks corresponding to data lake file write scenarios by using metadata information from a pre-built partitioned data table. It determines the first partition bucket number of a specified data lake file matching the concurrent write task from the metadata information, and obtains the first segment lock state value corresponding to the first partition bucket number from the metadata information. Then, it performs content analysis on the first segment lock state value to accurately respond to the concurrent write task based on the analysis results. The analysis of the segment lock state in the metadata information enables fast and accurate processing of concurrent write tasks corresponding to data lake files in the partitioned data table, effectively avoiding concurrent write conflicts in the data lake files in the partitioned data table. This significantly reduces the probability of conflicts in data lake files during concurrent write scenarios and greatly reduces the adverse effects on production caused by data latency or inconsistency due to concurrent write conflicts in data lake files.

[0082] In some alternative implementations, step S203 includes the following steps:

[0083] The concurrent write task is parsed to obtain the corresponding parsed information.

[0084] In this embodiment, the parsing information of the concurrent write task can be obtained by parsing the concurrent write task. The parsing information may include the task details information and file partition description information of the concurrent write task.

[0085] The file partition description information in the concurrent write task is extracted from the parsed information.

[0086] In this embodiment, the file partition description information in the concurrent write task can be obtained by extracting file partition description information from the parsed information. For example, if the content of the concurrent write task includes appending real-time data to the latest partition path: partition path = 2023-01-01 corresponding to the data lake file, then after parsing the information of the concurrent write task, the task details information of the concurrent write task can be obtained, including: appending real-time data, and the file partition description information of the concurrent write task includes: partition path = 2023-01-01, that is, the file partition description information is the partition path in the concurrent write task.

[0087] Filter out the specified partition bucket number that matches the file partition description information from the metadata information.

[0088] In this embodiment, the file partition description information can be used to query the metadata information to find a specified partition path that is identical to the file partition description information, and then the partition bucket number matching the specified partition path can be extracted to obtain the specified partition bucket number. The partition bucket number is composed of the partition path + bucket number. For example, the partition bucket number matching the specified partition path includes: 2023-01-01-a, where 2023-01-01 is the partition path and 'a' is the bucket number.

[0089] Use the specified partition and bucket number as the first partition and bucket number of the specified data lake file.

[0090] This application parses the concurrent write task to obtain corresponding parsed information; then extracts the file partition description information from the parsed information; subsequently, it filters out the specified partition bucket number that matches the file partition description information from the metadata information; and finally, it uses the specified partition bucket number as the first partition bucket number of the specified data lake file. This application achieves rapid and accurate determination of the first partition bucket number of the specified data lake file matching the concurrent write task based on metadata information, improving the efficiency of obtaining the first partition bucket number and ensuring its accuracy.

[0091] In some optional implementations of this embodiment, before step S202, the electronic device may further perform the following steps:

[0092] Construct partitioned data tables for all the data lake files.

[0093] In this embodiment, a new partitioned data table can be created to store all the data lake files.

[0094] The file storage paths of all the data lake files are initialized in the partition data table.

[0095] In this embodiment, the file storage paths of all data lake files are initialized in the partition data table so that all data lake files are stored in the partition data table according to a unique partition bucket number (the partition bucket number is composed of partition path + bucket number), and the partition bucket number of each data lake file is used to uniquely identify the corresponding data lake file.

[0096] Based on the preset segmented lock mapping mechanism and the file storage path, the partition data table is initialized to construct corresponding metadata information for all the data lake files in the partition data table.

[0097] In this embodiment, the specific implementation process of the above-mentioned data initialization processing of the partition data table based on the preset segmented lock mapping mechanism and the file storage path to construct corresponding metadata information for all the data lake files in the partition data table will be further described in detail in subsequent specific embodiments of this application, and will not be elaborated on here.

[0098] This application constructs a partitioned data table for all the data lake files; then initializes the file storage paths of all the data lake files in the partitioned data table; subsequently, it initializes the partitioned data table based on a preset segmented lock mapping mechanism and the file storage paths, thereby constructing corresponding metadata information for all the data lake files in the partitioned data table. After constructing the partitioned data table for all the data lake files, this application initializes the file storage paths of all the data lake files in the partitioned data table, and then initializes the partitioned data table based on a preset segmented lock mapping mechanism and the use of the file storage paths. This enables rapid construction of corresponding metadata information for all the data lake files in the partitioned data table, improving the efficiency of metadata information construction. This facilitates the subsequent rapid and accurate processing of concurrent write tasks corresponding to the data lake files in the partitioned data table based on this metadata information, and effectively avoids concurrent write conflicts in the data lake files of the partitioned data table.

[0099] In some optional implementations, the data initialization process of the partition data table based on the preset segmented lock mapping mechanism and the file storage path to construct corresponding metadata information for all data lake files in the partition data table includes the following steps:

[0100] Based on the file storage path, the second partition bucket number of each data lake file in the partition data table is determined.

[0101] In this embodiment, the second partition and bucket number of each data lake file in the partition data table can be created by performing corresponding numbering processing on the file storage path of each data lake file in the partition data table.

[0102] Construct data initialization write tasks corresponding to all the data lake files.

[0103] In this embodiment, the above-mentioned data initialization writing task is to construct a task for adding the second partition bucket number corresponding to all the data lake files to a preset segment lock mapping table, and to set the second segment lock status value corresponding to each second partition bucket number to an unlocked state in the segment lock mapping table, and then construct the initialization task of metadata information corresponding to all data lake files in the partition data table based on the obtained modified segment lock mapping table.

[0104] Execute the data initialization write task to add the second segment bucket number corresponding to all the data lake files to the preset segment lock mapping table.

[0105] In this embodiment, the segmented lock mapping table is a key-value mapping table constructed based on the segmented lock mapping mechanism. The key is the partition and bucket number of the data lake file, which can uniquely identify a unique data lake file under the partitioned data table. The value stores the segmented lock status value of the data lake file, which includes an unlocked state or a locked state.

[0106] In the segmented lock mapping table, the second segmented lock status value corresponding to each second segmented zone bucket number is set to the unlocked state, resulting in the modified segmented lock mapping table.

[0107] In this embodiment, during the data initialization process of the partitioned data table, the segment lock status values of all data lake files in the segment lock mapping table are pre-set to an unlocked state to ensure that all data lake files are in a normal state where data can be written normally.

[0108] The modified segmented lock mapping table is stored in a preset initial metadata file to obtain the metadata information.

[0109] In this embodiment, the initial metadata file is a file template constructed to store the segmented lock mapping table. The final metadata information is the information storing the segmented lock mapping table, that is, the metadata information stores the segmented lock information of all data lake files contained in the entire partitioned data table, and this metadata information will be persistently stored. After the metadata information is constructed, the first version information of the partitioned data table, i.e., version 01, will also be generated.

[0110] This application determines the second partition bucket number of each data lake file in the partition data table based on the file storage path; then constructs a data initialization write task corresponding to all the data lake files; subsequently executes the data initialization write task to add the second partition bucket number corresponding to all the data lake files to a preset segment lock mapping table; subsequently sets the second segment lock status value corresponding to each second partition bucket number to an unlocked state in the segment lock mapping table to obtain a modified segment lock mapping table; finally, stores the modified segment lock mapping table in a preset initial metadata file to obtain the metadata information. After determining the second partition bucket number of each data lake file in the partition data table based on the file storage path, this application executes a pre-built data initialization write task corresponding to all the data lake files to add the second partition bucket number corresponding to all the data lake files to a preset segment lock mapping table, and sets the second segment lock status value corresponding to each second partition bucket number to an unlocked state in the segment lock mapping table. Then, based on the modified segment lock mapping table, the metadata information corresponding to all the data lake files in the partition data table can be quickly constructed, improving the construction efficiency and intelligence of metadata information.

[0111] In some alternative implementations, prior to step S207, the electronic device may also perform the following steps:

[0112] The third segment lock status value, which matches the first segment bucket number of the specified data lake file, is obtained in real time from the metadata information.

[0113] In this embodiment, the third segment lock state value may include an unlocked state or a locked state.

[0114] Determine whether all the lock status values of the third segment are in an unlocked state.

[0115] If yes, the concurrent write task is determined to meet the task retry conditions; otherwise, the concurrent write task is determined to not meet the task retry conditions.

[0116] In this embodiment, if all the third segment lock state values are unlocked, it indicates that there are currently no other concurrent write tasks corresponding to the specified data lake file, and thus the concurrent write task is determined to meet the task retry condition. Conversely, if not all the third segment lock state values are unlocked (i.e., including locked states), it indicates that there are currently other concurrent write tasks corresponding to the specified data lake file, and thus the concurrent write task is determined to not meet the task retry condition.

[0117] This application obtains the third segment lock status value matching the first partition and bucket number of the specified data lake file from the metadata information in real time; then determines whether all the third segment lock status values are in an unlocked state; if so, it determines that the concurrent write task meets the task retry condition; otherwise, it determines that the concurrent write task does not meet the task retry condition. This application improves the accuracy of detecting the task retry condition for concurrent write tasks by obtaining the third segment lock status value matching the first partition and bucket number of the specified data lake file from the metadata information in real time and performing content analysis on the third segment lock status value.

[0118] In some optional implementations of this embodiment, step S202 includes the following steps:

[0119] Invoke the preset task verification rules.

[0120] In this embodiment, the above-mentioned task verification rule is a rule pre-built according to the content of the actual compliance task processing specification. The task verification rule is used to verify whether the write task to be executed conforms to the rules of the compliance task processing specification.

[0121] The concurrent write task is validated based on the task validation rules.

[0122] In this embodiment, the concurrent write task is verified using the task verification rules. If the concurrent write task is found to comply with the above-mentioned compliant task processing specifications, the concurrent write task is determined to have passed the verification; otherwise, the concurrent write task is determined to have failed the verification.

[0123] If the concurrent write task passes the verification, then the step of obtaining metadata information from the preset partition data table is executed.

[0124] In this embodiment, if the concurrent write task passes the verification, it is determined to be a compliant write task, and the subsequent step of obtaining metadata information from the preset partition data table will be executed. Conversely, if the concurrent write task fails the verification, it is determined to be an compliant write task, and the subsequent step of obtaining metadata information from the preset partition data table will not be executed. Instead, the execution of the concurrent write task will be restricted to avoid data lake file corruption due to processing compliant concurrent write tasks, thus ensuring the standardization of concurrent write task processing.

[0125] This application verifies concurrent write tasks by invoking preset task verification rules; then, it verifies the concurrent write tasks based on these rules; if the concurrent write tasks pass the verification, it proceeds to the step of obtaining metadata information from a preset partition data table. By using task verification rules to verify the concurrent write tasks, this application ensures that the process of obtaining metadata information from the preset partition data table is only executed after the concurrent write tasks have passed verification. This effectively prevents data lake files from being corrupted due to the processing of non-compliant concurrent write tasks, thus ensuring the standardization of concurrent write task processing.

[0126] In some optional implementations of this embodiment, after step S205, the electronic device may further perform the following steps:

[0127] Invoke the preset lock-free algorithm.

[0128] In this embodiment, the lock-free algorithm described above can specifically adopt the CAS (Compare And Swap) algorithm.

[0129] Obtain the current version information of the partition data table.

[0130] In this embodiment, the current version information of the partition data table can be obtained by querying the version information of the partition data table.

[0131] The partition data table is updated based on the lock-free algorithm and the current version information.

[0132] In this embodiment, the current version information of the partition data table can be iteratively incremented by one using the lock-free algorithm to generate corresponding target version information, and the current version information of the partition data table can be replaced using the target version information to complete the version update process of the partition data table.

[0133] This application calls a preset lock-free algorithm; then obtains the current version information of the partition data table; subsequently, it updates the partition data table based on the lock-free algorithm and the current version information. After completing the processing of concurrent write tasks, this application intelligently uses a lock-free algorithm to update the current version information of the partition data table, thereby achieving automatic updates of the partition data table's version information and improving the intelligence of the update process.

[0134] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

[0135] It should be emphasized that, to further ensure the privacy and security of the aforementioned partitioned data table, the partitioned data table can also be stored in a node of a blockchain.

[0136] The blockchain referred to in this application is a novel application model of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanisms, and encryption algorithms. Essentially, a blockchain is a decentralized database, a chain of data blocks linked together using cryptographic methods. Each data block contains information about a batch of network transactions, used to verify the validity of the information (anti-counterfeiting) and generate the next block. A blockchain can include an underlying blockchain platform, a platform product service layer, and an application service layer.

[0137] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.

[0138] Foundational technologies for artificial intelligence generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly encompass computer vision, robotics, biometrics, speech processing, natural language processing, and machine learning / deep learning.

[0139] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware with computer-readable instructions. These computer-readable instructions can be stored in a computer-readable storage medium. When executed, the program can include the processes of the embodiments of the above methods. The aforementioned storage medium can be a non-volatile storage medium such as a magnetic disk, optical disk, or read-only memory (ROM), or random access memory (RAM).

[0140] It should be understood that although the steps in the flowcharts of the accompanying figures are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the accompanying figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.

[0141] Further reference Figure 3 As a response to the above Figure 2 To implement the method shown, this application provides an embodiment of a task processing device, which is similar to... Figure 2 Corresponding to the method embodiments shown, this device can be specifically applied to various electronic devices.

[0142] like Figure 3 As shown, the task processing device 300 described in this embodiment includes: a first judgment module 301, a first acquisition module 302, a determination module 303, a second acquisition module 304, a first processing module 305, a second processing module 306, and a third processing module 307. Wherein:

[0143] The first judgment module 301 is used to determine whether a concurrent write task corresponding to the data lake file write scenario has been received;

[0144] The first acquisition module 302 is used to acquire metadata information from a preset partition data table if the condition is met; wherein, corresponding metadata information is pre-established for all data lake files in the partition data table.

[0145] The determination module 303 is used to determine the first partition bucket number of the specified data lake file that matches the concurrent write task from the metadata information;

[0146] The second acquisition module 304 is used to acquire the first segment lock status value corresponding to the first segment bin number based on the metadata information;

[0147] The first processing module 305 is configured to, if the first segment lock status values corresponding to the first segment bucket number are all unlocked, set the first segment lock status in the metadata information to locked status, execute the concurrent write task, and then restore the first segment lock status value to unlocked status after the concurrent write task is completed.

[0148] The second processing module 306 is used to restrict the execution of the concurrent write task if the first segment lock state value corresponding to the segment bucket number includes a locked state.

[0149] The third processing module 307 is used to perform execution processing on the concurrent write task after detecting that the concurrent write task meets the preset task retry conditions, until the concurrent write task is completed, and set the first segment lock state value to the unlocked state.

[0150] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the task processing method in the aforementioned embodiments, and will not be repeated here.

[0151] In some optional implementations of this embodiment, the determining module 303 includes:

[0152] The parsing submodule is used to parse the information of the concurrent write task to obtain the corresponding parsing information;

[0153] An extraction submodule is used to extract file partition description information from the parsed information in the concurrent write task;

[0154] The filtering submodule is used to filter out specified partition and bucket numbers that match the file partition description information from the metadata information.

[0155] The first determining submodule is used to use the specified partition and bucket number as the first partition and bucket number of the specified data lake file.

[0156] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the task processing method in the aforementioned embodiments, and will not be repeated here.

[0157] In some optional implementations of this embodiment, the task processing device further includes:

[0158] The build module is used to construct partitioned data tables for all the data lake files;

[0159] An initialization module is used to initialize the file storage paths of all the data lake files in the partition data table;

[0160] The fourth processing module is used to perform data initialization processing on the partition data table based on the preset segmented lock mapping mechanism and the file storage path, so as to construct corresponding metadata information for all the data lake files in the partition data table.

[0161] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the task processing method in the aforementioned embodiments, and will not be repeated here.

[0162] In some optional implementations of this embodiment, the fourth processing module includes:

[0163] The second determining submodule is used to determine the second partition bucket number of each data lake file in the partition data table based on the file storage path;

[0164] The construction submodule is used to construct the data initialization write task corresponding to all the data lake files;

[0165] The execution submodule is used to execute the data initialization and writing task, and add the second partition and bucket numbers corresponding to all the data lake files to the preset segment lock mapping table;

[0166] The setting submodule is used to set the second segment lock status value corresponding to each second segment bin number in the segment lock mapping table to the unlocked state, so as to obtain the modified segment lock mapping table.

[0167] A generation submodule is used to store the modified segmented lock mapping table into a preset initial metadata file to obtain the metadata information.

[0168] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the task processing method in the aforementioned embodiments, and will not be repeated here.

[0169] In some optional implementations of this embodiment, the task processing device further includes:

[0170] The third acquisition module is used to acquire in real time the third segment lock status value that matches the first segment bucket number of the specified data lake file in the metadata information;

[0171] The second judgment module is used to determine whether the third segment lock status values are all in an unlocked state.

[0172] The determination module is used to determine if the concurrent write task meets the task retry condition if so, and otherwise determine if the concurrent write task does not meet the task retry condition.

[0173] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the task processing method in the aforementioned embodiments, and will not be repeated here.

[0174] In some optional implementations of this embodiment, the first acquisition module 302 includes:

[0175] Call the submodule to invoke the preset task verification rules;

[0176] The verification submodule is used to verify the concurrent write task based on the task verification rules;

[0177] The execution submodule is used to execute the step of obtaining metadata information from the preset partition data table if the concurrent write task passes the verification.

[0178] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the task processing method in the aforementioned embodiments, and will not be repeated here.

[0179] In some optional implementations of this embodiment, the task processing device further includes:

[0180] The calling module is used to invoke the preset lock-free algorithm;

[0181] The fourth acquisition module is used to acquire the current version information of the partition data table;

[0182] The update module is used to perform version update processing on the partition data table based on the lock-free algorithm and the current version information.

[0183] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the task processing method in the aforementioned embodiments, and will not be repeated here.

[0184] To address the aforementioned technical problems, embodiments of this application also provide a computer device. Please refer to [link / reference needed]. Figure 4 , Figure 4 This is a basic structural block diagram of the computer device in this embodiment.

[0185] The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are interconnected via a system bus. It should be noted that only the computer device 4 with components 41-43 is shown in the figure; however, it should be understood that it is not required to implement all the shown components, and more or fewer components can be implemented alternatively. Those skilled in the art will understand that the computer device described here is a device capable of automatically performing numerical calculations and / or information processing according to pre-set or stored instructions, and its hardware includes, but is not limited to, microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), embedded devices, etc.

[0186] The computer device can be a desktop computer, laptop, handheld computer, or cloud server, etc. The computer device can interact with the user via a keyboard, mouse, remote control, touchpad, or voice control.

[0187] The memory 41 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as the hard disk or memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the computer device 4. Of course, the memory 41 may also include both the internal storage unit and its external storage device of the computer device 4. In this embodiment, the memory 41 is typically used to store the operating system and various application software installed on the computer device 4, such as computer-readable instructions for task processing methods. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.

[0188] In some embodiments, the processor 42 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chip. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is used to execute computer-readable instructions stored in the memory 41 or to process data, for example, to execute computer-readable instructions of the task processing method.

[0189] The network interface 43 may include a wireless network interface or a wired network interface, which is typically used to establish communication connections between the computer device 4 and other electronic devices.

[0190] Compared with the prior art, the embodiments of this application have the following main advantages:

[0191] In this embodiment, the processing of concurrent write tasks corresponding to data lake file write scenarios is optimized by using metadata information from a pre-built partitioned data table. The first partition bucket number of the specified data lake file matching the concurrent write task is determined from the metadata information, and the first segment lock state value corresponding to the first partition bucket number is obtained from the metadata information. Then, content analysis is performed on the first segment lock state value to accurately respond to the concurrent write task based on the analysis results. The analysis of the segment lock state in the metadata information enables fast and accurate processing of concurrent write tasks corresponding to data lake files in the partitioned data table, effectively avoiding concurrent write conflicts in the data lake files in the partitioned data table. This significantly reduces the probability of conflicts in data lake files during concurrent write scenarios and greatly reduces the adverse effects on production caused by data latency or inconsistency due to concurrent write conflicts in data lake files.

[0192] This application also provides another embodiment, namely, providing a computer-readable storage medium storing computer-readable instructions that can be executed by at least one processor to cause the at least one processor to perform the steps of the task processing method described above.

[0193] Compared with the prior art, the embodiments of this application have the following main advantages:

[0194] In this embodiment, the processing of concurrent write tasks corresponding to data lake file write scenarios is optimized by using metadata information from a pre-built partitioned data table. The first partition bucket number of the specified data lake file matching the concurrent write task is determined from the metadata information, and the first segment lock state value corresponding to the first partition bucket number is obtained from the metadata information. Then, content analysis is performed on the first segment lock state value to accurately respond to the concurrent write task based on the analysis results. The analysis of the segment lock state in the metadata information enables fast and accurate processing of concurrent write tasks corresponding to data lake files in the partitioned data table, effectively avoiding concurrent write conflicts in the data lake files in the partitioned data table. This significantly reduces the probability of conflicts in data lake files during concurrent write scenarios and greatly reduces the adverse effects on production caused by data latency or inconsistency due to concurrent write conflicts in data lake files.

[0195] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk), and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of this application.

[0196] Obviously, the embodiments described above are only some embodiments of this application, not all embodiments. The accompanying drawings show preferred embodiments of this application, but do not limit the patent scope of this application. This application can be implemented in many different forms; rather, the purpose of providing these embodiments is to provide a more thorough and comprehensive understanding of the disclosure of this application. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or make equivalent substitutions for some of the technical features. Any equivalent structures made using the content of this application's specification and drawings, directly or indirectly applied to other related technical fields, are similarly within the scope of patent protection of this application.

Claims

1. A task processing method characterized by, Includes the following steps: Determine whether a concurrent write task corresponding to the data lake file write scenario has been received; If so, obtain the metadata information from the preset partition data table; wherein, corresponding metadata information has been pre-established for all data lake files in the partition data table; The first partition and bucket number of the specified data lake file that matches the concurrent write task is determined from the metadata information; Based on the metadata information, obtain the first segment lock status value corresponding to the first segment bin number; If the first segment lock status value corresponding to the first segment bucket number is unlocked, then the first segment lock status in the metadata information is set to locked, and the concurrent write task is executed until the concurrent write task is completed, and then the first segment lock status value is restored to unlocked. If the first segment lock status value corresponding to the segment bucket number includes a locked state, then the concurrent write task is restricted from execution. After detecting that the concurrent write task meets the preset task retry conditions, the concurrent write task is executed until the concurrent write task is completed, and the first segment lock state value is set to the unlocked state. Prior to the step of obtaining the first partition bucket number of the metadata information in the preset partition data table, the method further includes: Construct partitioned data tables for all the data lake files; The file storage paths of all the data lake files are initialized in the partition data table; Based on the preset segmented lock mapping mechanism and the file storage path, the partition data table is initialized to construct corresponding metadata information for all the data lake files in the partition data table. The step of initializing the partitioned data table based on a preset segmented lock mapping mechanism and the file storage path to construct corresponding metadata information for all data lake files in the partitioned data table specifically includes: Based on the file storage path, the second partition bucket number of each data lake file in the partition data table is determined; Construct data initialization write tasks corresponding to all the aforementioned data lake files; Execute the data initialization write task to add the second partition bucket number corresponding to all the data lake files to the preset segment lock mapping table; In the segmented lock mapping table, the second segmented lock status value corresponding to each second segmented zone bucket number is set to the unlocked state to obtain the modified segmented lock mapping table; The modified segmented lock mapping table is stored in a preset initial metadata file to obtain the metadata information; The aforementioned data initialization writing task involves constructing a task to add the second partition bucket numbers corresponding to all the data lake files to a preset segment lock mapping table, and setting the second segment lock status value corresponding to each second partition bucket number to an unlocked state in the segment lock mapping table, thereby constructing an initialization task for the metadata information of all data lake files in the partition data table based on the modified segment lock mapping table.

2. The task processing method according to claim 1, characterized by, The step of determining the first partition bucket number of the specified data lake file that matches the concurrent write task from the metadata information specifically includes: The concurrent write task is parsed to obtain the corresponding parsed information; Extract the file partition description information from the parsed information in the concurrent write task; Filter out the specified partition and bucket number that matches the file partition description information from the metadata information; Use the specified partition and bucket number as the first partition and bucket number of the specified data lake file.

3. The task processing method according to claim 1, characterized by, Before the step of executing the concurrent write task after detecting that the concurrent write task meets the preset task retry conditions, until the concurrent write task is completed, and setting the first segment lock state value to the unlocked state, the method further includes: In real time, obtain the third segment lock status value from the metadata information that matches the first segment bucket number of the specified data lake file; Determine whether all the lock status values of the third segment are in an unlocked state; If yes, the concurrent write task is determined to meet the task retry conditions; otherwise, the concurrent write task is determined to not meet the task retry conditions.

4. The task processing method according to claim 1, characterized in that, The step of obtaining metadata information from the preset partition data table specifically includes: Invoke the preset task verification rules; The concurrent write task is validated based on the aforementioned task validation rules; If the concurrent write task passes the verification, then the step of obtaining metadata information from the preset partition data table is executed.

5. The task processing method according to claim 1, characterized in that, After the step of setting the first segment lock state in the metadata information to a locked state and executing the concurrent write task until the concurrent write task is completed, and then restoring the first segment lock state value to an unlocked state if all the first segment lock state values corresponding to the first segment bucket numbers are unlocked, the method further includes: Invoke the preset lock-free algorithm; Obtain the current version information of the partition data table; The partition data table is updated based on the lock-free algorithm and the current version information.

6. A task processing device, characterized in that, include: The first judgment module is used to determine whether a concurrent write task corresponding to the data lake file write scenario has been received; The first acquisition module is used to acquire metadata information from a preset partitioned data table if the condition is met; wherein, corresponding metadata information has been pre-established for all data lake files in the partitioned data table. The determination module is used to determine the first partition bucket number of the specified data lake file that matches the concurrent write task from the metadata information; The second acquisition module is used to acquire the first segment lock status value corresponding to the first segment bin number based on the metadata information; The first processing module is configured to, if the first segment lock status values corresponding to the first segment bucket number are all unlocked, set the first segment lock status in the metadata information to locked, execute the concurrent write task, and then restore the first segment lock status value to unlocked if the first segment lock status value is unlocked. The second processing module is used to restrict the execution of the concurrent write task if the first segment lock status value corresponding to the segment bucket number includes a locked status. The third processing module is used to execute the concurrent write task after detecting that the concurrent write task meets the preset task retry conditions, until the concurrent write task is completed, and set the first segment lock state value to the unlocked state. The task processing unit also includes: The build module is used to construct partitioned data tables for all the data lake files; An initialization module is used to initialize the file storage paths of all the data lake files in the partition data table; The fourth processing module is used to perform data initialization processing on the partition data table based on the preset segmented lock mapping mechanism and the file storage path, so as to construct corresponding metadata information for all the data lake files in the partition data table; The fourth processing module includes: The second determining submodule is used to determine the second partition bucket number of each data lake file in the partition data table based on the file storage path; The construction submodule is used to construct the data initialization write task corresponding to all the data lake files; The execution submodule is used to execute the data initialization and writing task, and add the second partition and bucket numbers corresponding to all the data lake files to the preset segment lock mapping table; The setting submodule is used to set the second segment lock status value corresponding to each second segment bin number in the segment lock mapping table to the unlocked state, so as to obtain the modified segment lock mapping table. A generation submodule is used to store the modified segmented lock mapping table into a preset initial metadata file to obtain the metadata information; The aforementioned data initialization writing task involves constructing a task to add the second partition bucket numbers corresponding to all the data lake files to a preset segment lock mapping table, and setting the second segment lock status value corresponding to each second partition bucket number to an unlocked state in the segment lock mapping table, thereby constructing an initialization task for the metadata information of all data lake files in the partition data table based on the modified segment lock mapping table.

7. A computer device comprising a memory and a processor, the memory storing computer-readable instructions, wherein the processor, when executing the computer-readable instructions, implements the steps of the task processing method as described in any one of claims 1 to 5.

8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-readable instructions, which, when executed by a processor, implement the steps of the task processing method as described in any one of claims 1 to 5.