Data processing method, device, medium and equipment based on lake warehouse data platform

By using the LakeWarehouse data platform to uniformly store and build a financial business knowledge base, and by using large models to generate data query statements, the problem of data silos in securities firms has been solved, enabling fast and accurate data analysis and processing, and improving the efficiency and compliance of data services.

CN122240743APending Publication Date: 2026-06-19CSC FINANCIAL CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CSC FINANCIAL CO LTD
Filing Date
2026-03-05
Publication Date
2026-06-19

Smart Images

  • Figure CN122240743A_ABST
    Figure CN122240743A_ABST
Patent Text Reader

Abstract

This application discloses a data processing method, apparatus, medium, and device based on a lake warehouse data platform. The method includes: storing several unstructured data and several structured data based on a pre-deployed lake warehouse data platform, and establishing the association between each unstructured data and its corresponding structured data; constructing a business knowledge base containing financial business terminology and financial regulatory rules; generating data query statements corresponding to each analysis intent using a large model based on the business knowledge base, for each data analysis intent; responding to the selection operation of the target user for the target data query statement corresponding to the target data analysis intent, and obtaining several target unstructured data and several target structured data from the lake warehouse data platform based on the target data query statement; and performing data analysis and processing based on each target unstructured data and each target structured data to obtain data processing results.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to a data processing method, apparatus, medium and equipment based on the Lake Warehouse Data Platform. Background Technology

[0002] As the securities industry accelerates its digital transformation, brokerage businesses are generating massive amounts of multi-source, heterogeneous data, encompassing both structured and unstructured data such as customer transaction records, real-time market data, customer service records, compliance audit logs, investment research reports, emails, audio recordings, and chat logs. This data forms the core foundation for customer profiling, intelligent marketing, real-time risk control, compliance monitoring, and large-scale model training.

[0003] However, traditional data architectures are increasingly showing systemic bottlenecks when dealing with such complex data ecosystems. Currently, most securities firms still use a "siloed" IT system architecture, with customer relationship management (CRM), core trading systems, risk control platforms, and compliance reporting systems operating independently. This results in inconsistent data standards and fragmented data storage, leading to a severe "data silo" phenomenon. When business personnel need to analyze customer behavior across systems, they often have to coordinate multiple IT teams to write complex structured query language (SQL) scripts, which is time-consuming and consequently results in low data analysis and processing efficiency.

[0004] Therefore, there is an urgent need for a data processing method based on the Lake Warehouse data platform to solve the problem that existing technologies cannot process data quickly and accurately. Summary of the Invention

[0005] In view of this, the present invention provides a data processing method, apparatus, medium and equipment based on the Lake Warehouse Data Platform, the main purpose of which is to solve the problem of the inability to process data quickly and accurately at present.

[0006] To address the aforementioned problems, this application provides a data processing method based on a lake warehouse data platform, comprising: Based on a pre-deployed lake warehouse data platform, a number of unstructured data and a number of structured data are stored, and the relationship between each unstructured data and the corresponding structured data is established. Build a business knowledge base that includes financial business terminology and financial regulatory rules; For several data analysis intentions, a pre-trained large model is used to generate data query statements corresponding to each analysis intention based on the business knowledge base. In response to the selection operation of the target user's target data query statement corresponding to the target data analysis intent, based on the target data query statement, a number of target unstructured data and a number of target structured data are obtained from the Lake Warehouse data platform; Based on the stated target data analysis intent, data analysis and processing are performed on each of the target unstructured data and each of the target structured data to obtain data processing results.

[0007] Optionally, the storage of several unstructured data and several structured data based on the pre-deployed lake warehouse data platform, and the establishment of the association between each unstructured data and the corresponding structured data, specifically includes: Based on a pre-deployed lake warehouse data platform, unstructured data is stored in a distributed file system, and the metadata of each unstructured data is stored in a pre-created directory table; Create a regular table associated with the directory table, and store the structured data corresponding to each unstructured data in the regular table to establish the association between each unstructured data and the corresponding structured data.

[0008] Optionally, before establishing the association between each unstructured data and its corresponding structured data, the method further includes: Data analysis is performed on each unstructured data based on the large model obtained through pre-training to obtain keyword tags for each unstructured data, thereby obtaining metadata for each unstructured data. Based on the keyword tags of each of the unstructured data, the structured data associated with each of the unstructured data is determined from a number of structured data.

[0009] Optionally, the step of obtaining several target unstructured data and several target structured data from the Lake Warehouse data platform based on the target data query statement specifically includes: Based on the target key fields in the target data query statement, several target structured data are obtained from the ordinary table; Based on the target key field in the target data query statement, obtain the target metadata containing the target key field from the directory table, and obtain the target Uniform Resource Locator from the target metadata; Based on the target Uniform Resource Locator, several target unstructured data are obtained from the distributed file system.

[0010] Optionally, the method further includes: The text component is used to perform structured transformation on the unstructured data to obtain the structured data after transformation. The process of obtaining several target unstructured data and several target structured data from the Lake Warehouse data platform based on the target data query statement specifically includes: Based on the target key fields in the target data query statement, the target retrieval engine is used to retrieve the transformed structured data to obtain the transformed target structured data containing the target key fields. The corresponding target unstructured data is determined based on the transformed target structured data.

[0011] Optionally, before storing a number of unstructured data and a number of structured data based on a pre-deployed lake warehouse data platform, the method further includes: Establish communication connections between the Lake Warehouse data platform and relational and non-relational databases in advance; Obtain a number of structured data from the relational database; Obtain some unstructured data from the aforementioned non-relational database.

[0012] Optionally, the metadata may also include any one or more of the following: storage time of unstructured data, file size of unstructured data, storage location, Uniform Resource Locator (URL), and checksum.

[0013] To address the aforementioned problems, this application provides a data processing apparatus based on a lake warehouse data platform, comprising: The storage module is used to store a number of unstructured data and a number of structured data based on a pre-deployed lake warehouse data platform, and to establish the association between each unstructured data and the corresponding structured data. The building module is used to construct a business knowledge base that includes financial business terminology and financial regulatory rules. The generation module is used to generate data query statements corresponding to each analysis intent based on the business knowledge base, using a pre-trained large model for each data analysis intent. The query module is used to respond to the selection operation of the target user's target data query statement corresponding to the target data analysis intent, and to obtain a number of target unstructured data and a number of target structured data from the Lake Warehouse data platform based on the target data query statement; The processing module is used to perform data analysis and processing based on the target unstructured data and the target structured data, in accordance with the target data analysis intent, to obtain data processing results.

[0014] To address the aforementioned problems, this application provides a storage medium storing a computer program that, when executed by a processor, implements the steps of the data processing method based on the Lake Warehouse data platform described above.

[0015] To address the aforementioned problems, this application provides an electronic device, comprising at least a memory and a processor. The memory stores a computer program, and the processor, when executing the computer program in the memory, implements the steps of any of the aforementioned data processing methods based on the Lake Warehouse data platform.

[0016] The data processing method based on the Lakewaregear data platform in this application unifies the storage of all unstructured and structured data on the Lakewaregear data platform. This allows for comprehensive and rapid data retrieval within the Lakewaregear platform, solving the problem of low retrieval efficiency caused by data distribution across different systems / platforms, thus improving data retrieval efficiency and consequently enhancing data analysis and processing efficiency. Furthermore, by creating a business knowledge base and utilizing a large model in conjunction with the business knowledge base to generate data query statements, the generated data query statements become more reasonable, accurate, and better suited to financial business scenarios. This ensures accurate retrieval of relevant data from the Lakewaregear data platform, thereby improving the accuracy of data analysis and processing.

[0017] The above description is merely an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention and to implement it in accordance with the contents of the specification, and in order to make the above and other objects, features and advantages of the present invention more apparent and understandable, specific embodiments of the present invention are described below. Attached Figure Description

[0018] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings: Figure 1 This is a flowchart illustrating a data processing method based on the Lake Warehouse data platform, as described in an embodiment of this application. Figure 2 This is a flowchart illustrating the data access process according to an embodiment of this application; Figure 3 This is a structural block diagram of a data processing device based on a lake warehouse data platform, according to another embodiment of this application. Figure 4 This is a structural block diagram of an electronic device according to another embodiment of this application. Detailed Implementation

[0019] Various embodiments and features of this application are described herein with reference to the accompanying drawings.

[0020] It should be understood that various modifications can be made to the embodiments described herein. Therefore, the above description should not be considered as limiting, but merely as an example of embodiments. Other modifications within the scope and spirit of this application will be apparent to those skilled in the art.

[0021] The accompanying drawings, which are included in and form part of this specification, illustrate embodiments of the present application and, together with the general description of the present application given above and the detailed description of the embodiments given below, serve to explain the principles of the present application.

[0022] These and other features of this application will become apparent from the following description of preferred forms of embodiments given as non-limiting examples, with reference to the accompanying drawings.

[0023] It should also be understood that although this application has been described with reference to some specific examples, those skilled in the art can certainly implement many other equivalent forms of this application.

[0024] The above and other aspects, features and advantages of this application will become more apparent when taken in conjunction with the accompanying drawings and in view of the following detailed description.

[0025] Specific embodiments of this application are described thereafter with reference to the accompanying drawings; however, it should be understood that the claimed embodiments are merely examples of this application, which can be implemented in various ways. Well-known and / or repeated functions and structures are not described in detail to avoid unnecessary or redundant details that could obscure the application. Therefore, the specific structural and functional details claimed herein are not intended to be limiting, but merely to serve as a representative basis for teaching those skilled in the art to use this application in a variety of substantially any suitable detailed structures.

[0026] This specification may use the phrases “in one embodiment,” “in another embodiment,” “in yet another embodiment,” or “in other embodiments,” all of which may refer to one or more of the same or different embodiments according to this application.

[0027] This application provides a data processing method based on a lake warehouse data platform, such as... Figure 1 As shown, it includes the following steps: Step S101: Based on the pre-deployed lake warehouse data platform, store a number of unstructured data and a number of structured data, and establish the association between each unstructured data and the corresponding structured data. In the specific implementation process, this step can involve pre-establishing communication connections between the Lakeware Data Platform and various relational and non-relational databases. Then, it retrieves some structured data from the relational databases and some unstructured data from the non-relational databases. The structured data can include transaction tables, user information tables, account information tables, risk control indicator tables, etc. The unstructured data can include user credit materials / documents, investment contracts, scanned copies of user ID cards, user photos, etc.

[0028] Step S102: Construct a business knowledge base that includes financial business terminology and financial regulatory rules; In this step, business terminology from the securities industry and regulatory provisions can be added to the system as knowledge. This can be done item by item or by importing in batches using CSV format, thereby building a business knowledge base. In this step, based on specific securities business scenarios, the analytical dimensions, dimension members, and indicators in the business data model, as well as various financial business terms and financial regulatory rules from the business knowledge, can be vectorized and stored in a vector library, serving as an external knowledge base for the larger model.

[0029] Step S103: For several data analysis intentions, the pre-trained large model is used to generate data query statements corresponding to each analysis intention based on the business knowledge base. In the specific implementation process, this step can call the large model interface to realize the intent of data analysis, clarify the dimensions, indicators, filtering conditions and grouping methods of analysis, and lay the groundwork for generating the SQL required for data analysis.

[0030] Step S104: In response to the target user's selection operation of the target data query statement corresponding to the target data analysis intent, obtain a number of target unstructured data and a number of target structured data from the Lake Warehouse data platform based on the target data query statement; In this step, the selection operation can specifically include dragging or clicking the virtual button. By performing the above operations, users can directly select the corresponding target data query statement from among many data query statements, laying the foundation for subsequent rapid data retrieval / query based on the target data query statement.

[0031] Step S105: Based on the target data analysis intent, perform data analysis and processing on each of the target unstructured data and each of the target structured data to obtain data processing results.

[0032] In this step, after obtaining the target unstructured data and some target structured data, data analysis and processing can be performed quickly based on the target unstructured data and target structured data. In specific implementation, a large model obtained through pre-training can be used to perform data analysis on the target unstructured data and target structured data.

[0033] In this embodiment, by uniformly storing all unstructured and structured data on the Lakewareg data platform, comprehensive and rapid data retrieval can be performed on the Lakewareg data platform subsequently. This solves the problem of low retrieval efficiency caused by data being distributed across different systems / platforms, improving data retrieval efficiency and thus enhancing data analysis and processing efficiency. In this application, by creating a business knowledge base and using a large model combined with the business knowledge base to generate data query statements, the generated data query statements are made more reasonable, accurate, and better suited to financial business scenarios. This ensures accurate retrieval of relevant data from the Lakewareg data platform and improves the accuracy of data analysis and processing.

[0034] Another embodiment of this application provides a data processing method based on a lake warehouse data platform, which specifically includes the following steps: Step 1: Establish communication connections between the Lakeware Data Platform and relational and non-relational databases in advance; obtain some structured data from the relational database; obtain some unstructured data from the non-relational database; Specifically, connection information for the corresponding data source can be obtained based on the business scenario, and the data can be accessed through the Lakeware platform via data connection methods. Relational databases can include, for example, MySQL, Oracle, and SQL Server; non-relational databases can include, for example, MongoDB and Cassandra.

[0035] Step 2: Based on the pre-trained large model, perform data analysis on each unstructured data to obtain keyword tags for each unstructured data, thereby obtaining metadata for each unstructured data; based on the keyword tags of each unstructured data, determine the structured data associated with each unstructured data from several structured data. In this step, keyword tags can be information such as name, ID number, financial product name, transaction serial number, etc. That is, for unstructured data, a large model can be used to identify and obtain keyword tags from the unstructured data, such as name A. Then, based on name A, structured data containing name A is determined from several structured data sets, thereby enabling the association between structured data containing name A and unstructured data containing name A.

[0036] Step 3: Based on the Lakeware data platform, store unstructured data in a distributed file system and store the metadata of each unstructured data in a pre-created directory table; create a regular table associated with the directory table and store the structured data associated with each unstructured data in the regular table to establish the association between each unstructured data and the corresponding structured data. This step leverages a lakeware data platform to integrate multi-source data. Specifically, it integrates data from different sources, including transaction records, customer profiles, and regulatory documents, through a unified lakeware architecture, ensuring unified storage of structured and unstructured data. HDFS can be used as the core to provide the underlying storage for unstructured data. Tabular metadata management records the mapping relationship between business terms and fields, facilitating subsequent data querying and analysis. Regular tables store structured data, while directory tables store metadata information for unstructured data. Metadata includes unique identifiers, storage locations, last modified time, MD5 checksums, and tags. Tags can be used to store the results of AI models' analysis of unstructured data, facilitating the extraction of structured information from unstructured data, thereby building knowledge bases or generating model training samples for rapid retrieval and use.

[0037] Step 4: Construct a business knowledge base that includes financial business terminology and financial regulatory rules; In this step, securities terminology and regulatory rules can be incorporated into the knowledge base construction, and vectorized using Embedding technology to complete the construction of the knowledge graph / knowledge base.

[0038] Step 5: For several data analysis intentions, use the pre-trained large model to generate data query statements corresponding to each analysis intention based on the business knowledge base; Step 6: Respond to the target user's selection operation of the target data query statement corresponding to the target data analysis intent, and obtain a number of target unstructured data and a number of target structured data from the Lake Warehouse data platform based on the target data query statement; In the specific implementation process of this step, based on the target key fields in the target data query statement, several target structured data can be obtained from the ordinary table; based on the target key fields in the target data query statement, target metadata containing the target key fields can be obtained from the directory table, and target Uniform Resource Locators can be obtained from the target metadata; based on the target Uniform Resource Locators, several target unstructured data can be obtained from the distributed file system.

[0039] Specifically, an interaction layer can be created to allow users or applications to access and manipulate all types of data through standard database interfaces, providing efficient support for both structured and unstructured data.

[0040] Step six: Based on the target data analysis intent, perform data analysis and processing on each of the target unstructured data and each of the target structured data to obtain data processing results.

[0041] In this embodiment, as Figure 2 As shown, for structured data access: users or applications can query ordinary tables using SQL statements to obtain structured data information related to unstructured data. For unstructured data access: users or applications can query the directory table in the Lakewareg data platform using SQL statements to obtain metadata and file URL information of unstructured data files. Then, based on the file URL obtained from the directory table, they can directly access the unstructured data files in the HDFS distributed file system.

[0042] Furthermore, when performing data queries, the text component can be used to perform efficient full-text retrieval of unstructured data using SQL. The text component is integrated with the Elasticsearch (ES) data engine to improve retrieval efficiency. Specifically, the text component is used to perform structured transformation on various unstructured data, obtaining transformed structured data. The process of retrieving several target unstructured data and several target structured data from the Lakewareg data platform based on the target data query statement specifically includes: based on the target key fields in the target data query statement, using the target retrieval engine Elasticsearch (ES index), retrieving each transformed structured data to obtain transformed target structured data containing the target key fields; and determining the corresponding target unstructured data based on the transformed target structured data.

[0043] In this embodiment, after obtaining several target unstructured data through querying, the unstructured data files in the distributed file system HDFS can be exported to external storage through the Copy to operation. At the same time, related structured data or metadata in the platform can be exported as needed.

[0044] In addition, the metadata of unstructured data stored externally can be stored in the directory table through the Copy from operation to achieve data import.

[0045] The method in this embodiment can improve data service efficiency: business personnel (such as investment advisors, risk control specialists, etc.) can directly initiate complex data queries through natural language without relying on the IT team. The system automatically converts them into accurate and executable SQL statements, reducing the response time from several hours in the traditional mode to minutes, which greatly improves decision-making efficiency.

[0046] Strengthen compliance control capabilities: All automatically generated SQL queries are required to embed the latest regulatory rules and compliance conditions, eliminating compliance risks caused by human error from the source; at the same time, relying on the metadata version management and SQL execution logs of the Lakeware platform, the business terms, regulatory clauses and data sources used for each query are fully recorded, achieving full-process auditing and fully meeting regulatory inspection requirements.

[0047] Unlocking the value of multi-source data assets: The Lakeware platform deeply integrates structured and unstructured data sources such as CRM systems, transaction systems, public opinion monitoring, and customer behavior logs, breaking down system silos. Leveraging big data modeling technology, it intelligently parses unstructured content such as research reports, customer call recordings, and regulatory documents, automatically generating structured tags and business insights, significantly enriching the dimensions of analysis.

[0048] Drive the transformation of "data finding people" intelligent services: By combining real-time data streams with personalized user profiles, the system can proactively push customized analysis results and business suggestions, realizing a paradigm shift from "people finding data" to "data finding people", and comprehensively improving user experience and business response speed.

[0049] In summary, this invention helps securities firms build an intelligent data service system that integrates compliance, efficiency, and user experience, providing solid support for the high-quality and sustainable digital transformation of brokerage business.

[0050] Another embodiment of this application provides a data processing device based on a lake warehouse data platform, such as... Figure 3 As shown, it includes: Storage module 11 is used to store a number of unstructured data and a number of structured data based on a pre-deployed lake warehouse data platform, and to establish the association between each unstructured data and the corresponding structured data. Module 12 is used to build a business knowledge base that includes financial business terminology and financial regulatory rules; The generation module 13 is used to generate data query statements corresponding to each analysis intent based on the business knowledge base, using a pre-trained large model for each data analysis intent. The query module 14 is used to respond to the selection operation of the target data query statement corresponding to the target user's intention to analyze the target data, and obtain a number of target unstructured data and a number of target structured data from the Lake Warehouse data platform based on the target data query statement. The processing module 15 is used to perform data analysis and processing based on each of the target unstructured data and each of the target structured data in accordance with the target data analysis intent, and to obtain data processing results.

[0051] In this embodiment, the storage module is specifically used to: store unstructured data in a distributed file system based on a pre-deployed lake warehouse data platform, and store the metadata of each unstructured data in a pre-created directory table; create a regular table associated with the directory table, and store the structured data corresponding to each unstructured data in the regular table, so as to establish the association between each unstructured data and the corresponding structured data.

[0052] In this embodiment, the data processing device based on the Lake Warehouse data platform further includes a determination module. The determination module is used to: perform data analysis on each unstructured data based on a pre-trained large model to obtain keyword tags for each unstructured data, thereby obtaining metadata for each unstructured data; and determine the structured data associated with each unstructured data from several structured data based on the keyword tags of each unstructured data.

[0053] In this embodiment, the query module is specifically used to: obtain several target structured data from the ordinary table based on the target key fields in the target data query statement; obtain target metadata containing the target key fields from the directory table based on the target key fields in the target data query statement, and obtain the target Uniform Resource Locator from the target metadata; and obtain several target unstructured data from the distributed file system based on the target Uniform Resource Locator.

[0054] In this embodiment, the data processing device based on the Lake Warehouse data platform further includes a conversion module, which is used to: perform structured conversion processing on each unstructured data using a text component to obtain the converted structured data. The query module is specifically used to retrieve the transformed structured data based on the target key fields in the target data query statement, using a target retrieval engine, to obtain the transformed target structured data containing the target key fields; and to determine the corresponding target unstructured data based on the transformed target structured data.

[0055] In this embodiment, the data processing device based on the Lake Warehouse data platform further includes an acquisition module, which is specifically used to: pre-establish communication connection relationships between the Lake Warehouse data platform and relational databases and non-relational databases; acquire a number of structured data from the relational database; and acquire a number of unstructured data from the non-relational database.

[0056] In this embodiment, the metadata also includes any one or more of the following: storage time of unstructured data, file size of unstructured data, storage location, Uniform Resource Locator (URL), and checksum.

[0057] The apparatus in this embodiment stores all unstructured and structured data uniformly on the Lakewareg data platform. This allows for comprehensive and rapid data retrieval from the Lakewareg data platform, solving the problem of low retrieval efficiency caused by data distribution across different systems / platforms. This improves data retrieval efficiency and, consequently, data analysis and processing efficiency. Furthermore, by creating a business knowledge base and using a large model in conjunction with the business knowledge base to generate data query statements, the generated query statements become more reasonable, accurate, and better suited to financial business scenarios. This ensures accurate retrieval of relevant data from the Lakewareg data platform and improves the accuracy of data analysis and processing.

[0058] Another embodiment of this application provides a storage medium storing a computer program, which, when executed by a processor, implements the following method steps: Step 1: Store a number of unstructured data and a number of structured data based on the pre-deployed lake warehouse data platform, and establish the association between each unstructured data and the corresponding structured data. Step 2: Construct a business knowledge base that includes financial business terminology and financial regulatory rules; Step 3: For several data analysis intentions, use the pre-trained large model to generate data query statements corresponding to each analysis intention based on the business knowledge base; Step 4: Respond to the target user's selection operation of the target data query statement corresponding to the target data analysis intent, and obtain a number of target unstructured data and a number of target structured data from the Lake Warehouse data platform based on the target data query statement; Step 5: Based on the target data analysis intent, perform data analysis and processing on each of the target unstructured data and each of the target structured data to obtain data processing results.

[0059] The specific implementation process of the above method steps can be found in any of the above embodiments of the data processing method based on the Lake Warehouse Data Platform, and will not be repeated here.

[0060] The storage medium in this application, by uniformly storing all unstructured and structured data on the Lakewaregear data platform, enables comprehensive and rapid data retrieval within the platform. This solves the problem of low retrieval efficiency caused by data distribution across different systems / platforms, improving data retrieval efficiency and consequently enhancing data analysis and processing efficiency. Furthermore, by creating a business knowledge base and utilizing a large model in conjunction with this knowledge base to generate data query statements, the application ensures that the generated queries are more reasonable, accurate, and better suited to financial business scenarios. This guarantees accurate retrieval of relevant data from the Lakewaregear data platform, further improving the accuracy of data analysis and processing.

[0061] Another embodiment of this application provides an electronic device, such as... Figure 4 As shown, it includes at least a memory 1 and a processor 2. The memory 1 stores a computer program, and the processor 2 performs the following method steps when executing the computer program in the memory: Step 1: Store a number of unstructured data and a number of structured data based on the pre-deployed lake warehouse data platform, and establish the association between each unstructured data and the corresponding structured data. Step 2: Construct a business knowledge base that includes financial business terminology and financial regulatory rules; Step 3: For several data analysis intentions, use the pre-trained large model to generate data query statements corresponding to each analysis intention based on the business knowledge base; Step 4: Respond to the target user's selection operation of the target data query statement corresponding to the target data analysis intent, and obtain a number of target unstructured data and a number of target structured data from the Lake Warehouse data platform based on the target data query statement; Step 5: Based on the target data analysis intent, perform data analysis and processing on each of the target unstructured data and each of the target structured data to obtain data processing results.

[0062] The specific implementation process of the above method steps can be found in any of the above embodiments of the data processing method based on the Lake Warehouse Data Platform, and will not be repeated here.

[0063] The electronic device described in this application, by uniformly storing all unstructured and structured data on the Lakewareg data platform, enables comprehensive and rapid data retrieval within the Lakewareg data platform. This solves the problem of low retrieval efficiency caused by data distribution across different systems / platforms, improving data retrieval efficiency and consequently enhancing data analysis and processing efficiency. Furthermore, by creating a business knowledge base and utilizing a large model in conjunction with this knowledge base to generate data query statements, the generated query statements become more reasonable, accurate, and better suited to financial business scenarios. This ensures accurate retrieval of relevant data from the Lakewareg data platform, further improving the accuracy of data analysis and processing.

[0064] The above embodiments are merely exemplary embodiments of this application and are not intended to limit this application. Those skilled in the art can make various modifications or equivalent substitutions to this application within the scope and nature of this application, and such modifications or equivalent substitutions should also be considered to fall within the scope of protection of this application.

Claims

1. A data processing method based on a lake warehouse data platform, characterized in that, include: Based on a pre-deployed lake warehouse data platform, a number of unstructured data and a number of structured data are stored, and the relationship between each unstructured data and the corresponding structured data is established. Build a business knowledge base that includes financial business terminology and financial regulatory rules; For several data analysis intentions, a pre-trained large model is used to generate data query statements corresponding to each analysis intention based on the business knowledge base. In response to the selection operation of the target user's target data query statement corresponding to the target data analysis intent, based on the target data query statement, a number of target unstructured data and a number of target structured data are obtained from the Lake Warehouse data platform; Based on the stated target data analysis intent, data analysis and processing are performed on each of the target unstructured data and each of the target structured data to obtain data processing results.

2. The method as described in claim 1, characterized in that, The aforementioned pre-deployed lake warehouse data platform stores a number of unstructured data and a number of structured data, and establishes the association between each unstructured data and its corresponding structured data, specifically including: Based on a pre-deployed lake warehouse data platform, unstructured data is stored in a distributed file system, and the metadata of each unstructured data is stored in a pre-created directory table; Create a regular table associated with the directory table, and store the structured data corresponding to each unstructured data in the regular table to establish the association between each unstructured data and the corresponding structured data.

3. The method as described in claim 2, characterized in that, Before establishing the association between each unstructured data and its corresponding structured data, the method further includes: Data analysis is performed on each unstructured data based on the large model obtained through pre-training to obtain keyword tags for each unstructured data, thereby obtaining metadata for each unstructured data. Based on the keyword tags of each of the unstructured data, the structured data associated with each of the unstructured data is determined from a number of structured data.

4. The method as described in claim 3, characterized in that, The process of obtaining several target unstructured data and several target structured data from the Lake Warehouse data platform based on the target data query statement specifically includes: Based on the target key fields in the target data query statement, several target structured data are obtained from the ordinary table; Based on the target key field in the target data query statement, obtain the target metadata containing the target key field from the directory table, and obtain the target Uniform Resource Locator from the target metadata; Based on the target Uniform Resource Locator, several target unstructured data are obtained from the distributed file system.

5. The method as described in claim 1, characterized in that, The method further includes: The text component is used to perform structured transformation on the unstructured data to obtain the structured data after transformation. The process of obtaining several target unstructured data and several target structured data from the Lake Warehouse data platform based on the target data query statement specifically includes: Based on the target key fields in the target data query statement, the target retrieval engine is used to retrieve the transformed structured data to obtain the transformed target structured data containing the target key fields. The corresponding target unstructured data is determined based on the transformed target structured data.

6. The method as described in claim 1, characterized in that, Before storing a number of unstructured data and a number of structured data based on a pre-deployed lake warehouse data platform, the method further includes: Establish communication connections between the Lake Warehouse data platform and relational and non-relational databases in advance; Obtain a number of structured data from the relational database; Obtain some unstructured data from the aforementioned non-relational database.

7. The method as described in claim 3, characterized in that, The metadata also includes any one or more of the following: storage time of unstructured data, file size of unstructured data, storage location, Uniform Resource Locator (URL), and checksum.

8. A data processing device based on a lake warehouse data platform, characterized in that, include: The storage module is used to store a number of unstructured data and a number of structured data based on a pre-deployed lake warehouse data platform, and to establish the association between each unstructured data and the corresponding structured data. The building module is used to construct a business knowledge base that includes financial business terminology and financial regulatory rules. The generation module is used to generate data query statements corresponding to each analysis intent based on the business knowledge base, using a pre-trained large model for each data analysis intent. The query module is used to respond to the selection operation of the target user's target data query statement corresponding to the target data analysis intent, and to obtain a number of target unstructured data and a number of target structured data from the Lake Warehouse data platform based on the target data query statement; The processing module is used to perform data analysis and processing based on the target unstructured data and the target structured data, in accordance with the target data analysis intent, to obtain data processing results.

9. A storage medium, characterized in that, The storage medium stores a computer program, which, when executed by a processor, implements the steps of the data processing method based on the Lake Warehouse data platform as described in any one of claims 1-7.

10. An electronic device, characterized in that, It includes at least a memory and a processor, wherein the memory stores a computer program, and the processor, when executing the computer program in the memory, implements the steps of the data processing method based on the Lake Warehouse data platform as described in any one of claims 1-7.