Data warehouse detection method, device, equipment and storage medium

By determining the similarity between the first and second execution plans in the data warehouse and generating a detection report to adjust the data hierarchy, the problem of low accuracy in data duplication processing is solved, and more efficient data management is achieved.

CN115329013BActive Publication Date: 2026-06-16CHINA PING AN PROPERTY INSURANCE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA PING AN PROPERTY INSURANCE CO LTD
Filing Date
2022-08-12
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing technologies have low accuracy in identifying duplicate data processing and cannot effectively prevent different departments within an enterprise from repeatedly accessing data.

Method used

By determining the first execution plan corresponding to the processing request of the terminal device, the second data and its execution plan in the data warehouse are obtained. Based on the preset execution plan processing rules, the similarity between the first and second execution plans is judged, and a detection report is generated to indicate the adjustment of the data level or the processing request.

🎯Benefits of technology

It improved the accuracy of identifying duplicate data, reduced redundant data storage and processing, and optimized data warehouse management.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115329013B_ABST
    Figure CN115329013B_ABST
Patent Text Reader

Abstract

The application relates to the technical field of big data, and provides a detection method and device of a data warehouse, equipment and a storage medium, wherein the method comprises the following steps: in response to a processing request of a terminal device for first data, determining a first execution plan corresponding to the first data based on the processing request; acquiring second data stored in the data warehouse and a second execution plan corresponding to the second data; determining the similarity between the first execution plan and the second execution plan based on a preset execution plan processing rule; generating a detection report of the data warehouse based on the first execution plan and the second execution plan with the similarity greater than or equal to a first similarity threshold value, outputting the detection report, and the detection report is used for instructing relevant personnel to adjust the data level of the data warehouse or adjust the processing request. The application aims to improve the accuracy of judging data repeated processing and reduce the repeated processing of data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of big data technology, and in particular to a method, apparatus, equipment and storage medium for detecting data warehouses. Background Technology

[0002] With the continuous development of big data technology, enterprises often use big data platforms such as data warehouses for unified data management. While unified data management provides convenience for enterprises, it also leads to the problem of duplicate data access between different departments within the enterprise.

[0003] Currently, to prevent duplicate data access, big data platforms store data processing result tables to determine if duplicate data processing has occurred. However, because these tables only include the data categories determined after processing, they do not comprehensively consider the entire data processing process. Different processing methods may result in the same or partially the same data categories, leading to misjudgments when determining if duplicate data is present based on the result table. Therefore, existing technologies suffer from low accuracy in identifying duplicate data processing. Summary of the Invention

[0004] The main objective of this application is to provide a method, apparatus, equipment, and storage medium for detecting data warehouses, aiming to improve the accuracy of identifying data duplication and reduce data duplication.

[0005] Firstly, this application provides a method for detecting a data warehouse, the method comprising the following steps:

[0006] In response to a processing request from a terminal device for first data, a first execution plan corresponding to the first data is determined based on the processing request;

[0007] Obtain the second data stored in the data warehouse and the second execution plan corresponding to the second data;

[0008] Based on preset execution plan processing rules, the degree of similarity between the first execution plan and the second execution plan is determined;

[0009] Based on the first execution plan and the second execution plan whose similarity is greater than or equal to the first similarity threshold, a detection report on the data warehouse is generated and output. The detection report is used to instruct relevant personnel to adjust the data hierarchy of the data warehouse or to adjust the processing request.

[0010] Secondly, this application also provides a data warehouse detection device, the detection device comprising:

[0011] The first acquisition module is configured to respond to a processing request for first data from a terminal device and determine a first execution plan corresponding to the first data based on the processing request.

[0012] The second acquisition module is used to acquire the second data stored in the data warehouse and the second execution plan corresponding to the second data;

[0013] The first determining module is used to determine the degree of similarity between the first execution plan and the second execution plan based on preset execution plan processing rules;

[0014] The first output module is used to generate a detection report on the data warehouse based on a first execution plan and a second execution plan whose similarity is greater than or equal to a first similarity threshold, and output the detection report. The detection report is used to instruct relevant personnel to adjust the data hierarchy of the data warehouse or to adjust the processing request.

[0015] Thirdly, this application also provides a computer device, which includes a memory and a processor;

[0016] The memory is used to store computer programs;

[0017] The processor is configured to execute the computer program and, in executing the computer program, implement the data warehouse detection method as described above.

[0018] Fourthly, this application also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps of the data warehouse detection method described above.

[0019] This application provides a data warehouse detection method, apparatus, device, and storage medium. The method includes: responding to a terminal device's request to process first data; determining a first execution plan corresponding to the first data based on the processing request; acquiring second data stored in the data warehouse and a second execution plan corresponding to the second data; determining the similarity between the first execution plan and the second execution plan based on preset execution plan processing rules; generating a detection report for the data warehouse based on the first execution plan and the second execution plan whose similarity is greater than or equal to a first similarity threshold; and outputting the detection report, which is used to instruct relevant personnel to adjust the data hierarchy of the data warehouse or adjust the processing request to improve the accuracy of judging data duplication and reduce data duplication. Attached Figure Description

[0020] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0021] Figure 1 This is a flowchart illustrating a data warehouse detection method provided in an embodiment of this application;

[0022] Figure 2 This is an application scenario diagram of a data warehouse detection method according to an embodiment of this application;

[0023] Figure 3 This is a schematic block diagram of a data warehouse detection device provided in an embodiment of this application;

[0024] Figure 4 This is a schematic block diagram of the structure of a computer device provided in an embodiment of this application. Detailed Implementation

[0025] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0026] The flowchart shown in the attached diagram is for illustrative purposes only and does not necessarily include all content and operations / steps, nor does it necessarily have to be performed in the order described. For example, some operations / steps can be broken down, combined, or partially merged, so the actual execution order may change depending on the actual situation.

[0027] This application provides a method, apparatus, device, and storage medium for detecting a data warehouse. The data warehouse detection method can be applied to terminal devices, such as mobile phones, tablets, laptops, and desktop computers. It can also be applied to servers, which can be standalone servers or cloud servers providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDNs), and big data and artificial intelligence platforms.

[0028] The following detailed description of some embodiments of this application is provided in conjunction with the accompanying drawings. Unless otherwise specified, the following embodiments and features can be combined with each other.

[0029] Please see Figure 1 , Figure 1 This is a flowchart illustrating a data warehouse detection method provided in an embodiment of this application. It should be noted that the data warehouse detection method provided in this embodiment can be used on terminal devices, and of course, it can also be used on servers. It can also be... Figure 2 As shown, the server can obtain a processing request for the first data from the terminal device, obtain a data warehouse detection report according to the data warehouse detection method, and send the detection report to the terminal device so that the terminal device can instruct relevant personnel to adjust the data level of the data warehouse or adjust the data request according to the detection report. Figure 2 This diagram illustrates only one application scenario of data warehouse detection methods. It should be understood that data warehouse detection methods are not limited to this. Figure 2 The scene shown.

[0030] In practice, terminal devices include, but are not limited to, any of the following: mobile phones, tablets, laptops, and desktop computers; servers can be standalone servers, server clusters, or cloud servers that provide cloud computing services.

[0031] like Figure 1 As shown, the detection method for this data warehouse includes steps S101 to S104.

[0032] Step S101: In response to the terminal device's request to process the first data, determine the first execution plan corresponding to the first data based on the processing request.

[0033] For example, a terminal device's request to process the first data may include at least one of retrieving the first data from a data warehouse or storing the first data in a data warehouse. For instance, in response to a terminal device's request to process the first data, the association between the name corresponding to the terminal device's request and the first execution plan can be determined.

[0034] In some embodiments, obtaining first data from a data warehouse includes processing existing raw data in the data warehouse according to a first execution plan to obtain the first data and then obtaining it. In some embodiments, storing the first data in a data warehouse includes processing the raw data according to the first execution plan to obtain the first data and then storing it in the data warehouse. The raw data may be data stored in the data warehouse, or it may be data other than data in the data warehouse; there is no limitation on this.

[0035] Step S102: Obtain the second data stored in the data warehouse and the second execution plan corresponding to the second data.

[0036] For example, the second data may be acquired and / or stored in a data warehouse in response to a processing request from a terminal device. For the second data, a second execution plan corresponding to the second data may be determined based on the processing request from the terminal device. Optionally, the data warehouse stores a data file corresponding to the second data, wherein the data file is associated with the processing request from the terminal device.

[0037] In some embodiments, the second data stored in the data warehouse and the data file corresponding to the second data are obtained; the existence of a preset first keyword in the data file is detected; when the first keyword is detected in the data file, the first keyword is replaced with a preset second keyword; and the second execution plan corresponding to the second data is queried in the data file based on the second keyword.

[0038] For example, the data file could be an HQL (Hibernate Query Language) file. For instance, the system checks if a predefined first keyword exists in the HQL file. This predefined first keyword can include either `insert into` or `insert overwrite`. For example, if `insert into` is detected in the HQL file, the first keyword is replaced with a predefined second keyword, such as `explain insert into`; if `insert overwrite` is detected in the HQL file, the first keyword is replaced with a predefined second keyword, such as `explain insert overwrite`. However, this is not a limitation; the data file, the predefined first keyword, and the predefined second keyword can be pre-set or set by relevant personnel, and there are no restrictions here.

[0039] In some embodiments, after replacing the first keyword in the HQL file with a preset second keyword, the second execution plan corresponding to the second data can be queried in the HQL file based on the second keyword. For example, in the HQL file, the HQL statement corresponding to the second data can be output based on the second keyword, and the HQL statement corresponding to the second data can be input into the Hive platform to obtain the second execution plan corresponding to the second data.

[0040] In some embodiments, the processing request name corresponding to the second data can be determined based on the HQL file, for example, the name corresponding to the processing request of the second data by the terminal device. Optionally, based on the processing request name corresponding to the second data, the HQL statement, and the second execution plan, at least two of the following relationships can be established: the processing request name corresponding to the second data, the HQL statement, and the second execution plan. For example, a table can be created to represent the relationships between the processing request name corresponding to the second data, the HQL statement, and the second execution plan, as shown in Table 1.

[0041]

[0042] Table 1

[0043] Step S103: Based on the preset execution plan processing rules, determine the degree of similarity between the first execution plan and the second execution plan.

[0044] For example, based on preset execution plan processing rules, the similarity between the first execution plan and the second execution plan is determined. The similarity judgment between the first data and the second data stored in the data warehouse is refined to the execution plan granularity. At the execution plan granularity, it is determined whether there is second data that is the same as the first data. This is equivalent to the terminal device's processing request for the second data being the same as the terminal device's processing request for the first data. Based on the similarity between the first execution plan and the second execution plan, it is determined whether it is necessary to respond to the terminal device's processing request for the first data and execute the first execution plan corresponding to the first data, thereby improving the accuracy of judging data duplication and reducing data duplication.

[0045] For example, based on preset execution plan processing rules, the first execution plan and the second execution plan are processed to obtain the first execution plan fragments corresponding to each stage in the first execution plan and the second execution plan fragments corresponding to each stage in the second execution plan; the similarity between the first execution plan and the second execution plan fragments is determined according to the similarity between the first execution plan fragments and the second execution plan fragments.

[0046] For example, based on preset execution plan processing rules, the first execution plan is further subdivided into multiple first execution plan segments, where each first execution plan segment corresponds to a stage of the first execution plan. Similarly, the second execution plan is further subdivided into multiple second execution plan segments, where each second execution plan segment corresponds to a stage of the second execution plan. By refining the similarity judgment between the first data and the second data stored in the data warehouse to the granularity of the execution plan segment, i.e., the stage granularity, it is determined from the granularity of the execution plan segment whether there is second data that is at least partially the same as the first data. This is equivalent to the terminal device's processing request for the second data being at least partially the same as the terminal device's processing request for the first data. Based on the execution plan segments corresponding to each stage, the accuracy of judging data duplication processing is improved, and the duplication of data processing is reduced.

[0047] For example, by comparing each first execution plan segment with each second execution plan segment, a second execution plan segment with a high degree of similarity to any first execution plan segment can be identified. That is, the data processing logic corresponding to the first execution plan segment and the data processing logic corresponding to the second execution plan segment are highly similar. In other words, the first execution plan segment and the second execution plan segment are highly likely to be the same data being processed repeatedly. This allows for better judgment of data duplication at the stage granularity, thereby improving the accuracy of judging data duplication.

[0048] In some embodiments, the presence of a preset third keyword in the first execution plan or the second execution plan is detected; when the third keyword is detected in the first execution plan, the first execution plan is divided according to every two adjacent third keywords to obtain the first execution plan segments corresponding to each stage in the first execution plan; when the third keyword is detected in the second execution plan, the second execution plan is divided according to every two adjacent third keywords to obtain the second execution plan segments corresponding to each stage in the second execution plan.

[0049] For example, referring to Table 1, the system checks whether a pre-defined third keyword, such as Stage: Stage-, exists in the second execution plan. When a third keyword is detected in the second execution plan, it divides the second execution plan into segments corresponding to each stage based on every two adjacent third keywords. Similarly, the first execution plan is processed to obtain the first execution plan segments corresponding to each stage. Thus, based on the data processing logic corresponding to the stage granularity, the similarity between the first and second execution plans is judged by the similarity between each first and second execution plan segment, improving the accuracy of data duplication judgment and reducing data duplication. Of course, this is not limited to this; the pre-defined third keyword can be pre-set or set by relevant personnel, without restriction.

[0050] In some embodiments, relationships among at least two of the processing request name, HQL statement, and second execution plan corresponding to the second data are established based on the relationships between at least two of these relationships, as well as the relationships between the second execution plan and its fragments. For example, a table can be created to represent the relationships among the processing request name, HQL statement, and second execution plan fragment corresponding to the second data, as shown in Table 2.

[0051]

[0052] Table 2

[0053] Optionally, the association between at least two of the processing request name, HQL statement, second execution plan, and second execution plan fragment corresponding to the second data can be stored. Then, in response to the terminal device's processing request for the first data, after determining the first execution plan corresponding to the first data based on the processing request, the second execution plan and the second execution plan fragment corresponding to each stage of the second execution plan can be obtained according to the stored association between at least two of the processing request name, HQL statement, second execution plan, and second execution plan fragment corresponding to the second data, so as to improve the efficiency of judging data duplication processing.

[0054] In some embodiments, determining the similarity between a first execution plan fragment and a second execution plan fragment based on the similarity between the first execution plan fragment and the second execution plan fragment includes: sequentially selecting any first execution plan fragment and matching it with each second execution plan fragment; determining the similarity between the first execution plan fragment and each second execution plan fragment based on the matching results; and determining the similarity between the first execution plan fragment and the second execution plan fragment based on the first execution plan fragment and the second execution plan fragment having a similarity greater than or equal to a second similarity threshold.

[0055] For example, by sequentially comparing any first execution plan fragment with each second execution plan fragment, the similarity between the corresponding first execution plan fragment and each second execution plan fragment is determined. If the similarity between a first execution plan fragment and a second execution plan fragment is greater than or equal to a second similarity threshold, then the first execution plan fragment and the second execution plan fragment are highly likely to be duplicate processing of the same data. Therefore, the first execution plan fragment and the second execution plan fragment with a similarity greater than or equal to the second similarity threshold have a strong correlation with the similarity between the first execution plan and the second execution plan. If the similarity between a first execution plan fragment and a second execution plan fragment is less than the second similarity threshold, then the first execution plan fragment and the second execution plan fragment are highly likely to be processing different data. Therefore, the first execution plan fragment and the second execution plan fragment with a similarity less than the second similarity threshold have a weak correlation with the similarity between the first execution plan and the second execution plan.

[0056] In some embodiments, the similarity between a first execution plan and a second execution plan is determined based on a first execution plan fragment and a second execution plan fragment whose similarity is greater than or equal to a second similarity threshold. This is equivalent to removing second execution plan fragments whose similarity to the first execution plan fragment is less than the second similarity threshold. For example, a data warehouse typically stores a large amount of second data, resulting in a large number of second execution plan fragments. If second execution plans whose similarity to the first execution plan fragment is less than the second similarity threshold are not removed, as the number of second execution plans with similarity to the first execution plan fragment less than the second similarity threshold increases, the proportion of second execution plans with similarity to the first execution plan fragment greater than or equal to the second similarity threshold will decrease accordingly. Consequently, the similarity between the first and second execution plans will also decrease, making it difficult to determine the similarity between the first and second execution plans, which is detrimental to improving the accuracy of determining data duplication and reducing data duplication. Therefore, it is necessary to eliminate the corresponding second execution plan segments based on the degree of similarity between each first execution plan segment and each second execution plan segment, so as to reduce the adverse effect of the number of corresponding second execution plan segments on the degree of similarity between the first and second execution plans.

[0057] Step S104: Generate a detection report on the data warehouse based on the first execution plan and the second execution plan, which are similar to or equal to the first similarity threshold. Output the detection report, which is used to instruct relevant personnel to adjust the data hierarchy of the data warehouse or adjust the processing request.

[0058] In some embodiments, if the similarity between a first execution plan and a second execution plan is greater than or equal to a first similarity threshold, then at least some segments of the first execution plan and the second execution plan have a high degree of similarity, which is equivalent to a high probability that some data is processed repeatedly in the first execution plan and the second execution plan. Therefore, a detection report on the data warehouse is generated based on the first execution plan and the second execution plan whose similarity is greater than or equal to the first similarity threshold, and the detection report is output to the terminal device. For example, the detection report may show that the terminal device's processing request for the first data indicates that at least some of the first data stored in the data warehouse is the same as the second data stored in the data warehouse, or that the terminal device's processing request for the first data indicates that at least some of the first data obtained by processing the data in the data warehouse is the same as the second data stored in the data warehouse. Then, relevant personnel can adjust the data hierarchy of the data warehouse through the detection report displayed on the terminal device to reduce the repeated storage of at least some of the same data in the data warehouse, or adjust the processing request for the first data to reduce the repeated processing of some of the same data, thereby reducing the repeated processing of data.

[0059] For example, the test report can be stored in a blockchain. The blockchain referred to in this application is a novel application model of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanisms, and encryption algorithms. A blockchain is essentially a decentralized database, a chain of data blocks linked together using cryptographic methods. Each data block contains information about a batch of network transactions, used to verify the validity of the information (anti-counterfeiting) and generate the next block. A blockchain can include a blockchain underlying platform, a platform product service layer, and an application service layer.

[0060] Understandably, when a server needs to obtain a test report, it can broadcast to the blockchain to determine the storage node or storage address of the test report, and then obtain the test report from the storage node or storage address.

[0061] For example, in response to adjustments to the data hierarchy of the data warehouse or adjustments to the processing request, the association between at least two of the processing request name, HQL statement, second execution plan, and second execution plan fragment corresponding to the second data can be updated. Thus, after responding to the terminal device's processing request for the first data and determining the first execution plan corresponding to the first data based on the processing request, the second execution plan and the second execution plan fragment corresponding to each stage of the second execution plan can be obtained according to the updated association between at least two of the processing request name, HQL statement, second execution plan, and second execution plan fragment corresponding to the second data, thereby improving the efficiency of determining duplicate data processing.

[0062] Optionally, the detection report includes processing recommendations. For example, a detection report on the data warehouse is generated based on a first execution plan and a second execution plan whose similarity is greater than or equal to a first similarity threshold. The output of the detection report includes: when the similarity between the first execution plan and the second execution plan is equal to a third similarity threshold, outputting processing recommendations, which indicate that the data warehouse already stores first data related to the processing request.

[0063] In some embodiments, when the similarity between the first execution plan and the second execution plan is equal to a third similarity threshold, for example, the third similarity threshold is 100%, it is determined that the first execution plan and the second execution plan are the same. In order to avoid repeated processing of the same data, a processing suggestion can be output. The processing suggestion is used to indicate that the first data of the processing request requirement has been stored in the data warehouse.

[0064] For example, when a terminal device requests the processing of first data to instruct the retrieval of first data from a data warehouse, second data corresponding to a second execution plan whose similarity to the first execution plan is equal to a third similarity threshold can be sent to the terminal device, thereby reducing redundant processing of data and improving the timeliness of data retrieval.

[0065] For example, when a terminal device requests the processing of first data to instruct the storage of the first data in a data warehouse, a processing suggestion can be sent to the terminal device to indicate that the same data does not need to be stored repeatedly.

[0066] Optionally, the detection report includes a table. For example, a data warehouse detection report is generated based on a first execution plan and a second execution plan whose similarity is greater than or equal to a first similarity threshold. The output detection report includes: processing the first execution plan and the second execution plan whose similarity is greater than or equal to the first similarity threshold based on preset table creation rules to obtain a table.

[0067] In some embodiments, based on the relationship between the processing request name corresponding to the first data, the first execution plan, and the first execution plan segment, and the processing request name corresponding to the second data, the second execution plan, and the second execution plan segment, the processing request name corresponding to the first execution plan and the second execution plan segment corresponding to each stage of the first execution plan and the second execution plan segment corresponding to each stage of the second execution plan can be obtained. A table can be created based on preset table creation rules. For example, the table can be used to instruct relevant personnel to adjust or optimize the processing requests corresponding to the first data and the second data to improve the accuracy of judging data duplication and reduce data duplication.

[0068] The data warehouse detection method provided in the above embodiments responds to a terminal device's request to process first data, determines a first execution plan corresponding to the first data based on the processing request; acquires second data stored in the data warehouse and a second execution plan corresponding to the second data; determines the similarity between the first execution plan and the second execution plan based on preset execution plan processing rules; generates a data warehouse detection report based on the first execution plan and the second execution plan whose similarity is greater than or equal to a first similarity threshold, and outputs the detection report. The detection report is used to instruct relevant personnel to adjust the data hierarchy of the data warehouse or adjust the data request to improve the accuracy of judging data duplication and reduce data duplication.

[0069] Please see Figure 3 , Figure 3 This is a schematic block diagram of a data warehouse detection device provided in an embodiment of this application. The data warehouse detection device can be configured in a server or terminal device to execute the aforementioned data warehouse detection method.

[0070] like Figure 3 As shown, the detection device for the data warehouse includes: a first acquisition module 110, a second acquisition module 120, a determination module 130, and an output module 140.

[0071] The first acquisition module 110 is used to respond to a processing request for first data from a terminal device and determine a first execution plan corresponding to the first data based on the processing request.

[0072] The second acquisition module 120 is used to acquire the second data stored in the data warehouse and the second execution plan corresponding to the second data.

[0073] The determination module 130 is used to determine the degree of similarity between the first execution plan and the second execution plan based on preset execution plan processing rules.

[0074] The output module 140 is used to generate a detection report on the data warehouse based on a first execution plan and a second execution plan whose similarity is greater than or equal to a first similarity threshold, and output the detection report. The detection report is used to instruct relevant personnel to adjust the data hierarchy of the data warehouse or to adjust the processing request.

[0075] For example, the second acquisition module 120 includes an acquisition submodule, a detection submodule, a replacement submodule, and a query submodule.

[0076] The acquisition submodule is used to acquire the second data stored in the data warehouse and the data file corresponding to the second data.

[0077] The first detection submodule is used to detect whether a preset first keyword exists in the data file.

[0078] The replacement submodule is used to replace the first keyword with a preset second keyword when the first keyword is detected in the data file.

[0079] The query submodule is used to query the second execution plan corresponding to the second data based on the second keyword in the data file.

[0080] For example, the determination module 130 includes a processing submodule and a first determination submodule.

[0081] The processing submodule is used to process the first execution plan and the second execution plan based on preset execution plan processing rules to obtain the first execution plan fragments corresponding to each stage in the first execution plan and the second execution plan fragments corresponding to each stage in the second execution plan.

[0082] The first determining submodule is used to determine the similarity between the first execution plan and the second execution plan based on the similarity between the first execution plan fragment and the second execution plan fragment.

[0083] For example, the processing submodule includes a second detection submodule, a first partitioning submodule, and a second partitioning submodule.

[0084] The second detection submodule is used to detect whether a preset third keyword exists in the first execution plan or the second execution plan.

[0085] The first segmentation submodule is used to segment the first execution plan according to every two adjacent third keywords when the third keyword is detected in the first execution plan, so as to obtain the first execution plan segment corresponding to each stage in the first execution plan.

[0086] The second segmentation submodule is used to segment the second execution plan according to every two adjacent third keywords when the third keyword is detected in the second execution plan, so as to obtain the second execution plan segments corresponding to each stage in the second execution plan.

[0087] For example, the first determining submodule includes a matching submodule, a second determining submodule, and a third determining submodule.

[0088] The matching submodule is used to sequentially select any first execution plan segment and match it with each of the second execution plan segments.

[0089] The second determining submodule is used to determine the degree of similarity between the first execution plan fragment and each of the second execution plan fragments based on the matching situation between the first execution plan fragment and each of the second execution plan fragments.

[0090] The third determination submodule is used to determine the similarity between the first execution plan and the second execution plan based on the first execution plan fragment and the second execution plan fragment having a similarity greater than or equal to a second similarity threshold.

[0091] For example, output module 140 includes a suggestion submodule.

[0092] The suggestion submodule is used to output a processing suggestion when the similarity between the first execution plan and the second execution plan is equal to a third similarity threshold. The processing suggestion is used to indicate that the first data of the processing request requirement has been stored in the data warehouse.

[0093] For example, output module 140 includes a table creation submodule.

[0094] The table creation submodule is used to process the first execution plan and the second execution plan, whose similarity is greater than or equal to the first similarity threshold, based on preset table creation rules, to obtain a table.

[0095] It should be noted that those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the above-described apparatus and its modules and units can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0096] The method of this application can be used in a wide variety of general-purpose or special-purpose computer system environments or configurations. Examples include: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics devices, network PCs, minicomputers, mainframe computers, and distributed computing environments including any of the above systems or devices. This application can be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform specific tasks or implement specific abstract data types. This application can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.

[0097] For example, the above-described methods and apparatus can be implemented as a computer program that can run on a computer device.

[0098] Please see Figure 4 , Figure 4 This is a schematic block diagram illustrating the structure of a computer device according to an embodiment of this application. The computer device can be a server or a terminal device.

[0099] like Figure 4 As shown, the computer device includes a processor, a memory, and a network interface connected via a system bus, wherein the memory may include a storage medium and internal memory.

[0100] The storage medium can store the operating system and computer programs. These computer programs include program instructions that, when executed, cause the processor to perform any data warehouse detection method.

[0101] The processor provides computing and control capabilities, supporting the operation of the entire computer device.

[0102] Internal memory provides an environment for the execution of computer programs stored in the storage medium. When these computer programs are executed by the processor, the processor can perform any data warehouse detection method.

[0103] This network interface is used for network communication, such as sending assigned tasks. Those skilled in the art will understand that... Figure 4 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0104] It should be understood that a processor can be a Central Processing Unit (CPU), but it can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other convertible logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Among these, a general-purpose processor can be a microprocessor or any conventional processor.

[0105] In one embodiment, the processor is configured to run a computer program stored in memory to perform the following steps:

[0106] In response to a processing request from a terminal device for first data, a first execution plan corresponding to the first data is determined based on the processing request;

[0107] Obtain the second data stored in the data warehouse and the second execution plan corresponding to the second data;

[0108] Based on preset execution plan processing rules, the degree of similarity between the first execution plan and the second execution plan is determined;

[0109] Based on the first execution plan and the second execution plan whose similarity is greater than or equal to the first similarity threshold, a detection report on the data warehouse is generated and output. The detection report is used to instruct relevant personnel to adjust the data hierarchy of the data warehouse or to adjust the processing request.

[0110] In one embodiment, when the processor implements the process of obtaining the second data stored in the data warehouse and the second execution plan corresponding to the second data, it is configured to:

[0111] Obtain the second data stored in the data warehouse and the data file corresponding to the second data;

[0112] Detect whether a preset first keyword exists in the data file;

[0113] When the first keyword is detected in the data file, the first keyword is replaced with a preset second keyword;

[0114] In the data file, the second execution plan corresponding to the second data is queried based on the second keyword.

[0115] In one embodiment, when the processor determines the similarity between the first execution plan and the second execution plan based on the preset execution plan processing rules, it is configured to:

[0116] Based on preset execution plan processing rules, the first execution plan and the second execution plan are processed to obtain the first execution plan fragments corresponding to each stage in the first execution plan and the second execution plan fragments corresponding to each stage in the second execution plan.

[0117] The similarity between the first execution plan fragment and the second execution plan fragment is determined based on their similarity.

[0118] In one embodiment, when the processor processes the first execution plan and the second execution plan based on the preset execution plan processing rules to obtain the first execution plan fragment corresponding to each stage of the first execution plan and the second execution plan fragment corresponding to each stage of the second execution plan, it is configured to:

[0119] Detect whether a preset third keyword exists in the first execution plan or the second execution plan;

[0120] When the third keyword is detected in the first execution plan, the first execution plan is divided according to every two adjacent third keywords to obtain the first execution plan segments corresponding to each stage in the first execution plan.

[0121] When the third keyword is detected in the second execution plan, the second execution plan is divided according to every two adjacent third keywords to obtain the second execution plan segments corresponding to each stage in the second execution plan.

[0122] In one embodiment, when implementing the similarity assessment based on the first execution plan fragment and the second execution plan fragment, the processor is configured to:

[0123] Each first execution plan segment is selected sequentially and matched with each of the second execution plan segments;

[0124] Based on the matching between the first execution plan fragment and each of the second execution plan fragments, determine the degree of similarity between the first execution plan fragment and each of the second execution plan fragments;

[0125] The similarity between the first execution plan and the second execution plan is determined based on the first execution plan fragment and the second execution plan fragment, which have a similarity greater than or equal to the second similarity threshold.

[0126] In one embodiment, when the processor generates and outputs a detection report on the data warehouse based on a first execution plan and a second execution plan that are based on a similarity level greater than or equal to a first similarity threshold, it is configured to:

[0127] When the similarity between the first execution plan and the second execution plan is equal to a third similarity threshold, a processing suggestion is output, which is used to indicate that the first data of the processing request requirement has been stored in the data warehouse.

[0128] In one embodiment, when the processor generates and outputs the detection report of the data warehouse based on the first execution plan and the second execution plan, which are based on the similarity being greater than or equal to the first similarity threshold, it is configured to:

[0129] Based on preset table creation rules, the first execution plan and the second execution plan, whose similarity is greater than or equal to the first similarity threshold, are processed to obtain a table.

[0130] It should be noted that those skilled in the art will understand that, for the sake of convenience and brevity, the specific working process of data warehouse detection described above can be referred to the corresponding process in the aforementioned data warehouse detection method embodiments, and will not be repeated here.

[0131] This application also provides a computer-readable storage medium storing a computer program, which includes program instructions. When the program instructions are executed, the method implemented can be referred to in various embodiments of the data warehouse detection method of this application.

[0132] The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiments, such as the hard disk or memory of the computer device. The computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, SmartMedia Card (SMC), Secure Digital (SD) card, or Flash Card equipped on the computer device.

[0133] It should be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the scope of the application. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.

[0134] It should also be understood that the term "and / or" as used in this specification and the appended claims refers to any combination and all possible combinations of one or more of the associated listed items, and includes such combinations. It should be noted that, herein, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or system that includes that element.

[0135] The sequence numbers of the embodiments in this application are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments. The above descriptions are merely specific implementations of this application, but the scope of protection of this application is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in this application, and these modifications or substitutions should all be covered within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A method for detecting a data warehouse, characterized in that, include: In response to a processing request from a terminal device for first data, a first execution plan corresponding to the first data is determined based on the processing request; the processing request includes storing the first data in the data warehouse; storing the first data in the data warehouse includes processing the original data according to the first execution plan to obtain the first data and storing the first data in the data warehouse; the original data includes data other than the data in the data warehouse; Obtain the second data stored in the data warehouse and the second execution plan corresponding to the second data; Based on preset execution plan processing rules, the degree of similarity between the first execution plan and the second execution plan is determined; Based on the first execution plan and the second execution plan whose similarity is greater than or equal to the first similarity threshold, a detection report on the data warehouse is generated and output. The detection report is used to instruct relevant personnel to adjust the data hierarchy of the data warehouse or to adjust the processing request.

2. The detection method according to claim 1, characterized in that, The step of obtaining the second data stored in the data warehouse and the second execution plan corresponding to the second data includes: Obtain the second data stored in the data warehouse and the data file corresponding to the second data; Detect whether a preset first keyword exists in the data file; When the first keyword is detected in the data file, the first keyword is replaced with a preset second keyword; In the data file, the second execution plan corresponding to the second data is queried based on the second keyword.

3. The detection method according to claim 1, characterized in that, The determination of the similarity between the first execution plan and the second execution plan based on preset execution plan processing rules includes: Based on preset execution plan processing rules, the first execution plan and the second execution plan are processed to obtain the first execution plan fragments corresponding to each stage in the first execution plan and the second execution plan fragments corresponding to each stage in the second execution plan. The similarity between the first execution plan fragment and the second execution plan fragment is determined based on their similarity.

4. The detection method according to claim 3, characterized in that, The process, based on preset execution plan processing rules, processes the first execution plan and the second execution plan to obtain first execution plan segments corresponding to each stage of the first execution plan and second execution plan segments corresponding to each stage of the second execution plan, including: Detect whether a preset third keyword exists in the first execution plan or the second execution plan; When the third keyword is detected in the first execution plan, the first execution plan is divided according to every two adjacent third keywords to obtain the first execution plan segments corresponding to each stage in the first execution plan. When the third keyword is detected in the second execution plan, the second execution plan is divided according to every two adjacent third keywords to obtain the second execution plan segments corresponding to each stage in the second execution plan.

5. The detection method according to claim 3, characterized in that, Determining the similarity between the first execution plan fragment and the second execution plan fragment based on their similarity includes: Each first execution plan segment is selected sequentially and matched with each of the second execution plan segments; Based on the matching between the first execution plan fragment and each of the second execution plan fragments, determine the degree of similarity between the first execution plan fragment and each of the second execution plan fragments; The similarity between the first execution plan and the second execution plan is determined based on the first execution plan fragment and the second execution plan fragment, which have a similarity greater than or equal to the second similarity threshold.

6. The detection method according to claim 1, characterized in that, The test report includes processing recommendations; The first execution plan and the second execution plan, based on the similarity being greater than or equal to a first similarity threshold, generate a detection report on the data warehouse, and output the detection report, including: When the similarity between the first execution plan and the second execution plan is equal to a third similarity threshold, a processing suggestion is output, which is used to indicate that the first data of the processing request requirement has been stored in the data warehouse.

7. The detection method according to claim 1, characterized in that, The test report includes a form; The process generates a detection report for the data warehouse based on a first execution plan and a second execution plan whose similarity is greater than or equal to a first similarity threshold, and outputs the detection report, including: Based on preset table creation rules, the first execution plan and the second execution plan, whose similarity is greater than or equal to the first similarity threshold, are processed to obtain a table.

8. A detection device for a data warehouse, characterized in that, The detection device includes: A first acquisition module is configured to respond to a processing request for first data from a terminal device, and determine a first execution plan corresponding to the first data based on the processing request; the processing request includes storing the first data into the data warehouse; storing the first data into the data warehouse includes processing the original data according to the first execution plan to obtain the first data and storing the first data into the data warehouse; the original data includes data other than the data in the data warehouse; The second acquisition module is used to acquire the second data stored in the data warehouse and the second execution plan corresponding to the second data; The first determining module is used to determine the degree of similarity between the first execution plan and the second execution plan based on preset execution plan processing rules; The first output module is used to generate a detection report on the data warehouse based on a first execution plan and a second execution plan whose similarity is greater than or equal to a first similarity threshold, and output the detection report. The detection report is used to instruct relevant personnel to adjust the data hierarchy of the data warehouse or to adjust the processing request.

9. A computer device, characterized in that, The computer device includes a memory and a processor; The memory is used to store computer programs; The processor is configured to execute the computer program and, in executing the computer program, implement the data warehouse detection method as described in any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the data warehouse detection method as described in any one of claims 1 to 7.