Data retrieval method, apparatus and computing device

WO2026130217A1PCT designated stage Publication Date: 2026-06-25HUAWEI CLOUD COMPUTING TECHNOLOGIES CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
HUAWEI CLOUD COMPUTING TECHNOLOGIES CO LTD
Filing Date
2025-12-11
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

In existing technologies, when the retrieval module cannot directly search for data related to the user's question from an external knowledge base, it is prone to returning near-incorrect results or no results, leading to a decrease in retrieval accuracy.

Method used

By identifying multiple data points to be retrieved, relevant candidate source tables are retrieved, a target wide table is generated, and data values ​​are extracted from it. These values ​​are then combined with a large model to generate answers, thereby improving retrieval accuracy.

Benefits of technology

It enables the retrieval of data-related content that cannot be directly searched from external knowledge bases, thereby improving retrieval accuracy and user trust.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025141841_25062026_PF_FP_ABST
    Figure CN2025141841_25062026_PF_FP_ABST
Patent Text Reader

Abstract

Provided in the present application are a data retrieval method, an apparatus and a computing device. The method comprises: on the basis of a question input by a user, determining a plurality of pieces of first data to be retrieved; retrieving from a data storage medium a plurality of candidate source tables related to said plurality of pieces of first data, each candidate source table comprising part of target data among said plurality of pieces of first data; on the basis of coverage of said plurality of pieces of first data by each candidate source table and an association relationship between the plurality of candidate source tables, determining a target source table combination from the plurality of candidate source tables; and, on the basis of the target source table combination, generating a target wide table, and retrieving values of said plurality of pieces of first data from the target wide table. The solution can help users find data-related content that cannot be directly searched from external knowledge bases, thereby improving the accuracy of data retrieval.
Need to check novelty before this filing date? Find Prior Art

Description

Data retrieval methods, apparatus, and computing devices

[0001] This application claims priority to Chinese Patent Application No. 202411884254.4, filed on December 19, 2024, with the China National Intellectual Property Administration, entitled “Method, Apparatus and Computing Device for Data Retrieval”, the entire contents of which are incorporated herein by reference. Technical Field

[0002] This application relates to the field of cloud computing, and more specifically, to a method, apparatus, and computing device for data retrieval. Background Technology

[0003] Retrieval augmented generation (RAG) is an artificial intelligence technique that combines information retrieval technology with language generation models. A RAG system includes a retrieval module, which is primarily responsible for searching for documents or paragraphs most relevant to the user's input question from external knowledge bases (including structured and unstructured knowledge bases).

[0004] In existing technologies, when a user's question involves data-related retrieval, if the retrieval module cannot directly search for the data involved in the user's question from an external knowledge base (including structured and unstructured knowledge bases), it may choose to return an approximate but incorrect retrieval result, or not return any retrieval results, thereby reducing the accuracy of the retrieval.

[0005] Therefore, improving the accuracy of retrieval has become a pressing technical problem that needs to be solved. Summary of the Invention

[0006] This application provides a data retrieval method, apparatus, and computing device that can help users find data-related content that cannot be directly searched from external knowledge bases (including structured and unstructured knowledge bases), thereby improving the accuracy of retrieval.

[0007] In a first aspect, a data retrieval method is provided, comprising: determining a plurality of first data to be retrieved based on a user-input question; retrieving a plurality of candidate source tables related to the plurality of first data from a data storage medium, wherein each candidate source table includes a portion of target data from the plurality of first data; determining a target source table combination from the plurality of candidate source tables based on the coverage of each candidate source table to the plurality of first data and the association relationship between the plurality of candidate source tables, the target source table combination including at least two source tables from the plurality of candidate source tables; generating a target wide table based on the target source table combination, the target wide table including the plurality of first data; and retrieving the values ​​of the plurality of first data from the target wide table.

[0008] The above technical solution can merge data tables from multiple data sources, enabling further mining of data knowledge from the raw data. This can help users find data-related content that cannot be directly searched from external knowledge bases, thereby improving the accuracy of retrieval.

[0009] In conjunction with the first aspect, in some implementations of the first aspect, the problem includes multiple first data to be retrieved, and the method further includes: outputting the retrieval result corresponding to the problem to the user, the retrieval result including the values ​​of the multiple first data.

[0010] In the above technical solution, if the user's input question includes multiple first data to be retrieved, then the values ​​of the multiple first data obtained from the retrieval can be directly output to the user as the retrieval result.

[0011] In conjunction with the first aspect, in some implementations of the first aspect, the problem includes second data to be retrieved, and multiple first data to be retrieved are determined according to the calculation formula of the second data, wherein the values ​​of the multiple first data are used to calculate the value of the second data.

[0012] In the above technical solution, some indicator calculation or conversion formulas can be used to calculate the second data to be retrieved in the question, and obtain multiple first data of the value of the second data calculated by the user. This can further help the user find content related to the data to be retrieved in the question that cannot be found from external knowledge bases, thereby improving the accuracy of retrieval.

[0013] In conjunction with the first aspect, in some implementations of the first aspect, after obtaining the values ​​of the plurality of first data, the method further includes: calculating the value of the second data based on the values ​​of the plurality of first data and the calculation formula of the second data; and outputting the search results corresponding to the question to the user, wherein the search results include the value of the second data.

[0014] In the above technical solution, the value of the second data to be retrieved in the question is calculated by using the values ​​of multiple first data obtained through retrieval and the calculation formula of the second data. This helps users find content related to the data to be retrieved in the question that cannot be found from external knowledge bases, thereby improving the accuracy of retrieval.

[0015] In conjunction with the first aspect, in some implementations of the first aspect, the method further includes: inputting the question and the corresponding search results into a large model, using the large model to generate the answer to the question, wherein the input information of the large model includes the question and the corresponding search results, and the output information of the large model includes the answer to the question; and outputting the answer to the question to the user.

[0016] In the above technical solution, the answer to the user's input question can be generated using a large model based on the search results corresponding to the question, thereby improving the accuracy of the answer output by the large model in the intelligent question answering system.

[0017] In conjunction with the first aspect, in some implementations of the first aspect, the method further includes: outputting to the user at least one of the following: the plurality of first data to be retrieved, the second data to be retrieved, the plurality of candidate source tables, and the target wide table.

[0018] In the above technical solution, at least one of the following can be displayed to the user: multiple first data to be retrieved, second data to be retrieved, multiple candidate source tables, and target wide table, thereby increasing the user's confidence in the above search results.

[0019] In conjunction with the first aspect, in some implementations of the first aspect, the method further includes: receiving confirmation information from the user, the confirmation information being used to indicate that the user confirms at least one of the following: the plurality of first data to be retrieved, the second data to be retrieved, the plurality of candidate source tables, and the target wide table.

[0020] In the above technical solution, the system can also receive at least one of the multiple first data to be retrieved, the second data to be retrieved, the multiple candidate source tables, and the target wide table, as confirmed by the user, and generate retrieval results based on the information confirmed by the user, thereby further increasing the user's confidence in the retrieval results.

[0021] In conjunction with the first aspect, in some implementations of the first aspect, the data storage medium includes any one or more combinations of the following: data lake, data warehouse, database, file system, etc.

[0022] Secondly, a data retrieval apparatus is provided, comprising: a determining unit, a retrieving unit, and a generating unit. The determining unit is configured to determine a plurality of first data to be retrieved based on a user-input question; the retrieving unit is configured to retrieve a plurality of candidate source tables related to the plurality of first data from a data storage medium, wherein each candidate source table includes a portion of the plurality of first data; the determining unit is further configured to determine a target source table combination from the plurality of candidate source tables based on the coverage of each candidate source table to the plurality of first data and the association relationships between the plurality of candidate source tables, the target source table combination including at least two source tables from the plurality of candidate source tables; the generating unit is configured to generate a target wide table based on the target source table combination; and the retrieving unit is further configured to retrieve the values ​​of the plurality of first data from the target wide table.

[0023] In conjunction with the second aspect, in some implementations of the second aspect, the problem includes a plurality of first data to be retrieved, and the device further includes: an output module for outputting the retrieval result corresponding to the problem to the user, the retrieval result including the values ​​of the plurality of first data.

[0024] In conjunction with the second aspect, in some implementations of the second aspect, the problem includes second data to be retrieved, and the determining unit is specifically used to: determine a plurality of first data to be retrieved according to the calculation formula of the second data, wherein the values ​​of the plurality of first data are used to calculate the value of the second data.

[0025] In conjunction with the second aspect, in some implementations of the second aspect, the apparatus further includes: a calculation unit, configured to calculate the value of the second data based on the values ​​of the plurality of first data and the calculation formula of the second data after obtaining the values ​​of the plurality of first data; and an output unit, configured to output the search results corresponding to the question to the user, the search results including the value of the second data.

[0026] In conjunction with the second aspect, in some implementations of the second aspect, the generation unit is further configured to input the question and the corresponding search results into a large model, and use the large model to generate the answer to the question, wherein the input information of the large model includes the question and the corresponding search results, and the output information of the large model includes the answer to the question; the output unit is further configured to output the answer to the question to the user.

[0027] In conjunction with the second aspect, in some implementations of the second aspect, the output unit is also used to output at least one of the following to the user: the plurality of first data to be retrieved, the second data to be retrieved, the plurality of candidate source tables, and the target wide table.

[0028] In conjunction with the second aspect, in some implementations of the second aspect, the apparatus further includes: an acquisition unit for receiving confirmation information from the user, the confirmation information indicating that the user confirms at least one of the following: the plurality of first data to be retrieved, the second data to be retrieved, the plurality of candidate source tables, and the target wide table.

[0029] In conjunction with the second aspect, in some implementations of the second aspect, the data storage medium includes any one or more combinations of the following: data lake, data warehouse, database, file system, etc.

[0030] It should be understood that for the beneficial effects of the second aspect and its various implementations, please refer to the first aspect and its various implementations; they will not be repeated here.

[0031] Thirdly, a computing device is provided, including a processor and a memory, and optionally, an input / output interface. The processor controls the input / output interface to send and receive information, the memory stores a computer program, and the processor retrieves and runs the computer program from the memory, causing the computing device to execute the method of the first aspect or any possible implementation thereof.

[0032] Optionally, the processor can be a general-purpose processor, which can be implemented in hardware or software. When implemented in hardware, the processor can be a logic circuit, integrated circuit, etc.; when implemented in software, the processor can be a general-purpose processor that reads software code stored in memory. This memory can be integrated into the processor or located outside the processor and exist independently.

[0033] Fourthly, a computing device cluster is provided, including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, such that the computing device cluster performs the method of the first aspect or any possible implementation thereof.

[0034] Fifthly, a chip is provided that acquires and executes instructions to implement the methods described in the first aspect and any implementation thereof.

[0035] Optionally, as one implementation, the chip includes a processor and a data interface, through which the processor reads instructions stored in the memory and executes the methods in the first aspect and any implementation thereof.

[0036] Optionally, as one implementation, the chip may further include a memory storing instructions, and the processor is used to execute the instructions stored in the memory. When the instructions are executed, the processor is used to perform the method in the first aspect and any implementation thereof.

[0037] In a sixth aspect, a computer program product containing instructions is provided, which, when executed by a computing device, cause the computing device to perform the methods described in the first aspect and any implementation thereof.

[0038] In a seventh aspect, a computer program product containing instructions is provided, which, when run by a cluster of computing devices, cause the cluster of computing devices to perform the methods described in the first aspect and any implementation thereof.

[0039] Eighthly, a computer-readable storage medium is provided, including computer program instructions that, when executed by a computing device, perform the method as described in the first aspect and any implementation thereof.

[0040] As examples, these computer-readable storage devices include, but are not limited to, one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), flash memory, electrically EPROM (EEPROM), and hard drive.

[0041] Alternatively, as one implementation method, the aforementioned storage medium can specifically be a non-volatile storage medium.

[0042] A ninth aspect provides a computer-readable storage medium including computer program instructions that, when executed by a cluster of computing devices, perform the method as described in the first aspect and any implementation thereof.

[0043] As examples, these computer-readable storage devices include, but are not limited to, one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), flash memory, electrically EPROM (EEPROM), and hard drive.

[0044] Alternatively, as one implementation method, the aforementioned storage medium can specifically be a non-volatile storage medium. Attached Figure Description

[0045] Figure 1 is a schematic block diagram of a cloud scenario applicable to an embodiment of this application.

[0046] Figure 2 is a schematic flowchart of a data retrieval method provided in an embodiment of this application.

[0047] Figure 3 is a schematic block diagram of the output answer of an intelligent question-answering system provided in an embodiment of this application.

[0048] Figure 4 is a schematic block diagram of a data retrieval device 400 provided in an embodiment of this application.

[0049] Figure 5 is a schematic deployment diagram of a data retrieval apparatus according to an embodiment of this application.

[0050] Figure 6 is a schematic diagram of the architecture of a computing device 1500 provided in an embodiment of this application.

[0051] Figure 7 is a schematic diagram of the architecture of a computing device cluster provided in an embodiment of this application.

[0052] Figure 8 is a schematic diagram of the connection between computing devices 1500A and 1500B via a network provided in an embodiment of this application. Detailed Implementation

[0053] The technical solutions in this application will now be described with reference to the accompanying drawings.

[0054] This application will present various aspects, embodiments, or features relating to systems comprising multiple devices, components, modules, etc. It should be understood and appreciated that individual systems may include additional devices, components, modules, etc., and / or may not include all devices, components, modules, etc. discussed in conjunction with the accompanying drawings. Furthermore, combinations of these approaches are also possible.

[0055] Furthermore, in the embodiments of this application, the words "exemplary," "for example," etc., are used to indicate that they are examples, illustrations, or descriptions. Any embodiment or design scheme described as "exemplary" in this application should not be construed as being more preferred or advantageous than other embodiments or design schemes. Specifically, the use of the term "exemplary" is intended to present the concept in a concrete manner.

[0056] In the embodiments of this application, "corresponding" and "corresponding" can sometimes be used interchangeably. It should be noted that when the distinction is not emphasized, their intended meanings are consistent.

[0057] The business scenarios described in the embodiments of this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided in the embodiments of this application. As those skilled in the art will know, with the evolution of network architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.

[0058] References to "one embodiment" or "some embodiments" as described in this specification mean that one or more embodiments of this application include a specific feature, structure, or characteristic described in connection with that embodiment. Therefore, the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in still other embodiments," etc., appearing in different parts of this specification do not necessarily refer to the same embodiment, but rather mean "one or more, but not all, embodiments," unless otherwise specifically emphasized. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless otherwise specifically emphasized.

[0059] In this application, "at least one" means one or more, and "more than one" means two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can mean: A alone, A and B simultaneously, and B alone, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can mean: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple.

[0060] Retrieval augmented generation (RAG) is an artificial intelligence technique that combines information retrieval technology with language generation models. By integrating traditional retrieval-based question-answering systems with natural language generation techniques, this technology allows the model to utilize up-to-date information from external knowledge bases when generating answers. This overcomes some limitations of traditional generative models, such as outdated knowledge and susceptibility to misinterpretations. The principle is that when the model needs to generate text or answer a question, it first retrieves relevant information from a large collection of documents and then uses this retrieved information to guide text generation, thereby improving the quality and accuracy of predictions.

[0061] The RAG system mainly consists of two core modules: a retrieval module and a generation module. The retrieval module is primarily responsible for searching for the most relevant documents or paragraphs from external knowledge bases (including structured and unstructured knowledge bases) based on the user's input question, and then passing the retrieved documents or paragraphs to the generation module. The generation module is responsible for combining the retrieved documents or paragraphs with the user's question, and using a pre-trained language model (such as bidirectional and auto-regressive transformers (BART)) to generate the final natural language answer.

[0062] RAG technology has a wide range of applications; some possible scenarios are listed below.

[0063] 1. Question-and-answer system: RAG technology can be used to build a powerful question-and-answer system that can answer various questions raised by users.

[0064] 2. Document generation and automatic summarization: RAG technology can be used to automatically generate article paragraphs, documents, or automatic summaries, filling in text based on retrieved knowledge, making the generated content more informative.

[0065] 3. Intelligent Assistants and Virtual Agents: RAG technology can be used to build intelligent assistants or virtual agents that can answer user questions, provide information, and perform tasks by combining chat logs.

[0066] 4. Information retrieval: RAG technology can improve information retrieval systems, making them more accurate and insightful.

[0067] 5. Knowledge Graph Population: RAG technology can be used to populate entity relationships in knowledge graphs by identifying and adding new knowledge points through document retrieval.

[0068] In existing technologies, when a user's question involves data-related retrieval, if the retrieval module cannot directly search for the data involved in the user's question from an external knowledge base (including structured and unstructured knowledge bases), it may choose to return an approximate but incorrect retrieval result, or not return any retrieval results, thereby reducing the accuracy of the retrieval.

[0069] In view of this, embodiments of this application provide a data retrieval method that can help users find data-related content that cannot be directly searched from external knowledge bases (including structured and unstructured knowledge bases), thereby improving the accuracy of retrieval.

[0070] In one possible implementation, the method provided in this application embodiment can be applied to a cloud service scenario, where the method is executed by a cloud management platform within the cloud service scenario. For ease of description, the cloud service scenario will be described in detail below with reference to Figure 1.

[0071] Figure 1 is a schematic block diagram of a cloud scenario applicable to an embodiment of this application. As shown in Figure 1, the cloud scenario may include: a cloud management platform 110, the Internet 120, and a client 130.

[0072] As shown in Figure 1, the cloud management platform 110 is used to manage the infrastructure that provides multiple cloud services. The infrastructure includes multiple cloud data centers, each cloud data center includes multiple servers, and each server includes cloud service resources to provide corresponding cloud services to tenants.

[0073] The cloud management platform 110 can be located in a cloud data center and provides access interfaces (such as user interfaces or application program interfaces, APIs). Tenants can use client 130 to remotely access the cloud management platform 110, register a cloud account and password, and log in. After successful authentication of the cloud account and password, the tenant can further select and purchase virtual machines of specific specifications (processor, memory, disk) on the cloud management platform 110. After successful purchase, the cloud management platform 110 provides the remote login account and password for the purchased virtual machine, allowing client 130 to remotely log in and install and run the tenant's applications. Therefore, tenants can create, manage, log in to, and operate virtual machines in the cloud data center through the cloud management platform 110. Virtual machines can also be referred to as Elastic Compute Service (ECS) or Elastic Instances (different cloud service providers may use different names).

[0074] It should be understood that cloud service tenants can be individuals, businesses, schools, hospitals, government agencies, etc.

[0075] The cloud management platform 110 includes, but is not limited to, a user console, compute management services, network management services, storage management services, authentication services, and image management services. The user console provides an interface or API for interaction with tenants. The compute management services manage servers running virtual machines and containers, as well as bare metal servers. The network management services manage network services (such as gateways and firewalls). The storage management services manage storage services (such as data bucket services). The authentication services manage tenant account passwords. The image management services manage virtual machine images. Tenants can log in to the cloud management platform 110 via client 130 and the internet 120 to manage their rented cloud services.

[0076] The following is a detailed description of a data retrieval method provided by an embodiment of this application, with reference to Figure 2. It should be understood that the examples in Figure 2 are merely to help those skilled in the art understand the embodiments of this application, and are not intended to limit the embodiments to the specific values ​​or scenarios illustrated in Figure 2. Those skilled in the art can obviously make various equivalent modifications or variations based on the examples given below in Figure 2, and such modifications and variations also fall within the scope of the embodiments of this application.

[0077] Figure 2 is a schematic flowchart of a data retrieval method provided in an embodiment of this application. As shown in Figure 2, the method may include steps 210-260, which will be described in detail below.

[0078] For example, the method shown in Figure 2 can be executed by the RAG system, specifically by the retrieval module within the RAG system.

[0079] Step 210: Obtain the user's input question.

[0080] In this embodiment of the application, user input can be obtained. For example, user input can be received.

[0081] For example, a user might input the question: "Whether the XXX project meets the requirements of the 'XXX Project Technical Service Contract' and whether the payment process can be initiated."

[0082] Step 220: Determine multiple primary data to be retrieved based on the user's input question.

[0083] In this embodiment of the application, after obtaining the question input by the user, multiple first data to be retrieved can be determined based on the question input by the user.

[0084] The aforementioned multiple first data to be retrieved can be structured data or unstructured data, and this application embodiment does not specifically limit them.

[0085] It should be understood that the aforementioned multiple first data belong to the fields of the table in the database, that is, the aforementioned multiple first data are directly stored in the table of the database.

[0086] For example, if the user's input question includes multiple first data points to be retrieved, those multiple first data points can be directly used as the data to be retrieved.

[0087] Another example is if the user's input includes second data to be retrieved, which is not stored in a table in the database, meaning it does not belong to a field in a database table. In this embodiment, multiple sets of first data to be retrieved can be determined based on the index calculation formula of the second data, and the values ​​of these multiple sets of first data are used to calculate the value of the second data.

[0088] It should be understood that the above-mentioned indicators refer to quantitative standards used to measure and evaluate the performance, progress, or results of a specific field, project, or system. Typically, these indicators are presented in the form of calculation formulas, descriptions, etc.

[0089] For example, the above metrics are typically represented in languages ​​such as JSON, SQL, or Python, and the required data is obtained by parsing them. For instance, if the metric is represented in JSON, the key in the JSON can be parsed to obtain the desired data. Similarly, if the metric is represented in SQL, the syntax tree in the SQL can be parsed to obtain the desired data. Likewise, if the metric is represented in Python, the function in the Python can be parsed to obtain the input and output parameters required by the function, thereby obtaining the required data.

[0090] Optionally, in some embodiments, in order to increase the user's confidence in the search results, the embodiments of this application may also display multiple first data to be searched to the user and prompt the user to confirm whether to use the multiple first data as search data for retrieval.

[0091] Optionally, in some embodiments, multiple first data confirmed by the user may also be received, and the multiple first data confirmed by the user may be used as data to be retrieved.

[0092] For example, the following describes in detail the specific implementation method of determining multiple primary data to be retrieved based on the user's input question: "Check whether the XXX project meets the requirements of the 'XXX Project Technical Service Contract' and whether the payment process can be initiated."

[0093] For example, the retrieval module in the RAG system can retrieve the relevant content specifically referred to in the aforementioned "XXX Project Technical Service Contract" from an external knowledge base.

[0094] For example, the retrieval module can match the keywords in the above question with files in an external knowledge base, and extract the text in the external knowledge base that matches the keywords in the above question, thereby obtaining the relevant content specifically referred to in the above "XXX Project Technical Service Contract".

[0095] It should be understood that the aforementioned external knowledge base may include structured knowledge bases and unstructured knowledge bases. Structured knowledge bases include databases that store structured data, while unstructured knowledge bases include, but are not limited to, web page knowledge bases and terminology databases.

[0096] For example, the specific content referred to in the "XXX Project Technical Service Contract" retrieved by the above search module is as follows.

[0097] 1. Party B shall send at least X technical personnel to participate in the project, of which M shall be a senior technical personnel, S shall be a mid-level technical personnel, and N shall be a junior technical personnel.

[0098] 2. The total on-duty time of Party B shall not be less than P hours;

[0099] 3. Meeting conditions 1 and 2 will initiate the payment process.

[0100] Specifically, in this embodiment of the application, multiple first data to be retrieved are determined based on the relevant content specifically referred to in the above-mentioned retrieved "XXX Project Technical Service Contract".

[0101] For example, the data to be retrieved in the above questions includes: "the number of technical personnel actually dispatched by Party B during the execution of the XXX project", "the number of senior technical personnel actually dispatched", "the number of intermediate technical personnel actually dispatched", "the number of junior technical personnel actually dispatched", and "the total actual on-the-job time of Party B during the execution of the XXX project".

[0102] For example, since the data to be retrieved, such as "the number of technical personnel actually dispatched by Party B during the execution of the XXX project", "the number of senior technical personnel actually dispatched", "the number of intermediate technical personnel actually dispatched", and "the number of junior technical personnel actually dispatched", belong to the fields of the table in the database, the above-mentioned multiple first data to be retrieved include: "the number of technical personnel actually dispatched by Party B during the execution of the XXX project", "the number of senior technical personnel actually dispatched", "the number of intermediate technical personnel actually dispatched", and "the number of junior technical personnel actually dispatched".

[0103] For example, since "the total actual on-duty time of Party B during the execution of the XXX project" in the data to be retrieved does not belong to the fields of the table in the database, multiple first data related to the data to be retrieved can be calculated according to the calculation formula of "the total actual on-duty time of Party B during the execution of the XXX project".

[0104] An example formula for calculating total on-duty time is shown below:

[0105] Total on-duty time = Sum(endtime - starttime) where date_time > project start time

[0106] Where endtime represents the employee's departure time; starttime represents the employee's arrival time; date_time represents the statistical time, which must be greater than the project's start time.

[0107] It should be understood that the calculation formula for the above-mentioned "total on-duty time of personnel" indicator can be retrieved from historically accumulated indicator systems, or from data development systems, or generated based on natural language combined with large models and knowledge bases. This application embodiment does not specifically limit this.

[0108] In this embodiment of the application, the multiple first data to be retrieved according to the above-mentioned calculation formula for "total on-duty time of personnel" include: "the departure time and arrival time of Party B's personnel during the execution of the XXX project".

[0109] Step 230: Retrieve multiple candidate source tables related to the multiple first data from the data storage medium.

[0110] In this embodiment of the application, after determining the multiple first data to be retrieved in the user-input question, multiple candidate source tables related to the multiple first data can be retrieved.

[0111] For example, based on the multiple first data to be retrieved, a search can be performed in the data storage medium to find multiple candidate source tables associated with the multiple first data to be retrieved.

[0112] For example, using each of the multiple sets of first data as a search criterion, at least one candidate source table capable of containing each set of first data is retrieved from the data storage medium. That is, each candidate source table may include a portion of the multiple sets of first data.

[0113] In some embodiments, if no candidate source table related to certain first data is found during the current retrieval process, multiple iterations can be performed for the retrieval.

[0114] In some embodiments, if no candidate source table is found that is related to certain first data after multiple iterations, the user may be prompted to complete the values ​​of that part of the first data.

[0115] Specifically, the aforementioned data storage media may include, but are not limited to: data lakes, data warehouses, databases, file systems, etc.

[0116] Optionally, in some embodiments, a search can also be performed in the data storage medium based on the multiple first data to be retrieved and context information to find multiple candidate source tables associated with the multiple first data to be retrieved.

[0117] It should be understood that this contextual information can be contextual information related to the question input by the user. This contextual information could be, for example, information such as the business scenario related to the question input by the user, so that multiple candidate source tables associated with the multiple first data to be retrieved can be quickly found from the data storage medium based on this contextual information.

[0118] For example, the process of obtaining multiple candidate source tables will be described in detail below with specific examples.

[0119] For example, taking the first piece of data to be retrieved as "the number of technical personnel actually dispatched by Party B during the execution of Project XXX" as an example, and using "the number of technical personnel actually dispatched by Party B during the execution of Project XXX" as the search condition, at least one candidate source table that can cover "the number of technical personnel actually dispatched by Party B during the execution of Project XXX" can be retrieved from the data storage medium. For example, the at least one candidate source table includes Table A and Table B below.

[0120] Table A above is a personnel information table, which includes the following fields: [Project Name, Project ID, Employee Name, Employee ID, Company Name, Level, etc.].

[0121] Table B above is the project information table, which includes the following fields: [Project Name, Project ID, Winning Company, Winning Bid ID].

[0122] For example, taking the first piece of data to be retrieved as "the number of senior / intermediate / junior technical personnel actually dispatched by Party B during the execution of Project XXX", and using "the number of senior / intermediate / junior technical personnel actually dispatched by Party B during the execution of Project XXX" as the search condition, at least one candidate source table that can cover "the number of senior / intermediate / junior technical personnel actually dispatched by Party B during the execution of Project XXX" can be retrieved from the data storage medium. For instance, this at least one candidate source table includes Table A mentioned above.

[0123] For example, using the first piece of data to be retrieved as "the departure time and arrival time of the contractor's personnel during the execution of the XXX project" as the search condition, at least one candidate source table that can be retrieved from the data storage medium and covers "the departure time and arrival time of the contractor's personnel during the execution of the XXX project" can be identified. For instance, this at least one candidate source table includes the following table C.

[0124] Table C above is the employee sign-in table, which includes the following fields: [Employee Name, Employee ID, Arrival Time, Departure Time].

[0125] Optionally, in some embodiments, in order to increase the user's confidence in the search results, the embodiments of this application may also display multiple candidate source tables to the user and prompt the user to confirm whether to use the multiple candidate source tables as candidate source tables for determining the target source table combination.

[0126] Optionally, in some embodiments, multiple candidate source tables confirmed by the user may also be received, and the multiple candidate source tables confirmed by the user may be used to determine the target source table combination.

[0127] Step 240: Sort multiple candidate source tables by coverage to determine the target source table combination.

[0128] In this embodiment of the application, after obtaining multiple candidate source tables associated with multiple first data to be retrieved, the multiple candidate source tables can be sorted by coverage, the combination of source tables with the highest coverage can be selected, and the target wide table can be generated based on the combination of source tables with the highest coverage.

[0129] For example, a target source table combination can be determined from multiple candidate source tables based on the coverage of each candidate source table to multiple first data and the association between the multiple candidate source tables. The target source table combination may include at least two source tables from the multiple candidate source tables.

[0130] For example, based on the coverage of the multiple candidate source tables for the multiple first data to be retrieved, the candidate source table with the highest coverage can be used as the main table. Then, based on the relationship between the candidate source tables, the combination of candidate source tables with the highest coverage can be determined. This combination of candidate source tables with the highest coverage is also the target source table combination mentioned above.

[0131] In this embodiment of the application, the association between multiple candidate source tables can be determined based on the primary key (PK) and foreign key (FK) in the candidate source table.

[0132] It should be understood that a primary key (PK) is one or more fields in a table whose values ​​uniquely identify each row in the table. A table can only have one primary key. A foreign key (FK) is a field (or combination of fields) in a table that references the primary key of another table. Through foreign keys, it can be ensured that data referenced in one table actually exists in another table, thereby maintaining referential integrity of data.

[0133] In some embodiments, if no primary or foreign key is marked in the candidate source table, primary and foreign key mining can be performed to find columns in the candidate source table that can be used as primary keys and columns that are related to the primary keys of other candidate source tables as foreign keys.

[0134] For example, taking multiple candidate source tables including the aforementioned tables A, B, and C, the primary and foreign keys mined are as follows:

[0135] Table A: Personnel Information Table [Project Name, Project ID (PK), Employee Name, Employee ID (FK), Company Name, Level, etc.].

[0136] Table B: Project Information Table [Project Name, Project ID (PK), Winning Company, Winning Bid ID].

[0137] Table C: Employee Sign-in Sheet [Employee Name, Employee ID (PK), Arrival Time, Departure Time].

[0138] In this embodiment, tables A, B, and C are grouped according to their coverage rates. For example, the resulting groups include: a combination of tables A and C, a combination of tables A and B, and a combination of tables B and C. Specifically, tables A and C have a 100% coverage rate for the multiple target data to be retrieved, tables A and B have a 60% coverage rate, and tables B and C have a 60% coverage rate. Therefore, the combination of tables A and C can be used as the aforementioned target source table combination.

[0139] Step 250: Generate a target wide table based on the target source table.

[0140] In this embodiment of the application, a target wide table can be generated based on the combination of target source tables. This target wide table includes the aforementioned multiple first data to be retrieved. For example, at least two source tables included in the target source table combination can be merged into one table, which is the aforementioned target wide table.

[0141] Optionally, in some embodiments, in order to increase the user's confidence in the search results, the generated target wide table may also be displayed to the user, and the user may be prompted to confirm the generated target wide table.

[0142] Step 260: Retrieve the values ​​of multiple first data from the target wide table.

[0143] In this embodiment of the application, after obtaining the target wide table, since the fields of the target wide table can cover multiple first data to be retrieved, the values ​​of multiple first data can be retrieved from the target wide table.

[0144] In this embodiment, knowledge fusion can be performed on data from multiple data sources to further mine data knowledge from the original data. This can help users find data-related content that cannot be directly searched from external knowledge bases, thereby improving the accuracy of retrieval.

[0145] For example, if the user's input question contains multiple first data points to be retrieved, the values ​​of the multiple first data points obtained from the above retrieval can be directly output to the user as the retrieval result.

[0146] Another example is that if the user's input question contains second data, after obtaining the values ​​of the multiple first data, the value of the second data can be calculated based on the values ​​of the multiple first data and the calculation formula of the second data, and the calculated value of the second data can be output to the user as the search result.

[0147] This application does not specifically limit the application scenarios of the search results corresponding to the questions generated above. Some possible application scenarios are listed below.

[0148] In one possible implementation, the search results described above can be used in an intelligent question-answering system. For example, as shown in Figure 3, the search module can pass the user-input question and the corresponding search results to the generation module. The generation module then inputs the search results as a prompt along with the user-output question into a pre-trained large model, using the pre-trained large model to generate the final natural language answer, which is used to answer the user's question. The generation module can also output the generated natural language answer to the user.

[0149] The aforementioned large model can also be called a large language model or a large language model, and this application does not specifically limit it in the embodiments.

[0150] It should be understood that a prompt refers to the input text or instruction provided to a large model to instruct or guide it to produce specific outputs. A prompt can be a question, a description, a task instruction, or even a portion of the dialogue history. By designing and optimizing prompts, large models can be guided to generate expected responses or complete specific tasks.

[0151] In another possible implementation, the results of the above retrieval can also be used for source table discovery in the data extraction, transformation, and loading (ETL) process.

[0152] In another possible implementation, the results of the above retrieval can also be applied to business intelligence (BI) scenarios to help users generate target metrics / transformation results.

[0153] The methods provided by the embodiments of this application have been described in detail above with reference to Figures 1 to 3. The embodiments of the apparatus of this application will now be described in detail below with reference to Figures 4 to 8. It should be understood that the descriptions of the method embodiments correspond to the descriptions of the apparatus embodiments; therefore, any parts not described in detail can be referred to the preceding method embodiments.

[0154] Figure 4 is a schematic block diagram of a data retrieval device 400 provided in an embodiment of this application. The device 400 can be implemented by software, hardware, or a combination of both. The device 400 provided in this embodiment can implement the method flow shown in Figure 2 of this embodiment. The device 400 includes: a determining unit 410, a retrieving unit 420, and a generating unit 430. Specifically, the determining unit 410 is used to determine multiple first data to be retrieved based on a user-input question; the retrieving unit 420 is used to retrieve multiple candidate source tables related to the multiple first data from a data storage medium, wherein each candidate source table includes a portion of the multiple first data; the determining unit 410 is further used to determine a target source table combination from the multiple candidate source tables based on the coverage of each candidate source table to the multiple first data and the association relationship between the multiple candidate source tables, wherein the target source table combination includes at least two source tables from the multiple candidate source tables; the generating unit 430 is used to generate a target wide table based on the target source table combination; and the retrieving unit 420 is further used to retrieve the values ​​of the multiple first data from the target wide table.

[0155] Optionally, the question includes multiple first data to be retrieved, and the device 400 further includes an output module for outputting the search results corresponding to the question to the user, wherein the search results include the values ​​of the multiple first data.

[0156] Optionally, the problem includes second data to be retrieved, and the determining unit 410 is specifically used to: determine a plurality of first data to be retrieved according to the calculation formula of the second data, wherein the values ​​of the plurality of first data are used to calculate the value of the second data.

[0157] Optionally, the device 400 further includes: a calculation unit, configured to calculate the value of the second data based on the values ​​of the plurality of first data and the calculation formula of the second data after obtaining the values ​​of the plurality of first data; and an output unit, configured to output the search results corresponding to the question to the user, the search results including the value of the second data.

[0158] Optionally, the generation unit 430 is further configured to input the question and the corresponding search results into the large model, and use the large model to generate the answer to the question, wherein the input information of the large model includes the question and the corresponding search results, and the output information of the large model includes the answer to the question; the output unit is further configured to output the answer to the question to the user.

[0159] Optionally, the output unit is also configured to output at least one of the following to the user: the plurality of first data to be retrieved, the second data to be retrieved, the plurality of candidate source tables, and the target wide table.

[0160] Optionally, the device 400 further includes: an acquisition unit for receiving confirmation information from the user, the confirmation information indicating that the user confirms at least one of the following: the plurality of first data to be retrieved, the second data to be retrieved, the plurality of candidate source tables, and the target wide table.

[0161] Optionally, the data storage medium may include any one or more combinations of the following: data lake, data warehouse, database, file system, etc.

[0162] The device 400 here can be embodied in the form of a functional module. The term "unit" here can be implemented in software and / or hardware, without specific limitations.

[0163] For example, a "unit" can be a software program, a hardware circuit, or a combination of both that implements the above functions. For instance, the implementation of unit 410 will be described below using unit 410 as an example. Similarly, the implementation of other units, such as retrieval unit 420, generation unit 430, output module, and calculation unit, can refer to the implementation of unit 410.

[0164] As an example of a software functional unit, determining unit 410 may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Further, the aforementioned computing instance may be one or more. For example, determining unit 410 may include code running on multiple hosts / virtual machines / containers. It should be noted that the multiple hosts / virtual machines / containers used to run the code may be distributed in the same region or in different regions. Further, the multiple hosts / virtual machines / containers used to run the code may be distributed in the same availability zone (AZ) or in different AZs, each AZ including one or more geographically proximate data centers. Typically, a region may include multiple AZs.

[0165] Similarly, multiple hosts / virtual machines / containers used to run this code can be distributed within the same Virtual Private Cloud (VPC) or across multiple VPCs. Typically, a VPC is set up within a region. Communication between two VPCs within the same region, as well as between VPCs in different regions, requires a communication gateway to be set up within each VPC to enable interconnection between VPCs.

[0166] As an example of a hardware functional unit, the determining unit 410 may include at least one computing device, such as a server. Alternatively, the determining unit 410 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD may be implemented using a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), generic array logic (GAL), or any combination thereof.

[0167] The multiple computing devices included in the determination unit 410 can be distributed in the same region or in different regions. Similarly, the multiple computing devices included in the acquisition unit 410 can be distributed in the same Availability Zone (AZ) or in different AZs. Likewise, the multiple computing devices included in the determination unit 410 can be distributed in the same Virtual Private Cloud (VPC) or in multiple VPCs. These multiple computing devices can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.

[0168] Therefore, the units of the various examples described in the embodiments of this application can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0169] It should be noted that the device provided in the above embodiments is only illustrated by the division of the functional units described above when executing the above methods. In actual applications, the functions can be assigned to different functional units as needed, that is, the internal structure of the device can be divided into different functional units to complete all or part of the functions described above. For example, the determining unit 410 can be used to execute any step in the above methods, the retrieving unit 420 can be used to execute any step in the above methods, the generating unit 430 can be used to execute any step in the above methods, the output module can be used to execute any step in the above methods, and the calculation unit can be used to execute any step in the above methods. The steps implemented by the determining unit 410, the retrieving unit 420, the generating unit 430, the output module, and the calculation unit can be specified as needed. By implementing different steps in the above methods through the determining unit 410, the retrieving unit 420, the generating unit 430, the output module, and the calculation unit, all the functions of the above device can be realized.

[0170] Furthermore, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and their specific implementation process can be found in the method embodiments above, which will not be repeated here.

[0171] Figure 5 is a schematic deployment diagram of a data retrieval apparatus according to an embodiment of this application. As shown in Figure 5, the data retrieval apparatus can be abstracted into a cloud service by a cloud service provider on a cloud management platform and provided to users. After a user purchases the cloud service on the cloud management platform, the cloud environment uses the cloud service to provide the user with a cloud service for data retrieval.

[0172] As shown in Figure 5, the above-mentioned data retrieval device can also be abstracted into a cloud service by the cloud service provider on the cloud management platform and provided to the user. After the user purchases the cloud service on the cloud management platform, the cloud environment uses the cloud service to provide the user with cloud services for data retrieval.

[0173] For example, a tenant logs into the cloud management platform via a pre-registered account and password on the public cloud access page. After successfully logging in, the tenant selects and purchases a data retrieval cloud service on the cloud management platform. If the tenant purchases the data retrieval cloud service, the tenant can then utilize that cloud service to perform computations that provide data retrieval functionality.

[0174] For example, a cloud management platform is primarily used to manage the infrastructure for running cloud services for data retrieval. This infrastructure may include multiple data centers located in different regions, each containing multiple servers. Data centers can provide basic resources for the data retrieval cloud services, such as computing resources and storage resources. Therefore, when tenants purchase and use cloud services for data retrieval, they primarily pay for the resources they use.

[0175] As shown in Figure 5, taking a tenant's purchase of a data retrieval cloud service as an example, the user can upload at least one of the following information to the cloud environment through an application program interface (API) or a web interface provided by the cloud management platform: the user's question; the data retrieval cloud service calling the data retrieval device to obtain the data retrieval results; and the obtained data retrieval results being returned to the user's terminal device through the cloud environment.

[0176] The aforementioned terminal devices can be mobile phones, laptops, tablets, handheld computers, wireless terminals in smart cities, wireless devices in smart homes, etc.

[0177] In some embodiments, users can also upload configuration data via the configuration interface or API on the public cloud access page provided by the cloud management platform. This configuration data may include, but is not limited to, at least one of the following: the number of iterations, the external knowledge base used, etc.

[0178] In this embodiment of the application, when the above-mentioned data retrieval device is a software device, the device can be deployed on a computing device in any environment, or on a computing device cluster consisting of multiple computing devices in any environment.

[0179] The method provided in this application can be executed by a computing device, which can also be referred to as a computer system. It includes a hardware layer, an operating system layer running on top of the hardware layer, and an application layer running on the operating system layer. The hardware layer includes hardware such as processing units, memory, and memory control units; the functions and structure of this hardware will be described in detail later. The operating system can be any one or more computer operating systems that implement business processing through processes, such as Linux, Unix, Android, iOS, or Windows. The application layer includes applications such as browsers, address books, word processing software, and instant messaging software. Optionally, the computer system can be a handheld device such as a smartphone, or a terminal device such as a personal computer; this application does not particularly limit this, as long as the method provided in this application can be used. The executing entity of the method provided in this application can be a computing device, or a functional module within the computing device capable of calling and executing programs.

[0180] The following describes in detail a computing device provided in an embodiment of this application, with reference to Figure 6.

[0181] Figure 6 is a schematic diagram of the architecture of a computing device 1500 provided in an embodiment of this application. The computing device 1500 may be a server, a computer, or other device with computing capabilities. The computing device 1500 shown in Figure 6 includes at least one processor 1510 and a memory 1520.

[0182] It should be understood that this application does not limit the number of processors and memories in the computing device 1500.

[0183] The processor 1510 executes instructions in the memory 1520, causing the computing device 1500 to implement the method provided in this application. Alternatively, the processor 1510 executes instructions in the memory 1520, causing the computing device 1500 to implement the various functional modules provided in this application, thereby implementing the method provided in this application.

[0184] Optionally, the computing device 1500 also includes a communication interface 1530. The communication interface 1530 uses a transceiver module, such as, but not limited to, a network interface card or a transceiver, to enable communication between the computing device 1500 and other devices or communication networks.

[0185] Optionally, the computing device 1500 also includes a system bus 1540, wherein the processor 1510, memory 1520, and communication interface 1530 are respectively connected to the system bus 1540. The processor 1510 can access the memory 1520 through the system bus 1540; for example, the processor 1510 can perform data read / write or code execution in the memory 1520 through the system bus 1540. The system bus 1540 is a peripheral component interconnect express (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The system bus 1540 is divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is used in Figure 6, but this does not mean that there is only one bus or one type of bus.

[0186] In one possible implementation, the processor 1510 primarily functions to interpret the instructions (or code) of a computer program and process data within the computer software. The instructions of the computer program and the data within the computer software can be stored in memory 1520 or cache 1516.

[0187] Optionally, processor 1510 may be an integrated circuit chip with signal processing capabilities. By way of example and not limitation, processor 1510 may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. Among these, a general-purpose processor is a microprocessor, etc. For example, processor 1510 may be a central processing unit (CPU).

[0188] Optionally, each processor 1510 includes at least one processing unit 1512 and a memory control unit 1514.

[0189] Optionally, the processing unit 1512, also known as the core, is the most important component of the processor. The processing unit 1512 is manufactured from single-crystal silicon using a specific production process. All calculations, command reception, command storage, and data processing are performed by the core. Each processing unit independently executes program instructions, utilizing parallel computing capabilities to accelerate program execution. Various processing units have fixed logical structures; for example, a processing unit includes logical units such as a Level 1 cache, a Level 2 cache, an execution unit, an instruction-level unit, and a bus interface.

[0190] In one implementation example, the memory control unit 1514 controls the data interaction between the memory 1520 and the processing unit 1512. Specifically, the memory control unit 1514 receives memory access requests from the processing unit 1512 and controls access to memory based on the memory access requests. By way of example and not limitation, the memory control unit is a device such as a memory management unit (MMU).

[0191] In one implementation example, each memory control unit 1514 addresses the memory 1520 via the system bus. An arbitrator (not shown in Figure 6) is configured on the system bus to handle and coordinate contention for access by multiple processing units 1512.

[0192] In one implementation example, the processing unit 1512 and the memory control unit 1514 are connected via internal chip connection lines, such as address lines, thereby enabling communication between the processing unit 1512 and the memory control unit 1514.

[0193] Optionally, each processor 1510 also includes a cache 1516, which is a buffer for data exchange (called a cache). When the processing unit 1512 needs to read data, it first looks for the required data in the cache. If the data is found, it is executed directly; otherwise, it looks for the data in memory. Since the cache operates much faster than memory, its purpose is to help the processing unit 1512 run faster.

[0194] The memory 1520 provides runtime space for processes in the computing device 1500. For example, the memory 1520 stores the computer program (specifically, the program code) used to generate the process. After the computer program is run by the processor to generate a process, the processor allocates corresponding storage space for the process in the memory 1520. Furthermore, the aforementioned storage space further includes text segments, initialized data segments, bit initialized data segments, stack segments, heap segments, etc. The memory 1520 stores data generated during the process's execution, such as intermediate data or process data, in the aforementioned process-specific storage space.

[0195] Optionally, the memory, also known as RAM, is used to temporarily store the data processed by the processor 1510, as well as data exchanged with external storage devices such as hard disks. As long as the computer is running, the processor 1510 will load the data that needs to be processed into RAM for processing, and after the processing is completed, the processing unit 1512 will send the result out.

[0196] By way of example and not limitation, memory 1520 is volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Non-volatile memory is read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory is random access memory (RAM) used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct rambus RAM (DR RAM). It should be noted that the memory 1520 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.

[0197] The structure of the computing device 1500 listed above is merely illustrative and is not limited thereto. The computing device 1500 in this application includes various hardware components in existing computer systems. For example, the computing device 1500 also includes other memories besides memory 1520, such as disk storage. Those skilled in the art should understand that the computing device 1500 may also include other devices necessary for normal operation. Furthermore, depending on specific needs, those skilled in the art should understand that the computing device 1500 may also include hardware devices for implementing other additional functions. In addition, those skilled in the art should understand that the computing device 1500 may only include the devices necessary for implementing the embodiments of this application, and not necessarily all the devices shown in FIG. 6.

[0198] This application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device may be a server. In some embodiments, the computing device may also be a desktop computer, a laptop computer, or a smartphone, or other terminal device.

[0199] As shown in Figure 7, the computing device cluster includes at least one computing device 1500. The memory 1520 of one or more computing devices 1500 in the computing device cluster may store the same instructions for performing the above-described methods.

[0200] In some possible implementations, the memory 1520 of one or more computing devices 1500 in the computing device cluster may also each store a portion of the instructions for executing the above-described methods. In other words, a combination of one or more computing devices 1500 can jointly execute the instructions of the above-described methods.

[0201] It should be noted that the memory 1520 in different computing devices 1500 within the computing device cluster can store different instructions, each used to execute a portion of the functions of the aforementioned device. That is, the instructions stored in the memory 1520 of different computing devices 1500 can implement the functions of one or more modules within the aforementioned device.

[0202] In some possible implementations, one or more computing devices in a computing device cluster can be connected via a network. This network can be a wide area network (WAN) or a local area network (LAN), etc. Figure 8 illustrates one possible implementation. As shown in Figure 8, two computing devices, 1500A and 1500B, are connected via a network. Specifically, they are connected to the network through communication interfaces in each computing device.

[0203] It should be understood that the functions of computing device 1500A shown in Figure 8 can also be performed by multiple computing devices 1500. Similarly, the functions of computing device 1500B can also be performed by multiple computing devices 1500.

[0204] In this embodiment, a computer program product containing instructions is also provided. The computer program product may be a software or program product containing instructions capable of running on a computing device or stored on any usable medium. When run on a computing device, it causes the computing device to perform the methods provided above, or causes the computing device to perform the functions of the apparatus provided above.

[0205] In this embodiment, a computer-readable storage medium is also provided. This computer-readable storage medium can be any available medium that a computing device can store, or a data storage device such as a data center containing one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state drive). The computer-readable storage medium includes instructions that, when executed on a computing device, cause the computing device to perform the method described above.

[0206] It should be understood that in the various embodiments of this application, the order of the above-mentioned processes does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

[0207] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0208] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0209] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0210] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0211] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0212] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0213] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A data retrieval method, characterized in that, The method includes: The system determines multiple primary data points to be retrieved based on the user's input question. Retrieve multiple candidate source tables related to the plurality of first data from the data storage medium, wherein each candidate source table includes a portion of the first data from the plurality of first data; Based on the coverage of each candidate source table to the plurality of first data and the association between the plurality of candidate source tables, a target source table combination is determined from the plurality of candidate source tables, the target source table combination including at least two source tables from the plurality of candidate source tables; Generate a target wide table based on the target source table; The values ​​of the plurality of first data are retrieved from the target wide table.

2. The method according to claim 1, characterized in that, The problem includes multiple sets of first data to be retrieved, and the method further includes: The search results corresponding to the question are output to the user, and the search results include the values ​​of the plurality of first data.

3. The method according to claim 1, characterized in that, The question includes second data to be retrieved. The step of determining multiple first data to be retrieved based on the user-input question includes: The plurality of first data to be retrieved are determined according to the calculation formula of the second data, wherein the values ​​of the plurality of first data are used to calculate the value of the second data.

4. The method according to claim 3, characterized in that, After obtaining the values ​​of the plurality of first data, the method further includes: The value of the second data is calculated based on the values ​​of the plurality of first data and the calculation formula of the second data. The search results corresponding to the question are output to the user, and the search results include the value of the second data.

5. The method according to any one of claims 2 to 4, characterized in that, The method further includes: The question and the corresponding search results are input into a large model, and the large model is used to generate the answer to the question. The input information of the large model includes the question and the corresponding search results, and the output information of the large model includes the answer to the question. Output the answer to the question to the user.

6. The method according to any one of claims 1 to 5, characterized in that, The method further includes: Output at least one of the following to the user: the plurality of first data to be retrieved, the second data to be retrieved, the plurality of candidate source tables, and the target wide table.

7. The method according to claim 6, characterized in that, The method further includes: Receive confirmation information from the user, the confirmation information being used to instruct the user to confirm at least one of the following: the plurality of first data to be retrieved, the second data to be retrieved, the plurality of candidate source tables, and the target wide table.

8. The method according to any one of claims 1 to 7, characterized in that, The data storage medium includes any one or more combinations of the following: data lake, data warehouse, database, and file system.

9. A data retrieval device, characterized in that, The device includes: The determining unit is used to determine multiple first data to be retrieved based on the question input by the user; A retrieval unit is configured to retrieve from a data storage medium a plurality of candidate source tables related to the plurality of first data, wherein each of the candidate source tables includes a portion of the first data from the plurality of first data; The determining unit is further configured to determine a target source table combination from the multiple candidate source tables based on the coverage of each candidate source table to the multiple first data and the association relationship between the multiple candidate source tables, wherein the target source table combination includes at least two source tables from the multiple candidate source tables; The generation unit is used to generate a target wide table based on the target source table; The retrieval unit is also used to retrieve the values ​​of the plurality of first data from the target wide table.

10. The apparatus according to claim 9, characterized in that, The problem includes the multiple first data to be retrieved, and the device further includes: The output module is used to output the search results corresponding to the question to the user, and the search results include the values ​​of the plurality of first data.

11. The apparatus according to claim 9, characterized in that, The question includes second data to be retrieved. The determining unit is specifically used for: The plurality of first data to be retrieved are determined according to the calculation formula of the second data, wherein the values ​​of the plurality of first data are used to calculate the value of the second data.

12. The apparatus according to claim 11, characterized in that, The device further includes: A calculation unit is configured to, after obtaining the values ​​of the plurality of first data, calculate the value of the second data based on the values ​​of the plurality of first data and the calculation formula of the second data; The output unit is used to output the search results corresponding to the question to the user, and the search results include the value of the second data.

13. The apparatus according to any one of claims 10 to 12, characterized in that, The generation unit is further configured to input the question and the search results corresponding to the question into a large model, and use the large model to generate the answer corresponding to the question, wherein the input information of the large model includes the question and the search results corresponding to the question, and the output information of the large model includes the answer corresponding to the question; The output module is also used to output the answer to the question to the user.

14. The apparatus according to any one of claims 9 to 13, characterized in that, The output unit is also configured to output at least one of the following to the user: the plurality of first data to be retrieved, the second data to be retrieved, the plurality of candidate source tables, and the target wide table.

15. The apparatus according to claim 14, characterized in that, The device further includes: The acquisition unit is further configured to receive confirmation information from the user, the confirmation information being used to instruct the user to confirm at least one of the following: the plurality of first data to be retrieved, the second data to be retrieved, the plurality of candidate source tables, and the target wide table.

16. The apparatus according to any one of claims 9 to 15, characterized in that, The data storage medium includes any one or more combinations of the following: data lake, data warehouse, database, and file system.

17. A computing device cluster, characterized in that, It includes at least one computing device, each computing device including a processor and memory; The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method as described in any one of claims 1 to 8.

18. A computer program product containing instructions, characterized in that, When the instruction is executed by the computing device cluster, the computing device cluster causes the computing device cluster to perform the method as described in any one of claims 1 to 8.

19. A computer-readable storage medium, characterized in that, It includes computer program instructions, which, when executed by a cluster of computing devices, perform the method as described in any one of claims 1 to 8.