Data query method and electronic device

By employing different query strategies based on the type of data query task and utilizing knowledge bases and knowledge graphs for data querying, the problems of uneven document segmentation and insufficient generalization ability for complex tasks are solved, thereby improving the accuracy and efficiency of data querying.

CN122240658APending Publication Date: 2026-06-19ZHEJIANG DAHUA TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHEJIANG DAHUA TECH CO LTD
Filing Date
2026-02-03
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Current data retrieval technologies, especially document knowledge retrieval methods, suffer from uneven document segmentation, incomplete knowledge fragmentation information, and insufficient generalization ability for complex tasks, which affect the accuracy of data queries.

Method used

By identifying the task type of the data query task, different query strategies are adopted, utilizing knowledge bases and knowledge graphs for data querying. For comparison analysis tasks, the task is broken down into multiple sub-tasks and matched in the knowledge base; for knowledge reasoning tasks, keyword information is obtained and analyzed in the knowledge graph.

Benefits of technology

It improves the accuracy and efficiency of data retrieval, reduces retrieval interference, and enhances the quality of document knowledge feature extraction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240658A_ABST
    Figure CN122240658A_ABST
Patent Text Reader

Abstract

This invention discloses a data query method and electronic device, comprising: acquiring and determining the task type of a data query task; in response to the task type being a first task type, acquiring multiple subtasks in the data query task, wherein the first task type is a comparison analysis type; determining the target data of the data query task using the subtasks and a knowledge base, wherein the knowledge base is constructed by semantic clustering fragments, summary knowledge fragments, and related question knowledge fragments of text blocks in preset text materials; or in response to the task type being a second task type, acquiring keyword information in the data query task, wherein the second task type is a knowledge reasoning type; and determining the target data of the data query task using a knowledge graph, a knowledge base, and keyword information, wherein the knowledge graph is constructed by triple data of text blocks in preset text materials. In other words, this application can effectively reduce retrieval interference, improve the quality and efficiency of document knowledge feature extraction, and enhance the accuracy of data queries.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the technical field of data retrieval, and in particular to a data query method and electronic device. Background Technology

[0002] Current data retrieval technologies, especially task-oriented retrieval methods for document knowledge, such as retrieval based on question posed, enable further learning of data retrieval and significantly improve data retrieval performance.

[0003] However, current technical solutions, while establishing knowledge bases, suffer from several problems. For instance, during knowledge base construction, the varying knowledge structures and complex content descriptions of different documents, such as textual descriptions of tabular data, special symbols, and mathematical formulas, often result in uneven document segmentation and incomplete information in the segmented knowledge fragments. Furthermore, for retrieval of more complex tasks, the current methods have limited generalization capabilities due to the diverse types of complex tasks, failing to accurately obtain the data required for data retrieval and thus affecting the accuracy of data queries. Summary of the Invention

[0004] The technical solution to the main technical problem addressed in this application is to provide a data query method and electronic device that can effectively determine the task type of the data query task, and then determine different query strategies based on the task type, thereby improving the accuracy of data query.

[0005] To address the aforementioned technical problems, this application provides a data query method comprising: acquiring a data query task and determining the task type of the data query task; in response to the task type of the data query task being a first task type, acquiring multiple subtasks in the data query task, wherein the first task type is a comparison analysis type; using the subtasks and a knowledge base to determine the target data of the data query task, wherein the knowledge base is constructed by semantic clustering fragments, summary knowledge fragments, and related question knowledge fragments of text blocks in a preset text material; or in response to the task type of the data query task being a second task type, acquiring keyword information in the data query task, wherein the second task type is a knowledge reasoning type; and using a knowledge graph, the knowledge base, and the keyword information to determine the target data of the data query task, wherein the knowledge graph is constructed by triple data of text blocks in the preset text material.

[0006] In some embodiments, in response to the data query task being a first task type, obtaining multiple subtasks in the data query task includes: in response to the data query task being a first task type, obtaining multiple main keywords in the data query task; using the main keywords to split the data query task into multiple statements, and using one of the statements as a subtask, wherein each statement contains at least one of the main keywords.

[0007] In some embodiments, determining the target data for the data query task using the subtasks and the knowledge base includes: obtaining semantic clustering segments, summary knowledge segments, and related question knowledge segments of text blocks in preset text materials, and constructing the knowledge base; using the subtasks to perform matching in the knowledge base, performing knowledge reasoning on the successfully matched data, and using the knowledge reasoning results as the target data for the data query task.

[0008] In some embodiments, obtaining semantic clustering segments, summary knowledge segments, and related question knowledge segments of text blocks in a preset text material, and constructing the knowledge base, includes: segmenting the preset text material into text blocks to obtain multiple text blocks; splitting each text block into sentences and clustering adjacent sentences based on semantic similarity to obtain semantic clustering segments; in response to a text block length exceeding a segment length threshold, extracting knowledge features from the text block to obtain corresponding summary knowledge segments, and generating related question knowledge from the text block to obtain corresponding related question knowledge segments; and constructing the knowledge base using the semantic clustering segments, the summary knowledge segments, and the related question knowledge segments.

[0009] In some embodiments, the step of splitting each text block into sentences and clustering adjacent sentences based on semantic similarity to obtain semantic clustering fragments includes: splitting each text block into sentences to obtain multiple sentences, obtaining the semantic similarity between each current sentence and adjacent sentences, and obtaining the sentence length after merging the current sentence and adjacent sentences; in response to the semantic similarity being greater than a preset clustering threshold, merging the current sentence and adjacent sentences into a semantic clustering fragment when the sentence length is less than a fragment length threshold, or when the sentence length is greater than the fragment length threshold, taking the current sentence greater than the minimum fragment length threshold as a semantic clustering fragment, and merging the current sentence and adjacent sentences less than the minimum fragment length threshold into a semantic clustering fragment; or in response to the semantic similarity being less than the preset clustering threshold, taking the current sentence as a semantic clustering fragment when the sentence length is greater than the fragment length threshold.

[0010] In some embodiments, the step of extracting knowledge features from the text block to obtain corresponding summary knowledge fragments and generating related question knowledge from the text block to obtain corresponding related question knowledge fragments in response to the text block length being greater than a fragment length threshold includes: obtaining the text block length of the text block; obtaining a summary text of the text block using a knowledge feature extraction method in response to the text block length being greater than a fragment length threshold; obtaining the semantic length between the summary text and the text block, wherein the semantic length is determined using semantic similarity; using the summary text as the summary knowledge fragment in response to the semantic length between the summary text and the text block being within a threshold range, or using the content of the text block as the summary knowledge fragment in response to the semantic length between the summary text and the text block being outside a threshold range; obtaining related question knowledge of the text block using a knowledge feature extraction method, and determining the related question knowledge fragments using the related question knowledge.

[0011] In some embodiments, obtaining the semantic length between the summary text and the text block includes: obtaining a first sentence length of the summary text and a second sentence length of the text block, and obtaining a first semantic similarity between the summary text and the text block; determining a first value using the first sentence length, the second sentence length, and the first semantic similarity, determining a second value using the first sentence length and the second sentence length, and determining the semantic length using the first value and the second value; obtaining relevant question knowledge of the text block using a knowledge feature extraction method, and determining the relevant question knowledge fragments using the relevant question knowledge, including: obtaining a first text length of the text block, and obtaining a first number of semantic clustering fragments in the text block and a second text length of the semantic clustering fragments; determining a third value using a first constant coefficient and the first number, and determining a fourth value using the first text length and the second text length, and then determining the text semantic length of the text block for reverse generation of relevant questions using the third value and the fourth value; determining the number of relevant question knowledge generated by the text block using the text semantic length, the first text length of the text block, the fragment length threshold, and the second constant coefficient, and then determining the relevant question knowledge fragments.

[0012] In some embodiments, determining the target data for the data query task using the knowledge graph, the knowledge base, and the keyword information includes: obtaining document information, subtopic information, and topic keyword information of text blocks in the preset text material, constructing triplet data, and constructing the knowledge graph using the triplet data; obtaining relevant target triplets from the knowledge graph using the keyword information; matching the target triplets in the knowledge base, performing knowledge reasoning on the successfully matched data, and using the knowledge reasoning result as the target data for the data query task.

[0013] In some embodiments, obtaining document information, subtopic information, and topic keyword information of text blocks in the preset text material, constructing triplet data, and using the triplet data to construct the knowledge graph includes: obtaining the document title, chapter, text block number, subtopic, subtopic semantic text, text keywords, and core keyword text of the text block; constructing document information triplets using the document title, the chapter, and the text block number; constructing text block subtopic triplets using the text block number, the subtopic, and the subtopic semantic text; constructing keyword triplets using the subtopic semantic text, the text keywords, and the core keyword text; determining the triplet data using the document information triplets, the text block subtopic triplets, and the keyword triplets; and constructing the document topic knowledge graph using the triplet data.

[0014] To solve the above-mentioned technical problems, another technical solution adopted in this application is to provide an electronic device, the electronic device including a memory and a processor coupled to the memory, the memory storing at least one computer program, which, when loaded and executed by the processor, is used to implement the method as described above.

[0015] Unlike current technologies, the data query method provided in this application includes: acquiring a data query task and determining the task type of the data query task; in response to the task type of the data query task being a first task type, acquiring multiple subtasks in the data query task, wherein the first task type is a comparison analysis type; using the subtasks and a knowledge base to determine the target data of the data query task, wherein the knowledge base is constructed by semantic clustering fragments, summary knowledge fragments, and related question knowledge fragments of text blocks in a preset text material; or in response to the task type of the data query task being a second task type, acquiring keyword information in the data query task, wherein the second task type is a knowledge reasoning type; using a knowledge graph, the knowledge base, and the keyword information to determine the target data of the data query task, wherein the knowledge graph is constructed by triple data of text blocks in a preset text material. In this application, by determining the task type of the data query task, and then conducting strategy analysis based on the task type, the data query task is broken down into multiple sub-tasks when the task type is a comparative analysis type. The target data is then determined from the knowledge base based on the sub-tasks. If the task type is a knowledge reasoning type, keyword information in the query task is obtained. Then, the target data is determined from the knowledge base using the keyword information and knowledge graph. This effectively reduces retrieval interference, improves the quality and efficiency of document knowledge feature extraction, and enhances the accuracy of data query. Attached Figure Description

[0016] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. Wherein: Figure 1 This is a flowchart illustrating an embodiment of the data query method in this application; Figure 2 This is a flowchart illustrating an embodiment of the overall principle in this application; Figure 3 This is a flowchart illustrating one embodiment of the query principle in this application; Figure 4 This is a schematic diagram of the structure of an embodiment of the electronic device in this application; Figure 5 This is a schematic diagram of an embodiment of a computer-readable storage medium in this application. Detailed Implementation

[0017] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be particularly noted that the following embodiments are for illustrative purposes only and do not limit the scope of the invention. Similarly, the following embodiments are only some, not all, embodiments of the present invention, and all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0018] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of the invention. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.

[0019] Current data query methods, especially knowledge-based data retrieval methods, suffer from several problems. For example, generating question-level knowledge from sentence-level knowledge using large language models incurs high computational costs, latency, and redundancy in question knowledge data. Alternatively, document segmentation and granularity selection rely on heuristic rules, lacking adaptability; referential relation extraction depends on high-precision NLP tools, leading to severe error propagation; native reference relations rely on explicit annotation, ensuring effective coverage; and the failure to consider the discrepancy between user queries and document knowledge representation results in low semantic matching. Furthermore, over-reliance on pre-trained planning models limits generalization ability; topic integration and sentence filtering depend on high-quality text fragments; and information extraction models are trained independently, making end-to-end optimization difficult. All of these factors negatively impact the accuracy of data queries.

[0020] Therefore, a data query method, electronic device, and storage medium are provided that can effectively determine the task type of the data query task in the data query process, and then determine different query strategies based on the task type, thereby improving the accuracy of data query.

[0021] Please see Figure 1 , Figure 1 This is a flowchart illustrating an embodiment of the data query method in this application; it should be noted that, if there is a substantial result, the method of this application does not rely on... Figure 1 The sequence of processes shown is limited.

[0022] like Figure 1 As shown, the data query method of this application may include the following operations.

[0023] S10. Obtain the data query task and determine the task type of the data query task.

[0024] Here, a data query task refers to the task corresponding to the instruction to query data, such as voice data, text data, image data, tabular data, or a collection of multiple types of data. Task type refers to the category of the data query task; for example, tasks requiring comparative analysis to determine the data are defined as comparative analysis types, and tasks requiring knowledge reasoning to determine the data are defined as knowledge reasoning types.

[0025] Specifically, the data can be input manually or by machine. The input data is then interpreted to determine the data query task. After obtaining the data query task, a pre-defined task classification model is used to determine the task type of the data query task.

[0026] S20. In response to the data query task being of the first task type, obtain multiple sub-tasks in the data query task, wherein the first task type is the comparison analysis type.

[0027] Among them, subtasks refer to the subtasks after the data query task is broken down; comparison and analysis type refers to the corresponding data query task involving multiple entities or multiple document knowledge, which requires comparison and analysis of multiple knowledge points, and therefore needs to be broken down into multiple subtasks.

[0028] Specifically, after determining that the data query task is the first task type, the data query task is split into multiple sub-tasks.

[0029] S30. Using subtasks and a knowledge base, determine the target data for the data query task. The knowledge base is constructed by semantic clustering of text blocks in the preset text material, summary knowledge fragments, and related question knowledge fragments.

[0030] Here, the knowledge base refers to a database that contains all types of knowledge, and the target data refers to the data retrieved using this solution for a data query task.

[0031] Specifically, after obtaining multiple subtasks of the data query task, the subtasks are queried in the knowledge base to determine the query data for each subtask, and then all the query data are integrated to obtain the target data of the data query task.

[0032] S40, or in response to the data query task, the task type is the second task type, and keyword information in the data query task is obtained, wherein the second task type is the knowledge reasoning type.

[0033] Here, keyword information refers to the keywords obtained in the data query task, which can be one or more; knowledge reasoning type refers to the data query task that requires analysis and reasoning of the question content based on a large model or contextual information in order to obtain the core content of the data query task.

[0034] Specifically, after determining that the data query task is the second task type, keyword analysis and acquisition are performed on the data query task to obtain the corresponding keyword information.

[0035] S50. Using knowledge graphs, knowledge bases, and keyword information, determine the target data for the data query task. The knowledge graph is constructed from the triple data of text blocks in the preset text materials.

[0036] Among them, knowledge graph refers to the relational graph of triple data constructed from multiple different knowledge relationships. The triple data can be one type of triple or multiple types of triples.

[0037] Specifically, after obtaining the keyword information for the data query task, the keyword information is used to perform comparison, analysis, and matching operations in the knowledge graph and knowledge base to determine the target data for the data query task.

[0038] It is understandable that task types can include complex first and second task types, as well as simple third task types. The third task type can be directly queried through the knowledge base to obtain the corresponding target data.

[0039] In this embodiment, by determining the task type of the data query task, and then performing strategy analysis based on the task type, the data query task is divided into multiple sub-tasks when the task type is the first type of comparative analysis. Then, the target data is determined from the knowledge base based on the sub-tasks. If the task type is the second type of knowledge reasoning, the keyword information in the query task is obtained. Then, the target data is determined from the knowledge base using the keyword information and knowledge graph. This effectively reduces retrieval interference, improves the quality and efficiency of document knowledge feature extraction, and enhances the accuracy of data query.

[0040] In some embodiments, obtaining a data query task and determining the task type of the data query task may also include the following.

[0041] Data query tasks can be categorized into general tasks and complex tasks using preset classification methods, such as manually setting classification methods; setting example datasets for classification; and fine-tuning the classification model using example datasets and instruction text to obtain a task classification model. The data query tasks can then be classified based on this model. General tasks refer to tasks that can directly retrieve target data from the knowledge base, such as the data query task of the third task type. Complex tasks refer to tasks that require classification before determining the target data, such as the first and second task types.

[0042] Further, please refer to Figure 2 , Figure 2 This is a flowchart illustrating an embodiment of the overall principle in this application.

[0043] like Figure 2 As shown, it is divided into two parts: knowledge base and knowledge graph construction. First, text materials are acquired, and then the text materials are segmented into document blocks to obtain multiple text blocks. Document knowledge extraction is performed on the text blocks to obtain semantic clustering fragments, summary knowledge fragments, and related question knowledge fragments. These fragments are then used to construct the knowledge base. Next, triplet data of the text blocks is acquired, and the triplet data is used to construct the knowledge graph. Then, data querying is performed to determine the task type. If it is a complex task, it is divided into comparative analysis tasks and knowledge reasoning tasks through task planning.

[0044] The other part involves executing data query tasks. When the data query task is a comparative analysis task, it is broken down into multiple subtasks. These subtasks are then used to perform matching in the knowledge base. The top-n matching data are used for knowledge inference, and the inference results are used as the target data for the data query task. When the data query task is a knowledge inference task, keywords are extracted according to task specifications. Based on the keyword information, the corresponding target triples are obtained from the knowledge graph. The target triples are then matched in the knowledge base. The top-n matching data are used for knowledge inference, and the inference results are used as the target data for the data query task. If the data query task is determined to be a general task, matching is performed directly in the knowledge base. The top-n matching data are used for knowledge inference, and the inference results are used as the target data for the data query task.

[0045] In some embodiments, in response to the data query task being of a first task type, obtaining multiple subtasks in the data query task may include the following.

[0046] In response to the data query task, the task type is the first task type, and multiple main keywords in the data query task are retrieved.

[0047] The data query task is split into multiple statements using the main keywords, and each statement is treated as a subtask. Each statement contains at least one main keyword.

[0048] Among them, the main keywords refer to words or phrases involving multiple similar entities or multiple document knowledge.

[0049] Specifically, after determining that the task type of the data query task is the first task type, the main keywords of the data query task are extracted to obtain multiple main keywords in the data query task; then, the text of the data query task is split into sentences based on the main keywords to obtain multiple sentences, each of which contains at least one main keyword, and each sentence is taken as a sub-task, thereby obtaining multiple sub-tasks of the data query task.

[0050] For example, a data query task might be: How many times did Sun Wukong and the Black Bear Spirit meet in *Journey to the West*? This can be broken down into two sub-tasks: First, where did Sun Wukong appear in *Journey to the West*? Second, where did the Black Bear Spirit appear in *Journey to the West*? The main keywords are *Journey to the West*, Sun Wukong, and the Black Bear Spirit.

[0051] In some embodiments, using subtasks and a knowledge base to determine the target data for a data query task may include the following.

[0052] Obtain semantic clustering segments, summary knowledge segments, and related question knowledge segments from text blocks in the preset text materials, and construct a knowledge base.

[0053] Subtasks are used to perform matching in the knowledge base. Successful matching data is used for knowledge reasoning, and the results of knowledge reasoning are used as the target data for data query tasks.

[0054] Among them, the preset text material refers to various types of knowledge text material, which can include existing knowledge from various aspects. The preset text material can be divided into multiple text blocks, and a text block can be a paragraph; semantic clustering segmentation refers to the segmentation blocks determined by semantic clustering and merging based on semantic similarity; summary knowledge segmentation refers to the knowledge summary description content generated by the knowledge summary of the text block content, that is, the segmentation block of knowledge summary description content; related question knowledge segmentation refers to the collection of related question knowledge of the text block, that is, the segmentation block of related question knowledge.

[0055] Specifically, the preset text materials are divided into multiple text blocks. Semantic similarity processing is then performed on the statements within each text block to obtain semantic clustering segments. Furthermore, knowledge summarization is performed on each text block to obtain summary knowledge segments containing knowledge summaries. Additionally, related question knowledge is acquired from each text block to obtain related question knowledge segments. A knowledge base is then constructed using semantic clustering segments, summary knowledge segments, and related question knowledge segments. Multiple subtasks of the data query task are matched against the knowledge base to obtain multiple successfully matched data. Knowledge reasoning is then performed on these multiple data to obtain the final knowledge reasoning result, which serves as the target data for the data query task.

[0056] Furthermore, if the chapter structure contains multiple paragraphs and the paragraph length is less than the text block threshold, the multiple paragraphs can be merged until the length of the merged paragraph is greater than the text block threshold, thereby obtaining the corresponding text block.

[0057] In some embodiments, obtaining semantic clustering segments, summary knowledge segments, and related question knowledge segments of text blocks in preset text materials, and constructing the knowledge base, may include the following:

[0058] The preset text material is segmented into text blocks to obtain multiple text blocks.

[0059] Each text block is split into sentences, and adjacent sentences are clustered based on semantic similarity to obtain semantic clustering segments.

[0060] In response to a text block length exceeding a segment length threshold, knowledge features are extracted from the text block to obtain corresponding summary knowledge segments, and related question knowledge is generated from the text block to obtain corresponding related question knowledge segments.

[0061] A knowledge base is constructed by utilizing semantic clustering fragmentation, summary knowledge fragmentation, and related question knowledge fragmentation.

[0062] Here, a text block refers to a segment obtained by dividing text content, such as dividing it into paragraphs, with each paragraph being a text block; a sentence refers to the content divided by punctuation marks, such as "\n\n", ".", "!", "?", ";", "space", etc., which are used as splitting symbols to obtain multiple corresponding sentences; the text block length refers to the number of characters in the text block; the segment length threshold is set according to the requirements, and can be 50-200 characters, such as 80, 100, 150, 180 characters, etc.

[0063] Specifically, after acquiring the preset text material, the preset text material is divided into paragraphs according to a chapter structure, resulting in multiple text blocks. Each text block is then split using delimiters, with the content between two delimiters forming a sentence. The semantic similarity between the current sentence and adjacent sentences is obtained, and adjacent sentences are clustered based on the magnitude of semantic similarity to obtain semantic clustering segments. The length of each text block is also acquired. If the length of a text block exceeds a set segment length threshold, the text block is input into a larger model to obtain corresponding knowledge features and generate summary text, thus obtaining corresponding summary knowledge segments. Finally, related question knowledge of the text block is acquired, resulting in corresponding related question knowledge segments.

[0064] Furthermore, each text block is split into sentences, and adjacent sentences are clustered based on semantic similarity to obtain semantic clustering segments, which may include the following:

[0065] Each text block is split into multiple sentences, and the semantic similarity between each current sentence and its adjacent sentences is obtained. The length of the combined current sentence and its adjacent sentences is also obtained.

[0066] In response to a semantic similarity greater than a preset clustering threshold, when the statement length is less than the segment length threshold, the current statement and the adjacent statements are merged into a semantic cluster segment; or when the statement length is greater than the segment length threshold, the current statement with a length greater than the minimum segment length threshold is used as a semantic cluster segment, and the current statement with a length less than the minimum segment length threshold and the adjacent statements are merged into a semantic cluster segment.

[0067] Alternatively, in response to a semantic similarity less than a preset clustering threshold, if the sentence length is greater than the segment length threshold, the current sentence can be treated as a semantic clustering segment.

[0068] The semantic similarity is calculated based on the cosine similarity of vectors between two adjacent statements. The steps are as follows: first, identify and preprocess adjacent statements; second, generate vector representations of the two statements; and third, calculate the cosine similarity using the formula. The statement length refers to the number of characters contained in the statement.

[0069] Specifically, the text block content is split into multiple sentences based on the splitting symbols, and the semantic similarity is obtained by calculating the vector cosine similarity between adjacent sentences. The sentences are then sorted in descending order according to their semantic similarity scores. A group of adjacent sentences with high semantic similarity is merged to obtain the current segment, and the sentence length of the merged current segment is obtained. Then, the semantic similarity of the sorting position corresponding to the size of each current segment obtained after splitting is selected as the preset clustering threshold.

[0070] After obtaining the semantic similarity between the current statement and its adjacent statements, if the corresponding semantic similarity is greater than a preset clustering threshold and the corresponding statement length is less than a segment length threshold, the current statement and its corresponding adjacent statements are merged to obtain a semantic clustering segment. If the corresponding statement length is greater than the segment length threshold, it is also necessary to determine whether the statement length of the current statement is greater than the minimum segment length threshold. If it is greater, the current statement is added to the segment list as a semantic clustering segment; if it is less, the corresponding merged statement is added to the segment list as a semantic clustering segment.

[0071] Understandably, the fragment length threshold can be obtained by dividing the total length of the text block by the number of fragments, and can be a dynamic value; the minimum fragment length threshold can be set according to the actual situation, and is an empirical value, such as 5 characters, 10 characters, 20 characters, etc.

[0072] In addition, when the semantic similarity between the current statement and its adjacent statements is less than the preset clustering threshold, if the length of the corresponding statement is greater than the minimum segmentation threshold, the current statement is added to the segmentation list as a semantic clustering segment; if it is less than the minimum segmentation threshold, the current statement is skipped and the next statement is clustered and merged.

[0073] Understandably, if the length of a text block is less than or equal to the fragment length threshold, the text block is retained.

[0074] Furthermore, in response to a text block length exceeding a segment length threshold, knowledge feature extraction is performed on the text block to obtain corresponding summary knowledge segments, and related question knowledge generation is performed on the text block to obtain corresponding related question knowledge segments, including: Get the length of the text block.

[0075] In response to a text block length exceeding a segment length threshold, a summary text of the text block is obtained using knowledge feature extraction.

[0076] Obtain the semantic length of the summary text and text blocks, where the semantic length is determined using semantic similarity.

[0077] If the semantic length of the summary text and the text block is within the threshold range, the summary text is used as the summary knowledge segment; if the semantic length of the summary text and the text block is outside the threshold range, the content of the text block is used as the summary knowledge segment.

[0078] We use knowledge feature extraction to obtain relevant question knowledge from text blocks, and then use this relevant question knowledge to determine relevant question knowledge segments.

[0079] Here, text block length refers to the length of the content contained in the text block, such as the number of characters; knowledge feature extraction method can be the use of a knowledge feature extraction model. Summary text refers to a summary of the content of the text block; related problem knowledge refers to what the object is, what attributes it has, how to process it, what it can be used for, what problems it may encounter, and how to solve them.

[0080] Specifically, the process begins by obtaining the length of the text block. If the text block length is less than the segment length threshold, the text block is retained. Next, the text block length is compared with the segment length threshold. If the text block length exceeds the threshold, knowledge feature extraction is performed on the text block to obtain a summary text. Then, the semantic length between the summary text and the text block is obtained, and this semantic length is determined through semantic similarity. If the semantic length between the summary text and the text block is within the threshold range, the summary text is stored in the summary feature attributes corresponding to the text block, serving as the official summary text. If it is outside the threshold range, the original content of the text block replaces the summary text, i.e., the original content of the text block is used as the summary text to avoid content redundancy.

[0081] Understandably, knowledge feature extraction methods can employ knowledge feature extraction models, inputting text blocks and their corresponding titles into a large model using examples and instructions to generate content summary descriptions, thus obtaining the summary text. The threshold range refers to the semantic cosine similarity range between the summary text and the text blocks, which can be 0.7-0.9. When the semantic length exceeds the maximum value, the text block content extraction and compression become ineffective, essentially meaning the summary and knowledge block content descriptions are completely identical.

[0082] Furthermore, obtaining the semantic length of the summary text and text blocks can include the following.

[0083] Obtain the first sentence length of the summary text and the second sentence length of the text block, as well as the first semantic similarity between the summary text and the text block.

[0084] A first numerical value is determined using the length of the first statement, the length of the second statement, and the first semantic similarity. A second numerical value is determined using the length of the first statement and the length of the second statement. The semantic length is determined using the first numerical value and the second numerical value.

[0085] Here, the first sentence length refers to the sentence length of the summary text, the second sentence length refers to the sentence length of the text block, and the first semantic similarity refers to the semantic similarity between the content contained in the summary text and the text block.

[0086] Specifically, the lengths of the first sentence in the summary text and the second sentence in the text block are obtained respectively, and the vector cosine similarity between the summary text and the text block is calculated to obtain the corresponding first semantic similarity; then, the semantic length is... The calculation is as follows:

[0087] in, This is the abstract text. Statements that are text blocks The length of the first statement. The length of the second statement. The first semantic similarity, The first value, This is the second value.

[0088] Furthermore, relevant question knowledge is obtained from text blocks using knowledge feature extraction methods, and relevant question knowledge fragments are determined based on this relevant question knowledge, including: Get the first text length of the text block, and get the first number of semantic clustering segments in the text block and the second text length of the semantic clustering segments.

[0089] The third value is determined using the first constant coefficient and the first quantity, and the fourth value is determined using the first text length and the second text length. Then, the text semantic length of the text block in the reverse generation related problem is determined using the third value and the fourth value.

[0090] By using the semantic length of the text, the first text length of the text block, the segment length threshold, and the second constant coefficient, the amount of relevant question knowledge generated by the text block is determined, and then the relevant question knowledge segments are determined.

[0091] Here, the first text length refers to the text length of the text block, such as characters, which can be the same as the first text length of the preceding text block; the second text length refers to the sentence length of the current semantic clustering segment; and the text semantic length refers to the semantic length of the reverse generation related questions of the text block.

[0092] Specifically, after obtaining the first text length of the text block, it is also necessary to obtain the first number of semantic clustering segments in the text block, and the second text length of the current semantic clustering segment; set the first constant coefficient k, which can be set according to the actual situation; then use the quotient between the first constant coefficient and the first number to determine the third value, and use the first text length and the second text length to determine the fourth value.

[0093] Then, the semantic length of the text is... The calculation is as follows:

[0094] in, As the first quantity, The first text length, The second text length, The first constant coefficient, The third value, It is the fourth value.

[0095]

[0096] in, This is the threshold for the slice length. The amount of knowledge related to the problem, The first text length, This is the coefficient of the second constant.

[0097] Understandably, generating relevant question knowledge from text blocks is an effective way to bridge the semantic gap between query statements and knowledge blocks. However, since the semantic length of text blocks varies, a fixed amount of question knowledge extracted from a text block can lead to incomplete coverage or knowledge redundancy in the generated relevant question knowledge. Therefore, the aforementioned processing is performed.

[0098] In some embodiments, the target data for a data query task can be determined by utilizing knowledge graphs, knowledge bases, and keyword information, and may include the following:

[0099] Obtain document information, subtopic information, and topic keyword information from text blocks in preset text materials, construct triple data, and use the triple data to construct a knowledge graph.

[0100] Relevant target triples can be obtained from knowledge graphs using keyword information.

[0101] The target triples are used to match data in the knowledge base. The successfully matched data is used for knowledge reasoning, and the reasoning results are used as the target data for the data query task.

[0102] Among them, document information refers to the contextual position of a text block in a document, used to obtain relevant text content corresponding to the document knowledge base, such as document title, chapter, text block number, etc.; subtopic information refers to phrases such as core scene theme words contained in the text block description that can represent the theme content of the text block description, such as text block number, subtopic, subtopic semantic text; theme keyword information refers to phrases such as core high-frequency words, scene theme words, time keywords, table title words, character keywords, event keywords contained in the text block description that can represent the theme content of the text block description, such as subtopic semantic text, text keywords, core keyword text.

[0103] Specifically, document titles, chapters, and text block numbers are extracted from preset text materials to form document information; the text block number, subtopic, and subtopic semantic text of each text block are also extracted to form subtopic information; furthermore, the subtopic semantic text, text keywords, and core keyword text within each text block are extracted to form topic keyword information. Corresponding triplet data are constructed for each, and a knowledge graph is built using the constructed triplet data. That is, based on a large model, subtopic text and core keywords are extracted from text blocks. Based on the document information and topic keyword extraction content of the text blocks, a document topic knowledge graph is constructed.

[0104] Next, after obtaining the keyword information for the data query task, the corresponding associated triples are searched from the knowledge graph using the keyword information to serve as target triples. After determining the target triples, they are matched in the knowledge base to obtain multiple successfully matched data. These multiple data are then used for knowledge reasoning to obtain the final knowledge reasoning result, which serves as the target data for the data query task.

[0105] In some embodiments, obtaining document information, subtopic information, and topic keyword information of text blocks in preset text materials, constructing triple data, and using the triple data to construct a document topic knowledge graph may include the following:

[0106] Retrieve the document title, chapter, text block number, subtopic, subtopic semantic text, text keywords, and core keyword text of a text block.

[0107] Document information triples are constructed using document title, chapter, and text block number; text block subtopic triples are constructed using text block number, subtopic, and subtopic semantic text; and keyword triples are constructed using subtopic semantic text, text keywords, and core keyword text.

[0108] By using document information triples, text block subtopic triples, and keyword triples, triple data is determined, and a document topic knowledge graph is constructed using the triple data.

[0109] Among them, document title refers to the title content of the text; chapter refers to the chapters contained in the text; text block number refers to the sorting number of the text blocks divided in the text; subtopic refers to the title content of the chapter; subtopic semantic text refers to the text content having clear and complete semantic information (including literal meaning, implied intent, logical relationship, entity association, etc.); text keywords refer to phrases describing the content of subtopics; core keyword text refers to high-frequency words in the text.

[0110] For example, the document's text block data acquisition: Text block document name: "Video Information Data Mining Platform Management Operation Manual"; Text block chapter: 1.2 System Function Overview; Text block number: 02; Text block content: "The video information data mining platform achieves the fusion of static image information and video dynamic trajectory through technologies such as image fusion and global retrieval, enabling intelligent dynamic data retrieval. Through the application of the functions of 'fusion retrieval' and 'holographic archives,' it can realize dynamic linkage of data mining business scenarios such as human-vehicle association, facial recognition-person association, personnel-to-archive, and vehicle-to-archive. Through the association and jump between information, it presents parking lot managers with more clues related to emergencies." Extracting subtopic information from text blocks using a large model: Input the document name + text section name + text block + instruction promotion words into the large model; obtain the large model extraction result of the text block: "{"Subtopic core words": "Video information data mining platform"}.

[0111] Core keywords for the theme: Input the document name + text chapter name + text block + instruction promotion words into the large model; obtain the large model extraction results of the text block: "{"Core keywords for the theme": "Video Information Data Mining Platform", "Scene keywords": ["Intelligent Data Retrieval", "Platform Function Overview", "Parking Lot Management"]}.

[0112] Based on the document information in the text blocks, the large model extracts the results to construct triplet data, which is then stored in the document topic knowledge graph: Triplet data: {{1.2 System Function Overview, Document to which the chapter belongs, "Video Information Data Mining Platform Management Operation Manual",}, {02, Chapter to which the text block belongs, 1.2 System Function Overview}, {"Video Information Data Mining Platform", Subtopic, 02}, {"Intelligent Data Retrieval", Scene Theme Keywords, 02}, {"Platform Function Overview", Scene Theme Keywords, 02}, {"Parking Lot Management", Scene Theme Keywords, 02}.

[0113] Next, for question retrieval based on knowledge graphs and knowledge bases, please refer to [link / reference]. Figure 3 , Figure 3 This is a flowchart illustrating one embodiment of the query principle in this application.

[0114] like Figure 3As shown, the process first determines whether the data query task is complex. If so, it is categorized into comparative analysis tasks (Type 1) and knowledge reasoning tasks (Type 2). For comparative analysis tasks, the task is broken down into multiple sub-tasks using task planning. These sub-tasks are then queried in the knowledge base to obtain the top-n data points. The results are then used for knowledge reasoning and output as the answer, which is the target data for the data query task. For knowledge reasoning tasks, keywords are extracted using task specification. Based on these keywords, the corresponding target triples are retrieved from the knowledge graph. These target triples are then matched against the knowledge base. The top-n matching data are used for knowledge reasoning, and the reasoning results are used as the target data for the data query task.

[0115] If the data query task is determined to be a general task, then a direct match is performed in the knowledge base. The top-n matching data are used for knowledge reasoning, and the result of the knowledge reasoning is used as the target data for the data query task.

[0116] This application also provides an electronic device.

[0117] See Figure 4 , Figure 4 This is a schematic diagram of an embodiment of the electronic device in this application. The electronic device can perform the steps in the above method.

[0118] The electronic device 200 includes a memory 220, a processor 210 coupled to the memory, and at least one computer program stored in the memory 220 and executable on the processor 210. When the processor 210 loads and executes the at least one computer program, it implements the steps of the data query method described above. For related details, please refer to the detailed description in the above method; further elaboration will not be repeated here.

[0119] This application also includes a computer-readable storage medium.

[0120] Please see Figure 5 , Figure 5 This is a schematic diagram of an embodiment of a computer-readable storage medium in this application.

[0121] The computer-readable storage medium 300 stores at least one program 310, which, when loaded and executed by a processor, is used to implement the steps of the data query method described above. For related details, please refer to the detailed description in the above method; it will not be repeated here.

[0122] The above solution determines the task type of the data query task and then performs strategy analysis based on the task type. For example, when the task type is the first type of comparative analysis, the data query task is broken down into multiple sub-tasks, and the target data is then determined from the knowledge base based on the sub-tasks. If the task type is the second type of knowledge reasoning, the keyword information in the query task is obtained, and then the target data is determined from the knowledge base using the keyword information and knowledge graph. This effectively reduces retrieval interference, improves the quality and efficiency of document knowledge feature extraction, and enhances the accuracy of data query.

[0123] In the several embodiments provided by this invention, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interfaces, devices, or units, and may be electrical, mechanical, or other forms.

[0124] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment, depending on actual needs.

[0125] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0126] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0127] The above description is merely an embodiment of the present invention and does not limit the patent scope of the present invention. Any equivalent structural or procedural transformations made based on the content of the present invention specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of the present invention.

Claims

1. A data query method, characterized in that, include: Obtain the data query task and determine the task type of the data query task; In response to the data query task being of the first task type, multiple sub-tasks in the data query task are obtained, wherein the first task type is a comparison analysis type; Using the subtasks and knowledge base, the target data for the data query task is determined, wherein the knowledge base is constructed by semantic clustering of text blocks in preset text materials, summary knowledge fragments, and related question knowledge fragments; Alternatively, in response to the data query task being a second task type, keyword information from the data query task can be obtained, wherein the second task type is a knowledge reasoning type. The target data for the data query task is determined by using the knowledge graph, the knowledge base, and the keyword information, wherein the knowledge graph is constructed from the triple data of text blocks in the preset text material.

2. The method according to claim 1, characterized in that, The task type responding to the data query task is a first task type, which involves obtaining multiple sub-tasks from the data query task, including: In response to the data query task being of the first task type, multiple main keywords of the data query task are obtained; The data query task is split into multiple statements using the main keywords, and each statement is used as a subtask, wherein each statement contains at least one of the main keywords.

3. The method according to claim 2, characterized in that, The step of determining the target data for the data query task using the sub-tasks and the knowledge base includes: Obtain semantic clustering segments, summary knowledge segments, and related question knowledge segments from text blocks in preset text materials, and construct the knowledge base accordingly; The subtasks are used to perform matching in the knowledge base, and the successfully matched data is used for knowledge reasoning. The results of the knowledge reasoning are then used as the target data for the data query task.

4. The method according to claim 3, characterized in that, The step of obtaining semantic clustering segments, summary knowledge segments, and related question knowledge segments from text blocks in the preset text material, and constructing the knowledge base, includes: The preset text material is segmented into text blocks to obtain multiple text blocks; Each text block is split into sentences, and adjacent sentences are clustered based on semantic similarity to obtain semantic clustering segments; In response to the text block length being greater than the segment length threshold, knowledge features are extracted from the text block to obtain the corresponding summary knowledge segment, and related question knowledge is generated from the text block to obtain the corresponding related question knowledge segment; The knowledge base is constructed using the semantic clustering fragments, the summary knowledge fragments, and the related question knowledge fragments.

5. The method according to claim 4, characterized in that, The step of splitting each text block into sentences and clustering adjacent sentences based on semantic similarity to obtain semantic clustering segments includes: Each text block is split into multiple sentences, and the semantic similarity between each current sentence and its adjacent sentences is obtained. The length of the sentence after merging the current sentence and its adjacent sentences is also obtained. In response to the semantic similarity being greater than a preset clustering threshold, when the statement length is less than a segment length threshold, the current statement and the adjacent statements are merged into a semantic cluster segment; or when the statement length is greater than the segment length threshold, the current statement with a length greater than the minimum segment length threshold is used as a semantic cluster segment, and the current statement with a length less than the minimum segment length threshold and the adjacent statements are merged into a semantic cluster segment. Alternatively, in response to the semantic similarity being less than the preset clustering threshold, when the statement length is greater than the segment length threshold, the current statement is treated as a semantic clustering segment.

6. The method according to claim 4, characterized in that, The step of extracting knowledge features from the text block to obtain corresponding summary knowledge fragments in response to the text block's length being greater than a fragment length threshold, and generating related question knowledge from the text block to obtain corresponding related question knowledge fragments, includes: Get the length of the text block; In response to the text block length being greater than the segment length threshold, a summary text of the text block is obtained using a knowledge feature extraction method; Obtain the semantic length between the summary text and the text block, wherein the semantic length is determined using semantic similarity; If the semantic length between the summary text and the text block is within a threshold range, the summary text is used as the summary knowledge segment; or if the semantic length between the summary text and the text block is outside the threshold range, the content of the text block is used as the summary knowledge segment. The relevant question knowledge of the text block is obtained by using knowledge feature extraction, and the relevant question knowledge is used to determine the relevant question knowledge fragments.

7. The method according to claim 6, characterized in that, The step of obtaining the semantic length between the summary text and the text block includes: Obtain the first sentence length of the summary text and the second sentence length of the text block, and obtain the first semantic similarity between the summary text and the text block; A first value is determined using the first statement length, the second statement length, and the first semantic similarity; a second value is determined using the first statement length and the second statement length; and the semantic length is determined using the first value and the second value. The step of obtaining relevant question knowledge of the text block using knowledge feature extraction methods, and determining relevant question knowledge segments based on the relevant question knowledge, includes: Obtain the first text length of the text block, and obtain the first number of semantic clustering segments in the text block and the second text length of the semantic clustering segments; A third value is determined using a first constant coefficient and a first quantity, and a fourth value is determined using a first text length and a second text length. Then, the text semantic length of the reverse generation related question of the text block is determined using the third value and the fourth value. The number of related question knowledge generated by the text block is determined by using the semantic length of the text, the first text length of the text block, the segment length threshold, and the second constant coefficient, and then the related question knowledge segment is determined.

8. The method according to claim 1, characterized in that, The step of determining the target data for the data query task using the knowledge graph, the knowledge base, and the keyword information includes: Obtain document information, subtopic information, and topic keyword information of text blocks in the preset text material, construct triple data, and use the triple data to construct the knowledge graph; Relevant target triples are obtained from the knowledge graph using the keyword information; The target triples are used to match data in the knowledge base. Successful matching data is used for knowledge reasoning, and the reasoning results are used as the target data for the data query task.

9. The method according to claim 8, characterized in that, The step of obtaining document information, subtopic information, and topic keyword information of text blocks in the preset text material, constructing triplet data, and using the triplet data to construct the knowledge graph includes: Obtain the document title, chapter, text block number, subtopic, subtopic semantic text, text keywords, and core keyword text of the text block; A document information triplet is constructed using the document title, the chapter, and the text block number; a text block subtopic triplet is constructed using the text block number, the subtopic, and the subtopic semantic text; and a keyword triplet is constructed using the subtopic semantic text, the text keywords, and the core keyword text. Using the document information triples, the text block subtopic triples, and the keyword triples, the triple data is determined, and the document topic knowledge graph is constructed using the triple data.

10. An electronic device, characterized in that, The electronic device includes a memory and a processor coupled to the memory, the memory storing at least one computer program, which, when loaded and executed by the processor, is used to implement the method as described in any one of claims 1-9.