A large model retrieval enhancement method and system based on a fusion search engine
By identifying the query domain and selecting an appropriate retrieval method in information retrieval, and integrating and sorting multiple retrieval results, the problem of poor retrieval accuracy in existing technologies is solved, achieving more efficient and accurate information retrieval.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA ELECTRONICS CYBERSPACE RESEARCH INSTITUTE CO LTD
- Filing Date
- 2024-12-25
- Publication Date
- 2026-06-26
AI Technical Summary
Existing information retrieval methods are difficult to select the appropriate retrieval method based on the search content, resulting in poor retrieval accuracy.
By obtaining keywords and entities from the query request, the query domain is determined, and an appropriate retrieval method, such as a knowledge base, knowledge graph, or search engine, is selected for retrieval. Multiple retrieval results are then integrated and sorted and displayed using a scoring model.
It improves the accuracy and comprehensiveness of information retrieval, ensures the authority and completeness of information, reduces information duplication and conflict, and enhances the system's response speed and efficiency.
Smart Images

Figure CN122285697A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data retrieval technology, and in particular to a method and system for enhancing large-model retrieval based on a fusion search engine. Background Technology
[0002] Information retrieval (IR) is an important branch of computer science that focuses on how to efficiently extract relevant information from large amounts of unstructured and structured data. This field encompasses various data types, including text, images, audio, and video, and aims to improve the convenience and accuracy of information retrieval for users through technological means.
[0003] Key technologies in information retrieval include indexing, query processing, and document ranking. Indexing is used to build an index structure for data to quickly locate relevant information; query processing involves parsing and optimizing user-input queries; and document ranking sorts the search results based on the degree of match between the query and the documents. With the rapid development of the internet, information retrieval technology has been widely applied in search engines, digital libraries, e-commerce, and other fields. It not only helps users quickly find the information they need from massive amounts of online data but also promotes the development of related fields such as knowledge discovery and data mining.
[0004] Existing information retrieval methods include knowledge graph and vector knowledge base retrieval. However, existing information retrieval schemes have difficulty selecting the appropriate retrieval method based on the content being retrieved, resulting in poor retrieval accuracy. Summary of the Invention
[0005] In view of this, embodiments of the present invention provide a method for enhancing large model retrieval based on a fusion search engine, in order to eliminate or improve one or more defects existing in the prior art.
[0006] One aspect of the present invention provides a method for enhancing large model retrieval based on a fusion search engine, the method comprising the following steps:
[0007] Obtain the query request, and extract keywords and entities based on the query content in the query request;
[0008] The query domain is determined based on the keywords and entities of the query content, and at least one corresponding retrieval method is determined based on the query domain;
[0009] The query content is retrieved using the corresponding retrieval method to obtain the retrieval results;
[0010] If there are multiple search methods corresponding to the query content, the search results will be merged and the merged search results will be displayed.
[0011] Using the above approach, this solution first extracts keywords from the query request. Keywords are often representative of the query content. This solution determines the query domain through keywords. Since different query domains are suitable for different retrieval methods, this solution matches the corresponding retrieval method with the query domain, which can ensure the retrieval effect. Furthermore, this solution integrates the results of multiple retrieval methods to improve the display effect.
[0012] In some embodiments of the present invention, in the steps of obtaining a query request and extracting keywords and entities based on the query content in the query request, the user query input module receives the query request and determines whether the query request is a text request or a voice request. If the query request is a voice request, the voice data of the voice request is converted into text data, and the text data is used as the query content.
[0013] In some embodiments of the present invention, the step of obtaining a query request and extracting keywords and entities based on the query content in the query request further includes:
[0014] The query content is segmented into words using the query analysis module.
[0015] It also identifies keywords and entities in the segmented text.
[0016] In some embodiments of the present invention, in the step of determining the query domain based on the keywords and entities of the query content, an input vector is constructed based on the keywords and entities, and the input vector is input into a pre-trained recognition model, wherein the recognition model outputs the corresponding query domain.
[0017] In some embodiments of the present invention, in the step of determining at least one corresponding retrieval method based on the query domain, the query domain includes a stable domain, a slowly changing domain, a rapidly changing domain, and a real-time domain, and the retrieval method includes a knowledge base, a knowledge graph, and a search engine.
[0018] In some embodiments of the present invention, in the step of retrieving the query content using the corresponding retrieval method:
[0019] If the query domain of the query content is a stable domain or a slowly changing domain, then a retrieval method using a knowledge base or knowledge graph will be used.
[0020] If the query content pertains to a rapidly changing or real-time domain, a search engine will be used for retrieval.
[0021] In some embodiments of the present invention, in the step of merging the search results and displaying the merged search results if there are multiple search methods corresponding to the query content, the result data corresponding to the multiple search results are deduplicated, the result data after deduplication is sorted by relevance, and the results with higher relevance are displayed first, so as to obtain the merged search results.
[0022] In some embodiments of the present invention, in the step of ranking the relevance of the deduplicated result data, a pre-set scoring model is used to score each result data item for multiple items, and a weighted calculation is performed based on the scores of multiple items to obtain the ranking weight value corresponding to each result data item, and each result data item is ranked based on the ranking weight value.
[0023] In some embodiments of the present invention, in the step of displaying the fused search results, structured data and unstructured data are marked by a pre-set result display module, the content of the structured data is displayed, and the links of the unstructured data are displayed.
[0024] A second aspect of the present invention also provides a large model retrieval enhancement system based on a fusion search engine. The system includes a computer device, the computer device including a processor and a memory, the memory storing computer instructions, and the processor executing the computer instructions stored in the memory. When the computer instructions are executed by the processor, the system implements the steps of the method described above.
[0025] A third aspect of the present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the aforementioned large model retrieval enhancement method based on a fusion search engine.
[0026] Additional advantages, objects, and features of the invention will be set forth in part in the description which follows, and will also become apparent in part to those skilled in the art upon studying the text, or may be learned by practice of the invention. The objects and other advantages of the invention will become apparent from the description and the accompanying drawings.
[0027] Those skilled in the art will understand that the objectives and advantages achievable with the present invention are not limited to those specifically described above, and that the above and other objectives achievable with the present invention will become clearer from the following detailed description. Attached Figure Description
[0028] The accompanying drawings, which are provided to further illustrate the invention and form part of this application, are not intended to limit the scope of the invention.
[0029] Figure 1 This is a schematic diagram of one implementation of the large model retrieval enhancement method based on a fusion search engine according to the present invention;
[0030] Figure 2 This is a schematic diagram illustrating another implementation of the large model retrieval enhancement method based on a fusion search engine according to the present invention;
[0031] Figure 3 This is a schematic diagram of the processing architecture of the large model retrieval enhancement method based on a fusion search engine according to the present invention. Detailed Implementation
[0032] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the embodiments and accompanying drawings. Here, the illustrative embodiments and descriptions of this invention are used to explain the invention, but are not intended to limit the invention.
[0033] It should also be noted that, in order to avoid obscuring the invention with unnecessary details, only the structures and / or processing steps closely related to the solution according to the invention are shown in the accompanying drawings, while other details that are not closely related to the invention are omitted.
[0034] In the specific implementation process, since railway passenger stations are scattered in various places, if we want to achieve a balanced optimization of energy consumption under the goals of "safety", "comfort" or "energy saving", we must inevitably consider the influence of region, season and climate. On the other hand, due to the huge daily passenger flow, the massive passenger flow and the wide variety of equipment and facilities also have a huge impact on station energy consumption. Therefore, it is also extremely important to optimize station energy consumption from the perspectives of region, time and passenger flow.
[0035] like Figure 1 and 3 As shown, this invention proposes a method for enhancing large-model retrieval based on a fusion search engine. The steps of this method include:
[0036] Step S100: Obtain a query request, and extract keywords and entities based on the query content in the query request;
[0037] Step S200: Determine the query domain based on the keywords and entities of the query content, and determine at least one corresponding retrieval method based on the query domain;
[0038] Step S300: The query content is retrieved using the corresponding retrieval method to obtain the retrieval results;
[0039] Step S400: If there are multiple retrieval methods corresponding to the query content, the retrieval results are merged and the merged retrieval results are displayed.
[0040] Using the above approach, this solution first extracts keywords from the query request. Keywords are often representative of the query content. This solution determines the query domain through keywords. Since different query domains are suitable for different retrieval methods, this solution matches the corresponding retrieval method with the query domain, which can ensure the retrieval effect. Furthermore, this solution integrates the results of multiple retrieval methods to improve the display effect.
[0041] In some embodiments of the present invention, in the steps of obtaining a query request and extracting keywords and entities based on the query content in the query request, the user query input module receives the query request and determines whether the query request is a text request or a voice request. If the query request is a voice request, the voice data of the voice request is converted into text data, and the text data is used as the query content.
[0042] In practice, users submit query requests through the input interface. This module is responsible for receiving user input and passing the query request to subsequent modules for processing. Users typically input questions or keywords through text input boxes or other forms of input (such as speech recognition to text).
[0043] In some embodiments of the present invention, the step of obtaining a query request and extracting keywords and entities based on the query content in the query request further includes:
[0044] The query content is segmented into words using the query analysis module.
[0045] It also identifies keywords and entities in the segmented text.
[0046] In practice, identifying keywords and entities in segmented text can be achieved by using deep neural networks for training and prediction, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This method typically offers higher accuracy and generalization ability.
[0047] In practice, the query analysis module preprocesses and analyzes user query requests, leveraging the natural language processing capabilities of the large-scale model to deeply analyze user query intent and identify keywords and entities within the query. The main function of this module is to transform user natural language queries into structured semantic forms. The large-scale model, through its powerful contextual understanding and language modeling capabilities, improves the accuracy of entity recognition and part-of-speech tagging, and can handle queries with complex syntactic structures and semantic ambiguity. In this way, the system can more effectively decompose queries and generate high-precision semantic elements that can be processed by subsequent modules.
[0048] In some embodiments of the present invention, in the step of determining the query domain based on the keywords and entities of the query content, an input vector is constructed based on the keywords and entities, and the input vector is input into a pre-trained recognition model, wherein the recognition model outputs the corresponding query domain.
[0049] In the step of constructing an input vector based on the keywords and entities, the keywords and entities are encoded to obtain the input vector, and the recognition model can be a convolutional neural network model.
[0050] In some embodiments of the present invention, in the step of determining at least one corresponding retrieval method based on the query domain, the query domain includes a stable domain, a slowly changing domain, a rapidly changing domain, and a real-time domain, and the retrieval method includes a knowledge base, a knowledge graph, and a search engine.
[0051] In practice, the module retrieves relevant web pages or other unstructured data from the internet using a search engine. Based on the query keywords and entities, it extracts relevant unstructured data from multiple web page sources for subsequent analysis. This data can be in various formats, such as text, images, or audio.
[0052] The system searches a local vector knowledge base for structured data (such as encyclopedia entries and database entries) relevant to the query. The user's query is transformed into a vector representation, and the system uses techniques such as word embedding or sentence embedding to map the natural language query to a vector space. The most relevant answer is found by calculating the cosine similarity or other similarity measures between the user's query vector and the vectors stored in the knowledge base. This module matches the user's query with high-quality, structured information in the knowledge base to identify the most relevant entries and facts. This data is then combined with unstructured data to provide the user with a comprehensive answer.
[0053] A knowledge graph is a structured database that uses nodes to represent entities and edges to represent relationships between entities. By retrieving and analyzing the knowledge graph, the system finds entities relevant to user queries and the relationships between them, and performs semantic reasoning based on the entities and relationships within the knowledge graph. The core function of this module is to perform deep semantic analysis and reasoning on queries based on entity relationships within the knowledge graph. For example, when a user queries a historical event, the system not only displays information about the event itself but also infers other background knowledge related to the event. This module supports multi-step reasoning, solving the problems of cross-domain and complex queries.
[0054] In some embodiments of the present invention, in the step of retrieving the query content using the corresponding retrieval method:
[0055] In practical implementation, the search method selection module is used to choose the appropriate search method and how to integrate search results from multiple sources, which is a crucial step in ensuring system efficiency and accuracy. When processing user queries, it analyzes the domain to which the query belongs to determine which search method to use.
[0056] Stable domains: Information in stable domains is relatively fixed and changes very little, such as mathematical theorems, historical events, and classical physical laws. Knowledge in these domains typically remains unchanged over long periods, making them suitable for querying through pre-built knowledge bases or knowledge graphs, which can quickly provide accurate and authoritative answers.
[0057] Slow-changing domains: Information in slow-changing domains changes slowly, typically taking years or longer to be updated, such as legal provisions, basic technologies, and scientific research findings. Systems can rely on regularly updated knowledge bases and knowledge graphs, but sometimes also need to use search engines to obtain the latest research progress or newly revised regulations.
[0058] Rapidly Changing Domains: Information in rapidly changing domains is frequently updated, such as technological developments, market trends, and emerging technologies. Due to the rapid pace of change, systems often need to rely on search engines to obtain the latest publicly available information, while also combining this with vector knowledge bases to provide background information in order to fully understand user queries.
[0059] Real-time domain: Information in the real-time domain changes rapidly within a short period of time, such as stock quotes, news events, and weather forecasts. To ensure the real-time nature of the results, the system mainly relies on search engines or external APIs to obtain the latest data in real time. This type of domain typically requires immediate feedback of the latest dynamic information.
[0060] If the query domain of the query content is a stable domain or a slowly changing domain, then a retrieval method using a knowledge base or knowledge graph will be used.
[0061] In practical implementation, a knowledge base is a database that organizes and stores structured information, typically including high-quality data manually compiled by domain experts. Its characteristics include structure, clearly defined rules, and high authority. RAGs can generate authoritative answers using the high-quality data provided by the knowledge base, improving the accuracy of information generation and user trust. Common knowledge bases include Wikipedia, Wikidata, and various specialized domain databases. The core advantages of a knowledge base are high accuracy and credibility, but its scope is relatively limited, and its maintenance costs are high.
[0062] A knowledge graph is a semantic network that represents and stores information through nodes (entities) and edges (relationships). It can not only store massive amounts of information but also model and analyze the semantic relationships between these entities. When generating complex, multi-domain answers, Relational Acyclic Graphs (RAGs) typically require cross-domain and multi-step reasoning, and knowledge graphs provide the semantic network foundation for achieving this. While knowledge graphs can effectively represent complex relationships and contexts, building and maintaining a large-scale knowledge graph requires significant resources, and its dynamic information updating capabilities are limited.
[0063] If the query content pertains to a rapidly changing or real-time domain, a search engine will be used for retrieval.
[0064] In practice, a search engine is a tool used to quickly retrieve information from massive amounts of online data. One of the core features of RAG (Retrieval Enhanced Generation) is the ability to generate answers based on retrieved documents. Therefore, as a supplement to the data source, the search engine greatly enhances the RAG system's ability to obtain the latest information, especially in scenarios where timeliness is critical.
[0065] Existing search engines perform well when handling simple queries, but struggle with complex, multi-layered problems (such as queries requiring multi-step reasoning or cross-domain knowledge). While knowledge bases and knowledge graphs can provide some accurate answers, they lack sufficient contextual understanding to effectively handle complex query requests. This invention leverages the powerful contextual understanding and language modeling capabilities of large-scale models to more effectively decompose queries based on user queries and identify keywords and entities within them, thereby improving the processing capability for complex queries.
[0066] In practice, knowledge base and knowledge graph queries involve significant computational loads, especially for complex queries or those requiring cross-domain reasoning, which can significantly slow down response times. Furthermore, while search engines can provide instant search results, they may lack accuracy and complexity. This invention improves the overall system response speed by optimizing indexing mechanisms, parallel processing, and caching mechanisms, enabling faster results in complex queries and reducing user waiting time.
[0067] The proposed solution integrates search engines, vector knowledge bases, and knowledge graphs to enhance the accuracy, comprehensiveness, and intelligence of information retrieval. Through this integration, RAG (Retrieval Enhanced Generation) technology can more efficiently retrieve, organize, and generate content based on dynamic data and structured knowledge.
[0068] like Figure 2As shown, in some embodiments of the present invention, if there are multiple retrieval methods corresponding to the query content, the step of merging the retrieval results and displaying the merged retrieval results includes:
[0069] Step S410: Deduplication is performed on the result data corresponding to multiple search results;
[0070] Step S420: Sort the deduplicated result data by relevance, and display the data with higher relevance first to obtain the fused search results.
[0071] In some embodiments of the present invention, in the step of ranking the relevance of the deduplicated result data, a pre-set scoring model is used to score each result data item for multiple items, and a weighted calculation is performed based on the scores of multiple items to obtain the ranking weight value corresponding to each result data item, and each result data item is ranked based on the ranking weight value.
[0072] In the specific implementation process, in the steps of using a pre-set scoring model to score each result data item for multiple items, and performing weighted calculation based on the scores of multiple items to obtain the ranking weight value corresponding to each result data item, each result data item is encoded into a scoring vector, the scoring vector is input into the pre-set scoring model, the scoring model outputs the scores corresponding to multiple items, the scores of multiple items are normalized, the processed values are weighted with the pre-set weight values of the corresponding items to obtain the ranking weight value of the result data item, and the multiple result data items are ranked based on the ranking weight value.
[0073] Specifically, the items can be real-time, channel importance, effectiveness, or geographical location, etc.
[0074] Using the above scheme, this scheme calculates the importance of a result based on the scores of multiple items and preset weight values. It can comprehensively assess the importance of the result data from multiple items and display it to users based on its importance. This can effectively combine multiple search channels and ensure the accuracy of users' searches.
[0075] In practice, the retrieval result fusion module is responsible for integrating results from vector knowledge bases, knowledge graphs, and search engines, ensuring that the final output is most useful to the user. First, duplicate results from each source are removed to avoid displaying repetitive content. Results are then ranked based on factors such as source, relevance score, and credibility. Results from knowledge bases and knowledge graphs are typically more accurate and are prioritized, while search engine results are used as supplementary information. The system can also display different types of results (such as concise answers, detailed background information, and external links) in a tiered manner, allowing users to select content for in-depth reading as needed. Through intelligent scoring and weighted fusion of results from different sources using a large model, the system ensures that users receive the optimally integrated content.
[0076] Existing methods for query result fusion primarily rely on single-level or parallel information integration, such as simply combining vector retrieval and knowledge graph results. This approach can lead to conflicting or repetitive results, failing to fully leverage the strengths of different information sources. This invention proposes a multi-level query result fusion mechanism that enables deeper semantic analysis and deduplication of retrieval results from search engines, vector knowledge bases, and knowledge graphs. Through semantic understanding and entity recognition, the system can process similar and repetitive information, ensuring that the final generated answer is optimized and filtered.
[0077] Using the above-mentioned methods, search engines cannot guarantee the authority and accuracy of all information when crawling web pages, potentially resulting in low-quality or misleading content. While knowledge bases and knowledge graphs provide accurate information, their coverage is limited, especially when dealing with the latest or more specialized fields of knowledge, where they may not provide comprehensive information. This invention integrates the massive amounts of unstructured data from search engines with the high-quality structured data from knowledge bases, ensuring both accuracy and comprehensiveness. By introducing a credibility scoring mechanism, the system prioritizes displaying authoritative information while supplementing unknown areas with other data provided by the search engine.
[0078] In some embodiments of the present invention, in the step of displaying the fused search results, structured data and unstructured data are marked by a pre-set result display module, the content of the structured data is displayed, and the links of the unstructured data are displayed.
[0079] In practice, the results display module presents the processed and filtered results to the user. This module is responsible for presenting the aggregated query results in a user-friendly manner. The results display can include structured information cards (such as profiles of people and summaries of events) and unstructured web page links, allowing users to gain further insights.
[0080] By employing the above approach, this invention achieves deeper fusion of structured and unstructured data, enhancing the completeness and relevance of information retrieval through a unified knowledge representation and query processing framework. By tightly integrating structured information from knowledge graphs and knowledge bases with unstructured web page information processed by search engines, users can obtain comprehensive, multi-layered answers within a single query.
[0081] In summary, the core idea of this invention is to achieve more efficient and accurate intelligent information retrieval by integrating search engines, vector knowledge bases, and knowledge graphs. By selecting appropriate retrieval methods and integrating retrieval results from multiple sources, the system ensures efficiency and accuracy, providing authoritative, relevant, and comprehensive answers.
[0082] The beneficial effects of this plan include:
[0083] 1. Deeper fusion of structured and unstructured data
[0084] This invention introduces a unified knowledge representation and processing framework that deeply integrates structured information from knowledge bases and knowledge graphs with unstructured web page data crawled by search engines. This integration goes beyond simple consolidation; it also ensures complementary data sources through deduplication, relevance ranking, and credibility scoring, achieving multi-level information integration. This multi-level data fusion provides richer query results, satisfying users' needs for comprehensiveness and relevance.
[0085] 2. Enhanced query comprehension and reasoning abilities
[0086] This invention preprocesses and analyzes user query requests, leveraging the natural language processing capabilities of a large-scale model to deeply analyze the user's query intent. The large-scale model, through its powerful contextual understanding and language modeling capabilities, improves the accuracy of entity recognition and part-of-speech tagging, and can handle queries with complex syntactic structures and semantic ambiguity. In this way, the system can more effectively decompose queries and generate high-precision semantic elements that can be processed by subsequent modules.
[0087] 3. Balancing the accuracy and comprehensiveness of information
[0088] This invention integrates massive amounts of unstructured data from search engines with high-quality structured data from knowledge bases and knowledge graphs, ensuring both the accuracy and comprehensiveness of information. By introducing a credibility scoring mechanism, the system strikes a balance between accuracy and comprehensiveness. Users not only receive information with broad coverage but also reliable and accurate answers when querying. For uncertain or evolving areas, the system can supplement with real-time data crawled by the search engine, ensuring that the information users obtain is both extensive and reliable.
[0089] 4. Improve search efficiency and response speed
[0090] This invention improves the retrieval speed and efficiency of the system by optimizing the indexing mechanism, parallel processing, and caching mechanism, enabling rapid return of results in complex queries and reducing user waiting time. The system establishes an efficient indexing structure for data in the vector knowledge base and knowledge graph, reducing the scanning scope during retrieval through segmented indexing, hierarchical indexing, or topic-based indexing. Parallel processing of different retrieval methods avoids the latency caused by sequential processing, and the parallel architecture reduces waiting time and improves the overall system response speed. For common queries, a caching mechanism reduces the time spent on redundant calculations. If the same or similar queries occur repeatedly within a short period, the system can directly return the cached results without performing a full retrieval again.
[0091] 5. Deep fusion of structured and unstructured data
[0092] This invention proposes a method for fusing structured data (from knowledge bases and knowledge graphs) and unstructured data (such as web pages and text). Through a unified knowledge representation and query processing framework, it improves the completeness and relevance of information retrieval. This fusion method achieves deep integration of structured and unstructured data, providing more comprehensive and multi-layered query results.
[0093] 6. Enhanced query comprehension and reasoning abilities
[0094] This invention enhances the system's understanding and reasoning capabilities for complex queries by combining knowledge graphs and knowledge bases, enabling it to handle multi-step reasoning and cross-domain knowledge query requests. The system's semantic understanding and reasoning functions, particularly its ability to perform multi-step reasoning based on entity relationships in the knowledge graph, are particularly noteworthy.
[0095] 7. The balance between authority and comprehensiveness
[0096] By integrating unstructured data from search engines with high-quality structured data from knowledge bases, this invention solves the problem of the imbalance between the accuracy and authority of information. Furthermore, through intelligent scoring and weighted fusion of results from different sources using a large model, it ensures that users obtain the optimal integrated content.
[0097] 8. Multi-level query result fusion mechanism
[0098] This invention proposes a multi-level query result fusion mechanism that can perform semantic analysis and information deduplication on retrieval results from different information sources, avoiding information conflicts and duplication. The multi-level query result fusion mechanism particularly focuses on result optimization and deduplication based on semantic understanding and entity recognition.
[0099] 9. Optimized response speed and processing efficiency
[0100] By optimizing the index structure in the vector knowledge base and knowledge graph, and by implementing parallel processing of multiple retrieval mechanisms and query caching mechanisms, the response speed and processing efficiency of complex queries have been improved.
[0101] This invention also provides a large model retrieval enhancement system based on a fusion search engine. The system includes a computer device, which includes a processor and a memory. The memory stores computer instructions, and the processor executes the computer instructions stored in the memory. When the computer instructions are executed by the processor, the system implements the steps of the method described above.
[0102] This invention also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of the aforementioned large-model retrieval enhancement method based on a fusion search engine. The computer-readable storage medium can be a tangible storage medium, such as random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disks, removable storage disks, CD-ROMs, or any other form of storage medium known in the art.
[0103] Those skilled in the art will understand that the exemplary components, systems, and methods described in conjunction with the embodiments disclosed herein can be implemented in hardware, software, or a combination of both. Whether implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this invention. When implemented in hardware, it can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this invention are programs or code segments used to perform the desired tasks. The programs or code segments can be stored in a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried in a carrier wave.
[0104] It should be clarified that the present invention is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of the present invention is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of the present invention.
[0105] In this invention, features described and / or illustrated for one embodiment may be used in the same or similar manner in one or more other embodiments, and / or combined with or in place of features of other embodiments.
[0106] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, various modifications and variations of the embodiments of the present invention are possible. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for retrieval enhancement based on a large model of a fusion search engine, characterized in that, The steps of this method include: Obtain the query request, and extract keywords and entities based on the query content in the query request; The query domain is determined based on the keywords and entities of the query content, and at least one corresponding retrieval method is determined based on the query domain; The query content is retrieved using the corresponding retrieval method to obtain the retrieval results; If there are multiple search methods corresponding to the query content, the search results will be merged and the merged search results will be displayed.
2. The large model retrieval enhancement method based on a fusion search engine according to claim 1, characterized in that, In the steps of obtaining a query request and extracting keywords and entities based on the query content in the query request, the user query input module receives the query request and determines whether the query request is a text request or a voice request. If the query request is a voice request, the voice data of the voice request is converted into text data, and the text data is used as the query content.
3. The method for enhancing large-model retrieval based on a fusion search engine according to claim 1, characterized in that, The steps of obtaining a query request and extracting keywords and entities based on the query content in the query request further include: The query content is segmented into words using the query analysis module. It also identifies keywords and entities in the segmented text.
4. The method for enhancing large-model retrieval based on a fusion search engine according to claim 1, characterized in that, In the step of determining the query domain based on the keywords and entities of the query content, an input vector is constructed based on the keywords and entities, and the input vector is input into a pre-trained recognition model, which outputs the corresponding query domain.
5. The method for enhancing large model retrieval based on a fusion search engine according to claim 4, characterized in that, In the step of determining at least one corresponding retrieval method based on the query domain, the query domain includes a stable domain, a slowly changing domain, a rapidly changing domain, and a real-time domain, and the retrieval method includes a knowledge base, a knowledge graph, and a search engine.
6. The large model retrieval enhancement method based on a fusion search engine according to claim 5, characterized in that, In the step of retrieving the query content using the corresponding retrieval method: If the query domain of the query content is a stable domain or a slowly changing domain, then a retrieval method using a knowledge base or knowledge graph will be used. If the query content pertains to a rapidly changing or real-time domain, a search engine will be used for retrieval.
7. The large model retrieval enhancement method based on a fusion search engine according to any one of claims 1-6, characterized in that, In the step of merging the search results and displaying the merged search results if the search content corresponds to multiple search methods, the result data corresponding to the multiple search results is deduplicated, the result data after deduplication is sorted by relevance, and the results with higher relevance are displayed first, thus obtaining the merged search results.
8. The large model retrieval enhancement method based on a fusion search engine according to claim 7, characterized in that, In the step of ranking the relevance of the deduplicated result data, a pre-set scoring model is used to score each result data item for multiple items. Based on the scores of multiple items, a weighted calculation is performed to obtain the ranking weight value corresponding to each result data item. Based on the ranking weight value, each result data item is ranked.
9. The large model retrieval enhancement method based on a fusion search engine according to claim 1, characterized in that, In the step of displaying the fused search results, structured data and unstructured data are marked by a pre-set result display module. The content of the structured data is displayed, and the links to the unstructured data are displayed.
10. A system for retrieval enhancement based on a large model of a fusion search engine, characterized in that, The system comprises a computer device comprising a processor and a memory having stored therein computer instructions, the processor being configured to execute the computer instructions stored in the memory, the system implementing the steps of the method as claimed in any one of claims 1 to 9 when the computer instructions are executed by the processor.