Text retrieval method and device, computer device and storage medium

By extracting and reconstructing semantic features from the text to be retrieved, the problem of low retrieval accuracy caused by incomplete user input is solved, achieving higher retrieval accuracy and more complete semantic expression.

CN117251528BActive Publication Date: 2026-06-16INDUSTRIAL AND COMMERCIAL BANK OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INDUSTRIAL AND COMMERCIAL BANK OF CHINA
Filing Date
2023-09-22
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing text retrieval methods suffer from low accuracy due to incomplete user input of the text to be retrieved.

Method used

By extracting and reconstructing semantic features from the text to be retrieved, target semantic features are obtained. The completeness of the target semantic features is greater than that of the initial semantic features. Based on the target semantic features, the matching target text is determined from multiple candidate texts.

🎯Benefits of technology

It improves the accuracy of text retrieval, fully explores users' retrieval needs, and more completely expresses the semantic information of the text to be retrieved.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117251528B_ABST
    Figure CN117251528B_ABST
Patent Text Reader

Abstract

The application relates to a text retrieval method and device, computer equipment, a storage medium and a computer program product, relates to the technical field of computers, and can also be used in the field of financial technology or other related fields. The method comprises the following steps: acquiring a text to be retrieved; performing semantic feature extraction processing on the text to be retrieved to obtain initial semantic features of the text to be retrieved; performing reconstruction processing on the initial semantic features to obtain target semantic features of the text to be retrieved; the completeness of the semantics represented by the target semantic features is greater than the completeness of the semantics represented by the initial semantic features; and based on the target semantic features, target text matching the text to be retrieved is determined from a plurality of candidate texts. The method can improve the retrieval accuracy of text retrieval.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a text retrieval method, apparatus, computer equipment, storage medium, and computer program product. Background Technology

[0002] Text retrieval technology is widely used in information retrieval, search engines and other scenarios. Current text retrieval is usually based on the word frequency features of the text to be retrieved by the user, and retrieves the target text that matches the text to be retrieved from the candidate text.

[0003] However, the text entered by users is often incomplete and cannot accurately describe the user's search needs, which can easily lead to low accuracy in text retrieval. Summary of the Invention

[0004] Therefore, it is necessary to provide a text retrieval method, apparatus, computer equipment, computer-readable storage medium, and computer program product that can improve retrieval accuracy to address the aforementioned technical problem of low retrieval accuracy.

[0005] Firstly, this application provides a text retrieval method, including:

[0006] Get the text to be searched;

[0007] The semantic features of the text to be retrieved are extracted to obtain the initial semantic features of the text to be retrieved.

[0008] The initial semantic features are reconstructed to obtain the target semantic features of the text to be retrieved; the semantic completeness represented by the target semantic features is greater than that represented by the initial semantic features.

[0009] Based on the target semantic features, the target text that matches the text to be retrieved is determined from multiple candidate texts.

[0010] In one embodiment, the process of reconstructing the initial semantic features to obtain the target semantic features of the text to be retrieved includes:

[0011] Based on the feature reconstruction model, the initial semantic features are reconstructed to obtain reconstructed semantic features that satisfy the complete semantic conditions;

[0012] Based on the feature space of the initial semantic features, the feature space of the reconstructed semantic features is constrained to obtain the reconstructed semantic features with constrained feature space.

[0013] The reconstructed semantic features constrained by the feature space are used as the target semantic features of the text to be retrieved.

[0014] In one embodiment, the feature reconstruction model is used to reconstruct the initial semantic features to obtain reconstructed semantic features that satisfy the complete semantic conditions, including:

[0015] Based on each mapping layer in the feature reconstruction model, the initial semantic features are mapped to obtain the mapped semantic features corresponding to each mapping layer;

[0016] Each mapped semantic feature is reconstructed separately to obtain multiple candidate reconstructed semantic features;

[0017] From the multiple candidate reconstructed semantic features, the reconstructed semantic features that satisfy the complete semantic conditions are selected.

[0018] In one embodiment, the step of selecting the reconstructed semantic features that satisfy the complete semantic condition from the plurality of candidate reconstructed semantic features includes:

[0019] The multiple candidate reconstructed semantic features are normalized respectively to obtain the normalized value of each candidate reconstructed semantic feature;

[0020] From the multiple candidate reconstructed semantic features, candidate reconstructed semantic features with normalized values ​​greater than a preset normalization threshold are selected as the reconstructed semantic features that satisfy the complete semantic condition.

[0021] In one embodiment, the step of performing semantic feature extraction processing on the text to be retrieved to obtain the initial semantic features of the text to be retrieved includes:

[0022] The text to be retrieved is input into a pre-trained feature filter to obtain multiple sets of candidate semantic features of the text to be retrieved.

[0023] Determine the similarity between each group of candidate semantic features and the sample semantic features of the training samples of the feature filter; the sample semantic features of the training samples satisfy the complete semantic condition;

[0024] From the multiple sets of candidate semantic features, candidate semantic features whose similarity satisfies the preset similarity conditions are selected as the initial semantic features of the text to be retrieved.

[0025] In one embodiment, the multiple sets of candidate semantic features of the text to be retrieved are obtained in the following way:

[0026] Obtain multiple sentences from the text to be retrieved;

[0027] Extract the word segments that meet the preset semantic conditions from each clause;

[0028] Based on the word segmentation that meets the preset semantic conditions corresponding to each sentence, candidate semantic features corresponding to each sentence are obtained, and the candidate semantic features corresponding to each sentence are combined to form multiple sets of candidate semantic features of the text to be retrieved.

[0029] In one embodiment, determining the target text matching the text to be retrieved from multiple candidate texts based on the target semantic features includes:

[0030] Based on the target semantic features, the matching degree between the text to be retrieved and each candidate text is determined;

[0031] From the multiple candidate texts, candidate texts whose matching degree meets the preset matching degree conditions are selected as target texts that match the text to be retrieved.

[0032] Secondly, this application also provides a text retrieval device, comprising:

[0033] The text acquisition module is used to acquire the text to be searched.

[0034] The feature extraction module is used to perform semantic feature extraction processing on the text to be retrieved to obtain the initial semantic features of the text to be retrieved;

[0035] The feature reconstruction module is used to reconstruct the initial semantic features to obtain the target semantic features of the text to be retrieved; the semantic completeness represented by the target semantic features is greater than the semantic completeness represented by the initial semantic features.

[0036] The text retrieval module is used to determine the target text that matches the text to be retrieved from multiple candidate texts based on the target semantic features.

[0037] Thirdly, this application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to perform the following steps:

[0038] Get the text to be searched;

[0039] The semantic features of the text to be retrieved are extracted to obtain the initial semantic features of the text to be retrieved.

[0040] The initial semantic features are reconstructed to obtain the target semantic features of the text to be retrieved; the semantic completeness represented by the target semantic features is greater than that represented by the initial semantic features.

[0041] Based on the target semantic features, the target text that matches the text to be retrieved is determined from multiple candidate texts.

[0042] Fourthly, this application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, performs the following steps:

[0043] Get the text to be searched;

[0044] The semantic features of the text to be retrieved are extracted to obtain the initial semantic features of the text to be retrieved.

[0045] The initial semantic features are reconstructed to obtain the target semantic features of the text to be retrieved; the semantic completeness represented by the target semantic features is greater than that represented by the initial semantic features.

[0046] Based on the target semantic features, the target text that matches the text to be retrieved is determined from multiple candidate texts.

[0047] Fifthly, this application also provides a computer program product, including a computer program that, when executed by a processor, performs the following steps:

[0048] Get the text to be searched;

[0049] The semantic features of the text to be retrieved are extracted to obtain the initial semantic features of the text to be retrieved.

[0050] The initial semantic features are reconstructed to obtain the target semantic features of the text to be retrieved; the semantic completeness represented by the target semantic features is greater than that represented by the initial semantic features.

[0051] Based on the target semantic features, the target text that matches the text to be retrieved is determined from multiple candidate texts.

[0052] The aforementioned text retrieval method, apparatus, computer equipment, storage medium, and computer program product first acquire the text to be retrieved; then, semantic feature extraction processing is performed on the text to be retrieved to obtain initial semantic features; next, the initial semantic features are reconstructed to obtain target semantic features of the text to be retrieved; the semantic completeness represented by the target semantic features is greater than that represented by the initial semantic features; finally, based on the target semantic features, the target text matching the text to be retrieved is determined from multiple candidate texts. In this way, by reconstructing the initial semantic features of the text to be retrieved, target semantic features that can more completely represent the semantics of the text to be retrieved can be obtained, thereby enriching the feature expression of the text to be retrieved based on the initial semantic features, fully exploring the user's retrieval needs; and by searching the candidate texts based on target semantic features that fully describe the retrieval needs, the accuracy of text retrieval can be improved. Attached Figure Description

[0053] To more clearly illustrate the technical solutions in the embodiments or related technologies of this application, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0054] Figure 1 This is a flowchart illustrating a text retrieval method in one embodiment;

[0055] Figure 2 This is a flowchart illustrating the steps of reconstructing initial semantic features to obtain the target semantic features of the text to be retrieved in one embodiment.

[0056] Figure 3 This is a flowchart illustrating the steps of reconstructing initial semantic features to obtain reconstructed semantic features that satisfy the complete semantic conditions in one embodiment.

[0057] Figure 4 This is a flowchart illustrating a text retrieval method in another embodiment;

[0058] Figure 5 This is a structural block diagram of a text retrieval device in one embodiment;

[0059] Figure 6 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0060] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0061] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of the relevant data shall comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0062] It should also be noted that the text retrieval methods, systems, devices, computer equipment, storage media, and computer program products provided in this application can be applied to the financial technology field, such as retrieving information in the business operation process of business personnel or the development operation process of developers; they can also be used in other related fields, such as in the field of computer technology, to optimize the retrieval accuracy of search engines.

[0063] In one exemplary embodiment, such as Figure 1 As shown, a text retrieval method is provided. This embodiment illustrates the application of this method to a server. It is understood that this method can also be applied to a terminal, or to a system that includes a server and a terminal, and is implemented through the interaction between the server and the terminal. The server can be a standalone server or a server cluster composed of multiple servers. The terminal can be, but is not limited to, various personal computers, laptops, smartphones, tablets, etc.

[0064] In this embodiment, the method includes the following steps:

[0065] Step S102: Obtain the text to be searched.

[0066] The text to be retrieved refers to the text entered by the user, such as one or more keywords, one or more sentences, or one or more paragraphs.

[0067] Specifically, the server receives text entered by the user through the text input box on the search page and uses the entered text as the text to be searched.

[0068] For example, when a user needs to perform a search, they enter the keywords, sentences, or paragraphs to be searched in the text input box on the search page and trigger the search operation after completing the input; when the server detects the search operation triggered by the user, it uses the keywords, sentences, or paragraphs in the text input box as the text to be searched.

[0069] Step S104: Perform semantic feature extraction processing on the text to be retrieved to obtain the initial semantic features of the text to be retrieved.

[0070] Among them, the initial semantic features are used to characterize at least part of the semantics of the text to be retrieved, which can reflect part of the user's retrieval needs. It is understandable that in the early stage of retrieval, users often conduct incomplete retrieval, that is, the information that users have is insufficient, and even the retrieval needs are unclear. Therefore, the text to be retrieved is often incomplete, and the retrieval results obtained based on the text to be retrieved by the user are often not what the user expects.

[0071] Specifically, the server inputs the text to be retrieved into a pre-trained feature filter, and obtains one or more initial semantic features of the text to be retrieved through the feature filter.

[0072] For example, taking the text to be retrieved as "The weather was sunny, and in the afternoon, Xiaoming and Xiaoli went boating in the park. We planned to rent a small boat by the lake and then row in the center of the lake," the server, through a feature filter, can obtain the initial semantic features of the text to be retrieved as "Xiaoming and Xiaoli rowing a boat," "we rented a boat," "we went to the center of the lake," "Xiaoming and Xiaoli went to the park," etc.

[0073] Among them, the pre-trained feature filter is trained on the multilayer perceptron using the embedded feature selection method. The training samples of the feature filter are made based on the complete semantic condition. The complete semantic condition refers to the shortest complete semantic principle, which requires that the training samples must contain the following four elements: necessary adverbials (adverbials of time and place), subject, action predicate, and object.

[0074] Step S106: Reconstruct the initial semantic features to obtain the target semantic features of the text to be retrieved.

[0075] Among them, the semantic completeness represented by the target semantic features is greater than that represented by the initial semantic features; compared with the initial semantic features, the target semantic features can more completely reflect the user's retrieval needs, that is, the content information represented by the target semantic features is more than that represented by the initial semantic features.

[0076] The reconstruction process refers to mapping the initial semantic features to multiple semantic feature subspaces to obtain the semantic features of the text to be retrieved under different semantic feature subspaces.

[0077] Specifically, the server reconstructs the initial semantic features using a GRU (Gate Recurrent Unit) to obtain at least one or more reconstructed semantic features, and then obtains the target semantic features of the text to be retrieved using the initial semantic features and the reconstructed semantic features.

[0078] Step S108: Based on the target semantic features, determine the target text that matches the text to be retrieved from multiple candidate texts.

[0079] Specifically, the server determines the matching degree between the text to be retrieved and each candidate text based on the target semantic features of the text to be retrieved and the semantic features of each candidate text; then, based on the matching degree, it identifies the target text that matches the text to be retrieved from multiple candidate texts, and displays the target text as the search result to the user through the search page.

[0080] The semantic features of candidate texts can be obtained through feature filtering or by labeling the candidate texts.

[0081] For example, assuming there are ten candidate texts, the server selects the three candidate texts with the highest matching degree between the text to be retrieved and each candidate text as the target text, and displays these three target texts on the search page for users to view.

[0082] In the aforementioned text retrieval method, the server first obtains the text to be retrieved; then, it performs semantic feature extraction on the text to obtain initial semantic features; next, it reconstructs the initial semantic features to obtain target semantic features; the semantic completeness represented by the target semantic features is greater than that represented by the initial semantic features; finally, based on the target semantic features, the target text matching the text to be retrieved is determined from multiple candidate texts. In this way, by reconstructing the initial semantic features of the text to be retrieved, the server can obtain target semantic features that more completely represent the semantics of the text to be retrieved, thus enriching the feature expression of the text to be retrieved based on the initial semantic features, fully exploring the user's retrieval needs; and by searching the candidate texts based on target semantic features that fully describe the retrieval needs, the accuracy of text retrieval can be improved.

[0083] like Figure 2 As shown, in an exemplary embodiment, step S106 above, which reconstructs the initial semantic features to obtain the target semantic features of the text to be retrieved, specifically includes the following steps:

[0084] Step S202: Based on the feature reconstruction model, the initial semantic features are reconstructed to obtain reconstructed semantic features that satisfy the complete semantic conditions.

[0085] Step S204: Based on the feature space of the initial semantic features, constrain the feature space of the reconstructed semantic features to obtain the reconstructed semantic features after feature space constraint.

[0086] Step S206: The reconstructed semantic features constrained by the feature space are used as the target semantic features of the text to be retrieved.

[0087] The feature reconstruction model is the GRU (Gate Recurrent Unit) model.

[0088] It is understandable that since the text input by the user may be incomplete, the initial semantic features may not meet the complete semantic conditions. Therefore, it is necessary to optimize the semantic information of the initial semantic features through reconstruction processing to obtain reconstructed semantic features that meet the complete semantic conditions. The content information represented by the reconstructed semantic features is more than that represented by the initial semantic features.

[0089] Specifically, based on a feature reconstruction model, the server maps initial semantic features to different semantic feature subspaces to enrich and optimize the semantic information of the initial semantic features, obtaining reconstructed semantic features of the text to be retrieved that satisfy the complete semantic conditions in different semantic feature subspaces. Since the reconstructed semantic features and the initial semantic features reside in different semantic feature subspaces, the server also needs to constrain the semantic feature subspace of the reconstructed semantic features using the semantic feature subspace of the initial semantic features, ensuring that both are within the same semantic feature space, thus ensuring that the reconstructed semantic features remain within the semantic feature space of the document to be queried. The server can combine the initial semantic features and the reconstructed semantic features using pairwise addition to constrain the feature space of the reconstructed semantic features. Then, the server uses the reconstructed semantic features constrained by the feature space as the target semantic features of the text to be retrieved.

[0090] In this embodiment, the server can enrich the semantic features of the text to be retrieved by reconstructing the initial semantic features through the feature reconstruction model, thereby better mining the user's retrieval needs, more completely expressing the semantic information of the text to be retrieved, and thus achieving more accurate retrieval and improving the retrieval accuracy of text retrieval.

[0091] like Figure 3 As shown, in an exemplary embodiment, step S202 above, based on the feature reconstruction model, reconstructs the initial semantic features to obtain reconstructed semantic features that satisfy the complete semantic conditions, specifically including the following steps:

[0092] Step S302: Based on each mapping layer in the feature reconstruction model, the initial semantic features are mapped to obtain the mapped semantic features corresponding to each mapping layer.

[0093] Step S304: Reconstruct each mapped semantic feature separately to obtain multiple candidate reconstructed semantic features.

[0094] Step S306: Select the reconstructed semantic features that meet the complete semantic conditions from multiple candidate reconstructed semantic features.

[0095] Each mapping layer corresponds to a semantic feature subspace.

[0096] Specifically, the server maps the initial semantic features to each mapping layer in the feature reconstruction model, mapping the initial semantic features to the semantic feature subspaces corresponding to each mapping layer, thus obtaining the mapped semantic features under each semantic feature subspace, i.e., the mapped semantic features corresponding to each mapping layer. Then, the server performs channel merging on each mapped semantic feature and reconstructs the channel-merged mapped semantic features to obtain multiple candidate reconstructed semantic features. It should be noted that since the candidate reconstructed semantic features may not necessarily meet the complete semantic conditions, it is necessary to select the reconstructed semantic features that meet the complete semantic conditions from the multiple candidate reconstructed semantic features.

[0097] For example, suppose the initial semantic features are X∈R m,v , where R m,v denoted by m, which represents the semantic feature subspace in which the initial semantic features are located; m represents the number of initial semantic features; and v represents the feature length of the initial semantic features.

[0098] The server obtains different mapping semantic features X'∈R through the mapping layer. m,n,l , where R m,n,l denoted as the semantic feature subspace in which the mapped semantic feature is located, n represents the number of mapped semantic features, i.e. the number of mapping layers, and l represents the feature length of the mapped semantic feature.

[0099] Then, the server merges the first two channels of the mapped semantic feature, that is, merges the m channel and the v channel, to obtain the channel-merged mapped semantic feature X'∈R. m×n,p , where p represents the feature length of the target reconstructed semantic features to be obtained.

[0100] Next, the server maps the semantic features X'∈R after the channels are merged. m×n,p Reconstruction processing is performed, and the m and v channels are restored to obtain multiple candidate reconstructed semantic features X”∈R. m,n,p , where R m,n,p This represents the semantic feature subspace in which the candidate reconstructed semantic features are located.

[0101] Finally, the server reconstructs semantic features X”∈R from multiple candidates. m,n,p In the process, reconstructed semantic features that meet the complete semantic conditions are selected.

[0102] In this embodiment, the server uses a feature reconstruction model to perform mapping, channel merging, reconstruction, and channel recovery processing on the initial semantic features. This allows the server to obtain candidate reconstructed semantic features of the text to be retrieved under different semantic feature subspaces. By filtering the candidate reconstructed semantic features based on complete semantic conditions, the server obtains reconstructed semantic features that meet the complete semantic conditions. This enriches the semantic features of the text to be retrieved, better uncovers the user's retrieval needs, and more completely expresses the semantic information of the text to be retrieved, thereby achieving more accurate retrieval and improving the retrieval accuracy of text retrieval.

[0103] In an exemplary embodiment, step S306 above, which selects reconstructed semantic features that meet the complete semantic conditions from multiple candidate reconstructed semantic features, specifically includes the following: normalizing multiple candidate reconstructed semantic features to obtain a normalized value for each candidate reconstructed semantic feature; selecting candidate reconstructed semantic features whose corresponding normalized values ​​are greater than a preset normalization threshold from multiple candidate reconstructed semantic features as reconstructed semantic features that meet the complete semantic conditions.

[0104] Specifically, the server normalizes multiple candidate reconstructed semantic features according to the mapping layer and the normalization function, such as the softmax function, to obtain the normalized value of each candidate reconstructed semantic feature, such as the softmax function value. Then, the server takes the candidate reconstructed semantic features whose corresponding normalized value is greater than the preset normalization threshold as the reconstructed semantic features that meet the complete semantic conditions.

[0105] For example, for each candidate, reconstruct the semantic feature X”∈R m,n,p The server reconstructs the semantic features of candidate features X”∈R m,n,p The second channel, the n-channel, calculates the softmax function value. For candidate reconstructed semantic features whose softmax function values ​​are below the normalization threshold, the n-channel value is set to 0. The first two channels, the m-channel and the n-channel, are then transposed to obtain the reconstructed semantic feature X”∈R that satisfies the complete semantic condition. n,m,p .

[0106] In this embodiment, the server determines the normalized value of the candidate reconstructed semantic features, which can filter out the reconstructed semantic features that meet the complete semantic conditions from multiple candidate reconstructed semantic features. This enriches the semantic features of the text to be retrieved, better uncovers the user's retrieval needs, and more completely expresses the semantic information of the text to be retrieved, thereby achieving more accurate retrieval and improving the retrieval accuracy of text retrieval.

[0107] In an exemplary embodiment, step S104 above, which involves extracting semantic features from the text to be retrieved to obtain initial semantic features of the text to be retrieved, specifically includes the following: inputting the text to be retrieved into a pre-trained feature filter to obtain multiple sets of candidate semantic features of the text to be retrieved; determining the similarity between each set of candidate semantic features and the sample semantic features of the training samples of the feature filter; and selecting candidate semantic features from the multiple sets of candidate semantic features whose similarity satisfies the preset similarity condition as the initial semantic features of the text to be retrieved.

[0108] Among them, the semantic features of the training samples satisfy the complete semantic condition; the semantic features of the training samples are obtained by labeling the training samples.

[0109] Specifically, the server first inputs the text to be retrieved into a pre-trained feature filter to obtain multiple sets of candidate semantic features for the text to be retrieved; then, the server calculates the similarity between each set of candidate semantic features and the sample semantic features of each training sample; then, the server selects one or more sets of candidate semantic features with the highest similarity from the multiple sets of candidate semantic features as candidate semantic features that meet the preset similarity conditions, thereby obtaining the initial semantic features of the text to be retrieved.

[0110] For example, the specific process by which the server calculates the similarity between each group of candidate semantic features and the semantic features of each training sample is as follows: The server converts each group of candidate semantic features and each sample semantic feature into vector form, and calculates the cosine similarity between each group of candidate semantic features and each sample semantic feature using the vectors, which is used as the similarity between the candidate semantic features and the sample semantic features; then, the server selects the candidate semantic feature with the highest similarity as the initial semantic feature, or the server sorts each candidate semantic feature in descending order of similarity and selects a preset number of candidate semantic features that are ranked first as the initial semantic features.

[0111] Preferably, the number of initial semantic features is one.

[0112] In this embodiment, the server can select the initial semantic features that most accurately express the semantic information of the text to be retrieved from multiple sets of candidate semantic features of the text to be retrieved by the similarity between the candidate semantic features and the sample semantic features.

[0113] In an exemplary embodiment, multiple sets of candidate semantic features of the text to be retrieved are obtained by: acquiring multiple sentences of the text to be retrieved; extracting word segments that satisfy preset semantic conditions from each sentence; obtaining candidate semantic features corresponding to each sentence based on the word segments that satisfy preset semantic conditions; and forming multiple sets of candidate semantic features of the text to be retrieved by combining the candidate semantic features corresponding to each sentence.

[0114] The presupposed semantic conditions refer to the requirement that the word segment must belong to a necessary adverbial (adverbial of time, adverbial of place), or a subject, or an action predicate, or an object. These presupposed semantic conditions are used to extract keywords from the clauses.

[0115] Specifically, the server divides the text to be retrieved into multiple clauses based on the punctuation marks in the text, and extracts each word from each clause that belongs to a necessary adverbial (adverbial of time, adverbial of place), subject, predicate of action, or object. Then, based on each word extracted from each clause, the server obtains the candidate semantic features corresponding to each clause. Finally, the server combines the candidate semantic features corresponding to each clause into multiple sets of candidate semantic features for the text to be retrieved.

[0116] It is understandable that the text entered by the user may contain a lot of unnecessary non-critical information, such as modal particles. This non-critical information is not helpful for text retrieval and will reduce the accuracy of text retrieval. Therefore, the server needs to extract words that can represent the text to be retrieved, such as necessary adverbs (adverbs of time, adverbs of place), subjects, action predicates, objects, etc.

[0117] In this embodiment, the server can divide the text to be retrieved into multiple sentences by segmenting the text to be retrieved, thereby reducing the extraction range of subsequent word segmentation and enhancing the relevance of the word segmentation extracted in the same sentence. In addition, by extracting word segmentation that meets preset semantic conditions, the server can extract keywords that express key information from the sentences, thereby obtaining multiple sets of candidate semantic features of the text to be retrieved.

[0118] In an exemplary embodiment, step S108, which determines the target text that matches the text to be retrieved from multiple candidate texts based on the target semantic features, specifically includes the following: determining the matching degree between the text to be retrieved and each candidate text according to the target semantic features; and selecting candidate texts from multiple candidate texts whose matching degree meets the preset matching degree conditions as the target text that matches the text to be retrieved.

[0119] Specifically, the server calculates the matching degree between the target semantic features and the semantic features of each candidate text, which is used as the matching degree between the text to be retrieved and each candidate text. Then, from each candidate text, one or more candidate texts with the highest matching degree are selected as candidate texts that meet the preset matching degree conditions, thereby obtaining the target text that matches the text to be retrieved.

[0120] For example, the specific process by which the server calculates the matching degree between the target semantic features and the semantic features of each candidate text is as follows: The server converts the target semantic features and the semantic features of each candidate text into vector form, and calculates the cosine similarity between the target semantic features and the semantic features of each candidate text using the vectors, which is taken as the matching degree between the target semantic features and the semantic features of each candidate text; then, the server selects the candidate text with the highest matching degree as the target text, or the server sorts the candidate texts in descending order of matching degree and selects a preset number of candidate texts at the top of the sort as the target text.

[0121] Preferably, the number of target texts is multiple.

[0122] In this embodiment, the server can filter out the target text that best matches the text to be retrieved from a large number of candidate texts by the matching degree between the target semantic features and the semantic features of the selected text, that is, the target text that best matches the user's retrieval needs, thereby satisfying the user's retrieval needs.

[0123] In one exemplary embodiment, such as Figure 4 As shown, another text retrieval method is provided. Taking the application of this method to a server as an example, the following steps are included:

[0124] Step S401: Obtain the text to be searched.

[0125] Step S402: Input the text to be retrieved into the pre-trained feature filter to obtain multiple sets of candidate semantic features of the text to be retrieved.

[0126] Step S403: Determine the similarity between each group of candidate semantic features and the sample semantic features of the training samples of the feature filter.

[0127] Step S404: From multiple sets of candidate semantic features, select the candidate semantic features whose similarity satisfies the preset similarity conditions, and use them as the initial semantic features of the text to be retrieved.

[0128] Step S405: Based on each mapping layer in the feature reconstruction model, the initial semantic features are mapped to obtain the mapped semantic features corresponding to each mapping layer.

[0129] Step S406: Reconstruct each mapped semantic feature separately to obtain multiple candidate reconstructed semantic features.

[0130] Step S407: Normalize the multiple candidate reconstructed semantic features to obtain the normalized value of each candidate reconstructed semantic feature.

[0131] Step S408: From multiple candidate reconstructed semantic features, select candidate reconstructed semantic features whose corresponding normalized values ​​are greater than the preset normalization threshold, and use them as reconstructed semantic features that meet the complete semantic conditions.

[0132] Step S409: Based on the feature space of the initial semantic features, constrain the feature space of the reconstructed semantic features to obtain the reconstructed semantic features after feature space constraint.

[0133] Step S410: Use the reconstructed semantic features constrained by the feature space as the target semantic features of the text to be retrieved.

[0134] Step S411: Determine the matching degree between the text to be retrieved and each candidate text based on the target semantic features.

[0135] Step S412: Select candidate texts from multiple candidate texts whose matching degree meets the preset matching degree conditions, and use them as target texts to match the text to be retrieved.

[0136] In this embodiment, firstly, the server, based on the similarity between candidate semantic features and sample semantic features, can select the initial semantic features that most accurately express the semantic information of the text to be retrieved from multiple sets of candidate semantic features. Secondly, the server, through the reconstruction of the initial semantic features using a feature reconstruction model, can enrich the semantic features of the text to be retrieved, thereby better uncovering the user's retrieval needs and more completely expressing the semantic information of the text to be retrieved. Thirdly, the server, based on the matching degree between the target semantic features and the semantic features of the selected text, can select the target text that best matches the text to be retrieved from a large number of candidate texts, that is, the target text that best matches the user's retrieval needs, thereby satisfying the user's retrieval requirements. The text retrieval method based on the above process, compared with text retrieval technology based on word frequency features, better uncovers the user's retrieval needs, more completely expresses the semantic information of the text to be retrieved, and improves the retrieval accuracy of text retrieval.

[0137] To more clearly illustrate the text retrieval method provided in the embodiments of this application, a specific embodiment is described below. However, it should be understood that the embodiments of this application are not limited thereto. In an exemplary embodiment, this application also provides a text retrieval method based on feature filtering and feature reconstruction, specifically including the following steps:

[0138] Step 1: Train the feature filter.

[0139] The server creates training samples for the feature filter based on the principle of shortest complete semantics, and trains the feature filter using the training samples to obtain the trained feature filter.

[0140] Step 2: Feature filtering.

[0141] The server inputs the text to be retrieved by the user into the trained feature filter, and the feature filter selects a set of semantic features of the text to be retrieved as the initial semantic features.

[0142] Step 3: Feature Reconstruction.

[0143] The server reconstructs the initial semantic features of the text to be retrieved based on the feature reconstruction model, obtains multiple sets of reconstructed semantic features of the text to be retrieved, and uses the feature space of the initial semantic features to constrain the feature space of the reconstructed semantic features, thereby obtaining the target semantic features of the text to be retrieved.

[0144] Step 4: Text retrieval.

[0145] Based on the target semantic features of the text to be retrieved, the server searches among multiple candidate texts, filters out multiple target texts that match the text to be retrieved, and displays the multiple target texts on the search page.

[0146] It is understandable that text retrieval technology is widely used in information retrieval, search engines, and other scenarios. Current text retrieval methods, such as TF-IDF (term frequency–inverse document frequency, a commonly used weighting technique for information retrieval and data mining) and DSSM (Deep Structured Semantic Model, a semantic model based on deep networks), only rely on the term frequency or semantic features of the text input by the user for retrieval. They do not consider that user input is often incomplete or even fails to fully describe the user's retrieval needs. In this embodiment, key information (initial semantic features) is obtained from the text to be retrieved through feature selection, and then a reconstruction strategy is used to obtain the retrieval information that the user might need (target semantic features) to improve retrieval accuracy.

[0147] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.

[0148] Based on the same inventive concept, this application also provides a text retrieval device for implementing the text retrieval method described above. The solution provided by this device is similar to the solution described in the above method; therefore, the specific limitations in one or more text retrieval device embodiments provided below can be found in the limitations of the text retrieval method described above, and will not be repeated here.

[0149] In one exemplary embodiment, such as Figure 5 As shown, a text retrieval device is provided, including: a text acquisition module 502, a feature extraction module 504, a feature reconstruction module 506, and a text retrieval module 508, wherein:

[0150] The text acquisition module 502 is used to acquire the text to be retrieved.

[0151] The feature extraction module 504 is used to perform semantic feature extraction processing on the text to be retrieved, so as to obtain the initial semantic features of the text to be retrieved.

[0152] The feature reconstruction module 506 is used to reconstruct the initial semantic features to obtain the target semantic features of the text to be retrieved; the semantic integrity represented by the target semantic features is greater than that represented by the initial semantic features.

[0153] The text retrieval module 508 is used to determine the target text that matches the text to be retrieved from multiple candidate texts based on the target semantic features.

[0154] In an exemplary embodiment, the feature reconstruction module 506 is further configured to reconstruct the initial semantic features based on the feature reconstruction model to obtain reconstructed semantic features that satisfy the complete semantic conditions; constrain the feature space of the reconstructed semantic features based on the feature space of the initial semantic features to obtain reconstructed semantic features constrained by the feature space; and use the reconstructed semantic features constrained by the feature space as the target semantic features of the text to be retrieved.

[0155] In an exemplary embodiment, the feature reconstruction module 506 is further configured to perform mapping processing on the initial semantic features based on each mapping layer in the feature reconstruction model to obtain the mapped semantic features corresponding to each mapping layer; perform reconstruction processing on each mapped semantic feature to obtain multiple candidate reconstructed semantic features; and select the reconstructed semantic features that satisfy the complete semantic conditions from the multiple candidate reconstructed semantic features.

[0156] In an exemplary embodiment, the feature reconstruction module 506 is further configured to perform normalization processing on multiple candidate reconstructed semantic features respectively to obtain a normalized value for each candidate reconstructed semantic feature; and to select candidate reconstructed semantic features whose corresponding normalized values ​​are greater than a preset normalization threshold from the multiple candidate reconstructed semantic features as reconstructed semantic features that satisfy the complete semantic conditions.

[0157] In an exemplary embodiment, the feature extraction module 504 is further configured to input the text to be retrieved into a pre-trained feature filter to obtain multiple sets of candidate semantic features of the text to be retrieved; determine the similarity between each set of candidate semantic features and the sample semantic features of the training samples of the feature filter; ensure that the sample semantic features of the training samples meet the complete semantic condition; and select candidate semantic features whose similarity meets the preset similarity condition from the multiple sets of candidate semantic features as the initial semantic features of the text to be retrieved.

[0158] In an exemplary embodiment, the feature extraction module 504 is further configured to acquire multiple sentences of the text to be retrieved; extract word segments that satisfy preset semantic conditions from each sentence; obtain candidate semantic features corresponding to each sentence based on the word segments that satisfy preset semantic conditions corresponding to each sentence; and combine the candidate semantic features corresponding to each sentence into multiple sets of candidate semantic features of the text to be retrieved.

[0159] In an exemplary embodiment, the text retrieval module 508 is further configured to determine the matching degree between the text to be retrieved and each candidate text based on the target semantic features; and to select candidate texts from multiple candidate texts whose matching degree meets the preset matching degree conditions as target texts that match the text to be retrieved.

[0160] Each module in the aforementioned text retrieval device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the operations corresponding to each module.

[0161] In one exemplary embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 6As shown, this computer device includes a processor, memory, input / output (I / O) interfaces, and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operating system and computer programs stored in the non-volatile storage media. The database stores multiple candidate texts and training samples for feature filters. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communication with external terminals via a network connection. When executed by the processor, the computer program implements a text retrieval method.

[0162] Those skilled in the art will understand that Figure 6 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0163] In one exemplary embodiment, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above-described method embodiments.

[0164] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the steps in the above method embodiments.

[0165] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.

[0166] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0167] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0168] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. A text retrieval method, characterized in that, The method includes: Get the text to be searched; The semantic feature extraction process for the text to be retrieved to obtain the initial semantic features of the text to be retrieved includes: inputting the text to be retrieved into a pre-trained feature filter to obtain multiple sets of candidate semantic features of the text to be retrieved; determining the similarity between each set of candidate semantic features and the sample semantic features of the training samples of the feature filter; the sample semantic features of the training samples satisfying the complete semantic condition; and selecting candidate semantic features whose similarity satisfies the preset similarity condition from the multiple sets of candidate semantic features as the initial semantic features of the text to be retrieved. The initial semantic features are reconstructed to obtain the target semantic features of the text to be retrieved; the semantic completeness represented by the target semantic features is greater than that represented by the initial semantic features; the feature space of the reconstructed semantic features is constrained based on the feature space of the initial semantic features, so that the reconstructed semantic features are still within the semantic feature space of the document to be queried. Based on the target semantic features, the target text that matches the text to be retrieved is determined from multiple candidate texts.

2. The method according to claim 1, characterized in that, The process of reconstructing the initial semantic features to obtain the target semantic features of the text to be retrieved includes: Based on the feature reconstruction model, the initial semantic features are reconstructed to obtain reconstructed semantic features that satisfy the complete semantic conditions; Based on the feature space of the initial semantic features, the feature space of the reconstructed semantic features is constrained to obtain the reconstructed semantic features with constrained feature space. The reconstructed semantic features constrained by the feature space are used as the target semantic features of the text to be retrieved.

3. The method according to claim 2, characterized in that, The feature reconstruction model reconstructs the initial semantic features to obtain reconstructed semantic features that satisfy the complete semantic conditions, including: Based on each mapping layer in the feature reconstruction model, the initial semantic features are mapped to obtain the mapped semantic features corresponding to each mapping layer; Each mapped semantic feature is reconstructed separately to obtain multiple candidate reconstructed semantic features; From the multiple candidate reconstructed semantic features, the reconstructed semantic features that satisfy the complete semantic conditions are selected.

4. The method according to claim 3, characterized in that, The step of selecting the reconstructed semantic features that satisfy the complete semantic condition from the plurality of candidate reconstructed semantic features includes: The multiple candidate reconstructed semantic features are normalized respectively to obtain the normalized value of each candidate reconstructed semantic feature; From the multiple candidate reconstructed semantic features, candidate reconstructed semantic features with normalized values ​​greater than a preset normalization threshold are selected as the reconstructed semantic features that satisfy the complete semantic condition.

5. The method according to claim 1, characterized in that, The multiple sets of candidate semantic features of the text to be retrieved are obtained through the following methods: Obtain multiple sentences from the text to be retrieved; Extract the word segments that meet the preset semantic conditions from each clause; Based on the word segmentation that meets the preset semantic conditions corresponding to each sentence, candidate semantic features corresponding to each sentence are obtained, and the candidate semantic features corresponding to each sentence are combined to form multiple sets of candidate semantic features of the text to be retrieved.

6. The method according to any one of claims 1 to 5, characterized in that, The step of determining the target text that matches the text to be retrieved from multiple candidate texts based on the target semantic features includes: Based on the target semantic features, the matching degree between the text to be retrieved and each candidate text is determined; From the multiple candidate texts, candidate texts whose matching degree meets the preset matching degree condition are selected as target texts that match the text to be retrieved.

7. A text retrieval device, characterized in that, The device includes: The text acquisition module is used to acquire the text to be retrieved. The feature extraction module is used to perform semantic feature extraction processing on the text to be retrieved to obtain the initial semantic features of the text to be retrieved; The feature extraction module is further configured to: input the text to be retrieved into a pre-trained feature filter to obtain multiple sets of candidate semantic features of the text to be retrieved; determine the similarity between each set of candidate semantic features and the sample semantic features of the training samples of the feature filter; the sample semantic features of the training samples satisfy the complete semantic condition; and select candidate semantic features whose similarity satisfies the preset similarity condition from the multiple sets of candidate semantic features as the initial semantic features of the text to be retrieved. The feature reconstruction module is used to reconstruct the initial semantic features to obtain the target semantic features of the text to be retrieved; the semantic integrity represented by the target semantic features is greater than that represented by the initial semantic features; the feature space of the reconstructed semantic features is constrained based on the feature space of the initial semantic features, so that the reconstructed semantic features are still within the semantic feature space of the document to be queried. The text retrieval module is used to determine the target text that matches the text to be retrieved from multiple candidate texts based on the target semantic features.

8. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 6.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 6.

10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 6.