Target text retrieval method and apparatus

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By training the model using a pre-acquired dataset of positive and negative samples and employing an unsupervised learning method to determine the feature vector, the problem of high manual annotation costs in Chinese text retrieval and matching is solved, thus achieving efficient text retrieval.

CN115730037BActive Publication Date: 2026-06-26CHINA TELECOM CORP LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHINA TELECOM CORP LTD
Filing Date: 2022-11-14
Publication Date: 2026-06-26

AI Technical Summary

Technical Problem

Existing Chinese text retrieval and matching methods require a large amount of manually labeled training data, resulting in high model training costs and a lack of effective training and fast retrieval mechanisms, which affects retrieval accuracy.

Method used

The pre-acquired positive and negative sample datasets are used to train a pre-set model. The feature vectors of the retrieved text and the text to be retrieved are obtained through unsupervised learning. The inner product distance is used to determine the target text, reducing the need for manual annotation.

Benefits of technology

This enables model training without manual annotation, reducing training costs and improving the accuracy and efficiency of retrieval.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115730037B_ABST

Patent Text Reader

Abstract

The application discloses a target text retrieval method and device. The method comprises the following steps: acquiring a searched text set and a to-be-searched text, inputting the searched text set and the to-be-searched text into a preset model for analysis, obtaining a first feature vector set corresponding to the searched text set and a second feature vector corresponding to the to-be-searched text, wherein the preset model is obtained by training a pre-acquired positive and negative sample data set, and the to-be-searched text is associated with the text in the searched text set; determining a first feature vector closest to the second feature vector in the inner product distance from the first feature vector set; and determining the text corresponding to the first feature vector as a searched target text. The application solves the technical problem of high model training cost caused by the need of a large amount of artificial label data for training data.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence, and more specifically, to a method and apparatus for retrieving target text. Background Technology

[0002] With the rapid development of artificial intelligence technology, especially the ever-evolving field of natural language processing, text retrieval and matching has become a core task in natural language processing. Text matching is indispensable in dialogue systems, recommendation systems, and search engines. Therefore, the quality of Chinese text retrieval and matching methods will significantly impact the healthy development of related businesses.

[0003] In related technologies, the application of deep learning methods to Chinese text retrieval and matching most commonly involves supervised training (most notably the popular dual-tower model in recent years). This method requires a large amount of labeled data for training, which often necessitates extensive manual annotation, incurring significant costs for companies. For example, corpus-based text retrieval and matching methods require data collection for training corpora, further increasing labor costs. Furthermore, this approach lacks a more effective training mechanism for the deep learning training phase of representation vectors and offers no improvement methods for the crucial deep learning optimizer. Simultaneously, it fails to propose an effective and fast retrieval mechanism for the matching inference phase, still relying on traditional iterative comparison methods, which consume excessive resources and time. Another approach depends on traditional rule-based encoding matching methods, lacking the ability to learn text order, resulting in low retrieval accuracy.

[0004] There is currently no effective solution to the above problems. Summary of the Invention

[0005] This application provides a target text retrieval method and apparatus to at least solve the technical problem of high model training costs caused by the need for a large amount of manually labeled data for training data.

[0006] According to one aspect of the embodiments of this application, a target text retrieval method is provided, comprising: acquiring a set of texts to be retrieved and texts to be retrieved; inputting the set of texts to be retrieved and the texts to be retrieved into a preset model for analysis to obtain a first feature vector set corresponding to the set of texts to be retrieved and a second feature vector corresponding to the texts to be retrieved, wherein the preset model is trained using a pre-acquired positive and negative sample dataset, and the texts to be retrieved are associated with the texts in the set of texts to be retrieved; determining a first feature vector from the first feature vector set that has the closest inner product distance to the second feature vector; and determining the text corresponding to the first feature vector as the retrieved target text.

[0007] Optionally, the model parameters of the preset model are determined by the following methods: inputting the positive and negative sample datasets into the preset model and training the preset model according to a preset objective function to determine a first gradient; adding perturbation values to the samples in the positive and negative sample datasets to determine a second gradient after adding the perturbation values; determining the sum of the first gradient and the second gradient as the target gradient; and determining the model parameters of the preset model based on the target gradient.

[0008] Optionally, the perturbation value is determined by: determining the norm of the first gradient; and determining the perturbation value based on a preset perturbation parameter, the first gradient, and the norm of the first gradient.

[0009] Optionally, determining the model parameters of the preset model based on the target gradient includes: inputting the target gradient into a first sub-optimizer and a second sub-optimizer in the preset optimizer; determining the model parameters of the preset model based on the output results of the first sub-optimizer and the second sub-optimizer, wherein the training rate of the first sub-optimizer is higher than that of the second sub-optimizer.

[0010] Optionally, determining the model parameters of the preset model based on the output results of the first sub-optimizer and the second sub-optimizer includes: determining the difference between the first model parameters corresponding to the output results of the first sub-optimizer and the second model parameters corresponding to the output results of the second sub-optimizer; determining the product of the difference and the preset parameters as the initial value of the model parameters for the next iteration of the first sub-optimizer; and inputting the positive and negative sample datasets and the target gradient into the first sub-optimizer to determine the model parameters of the preset model.

[0011] Optionally, the positive and negative sample dataset is determined by: inputting the same text from the obtained text training set twice into a preset encoder to obtain a positive sample pair; inputting two different texts from the text training set into the preset encoder to obtain a negative sample pair; and combining multiple positive sample pairs and multiple negative sample pairs to obtain the positive and negative sample dataset.

[0012] Optionally, after determining the first feature vector from the first feature vector set that has the closest inner product distance to the second feature vector as the first feature vector, the method further includes: determining that the first feature vector is invalid if the inner product distance corresponding to the first feature vector is greater than a preset threshold.

[0013] According to another aspect of the embodiments of this application, a target text retrieval device is also provided, comprising: an acquisition module, configured to acquire a set of texts to be retrieved and a text to be retrieved, and input the set of texts to be retrieved and the text to be retrieved into a preset model to obtain a first feature vector set corresponding to the set of texts to be retrieved and a second feature vector corresponding to the text to be retrieved, wherein the preset model is trained using a pre-acquired positive and negative sample dataset; a selection module, configured to determine a first feature vector from the first feature vector set whose inner product distance is closest to that of the second feature vector as a first feature vector; and a determination module, configured to determine the text corresponding to the first feature vector as the retrieved target text.

[0014] According to another aspect of the embodiments of this application, a non-volatile storage medium is also provided, the non-volatile storage medium including a stored computer program, wherein the device where the non-volatile storage medium is located executes the above-described target text retrieval method by running the computer program.

[0015] According to another aspect of the embodiments of this application, a computer device is also provided, including a memory and a processor, the processor being used to run a program, wherein the program executes the above-described target text retrieval method when it runs.

[0016] In this embodiment, a set of retrieved texts and the text to be retrieved are obtained, and the retrieved texts and the text to be retrieved are input into a preset model for analysis to obtain a first feature vector set corresponding to the retrieved text set and a second feature vector corresponding to the text to be retrieved. The preset model is trained using a pre-acquired positive and negative sample dataset, and the text to be retrieved is associated with the texts in the retrieved text set. The first feature vector with the closest inner product distance to the second feature vector is determined from the first feature vector set. The text corresponding to the first feature vector is determined as the retrieved target text. By using the pre-acquired positive and negative sample dataset to train the preset model, the purpose of unsupervised learning is achieved, thereby realizing the technical effect of not requiring manual annotation. This solves the technical problem of high model training cost caused by the need for a large amount of manually labeled training data. Attached Figure Description

[0017] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments of this application and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:

[0018] Figure 1 This is a hardware structure block diagram of a computer terminal (or mobile device) for a target text retrieval method according to an embodiment of this application;

[0019] Figure 2 This is a flowchart illustrating a target text retrieval method according to this application;

[0020] Figure 3 This is a schematic diagram of a target text retrieval device in related technologies. Detailed Implementation

[0021] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort should fall within the scope of protection of the present application.

[0022] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0023] The methods and embodiments provided in this application can be executed on mobile terminals, computer terminals, cloud servers, or similar computing devices. Figure 1 A hardware block diagram of a computer terminal (or mobile device) for implementing a target text retrieval method is shown. Figure 1 As shown, the computer terminal 10 (or mobile device 10) may include one or more processors 102 (shown as 102a, 102b, ..., 102n in the figure) 102 (processor 102 may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission module 106 for communication functions. In addition, it may also include: a display, an input / output interface (I / O interface), a universal serial bus (USB) port (which may be included as one of the ports of the I / O interface), a network interface, a power supply, and / or a camera. Those skilled in the art will understand that... Figure 1The structure shown is for illustrative purposes only and does not limit the structure of the aforementioned electronic device. For example, computer terminal 10 may also include... Figure 1 The more or fewer components shown, or having the same Figure 1 The different configurations shown.

[0024] It should be noted that the aforementioned one or more processors 102 and / or other data processing circuits are generally referred to herein as "data processing circuits". These data processing circuits may be embodied, in whole or in part, in software, hardware, firmware, or any other combination thereof. Furthermore, the data processing circuits may be a single, independent processing module, or may be integrated, in whole or in part, into any other element within the computer terminal 10 (or mobile device). As involved in the embodiments of this application, the data processing circuits serve as a processor control mechanism (e.g., selection of a variable resistor termination path connected to an interface).

[0025] The memory 104 can be used to store software programs and modules of application software, such as the program instructions / data storage device corresponding to the target text retrieval method in this embodiment. The processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, thereby realizing the target text retrieval method described above. The memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory remotely located relative to the processor 102, and these remote memories can be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0026] The transmission module 106 is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by the communication provider of the computer terminal 10. In one example, the transmission module 106 includes a Network Interface Controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the transmission module 106 may be a Radio Frequency (RF) module, used for wireless communication with the Internet.

[0027] The display can be, for example, a touchscreen liquid crystal display (LCD) that allows the user to interact with the user interface of the computer terminal 10 (or mobile device).

[0028] According to an embodiment of this application, a method embodiment for a target text retrieval method is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.

[0029] Figure 2 This is a target text retrieval method according to embodiments of this application, such as... Figure 2 As shown, the method includes the following steps:

[0030] Step S202: Obtain the set of texts to be retrieved and the text to be retrieved, and input the set of texts to be retrieved and the text to be retrieved into a preset model for analysis to obtain a first feature vector set corresponding to the set of texts to be retrieved and a second feature vector corresponding to the text to be retrieved. The preset model is trained using a pre-acquired positive and negative sample dataset, and the text to be retrieved is associated with the texts in the set of texts to be retrieved.

[0031] Step S204: Determine the first feature vector from the first feature vector set that has the closest inner product distance to the second feature vector;

[0032] Step S206: Determine the text corresponding to the first feature vector as the retrieved target text.

[0033] Through the above steps, the method provided in this application achieves the goal of unsupervised learning by training a preset model using a pre-acquired positive and negative sample dataset, thereby achieving the technical effect of not requiring manual labeling and solving the technical problem of high model training cost caused by the need for a large amount of manually labeled training data.

[0034] It should be noted that the above target text retrieval method is applicable to Chinese retrieval. After obtaining the text training dataset, data cleaning is required. For example, regular expressions can be used to clean the text data, removing English words, special characters, and other messy characters to improve the accuracy of model training.

[0035] The association between the text to be retrieved and the text in the retrieved text set can be that the semantics of the text to be retrieved are associated with the text in the retrieved text set, or that the text to be retrieved is contained in the retrieved text set, for example: the text to be retrieved is a supplier, and the retrieved text set contains the names of various supplier companies, etc.

[0036] It should be further explained that before step S204, the feature vectors in the first feature vector set can be normalized and then an inner product index can be established, for example, by using Faiss (a vector retrieval tool) to establish the inner product index.

[0037] In some embodiments of this application, the model parameters of a preset model can be determined by the following steps: inputting the positive and negative sample datasets into the preset model and training the preset model according to a preset objective function to determine a first gradient; adding perturbation values to the samples in the positive and negative sample datasets to determine a second gradient after adding the perturbation values; determining the sum of the first gradient and the second gradient as the target gradient; and determining the model parameters of the preset model based on the target gradient.

[0038] It should be noted that the positive and negative sample datasets can be determined through contrastive learning. The same text in the obtained text training set is input twice into the preset encoder to obtain a positive sample pair; different texts in the text training set, such as texts different from the first input text, are input into the preset encoder to obtain a negative sample pair; multiple positive and negative sample pairs are combined to obtain the positive and negative sample dataset.

[0039] For example, in a text training dataset containing two texts, "Today's weather is sunny" and "Go out to play tomorrow", the first input of the text "Today's weather is sunny" into the encoder, and the second input of the text "Today's weather is sunny" into the encoder, the result is determined as a positive sample pair, and the second input of the text "Go out to play tomorrow" into the encoder, the result is determined as a negative sample pair.

[0040] In practical applications, we can set up fully connected layers in contrastive learning to construct positive samples and random dropout in the attention mechanism to determine the results of the same text being input into the encoder twice as positive sample pairs, and the results of other texts being input into the encoder as negative sample pairs.

[0041] It should be noted that the text training set can use publicly available datasets, such as lcqmc (a Chinese text similarity dataset based on Baidu Knows published by Harbin Institute of Technology) and ocnli (a native Chinese natural language matching inference dataset).

[0042] The model parameters of the preset model can be determined in the following way: input the positive and negative sample datasets into the preset model and train the preset model according to the preset objective function to determine the first gradient; add perturbation values to the samples of the positive and negative sample datasets to determine the second gradient after adding the perturbation values; determine the target gradient by summing the first gradient and the second gradient; and determine the model parameters of the preset model according to the target gradient.

[0043] In practical applications, the target loss function is set as follows during the training process for each batch of sample pairs:

[0044]

[0045] In the formula, N represents the size of a batch; (que i ,pos i P(pos) represents a sample pair, where when i = j, it represents a positive sample pair, and when i ≠ j, it represents a negative sample pair; i |que i S(que) represents the probability that the second sample appears when the first sample appears in a positive sample pair; i ,pos i ) represents the output of the neural network.

[0046] Based on the loss function value calculated from the forward propagation results, adversarial training is introduced after the neural network obtains the first gradient through backpropagation. Specifically, a perturbation value is determined based on the first gradient and added during the training of the pre-defined model. The loss function value after adding the perturbation value is calculated, and the second gradient is determined through backpropagation. The second gradient is then added to the first gradient to obtain the target gradient. By adding perturbation during the training process, the accuracy of model training is further improved.

[0047] In one alternative approach, the perturbation value is determined by: determining the norm of the first gradient; and determining the perturbation value based on a preset perturbation parameter, the first gradient, and the norm of the first gradient.

[0048] Specifically, the disturbance value can be calculated using the following formula:

[0049] r = ε × g / ||g||2

[0050] In the formula, ε represents the preset perturbation parameter, g represents the first gradient, and ||g||2 represents the L2 norm of the first gradient.

[0051] In some embodiments of this application, the target gradient is input into a first sub-optimizer and a second sub-optimizer in a preset optimizer, respectively; the model parameters of the preset model are determined based on the output results of the first sub-optimizer and the second sub-optimizer, wherein the training rate of the first sub-optimizer is higher than that of the second sub-optimizer.

[0052] In practical applications, you can first set the target loss function for training, then add perturbation values for adversarial training, and finally use two sub-optimizers to obtain the feature vector of the input sample.

[0053] It should be noted that the model parameters of the preset model can be determined in the following way: the difference between the first model parameters corresponding to the output of the first sub-optimizer and the second model parameters corresponding to the output of the second sub-optimizer is determined; the product of the difference and the preset parameters is determined as the initial value of the model parameters for the next iteration of the first sub-optimizer; and the positive and negative sample datasets and the target gradient are input into the first sub-optimizer to determine the model parameters of the preset model.

[0054] Specifically, when searching for the optimal values of the model parameters of the preset model based on the target gradient, the second sub-optimizer retains an extra copy of the parameter weights for training, while the first sub-optimizer performs iterative training in multiple batches of k. Then, the second sub-optimizer calculates the difference between the weight copy at the current training rate and the parameter weight result of the first sub-optimizer, multiplies this difference by the preset parameters, and updates the parameters of the first sub-optimizer for the next iterative training, so as to train the globally optimal model.

[0055] In step S204, if the inner product distance corresponding to the first feature vector is greater than a preset threshold, the first feature vector is determined to be invalid.

[0056] If the first eigenvector is invalid, the first eigenvector that is closest to the second eigenvector is re-determined from the first eigenvector set.

[0057] Understandably, since the mathematical formulas for vector dot product distance and cosine distance are similar, cosine distance can be used instead of dot product distance in practical applications.

[0058] In practical applications, a pre-trained model can be loaded, and the text in the searched text set and the text to be searched can be input into the pre-trained model to obtain the first feature vector set and the second feature vector. The inner product index corresponding to the first feature vector set is established, and a search query is performed to obtain the text corresponding to the nearest distance value, that is, the target text that matches the text to be searched.

[0059] According to another aspect of the embodiments of this application, a target text retrieval device is also provided, such as... Figure 3As shown, it includes: an acquisition module 30, used to acquire a set of retrieved texts and a text to be retrieved, and input the set of retrieved texts and the text to be retrieved into a preset model to obtain a first feature vector set corresponding to the set of retrieved texts and a second feature vector corresponding to the text to be retrieved, wherein the preset model is trained using a pre-acquired positive and negative sample dataset; a selection module 32, used to determine the first feature vector from the first feature vector set whose inner product distance is closest to that of the second feature vector as the first feature vector; and a determination module 34, used to determine the text corresponding to the first feature vector as the retrieved target text.

[0060] The acquisition module 30 includes: a training submodule, a dataset submodule, and a judgment submodule;

[0061] The training submodule includes: a parameter determination unit, which is used to input the positive and negative sample dataset into the preset model and train the preset model according to the preset objective function to determine the first gradient; add perturbation values to the samples of the positive and negative sample dataset to determine the second gradient after adding the perturbation values; determine the target gradient by summing the first gradient and the second gradient; and determine the model parameters of the preset model according to the target gradient.

[0062] The parameter determination unit includes: a perturbation value determination subunit and a parameter determination subunit; the perturbation value determination subunit is used to determine the norm of the first gradient; and to determine the perturbation value based on the preset perturbation parameter, the first gradient and the norm of the first gradient.

[0063] A parameter determination subunit is used to input the target gradient into the first and second sub-optimizers of a preset optimizer, respectively; and to determine the model parameters of the preset model based on the outputs of the first and second sub-optimizers, wherein the training rate of the first sub-optimizer is higher than that of the second sub-optimizer.

[0064] The parameter determination subunit is further configured to determine the difference between the first model parameter corresponding to the output result of the first sub-optimizer and the second model parameter corresponding to the output result of the second sub-optimizer; the product of the difference and the preset parameter is determined as the initial value of the model parameter for the next iteration of the first sub-optimizer; and the positive and negative sample datasets and the target gradient are input into the first sub-optimizer to determine the model parameters of the preset model.

[0065] The dataset submodule is used to input the same text from the obtained text training set twice into a preset encoder to obtain positive sample pairs; input different texts from the text training set into the preset encoder to obtain negative sample pairs; and combine multiple positive sample pairs and negative sample pairs to obtain the positive and negative sample dataset.

[0066] The judgment submodule is used to determine that the first feature vector is invalid if the inner product distance corresponding to the first feature vector is greater than a preset threshold.

[0067] According to another aspect of the embodiments of this application, a non-volatile storage medium is also provided, the non-volatile storage medium including a stored computer program, wherein the device where the non-volatile storage medium is located executes the above-described target text retrieval method by running the computer program.

[0068] According to another aspect of the embodiments of this application, a computer device is also provided, including a memory and a processor, the processor being used to run a program, wherein the program executes the above-described target text retrieval method when it runs.

[0069] The sequence numbers of the embodiments in this application are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0070] In the above embodiments of this application, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0071] In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are merely illustrative; for example, the division of units can be a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual couplings, direct couplings, or communication connections may be through some interfaces; indirect couplings or communication connections between units or modules may be electrical or other forms.

[0072] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0073] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0074] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as a USB flash drive, read-only memory (ROM), random access memory (RAM), portable hard drive, magnetic disk, or optical disk.

[0075] The above are merely preferred embodiments of this application. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principles of this application, and these improvements and modifications should also be considered within the scope of protection of this application.

Claims

1. A method for retrieving target text, characterized in that, include: The system obtains a set of texts to be retrieved and a text to be retrieved, and inputs the set of texts to be retrieved and the text to be retrieved into a preset model for analysis to obtain a first feature vector set corresponding to the set of texts to be retrieved and a second feature vector corresponding to the text to be retrieved. The preset model is trained using a pre-acquired positive and negative sample dataset, and the text to be retrieved is associated with the texts in the set of texts to be retrieved. From the first set of feature vectors, determine the first feature vector that has the closest inner product distance to the second feature vector; The text corresponding to the first feature vector is determined as the retrieved target text; The model parameters of the preset model are determined in the following ways, including: The positive and negative sample datasets are input into the preset model, and the preset model is trained according to the preset objective function to determine the first gradient; A perturbation value is added to the samples in the positive and negative sample dataset to determine the second gradient after adding the perturbation value; The sum of the first gradient and the second gradient is determined as the target gradient; The model parameters of the preset model are determined based on the target gradient; The step of determining the model parameters of the preset model based on the target gradient includes: The target gradient is input into the first sub-optimizer and the second sub-optimizer in the preset optimizer, respectively; The model parameters of the preset model are determined based on the output results of the first sub-optimizer and the second sub-optimizer, wherein the training rate of the first sub-optimizer is higher than that of the second sub-optimizer; The step of determining the model parameters of the preset model based on the output results of the first sub-optimizer and the second sub-optimizer includes: Determine the difference between the first model parameters corresponding to the output of the first sub-optimizer and the second model parameters corresponding to the output of the second sub-optimizer; The product of the difference and the preset parameters is used as the initial value of the model parameters for the next iteration of the first sub-optimizer. The positive and negative sample datasets and the target gradient are input into the first sub-optimizer to determine the model parameters of the preset model.

2. The method according to claim 1, characterized in that, The disturbance value is determined in the following ways, including: Determine the norm of the first gradient; The perturbation value is determined based on the preset perturbation parameters, the first gradient, and the norm of the first gradient.

3. The method according to claim 1, characterized in that, The positive and negative sample datasets are determined in the following ways: The same text from the obtained text training set is input twice into the preset encoder to obtain positive sample pairs; Different texts from the text training set are input into the preset encoder to obtain negative sample pairs; The positive and negative sample datasets are obtained by combining multiple pairs of positive and negative samples.

4. The method according to claim 1, characterized in that, After determining the first feature vector from the first feature vector set whose inner product distance is closest to that of the second feature vector, the method further includes: If the inner product distance corresponding to the first feature vector is greater than a preset threshold, the first feature vector is determined to be invalid.

5. A target text retrieval device, characterized in that, include: The acquisition module is used to acquire a set of texts to be retrieved and a text to be retrieved, and input the set of texts to be retrieved and the text to be retrieved into a preset model to obtain a first feature vector set corresponding to the set of texts to be retrieved and a second feature vector corresponding to the text to be retrieved. The preset model is trained using a pre-acquired positive and negative sample dataset. The selection module is used to determine the first feature vector from the first feature vector set whose inner product distance is closest to that of the second feature vector as the first feature vector; The determination module is used to determine the text corresponding to the first feature vector as the retrieved target text. The model parameters of the preset model are determined in the following ways, including: The positive and negative sample datasets are input into the preset model, and the preset model is trained according to the preset objective function to determine the first gradient; A perturbation value is added to the samples in the positive and negative sample dataset to determine the second gradient after adding the perturbation value; The sum of the first gradient and the second gradient is determined as the target gradient; The model parameters of the preset model are determined based on the target gradient; The step of determining the model parameters of the preset model based on the target gradient includes: The target gradient is input into the first sub-optimizer and the second sub-optimizer in the preset optimizer, respectively; The model parameters of the preset model are determined based on the output results of the first sub-optimizer and the second sub-optimizer, wherein the training rate of the first sub-optimizer is higher than that of the second sub-optimizer; The step of determining the model parameters of the preset model based on the output results of the first sub-optimizer and the second sub-optimizer includes: Determine the difference between the first model parameters corresponding to the output of the first sub-optimizer and the second model parameters corresponding to the output of the second sub-optimizer; The product of the difference and the preset parameters is used as the initial value of the model parameters for the next iteration of the first sub-optimizer. The positive and negative sample datasets and the target gradient are input into the first sub-optimizer to determine the model parameters of the preset model.

6. A non-volatile storage medium, characterized in that, The non-volatile storage medium includes a stored computer program, wherein the device containing the non-volatile storage medium executes the target text retrieval method of any one of claims 1 to 4 by running the computer program.

7. A computer device, characterized in that, It includes a memory and a processor, the processor being used to run a program, wherein the program, when running, executes the target text retrieval method according to any one of claims 1 to 4.

Citation Information

Patent Citations

CN111738408A
CN114860874A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

CN111738408A

CN114860874A