Supercharge Your Innovation With Domain-Expert AI Agents!

Long text retrieval method and system based on Gaussian kernel function

A Gaussian kernel function and long text technology, applied in the field of information retrieval, to achieve the effect of improving accuracy and enhancing the degree of association

Pending Publication Date: 2022-04-12
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to solve the technical problems of how to find paragraph-level correlation tags without additional label data and how to find the bridge between semantic similarity and user click correlation faced by long text retrieval, and creatively propose a Long text retrieval method and system based on Gaussian kernel function

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Long text retrieval method and system based on Gaussian kernel function
  • Long text retrieval method and system based on Gaussian kernel function

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0015] A long text retrieval method based on Gaussian kernel function, such as figure 1 shown, including the following steps:

[0016] Step 1: Segment the long text.

[0017] Specifically, specify a length N as the maximum length of each paragraph after the long text is segmented. Within the range of length N, punctuation is used as the priority segmentation cut-off point to ensure the semantic integrity of the segmented text.

[0018] Step 2: Using a pre-trained language model, score user queries and candidate passages.

[0019] Specifically, after cascading user retrieval content and candidate paragraphs, pre-training is performed, and the output sentence vector [CLS] is used as the text feature interaction vector. Then, the multi-layer perceptron MLP is used to judge the semantic similarity between the user query and the candidate paragraph as a pseudo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a long text retrieval method and system based on a Gaussian kernel function, and belongs to the technical field of information retrieval. According to the method, the semantic modeling capability of the pre-training language model is utilized to calculate the semantic similarity between each paragraph of the long text and the retrieval content of the user, and the semantic similarity is used as a pseudo tag of the user click correlation, so that the problem of lack of paragraph-level annotation data is effectively relieved. And through different Gaussian kernel functions, the pseudo labels are mapped into correlation scores of different dimensions. The score of each paragraph of the long text is aggregated by using a linear layer to output the overall relevance score of the user retrieval content for the long text, so that paragraphs with different semantic similarity levels can make contributions to whether the user clicks relevance or not, and the relevance degree of the semantic similarity and the user clicking relevance is enhanced; and the accuracy of the long text retrieval model is improved.

Description

technical field [0001] The invention relates to a long text retrieval method and system, in particular to a Gaussian kernel function-based long text retrieval method and system, and belongs to the technical field of information retrieval. Background technique [0002] Long text retrieval is a basic task in the field of information retrieval. Its characteristics are: the average length of documents to be retrieved is relatively long, and a single document may contain multiple topics. It is difficult for traditional retrieval models to locate topics related to user click intentions in long texts. [0003] In recent years, pre-trained language models have performed well in the field of information retrieval. Its powerful contextual semantic modeling capability enables the retrieval model to better calculate the semantic similarity between the user's retrieval content and candidate documents, thereby improving the accuracy of the model in judging whether the two are related or ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/332G06F16/38G06F40/30
Inventor 史树敏朱乐黄河燕
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More