Check patentability & draft patents in minutes with Patsnap Eureka AI!

Text vector retrieval method combined with external knowledge

A text vector and external knowledge technology, applied in unstructured text data retrieval, text database query, digital data information retrieval, etc., can solve the problems of model representation space semantic drift, neglect, etc., to enhance the semantic information of the problem, and build model and improve retrieval quality

Active Publication Date: 2021-04-20
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The purpose of the present invention is to solve the technical defect of the semantic drift of the model representation space caused by the text vector space model only using plain text modeling and ignoring language-related knowledge in the existing document retrieval system, and proposes a combination of external A Text Vector Retrieval Method for Knowledge

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text vector retrieval method combined with external knowledge
  • Text vector retrieval method combined with external knowledge
  • Text vector retrieval method combined with external knowledge

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0109] figure 1 It is a flowchart of the method and embodiments of the present invention.

[0110] from figure 1 It can be seen that the present invention comprises the following steps:

[0111] Step A: Obtain external knowledge of the problem;

[0112] Specifically, the pre-trained dependency syntax model is used to obtain the part-of-speech and syntax tag information corresponding to each word in a given question;

[0113] Specifically in this embodiment, this step A corresponds to step 1 to step 2 in the summary of the invention;

[0114] Step B: extracting the conditional subinterval corresponding to the question;

[0115] Specifically corresponding to step 3 in the summary of the invention;

[0116] Among them, noun words refer to the structure of the subject or predicate in the syntactic tag or as a linking structure and the linking word acts as the subject or predicate structure; verb words refer to the part of speech tag as the verb tag or the linking structure of...

Embodiment 2

[0130] This embodiment will use specific examples to describe in detail the specific operation steps of a text vector retrieval method combined with external knowledge in the present invention;

[0131] The processing flow of a text vector retrieval method combined with external knowledge is as follows: figure 1 shown; from figure 1 It can be seen that a text vector retrieval method combined with external knowledge includes the following steps:

[0132] Step A: Obtain external knowledge of the question; specifically, in this embodiment, for the question "who sang the original version of true colors?", use the pre-trained dependency syntax model to obtain the part-of-speech and syntax tag information corresponding to each word in the given question, Such as figure 2 shown.

[0133] Step B: Extracting sub-intervals of conditions corresponding to the question; splitting the question into several sub-conditions by using the information of speech and syntactic structure, specif...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a text vector retrieval method combined with external knowledge, and belongs to the technical field of open domain document retrieval application. The method aims to introduce syntax structures and part-of-speech information of languages into a framework of a text vector space model to obtain subcondition structures of questions, express the questions into a plurality of subconditions, recall documents through a BM25 algorithm to calculate the importance degree of each subcondition of the questions, and provide additional training labels for final expression of the questions. According to the method, the existing representation method is optimized through the extracted sub-conditions and the corresponding weight scores introduced by the BM25 algorithm, and finally the purpose of improving the retrieval performance of the text vector method is achieved. According to the method, better representation capability and generalization capability can be obtained by utilizing the deep learning model based on pre-training, problems and semantic information of documents can be better modeled, and the retrieval quality of open domain retrieval is improved.

Description

technical field [0001] The invention relates to a text vector retrieval method combined with external knowledge, in particular to a method of dismantling a question text into several sub-conditions by using part-of-speech and syntactic tag information, and using the vocabulary matching information provided by the existing algorithm to measure the sub-conditions in the question. The importance degree is a text vector retrieval method for enhancing conditional information fusion, which belongs to the technical field of open domain document retrieval applications. Background technique [0002] In recent years, Open Domain Question Answering (OPQA) has gained widespread attention in the field of natural language processing. [0003] The open domain question answering system is divided into two pipeline frameworks. For open-domain questions, the open-domain document retrieval system first recalls relevant documents for the current question, and then the open-domain machine readi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33G06F16/332G06F16/35G06F40/211G06F40/30G06F40/289
Inventor 史树敏刘宏玉黄河燕
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More