Long text retrieval model based on comparative learning

A long text, retrieval module technology, applied in unstructured text data retrieval, neural learning methods, text database query and other directions, can solve the problems of long text, time-consuming, time-consuming and space resources, etc., to improve accuracy and efficiency effects

Pending Publication Date: 2022-03-18
SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI +1
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The primary difficulty in searching for similar cases is that the text is too long. In general, the query sentences in the retrieval scene are short, and the query text in similar case retrieval is often as many as thousands of words. Traditional retrieval models often perform literal matching based on keywords. Limit the length of the text, but have high requirements for word segmentation accuracy, and cannot handle the semantic correlation between words, while the general deep learning model has restrictions on the input length. As the input length increases, the calculation time and space costs On the other hand, long text retrieval is facing the bottleneck of retrieval efficiency. Traditional retrieval models need to spend a lot of time to calculate when the query text is too long and the number of candidate cases in the database is too large. The deep learning model based on interaction In this scenario, a large number of interactive calculations will be generated, which will consume unbearable time and space resources, and it is also not applicable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Long text retrieval model based on comparative learning
  • Long text retrieval model based on comparative learning
  • Long text retrieval model based on comparative learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0024] Similar case retrieval is a specific retrieval requirement in the legal field. It aims to retrieve similar cases from the database and return sorted results based on long texts provided by users, such as complaints and referee cases. A good similar case retrieval system can provide users with There are valuable legal references such as case judgment information, so implementing a long text retrieval model for the legal field has important re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a long text retrieval model based on comparative learning. The long text retrieval model comprises a legal field pre-training module, a comparative learning module and a retrieval module, wherein the legal field pre-training module is used for constructing a basic long text encoder and performing field pre-training on the long text encoder by using a legal document corpus; the comparative learning module is used for constructing training data from the case annotation data set and performing text vector training on the long text encoder by utilizing the training data, and the training data comprises query statements and positive samples and negative samples of the query statements; and the retrieval module is used for detecting a case corresponding to the long text query statement by adopting the trained long text encoder. By means of the mode, the long text retrieval model can effectively solve the problem that a deep model processes a long text, the document encoder is adjusted by combining the characteristics of class case retrieval and using a domain pre-training and contrast learning method, and the retrieval accuracy and efficiency are improved.

Description

technical field [0001] The present application relates to the technical field of text retrieval, in particular to a long text retrieval model based on contrastive learning. Background technique [0002] The user enters a long text containing the basic facts of the case, such as complaints, appeals, and judgment cases, etc., and the retrieval system returns judgment documents that are similar to the text in terms of basic facts, disputes, and legal issues from the database, and Returns are sorted by relevance. Whether it is legal related persons such as practicing lawyers, corporate legal or judicial personnel, or ordinary people, there is a great demand for legal search, and its value is self-evident. [0003] The primary difficulty in searching for similar cases is that the text is too long. In general, the query sentences in the retrieval scene are short, and the query text in similar case retrieval is often as many as thousands of words. Traditional retrieval models ofte...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/33G06F16/338G06F16/35G06N3/08G06N3/04
CPCG06F16/3332G06F16/3347G06F16/338G06F16/353G06N3/084G06N3/045Y02D10/00
Inventor 钟泽艺杨敏贺倩明
Owner SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products