Text retrieval method, device, storage medium and server

A text and text similarity technology, applied in unstructured text data retrieval, text database indexing, patent retrieval, etc., can solve the problems of low accuracy rate and low retrieval efficiency, improve accuracy rate, reduce comparison range, Guaranteeing the effect of accuracy

Active Publication Date: 2022-07-26
GUANGDONG UCAP INTERNET INFORMATION TECH +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The embodiment of the present application provides a text retrieval method, device, storage medium and server, which are used to solve the problem of low retrieval efficiency and low accuracy when searching based on sentence vectors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text retrieval method, device, storage medium and server
  • Text retrieval method, device, storage medium and server
  • Text retrieval method, device, storage medium and server

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0078] In order to make the objectives, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

[0079] Before applying the text retrieval method provided in this embodiment for retrieval, it is necessary to preprocess all patent texts, load the obtained preprocessing results into the retrieval model, and then apply the retrieval model to perform text retrieval. Among them, the preprocessing includes three parts: generating a dictionary and a dictionary, generating cosine distances of entries, and generating a bag of words. The specific implementation processes of these three parts are described below.

[0080] Please refer to figure 1 , which shows a schematic flowchart of a dictionary and a method for generating the dictionary, the method specifically includes:

[0081] Step 101, for each preprocessed patent ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present application discloses a text retrieval method, device, storage medium and server, which belong to the technical field of data retrieval. The method includes: acquiring a first bag of words combination and first patent information of a first patent text to be retrieved; acquiring a second bag of words combination and second patent information of each second patent text in the patent database; The word bag combination, the second word bag combination and the IPC weight of the entry, screen n second patent texts similar to the first patent text, and obtain a rough selection set; according to the first word bag combination and the second word bag in the rough selection set Combination, cosine distance between each entry and IPC weight, screen m second patent texts similar to the first patent text from the rough selection set to obtain a fine selection set; according to the matching degree between the first patent information and the second patent information , adjust the order of each second patent text in the fine selection set, and obtain the search result. The present application can improve retrieval efficiency and accuracy.

Description

technical field [0001] The embodiments of the present application relate to the technical field of data retrieval, and in particular, to a text retrieval method, device, storage medium, and server. Background technique [0002] A patent is a special document protected by law. As the country gradually attaches importance to the protection of patent intellectual property rights, more and more patent applications need to be reviewed efficiently, which requires reviewers to spend a lot of energy and time searching for similar documents. Compare documents to judge the inventive step of a patent application. [0003] After the examiner inputs the first patent text to be retrieved in the patent search engine, the search engine can convert each sentence in the first patent text into a sentence vector, and calculate all sentence vectors in the first patent text and the patent database. For the similarity between all sentence vectors in each pre-stored second patent text, sort each s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/383G06F16/33G06F16/338G06F16/31G06F16/36
CPCG06F16/383G06F16/334G06F16/3344G06F16/338G06F16/313G06F16/374G06F2216/11
Inventor 汪敏严妍裴非赵达张路
Owner GUANGDONG UCAP INTERNET INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products