An intelligent retrieval method, device, electronic device and storage medium for computing patent document similarity based on word frequency and semantics

A technology of semantic computing and patent documents, applied in the field of intelligent retrieval, electronic equipment and its storage media, can solve the problems of strong subjectivity of review opinions, low accuracy of results, and single use method, so as to reduce the scope of review and save manpower and time, the effect of improving accuracy

Active Publication Date: 2021-05-28
北京知呱呱科技服务有限公司
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, in the existing technology, the patent plagiarism check algorithm is used in a single way and the accuracy of the results is not high. The examiners need to spend a lot of time and energy in the patent examination process, the work efficiency is low, and there are problems of strong subjectivity in the examination opinions.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An intelligent retrieval method, device, electronic device and storage medium for computing patent document similarity based on word frequency and semantics
  • An intelligent retrieval method, device, electronic device and storage medium for computing patent document similarity based on word frequency and semantics
  • An intelligent retrieval method, device, electronic device and storage medium for computing patent document similarity based on word frequency and semantics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] see figure 1 , an intelligent retrieval method for calculating the similarity of patent documents based on word frequency and semantics provided in this embodiment, the examples given are only for explaining the present invention, not for limiting the scope of the present invention. The method specifically includes the following steps:

[0033] S101. For all the patent data in the question bank, extract text information related to the content of the test questions, organize them into structured data, and form a word segmentation result;

[0034] S102. Carry out bag-of-words statistics and word vector conversion calculations for the word segmentation results of all the above-mentioned patent data, and obtain the weight value of each word as preloaded data for model prediction;

[0035] S103. Load all the word bags, word vectors, and vocabulary data above, perform a full matching query according to the test question publication number, compare the similarity predicted by...

Embodiment 2

[0074] see figure 2 , is an intelligent data retrieval method based on a single server provided in this embodiment, and the examples given are only used to explain the present invention, and are not used to limit the scope of the present invention. The method specifically includes the following steps:

[0075] S201, extracting patent information and content from the XML file of the question bank and performing storage operations, the extracted content is preliminarily cleaned and sorted in the patent database, and then downloaded into a CSV file with specified fields;

[0076] S202. After segmenting the full content, removing stop words, and screening high-frequency words, construct a vector model;

[0077] S203. Load the vector model data, and combine multiple fusion results of the literal-based bag-of-words algorithm and the semantics-based semantic algorithm to predict top-ranked patents.

[0078] Among them, S203 further includes:

[0079] S2031. Perform word segmentat...

Embodiment 3

[0086] see image 3 , an intelligent retrieval device 210 for calculating the similarity of patent documents based on word frequency and semantics provided in this embodiment, the examples given are only for explaining the present invention, not for limiting the scope of the present invention. The device specifically includes the following components:

[0087] Data processing module 211: used to extract all patent text content from the question bank according to fields and importance, and obtain the data standard format for modeling;

[0088] Intelligent calculation module 212: used to perform various calculations on the extracted standard data to obtain model data reflecting its frequency, semantics and weight in the text;

[0089] Model building module 213: used to model and calculate model data, combine and optimize calculation results, and build an intelligent retrieval model in combination with business requirements;

[0090] Model prediction module 214: for encapsulati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides an intelligent retrieval method, device, electronic equipment and storage medium for calculating the similarity of patent documents based on word frequency and semantics, and performs bag-of-words statistics and word-vector calculations on all documents in the patent database to obtain corresponding bag-of-words data and word distance data; build a model, input the content or test question number, obtain the title, abstract, claims, and instructions of pending patents from the question bank data and make multiple combinations, and perform rough selection and analysis according to the bag of words algorithm and semantic algorithm respectively Fine selection, text similarity analysis is performed on the selected data, and the analysis results are fused and sorted to obtain a comprehensive similarity. After repeated checking and screening, a set of suspicious answers for patents to be checked is given. The invention improves the retrieval speed and adopts two rounds of screening. The first round of rough selection aims to quickly narrow the scope of comparison, and the second round of fine selection aims to improve the accuracy rate; it can effectively save manpower and time, and help patent examiners reduce the number of related patents. Examine scope and improve examination efficiency.

Description

technical field [0001] The invention belongs to the technical field of data plagiarism checking, and in particular relates to an intelligent retrieval method, device, electronic equipment and storage medium for calculating the similarity of patent documents based on word frequency and semantics. Background technique [0002] A patent is a special document protected by law, and it is a means for the government to protect social inventions and creations. After the national patent management department accepts a patent application, it needs to conduct an effective review of the patent, and the checking of plagiarism during the review process is undoubtedly One of the important links, the commonly used plagiarism check algorithm in the existing plagiarism check system is the bag of words algorithm or semantic algorithm. [0003] The bag-of-words algorithm refers to the similarity calculation based on the bag-of-words results of the word segmentation statistics of the text conten...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/216G06F40/30G06F40/242G06F40/289G06K9/62G06F16/33
CPCG06F40/216G06F40/30G06F40/242G06F40/289G06F16/3344G06F16/3346G06F18/22
Inventor 汪敏严妍肖国泉裴非肖克彭祖剑邵罗树赵达石鑫
Owner 北京知呱呱科技服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products