Intelligent retrieval method and device for calculating patent literature similarity based on word frequency and semantics, electronic equipment and storage medium thereof

A technology of semantic computing and patent documents, applied in the fields of intelligent retrieval, electronic equipment and its storage media, it can solve the problems of strong subjectivity of audit opinions, low accuracy of results, and single use method, so as to reduce the scope of examination and save manpower. and time, the effect of improving accuracy

Active Publication Date: 2021-01-22
北京知呱呱科技服务有限公司
View PDF8 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, in the existing technology, the patent plagiarism check algorithm is used in a single way and the accuracy of the results is not high. Examiners need

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Intelligent retrieval method and device for calculating patent literature similarity based on word frequency and semantics, electronic equipment and storage medium thereof
  • Intelligent retrieval method and device for calculating patent literature similarity based on word frequency and semantics, electronic equipment and storage medium thereof
  • Intelligent retrieval method and device for calculating patent literature similarity based on word frequency and semantics, electronic equipment and storage medium thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] see figure 1 , an intelligent retrieval method based on word frequency and semantic calculation of patent document similarity provided by this embodiment, the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention. The method specifically includes the following steps:

[0033] S101, for all the patent data in the question bank, extract the text information related to the content of the test question, organize it into structured data, and form a word segmentation result;

[0034] S102. Carry out word bag statistics and word vector conversion calculations for the word segmentation results of all the above patent data, and obtain the weight value of each word as preloaded data for model prediction;

[0035] S103. Load all the word bags, word vectors, and vocabulary data mentioned above, perform a full matching query according to the test question publication number, compare the similarity predicted b...

Embodiment 2

[0074] see figure 2 , is an intelligent data retrieval method based on a single server provided in this embodiment, and the examples given are only used to explain the present invention, and are not used to limit the scope of the present invention. The method specifically includes the following steps:

[0075] S201, extracting patent information and content from the XML file of the question bank and performing storage operations, the extracted content is downloaded into a CSV file of a specified field after preliminary cleaning and sorting in the patent database;

[0076] S202. After segmenting the full content, removing stop words, and screening high-frequency words, construct a vector model;

[0077] S203. Load the vector model data, and combine multiple sets of fusion results based on the literal-based bag-of-words algorithm and the semantic-based semantic algorithm to predict top-ranked patents.

[0078] Among them, S203 further includes:

[0079] S2031. Segment the co...

Embodiment 3

[0086] see image 3 , an intelligent retrieval device 210 for calculating the similarity of patent documents based on word frequency and semantics provided in this embodiment, the examples given are only for explaining the present invention, and are not intended to limit the scope of the present invention.

[0087] The device specifically includes the following components:

[0088] Data processing module 211: used to extract all patent text content according to fields and importance from the question bank, and obtain the data standard format for modeling;

[0089] Intelligent calculation module 212: used to carry out various calculations to the extracted standard data, and obtain model data reflecting its frequency, semantics and weight in the text;

[0090] Model building module 213: used to model and calculate model data, combine and optimize calculation results, and construct an intelligent retrieval model in combination with business requirements;

[0091] Model predicti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an intelligent retrieval method and device for calculating patent literature similarity based on word frequency and semanteme, electronic equipment and a storage medium of the electronic equipment. Bag-of-words statistics and word vector calculation are conducted on all literatures in a patent database, and corresponding bag-of-words data and word distance data are obtained;the method comprises the following steps: establishing a model, inputting contents or examination question numbers, acquiring titles, abstracts, claims and specifications of patents to be examined from question bank data and carrying out various combinations, performing rough selection and fine selection respectively according to a bag-of-words algorithm and a semantic algorithm, performing textsimilarity analysis on selected data, and performing fusion sorting on analysis results to obtain comprehensive similarity. Through duplicate checking and screening, a suspicious answer set of the to-be-checked patent is given. According to the method, the retrieval speed is increased, two rounds of screening are adopted, the first round of roughing aims at rapidly narrowing the comparison range,and the second round of fine selection aims at improving the accuracy; manpower and time can be effectively saved, a patent reviewer is helped to reduce the related patent review range, and review efficiency is improved.

Description

technical field [0001] The invention belongs to the technical field of data plagiarism checking, and in particular relates to an intelligent retrieval method, device, electronic equipment and storage medium for calculating the similarity of patent documents based on word frequency and semantics. Background technique [0002] A patent is a special document protected by law, and it is a means for the government to protect social inventions and creations. After the national patent management department accepts a patent application, it needs to conduct an effective review of the patent, and the checking of plagiarism during the review process is undoubtedly One of the important links, the commonly used plagiarism check algorithm in the existing plagiarism check system is the bag of words algorithm or semantic algorithm. [0003] The bag-of-words algorithm refers to the similarity calculation based on the bag-of-words results of the word segmentation statistics of the text conten...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/216G06F40/30G06F40/242G06F40/289G06K9/62G06F16/33
CPCG06F40/216G06F40/30G06F40/242G06F40/289G06F16/3344G06F16/3346G06F18/22
Inventor 汪敏严妍肖国泉裴非肖克彭祖剑邵罗树赵达石鑫
Owner 北京知呱呱科技服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products