Method for calculating similarity of patent texts

A text similarity and calculation method technology, applied in the field of computer text information processing, can solve the problems of low accuracy and recall, achieve high accuracy and recall, and save time

Inactive Publication Date: 2018-09-14
BEIJING INFORMATION SCI & TECH UNIV +1
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The accuracy rate and recall rate of the patent text similarity calculation method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for calculating similarity of patent texts
  • Method for calculating similarity of patent texts
  • Method for calculating similarity of patent texts

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0026] In order to make the objectives, technical solutions and advantages of the present invention clearer, the following further describes the present invention with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

[0027] reference figure 1 As shown, a patent text similarity calculation method includes the following steps:

[0028] Step 1) Extract the patent data in the two patent texts to be compared, and preprocess the patent data;

[0029] Step 2) Combine the part of speech weight and word position weight with the TF-IDF algorithm to calculate the word weight;

[0030] Step 3) Represent the two patent t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for calculating the similarity of patent texts. The method comprises the following steps that patent data is extracted from the two patent texts and preprocessed; word weights are calculated by combining part-of-speech weights, word position weights and a TF-IDF algorithm; the two patent texts are expressed in a vector space model, and two distributed word vectorsare obtained; the similarity of the texts is calculated, when the similarity of the obtained patent texts is larger than a set threshold, it is considered that the two patents are similar, and otherwise, the two patents are not similar. According to the method, patent structure characteristics and inter-word semantic relations are comprehensively considered, and special structures, such as IPC classification numbers, abstracts and claims, of the patent texts are fused into the method for calculating the similarity of the texts; compared with common text similarity calculation methods, the method is more targeted, high accuracy and a high recall rate can be guaranteed, and requirements of actual application can be well met.

Description

technical field [0001] The invention belongs to the technical field of computer text information processing, and in particular relates to a calculation method for patent text similarity. Background technique [0002] Patent documents have a relatively fixed organizational structure, which mainly includes IPC classification symbols, titles, abstracts, descriptions, and claims. Among them, the IPC classification code is the international general classification code, and the patent category can be determined according to the IPC classification code. The claim is the content required to be protected by the invention or utility model patent, and is the core of the patent application. In order to maintain its novelty and avoid patent minefields, patent documents generally use unique or uncommon words or phrases to express some common semantics, such as "a container for holding water" to express "water cup". ", and for example, the concept of "shared bicycles" is replaced by "bicy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/284G06F40/30
Inventor 吕学强董志安
Owner BEIJING INFORMATION SCI & TECH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products