Chinese patent text similarity calculation method

A technology of text similarity and calculation method, applied in the field of Chinese patent text similarity calculation, can solve the problems of low accuracy and recall rate of calculation results, inability to accurately reflect patent text similarity, loss of semantic information, etc.

Inactive Publication Date: 2018-09-18
BEIJING INFORMATION SCI & TECH UNIV +1
View PDF3 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existing method of calculating the similarity of Chinese patent texts has the problem of loss of semantic information, and the calculation of the similarity of Chinese texts in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese patent text similarity calculation method
  • Chinese patent text similarity calculation method
  • Chinese patent text similarity calculation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0043] Word similarity is a measure of the semantic similarity between words. Words are presented in the form of concepts in the domain ontology, and the similarity calculation of words can be transformed into the similarity calculation of concepts in the ontology. Using the existing domain ontology, in order to avoid the problem that words not included in the domain ontology cannot be cal...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Chinese patent text similarity calculation method. The method comprises the steps of performing word segmentation on texts; calculating TF-IDF values for word segmentation results, extracting the word segmentation results with the relatively high TF-IDF values to serve as keywords, locating sentences where the keywords are located to serve as key sentences, and taking maximum weight values of the keywords in the key sentences as weight values of the key sentences, thereby obtaining a keyword set of each text; and calculating weights of comparison texts of the key sentences, selecting the key sentences of the to-be-compared texts and the comparison texts in sequence, and based on the sentence similarity of the key sentences, calculating the similarity of the texts. By utilizing existing patent domain ontologies, semantic relationships in the patent texts are analyzed; by utilizing a vector space model and the domain ontologies, patent text similarity is calculated; the correct rate and the recall rate of a calculation result are relatively high; the similarity between patents can be described more accurately; the patent examination speed can be increased;and the need of actual application can be well met.

Description

technical field [0001] The invention belongs to the technical field of text information processing, in particular to a method for calculating the similarity of Chinese patent texts. Background technique [0002] In today's Internet era, patents, as a carrier to record human achievements, contain a large number of scientific and technological achievements and innovative technologies. The rapid development of science and technology has led to a sharp increase in the number of patent applications each year. The results returned by the traditional search method by matching the search terms generally use the number of search terms as the relevance of the patent, and do not take into account the semantic information contained in the patent itself. The essence of patent examination is to examine related patents with high patent similarity, among which, the most important point is to calculate the similarity of patent text. Text similarity, the general calculation method is to use...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06K9/62
CPCG06F40/289G06F40/30G06F18/22G06F18/214
Inventor 吕学强董志安
Owner BEIJING INFORMATION SCI & TECH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products