Chinese patent text similarity calculation method

A text similarity and similarity calculation technology, applied in the field of text processing, can solve the problems of limited ontology scale, no consideration of text semantic information, no consideration of sentence position relationship, etc., and achieve the effect of the best accuracy rate

Inactive Publication Date: 2019-08-16
BEIJING INFORMATION SCI & TECH UNIV
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the study of text similarity, traditional string-based methods only consider the matching or co-occurrence of strings literally, without taking into account the semantic information contained in the text; ontology-based methods are limited by human-constructed Due to the scale of the ontology, the similarity cannot be calculated for words that are not in the ontology; while the corpus-based method trains the word

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese patent text similarity calculation method
  • Chinese patent text similarity calculation method
  • Chinese patent text similarity calculation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described below with reference to specific embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0047] A Chinese patent text similarity calculation method is a text similarity calculation method based on the fusion of SAO structure and vector space model, including: first extracting SAO triple structure from patent text, then adding domain ontology, The similarity calculation method has been improved. Then, using the word similarity calculation method, a calculation method of the similarity between SAOs is p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Chinese patent text similarity calculation method, which is used for calculating text similarity based on the fusion of an SAO structure and a vector space model, and comprises the following steps: extracting an SAO triple from a patent text; calculating the similarity of words in the SAO triple; calculating the similarity between the SAOs; calculating the patent text similarity based on the SAO; and fusing the vector space model method and the SAO structure-based method. According to the Chinese patent text similarity calculation method provided by the invention, the patent text similarity is calculated on the basis of the SAO structure and vector space model fusion method, the advantages of the SAO structure and the vector space model are brought into full play, the defects of the SAO structure and the vector space model are overcome, excellent correct rate, recall rate and F value are obtained, and the requirements of practical application can be well met.

Description

technical field [0001] The invention belongs to the technical field of text processing, and in particular relates to a method for calculating the similarity of Chinese patent texts. Background technique [0002] Patent documents are the carriers of technology, and more than 90% of the technologies in the world are preserved in the form of patent documents. With the explosive growth of knowledge, patent plagiarism and plagiarism are also further increasing. In order to maintain the legitimate intellectual property rights of individuals, enterprises or patent holders can conduct patent lawsuits, patent invalidation applications and patent infringement judgments. Among them, a very important work is to search for related or similar patents in the patent database. However, in the face of the massive patent literature library, the traditional method is to simply enter keywords in the search box to search for related patents. Although there are many results obtained by this metho...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/22G06K9/62
CPCG06F40/194G06F18/22
Inventor 游新冬吕学强张乐董志安
Owner BEIJING INFORMATION SCI & TECH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products