Patent classification method based on similarity measurement

A technology of similarity measurement and patent classification, applied in text database clustering/classification, instruments, electronic digital data processing, etc., can solve the problem of low accuracy of patent classification and achieve the effect of reducing classification errors

Pending Publication Date: 2020-11-13
HARBIN ENG UNIV
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to solve the problem of low accuracy of patent classification using the existing patent classification method, and propose a patent classification method based on similarity measurement

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Patent classification method based on similarity measurement
  • Patent classification method based on similarity measurement
  • Patent classification method based on similarity measurement

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0025] Specific implementation mode one: combine figure 1 This embodiment will be described. A patent classification method based on similarity measurement described in this embodiment, the method is specifically implemented through the following steps:

[0026] Step 1. For the text elements of the abstract of the patent specification, the similarity of the abstract of the patent specification is calculated by combining the CHI statistic and the cosine similarity, so as to solve the problem that some features have a high CHI value but do not have classification information;

[0027] Step 2. Based on the IPC classification number of the patent, combined with the similarity of the description calculated in step 1, the mixed similarity of the patent is calculated;

[0028] Step 3: According to the mixed similarity of patents calculated in Step 2, the patents are classified using the KNN classification method.

specific Embodiment approach 2

[0029] Embodiment 2: The difference between this embodiment and Embodiment 1 is that in the first step, for the text elements of the abstract of the patent specification, the similarity of the abstract of the patent specification is calculated by combining the CHI statistic with the cosine similarity , which is specifically:

[0030] Step 11. Record a set of patents with similar technical subjects as set P, P={p 1 ,p 2 ,...,p n}, n is the number of patents contained in the set P;

[0031] Step 12, respectively extract the IPC classification number and the abstract of the specification of each patent in the set P, the set of abstracts of the specification is A={a 1 ,a 2 ,...,a n}, a 1 for patent p 1 The abstract of the instruction manual, a 2 for patent p 2 The abstract of the instruction manual, a n for patent p n summary of the instruction manual;

[0032] Step 13. Calculate the i-th patent p in the set P respectively i The abstract of the specification and the j...

specific Embodiment approach 3

[0043] Specific implementation mode three: the difference between this implementation mode and specific implementation mode two is: the specific process of said step two is:

[0044] Step 21, calculating the similarity of IPC classification symbols:

[0045] The similarity of the IPC classification code is the ratio of the number of the same IPC levels of the two patents to the total IPC level of the sample, assuming that the IPC classification code of the i-th patent is IPC i , the IPC classification code of the jth patent is IPC j , then IPC i with IPC j The similarity S IPC (p i ,p j ) is calculated as (4):

[0046]

[0047] Step 22. Calculating the hybrid similarity of patents:

[0048] The mixed similarity of the patent is obtained by calculating the similarity of the description abstract and the similarity of the IPC classification code. The calculation formula of the mixed similarity is shown in (5):

[0049] S w (p i ,p j )=α×S IPC (p i ,p j )+(1-α)×S(...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a patent classification method based on similarity measurement, and belongs to the technical field of text classification. According to the invention, the problem of low accuracy of patent classification by use of an existing patent classification method is solved. The invention provides a patent classification method based on mixed similarity by considering the characteristics of patent specification abstracts, combining CHI statistics and cosine similarity and combining the similarity of IPC classification numbers. The patent classification method based on the similarity of the claims is provided for the claims. And calculating the similarity of the claims according to the extracted SAO-x multi-dimensional structure, and classifying the patents by adopting a KNN classification algorithm based on the similarity result. Compared with an existing patent classification method, the patent classification method has the advantages that the accuracy of automatic patent classification reaches 70% or above, and classification errors generated by manual classification on the subjective level are reduced. The method can be applied to the technical field of text classification.

Description

technical field [0001] The invention belongs to the technical field of text classification, and in particular relates to a patent classification method based on similarity measurement. Background technique [0002] In the wave of global economic development, science and technology has become the primary productive force, a key factor and an important force to promote the development of modern productive forces. The innovation and development of science and technology have promoted the development of enterprises and governments. As a knowledge carrier that includes technology and technologies in various fields, patents have been used as a reflection of enterprises and governments to measure innovation capabilities. Therefore, the amount and quality of patent data represent the level of technological and economic development of each country. How to obtain effective innovative technology information from these patent texts and provide technological and innovative support for ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06K9/62G06F40/205
CPCG06F16/35G06F40/205G06F18/22
Inventor 周连科王红滨王念滨张毅仝彤刘鹏席泽盛崔琎
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products