Patent classification method based on similarity measurement
A technology of similarity measurement and patent classification, applied in text database clustering/classification, instruments, electronic digital data processing, etc., can solve the problem of low accuracy of patent classification and achieve the effect of reducing classification errors
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
specific Embodiment approach 1
[0025] Specific implementation mode one: combine figure 1 This embodiment will be described. A patent classification method based on similarity measurement described in this embodiment, the method is specifically implemented through the following steps:
[0026] Step 1. For the text elements of the abstract of the patent specification, the similarity of the abstract of the patent specification is calculated by combining the CHI statistic and the cosine similarity, so as to solve the problem that some features have a high CHI value but do not have classification information;
[0027] Step 2. Based on the IPC classification number of the patent, combined with the similarity of the description calculated in step 1, the mixed similarity of the patent is calculated;
[0028] Step 3: According to the mixed similarity of patents calculated in Step 2, the patents are classified using the KNN classification method.
specific Embodiment approach 2
[0029] Embodiment 2: The difference between this embodiment and Embodiment 1 is that in the first step, for the text elements of the abstract of the patent specification, the similarity of the abstract of the patent specification is calculated by combining the CHI statistic with the cosine similarity , which is specifically:
[0030] Step 11. Record a set of patents with similar technical subjects as set P, P={p 1 ,p 2 ,...,p n}, n is the number of patents contained in the set P;
[0031] Step 12, respectively extract the IPC classification number and the abstract of the specification of each patent in the set P, the set of abstracts of the specification is A={a 1 ,a 2 ,...,a n}, a 1 for patent p 1 The abstract of the instruction manual, a 2 for patent p 2 The abstract of the instruction manual, a n for patent p n summary of the instruction manual;
[0032] Step 13. Calculate the i-th patent p in the set P respectively i The abstract of the specification and the j...
specific Embodiment approach 3
[0043] Specific implementation mode three: the difference between this implementation mode and specific implementation mode two is: the specific process of said step two is:
[0044] Step 21, calculating the similarity of IPC classification symbols:
[0045] The similarity of the IPC classification code is the ratio of the number of the same IPC levels of the two patents to the total IPC level of the sample, assuming that the IPC classification code of the i-th patent is IPC i , the IPC classification code of the jth patent is IPC j , then IPC i with IPC j The similarity S IPC (p i ,p j ) is calculated as (4):
[0046]
[0047] Step 22. Calculating the hybrid similarity of patents:
[0048] The mixed similarity of the patent is obtained by calculating the similarity of the description abstract and the similarity of the IPC classification code. The calculation formula of the mixed similarity is shown in (5):
[0049] S w (p i ,p j )=α×S IPC (p i ,p j )+(1-α)×S(...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com