Automatic industry classification method and system

An automatic classification and industrial technology, applied in the field of document analysis, can solve problems such as missing in-depth information and loss

Active Publication Date: 2020-05-08
BEIJING BENYING TECH CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of this method is that the natural language processing method used loses the information on the word order relationship, does not use the hierarchical vector generated by the abstract, claims and specification, and misses the in-depth information contained in the patent text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic industry classification method and system
  • Automatic industry classification method and system
  • Automatic industry classification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0084] like figure 1 , 2 As shown, step 1000 is executed, the industry tree generation module 200 is used to define the target industry tree, and the scope of patents to be divided is manually determined as required.

[0085] Step 1100 is executed to determine the target patent scope using the confirmation module 210 . Define the industry tree as needed: I={i 1 ,..., i j ,…, i n}, where i j ∈I is the first-level industry j is the number of the first-level industry, 1≤j≤n, n is the number of all leaf nodes under I. Set any non-leaf node i of I jkl… ={i jkl…1 ,...,i jkl…t}, the degree of other nodes other than the leaf node is ≥ 2, where k is the second-level industry number, l is the third-level industry number, and t is the second-to-last industry number.

[0086] Execute step 1200, use the tag generation module 220 to generate tags on the target industry tree, determine the number p of patents that can be tagged according to resource constraints, p≥N, and tag at leas...

Embodiment 2

[0115] A method for automatic industry classification, comprising the following steps:

[0116] 1. Define the target industry tree. Define the industry tree as needed: I={i 1 ,...,i n}, where i j ∈I is the primary industry, which can be further divided into secondary industries, i j ={i j1 ,...,i jm}, and so on, any non-leaf node i of I jkl… ={i jkl…1 ,...,i jkl…t}. According to the general practice of industry division, the degree of other nodes other than leaf nodes is ≥2. Let N be the number of all leaf nodes under I.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an automatic industry classification method and system. The method comprises the step of determining a target patent range. The method also comprises the following steps of: defining a target industry tree; generating marks on the target industry tree; performing target patent rough classification by using the marks; and performing target patent fine classification accordingto a rough classification result. According to the automatic industry classification method and system provided by the invention, a direct-push learning method is used to realize the full mining of small-annotation-amount information; the information of the IPC is used, and therefore, so that information dimensions is enriched, and a calculation amount is reduced; hierarchical vectors generated by abstracts, claims and specifications are used; information in the aspect of word order relations is reserved; and patent texts are mined more deeply.

Description

technical field [0001] The invention relates to the technical field of document analysis, in particular to an industry automatic classification method and system. Background technique [0002] The rapid development of science and technology has brought about the surge of patent texts and the continuous emergence of new industries. In order to analyze technological development in the context of an industry, it is necessary to label patents with industry labels. The method of manual labeling is slow and expensive, but the accuracy is high. Therefore, there is a need for an automatic classification method with a small amount of annotation, high computational efficiency, and more fully mining annotation information. [0003] Existing methods either require a large amount of manual labeling, or do not use manual labeling at all, so that the corresponding relationship with the target industry cannot be directly established. Existing methods generally use patent texts for natura...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06K9/62
CPCG06F18/23G06F18/24323G06V10/7625G06F18/23213G06F40/289G06F40/30G06Q50/184G06F2216/11G06F16/353G06Q10/0637
Inventor 李卫宁
Owner BEIJING BENYING TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products