Enterprise industry classification method

A classification method, clustering method technology, applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problem of low accuracy

Active Publication Date: 2018-04-20
广州探迹科技有限公司
View PDF4 Cites 46 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition to this, a single classifier model relies too much on the coverage of sample descriptions, and is less accurate when classifying a new sample of a description that has never appeared before.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Enterprise industry classification method
  • Enterprise industry classification method
  • Enterprise industry classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The accompanying drawings are for illustrative purposes only, and should not be construed as limitations on this patent; in order to better illustrate this embodiment, certain components in the accompanying drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product; for those skilled in the art It is understandable that some well-known structures and descriptions thereof may be omitted in the drawings. The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0046] The main innovation of a kind of enterprise industry classification method in the present invention is to use the word vector and semi-supervised graph splitting clustering method to extract the main business keywords of the enterprise, eliminate garbage words, and construct a keyword library; use the extracted keywords as feat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an enterprise industry classification method. According to the method, main business keywords of enterprises are effectively extracted by utilizing semi-supervised learning-based image split clustering algorithm, the extracted keywords are used as features on the basis of a gradient enhancement decision-making tree, and a training cascade classifier is used for classifyingthe enterprises according to industries, so that the problem that artificial classification is tedious is solved. The method specifically comprises the following steps of: 1) extracting main businesskeywords of enterprises by utilizing a word vector and a semi-supervised image split clustering algorithm, getting rid of junk words and constructing a keyword library; and 2) inputting the extractedkeywords which are taken as features into a training cascade classifier, the enterprises are classified by each level of classifier, and the unclassified enterprises are classified according to the next level of classifier. According to the method, keywords can be automatically constructed, updated and classified, the problem of classifying millions and millions of enterprise industries is solved,and the problem of artificial labelling is effectively solved.

Description

technical field [0001] The present invention relates to the research field of data classification methods, and more specifically, relates to the extraction of industry keywords. In the case that the business scope of the enterprise overlaps with multiple industry descriptions, the fusion of semi-supervised graph splitting and clustering and cascading gradient boosting decision trees The enterprise industry classification method. Background technique [0002] In the industry classification standard issued by the National Bureau of Statistics of the People's Republic of China in 2013, it is divided into 20 first-level industries and 96 second-level industries. The industry label of an enterprise is an important field, and there are tens of millions of enterprises across the country, and many enterprises are incubated every day. How to quickly classify enterprises by industry is an important issue. In the previous industry classification norms, the industry to which an enterpr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213G06F18/241G06F18/214
Inventor 陈开冉吴璐璐
Owner 广州探迹科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products