Unlock instant, AI-driven research and patent intelligence for your innovation.

A method and device for automatically obtaining enterprise multi-level classification training data

A training data, multi-level technology, applied in natural language data processing, text database query, electronic digital data processing and other directions, can solve problems such as low work efficiency, affecting the efficiency and accuracy of enterprise industry classification, and inability to meet practical applications. achieve the effect of improving accuracy

Active Publication Date: 2021-04-13
北京创新智源科技有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, manual labeling of data is cumbersome and inefficient, especially when faced with a large amount of data, the work is heavy and the efficiency is low, which directly affects the efficiency and accuracy of the industry classification of the enterprise, and is far from meeting the needs of practical applications.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for automatically obtaining enterprise multi-level classification training data
  • A method and device for automatically obtaining enterprise multi-level classification training data
  • A method and device for automatically obtaining enterprise multi-level classification training data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] Such as figure 1 As shown, the embodiment of the present invention provides a method for automatically obtaining enterprise multi-level classification training data, including:

[0056] S101. Obtain industry information, product name information and enterprise description text;

[0057] S102. Generate an industry hierarchy system according to the industry information;

[0058] S103. Clustering the product name information and associating with the industry hierarchy system to obtain a multi-level industry keyword list;

[0059] S104. According to the keywords in the enterprise description text that match the multi-level keyword list of the industry, mark the corresponding industry classification for the enterprise, and obtain the industry labels of each level;

[0060] S105. Form training data according to the enterprise description text and industry labels of each level of the enterprise.

[0061] Optionally, in step S101, the industry information, product name informa...

Embodiment 2

[0105] An embodiment of the present invention provides a method for multi-level classification of an enterprise, including:

[0106] Utilize the training data that the method described in embodiment one obtains to train classification algorithm, obtain enterprise classification model;

[0107] Enter the enterprise description text into the enterprise classification model to obtain the multi-level industry classification of the enterprise.

[0108] Specifically, after obtaining the training data using the method described in Embodiment 1, select the BiLSTM classification algorithm, use the training data to train the BiLSTM classification algorithm, and obtain a reliable enterprise classification model.

[0109] The enterprise description text includes the public enterprise products, business, business scope and patent data, etc. Optionally, perform preprocessing on the enterprise description text, including feature selection, word segmentation, stop word removal, length fillin...

Embodiment 3

[0113] Such as figure 2 As shown, the present invention also includes a functional module architecture that is completely consistent with the aforementioned method flow, that is, the embodiment of the present invention also provides a device for automatically obtaining multi-level classification training data of an enterprise, including:

[0114] Information acquisition module 201, used to acquire industry information, product name information and enterprise description text;

[0115] An industry level generation module 202, configured to generate an industry level system according to the industry information;

[0116] The keyword list acquisition module 203 is used for clustering the product name information and associating with the industry hierarchy system to obtain an industry multi-level keyword list;

[0117] The industry label acquisition module 204 is used to mark the corresponding industry classification for the enterprise according to the keywords matching the mult...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a device for automatically obtaining enterprise multi-level classification training data. The method includes: obtaining industry information, product name information and enterprise description text; generating an industry hierarchy system according to the industry information; clustering the product name information and associating the industry hierarchy system to obtain an industry multi-level keyword list; According to the keywords matching the multi-level keyword list of the industry in the enterprise description text, mark the corresponding industry classification for the enterprise, and obtain the industry labels of each level; form training data according to the enterprise description text and the industry labels of each level of the enterprise . Adopting this solution can not only perform accurate multi-level classification and labeling of enterprises based on the public information of enterprises, and automatically obtain training data, which solves the problem of cumbersome and inefficient manual labeling of data; moreover, it is conducive to solving the multi-level classification problem of tens of millions of enterprises. , improve the accuracy of enterprise multi-level classification.

Description

technical field [0001] The invention relates to the technical field of data classification, in particular to a method and a device for automatically obtaining enterprise multi-level classification training data. Background technique [0002] The industry label of an enterprise is an important field, and the number of enterprises has tens of millions, and they are incubating at a very fast speed every day. Therefore, it is a very important task to classify enterprises by industry. [0003] At present, the usual method for industry classification of enterprises is: first, manually label data, and then use machine learning algorithms to model according to the manual label data. The process generally includes text labeling, text expression, classifier selection and training, and classification results. evaluation and feedback process. Commonly used enterprise classification algorithms include k-nearest neighbor, decision tree, multi-layer perceptron, naive Bayesian, logistic re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F16/35G06F40/247
CPCG06F16/3344G06F16/35G06F40/247
Inventor 孙会峰邢婷李健诚易航魏小敏
Owner 北京创新智源科技有限公司