Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for automatically acquiring multi-level classification training data of enterprise

A technology for training data and automatic acquisition, applied in the field of data classification, can solve problems such as low work efficiency, inability to meet practical applications, affecting the efficiency and accuracy of enterprise industry classification, and achieve the effect of improving accuracy.

Active Publication Date: 2021-01-29
北京创新智源科技有限公司
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, manual labeling of data is cumbersome and inefficient, especially when faced with a large amount of data, the work is heavy and the efficiency is low, which directly affects the efficiency and accuracy of the industry classification of the enterprise, and is far from meeting the needs of practical applications.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for automatically acquiring multi-level classification training data of enterprise
  • Method and device for automatically acquiring multi-level classification training data of enterprise
  • Method and device for automatically acquiring multi-level classification training data of enterprise

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] Such as figure 1 As shown, the embodiment of the present invention provides a method for automatically obtaining enterprise multi-level classification training data, including:

[0056] S101. Obtain industry information, product name information and enterprise description text;

[0057] S102. Generate an industry hierarchy system according to the industry information;

[0058] S103. Clustering the product name information and associating with the industry hierarchy system to obtain a multi-level industry keyword list;

[0059] S104. According to the keywords in the enterprise description text that match the multi-level keyword list of the industry, mark the corresponding industry classification for the enterprise, and obtain the industry labels of each level;

[0060] S105. Form training data according to the enterprise description text and industry labels of each level of the enterprise.

[0061] Optionally, in step S101, the industry information, product name informa...

Embodiment 2

[0105] An embodiment of the present invention provides a method for multi-level classification of an enterprise, including:

[0106] Utilize the training data that the method described in embodiment one obtains to train classification algorithm, obtain enterprise classification model;

[0107] Enter the enterprise description text into the enterprise classification model to obtain the multi-level industry classification of the enterprise.

[0108] Specifically, after obtaining the training data using the method described in Embodiment 1, select the BiLSTM classification algorithm, use the training data to train the BiLSTM classification algorithm, and obtain a reliable enterprise classification model.

[0109] The enterprise description text includes the public enterprise products, business, business scope and patent data, etc. Optionally, perform preprocessing on the enterprise description text, including feature selection, word segmentation, stop word removal, length fillin...

Embodiment 3

[0113] Such as figure 2 As shown, the present invention also includes a functional module architecture that is completely consistent with the aforementioned method flow, that is, the embodiment of the present invention also provides a device for automatically obtaining multi-level classification training data of an enterprise, including:

[0114] Information acquisition module 201, used to acquire industry information, product name information and enterprise description text;

[0115] An industry level generation module 202, configured to generate an industry level system according to the industry information;

[0116] The keyword list acquisition module 203 is used for clustering the product name information and associating with the industry hierarchy system to obtain an industry multi-level keyword list;

[0117] The industry label acquisition module 204 is used to mark the corresponding industry classification for the enterprise according to the keywords matching the mult...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and device for automatically acquiring multi-level classification training data of an enterprise. The method comprises the steps of: obtaining industrial information,product name information and an enterprise description text; generating an industrial level system according to the industrial information; clustering the product name information and associating theproduct name information with the industrial level system to obtain an industrial multi-level keyword list; according to keywords matched with the industry multi-level keyword list in the enterprise description text, marking corresponding an industry classification for the enterprise to obtain industry labels of all levels; and forming training data according to the enterprise description text andthe industrial labels of each level of the enterprise. By adopting the technical schemes, accurate multi-level classification marking can be performed on the enterprise according to the information disclosed by the enterprise to automatically obtain the training data, and the problem that manual data marking is tedious and low in efficiency is solved; moreover, the multi-level classification problem of ten-million-level enterprises is solved, and the accuracy of multi-level classification of the enterprises is improved.

Description

technical field [0001] The invention relates to the technical field of data classification, in particular to a method and a device for automatically obtaining enterprise multi-level classification training data. Background technique [0002] The industry label of an enterprise is an important field, and the number of enterprises has tens of millions, and they are incubating at a very fast speed every day. Therefore, it is a very important task to classify enterprises by industry. [0003] At present, the usual method for industry classification of enterprises is: first, manually label data, and then use machine learning algorithms to model according to the manual label data. The process generally includes text labeling, text expression, classifier selection and training, and classification results. evaluation and feedback process. Commonly used enterprise classification algorithms include k-nearest neighbor, decision tree, multi-layer perceptron, naive Bayesian, logistic re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/35G06F40/247
CPCG06F16/3344G06F16/35G06F40/247
Inventor 孙会峰邢婷李健诚易航魏小敏
Owner 北京创新智源科技有限公司