Data classification method, device and apparatus and storage medium

A data classification and data technology, applied in the field of data processing, can solve the problem of consuming large computing resources and time resources.

Pending Publication Date: 2021-03-19
科大讯飞(北京)有限公司 +2
View PDF10 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the present application provides a data classification method, device, equipment and storage medium to solve the problem in the prior art that classification schemes based on multiple classification models need to consume a large amount of computing resources and time resources. The scheme is as follows:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data classification method, device and apparatus and storage medium
  • Data classification method, device and apparatus and storage medium
  • Data classification method, device and apparatus and storage medium

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0087] see figure 1 , which shows a schematic flowchart of the data classification method provided by the embodiment of the present application, the method may include:

[0088] Step S101: Obtain data to be classified.

[0089] Wherein, the data to be classified may be, but not limited to, text, image, audio, video and other data.

[0090] Step S102: Input the data to be classified into the pre-established student classification model, and obtain the classification result output by the student classification model.

[0091] Wherein, the classification result output by the student classification model includes numerical values ​​that can represent the probability that the data to be classified belongs to each set category.

[0092] Exemplarily, if the categories include category y1, category y2, and category y3, then the classification results output by the student classification model include l1, l2, and l3, wherein l1 can represent the possibility that the data to be classi...

no. 2 example

[0099] It can be seen from the above embodiments that the category of the data to be classified is determined based on the student classification model, and the student classification model is trained using the training data in the training set. This embodiment introduces the training process of the student classification model.

[0100] see figure 2 , showing a schematic flow chart of the training process of the student classification model, which may include:

[0101] Step S201: Obtain multiple pieces of training data from the constructed training set to form a training subset.

[0102] Wherein, the amount of training data in the training subset can be set according to actual conditions.

[0103] Step S202: Input each piece of training data in the training subset into multiple teacher classification models, and obtain classification results predicted by the multiple teacher classification models for each piece of training data in the training subset.

[0104] Assuming tha...

no. 3 example

[0122] It can be known from the above embodiments that the student classification model is trained using the training data in the constructed training set. This embodiment introduces the process of constructing the training set.

[0123] There are many ways to implement training set construction, and in a possible implementation way, the process of building training set may include:

[0124] Obtain the first data set and the second data set, wherein, each piece of data in the first data set is data marked with a category, and each piece of data in the second data set is unlabeled data; the data in the first data set and the second data set The data in the two data sets are mixed, and the training set is composed of the mixed data.

[0125] Considering that there may be some unlabeled data of poor quality in the second data set, in order to prevent these unlabeled data of poor quality from affecting the training of the student classification model, this embodiment provides anot...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data classification method, device and apparatus and a storage medium, and the method comprises the steps: obtaining to-be-classified data; inputting the to-be-classified datainto a pre-established first classification model to obtain a classification result, the first classification model being obtained by training training data in a training set, and the training targetof the first classification model being a classification result; enabling the classification result predicted for the training data to tend to a fusion result of a plurality of pre-established secondclassification models for the classification result predicted for the training data; and according to a classification result predicted by the first classification model for the to-be-classified data, determining a category to which the to-be-classified data belongs. According to the data classification method provided by the invention, accurate classification of the to-be-classified data can berealized through one first classification model, and less computing resources and time resources are consumed for realizing data classification.

Description

technical field [0001] The present application relates to the technical field of data processing, and in particular to a data classification method, device, equipment and storage medium. Background technique [0002] Data classification refers to determining the category of the data to be classified from the set categories. For example, the data to be classified is news data D, and the set categories are "military", "people's livelihood", "technology", ..., Data classification refers to determining the category of the news data D from categories such as "military", "people's livelihood", and "science and technology". [0003] At present, most of the data classification schemes are classification schemes based on classification models. Among the classification schemes based on classification models, the classification schemes based on multiple classification models are more widely used, that is, multiple classification models are pre-trained, and the data to be classified Th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/906G06K9/62
CPCG06F16/906G06F18/254G06F18/214
Inventor 杨子清崔一鸣王士进胡国平刘挺
Owner 科大讯飞(北京)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products