Unlock instant, AI-driven research and patent intelligence for your innovation.

Data classification identifier determination method, device, electronic equipment and storage medium

A data classification and determination method technology, applied in the computer field, can solve the problems of inability to label data, high time cost, large amount of data, etc., and achieve the effect of improving convenience and efficiency

Pending Publication Date: 2021-11-30
联仁健康医疗大数据科技股份有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, when classifying and marking data according to professional doctors, the energy and time of professional doctors are limited, resulting in the inability to effectively classify all data; furthermore, the existing data volume is large, and data is marked manually , there are technical problems of high time cost and low efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data classification identifier determination method, device, electronic equipment and storage medium
  • Data classification identifier determination method, device, electronic equipment and storage medium
  • Data classification identifier determination method, device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] figure 1 It is a schematic flow chart of a method for determining a data classification identifier provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation of classifying hundreds of millions of data, and the method can be executed by a data classification identifier determination device , the apparatus may be implemented in the form of software and / or hardware, and the hardware may be an electronic device, such as a mobile terminal or a PC.

[0028] Before introducing the technical solution, an example description may be given to the application scenario. In medical scenarios, hundreds of millions of data can be generated every day. Some data categories in these data are the same, and some data categories are different. Optionally, if it is medical data, the medical data may include data corresponding to different disease types, and the disease levels corresponding to different disease types are different. Exemplarily, the di...

Embodiment 2

[0062] As an optional embodiment of the above embodiment, figure 2 It is a schematic flowchart of a method for determining a data classification identifier provided in Embodiment 2 of the present invention. see figure 2 , to obtain unlabeled sample data, that is, to obtain unlabeled labeled data to be classified. Using existing data clustering methods, the labeled data to be classified is clustered into M clusters. According to the preset total amount of data, stratified sampling is performed on each labeled cluster to obtain at least one labeled data to be classified. That is, to cluster the unlabeled samples N, because it is big data, store and calculate it on the cloud computing platform Hadoop, use spark-based K-means, hierarchical clustering and other clustering algorithms to divide the unlabeled samples into m clusters, and the number of samples in each cluster is Ni, where i=1,2,...,m. 3. Perform stratified sampling for each cluster Ni, draw p% samples, take the c...

Embodiment 3

[0066] image 3 A schematic structural diagram of a device for determining a data classification identifier provided by Embodiment 3 of the present invention, the device includes: a data determination module 310 , a category label determination module 320 and a category identifier determination module 330 .

[0067] Among them, the data determination module 310 is used to obtain at least one label data to be classified from each label cluster to be classified; the category label determination module 320 is used to input each label data to be classified into a pre-trained data classification model, Obtain the category label corresponding to each labeled data to be classified; wherein the data classification model is trained based on the training data and the category label corresponding to the training data; the category identification determination module 330 is used for according to each category label The corresponding cluster identifiers of the labeled data to be classified...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the invention provide a data classification identifier determination method, a device, electronic equipment and a storage medium. The method comprises the steps of obtaining at least one piece of to-be-classified mark data from each to-be-classified mark class cluster; inputting the to-be-classified marked data into a pre-trained data classification model to obtain category labels corresponding to the to-be-classified marked data; wherein the data classification model is obtained by training based on training data and category labels corresponding to the training data; and according to the class cluster identifier of the to-be-classified marked data corresponding to each class label, determining the class identifier of each to-be-classified marked class cluster. According to the technical scheme, the problems that in the prior art, under the condition that the data size is large, all the data need to be classified and marked in sequence, the marking amount is large, efficiency is low, and cost is high are solved, and the technical effect that all the data are accurately marked automatically, conveniently and efficiently is achieved.

Description

technical field [0001] The embodiments of the present invention relate to the field of computer technology, and in particular, to a method, device, electronic device, and storage medium for determining a data classification identifier. Background technique [0002] With the rapid development of information and communication technology, the amount of data in the world has exploded. In the face of massive and complex data, effective data analysis and data depth play a vital role. [0003] In specific applications, data classification labeling plays a vital role in data mining. For example, there are a lot of health check-up data. Based on the health data, it can be determined whether the experience user has other diseases such as high blood pressure and high blood sugar. At the same time, the risk level of the disease can also be marked. [0004] However, when classifying and marking data according to professional doctors, the energy and time of professional doctors are limi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G16H50/20G16H50/30
CPCG16H50/20G16H50/30G06F18/23G06F18/24
Inventor 刘伟业
Owner 联仁健康医疗大数据科技股份有限公司