Large-scale unbalanced diabetes electronic medical record parallel classification neighborhood evidence Spark method

A technology of electronic medical records and diabetes, which is applied in the field of intelligent processing of medical information, can solve the problems of large amount of data, too many attributes of experimental test data, unbalanced parallel classification of electronic medical records of diabetes, and improve efficiency and accuracy. The effect of applying value

Active Publication Date: 2021-06-22
NANTONG UNIVERSITY
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a large-scale unbalanced diabetes electronic medical record parallel classification neighborhood evidence Spark method, which solves the problem that the existing effective way to judge the state of diabetic lesions is to pass the pathological characteristic experiment of the etiology and pathogenesis of diabetes, resulting in experimental test data Too many attributes and a large amount of data will increase the workload of doctors in judging the pathological conditions of diabetic patients

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale unbalanced diabetes electronic medical record parallel classification neighborhood evidence Spark method
  • Large-scale unbalanced diabetes electronic medical record parallel classification neighborhood evidence Spark method
  • Large-scale unbalanced diabetes electronic medical record parallel classification neighborhood evidence Spark method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] see Figure 1 to Figure 3 , the present invention provides its technical scheme, a kind of neighborhood evidence Spark method for parallel classification of large-scale imbalanced diabetes electronic medical records, comprising the following steps:

[0063] Step 1. On the master node Master, read the large-scale unbalanced diabetes electronic medical record data set through the Hadoop distributed file system HDFS, and divide the training data set S according to the ratio of 4:1 TR and the test dataset S TE , the training dataset S will be TR Send it to the m child node, and convert the data into a four-tuple decision information system S=, the decision information system S is expressed as follows:

[0064] S=, where U={x 1 ,x 2 ,K,x M} represents the set of patient objects in the diabetes electronic medical record data set, M represents the number of diabetic electronic medical record patients; C={a 1 , a 2 , K, a n} represents the non-empty finite set of pathol...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a large-scale unbalanced diabetes electronic medical record parallel classification neighborhood evidence Spark method. The method comprises the steps: reading diabetes data on a main node, and dividing the diabetes data into a training set and a test set according to a ratio of 4: 1; carrying out Spark parallel undersampling on the diabetes training set on the child nodes to obtain a plurality of new training subsets; obtaining pathological feature reduction subsets on the sub-nodes through a Spark parallel pathological feature reduction device, updating pathological feature sets of the training subsets and the test subsets on each sub-node, obtaining prediction category label sets of the test subsets on the sub-nodes through a neighborhood evidence Spark parallel classifier, and obtaining prediction category label sets of the test subsets on the sub-nodes; and obtaining a final prediction category label on the main node according to a voting mechanism. The method has the beneficial effects that redundant attributes in large-scale data are removed, the calculation efficiency is improved, support information among samples is fully utilized, and the efficiency and precision of diabetes data classification are improved.

Description

technical field [0001] The invention relates to the technical field of medical information intelligent processing, in particular to a large-scale unbalanced diabetes electronic medical record parallel classification neighborhood evidence Spark method. Background technique [0002] Diabetes mellitus (DM) refers to the sugar, protein, fat, water and electrolytes caused by the decline of islet function and insulin resistance caused by various pathogenic factors such as genetic factors, endocrine, and dietary imbalance after dysfunction. A series of metabolic disorder syndromes. At the same time, there are more types of complications caused by diabetes, and doctors cannot effectively and accurately determine whether a patient has diabetes only by relying on the patient's physical signs. [0003] At present, the effective way to judge the condition of diabetes is through the pathological characteristic experiment of the etiology and pathogenesis of diabetes, but the experiment n...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16H10/60G16H15/00G06F16/182G06K9/62
CPCG16H10/60G16H15/00G06F16/182G06F18/24G06F18/214
Inventor 丁卫平李铭孙颖秦廷帧鞠恒荣黄嘉爽高自强潘壬远
Owner NANTONG UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products