Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data missing processing method and system based on maximizing set partition information

A set division, data missing technology, applied in the direction of digital data processing, special data processing applications, patient-specific data, etc., can solve the problems of masking the original data rules, large amount of calculation, etc., to improve efficiency, reduce the amount of calculation, avoid The effect of data errors

Active Publication Date: 2022-05-20
SICHUAN ACADEMY OF MEDICAL SCI SICHUAN PROVINCIAL PEOPLES HOSPITAL
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to provide a data missing processing method and system based on the maximization of set division information to overcome the problems of large amount of calculation and concealing the original data rules existing in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data missing processing method and system based on maximizing set partition information
  • Data missing processing method and system based on maximizing set partition information
  • Data missing processing method and system based on maximizing set partition information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] A data missing processing method based on maximizing the amount of set partition information, obtain patient data, the patient data contains N patient samples, each patient contains F features, there are missing values ​​in the obtained data, and the acquired N The F characteristic data of the patient are saved in the form of matrix S,

[0047] Transform the matrix S to obtain the matrix T, and the mapping relationship of transforming the matrix S into the matrix T is: if S i,j Existence of acquired data, will define T i,j =C, C is a constant, if S i,j In the absence of acquired data, T i,j =a i / F×C, where a i For the number of non-missing data in the i-th sample data, calculate the sum of each column of the matrix T to get Sum 1 ,Sum 2 ,…,Sum F ,

[0048] where i=1,...,N,

[0049] j=1,...,F,

[0050] And i, j, N and F are all positive integers,

[0051] According to the sum of each column of the matrix T from small to large, the feature data under the colum...

Embodiment 2

[0060] For a missing data set, where the number of samples = N and the number of features = F, different methods of deleting missing data can be used to obtain subsets with complete data containing different amounts of data, and the subset containing the largest amount of data can be selected. Set as the optimal subset for subsequent data analysis. Taking Table 1 as an example, V represents the observed value, and the blank is the missing value.

[0061]

[0062] Table 1 Raw data

[0063] For example, if you delete the fourth column and delete the corresponding row with missing data, you can get a subset containing samples 2, 5, 6, 8, and features 1, 2, and 3; similarly, delete the third column to get a subset containing samples 2, 8, a subset of features 1, 2, 4; delete the 3rd and 4th columns, you can get a subset containing samples 2, 3, 5, 6, 8, features 1, 2... But with the feature As the number increases, the number of deletion methods also increases. In this exampl...

Embodiment 3

[0076] Take the extreme data of table 6 as an example to analyze the method of the present invention, the data of the 6th characteristic of this data are missing data,

[0077]

[0078] The raw data of table 6 embodiment 3

[0079] The data set has 10 samples and 6 features, and the missing conditions of each column are different. It is assumed that the number of features observed in each sample is a n , replace all V in the data set with 100, and replace the missing data of each sample with m n =a n / F 100, for sample 1, m n =100 / 3, and so on, the observed variables and missing data in the data set are replaced accordingly, and the values ​​​​of each column are summed, so Table 7 can be obtained,

[0080]

[0081] Table 7 The intermediate data after conversion of embodiment 3

[0082] Here, for the convenience of calculation, m n Rounding off is performed in the calculation. The obtained sum can reflect the data retention of each sample on the feature. The lar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to the technical field of medical data processing, and in particular to a data missing processing method and system for maximizing the amount of information based on set division. The system includes a data acquisition unit, a data processing unit, a feature deletion unit, and an optimal subset output unit. The present invention obtains a method for quickly finding the optimal subset of missing data by judging the amount of information, greatly reduces the amount of calculation, and improves the efficiency of data processing in medical data analysis. The method of the invention provides a new idea for the missing data in the medical field, and avoids the problems of large amount of calculation and concealing real data rules caused by the traditional deletion method and filling method.

Description

technical field [0001] The present invention relates to the technical field of medical data processing, in particular to a data missing processing method and system for maximizing the amount of information based on set division. Background technique [0002] The problem of missing data is usually unavoidable in real-world research, not only the outcome variables may be missing, but covariates may also be missing. There may be many reasons for the lack of data, such as: 1. The patient refused to answer specific questions, such as the patient did not report sensitive information such as income data; 2. The patient was lost to follow-up, such as patient migration, death, withdrawal from the study, etc.; Some patients arrange certain examinations. For example, cholesterol examinations are not arranged for some patients; 4. Investigator or mechanical failure, such as investigators forgetting to record data due to subjective reasons, sphygmomanometer failure, etc. [0003] The la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/215G16H10/60
CPCG06F16/215G16H10/60
Inventor 吴行伟童荣生常欢吴竞鲜温亚林
Owner SICHUAN ACADEMY OF MEDICAL SCI SICHUAN PROVINCIAL PEOPLES HOSPITAL
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products