Data missing processing method and system based on maximizing set partition information

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A set division, data missing technology, applied in the direction of digital data processing, special data processing applications, patient-specific data, etc., can solve the problems of masking the original data rules, large amount of calculation, etc., to improve efficiency, reduce the amount of calculation, avoid The effect of data errors

Active Publication Date: 2022-05-20

SICHUAN ACADEMY OF MEDICAL SCI SICHUAN PROVINCIAL PEOPLES HOSPITAL

View PDF6 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] The purpose of the present invention is to provide a data missing processing method and system based on the maximization of set division information to overcome the problems of large amount of calculation and concealing the original data rules existing in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0046] A data missing processing method based on maximizing the amount of set partition information, obtain patient data, the patient data contains N patient samples, each patient contains F features, there are missing values in the obtained data, and the acquired N The F characteristic data of the patient are saved in the form of matrix S,

[0047] Transform the matrix S to obtain the matrix T, and the mapping relationship of transforming the matrix S into the matrix T is: if S i,j Existence of acquired data, will define T i,j =C, C is a constant, if S i,j In the absence of acquired data, T i,j =a i / F×C, where a i For the number of non-missing data in the i-th sample data, calculate the sum of each column of the matrix T to get Sum 1 ,Sum 2 ,…,Sum F ,

[0048] where i=1,...,N,

[0049] j=1,...,F,

[0050] And i, j, N and F are all positive integers,

[0051] According to the sum of each column of the matrix T from small to large, the feature data under the colum...

Embodiment 2

[0060] For a missing data set, where the number of samples = N and the number of features = F, different methods of deleting missing data can be used to obtain subsets with complete data containing different amounts of data, and the subset containing the largest amount of data can be selected. Set as the optimal subset for subsequent data analysis. Taking Table 1 as an example, V represents the observed value, and the blank is the missing value.

[0061]

[0062] Table 1 Raw data

[0063] For example, if you delete the fourth column and delete the corresponding row with missing data, you can get a subset containing samples 2, 5, 6, 8, and features 1, 2, and 3; similarly, delete the third column to get a subset containing samples 2, 8, a subset of features 1, 2, 4; delete the 3rd and 4th columns, you can get a subset containing samples 2, 3, 5, 6, 8, features 1, 2... But with the feature As the number increases, the number of deletion methods also increases. In this exampl...

Embodiment 3

[0076] Take the extreme data of table 6 as an example to analyze the method of the present invention, the data of the 6th characteristic of this data are missing data,

[0077]

[0078] The raw data of table 6 embodiment 3

[0079] The data set has 10 samples and 6 features, and the missing conditions of each column are different. It is assumed that the number of features observed in each sample is a n , replace all V in the data set with 100, and replace the missing data of each sample with m n =a n / F 100, for sample 1, m n =100 / 3, and so on, the observed variables and missing data in the data set are replaced accordingly, and the values of each column are summed, so Table 7 can be obtained,

[0080]

[0081] Table 7 The intermediate data after conversion of embodiment 3

[0082] Here, for the convenience of calculation, m n Rounding off is performed in the calculation. The obtained sum can reflect the data retention of each sample on the feature. The lar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention relates to the technical field of medical data processing, and in particular to a data missing processing method and system for maximizing the amount of information based on set division. The system includes a data acquisition unit, a data processing unit, a feature deletion unit, and an optimal subset output unit. The present invention obtains a method for quickly finding the optimal subset of missing data by judging the amount of information, greatly reduces the amount of calculation, and improves the efficiency of data processing in medical data analysis. The method of the invention provides a new idea for the missing data in the medical field, and avoids the problems of large amount of calculation and concealing real data rules caused by the traditional deletion method and filling method.

Description

technical field [0001] The present invention relates to the technical field of medical data processing, in particular to a data missing processing method and system for maximizing the amount of information based on set division. Background technique [0002] The problem of missing data is usually unavoidable in real-world research, not only the outcome variables may be missing, but covariates may also be missing. There may be many reasons for the lack of data, such as: 1. The patient refused to answer specific questions, such as the patient did not report sensitive information such as income data; 2. The patient was lost to follow-up, such as patient migration, death, withdrawal from the study, etc.; Some patients arrange certain examinations. For example, cholesterol examinations are not arranged for some patients; 4. Investigator or mechanical failure, such as investigators forgetting to record data due to subjective reasons, sphygmomanometer failure, etc. [0003] The la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06F16/215G16H10/60

CPCG06F16/215G16H10/60

Inventor吴行伟童荣生常欢吴竞鲜温亚林

OwnerSICHUAN ACADEMY OF MEDICAL SCI SICHUAN PROVINCIAL PEOPLES HOSPITAL

Data missing processing method and system based on maximizing set partition information

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology