Unlock instant, AI-driven research and patent intelligence for your innovation.

Rapid comparison method for clinical test data

A clinical trial and data technology, applied in database indexing, database updating, structured data retrieval, etc., can solve the problems of not using structured or semi-structured data, not considering the unique identification column, and having no way to solve it, and achieves good results. Application prospect and effect of commercial value

Active Publication Date: 2022-07-26
广东杰纳医药科技有限公司 +1
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Compared with row-by-row comparison, this method can significantly reduce the complexity of the algorithm, but there are two main problems: 1. It does not consider when the unique identification column changes, but the comparison column does not change or only a small part of the change, At this time, this line is not deleted, but belongs to attribute or attribute value modification
2. This patent only considers the deletion and addition of rows between the reference file and the target file, as well as the modification of attributes between columns, and does not consider the addition and deletion of columns. In other words, this patent only considers two There is no way to solve the situation that the attributes of two files are consistent, but for the situation that the attribute column can be added or deleted
[0007] Another prior art proposes a method and device for comparing file differences, which proposes to convert the content of the file before and after the change into the first matrix and the second matrix, and convert the first matrix and the second matrix into the first matrix and the second matrix. The feature column and the second feature column are converted into a hash table and then compared. This patent is mainly for the comparison of unstructured data differences, and does not take advantage of the characteristics of structured or semi-structured data, so it is not applicable to Scenario of this application

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rapid comparison method for clinical test data
  • Rapid comparison method for clinical test data
  • Rapid comparison method for clinical test data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0077] This embodiment provides a rapid comparison method of clinical trial data, such as figure 1 As shown, the clinical trial data is stored in sas7bdat or excel format, and is stored in row units, each row contains multiple columns, the original storage file is used as the benchmark file, and the modified storage file is used as the target file. The comparison method includes: The following steps:

[0078] S1: Select certain columns in the clinical trial data as feature columns to form a feature column configuration file, and the values ​​of the feature columns are combined to be unique in the database;

[0079] In this embodiment, the value of the feature column is unique, but it cannot be considered that the value of the feature column cannot be modified. A typical example is that, usually in clinical trials, subject ID is taken as a feature column, but there are also cases where clinical operators misread subject ID and incorrectly record data from one subject to anothe...

Embodiment 2

[0091] On the basis of Embodiment 1, this embodiment continues to disclose the following content:

[0092] In step S3, the hash value of the numerical value of each row located in the common feature column in the reference file and the target file is respectively obtained, specifically:

[0093] First, sort the feature columns, and sort the feature columns according to the column names of the feature columns;

[0094] For each row of data, extract the values ​​of each feature column in order, and calculate its hash value according to the type of the value. For data types that can be upgraded to 64-bit integer or double-precision floating-point data, or less than 8 bytes String, fill these data types to 8 bytes, convert them to hexadecimal strings and use them as hash values; for other data types, use hash algorithms (such as sha1 algorithm, but not limited to sha1 algorithm, other hash algorithms are all OK) Convert the value to a hash value of length 8byte.

[0095] In step...

Embodiment 3

[0109] This embodiment also provides an optimized processing method based on the basic algorithm flow of Embodiment 1 and Embodiment 2 for the situation that a new column may appear in a table, or an entire column is deleted, or an entire column is changed.

[0110] When there is a change in the entire column in the comparison table (including adding a new column, deleting an entire column, and modifying an entire column), in step S6 "if the key hashes are consistent but the row hashes do not match, compare the values ​​of the two rows column by column, and find Inconsistent data points and output" This process will be executed for each row comparison, which will degenerate the complexity of the algorithm from O(R) to O(R*C) in this case, where R is the table row. number, C is the number of table columns. Entire row / column changes are very common in data set changes. In order to ensure the efficiency of the algorithm in this case, this embodiment proposes an improved algorithm...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a rapid comparison method for clinical test data, which can complete one-time file comparison with time complexity close to O (R) (R is the line number of a target file) during comparison by generating a key hash table and a line hash table, and can cope with the condition of random arrangement of each line when a database is exported. Secondly, by establishing the key hash cluster, it is guaranteed that when key hash changes, the corresponding row can still be quickly found, the situation that records of the row are marked as deleted and newly added due to individual change of the key hash is avoided, and data managers can accurately track changes of all data points to the maximum extent. And thirdly, through the improved row Hash comparison method, when the whole row change occurs in the data set, the comparison can still be completed with the time complexity close to O (R), and the algorithm is not degraded into a value-by-value comparison algorithm. The method is designed for database change characteristics of CRO, and has very good application prospect and commercial value in a CRO business scene.

Description

technical field [0001] The invention relates to the field of data comparison, and more particularly, to a method for rapid comparison of clinical trial data. Background technique [0002] In pharmaceutical CRO enterprises, various experimental data generated in clinical experiments need to be analyzed and processed, and the generated data is generally stored in sas7bdat or excel format. sas7bdat / excel is a typical structured data with the following characteristics: 1) in units of rows; 2) the relationship between rows is independent, but there is an association, and each row of the data represents the individual of a subject information, medication records or adverse reactions, etc. However, certain columns of some rows depend on common data, and if they change, they will change together; 3) Some columns are fixed and will definitely exist, while other columns may be added or deleted as the experiment progresses. As the experiment progresses, a sas7bdat / excel data file may...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/22G06F16/23G06F16/81
CPCG06F16/2255G06F16/2365G06F16/81
Inventor 康灿平邓亮
Owner 广东杰纳医药科技有限公司