Rapid comparison method for clinical test data
A clinical trial and data technology, applied in database indexing, database updating, structured data retrieval, etc., can solve the problems of not using structured or semi-structured data, not considering the unique identification column, and having no way to solve it, and achieves good results. Application prospect and effect of commercial value
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0077] This embodiment provides a rapid comparison method of clinical trial data, such as figure 1 As shown, the clinical trial data is stored in sas7bdat or excel format, and is stored in row units, each row contains multiple columns, the original storage file is used as the benchmark file, and the modified storage file is used as the target file. The comparison method includes: The following steps:
[0078] S1: Select certain columns in the clinical trial data as feature columns to form a feature column configuration file, and the values of the feature columns are combined to be unique in the database;
[0079] In this embodiment, the value of the feature column is unique, but it cannot be considered that the value of the feature column cannot be modified. A typical example is that, usually in clinical trials, subject ID is taken as a feature column, but there are also cases where clinical operators misread subject ID and incorrectly record data from one subject to anothe...
Embodiment 2
[0091] On the basis of Embodiment 1, this embodiment continues to disclose the following content:
[0092] In step S3, the hash value of the numerical value of each row located in the common feature column in the reference file and the target file is respectively obtained, specifically:
[0093] First, sort the feature columns, and sort the feature columns according to the column names of the feature columns;
[0094] For each row of data, extract the values of each feature column in order, and calculate its hash value according to the type of the value. For data types that can be upgraded to 64-bit integer or double-precision floating-point data, or less than 8 bytes String, fill these data types to 8 bytes, convert them to hexadecimal strings and use them as hash values; for other data types, use hash algorithms (such as sha1 algorithm, but not limited to sha1 algorithm, other hash algorithms are all OK) Convert the value to a hash value of length 8byte.
[0095] In step...
Embodiment 3
[0109] This embodiment also provides an optimized processing method based on the basic algorithm flow of Embodiment 1 and Embodiment 2 for the situation that a new column may appear in a table, or an entire column is deleted, or an entire column is changed.
[0110] When there is a change in the entire column in the comparison table (including adding a new column, deleting an entire column, and modifying an entire column), in step S6 "if the key hashes are consistent but the row hashes do not match, compare the values of the two rows column by column, and find Inconsistent data points and output" This process will be executed for each row comparison, which will degenerate the complexity of the algorithm from O(R) to O(R*C) in this case, where R is the table row. number, C is the number of table columns. Entire row / column changes are very common in data set changes. In order to ensure the efficiency of the algorithm in this case, this embodiment proposes an improved algorithm...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


