Automated labeling method and system for clinical trial center laboratory records

The automated labeling method enables efficient, accurate, and standardized management of laboratory records in clinical trial centers, solving the problems of inefficiency and errors caused by traditional manual recording.

CN122240731APending Publication Date: 2026-06-19GUANGZHOU JINYILI PHARM TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGZHOU JINYILI PHARM TECH CO LTD
Filing Date
2026-03-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The record management in clinical trial center laboratories is inefficient, prone to human error, and difficult to standardize and trace.

Method used

An automated labeling method is adopted, which realizes the automated labeling of experimental records through information recognition, type judgment and label generation modules.

Benefits of technology

It improved the efficiency and standardization of record management, reduced human error, and ensured the accuracy and traceability of records.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240731A_ABST
    Figure CN122240731A_ABST
Patent Text Reader

Abstract

This invention relates to the field of label management, specifically to an automated labeling method and system for laboratory records in clinical trial centers. The method involves collecting experimental records, identifying information from the records, obtaining the record information returned by the records, matching the record information according to preset standards to determine the record type to which the experimental record belongs, scheduling the corresponding label naming template, and generating record labels for the experimental records based on the label naming template.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of label management, specifically to an automated labeling method and system for clinical trial center laboratory records. Background Technology

[0002] A clinical trial center laboratory is a laboratory designated by the sponsor or research organization in multi-center clinical trials to centrally process and analyze trial-related biological samples (such as blood, urine, tissue, etc.) or perform specific tests to ensure the consistency, accuracy, and traceability of test results. Its test results have a significant impact on the effectiveness and safety evaluation of clinical trials. Therefore, quality assurance of the center laboratory's records is a key area of ​​inspection by all parties. The company's clinical trial business needs to process 500,000 records (including original records and public records) annually. Using traditional manual recording and archiving can lead to low work efficiency and errors during the processing. Summary of the Invention

[0003] The technical problem to be solved by the present invention is to provide an automated labeling method and system for clinical trial center laboratory records, which can solve the problems in the prior art.

[0004] This invention is achieved through the following technical solution: This invention provides an automated labeling method for clinical trial center laboratory records, comprising: Collect experimental records and perform information recognition on the experimental records to obtain the recording information fed back by the experimental records; The recorded information is matched according to preset criteria to determine the record type to which the experimental record belongs; Schedule the tag naming template corresponding to the record type, and generate record tags for the experimental record based on the tag naming template.

[0005] This invention provides an automated labeling system for laboratory records in clinical trial centers, used to implement the automated labeling method for laboratory records in clinical trial centers as described in any one of the first aspects, comprising: The record recognition module is used to collect experimental records and perform information recognition on the experimental records to obtain the record information fed back by the experimental records; The type determination module is used to perform information matching on the recorded information according to preset standards in order to determine the record type to which the experimental record belongs; The tag generation module is used to schedule the tag naming template corresponding to the record type, and generate record tags for the experimental record based on the tag naming template.

[0006] In summary, the beneficial effects of this invention are: This invention acquires recorded information through information recognition, enabling efficient and accurate extraction of key content from a large number of experimental records. It uses preset standard matching to determine the record type, achieving precise classification of records. It generates record tags based on tag naming templates, making record naming standardized and uniform. Overall, it improves record management efficiency, facilitates quick retrieval and location of required records, reduces errors and time consumption in manual operations, and enhances the scientific and standardized nature of laboratory record management in clinical trial centers. Attached Figure Description

[0007] For ease of explanation, the present invention will be described in detail below with reference to specific embodiments and accompanying drawings.

[0008] Figure 1 This is a schematic diagram illustrating the steps of an automated labeling method for clinical trial center laboratory records according to the present invention; Figure 2 This is a schematic diagram of the structure of an automated labeling system for clinical trial center laboratory records according to the present invention. Detailed Implementation

[0009] All features disclosed in this specification, or all steps in all disclosed methods or processes, may be combined in any way, except for mutually exclusive features and / or steps.

[0010] The following is combined Figure 1-2 The present invention will be described in detail below.

[0011] like Figure 1 As shown, the present invention provides an automated labeling method for clinical trial center laboratory records, comprising: S1: Collect experimental records and perform information recognition on the experimental records to obtain the recording information fed back by the experimental records; S2: Match the recorded information according to preset standards to determine the record type to which the experimental record belongs; S3: Schedule the tag naming template corresponding to the record type, and generate record tags for the experimental record based on the tag naming template.

[0012] Experimental records are collected from various data sources in the clinical trial center laboratory (such as experimental equipment, scanned copies of paper documents, spreadsheets, etc.). Based on the document format of the experimental records, the appropriate information correction channel is selected to correct the experimental records and obtain an optimized version of the experimental records. For scanned paper documents, noise reduction and skew correction are required; for spreadsheets, formatting errors need to be checked and corrected.

[0013] The optimized version of the experimental records is identified by OCR algorithm to obtain the recorded information fed back by the experimental records. During the OCR recognition process, one OCR recognition algorithm is selected as the main recognition algorithm from several pre-deployed OCR recognition algorithms, and one or more OCR recognition algorithms are selected as auxiliary recognition algorithms. The optimized version of the experimental records is identified by superimposing the main recognition algorithm and the auxiliary recognition algorithm.

[0014] Collecting experimental records is the foundation of the entire process. Only by collecting complete experimental records can subsequent processing and analysis be meaningful. Experimental records may have problems such as non-standard format and poor image quality. Information correction can eliminate these interfering factors and improve the accuracy of subsequent OCR recognition. Using multiple OCR recognition algorithms for superimposed recognition can give full play to the advantages of different algorithms, reduce recognition errors, and improve the reliability of recognition.

[0015] Based on preset keyword standards for each record type, the record information is matched with keywords to obtain the keyword matching degree of the experimental record relative to the keyword standards. The keyword matching degree is then analyzed for numerical range based on preset matching threshold standards, and handled in three cases: when the keyword matching degree is in the highest numerical range, the record type to which the experimental record belongs is confirmed based on the keyword matching degree; when the keyword matching degree is in the secondary numerical range, a pending confirmation mark is generated for the experimental record, and the experimental records with the pending confirmation mark are temporarily stored so that they can be identified by the record type recognition model at a specified time; when the keyword matching degree is in the low numerical range, the experimental record is transferred to the manual review channel.

[0016] Different types of experimental records require different processing methods and storage locations. Accurately identifying the record type helps in the classification and management of experimental records, improving management efficiency. Through keyword matching and numerical range analysis, the type of experimental record can be quickly and accurately determined. For records with low matching degree, using a record type identification model or manual review can further improve the accuracy of the judgment. For records with high matching degree, the type can be directly confirmed, reducing unnecessary processing. For records with low matching degree, using model recognition or manual review can make full use of resources and improve the accuracy of processing.

[0017] Based on the tag naming template, several naming element standards are deployed for the experimental records. The recorded information is then analyzed according to each naming element standard to obtain the naming element data for each naming element standard. The naming elements are divided into content type elements and format type elements. The recorded information is identified and matched with keywords based on the naming element standards of the content type elements to obtain the naming element data of the core feedback content used to provide feedback on the experimental records. The recorded information is analyzed and matched with information format based on the naming element standards of the format type elements to obtain the naming element data of the overall information format used to provide feedback on the experimental records.

[0018] The data of each naming element is combined and processed according to the tag naming template to obtain the record tags of the experimental records. The experimental records are then renamed according to the record tags. Using the tag naming template can ensure the consistency and standardization of the naming of experimental records, which facilitates subsequent querying, retrieval and management. By performing element analysis on the record information, extracting the core content and format information and combining them into record tags, the main content and characteristics of the experimental records can be accurately reflected, improving the readability and usability of the information. Renaming the experimental records according to the record tags can make the record names more intuitive and accurate, and facilitate classification, storage and management in the data archive.

[0019] In one embodiment of the present invention, the step of performing information identification on the experimental record to obtain the recording information fed back by the experimental record includes: S11: Based on the document format of the experimental record, select the corresponding information correction channel to correct the experimental record and obtain an optimized version of the experimental record. S12: Perform OCR recognition on the optimized version of the experimental record using an OCR algorithm to obtain the recording information fed back by the experimental record.

[0020] First, identify the document format of the experimental records. Common formats include scanned copies of paper documents (such as PDF scans, JPEG images, etc.) and electronic documents (such as Word, Excel, etc.). The document format can be determined by the file extension, header information, etc. For different document formats, select the appropriate information correction channel. If it is a scanned copy of a paper document, there may be problems such as image blurring, tilting, and noise interference. For image blurring, image enhancement algorithms such as histogram equalization and sharpening filtering can be used to improve image clarity. For tilting, image rotation algorithms can be used for correction. For noise interference, median filtering, Gaussian filtering, and other methods can be used to remove noise.

[0021] For electronic documents, there may be issues such as non-standard formatting and missing data. For example, Word documents may have inconsistent fonts and disordered paragraph formatting. The program can automatically adjust the font, paragraph spacing, and other formatting. For Excel spreadsheets, there may be missing data or formatting errors. The missing data or formatting can be supplemented or corrected through data validation and filling rules. The experimental records are then corrected according to the selected correction channel to obtain an optimized version of the experimental records.

[0022] The original documents used for experimental recordings may have various problems, such as image quality issues when scanning paper documents or non-standard formatting issues in electronic documents. These problems can seriously affect the accuracy of OCR recognition. Through information correction processing, these interfering factors can be eliminated, making the images of experimental records clearer and the format more standardized, thereby improving the accuracy of OCR recognition. Standardized document formats and clear images are helpful for subsequent processing and analysis of recorded information. If the document has problems such as disordered formatting or blurry images, it will lead to inaccurate or incomplete information being recognized, affecting subsequent data processing and decision-making.

[0023] In one embodiment of the present invention, during the OCR recognition process, one OCR recognition algorithm is selected as the main recognition algorithm from a pre-deployed set of OCR recognition algorithms, and one or more OCR recognition algorithms are selected as auxiliary recognition algorithms. The optimized version of the experimental record is then overlaid and recognized using the main recognition algorithm and the auxiliary recognition algorithms.

[0024] Choose one of several pre-deployed OCR recognition algorithms as the main recognition algorithm, and select one or more as auxiliary recognition algorithms. Common OCR recognition algorithms include template matching-based algorithms, feature extraction-based algorithms, and deep learning algorithms (such as convolutional neural networks).

[0025] The optimized version of the experimental records was overlaid using a primary recognition algorithm and an auxiliary recognition algorithm. First, the primary recognition algorithm was used for preliminary recognition. Then, the preliminary recognition results were compared and fused with the recognition results of the auxiliary recognition algorithm. For example, if the primary recognition algorithm was inaccurate in recognizing a certain part of the text, but the auxiliary recognition algorithm was accurate, the result of the auxiliary recognition algorithm was adopted. After overlay recognition, the recognized text information was sorted and extracted to obtain the record information fed back by the experimental records.

[0026] Clinical trial centers typically have a large number of experimental records. Manual data entry is not only inefficient but also prone to errors. OCR technology can automate information acquisition, greatly improving the efficiency and accuracy of data entry. Different OCR algorithms have their own advantages and disadvantages. A single algorithm may not perform well in certain situations. By combining multiple algorithms, the advantages of different algorithms can be fully utilized, improving the reliability and accuracy of recognition and reducing the occurrence of recognition errors.

[0027] In one embodiment of the present invention, the step of performing information matching on the recorded information according to a preset standard to determine the record type to which the experimental record belongs includes: S21: Perform keyword matching processing on the record information according to the preset keyword standards for each record type to obtain the keyword matching degree of the experimental record relative to the keyword standards; S22: Analyze the numerical range of the keyword matching degree according to the preset matching threshold standard. When the keyword matching degree is in the highest numerical range, determine the record type to which the experimental record belongs based on the keyword matching degree. S23: When the keyword matching degree is within the secondary numerical range, generate a confirmation mark for the experimental record and temporarily store the experimental record with the confirmation mark so that the temporarily stored experimental record can be identified by the record type recognition model at a specified time. S24: When the keyword matching degree is in a low numerical range, the experimental record will be transferred to the manual review channel.

[0028] Pre-define corresponding keyword standards for each record type. These keywords are words that can represent the core characteristics of that record type. For the "clinical trial report" type, keywords include "clinical trial," "efficacy assessment," and "safety analysis." For the "experimental sample record" type, keywords include "sample number," "sample source," and "collection time." Compare the record information of the experimental records with the keyword standards of each record type. String matching algorithms, such as regular expression matching and edit distance calculation, can be used to count the number or frequency of keywords appearing in the record information. Calculate the keyword matching degree of the experimental record relative to each keyword standard based on the matching results. The matching degree can be expressed by the proportion of the number of keywords appearing to the total number of keyword standards, the frequency of keyword appearance, etc.

[0029] Keyword matching allows for rapid initial screening and classification of experimental records. Keywords are key characteristics reflecting record types; by matching keywords, the record type can be quickly located from a large amount of information, improving classification efficiency. Calculating keyword matching degree quantifies the matching relationship between records and each type, facilitating further analysis and judgment based on the matching degree.

[0030] Different matching threshold standards are preset to divide the keyword matching degree into the highest numerical range, the second-lower numerical range, and the lowest numerical range. For example, the matching degree is set to be greater than 80% as the highest numerical range, 50% - 80% as the second-lower numerical range, and less than 50% as the lowest numerical range. The calculated keyword matching degree is compared with the preset matching threshold standards to determine its numerical range.

[0031] When the keyword matching degree is in the highest value range, the record type of the experimental record is directly confirmed based on the keyword matching degree. For example, if an experimental record has a keyword matching degree of 85% with the "clinical trial report" type, then the experimental record is confirmed to belong to the "clinical trial report" type. When the keyword matching degree is in the lower value range, a confirmation mark is generated for the experimental record, and the experimental record with the confirmation mark is temporarily stored. At a specified time, the record type identification model is used to identify each of the temporarily stored experimental records. The record type identification model can be a classification model based on machine learning or deep learning, such as support vector machine, neural network, etc. It can comprehensively consider more feature information to accurately determine the record type. When the keyword matching degree is in the lower value range, the experimental record is transferred to the manual review channel, where professionals conduct a detailed review of the experimental record and determine its record type based on the specific content and background information of the record.

[0032] Different matching thresholds correspond to different processing methods, which can improve classification accuracy while ensuring classification efficiency. Records with high matching scores are directly confirmed in terms of type, reducing unnecessary processing steps. Records with medium matching scores are identified using more complex models, leveraging the powerful classification capabilities of the models to improve the accuracy of the judgment. Records with low matching scores are transferred to manual review, relying on the experience and knowledge of professionals to ensure the correctness of the classification. Using different processing methods according to different matching scores can reasonably allocate computing and human resources. Most records with high matching scores can be processed automatically, reducing manual intervention. For records that are difficult to classify accurately through keyword matching, more resources are invested in processing to avoid wasting resources.

[0033] In one embodiment of the present invention, the step of generating record tags for the experimental record based on the tag naming template includes: S31: Deploy several naming element standards for the experimental records based on the label naming template, and perform corresponding element analysis on the record information according to each of the naming element standards to obtain the naming element data of each of the naming element standards; S32: Combine and process the naming element data according to the tag naming template to obtain the record tag of the experimental record, and rename the experimental record according to the record tag.

[0034] Based on the label naming template, the naming element standards used to generate record labels are clearly defined. For example, for experimental records related to clinical trials, the naming element standards include "Trial Name," "Sample Number," "Test Date," and "Test Item." Each naming element standard has its specific rules and requirements. For example, the "Test Date" must be extracted in the format "YYYY-MM-DD." For each naming element standard, the record information of the experimental record is analyzed in detail. For "Trial Name," it is necessary to find text in specific locations or containing specific keywords in the record information, such as the content after the identifier "[Trial Name]." For "Sample Number," it is necessary to identify strings that conform to the number format, such as combinations of numbers starting with a specific letter. For "Test Date," tools such as regular expressions are used to extract text that conforms to the date format from the record information and perform format conversion to meet the standard requirements. Through these analyses, the naming element data corresponding to each naming element standard is obtained.

[0035] By defining clear naming element standards, experimental record labels can have a unified specification and format. Different experimental records come from a wide range of sources and have diverse formats. With standardized naming elements, it can be ensured that the generated labels are consistent in content and form, which facilitates subsequent management and retrieval. Element analysis of the record information can extract key content from a large amount of record information as naming element data. This data accurately reflects the core information of the experimental record, enabling the record label to clearly convey the main characteristics of the experimental record and allowing users to quickly understand the general content of the record.

[0036] Following the order and format specified in the label naming template, combine the various naming element data. For example, if the label naming template specifies the record label format as "Trial Name - Sample Number - Test Date - Test Item", then concatenate the corresponding naming element data obtained earlier according to this format. If a certain naming element data is missing, it needs to be specially processed according to the template requirements, such as replacing it with a specific symbol (such as "-"). After generating the record label, replace the original file name of the experimental record with the newly generated record label. For example, if the original file name is "record1.doc" and the generated record label is "Drug A Clinical Trial - S001 - 2024 - 01 - 01 - Blood Routine", then rename the file to "Drug A Clinical Trial - S001 - 2024 - 01 - 01 - Blood Routine.doc".

[0037] By combining various naming elements into record tags according to the tag naming template, the tags have a clear structure and meaning. Compared with the original file name, the new record tags can more intuitively display the key information of the experimental record, greatly improving the file's recognizability and facilitating the quick location and identification of the required experimental record among many files. Using record tags to rename experimental records helps to establish an orderly data archive. In terms of data retrieval, classification storage, and version control, files with standardized tags are easier to manage and operate, improving the efficiency and accuracy of data management.

[0038] In one embodiment of the present invention, the naming element standard is used to analyze the naming elements of the experimental record to obtain naming element data that conforms to the naming elements of the experimental record. The naming elements are divided into content type elements and format type elements. The record information is subjected to keyword identification and standard matching of the record content according to the naming element standard of the content type elements to obtain naming element data for the core feedback content of the experimental record. The record information is subjected to information format analysis and standard matching according to the naming element standard of the format type elements to obtain naming element data for the overall information format of the experimental record.

[0039] Clearly define the naming elements used to describe the core content of experimental records. For example, for clinical trial records, content type elements include "trial name", "drug name", "subject number", "experimental phase", etc. Develop corresponding keyword standards and matching rules for each content type element. For example, the keyword standard for "trial name" is specific name words that appear in the record, and the matching rules can be exact matching or fuzzy matching.

[0040] Searching for words that meet the content type element keyword criteria in the experimental records can be done using string matching algorithms, such as regular expression matching, to find content related to keywords such as "experiment name" and "drug name" in the record text. For some complex cases, semantic analysis is required to accurately identify keywords by combining context. For example, when the record mentions "[drug name] was used for treatment in this experiment", the "drug name" can be accurately extracted through context.

[0041] The identified keywords are compared with pre-set standards. If it is an exact match standard, the keyword is checked to see if it is completely consistent with the standard vocabulary. If it is a fuzzy match standard, the similarity between the keyword and the standard vocabulary is judged to see if it reaches a certain threshold. For keywords that match successfully, they are used as the naming element data of the content type element. For example, if the identified "drug name" matches a name in the standard drug name list, then that name is used as the naming element data of the content type element "drug name".

[0042] By identifying and matching keywords for content type elements, information that accurately reflects the core content of experimental records can be extracted. This information is crucial for quickly understanding the main content of experimental records, classifying data, and retrieving information. For example, in a large number of clinical trial records, records related to specific trials and drugs can be quickly located using content type element data such as "trial name" and "drug name." Establishing content type element standards and matching them can ensure the consistency and standardization of the extracted naming element data. Different experimental records may differ in their descriptions, but standard matching can unify them into a standardized data format, facilitating subsequent data processing and analysis.

[0043] Define naming elements to describe the overall information format of experimental records, such as "file format," "date format," and "data encoding format." Establish corresponding format standards and matching rules for each format type element. For example, the standard for "date format" is "YYYY-MM-DD," and the matching rule checks whether the date in the record conforms to this format. Perform format analysis on the experimental record information. For "file format," determine it by checking the file extension or header information; for "date format," use regular expressions or date parsing functions to identify the date format in the record; for data encoding format, determine it by viewing the file's metadata or using encoding detection tools.

[0044] The information format obtained from the analysis is compared with the pre-set format standard. If the date format in the record matches the "YYYY-MM-DD" standard format, then the format is used as the named element data of the "Date Format" format type element. For cases that do not conform to the standard format, format conversion or marking as an anomaly is required for subsequent processing.

[0045] The format of experimental records is crucial for data storage, transmission, and processing. By analyzing and matching format type elements with standards, it can be ensured that the recorded information format conforms to a unified standard, improving data compatibility and processability. For example, if all records use the same date format, "YYYY-MM-DD", date comparison and statistical analysis will be much easier. Understanding the overall information format of experimental records helps in effective data management. For instance, based on the "file format," appropriate software tools can be selected to open and process the records; based on the "data encoding format," problems such as garbled characters during data transmission and storage can be avoided.

[0046] In one embodiment of the present invention, the method further includes: transferring the experimental record to a designated location in a data archive according to the record tag, recording the transfer behavior of the experimental record to obtain an archived record list, tracing the relationship between experimental records in the same batch based on the archived record list to construct an association network of experimental records in the same batch, and confirming the naming uniqueness of experimental records in the same batch through the association network to correct experimental records with duplicate names.

[0047] Based on the content of the record tags, rules are established for the storage location in the data archive. For example, if the record tag contains information such as "trial name," "sample number," and "test date," the storage path can be determined according to the hierarchical structure of "trial name / test date / sample number." Based on the record tags and the above rules, the target storage path for the experimental records in the data archive is generated. For example, if the record tag is "Drug A Clinical Trial - S001 - 2024 - 01 - 01 - Blood Routine," the generated target path would be "Drug A Clinical Trial / 2024 - 01 - 01 / S001," moving the experimental records from their original storage location to the specified location in the data archive. File system operation functions or related file management tools can be used to complete the file transfer operation.

[0048] Transferring experimental records to a designated location in the data archive facilitates centralized management of all records, improves data security and maintainability, avoids management difficulties caused by scattered data storage, and makes the data archive structure clearer by storing records according to the rules of record tags, making it easier to quickly retrieve and query the required experimental records based on the content of the record tags.

[0049] The format of the archived record list should be determined, typically including the original filename of the experimental record, record tag, storage location before transfer, storage location after transfer, transfer time, and other information. After the experimental record transfer is completed, the relevant transfer information is recorded in the archived record list according to the designed format. The archived record list can be stored as a text file, CSV file, or database table, etc. The archived record list records the transfer history of experimental records, providing a basis for data traceability. When it is necessary to find the original location, transfer time, or other information of a record, it can be quickly obtained through the archived record list. Data auditing is required during the data management process, and the archived record list can serve as an important audit basis to ensure the compliance and traceability of the data transfer process.

[0050] Define clear rules for identifying experimental records from the same batch. For example, determine whether a record belongs to the same batch based on information such as "experiment name" and "test date" in the record label. If the "experiment name" and "test date" are the same in the record label, then these records are considered to belong to the same batch. Filter out experimental records from the same batch from the archived record list and analyze the relationships between them. For example, records with different sample numbers represent different sample test results for the same experiment at the same time, indicating a sample correlation. Records with different test items may represent different aspects of the same sample, indicating a test item correlation. Based on the analyzed correlations, construct a correlation network for each experimental record in the same batch. This correlation network can be represented using a graph data structure, where nodes represent experimental records and edges represent the correlations between records.

[0051] The archived record list records the transfer history of experimental records, providing a basis for data traceability. When it is necessary to find the original location, transfer time, or other information of a record, it can be quickly obtained through the archived record list. In the data management process, data auditing is required, and the archived record list can serve as an important basis for auditing, ensuring the compliance and traceability of the data transfer process.

[0052] Within the associated network, checking for duplicate record tags within the same batch of experimental records can be done by comparing the string content of the record tags. For experimental records with duplicate names, corrections are made according to certain rules. For example, a serial number can be added to the end of the record tag, such as "Drug A Clinical Trial - S001 - 2024 - 01 - 01 - Blood Routine_1" or "Drug A Clinical Trial - S001 - 2024 - 01 - 01 - Blood Routine_2". Simultaneously, the record tag and storage location information of the corresponding record in the archived record list are updated.

[0053] The archived record list records the transfer history of experimental records, providing a basis for data traceability. When it is necessary to find the original location, transfer time, or other information of a record, it can be quickly obtained through the archived record list. Data auditing is required during data management, and the archived record list serves as an important auditing basis, ensuring the compliance and traceability of the data transfer process. Ensuring the uniqueness of the naming of experimental records within the same batch can avoid confusion during data management and use. If there are duplicate names, it will lead to data retrieval errors or inaccurate analysis results. Correcting duplicate names ensures the consistency and standardization of record labels in the data archive, improving data quality and usability.

[0054] In one embodiment of the present invention, the association networks of each batch are interconnected to perform overall association analysis on all experimental records in the data archive, so as to generate corresponding association retrieval pointers for each experimental record in the data archive. The association retrieval pointers are used to retrieve other experimental records associated with a specified experimental record.

[0055] Extract the association network information of each batch of experimental records previously constructed from the data archive. These association networks are stored in a graph data structure, containing nodes (experimental records) and edges (relationships between records). Integrate the association network data of each batch to form a dataset containing all the association information of experimental records, ensuring that the node and edge information in the association network of each batch is merged accurately and without error, and avoiding information loss or conflict.

[0056] The association networks for each batch are constructed separately. Extracting and integrating them together is the basis for conducting overall association analysis. Only by centralizing all the association information can we fully understand the association between experimental records in the data archive. By accurately extracting and integrating the association network data of each batch, we can ensure that no association information of any experimental record is missed, thus guaranteeing the completeness and accuracy of subsequent analysis.

[0057] Determine the connection rules between the various batch association networks. For example, establish associations between different batches of records based on common elements in the record tags (such as "trial name" and "drug name"). If different batches of records contain the same "trial name" in their tags, then there is a potential association between these batches of records. According to the connection rules, establish association edges between batches in the integrated association network. For example, for records from different batches but belonging to the same trial, add association edges between their corresponding nodes to represent cross-batch associations. In the process of establishing cross-batch associations, association conflicts may occur, such as contradictory associations between records from different batches. In this case, it is necessary to handle the conflict according to the preset conflict resolution strategy, such as prioritizing the use of the more recent record association information or making a choice based on the credibility of the association.

[0058] There are potential correlations between experimental records from different batches. By connecting these records, we can discover these cross-batch correlations and gain a more comprehensive understanding of the entire experimental process. For example, the same experiment in different batches may have records from different stages. By establishing cross-batch correlations, we can link these records together to form a complete chain of experimental records. By connecting the correlation networks of each batch, we can build a unified correlation system, so that all experimental records in the database are in an organic correlation network, which facilitates overall management and analysis.

[0059] Graph analysis algorithms are used to analyze the integrated and connected network of records to uncover the overall relationships between experimental records. Commonly used graph algorithms include shortest path algorithms, connected component analysis algorithms, and centrality analysis algorithms. These algorithms can discover indirect relationships between records, important core records, and connectivity between record groups. The analyzed relationships are then evaluated to determine their strength and reliability. The strength of the relationships can be assessed based on indicators such as the number of edges and the length of the paths; the reliability of the relationships can be assessed based on the source and accuracy of the association information.

[0060] Analyzing the integrated network of connections using graph algorithms allows for in-depth exploration of complex relationships between experimental records, uncovering hidden information and patterns. This information is crucial for researchers to understand experimental results, identify problems, and conduct subsequent research. Evaluating these connections ensures the reliability of the analysis results. Only by accurately assessing the strength and credibility of these connections can informed decisions be made in subsequent retrieval and use.

[0061] Based on the results of the overall correlation analysis, rules for generating correlation retrieval pointers are formulated. A correlation retrieval pointer can be a data structure containing the identifier of the associated record and information about the correlation relationship. For example, a list can be used to represent the identifiers of other records associated with a specified record and the correlation type (such as sample correlation, detection item correlation, etc.). Each node (experimental record) in the integrated correlation network is traversed, and a corresponding correlation retrieval pointer is generated according to the pointer generation rules. The generated correlation retrieval pointers are then associated and stored with the corresponding experimental records for subsequent retrieval.

[0062] The association retrieval pointer facilitates the quick retrieval of other records associated with a specified experimental record. When it is necessary to find related records, there is no need to perform a complex search of the entire association network. The relevant record can be quickly located by simply following the association retrieval pointer, which greatly improves retrieval efficiency. The association retrieval pointer helps to conduct more in-depth data analysis. Researchers can obtain relevant records based on the association retrieval pointer and conduct comparative analysis, trend analysis, etc., so as to better understand the inherent laws of experimental data.

[0063] like Figure 2 As shown, the present invention provides an automated labeling system for clinical trial center laboratory records, used to implement the automated labeling method for clinical trial center laboratory records as described in any one of the first aspects, comprising: The record recognition module is used to collect experimental records and perform information recognition on the experimental records to obtain the record information fed back by the experimental records; The type determination module is used to perform information matching on the recorded information according to preset standards in order to determine the record type to which the experimental record belongs; The tag generation module is used to schedule the tag naming template corresponding to the record type, and generate record tags for the experimental record based on the tag naming template.

[0064] In this embodiment, the specific implementation of each module in the above system embodiment is described in the above method embodiment, and will not be repeated here.

[0065] The above description is merely a specific embodiment of the invention, but the scope of protection of the invention is not limited thereto. Any changes or substitutions conceived without creative effort should be included within the scope of protection of the invention.

Claims

1. A method for automated labeling of clinical trial center laboratory records, characterized in that, include: Collect experimental records and perform information recognition on the experimental records to obtain the recording information fed back by the experimental records; The recorded information is matched according to preset criteria to determine the record type to which the experimental record belongs; Schedule the tag naming template corresponding to the record type, and generate record tags for the experimental record based on the tag naming template.

2. The automated labeling method for clinical trial center laboratory records as described in claim 1, characterized in that, The steps for identifying information from the experimental records and obtaining the recorded information returned by the experimental records include: Based on the document format of the experimental records, select the appropriate information correction channel to correct the experimental records and obtain an optimized version of the experimental records. The optimized version of the experimental records is subjected to OCR recognition using an OCR algorithm to obtain the recording information fed back by the experimental records.

3. The automated labeling method for clinical trial center laboratory records as described in claim 2, characterized in that, During the OCR recognition process, one OCR recognition algorithm is selected as the main recognition algorithm from a pre-deployed set of OCR recognition algorithms, and one or more OCR recognition algorithms are selected as auxiliary recognition algorithms. The optimized version of the experimental record is then overlaid and recognized using the main recognition algorithm and the auxiliary recognition algorithms.

4. The automated labeling method for clinical trial center laboratory records as described in claim 1, characterized in that, The steps of matching the recorded information according to preset criteria to determine the record type to which the experimental record belongs include: The record information is matched with keywords according to the preset keyword standards for each record type to obtain the keyword matching degree of the experimental record relative to the keyword standards; The keyword matching degree is analyzed according to the preset matching threshold standard. When the keyword matching degree is in the highest value range, the record type to which the experimental record belongs is determined according to the keyword matching degree. When the keyword matching degree falls within the secondary numerical range, a confirmation mark is generated for the experimental record, and the experimental record with the confirmation mark is temporarily stored so that the temporarily stored experimental records can be identified by the record type recognition model at a specified time. When the keyword matching degree is in a low numerical range, the experimental record will be transferred to the manual review channel.

5. The automated labeling method for clinical trial center laboratory records as described in claim 1, characterized in that, The steps for generating record tags for the experimental records based on the tag naming template include: Based on the label naming template, several naming element standards are deployed for the experimental records, and the record information is analyzed according to each naming element standard to obtain the naming element data of each naming element standard. The data of each naming element are combined and processed according to the tag naming template to obtain the record tag of the experimental record, and the experimental record is renamed according to the record tag.

6. The automated labeling method for clinical trial center laboratory records as described in claim 5, characterized in that, The naming element standard is used to analyze the naming elements of the experimental records to obtain naming element data that conforms to the naming elements of the experimental records. The naming elements are divided into content type elements and format type elements. According to the naming element standard of the content type elements, the record information is subjected to keyword identification and standard matching of the record content to obtain naming element data for the core feedback content of the experimental records. According to the naming element standard of the format type elements, the record information is subjected to information format analysis and standard matching to obtain naming element data for the overall information format of the experimental records.

7. The automated labeling method for clinical trial center laboratory records as described in claim 1, characterized in that, Also includes: The experimental records are transferred to a designated location in the data archive based on the record tags. The transfer behavior of the experimental records is recorded to obtain an archived record list. Based on the archived record list, the relationships between experimental records in the same batch are traced to construct an association network for each experimental record in the same batch. The uniqueness of the names of each experimental record in the same batch is confirmed through the association network to correct experimental records with duplicate names.

8. The automated labeling method for clinical trial center laboratory records as described in claim 7, characterized in that, The association networks of each batch are interconnected to perform an overall association analysis of all experimental records in the data archive, thereby generating corresponding association retrieval pointers for each experimental record in the data archive. These association retrieval pointers are used to retrieve other experimental records associated with a specified experimental record.

9. An automated labeling system for laboratory records in a clinical trial center, characterized in that, A method for implementing automated labeling of clinical trial center laboratory records as described in any one of claims 1-8, comprising: The record recognition module is used to collect experimental records and perform information recognition on the experimental records to obtain the record information fed back by the experimental records; The type determination module is used to perform information matching on the recorded information according to preset standards in order to determine the record type to which the experimental record belongs; The tag generation module is used to schedule the tag naming template corresponding to the record type, and generate record tags for the experimental record based on the tag naming template.