A medical data security synchronization system and method

The medical data security synchronization system automatically identifies and adapts to the communication protocols of different medical information systems and supports non-programming configuration of data cleaning rules, solving the problem of data silos and achieving efficient data synchronization and privacy protection.

CN122245666APending Publication Date: 2026-06-19联通数智医疗科技有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
联通数智医疗科技有限公司
Filing Date
2026-03-13
Publication Date
2026-06-19

Smart Images

  • Figure CN122245666A_ABST
    Figure CN122245666A_ABST
Patent Text Reader

Abstract

This application discloses a medical data secure synchronization system and method for synchronizing data between different medical information systems. The system includes a central processing module, a dynamic protocol adaptation module, a visual data cleaning module, and a privacy computing module. The dynamic protocol adaptation module automatically identifies and adapts to different medical information systems, while also supporting user-configured protocol extensions, improving data adaptation flexibility in cross-system data interaction within medical institutions. The visual data cleaning module allows for non-programming configuration of data cleaning rules, eliminating the need for professional code writing or complex configuration files, thus reducing development costs. The privacy computing module implements multi-dimensional privacy protection strategies and key management, enhancing the system's security performance.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of medical informatics and data security technology, specifically to a medical data security synchronization system and method. Background Technology

[0002] With the development of healthcare informatization, medical institutions have deployed various information systems, such as hospital information systems and clinical information systems, to undertake business functions such as diagnosis and treatment and cost management. However, due to differences in the construction cycle and vendors of these systems, there are significant differences in data formats and communication protocols, resulting in serious data silos. Currently, cross-system data interaction in medical institutions mainly relies on customized interface development. Dedicated interfaces are developed separately for specific data interaction needs to achieve data transmission. However, related technologies suffer from poor adaptability and flexibility, complex data cleaning and configuration, and insufficient privacy protection. Summary of the Invention

[0003] The main objective of this application is to provide a medical data secure synchronization system and method, aiming to at least solve one of the above-mentioned technical problems.

[0004] A first aspect of this application provides a medical data secure synchronization system for synchronizing data between different medical information systems, characterized in that the system comprises: The central processing module is configured to be responsible for task scheduling, coordination between functional modules, and management of system resources; The dynamic protocol adaptation module communicates with the external information system connected to the system. It is used to automatically detect or receive configuration to identify the communication protocol used by the external information system. Based on the identified communication protocol, it calls the corresponding protocol template to parse and convert the transmitted data. The visual data cleaning module is communicatively connected to the dynamic protocol adaptation module and is used to clean the data after protocol adaptation. The visual data cleaning module provides a graphical interface and supports configuring data cleaning rules in a non-programming manner.

[0005] In some embodiments of this application, the dynamic protocol adaptation module includes: Protocol detection unit, used to automatically identify protocol type; An extensible protocol template library for storing various standard communication protocol templates and supporting the addition of custom protocol templates; The protocol parsing unit is used to call the corresponding template to parse the data based on the identified protocol type; The protocol adaptation unit is used to convert data into a protocol format compatible with the target system.

[0006] In some embodiments of this application, the visualization data cleaning module includes: The cleaning component library contains multiple predefined data cleaning components corresponding to different cleaning functions. The visual data cleaning component configuration unit provides a graphical data cleaning component operation interface for configuring data cleaning rules by assembling the data cleaning components; The natural language parsing unit integrates a semantic understanding model to parse user-input natural language commands into executable data cleaning rules; The cleaning execution unit is used to perform data cleaning operations according to the configured rules.

[0007] The cleaning result verification unit is used to verify the correctness of the cleaned data.

[0008] In some embodiments of this application, the natural language parsing unit is a dedicated model trained on medical corpus, which has the ability to parse the semantic association between medical-specific fields and cleaning operations, and supports error correction and intent guidance for ambiguous instructions.

[0009] In some embodiments of this application, the system further includes a privacy computing module, communicatively connected to the visualization data cleaning module, for identifying privacy information in the cleaned data and applying a preset privacy protection strategy, wherein the privacy computing module includes: A privacy data identification unit is used to automatically identify privacy information in data based on preset rules; The policy configuration unit is used to configure various privacy protection policies; A privacy processing unit is used to process identified privacy data according to a configured policy; The key management unit is used for the secure management of keys used in the privacy processing process.

[0010] In some embodiments of this application, the system further includes a full-process tracking and log management module for monitoring the data synchronization process. The full-process tracking and log management module includes: The process tracking unit is configured to collect status information at key nodes to form a full-process tracking link; The log collection unit is configured to generate detailed log data based on the information collected by the process tracking unit. The log storage unit is configured to store the collected log data; The global monitoring unit provides a visual dashboard to display the core operational metrics of system tasks; The intelligent analysis unit integrates machine learning and / or non-machine learning models to analyze log data to pinpoint the root cause of anomalies and generate remediation suggestions in natural language.

[0011] In some embodiments of this application, the system further includes a service publishing and concurrency control module, which is used to publish the processed data to the outside world in the form of a standardized service interface, and to perform concurrent traffic control on requests to access the service interface.

[0012] Another aspect of this application provides a method for secure synchronization of medical data, applied to the aforementioned system, the method comprising: System access and protocol adaptation steps: Identify the communication protocol of the external information system and complete data acquisition through the dynamic protocol adaptation module; Data cleaning steps: The data is cleaned using a visual data cleaning module based on rules configured in a non-programming manner; Privacy computation steps: Identify privacy information in the data and apply privacy protection policies through the privacy computation module; Data synchronization and service publishing steps: Synchronize the processed data to the target system, and / or publish the processed data to the outside world in the form of a standardized service interface through the service publishing and concurrency control module.

[0013] In some embodiments of this application, the system access and protocol adaptation steps include: First, an automatic detection method is used to attempt to identify the protocol type of the external information system; If automatic detection fails, the system will receive protocol parameters manually entered by the user to complete the adaptation configuration.

[0014] In some embodiments of this application, configuring rules in a non-programming manner includes: parsing user-input natural language instructions into executable data cleaning rules through a natural language parsing unit, and providing a visual preview of the rules for user confirmation or adjustment before execution.

[0015] The embodiments of this application include at least the following beneficial effects: by providing a medical data security synchronization system and method, data synchronization can be performed between different medical information systems. The dynamic protocol adaptation module automatically identifies and adapts to different medical information systems, while supporting user-configured protocols for protocol extension. This improves the flexibility of data adaptation in cross-system data interaction within medical institutions. The visual data cleaning module enables the configuration of data cleaning rules in a non-programming manner, eliminating the need for professionals to write code or complex configuration files, thus reducing development costs. The privacy computing module implements multi-dimensional privacy protection strategies and key management, thereby improving the system's security performance.

[0016] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description

[0017] To more clearly illustrate the technical solutions of the embodiments of this application, the relevant drawings of the embodiments of this application are described below. It should be understood that the drawings described below are only for the convenience of clearly describing some embodiments of the technical solutions of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 This is an overall architecture diagram of the medical data security synchronization system provided in the embodiments of this application; Figure 2 This application provides a layered design architecture diagram of a medical data security synchronization system; Figure 3 This is a schematic diagram of the structure of the dynamic protocol adaptation module provided in the embodiments of this application; Figure 4 This is a schematic diagram of the structure of the visual data cleaning module provided in the embodiments of this application; Figure 5 This is an overall architecture diagram of a medical data security synchronization system with a privacy computing module provided in an embodiment of this application; Figure 6 This is a schematic diagram of the privacy computing module provided in an embodiment of this application; Figure 7 This is a schematic diagram of the structure of the end-to-end tracking and log management module provided in the embodiments of this application; Figure 8 This is a schematic diagram of the structure of the service publishing and concurrency control module provided in the embodiments of this application; Figure 9 This is an exemplary description of a user using the medical data security synchronization system provided in the embodiments of this application; Figure 10 This application provides a method for secure synchronization of medical data. Detailed Implementation

[0019] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to limit the scope of this application. In the following description, when referring to the accompanying drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with those of this application; they are merely examples of circuits, systems, apparatus, and methods consistent with some aspects of the embodiments of this application as detailed in the appended claims.

[0020] It is understood that the terms "first," "second," etc., used in this application may be used to describe various technical features, but unless otherwise specified, these technical features are not limited by these terms. These terms are only used to distinguish one technical feature from another and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. For example, without departing from the scope of the embodiments of this application, a first element may also be referred to as a second element, and similarly, a second element may also be referred to as a first element.

[0021] Unless otherwise defined, the technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.

[0022] As used in this application, the terms "at least one", "multiple", "each", "any", etc., "at least one" includes one, two or more, "multiple" includes two or more, "each" refers to each of the corresponding multiples, and "any" refers to any one of the multiples.

[0023] It should be understood that the terms "center," "longitudinal," "lateral," etc., indicate the orientation or positional relationship based on the accompanying drawings, and are used only for the convenience of describing this application and simplifying the description, and do not indicate or imply that the elements referred to must have a specific orientation, or be constructed and operated in a specific orientation. The term "and / or" includes any and all combinations of one or more of the related listed items. Those skilled in the art can understand the specific meaning of the above terms in this application according to the specific circumstances.

[0024] The following describes in detail, with reference to the accompanying drawings, a medical data security synchronization system and method according to an embodiment of this application.

[0025] Before providing a detailed description of the embodiments of this application, some of the nouns and terms involved in the embodiments of this application will be explained first. The nouns and terms involved in the embodiments of this application are subject to the following interpretations.

[0026] NLP (Natural Language Processing): NLP technology is used in natural language parsing units, which can understand natural language instructions input by users (such as "unify the patient name field to half-width characters") and convert them into data cleaning rules that can be executed by computers.

[0027] HIS (Hospital Information System): This is the most basic and core operation and management system of a hospital, which mainly manages the hospital's administration, finance and logistics.

[0028] CIS (Clinical Information System): A clinical diagnosis and treatment system that primarily provides doctors and nurses with clinical support such as electronic medical records, medical orders, and nursing records.

[0029] LIS (Laboratory Information System): The working system of the laboratory department, which is dedicated to handling the laboratory's business, such as receiving test requests, managing samples, receiving test results, reviewing and issuing reports.

[0030] PACS (Picture Archiving and Communication System): The working system of the radiology department, specifically used for the storage, transmission, management and display of medical images (such as CT, MRI, X-ray).

[0031] Transformer architecture: A deep learning model architecture based on self-attention mechanism that completely abandons recurrent neural networks and convolutional neural networks, and achieves revolutionary performance improvement by processing sequential data in parallel.

[0032] Hybrid Expert (MoE) architecture: a deep learning technique that combines multiple "expert" subnetworks into a sparse activation model, where only a few experts are activated for computation for each input sample, significantly reducing computational costs while maintaining model capacity.

[0033] Bi-LSTM model: A neural network that extends the standard Long Short-Term Memory (LSTM) network into a bidirectional structure.

[0034] BERT model: A pre-trained language model based on the Transformer encoder architecture. By training on large-scale corpora for masked language modeling and next-sentence prediction tasks, it can generate deeply context-sensitive word vectors.

[0035] CBR (Case-Based Reasoning): A non-machine learning model that simulates how humans solve problems. Its core idea is that when encountering a new problem, the system will look for similar problems (cases) that have been solved in the past from its memory (case library), and then reuse or modify the solutions of those old cases to solve the new problem.

[0036] With the development of healthcare informatization, medical institutions have deployed various information systems, such as hospital information systems and clinical information systems, to undertake business functions such as diagnosis and treatment and cost management. However, due to differences in the construction cycle and vendors of these systems, there are significant differences in data formats and communication protocols, resulting in serious data silos. Currently, cross-system data interaction in medical institutions mainly relies on customized interface development. Dedicated interfaces are developed separately for specific data interaction needs to achieve data transmission. However, related technologies suffer from poor adaptability and flexibility, complex data cleaning and configuration, and insufficient privacy protection.

[0037] A first aspect of this application provides a medical data secure synchronization system for synchronizing data between different medical information systems. The system includes: The central processing module is configured to be responsible for task scheduling, coordination between functional modules, and management of system resources; The dynamic protocol adaptation module communicates with external information systems connected to the medical data security synchronization system. It is used to automatically detect or receive configurations to identify the communication protocols used by the external information systems. Based on the identified communication protocols, it calls the corresponding protocol template to parse and convert the transmitted data. The visual data cleaning module communicates with the dynamic protocol adaptation module and is used to clean the data after protocol adaptation. The visual data cleaning module provides a graphical interface and supports configuring data cleaning rules in a non-programming manner.

[0038] See Figure 1 This application provides a medical data security synchronization system, including a central processing module, a dynamic protocol adaptation module, and a visual data cleaning module. The central processing module, as the system's control hub, communicates with each module and is responsible for instruction parsing, task scheduling, resource allocation, and coordinating data flow between modules.

[0039] This application provides a medical data security synchronization system that enables data synchronization between different medical information systems. A dynamic protocol adaptation module automatically identifies and adapts to different medical information systems, while also supporting user-configured protocol extensions. This enhances the flexibility of data adaptation in cross-system data interaction within medical institutions. A visual data cleaning module allows for non-programming configuration of data cleaning rules, eliminating the need for professional coding or complex configuration files, thus reducing development costs. A privacy computing module implements multi-dimensional privacy protection strategies and key management, improving the system's security performance.

[0040] See Figure 2The medical data security synchronization system adopts a layered architecture design, consisting of an interaction layer, a core service layer, a protocol adaptation layer, a data processing layer, a privacy computing layer, a storage layer, and an interface layer from top to bottom. Each layer works together to achieve dynamic adaptation, secure synchronization, privacy protection, and service publishing of medical data.

[0041] Specifically, the interaction layer provides users with a visual operation interface, including a protocol configuration interface, a data cleaning rule configuration interface, a synchronization task management interface, a log query interface, and a service publishing interface, supporting users to complete various configurations through simple operations such as drag-and-drop and checkboxes; the core service layer, as the core control module of the medical data security synchronization system, is responsible for coordinating the work of each layer, including synchronization task scheduling, concurrency control, service publishing management, and permission management; the protocol adaptation layer is used to realize dynamic adaptation of various communication protocols, supporting automatic identification and adaptation of the protocol types of external information systems without the need for customized development; the data processing layer is responsible for data cleaning, transformation, and format standardization, integrating an NLP model, and can automatically complete data processing based on rules generated by visual configuration or natural language commands. The system comprises several layers: a processing layer and a privacy computing layer. The privacy computing layer performs desensitization, anonymization, and encryption on medical privacy data to ensure data privacy and security during synchronization and use. The storage layer stores configuration information (protocol configuration, cleaning rules, etc.), synchronization task data, log data, privacy computing keys, etc. The interface layer includes system access interfaces and service publishing interfaces. External information systems access this system through the system access interface, and external application service requesters obtain data services provided by this system through the service publishing interface. External information systems include existing hospital information systems such as HIS, CIS, LIS, and PACS, as well as newly added innovative application systems. External application service requesters include data analysis applications and research applications within medical institutions, as well as applications from authorized external partners.

[0042] The central processing module is located in the core service layer; the dynamic protocol adaptation module is located in the interaction layer and the protocol adaptation layer; and the visual data cleaning module is located in the interaction layer and the data processing layer.

[0043] In some embodiments of this application, the central processing module may adopt a microservice-based distributed architecture. The central processing module acts as the system's workflow engine, responsible for task scheduling and state management. The dynamic protocol adaptation module and the visual data cleaning module are deployed as independent microservices, communicating with the workflow engine through message queues or remote procedure call interfaces to collaboratively complete data synchronization tasks. Upon receiving a synchronization task, the central processing module sequentially calls the protocol adaptation microservice, the data cleaning microservice, and the privacy computing microservice. Data is transmitted via inter-service API calls (such as RESTful API, gRPC) or message queues (such as RabbitMQ, Kafka).

[0044] In some embodiments of this application, the central processing module may also adopt a monolithic centralized control architecture. The dynamic protocol adaptation module and the visual data cleaning module, as functional sub-modules within the core process, are uniformly scheduled by the central processing module, and data is sequentially transferred between the sub-modules in the form of memory data streams.

[0045] In some embodiments of this application, the central processing module may also adopt an event-driven architecture. The central processing module is decoupled from each functional module through an event bus. Each state change during the data synchronization process is published to the event bus in the form of an event, which is then consumed and processed asynchronously by the corresponding functional modules, thereby achieving efficient and loosely coupled pipelined operation.

[0046] In some embodiments of this application, frameworks such as Spring Cloud and Apache ServiceComb can be used to build the coordinator and microservices. In some embodiments of this application, RESTful APIs (synchronous), gRPC (high-performance synchronous), or message queues (such as RabbitMQ, Apache Kafka, RocketMQ, etc.) (asynchronous) can be used as communication protocols between modules. In some embodiments of this application, the system modules can be deployed on physical servers, virtual machines, or Docker containers and managed by container orchestration tools such as Kubernetes to achieve high availability and elastic scaling. Those skilled in the art will understand that the above are merely illustrative examples.

[0047] In this embodiment, the dynamic protocol adaptation module includes: a protocol detection unit for automatically identifying protocol types; an extensible protocol template library for storing various standard communication protocol templates and supporting the addition of custom protocol templates; a protocol parsing unit for calling the corresponding template to parse data according to the identified protocol type; and a protocol adaptation unit for converting data into a protocol format compatible with the target system.

[0048] See Figure 3 , Figure 3This is a schematic diagram of the dynamic protocol adaptation module provided in this embodiment. When an external information system accesses the system, the protocol detection unit can automatically identify the communication protocol types and versions supported by the external system by sending probe data packets, parsing system response header information, and reading system protocol declarations. For protocols that cannot be automatically identified, users can manually input protocol parameters (such as transmission method, data format, port number, etc.) through the protocol configuration interface of the interaction layer to complete the identification. The dynamic protocol adaptation module includes an extensible protocol template library, which pre-stores standard templates for various mainstream protocols. Each template contains core parameters such as the protocol's communication rules, data format specifications, and parsing methods. It also supports users customizing protocol templates through the protocol configuration interface of the interaction layer and storing them in the template library, enabling the expansion and adaptation of custom protocols. The protocol parsing unit, based on the protocol type identified by the protocol detection unit, calls the corresponding protocol template from the protocol template library to parse the data transmitted by the external information system, extracting valid data. For data with non-standard formats, it automatically marks them and feeds them back to the data processing layer for cleaning. The protocol adaptation unit calls the corresponding protocol template according to the protocol types supported by the target system, and converts the processed data into a format compatible with the target system to realize cross-protocol data transmission.

[0049] In some embodiments of this application, the protocol detection unit can actively or passively determine the application layer protocol used by the external access system. For example, it can infer the protocol by actively sending specific data packets and analyzing response characteristics; or it can identify the protocol by listening to or parsing existing communication traffic, without actively interfering with the target system. Exemplarily, the extensible protocol template library can use a standardized schema language to define protocol templates, making them machine-readable and parseable. Optionally, the extensible protocol template library uses JSON Schema or Protocol Buffers' .proto file format to describe each protocol. For example, a template for an HL7 v2.5 protocol defines the required fields, field order, delimiters, and data types for segments such as MSH and PID. Each template in the library is stored as a file and includes version management. Templates can be stored in a database, providing CRUD (Create, Read, Update, Delete) interfaces. For example, protocol templates can be stored in a relational database (such as MySQL) or a document database (such as MongoDB). The system provides a management interface that allows administrators to add new custom protocol templates (such as those defined internally by the hospital) and register them in the library via graphical forms or by directly editing JSON description files.

[0050] In some embodiments of this application, the protocol parsing unit can automatically generate corresponding parser code based on a protocol template (e.g., syntax rules). For example, the protocol parsing unit integrates a parser generator (e.g., ANTLR). After loading a protocol template (e.g., JSON Schema), it dynamically generates the corresponding lexical analyzer and syntax analyzer. For example, for HL7 messages, the generated parser can recognize the segment separator | and parse the raw message PID|1||12345... into a memory object tree containing attributes such as segment ID and field sequence. Optionally, the protocol parsing unit can use mature third-party parsing libraries for common protocols, or utilize a script engine to implement dynamic parsing. For example, for standard protocols, the unit directly calls high-performance native parsing libraries, such as using libhl7 to parse HL7 messages or using the Jackson library to parse JSON. For highly customized protocols, it calls a built-in script engine (e.g., the JavaScript Nashorn engine) to execute a parsing script dynamically generated based on the template, achieving flexible parsing.

[0051] In some embodiments of this application, the protocol adaptation unit may use a template engine to populate data into the message template of the target protocol. Optionally, the protocol adaptation unit may employ a template engine (such as FreeMarker or Thymeleaf). For example, when data needs to be converted into FHIR format JSON, the engine binds its internal data object to a JSON template of an FHIR Patient resource to generate a JSON string conforming to the FHIR standard.

[0052] In some embodiments of this application, the dynamic protocol adaptation module works collaboratively through a pipeline of protocol detection, template management, parsing, and adaptation. The protocol detection unit identifies protocols by combining proactively sending feature packets (HTTP GET / HL7 handshake messages) with passive deep packet inspection (DPI); the identification results are used to query the corresponding template from an extensible protocol template library (based on JSONSchema describing the protocol structure); the protocol parsing unit uses a parser generated by ANTLR to parse the raw message into a structured object; finally, the protocol adaptation unit uses the FreeMarker template engine or Protocol Buffers serializer to convert the object into a protocol format compatible with the target system, completing the cross-protocol data conversion.

[0053] In some embodiments of this application, the dynamic protocol adaptation module can adopt a deep learning-based protocol automatic identification and adaptation scheme. By training a deep learning model to learn the characteristics of protocol data packets, it can achieve the identification and parsing of unknown protocols without the need for predefined templates.

[0054] It is understood that the above is merely an illustrative example. This application automatically identifies mainstream protocols through a dynamic protocol adaptation module, while also supporting custom template extensions. This solves the problem in existing technologies where adding new protocols requires the development of new plugins, resulting in long development cycles and high costs. It shortens the access time for new systems and significantly reduces access costs.

[0055] In this embodiment, the visual data cleaning module includes: a cleaning component library, which predefines multiple data cleaning components corresponding to different cleaning functions; a visual cleaning component configuration unit, which provides a graphical data cleaning component operation interface for configuring data cleaning rules by assembling the data cleaning components; a natural language parsing unit, which integrates a semantic understanding model for parsing user-input natural language instructions into executable data cleaning rules; and a cleaning execution unit, which performs data cleaning operations according to the configured rules.

[0056] See Figure 4 , Figure 4 This is a schematic diagram of the data cleaning module. The cleaning component library predefines data cleaning components corresponding to different cleaning functions. These components cover multiple technical dimensions of data cleaning, including but not limited to: a missing value handling component, which supports direct deletion of missing records or intelligent filling based on statistical values ​​(mean, median, mode) and algorithmic prediction (K-nearest neighbor); a format standardization component, which supports unified formatting and conversion of dates, times, values, character encodings, etc.; an outlier handling component, which supports the identification and correction of abnormal data based on business rules (such as regular expressions, logical judgments) or statistical methods (such as IQR interquartile range); a data deduplication component, which supports precise deduplication and fuzzy deduplication based on fuzzy matching algorithms; and a data transformation and derivation component, which supports field splitting, merging, and generating derived fields based on existing fields. Those skilled in the art will understand that this application does not limit the specific execution algorithm for data cleaning.

[0057] In some embodiments of this application, users can flexibly combine the aforementioned components through the visual cleaning component configuration unit in the interactive layer's visual configuration interface by dragging, checking, or selecting from dropdowns to form a cleaning configuration that meets the needs of complex medical data cleaning. The visual data cleaning module may also include a cleaning rule storage unit, which stores configuration rules and field information in association, supports querying, reuse, and batch modification, reduces redundant configuration, and thus enables the visual data cleaning module to support the saving, modification, and reuse of cleaning configurations.

[0058] In some embodiments of this application, the natural language parsing unit integrates a semantic understanding model to parse natural language instructions input by the user in the interaction layer into executable data cleaning rules. For example, when a user inputs a natural language instruction (such as "fill in missing values") in the visual configuration interface of the interaction layer, the natural language parsing unit will complete the corresponding missing value filling rule configuration based on the natural language instruction input by the user.

[0059] In this embodiment, the natural language parsing unit is a dedicated model trained on medical domain corpora. It possesses the ability to parse the semantic association between medical-specific fields and cleaning operations, and supports error correction and intent guidance for ambiguous commands. Those skilled in the art will understand that the dedicated model in the natural language parsing unit can be a pre-trained NLP model based on the Transformer architecture, a pre-trained NLP model based on the Hybrid Expert (MoE) architecture, or other deep learning pre-trained models. By inputting medical domain corpora into the model for training, and outputting corresponding cleaning configuration rules, the model gains the ability to parse the semantic association between medical-specific fields and cleaning operations. Through adaptation and adjustment of the model and training parameters during training, the model acquires the ability to correct ambiguous commands and guide intent.

[0060] In one embodiment of this application, the natural language parsing unit uses a BERT pre-trained model, combined with a hybrid architecture of medical domain fine-tuning and rule engine supplementation to achieve medical-specific semantic understanding. The specific customization process is as follows: 1. Corpus Construction: Collect real-world instructions for medical data cleaning scenarios (including field-level configuration, format conversion, encryption and decryption requirements, etc.), and label medical-specific fields (such as "patient ID number" and "test results"), cleaning operations (such as "de-identification" and "AES encryption"), logical relationships (such as "AND" and "OR"), etc., to form a labeled corpus for the medical data cleaning field; 2. Model fine-tuning: Based on the BERT-base model, fine-tuning is performed using an annotated corpus to optimize the model's semantic representation of medical terms and learn the association mapping between fields and cleaning operations; 3. Rule Engine Integration: Construct a syntax rule library for the medical cleaning domain (such as "[field] + [operation] + [parameter]" sentence rules) to verify and correct the model output results, thereby improving the accuracy of parsing.

[0061] The core functions of this model include: compound instruction parsing, fuzzy instruction error correction, and rule transformation. By employing dependency parsing and entity relation extraction algorithms, it performs syntactic structure analysis on compound instructions (such as "convert patient's birthday format to YYYY-MM-DD and check for emptiness, and encrypt ID number using AES"), identifying core entities such as "patient's birthday" and "ID number," extracting operations and logical relationships such as "format conversion - check for emptiness" and "encryption," and generating independent sub-rules. Based on the edit distance algorithm, it calculates the similarity between the fuzzy instruction and standard instructions in the corpus, matching the most likely correct intent, and generating multiple candidate requirements to guide user confirmation (e.g., when inputting "process patient name," candidate requirements include "desensitization / encryption / check for emptiness"). The parsed entities and operations are mapped to a standardized instruction format (JSON format) recognizable by the system, containing information such as field ID, operation type, parameter configuration, and execution order, for use by the cleansing execution unit.

[0062] When a user configures a specific cleaning function through the visual cleaning component configuration unit, or when a user inputs natural language commands in the interaction layer and completes the configuration through the natural language parsing unit, the cleaning execution unit automatically executes the cleaning process according to the configuration rules.

[0063] The following are exemplary descriptions. It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the scope of this application.

[0064] For example, a hospital needs to synchronize laboratory data from its LIS system to a research platform, but the "Laboratory Time" format in the source data is inconsistent, and the "Project Name" uses abbreviations and full names interchangeably. Administrators can configure cleaning rules through both visual configuration and natural language-driven methods. After the cleaning rules are confirmed, the cleaning execution unit performs cleaning operations on the incoming data and outputs standardized data. In the visual configuration method, users can operate the interface provided by the visual cleaning component configuration unit, dragging and dropping "Date Formatting" and "Dictionary Mapping" components from the cleaning component library onto the canvas and connecting them. Then, for the "Date Formatting" component, the target format is set to YYYY-MM-DD HH:mm:ss, and for the "Dictionary Mapping" component, a CSV mapping table file is uploaded that maps abbreviations (such as WBC) to full names (such as white blood cell count). In the natural language-driven approach, users can directly type in the input box: "Unify the test time to a standard format and convert abbreviations such as WBC to their full Chinese names." The natural language parsing unit integrates a model (such as the BERT model) trained on medical corpora (such as medical dictionaries and electronic medical records), which can accurately understand that "test time" and "WBC" are specific fields, and "unify to a standard format" and "convert to their full Chinese names" are cleaning operations. The system then automatically generates the above-mentioned visual process for administrator confirmation. The natural language parsing unit has intent guidance capabilities. For example, for the ambiguous instruction "process the outliers," it will prompt: "Please specify the outlier judgment rules (such as: outside the reference range) and processing methods (such as: mark as 'outlier' or leave empty)."

[0065] This application achieves accurate conversion of natural language into rules through a medical-specific model, enabling non-technical personnel to quickly complete complex configurations with zero coding barriers. It solves the problem that existing technologies require professional coding for cleaning configurations, making them inaccessible to non-technical personnel, thus improving development efficiency while ensuring configuration accuracy.

[0066] Optionally, the visual data cleaning module also includes a cleaning result verification unit, which performs precise field-level verification on the cleaned data to check whether each field conforms to the preset cleaning rules (such as format specifications, no empty values, encryption compliance, correct merging / splitting, etc.). If the verification fails, the specific abnormal field and the corresponding cleaning rule are automatically located, the original data before cleaning is returned, and the error type and location are marked, and feedback is sent to the interaction layer for users to view and adjust the cleaning rules.

[0067] Optionally, the visual data cleaning module also includes a cleaning rule storage unit, which stores the user-configured cleaning rules (including rules generated by visual configuration and rules converted from natural language) along with the corresponding field information and execution logic in the storage layer, supporting the querying, reuse and batch modification of rules, and reducing repetitive configuration work.

[0068] Optionally, the system also includes a privacy computing module, which communicates with the visualization data cleaning module. This module identifies privacy information in the cleaned data and applies preset privacy protection policies. The privacy computing module includes: a privacy data identification unit for automatically identifying privacy information in the data based on preset rules; a policy configuration unit for configuring various privacy protection policies; a privacy processing unit for processing the identified privacy data according to the configured policies; and a key management unit for securely managing the keys used in the privacy processing. The privacy computing module is located in both the interaction layer and the privacy computing layer.

[0069] See Figure 5 and Figure 6 , Figure 5 This is an overall architecture diagram of a medical data security synchronization system with a privacy computing module provided in the embodiments of this application, including a central processing module, a dynamic protocol adaptation module, a visual data cleaning module, and a privacy computing module; Figure 6 This is a schematic diagram of the privacy computing module provided in an embodiment of this application. The privacy computing module provides privacy-in-depth protection and includes a privacy data identification unit, a policy configuration unit, a privacy processing unit, and a key management unit.

[0070] In some embodiments of this application, the privacy data identification unit automatically identifies privacy information in the data through a preset privacy data feature library (such as 18-digit ID card number features, medical record number encoding rules, sensitive field keywords, etc.), and supports users to manually mark privacy data fields that are not automatically identified.

[0071] In some embodiments of this application, the policy configuration unit can provide a visual interface, allowing users to configure different protection policies for different types of privacy data, including data anonymization (such as hiding the middle 8 digits of an ID number and the last 1 digit of a name), data anonymization (such as removing personal identification information and replacing it with a random identifier), homomorphic encryption (for scenarios where the original data needs to be used directly, data computation is performed in an encrypted state), and differential privacy (adding noise interference to protect individual information in the dataset), etc.

[0072] In some embodiments of this application, the privacy processing unit can process the identified privacy data according to the configured privacy protection policy; it can also encrypt the entire processing process to ensure that the privacy data is not leaked; and it can reversibly restore the processed data (requiring an authorized key) to meet the legitimate original data access requirements.

[0073] In some embodiments of this application, the key management unit may use an asymmetric encryption algorithm (such as RSA) to manage the keys in the privacy processing process, including key generation, storage, distribution, updating and destruction; only users with the corresponding permissions can obtain the keys, ensuring the security of privacy data.

[0074] Optionally, the privacy computing module also includes a privacy protection log unit, which records information about the entire process of privacy data identification, processing, and access, including processing time, processing method, operator, access permissions, etc., to ensure that the use of privacy data is traceable.

[0075] For example, in a scenario where data is distributed to collaborating research institutions: the privacy data identification unit automatically identifies the ID number field in structured data and the disease diagnosis description in unstructured text using regular expressions (such as matching ID card numbers) and a built-in database of medically sensitive information keywords (such as "diabetes" and "malignant tumor"). The policy configuration unit's management interface allows administrators to configure policies for data with different levels of sensitivity: "strong anonymization" is applied to ID card numbers (e.g., retaining the first 3 and last 4 digits), and "k-anonymization" is applied to diagnosis descriptions. The privacy processing unit calls the corresponding algorithm to perform operations according to the policy. The key management unit can securely manage the keys required for anonymization and encryption based on RSA asymmetric encryption algorithms or hardware security modules (HSMs) and record a full lifecycle log.

[0076] Those skilled in the art will understand that the automatic identification of privacy information, privacy protection strategies, and key management in data based on preset rules is not limited to the above implementation methods, and may also include other implementation methods. For example, in addition to the rule-and-pattern matching method described above, the automatic identification of privacy information in data based on preset rules can also employ machine learning / deep learning-based methods, or a combination of rule-and-pattern matching and machine learning / deep learning-based methods.

[0077] For example, when the data is free text (such as clinical notes or diagnostic descriptions), rule-based methods are insufficient to cover all situations. In such cases, machine learning / deep learning-based methods can be used to automatically identify privacy information in the data, such as named entity recognition (NER) and classification models. In NER, sequence labeling models (such as Bi-LSTM-CRF and BERT-CRF) can be used to automatically identify entities such as names of people, locations, institutions, times, diseases, and body parts in the text. In this case, the privacy data identification unit can integrate a lightweight NER model based on BERT or Transformer architectures, fine-tuned using massive amounts of medical record data. This model can accurately identify medical entities such as "headache" (symptom) and "brain CT" (examination item) from the text "The patient complains of severe headache and is advised to have a brain CT scan." For classification models, text fragments or entire documents can be categorized as containing privacy information. For example, determining whether a piece of text is a general medical description or privacy content involving the patient's personal circumstances. For text segments that cannot be precisely defined by entity recognition, the privacy data recognition unit can use a text classification model to determine their privacy level (such as "public", "internal", "highly sensitive"). In some embodiments of this application, the privacy computing module can be implemented using privacy protection technology based on federated learning, adopting a "data not leaving the domain, model co-training" mode, processing data locally on the external system, and only transmitting model parameters.

[0078] It is understood that the above is merely an exemplary description of the embodiments of this application, and this application does not limit it.

[0079] This application constructs a multi-dimensional privacy computing system that supports strategies such as de-identification and homomorphic encryption. It solves the problems of existing technologies having single privacy protection methods and being easily leaked, and realizes fine-grained key management and full-process traceability of privacy data, thereby strengthening the privacy and security protection of the system.

[0080] Optionally, the system also includes a full-process tracking and log management module for monitoring the data synchronization process. This module includes: a process tracking unit for collecting status information at key nodes to form a full-process tracking link; a log storage unit for storing system operation logs; a global monitoring unit for providing a visualization panel to display the core operation indicators of system tasks; and an intelligent analysis unit that integrates machine learning models and / or non-machine learning models to analyze log data to locate the root cause of anomalies and generate natural language descriptions of repair suggestions.

[0081] See Figure 7 , Figure 7 This is a schematic diagram of the structure of the end-to-end tracking and log management module provided in this application embodiment. The end-to-end tracking and log management module includes a process tracking unit, a log storage unit, a global monitoring unit, and an intelligent analysis unit.

[0082] In some embodiments of this application, the process tracking unit sets up embedded points at various key nodes of data synchronization (such as protocol adaptation, data cleaning, privacy processing, data transmission, and data reception) to collect information such as the execution status (such as success, failure, and in progress), execution time, and amount of data processed at each node in real time, forming a full-process tracking link.

[0083] In some embodiments of this application, the log collection unit generates detailed log data based on the information collected by the process tracking unit, including system logs (such as module startup, exception reporting), business logs (such as synchronization task creation, execution results), security logs (such as permission verification, privacy data processing), etc.; the log data is stored in a standardized format and includes fields such as timestamp, module name, operation type, data identifier, operator, and detailed description.

[0084] In some embodiments of this application, the log storage unit stores the collected log data to the log database of the storage layer, adopts a partitioned storage strategy (such as partitioning by time) to improve log storage and query efficiency; supports backup and archiving of log data to meet data retention period requirements.

[0085] In some embodiments of this application, the global monitoring unit provides a visual global monitoring panel, adopting a hierarchical layout design of "overview + details" to intuitively display the overall status of all data synchronization tasks in the current system. The top of the panel is a core indicator overview area, presenting in real-time core indicators such as the total number of tasks, the number of tasks currently executing, the number of tasks waiting to be executed, the number of successfully executed tasks, the number of problematic tasks, and the task failure rate in the form of digital cards. Each indicator card is distinguished by a different color (e.g., green for successful tasks, red for problematic tasks, and yellow for waiting tasks), and mouse hover is supported to view indicator descriptions and calculation logic. The middle section is a task status distribution area, intuitively displaying the proportion of tasks in different statuses through a pie chart. Clicking on each section of the pie chart allows for quick filtering and viewing of the task list for the corresponding status. The bottom section is a task details list area, sorted by task execution status priority. The list (with problematic tasks pinned to the top) includes fields such as task ID, task name, associated source and target systems, synchronization mode, start time, execution progress, current stage, and operation buttons. Tasks with problems are highlighted with a red border and marked with an exception indicator after the task name. It supports quick filtering and viewing by multiple dimensions such as task type, execution status, associated systems, and time range. Users can click the task details button to jump to the single task's full process tracking page. It also supports automatic refresh of monitoring panel data (configurable refresh frequency: 10 seconds / 30 seconds / 1 minute) and manual refresh, allowing users to have a global grasp of the task execution status.

[0086] In one embodiment of this application, the intelligent analysis unit employs a random forest classification model combined with a case-based reasoning (CBR) architecture. Its specific implementation includes: ① Model Training: Collect historical synchronization task exception logs (including 10 types of exceptions such as protocol adaptation failure and cleaning rule errors) as the training set, extract exception features (such as error codes, field names, and step identifiers) from the logs, and train a random forest model for exception type classification with an accuracy of ≥95%; ② Problem localization: Extract features from the entire process log of the problem task, input them into the model to obtain the anomaly type, and combine log timestamps and process identifiers to accurately locate the abnormal process and its root cause; ③ Problem description transformation: Based on a pre-set "technical terminology-natural language" mapping dictionary, the technical results output by the model are transformed into easy-to-understand descriptions; ④ Repair suggestion generation: Based on the CBR architecture, it retrieves solutions for similar historical anomaly cases, optimizes and generates repair suggestions based on the current anomaly scenario; it supports user feedback on repair effects and continuously updates the case library to optimize the accuracy of suggestions.

[0087] For example, in a scenario where a task execution fails: the process tracking unit embeds tracking points at key nodes such as protocol adaptation and data cleaning, generating a unique tracking ID for each synchronous task and collecting status and execution time in real time. When the log storage unit (e.g., using Elasticsearch) records that a task times out during the privacy computation phase, the global monitoring unit's visualization dashboard immediately issues an alert and highlights the task at the top. The intelligent analysis unit (e.g., integrating an isolated forest algorithm) automatically analyzes the entire log of the task, pinpointing the root cause as "the privacy computation module experiencing a memory overflow while processing an extremely long pathological text," and generates a remediation suggestion: "It is recommended to add length validation or block processing logic to the text fields."

[0088] Those skilled in the art will understand that, in addition to the exemplary descriptions above, the process tracing unit, log storage unit, global monitoring unit, and intelligent analysis unit can be implemented using other technologies. For example, the process tracing unit can be implemented based on the OpenTelemetry specification. Whenever a data synchronization task is triggered, this unit generates a globally unique TraceId. When the task reaches key nodes such as protocol adaptation, data cleaning, and privacy computation, a corresponding Span is created, and its start time, end time, tag (e.g., external.system=HIS), and log (e.g., error=connection timeout) are recorded. All these Spans are associated through TraceIds, forming a complete end-to-end tracing chain. The process tracing unit automatically collects node status information by embedding tracing code (insertion points) at the key function entry and exit points of core components such as the dynamic protocol adaptation module and data cleaning module. For HTTP / RPC calls, this unit automatically injects and extracts tracing context information (such as TraceId and SpanId) to ensure that cross-process and cross-service call chains are not interrupted.

[0089] For example, log collection and transmission can be implemented using Fluentd, Filebeat, Logstash, etc. For instance, a Filebeat agent can be run on each system deployment node. This agent monitors changes to application log files, collects new logs in real time, and forwards them to the log storage unit. The log storage unit uses an Elasticsearch cluster as its core storage engine. All received log data, including business logs, performance metrics, and trace data, is indexed, supporting millisecond-level retrieval by TraceId, time range, log level, module name, and other dimensions. To further ensure system stability, an Apache Kafka message queue can be deployed between the log collection agent and Elasticsearch as a high-throughput data buffer to achieve traffic smoothing and valley filling.

[0090] For example, the global monitoring unit integrates Grafana or Kibana as its visualization engine. It provides hierarchical monitoring dashboards: the overview page dynamically displays key global metrics in the form of colored number cards, such as "Total tasks today: 150", "Failed tasks: 2", and "System health: 98%"; the central pie chart shows the task status distribution (success, failure, in progress); and the lower details list displays detailed information for each task in real time, highlights abnormal tasks, and supports drill-down to view full-link tracing details.

[0091] In some embodiments of this application, the intelligent analysis unit can integrate machine learning models such as Isolation Forest, Local Outlier Factor, or time-series-based prediction models (e.g., Prophet, LSTM), or non-machine learning models such as rule-based models (e.g., expert systems), case-based reasoning models, or hybrid models combining machine learning and non-machine learning models, such as the aforementioned random forest classification model combined with case-based reasoning (CBR) architecture. For example, the intelligent analysis unit integrates an Isolation Forest unsupervised learning model to monitor metrics such as task execution time and CPU / memory consumption in real time. When the execution time of a data synchronization task deviates significantly from the historical normal pattern, the model will identify it as abnormal and automatically issue an early warning, without relying on manually setting fixed thresholds.

[0092] In some embodiments of this application, the end-to-end tracking and log management module can be implemented using a distributed tracking, log management, and intelligent analysis technology stack. The process tracking unit implements end-to-end tracking based on the OpenTelemetry specification; the log storage unit uses an Elasticsearch cluster and collects logs through a Filebeat agent to achieve the storage and retrieval of massive amounts of data; the global monitoring unit integrates the Grafana visualization platform to provide real-time, multi-dimensional system monitoring dashboards and alarms; and the intelligent analysis unit integrates machine learning models such as Isolation Forest to achieve automatic anomaly detection, root cause localization, and intelligent repair suggestion generation, thereby upgrading the operation and maintenance mode from "passive firefighting" to "proactive early warning and intelligent diagnosis."

[0093] This application solves the problems of simple technical logs and low troubleshooting efficiency by using full-process data tracking, a visual monitoring panel, and AI intelligent analysis. It enables rapid problem location and accurate repair, reduces troubleshooting time, lowers the difficulty of operation and maintenance, and provides a reliable basis for security auditing.

[0094] Optionally, the end-to-end tracking and log management module also includes a log query unit. The log query unit provides a visual log query interface, supporting users to perform precise and fuzzy queries based on various conditions (such as time range, module name, data identifier, operation type, etc.); it also supports the export and printing of log data, facilitating troubleshooting and auditing.

[0095] Optionally, the end-to-end tracking and log management module also includes a log analysis unit. The log analysis unit automatically analyzes log data, identifies anomalies during the data synchronization process (such as synchronization failure, processing delay, and permission abnormalities), and promptly sends alarm notifications to users (such as pop-ups, emails, and SMS messages); it also supports the generation of log statistical reports to intuitively display the overall status of data synchronization (such as the number of daily synchronization tasks, success rate, and execution efficiency of each module).

[0096] Optionally, the system also includes a service publishing and concurrency control module, used to publish the processed data to the outside world in the form of standardized service interfaces, and to perform concurrent traffic control on requests to access the service interfaces. See also Figure 8 , Figure 8 This is a schematic diagram of the service publishing and concurrency control module provided in this application embodiment. The service publishing and concurrency control module includes a service configuration unit, a service publishing unit, a concurrency control unit, a service monitoring unit, and a permission verification unit. The service configuration unit provides a visual service configuration interface, allowing users to select the data resources to be published (such as patient basic information, test results), set service names, service types (such as RESTful API, WebService), access permissions (such as public, authorized access), and data return formats (such as JSON, XML), etc. The service publishing unit automatically generates standardized service interfaces based on the user-configured parameters and registers the service information to the service registry center. It supports dynamic publishing, updating, and decommissioning of services without requiring a system restart and provides automatic service documentation generation for easy integration by external application developers. The concurrency control unit uses a token bucket rate limiting algorithm, presetting the number of requests that can be processed per second (token generation rate) based on the system's processing capacity. When an external application requests a service, it must first obtain a token; only requests with a token can be processed. For requests exceeding the system's processing capacity, the system automatically returns a queuing prompt or rejects the response to avoid system congestion. It supports dynamic adjustment of rate limiting parameters based on service type and request source. The service monitoring unit monitors the operational status of the published service in real time, including metrics such as request volume, response time, success rate, and error codes. When service anomalies occur (such as excessively high response latency or low success rate), it automatically sends alarm notifications and supports the generation of service operation reports, providing data support for system optimization. The permission verification unit authenticates and controls the permissions of external applications accessing the published service. Only applications that have passed authentication and possess the corresponding access permissions can call the service. It supports multiple authentication methods (such as API keys and OAuth 2.0) to ensure the security of service access. In some embodiments of this application, the concurrency control unit can be implemented using queue-based concurrency control. Requests are processed according to a first-in, first-out principle through a request queue; if the queue is full, new requests are rejected.

[0097] For example, when a hospital wants to make its anonymized data available to its internal research platform: the service publishing unit automatically generates standard API interfaces and related documentation based on administrator configuration. The authorization verification unit requires callers to provide API keys for authentication. The concurrency control unit uses a token bucket algorithm to limit the request rate to, for example, 100 or 120 requests per second. Requests exceeding this limit will be queued or rejected to ensure the stability of the backend system.

[0098] This application addresses the issues of existing technologies being unable to publish data services externally and having poor reusability through a service publishing and concurrency control module. It uses a token bucket rate limiting algorithm to dynamically adjust parameters, ensuring stable operation in high-concurrency scenarios. It also ensures service security and stability through permission verification and concurrency control, thereby enhancing the reusability value of data.

[0099] like Figure 9 The illustration shown is an exemplary description of a user using the medical data security synchronization system provided in the embodiments of this application.

[0100] S1 system access configuration: Users can complete the input of external system information, protocol adaptation configuration (such as using automatic detection and manual supplementation methods) and synchronization range configuration through a visual interface.

[0101] S2 data cleaning rule configuration: Users configure rules by dragging and dropping components or using natural language commands. The system then parses the rules using NLP, converts them into standardized rules, and stores them.

[0102] S3 Privacy Protection Policy Configuration: Users can configure privacy data identification rules and protection policies, and set key access permissions.

[0103] S4 Synchronization Task Creation and Scheduling: Users create tasks and set parameters such as synchronization mode and frequency, and the system automatically schedules and executes them.

[0104] S5 Dynamic Protocol Adaptation and Data Acquisition: The system calls the protocol template to establish communication, parses the data, and completes the acquisition.

[0105] S6 data cleaning process: The system performs cleaning according to rules, and after completion, it performs field-level validation. If there are any abnormalities, they are marked and reported.

[0106] S7 Privacy Computing Processing: The system automatically identifies privacy data and executes protection policies, while simultaneously recording processing logs and management keys.

[0107] S8 Data Transmission and Adaptation: The system converts the processed data into a format compatible with the target system and transmits it through an encrypted channel.

[0108] S9 Full-Process Tracking and Log Recording: The system tracks the status of each stage in real time, collects and stores logs, and issues alarms when anomalies occur.

[0109] S10 Dynamic Service Deployment (Optional): Users configure service parameters, the system publishes standardized interfaces, and rate limiting and security monitoring are implemented simultaneously.

[0110] S11 Synchronization Result Feedback and Verification: The target system returns a receipt confirmation. If successful, the task ends; if it fails, a retry is triggered. After multiple failures, an alarm is recorded.

[0111] This application provides a medical data security synchronization system and method, which enables data synchronization between different medical information systems, improves data adaptability and flexibility, reduces development costs, and enhances system security.

[0112] Another aspect of this application provides a method for secure synchronization of medical data, applied to the aforementioned system. The method includes: a system access and protocol adaptation step: identifying the communication protocol of an external information system and completing data acquisition through a dynamic protocol adaptation module; a data cleaning step: cleaning the data according to rules configured non-programmatically through a visual data cleaning module; a privacy calculation step: identifying privacy information in the data and applying privacy protection strategies through a privacy calculation module; and a data synchronization and service publishing step: synchronizing the processed data to the target system, and / or publishing the processed data externally in the form of a standardized service interface through a service publishing and concurrency control module.

[0113] In this embodiment of the application, the system access and protocol adaptation steps include: First, an automatic detection method is used to attempt to identify the protocol type of the external information system; If automatic detection fails, the system will receive protocol parameters manually entered by the user to complete the adaptation configuration.

[0114] In this embodiment of the application, configuring rules in a non-programming manner includes: parsing the natural language instructions input by the user into executable data cleaning rules through a natural language parsing unit, and providing a visual preview of the rules before execution for the user to confirm or adjust.

[0115] See Figure 10 The medical data secure synchronization method based on the above system provided in this application includes the following steps: System Access and Protocol Adaptation Steps. When an external system connects to the aforementioned medical data security synchronization system, the central processing module initiates a synchronization task. First, it automatically detects the target system's protocol using the protocol detection unit of the dynamic protocol adaptation module. If automatic detection fails, it receives manually configured protocol parameters from the administrator through the management interface. Upon successful connection, it establishes a connection and collects data.

[0116] Data cleaning steps. Data flows into the visual data cleaning module. Administrators configure rules in a non-programming manner, such as through natural language commands or drag-and-drop components. After parsing the natural language commands, the system provides a visual preview of the rules for confirmation, and then the cleaning execution unit performs the cleaning.

[0117] Privacy-preserving computation steps. The cleaned data enters the privacy-preserving computation module for privacy information identification and protection.

[0118] Data synchronization and service deployment steps. The processed data is synchronized to the target system by the central processing module.

[0119] Furthermore, the medical data security synchronization method in this application embodiment can publish data as a standardized API service for authorized applications to call through the service publishing and concurrency control module.

[0120] The medical data secure synchronization method provided in this application also has the beneficial effects and advantages of a medical data secure synchronization system, which will not be elaborated upon here.

[0121] The above are merely exemplary descriptions of the embodiments of this application. It should be noted that the embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.

[0122] In the foregoing description of this specification, references to terms such as "one embodiment," "another embodiment," or "some embodiments," etc., indicate that a specific feature, structure, material, or characteristic described in connection with an embodiment is included in at least one embodiment of this application. In this specification, illustrative expressions of the above terms do not necessarily refer to the same embodiment. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments.

[0123] Although embodiments of this application have been shown and described, those skilled in the art will understand that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principles and spirit of this application, the scope of which is defined by the claims and their equivalents.

[0124] The above is merely an exemplary description of the embodiments of this application. This application is not limited to the embodiments described. Those skilled in the art can make equivalent modifications or substitutions without departing from the spirit of this application. All such equivalent modifications or substitutions are included within the scope defined by the claims of this application.

Claims

1. A medical data secure synchronization system for synchronizing data between different medical information systems, characterized in that, The system includes: The central processing module is configured to be responsible for task scheduling, coordination between functional modules, and management of system resources; The dynamic protocol adaptation module communicates with the external information system connected to the system. It is used to automatically detect or receive configuration to identify the communication protocol used by the external information system. Based on the identified communication protocol, it calls the corresponding protocol template to parse and convert the transmitted data. The visual data cleaning module is communicatively connected to the dynamic protocol adaptation module and is used to clean the data after protocol adaptation. The visual data cleaning module provides a graphical interface and supports configuring data cleaning rules in a non-programming manner.

2. The system according to claim 1, characterized in that, The dynamic protocol adaptation module includes: Protocol detection unit, used to automatically identify protocol type; An extensible protocol template library for storing various standard communication protocol templates and supporting the addition of custom protocol templates; The protocol parsing unit is used to call the corresponding template to parse the data based on the identified protocol type; The protocol adaptation unit is used to convert data into a protocol format compatible with the target system.

3. The system according to claim 1, characterized in that, The visualization data cleaning module includes: The cleaning component library contains multiple predefined data cleaning components corresponding to different cleaning functions. The visual data cleaning component configuration unit provides a graphical data cleaning component operation interface for configuring data cleaning rules by assembling the data cleaning components; The natural language parsing unit integrates a semantic understanding model to parse user-input natural language commands into executable data cleaning rules; The cleaning execution unit is used to perform data cleaning operations according to the configured rules. The cleaning result verification unit is used to verify the correctness of the cleaned data.

4. The system according to claim 3, characterized in that, The natural language parsing unit is a dedicated model trained on medical corpus, capable of parsing the semantic association between medical-specific fields and cleaning operations, and supporting error correction and intent guidance for ambiguous commands.

5. The system according to claim 1, characterized in that, The system also includes a privacy computing module, which is communicatively connected to the visualization data cleaning module. This module identifies privacy information in the cleaned data and applies a preset privacy protection strategy. The privacy computing module includes: A privacy data identification unit is used to automatically identify privacy information in data based on preset rules; The policy configuration unit is used to configure various privacy protection policies; A privacy processing unit is used to process identified privacy data according to a configured policy; The key management unit is used for the secure management of keys used in the privacy processing process.

6. The system according to any one of claims 1 to 5, characterized in that, The system also includes a full-process tracking and log management module for monitoring the data synchronization process. The full-process tracking and log management module includes: The process tracking unit is configured to collect status information at key nodes to form a full-process tracking link; The log collection unit is configured to generate detailed log data based on the information collected by the process tracking unit. The log storage unit is configured to store the collected log data; The global monitoring unit provides a visual dashboard to display the core operational metrics of system tasks; The intelligent analysis unit integrates machine learning and / or non-machine learning models to analyze log data to pinpoint the root cause of anomalies and generate remediation suggestions in natural language.

7. The system according to any one of claims 1 to 5, characterized in that, The system also includes a service publishing and concurrency control module, which publishes the processed data to the outside world in the form of a standardized service interface, and performs concurrent traffic control on requests to access the service interface.

8. A method for securely synchronizing medical data, applied to the system according to any one of claims 1 to 7, characterized in that, The method includes: System access and protocol adaptation steps: Identify the communication protocol of the external information system and complete data acquisition through the dynamic protocol adaptation module; Data cleaning steps: The data is cleaned using a visual data cleaning module based on rules configured in a non-programming manner; Privacy computation steps: Identify privacy information in the data and apply privacy protection policies through the privacy computation module; Data synchronization and service publishing steps: Synchronize the processed data to the target system, and / or publish the processed data to the outside world in the form of a standardized service interface through the service publishing and concurrency control module.

9. The method according to claim 8, characterized in that, The system access and protocol adaptation steps include: First, an automatic detection method is used to attempt to identify the protocol type of the external information system; If automatic detection fails, the system will receive protocol parameters manually entered by the user to complete the adaptation configuration.

10. The method according to claim 8 or 9, characterized in that, Configuring rules in a non-programming manner includes: using a natural language parsing unit to parse user-input natural language commands into executable data cleaning rules, and providing a visual preview of the rules for user confirmation or adjustment before execution.