Blockchain and ai-based medical industry data asset integration system and method

By integrating blockchain and AI, the system addresses the problem of isolated medical data, enabling data cleansing, dynamic access control, and cross-domain collaborative modeling. It also optimizes logistics routes, improves data utilization efficiency and security, and unlocks the value of data assets.

CN120951383BActive Publication Date: 2026-06-23HUNAN PHARMACEUTICAL INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUNAN PHARMACEUTICAL INFORMATION TECHNOLOGY CO LTD
Filing Date
2025-08-06
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Medical data is siloed among hospitals, pharmaceutical companies, and distribution companies, making it difficult to use data across different sectors, affecting the optimal allocation of medical resources and the release of data asset value, and lacking a unified integration mechanism.

Method used

The system adopts a blockchain and AI-based medical industry data asset integration system, which includes a data acquisition layer, a blockchain rights confirmation layer, a federated learning analysis layer, a data asset trading platform, and a supply chain traceability optimization module to achieve data cleaning, dynamic access control, cross-domain collaborative modeling, and logistics route optimization.

Benefits of technology

Break down medical data silos, achieve secure integration and efficient utilization of cross-domain data, unlock the value of data assets, and improve data utilization efficiency and security.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120951383B_ABST
    Figure CN120951383B_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of medical data integration, and particularly relates to a medical full-industry data asset integration system and method based on blockchain and AI. The system comprises a data collection layer, a blockchain right confirmation layer, a federal learning analysis layer, a data asset transaction platform and a supply chain traceability optimization module. The data collection layer comprises an edge computing node, the edge computing node is internally provided with a data cleaning engine and a desensitization algorithm, and is used for local cleaning and desensitization processing of hospital HIS system data, medicine RFID data, medicine research and development experiment data and medical image data. The blockchain right confirmation layer constructs a medical data asset account book based on a consortium chain architecture, the asset account book comprises data hash records, contributor identity labels and use authorization logs, and an intelligent contract module is integrated for dynamic permission management. The present application can break the medical data island, and realize cross-field data safe integration and efficient utilization.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of medical data integration technology, and in particular to a system and method for integrating medical data assets across the entire industry based on blockchain and AI. Background Technology

[0002] In the entire healthcare industry chain, data is a core resource, and its circulation and integration directly impact the efficiency of healthcare services, the progress of pharmaceutical R&D, and the stability of the supply chain. However, under current technology, healthcare data has long been in a state of "siloed" existence: clinical diagnosis and treatment data from hospitals, R&D experimental data from pharmaceutical companies, and inventory and logistics data from medical distribution companies are stored in their respective independent systems, with significant differences in data formats, standards, and management rules, lacking a unified integration mechanism. This siloing makes it difficult to utilize data across different sectors: pharmaceutical companies cannot obtain timely feedback on clinical efficacy to optimize R&D directions, hospitals cannot predict drug supply based on distribution data, and distribution companies cannot dynamically adjust inventory according to clinical needs. At the same time, data silos prevent cross-entity joint analysis, leaving massive amounts of valuable data idle due to the lack of linkage, which not only restricts the optimal allocation of healthcare resources but also hinders the release of the value of healthcare data assets.

[0003] Based on the above problems, there is an urgent need for a technical solution that can break down medical data silos and achieve secure integration and efficient utilization of cross-domain data. Summary of the Invention

[0004] The purpose of this invention is to address the shortcomings of existing technologies and proposes a data asset integration system for the entire medical industry based on blockchain and AI. The system is characterized by including a data acquisition layer, a blockchain rights confirmation layer, a federated learning analysis layer, a data asset trading platform, and a supply chain traceability optimization module.

[0005] The data acquisition layer includes edge computing nodes, which have built-in data cleaning engines and desensitization algorithms for localized cleaning and desensitization of hospital HIS system data, drug RFID data, pharmaceutical R&D experimental data and medical imaging data.

[0006] The blockchain ownership confirmation layer constructs a medical data asset ledger based on a consortium blockchain architecture. The asset ledger includes data hash records, contributor identity identifiers, and usage authorization logs, and integrates a smart contract module for dynamic permission management.

[0007] The federated learning analysis layer is configured with a distributed AI training framework, which is used to train drug demand prediction models and adverse reaction monitoring models in conjunction with nodes in medical distribution, pharmaceutical manufacturing and health care scenarios, without transmitting the original data across domains.

[0008] The data asset trading platform is built on a tokenization mechanism and integrates a secure computing module to enable the transfer of permissions in a way that makes the original data invisible during the data usage rights trading process.

[0009] The supply chain traceability optimization module includes an RFID data parsing unit and an AI logistics scheduling engine, which are used to optimize logistics routes by combining on-chain drug batch records and real-time inventory data.

[0010] Preferably, the blockchain rights confirmation layer includes a digital fingerprint generator, a zero-knowledge proof verification unit, a smart contract executor, and a dual-chain communication interface;

[0011] The digital fingerprint generator uses the SHA-256 hash algorithm to calculate the cleaned medical data after data collection is completed, generating a unique digital fingerprint containing a data content summary, collection timestamp, and collection terminal hardware identifier.

[0012] The zero-knowledge proof verification unit has a built-in verification algorithm, which is used to verify that the data has not been tampered with and that its source is authentic by comparing the digital fingerprint generated by the data being called with the on-chain evidence fingerprint during the data calling phase.

[0013] The smart contract executor stores a data usage agreement, which presets the data call permission range, the single call revenue sharing ratio, and the cumulative call revenue sharing threshold. When the data call behavior meets the preset conditions of the agreement, a revenue sharing voucher containing the caller's identifier, call time, and revenue sharing amount is automatically generated.

[0014] The dual-chain communication interface is used to realize cross-chain data synchronization between the evidence storage chain and the transaction chain. The evidence storage chain is a consortium chain structure used to persistently store digital fingerprints, contributor public keys and zero-knowledge proof verification records. The transaction chain is a private chain structure with permissions used to store smart contracts, ledger vouchers and permission change records.

[0015] More preferably, the federated learning analysis layer includes a cross-domain model training engine, a heterogeneous data alignment unit, and a differential privacy protection module;

[0016] The cross-domain model training engine includes a parameter server and a node training client. The parameter server is used to receive and aggregate the model parameter gradients uploaded by each node. The node training client is deployed at pharmaceutical company and hospital terminals and is used to perform model training based on local data.

[0017] The heterogeneous data alignment unit incorporates a pre-trained feature extraction model, which includes a medical image feature extraction sub-model and a pathological text parsing sub-model. The medical image feature extraction sub-model uses a convolutional neural network structure to extract lesion feature vectors from unstructured medical images. The pathological text parsing sub-model uses a BERT pre-trained model to convert structured pathology reports into text feature vectors and maps the two types of vectors to a unified feature space of dimension 512 through a feature mapping algorithm. The differential privacy protection module is equipped with a noise generator to add Laplacian noise before uploading the model parameter gradient. The noise intensity is dynamically adjusted according to the data sensitivity, with the noise coefficient for clinical case data set to 0.3 and the noise coefficient for pharmaceutical R&D experimental data set to 0.5.

[0018] More preferably, the data asset trading platform includes a token issuance module, a transaction matching engine, a revenue sharing and settlement engine, and a transaction record storage module;

[0019] The Token issuance module is used to generate corresponding Tokens according to the data access permission level. The Token contains an encrypted fragment of the data access key, the validity period of the permission, and the analyzable dimension information. The Token for clinical trials contains 3 analysis dimensions and has a validity period of 30 days, while the Token for basic statistics contains 1 analysis dimension and has a validity period of 15 days.

[0020] The transaction matching engine is used to match the token purchase request of the data demander with the token sale information of the data provider, and automatically generate transaction orders based on the preset price range and permission requirements of both parties.

[0021] The revenue sharing and settlement engine is connected to the smart contract executor of the blockchain rights confirmation layer through an API interface. It is used to receive revenue sharing vouchers and convert revenue sharing amount into cashable assets. The revenue sharing amount is calculated based on data contribution, number of calls and weight of analysis dimensions.

[0022] The transaction record storage module is used to upload transaction orders, token transfer records and settlement results to the transaction chain to form a complete transaction traceability chain, ensuring that the transaction process is traceable and tamper-proof.

[0023] More preferably, the blockchain rights confirmation layer is calculated using a multi-source data contribution coefficient formula, the expression of which is:

[0024] ;

[0025] in, For the first Contribution coefficient of the data source class; For the first The number of valid samples from the data source class; For all participants in the joint analysis Total number of valid samples from the data source; For the first Quality coefficient of class data; For the first The timeliness coefficient of the data type is determined by the formula. Calculation, where The time interval from data generation to the present. This refers to the maximum number of valid days for this type of data; For the first Domain urgency coefficient for class data; For the first The domain coefficient of the class data, , , , These are the weighting coefficients, and ,in , , , .

[0026] More preferably, the heterogeneous data alignment unit calculates feature alignment weights using a feature alignment weight formula, which are used to correct feature offsets when heterogeneous data is mapped to a unified feature space.

[0027] ;

[0028] in, For the first Class data source in the first Alignment weights on each feature dimension; For the first Contribution coefficient of the data source class; For the first Class data source in the first The mean of each feature dimension; For all data sources participating in joint training, at the... The overall mean across all feature dimensions is obtained by the parameter server from the data uploaded by each node. We obtain the result by weighted aggregation; For all data sources in the first The total standard deviation across all feature dimensions; This is the offset correction factor; For the first Importance coefficients for each feature dimension.

[0029] More preferably, the supply chain traceability optimization module calculates the following using the logistics dynamic adjustment coefficient formula:

[0030] ;

[0031] in, For from warehouse To the demand area Logistics adjustment coefficient, Indicates priority from warehouse To the region Transfer of goods, This indicates a delay in ordering the goods. For warehouse Current actual inventory quantity; For warehouse Feature alignment weights for inventory data; For warehouse In the past Daily inventory fluctuations; This is the time decay coefficient; For demand areas Current demand for medicines; For demand areas Feature alignment weights for demand data; For warehouse To the demand area The straight-line distance; This is the distance influence coefficient; This is a correction factor for sudden events.

[0032] Furthermore, the data asset trading platform further includes a token verification unit and a permission revocation engine;

[0033] The Token verification unit has a built-in Token parsing algorithm, which is used to verify whether the recipient has the corresponding data usage rights after the Token is transferred. The verification content includes the validity period of the Token, the range of accessible data, and the remaining quota of call counts.

[0034] The permission revoke engine works in conjunction with the smart contract executor of the blockchain rights confirmation layer to automatically revoke the recipient's data access permissions when the token expires, the number of calls is exhausted, or the data provider triggers a permission revoke instruction, and uploads the permission revoke record to the transaction chain. After the data usage right transaction is completed, the token verification unit sends a transaction completion notification to the data provider's terminal. The notification includes the transaction amount, the recipient's identifier, and the permission's effective time. At the same time, the permission revoke engine starts timing to ensure that the permission is revoked in a timely manner after the token's validity period expires, thus avoiding permission abuse.

[0035] A method for integrating healthcare industry-wide data assets based on blockchain and AI, applied to the healthcare industry-wide data asset integration system based on blockchain and AI as described in any of the above, characterized in that it includes:

[0036] S1: The edge computing nodes of the data acquisition layer receive medical data sent by the hospital HIS system, pharmaceutical R&D system and medical distribution platform. They remove duplicate and invalid data through the built-in cleaning engine, and then replace sensitive information through the desensitization algorithm to obtain standardized data for subsequent processing.

[0037] S2: The digital fingerprint generator of the blockchain ownership confirmation layer generates digital fingerprints for standardized data, uploads the digital fingerprints and contributor public keys to the evidence storage chain, and at the same time the smart contract executor generates a smart contract containing data usage rules and deploys it to the transaction chain;

[0038] S3: The cross-domain model training engine of the federated learning analysis layer receives local model training requests from each node. The heterogeneous data alignment unit performs feature alignment on heterogeneous data according to the feature alignment weight formula. The differential privacy protection module adds dynamic noise and then performs joint training. Only the model parameter gradient is transmitted during the training process.

[0039] S4: The data asset trading platform matches data supply and demand based on the tokenization mechanism, realizes anonymous trading of data usage rights through a multi-party secure computing module, and calculates the revenue sharing amount based on the blockchain ownership confirmation layer through the multi-source data contribution coefficient formula.

[0040] S5: The RFID data parsing unit of the supply chain traceability optimization module reads the RFID information of the medicine and generates the medicine circulation trajectory by combining it with the batch records on the chain. The AI ​​logistics scheduling engine generates a logistics path adjustment plan based on the logistics dynamic adjustment coefficient formula.

[0041] In a further preferred embodiment, S3 also includes a model accuracy optimization sub-step: During joint training, after the parameter server of the cross-domain model training engine completes every 10 rounds of parameter aggregation, it calculates the model's accuracy on the validation set of each node; when the validation set accuracy of a node is lower than a preset threshold, the parameter server sends a feature importance prompt to that node, which includes the three feature dimensions that have the greatest impact on model accuracy and the corresponding alignment weights of the features. The node adjusts the feature weights of its local training data according to the prompts, increases the training ratio of high-importance features, re-executes model training after adjustment, and uploads the parameter gradients; the parameter server receives the adjusted gradients and aggregates them again, repeating the above process until the validation set accuracy of all nodes is not lower than the preset threshold.

[0042] Technical effects:

[0043] This invention breaks down medical data silos through collaboration between localized data cleaning and desensitization at the data acquisition layer, dynamic permission management at the blockchain ownership layer, cross-domain collaborative modeling at the federated learning layer, tokenized transactions on the trading platform, and supply chain traceability optimization. It achieves secure data integration and efficient utilization across medical distribution, pharmaceutical manufacturing, and healthcare scenarios, enabling cross-domain collaboration without leaking original information and unlocking the value of data assets. Attached Figure Description

[0044] Figure 1 This is a block diagram of the blockchain and AI-based medical industry data asset integration system proposed in this application;

[0045] Figure 2 This is a flowchart illustrating the blockchain and AI-based approach to integrating data assets across the entire healthcare industry, as outlined in this application. Detailed Implementation

[0046] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.

[0047] Traditional medical data integration systems suffer from the following technical problems: medical data is scattered across different entities such as hospitals, pharmaceutical companies, and distribution companies, forming data silos with inconsistent standards, making cross-domain sharing difficult; due to significant structural differences, cross-domain data has low feature alignment accuracy, affecting the effectiveness of joint analysis; inventory forecasting based on static data in the supply chain is prone to errors, and these errors can propagate along the logistics chain, leading to resource waste. Furthermore, unclear data ownership and insufficient privacy protection further restrict the asset utilization of medical data.

[0048] Based on this, please refer to Figure 1This embodiment provides a blockchain and AI-based integrated data asset system for the entire medical industry, including a data acquisition layer, a blockchain ownership confirmation layer, a federated learning and analysis layer, a data asset trading platform, and a supply chain traceability optimization module. The data acquisition layer includes edge computing nodes with built-in data cleaning engines and de-identification algorithms for localized cleaning and de-identification of hospital HIS system data, drug RFID data, pharmaceutical R&D experimental data, and medical imaging data. The blockchain ownership confirmation layer constructs a medical data asset ledger based on a consortium blockchain architecture. This ledger includes data hash records, contributor identity identifiers, and usage authorization logs, and integrates a smart contract module for dynamic permission management. The federated learning and analysis layer is configured with a distributed AI training framework for... Under the premise of no cross-domain data transmission, the system trains drug demand prediction and adverse reaction monitoring models by linking nodes in medical distribution, pharmaceutical manufacturing, and healthcare scenarios. The data asset trading platform is built on a tokenization mechanism and integrates a multi-party secure computing module to achieve the transfer of permissions without the original data being visible during the data usage rights trading process. The supply chain traceability optimization module includes an RFID data parsing unit and an AI logistics scheduling engine to optimize logistics routes by combining on-chain drug batch records and real-time inventory data. Through the dynamic rights confirmation of the blockchain rights confirmation layer, the cross-domain collaboration of the federated learning analysis layer, and the intelligent scheduling collaboration of the supply chain traceability optimization module, the system solves the problems of medical data silos, low cross-domain data feature alignment accuracy, and the transmission of supply chain demand prediction deviations.

[0049] This solution integrates the entire process of medical data acquisition, ownership confirmation, analysis, trading, and supply chain optimization through multi-module collaboration. Localized processing at the data acquisition layer ensures data standardization and initial privacy protection; the blockchain ownership confirmation layer enables clear and traceable data ownership; federated learning enables cross-domain collaborative modeling without disclosing original data; the trading platform guarantees the secure transfer of data assets; and the supply chain module optimizes scheduling based on real-time data. The overall system breaks down data silos, improves the efficiency of cross-domain data utilization, and ensures data security through blockchain and privacy computing, providing a feasible path for the assetization of medical data.

[0050] Traditional blockchain ownership confirmation mechanisms have the following technical problems: In centralized data ownership confirmation, data ownership records are easily tampered with, and ownership division relies on manual review, which poses a risk of unclear ownership; it is difficult to quickly verify the authenticity of data when data is called, and it is easy for data to be used after being tampered with; data accounting relies on manual calculation, the execution of accounting rules is not transparent, and accounting lags behind data calling behavior, which reduces the enthusiasm of data providers to participate.

[0051] Based on this, the blockchain rights confirmation layer includes a digital fingerprint generator, a zero-knowledge proof verification unit, a smart contract executor, and a dual-chain communication interface. The digital fingerprint generator uses the SHA-256 hash algorithm to calculate a unique digital fingerprint containing a data content summary, a collection timestamp, and the hardware identifier of the collection terminal when data collection is complete. The zero-knowledge proof verification unit has a built-in verification algorithm used during the data retrieval phase to verify that the data has not been tampered with and that its source is authentic by comparing the digital fingerprint generated from the retrieved data with the on-chain evidence fingerprint. The smart contract executor stores the data usage protocol. The protocol presets the data call permission range, the single call revenue sharing ratio, and the cumulative call revenue sharing threshold. When the data call behavior meets the preset conditions of the protocol, a revenue sharing voucher containing the caller's identifier, call time, and revenue sharing amount is automatically generated. The dual-chain communication interface is used to realize cross-chain data synchronization between the evidence storage chain and the transaction chain. The evidence storage chain is a consortium chain structure used to persistently store digital fingerprints, contributor public keys, and zero-knowledge proof verification records. The transaction chain is a private chain structure with permissions used to store smart contracts, revenue sharing vouchers, and permission change records. The dual-chain synchronization delay does not exceed 100ms to ensure the consistency and immutability of the rights confirmation records.

[0052] This solution achieves dynamic ownership confirmation and efficient revenue sharing through multiple technical means. A digital fingerprint generator generates unique identifiers based on multi-dimensional information, ensuring data traceability; a zero-knowledge proof verification unit verifies authenticity without exposing the original data, balancing efficiency and security; a smart contract executor automatically executes revenue sharing rules, avoiding delays and unfairness caused by human intervention; the dual-chain structure has a clear division of labor: the evidence storage chain ensures the immutability of core records, the transaction chain ensures flexible execution of transactions and revenue sharing, and the synchronization of the two chains ensures data consistency. This overall mechanism solves the problems of unclear ownership, inefficient verification, and opaque revenue sharing in traditional ownership confirmation methods, increasing the trust and willingness of data providers to participate.

[0053] Traditional federated learning faces the following technical challenges in medical data applications: cross-domain medical data, such as molecular structure data from pharmaceutical companies and efficacy data from hospitals; due to significant differences in data types, structured and unstructured data are difficult to directly train together; heterogeneous data, such as medical images and pathology reports, are prone to feature shifts when mapped to a unified feature space, leading to a decrease in the accuracy of the joint model; and during joint training, if the original data is not properly protected, there is a risk of privacy leakage, especially since medical data contains a large amount of sensitive information and has higher privacy protection requirements.

[0054] Based on this, the federated learning analysis layer includes a cross-domain model training engine, a heterogeneous data alignment unit, and a differential privacy protection module. The cross-domain model training engine includes a parameter server and node training clients. The parameter server receives and aggregates the model parameter gradients uploaded by each node, while the node training clients are deployed on pharmaceutical company and hospital terminals to perform model training based on local data. The heterogeneous data alignment unit has a built-in pre-trained feature extraction model, which includes a medical image feature extraction sub-model and a pathological text parsing sub-model. The medical image feature extraction sub-model uses a convolutional neural network structure to extract lesion feature vectors from unstructured medical images. The pathological text parsing sub-model uses a BERT pre-trained model to convert structured pathology reports into text feature vectors and maps the two types of vectors to a unified feature space of dimension 512 through a feature mapping algorithm. The differential privacy protection module is configured with a noise generator to add Laplacian noise before uploading the model parameter gradients. The noise intensity is dynamically adjusted according to data sensitivity. The noise coefficient for clinical case data is set to 0.3, and the noise coefficient for pharmaceutical R&D experimental data is set to 0.5, so as to protect privacy while ensuring model training accuracy.

[0055] This solution optimizes the pain points of cross-domain medical data collaborative analysis: the cross-domain model training engine achieves distributed training through a parameter server and client architecture, avoiding cross-domain transmission of raw data; the heterogeneous data alignment unit uses a dedicated feature extraction sub-model to convert different types of data into feature vectors of a unified dimension, solving the problem of difficulty in integrating heterogeneous data; the differential privacy protection module dynamically adjusts noise intensity according to data sensitivity, appropriately reducing noise to ensure model accuracy in highly sensitive scenarios such as clinical data, and enhancing protection in scenarios such as R&D data, achieving a balance between privacy and accuracy. The overall framework improves the feasibility and security of cross-domain medical data joint modeling, providing technical support for collaboration between pharmaceutical R&D and clinical feedback.

[0056] Traditional medical data transactions suffer from the following technical problems: As a special asset, data is difficult to separate from its ownership and use rights, which can easily lead to data being purchased and used or resold indefinitely, harming the interests of data providers; if raw data is directly exposed during the data transaction process, there is a risk of privacy leakage, especially since medical data contains sensitive patient information, and the consequences of such leakage are serious; the execution of revenue sharing rules is not transparent, making it difficult for data providers to confirm the number of data calls and the corresponding revenue, reducing the trust level of the transaction.

[0057] Based on this, the data asset trading platform includes a token issuance module, a transaction matching engine, a revenue sharing and settlement engine, and a transaction record storage module. The token issuance module generates corresponding tokens based on the data's access permission level. Each token contains an encrypted fragment of the data access key, the validity period of the permission, and information on analyzable dimensions. Tokens used for clinical trials contain three analytical dimensions and have a validity period of 30 days, while tokens used for basic statistics contain one analytical dimension and have a validity period of 15 days. The transaction matching engine matches token purchase requests from data requesters with token sale information from data providers, automatically generating transaction orders based on the preset price range and permission requirements of both parties. The revenue sharing and settlement engine connects to the smart contract executor of the blockchain's rights confirmation layer via an API interface, receiving revenue sharing vouchers and converting revenue sharing amounts into withdrawable assets. The revenue sharing amount is calculated based on data contribution, call count, and analytical dimension weights. The transaction record storage module uploads transaction orders, token transfer records, and revenue sharing results to the transaction chain, forming a complete transaction traceability chain to ensure the transaction process is traceable and tamper-proof.

[0058] This solution addresses the core issues of data asset trading through a tokenization mechanism: Tokens clearly define usage permissions, duration, and dimensions, enabling controlled transfer of usage rights and preventing data misuse; the transaction matching engine automatically matches supply and demand based on preset conditions, improving transaction efficiency; revenue sharing and settlement are linked with blockchain smart contracts, ensuring transparent execution of revenue sharing rules, and data providers can verify their earnings through on-chain records; transaction records are stored on the blockchain throughout the entire process, achieving full traceability and reducing transaction disputes. Furthermore, the token contains only encrypted access key fragments, preventing the exposure of raw data and protecting data privacy.

[0059] The platform provides a safe, transparent, and controllable environment for medical data asset transactions, promoting the rational flow and value release of medical data.

[0060] Traditional data contribution calculation has the following technical problems: it only quantifies contribution based on data volume, ignoring data quality, which leads to low-quality data still receiving unreasonable benefits and reduces the enthusiasm of high-quality data providers; it does not consider the timeliness of data, such as the significant difference in analytical value between real-time inventory data and historical inventory data during the epidemic, and uniformly calculating contribution will lead to unfair revenue sharing; data from different fields have different importance to medical analysis, but there is a lack of clear weighting, making it difficult to reflect the impact of field differences on contribution.

[0061] Based on this, the blockchain rights confirmation layer calculates the multi-source data contribution coefficient using the following formula, which serves as the basis for calculating the revenue sharing ratio and the federated learning weight:

[0062] ;

[0063] in, For the first The contribution coefficient of the data source ranges from 0 to 1; For the first The effective number of samples from the data source is specifically the amount of data that can be used for model training after cleaning and desensitization. For all participants in the joint analysis Total number of valid samples from the data source; For the first The quality coefficient of the data type, ranging from 0 to 1, is calculated by combining data completeness and annotation accuracy. Data completeness is the completion rate of required fields, and annotation accuracy is the consistency rate of expert review. The weights of the two are 60% and 40%, respectively. For the first The timeliness coefficient of the data type ranges from 0 to 1, and is determined by the formula... Calculation, where The time interval from data generation to the present. This refers to the maximum number of valid days for this type of data; For the first The domain urgency coefficient for this type of data ranges from 0 to 1.5 and is updated by an event-aware contract deployed on the blockchain. This data is used for data circulation during public health events. During normal periods ; For the first The domain coefficient for this type of data ranges from 0 to 1. It is preset by a medical expert committee and stored on the blockchain for verification. (This is related to) clinical efficacy data. Drug distribution data Basic research and development data ; , , , These are the weighting coefficients, and ,in , , , .

[0064] This formula is used to quantify the contribution of the i-th type of data source to the joint analysis and revenue sharing of medical data. It is the core basis for data value assessment, revenue sharing ratio determination and federated learning weight allocation.

[0065] The formula as a whole achieves a comprehensive evaluation of the contribution of data through weighted calculation of four dimensions. The four dimensions correspond to data volume, data quality, data timeliness and domain urgency, and data domain importance. Each dimension is balanced by weight coefficients α = 0.2, β = 0.3, γ = 0.3, and δ = 0.2, and the sum of the weight coefficients is 1 to ensure the scientific nature of the evaluation logic.

[0066] The first dimension is the percentage of valid samples, which is the number of valid samples from the i-th type of data source. Total effective sample size of all data sources participating in the joint analysis The ratio reflects this. Valid samples refer to data that has been cleaned and desensitized and can be used for model training. This part is weighted by α to reflect the fundamental supporting role of data volume in joint analysis, avoiding the conclusion that contribution is solely based on data volume while also taking into account the fundamental value of data volume.

[0067] The second dimension is the data quality coefficient. The data quality coefficient is calculated using a beta-weighted approach, which integrates data completeness and annotation accuracy, with weights of 60% and 40%, respectively. This section highlights the value of high-quality data, guiding data providers to focus on data completeness and accuracy, and addressing the problem of traditional assessments neglecting data quality, leading to unreasonable gains from low-quality data.

[0068] The third dimension is the product of timeliness and domain urgency, weighted by γ, and derived from the timeliness coefficient. and domain urgency coefficient The result is obtained by multiplying. The timeliness coefficient is calculated from the time interval from data generation to the present. The longest valid days for this type of data calculate The newer the data, the higher the timeliness coefficient; the domain urgency coefficient is dynamically adjusted according to the actual scenario, such as the higher urgency of circulation data during public health emergencies. This part addresses the issue of unfair revenue sharing caused by the failure to distinguish between data timeliness and scenario urgency in traditional assessments.

[0069] The fourth dimension is the domain coefficient. Through delta weighting, data is preset by domain experts and stored on the blockchain. For example, clinical efficacy data has a higher weight than basic research data, reflecting the inherent value differences of data from different fields in medical analysis and avoiding the homogenization of the value of different types of data.

[0070] In summary, this formula comprehensively quantifies data contribution through multi-dimensional weighted fusion, providing a clear basis for the formulation of revenue sharing rules and the allocation of federated learning weights, making data value assessment more aligned with the actual application scenarios of medical data.

[0071] This formula quantifies data contribution from multiple dimensions. Data volume accounts for only 20% of the weight, avoiding the pursuit of quantity over quality; data quality accounts for 30%, objectively assessed through completeness and annotation accuracy to guide the provision of high-quality data; timeliness and urgency are combined to ensure that critical data in urgent scenarios receives reasonable recognition; and domain coefficients are preset by experts to reflect the inherent value differences of different data types. Each dimension is reasonably allocated its influence through weight coefficients, and the calculation results objectively reflect the actual contribution of data to joint analysis. The revenue sharing and weight allocation based on these coefficients are fairer, incentivizing data providers to improve data quality and provide critical data in a timely manner, while aligning with the medical field's perception of the value of different types of data, providing a reasonable basis for the distribution of benefits in data collaboration.

[0072] Traditional heterogeneous data alignment techniques suffer from the following technical problems: When heterogeneous data in the medical field is mapped to a unified feature space, feature shifts can easily occur due to differences in data distribution, resulting in the fused features failing to accurately reflect the essence of the data; fixed feature alignment weights cannot be dynamically adjusted based on data contribution and distribution differences, and when features from some data sources deviate from the overall distribution, fusion is still performed with fixed weights, which lowers the accuracy of the joint model; and the importance of features is not differentiated, with features that have a significant impact on the model and features with a minor impact receiving the same weight, failing to highlight the role of key features.

[0073] Based on this, the heterogeneous data alignment unit calculates the feature alignment weight using the following formula to correct the feature offset when heterogeneous data is mapped to a unified feature space:

[0074]

[0075] in, For the first Class data source in the first Alignment weights on each feature dimension, with values ​​ranging from 0 to 1.2; For the first Contribution coefficient of data source class For the first Class data source in the first The mean value of each feature dimension is calculated locally by the data source and then encrypted and uploaded to the parameter server of the federated learning analysis layer. For all data sources participating in joint training, at the... The overall mean across all feature dimensions is obtained by the parameter server from the data uploaded by each node. We obtain the result by weighted aggregation; For all data sources in the first The total standard deviation across each feature dimension is the historical statistical data stored on the blockchain, updated every 24 hours. This is the offset correction factor, ranging from 0.1 to 0.5. It is used when the accuracy of the federated learning model on the validation set drops by more than 5%. Automatically increase by 0.1; when accuracy returns to the initial value, Restored to 0.1; For the first The importance coefficients for each feature dimension range from 0.5 to 1.2 and are labeled by domain experts based on the degree of influence of the features on the model output.

[0076] This formula is used to correct the feature offset of the i-th type of data source in the k-th feature dimension, making heterogeneous medical data, such as medical images and pathology report data, more accurate when mapped to a unified feature space. It provides reliable input for federated learning joint training and solves the problem of decreased model accuracy caused by feature offset in traditional heterogeneous data fusion.

[0077] The formula as a whole calculates the alignment weights through the product of three core components: the data source contribution coefficient, the feature offset correction term, and the feature importance coefficient. These three components work together to achieve dynamic weight adjustment, ensuring that high-value, low-offset, and high-importance features receive reasonable weights.

[0078] The first part is the contribution coefficient of the data source. This refers to the result calculated using the multi-source data contribution coefficient formula. This part correlates data contribution with feature alignment weights, giving higher weights to data sources with higher contributions on the corresponding feature dimensions. This ensures that valuable data features play a greater role in the fusion process and avoids interference from features of low-contribution data with the overall fusion effect.

[0079] The second part is the feature offset correction term, expressed as 1 + θ·| - | / The reciprocal form exists. Among them, It is the mean of the i-th type of data source in the k-th feature dimension. It is the total mean of all data sources in this feature dimension, and the absolute value of the difference between the two reflects the degree of deviation of the feature of this data source from the overall feature. This represents the total standard deviation of all data sources for this feature dimension, used to standardize the degree of bias. θ is the bias correction coefficient, which automatically increases when the accuracy of the federated learning model on the validation set drops by more than 5%, enhancing the correction strength, and returns to its normal value when the accuracy recovers. This part reduces the interference of biased data on feature fusion by using the logic that the larger the bias, the larger the correction term, and the smaller the weight, thus solving the problem that traditional fixed weights cannot cope with differences in data distribution.

[0080] The third part is the feature importance coefficient. , which is marked by domain experts according to the influence degree of features on the model output. This part highlights the role of key features, ensures that features with a greater impact on model accuracy obtain higher weights in the fusion, and avoids treating all features homogenously.

[0081] In summary, by correlating data contribution, dynamically correcting feature offsets, and distinguishing feature importance, this formula makes the mapping of heterogeneous data in the unified feature space more conform to the essence of the data. The setting logic and adjustment rules of related parameters (such as θ, ) are clear, and the calculation process is reproducible.

[0082] This solution corrects feature offsets through dynamic weight calculation: based on data contribution ensuring that high-value data obtains reasonable weights; quantifying the degree of feature offset through the ratio of the mean difference to the standard deviation, with a larger offset resulting in a lower weight to reduce the interference of abnormal data; the offset correction coefficient is dynamically adjusted according to the model accuracy, increasing the correction strength when the accuracy decreases; the feature importance coefficient highlights the role of key features. The weight calculation logic is deeply related to data quality, distribution differences, and feature importance, and can more accurately map heterogeneous data to the unified space. The fused features are more in line with the essence of the data, providing reliable input for joint training, enabling the federated learning model to maintain stable analysis accuracy in heterogeneous data scenarios, and is especially suitable for cross-domain data collaboration in pharmaceutical research and development and clinical feedback.

[0083] Traditional medical supply chain logistics scheduling has the following technical problems: Regional inventory demand forecasting relies on historical data, without considering real-time inventory fluctuations and data reliability, resulting in a large initial forecasting deviation; Logistics path adjustment only considers distance factors, without taking into account inventory data reliability and the impact of emergencies, which is likely to lead to overstocking or understocking of goods; The reliability of inventory and demand data is not incorporated into the scheduling decision. If low-quality data is used for scheduling, it will amplify the forecasting deviation and trigger a chain reaction in the supply chain, such as a shortage of goods in Area A leading to overstocking of replenishments in Area B.

[0084] Based on this, the supply chain traceability optimization module calculates the logistics dynamic adjustment coefficient through the following formula to optimize the logistics scheduling from the warehouse to the demand area:

[0085] ;

[0086] where is the logistics adjustment coefficient from the warehouse to the demand area , indicates preferentially shipping goods from the warehouse to the area , indicates delaying the shipment; is the warehouse Current actual inventory quantity; For warehouse The feature alignment weights of inventory data reflect the reliability of the inventory data; For warehouse In the past Daily inventory fluctuations; This is a time decay coefficient; recent fluctuations have a higher weighting. For demand areas Current demand for medicines; For demand areas Feature alignment weights for demand data; For warehouse To the demand area The straight-line distance; The distance factor has a higher weighting for long-distance cargo transfers; This is a contingency event correction factor, updated by real-time on-chain event evidence storage. This solution optimizes logistics scheduling from multiple dimensions: both inventory and demand data are multiplied by a reliability weight. Reduce the impact of low-quality data; incorporate inventory fluctuations over the past 3 days to capture demand trends; distance factor is used to mitigate the impact of low-quality data. Differentiated treatment to balance transportation costs; unforeseen event factors Rapid response to abnormal situations. Adjustment coefficient. It comprehensively reflects the impact of inventory adequacy, data reliability, fluctuation trends, distance, and unforeseen events, making logistics scheduling more aligned with actual needs.

[0087] This formula is used to calculate the logistics adjustment coefficient from warehouse m to demand area n. The coefficient is greater than or less than 1 to determine whether to prioritize the transfer of goods, thereby optimizing the logistics path of the medical supply chain and solving the problems of over- or under-transfer of goods caused by relying on static data in traditional scheduling, without considering data reliability and unforeseen events.

[0088] The formula is obtained by multiplying the ratio of warehouse supply capacity indicators to demand and transportation cost indicators by a contingency adjustment factor, which comprehensively reflects the impact of inventory adequacy, data reliability, fluctuation trends, transportation costs, and contingency events.

[0089] The numerator is used to assess the supply capacity of warehouse m, and includes two sub-items: one is the product of the current inventory and the weighted inventory data. ,in This is the current actual inventory quantity in warehouse m. It is the feature alignment weight of inventory data, which reflects the reliability of inventory data. It is calculated by the feature alignment weight formula. This sub-item ensures that high-reliability inventory data accounts for a higher proportion in the evaluation and reduces the interference of low-quality inventory data.

[0090] The denominator is used to assess the necessity and cost of restocking, and includes two sub-items: one is the product of current demand and the weighted demand data. ,in It represents the drug demand in demand region n. The first is the feature alignment weight of the demand data, which ensures that demand assessment is based on reliable data and avoids invalid shipments caused by false demand; the second is the product of transportation distance and distance influence coefficient. ,in It is the straight-line distance from the warehouse to the demand area, and τ is the distance influence coefficient. The coefficient is larger when the cost of transferring goods over long distances is higher. This sub-item balances transportation costs and avoids blindly choosing to transfer goods from long-distance warehouses.

[0091] The formula is finally multiplied by the contingency correction factor. This factor is triggered by real-time on-chain event storage and updates. For example, if an epidemic occurs in the demand area, the factor increases, and priority is given to dispatching goods, so that the scheduling can respond quickly to abnormal scenarios.

[0092] when When warehouse m has a relatively stronger supply capacity, a higher necessity for transferring goods, or a lower cost, it should be prioritized for transferring goods; when If the time frame is not met, it indicates that the priority for dispatching goods is low and the dispatch can be delayed. This formula integrates multiple dimensions of indicators, making logistics scheduling more closely aligned with actual supply and demand and changing scenarios, with clearly defined parameters and calculation logic.

[0093] Path adjustments based on this coefficient can reduce the transmission of initial forecast bias, avoid inventory backlogs or shortages, and ensure the rational allocation of medical supplies between regions, especially enabling rapid response during public health emergencies.

[0094] Traditional medical data token transactions have the following technical problems: After the token is transferred, there is no effective permission verification mechanism, making it impossible to confirm whether the recipient meets the preset usage conditions, which easily leads to permission abuse; After the token expires or the number of calls is exhausted, the data access permissions are not revoked in time, allowing the recipient to still illegally access the data and harming the interests of the data provider; The permission revoke records are not linked to the blockchain, the revoke behavior lacks traceability, and it is difficult to provide evidence when permission disputes occur.

[0095] Based on this, the data asset trading platform also includes a token verification unit and a permission revoke engine. The token verification unit has a built-in token parsing algorithm to verify whether the recipient has the corresponding data usage rights after the token is transferred. The verification content includes the token's validity period, the range of accessible data, and the remaining amount of call counts. The permission revoke engine is linked with the smart contract executor of the blockchain rights confirmation layer. It is used to automatically revoke the recipient's data access rights when the token expires, the call count is exhausted, or the data provider triggers a permission revoke instruction, and upload the permission revoke record to the transaction chain. After the data usage rights transaction is completed, the token verification unit sends a transaction completion notification to the data provider's terminal. The notification includes the transaction amount, the recipient's identifier, and the permission effective time. At the same time, the permission revoke engine starts timing to ensure that the permissions are revoked in a timely manner after the token's validity period ends.

[0096] The solution employs a dual mechanism to ensure controllable token permissions: the token verification unit verifies the legality of permissions immediately after a transaction, clearly defining the recipient's usage boundaries and preventing overuse; the permission revocation engine automatically revoks permissions based on preset conditions or manual commands, with the revocation action linked to the smart contract to ensure timely execution; and the revocation record is uploaded to the transaction chain, enabling full traceability of permission changes.

[0097] The notification mechanism allows data providers to monitor transaction status in real time, while the timing function ensures timely revocation of permissions. This overall mechanism addresses the issue of easily issued but difficult-to-manage token permissions, ensuring that data usage rights remain controllable after the transaction. This guarantees the recipient's legitimate use of the data, prevents abuse of permissions, and enhances the data provider's trust in the transaction.

[0098] Traditional medical data integration methods suffer from the following technical problems: data collection, ownership confirmation, analysis, transaction, and supply chain optimization operate independently, lacking a collaborative mechanism, and data is prone to format incompatibility and information loss when flowing between stages; data integration does not adhere to privacy protection logic, and there is a risk of raw data leakage in some stages; cross-domain data collaborative analysis lacks standardized process design, making it difficult for data from pharmaceutical companies, hospitals, and other entities to participate in joint modeling efficiently; data asset transactions are disconnected from revenue sharing and settlement, and revenue sharing results cannot be fed back to data providers in a timely manner.

[0099] Based on this, please refer to Figure 2 This embodiment provides a method for integrating medical industry data assets based on blockchain and AI, including the following steps:

[0100] S1: The edge computing nodes of the data acquisition layer receive medical data sent by the hospital HIS system, pharmaceutical R&D system and medical distribution platform. The built-in cleaning engine removes duplicate and invalid data, and the desensitization algorithm replaces sensitive information such as patient ID number and medical record number to obtain standardized data that can be used for subsequent processing.

[0101] S2: The digital fingerprint generator of the blockchain ownership confirmation layer generates digital fingerprints for standardized data, uploads the fingerprints and contributor public keys to the evidence storage chain, and at the same time the smart contract executor generates a smart contract containing data usage rules and deploys it to the transaction chain;

[0102] S3: The cross-domain model training engine of the federated learning analytics layer receives local model training requests from each node, and the heterogeneous data alignment unit performs the alignment according to the alignment weights. Feature alignment is performed on heterogeneous data, and joint training is performed after dynamic noise is added by the differential privacy protection module;

[0103] S4: The data asset trading platform matches data supply and demand based on a tokenization mechanism. It achieves anonymous trading of data usage rights through a multi-party secure computation module, and the revenue-sharing and settlement engine calculates based on contribution coefficients. Calculate the revenue sharing amount;

[0104] S5: The RFID data parsing unit of the supply chain traceability optimization module reads the RFID information of the medicine and generates the medicine circulation trajectory by combining it with the batch records on the chain. The AI ​​logistics scheduling engine dynamically adjusts the coefficients according to the logistics. Generate a logistics route adjustment plan.

[0105] This solution achieves end-to-end collaboration through standardized steps: Step one, cleaning and de-identification, lays a standardized data foundation for subsequent processing; Step two, blockchain-based rights confirmation clarifies data ownership and provides a legal basis for data transfer; Step three, federated learning, enables cross-domain modeling under the premise of privacy protection, solving the problem of data being usable but not visible.

[0106] Step four, linking token transactions with revenue sharing, ensures the fair allocation of data asset value; step five, supply chain optimization, achieves precise scheduling based on integrated data. Each step is interconnected, with data flowing seamlessly between stages, and privacy protection is maintained throughout the entire process.

[0107] This method covers the entire lifecycle of medical data from generation to application, breaking down barriers between stages and achieving secure data integration and efficient utilization. It provides a feasible operational process for collaboration across the entire medical industry chain. Traditional federated learning model training suffers from the following technical problems: Real-time monitoring of model accuracy at each node is not performed during joint training; some nodes, due to abnormal local data distribution leading to low model accuracy, still participate in parameter aggregation, further lowering overall model accuracy; a lack of targeted accuracy optimization mechanisms means that when node model accuracy declines, the direction of adjustment cannot be clearly defined, resulting in blind and inefficient optimization; and fluctuations in model accuracy are not promptly fed back to the nodes, preventing them from adjusting their local training strategies based on overall accuracy, leading to persistent accuracy issues. Therefore, step three also includes a model accuracy optimization sub-step: During joint training, after every 10 rounds of parameter aggregation, the parameter server of the cross-domain model training engine calculates the model's accuracy on the validation set of each node; when the validation set accuracy of a node falls below a preset threshold, the parameter server sends a feature importance prompt to that node, containing the three feature dimensions with the greatest impact on model accuracy and their corresponding alignment weights. The nodes adjust the feature weights of their local training data according to the prompts, increasing the training proportion of highly important features. After adjustment, they re-execute model training and upload the parameter gradients. The parameter server receives the adjusted gradients and aggregates them again. The above process is repeated until the validation set accuracy of all nodes is not lower than the preset threshold.

[0108] This solution improves model accuracy and stability through closed-loop optimization: It periodically calculates the validation set accuracy of each node to promptly identify nodes with low accuracy, preventing them from dragging down the overall model; feature importance indicators clarify the optimization direction, pointing out the features with the greatest impact on accuracy and their corresponding weights, making node adjustments more targeted and avoiding blind optimization; nodes increase the training ratio of high-importance features based on these indicators, strengthening the impact of key features on the model; the optimization process is iterative until all nodes achieve the required accuracy. The optimization logic is deeply correlated with feature alignment weights, and adjustment measures are formulated based on the actual impact of data features, effectively addressing accuracy fluctuations caused by differences in data distribution. The optimized joint model maintains stable analytical accuracy at each node, providing reliable model support for scenarios such as pharmaceutical research and development and efficacy prediction.

[0109] The above are merely preferred embodiments of the present invention and are not intended to limit the present invention in any other way. Any person skilled in the art may make changes or modifications to the above-disclosed technical content to create equivalent embodiments that can be applied to other fields. However, any simple modifications, equivalent changes, and modifications made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the protection scope of the present invention.

Claims

1. A data asset integration system for the entire healthcare industry based on blockchain and AI, characterized in that: It includes a data acquisition layer, a blockchain rights confirmation layer, a federated learning and analysis layer, a data asset trading platform, and a supply chain traceability optimization module; The data acquisition layer includes edge computing nodes, which have built-in data cleaning engines and desensitization algorithms for localized cleaning and desensitization of hospital HIS system data, drug RFID data, pharmaceutical R&D experimental data and medical imaging data. The blockchain ownership confirmation layer constructs a medical data asset ledger based on a consortium blockchain architecture. The asset ledger includes data hash records, contributor identity identifiers, and usage authorization logs, and integrates a smart contract module for dynamic permission management. The federated learning analysis layer is configured with a distributed AI training framework, which is used to train drug demand prediction models and adverse reaction monitoring models in conjunction with nodes in medical distribution, pharmaceutical manufacturing and health care scenarios, without transmitting the original data across domains. The data asset trading platform is built on a tokenization mechanism and integrates a secure computing module to enable the transfer of permissions in a way that makes the original data invisible during the data usage rights trading process. The supply chain traceability optimization module includes an RFID data parsing unit and an AI logistics scheduling engine, which are used to optimize logistics routes by combining on-chain drug batch records and real-time inventory data. The data asset trading platform includes a token issuance module, a transaction matching engine, a revenue sharing and settlement engine, and a transaction record storage module. The Token issuance module is used to generate corresponding Tokens according to the data access permission level. The Token contains an encrypted fragment of the data access key, the validity period of the permission, and the analyzable dimension information. The Token for clinical trials contains 3 analysis dimensions and has a validity period of 30 days, while the Token for basic statistics contains 1 analysis dimension and has a validity period of 15 days. The transaction matching engine is used to match the token purchase request of the data demander with the token sale information of the data provider, and automatically generate transaction orders based on the preset price range and permission requirements of both parties. The revenue sharing and settlement engine is connected to the smart contract executor of the blockchain rights confirmation layer through an API interface. It is used to receive revenue sharing vouchers and convert revenue sharing amount into cashable assets. The revenue sharing amount is calculated based on data contribution, number of calls and weight of analysis dimensions. The transaction record storage module is used to upload transaction orders, token transfer records and settlement results to the transaction chain to form a complete transaction traceability chain, ensuring that the transaction process is traceable and tamper-proof.

2. The medical industry-wide data asset integration system based on blockchain and AI as described in claim 1, characterized in that, The blockchain rights confirmation layer includes a digital fingerprint generator, a zero-knowledge proof verification unit, a smart contract executor, and a dual-chain communication interface. The digital fingerprint generator uses the SHA-256 hash algorithm to calculate the cleaned medical data after data collection is completed, generating a unique digital fingerprint containing a data content summary, collection timestamp, and collection terminal hardware identifier. The zero-knowledge proof verification unit has a built-in verification algorithm, which is used to verify that the data has not been tampered with and that its source is authentic by comparing the digital fingerprint generated by the data being called with the on-chain evidence fingerprint during the data calling phase. The smart contract executor stores a data usage agreement, which presets the data call permission range, the single call revenue sharing ratio, and the cumulative call revenue sharing threshold. When the data call behavior meets the preset conditions of the agreement, a revenue sharing voucher containing the caller's identifier, call time, and revenue sharing amount is automatically generated. The dual-chain communication interface is used to realize cross-chain data synchronization between the evidence storage chain and the transaction chain. The evidence storage chain is a consortium chain structure used to persistently store digital fingerprints, contributor public keys and zero-knowledge proof verification records. The transaction chain is a private chain structure with permissions used to store smart contracts, ledger vouchers and permission change records.

3. The medical industry-wide data asset integration system based on blockchain and AI as described in claim 1, characterized in that, The federated learning analytics layer includes a cross-domain model training engine, a heterogeneous data alignment unit, and a differential privacy protection module. The cross-domain model training engine includes a parameter server and a node training client. The parameter server is used to receive and aggregate the model parameter gradients uploaded by each node. The node training client is deployed at pharmaceutical company and hospital terminals and is used to perform model training based on local data. The heterogeneous data alignment unit has a built-in pre-trained feature extraction model, which includes a medical image feature extraction sub-model and a pathological text parsing sub-model. The medical image feature extraction sub-model adopts a convolutional neural network structure to extract lesion feature vectors from unstructured medical images. The pathological text parsing sub-model adopts a BERT pre-trained model to convert structured pathology reports into text feature vectors and maps the two types of vectors to a unified feature space of dimension 512 through a feature mapping algorithm. The differential privacy protection module is equipped with a noise generator to add Laplacian noise before uploading the model parameter gradient. The noise intensity is dynamically adjusted according to the data sensitivity. The noise coefficient for clinical case data is set to 0.3, and the noise coefficient for pharmaceutical R&D experimental data is set to 0.

5.

4. The medical industry-wide data asset integration system based on blockchain and AI as described in claim 1, characterized in that, The blockchain rights confirmation layer is calculated using the multi-source data contribution coefficient formula, expressed as follows: ; in, For the first Contribution coefficient of the data source class; For the first The number of valid samples from the data source class; For all participants in the joint analysis Total number of valid samples from the data source; For the first Quality coefficient of class data; For the first The timeliness coefficient of the data type is determined by the formula. Calculation, where The time interval from data generation to the present. This refers to the maximum number of valid days for this type of data; For the first Domain urgency coefficient for class data; For the first The domain coefficient of the class data, , , , These are the weighting coefficients, and ,in , , , .

5. The medical industry-wide data asset integration system based on blockchain and AI as described in claim 3, characterized in that, The heterogeneous data alignment unit calculates feature alignment weights using a feature alignment weight formula, which are used to correct feature offsets when heterogeneous data is mapped to a unified feature space. ; in, For the first Class data source in the first Alignment weights on each feature dimension; For the first Contribution coefficient of the data source class; For the first Class data source in the first The mean of each feature dimension; For all data sources participating in joint training, at the... The overall mean across all feature dimensions is obtained by the parameter server from the data uploaded by each node. We obtain the result by weighted aggregation; For all data sources in the first The total standard deviation across all feature dimensions; This is the offset correction factor; For the first Importance coefficients for each feature dimension.

6. The medical industry-wide data asset integration system based on blockchain and AI as described in claim 1, characterized in that, The supply chain traceability optimization module calculates the following using the logistics dynamic adjustment coefficient formula: ; in, For from warehouse To the demand area Logistics adjustment coefficient, Indicates priority from warehouse To the region Transfer of goods, This indicates a delay in ordering the goods. For warehouse Current actual inventory quantity; For warehouse Feature alignment weights for inventory data; For warehouse In the past Daily inventory fluctuations; This is the time decay coefficient; For demand areas Current demand for medicines; For demand areas Feature alignment weights for demand data; For warehouse To the demand area The straight-line distance; This is the distance influence coefficient; This is a correction factor for sudden events.

7. The medical industry-wide data asset integration system based on blockchain and AI as described in claim 1, characterized in that, The data asset trading platform also includes a token verification unit and a permission revocation engine; The Token verification unit has a built-in Token parsing algorithm, which is used to verify whether the recipient has the corresponding data usage rights after the Token is transferred. The verification content includes the validity period of the Token, the range of accessible data, and the remaining quota of call counts. The permission revoke engine works in conjunction with the smart contract executor of the blockchain rights confirmation layer to automatically revoke the recipient's data access permissions when the token expires, the number of calls is exhausted, or the data provider triggers a permission revoke instruction, and uploads the permission revoke record to the transaction chain. After the data usage right transaction is completed, the token verification unit sends a transaction completion notification to the data provider's terminal. The notification includes the transaction amount, the recipient's identifier, and the permission's effective time. At the same time, the permission revoke engine starts timing to ensure that the permission is revoked in a timely manner after the token's validity period expires, thus avoiding permission abuse.

8. A method for integrating medical industry data assets based on blockchain and AI, applied to the medical industry data asset integration system based on blockchain and AI as described in any one of claims 1-7, characterized in that, include: S1: The edge computing nodes of the data acquisition layer receive medical data sent by the hospital HIS system, pharmaceutical R&D system and medical distribution platform. They remove duplicate and invalid data through the built-in cleaning engine, and then replace sensitive information through the desensitization algorithm to obtain standardized data for subsequent processing. S2: The digital fingerprint generator of the blockchain ownership confirmation layer generates digital fingerprints for standardized data, uploads the digital fingerprints and contributor public keys to the evidence storage chain, and at the same time the smart contract executor generates a smart contract containing data usage rules and deploys it to the transaction chain; S3: The cross-domain model training engine of the federated learning analysis layer receives local model training requests from each node. The heterogeneous data alignment unit performs feature alignment on heterogeneous data according to the feature alignment weight formula. The differential privacy protection module adds dynamic noise and then performs joint training. Only the model parameter gradient is transmitted during the training process. S4: The data asset trading platform matches data supply and demand based on the tokenization mechanism, realizes anonymous trading of data usage rights through a multi-party secure computing module, and calculates the revenue sharing amount based on the blockchain ownership confirmation layer through the multi-source data contribution coefficient formula. S5: The RFID data parsing unit of the supply chain traceability optimization module reads the RFID information of the medicine and generates the medicine circulation trajectory by combining it with the batch records on the chain. The AI ​​logistics scheduling engine generates a logistics path adjustment plan based on the logistics dynamic adjustment coefficient formula.

9. The method for integrating medical industry data assets based on blockchain and AI according to claim 8, characterized in that, S3 also includes a model accuracy optimization sub-step: During joint training, after every 10 rounds of parameter aggregation, the parameter server of the cross-domain model training engine calculates the model's accuracy on the validation set of each node; when the validation set accuracy of a node falls below a preset threshold, the parameter server sends a feature importance prompt to that node, which includes the three feature dimensions that have the greatest impact on model accuracy and their corresponding alignment weights. The node adjusts the feature weights of its local training data according to the prompts, increases the training ratio of high-importance features, re-executes model training after adjustment, and uploads the parameter gradients; the parameter server receives the adjusted gradients and aggregates them again, repeating the above process until the validation set accuracy of all nodes is not lower than the preset threshold.