A method for constructing a medical research and development test data association path based on data processing

By calculating the time boundary damping factor and phase alignment coefficient, a cross-stage interaction matrix is ​​constructed, and the maximum physical absolute amplitude is extracted. This solves the problem of inconsistent frequency and scale in cross-stage data association in pharmaceutical R&D, and achieves accurate data matching and reliable association.

CN122245822APending Publication Date: 2026-06-19SHANGHAI XINGYUANHUI HEALTH TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI XINGYUANHUI HEALTH TECH CO LTD
Filing Date
2026-03-30
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In current pharmaceutical research and development processes, the correlation of cross-stage data leads to context drift due to inconsistencies in frequency and scale. Irregular local sampling causes boundary distortion, making it difficult to accurately match the physical absolute amplitude characteristics of biochemical data, resulting in judgment bias and noise interference.

Method used

By calculating the time boundary damping factor to eliminate distortion, generating the phase alignment coefficient, constructing the cross-stage interaction matrix, extracting the maximum physical absolute amplitude, and calculating the baseline threshold to determine the data connectivity status.

🎯Benefits of technology

It achieves objective and accurate matching of cross-stage data, eliminates noise interference caused by irregular sampling, adaptively quantifies feature changes at different stages, establishes reliable correlation paths, and is suitable for the merged analysis of high-risk medical data.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245822A_ABST
    Figure CN122245822A_ABST
Patent Text Reader

Abstract

This invention relates to the field of R&D data processing technology and discloses a method for constructing association paths for pharmaceutical R&D trial data based on data processing. This method collects monitoring indicators from adjacent trial stages to construct multidimensional data and timestamp sequences; calculates the adjacent intervals of the time series and obtains the boundary damping factor based on the time density gradient; uses this factor to attenuate the original data to remove boundary distortion; generates a phase alignment coefficient based on the corrected sequence internal variability; calculates the normalized dot product and combines it with the alignment coefficient to construct a cross-stage interaction matrix; extracts the matrix mean as topological tension, which is then converted into standard path association degree; finally, extracts the maximum physical absolute amplitude of the data to calculate the dynamic baseline threshold, and outputs the final association connectivity state through comparison. This invention effectively overcomes cross-stage data scale misalignment and boundary distortion, achieving objective quantitative construction of association paths.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of R&D data processing technology, specifically a method for constructing association paths for pharmaceutical R&D trial data based on data processing. Background Technology

[0002] The pharmaceutical research and development process typically involves multiple progressive experimental phases, such as the transition from preclinical animal experiments to Phase I clinical trials. In different stages of research and development, it is necessary to continuously track common key physiological or biochemical monitoring indicators. In order to comprehensively evaluate the safety and efficacy of drugs, it is necessary to connect experimental data across stages to build a continuous data association path. In existing pharmaceutical R&D data processing solutions, the correlation of cross-stage data usually relies on manual experience comparison or basic statistical trend fitting. However, different trial stages often face drastically different contextual testing environments. This difference in environment directly leads to a great inconsistency in the scale and sampling frequency of the data. When assessing the correlation of cross-stage data, the severe misalignment of frequency and scale can cause context drift, which can lead to huge judgment biases in conventional comparison and alignment schemes. Furthermore, in actual medical trial scenarios, the initial enrollment and final exit periods at each stage are often unstable, which leads to extremely irregular system sampling times. Irregular sampling frequencies will produce strong boundary distortions at both ends of the time series. Existing data alignment schemes are unable to adaptively identify and dynamically eliminate such local distortions, which not only destroys the phase alignment accuracy of the global sequence, but also directly introduces distortion noise into subsequent variability calculations. Meanwhile, pharmaceutical and biochemical data often have both positive and negative physical meanings. When quantifying the cross-stage correlation, existing technologies are unable to objectively and accurately separate the physical absolute amplitude characteristics of the data dimension. This makes it impossible for the system to set a dynamic baseline defense based on the macroscopic physical differences in data distribution. Ultimately, the system is unable to accurately determine the true connectivity state of high-risk core data during cross-stage transitions. Therefore, there is an urgent need for a data processing-based method for constructing association paths for pharmaceutical R&D trial data to address the context drift problem caused by inconsistencies in data frequency and scale across different stages. This method should also overcome boundary time series distortions caused by irregular local sampling and extract dynamic baseline thresholds based on real physical absolute amplitudes. Ultimately, this will enable objective and accurate matching of pharmaceutical R&D data and the construction of reliable association paths. Summary of the Invention

[0003] This invention provides a method for constructing data association paths in pharmaceutical R&D trials based on data processing, which helps to solve the problems mentioned in the background art.

[0004] This invention provides the following technical solution: a method for constructing association paths for pharmaceutical R&D trial data based on data processing, comprising: Collect common monitoring index data from adjacent experimental phases, extract sampling data in chronological order, and construct multidimensional data sequences and timestamp sequences; For the constructed timestamp sequence, calculate the time interval between every two adjacent sampling points within each experimental phase; The time boundary damping factor of each sampling node is calculated by taking the ratio of the absolute value of the difference between adjacent time intervals to their sum. The original multidimensional data sequence was numerically attenuated using the time boundary damping factor to obtain a corrected multidimensional data sequence with distortion removed. The sum of squares of the Euclidean norms of the difference vectors between adjacent nodes in the corrected multidimensional data sequence is taken as the internal variability. The phase alignment coefficient is generated by combining the variability of the two stages. Calculate the normalized dot product between the two-stage corrected multidimensional data, multiply the result by the phase alignment coefficient, and map to generate a cross-stage interaction matrix; The mean of all elements in the cross-stage interaction matrix is ​​taken as the topological tension, and the standard path correlation degree is obtained by nonlinear transformation of the tension. Extract the maximum physical absolute amplitude of various correction data, calculate the average extreme value ratio to obtain the baseline threshold, compare it with the path correlation degree and output the connectivity status.

[0005] Optionally, the step of collecting common monitoring index data from adjacent experimental phases, extracting sampling data in chronological order, and constructing a multidimensional data sequence and timestamp sequence includes: By connecting to a pharmaceutical R&D database, two adjacent trial phases are selected, including a pre-trial phase and a post-trial phase. Select common key monitoring indicators in both phases; Export the sampling data of the pre-stage and the sampling data of the post-stage in chronological order of sampling time. Construct a multidimensional data sequence and a corresponding timestamp sequence for the pre-stage, wherein the single sampling vector of the pre-stage is composed of the data combination of each key monitoring indicator in that stage; Construct a multidimensional data sequence and a corresponding timestamp sequence for the post-stage, wherein the single sampling vector of the post-stage is composed of the data combination of each key monitoring indicator in that stage.

[0006] Optionally, for the constructed timestamp sequence, calculating the time interval between every two adjacent sampling points within each experimental phase includes: For the pre-stage, the time difference between the current sampling point and the previous sampling point arranged in chronological order within the stage is calculated as the time interval of the pre-stage. For the subsequent stage, the time difference between the current sampling point and the previous sampling point arranged in chronological order within the stage is calculated as the time interval of the subsequent stage. The current sampling point is calculated starting from the second sampling point within the corresponding stage.

[0007] Optionally, the calculation of the time boundary damping factor for each sampling node is specifically obtained by taking the ratio of the absolute value of the difference between adjacent time intervals to their sum, including: For the pre-processing stage, obtain the time interval corresponding to the current sampling point and the time interval corresponding to the next sampling point after the current sampling point; Calculate the absolute value of the difference between the two time intervals, calculate the sum of the two time intervals, and divide the absolute value by the sum to obtain the time boundary damping factor corresponding to the current sampling point in the pre-stage. For the subsequent stage, obtain the time interval corresponding to the current sampling point and the time interval corresponding to the next sampling point after the current sampling point; Calculate the absolute value of the difference between the two time intervals, calculate the sum of the two time intervals, and divide the absolute value by the sum to obtain the time boundary damping factor corresponding to the current sampling point in the subsequent stage; Set the time boundary damping factor of the first and last sampling points of each stage to zero.

[0008] Optionally, the step of using the time boundary damping factor to perform numerical attenuation calculation on the original multidimensional data sequence to obtain a distortion-removed corrected multidimensional data sequence includes: Subtract the time boundary damping factor of the current sampling point in the pre-stage from the numerical value to obtain the pre-stage attenuation coefficient. Multiply the pre-stage attenuation coefficient by the original single-sample vector corresponding to the pre-stage to obtain the corrected sampling vector of the pre-stage. Obtain all corrected sampling vectors from the pre-stage to form a corrected multidimensional data sequence for the pre-stage. Subtract the time boundary damping factor of the current sampling point in the subsequent stage from the numerical value to obtain the attenuation coefficient of the subsequent stage. Multiply the attenuation coefficient of the later stage by the original single sampling vector corresponding to the later stage to obtain the corrected sampling vector of the later stage. All corrected sampling vectors in the post-stage are obtained to form the corrected multidimensional data sequence of the post-stage.

[0009] Optionally, the step of calculating the sum of squares of the Euclidean norms of the difference vectors between adjacent nodes in the corrected multidimensional data sequence as the internal variability, and combining the variability of the two stages to calculate the phase alignment coefficient, includes: Calculate the global internal variability of the data sequences for the two experimental phases respectively; For the corrected multidimensional data sequence in the pre-stage, calculate the difference vector between two adjacent corrected sampling vectors, and calculate the square of the Euclidean norm of the difference vector; The global internal variability of the preceding stage is obtained by summing the squares of all the Euclidean norms calculated in the preceding stage. For the corrected multidimensional data sequence in the later stage, calculate the difference vector between two adjacent corrected sampling vectors, and calculate the square of the Euclidean norm of the difference vector. The global internal variability of the post-stage is obtained by summing the squares of all the Euclidean norms calculated in the post-stage. Multiply the global internal variability of the preceding stage by the global internal variability of the following stage, and multiply the product by a value of two to obtain the first value. The second value is obtained by adding the square of the global internal variability of the preceding stage to the square of the global internal variability of the following stage, and then adding a preset minimum positive constant. Divide the first value by the second value to obtain the cross-stage phase alignment coefficient.

[0010] Optionally, the step of calculating the normalized dot product between the two-stage corrected multidimensional data and multiplying the result by a phase alignment coefficient to map and generate a cross-stage interaction matrix includes: Select a modified sampling vector from the pre-stage and a modified sampling vector from the post-stage, calculate their dot product and take the absolute value; Multiply the absolute value by the cross-stage phase alignment coefficient, and then multiply by the value two to obtain the first intermediate value; Calculate the square of the Euclidean norm of the two selected modified sampling vectors respectively, add the squares of the two Euclidean norms together, and add a preset minimum positive constant to obtain the second intermediate value. Divide the first intermediate value by the second intermediate value to obtain an element value in the cross-stage interaction matrix; By traversing all combinations of modified sampling vectors in the pre-stage and post-stage phases, the values ​​of all elements in the cross-stage interaction matrix are calculated, thus generating the cross-stage interaction matrix.

[0011] Optionally, the step of calculating the mean of all elements in the cross-stage interaction matrix as the topological tension, and performing a nonlinear transformation on this tension to obtain the standard path correlation degree, includes: Add all the element values ​​in the cross-stage interaction matrix and divide the sum by the total number of elements in the cross-stage interaction matrix to obtain the stage transition data topology tension. The total number of elements is the product of the total number of samples in the pre-stage and the total number of samples in the post-stage; The square of the topological tension of the transition data in the calculation stage is multiplied by two to obtain the third intermediate value; Add one to the square of the topological tension of the stage transition data to obtain the fourth intermediate value; Divide the third intermediate value by the fourth intermediate value to obtain the standard path relevance.

[0012] Optionally, the step of extracting the maximum physical absolute amplitude of various types of corrected data, calculating the average extreme value ratio to obtain a baseline threshold, comparing it with the path correlation degree, and outputting the connectivity status includes: Obtain the component values ​​of each dimension in the corrected multidimensional data sequence of the pre-stage and post-stage respectively, and take the absolute value of all component values. Under the same key monitoring indicator, extract the maximum value among all absolute values ​​in the preceding stage and the maximum value among all absolute values ​​in the subsequent stage. By comparing the maximum absolute values ​​of the two stages under this indicator, the smaller of the two values ​​is divided by the sum of the larger of the two values ​​and the preset minimum positive constant, and the extreme value ratio corresponding to the indicator is obtained. The extreme value ratios of all key monitoring indicators are summed, and the summation result is divided by the total number of key monitoring indicators to obtain the dynamic extreme value baseline threshold. The penalty threshold is obtained by multiplying the dynamic extreme value baseline threshold by the preset baseline penalty sensitivity coefficient. Subtract the penalty threshold from the standard path affinity to obtain the connectivity status value of the associated path; Determine if the connectivity status value of the associated path is greater than zero. If it is greater than zero, determine if the associated path can be established; otherwise, determine if the associated path cannot be established.

[0013] The present invention has the following beneficial effects: 1. This technical solution primarily eliminates distortion in the original multidimensional data sequence by calculating the time boundary damping factor, then extracts the internal variability to generate a phase alignment coefficient, and calculates the standard path correlation degree by combining the topological tension of the cross-stage interaction matrix. Finally, it determines the connectivity status of data in adjacent experimental stages by extracting the maximum physical absolute amplitude to construct a dynamic extreme value baseline threshold. In the specific environment of adjacent experimental stages in pharmaceutical R&D (such as the early transformation from the preclinical stage to Phase I clinical trials), data often exhibits bias due to irregular sampling or scale misalignment. This solution can objectively quantify the characteristic changes between different stages and eliminate the bias caused by irregular sampling or scale misalignment. To mitigate boundary noise and distortion interference caused by individual patient differences or irregular local sampling times; in the face of high-risk cross-period transition scenarios of core pharmaceutical data, the system can adaptively quantify the symmetrical similarity of two sets of irregular frequency sequences and accurately isolate the true physical amplitude extreme values ​​to represent the macroscopic differences in the benchmark; relying on baseline thresholds and penalty sensitivity coefficients determined purely by data distribution, the system balances tolerance for drift in normal trial phases with strict prevention of false alarms, thereby objectively and accurately establishing whether cross-period R&D data can successfully construct a reliable correlation path, allowing it to be included in the management system for subsequent merge analysis; 2. By collecting common monitoring index data from adjacent experimental phases, sampling data is extracted in chronological order to construct multidimensional data sequences and timestamp sequences. This operation structurally integrates scattered and independent pharmaceutical R&D monitoring data, establishes the basic physical structure for data docking at different R&D phases, and provides a unified high-dimensional time series comparison space for subsequent judgment of cross-phase data evolution patterns. 3. By calculating the time interval between every two adjacent sampling points within each experimental stage for the constructed timestamp sequence, this step effectively quantifies the dynamic changes in sampling frequency during clinical trials and pharmaceutical R&D. In actual pharmaceutical trial scenarios, the initial enrollment and final exit periods of each stage are often unstable, resulting in extremely irregular system sampling times. Direct data alignment would introduce severe misalignment errors. Extracting the absolute time span between adjacent sampling points can accurately reflect the temporal abrupt changes in sampling frequency. This operation explicitly extracts the density gradient information hidden within the original timestamps, providing the most direct physical measure for subsequent adaptive identification of the distortion degree of local time series, thereby effectively avoiding global sequence phase alignment deviations caused by irregular sampling times. 4. The time boundary damping factor of each sampling node is calculated by taking the ratio of the absolute value of the difference between adjacent time intervals to their sum. This method uses the relative rate of change of time intervals to dynamically quantify the abrupt change and distortion of sampling frequency. The more uneven the time intervals are, the larger the damping factor is calculated. This ratio-based calculation can automatically adapt to different time scales and sampling frequencies at different stages. Whether it is preclinical observation at the day level or clinical monitoring at the hour level, it can be uniformly normalized to the same evaluation system. The forced use of absolute value calculation ensures the non-negativity of the damping coefficient threshold and the accuracy of its physical meaning. It effectively prevents the anti-physical phenomenon of data attenuation being transformed into amplification. Thus, it provides a reliable dynamic adaptive quantitative basis for accurately capturing time boundary distortion noise caused by individual patient differences or irregular local sampling time. 5. By using the calculated time boundary damping factor to perform numerical attenuation calculation on the original multidimensional data sequence, a distorted corrected multidimensional data sequence is obtained. This step directly applies to the data at the distortion points of the time series and performs adaptive numerical attenuation on them proportionally. The attenuation force is greater for nodes with more drastic changes in sampling frequency, thereby dynamically removing distortion noise introduced by boundary instability. This makes the time series, which was originally filled with irregular fluctuations, smooth and restore it to a pure data base that can truly reflect the trend of drug action. It effectively avoids distortion noise being directly introduced into the subsequent variability calculation and spatial interaction matrix construction, and protects the essential characteristics of the core biochemical indicators of pharmaceutical research and development to the greatest extent. This greatly enhances the robustness of the algorithm and the purity of feature extraction in the face of extreme irregular sampling environments. 6. By calculating the sum of squares of the Euclidean norms of the difference vectors between adjacent nodes in a modified multidimensional data sequence as the internal variability, this calculation method fully considers the overall evolution trend of pharmaceutical and biochemical data in multidimensional space. The difference vectors between adjacent nodes accurately capture the dynamic rate and direction of change of key monitoring indicators within a unit sampling span. Extracting the sum of squares of the Euclidean norms further aggregates this multidimensional rate of change into a macroscopic quantifier reflecting the intensity of internal fluctuations throughout the entire experimental phase. This calculation method can truly reflect the inherent physiological changes of pharmaceutical data at this stage after excluding time noise, laying a solid physical foundation for subsequently unifying the variability measurement units of different stages and quantifying the symmetrical similarity of irregular frequency sequences in characteristic changes. The cornerstone of spatial mapping: By combining the internal variability of the pre-trial and post-trial stages, a cross-stage phase alignment coefficient is generated. This coefficient uses the product and sum of squares of the two sets of internal variability to construct a dimensionless symmetric evaluation criterion. Its value range is strictly defined within a fixed interval, thereby effectively unifying the drastically different data scales and variability dimensions of different trial stages. When connecting cross-stage data sources, this alignment coefficient can accurately quantify the symmetric similarity of two sets of irregular frequency sequences in the evolution of their intrinsic features. This enables clinical sequences that were originally unable to be directly compared due to severe misalignment of frequency and scale to have a comparable basis mapped to the isomorphic space, improving the system's objective consistency in quantifying context drift and the accuracy of cross-stage data connection. 7. By calculating the normalized dot product between the two-stage corrected multidimensional data and multiplying the result by the phase alignment coefficient, a cross-stage interaction matrix is ​​generated. This construction method establishes a high-dimensional gravitational connection between the experimental data of two stages with different time spans and scales. The normalized dot product operation deeply mines the homogeneity in direction and structure between single multidimensional sampling vectors in different stages, eliminating the influence of absolute numerical fluctuations caused by individual metabolic differences. Multiplying by the phase alignment coefficient further incorporates the symmetric similarity weight of the global sequence into the comparison of local nodes, so that each element in the generated cross-stage interaction matrix can objectively and comprehensively reflect the magnitude of the correlation gravitational force between a certain sampling in the previous stage and a certain sampling in the subsequent stage in the isomorphic space, providing a complete high-dimensional relationship map for the subsequent extraction of global stage transition features. 8. By taking the mean of all elements in the cross-stage interaction matrix as the topological tension and performing a nonlinear transformation on this tension to calculate the standard path correlation degree, this step effectively realizes the dimensionality reduction and aggregation of global correlation information in the high-dimensional interaction matrix. Extracting the mean as the topological tension not only smooths outlier interference caused by local abnormal sampling points, but also extracts the comprehensive performance of the cohesion of the data in the two experimental stages from a macroscopic level. Subsequently, a specific nonlinear transformation formula is introduced, combined with a specific normalization gain processing, to break the deadlock of the value range collapse of the original topological tension under extreme matching conditions, and accurately expand and map the correlation degree score to the standard interval. This score is entirely driven by the inherent structural lineage relationship of the medical data, enhancing the scientific nature and objective boundary clarity of the cross-stage matching state quantification. 9. By extracting the maximum physical absolute amplitude of various corrected data and calculating the average extreme value ratio, a dynamic extreme value baseline threshold is obtained. This extreme value extraction logic forces the use of absolute value operators for each component of the corrected data, ensuring that pharmaceutical and biochemical indicators, whether exhibiting positive gain or negative decay, can truly reflect their maximum physical fluctuation range. This avoids the error of losing the scientific meaning of the benchmark due to algebraic sign reversal in data with both positive and negative physical meanings. The extreme value ratio average calculated based on this pure physical amplitude can extract the underlying threshold representing the macroscopic difference between the two-stage data benchmarks purely based on the data distribution itself, improving the system's adaptive judgment capability when facing monitoring data for different drug targets. By further correlating the dynamic extreme value baseline threshold with the standard path correlation... The system performs a comparison and outputs the final connectivity status of the associated path based on a preset baseline penalty sensitivity coefficient. This judgment mechanism balances tolerance for normal biological fluctuations with strict prevention of false positives. The introduction of the baseline penalty sensitivity coefficient provides the system with flexible adjustment capabilities to cope with experimental data of different risk levels. The larger the value, the more stringent the system's requirements for data consistency, thus resolutely rejecting excessive drift caused by extreme individual differences in patients. It is suitable for strict compliance merging of high-risk core medical data. The smaller the value, the higher the fault tolerance rate, and it can be widely applied to early cross-species translational data association with strong exploratory characteristics. This rigorous judgment and comparison logic makes the final output connectivity status highly reliable for medical applications and feasible for engineering implementation. Attached Figure Description

[0014] Figure 1 This is a schematic diagram of the basic process of the present invention. Detailed Implementation

[0015] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0016] Example 1, refer to Figure 1 A method for constructing association paths for pharmaceutical R&D trial data based on data processing, comprising: Collect common monitoring index data from adjacent experimental phases, extract sampling data in chronological order, and construct multidimensional data sequences and timestamp sequences; For the constructed timestamp sequence, calculate the time interval between every two adjacent sampling points within each experimental phase; The time boundary damping factor of each sampling node is calculated by taking the ratio of the absolute value of the difference between adjacent time intervals to their sum. The original multidimensional data sequence was numerically attenuated using the time boundary damping factor to obtain a corrected multidimensional data sequence with distortion removed. The sum of squares of the Euclidean norms of the difference vectors between adjacent nodes in the corrected multidimensional data sequence is taken as the internal variability. The phase alignment coefficient is generated by combining the variability of the two stages. Calculate the normalized dot product between the two-stage corrected multidimensional data, multiply the result by the phase alignment coefficient, and map to generate a cross-stage interaction matrix; The mean of all elements in the cross-stage interaction matrix is ​​taken as the topological tension, and the standard path correlation degree is obtained by nonlinear transformation of the tension. Extract the maximum physical absolute amplitude of various correction data, calculate the average extreme value ratio to obtain the baseline threshold, compare it with the path correlation degree and output the connectivity status.

[0017] The process involves collecting shared monitoring index data from adjacent experimental phases, extracting sampled data in chronological order, and constructing a multidimensional data sequence and a timestamp sequence, including: By connecting to a pharmaceutical R&D database, two adjacent trial phases are selected, including a pre-trial phase and a post-trial phase. Select common key monitoring indicators in both phases; Export the sampling data of the pre-stage and the sampling data of the post-stage in chronological order of sampling time. Construct a multidimensional data sequence and a corresponding timestamp sequence for the pre-stage, wherein the single sampling vector of the pre-stage is composed of the data combination of each key monitoring indicator in that stage; Construct a multidimensional data sequence and a corresponding timestamp sequence for the post-stage, wherein the single sampling vector of the post-stage is composed of the data combination of each key monitoring indicator in that stage.

[0018] For the constructed timestamp sequence, the time interval between every two adjacent sampling points within each experimental phase is calculated, including: For the pre-stage, the time difference between the current sampling point and the previous sampling point arranged in chronological order within the stage is calculated as the time interval of the pre-stage. For the subsequent stage, the time difference between the current sampling point and the previous sampling point arranged in chronological order within the stage is calculated as the time interval of the subsequent stage. The current sampling point is calculated starting from the second sampling point within the corresponding stage.

[0019] The calculation of the time boundary damping factor for each sampling node is specifically obtained by taking the ratio of the absolute value of the difference between adjacent time intervals to their sum, including: For the pre-processing stage, obtain the time interval corresponding to the current sampling point and the time interval corresponding to the next sampling point after the current sampling point; Calculate the absolute value of the difference between the two time intervals, calculate the sum of the two time intervals, and divide the absolute value by the sum to obtain the time boundary damping factor corresponding to the current sampling point in the pre-stage. For the subsequent stage, obtain the time interval corresponding to the current sampling point and the time interval corresponding to the next sampling point after the current sampling point; Calculate the absolute value of the difference between the two time intervals, calculate the sum of the two time intervals, and divide the absolute value by the sum to obtain the time boundary damping factor corresponding to the current sampling point in the subsequent stage; Set the time boundary damping factor of the first and last sampling points of each stage to zero.

[0020] The step of using a time boundary damping factor to perform numerical attenuation calculations on the original multidimensional data sequence to obtain a distortion-free corrected multidimensional data sequence includes: Subtract the time boundary damping factor of the current sampling point in the pre-stage from the numerical value to obtain the pre-stage attenuation coefficient. Multiply the pre-stage attenuation coefficient by the original single-sample vector corresponding to the pre-stage to obtain the corrected sampling vector of the pre-stage. Obtain all corrected sampling vectors from the pre-stage to form a corrected multidimensional data sequence for the pre-stage. Subtract the time boundary damping factor of the current sampling point in the subsequent stage from the numerical value to obtain the attenuation coefficient of the subsequent stage. Multiply the attenuation coefficient of the later stage by the original single sampling vector corresponding to the later stage to obtain the corrected sampling vector of the later stage. All corrected sampling vectors in the post-stage are obtained to form the corrected multidimensional data sequence of the post-stage.

[0021] The step of calculating the sum of squares of the Euclidean norms of the difference vectors between adjacent nodes in the corrected multidimensional data sequence as the internal variability, and combining the variability of the two stages to generate the phase alignment coefficient, includes: Calculate the global internal variability of the data sequences for the two experimental phases respectively; For the corrected multidimensional data sequence in the pre-stage, calculate the difference vector between two adjacent corrected sampling vectors, and calculate the square of the Euclidean norm of the difference vector; The global internal variability of the preceding stage is obtained by summing the squares of all the Euclidean norms calculated in the preceding stage. For the corrected multidimensional data sequence in the later stage, calculate the difference vector between two adjacent corrected sampling vectors, and calculate the square of the Euclidean norm of the difference vector. The global internal variability of the post-stage is obtained by summing the squares of all the Euclidean norms calculated in the post-stage. Multiply the global internal variability of the preceding stage by the global internal variability of the following stage, and multiply the product by a value of two to obtain the first value. The second value is obtained by adding the square of the global internal variability of the preceding stage to the square of the global internal variability of the following stage, and then adding a preset minimum positive constant. Divide the first value by the second value to obtain the cross-stage phase alignment coefficient.

[0022] The calculation of the normalized dot product between the two-stage corrected multidimensional data, and the multiplication of the result by the phase alignment coefficient to map and generate a cross-stage interaction matrix, includes: Select a modified sampling vector from the pre-stage and a modified sampling vector from the post-stage, calculate their dot product and take the absolute value; Multiply the absolute value by the cross-stage phase alignment coefficient, and then multiply by the value two to obtain the first intermediate value; Calculate the square of the Euclidean norm of the two selected modified sampling vectors respectively, add the squares of the two Euclidean norms together, and add a preset minimum positive constant to obtain the second intermediate value. Divide the first intermediate value by the second intermediate value to obtain an element value in the cross-stage interaction matrix; By traversing all combinations of modified sampling vectors in the pre-stage and post-stage phases, the values ​​of all elements in the cross-stage interaction matrix are calculated, thus generating the cross-stage interaction matrix.

[0023] The process of calculating the mean of all elements in the cross-stage interaction matrix as topological tension, and then performing a nonlinear transformation on this tension to obtain the standard path correlation degree, includes: Add all the element values ​​in the cross-stage interaction matrix and divide the sum by the total number of elements in the cross-stage interaction matrix to obtain the stage transition data topology tension. The total number of elements is the product of the total number of samples in the pre-stage and the total number of samples in the post-stage; The square of the topological tension of the transition data in the calculation stage is multiplied by two to obtain the third intermediate value; Add one to the square of the topological tension of the stage transition data to obtain the fourth intermediate value; Divide the third intermediate value by the fourth intermediate value to obtain the standard path relevance.

[0024] The process of extracting the maximum physical absolute amplitude of various correction data, calculating the average extreme value ratio to obtain the baseline threshold, comparing it with the path correlation degree, and outputting the connectivity status includes: Obtain the component values ​​of each dimension in the corrected multidimensional data sequence of the pre-stage and post-stage respectively, and take the absolute value of all component values. Under the same key monitoring indicator, extract the maximum value among all absolute values ​​in the preceding stage and the maximum value among all absolute values ​​in the subsequent stage. By comparing the maximum absolute values ​​of the two stages under this indicator, the smaller of the two values ​​is divided by the sum of the larger of the two values ​​and the preset minimum positive constant, and the extreme value ratio corresponding to the indicator is obtained. The extreme value ratios of all key monitoring indicators are summed, and the summation result is divided by the total number of key monitoring indicators to obtain the dynamic extreme value baseline threshold. The penalty threshold is obtained by multiplying the dynamic extreme value baseline threshold by the preset baseline penalty sensitivity coefficient. Subtract the penalty threshold from the standard path affinity to obtain the connectivity status value of the associated path; Determine if the connectivity status value of the associated path is greater than zero. If it is greater than zero, determine if the associated path can be established; otherwise, determine if the associated path cannot be established. This technical solution primarily eliminates distortions in the original multidimensional data sequence by calculating the time boundary damping factor, then extracts the internal variability to generate a phase alignment coefficient, and calculates the standard path correlation degree by combining the topological tension of the cross-stage interaction matrix. Finally, it determines the connectivity status of data in adjacent experimental stages by extracting the maximum physical absolute amplitude to construct a dynamic extreme value baseline threshold. In the specific environment of adjacent experimental stages in pharmaceutical R&D (such as the early transformation from the preclinical stage to Phase I clinical trials), data often deviates due to irregular sampling or scale misalignment. This solution can objectively quantify the characteristic changes between different stages and eliminate boundary noise and distortion interference caused by individual patient differences or irregular local sampling time. In the face of high-risk cross-stage transition scenarios of core pharmaceutical data, it can adaptively quantify the symmetrical similarity of two sets of irregular frequency sequences and accurately separate the true physical amplitude extreme value to represent the macroscopic gap of the benchmark. Relying on the baseline threshold and penalty sensitivity coefficient adjustment determined purely by the data distribution, the system takes into account both tolerance for drift in normal experimental stages and strict prevention of false alarms, thereby objectively and accurately establishing whether cross-stage R&D data can successfully construct a reliable correlation path, so as to allow it to be included in the management system for subsequent merge analysis. Example 2, a method for constructing association paths for pharmaceutical R&D trial data based on data processing, further includes: The process involves collecting shared monitoring index data from adjacent experimental phases, extracting sampled data in chronological order, and constructing a multidimensional data sequence and a timestamp sequence, including: By connecting to the pharmaceutical R&D database, two adjacent trial phases for which it is necessary to determine whether a correlation path can be constructed are selected and designated as preliminary phases. and post-stage ; In the two stages, select Common key monitoring indicators, such as specific protein concentration, blood pressure, target binding rate, etc. Export the stages in chronological order of sampling time. of Sub-sampling data and stages of Secondary sampling data; Construction phase Multidimensional data sequences and the corresponding timestamp sequence The single-sample vector is calculated using the following formula: ; Similarly, the construction phase Multidimensional data sequences and the corresponding timestamp sequence The single-sample vector is calculated using the following formula: By collecting common monitoring index data from adjacent experimental phases and extracting sampling data in chronological order, a multidimensional data sequence and a timestamp sequence are constructed. This operation structurally integrates scattered and independent pharmaceutical R&D monitoring data, establishes the basic physical structure for data docking at different R&D phases, and provides a unified high-dimensional time series comparison space for subsequent judgment of cross-phase data evolution patterns. For the constructed timestamp sequence, the time interval between every two adjacent sampling points within each experimental phase is calculated, including: Execution phase and stages The time interval between each two adjacent sampling time points is calculated, and the range of values ​​for the time sequence number is limited. as well as :

[0025] in: For the stage No. The second sampling and the first The time interval between each sampling; For the acquired stage No. The timestamp of the next sample; For the stage The time interval; For the acquired stage timestamp; and Each is a stage Number of samplings and phase The sampling frequency was calculated. By calculating the time interval between each two adjacent sampling points within each experimental stage for the constructed timestamp sequence, this step effectively quantified the dynamic changes in sampling frequency during clinical trials and pharmaceutical research and development. In actual pharmaceutical trial scenarios, the initial enrollment and final exit periods of each stage are often unstable, resulting in extremely irregular system sampling times. Direct data alignment would introduce serious misalignment errors. Extracting the absolute time span between adjacent sampling points can accurately reflect the temporal abrupt changes in sampling frequency. This operation explicitly extracts the density gradient information hidden within the original timestamps, providing the most direct physical measure for subsequent adaptive identification of the distortion degree of local time series, thereby effectively avoiding global sequence phase alignment deviations caused by irregular sampling times. The calculation of the time boundary damping factor for each sampling node is specifically obtained by taking the ratio of the absolute value of the difference between adjacent time intervals to their sum, including: After obtaining the time interval, the calculation phase and stages The time boundary damping factor corresponds to each sampling node. This factor measures the degree of distortion in the local time series and limits the calculation range of the nodes. as well as :

[0026] in: For the stage No. The boundary damping factor for each data point has a range of values. For endpoints and Its damping factor is always 0; For the stage The boundary damping factor is set to 0 at the endpoints; and All time intervals are known and calculated. and for and The adjacent time interval; and Each is a stage Number of samplings and phase The sampling number is determined by the ratio of the absolute value of the difference between adjacent time intervals to their sum. This method dynamically quantifies the abrupt changes and distortion of the sampling frequency using the relative rate of change of the time intervals. The more uneven the time intervals, the larger the calculated damping factor. This ratio-based calculation can automatically adapt to the drastically different time scales and sampling frequencies at different stages. Whether it is preclinical observation at the daily level or clinical monitoring at the hourly level, it can be uniformly normalized to the same evaluation system. The mandatory use of absolute value calculation ensures the non-negativity of the damping coefficient threshold and the accuracy of its physical meaning, effectively preventing the anti-physical phenomenon of data attenuation being amplified. This provides a reliable dynamic adaptive quantitative basis for accurately capturing time boundary distortion noise caused by individual patient differences or irregular local sampling times. The step of using a time boundary damping factor to perform numerical attenuation calculations on the original multidimensional data sequence to obtain a distortion-free corrected multidimensional data sequence includes: Applying the damping factor to the original data vector, including all nodes, i.e. and Perform corrective calculations:

[0027] in: and The corrected stages are respectively and stages The and the Sub-multidimensional sampling vector; and To obtain the original direct measurement multidimensional vector; and The known time boundary damping factor is calculated; and Each is a stage Number of samplings and phase The sampling number is determined. By using the calculated time boundary damping factor to perform numerical attenuation calculation on the original multidimensional data sequence, a corrected multidimensional data sequence with distortion removed is obtained. This step directly applies to the data at the distortion points of the time series and performs adaptive numerical attenuation proportionally. The attenuation is greater for nodes with more drastic changes in sampling frequency, thereby dynamically removing distortion noise introduced by boundary instability. This makes the time series, which was originally filled with irregular fluctuations, smoothly restored to a pure data base that can truly reflect the trend of drug action. It effectively avoids distortion noise being directly introduced into the subsequent variability calculation and spatial interaction matrix construction, and protects the essential characteristics of the core biochemical indicators of pharmaceutical research and development to the greatest extent. This greatly enhances the robustness of the algorithm and the purity of feature extraction in the face of extreme irregular sampling environments. The step of calculating the sum of squares of the Euclidean norms of the difference vectors between adjacent nodes in the corrected multidimensional data sequence as the internal variability, and combining the variability of the two stages to generate the phase alignment coefficient, includes: After completing the data correction, calculate the global internal variability of the data sequences for the two experimental phases respectively:

[0028] in: and Stages and stages The global internal variability; This represents the Euclidean norm of a vector. and The obtained corrected data vector; and Each is a stage Number of samplings and phase The number of sampling times; Calculate the phase alignment coefficient based on global internal variability:

[0029] in: This is a directly set constant to prevent division by zero from minimizing positive numbers, and its value is [value missing]. ; This is the cross-stage phase alignment coefficient, and its value range is between This method quantifies the symmetric similarity of two sets of irregular frequency sequences in terms of feature changes. By calculating the sum of squares of the Euclidean norms of the difference vectors between adjacent nodes in the modified multidimensional data sequence as the internal variability, this calculation method fully considers the overall evolution trend of pharmaceutical biochemical data in multidimensional space. The difference vectors between adjacent nodes accurately capture the dynamic rate and direction of change of key monitoring indicators within a unit sampling span. Extracting the sum of squares of the Euclidean norms further aggregates this multidimensional rate of change into a macroscopic pure quantity reflecting the intensity of internal fluctuations throughout the entire experimental stage. This calculation method can truly reflect the inherent physiological change characteristics of pharmaceutical data at this stage after excluding time noise, laying a solid physical foundation for subsequently unifying the variability measurement dimensions of different stages and quantifying the symmetric similarity of irregular frequency sequences in terms of feature changes. The cornerstone of inter-stage mapping: By combining the internal variability of the pre- and post-test stages, a cross-stage phase alignment coefficient is generated. This coefficient uses the product and sum of squares of the two sets of internal variability to construct a dimensionless symmetric evaluation criterion. Its value range is strictly defined within a fixed interval, thereby effectively unifying the drastically different data scales and variability dimensions of different test stages. When connecting cross-stage data sources, this alignment coefficient can accurately quantify the symmetric similarity of two sets of irregular frequency sequences in the evolution of their intrinsic features. This enables clinical sequences that could not be directly compared due to severe misalignment of frequency and scale to have a comparable basis mapped to the isomorphic space, improving the system's objective consistency in quantifying context drift and the accuracy of cross-stage data connection. The calculation of the normalized dot product between the two-stage corrected multidimensional data, and the multiplication of the result by the phase alignment coefficient to map and generate a cross-stage interaction matrix, includes: Perform item-by-item mapping calculations for specific elements in the cross-stage interaction matrix:

[0030] in: Cross-stage interaction matrix The Middle Line number The element values ​​of the column; To obtain the cross-stage phase alignment coefficient; Representation phase The One correction vector and stage The The absolute value of the dot product between the correction vectors; and These are the squares of the Euclidean norms of the two modified vectors, respectively; This is a directly set constant to prevent division by zero from minimizing positive numbers, and its value is [value missing]. By calculating the normalized dot product between two stages of corrected multidimensional data and multiplying the result by a phase alignment coefficient, a cross-stage interaction matrix is ​​generated. This construction method establishes a high-dimensional gravitational connection between data from two experimental stages at different time spans and scales. The normalized dot product operation deeply mines the homogeneity in direction and structure between single multidimensional sampling vectors in different stages, eliminating the influence of absolute numerical fluctuations caused by individual metabolic differences. Multiplying by the phase alignment coefficient further incorporates the symmetric similarity weight of the global sequence into the comparison of local nodes, so that each element in the generated cross-stage interaction matrix can objectively and comprehensively reflect the magnitude of the gravitational connection between a certain sampling in the previous stage and a certain sampling in the subsequent stage in the isomorphic space, providing a complete high-dimensional relationship map for subsequent extraction of global stage transition features. The process of calculating the mean of all elements in the cross-stage interaction matrix as topological tension, and then performing a nonlinear transformation on this tension to obtain the standard path correlation degree, includes: Calculate the topological tension of the global phase transition data:

[0031] in: For stage transition data topology tension; and Stages and stages Total number of samples; These are the corresponding elements in the calculated cross-stage interaction matrix; Transform the topological tension of stage transition data into standard path correlation:

[0032] in: This represents the standard path affinity after nonlinear mapping, and its value is between [value missing]. The larger the value, the stronger the data cohesion between the two experimental phases. By taking the mean of all elements in the cross-phase interaction matrix as the topological tension and performing a nonlinear transformation on this tension to calculate the standard path correlation degree, this step effectively realizes the dimensionality reduction and aggregation of global correlation information in the high-dimensional interaction matrix. Extracting the mean as the topological tension not only smooths outlier interference caused by local abnormal sampling points, but also extracts the comprehensive performance of data cohesion between the two experimental phases from a macroscopic level. Subsequently, a specific nonlinear transformation formula is introduced, combined with specific normalization gain processing, to break the deadlock of the value range collapse of the original topological tension under extreme matching conditions, and accurately expand and map the correlation score to the standard interval. This score is entirely driven by the inherent structural lineage relationship of the medical data, enhancing the scientific nature and objective boundary clarity of the cross-phase matching state quantification. The process of extracting the maximum physical absolute amplitude of various correction data, calculating the average extreme value ratio to obtain the baseline threshold, comparing it with the path correlation degree, and outputting the connectivity status includes: Perform dynamic extreme value baseline threshold extraction calculation:

[0033] in: This serves as the dynamic extreme value baseline threshold. This represents the total number of initially determined key monitoring indicators; The calculated correction vector The Middle The component values ​​of each dimension; For the correction vector No. The component values ​​of each dimension; This is a directly set constant to prevent division by zero from minimizing positive numbers, and its value is [value missing]. ; and Each is a stage Number of samplings and phase The number of sampling times; the dynamic extreme value baseline threshold calculation formula is based on the clean data after eliminating boundary distortion. First, the absolute value of each component is taken to obtain the true physical amplitude. Then, the maximum amplitude extreme value of each type of index in its respective stage is extracted, the extreme value ratio is calculated and the mean is obtained to characterize the benchmark macro gap between the two stages of data. Calculate the final path connectivity state:

[0034] in: This is the final output of the associated path connectivity status value; The calculated standard path correlation degree; The preset baseline penalty sensitivity coefficient has a value range of [value range missing]. In this scheme, the preferred setting is... The value is chosen to balance tolerance for abnormal fluctuations with prevention of false alarms. This value keeps the blocking effect of the baseline threshold within 80% of the effective constraint. The larger the value, such as when it approaches 1.0, the stricter the system's requirements for data consistency. It is more likely to reject drift during normal trial phases caused by individual patient differences, making it suitable for strict compliance correlation of high-risk core data in late-stage clinical trials. The smaller the value, such as 0.5, the higher the fault tolerance, allowing for greater data drift. This is suitable for establishing association paths for cross-species or early translational data from the preclinical stage to the first clinical stage, which requires a high degree of exploration. when At that time, the judgment stage and stage Successful matching and establishment of reliable correlation paths for pharmaceutical R&D data allow for inclusion in the management system for merged analysis. By extracting the maximum physical absolute amplitude of various corrected data and calculating the mean extreme value ratio, a dynamic extreme value baseline threshold is obtained. This extreme value extraction logic forces the use of absolute value operators for each component of the corrected data, ensuring that pharmaceutical and biochemical indicators, whether exhibiting positive gain or negative decay, accurately reflect their maximum physical fluctuation range. This avoids the error of losing the scientific meaning of the benchmark due to algebraic sign reversal in data with both positive and negative physical significance. Furthermore, the extreme value ratio mean calculated based on this pure physical amplitude can extract the underlying threshold representing the macroscopic difference between the two-stage data benchmarks purely based on the data distribution itself, improving the system's adaptive judgment capability when facing monitoring data for different drug targets. By linking the dynamic extreme value baseline threshold with the standard path correlation... The system compares and combines the results with a preset baseline penalty sensitivity coefficient to output the final connectivity status of the associated path. This judgment mechanism takes into account both tolerance for abnormal fluctuations in normal biological processes and strict prevention of false positives. The introduction of the baseline penalty sensitivity coefficient provides the system with flexible adjustment capabilities to cope with experimental data of different risk levels. The larger the value, the more stringent the system's requirements for data consistency, thus resolutely rejecting excessive drift caused by extreme individual differences in patients. It is suitable for strict compliance merging of high-risk core medical data. The smaller the value, the higher the fault tolerance rate, and it can be widely applied to early cross-species translational data association with strong exploratory characteristics. This rigorous judgment and comparison logic makes the final output connectivity status highly reliable for medical applications and feasible for engineering implementation.

[0035] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.

[0036] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the technical principles of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A method for constructing association paths for pharmaceutical R&D trial data based on data processing, characterized in that, include: Collect common monitoring index data from adjacent experimental phases, extract sampling data in chronological order, and construct multidimensional data sequences and timestamp sequences; For the constructed timestamp sequence, calculate the time interval between every two adjacent sampling points within each experimental phase; The time boundary damping factor of each sampling node is calculated by taking the ratio of the absolute value of the difference between adjacent time intervals to their sum. The original multidimensional data sequence was numerically attenuated using the time boundary damping factor to obtain a corrected multidimensional data sequence with distortion removed. The sum of squares of the Euclidean norms of the difference vectors between adjacent nodes in the corrected multidimensional data sequence is taken as the internal variability. The phase alignment coefficient is generated by combining the variability of the two stages. Calculate the normalized dot product between the two-stage corrected multidimensional data, multiply the result by the phase alignment coefficient, and map to generate a cross-stage interaction matrix; The mean of all elements in the cross-stage interaction matrix is ​​taken as the topological tension, and the standard path correlation degree is obtained by nonlinear transformation of the tension. Extract the maximum physical absolute amplitude of various correction data, calculate the average extreme value ratio to obtain the baseline threshold, compare it with the path correlation degree and output the connectivity status.

2. The method for constructing a data association path for pharmaceutical R&D trials based on data processing according to claim 1, characterized in that, The process involves collecting shared monitoring index data from adjacent experimental phases, extracting sampled data in chronological order, and constructing a multidimensional data sequence and a timestamp sequence, including: By connecting to a pharmaceutical R&D database, two adjacent trial phases are selected, including a pre-trial phase and a post-trial phase. Select common key monitoring indicators in both phases; Export the sampling data of the pre-stage and the sampling data of the post-stage in chronological order of sampling time. Construct a multidimensional data sequence and a corresponding timestamp sequence for the pre-stage, wherein the single sampling vector of the pre-stage is composed of the data combination of each key monitoring indicator in that stage; Construct a multidimensional data sequence and a corresponding timestamp sequence for the post-stage, wherein the single sampling vector of the post-stage is composed of the data combination of each key monitoring indicator in that stage.

3. The method for constructing a data association path for pharmaceutical R&D trials based on data processing according to claim 1, characterized in that, For the constructed timestamp sequence, the time interval between every two adjacent sampling points within each experimental phase is calculated, including: For the pre-stage, the time difference between the current sampling point and the previous sampling point arranged in chronological order within the stage is calculated as the time interval of the pre-stage. For the subsequent stage, the time difference between the current sampling point and the previous sampling point arranged in chronological order within the stage is calculated as the time interval of the subsequent stage. The current sampling point is calculated starting from the second sampling point within the corresponding stage.

4. The method for constructing a data association path for pharmaceutical R&D trials based on data processing according to claim 1, characterized in that, The calculation of the time boundary damping factor for each sampling node is specifically obtained by taking the ratio of the absolute value of the difference between adjacent time intervals to their sum, including: For the pre-processing stage, obtain the time interval corresponding to the current sampling point and the time interval corresponding to the next sampling point after the current sampling point; Calculate the absolute value of the difference between the two time intervals, calculate the sum of the two time intervals, and divide the absolute value by the sum to obtain the time boundary damping factor corresponding to the current sampling point in the pre-stage. For the subsequent stage, obtain the time interval corresponding to the current sampling point and the time interval corresponding to the next sampling point after the current sampling point; Calculate the absolute value of the difference between the two time intervals, calculate the sum of the two time intervals, and divide the absolute value by the sum to obtain the time boundary damping factor corresponding to the current sampling point in the subsequent stage; Set the time boundary damping factor of the first and last sampling points of each stage to zero.

5. The method for constructing a data association path for pharmaceutical R&D trials based on data processing according to claim 1, characterized in that, The step of using a time boundary damping factor to perform numerical attenuation calculations on the original multidimensional data sequence to obtain a distortion-free corrected multidimensional data sequence includes: Subtract the time boundary damping factor of the current sampling point in the pre-stage from the numerical value to obtain the pre-stage attenuation coefficient. Multiply the pre-stage attenuation coefficient by the original single-sample vector corresponding to the pre-stage to obtain the corrected sampling vector of the pre-stage. Obtain all corrected sampling vectors from the pre-stage to form a corrected multidimensional data sequence for the pre-stage. Subtract the time boundary damping factor of the current sampling point in the subsequent stage from the numerical value to obtain the attenuation coefficient of the subsequent stage. Multiply the attenuation coefficient of the later stage by the original single sampling vector corresponding to the later stage to obtain the corrected sampling vector of the later stage. All corrected sampling vectors in the post-stage are obtained to form the corrected multidimensional data sequence of the post-stage.

6. The method for constructing a data association path for pharmaceutical R&D trials based on data processing according to claim 1, characterized in that, The step of calculating the sum of squares of the Euclidean norms of the difference vectors between adjacent nodes in the corrected multidimensional data sequence as the internal variability, and combining the variability of the two stages to generate the phase alignment coefficient, includes: Calculate the global internal variability of the data sequences for the two experimental phases respectively; For the corrected multidimensional data sequence in the pre-stage, calculate the difference vector between two adjacent corrected sampling vectors, and calculate the square of the Euclidean norm of the difference vector; The global internal variability of the preceding stage is obtained by summing the squares of all the Euclidean norms calculated in the preceding stage. For the corrected multidimensional data sequence in the later stage, calculate the difference vector between two adjacent corrected sampling vectors, and calculate the square of the Euclidean norm of the difference vector. The global internal variability of the post-stage is obtained by summing the squares of all the Euclidean norms calculated in the post-stage. Multiply the global internal variability of the preceding stage by the global internal variability of the following stage, and multiply the product by a value of two to obtain the first value. The second value is obtained by adding the square of the global internal variability of the preceding stage to the square of the global internal variability of the following stage, and then adding a preset minimum positive constant. Divide the first value by the second value to obtain the cross-stage phase alignment coefficient.

7. The method for constructing a data association path for pharmaceutical R&D trials based on data processing according to claim 1, characterized in that, The calculation of the normalized dot product between the two-stage corrected multidimensional data, and the multiplication of the result by the phase alignment coefficient to map and generate a cross-stage interaction matrix, includes: Select a modified sampling vector from the pre-stage and a modified sampling vector from the post-stage, calculate their dot product and take the absolute value; Multiply the absolute value by the cross-stage phase alignment coefficient, and then multiply by the value two to obtain the first intermediate value; Calculate the square of the Euclidean norm of the two selected modified sampling vectors respectively, add the squares of the two Euclidean norms together, and add a preset minimum positive constant to obtain the second intermediate value. Divide the first intermediate value by the second intermediate value to obtain an element value in the cross-stage interaction matrix; By traversing all combinations of modified sampling vectors in the pre-stage and post-stage phases, the values ​​of all elements in the cross-stage interaction matrix are calculated, thus generating the cross-stage interaction matrix.

8. The method for constructing a data association path for pharmaceutical R&D trials based on data processing according to claim 1, characterized in that, The process of calculating the mean of all elements in the cross-stage interaction matrix as topological tension, and then performing a nonlinear transformation on this tension to obtain the standard path correlation degree, includes: Add all the element values ​​in the cross-stage interaction matrix and divide the sum by the total number of elements in the cross-stage interaction matrix to obtain the stage transition data topology tension. The total number of elements is the product of the total number of samples in the pre-stage and the total number of samples in the post-stage; The square of the topological tension of the transition data in the calculation stage is multiplied by two to obtain the third intermediate value; Add one to the square of the topological tension of the stage transition data to obtain the fourth intermediate value; Divide the third intermediate value by the fourth intermediate value to obtain the standard path relevance.

9. The method for constructing a data association path for pharmaceutical R&D trials based on data processing according to claim 1, characterized in that, The process of extracting the maximum physical absolute amplitude of various correction data, calculating the average extreme value ratio to obtain the baseline threshold, comparing it with the path correlation degree, and outputting the connectivity status includes: Obtain the component values ​​of each dimension in the corrected multidimensional data sequence of the pre-stage and post-stage respectively, and take the absolute value of all component values. Under the same key monitoring indicator, extract the maximum value among all absolute values ​​in the preceding stage and the maximum value among all absolute values ​​in the subsequent stage. By comparing the maximum absolute values ​​of the two stages under this indicator, the smaller of the two values ​​is divided by the sum of the larger of the two values ​​and the preset minimum positive constant, and the extreme value ratio corresponding to the indicator is obtained. The extreme value ratios of all key monitoring indicators are summed, and the summation result is divided by the total number of key monitoring indicators to obtain the dynamic extreme value baseline threshold. The penalty threshold is obtained by multiplying the dynamic extreme value baseline threshold by the preset baseline penalty sensitivity coefficient. Subtract the penalty threshold from the standard path affinity to obtain the connectivity status value of the associated path; Determine if the connectivity status value of the associated path is greater than zero. If it is greater than zero, determine if the associated path can be established; otherwise, determine if the associated path cannot be established.