RNA-drug association prediction method based on quantum adaptive allocation and cross-modal fusion
By employing multi-scale feature extraction, dynamic weight fusion, and adaptive quantum-enhanced cross-modal adversarial training, the problems of insufficient feature representation and inadequate robustness in RNA-drug association prediction are addressed, achieving high-precision and high-robust prediction results and supporting RNA-targeted drug development and drug repositioning.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA UNIV OF PETROLEUM (EAST CHINA)
- Filing Date
- 2026-03-12
- Publication Date
- 2026-06-12
Smart Images

Figure CN122201507A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the interdisciplinary field of bioinformatics, artificial intelligence and drug development, specifically involving an RNA-drug association prediction method. It is applicable to scenarios such as RNA-targeted drug development, drug repositioning, and disease treatment target screening. It can efficiently and accurately predict the association between various types of RNA, such as mRNA, miRNA, lncRNA, and circRNA, and small molecule drugs, providing technical support for biomedical research and development. Background Technology
[0002] RNA molecules (including coding and non-coding RNA) serve as the core carriers of gene expression regulation, and their dysfunction is closely related to the occurrence and development of various major diseases such as cancer and neurodegenerative diseases. Meanwhile, the structural functional elements of RNA (such as G-quadruplexes, riboswitch mechanisms, and stem-loop motifs) can serve as specific targets for small molecule drugs, breaking through the limitations of traditional protein-targeted therapy and becoming a research hotspot in the field of precision medicine. However, the process of validating RNA-drug associations through wet experiments is complex, takes 1-3 years, costs over a million US dollars for a single association validation, and has a success rate of less than 10%, severely hindering the development of RNA-targeted drugs.
[0003] Computational prediction methods offer an effective way to overcome the aforementioned bottlenecks, but existing RNA-drug association prediction technologies still have three major drawbacks:
[0004] Insufficient multimodal feature representation: Existing methods (such as ImageDDI and MVPFDPC) mostly employ single-scale feature extraction strategies, failing to uncover key information such as fragment-level motifs of sequences, hierarchical semantics of text, and multi-resolution structures of images. For example, traditional sequence features only focus on global similarity, ignoring functional fragment differences of 20-100bp, resulting in insufficient accuracy in representing RNA structural elements and drug binding sites.
[0005] The quantum augmentation module suffers from poor resource adaptability: existing quantum augmentation models (such as QKDTI and HQ-DTI) use a fixed number of qubits, which cannot adapt to feature data of varying complexity. Allocating too many qubits to simple features (such as short RNA sequences and single-structure drugs) leads to resource redundancy and a decrease in computational efficiency of more than 30%; while insufficient allocation to complex features (such as long lncRNAs and polycyclic drugs) results in inadequate characterization capabilities and a 12%-15% reduction in the accuracy of high-order interaction capture.
[0006] Cross-modal fusion lacks robustness: Traditional fusion strategies (such as static weights and simple averaging) do not consider the reliability differences of modal data and lack mechanisms to combat interference. When noise exists in a modality (such as low-quality molecular images or incomplete text descriptions), the signal-to-noise ratio of the fused features decreases significantly, leading to a decline in the generalization ability of the prediction model on real complex datasets.
[0007] To address the aforementioned technical bottlenecks, this invention proposes an RNA-drug association prediction method based on quantum adaptive allocation and cross-modal fusion. Through the synergistic design of four core modules—multi-scale feature extraction, dynamic weight fusion, adaptive quantum enhancement, and cross-modal adversarial training—it achieves high-precision and robust prediction of RNA-drug associations, providing key technical support for accelerating the development of RNA-targeted drugs and drug repositioning. Summary of the Invention
[0008] This invention aims to address the technical problems of incomplete multimodal feature representation, low quantum resource utilization efficiency, poor robustness of cross-modal fusion, and insufficient generalization ability for small samples in existing RNA-drug association prediction methods. It provides a prediction method that can fully mine multi-scale biological information, dynamically adapt to quantum computing resources, and has strong anti-interference ability, thereby improving the accuracy and practicality of RNA-drug association prediction.
[0009] The technical solution is as follows:
[0010] The RNA-drug association prediction method based on quantum adaptive allocation and cross-modal fusion specifically includes the following steps:
[0011] S1. Construct an optimized RNA-drug association dataset and perform multi-scale cross-modal feature extraction: Collect RNA-drug association data from authoritative databases, perform deduplication and completion, standardize and preprocess the data, and then use the multi-scale feature extraction module to perform hierarchical representation of RNA / drug sequence, text, and image data to obtain multi-scale modal features.
[0012] S2. Using the modal reliability assessment module, calculate the data quality score and task relevance score for each modality to obtain the comprehensive reliability assessment result.
[0013] S3. Based on the reliability assessment results, the multi-scale modal features are adaptively fused using a dynamic weighted cross-modal attention encoder to generate an initial unified representation.
[0014] S4. Construct a quantum-enhanced interaction module with adaptive qubit allocation, dynamically adjust the quantum circuit scale according to the complexity of the initial unified representation, and realize cross-domain information interaction and feature enhancement.
[0015] S5. Optimize the robustness and domain adaptability of feature representation through cross-modal adversarial training, combine the RNA-drug association probability output by the association prediction model, and then improve the prediction accuracy through model training optimization.
[0016] Optionally, in step S1, the specific process of constructing the RNA-drug dataset and extracting multi-scale modal features is as follows:
[0017] Optionally, in step S11, the data acquisition and preprocessing process is as follows:
[0018] • Data acquisition and deduplication: Data was acquired from the NoncoRNA, ncDR, and RNAactDrug databases, and duplicate records were removed through sequence alignment and SMILES matching;
[0019] • Data completion: Missing RNA sequences were completed using BLAST homology alignment (homology ≥ 85%), missing drug SMILES were completed using structural similarity (Tanimoto coefficient ≥ 0.9), and missing text / images were completed using Qwen-Max generation and RDKit reconstruction, respectively.
[0020] • Multi-scale standardization: The sequence is divided into segments of length 20, 50, and 100 (insufficient padding, excessively long sliding segmentation); the text is uniformly formatted as UTF-8 and redundant information is removed; the image is standardized to a 224×224 RGB image and the pixel values are normalized to [0,1].
[0021] • Dataset Construction: Two benchmark datasets were constructed. Dataset 1 contains 308 RNAs, 62 drugs, and 1833 known associations. Dataset 2 contains 700 RNAs, 90 drugs, and 4092 validated associations. RNA-drug associations are represented by a binary matrix.
[0022] Optionally, in step S12, the multi-scale cross-modal feature extraction process is as follows:
[0023] Optionally, in step S121, the multi-scale segment feature extraction process of the sequence is as follows:
[0024] • Multi-scale fragment segmentation: RNA sequences are segmented into non-overlapping fragments based on lengths of 20 (short scale), 50 (medium scale), and 100 (long scale); drug topological fingerprints are segmented into feature fragments based on feature dimensions of 32 (low scale), 64 (medium scale), and 128 (high scale).
[0025] • Intra-scale encoding: RNA sequence fragments are encoded using a 2-layer bidirectional LSTM (hidden layer dimension 256, dropout rate 0.3) to capture the temporal dependencies of nucleotides within the fragment; drug topological fingerprint fragments are encoded using a 3-layer CNN convolutional encoding to extract local structural features;
[0026] • Multi-scale fusion: Calculating scale attention weights based on fragment information entropy Information entropy is expressed by the formula calculate (The probability of the occurrence of feature elements in the fragment), then according to the formula Fusion ( For the k-th scale segment, (for the corresponding encoding network), and
[0027] Optionally, in step S122, the hierarchical semantic feature extraction process of the text is as follows:
[0028] • Semantic hierarchy division: The RNA / drug function description text generated by Qwen-Max is divided into “basic attribute layer”, “functional mechanism layer” and “clinical association layer” according to semantic hierarchy.
[0029] • Hierarchical encoding: The basic attribute layer uses a 1-layer Transformer, the functional mechanism layer uses a 2-layer Transformer, and the clinical association layer uses a 3-layer Transformer (the deeper the layer, the higher the model complexity, in order to match the semantic depth);
[0030] • Semantic fusion: This involves fusing hierarchical semantic features through a gating mechanism, as shown in the following formula: , in These are semantic features at three levels. For the gated weight matrix, For bias terms, It is the sigmoid activation function;
[0031] Optionally, in step S123, the multi-resolution feature extraction process of the image is as follows:
[0032] • Multi-resolution image generation: The drug molecule image is downsampled using bilinear interpolation to generate three versions: low resolution (56×56), medium resolution (112×112), and high resolution (224×224).
[0033] • Resolution feature encoding: ResNet-50 is used as the backbone network (pre-trained weights are fine-tuned based on ImageNet and ImageMol). Features are extracted for each resolution image. Low resolution images focus on global structural features, medium resolution images focus on local-global fusion features, and high resolution images focus on local detail features.
[0034] • Cross-resolution feature fusion: A feature pyramid network is introduced to reduce the dimensionality of high-resolution features to 256 dimensions through 1×1 convolution, while low-resolution features are upsampled to 256 dimensions. These low-resolution features are then concatenated and fused with the medium-resolution features (reduced to 256 dimensions), as shown in the following formula: ,in For upsampling operation, For downsampling operation, The features are concatenated, and the final output dimension is 768.
[0035] Optionally, in step S2, the specific process of designing the adaptive modal weight allocation module is as follows:
[0036] Optionally, in step S21, a data quality score is calculated. The specific process is as follows:
[0037] • Sequence modality: Based on sequence integrity (missing bases / features ≤ 5% is the maximum score of 1.0, decreasing by 0.2 for every additional 5%) and sequence conservation (consistency calculated through multiple sequence alignments). The weight percentages were 0.5 and 0.5%, respectively.
[0038] • Text Modality: The confidence level and information completeness of the text generated by LLM are weighted and calculated, with weights of 0.5 and 0.5 respectively.
[0039] • Image modality: Calculated based on a weighted average of image sharpness and structural discernibility, with weights of 0.5 and 0.5 respectively;
[0040] Optionally, in step S22, the task relevance score is calculated. The specific process is as follows: quantification is achieved through the mutual information of modal features and RNA-drug association tags. Mutual information is expressed using the formula...
[0041]
[0042] calculate( Modal features, (For the associated labels), the mutual information values are then normalized to the [0,1] interval as the task relevance score.
[0043] Optionally, in step S23, the overall reliability score is calculated. The specific process is as follows: according to the formula The balance coefficient This ensures a balanced consideration of data quality and task relevance.
[0044] Optionally, in step S3, the specific process of adaptively fusing multi-scale modal features is as follows:
[0045] Optionally, in step S31, the specific process of intra-modal feature enhancement is as follows: Multi-scale features of each modality are encoded using a Modality-Specific Graph Attention Network (GAT). The GAT network contains two attention layers (8 attention heads, 256 hidden layer dimensions, and a dropout rate of 0.3), as shown in the formula. ,in For the k-th scale feature of the m-th mode, A modality-specific GAT network is used to enhance key features within a modality through a self-attention mechanism.
[0046] Optionally, in step S32, the dynamic cross-modal attention calculation process is as follows:
[0047] • Initial weight calculation: based on reliability score The initial attention weights are calculated using the softmax function, as shown in the formula below. .
[0048] • Weight optimization: Calculate the mean of each modal feature and all modal features. cosine similarity Then, the final attention weight is obtained through similarity-weighted optimization, as shown in the formula: This ensures that the fusion weights consider both modal reliability and feature consistency.
[0049] Optionally, in step S33, the unified characterization generation process is as follows: according to the formula , An initial unified representation of RNA and drugs is generated, with a dimension of 256, and is normalized using LayerNorm (mean 0, variance 1) to improve the training stability of subsequent modules.
[0050] Optionally, in step S4, the specific process of quantum-enhanced cross-domain information interaction is as follows:
[0051] Optionally, in step S41, the feature complexity evaluation process is as follows: according to the formula The computational complexity of features is as follows: This represents the normalized value of the feature dimension (mapping the feature dimension to the [0,1] interval). Characteristic sparsity (proportion of non-zero elements), balance coefficient Highlight the impact of feature dimensions on complexity;
[0052] Optionally, in step S42, the specific process of adaptive qubit allocation is as follows:
[0053] • Allocation rules: When (For low complexity, such as simple structural drugs and short RNA sequences) allocate 3 qubits; when (For medium complexity, such as medium-structure drugs or medium-length RNA), allocate 5 qubits; when (For high complexity, such as drugs with complex structures or long RNA sequences, 7 qubits are allocated.)
[0054] • Quantum circuit adaptation: The depth of the quantum circuit is adaptively adjusted according to the number of qubits. 3 qubits correspond to 2 layers of circuit (each layer contains single-qubit rotation + ring entanglement), 5 qubits correspond to 3 layers of circuit, and 7 qubits correspond to 4 layers of circuit, avoiding resource redundancy or insufficient representation.
[0055] Optionally, in step S43, the specific process of quantum enhancement coding is as follows:
[0056] Classical-Quantum Projection: A Study of the Initial Unified Characterization LayerNorm normalization is performed to obtain Then, it is mapped to a quantum rotation angle through a sigmoid activation function. ( Use the sigmoid function to ensure the angle range is [0, ... ]);
[0057] • Quantum circuit processing: Quantum circuits contain single-qubit rotation layers (three rotation gates, RX, RY, and RZ, connected in series, with rotation angles determined by...) (Definition) and ring entanglement layers (adjacent qubits are entangled through CNOT gates, forming a ring topology);
[0058] • Quantum-classical projection: Measuring the expectation value of the Pauli-Z operator on each qubit to obtain the quantum measurement result. Then, it is mapped back to the classical feature space through a two-layer fully connected network (128 hidden layers, ReLU activation function), as shown in the formula. ,in 128× Dimension weight matrix, It is a 256×128 dimensional weight matrix;
[0059] Optionally, in step S44, the specific process of cross-domain information transfer is as follows: An interaction graph is constructed based on known RNA-drug associations; cross-domain information exchange is achieved through symmetric neighborhood aggregation; and the RNA-side formula... Drug-side formula ).
[0060] Optionally, in step S5, the specific process of cross-modal adversarial training optimization and model training is as follows:
[0061] Optionally, in step S51, the specific process of cross-modal adversarial training is as follows:
[0062] • Architecture Construction: The generator integrates multi-scale feature extraction, dynamic weight fusion, and quantum enhancement modules, outputting optimized unified characterizations of RNA and drugs. , The discriminator includes a modal discriminator. and correlation discriminant ;
[0063] Discriminator Training: Modal Discriminator Taking a single modality feature as input, the output is the modality class probability (sequence, text, image), and the loss function is cross-entropy loss. Association discriminator To characterize the The input is the probability of the associated label (0 or 1), and the output is the cross-entropy loss function.
[0064] • Calculation of adversarial losses: Total adversarial losses are The generator minimizes This allows for the obfuscation of modal differences in features, thereby enhancing the ability to predict associations.
[0065] Optionally, in step S52, the specific process for calculating the association probability is as follows: The RNA and drug representations optimized through adversarial training are matrix-multiplied, and the association probability is output using the sigmoid function, as shown in the formula: ,in Let be the association probability between the r-th RNA and the d-th drug, with a value ranging from [0,1].
[0066] Optionally, in step S11, the BLAST tool has an E value ≤ 1e-5 and an alignment length ≥ 50bp; in step S121, the CNN uses 3×3 / 5×5 / 7×7 convolutional kernels; in step S122, the Transformer has 4 / 8 / 12 attention heads and 128 / 256 / 512 hidden layers.
[0067] Optionally, in step S43, the quantum measurement is performed 1024 times; in step S44, isolated nodes are filled with their own feature mean; in step S51, the discriminator uses a 3-layer fully connected network with the activation function LeakyReLU (negative slope 0.01); and the model training uses stratified sampling 5-fold cross-validation.
[0068] Beneficial effects
[0069] Compared with the prior art, the present invention has the following beneficial effects:
[0070] 1. Comprehensive and accurate multi-scale feature representation: Through differentiated extraction strategies such as sequence fragmentation, text hierarchization, and multi-resolution image extraction, key biological information such as fragment-level motifs, hierarchical semantics, and multi-scale topology of RNA / drugs are deeply mined, solving the problem of insufficient representation of structured functional elements by traditional single-scale extraction, and significantly improving the completeness and accuracy of feature representation.
[0071] 2. Efficient and adaptable utilization of quantum resources: The first adaptive qubit allocation mechanism dynamically adjusts the size and depth of the quantum circuit according to the feature complexity, enhances the representation ability in complex feature scenarios, and reduces resource redundancy in simple feature scenarios, taking into account both quantum-enhanced expressive power and computational efficiency, and breaking through the resource adaptation limitations of fixed qubits.
[0072] 3. Strong robustness of cross-modal fusion: The dynamic weight fusion strategy combined with modal reliability assessment can adaptively adjust the contribution ratio of each modality; coupled with the cross-modal adversarial training module, it effectively resists the interference of modal noise and data distribution differences, solves the problem of insufficient robustness of traditional static fusion, and improves the generalization ability of the model in complex real-world scenarios.
[0073] 4. Outstanding predictive practical value: It outperforms existing mainstream methods in core metrics such as AUC and AUPR on benchmark datasets, and performs well in small sample scenarios; it can be directly applied to practical scenarios such as RNA-targeted drug development and drug repositioning, providing efficient technical support for the biomedical field.
[0074] In summary, this invention achieves high accuracy, robustness, and practicality in RNA-drug association prediction through multi-module collaborative innovation, providing a key technical solution for overcoming the bottleneck in RNA-targeted therapy research and development, and has broad prospects for industrial application. Attached Figure Description
[0075] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0076] Figure 1 It is the overall framework of an RNA-drug association prediction method based on quantum adaptive allocation and cross-modal fusion. Detailed Implementation
[0077] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0078] This application discloses an RNA-drug association prediction method based on quantum adaptive allocation and cross-modal fusion.
[0079] Example
[0080] according to Figure 1 As shown, an RNA-drug association prediction method based on quantum adaptive allocation and cross-modal fusion is presented.
[0081] 1. Related work
[0082] 1.1 Multimodal Molecular Correlation Prediction Method
[0083] Existing multimodal molecular association prediction methods mainly improve performance by integrating features from sequences, text, and images, but they have significant limitations: ImageDDI only achieves a simple fusion of local motifs and global image features, without considering multi-scale information; MVPFDPC's multi-view fusion lacks dynamic weight adjustment and cannot adapt to changes in the reliability of modal data. Some methods attempt to introduce attention mechanisms, but without combining them with modality evaluation metrics, resulting in limited fusion accuracy. This invention addresses the shortcomings of existing methods, namely, the lack of hierarchical information and static fusion defects, through multi-scale feature extraction and dynamic weight fusion.
[0084] 1.2 Cross-modal adversarial training techniques
[0085] Cross-modal adversarial training, which enhances the domain adaptability of features through GAN structures, has been applied in fields such as image recognition and natural language processing, but it has not yet been effectively applied in biomolecular association prediction. Existing methods lack mechanisms to combat modal noise and distributional differences, resulting in insufficient feature robustness. This invention introduces cross-modal adversarial training into RNA-drug association prediction, strengthening the consistency and anti-interference ability of features and providing a new path to improve prediction generalization.
[0086] 2. Research Methods
[0087] 2.1 Overall Model Architecture
[0088] The overall framework of the model is as follows Figure 1 As shown, it includes three core modules: a multimodal input and multi-scale feature encoding module, a dynamic fusion and adaptive quantum enhancement module, and a cross-modal adversarial training and prediction module. The specific process is as follows:
[0089] 1. Perform multi-scale feature extraction on RNA and drug sequence, text, and image data to obtain hierarchical raw features;
[0090] 2. The quality and correlation scores of each modal data are quantified through the modal reliability assessment module;
[0091] 3. The dynamic weighted cross-modal attention encoder combines reliability scores to adaptively fuse multi-scale features and generate an initial unified representation;
[0092] 4. The adaptive qubit allocation module dynamically adjusts the number of qubits based on the complexity of the initial representation, and realizes high-order feature transformation and cross-domain interaction through quantum circuits;
[0093] 5. The cross-modal adversarial training module optimizes the robustness and domain adaptability of representations through adversarial game between the discriminator and the generator;
[0094] 6. Predict the probability of RNA-drug association based on optimized characterization.
[0095] 2.2 Dataset Construction
[0096] The dataset was constructed using RNA-drug association data from three authoritative databases: NoncoRNA, ncDR, and RNAactDrug, and after the following optimized preprocessing steps:
[0097] (1) Data deduplication and completion: Duplicate and associated records were removed, and missing RNA / drug attribute data (such as sequences and SMILES strings) were completed by homologous sequence alignment and chemical structure deduction;
[0098] (2) Multi-scale data standardization: Sequence data is uniformly truncated into fixed-length segments (filling if insufficient, segmenting if too long), text data is uniformly formatted and redundant semantics are removed, and image data is standardized to a 3-channel fixed resolution (224×224).
[0099] (3) Dataset partitioning: Two benchmark datasets were constructed. Dataset 1 contains 308 RNAs, 62 drugs and 1833 known associations, and Dataset 2 contains 700 RNAs, 90 drugs and 4092 validated associations. Both datasets were evaluated using 5-fold cross-validation.
[0100] RNA-drug association in a binary matrix It means that among them This indicates that the m-th RNA is associated with the n-th drug. This indicates no association (including unknown associations).
[0101] 2.3 Multi-scale cross-modal feature extraction module
[0102] For sequence, text, and image modalities, multi-scale extraction mechanisms are designed to mine biological information at different levels:
[0103] 2.3.1 Multi-scale segment feature extraction of sequences
[0104] (1) Multi-scale segmentation: For RNA sequences, segments are divided into short, medium and long scales according to lengths of 20, 50 and 100; for drug sequences (topological fingerprints), feature segments are divided into low, medium and high scales according to feature dimensions of 32, 64 and 128.
[0105] (2) In-scale feature encoding: Bidirectional LSTM is used to encode RNA sequence fragments to capture the nucleotide dependencies within the fragments; CNN is used to convolve drug topological fingerprint fragments to extract local structural features;
[0106] (3) Multi-scale feature fusion: Features at each scale are weighted and fused using scale attention weights (calculated based on fragment information entropy), as shown in the following formula:
[0107] (1)
[0108] in For the sequence segment at the k-th scale, This is the coding network for the corresponding scale. For scale attention weights ( ), from fragment information entropy Normalization yields: ,
[0109] 2.3.2 Extraction of Hierarchical Semantic Features from Text
[0110] (1) Text hierarchical division: The RNA / drug functional description text generated by LLM is divided into "basic attribute layer" (such as genomic location, chemical structure), "functional mechanism layer" (such as regulatory pathway, target of action) and "clinical association layer" (such as disease association, treatment indication);
[0111] (2) Hierarchical semantic encoding: The hierarchical Transformer model is used to encode the text at each level. The basic attribute layer uses a 1-layer Transformer, the functional mechanism layer uses a 2-layer Transformer, and the clinical association layer uses a 3-layer Transformer. The deeper the level, the higher the model complexity.
[0112] (3) Semantic fusion and optimization: The hierarchical semantic features are fused through a gating mechanism, as shown in the following formula:
[0113] (2)
[0114] in These are semantic features at three levels. For the gated weight matrix, For bias terms, It is the sigmoid activation function.
[0115] 2.3.3 Multi-resolution feature extraction of images
[0116] (1) Multi-resolution image generation: The drug molecule image is downsampled to generate three resolution versions: low (56×56), medium (112×112), and high (224×224);
[0117] (2) Resolution feature encoding: ResNet-50 is used as the backbone network to extract features for images of different resolutions. Low-resolution images focus on global structure, while high-resolution images focus on local details.
[0118] (3) Cross-resolution feature fusion: Introducing a Feature Pyramid Network (FPN) to upsample and fuse high-resolution features with low-resolution features to capture multi-scale spatial topology information, as shown in the following formula:
[0119] (3)
[0120] in For upsampling operation, For downsampling operation, Feature splicing.
[0121] 2.4 Modal Reliability Assessment Module
[0122] To quantify the quality and correlation of data from each modality, a modal reliability assessment index is designed to provide a basis for dynamic weight fusion.
[0123] (1) Data quality score For sequence modalities, calculations are based on sequence integrity and conservatism; for text modalities, calculations are based on the confidence and information integrity of LLM-generated text; for image modalities, calculations are based on image sharpness and structural discernibility.
[0124] (2) Task relevance score The contribution of a modality to the prediction task is measured by calculating the mutual information between modal features and RNA-drug association tags.
[0125] (3) Overall reliability score The final reliability score is obtained by weighted fusion. The balance coefficient is 0.4 (optimized through experiments).
[0126] 2.5 Dynamic Weighted Cross-Modal Attention Encoder
[0127] By combining modal reliability scores, adaptive fusion of multi-scale features is achieved. The core steps are as follows:
[0128] (1) Intramodal feature enhancement: For the multi-scale features of each modality, the modality-specific structure and semantic features are preserved by encoding through a dedicated graph attention network (GAT), as shown in the following formula:
[0129] (4)
[0130] in For the k-th scale feature of the m-th mode, A dedicated GAT network for modal applications.
[0131] (2) Dynamic cross-modal attention calculation: based on modal reliability score Calculate the initial attention weights, and then optimize them using the interactive attention mechanism, as shown in the following formula:
[0132] (5)
[0133] in The modal characteristic mean, Let be the cosine similarity.
[0134] (3) Unified representation generation: Initial unified representations of RNA and drugs are obtained by fusing features from various modalities through dynamic attention weights.
[0135] (6)
[0136] 2.6 Quantum Enhancement Interaction Module with Adaptive Qubit Allocation
[0137] 2.6.1 Feature Complexity Evaluation
[0138] For the initial unified representation and As the basis for qubit allocation:
[0139] (7)
[0140] in Normalized values for feature dimensions Characteristic sparsity (the proportion of non-zero elements), This is the balance coefficient (optimized to 0.6 through experiments).
[0141] 2.6.2 Adaptive Quantum Bit Allocation Rule
[0142] Based on feature complexity Dynamically adjust the number of qubits The allocation rules are as follows:
[0143] ·when (Low complexity): This satisfies the basic characterization requirements;
[0144] ·when (Medium complexity): Balancing representation capabilities with resource consumption;
[0145] ·when (High complexity): Enhance advanced interaction capture capabilities.
[0146] 2.6.3 Quantum-enhanced cross-domain interaction
[0147] (1) Classical-quantum projection: The initial unified representation is normalized and mapped to a quantum rotation angle, as shown in the following formula:
[0148] (8)
[0149] in It is the sigmoid activation function.
[0150] (2) Adaptive quantum circuit encoding: based on the assigned Constructing quantum circuits, comprising single-qubit rotation layers (RX, RY, RZ) and ring-entangled layers (CNOT), with circuit depth varying... Adaptive adjustment (3→2 layers, 5→3 layers, 7→4 layers).
[0151] (3) Quantum-classical projection: The expected value of the Pauli-Z operator output of the quantum circuit is measured and mapped back to the classical feature space through a fully connected network to obtain the quantum-enhanced characterization:
[0152] (9)
[0153] in This is the result of quantum measurement.
[0154] Cross-domain information transfer: Based on known RNA-drug associations, cross-domain information exchange is achieved through symmetric neighborhood aggregation, as shown in the following formula:
[0155] (10)
[0156] (11)
[0157] in and It is a two-layer MLP that achieves feature dimension matching.
[0158] 2.7 Cross-modal adversarial training module
[0159] To improve the robustness and domain adaptability of the representation, cross-modal adversarial training is introduced to construct a "generator-discriminator" architecture:
[0160] (1) Generator: namely the multi-scale feature extraction, dynamic weight fusion, and quantum enhancement modules mentioned above, which output a unified characterization of RNA and drugs. and
[0161] (2) Discriminator: Design a dual discriminator structure, including a modal discriminator. and correlation discriminant
[0162] Modal discriminator The input is a single modality feature, and the goal is to distinguish which modality the feature comes from. The generator is trained adversarially to confuse the modality differences of the feature and improve the domain adaptability.
[0163] Association discriminator Characterization of input RNA and drugs The goal is to distinguish between association labels of 1 and 0, and the generator optimizes the association prediction ability of the representation through adversarial training.
[0164] (3) Adversarial loss function:
[0165] (12)
[0166] in For the cross-entropy loss of the modality discriminator, This represents the cross-entropy loss of the association discriminator.
[0167] (4) Joint optimization: The total loss function is the weighted sum of the prediction loss and the adversarial loss:
[0168] in The cross-entropy loss for RNA-drug association prediction. To counteract the loss weight (optimized to 0.1 through experiments).
[0169] 2.8 Association Prediction and Model Optimization
[0170] (13)
[0171] (1) Association probability calculation: The optimized RNA and drug characterization are matrix multiplied, and the association probability is output through the sigmoid function:
[0172] (14)
[0173] (2) Model optimization: The AdamW optimizer was used. The learning rate was initially set to 1e-4 and decayed to 0.9 every 10 epochs. The training batch size was set to 32 and the number of iterations was set to 350 (dataset 1) / 150 (dataset 2). The early stopping strategy (stopping if the validation set AUC does not improve for 10 consecutive epochs) was used to avoid overfitting.
[0174] 2.9 Experimental Verification
[0175] (1) Implementation details: To comprehensively and fairly evaluate the performance of the method of this invention and mitigate the impact of random data splitting, we conducted a 5-fold cross-validation (CV) experiment on the dataset, uniformly dividing the samples into 5 subsets, selecting one subset in turn as the test set, and using the rest for training. Based on this, we adopted a set of comprehensive performance metrics, including accuracy, precision, recall, F1 score, area under the receiver operating characteristic curve (AUC), and area under the precision-recall curve (AUPR), to provide supplementary evaluation of prediction performance.
[0176] (2) Overall performance comparison: The test results of the core classification metrics (AUC, AUPR, F1-Score, Recall, Precision, MCC, Accuracy) of the method of this invention and the comparison methods on two benchmark datasets are shown in the table below. All metrics are the mean ± standard deviation after 5-fold cross-validation.
[0177]
[0178] As can be seen from the above results, the method of the present invention significantly outperforms all the comparative methods on multiple metrics on both datasets, indicating that the present invention effectively improves the accuracy of RNA-drug association prediction through the synergistic design of multi-scale feature extraction, dynamic weight fusion and adaptive quantum enhancement.
[0179] (3) Ablation experiment: To verify the effectiveness of each core module of the present invention, an ablation experiment was designed. The multi-scale feature extraction (MSFE), dynamic weight fusion (DWF), adaptive quantum enhancement (AQE), and cross-modal adversarial training (CMAT) modules were removed in sequence. The AUC and AUPR of the model were tested on datasets 1 and 2. The results are shown in the table below.
[0180]
[0181] Ablation experiments show that all four core modules of this invention significantly contribute to model performance. Multi-scale feature extraction (MSFE) is fundamental; its removal significantly reduces performance because it addresses the shortcomings of traditional single-scale extraction in representing the structural functional elements of RNA / drugs. Adaptive quantum enhancement (AQE) and dynamic weight fusion (DWF) are key to performance improvement, enhancing the capture of higher-order interactions and the accuracy of cross-modal fusion, respectively. Cross-modal adversarial training (CMAT) further strengthens the model's robustness and generalization ability. The synergistic effect of these modules achieves an overall improvement in model performance, validating the rationality and innovation of this invention.
[0182] Other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and practice of the invention herein. This application is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common knowledge or customary techniques in the art not invented by the invention. The specification and examples are to be considered exemplary only, and the true scope and spirit of the invention are indicated by the appended claims.
[0183] It should be understood that the present invention is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of the invention is limited only by the appended claims.
Claims
1. A method for predicting RNA-drug associations based on quantum adaptive allocation and cross-modal fusion, characterized in that, Includes the following steps: S1. Construct an optimized RNA-drug association dataset and perform multi-scale cross-modal feature extraction: Collect RNA-drug association data from authoritative databases, perform deduplication and completion, standardize and preprocess the data, and then use the multi-scale feature extraction module to perform hierarchical representation of RNA / drug sequence, text, and image data to obtain multi-scale modal features. S2. Using the modal reliability assessment module, calculate the data quality score and task relevance score for each modality to obtain the comprehensive reliability assessment result. S3. Based on the reliability assessment results, the multi-scale modal features are adaptively fused using a dynamic weighted cross-modal attention encoder to generate an initial unified representation. S4. Construct a quantum-enhanced interaction module with adaptive qubit allocation, dynamically adjust the quantum circuit scale according to the complexity of the initial unified representation, and realize cross-domain information interaction and feature enhancement. S5. Optimize the robustness and domain adaptability of feature representation through cross-modal adversarial training, combine the RNA-drug association probability output by the association prediction model, and then improve the prediction accuracy through model training optimization.
2. The method according to claim 1, characterized in that, The specific process of step S1 is as follows: S11. Data Acquisition and Preprocessing: • Data acquisition and deduplication: Data was acquired from the NoncoRNA, ncDR, and RNAactDrug databases, and duplicate records were removed through sequence alignment and SMILES matching; • Data completion: Missing RNA sequences were completed using BLAST homology alignment (homology ≥ 85%), missing drug SMILES were completed using structural similarity (Tanimoto coefficient ≥ 0.9), and missing text / images were completed using Qwen-Max generation and RDKit reconstruction, respectively. • Multi-scale standardization: The sequence is divided into segments of length 20, 50, and 100 (insufficient padding, excessively long sliding segmentation); the text is uniformly formatted as UTF-8 and redundant information is removed; the image is standardized to a 224×224 RGB image, and the pixel values are normalized to [0,1]. • Dataset Construction: Two benchmark datasets were constructed. Dataset 1 contains 308 RNAs, 62 drugs, and 1833 known associations. Dataset 2 contains 700 RNAs, 90 drugs, and 4092 validated associations. RNA-drug associations are represented by a binary matrix. S12, Multi-scale cross-modal feature extraction: S121. Multi-scale segment feature extraction of sequences: • Multi-scale fragment segmentation: RNA sequences are segmented into non-overlapping fragments based on lengths of 20 (short scale), 50 (medium scale), and 100 (long scale); drug topological fingerprints are segmented into feature fragments based on feature dimensions of 32 (low scale), 64 (medium scale), and 128 (high scale). • Intra-scale encoding: RNA sequence fragments are encoded using a 2-layer bidirectional LSTM (hidden layer dimension 256, dropout rate 0.3) to capture the temporal dependencies of nucleotides within the fragment; drug topological fingerprint fragments are encoded using a 3-layer CNN convolutional encoding to extract local structural features; • Multi-scale fusion: Calculating scale attention weights based on fragment information entropy Information entropy is expressed by the formula calculate (The probability of the occurrence of feature elements in the fragment), then according to the formula Fusion ( For the k-th scale segment, (for the corresponding encoding network), and S122. Extraction of hierarchical semantic features of text: • Semantic hierarchy division: The RNA / drug function description text generated by Qwen-Max is divided into "basic attribute layer", "functional mechanism layer" and "clinical relevance layer" according to semantic hierarchy; • Hierarchical encoding: The basic attribute layer uses a 1-layer Transformer, the functional mechanism layer uses a 2-layer Transformer, and the clinical association layer uses a 3-layer Transformer (the deeper the layer, the higher the model complexity, in order to match the semantic depth); • Semantic fusion: This involves fusing hierarchical semantic features through a gating mechanism, as shown in the following formula: ,in These are semantic features at three levels. For the gated weight matrix, For bias terms, It is the sigmoid activation function; S123. Multi-resolution feature extraction of images: • Multi-resolution image generation: The drug molecule image is downsampled using bilinear interpolation to generate three versions: low resolution (56×56), medium resolution (112×112), and high resolution (224×224). • Resolution feature encoding: ResNet-50 is used as the backbone network (pre-trained weights are fine-tuned based on ImageNet and ImageMol). Features are extracted for each resolution image. Low resolution images focus on global structural features, medium resolution images focus on local-global fusion features, and high resolution images focus on local detail features. • Cross-resolution feature fusion: A feature pyramid network is introduced to reduce the dimensionality of high-resolution features to 256 dimensions through 1×1 convolution, while low-resolution features are upsampled to 256 dimensions. These low-resolution features are then concatenated and fused with the medium-resolution features (reduced to 256 dimensions), as shown in the following formula: ,in For upsampling operation, For downsampling operation, The features are concatenated, and the final output dimension is 768.
3. The model according to claim 1, characterized in that, The specific process of designing the adaptive modal weight allocation module in step S2 is as follows: S21, Data Quality Score calculate: • Sequence modality: Based on sequence integrity (missing bases / features ≤ 5% is the maximum score of 1.0, decreasing by 0.2 for every additional 5%) and sequence conservation (consistency calculated through multiple sequence alignments). The weight percentages were 0.5 and 0.5%, respectively. • Text Modality: The confidence level and information completeness of the text generated by LLM are weighted and calculated, with weights of 0.5 and 0.5 respectively. • Image modality: Calculated based on a weighted average of image sharpness and structural discernibility, with weights of 0.5 and 0.5 respectively; S22, Task Relevance Score calculate: Quantification is achieved through the mutual information between modal features and RNA-drug association tags. Mutual information is expressed using a formula... calculate( Modal features, (For the associated labels), the mutual information values are then normalized to the [0,1] interval as the task relevance score. S23, Overall Reliability Score calculate: According to the formula The balance coefficient Ensure a balanced consideration of data quality and task relevance.
4. The method according to claim 1, characterized in that, The specific process of step S3 is as follows: S31. Intramodal feature enhancement: Modality-Specific Graph Attention Network (GAT) is used to encode multi-scale features of each modality. The GAT network consists of two attention layers (8 attention heads, 256 hidden dimensions, and a dropout rate of 0.3), as shown in the formula. ,in For the k-th scale feature of the m-th mode, A modality-specific GAT network is used to enhance key features within a modality through a self-attention mechanism. S32, Dynamic Cross-Modal Attention Calculation: • Initial weight calculation: based on reliability score The initial attention weights are calculated using the softmax function, as shown in the formula below. . • Weight optimization: Calculate the mean of each modal feature and all modal features. cosine similarity Then, the final attention weight is obtained through similarity-weighted optimization, as shown in the formula: This ensures that the fusion weights consider both modal reliability and feature consistency. S33, Unified Representation Generation: According to the formula , An initial unified representation of RNA and drugs is generated, with a dimension of 256, and is normalized using LayerNorm (mean 0, variance 1) to improve the training stability of subsequent modules.
5. The method according to claim 1, characterized in that, The specific process of step S4 is as follows: S41. Feature Complexity Evaluation: According to the formula The computational complexity of features is as follows: This represents the normalized value of the feature dimension (mapping the feature dimension to the [0,1] interval). Characteristic sparsity (proportion of non-zero elements), balance coefficient Highlight the impact of feature dimensions on complexity; S42, Adaptive qubit allocation: • Allocation rules: When (For low complexity, such as simple structural drugs and short RNA sequences) allocate 3 qubits; when (For medium complexity, such as medium-structure drugs or medium-length RNA), allocate 5 qubits; when (For high complexity, such as drugs with complex structures or long RNA sequences, 7 qubits are allocated.) • Quantum circuit adaptation: The depth of the quantum circuit is adaptively adjusted according to the number of qubits. 3 qubits correspond to 2 layers of circuit (each layer contains single-qubit rotation + ring entanglement), 5 qubits correspond to 3 layers of circuit, and 7 qubits correspond to 4 layers of circuit, avoiding resource redundancy or insufficient representation. S43, Quantum Enhancement Coding: Classical-Quantum Projection: A Study of the Initial Unified Characterization LayerNorm normalization is performed to obtain Then, it is mapped to a quantum rotation angle through a sigmoid activation function. ( Use the sigmoid function to ensure the angle range is [0, ... ]); • Quantum circuit processing: Quantum circuits contain single-qubit rotation layers (RX, RY, and RZ three rotation gates connected in series, with rotation angles determined by...) (Definition) and ring entanglement layers (adjacent qubits are entangled through CNOT gates, forming a ring topology); • Quantum-classical projection: Measuring the expectation value of the Pauli-Z operator on each qubit to obtain the quantum measurement result. Then, it is mapped back to the classical feature space through a two-layer fully connected network (128 hidden layers, ReLU activation function), as shown in the formula. ,in 128× Dimension weight matrix, It is a 256×128 dimensional weight matrix; S44. Cross-domain information transfer: An interaction graph was constructed based on known RNA-drug associations, and cross-domain information exchange was achieved through symmetric neighborhood aggregation. (RNA-side formula) Drug-side formula ).
6. The method according to claim 1, characterized in that, The specific process of step S5 is as follows: S51, Cross-modal adversarial training: • Architecture Construction: The generator integrates multi-scale feature extraction, dynamic weight fusion, and quantum enhancement modules, outputting optimized unified characterizations of RNA and drugs. , The discriminator includes a modal discriminator. and correlation discriminant ; Discriminator Training: Modal Discriminator Taking a single modality feature as input, the output is the modality class probability (sequence, text, image), and the loss function is cross-entropy loss. Association discriminator To characterize the The input is the probability of the associated label (0 or 1), and the output is the cross-entropy loss function. • Calculation of adversarial losses: Total adversarial losses are The generator minimizes This allows for the obfuscation of modal differences in features, thereby enhancing the ability to predict associations. S52. Association Probability Calculation: The RNA and drug representations optimized through adversarial training are matrix-multiplied, and the association probability is output using the sigmoid function. The formula is as follows: ,in Let be the association probability between the r-th RNA and the d-th drug, with a value in the range [0,1].
7. The method according to claim 2, characterized in that, In step S11, the BLAST tool has an E value ≤ 1e-5 and an alignment length ≥ 50bp; in step S121, the CNN uses 3×3 / 5×5 / 7×7 convolutional kernels; in step S122, the Transformer has 4 / 8 / 12 attention heads and 128 / 256 / 512 hidden layers.
8. The method according to claim 5, characterized in that, In step S43, the number of quantum measurements is 1024; in step S44, isolated nodes are filled with their own feature mean; in step S51, the discriminator uses a 3-layer fully connected network with the activation function LeakyReLU (negative slope 0.01); the model training uses stratified sampling 5-fold cross-validation.