A method for predicting retinopathy of prematurity

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a multimodal deep learning model and combining it with multimodal data from newborns within 72 hours after birth, the problem of early and accurate diagnosis of neonatal retinopathy of prematurity was solved, enabling early, non-invasive, multi-dimensional prediction and improving prediction accuracy and clinical applicability.

CN122245727APending Publication Date: 2026-06-19NORTHWEST WOMEN & CHILDREN HOSPITAL

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: NORTHWEST WOMEN & CHILDREN HOSPITAL
Filing Date: 2026-04-23
Publication Date: 2026-06-19

Application Information

Patent Timeline

23 Apr 2026

Application

19 Jun 2026

Publication

CN122245727A

IPC: G16H50/20; G16H50/30; G16H50/70; G06F18/241; G06F18/2433; G06F18/25

AI Tagging

Application Domain

Medical data mining Health-index calculation

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Current technologies for diagnosing neonatal retinopathy suffer from problems such as lag, high risk, and low accuracy. In particular, fundus examinations, which rely on specialized resources, are difficult to predict retinopathy early and accurately.

Method used

By collecting multimodal data from newborns within 72 hours of birth, a deep learning fusion model was constructed, including basic clinical data, fundus imaging data, and serum biomarker data. The multimodal fusion deep learning model was used to make predictions and output the risk level and confidence level of ROP.

Benefits of technology

It achieves early, non-invasive, multi-dimensional, and accurate prediction, lowers the resource threshold, is suitable for promotion in primary hospitals, improves prediction accuracy and sensitivity, and outputs key influencing factors to assist clinical decision-making.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122245727A_ABST

Patent Text Reader

Abstract

This invention discloses a method for predicting neonatal retinopathy of prematurity (ROP), comprising the following steps: S1, collecting multimodal data of newborns within 72 hours after birth; S2, preprocessing and extracting features from the multimodal data; S3, constructing a multimodal fusion deep learning model containing three feature extraction branches, a cross-modal attention fusion layer, and a risk classification layer; S4, training the multimodal fusion deep learning model and optimizing it through data augmentation and a weighted cross-entropy loss function to obtain the optimal prediction model; S5, inputting the preprocessed data of the newborn to be predicted into the optimal prediction model, and outputting the ROP risk level, confidence level, and key influencing factors. This invention employs the above-mentioned method for predicting neonatal retinopathy of prematurity, which is characterized by early detection and non-invasiveness, enabling accurate multi-dimensional prediction, and possesses strong clinical applicability, thus helping to reduce the resource threshold for neonatal retinopathy screening.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of medical data processing technology, and in particular to a method for predicting neonatal retinopathy. Background Technology

[0002] Retinopathy of prematurity (ROP) is a blinding disease caused by abnormal retinal vascular development in premature infants. Its pathogenesis is closely related to factors such as hypoxia and angiogenesis factor imbalance. Currently, clinical diagnosis relies on fundus examination (such as indirect ophthalmoscopy and fundus photography), which presents the following key issues: (1) Delayed diagnosis: The first examination is usually performed at 31-32 weeks of corrected gestational age. Some children have already entered the disease progression stage at this time, missing the best intervention window; (2) Invasive risks: The examination requires pupil dilation and contact operation, which may cause complications such as eye infection and increased intraocular pressure in premature infants; (3) Reliance on professional resources: Interpretation of fundus images requires experienced ophthalmologists, and primary hospitals are prone to missed diagnoses due to insufficient resources (missed diagnoses rate is about 20%). (4) Single prediction dimension: Existing models are mostly based on basic indicators such as "gestational age + birth weight", without integrating retinal structural features and molecular markers, and the accuracy is less than 70%.

[0003] Therefore, there is an urgent need for an early and multi-dimensional ROP prediction method to address the pain points of "lag, high risk, and low accuracy" in existing technologies. Summary of the Invention

[0004] The purpose of this invention is to provide a method for predicting neonatal retinopathy of prematurity (ROP). By integrating multimodal non-invasive data and constructing a deep learning fusion model, it can achieve early and accurate prediction of ROP and provide decision support for clinical intervention.

[0005] To achieve the above objectives, the present invention provides a method for predicting neonatal retinopathy, comprising the following steps: S1. Collect multimodal data of newborns within 72 hours after birth. The multimodal data includes clinical basic data, fundus imaging data, and serum biomarker data. S2. Preprocess and extract features from multimodal data, including handling missing values and feature encoding of clinical data, enhancement of fundus images and extraction of structural features, calibration of serum biomarker concentrations and time series processing, and standardization of all data to the [0, 1] interval; S3. Construct a multimodal fusion deep learning model that includes three feature extraction branches, a cross-modal attention fusion layer, and a risk classification layer; S4. Train a multimodal fusion deep learning model using multi-center labeled data, and optimize it through data augmentation and weighted cross-entropy loss function to obtain the optimal prediction model; S5. Input the preprocessed data of the newborns to be predicted into the optimal prediction model, and output the ROP risk level, confidence level and key influencing factors.

[0006] Preferably, in S1, the basic clinical data is extracted through the electronic medical record system, including: Demographic characteristics: corrected gestational age, birth weight, sex; Clinical indicators: Apgar score, duration of oxygen therapy, frequency of fluctuations in arterial blood oxygen saturation, and whether there is concurrent acute respiratory distress syndrome; Fundus imaging data were collected using a non-contact, non-mydriatic fundus screening instrument to capture images of the posterior pole retina of both eyes, avoiding contact with the eyes during the acquisition process; Serum biomarker data were obtained from heel blood samples, and the concentrations of the following indicators were detected using enzyme-linked immunosorbent assay (ELISA): Angiogenic factors: Vascular endothelial growth factor, insulin-like growth factor-1; Inflammatory factors: Interleukin-6, Tumor necrosis factor-1 .

[0007] Preferably, in S2, the missing values of clinical data are handled by stratified interpolation, which groups the data by corrected gestational age and birth weight, and calculates the mean within each group to fill in the missing values; the feature coding converts categorical variables into one-hot codes, where 1 = merged and 0 = not merged, and continuous variables retain their original values. Fundus image enhancement employs adaptive histogram equalization to enhance the contrast between blood vessels and background, and Gaussian filtering is used to remove noise. Structural feature extraction is based on the U-Net segmentation model to extract the retinal vascular region, and the following are calculated: vascular branch density, vascular diameter variation coefficient, and uniformity of gray values in the macular region. Serum biomarker concentration calibration corrects the original absorbance values based on the standard curve of the test kit to obtain the actual concentration; during time series processing, if there are no fewer than three tests, time series data is constructed, and missing time points are supplemented by linear interpolation.

[0008] Preferably, the formula for mapping clinical, imaging, and serum data to the [0, 1] interval is as follows: ; in, , These are the minimum and maximum values of the features in the training set.

[0009] Preferably, in S3, the three feature extraction branches are as follows: Clinical data feature extraction network: A fully connected neural network is used, with the following structure: input layer, containing 8-dimensional clinical features → hidden layer 1, containing 64 neurons → hidden layer 2, containing 128 neurons → output layer, containing 128-dimensional clinical feature vectors. Fundus image feature extraction network: Based on a lightweight convolutional neural network, the structure is as follows: Input layer, containing 3-channel fundus images, → 16 depthwise separable convolutional layers → global average pooling layer → output layer, containing 256-dimensional image feature vectors; Serum biomarker feature extraction network: employs gated recurrent units, with the following structure: Input layer, containing 4-dimensional biomarker time series data → GRU layer, containing 64 hidden units → fully connected layer, containing 128 neurons → output layer, containing 128-dimensional serum feature vectors.

[0010] Preferably, the cross-modal attention fusion layer calculates the attention weights of the three branch feature vectors, specifically as follows: For clinical F 1. Images F 2. Align the serum F3 feature vectors dimensionally; calculate the importance weights of each modality: ; in, These are the weighting coefficients. and The first i The and the first j The core feature vector of each modality = ( F 1+ F 2 + F3) / 3 is the average eigenvector of the three modal eigenvectors. S For similarity functions; Generate fused feature vectors: F 融合 = 1× F 1+ 2× F 2+ 3× F 3.

[0011] Preferably, the risk classification layer adopts a two-layer fully connected network with the following structure: fused features → hidden layer containing 64 neurons → output layer containing 3 neurons; The output results correspond to the ROP risk level: Category 0: No risk; Category 1: Low risk; Category 2: High risk.

[0012] Preferably, in S4, the dataset is first constructed by collecting samples from hospital NICUs and dividing them into training, validation, and test sets in a ratio of 7:2:1. Data augmentation includes: Image enhancement: Randomly rotate, adjust brightness, and horizontally flip fundus images to expand the image sample size; Class balancing: Oversampling + weighted sampling is used. High-risk samples are oversampled to a 1:1 ratio with low-risk samples. During training, samples are sampled according to class weights.

[0013] Preferably, the parameter settings in the optimization process of the weighted cross-entropy loss function are as follows: Optimizer: Adam, initial learning rate 0.001, decaying by 10% every 5 epochs; Loss function: Weighted cross-entropy loss, the formula is as follows: ; in, L This represents the weighted cross-entropy loss value of the model over a batch of samples. c For category weights, y i,c Authentic labels To predict probabilities; Training termination condition: The validation set accuracy shows no improvement for 5 consecutive epochs; save the optimal model parameters. Model validation: Five-fold cross-validation was used to ensure the stability of the model on different center data.

[0014] Preferably, in S5, the output ROP risk level includes "no risk / low risk / high risk", corresponding to ROP stage; the confidence level output is the probability value of the corresponding risk level; key influencing factors are traced back through the attention mechanism, and the top 3 features that contribute to the prediction are output to help doctors understand the prediction logic.

[0015] Therefore, the beneficial effects of the above-mentioned method for predicting neonatal retinopathy are as follows: (1) The method of the present invention has the characteristics of early detection and non-invasiveness. It can be predicted within 72 hours of the newborn's birth, which is 2-4 weeks earlier than the first clinical examination. It uses non-mydriatic imaging and heel blood to avoid the risk of invasive operation.

[0016] (2) The method of the present invention can achieve multi-dimensional accurate prediction, integrating three layers of data: clinical, imaging and molecular. The accuracy, sensitivity and specificity are significantly better than traditional models.

[0017] (3) The method of the present invention has strong clinical applicability, outputs key influencing factors, and solves the "black box" problem of deep learning; the model's single-case prediction time is <1 second, and it can be integrated into the NICU electronic medical record system for real-time use.

[0018] (4) The method of the present invention can reduce the resource threshold and can complete the screening without the need for professional ophthalmologists, making it suitable for promotion in primary hospitals.

[0019] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0020] Figure 1 This is a flowchart illustrating an embodiment of a neonatal retinopathy prediction method according to the present invention. Detailed Implementation

[0021] The technical solution of the present invention will be further described below with reference to the accompanying drawings and embodiments.

[0022] Unless otherwise defined, the technical or scientific terms used in this invention shall have the ordinary meaning understood by one of ordinary skill in the art to which this invention pertains. The terms "first," "second," and similar terms used in this invention do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Terms such as "comprising" or "including" mean that the element or object preceding the word encompasses the elements or objects listed following the word and their equivalents, without excluding other elements or objects. Terms such as "connected" or "linked" are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. Terms such as "upper," "lower," "left," and "right" are used only to indicate relative positional relationships; when the absolute position of the described object changes, the relative positional relationship may also change accordingly.

[0023] Example 1: like Figure 1 As shown, the present invention provides a method for predicting neonatal retinopathy, comprising the following steps: S1. Collect multimodal data of newborns within 72 hours after birth. Multimodal data includes clinical basic data, fundus imaging data, and serum biomarker data.

[0024] Clinical baseline data: extracted through the electronic medical record system, including: Demographic characteristics: corrected gestational age (accurate to the day), birth weight (accurate to the gram), sex; Clinical indicators: Apgar score (1 minute / 5 minutes), duration of oxygen therapy (cumulative hours), frequency of arterial oxygen saturation fluctuations (number of fluctuations >5% within 24 hours), and whether there is concurrent acute respiratory distress syndrome (RDS).

[0025] Fundus imaging data: Non-contact, non-mydriatic fundus screening instrument (Canon CR-2) was used to acquire images of the posterior pole retina of both eyes (resolution 1920×1080 pixels), avoiding contact with the eyes during acquisition (<5 seconds).

[0026] Serum biomarker data: Heel blood was collected (blood volume < 0.5 mL), and the concentrations of the following indicators were detected by enzyme-linked immunosorbent assay (ELISA): Angiogenic factors: Vascular endothelial growth factor, insulin-like growth factor-1; Inflammatory factors: Interleukin-6, Tumor necrosis factor-1 .

[0027] S2. Preprocess and extract features from the multimodal data, and standardize all data to the [0, 1] interval.

[0028] Clinical data preprocessing: Missing value handling: Stratified interpolation was used to group the data by corrected gestational age (each 2 weeks as one layer) and birth weight (each 500g as one layer), and the mean within each group was calculated to fill in the missing values; Feature encoding: Categorical variables are converted into one-hot encodings, 1 = merged, 0 = not merged, and continuous variables retain their original values.

[0029] Fundus image feature extraction: Fundus image enhancement: Adaptive histogram equalization is used to enhance the contrast between blood vessels and the background, and Gaussian filtering is used to remove noise; Structural feature extraction: Retinal vascular regions were extracted based on the U-Net segmentation model, and the following calculations were made: vascular branch density (number of vascular branch points per unit area, branches / mm). 2 ), coefficient of variation of blood vessel diameter (standard deviation of diameter / mean) and uniformity of gray value in the macular region (standard deviation of gray value of pixels in the macular region).

[0030] Serum biomarker pretreatment: Serum biomarker concentration calibration: Correct the original absorbance value according to the standard curve of the test kit to obtain the actual concentration (pg / mL). Time series processing: If there are at least three tests (24h, 48h, and 72h after birth), construct time series data (time step 24h), and use linear interpolation to fill in missing time points.

[0031] Data standardization: The formula for mapping various clinical, imaging, and serum data to the [0, 1] interval is as follows: in, , These are the minimum and maximum values of the features in the training set.

[0032] S3. Construct a multimodal fusion deep learning model that includes three feature extraction branches, a cross-modal attention fusion layer, and a risk classification layer.

[0033] The three feature extraction branches are as follows: Clinical data feature extraction network: A fully connected neural network (FCN) is used, with the following structure: input layer (8-dimensional clinical features) → hidden layer 1 (64 neurons, ReLU activation) → hidden layer 2 (128 neurons, ReLU activation) → output layer (128-dimensional clinical feature vector). Its function is to capture the linear and nonlinear correlation between gestational age, oxygen therapy history, and ROP.

[0034] Fundus image feature extraction network: based on a lightweight convolutional neural network (MobileNetV3-small), with the following structure: Input layer (3-channel fundus image, 224×224 pixels) → 16 depthwise separable convolutional layers (including attention module) → global average pooling layer → output layer (256-dimensional image feature vector). Its function is to extract early morphological features such as abnormal vascular branching and structural changes in the macular region.

[0035] Serum biomarker feature extraction network: A gated recurrent unit (GRU) is used, with the following structure: Input layer (4-dimensional biomarker time series data, time step 3) → GRU layer (64 hidden units, returned sequence) → fully connected layer (128 neurons, ReLU activation) → output layer (128-dimensional serum feature vector). Its function is to model the dynamic trends of indicators such as VEGF and IL-6 (e.g., a sudden increase / decrease in VEGF within 72 hours).

[0036] The cross-modal attention fusion layer calculates the attention weights for the three branch feature vectors, specifically as follows: For clinical F 1. Images F 2. Serum F3 feature vectors were dimensionally aligned (all were 128-dimensional, and image features were compressed using a fully connected layer). Calculate the importance weights of each mode: in, These are the weighting coefficients. and The first i The and the first j The core feature vector of each modality = ( F 1+ F 2 +F3) / 3 is the average eigenvector of the three modal eigenvectors. S For similarity functions; Generate fused feature vectors: F 融合 = 1× F 1+ 2× F 2+ 3× F 3.

[0037] The risk classification layer uses a two-layer fully connected network with the following structure: fused features (128 dimensions) → hidden layer (64 neurons, ReLU activation) → output layer (3 neurons, softmax activation). The output results correspond to the ROP risk level: Category 0: No risk (ROP phase < 1 phase); Category 1: Low risk (ROP phases 1-2); Category 2: High risk (ROP stage ≥ 3, requiring clinical intervention).

[0038] S4. A multimodal fusion deep learning model is trained using multi-center labeled data, and the optimal prediction model is obtained through data augmentation and weighted cross-entropy loss function optimization.

[0039] First, we constructed a dataset: we collected samples from the NICU of a hospital and divided them into a training set, a validation set, and a test set in a ratio of 7:2:1 (sample inclusion criteria: corrected gestational age <32 weeks or birth weight <1500g, and no congenital eye malformations).

[0040] Data augmentation: Image enhancement: Randomly rotate (±15°), adjust brightness (±20%), and horizontally flip the fundus image to expand the image sample size; Class balancing: Oversampling + weighted sampling is used to oversample high-risk samples (class 2) to a ratio of 1:1 with low-risk samples (class 1). During training, sampling is performed according to class weights (class 0: class 1: class 2 = 1:2:3).

[0041] The specific parameter settings during the optimization process of the weighted cross-entropy loss function are as follows: Optimizer: Adam, initial learning rate 0.001, decaying by 10% every 5 epochs; Loss function: Weighted cross-entropy loss, the formula is as follows: in, L This represents the weighted cross-entropy loss value of the model over a batch of samples. c Class weights (Class 0 = 1, Class 1 = 2, Class 2 = 3). y i,c Authentic labels To predict probabilities; Training termination condition: The validation set accuracy shows no improvement for 5 consecutive epochs, and the optimal model parameters are saved.

[0042] Model validation: Five-fold cross-validation was used to ensure the stability of the model on different center data.

[0043] S5. Input the preprocessed data of the newborns to be predicted into the optimal prediction model, and output the ROP risk level, confidence level and key influencing factors.

[0044] The output ROP risk level includes "no risk / low risk / high risk", corresponding to the ROP stage; the confidence level output is the probability value of the corresponding risk level (e.g., "high risk, confidence level 0.92"); key influencing factors are traced back through the attention mechanism, and the top 3 features that contribute to the prediction are output to help doctors understand the prediction logic.

[0045] Based on the above method, this embodiment selected 1,500 cases of data from the NICU of 5 hospitals in China from 2021 to 2023, including 450 cases diagnosed with ROP (120 cases of stage 1, 180 cases of stage 2, and 150 cases of stage 3 and above), and 1,050 cases negative.

[0046] The clinical data completeness rate was 97.8%, with the missing value mainly being the blood oxygen fluctuation frequency (missing rate 3.2%), which was supplemented by interpolation with the same gestational age group; The pass rate for fundus images was 96.3%, and unqualified images (blurred / eyelid occlusion) were supplemented by re-acquiring; The pass rate for serum biomarker detection was 99.2%, with concentration ranges of VEGF (20-350 pg / mL), IGF-1 (10-180 ng / mL), IL-6 (5-200 pg / mL), and TNF-α (8-150 pg / mL).

[0047] Preprocessed data is stored in CSV format (clinical, serum) and PNG format (images), and automated processing is achieved using Python (Pandas, OpenCV library).

[0048] Model training and validation: The model is implemented based on the TensorFlow 2.10 framework, and the hardware environment is an NVIDIA A100 graphics card (40GB of video memory).

[0049] Training process: batch size=32, training for 50 epochs, the best accuracy on the validation set is 92.8% (32nd epoch).

[0050] The results of the test set comparison with those of the traditional clinical model are shown in Table 1: Table 1. Comparison of data from this embodiment with traditional clinical models and single imaging models.

[0051] The results show that the method in this embodiment is significantly better than the traditional model and the single image model in terms of accuracy, sensitivity, specificity, and high-risk identification rate.

[0052] Therefore, the above-mentioned method for predicting neonatal retinopathy has the characteristics of being early and non-invasive, can achieve accurate prediction in multiple dimensions, and has strong clinical applicability, which helps to reduce the resource threshold for neonatal retinopathy screening.

[0053] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the technical solutions of the present invention, and these modifications or equivalent substitutions cannot cause the modified technical solutions to deviate from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method for predicting neonatal retinopathy, characterized in that, Includes the following steps: S1. Collect multimodal data of newborns within 72 hours after birth. The multimodal data includes clinical basic data, fundus imaging data, and serum biomarker data. S2. Perform preprocessing and feature extraction on multimodal data, including handling missing values and feature encoding of clinical data, enhancement of fundus images and extraction of structural features, calibration of serum biomarker concentrations and time series processing, and standardize all data to the [0, 1] interval; S3. Construct a multimodal fusion deep learning model that includes three feature extraction branches, a cross-modal attention fusion layer, and a risk classification layer; S4. Train a multimodal fusion deep learning model using multi-center labeled data, and optimize it through data augmentation and weighted cross-entropy loss function to obtain the optimal prediction model; S5. Input the preprocessed data of the newborns to be predicted into the optimal prediction model, and output the ROP risk level, confidence level and key influencing factors.

2. The method for predicting neonatal retinopathy according to claim 1, characterized in that: In S1, basic clinical data is extracted through the electronic medical record system, including: Demographic characteristics: corrected gestational age, birth weight, sex; Clinical indicators: Apgar score, duration of oxygen therapy, frequency of fluctuations in arterial blood oxygen saturation, and whether there is concurrent acute respiratory distress syndrome; Fundus imaging data were collected using a non-contact, non-mydriatic fundus screening instrument to capture images of the posterior pole retina of both eyes, avoiding contact with the eyes during the acquisition process; Serum biomarker data were obtained from heel blood samples, and the concentrations of the following indicators were detected using enzyme-linked immunosorbent assay (ELISA): Angiogenic factors: Vascular endothelial growth factor, insulin-like growth factor-1; Inflammatory factors: Interleukin-6, Tumor necrosis factor-1 .

3. The method for predicting neonatal retinopathy according to claim 1, characterized in that: In S2, missing values in clinical data are handled by stratified interpolation, grouping by corrected gestational age and birth weight, and calculating the mean within each group to fill in the missing values; feature coding converts categorical variables into one-hot codes, with 1 = merged and 0 = not merged, while continuous variables retain their original values. Fundus image enhancement employs adaptive histogram equalization to enhance the contrast between blood vessels and the background, and Gaussian filtering to remove noise. Structural feature extraction is based on the U-Net segmentation model to extract the retinal vascular region and calculates: vascular branch density, vascular diameter variation coefficient and uniformity of gray value in the macular region. Serum biomarker concentration calibration corrects the original absorbance values based on the standard curve of the test kit to obtain the actual concentration; during time series processing, if there are no fewer than three tests, time series data is constructed, and missing time points are supplemented by linear interpolation.

4. The method for predicting neonatal retinopathy according to claim 3, characterized in that: The formula for mapping clinical, imaging, and serum data to the [0, 1] interval is as follows: ； in, , These are the minimum and maximum values of the features in the training set.

5. The method for predicting neonatal retinopathy according to claim 1, characterized in that: In S3, the three feature extraction branches are as follows: Clinical data feature extraction network: A fully connected neural network is used, with the following structure: input layer, containing 8-dimensional clinical features → hidden layer 1, containing 64 neurons → hidden layer 2, containing 128 neurons → output layer, containing 128-dimensional clinical feature vectors. Fundus image feature extraction network: Based on a lightweight convolutional neural network, the structure is as follows: Input layer, containing 3-channel fundus images, → 16 depthwise separable convolutional layers → global average pooling layer → output layer, containing 256-dimensional image feature vectors; Serum biomarker feature extraction network: Employing gated recurrent units, the structure is as follows: Input layer, containing 4-dimensional biomarker time series data → GRU layer, containing 64 hidden units → fully connected layer, containing 128 neurons → output layer, containing 128-dimensional serum feature vectors.

6. The method for predicting neonatal retinopathy according to claim 5, characterized in that: The cross-modal attention fusion layer calculates the attention weights for the three branch feature vectors, specifically as follows: For clinical F 1. Images F 2. Align the serum F3 feature vectors dimensionally; calculate the importance weights of each modality: ； in, These are the weighting coefficients. and The first i The and the first j The core feature vector of each modality =( F 1+ F 2 + F3) / 3 is the average eigenvector of the three modal eigenvectors. S For similarity functions; Generate fused feature vectors: F 融合 = 1× F 1+ 2× F 2+ 3× F 3.

7. The method for predicting neonatal retinopathy according to claim 6, characterized in that: The risk classification layer uses a two-layer fully connected network with the following structure: fused features → hidden layer containing 64 neurons → output layer containing 3 neurons; The output results correspond to the ROP risk level: Category 0: No risk; Category 1: Low risk; Category 2: High risk.

8. The method for predicting neonatal retinopathy according to claim 1, characterized in that: In S4, the dataset is first constructed by collecting samples from hospital NICUs and dividing them into training, validation, and test sets in a ratio of 7:2:

1. Data augmentation includes: Image enhancement: Randomly rotate, adjust brightness, and horizontally flip fundus images to expand the image sample size; Class balancing: Oversampling + weighted sampling is used. High-risk samples are oversampled to a 1:1 ratio with low-risk samples. During training, samples are sampled according to class weights.

9. A method for predicting neonatal retinopathy according to claim 8, characterized in that: The specific parameter settings during the optimization process of the weighted cross-entropy loss function are as follows: Optimizer: Adam, initial learning rate 0.001, decaying by 10% every 5 epochs; Loss function: Weighted cross-entropy loss, the formula is as follows: ； in, L This represents the weighted cross-entropy loss value of the model over a batch of samples. c For category weights, y i,c Authentic labels To predict probabilities; Training termination condition: The validation set accuracy shows no improvement for 5 consecutive epochs; save the optimal model parameters. Model validation: Five-fold cross-validation was used to ensure the stability of the model on different center data.

10. A method for predicting neonatal retinopathy according to claim 1, characterized in that: In S5, the output ROP risk level includes "no risk / low risk / high risk", corresponding to the ROP stage; the confidence level output is the probability value of the corresponding risk level; Key influencing factors are traced back through an attention mechanism to output the top 3 features that contribute to the prediction, helping doctors understand the prediction logic.