Alzheimer's disease image analysis system and analysis method based on reinforcement learning

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By integrating multimodal longitudinal data through a reinforcement learning system, the limitations of existing methods for diagnosing and analyzing the prognosis of Alzheimer's disease have been overcome. This has enabled early, dynamic, and accurate diagnosis and personalized prognostic assessment of Alzheimer's disease, improving diagnostic accuracy and predictive precision.

CN122245696APending Publication Date: 2026-06-19ZHUNENG TECHNOLOGY (JIAXING) CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: ZHUNENG TECHNOLOGY (JIAXING) CO LTD
Filing Date: 2026-01-21
Publication Date: 2026-06-19

Application Information

Patent Timeline

21 Jan 2026

Application

19 Jun 2026

Publication

CN122245696A

IPC: G16H50/20; G16H50/30; G16H50/70; G16H30/40; G16H15/00; G06T7/00; G06V20/64; G06V10/62; G06V10/82; G06N3/0442; G06N3/045; G06N3/0464; G06N3/084; G06N3/092; G06N3/096

AI Tagging

Application Domain

Medical data mining Image analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Current methods for diagnosing and predicting the prognosis of Alzheimer's disease mainly rely on single static imaging data, failing to effectively integrate dynamic information about the disease's evolution over time, as well as multimodal clinical biomarkers and cognitive score data, thus limiting diagnostic accuracy and prognostic precision.

Method used

We employ a reinforcement learning-based image analysis system for Alzheimer's disease. By deeply fusing multimodal longitudinal data and leveraging the decision optimization capabilities of reinforcement learning, we integrate multi-sequence magnetic resonance imaging, positron emission tomography, cognitive function scale scores, and cerebrospinal fluid biomarker data to capture the dynamic patterns and multidimensional characteristics of the disease's evolution over time, enabling early, dynamic, and accurate diagnosis and personalized prognostic assessment.

Benefits of technology

It improves the comprehensiveness and accuracy of Alzheimer's disease diagnosis, enabling earlier identification of individuals in the preclinical stage or mild cognitive impairment stage, providing high-precision risk prediction and diagnostic support, and maintaining the advanced performance and robustness of the system through continuous learning and knowledge updates.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122245696A_ABST

Patent Text Reader

Abstract

This invention relates to the field of medical image processing technology, specifically disclosing an image analysis system and method for Alzheimer's disease based on reinforcement learning. This system achieves early, dynamic, and accurate diagnosis and personalized prognostic assessment of Alzheimer's disease by deeply fusing multimodal longitudinal data and utilizing the decision optimization capabilities of reinforcement learning, thereby improving the accuracy of diagnosis and prognostic prediction. The system innovatively integrates multimodal data such as multi-sequence magnetic resonance imaging, positron emission tomography (PET), cognitive function scale scores, and cerebrospinal fluid biomarkers, and particularly emphasizes longitudinal time-series analysis of these data, fundamentally overcoming the limitations of existing methods that rely solely on single static image analysis. By capturing the dynamic patterns and multidimensional characteristics of disease evolution over time, it improves the comprehensiveness and accuracy of Alzheimer's disease diagnosis and lays a solid foundation for accurate prognostic assessment.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of medical image processing, specifically relating to an image analysis system and method for Alzheimer's disease based on reinforcement learning. Background Technology

[0002] Neurodegenerative diseases, particularly Alzheimer's disease (AD), have become a major global health challenge. Early diagnosis, disease progression assessment, and prognostic prediction are crucial for clinical intervention and patient management. Medical imaging technologies, such as magnetic resonance imaging (MRI) and positron emission tomography (PET), play a central role in detecting AD-related structural and functional changes in the brain, providing important evidence for non-invasive disease assessment. However, due to the complexity of the pathophysiological mechanisms of AD and its heterogeneity among individuals, relying solely on traditional image interpretation is insufficient for achieving high-precision, early, and comprehensive disease analysis.

[0003] Among these, medical image analysis methods based on machine learning and deep learning offer new avenues for overcoming the limitations of traditional interpretation. These methods aim to assist or even automate disease detection, classification, and prognosis by learning patterns from large amounts of image data. However, the diagnosis and progression assessment of Alzheimer's disease is a complex process involving multimodal information and time-dependent characteristics. Analyzing only static images makes it difficult to capture the dynamic evolution of the disease from mild cognitive impairment (MCI) to AD. Furthermore, existing analytical models still face challenges in integrating heterogeneous clinical data, such as cognitive scores, genetic information, and cerebrospinal fluid biomarkers.

[0004] Current technologies applied to image analysis for Alzheimer's disease generally suffer from several shortcomings. First, existing methods are mostly based on single MRI or PET scans, neglecting crucial information over time and failing to effectively capture the dynamic progression of the disease, which is essential for early diagnosis and treatment evaluation. Second, rich clinical data such as patient cognitive scores and cerebrospinal fluid biomarkers are often not effectively integrated into image analysis models, preventing the models from fully utilizing multi-dimensional information for more comprehensive and accurate judgments. These deficiencies result in significant limitations in the accuracy, robustness, and generalization ability of existing systems when facing early diagnosis, disease progression prediction, and personalized treatment planning for AD. Summary of the Invention

[0005] The purpose of this invention is to address the problem that existing methods for diagnosing and predicting the prognosis of Alzheimer's disease mainly rely on single static image data, failing to effectively integrate dynamic information about the disease's evolution over time, as well as multimodal clinical biomarkers and cognitive score data, which limits the accuracy of diagnosis and the precision of prognosis prediction. This invention provides an Alzheimer's disease image analysis system and analysis method based on reinforcement learning. By deeply fusing multimodal longitudinal data and utilizing the decision optimization capabilities of reinforcement learning, it enables early, dynamic, and accurate diagnosis and personalized prognostic assessment of Alzheimer's disease.

[0006] This invention provides an image analysis system for Alzheimer's disease based on reinforcement learning, comprising: a data acquisition and preprocessing unit, a multimodal feature extraction unit, a temporal state representation unit, a reinforcement learning decision-making unit, a decision output and evaluation unit, and a continuous learning and knowledge updating unit.

[0007] The data acquisition and preprocessing unit is used to acquire patients' multi-sequence magnetic resonance imaging data, positron emission tomography data, cognitive function scale scores, and cerebrospinal fluid biomarker data, and to perform preprocessing operations such as image registration, standardization, noise reduction, and normalization on the acquired data.

[0008] The multimodal feature extraction unit is connected to the data acquisition and preprocessing unit and is used to perform deep learning feature extraction on the preprocessed magnetic resonance imaging data and positron emission tomography data, and to perform statistical feature extraction on the cognitive function scale score data and cerebrospinal fluid biomarker data.

[0009] The temporal state representation unit is connected to the multimodal feature extraction unit and is used to integrate the extracted multimodal features into a temporal state vector containing the patient's current disease state and historical evolution trajectory.

[0010] The reinforcement learning decision unit, connected to the temporal state representation unit, includes a policy network and a value network. The policy network outputs diagnostic or prognostic decision actions based on the temporal state vector. The value network evaluates the potential value of the temporal state vector. The reinforcement learning decision unit receives reward signals by interacting with a preset environment and updates the parameters of the policy network and the value network based on the reward signals, thereby optimizing the decision strategy.

[0011] The decision output and evaluation unit is connected to the reinforcement learning decision unit and is used to generate a diagnostic report, a disease progression risk prediction report, or personalized treatment suggestions for Alzheimer's disease based on the optimized decision strategy output by the reinforcement learning decision unit. At the same time, it generates a reward signal and feeds it back to the reinforcement learning decision unit based on the actual clinical effect of the diagnostic report, risk prediction report, or treatment suggestions.

[0012] The continuous learning and knowledge update unit is connected to the decision output and evaluation unit. It is used to continuously receive new patient follow-up data and clinical treatment results, and use the new data to retrain and optimize the model of the reinforcement learning decision unit, so as to realize the dynamic update and knowledge accumulation of the system model.

[0013] This invention also provides a reinforcement learning-based image analysis method for Alzheimer's disease, comprising the following steps: Step 1: Data Acquisition and Preprocessing; Collect the patient's multi-sequence magnetic resonance imaging data, positron emission tomography data, cognitive function scale score data, and cerebrospinal fluid biomarker data, and perform preprocessing such as image registration, standardization, noise reduction, and normalization on the collected data.

[0014] Step 2: Multimodal feature extraction; Deep learning feature extraction is performed on the preprocessed magnetic resonance imaging data and positron emission tomography data, and statistical feature extraction is performed on the cognitive function scale score data and cerebrospinal fluid biomarker data.

[0015] Step 3: Temporal state representation; The extracted multimodal features are integrated into a temporal state vector containing the patient's current disease state and historical evolution trajectory.

[0016] Step 4: Enhanced learning decision-making; Utilizing a reinforcement learning model, including a policy network and a value network, wherein the policy network outputs diagnostic or prognostic decision actions based on the temporal state vector, and the value network evaluates the potential value of the temporal state vector; By interacting with a preset environment, receiving reward signals, and updating the parameters of the policy network and the value network based on the reward signals, the decision-making strategy is optimized.

[0017] Step 5: Decision Output and Evaluation; Based on the optimized decision strategy output by the reinforcement learning decision model, generate a diagnosis report, a disease progression risk prediction report, or personalized treatment recommendations for Alzheimer's disease. At the same time, based on the actual clinical effects of the diagnosis report, risk prediction report, or treatment recommendations, generate a reward signal to feed back to the reinforcement learning model.

[0018] Step 6: Continuous learning and knowledge updating; continuously receive new patient follow-up data and clinical treatment results, and use the new data to retrain and optimize the reinforcement learning model to achieve dynamic model updates and knowledge accumulation.

[0019] In a preferred embodiment of the present invention, the magnetic resonance imaging data acquired by the data acquisition and preprocessing unit includes T1-weighted images, T2-weighted images, FLAIR sequence images, and diffusion tensor imaging data. The positron emission tomography (PET) data includes 18F-FDG PET data and amyloid PET data. The cognitive function scale scoring data includes the Mini-Mental State Examination (MMSE) score and the ADAS-Cog score (cognitive portion) of the Alzheimer's Disease Rating Scale. The cerebrospinal fluid biomarker data includes the concentrations of amyloid-42 Aβ42, total Tau protein (T-Tau), and phosphorylated Tau protein (P-Tau).

[0020] Furthermore, the preprocessing operations performed by the data acquisition and preprocessing unit include: performing skull dissection on the magnetic resonance imaging data and positron emission tomography data; using an image registration algorithm based on affine transformation and nonlinear deformation field to register all image data to a unified standard brain space; and performing intensity normalization and Gaussian smoothing filtering to denoise the registered image data.

[0021] In a preferred embodiment of the present invention, the multimodal feature extraction unit extracts features in the following manner: The magnetic resonance imaging (MRI) data and positron emission tomography (PET) data were used to extract depth features using a 3D convolutional neural network (CNN) model. This 3D CNN model comprises multiple convolutional layers, activation layers, pooling layers, and fully connected layers, aiming to learn disease-related three-dimensional spatial patterns from the images. The depth features include, but are not limited to, potential representations of quantitative indicators such as the volume, cortical thickness, gray matter density, and white matter integrity of brain regions such as the hippocampus, entorhinal cortex, and amygdala.

[0022] The original scores, the rate of change during the follow-up period, and the Z-scores relative to age-matched healthy individuals were extracted from the cognitive function scale scores.

[0023] Statistical characteristics such as absolute concentration values, ratios, and trends of change during the follow-up period were extracted from the cerebrospinal fluid biomarker data.

[0024] In a preferred embodiment of the present invention, the temporal state representation unit employs a recurrent neural network model, such as a long short-term memory network or a gated recurrent unit, to serialize and encode the features output by the multimodal feature extraction unit, thereby generating a temporal state vector capable of capturing time dependence and disease progression trajectory. At each time step, the temporal state vector aggregates the multimodal feature information of the current moment with the feature evolution information of historical moments.

[0025] In a preferred embodiment of the present invention, the reinforcement learning decision unit employs a deep Q-network, an advantager-commentator algorithm, or a proximate policy optimization algorithm as its core reinforcement learning algorithm.

[0026] The policy network is composed of a multi-layer feedforward neural network. The input is the temporal state vector, and the output is a discrete probability distribution of decision actions. The decision actions include "diagnosing Alzheimer's disease", "diagnosing mild cognitive impairment", "diagnosing normal cognition", "predicting the risk level of disease progression to Alzheimer's disease", and "recommending specific drug interventions".

[0027] The value network consists of a multi-layer feedforward neural network, with the temporal state vector as input and the expected cumulative reward of the current state as output.

[0028] The preset environment consists of desensitized clinical data from historical patients, simulating the natural evolution of the patient's disease and the possible clinical outcomes of different decision-making actions.

[0029] The reward signal is designed as follows: a positive reward is given when the diagnostic or prognostic result output by the policy network is consistent with the patient's actual clinical follow-up results; a negative reward is given when the results are inconsistent. The reward signal may also include additional positive incentives for early diagnosis and effective intervention. The reinforcement learning decision unit updates the weight parameters of the policy network and the value network by maximizing the expected cumulative reward.

[0030] In a preferred embodiment of the present invention, the diagnostic report generated by the decision output and evaluation unit includes the diagnostic category, diagnostic confidence level, and key imaging and biomarker evidence. The disease progression risk prediction report includes the probability of progression to Alzheimer's disease within the next 1, 3, or 5 years, risk level classification, and key predictive factors. The personalized treatment recommendations are based on the patient's genotype, disease stage, and biomarker characteristics. The reward signal is generated specifically by comparing the diagnostic report or risk prediction report with the patient's subsequent diagnosis or actual disease progression path.

[0031] In a preferred embodiment of the present invention, the continuous learning and knowledge update unit employs a strategy combining online learning and offline retraining. Online learning updates model parameters in small batches to adapt to new patient data streams. Offline retraining periodically utilizes larger-scale accumulated historical data for comprehensive model optimization and validation, ensuring stable improvement in model performance.

[0032] In a preferred embodiment of the present invention, the system is deployed on a distributed computing cluster or cloud computing platform, utilizing parallel computing capabilities to process large-scale multimodal data and accelerate the training process of reinforcement learning models. The system provides an application programming interface (API) or a graphical user interface (GUI) to enable clinicians or researchers to perform interactive queries and analyses.

[0033] Compared with the prior art, the advantages and positive effects of the present invention are as follows: 1. This invention innovatively integrates multimodal data from multiple sequences of magnetic resonance imaging (MRI), positron emission tomography (PET), cognitive function scale scores, and cerebrospinal fluid biomarkers. It particularly emphasizes longitudinal time-series analysis of this data, fundamentally overcoming the limitations of existing methods that rely solely on single static image analysis. By capturing the dynamic patterns and multidimensional characteristics of disease evolution over time, it improves the comprehensiveness and accuracy of Alzheimer's disease diagnosis and lays a solid foundation for precise prognostic assessment.

[0034] 2. This invention introduces a reinforcement learning paradigm, enabling the system to transcend the static classification limitations of traditional supervised learning. It actively interacts with the simulated disease environment and learns optimal diagnostic and prognostic decision-making strategies from the feedback. This dynamic learning mechanism allows the system to adaptively optimize its decision-making strategies based on the patient's individual disease trajectory and response patterns to interventions, thereby providing more personalized and forward-looking disease management solutions.

[0035] 3. By deeply fusing multimodal longitudinal data and leveraging the decision-making optimization capabilities of reinforcement learning, this invention can identify individuals in the preclinical stage or with mild cognitive impairment earlier, providing high-precision risk prediction and diagnostic support during the critical early intervention window. This has extremely important clinical value for delaying disease progression and improving patients' quality of life.

[0036] 4. This invention incorporates a continuous learning and knowledge update unit, enabling the system to continuously absorb new clinical follow-up data and treatment outcomes as reward signals, thereby continuously optimizing the decision-making strategy of the reinforcement learning model. This built-in adaptive and self-improving capability ensures that the system maintains its advanced and robust diagnostic and prognostic performance when faced with new medical discoveries, evolving diagnostic criteria, or updated treatment protocols.

[0037] 5. The diagnostic results output by this invention through the strategy network not only include disease categories, but also include confidence levels and key imaging and biomarker evidence; prognostic predictions are refined to the risk probability and risk level within specific time windows, and personalized treatment recommendations are provided. This detailed, multi-layered output model enhances the transparency and reliability of clinical decision-making, assisting physicians in making more informed and evidence-based medical decisions. Attached Figure Description

[0038] Figure 1 This is a schematic diagram of the overall technical architecture of the Alzheimer's disease image analysis system based on reinforcement learning proposed in this invention; Figure 2 This is a schematic diagram of the core principle framework of reinforcement learning decision-making in this invention; Figure 3 This is a logical flowchart of the multimodal longitudinal data processing and temporal state representation in this invention; Figure 4 This is a schematic diagram of the interaction relationship and data flow between decision output, evaluation and continuous learning in this invention; Figure 5 This is a schematic diagram comparing the core principles of this invention with existing technologies in early disease diagnosis and personalized prognostic assessment. Detailed Implementation

[0039] This invention provides an image analysis system for Alzheimer's disease based on reinforcement learning. Please refer to the appendix. Figure 1 This system aims to address the limitations of existing methods for diagnosing and predicting the prognosis of Alzheimer's disease, which primarily rely on single, static imaging data and fail to effectively integrate dynamic information about the disease's evolution over time, as well as multimodal clinical biomarkers and cognitive scoring data. This results in limited diagnostic accuracy and prognostic precision. By deeply fusing multimodal longitudinal data and leveraging the decision-making optimization capabilities of reinforcement learning, the system achieves early, dynamic, and accurate diagnosis and personalized prognostic assessment of Alzheimer's disease. Deployed in a computing environment comprised of a distributed computing cluster or cloud computing platform, the system utilizes parallel computing capabilities to process large-scale multimodal data and accelerate the training process of reinforcement learning models. It also provides an application programming interface (API) or a graphical user interface for interactive querying and analysis by clinicians and researchers.

[0040] This system mainly includes a data acquisition and preprocessing unit 101, a multimodal feature extraction unit 102, a temporal state representation unit 103, a reinforcement learning decision-making unit 104, a decision output and evaluation unit 105, and a continuous learning and knowledge update unit 106. These units are logically connected and data flows through precisely defined data interfaces and communication protocols, forming a closed-loop learning and decision optimization system.

[0041] The data acquisition and preprocessing unit 101 is the primary component of this system. Its core function is to collect patients' multimodal clinical data and perform a series of standardized processes to ensure the accuracy and consistency of subsequent analyses. This unit seamlessly interfaces with medical information systems, image archiving and communication systems, and laboratory information management systems through an integrated application programming interface, thereby achieving automated data acquisition.

[0042] The data acquisition and preprocessing unit 101 can be further divided into multiple sub-modules, including a data acquisition module, an image preprocessing module, a physiological and cognitive data processing module, and a unified data storage module.

[0043] The data acquisition module is responsible for collecting raw patient data from multiple sources. These data types include, but are not limited to: Multi-sequence magnetic resonance imaging (MRI) data: This includes T1-weighted images, T2-weighted images, FLAIR sequences, and diffusion tensor imaging data. These data are typically acquired from image archiving and communication systems in DICOM format and record information on brain structure, pathological changes, and the integrity of white matter fiber tracts.

[0044] Positron emission tomography (PET) data: primarily including 18F-FDG PET data and amyloid PET data. 18F-FDG PET reflects brain glucose metabolism, while amyloid PET directly reflects the deposition of amyloid plaques in the brain. These data were also acquired in DICOM format.

[0045] Cognitive function scale scores: obtained through standardized cognitive assessment scales, such as the Mini-Mental State Examination (MMSE) score and the ADAS-Cog score (cognitive component of the Alzheimer's Disease Rating Scale). These data are typically obtained in structured tabular form from electronic health record systems or specialized cognitive assessment databases.

[0046] Cerebrospinal fluid biomarker data: including the concentrations of amyloid-42 (Aβ42), total Tau protein (T-Tau), and phosphorylated Tau protein (P-Tau). These data are obtained from laboratory test reports, typically as numerical values from a laboratory information management system.

[0047] The image preprocessing module performs several key preprocessing operations on the acquired magnetic resonance imaging (MRI) and positron emission tomography (PET) data. These operations aim to eliminate scanning artifacts, correct motion, and standardize space, ensuring the comparability of image data across different time points and patients. The specific processing steps include: Skull dissection: Automated image processing algorithms are used to identify and remove non-brain tissue structures such as the skull and scalp, preserving only the brain parenchyma. This process utilizes morphological manipulation and threshold segmentation methods to ensure that subsequent analysis focuses on the brain tissue.

[0048] Image registration: An image registration algorithm based on affine transformation and nonlinear deformation field is employed. All image data, including MRI and positron emission tomography (PET) data from different sequences, are registered to a unified standard brain space, such as the standard brain template of the Montreal Neuroscience Institute. Affine transformation is used to correct for large-scale translation, rotation, scaling, and shearing, while the nonlinear deformation field further corrects for fine anatomical differences, ensuring consistent correspondence at the pixel or voxel level. This step is achieved iteratively by optimizing image similarity metrics, such as mutual information or correlation coefficients.

[0049] Intensity normalization: The registered image data undergoes intensity normalization. This operation aims to eliminate differences in image intensity caused by different scanning devices, scanning parameters, or time points, making the image signal intensity comparable. Common methods include linear scaling to a specific range, such as 0 to 255 or 0 to 1, or normalization techniques based on tissue histograms.

[0050] Gaussian smoothing filtering for denoising: Gaussian smoothing filtering is applied to denoise the image data. This operation reduces image noise by convolving with a Gaussian kernel function while preserving important anatomical information. The standard deviation of the filter is adjusted according to the image resolution and noise level to balance denoising effectiveness and detail preservation.

[0051] The physiological and cognitive data processing module is responsible for cleaning, handling missing values, and detecting outliers in cognitive function scale scores and cerebrospinal fluid biomarker data. For missing data, interpolation, multiple imputation, or machine learning-based models are used to make reasonable estimates. Outliers are identified and reviewed or corrected using statistical methods.

[0052] The unified data storage module is responsible for storing all preprocessed multimodal data in a unified internal data structure format within a high-performance distributed file system or dedicated medical database. The data structure is designed to support vertical data querying and management, ensuring that each patient's data includes a timestamp for subsequent time-series analysis. Simultaneously, all stored data is anonymized or de-identified to protect patient privacy. This module is also responsible for maintaining data integrity, consistency, and security, employing data verification mechanisms and access control policies.

[0053] The multimodal feature extraction unit 102 is logically connected to the data acquisition and preprocessing unit 101. Its function is to extract quantitative features closely related to the occurrence, development, and prognosis of Alzheimer's disease from the preprocessed raw data. This unit transforms complex raw data into feature vectors with higher information density and analytical value, providing refined input for subsequent temporal state representation and reinforcement learning decision-making.

[0054] The multimodal feature extraction unit 102 includes an image depth feature extraction module, a cognitive statistical feature extraction module, and a biomarker statistical feature extraction module.

[0055] The image depth feature extraction module is specifically designed for processing magnetic resonance imaging (MRI) and positron emission tomography (PET) data. This module utilizes a 3D convolutional neural network (CNN) model for depth feature extraction, aiming to automatically learn and capture three-dimensional spatial patterns in image data. The 3D CNN model comprises multiple convolutional layers, activation layers, pooling layers, and fully connected layers.

[0056] Convolutional layers: Multiple 3D convolutional kernels slide across 3D image data, extracting local features through convolution operations. Each convolutional layer learns different levels of feature representation, from low-level edges and textures to high-level anatomical features. The size, stride, and number of convolutional kernels are configurable hyperparameters.

[0057] Activation layer: After each convolutional layer, a non-linear activation function, such as ReLU (Rectified Linear Unit), is applied to increase the non-linear expressiveness of the model, enabling it to learn more complex mapping relationships.

[0058] Pooling layers: After some convolutional layers, 3D pooling layers, such as max pooling or average pooling, are inserted to reduce the spatial dimensionality of the feature maps, decrease the number of parameters, and improve the translation invariance of features. This helps reduce the risk of overfitting the model.

[0059] Fully connected layers: The network ends with one or more fully connected layers that integrate the local features extracted by the preceding convolutional and pooling layers into a global feature representation. These fully connected layers learn the complex interactions between features and ultimately output a highly abstract latent representation of disease-related quantitative indicators.

[0060] This deep learning architecture enables the system to learn latent representations of quantitative indicators such as volume, cortical thickness, gray matter density, and white matter integrity of specific brain regions like the hippocampus, entorhinal cortex, and amygdala from images. These indicators are automatically extracted from raw image data in an end-to-end manner, eliminating the need for tedious manual region segmentation and feature engineering, significantly improving efficiency and objectivity. The extracted deep features are output as high-dimensional vectors.

[0061] The cognitive statistical feature extraction module is responsible for extracting statistical features from the cognitive function scale scoring data. Since cognitive scores are inherently structured data, this module primarily focuses on their quantitative attributes and changes over time. The extracted statistical features include: Raw scores: The raw scores of the MMSE and ADAS-Cog scales were used directly.

[0062] Rate of change during follow-up: This is obtained by calculating the ratio of the difference between two or more consecutive cognitive scores to the time interval. For example, for a patient's MMSE score sequence, the difference in MMSE scores between every two adjacent time points is calculated and divided by the corresponding time interval to reflect the rate of cognitive decline or stabilization.

[0063] Z-score relative to age-matched healthy population: The Z-score is calculated by comparing the patient's cognitive score with the mean and standard deviation of a healthy population matched for age, education level, and gender. The Z-score represents the standard deviation of the patient's score from the mean of the healthy population, providing a more objective assessment of the patient's cognitive level. These statistical features are combined into a mid-dimensional feature vector.

[0064] The biomarker statistical feature extraction module processes cerebrospinal fluid biomarker data. This module also focuses on quantitative attributes and dynamic changes. The extracted statistical features include: Absolute concentration values: Laboratory-detected concentration values of Aβ42, T-Tau, and P-Tau are used directly.

[0065] Ratios: Calculate the ratio between biomarkers, such as Aβ42 / T-Tau, P-Tau / T-Tau, etc. These ratios are often considered to have higher diagnostic or prognostic value than a single biomarker.

[0066] Trends during follow-up: Statistical methods such as linear regression or exponential fitting are used to analyze the concentration trends of biomarkers at multiple follow-up time points to capture the impact of disease progression or treatment interventions. These statistical characteristics are also combined into a mid-dimensional feature vector.

[0067] All feature vectors extracted from image, cognitive, and biomarker data are standardized within their respective submodules, such as through Z-score standardization or min-max standardization, to ensure comparability between different types of features and prevent any single feature from dominating subsequent model training due to an excessively large numerical range. These standardized multimodal features are then passed to the temporal state representation unit 103.

[0068] The temporal state representation unit 103 is connected to the multimodal feature extraction unit 102. Its core task is to integrate the various features output by the multimodal feature extraction unit 102 into a time-dependent temporal state vector that can capture the patient's current disease state and historical evolution trajectory. Please refer to the appendix. Figure 3 This unit is a key link connecting static features and dynamic decision-making.

[0069] The temporal state representation unit 103 uses a recurrent neural network model for serialization and encoding. Specifically, the model can be a Long Short-Term Memory (LSTM) network or a gated recurrent unit (GRU), both of which excel at processing sequential data and capturing long-distance temporal dependencies.

[0070] Long Short-Term Memory (LSTM) Network: LSTM is a special type of recurrent neural network. Its internal structure is designed with a special gating mechanism to solve the gradient vanishing or exploding problems of traditional RNNs, thus effectively learning long sequence dependencies. A typical LSTM unit contains: Forget Gate: Determines which information to discard from the cell state of the previous time step. It outputs a value between 0 and 1 to each cell state through a sigmoid activation function, where 0 represents complete forgetting and 1 represents complete retention.

[0071] Input gate: Determines which information from the current input is used to update the cell state. It consists of a Sigmoid layer that decides which values to update, and a Tanh layer that generates candidate values.

[0072] Cellular state: Stores long-term memories. Updated through the coordinated action of the forgetting gate and the input gate.

[0073] Output gate: Determines which information is output from the cell state as the hidden state for the current time step. It filters the cell state through a Sigmoid layer and scales it using a Tanh layer.

[0074] At each time step, the LSTM unit receives the multimodal feature vector of the current time step and the hidden state and cell state of the previous time step as input, and outputs the new hidden state and cell state of the current time step. The hidden state is the final temporal state vector generated by this unit.

[0075] Gated Recurrent Unit (GRU): A GRU is a simplified version of LSTM with fewer parameters, but performs comparably to LSTM in many tasks. A typical GRU unit contains: Update gate: Controls the extent to which information from the previous time step is brought into the current cell state, and the extent to which new information from the current time step is brought into the current cell state. It determines the proportion of old memories retained and the proportion of new memories introduced.

[0076] Reset gate: Determines how to combine the hidden state from the previous time step with the current input. It controls whether the model should ignore past information.

[0077] The GRU unit achieves selective memorization and forgetting of information through the coordinated action of update gates and reset gates, generating the hidden state at the current time step, i.e., the time sequence state vector.

[0078] Regardless of whether LSTM or GRU is used, the input to the temporal state representation unit 103 is a sequence of multimodal features arranged in chronological order. For each patient, the system collects their imaging, cognitive, and biomarker features at multiple time points (e.g., initial diagnosis, 6-month follow-up, 1-year follow-up, etc.). These multimodal feature vectors extracted at each time point are concatenated to form a feature sequence.

[0079] At each time step, the temporal state representation unit 103 aggregates the multimodal feature information of the current moment and the feature evolution information of historical moments. This means that the final generated temporal state vector not only reflects the patient's static physiological state at the current time point, but more importantly, it encodes the dynamic progression trajectory of the disease from the past to the present. This vector is a high-dimensional continuous numerical representation, and its dimension depends on the configuration of the recurrent neural network model, typically ranging from tens to hundreds. This vector serves as the core input of the reinforcement learning decision unit 104, enabling the decision to fully consider the dynamics of the disease.

[0080] The reinforcement learning decision-making unit 104 is the core intelligent component of this system. Connected to the temporal state representation unit 103, it is responsible for learning and outputting the optimal diagnostic or prognostic decision based on the patient's temporal state vector. Please refer to the appendix. Figure 2 This unit learns from reward signals by interacting with a pre-defined environment, thereby optimizing its decision-making strategy. This unit employs reinforcement learning algorithms, such as Deep Q-Network (DQN), the Advantage-Commentator (A2C) algorithm, or the Proximal Policy Optimization (PPO) algorithm, as its core learning mechanism.

[0081] This unit mainly consists of a policy network, a value network, and an interaction mechanism with the preset environment.

[0082] Policy Networks: The policy network is the "decision-maker" in the reinforcement learning decision-making unit 104. It consists of a multi-layer feedforward neural network, and its input is the temporal state vector output by the temporal state representation unit 103. This vector encodes the patient's overall disease state at the current moment and its historical evolution trajectory.

[0083] The output of the policy network is a discrete probability distribution of decision actions. This means that for each possible decision action, the network outputs a probability value representing the propensity to take that action in the current state. The set of decision actions is predefined and designed to cover key aspects of Alzheimer's disease diagnosis and management. These decision actions include: "Diagnosed with Alzheimer's disease (AD)": This means that the system has determined that the patient has Alzheimer's disease based on the current information.

[0084] "Diagnosed as Mild Cognitive Impairment (MCI)": This indicates that the system has determined that the patient is in the stage of mild cognitive impairment.

[0085] "Diagnosed as normal cognition": This indicates that the system judges the patient's cognitive function to be normal.

[0086] "Predicting the risk level of disease progression to Alzheimer's disease": The system does not directly provide a diagnosis, but rather assesses the patient's risk level of progressing to AD within a specific time window in the future, such as low, medium, or high risk.

[0087] "Recommend specific drug intervention": Based on the patient's disease stage, biomarker characteristics, etc., recommend one or more specific drug treatment options.

[0088] "Further genetic testing is recommended": When existing information is insufficient or specific risk factors exist, genetic testing is recommended to assist in diagnostic or treatment decisions.

[0089] The policy network uses the Softmax function to transform the output of the final fully connected layer into a probability distribution, ensuring that the sum of the probabilities of all actions is 1. The training objective of the network is to maximize the expected cumulative reward obtained in the preset environment.

[0090] Value Network: The value network is the "evaluator" in the reinforcement learning decision unit 104. It is also composed of a multi-layer feedforward neural network, and its input is the same as that of the policy network, which is the temporal state vector output by the temporal state representation unit 103.

[0091] The output of a value network is a scalar value representing the expected cumulative reward given the current temporal state vector. This value can be understood as the "goodness" or "badness" of the current state. Value networks evaluate the potential value of a state by learning a state-value function or a state-action-value function.

[0092] For example, in Deep Q-Network (DQN), the value network outputs the Q-value for each possible action, representing the expected future reward obtained by taking an action in the current state and following the optimal policy. In the Advantage-Critic Algorithm A2C, the value network directly estimates the state value V(s) as the baseline for the policy network's gradient updates. The training objective of the value network is to minimize the difference between its predicted value and the actual cumulative reward, typically using the mean squared error loss function.

[0093] Preset environment: The preset environment is a simulated learning field for the reinforcement learning decision unit 104. It is constructed from a large amount of desensitized clinical data from historical patients, which details the natural evolution of the patients' diseases, the implementation of different interventions, and the subsequent clinical outcomes.

[0094] The default environment is implemented through an emulator, which can: Receive the decision-making actions output by the reinforcement learning agent policy network.

[0095] Based on decision-making actions and the current patient status, the simulator simulates changes in the patient's disease state. For example, if the system recommends a certain drug intervention, the simulator will simulate changes in the patient's cognitive function, biomarkers, and other indicators over a period of time, based on historical data of similar patients' responses to the drug.

[0096] A reward signal is generated and fed back to the reinforcement learning decision unit 104. The logic for generating the reward signal strictly follows the correlation between disease evolution and clinical outcomes.

[0097] Reward signal design and parameter updates: Reward signals are the core driving force of reinforcement learning, guiding the optimization of parameters in both the policy and value networks. The design of reward signals is crucial. The system provides a positive reward when the diagnostic or prognostic outcome output by the strategy network matches the patient's actual clinical follow-up results. For example, if the system diagnoses AD, and the patient is subsequently diagnosed with AD during clinical follow-up, a substantial positive reward is given.

[0098] When results are inconsistent, a negative reward is given. For example, if the system predicts low risk, but the patient rapidly progresses to Alzheimer's disease (AD) within a short period, a significant negative reward is given.

[0099] Reward signals can also include additional positive incentives for early diagnosis and effective intervention. For example, if the system can accurately predict the risk of disease progression and recommend effective interventions in the early stages of the disease, such as the period of mild cognitive impairment, and these interventions actually delay disease progression, then an additional positive reward is given. This encourages the system to learn and tend to make accurate and beneficial decisions at earlier stages.

[0100] The reinforcement learning decision unit 104 updates the weight parameters of the policy network and the value network by maximizing the expected cumulative reward.

[0101] In Deep Q-Network (DQN), the system stores the state, action, reward, and next state of experience tuples through an experience replay mechanism, and updates them in batches by randomly sampling from these tuples. The target Q-value is calculated using the Bellman equation, and then the parameters of the value network are optimized through gradient descent to reduce the mean squared error between the predicted Q-value and the target Q-value.

[0102] In the Advantage-Critic A2C algorithm, the policy network is updated using the policy gradient method, while the value network estimates state value through temporal difference learning. The gradient update of the policy network considers the advantage function provided by the value network to guide the policy towards actions with higher value.

[0103] Parameter updates typically employ optimizers, such as Adam or RMSprop, which adjust millions of weights and biases in the network via backpropagation to progressively improve the accuracy and effectiveness of decisions. This training process is iterative; through continuous interaction with the pre-defined environment, the reinforcement learning decision unit 104 continuously learns and refines its diagnostic and prognostic strategies.

[0104] The decision output and evaluation unit 105 is connected to the reinforcement learning decision unit 104. Its main function is to transform the optimized decision-making strategy output by the reinforcement learning decision unit 104 into a diagnostic report, disease progression risk prediction report, or personalized treatment suggestion that clinicians and researchers can understand and act upon. Please refer to the appendix. Figure 4 This unit is also responsible for evaluating the effectiveness of these decisions in actual clinical applications and generating reward signals to feed back to the reinforcement learning decision unit 104, forming an important closed loop.

[0105] The decision output and evaluation unit 105 can be further subdivided into a report generation module, a treatment suggestion generation module, a clinical effect evaluation module, and a reward signal feedback module.

[0106] The report generation module generates a structured and easy-to-interpret report based on the decision actions output by the reinforcement learning decision unit 104 and their related confidence information.

[0107] Diagnostic Report: This report includes a clear diagnostic category, such as "Alzheimer's disease," "mild cognitive impairment," or "normal cognition." It also provides diagnostic confidence, typically expressed as a probability value or percentage, indicating the system's level of confidence in the diagnosis. More importantly, the report lists key imaging and biomarker evidence, such as the degree of atrophy in specific brain regions, the standardized uptake ratio (SUVr) of amyloid PET, and cerebrospinal fluid Aβ42 levels. This evidence is crucial for supporting the diagnostic decision, increasing the transparency and reliability of the diagnosis. The report format supports multiple output standards, such as PDF or XML, facilitating integration into electronic medical record systems.

[0108] Disease Progression Risk Prediction Report: This report details the probability that a patient will develop Alzheimer's disease within a specific time window, such as the next 1, 3, or 5 years. This probability is typically expressed as a percentage, and the risk is categorized into different risk levels, such as low, intermediate, and high. The report also identifies and lists key predictive factors that may be the most significant imaging, cognitive, or biomarker features influencing disease progression, providing targets for clinical intervention.

[0109] The treatment recommendation generation module generates personalized treatment suggestions based on the actions that the reinforcement learning decision-making unit 104 might output under specific circumstances, such as "recommending specific drug intervention" or "recommending further genetic testing," combined with the patient's specific clinical data. These suggestions are highly customized, based on the patient's genotype information, such as APOE ε4 allele status, disease stage, such as early, middle, and late stages, and biomarker characteristics, such as cerebrospinal fluid amyloid and Tau protein levels. For example, for an APOE ε4 carrier and an MCI patient with positive amyloid PET, the system may recommend specific drug intervention targeting amyloid and recommend regular follow-up examinations. The suggestions may also include recommendations for lifestyle interventions, cognitive training, or participation in clinical trials.

[0110] The clinical outcome assessment module is responsible for continuously tracking and collecting patients' actual clinical outcomes after receiving diagnoses, risk predictions, or treatment recommendations. This includes subsequent clinical diagnosis information, disease progression pathways, changes in cognitive function scale scores, biomarker follow-up results, and adherence to treatment plans and feedback on efficacy. This module obtains this real-world data through data exchange with electronic medical record systems and follow-up databases.

[0111] The reward signal feedback module is key to closed-loop learning. Based on the actual clinical results collected by the clinical outcome evaluation module, it generates precise reward signals and feeds them back to the reinforcement learning decision unit 104.

[0112] Specifically, the reward signal is generated by comparing the decision output with the patient's subsequent diagnosis or actual disease progression path.

[0113] If the diagnostic report output by decision unit 104 is completely consistent with the patient's subsequent gold standard diagnosis result, such as both being diagnosed as AD, then a strong positive reward is generated.

[0114] If the disease progression risk prediction report accurately predicts the patient's actual progression, such as predicting a high risk, and the patient does indeed progress to Alzheimer's disease (AD) within one year, a significant positive reward is generated.

[0115] A positive reward is generated if the personalized treatment recommendation is adopted and the patient's clinical condition improves significantly or disease progression is slowed.

[0116] Conversely, if the decision does not match the actual outcome, such as misdiagnosis, incorrect risk prediction, or ineffective treatment, a negative reward is generated.

[0117] The quantification of reward signals is typically based on a pre-defined reward function that maps differences or consistency in clinical outcomes to a numerical value. For example, a correct diagnosis awards 10 points, while a wrong diagnosis deducts 20 points; accurately predicting high risk and patient progression awards 5 points, while predicting low risk but patient progression deducts 15 points. The accuracy and timeliness of this reward signal directly affect the model optimization performance of the reinforcement learning decision unit 104.

[0118] The continuous learning and knowledge updating unit 106 is connected to the decision output and evaluation unit 105, with the goal of ensuring that the system can adapt to the continuous evolution of medical knowledge, changes in clinical practice, and the influx of new patient data. Please refer to the appendix. Figure 4 This unit continuously receives new patient follow-up data and clinical treatment results, and uses this new data to retrain and optimize the model of the reinforcement learning decision unit 104, thereby achieving dynamic updating of the system model and knowledge accumulation.

[0119] The continuous learning and knowledge update unit 106 includes an online learning module, an offline retraining module, a data management and version control module, and a model evaluation and deployment module.

[0120] The online learning module employs incremental learning, updating model parameters in small batches to adapt to new patient data streams. When new patient follow-up data or new clinical intervention results are processed by the data acquisition and preprocessing unit 101, the multimodal feature extraction unit 102, and the temporal state representation unit 103, forming new temporal state vectors and corresponding reward signals, the online learning module fine-tunes the policy network and value network in the reinforcement learning decision-making unit 104 with a small learning rate.

[0121] This online learning mechanism enables the system to respond in real time to new clinical evidence, capturing the latest disease evolution patterns and treatment effects without waiting for large-scale offline training. Online learning typically employs mini-batch optimization methods based on stochastic gradient descent to ensure the model can quickly adapt to environmental changes while avoiding overfitting to short-term fluctuations.

[0122] The offline retraining module periodically utilizes a larger-scale accumulated historical data for comprehensive model optimization and validation. This is a periodic and more thorough model update process. The offline retraining module will be activated when the accumulated new data reaches a preset threshold or after a certain time period, such as monthly or quarterly. It will integrate all historical data with newly collected data, reconstruct the training set, and perform a complete retraining of the reinforcement learning decision unit 104 model.

[0123] Offline retraining is typically performed on high-performance computing resources, employing more refined hyperparameter tuning, more complex model architecture exploration, and more comprehensive cross-validation strategies to ensure stable performance improvement and generalization ability. The retrained model undergoes rigorous offline validation, including evaluation of metrics such as diagnostic accuracy, prognostic prediction accuracy, sensitivity, and specificity on independent test sets.

[0124] The data management and version control module manages all datasets used for learning and training, and performs version control on the models. All newly collected data, preprocessed features, and corresponding reward signals are securely stored and timestamped. This module maintains a complete historical version of the model training dataset for backtracking analysis or retraining when necessary. Simultaneously, each new model version generated during offline retraining is numbered and its training parameters, performance metrics, and deployment time are recorded. This ensures the system's traceability and stability.

[0125] The model evaluation and deployment module comes into play after offline retraining. Before deployment, the new model version must undergo rigorous performance evaluation to ensure it is superior to or at least not inferior to the currently running model in terms of clinical metrics. Evaluation metrics include diagnostic accuracy, area under the region (AUC) for prognostic prediction, F1 score, and clinical usability metrics. Once the new model passes all validation criteria, it is deployed to the production environment, replacing the old model and becoming available for service. During deployment, the system performs a seamless switchover to ensure service continuity. Simultaneously, the deployed model's performance continues to be monitored online to detect any potential performance degradation or bias, triggering further online learning or offline retraining.

[0126] By employing a strategy that combines online learning with offline retraining, this continuous learning and knowledge update unit 106 ensures that the system can continuously learn and improve itself, constantly enhancing the accuracy and reliability of its diagnosis and prognostic assessment, and always remaining at the forefront of Alzheimer's disease research and clinical practice. This built-in adaptive and self-improving capability enables the system to quickly adapt and optimize its decision-making strategies in the face of new medical discoveries, evolving diagnostic criteria, or updated treatment plans.

[0127] The workflow of this system can be summarized as follows: First, the data acquisition and preprocessing unit 101 continuously acquires and processes multimodal clinical data; then, the multimodal feature extraction unit 102 extracts high-dimensional features from the data; subsequently, the temporal state representation unit 103 transforms these features into temporal state vectors to capture the dynamic evolution of the disease; next, the reinforcement learning decision-making unit 104 utilizes its policy network and value network, combined with interactions with the preset environment, to learn the optimal diagnostic or prognostic decision-making strategy from reward signals; finally, the decision output and evaluation unit 105 transforms the decisions into understandable clinical reports and recommendations, and generates reward signals based on actual clinical results, feeding them back to the reinforcement learning decision-making unit 104, forming a closed loop that promotes continuous model optimization. The continuous learning and knowledge update unit 106 ensures that the entire system can continuously improve itself as new data accumulates and medical knowledge evolves.

[0128] In the data acquisition and preprocessing unit 101, the data acquisition module, in addition to performing routine data acquisition, also incorporates a data integrity verification mechanism. Whenever a new dataset is received, the system performs checksum comparison, metadata verification, and data range checks to ensure the accuracy, integrity, and consistency of the data. For example, for DICOM images, it checks whether the patient ID, scan date, sequence type, and other information in the header file conform to the expected format and match the patient's master index information. For cognitive scores, it checks whether the values are within a reasonable range to avoid data entry errors. Any non-compliant data is flagged and triggers an exception handling process, such as automatically requesting data retransmission or notifying the data administrator for manual intervention. Furthermore, all acquired raw data undergoes preliminary anonymization before entering the preprocessing process, separating patient identity information from clinical data and generating a unique anonymous ID to comply with medical data privacy protection regulations.

[0129] The image preprocessing module employs a multi-stage iterative optimization strategy during image registration. First, rigid body registration based on mutual information is performed to correct translation and rotation. Next, affine registration based on normalized mutual information is performed to further correct scaling and shearing. Finally, a nonlinear registration algorithm based on elastic deformation fields is applied to correct local anatomical differences by minimizing deformation energy and maximizing image similarity. Each step in the entire registration process generates a transformation matrix or deformation field, and these intermediate results are stored for subsequent verification or inverse transformation. To address severe image artifacts caused by significant patient movement, the system incorporates motion correction algorithms, such as data-driven motion estimation and compensation techniques, or automatically identifies and excludes poor-quality image sequences in extreme cases, preventing the introduction of low-quality data into subsequent analysis and ensuring the reliability of feature extraction.

[0130] In the multimodal feature extraction unit 102, the 3D convolutional neural network model used in the image depth feature extraction module is pre-trained using a large-scale public dataset and the private dataset accumulated by this system. The pre-trained model is fine-tuned to adapt to the image features of Alzheimer's disease when performing feature extraction for specific tasks within this system. The parameters of each convolutional kernel, bias term, and weights of the fully connected layers are carefully initialized and iteratively updated using the backpropagation algorithm and the Adam optimizer. During training, batch normalization is used to accelerate convergence, and dropout is used to prevent overfitting. The output depth feature dimension is configurable, typically 128, 256, or 512 dimensions. These continuous numerical features represent complex pathological patterns in brain regions, such as slight asymmetric atrophy of the hippocampus or subtle thinning of the entorhinal cortex. Each feature vector is normalized to L2 norm before being transmitted to the temporal state representation unit 103, ensuring that all features have similar scales across different patients and time points, and preventing the numerical magnitude of a particular feature from dominating model decisions.

[0131] When processing multimodal feature sequences, the temporal state representation unit 103 uses a recurrent neural network model (LSTM or GRU) that performs weighted processing of input features at each time step. This weighting mechanism can be dynamically adjusted according to the importance of different modal features at different time points. For example, in the early stages of disease, image structure features may be more critical, while in the later stages of disease, the trend of cognitive score changes may be more indicative. The hidden layer dimension, number of layers, and whether a bidirectional recurrent neural network (Bi-RNN) structure is used are all configurable hyperparameters. The bidirectional structure allows the model to consider both past and future information to construct a representation of the current state, thereby capturing a more comprehensive temporal dependency. To improve the robustness of the model, a sequence noise injection technique is introduced during training to enhance the model's adaptability to incomplete or noisy temporal data. The generated temporal state vector has a fixed and high dimension, which integrates all historical observation data of the patient and can accurately represent the patient's "memory" and "current state" of disease progression.

[0132] The preset environment simulator of the reinforcement learning decision-making unit 104 is a complex state transition model. It not only incorporates desensitized clinical data from historical patients but also integrates medical knowledge graphs and expert experience rules. When the policy network outputs a decision action, such as "recommend a specific drug intervention," the simulator predicts the patient's state transition at the next time step based on the drug's average efficacy and side effect rate in historical patients, as well as individual patient characteristics such as genotype and age. This state transition includes changes in cognitive scores, fluctuations in biomarker levels, and evolution of imaging features. The simulator achieves state transitions through probabilistic models, such as using Markov Decision Processes (MDPs) to simulate the natural progression of disease and the effects of interventions. The reward signal generation mechanism also considers delayed rewards for decisions. For example, the value of an early diagnosis or intervention may not become apparent for months or years; therefore, the reward function is designed with a discount factor to assess the current value of future rewards, encouraging the system to make decisions that are effective in the long term.

[0133] The report generation module of the decision output and evaluation unit 105 automatically generates an interactive image region visualization map when generating diagnostic reports, in addition to providing the diagnostic category and confidence level. For example, for diagnostic evidence of structural atrophy, the system highlights the most significantly atrophied brain region on the patient's MRI images and marks the quantified percentage of volume reduction. This allows clinicians to intuitively see the basis for the diagnosis, improving the clinical usability of the report. For personalized treatment recommendations, the system details the dosage, duration of treatment, and potential side effects of the recommended medications, and provides links to references based on evidence-based medicine. All generated reports include a timestamp, system version information, and the model confidence interval at the time of decision-making to ensure the transparency and traceability of the reports.

[0134] The offline retraining module of the continuous learning and knowledge update unit 106 employs new cross-validation strategies, such as K-fold cross-validation, during each large training cycle to evaluate the model's generalization ability on different data subsets. Model hyperparameters, such as learning rate, batch size, number of network layers, and number of neurons, are tuned using automated methods like grid search or Bayesian optimization to find the optimal model configuration. Before model deployment, rigorous A / B testing or shadow deployment is performed, allowing the new and old models to run in parallel for a period of time, comparing their performance on real-world data to ensure the new model meets or surpasses the old model in all key metrics. The system also features a model rollback mechanism; if the new model exhibits abnormalities or performance degradation after deployment, the system can quickly roll back to the previous stable version, ensuring the continuity and safety of clinical services.

[0135] Through these meticulous engineering details and refined data flow management, this reinforcement learning-based Alzheimer's image analysis system can provide unprecedented diagnostic accuracy and prognostic prediction capabilities, and possesses strong adaptive and self-improvement capabilities, providing solid technical support for precision medicine for Alzheimer's disease.

[0136] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.

[0137] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. An image analysis system for senile dementia based on reinforcement learning, characterized by, include: The data acquisition and preprocessing unit is used to acquire multimodal clinical data from patients and preprocess the acquired data to ensure the accuracy and consistency of subsequent analysis. The preprocessing operations include image registration, standardization, denoising, and normalization. A multimodal feature extraction unit, connected to the data acquisition and preprocessing unit, is used to perform deep learning feature extraction and statistical feature extraction on the preprocessed data; The temporal state representation unit, connected to the multimodal feature extraction unit, is used to integrate the extracted multimodal features into a temporal state vector containing the patient's current disease state and historical evolution trajectory. The reinforcement learning decision unit, connected to the temporal state representation unit, includes a policy network and a value network. The policy network is used to output diagnostic or prognostic decision actions based on the temporal state vector, and the value network is used to evaluate the potential value of the temporal state vector. The reinforcement learning decision unit interacts with a preset environment, receives reward signals, and updates the parameters of the policy network and the value network based on the reward signals to optimize the decision strategy. The decision output and evaluation unit, connected to the reinforcement learning decision unit, is used to generate a diagnostic report, a disease progression risk prediction report, or personalized treatment suggestions for Alzheimer's disease based on the optimized decision strategy output by the reinforcement learning decision unit, and to generate a reward signal to be fed back to the reinforcement learning decision unit based on the actual clinical effect of the diagnostic report, risk prediction report, or treatment suggestions. The continuous learning and knowledge update unit, connected to the decision output and evaluation unit, is used to continuously receive new patient follow-up data and clinical treatment results, and to use the new data to retrain and optimize the model of the reinforcement learning decision unit, so as to realize the dynamic update and knowledge accumulation of the system model.

2. The Alzheimer's disease image analysis system based on reinforcement learning according to claim 1, characterized in that, The multimodal clinical data collected by the data acquisition and preprocessing unit includes: multi-sequence magnetic resonance imaging data; positron emission tomography (PET) data; cognitive function scale scores; and cerebrospinal fluid biomarker data.

3. The Alzheimer's disease image analysis system based on reinforcement learning according to claim 2, characterized in that, The multi-sequence magnetic resonance imaging data includes T1-weighted images, T2-weighted images, FLAIR sequence images, and diffusion tensor imaging data; the positron emission tomography (PET) data includes 18-fluorodeoxyglucose PET data and amyloid PET data; the cognitive function scale scoring data includes the Brief Mental State Examination (BMS) score and the cognitive portion of the Alzheimer's Disease Rating Scale (ALR); the cerebrospinal fluid biomarker data includes the concentration values of amyloid-42, total Tau protein, and phosphorylated Tau protein.

4. The Alzheimer's disease image analysis system based on reinforcement learning according to claim 1, characterized in that, The preprocessing operations performed by the data acquisition and preprocessing unit include: performing cranial dissection on magnetic resonance imaging data and positron emission tomography (PET) scan data; using an image registration algorithm based on affine transformation and nonlinear deformation field to register all image data to a unified standard brain space; performing intensity normalization on the registered image data; and performing Gaussian smoothing filtering to denoise the registered image data.

5. The Alzheimer's disease image analysis system based on reinforcement learning according to claim 1, characterized in that, The multimodal feature extraction unit extracts deep features from the preprocessed magnetic resonance imaging data and positron emission tomography data using a 3D convolutional neural network model. The 3D convolutional neural network model includes multiple convolutional layers, activation layers, pooling layers, and fully connected layers to learn disease-related three-dimensional spatial patterns from the images. The deep features are potential representations of the volume, cortical thickness, gray matter density, and white matter integrity of the hippocampus, entorhinal cortex, and amygdala.

6. The Alzheimer's disease image analysis system based on reinforcement learning according to claim 1, characterized in that, The temporal state representation unit employs a recurrent neural network model to serialize and encode the features output by the multimodal feature extraction unit, thereby generating a temporal state vector capable of capturing time dependence and disease progression trajectory; the recurrent neural network model is a long short-term memory network or a gated recurrent unit.

7. The Alzheimer's disease image analysis system based on reinforcement learning according to claim 1, characterized in that, The reinforcement learning decision unit uses a deep Q-network, the dominant-critic algorithm, or the proximal policy optimization algorithm as its core reinforcement learning algorithm; the decision actions output by the policy network include diagnosing Alzheimer's disease, diagnosing mild cognitive impairment, diagnosing normal cognition, predicting the risk level of disease progression to Alzheimer's disease, and recommending specific drug intervention.

8. The Alzheimer's disease image analysis system based on reinforcement learning according to claim 7, characterized in that, The preset environment consists of desensitized clinical data from historical patients, used to simulate the natural evolution of the patient's disease and the possible clinical outcomes of different decision-making actions; The design of the reward signal includes: giving a positive reward when the diagnostic or prognostic result output by the strategy network is consistent with the patient's actual clinical follow-up results; When the results are inconsistent, a negative reward is given; the reward signal also includes additional positive incentives for early diagnosis and effective intervention.

9. The Alzheimer's disease image analysis system based on reinforcement learning according to claim 1, characterized in that, The diagnostic report generated by the decision output and evaluation unit includes the diagnostic category, diagnostic confidence level, and key imaging and biomarker evidence; the disease progression risk prediction report includes the probability of progression to Alzheimer's disease within the next 1, 3, or 5 years, risk level classification, and key predictive factors; the personalized treatment recommendations are based on the patient's genotype, disease stage, and biomarker characteristics.

10. A reinforcement learning-based image analysis method for Alzheimer's disease, characterized in that, Includes the following steps: Step 1: Data Acquisition and Preprocessing; Collect the patient's multi-sequence magnetic resonance imaging data, positron emission tomography data, cognitive function scale score data, and cerebrospinal fluid biomarker data, and perform image registration, standardization, noise reduction, and normalization preprocessing on the collected data; Step 2: Multimodal feature extraction; Deep learning feature extraction was performed on the preprocessed magnetic resonance imaging data and positron emission tomography data, and statistical feature extraction was performed on the cognitive function scale score data and cerebrospinal fluid biomarker data. Step 3: Temporal state representation; Integrate the extracted multimodal features into a temporal state vector that includes the patient's current disease state and historical evolution trajectory; Step 4: Enhanced learning decision-making; Utilizing a reinforcement learning model, including a policy network and a value network, wherein the policy network outputs diagnostic or prognostic decision actions based on the temporal state vector, and the value network evaluates the potential value of the temporal state vector; By interacting with a preset environment, receiving reward signals, and updating the parameters of the policy network and the value network based on the reward signals, the decision-making strategy is optimized. Step 5: Decision Output and Evaluation; Based on the optimized decision strategy output by the reinforcement learning decision model, generate a diagnosis report, a disease progression risk prediction report, or personalized treatment recommendations for Alzheimer's disease. At the same time, based on the actual clinical effects of the diagnosis report, risk prediction report, or treatment recommendations, generate a reward signal to feed back to the reinforcement learning model. Step 6: Continuous learning and knowledge updating; The system continuously receives new patient follow-up data and clinical treatment results, and uses the new data to retrain and optimize the reinforcement learning model in order to achieve dynamic model updates and knowledge accumulation.