Training method of image intermediate frame generation model and generation method of image intermediate frame
By training an intermediate frame generation model, and using high frame rate DSA image sequences and neural network processing, intermediate frame images are generated. This solves the problem of insufficient diagnostic accuracy of traditional DSA technology at low frame rates, and realizes the generation of high frame rate images, which is convenient for doctors to diagnose.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- UNION STRONG (BEIJING) TECH CO LTD
- Filing Date
- 2026-03-20
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional DSA technology struggles to capture hemodynamic changes at low frame rates, resulting in insufficient diagnostic accuracy.
By training an image intermediate frame generation model, high frame rate DSA is used to train image sequences for preprocessing and neural network model training to generate intermediate frame images. This includes a combination of optical flow estimation, feature fusion, and decoder. The model parameters are adjusted to meet the loss value conditions.
It enables frame interpolation in low frame rate DSA image sequences to generate high frame rate images, making it easier for doctors to diagnose diseases such as vascular stenosis and coronary heart disease.
Smart Images

Figure CN122244589A_ABST
Abstract
Description
Technical Field
[0001] This application generally relates to the field of medical image processing technology. More specifically, this application relates to a training method for an image intermediate frame generation model and a method for generating image intermediate frames. Background Technology
[0002] Digital subtraction angiography (DSA) is currently one of the gold standard techniques for the clinical diagnosis of vascular diseases such as intracranial vascular stenosis and coronary artery disease. Its basic principle is to digitally subtract images taken before and after contrast agent injection to eliminate images of soft tissues such as bones and muscles, ultimately obtaining a pure vascular image. In the diagnosis and interventional treatment of intracranial vascular stenosis and coronary artery disease, DSA imaging can provide crucial information on vascular morphology and hemodynamics.
[0003] However, traditional DSA technology has significant limitations in clinical applications. Specifically, in order to control the dosage of contrast agent, DSA equipment used to capture DSA images typically uses a low frame rate (e.g., 6-15 frames / second) for image acquisition. This may lead to the loss of motion information in lesion areas with high blood flow velocity, making it difficult to capture subtle hemodynamic changes and accurately display the dynamic process of blood flow through the diseased blood vessels, thus affecting the doctor's judgment of the degree of lesion.
[0004] In view of this, there is an urgent need to provide a training method for an intermediate frame generation model of images, so as to train an intermediate frame generation model of images that can interpolate frames in images acquired at a lower frame rate and convert low frame rate images into high frame rate images, so as to facilitate doctors' diagnosis. Summary of the Invention
[0005] In order to at least solve one or more of the technical problems mentioned above, this application proposes a training method for an image intermediate frame generation model and a method for generating image intermediate frames in several aspects.
[0006] In a first aspect, this application provides a training method for an image intermediate frame generation model, comprising: acquiring multiple DSA training image sequences, wherein the acquisition frame rate of the DSA training image sequences is greater than or equal to a set frame rate; preprocessing each image in each DSA training image sequence to obtain a DSA optimized image sequence; for each DSA optimized image sequence, inputting the image of the nth frame and the image of the (n+2)th frame in the DSA optimized image sequence into a neural network model to obtain an intermediate frame image; calculating a target loss value based on the intermediate frame image and the image of the (n+1)th frame, and adjusting the model parameters of the neural network model based on the target loss value, and returning to execute the step of inputting the image of the nth frame and the image of the (n+2)th frame in the DSA optimized image sequence into the neural network model to obtain an intermediate frame image, until a first set condition is met to obtain a trained image intermediate frame generation model.
[0007] In some embodiments, each image in each DSA training image sequence undergoes at least one of the following preprocessing steps to obtain a DSA optimized image sequence: filtering each image in each DSA training image sequence; extracting a region of interest in each image in each DSA training image sequence; obtaining a reference image for each DSA training image sequence, and performing inter-frame registration processing on each image in the DSA training image sequence based on the reference image.
[0008] In some embodiments, the neural network model includes: an optical flow estimation module, an optical flow encoder, an image encoder, a feature fusion module, and a decoder; the step of inputting the image of the nth frame and the image of the (n+2)th frame in the DSA optimized image sequence into the neural network model to obtain intermediate frame images includes: inputting the image of the nth frame and the image of the (n+2)th frame into the optical flow estimation module to obtain a target optical flow map, the target optical flow map being used to characterize the pixel motion trajectory from the image of the nth frame to the image of the (n+2)th frame; inputting the target optical flow map into the optical flow encoder for feature extraction to obtain... Optical flow features; and inputting the image of the nth frame and the image of the (n+2)th frame into the image encoder for feature extraction to obtain feature information of the image of the nth frame and the image of the (n+2)th frame; inputting the optical flow features, the feature information of the image of the nth frame and the feature information of the image of the (n+2)th frame together into the feature fusion module for feature fusion to obtain fused features; inputting the fused features and the optical flow features into the decoder so that the decoder generates the intermediate frame image based on the fused features with the optical flow features as spatial constraints.
[0009] In some embodiments, the optical flow estimation module includes: a convolutional encoder, a warp layer, a cost vohume layer, an optical flow decoder, and a context network module; the step of inputting the image of the nth frame and the image of the (n+2)th frame into the optical flow estimation module to obtain a target optical flow map includes: inputting the image of the nth frame and the image of the (n+2)th frame into the convolutional encoder for downsampling processing to obtain multiple feature map pairs; each feature map pair includes a one-to-one corresponding first feature map and second feature map; wherein, the first feature map is the feature map of the image of the nth frame, and the second feature map is the feature map of the image of the (n+2)th frame; the first feature map and the second feature map in each feature map pair have the same resolution, and the resolutions of each feature map pair are different; for the first feature map pair in the feature map pair, inputting the second feature map in the first feature map pair into the warp layer for alignment processing to obtain a second feature map aligned with the first feature map in the first feature map pair; inputting the aligned second feature map and the first feature map into the cost vohume layer. In the vohume layer, a similarity map between the aligned second feature map and the first feature map is calculated. The initial optical flow map, the similarity map, the first feature map, and the aligned second feature map are input into the optical flow decoder to obtain an optical flow residual map. The initial optical flow map is updated using the optical flow residual map, and the updated initial optical flow map is input into the context network module for optimization to obtain an optimized initial optical flow map. The optimized initial optical flow map is used as the new initial optical flow map, and the process returns to the step of inputting the second feature map in the first feature map pair into the warp layer for alignment processing until the second set condition is met to obtain a candidate optical flow map. The candidate optical flow map is upsampled, and the optical flow map obtained by the upsampling process is used as the new initial optical flow map to process the next feature map pair in the feature map pair until all feature map pairs are processed to obtain the target optical flow map.
[0010] In some embodiments, calculating the target loss value based on the intermediate frame image and the (n+1)th frame image includes: calculating the pixel difference based on the intermediate frame image and the (n+1)th frame image to obtain a pixel loss value; inputting the intermediate frame image and the (n+1)th frame image into a pre-trained perceptual feature extraction network for feature extraction to obtain a first perceptual feature map and a second perceptual feature map; calculating the perceptual feature difference based on the first perceptual feature map and the second perceptual feature map to obtain a perceptual loss value; and obtaining the target loss value based on the pixel loss value and the perceptual loss value.
[0011] In some embodiments, the pixel loss value is calculated using the following formula: ;in, Indicates the pixel loss value; This represents the value of the i-th pixel in the intermediate frame; Let represent the value of the i-th pixel in the (n+1)-th frame; |.| represents taking the absolute value; N represents the number of pixels in the intermediate frames and the (n+1)-th frame; and / or, the perceptual loss value is calculated using the following formula: ;in, This represents the perceived loss value; This represents the i×j×k-th pixel value of the first perceptual feature map; This represents the i×j×k-th pixel value of the second perceptual feature map; This represents the number of pixels in the perceptual feature map.
[0012] In a second aspect, this application provides a method for generating intermediate frames of an image, comprising: acquiring a DSA image sequence of a frame to be interpolated, wherein the acquisition frame rate of the DSA image sequence of the frame to be interpolated is less than a set frame rate; preprocessing the DSA image sequence of the frame to be interpolated to obtain a target DSA image sequence of the frame to be interpolated; inputting each adjacent image pair in the target DSA image sequence of the frame to be interpolated into a trained intermediate frame generation model to obtain intermediate frame images of the adjacent image pairs; wherein the intermediate frame generation model is trained based on the training method of the intermediate frame generation model described in the first aspect or any of the embodiments of the first aspect.
[0013] In some embodiments, after obtaining the intermediate frame image of the adjacent image pair, the method further includes: performing edge detection on the intermediate frame image; and using an edge enhancement algorithm to enhance the detected edges to obtain an optimized intermediate frame image.
[0014] In a third aspect, this application provides an electronic device, comprising: a processor configured to execute program instructions; and a memory configured to store the program instructions, which, when loaded and executed by the processor, cause the processor to perform a training method for an intermediate image frame generation model according to the first aspect or any embodiments thereof, or a method for generating intermediate images according to the second aspect or any embodiments thereof.
[0015] In a fourth aspect, this application provides a computer-readable storage medium storing program instructions that, when loaded and executed by a processor, cause the processor to perform a training method for an intermediate image frame generation model according to the first aspect or any embodiments thereof, or a method for generating intermediate images according to the second aspect or any embodiments thereof.
[0016] Using the training method and generation method of the image intermediate frame generation model provided above, this embodiment acquires multiple high-frame-rate DSA training image sequences, then preprocesses each image in each DSA training image sequence to obtain an optimized DSA image sequence. The nth and (n+2)th frames of each optimized DSA image sequence are input into a neural network model, enabling the model to autonomously learn complex prior knowledge of vascular movement from a large amount of high-frame-rate data, obtaining intermediate frame images. Then, a target loss value is calculated based on the intermediate frame image and the (n+1)th frame image, and the model parameters of the neural network model are adjusted based on the target loss value. The above steps are repeated to train the model until the target loss value meets the set loss value condition, resulting in a trained image intermediate frame generation model. The trained image intermediate frame generation model performs frame interpolation processing on low-frame-rate DSA image sequences, enabling frame interpolation in images acquired at a lower acquisition frame rate, converting low-frame-rate images into high-frame-rate images, facilitating diagnosis by doctors. Attached Figure Description
[0017] The above and other objects, features, and advantages of exemplary embodiments of this application will become readily understood by reading the following detailed description with reference to the accompanying drawings. In the drawings, several embodiments of this application are illustrated by way of example and not limitation, and the same or corresponding reference numerals denote the same or corresponding parts, wherein:
[0018] Figure 1 An exemplary flowchart of a training method for an image intermediate frame generation model according to some embodiments of this application is shown; Figure 2 The diagram shows an example of the structure of a neural network model according to some embodiments of this application; Figure 3 The diagram shows an example of the structure of an optical flow estimation module according to some embodiments of this application; Figure 4 An exemplary flowchart of a method for generating intermediate frames of an image according to some embodiments of this application is shown; Figure 5 An exemplary flowchart of intermediate frame image generation according to an embodiment of this application is shown; Figure 6 An exemplary structural block diagram of a training apparatus for an image intermediate frame generation model according to some embodiments of this application is shown; Figure 7 An exemplary structural block diagram of an apparatus for generating intermediate frames of images according to some embodiments of this application is shown; Figure 8 An exemplary structural block diagram of an electronic device according to some embodiments of this application is shown. Detailed Implementation
[0019] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this application. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0020] It should be understood that the terms "comprising" and "including" used in the specification and claims of this application indicate the presence of the described features, integrals, steps, operations, elements and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.
[0021] It should also be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this specification and claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used in this specification and claims refers to any combination and all possible combinations of one or more of the associated listed items, and includes such combinations.
[0022] As used in this specification and claims, the term "if" may be interpreted, depending on the context, as "when," "once," "in response to determination," or "in response to detection." Similarly, the phrase "if determined" or "if [described condition or event] is detected" may be interpreted, depending on the context, as "once determined," "in response to determination," "once [described condition or event] is detected," or "in response to detection of [described condition or event]."
[0023] The specific embodiments of this application will now be described in detail with reference to the accompanying drawings.
[0024] Exemplary application scenarios For the clinical diagnosis of vascular diseases such as intracranial vascular stenosis and coronary heart disease, DSA technology is often used. However, traditional DSA technology has significant limitations in clinical application. Specifically, in order to control the dosage of contrast agent, DSA equipment used to capture DSA images usually uses a low acquisition frame rate (e.g., 6-15 frames / second). This may lead to the loss of motion information in lesion areas with fast blood flow, making it difficult to capture subtle hemodynamic changes and accurately display the dynamic process of blood flow through the diseased blood vessel, thus affecting the doctor's judgment of the degree of lesion.
[0025] In view of this, there is an urgent need to provide a training method for an image intermediate frame generation model and a method for generating image intermediate frames, so as to train an image intermediate frame generation model that can interpolate frames in images acquired at a lower frame rate and convert low frame rate images into high frame rate images, which is convenient for doctors to diagnose.
[0026] Figure 1 An exemplary flowchart of a training method 100 for an image intermediate frame generation model according to some embodiments of this application is shown. It is understood that the above-described training method 100 for the image intermediate frame generation model can be executed by any suitable device with data processing capabilities, such as including but not limited to terminal devices, processors, and servers.
[0027] like Figure 1 As shown, the training method 100 for the above-mentioned image intermediate frame generation model includes: Step S110: acquiring multiple DSA training image sequences, wherein the acquisition frame rate of the DSA training image sequences is greater than or equal to a set frame rate; Step S120: preprocessing each image in each DSA training image sequence to obtain a DSA optimized image sequence; Step S130: for each DSA optimized image sequence, inputting the image of the nth frame and the image of the (n+2)th frame in the DSA optimized image sequence into a neural network model to obtain an intermediate frame image; Step S140: calculating a target loss value based on the intermediate frame image and the image of the (n+1)th frame, and adjusting the model parameters of the neural network model based on the target loss value, and returning to execute the step of inputting the image of the nth frame and the image of the (n+2)th frame in the DSA optimized image sequence into a neural network model to obtain an intermediate frame image, until the first set condition is met, and a trained image intermediate frame generation model is obtained.
[0028] For example, the DSA training image sequence in step S110 above refers to a collection of consecutive frames of clinically acquired DSA images, which may contain core feature information such as vascular morphology and blood flow motion, and is the raw data for model training. For example, the above-mentioned DSA training image sequence is a DSA image sequence of a patient with intracranial vascular stenosis. Specifically, the DSA training image sequence in this embodiment is a 2D-DSA image sequence.
[0029] In this embodiment, the aforementioned DSA training image sequences can be obtained through methods such as clinical DSA equipment acquisition and retrieval from medical image databases; this embodiment does not impose specific limitations on this method. The number of the aforementioned DSA training image sequences can be multiple, for example, 100, which can cover medical records with different locations of vascular stenosis, different degrees of lesions, and different blood flow velocities, ensuring the diversity of training data.
[0030] It should be noted that all 100 DSA training image sequences mentioned above can be used for model training. Alternatively, these 100 DSA training image sequences can be divided into training, validation, and test sets in a ratio of, for example, 7:2:1. The model can be trained on the training set, and the training effect can be validated on the validation set during training. Model parameters (e.g., learning rate, batch size, number of iterations, etc.) can be adjusted to avoid overfitting. After the model training is completed, the model performance can be evaluated on the test set.
[0031] For example, the acquisition frame rate in step S110 above refers to the number of image frames acquired per second. The higher the frame rate, the more detailed the capture of vascular movement and blood flow changes. In this embodiment, the acquisition frame rate of the DSA training image sequence is greater than or equal to a set frame rate. The set frame rate here is a critical value that distinguishes between high and low frame rate DSA images. Its specific value can be determined according to actual needs and application scenarios, for example, it can be set to 15 frames / second. Furthermore, the acquisition time of each DSA training image sequence can be set to be greater than or equal to a set time (e.g., 5 seconds).
[0032] In summary, step S110 above requires obtaining multiple DSA training image sequences with high frame rates as training data for the image intermediate frame generation model.
[0033] For example, the DSA optimized image sequence in step S120 above refers to the DSA training image sequence after preprocessing to eliminate noise, pose deviation and irrelevant background interference. It retains the core features of blood vessels and improves the efficiency and accuracy of model training.
[0034] Specifically, preprocessing each image in each DSA training image sequence can include at least one of the following: filtering, region of interest extraction, and inter-frame registration. Examples of each preprocessing method are described below: Filtering: For each DSA training image sequence, each image in the sequence is filtered. Specifically, image filters such as median filtering and Gaussian filtering can be used to filter each image in the DSA training image sequence to eliminate random noise in the image, making the blood vessel edges smoother and the background cleaner.
[0035] Of course, in the embodiments of this application, a pre-trained denoising model (e.g., the lightweight U-Net framework, which can be pre-trained on other DSA datasets) can also be used to denoise each image in the DSA training image sequence in order to preserve the blood vessel edges to the maximum extent while suppressing noise.
[0036] It should be noted that the embodiments of this application can simultaneously use image filters and pre-trained denoising models to preprocess each image in the DSA training image sequence.
[0037] Region of Interest (ROI) Extraction: For each DSA training image sequence, a region of interest (ROI) extraction operation is performed on each image in that sequence. Specifically, DSA training images typically contain some background tissue (e.g., bone, soft tissue subtraction residues). Therefore, conventional vessel segmentation algorithms or manual extraction methods can be used to extract the ROI from each image in the DSA training image sequence, allowing subsequent processing to focus on the ROI and reducing computational load.
[0038] Inter-frame registration: For each DSA training image sequence, a reference image for that DSA training image sequence is acquired, and inter-frame registration processing is performed on each image in the DSA training image sequence based on the reference image. In this embodiment, when the DSA device acquires a patient's DSA image sequence, the patient may move slightly (for example, in emergency care such as cerebral hemorrhage or cerebral infarction, it is often difficult to keep the patient still), causing an overall shift in the position of the same blood vessel in different image frames, affecting the display of blood vessels in the image. Therefore, an inter-frame registration method can be used to process the image frames to reduce pose deviations caused by patient movement. In specific implementation, a reference image can be acquired first. This reference image can be a frame selected from the DSA training image sequence that has high imaging quality and stable pose. Then, for each image in the DSA training image sequence, inter-frame registration processing is performed on each image in the DSA training image sequence based on the reference image. Specifically, stable features such as vascular bifurcation points can be manually extracted from the reference image and the image to be registered between frames. Then, rigid or affine transformation registration between frames can be performed using conventional scale-invariant feature transform (SIFT) or feature point matching algorithms such as oriented FAST and rotated BRIEF (ORB) to eliminate global displacement caused by slight patient movements.
[0039] For example, the neural network model in step S130 above refers to a network model used for DSA image frame interpolation, which uses the U-Net model framework as its basic framework and also incorporates an IRR-PWC optical flow network structure. The neural network model in this embodiment may include an optical flow estimation module, an optical flow encoder, an image encoder, a feature fusion module, and a decoder (see...). Figure 2 ).
[0040] In this embodiment, the image of the nth frame, the image of the (n+1)th frame, and the image of the (n+2)th frame are three consecutive frames on the time axis. For example, when n=1, the input is the image of the 1st frame and the image of the 3rd frame, and the expected output is the image of the 2nd frame.
[0041] In step S130, during the training of the neural network model, the nth frame and the (n+2)th frame of the DSA-optimized image sequence are input into the neural network model. The neural network model performs complex calculations to predict the intermediate frame image between the nth frame and the (n+2)th frame. The specific steps for inputting the nth frame and the (n+2)th frame of the DSA-optimized image sequence into the neural network model to obtain the intermediate frame image are described in the following embodiments and will not be repeated here.
[0042] For example, the target loss value in step S140 above can be used to measure the difference between the generated intermediate frame image (i.e., the generated n+1th frame) and the real n+1th frame image. The smaller the loss value, the closer the generated intermediate frame image is to the real n+1th frame image.
[0043] Specifically, the target loss value can be calculated using at least one loss function, such as a pixel-level loss function, a perceptual loss function, or a weighted average of the pixel-level loss function and the perceptual loss function. This application does not specifically limit the calculation method of the target loss value, but can determine it according to the actual situation.
[0044] In one specific embodiment of this application, the target loss value can be calculated by a weighted average of a pixel-level loss function and a perceptual loss function. Specifically, the pixel loss value is obtained by calculating the pixel difference between the intermediate frame image and the (n+1)th frame image; the intermediate frame image and the (n+1)th frame image are respectively input into a pre-trained perceptual feature extraction network for feature extraction to obtain a first perceptual feature map and a second perceptual feature map; the perceptual feature difference is calculated based on the first perceptual feature map and the second perceptual feature map to obtain a perceptual loss value; and the target loss value is obtained based on the pixel loss value and the perceptual loss value.
[0045] For example, the pixel difference calculation mentioned above refers to directly calculating the difference in pixel grayscale values between the intermediate frame image and the actual (n+1)th frame image. Specifically, the pixel difference calculation can be an L1 loss function. Based on this, the pixel loss value is the L1 loss value, which can reflect the degree of difference between the two frames at the original pixel level; the smaller the value, the higher the pixel consistency.
[0046] Specifically, the pixel loss value mentioned above can be calculated using the following formula:
[0047] in, Indicates the pixel loss value; This represents the value of the i-th pixel in the intermediate frame; represents the value of the i-th pixel in the (n+1)-th frame; |.| represents the absolute value; N represents the number of pixels in the intermediate frames and the (n+1)-th frame.
[0048] The aforementioned perceptual feature extraction network refers to a pre-trained neural network used to extract high-level visual semantic features of images. In this embodiment, the perceptual feature extraction network can be a VGG16 network.
[0049] In this embodiment, the intermediate frame image and the (n+1)th frame image are respectively input into a trained perceptual feature extraction network for feature extraction, resulting in a first perceptual feature map and a second perceptual feature map. Here, the first perceptual feature map is a high-level semantic feature map obtained by the perceptual feature extraction network after extracting the intermediate frame image; the second perceptual feature map is a high-level semantic feature map obtained by the perceptual feature extraction network after extracting the real (n+1)th frame image.
[0050] For example, the aforementioned perceptual feature difference refers to the feature difference between the first perceptual feature map and the second perceptual feature map, which is a quantitative method for measuring the consistency of two images at the visual semantic level. Specifically, the above perceptual difference calculation can also utilize the L1 loss function. Based on this, the perceptual loss value is also an L1 loss value, which can reflect the degree of difference between two images at the visual semantic level; the smaller the value, the higher the pixel consistency.
[0051] Specifically, the aforementioned perceptual loss value can be calculated using the following formula:
[0052] in, This represents the perceived loss value; This represents the i×j×k-th pixel value of the first perceptual feature map; This represents the i×j×k-th pixel value of the second perceptual feature map; This represents the number of pixels in the perceptual feature map.
[0053] In this embodiment of the application, after calculating the pixel loss value and the perception loss value, a weighted average can be calculated on the pixel loss value and the perception loss value to obtain the target loss value.
[0054] In this embodiment of the application, after the target loss value is calculated, the model parameters of the neural network model (e.g., the weights of the convolution kernel, the bias value, the learning rate, etc.) can be adjusted based on the target loss value through the backpropagation algorithm. Then, the steps S130 and S140 are repeated to continuously adjust the model parameters until the first set condition is met, and a trained image intermediate frame generation model is obtained.
[0055] In this embodiment, when training the image intermediate frame generation model, the initial learning rate can be set to 1e-4. During training, the Adam optimizer can be used to continuously adjust the model parameters of the neural network model based on the target loss value, and the learning rate can be adjusted using a cosine annealing strategy, ultimately obtaining a well-trained image intermediate frame generation model.
[0056] The first setting condition mentioned above can be implemented in many ways. For example, the target loss value converges, the target loss value is less than or equal to a set value (e.g., 0.8), the number of iterations reaches a set number (e.g., 100 times), etc. The embodiments of this application do not specifically limit this.
[0057] This application embodiment acquires multiple high-frame-rate DSA training image sequences, then preprocesses each image in each DSA training image sequence to obtain an optimized DSA image sequence. The nth and (n+2)th frames of each optimized DSA image sequence are input into a neural network model, enabling the model to autonomously learn complex prior knowledge of vascular motion from a large amount of high-frame-rate data, obtaining intermediate frame images. Then, a target loss value is calculated based on the intermediate frame images and the (n+1)th frame image, and the model parameters of the neural network model are adjusted based on the target loss value. The above steps are repeated to train the model until the target loss value meets the set loss value condition, resulting in a trained intermediate frame generation model. The trained intermediate frame generation model performs frame interpolation on low-frame-rate DSA image sequences, enabling frame interpolation in images acquired at lower frame rates, converting low-frame-rate images into high-frame-rate images, facilitating diagnosis by doctors.
[0058] The following is combined Figure 2 The illustrated diagram of the neural network model structure describes the training process of the neural network model: As an optional embodiment of this application, step S130 above, which involves inputting the image of the nth frame and the image of the (n+2)th frame in the DSA optimized image sequence into a neural network model to obtain an intermediate frame image, includes: inputting the image of the nth frame and the image of the (n+2)th frame into an optical flow estimation module to obtain a target optical flow map, which is used to characterize the pixel motion trajectory from the image of the nth frame to the image of the (n+2)th frame; inputting the target optical flow map into an optical flow encoder for feature extraction to obtain optical flow features; and inputting the image of the nth frame and the image of the (n+2)th frame into an image encoder for feature extraction to obtain feature information of the image of the nth frame and feature information of the image of the (n+2)th frame; inputting the optical flow features, the feature information of the image of the nth frame, and the feature information of the image of the (n+2)th frame together into a feature fusion module for feature fusion to obtain fused features; and inputting the fused features and the optical flow features into a decoder so that the decoder generates an intermediate frame image based on the fused features, with the optical flow features as a spatial constraint.
[0059] For example, embodiments of this application can employ the IRR-PWC optical flow algorithm to analyze the motion trajectory of blood vessels between consecutive image frames. This algorithm can capture the overall displacement of large blood vessels at a coarse scale and capture the subtle movements of small blood vessels at a fine scale. In specific implementation, optical flow estimation is performed at the top layer of the pyramid (i.e., the lowest resolution), and then the coarse optical flow result from the previous layer is upsampled and passed to the next layer (i.e., the higher resolution layer). Simultaneously, a more refined feature map is used to perform residual correction on the coarse result from the previous layer, thereby capturing the subtle movements and edge details of small blood vessels.
[0060] Specifically, such as Figure 2 As shown, the images of frame n and frame (n+2) are input into the optical flow estimation module (i.e., the IRR-PWC optical flow network structure) for IRR-PWC optical flow analysis to obtain the target optical flow map. This target optical flow map is an image that characterizes the pixel motion trajectory from frame n to frame (n+2), and can include the motion direction and distance of each pixel.
[0061] For example Figure 2 As shown, after the optical flow estimation module outputs the target optical flow map, the target optical flow map is input into the optical flow encoder for feature extraction to obtain optical flow features. Here, the optical flow encoder refers to the neural network module that extracts features from the optical flow map; a convolutional neural network (CNN) can be used. These optical flow features can characterize the high-dimensional features of blood vessel motion.
[0062] For example Figure 2As shown, while inputting the images of the nth frame and the (n+2)th frame into the optical flow estimation module, the images of the nth frame and the (n+2)th frame can also be input into the image encoder for feature extraction to obtain the feature information of the nth frame and the (n+2)th frame. The image encoder is used to extract features from the images. In this embodiment, the image encoder can be constructed using a U-Net convolutional neural network (which may include downsampling and skip connections), capable of extracting spatial features such as blood vessel morphology, grayscale distribution, and tissue texture from the images, and then outputting multi-scale feature maps (i.e., the feature information of the nth frame and the (n+2)th frame). The number of channels in the feature map can be, for example, 64 / 128 / 256, and the spatial size gradually decreases with downsampling.
[0063] Then, the extracted optical flow features, the feature information of the nth frame, and the feature information of the (n+2)th frame are input together into the feature fusion module for feature fusion. That is, direct connection is made in the channel dimension, which can combine motion features and spatial features to obtain fused features.
[0064] Finally, the fused features and optical flow features are input into the decoder, so that the decoder generates intermediate frame images based on the fused features, using the optical flow features as spatial constraints. Figure 2 (T+1 in the original text). Here, the decoder refers to the neural network module that generates high-resolution intermediate frame images based on fused features. In this embodiment, a series of transposed convolutional structures can be used to gradually restore the fused high-dimensional features into pixel-level intermediate frame images with the same size as the input image. Of course, during the decoding process, the decoder also uses optical flow features as spatial constraints to guide the correct generation of the position and motion trend of each pixel in the intermediate frame image, ensuring that the generated intermediate frame image conforms to the blood vessel motion pattern between two frames and avoiding pixel position distortion.
[0065] This application embodiment integrates an IRR-PWC optical flow network structure into the U-Net network architecture, which can accurately extract the pixel-level motion trajectory of two frames of images, providing a motion basis that conforms to hemodynamics for the generation of intermediate frame images and avoiding vascular motion distortion in the generated intermediate frames. At the same time, through the design of dual encoders, optical flow features and spatial features of the images are extracted separately, which can comprehensively capture the core information of DSA images. Compared with single feature extraction, the richness and accuracy of features are improved. Furthermore, the decoder generates intermediate frames with optical flow features as spatial constraints, which can effectively constrain the spatial position of pixels, ensuring that the vascular motion of the generated intermediate frames is continuous and consistent with that of the preceding and following frames, thus improving the smoothness of frame interpolation.
[0066] The following is combined Figure 3The structural example diagram of the optical flow estimation module shown illustrates the calculation process of the optical flow estimation module: As a specific embodiment of this application, such as Figure 3 As shown, the optical flow estimation module described above may include: a convolutional encoder, a warp layer, a cost vohume layer, an optical flow decoder, and a context network module.
[0067] The convolutional encoder refers to the module in the optical flow estimation module used to downsample and extract features from the input image. In this embodiment, the convolutional encoder can be a CNN with shared weights, which includes convolutional layers and downsampling layers, and can construct a multi-scale feature pyramid for the two input frames.
[0068] The warp layer refers to the functional layer in the optical flow estimation module used to perform pixel-level coordinate transformation on the feature map. It can align two feature maps in spatial position based on the current optical flow estimation result.
[0069] The cost volume layer is a functional layer used to calculate the pixel correlation between two feature maps. It can generate a three-dimensional cost volume (i.e., a similarity map) and encode pixel-level matching similarity information.
[0070] The optical flow decoder refers to the module used to correct and optimize the optical flow estimation results. It can output an optical flow residual map based on information such as similarity map and feature map, and update the current optical flow map. In the embodiments of this application, it can also be implemented using a CNN with shared weights.
[0071] The context network module is used to optimize and smooth the updated optical flow graph, which can improve the accuracy of the optical flow graph and eliminate local noise and discontinuities in the optical flow graph.
[0072] Specifically, the image of frame n and the image of frame (n+2) are input into the optical flow estimation module to obtain the target optical flow map. This includes: inputting the image of frame n and the image of frame (n+2) into a convolutional encoder for downsampling processing to obtain multiple feature map pairs; each feature map pair includes a one-to-one corresponding first feature map and second feature map; wherein the first feature map is the feature map of the image of frame n, and the second feature map is the feature map of the image of frame (n+2); the first feature map and the second feature map in each feature map pair have the same resolution, and the resolutions of each feature map pair are different; for the first feature map pair in the feature map pair, the second feature map in the first feature map pair is input into the warp layer for alignment processing to obtain a second feature map aligned with the first feature map in the first feature map pair; the aligned second feature map and the first feature map are input into the cost layer. In the vohume layer, a similarity map between the aligned second feature map and the first feature map is calculated. The initial optical flow map, the similarity map, the first feature map, and the aligned second feature map are input into the optical flow decoder to obtain an optical flow residual map. The initial optical flow map is updated using the optical flow residual map, and the updated initial optical flow map is input into the context network module for optimization to obtain an optimized initial optical flow map. The optimized initial optical flow map is used as the new initial optical flow map, and the process returns to the step of inputting the second feature map in the first feature map pair into the warp layer for alignment processing until the second set condition is met to obtain a candidate optical flow map. The candidate optical flow map is upsampled, and the optical flow map obtained by the upsampling process is used as the new initial optical flow map to process the next feature map pair in the feature map pair until all feature map pairs are processed to obtain the target optical flow map.
[0073] For example, the aforementioned feature map pair refers to a set of one-to-one corresponding feature maps obtained by the convolutional encoder after performing multi-scale downsampling on the nth frame and the (n+2)th frame. Each feature map pair includes a one-to-one corresponding first feature map and a second feature map, wherein the first feature map is obtained by downsampling the nth frame and the second feature map is obtained by downsampling the (n+2)th frame.
[0074] In each feature map pair mentioned above, the first and second feature maps have the same resolution, but the resolutions of each feature map pair differ, and the resolutions of the feature map pairs are arranged in descending order. For example, the feature map in the first feature map pair has a resolution of 512×512, the feature map in the second feature map pair has a resolution of 256×256, the feature map in the third feature map pair has a resolution of 128×128, and so on. These multi-scale resolution feature map pairs allow the optical flow estimation module to capture the overall motion of large blood vessels at a coarse scale and the subtle motion of small blood vessels at a fine scale.
[0075] Specifically, such as Figure 3As shown, the images of the nth frame and the (n+2)th frame are input into a convolutional encoder for downsampling processing, thereby constructing a multi-scale feature pyramid (i.e., ...) for the images of the nth frame and the (n+2)th frame respectively. Figure 3 The process involves creating two pyramids (one pyramid and two pyramids) to obtain multiple feature map pairs, and then sequentially inputting each feature map pair into the warp layer.
[0076] In this embodiment of the application, the feature map pair at the top of the pyramid (i.e., the first feature map pair) is processed first, and at this time, the initial optical flow map is set to 0 values.
[0077] Specifically, for the first feature map pair in the feature map pair (i.e., the feature map pair at the top of the pyramid), the second feature map in the first feature map pair is input into the warp layer for alignment processing, resulting in a second feature map aligned with the first feature map in the first feature map pair. That is, the warp layer performs pixel coordinate transformation (i.e., warping operation) on the second feature map based on the initial optical flow map, so that its spatial position is consistent with the first feature map, thus obtaining the aligned second feature map.
[0078] Then the aligned second feature map is compared with the first feature map. Figure 1 The input is fed into the cost vohume layer, which calculates a three-dimensional cost volume (i.e., a similarity map between the aligned second feature map and the first feature map). Here, the cost vohume layer calculates the pixel correlation between the aligned second feature map and the first feature map to obtain the similarity map.
[0079] For example Figure 3 The initial optical flow map, similarity map, first feature map, and aligned second feature map are described above. Figure 1 The input is fed into the optical flow decoder to obtain the optical flow residual map, which is a correction of the initial optical flow map.
[0080] Next, the initial optical flow map is updated using the optical flow residual map, that is, the optical flow residual map is superimposed on the initial optical flow map to complete the update of the optical flow map. The updated initial optical flow map is then input into the context network module for optimization (i.e., detailed optimization and smoothing of the updated optical flow) to obtain the optimized initial optical flow map.
[0081] The optimized initial optical flow map is used as the new initial optical flow map. The process then returns to the step of inputting the second feature map from the first feature map pair into the warp layer for alignment. That is, the steps of alignment, similarity calculation, and optical flow map update are repeated within the same pyramid layer for multiple iterations to gradually refine the motion estimation results at that level until a second set condition is met. At this point, the iteration of the current pyramid layer is considered to have obtained a candidate optical flow map. The second set condition can be of many kinds, such as reaching a set number of iterations (e.g., 10 times), etc. This application embodiment does not specifically limit this.
[0082] After completing the iteration of the current pyramid layer, the candidate optical flow map is upsampled, and the resulting optical flow map is used as the new initial optical flow map. The next feature map pair in the feature map pair is then processed, and the processing procedure is the same as that for the first feature map pair, which will not be repeated here. This process continues until all feature map pairs have been processed, resulting in the target optical flow map.
[0083] Figure 4 An exemplary flowchart of an image intermediate frame generation method 400 according to some embodiments of this application is shown. It is understood that the image intermediate frame generation method 400 described above can be executed by any suitable device with data processing capabilities, such as including but not limited to terminal devices, processors, and servers.
[0084] It should be noted that the execution device of the above-mentioned training method 100 for generating intermediate images can be the same device as the execution device of the method 400 for generating intermediate images, or it can be a different device. This application embodiment does not specifically limit this.
[0085] like Figure 4 As shown, the image intermediate frame generation method 400 includes: step S410: acquiring a DSA image sequence to be interpolated, wherein the acquisition frame rate of the DSA image sequence to be interpolated is less than a set frame rate; step S420: preprocessing the DSA image sequence to be interpolated to obtain a target DSA image sequence to be interpolated; step S430: inputting each adjacent image pair in the target DSA image sequence to be interpolated into a trained image intermediate frame generation model to obtain intermediate frame images of adjacent image pairs; wherein, the image intermediate frame generation model is trained based on the training method of the image intermediate frame generation model in the above embodiment.
[0086] For example, the DSA image sequence to be interpolated in step S410 above refers to a clinically acquired low frame rate (i.e., the acquisition frame rate is less than the set frame rate) DSA image sequence. For example, it could be a diagnostic DSA image sequence for patients with intracranial vascular stenosis.
[0087] In this embodiment, the frame rate set above can be referred to the description of the above embodiment, that is, it can be, for example, 15 frames / second. Based on this, the acquisition frame rate of the DSA image sequence to be interpolated in this embodiment can be, for example, 6 frames / second, 10 frames / second, or other clinically common frame rates.
[0088] In this embodiment of the application, the DSA image sequence to be interpolated can be obtained by means of DSA equipment acquisition, medical imaging workstation retrieval, etc., and this embodiment of the application does not make specific limitations on this.
[0089] For example, after obtaining the DSA image sequence to be interpolated, each image in the DSA image sequence to be interpolated is preprocessed to obtain the target DSA image sequence to be interpolated. The specific preprocessing process can be found in the description of the above embodiment, and will not be repeated here.
[0090] In this embodiment, each adjacent image pair (i.e., the image of the m-th frame and the image of the (m+1)-th frame) in the target DSA image sequence to be interpolated is input into a pre-trained image intermediate frame generation model. The image intermediate frame generation model performs a series of operations and outputs the intermediate frame image of the adjacent image pair. The aforementioned image intermediate frame generation model can be trained using the training method 100 of the image intermediate frame generation model shown in the above embodiment.
[0091] As an optional embodiment of this application, after obtaining the intermediate frame images of adjacent image pairs, the intermediate frame generation method 400 further includes: performing edge detection on the intermediate frame images; and using an edge enhancement algorithm to enhance the detected edges to obtain an optimized intermediate frame image.
[0092] For example, the edge detection described above refers to the operation of extracting the contours of blood vessel edges from intermediate frame images using a specific edge detection algorithm. Such edge detection algorithms can be, for example, conventional Canny operator edge detection, Sobel operator edge detection, etc.
[0093] In this embodiment, after detecting the blood vessel edge contour, an edge enhancement algorithm is used to sharpen and improve the contrast of the detected edges to obtain a sharper and clearer intermediate frame optimized image of the blood vessel edges, avoiding the introduction of artifacts. The aforementioned edge enhancement algorithm can be implemented in many ways, such as deep learning-based edge enhancement algorithms, unsharpened mask-based edge enhancement algorithms, gradient-based adaptive edge enhancement algorithms, etc. This embodiment does not specifically limit the edge enhancement algorithm used.
[0094] This application embodiment enhances the edges of intermediate frame images using an edge enhancement algorithm, which can better highlight blood vessel boundaries and improve image clarity.
[0095] The following is combined Figure 5 The exemplary flowchart shown illustrates the image interpolation process: like Figure 5 As shown, after acquiring the low frame rate DSA image sequence to be interpolated, preprocessing operations such as image denoising, region of interest extraction, and inter-frame registration are performed on the low frame rate DSA image sequence to obtain the target interpolated frame DSA image sequence. Then, the target interpolated frame DSA image sequence is input into the image intermediate frame generation model. Motion estimation and feature extraction are performed by the IRR-WPC optical flow network structure in the image intermediate frame generation model to obtain optical flow features. Spatial features are extracted by the image encoder in the image intermediate frame generation model. The feature fusion module fuses the spatial features and optical flow features, and then the decoder decodes the fused features to obtain the intermediate frame image. An edge enhancement algorithm is used to post-process the intermediate frame image to obtain the optimized intermediate frame image, thus obtaining the final high frame rate DSA sequence.
[0096] Figure 6 An exemplary structural block diagram of a training apparatus for an image intermediate frame generation model according to some embodiments of this application is shown.
[0097] like Figure 6 As shown, the training device 600 for the image intermediate frame generation model includes: a DSA training image sequence acquisition module 610, used to acquire multiple DSA training image sequences, wherein the acquisition frame rate of the DSA training image sequences is greater than or equal to a set frame rate; a preprocessing module 620, used to preprocess each image in each DSA training image sequence to obtain a DSA optimized image sequence; a first intermediate frame image generation module 630, used to input the image of the nth frame and the image of the (n+2)th frame in each DSA optimized image sequence into a neural network model to obtain an intermediate frame image; and an image intermediate frame generation model generation module 640, used to calculate a target loss value based on the intermediate frame image and the image of the (n+1)th frame, and adjust the model parameters of the neural network model based on the target loss value, and return to execute the step of inputting the image of the nth frame and the image of the (n+2)th frame in each DSA optimized image sequence into a neural network model to obtain an intermediate frame image, until the first set condition is met, and a trained image intermediate frame generation model is obtained.
[0098] As an optional embodiment of this application, the preprocessing module is specifically used to perform at least one of the following to obtain a DSA optimized image sequence: for each DSA training image sequence, filtering is performed on each image in the DSA training image sequence; for each DSA training image sequence, region of interest extraction is performed on each image in the DSA training image sequence; for each DSA training image sequence, a reference image of the DSA training image sequence is obtained, and inter-frame registration is performed on each image in the DSA training image sequence based on the reference image.
[0099] As an optional embodiment of this application, the above-mentioned neural network model includes: an optical flow estimation module, an optical flow encoder, an image encoder, a feature fusion module, and a decoder; the first intermediate frame image generation module 630 is specifically used to: input the image of the nth frame and the image of the (n+2)th frame into the optical flow estimation module to obtain a target optical flow map, the target optical flow map being used to characterize the pixel motion trajectory from the image of the nth frame to the image of the (n+2)th frame; input the target optical flow map into the optical flow encoder for feature extraction to obtain optical flow features; and input the image of the nth frame and the image of the (n+2)th frame into the image encoder for feature extraction to obtain feature information of the image of the nth frame and feature information of the image of the (n+2)th frame; input the optical flow features, the feature information of the image of the nth frame, and the feature information of the image of the (n+2)th frame together into the feature fusion module for feature fusion to obtain fused features; and input the fused features and the optical flow features into the decoder so that the decoder generates intermediate frame images based on the fused features, with the optical flow features as spatial constraints.
[0100] As an optional embodiment of this application, the optical flow estimation module includes: a convolutional encoder, a warp layer, a costvohume layer, an optical flow decoder, and a context network module; the above-mentioned inputting the image of the nth frame and the image of the (n+2)th frame into the optical flow estimation module to obtain the target optical flow map includes: inputting the image of the nth frame and the image of the (n+2)th frame into the convolutional encoder for downsampling processing to obtain multiple feature map pairs; each feature map pair includes a one-to-one corresponding first feature map and second feature map; wherein, the first feature map is the feature map of the image of the nth frame, and the second feature map is the feature map of the image of the (n+2)th frame; the first feature map and the second feature map in each feature map pair have the same resolution, and the resolutions of each feature map pair are different; for the first feature map pair in the feature map pair, the second feature map in the first feature map pair is input into the warp layer for alignment processing to obtain a second feature map aligned with the first feature map in the first feature map pair; the aligned second feature map and the first feature map are input into the costvohume layer. In the vohume layer, a similarity map between the aligned second feature map and the first feature map is calculated. The initial optical flow map, the similarity map, the first feature map, and the aligned second feature map are input into the optical flow decoder to obtain an optical flow residual map. The initial optical flow map is updated using the optical flow residual map, and the updated initial optical flow map is input into the context network module for optimization to obtain an optimized initial optical flow map. The optimized initial optical flow map is used as the new initial optical flow map, and the process returns to the step of inputting the second feature map in the first feature map pair into the warp layer for alignment processing until the second set condition is met to obtain a candidate optical flow map. The candidate optical flow map is upsampled, and the optical flow map obtained by the upsampling process is used as the new initial optical flow map to process the next feature map pair in the feature map pair until all feature map pairs are processed to obtain the target optical flow map.
[0101] As an optional embodiment of this application, the image intermediate frame generation model generation module 640 is specifically used for: calculating pixel differences based on the intermediate frame image and the image of the (n+1)th frame to obtain a pixel loss value; inputting the intermediate frame image and the image of the (n+1)th frame into a pre-trained perceptual feature extraction network for feature extraction to obtain a first perceptual feature map and a second perceptual feature map; calculating perceptual feature differences based on the first perceptual feature map and the second perceptual feature map to obtain a perceptual loss value; and obtaining the target loss value based on the pixel loss value and the perceptual loss value.
[0102] As an optional embodiment of this application, the pixel loss value is calculated using the following formula:
[0103] in, Indicates the pixel loss value; This represents the value of the i-th pixel in the intermediate frame; represents the value of the i-th pixel in the (n+1)-th frame; |.| represents the absolute value; N represents the number of pixels in the intermediate frames and the (n+1)-th frame; And / or, The perception loss value is calculated using the following formula:
[0104] in, This represents the perceived loss value; This represents the i×j×k-th pixel value of the first perceptual feature map; This represents the i×j×k-th pixel value of the second perceptual feature map; This represents the number of pixels in the perceptual feature map.
[0105] For detailed implementation methods and beneficial effects, please refer to the description of the above method embodiments, which will not be repeated here.
[0106] Figure 7 An exemplary structural block diagram of an image intermediate frame generation apparatus according to some embodiments of this application is shown.
[0107] like Figure 7 As shown, the image intermediate frame generation device 700 includes: a DSA image sequence acquisition module 710 for acquiring a DSA image sequence to be interpolated, wherein the acquisition frame rate of the DSA image sequence to be interpolated is less than a set frame rate; a target DSA image sequence acquisition module 720 for preprocessing the DSA image sequence to be interpolated to obtain a target DSA image sequence to be interpolated; and a second intermediate frame image generation module 730 for inputting each adjacent image pair in the target DSA image sequence to be interpolated into a trained image intermediate frame generation model to obtain intermediate frame images of adjacent image pairs; wherein the image intermediate frame generation model is trained based on the training method of the image intermediate frame generation model in the above embodiment.
[0108] As an optional embodiment of this application, after obtaining the intermediate frame images of adjacent image pairs, the intermediate frame generation apparatus 700 further includes: an edge detection module for performing edge detection on the intermediate frame images; and an optimization module for performing edge enhancement processing on the detected edges using an edge enhancement algorithm to obtain an optimized intermediate frame image.
[0109] For detailed implementation methods and beneficial effects, please refer to the description of the above method embodiments, which will not be repeated here.
[0110] Correspondingly, embodiments of this application also provide Figure 6 as well as Figure 7 The hardware structure diagram of the device shown is as follows: Figure 8As shown, the electronic device 800 can be a device for implementing the training method 100 of the above-described image intermediate frame generation model or the image intermediate frame generation method 400. Figure 8 As shown, the electronic device 800 includes a processor 810 and a memory 820. The memory 820 is configured to store program instructions; the processor 810 is configured to load and execute the program instructions stored in the memory 820 to implement embodiments of the image intermediate frame generation model training method 100 or the image intermediate frame generation method 400 as shown above.
[0111] As one embodiment, memory 820 can be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as program instructions, data, etc. For example, memory 820 can be volatile memory, non-volatile memory, or similar storage media. Specifically, memory 820 can be RAM (Random Access Memory), flash memory, storage drives (such as hard disk drives), solid-state drives, any type of storage disk (such as optical discs, DVDs, etc.), or similar storage media, or combinations thereof.
[0112] This concludes the process. Figure 8 Description of the electronic device shown.
[0113] While numerous embodiments of this application have been shown and described herein, it will be apparent to those skilled in the art that such embodiments are provided by way of example only. Many modifications, alterations, and alternatives will arise for those skilled in the art without departing from the spirit and intent of this application. It should be understood that various alternatives to the embodiments of this application described herein may be employed in the practice of this application. The appended claims are intended to define the scope of protection of this application and therefore cover equivalents or alternatives within the scope of these claims.
Claims
1. A training method for an image intermediate frame generation model, characterized in that, include: Acquire multiple DSA training image sequences, wherein the acquisition frame rate of the DSA training image sequences is greater than or equal to a set frame rate; Each image in each DSA training image sequence is preprocessed to obtain a DSA optimized image sequence; For each DSA-optimized image sequence, the image of the nth frame and the image of the (n+2)th frame in the DSA-optimized image sequence are input into the neural network model to obtain the intermediate frame image. The target loss value is calculated based on the intermediate frame image and the image of the (n+1)th frame. The model parameters of the neural network model are adjusted based on the target loss value. The process is repeated for each DSA optimized image sequence, and the image of the nth frame and the image of the (n+2)th frame in the DSA optimized image sequence are input into the neural network model to obtain the intermediate frame image. This process continues until the first set condition is met, and a trained image intermediate frame generation model is obtained.
2. The method according to claim 1, characterized in that, Perform at least one of the following preprocessing steps on each image in each DSA training image sequence to obtain the DSA optimized image sequence: For each DSA training image sequence, each image in the DSA training image sequence is filtered. For each DSA training image sequence, perform region of interest extraction operation on each image in the DSA training image sequence; For each DSA training image sequence, a reference image for that DSA training image sequence is obtained, and inter-frame registration processing is performed on each image in the DSA training image sequence based on the reference image.
3. The method according to claim 1, characterized in that, The neural network model includes: an optical flow estimation module, an optical flow encoder, an image encoder, a feature fusion module, and a decoder; the step of inputting the nth frame and the (n+2)th frame of the DSA-optimized image sequence into the neural network model to obtain intermediate frame images includes: The image of the nth frame and the image of the (n+2)th frame are input into the optical flow estimation module to obtain the target optical flow map, which is used to characterize the pixel motion trajectory from the image of the nth frame to the image of the (n+2)th frame. The target optical flow map is input into the optical flow encoder for feature extraction to obtain optical flow features; and The image of the nth frame and the image of the (n+2)th frame are input into the image encoder for feature extraction to obtain the feature information of the image of the nth frame and the feature information of the image of the (n+2)th frame. The optical flow features, the feature information of the nth frame, and the feature information of the (n+2)th frame are input together into the feature fusion module for feature fusion to obtain fused features; The fusion feature and the optical flow feature are input to the decoder so that the decoder generates the intermediate frame image based on the fusion feature, with the optical flow feature as the spatial constraint.
4. The method according to claim 3, characterized in that, The optical flow estimation module includes: a convolutional encoder, a warp layer, a cost vohume layer, an optical flow decoder, and a context network module; the step of inputting the image of the nth frame and the image of the (n+2)th frame into the optical flow estimation module to obtain the target optical flow map includes: The images of the nth frame and the (n+2)th frame are respectively input into the convolutional encoder for downsampling processing to obtain multiple feature map pairs; each feature map pair includes a one-to-one corresponding first feature map and second feature map; wherein, the first feature map is the feature map of the nth frame image, and the second feature map is the feature map of the (n+2)th frame image; the first feature map and the second feature map in each feature map pair have the same resolution, and the resolutions of each feature map pair are different; For the first feature map pair in the feature map pair, the second feature map in the first feature map pair is input into the warp layer for alignment processing to obtain a second feature map aligned with the first feature map in the first feature map pair; The aligned second feature map and the first feature map are input into the cost vohume layer to calculate the similarity map between the aligned second feature map and the first feature map. The initial optical flow map, the similarity map, the first feature map, and the aligned second feature map are input into the optical flow decoder to obtain the optical flow residual map. The initial optical flow map is updated using the optical flow residual map, and the updated initial optical flow map is input into the context network module for optimization to obtain an optimized initial optical flow map. The optimized initial optical flow map is used as the new initial optical flow map. The process of inputting the second feature map in the first feature map pair into the warp layer for alignment is repeated until the second set condition is met, and a candidate optical flow map is obtained. The candidate optical flow map is upsampled, and the optical flow map obtained by the upsampling process is used as a new initial optical flow map. The next feature map pair in the feature map pair is then processed until all feature map pairs are processed to obtain the target optical flow map.
5. The method according to claim 1, characterized in that, The calculation of the target loss value based on the intermediate frame image and the (n+1)th frame image includes: The pixel loss value is calculated based on the pixel difference between the intermediate frame image and the (n+1)th frame image. The intermediate frame image and the (n+1)th frame image are respectively input into the trained perceptual feature extraction network for feature extraction to obtain the first perceptual feature map and the second perceptual feature map. The perceptual feature difference is calculated based on the first perceptual feature map and the second perceptual feature map to obtain the perceptual loss value; The target loss value is obtained based on the pixel loss value and the perceptual loss value.
6. The method according to claim 5, characterized in that, The pixel loss value is calculated using the following formula: in, Indicates the pixel loss value; This represents the value of the i-th pixel in the intermediate frame; represents the value of the i-th pixel in the (n+1)-th frame; |.| represents the absolute value; N represents the number of pixels in the intermediate frames and the (n+1)-th frame; And / or, The perception loss value is calculated using the following formula: in, This represents the perceived loss value; This represents the i×j×k-th pixel value of the first perceptual feature map; This represents the i×j×k-th pixel value of the second perceptual feature map; This represents the number of pixels in the perceptual feature map.
7. A method for generating intermediate frames of an image, characterized in that, include: Acquire a DSA image sequence of frames to be interpolated, wherein the acquisition frame rate of the DSA image sequence of frames to be interpolated is less than a set frame rate; The DSA image sequence of the frame to be interpolated is preprocessed to obtain the target DSA image sequence of the frame to be interpolated. Each adjacent image pair in the target DSA image sequence to be interpolated is input into a pre-trained intermediate image generation model to obtain an intermediate image of the adjacent image pair; wherein, the intermediate image generation model is trained based on the training method of the intermediate image generation model according to any one of claims 1-6.
8. The method according to claim 7, characterized in that, After obtaining the intermediate frame images of the adjacent image pairs, the method further includes: Edge detection is performed on the intermediate frame image; An edge enhancement algorithm is used to enhance the detected edges to obtain an optimized intermediate frame image.
9. An electronic device, characterized in that, include: A processor, configured to execute program instructions; as well as A memory configured to store the program instructions, which, when loaded and executed by the processor, cause the processor to perform the training method for the image intermediate frame generation model according to any one of claims 1-6 or the image intermediate frame generation method according to any one of claims 7-8.
10. A computer-readable storage medium storing program instructions, characterized in that, When the program instructions are loaded and executed by the processor, the processor performs the training method of the image intermediate frame generation model according to any one of claims 1-6 or the image intermediate frame generation method according to any one of claims 7-8.