Method and apparatus for super-resolution enhancement of echocardiogram videos, device, medium
By introducing high-resolution reference images and frame sequence information, the problem of insufficient echocardiographic resolution is solved, and the texture information and diagnostic accuracy of echocardiography are improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTHERN UNIVERSITY OF SCIENCE AND TECHNOLOGY
- Filing Date
- 2023-02-23
- Publication Date
- 2026-06-30
AI Technical Summary
Existing echocardiograms have insufficient resolution, making it difficult to observe cardiac lesions and affecting diagnostic quality. Traditional super-resolution techniques are not effective in improving the resolution of echocardiograms.
By introducing high-resolution reference echocardiograms and using the texture information of the reference images to add constraints to the super-resolution enhancement task, and combining the information of the preceding and following frame sequences with block alignment methods, feature extraction and fusion are optimized to improve the performance of super-resolution enhancement.
It improves the resolution of echocardiogram videos, enhances image texture information, and improves the accuracy of medical diagnosis.
Smart Images

Figure CN116468601B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and in particular to a method, apparatus, device, and medium for super-resolution enhancement of echocardiogram videos. Background Technology
[0002] Ultrasound imaging is characterized by real-time imaging and high contrast. Echocardiography is one of the most widely used diagnostic tools in medical imaging. This imaging method can visualize the size and shape of the heart in real time.
[0003] Examination of cardiac function requires careful observation of the movement and deformation of the myocardium and heart valves (such as the mitral valve apparatus, including the mitral valve, leaflets, subvalvular cords, and papillary muscles) to diagnose conditions such as mitral regurgitation, mitral and tricuspid valve prolapse, and left ventricular apical clots. However, due to the instability of ultrasound, most ultrasound images contain speckle noise and blurred boundaries, resulting in insufficient image resolution. This makes it difficult to observe some cardiac lesions, affecting the quality of diagnosis. Summary of the Invention
[0004] The main objective of this application is to propose a method, apparatus, device, and medium for super-resolution enhancement of echocardiogram videos, thereby improving the resolution of echocardiogram videos.
[0005] To achieve the above objectives, a first aspect of this application proposes a super-resolution enhancement method for echocardiographic videos, the method comprising:
[0006] Acquire echocardiographic video, wherein the echocardiographic video includes frame images;
[0007] Acquire a reference echocardiogram; wherein the resolution of the reference echocardiogram is greater than the resolution of the frame image;
[0008] Each frame image is input into a preset first feature extraction model for feature extraction to obtain the image features to be enhanced for each frame image;
[0009] The reference echocardiogram is input into a preset second feature extraction model for feature extraction to obtain the reference image features;
[0010] The image features to be enhanced and the reference image features of each frame image are input into a preset feature dynamic aggregation model for feature aggregation to obtain the initial frame features of each frame image.
[0011] The initial frame features of the current frame image, the initial frame features of the previous frame image, and the initial frame features of the previous two frames image are input into a preset feature transmission model for feature frame-level fusion to obtain the fused frame features of the current frame image.
[0012] The fused frame features are input into a preset feature reconstruction model for resolution enhancement to obtain a reconstructed echocardiogram; wherein the resolution of the reconstructed echocardiogram is greater than the resolution of the frame image;
[0013] The target reconstructed echocardiogram video is obtained by replacing the current frame image with the reconstructed echocardiogram image.
[0014] In some embodiments, the image features to be enhanced include initial features of initial pixels, and the reference image features include original reference features of reference pixels. The step of inputting the image features to be enhanced and the reference image features of each frame image into a preset feature dynamic aggregation model for feature aggregation to obtain the initial frame features of each frame image includes:
[0015] The selected reference pixel set is obtained by searching the reference pixels based on the initial pixels.
[0016] The original reference features of the selected reference pixel set are aggregated to obtain aggregated reference features;
[0017] The initial features and the aggregated reference features are fused together to obtain the aggregated features of the initial pixel.
[0018] The aggregated features of the initial pixels are merged to obtain the initial frame features of the frame image.
[0019] In some embodiments, a selected set of reference pixels is obtained by searching for the reference pixels based on the initial pixel, including:
[0020] Based on the position of the initial pixel in the frame image, the position of each reference pixel in the reference echocardiogram is matched to obtain the selected reference pixel.
[0021] A preset image region is defined on the reference echocardiogram based on the selected reference pixels;
[0022] The reference pixels located within the preset image area are merged to obtain the selected reference pixel set.
[0023] In some embodiments, the first feature extraction model includes a first upsampling layer, a first convolutional layer, a first normalization layer, and a first linear rectified function. Each frame of the echocardiogram is input into the preset first feature extraction model for feature extraction to obtain the image features to be enhanced for each frame, including:
[0024] The frame image is upsampled using the first upsampling layer to obtain the frame image to be processed.
[0025] The first image feature is obtained by performing convolution processing on the frame image to be processed through the first convolutional layer.
[0026] The first image features are normalized using the first normalization layer to obtain the second image features;
[0027] The second image feature is activated by the first linear rectification function to obtain the image feature to be enhanced in the frame image.
[0028] In some embodiments, the feature transfer model includes an optical flow alignment network and a multi-frame self-attention block. The step of inputting the initial frame features of the current frame, the initial frame features of the previous frame, and the initial frame features of the two previous frames into a preset feature transfer model for feature frame-level fusion to obtain the fused frame features of the current frame includes:
[0029] The optical flow alignment network is used to predict the initial frame features of any two frame images to obtain an optical flow map of the difference between the two frame images, and the initial frame features are segmented into patch blocks.
[0030] Within each patch block, the average value of the optical flow map is calculated, and all pixel values within the patch block are shifted and aligned using the average value to obtain the aligned image features of the current frame, the aligned image features of the previous frame, and the aligned image features of the two previous frames.
[0031] The alignment image features of the current frame, the alignment image features of the previous frame, and the alignment image features of the two previous frames are input into the multi-frame self-attention block for inter-frame feature processing to obtain the fused image features of the current frame.
[0032] In some embodiments, the feature reconstruction model includes a second convolutional layer, a second normalization layer, and a second linear rectified function layer. The step of inputting the fused frame features into a preset feature reconstruction model for resolution enhancement to obtain a reconstructed echocardiogram image includes:
[0033] The fused frame features are processed by convolution in the second convolutional layer to obtain the first intermediate echocardiogram features.
[0034] The first intermediate echocardiogram features are normalized using the second normalization layer to obtain the second intermediate echocardiogram features.
[0035] The reconstructed echocardiogram is obtained by activating the features of the second intermediate echocardiogram through the second linear rectified function layer.
[0036] In some embodiments, the first feature extraction model, the second feature extraction model, the feature dynamic aggregation model, the feature transfer model, and the feature reconstruction model are jointly trained in advance through the following process:
[0037] Acquire sample echocardiogram images;
[0038] The sample echocardiogram images are processed to reduce resolution, resulting in the enhanced echocardiogram images.
[0039] The features of the echocardiogram to be enhanced are obtained by using the first feature extraction model to obtain the features of the sample image to be enhanced.
[0040] The features of the sample echocardiogram image are obtained by using the second feature extraction model to obtain the features of the sample reference image.
[0041] The feature dynamic aggregation model is used to aggregate the features of the sample image to be enhanced and the features of the sample reference image to obtain the features of the frame to be enhanced in the echocardiogram image to be enhanced.
[0042] The feature transfer model is used to perform feature frame-level fusion of the features of at least two frames of the frame to be enhanced to obtain the target enhancement image features of the echocardiogram to be enhanced.
[0043] The target-enhanced image features are enhanced in resolution using the feature reconstruction model to obtain a target-enhanced echocardiogram.
[0044] A mean absolute error loss function is constructed based on the sample echocardiogram and the target enhanced echocardiogram to obtain first loss data. A similarity learning loss function is constructed based on the sample echocardiogram and the target enhanced echocardiogram to obtain second loss data. The first loss data and the second loss data are merged to obtain target loss data.
[0045] The parameters of the first feature extraction model, the second feature extraction model, the feature dynamic aggregation model, the feature transfer model, and the feature reconstruction model are adjusted based on the target loss data.
[0046] To achieve the above objectives, a second aspect of this application provides a super-resolution enhancement device for echocardiographic videos, the device comprising:
[0047] The video acquisition module is used to acquire echocardiogram videos, which include frame images;
[0048] A reference image acquisition module is used to acquire a reference echocardiogram; wherein the resolution of the reference echocardiogram is greater than the resolution of the frame image;
[0049] The image feature extraction module to be enhanced inputs each frame image into a preset feature extraction model to extract features and obtains the image features to be enhanced for each frame image;
[0050] The reference image feature extraction module is used to input the reference echocardiogram into the feature extraction model to extract features and obtain reference image features.
[0051] The feature aggregation module inputs the image features to be enhanced and the reference image features of each frame image into a preset feature dynamic aggregation model to perform feature aggregation and obtain the initial frame features of each frame image.
[0052] The feature frame-level fusion module is used to input the initial frame features of the current frame image, the initial frame features of the previous frame image, and the initial frame features of the previous two frames image into a preset feature transmission model for feature frame-level fusion to obtain the fused frame features of the current frame image.
[0053] The feature reconstruction module is used to input the fused frame features into a preset feature reconstruction model for resolution enhancement to obtain a reconstructed echocardiogram image; wherein the resolution of the reconstructed echocardiogram image is greater than the resolution of the frame image;
[0054] An image merging module is used to replace the current frame image with the reconstructed echocardiogram image to obtain the target reconstructed echocardiogram video.
[0055] To achieve the above objectives, a third aspect of this application provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the method described in the first aspect.
[0056] To achieve the above objectives, a fourth aspect of the present application provides a storage medium, which is a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method described in the first aspect.
[0057] This application proposes a method, apparatus, device, and medium for super-resolution enhancement of echocardiogram videos. The method introduces a high-resolution reference echocardiogram image and uses the texture information of the reference echocardiogram image to add constraints to the super-resolution enhancement task, thereby reducing the uncertainty of the super-resolution task, making the optimization objective clearer, and comprehensively improving the super-resolution enhancement performance. Attached Figure Description
[0058] Figure 1This is a schematic diagram of the overall framework of the super-resolution enhancement method for echocardiogram video provided in the embodiments of this application;
[0059] Figure 2 This is an optional flowchart of the super-resolution enhancement method for echocardiogram video provided in the embodiments of this application;
[0060] Figure 3A -C is a visual image example provided in the embodiments of this application;
[0061] Figure 4 This is a schematic diagram of the feature extraction model provided in the embodiments of this application;
[0062] Figure 5 This is a schematic diagram of the feature transfer model provided in the embodiments of this application;
[0063] Figure 6 This is a schematic diagram of the structure of a multi-frame self-attention block provided in an embodiment of this application;
[0064] Figure 7 This is an optional flowchart of a super-resolution enhancement method for echocardiogram video provided in another embodiment of this application;
[0065] Figure 8 A schematic diagram of the structure of the super-resolution enhancement device for echocardiogram video provided in an embodiment of this application;
[0066] Figure 9 This is a schematic diagram of the hardware structure of the electronic device provided in the embodiments of this application. Detailed Implementation
[0067] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0068] It should be noted that although functional modules are divided in the device schematic diagram and a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the device or the order in the flowchart. The terms "first," "second," etc., in the specification, claims, and the aforementioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.
[0069] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.
[0070] Artificial intelligence (AI) is a new branch of computer science that studies, develops, and applies theories, methods, technologies, and systems to simulate, extend, and expand human intelligence. It aims to understand the essence of intelligence and produce intelligent machines that can react in a way similar to human intelligence. Research in this field includes robotics, speech recognition, image recognition, natural language processing, and expert systems. AI can simulate the information processes of human consciousness and thought. Furthermore, AI utilizes digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceiving the environment, acquiring knowledge, and using that knowledge to achieve optimal results.
[0071] Super-resolution technology reconstructs a corresponding high-resolution image from an observed low-resolution image. Traditional super-resolution image enhancement methods can be divided into three categories: Filter-based methods, which use multiple filters to scan the image and introduce prior knowledge based on spatial uniformity to recover unknown high-resolution image information; Interpolation-based methods, which utilize multi-surface fitting to fully leverage spatial structure information and reconstruct the image by fusing this information through maximum a posteriori (MAP); and Deep learning-based methods, which design data-driven network models, perform end-to-end training on large datasets, and require only one parameter pre-propagation from the low-resolution image during inference.
[0072] However, super-resolution video enhancement is inherently an ill-posed problem. A set of low-resolution echocardiographic video data can correspond to multiple sets of high-resolution data with different texture information. This uncertainty in the optimization objective poses a challenge to accurate and flexible super-resolution enhancement. Using filtering and interpolation-based methods can easily result in overly smoothed generated images, further reducing the contrast of ultrasound images and hindering subsequent medical diagnosis. Existing deep learning methods also struggle to fully utilize the inter-frame temporal information and highly stable prior knowledge of texture in echocardiography, and are difficult to generalize to processing echocardiographic data with various textures, resulting in poor reconstruction performance. Therefore, traditional super-resolution techniques are difficult to apply to the super-resolution enhancement of echocardiography.
[0073] To address the aforementioned problems, the purpose of this application is to provide a super-resolution enhancement method for echocardiogram videos, the overall framework of which is as follows: Figure 1 As shown, by introducing a high-resolution reference image and utilizing its texture information to impose constraints on the super-resolution enhancement task, the uncertainty of the super-resolution task is reduced, making the optimization objective clearer and comprehensively improving the super-resolution enhancement performance. Furthermore, this embodiment of the application further improves performance by introducing preceding and following frame sequence information and block alignment.
[0074] The super-resolution enhancement method for echocardiogram videos in this application can be executed by a server alone, by a terminal alone, or by both the terminal and the server. Furthermore, the super-resolution enhancement method for echocardiogram videos provided in this application can also be software running on a server. The server can be configured as a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. The software can be an application implementing the super-resolution enhancement method for echocardiogram videos, but is not limited to the above forms.
[0075] The super-resolution enhancement method for echocardiographic videos according to embodiments of this application can be used in numerous general-purpose or special-purpose computer system environments or configurations. Examples include: server computers, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics devices, network PCs, distributed computing environments including any of the above systems or devices, etc. This application can be described in the general context of computer-executable instructions, such as program modules, that are executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform specific tasks or implement specific abstract data types. This application can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In a distributed computing environment, program modules can reside in local and remote computer storage media, including storage devices.
[0076] This application provides a method, apparatus, electronic device, and computer-readable storage medium for super-resolution enhancement of echocardiogram videos. The specific embodiments are described below. First, the method for super-resolution enhancement of echocardiogram videos in this application is described.
[0077] See Figure 1 , Figure 2 A super-resolution enhancement method for echocardiographic videos according to an embodiment of this application includes:
[0078] Step 101: Acquire echocardiographic video, which includes frame images;
[0079] Step 102: Obtain a reference echocardiogram; wherein the resolution of the reference echocardiogram is greater than the resolution of the frame image;
[0080] Step 103: Input each frame of image into the preset first feature extraction model for feature extraction to obtain the image features to be enhanced for each frame of image;
[0081] Step 104: Input the reference echocardiogram image into the preset second feature extraction model for feature extraction to obtain the reference image features;
[0082] Step 105: Input the image features to be enhanced and the reference image features of each frame into a preset feature dynamic aggregation model to perform feature aggregation and obtain the initial frame features of each frame.
[0083] Step 106: Input the initial frame features of the current frame image, the initial frame features of the previous frame image, and the initial frame features of the two previous frames image into the preset feature transmission model for feature frame-level fusion to obtain the fused frame features of the current frame image.
[0084] Step 107: Input the fused frame features into a preset feature reconstruction model for resolution enhancement to obtain a reconstructed echocardiogram; wherein the resolution of the reconstructed echocardiogram is greater than that of the frame image.
[0085] Step 108: Replace the current frame image with the reconstructed echocardiogram image to obtain the target reconstructed echocardiogram video.
[0086] Please see Figure 3A , Figure 3B , Figure 3C . Figure 3A The frame image provided for the embodiments of this application. Figure 3B Reference echocardiograms provided for embodiments of this application. Figure 3C Reconstructed echocardiograms provided for embodiments of this application. From Figure 3A and Figure 3C As can be seen, after the resolution enhancement of this application embodiment, the resolution is fully enhanced. Figure 3B High-resolution texture information was ported to Figure 3A On low-resolution images, thus obtaining Figure 3C High-resolution images.
[0087] Steps 101-108 are described in detail below.
[0088] In step 101, an echocardiographic video is acquired. This echocardiographic video includes multiple frames.
[0089] A frame image is an echocardiogram, which is formed by using the principle of shortwave ranging to allow pulsed ultrasound waves to pass through the chest wall and soft tissue to measure the periodic activity of the underlying structures such as the heart wall, ventricles, and valves. The corresponding activity of each structure is displayed on the monitor as a curve relating to time. These curves are recorded by a recorder, and that is the echocardiogram.
[0090] There are several ways to acquire echocardiogram images. For example, various ultrasound image acquisition devices (such as echocardiographs) can be used to acquire images of the human heart, resulting in echocardiogram videos. Alternatively, echocardiogram videos can be obtained from local or external databases, or by searching for echocardiogram videos on the internet.
[0091] In step 102, a reference echocardiogram is acquired. The resolution of the reference echocardiogram is greater than that of the frame image.
[0092] A reference echocardiogram is similar to a frame image; it is also an echocardiogram image. However, unlike a frame image, the resolution of a reference echocardiogram is greater than that of a frame image. In one example, at the same size, the resolution of the high-resolution reference echocardiogram image is 448*448 pixels, while the resolution of the low-resolution image is 112*112 pixels. In the embodiments of this application, one reference echocardiogram image may correspond to different frame images in an echocardiogram video.
[0093] There are several ways to obtain reference echocardiogram images. If an ultrasound acquisition device is used to acquire a frame image of a certain part to obtain an echocardiogram video, a higher resolution ultrasound acquisition device can be used to scan similar parts to obtain the required reference echocardiogram image.
[0094] In step 103, each frame image is input into a preset first feature extraction model for feature extraction to obtain the image features to be enhanced for each frame image. Here, the first feature extraction model is used to extract features from the frame images, thereby obtaining the image features to be enhanced for the frame images.
[0095] Reference Figure 4 The first feature extraction model includes an upsampling layer and at least one convolutional block (e.g., five convolutional blocks), each convolutional block including a convolutional layer, a normalization layer, and a linear rectified function.
[0096] The upsampling layer uses bilinear interpolation to upsample the image. Each convolutional layer is followed by a normalization layer and a Rectified Linear Unreal Engine (ReLU). The input to each convolutional block is the features from the previous scale, and the output is the features from the next scale. Compared to the features from the previous scale, the features at the next scale are half the size in spatial dimension but twice the size in channel dimension.
[0097] After upsampling, the pixels of the frame image are the same as those of the reference echocardiogram image. In one example, for echocardiograms of the same scene, the reference echocardiogram image has a resolution of 448*448 pixels, while the frame image has a resolution of 112*112 pixels. After the reference echocardiogram image and the frame image are input into the feature extraction module, they are first passed through an upsampling layer to align their size with the size of the high-resolution image to be reconstructed, which in this embodiment is aligning it to 448*448 pixels, and then passed into a convolutional layer for processing.
[0098] The Rectified Linear Activation Function (ReLU) is an activation function. To balance computational simplicity and model flexibility, the model employs a combination of linear operations on the processing nodes and a non-linear transformation of the activation function. ReLU is a piecewise linear function; if the input is positive, it outputs directly; otherwise, it outputs zero. Its advantages include making the model easier to train and typically achieving better performance.
[0099] Normalization layers normalize features to ensure the output follows a normalized function (RCF). Generally, model inputs are normalized to ensure they conform to a normal distribution with mean u and variance h, accelerating convergence. However, after convolution in a convolutional layer, the result may no longer follow this normal distribution. Inputting this result into an RCF can cause some convolutions to fall into the RCF's saturation region, leading to gradient vanishing during training. Normalization ensures the convolution result conforms to a normal distribution, preventing gradient vanishing when input into an RCF.
[0100] In one embodiment, step 103 specifically includes:
[0101] The frame image is upsampled by the first upsampling layer to obtain the frame image to be processed;
[0102] The first image features are obtained by performing convolution processing on the image of the frame to be processed through the first convolutional layer.
[0103] The first image features are normalized by the first normalization layer to obtain the second image features;
[0104] The second image features are activated by the first linear rectification function to obtain the image features to be enhanced in the frame image.
[0105] In step 104, the reference echocardiogram image is input into a preset second feature extraction model for feature extraction to obtain reference image features. This second feature extraction model is used to extract features from the reference echocardiogram image to obtain the reference image features.
[0106] Similar to the first feature extraction model, the second feature extraction model also includes an upsampling layer and at least one convolutional block, each convolutional block including a convolutional layer, a normalization layer, and a linear rectified function.
[0107] In one embodiment, step 104 specifically includes:
[0108] The reference echocardiogram image is upsampled using a second upsampling layer to obtain the reference image to be processed.
[0109] The third convolutional layer performs convolution processing on the reference image to be processed, thereby obtaining the third image features;
[0110] The third image features are normalized using the third normalization layer to obtain the fourth image features;
[0111] The fourth image feature is activated by the third linear rectification function to obtain the reference image feature.
[0112] In step 105, the image features to be enhanced and the reference image features of each frame are input into a preset feature dynamic aggregation model for feature aggregation to obtain the initial frame features of each frame.
[0113] Specifically, after obtaining the corresponding features of the frame image and the reference echocardiogram image, a dynamic feature aggregation model is used to fuse the texture in the reference image features. The dynamic feature aggregation model can specifically employ deformable convolutional networks.
[0114] In one embodiment, the image features to be enhanced include the initial features of the initial pixels, and the reference image features include the original reference features of the reference pixels. Step 105 specifically includes:
[0115] The selected set of reference pixels is obtained by searching for reference pixels based on the initial pixel points.
[0116] The original reference features of the selected reference pixel set are aggregated to obtain aggregated reference features;
[0117] The initial features and aggregated reference features are fused together to obtain the aggregated features of the initial pixels.
[0118] The aggregated features of the initial pixels are merged to obtain the initial frame features of the frame image.
[0119] Furthermore, in another embodiment, the step of searching for reference pixels based on the initial pixel points to obtain a selected set of reference pixels specifically includes:
[0120] Based on the position of the initial pixel in the frame image, the position of each reference pixel in the reference echocardiogram is matched to obtain the selected reference pixel.
[0121] A preset image region is defined on the reference echocardiogram based on the selected reference pixels;
[0122] The reference pixels located within the preset image area are merged to obtain the selected reference pixel set.
[0123] Specifically, the initial pixel position can be (x1, y1), and the reference pixel position can be (x2, y2). Distance is calculated based on the positions of the initial and reference pixels. The reference pixel with the smallest distance value d is selected as the reference pixel. The preset image region can be a 3x3 grid centered on the selected reference pixel. In one example, for each pixel p in the input low-resolution image, the corresponding point p' in the reference echocardiogram image is first obtained. Then, a feature dynamic aggregation model is used to aggregate the texture feature information around p'. Let p0 represent the spatial difference between positions p and p', i.e., p0 = p' - p. The role of pk is to enumerate the pixels "around" pixel p, i.e., the coordinates of (p + pk) are the 3x3 grid coordinates centered on pixel p. Let the original reference feature be x, and the aggregated reference feature y at position p is calculated using a modified deformable convolutional network as follows:
[0124]
[0125] Where, p k ∈{(-1,1),(-1,0),(-1,-1),(0,1),(0,0),(0,-1),(1,1),(1,0),
[0126] (1,-1)},ω k The weights of the convolution kernel, Δp k Denotes the learnable offset scalar, Δm k This represents a learnable modulation scalar.
[0127] In step 106, the initial frame features of the current frame image, the initial frame features of the previous frame image, and the initial frame features of the previous two frames image are input into a preset feature transmission model for feature frame-level fusion to obtain the fused frame features of the current frame image.
[0128] After obtaining the aggregated reference image features y, each frame of the echocardiogram video and its corresponding aggregated reference image features y are used as initial frame features and input into the feature transfer module. The feature transfer model is as follows: Figure 5 As shown, it consists of multiple feature transmission blocks.
[0129] The feature transmission block includes an optical flow alignment network and a multi-frame self-attention block. Step 106 specifically includes:
[0130] An optical flow alignment network is used to predict the features of any two initial frames to obtain an optical flow map of the difference between the two frame images, and the initial frame features are segmented into patch blocks; wherein, any two initial frame features include: the initial frame features of the current frame image and the initial frame features of the previous frame image, or the initial frame features of the previous frame image and the initial frame features of the two previous frames image, or the initial frame features of the current frame image and the initial frame features of the two previous frames image.
[0131] Within each patch block, the average value of the optical flow map is calculated, and all pixel values within the patch block are shifted and aligned using the average value to obtain the aligned image features of the current frame, the aligned image features of the previous frame, and the aligned image features of the two previous frames.
[0132] The aligned image features of the current frame, the previous frame, and the two previous frames are input into a multi-frame self-attention block for inter-frame feature processing to obtain the fused image features of the current frame.
[0133] Specifically, the input to each feature transfer block is the initial frame feature X of the current frame image. t The initial frame feature X of the previous frame image t-1 and the initial frame features X of the first two frames. t-2 The optical flow maps between two frames are predicted pairwise using a pre-trained optical flow alignment network (SpyNet). The aggregated reference image features are then divided into 7x7 patches. Within each patch, the average optical flow is calculated, and all pixels within the patch are aligned using this average value. The aligned features are then fed into several Multi-Frame Self-Attention Blocks (MFSABs) for inter-frame feature extraction. The MFSAB network structure is as follows: Figure 6 As shown. Finally, the features obtained after inter-frame information fusion are obtained, that is, the fused image features of this frame are obtained.
[0134] Reference Figure 6 The multi-frame self-attention block includes a normally connected normalization layer, a multi-frame self-attention layer, a residual connection layer, a normalization layer, a multilayer perceptron, and a residual connection layer. The step involves inputting the aligned image features of the current frame, the aligned image features of the previous frame, and the aligned image features of the two previous frames into the multi-frame self-attention block for inter-frame feature processing to obtain the fused image features of the current frame. Specifically, this includes:
[0135] The alignment image features of the current frame are normalized by a normalization layer to obtain the first current frame level features. The alignment image features of the previous frame are normalized by a normalization layer to obtain the first previous frame level features. The alignment image features of the two previous frames are normalized by a normalization layer to obtain the first two previous frame level features.
[0136] The first frame-level features and the first previous frame-level features are processed by a multi-frame self-attention layer to obtain the second frame-level features; the aligned image features of the current frame and the second frame-level features are merged by a first residual connection layer to obtain the third frame-level features; the third frame-level features are normalized by a normalization layer to obtain the fourth frame-level features; and the fourth frame-level features are processed by a multilayer perceptron to obtain the fifth frame-level features.
[0137] The first previous frame level feature and the first two previous frame level features are processed by a multi-frame self-attention layer to obtain the second previous frame level feature; the aligned image features of the previous frame image and the second previous frame level feature are merged by a first residual connection layer to obtain the third previous frame level feature; the third previous frame level feature is normalized by a normalization layer to obtain the fourth previous frame level feature; and the fourth previous frame level feature is processed by a multilayer perceptron to obtain the fifth previous frame level feature.
[0138] The first two-frame-level features are obtained by performing attention calculations on the first two-frame-level features and the first current-frame-level features through a multi-frame self-attention layer; the second two-frame-level features are obtained by merging the aligned image features of the first two frames and the second two-frame-level features through a first residual connection layer; the third two-frame-level features are obtained by normalizing the third two-frame-level features through a normalization layer; and the fourth two-frame-level features are obtained by performing feature perception processing on the fourth two-frame-level features through a multilayer perceptron.
[0139] The features of the current frame are merged based on the features of the fifth current frame, the features of the fifth preceding frame, and the features of the fifth preceding two frames to obtain the fused image features of the current frame.
[0140] In step 107, the fused frame features are input into a preset feature reconstruction model for resolution enhancement to obtain a reconstructed echocardiogram; wherein the resolution of the reconstructed echocardiogram is greater than that of the frame image.
[0141] After obtaining the fused frame features, a high-resolution echocardiogram is reconstructed using a feature reconstruction model, resulting in a reconstructed echocardiogram image. The feature reconstruction model contains at least one convolutional block (e.g., five convolutional blocks), each consisting of a convolutional layer, a normalization layer, and a Rectified Linear Unified Function (ReLU). Each convolutional block takes features from the previous scale as input and outputs features from the next scale. Compared to the previous scale, the features at the next scale are twice as large in spatial dimension but half the size in channel dimension.
[0142] In one embodiment, the feature reconstruction model includes a second convolutional layer, a second normalization layer, and a second linear rectified function layer. Step 107 specifically includes:
[0143] The fused image features are processed by convolution in the second convolutional layer to obtain the first intermediate echocardiogram features.
[0144] The features of the first intermediate echocardiogram are normalized by the second normalization layer to obtain the features of the second intermediate echocardiogram.
[0145] The reconstructed echocardiogram is obtained by activating the features of the second intermediate echocardiogram through the second linear rectified function layer.
[0146] In step 108, the current frame image is replaced with the reconstructed echocardiogram image to obtain the target reconstructed echocardiogram video.
[0147] Specifically, each time a reconstructed echocardiogram corresponding to a current frame is obtained, the current frame is replaced with the reconstructed echocardiogram to obtain the target reconstructed echocardiogram video, until all frames have been replaced. Alternatively, after obtaining the reconstructed echocardiograms corresponding to all frames, the target echocardiogram video is obtained by merging the reconstructed echocardiograms.
[0148] like Figure 7 As shown, in one embodiment, the first feature extraction model, the second feature extraction model, the feature dynamic aggregation model, the feature transfer model, and the feature reconstruction model are jointly trained in advance through the following process:
[0149] Step 201: Obtain a sample set, which includes sample echocardiogram images;
[0150] Step 202: The resolution of the sample echocardiogram image is reduced to obtain the enhanced echocardiogram image.
[0151] Step 203: Obtain the features of the echocardiogram to be enhanced through the first feature extraction model to obtain the features of the sample image to be enhanced;
[0152] Step 204: Obtain the features of the sample echocardiogram image through the second feature extraction model to obtain the sample reference image features;
[0153] Step 205: The features of the sample image to be enhanced and the features of the sample reference image are aggregated by the feature dynamic aggregation model to obtain the features of the frame to be enhanced of the echocardiogram image to be enhanced.
[0154] Step 206: Perform feature frame-level fusion of the features of at least two frames to be enhanced using the feature transfer model to obtain the target enhancement image features of the echocardiogram to be enhanced.
[0155] Step 207: The target enhancement image features are enhanced in resolution using a feature reconstruction model to obtain a target enhancement echocardiogram.
[0156] Step 208: Construct a mean absolute error loss function based on the sample echocardiogram and the target enhanced echocardiogram to obtain the first loss data; construct a similarity learning loss function based on the sample echocardiogram and the target enhanced echocardiogram to obtain the second loss data; merge the first loss data and the second loss data to obtain the target loss data.
[0157] Step 209: Adjust the parameters of the first feature extraction model, the second feature extraction model, the feature dynamic aggregation model, the feature transfer model, and the feature reconstruction model based on the target loss data.
[0158] The sample set in step 201 includes multiple samples, each containing a sample echocardiogram image. Each sample echocardiogram image is also a single echocardiogram image. There are various ways to acquire sample echocardiogram images. For example, they can be obtained by using various ultrasound image acquisition devices (echocardiography machines, etc.) to acquire images of the human heart. Alternatively, they can be obtained from a local or external database, or by searching for sample echocardiogram images on the network, and so on.
[0159] In step 202, the sample echocardiogram image needs to be reduced in resolution to obtain the enhanced echocardiogram image. The resolution of the enhanced echocardiogram image is lower than that of the sample echocardiogram image. Resolution reduction can be achieved by, for example, reducing the size of the sample echocardiogram image to obtain the enhanced echocardiogram image.
[0160] Steps 203 to 207 are similar to steps 103 to 107. After passing through the feature extraction model, the feature dynamic aggregation model, the feature transfer model, and the feature reconstruction model in sequence, the target enhanced echocardiogram image is obtained.
[0161] Step 208 involves two loss functions: the mean absolute error loss function and the perceptual image patch similarity learning loss function.
[0162] First loss data L rec The target-enhanced echocardiogram is represented as I. SR The sample echocardiogram is represented as I. HR Mean absolute error loss function between true values:
[0163] L rec =||I SR -I HR ‖ Formula (2)
[0164] Second loss data L LPIPS To learn a loss function for perceptual image patch similarity:
[0165]
[0166] The image features used here are extracted using a pre-trained VGG network on ImageNet. The network's output I is then processed. SR and high-resolution echocardiography I HR Image features are obtained by inputting each feature into a pre-trained VGG network model. Where φ l (I) represents the l-th layer features of the image extracted by the VGG network, H l Represents φ l (I) width, W l Represents φ l (I) is high.
[0167] The target loss data is:
[0168] L = L rec +λ LPIPS L LPIPS Formula (4)
[0169] Where, λ LPIPS This is a weight term, typically set to 1. The model's training batch size is 2, using the Adam optimizer, with an initial learning rate of 1×10⁻⁶. -4 The optimizer hyperparameters were set to β1 = 0.9, β2 = 0.999, and the weight decay parameter was 1 × 10⁻⁶. -4 Furthermore, to avoid gradient explosion, gradient values during backpropagation are truncated to the interval [-0.1, 0.1]. During training, 224×224 pixel blocks are randomly cropped, and a two-stage training strategy is used.
[0170] Please see Figure 8This application also provides a super-resolution enhancement device for echocardiogram videos, which can implement the above-mentioned super-resolution enhancement method for echocardiogram videos. Figure 8 This is a block diagram of the module structure of the super-resolution enhancement device for echocardiogram video provided in this application embodiment. The device includes: a video acquisition module 301, a reference image acquisition module 302, a feature extraction module for the image to be enhanced 303, a reference image feature extraction module 304, a feature aggregation module 305, a feature frame-level fusion module 306, a feature reconstruction module 307, and an image merging module 308. The video acquisition module 301 is used to acquire echocardiogram video, which includes frame images. The reference image acquisition module 302 is used to acquire reference echocardiogram images, wherein the resolution of the reference echocardiogram images is greater than the resolution of the frame images. The feature extraction module for the image to be enhanced 303 inputs each frame image into a preset feature extraction model for feature extraction to obtain the features of the image to be enhanced for each frame image. The reference image feature extraction module 304 inputs the reference echocardiogram images into a feature extraction model for feature extraction to obtain reference image features. The feature aggregation module 305 inputs the features of the image to be enhanced and the reference image features of each frame image into a preset feature extraction model. The system performs feature aggregation using a dynamic feature aggregation model to obtain the initial frame features of each frame image. A feature frame-level fusion module 306 inputs the initial frame features of the current frame image, the initial frame features of the previous frame image, and the initial frame features of the two previous frames into a preset feature transmission model for feature frame-level fusion to obtain the fused frame features of the current frame image. A feature reconstruction module 307 inputs the fused frame features into a preset feature reconstruction model for resolution enhancement to obtain a reconstructed echocardiogram image. The resolution of the reconstructed echocardiogram image is greater than the resolution of the frame image. An image merging module 308 replaces the current frame image with the reconstructed echocardiogram image to obtain the target reconstructed echocardiogram video.
[0171] It should be noted that the specific implementation of the super-resolution enhancement device for echocardiogram video is basically the same as the specific embodiment of the super-resolution enhancement method for echocardiogram video described above, and will not be repeated here.
[0172] This application also provides an electronic device, which includes: a memory, a processor, a program stored in the memory and executable on the processor, and a data bus for communication between the processor and the memory. When the program is executed by the processor, it implements the above-described super-resolution enhancement method for echocardiogram video. This electronic device can be any smart terminal, including tablet computers, in-vehicle computers, etc.
[0173] Please see Figure 9 , Figure 9 The hardware structure of an electronic device according to another embodiment is illustrated. The electronic device includes:
[0174] The processor 401 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application.
[0175] The memory 402 can be implemented as a read-only memory (ROM), static storage device, dynamic storage device, or random access memory (RAM). The memory 402 can store the operating system and other applications. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 402 and called by the processor 401 to execute the super-resolution enhancement method for echocardiogram video according to the embodiments of this application.
[0176] Input / output interface 403 is used to implement information input and output;
[0177] The communication interface 404 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
[0178] Bus 405 transmits information between various components of the device (e.g., processor 401, memory 402, input / output interface 403, and communication interface 404);
[0179] The processor 401, memory 402, input / output interface 403 and communication interface 404 are connected to each other within the device via bus 405.
[0180] This application embodiment also provides a storage medium, which is a computer-readable storage medium for computer-readable storage. The storage medium stores one or more programs, which can be executed by one or more processors to implement the above-described method for super-resolution enhancement of echocardiogram video.
[0181] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
[0182] The super-resolution enhancement method, apparatus, electronic device, and computer-readable storage medium for echocardiogram videos provided in this application introduce a high-resolution reference image and use the texture information of the reference image to add constraints to the super-resolution enhancement task, thereby reducing the uncertainty of the super-resolution task, making the optimization target clearer, and comprehensively improving the super-resolution enhancement performance.
[0183] The embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.
[0184] It will be understood by those skilled in the art that Figure 2 and Figure 7 The technical solutions shown do not constitute a limitation on the embodiments of this application, and may include more or fewer steps than shown, or combine certain steps, or different steps.
[0185] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.
[0186] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0187] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.
[0188] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0189] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0190] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes multiple instructions to cause an electronic device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing programs, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0191] The preferred embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present application. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and substance of the embodiments of the present application shall be within the scope of the claims of the present application.
Claims
1. A method for super-resolution enhancement of echocardiographic videos, characterized in that, The method includes: Acquire echocardiographic video, wherein the echocardiographic video includes frame images; Acquire a reference echocardiogram; wherein the resolution of the reference echocardiogram is greater than the resolution of the frame image; Each frame image is input into a preset first feature extraction model for feature extraction to obtain the image features to be enhanced for each frame image, wherein the image features to be enhanced include the initial features of the initial pixels; The reference echocardiogram is input into a preset second feature extraction model for feature extraction to obtain reference image features, which include the original reference features of reference pixels. The image features to be enhanced and the reference image features of each frame image are input into a preset feature dynamic aggregation model for feature aggregation to obtain the initial frame features of each frame image. This includes: searching for reference pixels based on the initial pixels to obtain a selected set of reference pixels; performing feature aggregation on the original reference features of the selected set of reference pixels to obtain aggregated reference features; fusing the initial features and the aggregated reference features to obtain the aggregated features of the initial pixels; and merging the aggregated features of the initial pixels to obtain the initial frame features of the frame image. The initial frame features of the current frame image, the initial frame features of the previous frame image, and the initial frame features of the previous two frames image are input into a preset feature transmission model for feature frame-level fusion to obtain the fused frame features of the current frame image. The fused frame features are input into a preset feature reconstruction model for resolution enhancement to obtain a reconstructed echocardiogram; wherein the resolution of the reconstructed echocardiogram is greater than the resolution of the frame image; The target reconstructed echocardiogram video is obtained by replacing the current frame image with the reconstructed echocardiogram image.
2. The method according to claim 1, characterized in that, Based on the initial pixel point, the reference pixel point is searched to obtain a selected reference pixel point set, including: Based on the position of the initial pixel in the frame image, the position of each reference pixel in the reference echocardiogram is matched to obtain the selected reference pixel. A preset image region is defined on the reference echocardiogram based on the selected reference pixels; The reference pixels located within the preset image area are merged to obtain the selected reference pixel set.
3. The method according to claim 1, characterized in that, The first feature extraction model includes a first upsampling layer, a first convolutional layer, a first normalization layer, and a first linear rectified function. Each frame of the echocardiogram is input into the preset first feature extraction model for feature extraction to obtain the image features to be enhanced for each frame, including: The frame image is upsampled using the first upsampling layer to obtain the frame image to be processed. The first image feature is obtained by performing convolution processing on the frame image to be processed through the first convolutional layer. The first image features are normalized using the first normalization layer to obtain the second image features; The second image feature is activated by the first linear rectification function to obtain the image feature to be enhanced in the frame image.
4. The method according to claim 1, characterized in that, The feature transfer model includes an optical flow alignment network and multi-frame self-attention blocks. The initial frame features of the current frame, the initial frame features of the previous frame, and the initial frame features of the two previous frames are input into a preset feature transfer model for feature frame-level fusion to obtain the fused frame features of the current frame, including: The optical flow alignment network is used to predict the initial frame features of any two frame images to obtain an optical flow map of the difference between the two frame images, and the initial frame features are segmented into patch blocks. Within each patch block, the average value of the optical flow map is calculated, and all pixel values within the patch block are shifted and aligned using the average value to obtain the aligned image features of the current frame, the aligned image features of the previous frame, and the aligned image features of the two previous frames. The alignment image features of the current frame, the alignment image features of the previous frame, and the alignment image features of the two previous frames are input into the multi-frame self-attention block for inter-frame feature processing to obtain the fused image features of the current frame.
5. The method according to claim 1, characterized in that, The feature reconstruction model includes a second convolutional layer, a second normalization layer, and a second linear rectified function layer. The step of inputting the fused frame features into the preset feature reconstruction model for resolution enhancement to obtain the reconstructed echocardiogram image includes: The fused frame features are processed by convolution in the second convolutional layer to obtain the first intermediate echocardiogram features. The first intermediate echocardiogram features are normalized using the second normalization layer to obtain the second intermediate echocardiogram features. The reconstructed echocardiogram is obtained by activating the features of the second intermediate echocardiogram through the second linear rectified function layer.
6. The method according to any one of claims 1 to 5, characterized in that, The first feature extraction model, the second feature extraction model, the dynamic feature aggregation model, the feature transfer model, and the feature reconstruction model are jointly trained in advance through the following process: Acquire sample echocardiogram images; The sample echocardiogram images are processed to reduce resolution, resulting in the enhanced echocardiogram images. The features of the echocardiogram to be enhanced are obtained by using the first feature extraction model to obtain the features of the sample image to be enhanced. The features of the sample echocardiogram image are obtained by using the second feature extraction model to obtain the features of the sample reference image. The feature dynamic aggregation model is used to aggregate the features of the sample image to be enhanced and the features of the sample reference image to obtain the features of the frame to be enhanced in the echocardiogram image to be enhanced. The feature transfer model is used to perform feature frame-level fusion of the features of at least two frames of the frame to be enhanced to obtain the target enhancement image features of the echocardiogram to be enhanced. The target-enhanced image features are enhanced in resolution using the feature reconstruction model to obtain a target-enhanced echocardiogram. A mean absolute error loss function is constructed based on the sample echocardiogram and the target enhanced echocardiogram to obtain first loss data. A similarity learning loss function is constructed based on the sample echocardiogram and the target enhanced echocardiogram to obtain second loss data. The first loss data and the second loss data are merged to obtain target loss data. The parameters of the first feature extraction model, the second feature extraction model, the feature dynamic aggregation model, the feature transfer model, and the feature reconstruction model are adjusted based on the target loss data.
7. A super-resolution enhancement device for echocardiogram videos, characterized in that, The device includes: The video acquisition module is used to acquire echocardiogram videos, which include frame images; A reference image acquisition module is used to acquire a reference echocardiogram; wherein the resolution of the reference echocardiogram is greater than the resolution of the frame image; The image feature extraction module to be enhanced inputs each frame image into a preset feature extraction model to extract features, thereby obtaining the image features to be enhanced for each frame image, wherein the image features to be enhanced include the initial features of the initial pixels; The reference image feature extraction module is used to input the reference echocardiogram into the feature extraction model to extract features and obtain reference image features, wherein the reference image features include the original reference features of reference pixels. The feature aggregation module inputs the image features to be enhanced and the reference image features of each frame image into a preset dynamic feature aggregation model for feature aggregation to obtain the initial frame features of each frame image. This includes: searching for reference pixels based on the initial pixels to obtain a selected set of reference pixels; performing feature aggregation on the original reference features of the selected set of reference pixels to obtain aggregated reference features; fusing the initial features and the aggregated reference features to obtain the aggregated features of the initial pixels; and merging the aggregated features of the initial pixels to obtain the initial frame features of the frame image. The feature frame-level fusion module is used to input the initial frame features of the current frame image, the initial frame features of the previous frame image, and the initial frame features of the previous two frames image into a preset feature transmission model for feature frame-level fusion to obtain the fused frame features of the current frame image. The feature reconstruction module is used to input the fused frame features into a preset feature reconstruction model for resolution enhancement to obtain a reconstructed echocardiogram image; wherein the resolution of the reconstructed echocardiogram image is greater than the resolution of the frame image; An image merging module is used to replace the current frame image with the reconstructed echocardiogram image to obtain the target reconstructed echocardiogram video.
8. An electronic device, characterized in that, The electronic device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the method according to any one of claims 1 to 6.
9. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the method of any one of claims 1 to 6.