A key frame extraction method based on ultrasonic video and related equipment
By combining convolutional neural networks and recurrent neural networks, the automatic extraction and detection of key frames in ultrasound videos is achieved, which solves the error problem caused by the reliance on the experience of medical personnel in existing technologies and improves the accuracy of diagnosis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN UNIV
- Filing Date
- 2023-04-21
- Publication Date
- 2026-06-26
Smart Images

Figure CN116486304B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of keyframe extraction, and more particularly to a method and related equipment for keyframe extraction based on ultrasound video. Background Technology
[0002] In clinical practice, ultrasound physicians move the probe to display scan results in real time on the machine to identify keyframes of lesions. These frames are then saved and used to make diagnoses based on the information they contain. Because certain diagnostic attributes only appear on certain cross-sections of the lesion, physicians may overlook keyframe selection, leading to misinterpretations of lesion attributes and affecting diagnostic results. To alleviate clinical pressure and reduce reliance on ultrasound physician experience, many researchers have developed AI-assisted clinical diagnosis systems using artificial intelligence (AI). These efforts can improve the detection rate of breast lesions, help shorten image reading time, and demonstrate promising application prospects. However, currently, AI technology is mainly applied to the "post-processing" level of ultrasound images. This method involves detecting and analyzing lesions in two-dimensional images selected by medical staff from clinical videos; therefore, the accuracy of the algorithm largely depends on the image quality of the images selected by the medical staff.
[0003] Therefore, existing technologies still need improvement and development. Summary of the Invention
[0004] The technical problem to be solved by the present invention is to provide a keyframe extraction method and related equipment based on ultrasound video, which addresses the above-mentioned deficiencies of the prior art. The aim is to solve the problem that the keyframe extraction of ultrasound video in the prior art mainly relies on the experience of medical personnel, which is subject to human error and can easily affect the diagnostic results.
[0005] The technical solution adopted by this invention to solve the problem is as follows:
[0006] In a first aspect, embodiments of the present invention provide a keyframe extraction method based on ultrasound video, wherein the method includes:
[0007] The original ultrasound video is acquired, and the original ultrasound video is preprocessed to obtain the target ultrasound video. The preprocessing includes one or more of the following: video cropping, channel transformation, and size scaling.
[0008] The target ultrasound video is input into a pre-trained keyframe extraction network;
[0009] The prediction data is obtained by the keyframe extraction network based on the target ultrasound video output, wherein the prediction data includes several keyframes and nodule information corresponding to each keyframe.
[0010] In one implementation, the keyframe extraction network includes:
[0011] A convolutional neural network is used to extract spatial information from the target ultrasound video.
[0012] A recurrent neural network is used to determine temporal information based on the spatial information, and to determine a number of key frames and nodal information corresponding to each key frame based on the temporal information, wherein the nodal information includes nodal position, nodal size, confidence level and nodal attribute information.
[0013] In one implementation, the network parameter update method of the keyframe extraction network includes:
[0014] Obtain the training ultrasound video and the key frame label information corresponding to the training ultrasound video, and input the training ultrasound video and the key frame label information into the key frame extraction network;
[0015] Obtain the training prediction data of the keyframe extraction network based on the training ultrasound video output;
[0016] The prediction error of the training prediction data is determined based on the keyframe label information, and the network parameters of the keyframe extraction network are updated based on the prediction error.
[0017] In one implementation, the network parameter update method of the keyframe extraction network includes:
[0018] The first reference information corresponding to the training ultrasound video is obtained through the nodule detection module, wherein the first reference information includes the nodule position, nodule size and confidence level corresponding to each key frame in the training ultrasound video.
[0019] The second reference information corresponding to the training ultrasound video is obtained through the attribute classification module, wherein the second reference information includes nodule attribute information corresponding to each key frame in the training ultrasound video.
[0020] The prediction error of the training prediction data is determined based on the first reference information and the second reference information, and the network parameters of the keyframe extraction network are updated based on the prediction error.
[0021] In one embodiment, the nodule detection module is further configured to:
[0022] The confidence level corresponding to each video frame in the target ultrasound video is obtained, and an original confidence level curve is generated based on the confidence level of each video frame.
[0023] The original confidence curve is smoothed by a window function of a preset size to obtain an initial confidence curve;
[0024] The initial confidence curve is smoothed using the window function to obtain the target confidence curve;
[0025] The video frames are filtered and selected based on the target confidence curve.
[0026] In one implementation, the attribute classification module works as follows:
[0027] For each input video frame, extract the image features corresponding to that video frame;
[0028] Based on the image features, the nodule attribute information corresponding to the video frame is determined, wherein the nodule attribute information includes several ultrasound attributes of the nodule and the probability of benign or malignant transformation.
[0029] In one embodiment, the keyframe extraction network is a reinforcement learning network, and the network parameter update method of the keyframe extraction network includes:
[0030] The training ultrasound video is input into the reinforcement learning network, wherein the initial extraction probability of each video frame in the training ultrasound video is equal.
[0031] The extraction probability of each video frame is randomly changed, and a key frame extraction action is determined based on the changed extraction probability. The training prediction data corresponding to the training ultrasound video is determined based on the key frame extraction action.
[0032] The reinforcement signal of the reinforcement learning network is determined based on the training prediction data, wherein the reinforcement signal is used to reflect the quality of the keyframe extraction action;
[0033] Determine whether the reinforcement signal has reached the target value; if not, update the network parameters of the reinforcement learning network based on the reinforcement signal.
[0034] Continue executing the step of randomly changing the extraction probability of each of the video frames until the enhanced signal reaches the target value.
[0035] Secondly, embodiments of the present invention also provide a keyframe extraction system based on ultrasound video, wherein the system includes:
[0036] A preprocessing module is used to acquire the original ultrasound video, preprocess the original ultrasound video to obtain the target ultrasound video, wherein the preprocessing includes one or more of the following: video cropping, channel transformation, and size scaling.
[0037] The input module is used to input the target ultrasound video into a pre-trained keyframe extraction network;
[0038] The output module is used to acquire the prediction data output by the keyframe extraction network based on the target ultrasound video, wherein the prediction data includes several keyframes and nodule information corresponding to each keyframe.
[0039] Thirdly, embodiments of the present invention also provide a terminal, wherein the terminal includes a memory and one or more processors; the memory stores one or more programs; the programs include instructions for executing the keyframe extraction method based on ultrasound video as described above; and the processor is used to execute the programs.
[0040] Fourthly, embodiments of the present invention also provide a computer-readable storage medium storing a plurality of instructions, wherein the instructions are adapted to be loaded and executed by a processor to implement the steps of any of the above-described methods for extracting keyframes based on ultrasonic video.
[0041] The beneficial effects of this invention are as follows: This invention utilizes a keyframe extraction network to assist medical personnel in identifying keyframes in ultrasound videos, effectively reducing human error and improving the accuracy of diagnostic results. Furthermore, the keyframe extraction network can detect and classify keyframes while outputting them, providing doctors with more auxiliary reference information. This solves the problem that current keyframe extraction in ultrasound videos mainly relies on the experience of medical personnel, which is prone to human error and can easily affect diagnostic results. Attached Figure Description
[0042] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0043] Figure 1 This is a flowchart illustrating the keyframe extraction method based on ultrasound video provided in an embodiment of the present invention.
[0044] Figure 2 This is a network framework diagram of the keyframe extraction network provided in an embodiment of the present invention.
[0045] Figure 3 This is a schematic diagram of the module of the keyframe extraction system based on ultrasound video provided in an embodiment of the present invention.
[0046] Figure 4 This is a schematic diagram of the terminal provided in the embodiment of the present invention. Detailed Implementation
[0047] This invention discloses a method and related equipment for keyframe extraction from ultrasound video. To make the objectives, technical solutions, and effects of this invention clearer and more explicit, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only for explaining the invention and are not intended to limit the invention.
[0048] Those skilled in the art will understand that, unless specifically stated otherwise, the singular forms “a,” “an,” “the,” and “the” used herein may also include the plural forms. It should be further understood that the term “comprising” as used in this specification means the presence of the stated features, integers, steps, operations, elements, and / or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof. It should be understood that when we say an element is “connected” or “coupled” to another element, it can be directly connected or coupled to the other element, or there may be intermediate elements. Furthermore, “connected” or “coupled” as used herein can include wireless connections or wireless coupling. The term “and / or” as used herein includes all or any units and all combinations of one or more associated listed items.
[0049] It will be understood by those skilled in the art that, unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. It should also be understood that terms such as those defined in general dictionaries should be understood to have the same meaning as in the context of the prior art, and should not be interpreted in an idealized or overly formal sense unless specifically defined as herein.
[0050] To address the aforementioned shortcomings of existing technologies, this invention provides a keyframe extraction method based on ultrasound video, the method comprising:
[0051] Step S100: Obtain the original ultrasound video, preprocess the original ultrasound video to obtain the target ultrasound video, wherein the preprocessing includes one or more of the following: video cropping, channel transformation and size scaling.
[0052] Step S200: Input the target ultrasound video into a pre-trained keyframe extraction network;
[0053] Step S300: Obtain the prediction data output by the keyframe extraction network based on the target ultrasound video, wherein the prediction data includes several keyframes and nodule information corresponding to each keyframe.
[0054] Specifically, the original ultrasound video in this embodiment can be a breast ultrasound video. To improve the robustness and accuracy of the data, the original ultrasound video first needs to be preprocessed uniformly. For example, the original ultrasound video can be cropped, channel transformed, and scaled to ensure that the data size is uniform in subsequent processing. To reduce the degree of human intervention in the keyframe extraction process, this embodiment pre-trains a keyframe extraction network. This network has learned the complex mapping relationship between input and output through massive data training. Therefore, the preprocessed target ultrasound video is input into the keyframe extraction network, which can determine which video frames in the target ultrasound video are keyframes and output these keyframes and the relevant information of nodules within each keyframe. This embodiment uses a keyframe extraction network to assist medical personnel in judging keyframes in ultrasound videos, which can effectively reduce human error and improve the accuracy of diagnostic results. In addition, the keyframe extraction network can also detect and classify keyframes while outputting them, that is, output the nodule information of the keyframes, which can better assist doctors in diagnosis.
[0055] In one implementation, the preprocessing of the original ultrasound video specifically includes: for each video frame in the original ultrasound video, cropping the background area of that video frame image; converting the video frame image into a single channel; and finally, scaling the size of the video frame image to 600*400 based on the statistical results of the original image size. This preprocessing ensures that the size and length of the input ultrasound video remain consistent, thus better enabling the automatic extraction of keyframes from ultrasound videos of varying lengths.
[0056] In one implementation, the entire dataset is divided into training, validation, and test sets at the user level in a 7:1:2 ratio.
[0057] In one implementation, the keyframe extraction network includes:
[0058] A convolutional neural network is used to extract spatial information from the target ultrasound video.
[0059] A recurrent neural network is used to determine temporal information based on the spatial information, and to determine a number of key frames and nodal information corresponding to each key frame based on the temporal information, wherein the nodal information includes nodal position, nodal size, confidence level and nodal attribute information.
[0060] Specifically, this embodiment uses convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to form the basic framework of the keyframe extraction network. The CNNs and RNNs are cascaded, meaning the output data of the former is the input data of the latter. This embodiment pre-constructs a keyframe extraction network with many hidden layers. The target ultrasound video is input into the constructed keyframe extraction network. The CNN extracts spatial information, which is then input into the RNN to extract temporal information. Based on the temporal information, the network determines which frames are keyframes. While predicting keyframes, the keyframe extraction network also predicts nodal information within the keyframes, such as nodal location, size, confidence level, and attribute information, thereby enhancing the network's utilization of keyframe information. In another implementation, the recurrent neural network can be replaced by a recurrent neural network.
[0061] In one implementation, the network parameter update method of the keyframe extraction network includes:
[0062] Obtain the training ultrasound video and the key frame label information corresponding to the training ultrasound video, and input the training ultrasound video and the key frame label information into the key frame extraction network;
[0063] Obtain the training prediction data of the keyframe extraction network based on the training ultrasound video output;
[0064] The prediction error of the training prediction data is determined based on the keyframe label information, and the network parameters of the keyframe extraction network are updated based on the prediction error.
[0065] Specifically, the training ultrasound video can be an ultrasound video of a user obtained from a clinical setting. An experienced sonographer then annotates the video with keyframes, marking the nodule's bounding box and attribute information on each keyframe. The nodule bounding box reflects the nodule's location and size, while the attribute information reflects the nodule's corresponding attribute and benign / malignant classification, thus obtaining the user's keyframe label information. During network training, the training ultrasound video and the corresponding keyframe label information are simultaneously input into the keyframe extraction network. Supervised by the keyframe label information, the keyframe extraction network calculates the prediction error of its output training prediction data. Then, based on this prediction error, backpropagation of the network gradient is performed, thereby achieving automatic update of the network parameters and converging the difference between the predicted and actual results.
[0066] In one implementation, the network parameter update method of the keyframe extraction network includes:
[0067] The first reference information corresponding to the training ultrasound video is obtained through the nodule detection module, wherein the first reference information includes the nodule position, nodule size and confidence level corresponding to each key frame in the training ultrasound video.
[0068] The second reference information corresponding to the training ultrasound video is obtained through the attribute classification module, wherein the second reference information includes nodule attribute information corresponding to each key frame in the training ultrasound video.
[0069] The prediction error of the training prediction data is determined based on the first reference information and the second reference information, and the network parameters of the keyframe extraction network are updated based on the prediction error.
[0070] Specifically, this embodiment also pre-constructs a nodule detection module and an attribute classification module to coordinate the learning between the main task (keyframe extraction) and the auxiliary tasks (nodule detection and attribute classification). The nodule detection module can accurately detect the nodule location, nodule size, and confidence level in each keyframe. The attribute classification module can accurately detect the nodule attribute information in each keyframe. This embodiment uses the output data of the nodule detection module and the attribute classification module as reference information for auxiliary training to evaluate the accuracy of the nodule location, nodule size, confidence level, and nodule attribute information output by the keyframe extraction network, thereby obtaining the network's prediction error, and then automatically updating the network parameters based on this prediction error.
[0071] In one implementation, the attribute classification module works as follows:
[0072] For each input video frame, extract the image features corresponding to that video frame;
[0073] Based on the image features, the nodule attribute information corresponding to the video frame is determined, wherein the nodule attribute information includes several ultrasound attributes of the nodule and the probability of benign or malignant transformation.
[0074] Specifically, the attribute classification module in this embodiment uses advanced technologies such as convolutional neural networks. It takes a single frame of ultrasound image as input and extracts image features from the two-dimensional ultrasound image, thereby directly and accurately predicting the attributes of the nodule (e.g., 8 ultrasound attributes) and its benign or malignant outcome.
[0075] In one implementation, the nodule detection module is further configured to:
[0076] The confidence level corresponding to each video frame in the target ultrasound video is obtained, and an original confidence level curve is generated based on the confidence level of each video frame.
[0077] The original confidence curve is smoothed by a window function of a preset size to obtain an initial confidence curve;
[0078] The initial confidence curve is smoothed using the window function to obtain the target confidence curve;
[0079] The video frames are filtered and selected based on the target confidence curve.
[0080] It is understandable that ultrasound videos contain a large amount of redundant information, nodules may only exist in a small portion of the video, and the length of different ultrasound videos varies (e.g., from 50 to 400 frames). Therefore, this embodiment uses a nodule detection module to eliminate irrelevant information, thereby reducing subsequent computational overhead and facilitating the processing of ultrasound videos of varying lengths. Specifically, this embodiment first uses the nodule detection module to detect the location, size, and confidence level of nodules in each video frame of the ultrasound video. Then, based on the confidence level of each frame, the confidence curve is smoothed using a window function of fixed size. Since a small number of keyframes are filtered out after the first smoothing, the confidence curve needs to be smoothed again to ensure that all keyframes are included in the extracted video. Based on the finally generated target confidence curve, video frames with low nodule confidence can be identified. For example, if the confidence level of a video frame is lower than the target confidence curve, it indicates that the confidence level of that video frame is low. The nodule detection module uses two smoothing filters to filter out video frames with low confidence levels of nodules in ultrasound video frames, and has good robustness to confidence curves with large fluctuations, thus effectively avoiding interference from irrelevant information.
[0081] In another implementation, the keyframe extraction network includes a nodule detection module and an attribute classification module. First, the nodule detection module filters and selects each video frame in the target ultrasound video, then obtains the nodule location, size, and confidence level for each video frame. Next, the attribute classification module obtains the nodule attribute information for each video frame, including several ultrasound attributes and the probability of benignity / malignancy. Then, when selecting keyframes, the keyframe extraction network combines the nodule location, size, and confidence level output by the nodule detection module, as well as the several ultrasound attributes and the probability of benignity / malignancy output by the nodule detection module, to comprehensively determine which video frames are keyframes. Finally, the keyframe extraction network outputs several keyframes of the target ultrasound video and the nodule location, size, confidence level, several ultrasound attributes, and the probability of benignity / malignancy for each keyframe.
[0082] In another implementation, the keyframe extraction network is a reinforcement learning network, and the network parameter update method of the keyframe extraction network includes:
[0083] The training ultrasound video is input into the reinforcement learning network, wherein the initial extraction probability of each video frame in the training ultrasound video is equal.
[0084] The extraction probability of each video frame is randomly changed, and a key frame extraction action is determined based on the changed extraction probability. The training prediction data corresponding to the training ultrasound video is determined based on the key frame extraction action.
[0085] The reinforcement signal of the reinforcement learning network is determined based on the training prediction data, wherein the reinforcement signal is used to reflect the quality of the keyframe extraction action;
[0086] Determine whether the reinforcement signal has reached the target value; if not, update the network parameters of the reinforcement learning network based on the reinforcement signal.
[0087] Continue executing the step of randomly changing the extraction probability of each of the video frames until the enhanced signal reaches the target value.
[0088] This embodiment can also use reinforcement learning algorithms to learn the keyframe extraction network. Deep neural networks combined with reinforcement learning can give the network powerful exploration and learning capabilities, thus enabling more targeted selection of keyframes. Specifically, firstly, the extraction probabilities of all video frames are initialized to be equal. Then, the extraction probability of each video frame is randomly changed to obtain the keyframe extraction action. Afterwards, the keyframe extraction action is used to output the keyframe extraction results, nodule detection results, and nodule classification results of the keyframe extraction network, i.e., the prediction data of the keyframe extraction network. Then, the quality of the prediction data is evaluated to obtain the reinforcement signal of the keyframe extraction network. It is determined whether the reinforcement signal reaches the preset target value. If yes, it indicates that the prediction accuracy of the keyframe extraction network has reached the target, and network training stops; if no, it indicates that the prediction accuracy of the keyframe extraction network has not reached the target, and network training continues. First, the network parameters are optimized based on the reinforcement signal, and then the extraction probability of each video frame is randomly changed (but this random change is randomized within different probability distributions). This operation is repeated until the reinforcement signal is maximized or reaches the target value. In one implementation, since the enhancement signal can reflect the quality of the keyframe extraction action, the probability of each video frame being extracted in the next round can be changed by using the enhancement signal as a guide, so that the probability of key frames being extracted gradually increases and the probability of non-key frames being extracted gradually decreases.
[0089] The advantages of this invention are:
[0090] 1. This invention can provide reference and guidance for ultrasound physicians' ultrasound video diagnostic process, and assist doctors in extracting key frames and diagnosing lesions.
[0091] 2. The ultrasound video preprocessing method in this invention can enhance the applicability of this invention to different ultrasound data styles.
[0092] 3. Compared with traditional keyframe extraction networks, this invention combines the advantages of current artificial intelligence and deep learning, and can make full use of the importance information of different tasks to extract keyframes.
[0093] 4. Most other existing technologies perform tasks such as predicting benign or malignant diseases and detecting nodules on static images, usually requiring manual selection of keyframes in the video. This invention, however, can automatically extract keyframes, assisting less experienced ultrasound physicians in completing ultrasound video diagnoses.
[0094] Based on the above embodiments, the present invention also provides a keyframe extraction system based on ultrasound video, such as... Figure 3 As shown, the system includes:
[0095] The preprocessing module 01 is used to acquire the original ultrasound video, preprocess the original ultrasound video to obtain the target ultrasound video, wherein the preprocessing includes one or more of the following: video cropping, channel transformation and size scaling.
[0096] Input module 02 is used to input the target ultrasound video into a pre-trained keyframe extraction network;
[0097] Output module 03 is used to acquire the prediction data output by the keyframe extraction network based on the target ultrasound video, wherein the prediction data includes several keyframes and nodule information corresponding to each keyframe.
[0098] Based on the above embodiments, the present invention also provides a terminal, the principle block diagram of which can be as follows: Figure 4 As shown, the terminal includes a processor, memory, network interface, and display screen connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides the environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface is used to communicate with external terminals via a network connection. When the computer program is executed by the processor, it implements a keyframe extraction method based on ultrasound video. The display screen can be an LCD screen or an e-ink screen.
[0099] Those skilled in the art will understand that Figure 4The schematic diagram shown is merely a partial structural diagram related to the present invention and does not constitute a limitation on the terminal to which the present invention is applied. A specific terminal may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0100] In one implementation, the terminal's memory stores one or more programs, and these programs are configured to be executed by one or more processors, and the programs contain instructions for performing a keyframe extraction method based on ultrasound video.
[0101] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided by this invention can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and RAMbus dynamic RAM (RDRAM), etc.
[0102] In summary, this invention discloses a keyframe extraction method and related equipment based on ultrasound video. The method includes: acquiring an original ultrasound video; preprocessing the original ultrasound video to obtain a target ultrasound video; inputting the target ultrasound video into a keyframe extraction network; and acquiring prediction data output by the keyframe extraction network based on the target ultrasound video. The prediction data includes several keyframes and nodule information corresponding to each keyframe. This invention assists medical personnel in judging keyframes in ultrasound videos through a keyframe extraction network, effectively reducing human error and improving the accuracy of diagnostic results. Furthermore, the keyframe extraction network can detect and classify keyframes while outputting them, providing doctors with more auxiliary reference information. This solves the problem that current keyframe extraction from ultrasound videos mainly relies on the experience of medical personnel, which is prone to human error and can easily affect diagnostic results.
[0103] It should be understood that the application of the present invention is not limited to the examples above. Those skilled in the art can make improvements or modifications based on the above description, and all such improvements and modifications should fall within the protection scope of the appended claims.
Claims
1. A keyframe extraction method based on ultrasound video, characterized in that, The method includes: The original ultrasound video is acquired, and the original ultrasound video is preprocessed to obtain the target ultrasound video. The preprocessing includes one or more of the following: video cropping, channel transformation, and size scaling. The target ultrasound video is input into a pre-trained keyframe extraction network; the keyframe extraction network includes: a convolutional neural network for extracting spatial information from the target ultrasound video; and a recurrent neural network for determining temporal information based on the spatial information, and determining several keyframes and nodule information corresponding to each keyframe based on the temporal information, wherein the nodule information includes nodule location, nodule size, confidence level, and nodule attribute information. The prediction data output by the keyframe extraction network based on the target ultrasound video is obtained, wherein the prediction data includes several keyframes and nodule information corresponding to each keyframe; The keyframe extraction network is a reinforcement learning network. The network parameter update method of the keyframe extraction network includes: inputting a training ultrasound video into the reinforcement learning network, wherein the initial extraction probabilities of each video frame in the training ultrasound video are equal; randomly changing the extraction probabilities of each video frame, determining a keyframe extraction action based on the changed extraction probabilities, and determining training prediction data corresponding to the training ultrasound video based on the keyframe extraction action; determining a reinforcement signal of the reinforcement learning network based on the training prediction data, wherein the reinforcement signal is used to reflect the quality of the keyframe extraction action; determining whether the reinforcement signal has reached a target value, and if not, updating the network parameters of the reinforcement learning network based on the reinforcement signal; continuing to execute the step of randomly changing the extraction probabilities of each video frame until the reinforcement signal reaches the target value.
2. The keyframe extraction method based on ultrasound video according to claim 1, characterized in that, The network parameter update method for the keyframe extraction network includes: Obtain the training ultrasound video and the key frame label information corresponding to the training ultrasound video, and input the training ultrasound video and the key frame label information into the key frame extraction network; Obtain the training prediction data of the keyframe extraction network based on the training ultrasound video output; The prediction error of the training prediction data is determined based on the keyframe label information, and the network parameters of the keyframe extraction network are updated based on the prediction error.
3. The keyframe extraction method based on ultrasound video according to claim 2, characterized in that, The network parameter update method for the keyframe extraction network includes: The first reference information corresponding to the training ultrasound video is obtained by the nodule detection module, wherein the first reference information includes the nodule position, nodule size and confidence level corresponding to each key frame in the training ultrasound video. The second reference information corresponding to the training ultrasound video is obtained through the attribute classification module, wherein the second reference information includes nodule attribute information corresponding to each key frame in the training ultrasound video. The prediction error of the training prediction data is determined based on the first reference information and the second reference information, and the network parameters of the keyframe extraction network are updated based on the prediction error.
4. The keyframe extraction method based on ultrasound video according to claim 3, characterized in that, The nodule detection module is also used for: The confidence level corresponding to each video frame in the target ultrasound video is obtained, and an original confidence level curve is generated based on the confidence level of each video frame. The original confidence curve is smoothed by a window function of a preset size to obtain an initial confidence curve; The initial confidence curve is smoothed using the window function to obtain the target confidence curve; The video frames are filtered and selected based on the target confidence curve.
5. The keyframe extraction method based on ultrasound video according to claim 3, characterized in that, The working principle of the attribute classification module is as follows: For each input video frame, extract the image features corresponding to that video frame; Based on the image features, the nodule attribute information corresponding to the video frame is determined, wherein the nodule attribute information includes several ultrasound attributes of the nodule and the probability of benign or malignant transformation.
6. A keyframe extraction system based on ultrasound video, characterized in that, The system includes: A preprocessing module is used to acquire the original ultrasound video, preprocess the original ultrasound video to obtain the target ultrasound video, wherein the preprocessing includes one or more of the following: video cropping, channel transformation, and size scaling. An input module is used to input the target ultrasound video into a pre-trained keyframe extraction network; the keyframe extraction network includes: a convolutional neural network, used to extract spatial information from the target ultrasound video; and a recurrent neural network, used to determine temporal information based on the spatial information, and to determine a number of keyframes and nodule information corresponding to each keyframe based on the temporal information, wherein the nodule information includes nodule location, nodule size, confidence level, and nodule attribute information; The output module is used to acquire the prediction data output by the keyframe extraction network based on the target ultrasound video, wherein the prediction data includes several keyframes and nodule information corresponding to each keyframe. The keyframe extraction network is a reinforcement learning network. The network parameter update method of the keyframe extraction network includes: inputting a training ultrasound video into the reinforcement learning network, wherein the initial extraction probabilities of each video frame in the training ultrasound video are equal; randomly changing the extraction probabilities of each video frame, determining a keyframe extraction action based on the changed extraction probabilities, and determining training prediction data corresponding to the training ultrasound video based on the keyframe extraction action; determining a reinforcement signal of the reinforcement learning network based on the training prediction data, wherein the reinforcement signal is used to reflect the quality of the keyframe extraction action; determining whether the reinforcement signal has reached a target value, and if not, updating the network parameters of the reinforcement learning network based on the reinforcement signal; continuing to execute the step of randomly changing the extraction probabilities of each video frame until the reinforcement signal reaches the target value.
7. A terminal, characterized in that, The terminal includes a memory and one or more processors; the memory stores one or more programs; the programs contain instructions for executing the keyframe extraction method based on ultrasound video as described in any one of claims 1-5; the processor is used to execute the programs.
8. A computer-readable storage medium storing a plurality of instructions, characterized in that, The instructions are applicable to be loaded and executed by a processor to implement the steps of the keyframe extraction method based on ultrasound video as described in any one of claims 1-5.