Method and system for female pelvic floor muscle ultrasound assessment based on td3 reinforcement learning
By employing TD3 reinforcement learning, the ultrasound assessment of female pelvic floor muscles was automated and standardized, solving the problem of reliance on operator experience in pelvic floor muscle rehabilitation treatment. This improved assessment efficiency and the reproducibility of results, and achieved a closed-loop linkage between imaging data and assessment analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QIANSHAN NORTH COMMUNITY HEALTH SERVICE CENTER XIANGZHOU DISTRICT ZHUHAI CITY
- Filing Date
- 2026-04-28
- Publication Date
- 2026-06-26
Smart Images

Figure CN122272072A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of analysis technology for female pelvic floor muscle rehabilitation treatment, specifically to a method and system for ultrasound assessment of female pelvic floor muscles based on TD3 reinforcement learning. Background Technology
[0002] Against the backdrop of demographic changes and increased health awareness among residents, the demand for women's full-life-cycle health management is rapidly growing. Among these, pelvic floor dysfunction, including pelvic floor insufficiency, stress urinary incontinence, and pelvic organ prolapse, is prevalent among postpartum, menopausal, and elderly women, severely impacting their quality of life and placing a continuous burden on social medical resources. The pelvic floor muscles are deep anatomical structures of the pelvic wall, characterized by being "invisible, difficult to palpate, and difficult to quantify." Relying solely on subjective feelings or experience often fails to accurately reflect their true functional state. Therefore, objective assessment and rehabilitation intervention focusing on pelvic floor function are gradually becoming important directions in clinical rehabilitation and device development.
[0003] Currently, hospitals and related institutions commonly employ pelvic floor rehabilitation techniques such as electrical stimulation, biofeedback, and magnetic stimulation. These techniques utilize the acquisition of electrophysiological signals from within or on the body surface to guide training and achieve a closed loop of treatment and training. However, practical applications still suffer from the following shortcomings: frequent back-and-forth visits are necessary due to the privacy of the procedure; the correctness of training movements is difficult to verify in real time; and efficacy evaluation relies heavily on indirect signals and the operator's experience. These issues limit the stability, repeatability, and long-term management capabilities of the assessment results.
[0004] Color Doppler ultrasound, with its advantages of real-time imaging, dynamic observation, and visualization of soft tissue structure and movement, can provide more intuitive and objective evidence of pelvic floor structure and function, and is gradually being used in pelvic floor assessment and rehabilitation guidance. However, traditional color Doppler ultrasound assessment still relies heavily on operator experience, and the assessment indicators are discrete and lack standardization, making it difficult to form consistent quantitative comparisons among different individuals, at different stages, and among different operators. It is also difficult to transform continuous data into adaptive and optimized training or intervention strategies.
[0005] Therefore, existing technologies suffer from several problems: female pelvic floor muscle rehabilitation treatment analysis relies heavily on operator experience, standard sections are difficult to obtain, key indicator measurement is inefficient and has poor repeatability, and imaging data is difficult to link with assessment and analysis in a closed loop. Summary of the Invention
[0006] To overcome the shortcomings of the prior art, one of the objectives of this invention is to provide an ultrasound assessment method for female pelvic floor muscles based on TD3 reinforcement learning. This method achieves closed-loop adaptive adjustment of ultrasound scanning through TD3 reinforcement learning, automatically acquires standard sections and completes measurement and assessment, reduces dependence on operator experience, and improves assessment efficiency and the repeatability of results.
[0007] The second objective of this invention is to provide a female pelvic floor muscle ultrasound assessment system based on TD3 reinforcement learning. This system achieves closed-loop adaptive adjustment of ultrasound scanning through TD3 reinforcement learning, automatically acquires standard sections and completes measurement and assessment, reduces reliance on operator experience, and improves assessment efficiency and the repeatability of results.
[0008] To achieve one of the objectives of this invention, the following solution is adopted: A TD3 reinforcement learning-based ultrasound assessment method for female pelvic floor muscles includes the following steps: Step S1: The color ultrasound host controls the dedicated ultrasound probe to acquire the current moment's female pelvic floor muscle ultrasound image sequence and the corresponding probe pose parameters and imaging parameters, and constructs the current state vector; Step S2: Input the current state vector into the pre-trained TD3 reinforcement learning model, and the TD3 reinforcement learning model outputs the current action instruction in the continuous action space; Step S3: According to the current action command, adjust the probe pose parameters and imaging parameters, and acquire the ultrasound image sequence and corresponding probe pose parameters and imaging parameters at the next moment to construct the next state vector; Step S4: Calculate the immediate reward based on the current state vector, the current action instruction, and the next state vector, and store the state transition quadruple into the experience replay pool. The state transition quadruple includes the current state vector, the current action instruction, the immediate reward, and the next state vector. Step S5: Iteratively execute steps S1 to S4 until the preset termination condition is met, and obtain keyframe images that meet the evaluation requirements and the corresponding probe pose parameters and imaging parameters. Step S6: Based on the keyframe images, automatically measure key anatomical structures and functional indicators, and generate a diagnostic assessment report and individualized intervention plan.
[0009] Further, in step S1, the constructed current state vector includes: the section score, quality score, and confidence of key anatomical points obtained from the ultrasound image sequence through the section discrimination and quality assessment network; the probe pose parameters; the imaging parameters; and a stage marker used to distinguish between the resting state and the Valsalva exertion state.
[0010] Further, in step S2, the current action instruction in the continuous action space output by the TD3 reinforcement learning model includes: The probe control actions include the translational displacement, rotation angle, and contact pressure adjustment of the probe in the perineal region. The imaging parameter adjustment action includes gain fine-tuning, imaging depth fine-tuning, and focus position adjustment.
[0011] Furthermore, in step S4, the formula for calculating the immediate reward is as follows: the reward function is calculated based on the improvement of the current facet score, quality score, and confidence of the key dissection point, and simultaneously incorporates the time penalty of the current step and the magnitude penalty of the current action instruction.
[0012] Furthermore, in step S5, the preset termination conditions include: the current section score exceeds a preset first threshold, the quality score exceeds a preset second threshold, and the confidence level of the key anatomical point exceeds a preset third threshold.
[0013] Furthermore, the training process of the TD3 reinforcement learning model includes: Construct an offline dataset that includes the expert scanning process, wherein each record in the offline dataset contains a state transition quadruple; Randomly sample state transition quadruples from the offline dataset; The next state vector in the sampled state transition quadruple is input into the target policy network to generate the target action. The target action and the immediate reward are input into two sets of target value networks. The smaller value output by the two sets of target value networks is used to construct the temporal difference target. The parameters of the two sets of value networks are updated using the time-series difference objective as a supervision signal; After updating the value network several times, the parameters of the policy network are updated once, and the parameters of the target policy network and the target value network are synchronized using a soft update method.
[0014] Furthermore, during the update phase, the TD3 reinforcement learning model generates a target action based on the next state vector by the target policy network, and adds smooth noise of a limited amplitude to the target action to suppress Q-value overestimation and policy oscillation.
[0015] Furthermore, in step S5, when the preset termination condition is met, a freeze action is triggered and the current frame image is captured. The current frame image is used as a key frame image, and the key frame image, along with its corresponding probe pose parameters and imaging parameters, are stored in the inspection cache.
[0016] Furthermore, in step S6, the generated diagnostic assessment report and individualized intervention plan include the final probe movement trajectory and parameter change record recorded during this scanning process, which are used for comparative analysis during follow-up examinations.
[0017] To achieve the second objective of this invention, the following solution is adopted: A female pelvic floor muscle ultrasound assessment system based on TD3 reinforcement learning, used to perform the female pelvic floor muscle ultrasound assessment method based on TD3 reinforcement learning as described in one of the objectives of this invention, comprising: A color ultrasound imaging module includes a color ultrasound host and a dedicated ultrasound probe assembly, wherein the dedicated ultrasound probe assembly is connected to the color ultrasound host; the color ultrasound imaging module is used to acquire ultrasound image sequences of the female pelvic floor muscles in real time and the corresponding probe pose parameters and imaging parameters. The data acquisition and processing module is connected to the color ultrasound host and is used to acquire the ultrasound image sequence, probe pose parameters and imaging parameters, and to perform preprocessing and feature extraction to construct the current state vector. The reinforcement learning evaluation module based on TD3 is connected to the color ultrasound host and the data acquisition and processing module respectively. It has a built-in pre-trained TD3 reinforcement learning model, which is used to receive the current state vector and output action instructions in the continuous action space, and at the same time perform experience playback and parameter update of the model according to the instant reward. The color ultrasound host adjusts the probe pose parameters and imaging parameters according to the received action commands to form a closed-loop control until the termination conditions are met and the evaluation results are output.
[0018] Compared with the prior art, the beneficial effects of the present invention are as follows: 1. This invention reduces reliance on operator experience and improves the success rate of standard section acquisition. By utilizing a pre-trained TD3 reinforcement learning model, this invention automatically outputs probe pose and imaging parameter adjustment commands in continuous motion space based on the real-time constructed current state vector. This achieves intelligent guidance during the ultrasound scanning process, avoiding difficulties in acquiring standard sections due to differences in operator technique, and enabling operators of varying experience levels to stably acquire ultrasound sections that meet diagnostic requirements.
[0019] 2. This invention improves the efficiency and repeatability of key indicator measurement. Through closed-loop iterative control, this invention automatically captures keyframe images when preset termination conditions are met, and automatically completes the measurement of key anatomical structures and functional indicators based on these images. This eliminates the need for manual frame-by-frame selection and annotation, significantly shortening the time required for a single evaluation. It also eliminates subjective errors and omissions caused by manual measurement, ensuring consistency and repeatability of results across multiple measurements and between different operators.
[0020] 3. This invention achieves a closed-loop linkage between image data and evaluation analysis. It deeply integrates the "state-action-reward" mechanism of reinforcement learning with the ultrasound image acquisition process. At each moment, the ultrasound image sequence, probe pose, and imaging parameters all participate in decision-making as states. Instant reward feedback drives the model to continuously optimize the next action, making the image acquisition process itself part of the evaluation analysis. This breaks the traditional fragmented "acquisition first, analysis later" model in color Doppler ultrasound evaluation, forming a closed loop of acquisition-evaluation-feedback, providing continuous data support for the dynamic adjustment of subsequent individualized intervention plans.
[0021] 4. This invention meets the need for rapid and objective clinical assessment. The diagnostic assessment report and individualized intervention plan automatically generated by this invention cover assessment results under multiple states, including resting and exertion states. It realizes a one-click process from data collection to report output, which greatly improves the efficiency of clinical work, while ensuring the objectivity and quantitative comparability of assessment results, facilitating patient follow-up and comparison and tracking of rehabilitation effects. Attached Figure Description
[0022] Figure 1 This is a flowchart of the ultrasound assessment method for female pelvic floor muscles based on TD3 reinforcement learning in an embodiment of the present invention; Figure 2 This is a structural block diagram of the female pelvic floor muscle ultrasound assessment system based on TD3 reinforcement learning in an embodiment of the present invention; Figure 3 This is a schematic diagram of the female pelvic floor muscle ultrasound assessment system based on TD3 reinforcement learning in an embodiment of the present invention. Figure 4 This is a schematic diagram of the color ultrasound diagnostic algorithm strategy for female pelvic floor muscle rehabilitation in an embodiment of the present invention; Figure 5 This is a schematic diagram of the TD3 algorithm in an embodiment of the present invention. Detailed Implementation
[0023] The present invention will now be further described in conjunction with the accompanying drawings and specific embodiments. It should be noted that, without conflict, the various embodiments or technical features described below can be arbitrarily combined to form new embodiments.
[0024] Example 1 This invention provides a female pelvic floor muscle ultrasound assessment method based on TD3 reinforcement learning, which can achieve standard section guidance and automatic measurement, quickly obtain assessment results under multiple states such as resting and exertion, and generate assessment reports and rehabilitation plans with one click, which can meet the clinical requirements for rapid, objective and repeatable assessment.
[0025] like Figure 1 As shown in the figure, the ultrasound assessment method for female pelvic floor muscles based on TD3 reinforcement learning according to an embodiment of the present invention includes the following steps: Step S1: The color ultrasound host controls the dedicated ultrasound probe to acquire the current moment's female pelvic floor muscle ultrasound image sequence and the corresponding probe pose parameters and imaging parameters, and constructs the current state vector.
[0026] Step S2: Input the current state vector into the pre-trained TD3 reinforcement learning model, and the TD3 reinforcement learning model outputs the current action command in the continuous action space.
[0027] Step S3: According to the current action command, adjust the probe pose parameters and imaging parameters, and acquire the ultrasound image sequence and corresponding probe pose parameters and imaging parameters at the next moment to construct the next state vector.
[0028] Step S4: Calculate the immediate reward based on the current state vector, the current action instruction, and the next state vector, and store the state transition quadruple into the experience replay pool. The state transition quadruple includes the current state vector, the current action instruction, the immediate reward, and the next state vector.
[0029] Step S5: Iteratively execute steps S1 to S4 until the preset termination condition is met, and obtain keyframe images that meet the evaluation requirements and the corresponding probe pose parameters and imaging parameters.
[0030] Step S6: Based on the keyframe images, automatically measure key anatomical structures and functional indicators, and generate a diagnostic assessment report and individualized intervention plan.
[0031] Further, in step S1, the constructed current state vector includes: the section score, quality score, and confidence of key anatomical points obtained from the ultrasound image sequence through the section discrimination and quality assessment network; the probe pose parameters; the imaging parameters; and a stage marker used to distinguish between the resting state and the Valsalva exertion state.
[0032] Further, in step S2, the current action instruction in the continuous action space output by the TD3 reinforcement learning model includes: The probe control actions include the translational displacement, rotation angle, and contact pressure adjustment of the probe in the perineal region. The imaging parameter adjustment action includes gain fine-tuning, imaging depth fine-tuning, and focus position adjustment.
[0033] Furthermore, in step S4, the formula for calculating the immediate reward is as follows: the reward function is calculated based on the improvement of the current facet score, quality score, and confidence of the key dissection point, and simultaneously incorporates the time penalty of the current step and the magnitude penalty of the current action instruction.
[0034] Furthermore, in step S5, the preset termination conditions include: the current section score exceeds a preset first threshold, the quality score exceeds a preset second threshold, and the confidence level of the key anatomical point exceeds a preset third threshold.
[0035] Furthermore, the training process of the TD3 reinforcement learning model includes: Construct an offline dataset that includes the expert scanning process, wherein each record in the offline dataset contains a state transition quadruple; Randomly sample state transition quadruples from the offline dataset; The next state vector in the sampled state transition quadruple is input into the target policy network to generate the target action. The target action and the immediate reward are input into two sets of target value networks. The smaller value output by the two sets of target value networks is used to construct the temporal difference target. The parameters of the two sets of value networks are updated using the time-series difference objective as a supervision signal; After updating the value network several times, the parameters of the policy network are updated once, and the parameters of the target policy network and the target value network are synchronized using a soft update method.
[0036] Furthermore, during the update phase, the TD3 reinforcement learning model generates a target action based on the next state vector by the target policy network, and adds smooth noise of a limited amplitude to the target action to suppress Q-value overestimation and policy oscillation.
[0037] Furthermore, in step S5, when the preset termination condition is met, a freeze action is triggered and the current frame image is captured. The current frame image is used as a key frame image, and the key frame image, along with its corresponding probe pose parameters and imaging parameters, are stored in the inspection cache.
[0038] Furthermore, in step S6, the generated diagnostic assessment report and individualized intervention plan include the final probe movement trajectory and parameter change record recorded during this scanning process, which are used for comparative analysis during follow-up examinations.
[0039] Example 2 like Figure 2 As shown, this embodiment of the invention also provides a female pelvic floor muscle ultrasound assessment system based on TD3 reinforcement learning, used to perform the female pelvic floor muscle ultrasound assessment method based on TD3 reinforcement learning as described in Embodiment 1, including: A color ultrasound imaging module includes a color ultrasound host and a dedicated ultrasound probe assembly, the dedicated ultrasound probe assembly being connected to the color ultrasound host; the color ultrasound imaging module is used to acquire ultrasound image sequences of the female pelvic floor muscles in real time, as well as the corresponding probe pose parameters and imaging parameters.
[0040] The data acquisition and processing module is connected to the color ultrasound host and is used to acquire the ultrasound image sequence, probe pose parameters and imaging parameters, and perform preprocessing and feature extraction to construct the current state vector.
[0041] The reinforcement learning evaluation module based on TD3 is connected to the color ultrasound host and the data acquisition and processing module respectively. It has a built-in pre-trained TD3 reinforcement learning model, which is used to receive the current state vector and output action instructions in the continuous action space, and at the same time perform experience playback and parameter update of the model according to the instant reward.
[0042] The color ultrasound host adjusts the probe pose parameters and imaging parameters according to the received action commands to form a closed-loop control until the termination conditions are met and the evaluation results are output.
[0043] The following is a more detailed description of the female pelvic floor muscle ultrasound assessment system based on TD3 reinforcement learning, according to an embodiment of the present invention.
[0044] like Figure 3 As shown, the color ultrasound host issues control commands according to a predetermined control strategy, guiding the probe to perform ultrasound scanning tasks and acquire ultrasound signals from the target area. The acquired signals are processed and converted by the data acquisition system to generate standardized data, including image feature information and probe pose parameters (such as translation position, rotation angle, and contact pressure). The processed data is input to the data acquisition and analysis system, which analyzes the input ultrasound image features and pose data, extracts effective information, and provides a basis for subsequent optimization decisions. Next, the information processed by the analysis system is input to the TD3 reinforcement learning algorithm module. This module, based on a reinforcement learning strategy, performs deep learning and optimization using the input data to calculate the optimal probe control strategy. After algorithm optimization, corresponding control commands are output through a feedback mechanism to adjust probe pose, pressure, and other parameters, ensuring continuous improvement in ultrasound image quality and acquisition accuracy. Finally, the optimized control commands are adjusted by the color ultrasound host to complete the next step of probe action or imaging parameter adjustment, thereby achieving precise acquisition and optimization of ultrasound images.
[0045] like Figure 4 As shown, an offline dataset D is established. Offline dataset D includes ultrasound image sequences recorded during expert scanning, corresponding probe pose / contact pressure information, and imaging parameter settings. It also synchronously saves each step of the action, whether the cross-section is standard, key anatomical point annotations, and measurement results. During training, state transition quadruples are randomly sampled from offline dataset D in batches of size Batch_size. Used for updating TD3 model parameters.
[0046] During an actual examination, the system acquired image sequences I in real time from the color Doppler ultrasound host. t And simultaneously read the probe pose parameter P t With imaging parameter G t Among them, the pose parameter P tIncluding the translational position of the probe in the perineal region (x t ,y t Rotation angle (θ) t ,η t and contact pressure p t Imaging parameters G t This includes gain, depth, focus, and color Doppler pulsed light (PRF).
[0047] The image sequence I t Input the slice discrimination quality assessment and key anatomical point localization network to obtain the slice score q. t Quality score m t and the confidence level k of key anatomical points t (Or key structure segmentation results). Current system build status: S t =[q t , m t , k t , P t G t [Stage marker] The stage markers are used to distinguish between the resting state and the Valsalva exertion state.
[0048] In this embodiment, the TD3 model outputs continuous action A. t The action consists of two parts: probe manipulation and imaging parameter adjustment, namely: in: The probe is moved slightly left and right or up and down in the perineal area (e.g., 0.5–2 mm per step). The probe is rotated about its own axis (e.g., 0.5–3° per step). Adjust the pressure by gently pressing or releasing the probe (e.g., ±0.1–0.3 N per step, or by using a relative pressure rating). Gain fine-tuning (e.g., ±1–3 dB per step); Fine-tuning (e.g., ±0.5–1 cm per step); Adjust the focus position (e.g., move the focus up or down by 1-2 increments).
[0049] The system can optionally set motion safety constraints to limit the amplitude of motion, ensuring that changes in probe pose and pressure do not exceed preset thresholds, and that imaging parameters are within the allowable range of the equipment.
[0050] When the termination condition is not met (i.e., the section score q) t The confidence level k of the critical anatomical point was not reached. t (Insufficient), the system will perform action A. t The probe control and image adjustment module controls the operator to move the probe in the indicated direction (or drives the robotic arm to perform fine adjustments), and simultaneously adjusts the color Doppler ultrasound imaging parameters. The system then acquires the image for the next moment. t+1 and repeatedly build new states. t+1 .
[0051] For example, when the system detects insufficient confidence at the pubic symphysis key point and a tilted section, the TD3 output action might be: Δy = +1.0mm (Shift to the side of the head), Δθ = -2° (Return to center counterclockwise), (Light pressure increases coupling), and outputs simultaneously. Gain = +2dB to improve structural boundary contrast; data acquisition after execution. t+1 And re-evaluate.
[0052] The system calculates rewards based on the section score, quality score, key point confidence, time taken, and range of motion. in, Penalty for the time or number of steps taken in the current step. This is a penalty for the range of motion, used to avoid frequent, large adjustments. Reward R t With state transition Write the experience replay pool D for training or online updates.
[0053] When the termination conditions are met, the system terminates the current merge and output of measurement indicators and evaluation conclusions. The termination conditions include: Section score ; Quality Score ; Confidence of key anatomical points ; Furthermore, it can be required to obtain keyframes that meet the threshold in both the resting and Valsalva states.
[0054] At the termination point, the system triggers a "freeze / capture frame" action, capturing the current frame and its corresponding probe pose P. t With imaging parameter G t The data is stored in the inspection cache, and the measurement module automatically measures key structures (such as indicators related to pores and anterior pelvic cavity), generating measurement results and evaluation conclusions. At the same time, the system can output the final probe movement trajectory and parameter change record of this scan, which is convenient for review and follow-up comparison.
[0055] like Figure 5 As shown, the TD3 algorithm is used to implement policy learning in a continuous action space. The system includes: a policy network (Actor) and its target network, two sets of value networks (Critic1 / Critic2) and their target networks, an experience replay pool (D), and a parameter soft update module. During the interaction process, the system records the state of the environment at time t. t The input policy network Actor obtains a series of actions At, and receives a reward after performing the actions. t and the next state S t+1 Transition quadruple Store the samples in the experience replay pool D. During training, randomly select a batch of samples from the experience replay pool D to update the network parameters.
[0056] During the update phase, the target policy network is the first to be updated. Based on the next state Generate target action And add smooth noise of a limited amplitude to the target action to suppress Q-value overestimation and policy oscillation; then... Input two sets of target value networks respectively , Two target Q values are obtained, and the smaller of the two values is taken as the target Q value. Thus, the temporal difference objective is constructed: in, This is the discount factor. Then, using... As a monitoring signal, calculate the loss function (Loss) for both sets of value networks. Loss Gradient descent is then applied to Critic1 / Critic2 to update the action value function. A more accurate estimate.
[0057] To improve training stability, TD3 employs a "delayed update" strategy: the policy network (Actor) is updated only after the value network has been updated several times. The update objective of the policy network is to maximize the Q-value evaluated by the value network, i.e., updating the Actor through gradient ascent to obtain a higher value estimate for the output action in the current state. After the policy network update is complete, a soft update method is used to synchronize the target network parameters. in, Indicates online network parameters, Indicates the target network parameters. The soft update coefficient is used. The above interactive sampling, experience replay, minimum value taking of the dual-value network, noise smoothing of the target action, delayed update and soft update process are executed cyclically until the preset training termination condition is met, thereby obtaining a converged policy network for outputting measurement indicators and evaluation conclusions.
[0058] The female pelvic floor muscle ultrasound assessment system based on TD3 reinforcement learning in this invention uses color ultrasound imaging data as a basis. By introducing the TD3 multidimensional continuous motion space reinforcement learning strategy, it learns and models ultrasound image features and related dynamic parameters to achieve intelligent diagnosis and comprehensive assessment of the functional status of female pelvic floor muscles, thereby improving the objectivity and consistency of the assessment results.
[0059] The female pelvic floor muscle ultrasound assessment system based on TD3 reinforcement learning in this invention acquires color Doppler ultrasound image sequences under resting and exertion states through a probe, performs preprocessing and feature extraction, judges the imaging quality and standard sections, uses reinforcement learning strategies to achieve section guidance and keyframe selection, completes the automatic measurement of key anatomical structures and functional indicators, and performs fusion analysis on the assessment results, finally automatically generating a diagnostic assessment report and individualized training or intervention suggestions.
[0060] The TD3 reinforcement learning-based ultrasound assessment system for female pelvic floor muscles in this invention can improve the accuracy and consistency of assessments. During color Doppler ultrasound assessment of female pelvic floor muscles, the system first achieves standard section pre-positioning through instructional guidance and image quality judgment. Then, a multi-dimensional continuous reinforcement learning strategy optimizes the scanning process, automatically selecting keyframes and completing stable capture of resting and exertion states, thereby avoiding section deviations and poor repeatability caused by differences in operator experience. Simultaneously, the system automatically measures and fuses key anatomical structures and functional indicators, reducing manual measurement errors and omissions, effectively improving assessment efficiency and result comparability, facilitating follow-up comparisons, and generating individualized training or intervention plans.
[0061] The above is a detailed description of the preferred embodiments of the present invention. However, the present invention is not limited to the embodiments described. Those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention. All such equivalent modifications or substitutions are included within the scope defined by the claims of this application.
Claims
1. A female pelvic floor muscle ultrasound assessment method based on TD3 reinforcement learning, characterized in that, Includes the following steps: Step S1: The color ultrasound host controls the dedicated ultrasound probe to acquire the current moment's female pelvic floor muscle ultrasound image sequence and the corresponding probe pose parameters and imaging parameters, and constructs the current state vector; Step S2: Input the current state vector into the pre-trained TD3 reinforcement learning model, and the TD3 reinforcement learning model outputs the current action instruction in the continuous action space; Step S3: According to the current action command, adjust the probe pose parameters and imaging parameters, and acquire the ultrasound image sequence and corresponding probe pose parameters and imaging parameters at the next moment to construct the next state vector; Step S4: Calculate the immediate reward based on the current state vector, the current action instruction, and the next state vector, and store the state transition quadruple into the experience replay pool. The state transition quadruple includes the current state vector, the current action instruction, the immediate reward, and the next state vector. Step S5: Iteratively execute steps S1 to S4 until the preset termination condition is met, and obtain keyframe images that meet the evaluation requirements and the corresponding probe pose parameters and imaging parameters. Step S6: Based on the keyframe images, automatically measure key anatomical structures and functional indicators, and generate a diagnostic assessment report and individualized intervention plan.
2. The ultrasound assessment method for female pelvic floor muscles based on TD3 reinforcement learning according to claim 1, characterized in that, In step S1, the constructed current state vector includes: the section score, quality score, and confidence of key anatomical points obtained from the ultrasound image sequence through the section discrimination and quality assessment network; the probe pose parameters; the imaging parameters; and the stage marker used to distinguish between the resting state and the Valsalva exertion state.
3. The ultrasound assessment method for female pelvic floor muscles based on TD3 reinforcement learning according to claim 1, characterized in that, In step S2, the current action instructions output by the TD3 reinforcement learning model in the continuous action space include: The probe control actions include the translational displacement, rotation angle, and contact pressure adjustment of the probe in the perineal region. The imaging parameter adjustment action includes gain fine-tuning, imaging depth fine-tuning, and focus position adjustment.
4. The ultrasound assessment method for female pelvic floor muscles based on TD3 reinforcement learning according to claim 1, characterized in that, In step S4, the formula for calculating the immediate reward is as follows: the reward function is calculated based on the improvement of the current facet score, quality score, and confidence of the key dissection point, and simultaneously incorporates the time penalty of the current step and the magnitude penalty of the current action instruction.
5. The ultrasound assessment method for female pelvic floor muscles based on TD3 reinforcement learning according to claim 1, characterized in that, In step S5, the preset termination conditions include: the current section score exceeds a preset first threshold, the quality score exceeds a preset second threshold, and the confidence level of the key anatomical point exceeds a preset third threshold.
6. The ultrasound assessment method for female pelvic floor muscles based on TD3 reinforcement learning according to claim 1, characterized in that, The training process of the TD3 reinforcement learning model includes: Construct an offline dataset that includes the expert scanning process, wherein each record in the offline dataset contains a state transition quadruple; Randomly sample state transition quadruples from the offline dataset; The next state vector in the sampled state transition quadruple is input into the target policy network to generate the target action. The target action and the immediate reward are input into two sets of target value networks. The smaller value output by the two sets of target value networks is used to construct the temporal difference target. The parameters of the two sets of value networks are updated using the time-series difference objective as a supervision signal; After updating the value network several times, the parameters of the policy network are updated once, and the parameters of the target policy network and the target value network are synchronized using a soft update method.
7. The ultrasound assessment method for female pelvic floor muscles based on TD3 reinforcement learning according to claim 6, characterized in that, During the update phase, the TD3 reinforcement learning model generates a target action based on the next state vector by the target policy network, and adds smooth noise of a limited amplitude to the target action to suppress Q-value overestimation and policy oscillation.
8. The ultrasound assessment method for female pelvic floor muscles based on TD3 reinforcement learning according to claim 1, characterized in that, In step S5, when the preset termination condition is met, a freeze action is triggered and the current frame image is captured. The current frame image is used as a key frame image, and the key frame image, along with its corresponding probe pose parameters and imaging parameters, are stored in the inspection cache.
9. The ultrasound assessment method for female pelvic floor muscles based on TD3 reinforcement learning according to claim 1, characterized in that, In step S6, the generated diagnostic assessment report and individualized intervention plan include the final probe movement trajectory and parameter change record recorded during this scan, which are used for comparative analysis during follow-up examinations.
10. A female pelvic floor muscle ultrasound assessment system based on TD3 reinforcement learning, used to perform the female pelvic floor muscle ultrasound assessment method based on TD3 reinforcement learning as described in any one of claims 1-9, characterized in that, include: A color ultrasound imaging module includes a color ultrasound host and a dedicated ultrasound probe assembly, wherein the dedicated ultrasound probe assembly is connected to the color ultrasound host; the color ultrasound imaging module is used to acquire ultrasound image sequences of the female pelvic floor muscles in real time and the corresponding probe pose parameters and imaging parameters. The data acquisition and processing module is connected to the color ultrasound host and is used to acquire the ultrasound image sequence, probe pose parameters and imaging parameters, and to perform preprocessing and feature extraction to construct the current state vector. The reinforcement learning evaluation module based on TD3 is connected to the color ultrasound host and the data acquisition and processing module respectively. It has a built-in pre-trained TD3 reinforcement learning model, which is used to receive the current state vector and output action instructions in the continuous action space, and at the same time perform experience playback and parameter update of the model according to the instant reward. The color ultrasound host adjusts the probe pose parameters and imaging parameters according to the received action commands to form a closed-loop control until the termination conditions are met and the evaluation results are output.