A video-based parkinsonian rest tremor assessment method and device, electronic equipment and medium
By using K-means clustering and smoothing depth model noise reduction, combined with a deep classification model, the accuracy problem of assessing the resting tremor level in Parkinson's disease in existing technologies has been solved, achieving more accurate tremor assessment and objective judgment, and is applicable to multi-platform systems.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NANJING UNIV OF POSTS & TELECOMM
- Filing Date
- 2023-04-07
- Publication Date
- 2026-06-19
AI Technical Summary
Existing end-to-end deep learning models have low accuracy in assessing the level of resting tremor in Parkinson's disease, and video human pose estimation algorithms are subject to noise interference, resulting in inaccurate assessment results.
A video-based method for assessing resting tremor in Parkinson's disease is adopted. The K-means clustering algorithm is used to distinguish the difference scores of skeleton points. Noise reduction is performed using smooth depth models for small and large tremors respectively. The tremor motion information is calculated by combining the depth classification model to provide a more accurate assessment of tremor level.
It improves the accuracy of assessing resting tremor in Parkinson's disease, provides more objective criteria for judgment, is applicable to mobile phones and web-server systems, has scalability and portability, and improves assessment efficiency.
Smart Images

Figure CN116386868B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of artificial intelligence and video image processing technology, specifically to a video-based method, device, electronic device, and medium for assessing resting tremor in Parkinson's disease. Background Technology
[0002] Parkinson's disease is the second leading cause of neurodegenerative disease after Alzheimer's, affecting nearly 10 million people worldwide. Its main symptoms include tremor, rigidity, bradykinesia, or postural instability. Because the diagnosis of Parkinson's disease is highly dependent on the subjective experience of physicians, a meta-analysis showed that the error rate for non-expert physicians in diagnosing Parkinson's disease was approximately 26.2%, while the error rate for expert physicians ranged from 16.1% (initial diagnosis) to 20.4% (follow-up diagnosis). To reduce the reliance on subjective experience in diagnosis, some neurological and radiological detection devices and wearable sensor devices have been used in the field of auxiliary diagnosis of Parkinson's disease. However, the high cost and physical harm to patients from the former, and the inefficiency of wearing sensors and the high dependence of diagnostic accuracy on sensor quality from the latter, have prevented the widespread adoption of these two auxiliary diagnostic technologies for Parkinson's disease.
[0003] With the development of information technology, traditional image and video processing technology and deep learning technology in the field of artificial intelligence have been widely used in the auxiliary diagnosis of Parkinson's disease, making the diagnostic process more convenient and intelligent for doctors. However, the following problems exist:
[0004] 1. Inputting patient images and video information into an end-to-end deep learning model and directly outputting Parkinson's disease grade information has low accuracy and strong uninterpretability due to the difficulty in capturing the movement posture process of Parkinson's disease. Specifically, Mandy Lu et al.[1] calculated the Euclidean distance between adjacent skeleton points and the movement speed of the same skeleton point based on the skeleton point sequence extracted from the video. Then, they input the skeleton point distance and skeleton point speed into the classification deep model to output the grade of Parkinson's disease. Mark Endo et al.[2] input the skeleton point sequence into the Transformer deep model and used the deep model to learn the connection relationship between the corresponding skeleton points to output the grade information of Parkinson's disease. Although the Parkinson's disease grade information output by these two methods has a certain reference value, their low accuracy may mislead the judgment of Parkinson's disease. Moreover, it is difficult to find relevant assessment information that can reduce subjective experience from a single Parkinson's disease grade information. The physician's assessment of human posture is still based on the patient's original image and video information.
[0005] 2. When deep learning technology is used to estimate human pose in videos, the design of the loss function of existing algorithms is mainly aimed at optimizing the L1 / L2 (absolute distance / Euclidean distance) loss between the predicted position and the ground truth position of the skeleton point. Specifically, the loss function adopted by Cao et al.[3] and Sun et al.[4] is mainly designed for the L2 Euclidean distance between the skeleton point position in the predicted heatmap and the skeleton point position in the ground truth heatmap; however, L1 and L2 are both undirected, so the existing algorithms will have obvious jitter when processing challenging video frames. When using this technology to estimate the patient's tremor, it is equivalent to adding some algorithm noise to the original tremor, so that the extracted patient's pose does not conform to the real pose, reducing the accuracy of the motion pose information obtained by calculation, affecting the input authenticity of the subsequent rating deep model and the accuracy of the rating information, thus producing a wrong guidance for the evaluation of human pose.
[0006] The references mentioned above are as follows:
[0007] [1] Lu, M., Poston, K., Pfefferbaum, A., Sullivan, EV, Fei-Fei, L.,Pohl, KM, Niebles,
[0008] JC, Adeli, E.: Vision-based estimation of mds-updrs gait scores for assessing parkinson's disease motor severity. In Medical Image Computing and Computer Assisted Intervention: 2020. 637–647.
[0009] [2] Endo, M., Poston, KL, Sullivan, EV, Fei-Fei, L., Pohl, KM, Adeli, E.: Gaitforemer: Self-supervised pre-training of transformers via human motion forecasting for few-shot gait impairment severity estimation. In: Medical Image Computing and Computer Assisted Intervention: 2022. 130–139.
[0010] [3] Cao, Z., Simon, T., Wei, SE, Sheikh, Y.: Realtime multi-person2d pose estimation using part affinity fields. In: Proceedings of the IEEEconference on computer vision and pattern recognition: 2017. 7291–7299. Summary of the Invention
[0011] The purpose of this invention is to overcome the shortcomings of the prior art and provide a video-based method, device, electronic device and medium for assessing resting tremor in Parkinson's disease, so as to solve the problems of low accuracy of end-to-end deep learning models in assessing tremor level and strong interference of noise in video human posture estimation algorithms on tremor estimation.
[0012] To achieve the above objectives, the present invention is implemented using the following technical solution:
[0013] In a first aspect, the present invention provides a video-based method for assessing resting tremor in Parkinson's disease, the method comprising:
[0014] Based on the acquired patient video to be evaluated, the pixel neighborhood blocks of different skeleton points on each frame of the video are determined, and the difference score between the pixel neighborhood blocks corresponding to each skeleton point in each two adjacent frames is calculated.
[0015] Perform K-means clustering with 3 types on the difference scores of each skeleton point, and determine whether the type of the skeleton point with the highest difference score contains the root skeleton point.
[0016] If the data is not present, the skeleton points of this type are input into the large tremor smoothing depth model for noise reduction, and the skeleton points of the other two types are input into the small tremor smoothing depth model for noise reduction. If the data is present, all three types of skeleton points are input into the small tremor smoothing depth model for noise reduction to obtain the overall skeleton point coordinate sequence after noise reduction.
[0017] Based on the denoised overall skeleton point coordinate sequence, the tremor motion information of each skeleton point is calculated, and the tremor motion information is input into a preset depth classification model to obtain the assessment result of the resting tremor level of Parkinson's disease.
[0018] In conjunction with the first aspect, preferably, determining the pixel neighborhood blocks of different skeleton points on each frame of the acquired patient evaluation video includes the following steps:
[0019] The acquired patient video to be evaluated is input into a preset human pose estimation model to obtain a two-dimensional skeleton point coordinate sequence of all frames of the video.
[0020] Using the coordinates of each two-dimensional skeleton point as the center, a square pixel neighborhood block of fixed size is determined on each corresponding frame image to obtain the pixel neighborhood block of different skeleton points on each frame image.
[0021] The human pose estimation model includes a Faster R-CNN deep region convolutional network and an HRNet high-resolution network. The Faster R-CNN deep region convolutional network is trained using bounding box data of people labeled in the MSCOCO dataset and is used to predict the rectangular bounding box of a person in an image. The HRNet high-resolution network is trained using skeleton point data of people labeled in the MSCOCO dataset and is used to predict the position of skeleton points of the person in the rectangular bounding box.
[0022] In conjunction with the first aspect, preferably, the formula for calculating the difference score is:
[0023]
[0024] In the formula: Indicates the first The difference score of each skeleton point This represents the total number of frames in the video. A function representing the number of non-zero pixels in the neighborhood of the pixel with the statistical difference. Indicates the first The first frame of the image The coordinates of the skeleton point are centered at the [number]th skeleton point. The pixel neighborhood block determined on the frame image, This represents the side length of the determined pixel neighborhood block.
[0025] In conjunction with the first aspect, preferably, the large tremor smoothing depth model includes a three-branch fully connected layer for learning the coordinates, velocity, and acceleration of the skeleton points respectively; each of the three-branch fully connected layers of the large tremor smoothing depth model is provided with four fitting layer modules;
[0026] The large tremor smoothing depth model is obtained by training 2D motion coordinate data marked with infrared reflective markers in the Human3.6 public dataset as ground truth data and HRNet network to detect noisy 2D motion coordinate data of human motion videos in the Human3.6 public dataset.
[0027] In conjunction with the first aspect, preferably, the small tremor smoothing depth model includes a three-branch fully connected layer for learning the coordinates, velocity, and acceleration of the skeleton points respectively; each of the fully connected layers of the small tremor smoothing depth model is provided with four temporal convolutional modules;
[0028] The small tremor smoothing depth model is obtained by training a dataset with small motion amplitude constructed by extending the frame 2D coordinate data.
[0029] In conjunction with the first aspect, preferably, the calculation of the tremor motion information of each skeletal point includes calculating the motion amplitude, motion acceleration, and motion frequency of each skeletal point;
[0030] The calculation of the motion amplitude of each skeleton point includes: calculating the skeleton points of each frame except the first frame. Coordinates and skeleton points of the first frame The Euclidean distance of the coordinates is obtained The formula for calculating the motion amplitude of each skeleton point is:
[0031]
[0032] In the formula, It indicates the first Frame Image The range of motion of each skeletal point It indicates the first Frame Image The location of each skeletal point. It indicates the first Frame Image The location of each skeleton point;
[0033] The calculation of the motion acceleration of each skeleton point includes: obtaining the motion velocity of the skeleton point by subtracting the motion amplitude of the skeleton point from the motion amplitude of each two adjacent frames, and then obtaining the motion acceleration of the skeleton point by subtracting the motion velocities of the two frames again. The calculation formula is as follows:
[0034]
[0035]
[0036] In the formula: and They represent the first Frame and the The first frame of the image The movement speed of each skeleton point They represent the first Frame and the The first frame of the image The range of motion of each skeleton point; It indicates the first Frame Image The motion acceleration of the skeletal points;
[0037] The calculation of the motion frequency of each skeleton point includes: calculating the reciprocal of the peak interval of the motion amplitude of each skeleton point, which is the motion frequency of that skeleton point. The calculation formula is as follows:
[0038]
[0039] In the formula: It indicates the first The movement frequency of the skeleton points Representing skeleton points respectively No. and The time corresponding to each peak.
[0040] In conjunction with the first aspect, preferably, the deep classification model includes three feature extraction branches with identical structures, a fully connected layer module, and a Softmax hierarchical module; each feature extraction branch includes a fully connected layer module for extracting head features, four temporal convolutional modules for fitting the tremor motion features of skeleton points, and a fully connected layer module for integrating the output of the temporal convolutional modules into hierarchical features.
[0041] In a second aspect, the present invention provides a video-based assessment device for resting tremor in Parkinson's disease, the device comprising:
[0042] The acquisition module is used to determine the pixel neighborhood blocks of different skeleton points on each frame of the acquired patient video to be evaluated, and to calculate the difference score between the pixel neighborhood blocks corresponding to each skeleton point in each two adjacent frames.
[0043] The clustering judgment module is used to perform K-means clustering with 3 types on the difference scores of each skeleton point, and to determine whether the skeleton point with the highest difference score contains a root skeleton point in its type.
[0044] The noise reduction module is used to input the skeleton points of this type into the large tremor smoothing depth model for noise reduction if the clustering judgment module determines that the cluster points do not contain the data, and input the skeleton points of the other two types into the small tremor smoothing depth model for noise reduction. If the clustering judgment module determines that the cluster points contain the data, then all three types of skeleton points are input into the small tremor smoothing depth model for noise reduction to obtain the overall skeleton point coordinate sequence after noise reduction.
[0045] The grading assessment module is used to calculate the tremor motion information of each skeleton point based on the denoised overall skeleton point coordinate sequence, and input the tremor motion information into a preset depth classification model to obtain the assessment result of the resting tremor grade of Parkinson's disease.
[0046] Thirdly, the present invention provides an electronic device, including a processor and a storage medium;
[0047] The storage medium is used to store instructions;
[0048] The processor is configured to operate according to the instructions to perform the steps of the video-based assessment method for resting tremor in Parkinson's disease as described in any of the first aspects.
[0049] Fourthly, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the video-based method for assessing resting tremor in Parkinson's disease as described in any of the first aspects.
[0050] Compared with the prior art, the beneficial effects achieved by the present invention are as follows:
[0051] This invention utilizes the k-means clustering algorithm to classify the difference scores between pixel neighborhoods of adjacent skeleton points into three categories, effectively distinguishing skeleton points with different tremor sizes. It employs small tremor smoothing depth models and large tremor smoothing depth models to denoise different types of skeleton points, obtaining a denoised overall skeleton point coordinate sequence. This effectively reduces jitter in existing human pose estimation algorithms while preserving the patient's resting tremor to the greatest extent possible, avoiding the filtering out of resting tremors exhibited by some filtering algorithms, thus significantly improving assessment accuracy. Furthermore, the resting tremor motion information calculated from the two-dimensional skeleton point coordinate sequence extracted from the patient's video to be evaluated by this invention provides doctors with a more objective understanding of the patient's tremor. Compared to existing technologies that only provide doctors with grade information, the tremor motion information combined with tremor grade information provided by this invention offers doctors more objective judgment criteria. Moreover, this invention can be easily deployed on mobile phones and web-server front-end and back-end systems, exhibiting scalability and portability, effectively improving the assessment efficiency of resting tremor in Parkinson's disease. Attached Figure Description
[0052] Figure 1 A schematic diagram of the video-based assessment method for resting tremor in Parkinson's disease provided in an embodiment of the present invention;
[0053] Figure 2 A schematic flowchart illustrating the specific implementation of the video-based assessment method for resting tremor in Parkinson's disease provided in this embodiment of the invention.
[0054] Figure 3 A structural principle block diagram of the large tremor smoothing depth model noise reduction processing provided in the embodiments of the present invention;
[0055] Figure 4 This is a block diagram illustrating the structural principle of the small tremor smoothing depth model noise reduction processing provided in an embodiment of the present invention.
[0056] Figure 5 A structural principle block diagram of the deep classification model for processing stationary tremor motion information provided in this embodiment of the invention;
[0057] Figure 6 The structural principle block diagram of the video-based Parkinson's disease resting tremor assessment device provided in the embodiments of the present invention. Detailed Implementation
[0058] The technical solution of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the embodiments and specific features in the embodiments are detailed descriptions of the technical solution of the present application, rather than limitations thereof. In the absence of conflict, the embodiments and technical features in the embodiments can be combined with each other.
[0059] In this article, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this article generally indicates that the preceding and following related objects have an "or" relationship.
[0060] Example 1:
[0061] like Figure 1 As shown in the figure, this invention provides a video-based method for assessing resting tremor in Parkinson's disease, which specifically includes the following steps:
[0062] S1: Based on the acquired patient video to be evaluated, determine the pixel neighborhood blocks of different skeleton points on each frame of the video, and calculate the difference score between the pixel neighborhood blocks corresponding to each skeleton point in each two adjacent frames.
[0063] S2: Perform K-means clustering with 3 types on the difference scores of each skeleton point, and determine whether the type of the skeleton point with the highest difference score contains the root skeleton point.
[0064] S3: If not contained, the skeleton points of this type are input into the large tremor smoothing depth model for noise reduction, and the skeleton points of the other two types are input into the small tremor smoothing depth model for noise reduction; if contained, the skeleton points of all three types are input into the small tremor smoothing depth model for noise reduction to obtain the overall skeleton point coordinate sequence after noise reduction.
[0065] S4: Based on the denoised overall skeleton point coordinate sequence, calculate the tremor motion information of each skeleton point, and input the tremor motion information into a preset depth classification model to obtain the assessment result of the resting tremor level of Parkinson's disease.
[0066] As an embodiment of the present invention, step S1, determining the pixel neighborhood blocks of different skeleton points on each frame of the acquired patient evaluation video, includes the following steps:
[0067] Step A: Input the acquired patient video to be evaluated into the preset human pose estimation model to obtain the two-dimensional skeleton point coordinate sequence of all frames of the video;
[0068] Specifically, the human pose estimation model provided in this embodiment of the invention adopts a top-down deep model strategy with relatively stable and accurate estimation results. It includes a Faster R-CNN deep region convolutional network and an HRNet high-resolution network. That is, the Faster R-CNN deep region convolutional network trained on the MSCOCO public object detection dataset is used to estimate a rectangular box containing the patient's entire torso. Then, within the rectangular box, the HRNet high-resolution network trained on the MSCOCO public human skeleton point dataset is used to estimate the coordinates of the patient's two-dimensional skeleton points. By combining the two-dimensional skeleton points estimated from all video frames in sequence, a sequence of two-dimensional skeleton point coordinates of the patient containing noise can be obtained. This strategy can avoid the misestimation caused by the presence of patient shadows or similar human limbs but not human limbs in the image when the bottom-up human pose estimation strategy first estimates human limbs.
[0069] Step B: Using the coordinates of each two-dimensional skeleton point as the center, determine a square pixel neighborhood block of fixed size on each corresponding frame image to obtain the pixel neighborhood block of different skeleton points on each frame image.
[0070] Specifically, 1 / 6 of the average difference between the maximum and minimum ordinate values of the patient's two-dimensional skeleton point coordinate sequence obtained in step A is set as the width and height of the square pixel neighborhood block; this allows the size of the square pixel neighborhood block to be adjusted according to the resolution of different videos; centering on the two-dimensional skeleton point of each frame, a square pixel neighborhood block with the calculated width and height is determined on a single-frame grayscale image of the patient video, and a square pixel neighborhood block of the same size is determined at the same position in the next grayscale image of the current frame. The proportion of non-zero pixels contained in the difference block formed after calculating the difference between the pixel neighborhood blocks of the previous and next frames to the total number of pixels in the pixel neighborhood block is used as the difference score of the current two-dimensional skeleton point in the current frame; the formula for calculating the difference score is:
[0071]
[0072] In the formula: Indicates the first The difference score of each skeleton point This represents the total number of frames in the video. A function representing the number of non-zero pixels in the neighborhood of the pixel with the statistical difference. Indicates the first The first frame of the image The coordinates of the skeleton point are centered at the [number]th skeleton point. The pixel neighborhood block determined on the frame image, This represents the side length of the determined pixel neighborhood block.
[0073] Furthermore, step S2 in this embodiment of the invention performs K-means clustering with a number of types of 3 on the difference scores of each skeleton point to distinguish three types of skeleton points with different degrees of motion intensity (stronger tremor, weaker tremor, and more stable). The number of clusters of 3 ensures that the classification results can avoid the misclassification that occurs when the number of clusters of 2, which would result in skeleton points with a motion intensity stronger than stable but weaker than strong tremor being classified into the category of skeleton points with stronger tremor. After obtaining the clustering results, if the skeleton point of the category with the highest difference score contains a root skeleton point (waist, hip, chest), then all three types of skeleton points are input into the small tremor smoothing depth model for noise reduction. Among them, the root skeleton point is the intersection point of the two-dimensional skeleton points of the patient's limbs, that is, multiple patient limbs can be drawn from the root skeleton point. When Parkinson's disease patients perform tremor-related detection movements, the root skeleton point serves as the intersection point, and the movement tends to be more stable, swaying, or non-tremor slight movement, with the intensity of movement being much less than that of the main tremor sites of Parkinson's disease. When the skeleton point of the category with the highest difference score in the clustering results contains a root skeleton point, it indicates that the movement of all skeleton points tends to be stable or slight.
[0074] As an embodiment of the present invention, the structure of the large tremor smoothing depth model in step S3 is shown below. Figure 3As shown, the inputs are: a noisy 2D skeleton point coordinate sequence from the video output by the human pose estimation model; a skeleton point velocity sequence calculated from the 2D skeleton point coordinates of the preceding and following frames; and a skeleton point acceleration sequence calculated from the 2D skeleton point coordinates of the preceding and following three frames. The large tremor smoothing depth model feeds the three inputs into three feature extraction branches, and each branch has the same structure, including, in sequence: a fully connected layer module for feature extraction at the beginning; four fitting layer modules for fitting the skeleton point pose features; and a fully connected layer module for integrating the outputs of the fitting layer modules into a noise-removed coordinate sequence, velocity sequence, and acceleration sequence. Finally, the outputs of different branches are concatenated and input into the fully connected layer to adjust the output into the denoised overall skeleton point coordinate sequence.
[0075] The aforementioned large tremor smoothing depth model is trained for 50 epochs using 2D motion coordinate data marked with infrared reflective markers from the Human3.6 public dataset as ground truth and noisy 2D motion coordinate data of human motion videos detected by HRNet from the Human3.6 public dataset as input data. During training, due to the limitations of graphics card memory and the fact that the model is used to capture situations with violent skeleton point movement, this embodiment of the invention uses 4 frames of motion data as a batch input to train the depth model.
[0076] As an embodiment of the present invention, the structure of the small tremor smoothing depth model in step S3 is shown below. Figure 4 As shown, the inputs are a noisy 2D skeleton point coordinate sequence from the video output detected by the human pose estimation model, a skeleton point velocity sequence calculated from the 2D skeleton point coordinates of the preceding and following frames, and a skeleton point acceleration sequence calculated from the 2D skeleton point coordinates of the preceding and following three frames. The small tremor smoothing depth model feeds the three inputs into three feature extraction branches, and each branch has the same structure, including, in sequence, the following: a fully connected layer module for feature extraction at the beginning, four temporal convolutional modules for fitting the skeleton point pose features; a fully connected layer module that integrates the outputs of the temporal convolutional modules into a noise-removed coordinate sequence, velocity sequence, and acceleration sequence; finally, the outputs of different branches are concatenated and input into the fully connected layer to adjust the output into the denoised overall skeleton point coordinate sequence.
[0077] It should be noted that, for fitting the pose features of skeleton points, compared with the fitting module in the large tremor smoothing depth model, the temporal convolution module has a stronger ability to capture the temporal features of input coordinate position, velocity, and acceleration. It has a stronger fitting power for skeleton points that tend to be stable, swaying, and slightly moving, and is more suitable as the backbone structure in the branch used in the small tremor smoothing depth model.
[0078] The aforementioned smooth depth model suitable for small tremors is trained by constructing a dataset with small motion amplitude through frame 2D coordinate data extension. Specifically, the 2D motion coordinate data of one frame marked with an infrared reflective marker in the Human3.6 public dataset is expanded to twenty frames, while the total number of video frames remains unchanged. The excess data is discarded to form the ground truth data. The noisy 2D motion coordinate data of the human motion video in the Human3.6 public dataset detected by HRNet is processed in the same way as the 2D motion coordinate data marked with an infrared reflective marker to form the input data. Similarly, the smooth depth model for small tremors is also trained for 50 epochs. During the training process, due to the limitations of the graphics card memory and the fact that this model is suitable for situations where the skeleton points tend to be stable, swaying, and with slight movement, the motion data of every 64 frames is used as a batch input into the depth model for training.
[0079] Furthermore, the loss functions of the two smooth depth models for skeleton points with different degrees of motion intensity are composed of the coordinate difference between the two-dimensional skeleton points in the training data and the ground truth data, as well as the acceleration difference of the two-dimensional skeleton points obtained by a similar calculation method for the acceleration sequence of skeleton points in the input model.
[0080] In one embodiment of the present invention, in step S4, after obtaining the two-dimensional skeleton point sequence of the patient after noise reduction, the Euclidean distance between the skeleton point coordinates of each frame other than the first frame and the skeleton point coordinates of the first frame is calculated to obtain the motion amplitude of the patient's skeleton points; the motion velocity of the skeleton points in two frames is obtained by subtracting the motion amplitude of the skeleton points in every three frames, and the acceleration of the skeleton point motion is obtained by subtracting the motion velocity of the two frames; the reciprocal of the peak interval of the motion amplitude of the patient's skeleton points is calculated to obtain the motion frequency of the skeleton points; the specific calculation process is as follows:
[0081] a: The calculation of the motion amplitude of each skeleton point includes: calculating the skeleton points of each frame except the first frame. Coordinates and skeleton points of the first frame The Euclidean distance of the coordinates is obtained The formula for calculating the motion amplitude of each skeleton point is:
[0082]
[0083] In the formula, It indicates the first Frame Image The range of motion of each skeletal point It indicates the first Frame Image The location of each skeletal point. It indicates the first Frame Image The location of each skeleton point;
[0084] b: The calculation of the motion acceleration of each skeleton point includes: obtaining the motion velocity of the skeleton point by subtracting the motion amplitude of the skeleton point from the motion amplitude of each two adjacent frames, and then obtaining the motion acceleration of the skeleton point by subtracting the motion velocities of the two frames again. The calculation formula is as follows:
[0085]
[0086]
[0087] In the formula: and They represent the first Frame and the The first frame of the image The movement speed of each skeleton point They represent the first Frame and the The first frame of the image The range of motion of each skeleton point; It indicates the first Frame Image The motion acceleration of the skeletal points;
[0088] c: The calculation of the motion frequency of each skeleton point includes: calculating the reciprocal of the peak interval of the motion amplitude of each skeleton point, which is the motion frequency of that skeleton point. The calculation formula is as follows:
[0089]
[0090] In the formula: It indicates the first The movement frequency of the skeleton points Representing skeleton points respectively No. and The time corresponding to each peak.
[0091] As an embodiment of the present invention, the structure of the deep classification model in step S4 is shown below. Figure 5As shown, the deep classification model includes three structurally identical feature extraction branches, a fully connected layer module, and a Softmax grading module. The inputs to the deep classification model are the calculated skeletal point motion amplitude, acceleration, and frequency. These three inputs are fed into the three structurally identical feature extraction branches. Each feature extraction branch includes a fully connected layer module for extracting head features, four temporal convolutional modules for fitting skeletal point tremor motion features, and a fully connected layer module for integrating the outputs of the temporal convolutional modules into grade features. Finally, the outputs of different branches are concatenated and input into the fully connected layer and Softmax grading module to perform five classifications of the Parkinson's disease resting tremor grade, obtaining the grade information of the Parkinson's disease resting tremor. Finally, combining the skeletal point tremor motion information with the tremor grade information allows for a more objective, reasonable, and accurate assessment of the patient's Parkinson's disease resting tremor status.
[0092] Example 2:
[0093] like Figure 6 As shown, this embodiment of the invention provides a video-based assessment device for resting tremor in Parkinson's disease, which can be used to implement the method described in Embodiment 1. The device includes:
[0094] The acquisition module is used to determine the pixel neighborhood blocks of different skeleton points on each frame of the acquired patient video to be evaluated, and to calculate the difference score between the pixel neighborhood blocks corresponding to each skeleton point in each two adjacent frames.
[0095] The clustering judgment module is used to perform K-means clustering with 3 types on the difference scores of each skeleton point, and to determine whether the skeleton point with the highest difference score contains a root skeleton point in its type.
[0096] The noise reduction module is used to input the skeleton points of this type into the large tremor smoothing depth model for noise reduction if the clustering judgment module determines that the cluster points do not contain the data, and input the skeleton points of the other two types into the small tremor smoothing depth model for noise reduction. If the clustering judgment module determines that the cluster points contain the data, then all three types of skeleton points are input into the small tremor smoothing depth model for noise reduction to obtain the overall skeleton point coordinate sequence after noise reduction.
[0097] The grading assessment module is used to calculate the tremor motion information of each skeleton point based on the denoised overall skeleton point coordinate sequence, and input the tremor motion information into a preset depth classification model to obtain the assessment result of the resting tremor grade of Parkinson's disease.
[0098] The video-based Parkinson's disease resting tremor assessment device provided in this embodiment of the invention is based on the same technical concept as the video-based Parkinson's disease resting tremor assessment method provided in Embodiment 1, and can produce the beneficial effects described in Embodiment 1. For the contents not described in detail in this embodiment, please refer to Embodiment 1.
[0099] Example 3:
[0100] This invention provides an electronic device, including a processor and a storage medium;
[0101] The storage medium is used to store instructions;
[0102] The processor is used to perform operations according to instructions to execute steps according to any one of the methods in Embodiment 1. Embodiment 4:
[0103] This invention provides a computer-readable storage medium storing a computer program thereon, characterized in that, when the computer program is executed by a processor, it implements the steps of any of the methods in Embodiment 1.
[0104] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0105] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0106] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0107] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0108] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the technical principles of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.
Claims
1. A video-based method for assessing resting tremor in Parkinson's disease, characterized in that, The method includes: Based on the acquired patient video to be evaluated, the pixel neighborhood blocks of different skeleton points on each frame of the video are determined, and the difference score between the pixel neighborhood blocks corresponding to each skeleton point in each two adjacent frames is calculated. At the same position in the grayscale image of the next frame of the current frame, a square pixel neighborhood block of the same size is determined. The proportion of the number of non-zero pixels contained in the difference block formed after calculating the difference between the pixel neighborhood blocks of the previous and next frames to the total number of pixels in the pixel neighborhood block is used as the difference score of the current two-dimensional skeleton point in the current frame. Perform K-means clustering with 3 types on the difference scores of each skeleton point, and determine whether the type of the skeleton point with the highest difference score contains a root skeleton point; wherein, the root skeleton point is the waist, hip and chest skeleton point. If the data is not present, the skeleton points of this type are input into the large tremor smoothing depth model for noise reduction, and the skeleton points of the other two types are input into the small tremor smoothing depth model for noise reduction. If the data is present, all three types of skeleton points are input into the small tremor smoothing depth model for noise reduction to obtain the overall skeleton point coordinate sequence after noise reduction. Based on the denoised overall skeleton point coordinate sequence, the tremor motion information of each skeleton point is calculated, and the tremor motion information is input into a preset depth classification model to obtain the assessment result of the resting tremor level of Parkinson's disease.
2. The video-based method for assessing resting tremor in Parkinson's disease according to claim 1, characterized in that, The step of determining pixel neighborhoods of different skeleton points on each frame of the acquired patient evaluation video includes the following steps: The acquired patient video to be evaluated is input into a preset human pose estimation model to obtain a two-dimensional skeleton point coordinate sequence of all frames of the video. Using the coordinates of each two-dimensional skeleton point as the center, a square pixel neighborhood block of fixed size is determined on each corresponding frame image to obtain the pixel neighborhood block of different skeleton points on each frame image. The human pose estimation model includes a Faster R-CNN deep region convolutional network and an HRNet high-resolution network. The Faster R-CNN deep region convolutional network is trained using bounding box data of people labeled in the MSCOCO dataset and is used to predict the rectangular bounding box of a person in an image. The HRNet high-resolution network is trained using skeleton point data of people labeled in the MSCOCO dataset and is used to predict the position of skeleton points of the person in the rectangular bounding box.
3. The video-based method for assessing resting tremor in Parkinson's disease according to claim 1, characterized in that, The formula for calculating the difference score is: In the formula: Indicates the first The difference score of each skeleton point This represents the total number of frames in the video. A function representing the number of non-zero pixels in the neighborhood of the pixel with the statistical difference. Indicates the first The first frame of the image The coordinates of the skeleton point are centered at the [number]th skeleton point. The pixel neighborhood block determined on the frame image, This indicates the side length of the determined pixel neighborhood block.
4. The video-based method for assessing resting tremor in Parkinson's disease according to claim 1, characterized in that, The large tremor smoothing depth model includes a three-branch fully connected layer for learning the coordinates, velocity, and acceleration of the skeleton points; each of the three-branch fully connected layers of the large tremor smoothing depth model has four fitting layer modules. The large tremor smoothing depth model is obtained by training 2D motion coordinate data marked with infrared reflective markers in the Human3.6 public dataset as ground truth data and HRNet network to detect noisy 2D motion coordinate data of human motion videos in the Human3.6 public dataset.
5. The video-based method for assessing resting tremor in Parkinson's disease according to claim 1, characterized in that, The small tremor smoothing depth model includes a three-branch fully connected layer for learning the coordinates, velocity, and acceleration of the skeleton points; each of the three branches of the small tremor smoothing depth model has four temporal convolutional modules. The small tremor smoothing depth model is obtained by training a dataset with small motion amplitude constructed by extending the frame 2D coordinate data.
6. The video-based method for assessing resting tremor in Parkinson's disease according to any one of claims 1 to 5, characterized in that, The calculation of tremor motion information for each skeletal point includes calculating the motion amplitude, motion acceleration, and motion frequency of each skeletal point. The calculation of the motion amplitude of each skeleton point includes: calculating the skeleton points of each frame except the first frame. Coordinates and skeleton points of the first frame The Euclidean distance of the coordinates is obtained The formula for calculating the motion amplitude of each skeleton point is: In the formula, It indicates the first Frame Image The range of motion of each skeletal point It indicates the first Frame Image The location of each skeleton point. It indicates the first Frame Image The location of each skeleton point; The calculation of the motion acceleration of each skeleton point includes: obtaining the motion velocity of the skeleton point by subtracting the motion amplitude of the skeleton point from the motion amplitude of each two adjacent frames, and then obtaining the motion acceleration of the skeleton point by subtracting the motion velocities of the two frames again. The calculation formula is as follows: In the formula: and They represent the first Frame and the The first frame of the image The movement speed of each skeleton point They represent the first Frame and the The first frame of the image The range of motion of each skeleton point; It indicates the first Frame Image The motion acceleration of the skeletal points; The calculation of the motion frequency of each skeleton point includes: calculating the reciprocal of the peak interval of the motion amplitude of each skeleton point, which is the motion frequency of that skeleton point. The calculation formula is as follows: In the formula: It indicates the first The movement frequency of the skeleton points Representing skeleton points respectively No. and The time corresponding to each peak.
7. The video-based method for assessing resting tremor in Parkinson's disease according to claim 6, characterized in that, The deep classification model includes three identical feature extraction branches, a fully connected layer module, and a Softmax hierarchical module. Each feature extraction branch includes a fully connected layer module for extracting head features, four temporal convolutional modules for fitting the tremor motion features of the skeleton points, and a fully connected layer module for integrating the output of the temporal convolutional modules into hierarchical features.
8. A video-based assessment device for resting tremor in Parkinson's disease, characterized in that, The device includes: The acquisition module is used to determine the pixel neighborhood blocks of different skeleton points on each frame of the acquired patient video to be evaluated, and to calculate the difference score between the pixel neighborhood blocks corresponding to each skeleton point in each two adjacent frames. At the same position in the grayscale image of the next frame of the current frame, a square pixel neighborhood block of the same size is determined. The proportion of the number of non-zero pixels contained in the difference block formed after calculating the difference between the pixel neighborhood blocks of the previous and next frames to the total number of pixels in the pixel neighborhood block is used as the difference score of the current two-dimensional skeleton point in the current frame. The clustering judgment module is used to perform K-means clustering with 3 types on the difference scores of each skeleton point, and to determine whether the type of the skeleton point with the highest difference score contains a root skeleton point; wherein, the root skeleton point is the waist, hip and chest skeleton point. The noise reduction module is used to input the skeleton points of this type into the large tremor smoothing depth model for noise reduction if the clustering judgment module determines that the cluster points do not contain the data, and input the skeleton points of the other two types into the small tremor smoothing depth model for noise reduction. If the clustering judgment module determines that the cluster points contain the data, then all three types of skeleton points are input into the small tremor smoothing depth model for noise reduction to obtain the overall skeleton point coordinate sequence after noise reduction. The grading assessment module is used to calculate the tremor motion information of each skeleton point based on the denoised overall skeleton point coordinate sequence, and input the tremor motion information into a preset depth classification model to obtain the assessment result of the resting tremor grade of Parkinson's disease.
9. An electronic device, characterized in that, Including processor and storage media; The storage medium is used to store instructions; The processor is configured to operate according to the instructions to perform the steps of the video-based method for assessing resting tremor in Parkinson's disease according to any one of claims 1 to 7.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the video-based method for assessing resting tremor in Parkinson's disease as described in any one of claims 1 to 7.