Fall risk early assessment method based on multi-modal information fusion
By using multimodal information fusion and combining video and sensor data assessment methods, this approach solves the problems of time-consuming and experience-dependent accuracy in existing technologies, achieving efficient and accurate fall risk assessment suitable for daily use in homes and communities.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG UNIV
- Filing Date
- 2023-12-29
- Publication Date
- 2026-06-26
AI Technical Summary
Existing methods for predicting fall risk are time-consuming and their accuracy depends on the experience of caregivers, resulting in low efficiency and low precision.
A multimodal information fusion method is employed, simultaneously collecting video data and sensor data from the subject while walking. A first evaluation is performed based on the video data, a second evaluation on the sensor data, and a final evaluation is generated by combining the results of both. The video data is processed using an STGCN network to extract spatiotemporal features of key human points, while the sensor data is evaluated using frequency and amplitude variance analysis. Finally, a comprehensive evaluation is achieved by combining these results with a mathematical statistical model.
It improves the accuracy and efficiency of fall risk assessment, simplifies the detection process, reduces costs, and is suitable for daily use in homes and communities.
Smart Images

Figure CN117672529B_ABST
Abstract
Description
Technical Field
[0001] This invention specifically relates to an early assessment method for fall risk based on multimodal information fusion. Background Technology
[0002] In order to effectively deal with such injuries, it is necessary to make predictions in advance so that families or communities can be aware of the potential fall risks of the elderly and provide corresponding countermeasures.
[0003] The primary predictive method currently available is still the scale model, such as the Tinetti Gait Scale. However, this model has drawbacks such as being time-consuming and inefficient. Furthermore, the final judgment of the results still relies on the experience of caregivers, resulting in a relatively low overall accuracy rate. Summary of the Invention
[0004] This invention provides an early fall risk assessment method based on multimodal information fusion to solve the aforementioned technical problems, specifically adopting the following technical solution:
[0005] A method for early fall risk assessment based on multimodal information fusion, comprising:
[0006] Simultaneously, video data and sensor data of the subjects while they were walking were collected;
[0007] The first evaluation result was obtained based on the video data;
[0008] A second evaluation result is obtained based on sensor data;
[0009] The final evaluation result is obtained by combining the results of the first and second evaluations.
[0010] Furthermore, the specific method for obtaining the first evaluation result based on video data is as follows:
[0011] Perform data cleaning on the video data;
[0012] Human target detection bounding boxes are used to extract human information from video data;
[0013] Pose estimation is performed on the human body information, and frame-by-frame human body key point information is extracted and converted into a graph structure;
[0014] Feature extraction of the human body key point map structure;
[0015] The extracted features are expanded in one dimension and scored to obtain the first scoring result;
[0016] The first evaluation result is obtained based on the first scoring result.
[0017] Furthermore, the specific method for feature extraction from the human body key point map structure is as follows:
[0018] The spatiotemporal features between key points of the human body are extracted using the stgcn network.
[0019] Furthermore, the STGCN network includes a normalization layer, several ST-GCN modules, a pooling layer, and a fully connected layer;
[0020] The specific method for extracting the spatiotemporal features between key points of the human body using the STGCN network is as follows:
[0021] The normalization layer performs a normalization operation on the human body key point map structure, so that the feature vectors of the same joint in different frames are normalized.
[0022] The ST-GCN modules described above are used to extract the features of human key points in both time and space dimensions from normalized information.
[0023] The extracted feature information is pooled and fully connected through the pooling layer and the fully connected layer before being output.
[0024] Furthermore, the ST-GCN module includes an ATT submodule, a GCN submodule, and a TCN submodule.
[0025] Furthermore, the sensor data includes three-dimensional acceleration data and deflection angle data.
[0026] Furthermore, the specific method for obtaining the second evaluation result based on sensor data is as follows:
[0027] Clean the sensor data;
[0028] Perform frequency analysis and amplitude variance analysis on sensor data;
[0029] The second evaluation result is obtained by combining the results of frequency analysis and amplitude variance analysis.
[0030] Furthermore, the specific method for performing frequency analysis on sensor data is as follows:
[0031] Collect normal sensor datasets and perform Fast Fourier Transform on each one to find the three most probable frequency values in each data set, with corresponding periods of T1, T2, and T3.
[0032] Based on these three cycles, perform ACF plot analysis on each set of data to find the lag order that maximizes the absolute value of the autocorrelation coefficient.
[0033] A comprehensive comparison yields the periodicity Ti of each data set. x (x=1,2,3…), using statistical methods, the general variation period Ti of acceleration and deflection angle information when walking in normal datasets of people with no risk of falling is determined;
[0034] The sensor data is divided into time series segments based on a defined normal period Ti.
[0035] The DTW method is used to compare the similarity between each sequence segment;
[0036] The second score result is obtained by scoring the subjects in the frequency dimension based on the degree of similarity;
[0037] The frequency analysis and evaluation results are obtained based on the second scoring result.
[0038] Furthermore, the specific method for performing amplitude variance analysis on sensor data is as follows:
[0039] Set a normal variance distribution and take a 95% confidence level. Compare the variance values of the measured sensor data. If it is in the normal range, it is judged as normal; otherwise, it is judged as a risk of fall. At the same time, the distance between the variance value of the sensor data and the distribution center will be used as the basis for the variance scoring item in the comprehensive evaluation.
[0040] At the amplitude level, the t-test was used to determine whether there was a significant difference between the mean amplitude of the normal group and the fall risk group, and the Mann-Whitney U test was used to determine whether there was a significant difference between the median amplitude of the normal group and the fall risk group. The amplitude data of the test subjects were compared with the median amplitude data of the dataset, and the amplitude scoring item was scored. Based on the scores of the two scoring items of variance and amplitude, the corresponding third score result was obtained.
[0041] The amplitude variance analysis evaluation results were obtained based on the third scoring results;
[0042] The second evaluation result is obtained by combining the results of frequency analysis and amplitude variance analysis.
[0043] Furthermore, the assessment results include normal, relatively normal, possibly at risk of falling, and at risk of falling;
[0044] The specific method for obtaining the final evaluation result by combining the results of the first and second evaluations is as follows:
[0045] If both the first and second assessment results are normal, then the final assessment result is normal.
[0046] If both the first and second assessments indicate a risk of falling, then the final assessment result is a risk of falling.
[0047] If the first assessment result and the second assessment result are inconsistent, the first score result, the second score result and the third score result are weighted by a certain proportion to obtain the final score result. The final score result is placed in the joint distribution of the score results of the normal and fall risk datasets. Based on its distance from the distribution center, four classification indicators are distinguished: normal, relatively normal, possible fall risk and fall risk, to obtain the final assessment result.
[0048] The advantage of this invention lies in the provision of an early fall risk assessment method based on multimodal information fusion, which utilizes multimodal information from two sources to comprehensively assess the subject's walking posture information, combining the advantages of both. Attached Figure Description
[0049] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0050] Figure 1 This is a flowchart of an early fall risk assessment method based on multimodal information fusion according to the present invention;
[0051] Figure 2 This is a schematic diagram of information collection according to the present invention;
[0052] Figure 3 This is a flowchart of the video data processing of the present invention;
[0053] Figure 4 This is a schematic diagram of the STGCN network of the present invention;
[0054] Figure 5 This is a flowchart of the sensor data processing of the present invention;
[0055] Figure 6 This is a flowchart of the frequency analysis method of the present invention;
[0056] Figure 7 This is a schematic diagram of the comprehensive statistical model of this invention. Detailed Implementation
[0057] The embodiments of this application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain this application, and should not be construed as limiting this application.
[0058] like Figure 1The diagram illustrates the main workflow of an early fall risk assessment method based on multimodal information fusion, which includes: simultaneously collecting video data and sensor data of the subject while walking; and obtaining a first assessment result based on the video data.
[0059] A second evaluation result is obtained based on sensor data. The final evaluation result is obtained by combining the first and second evaluation results.
[0060] Specifically, the overall detection process comprises two parts: one is an assessment model based on video processing methods, and the other is an assessment model based on sensor data analysis, which uses certain data characteristics to distinguish fall risk data from normal data. In the video analysis method, the collected walking video images of the test subject are input into the video analysis model, and analysis is performed based on the spatiotemporal characteristics of key human body points to derive an assessment result for the test subject. In the sensor method, the collected walking acceleration and deviation angle information of the test subject are input into the sensor analysis model, and analysis is performed based on the amplitude variance, frequency, and other characteristics of the data in three-dimensional space to derive another assessment result for the test subject. Finally, a mathematical statistical model is used to comprehensively evaluate the assessment results under the two models to obtain the final assessment result regarding fall risk.
[0061] like Figure 2 As shown, during the data acquisition process, the subject was required to walk in a straight line according to their usual walking style. A high-definition camera was placed directly in front of the subject to record gait video information. Simultaneously, a 3D sensor strapped to the waist recorded 3D acceleration and deflection angle information at 200ms intervals. These two sets of data were then input into the video analysis model and the sensor analysis model, respectively.
[0062] like Figure 3 As shown, in the embodiments of this application, the specific method for obtaining the first evaluation result based on video data is as follows: Data cleaning is performed on the video data. Human target detection bounding boxes are used to extract human information from the video data. Pose estimation is performed on the human information, extracting frame-by-frame human keypoint information and converting it into a graph structure. Feature extraction is performed on the human keypoint graph structure. In this application, a spatiotemporal graph convolutional (STGCN) network is used to extract the spatiotemporal features between human keypoints. The extracted features are one-dimensionally unfolded and scored to obtain a first scoring result (two scoring items). The first evaluation result is obtained based on the first scoring result.
[0063] Specifically, in the video processing method, the input video image is first cleaned, including video length trimming and frame segmentation. Then, based on the YOLO algorithm, human target detection is performed, and the human information in the video image is bounded. Next, the OpenPose method is used to estimate the pose of the human information, extracting frame-by-frame human keypoint information and converting it into a graph structure, thus completing the initial data preprocessing. Secondly, for the human keypoint graph structure, the Spatiotemporal Graph Convolutional Network (STGCN) method is used, with corresponding algorithm improvements, to extract the spatiotemporal features between human keypoints. After being unfolded into a one-dimensional feature form, it is input into a classifier, which scores two sub-items to obtain a first classification result, i.e., evaluation result 1. Specifically, a simple binary classifier is trained beforehand using the one-dimensional spatiotemporal features obtained from processing normal datasets and fall datasets.
[0064] In the embodiments of this application, the STGCN network includes a normalization layer, several ST-GCN modules, a pooling layer, and a fully connected layer. For example... Figure 4 The diagram illustrates the basic workflow of the STGCN network. A normalization layer normalizes the human keypoint graph structure, ensuring that the feature vectors of the same joint are consistent across different frames. This normalized information is input into several STGCN structures to extract the features of human keypoints in both temporal and spatial dimensions. The ST-GCN module comprises three sub-modules: ATT, GCN, and TCN. ATT is an attention model that weights different torso positions, indicating their varying roles in inducing fall risk during movement. GCN is a standard graph convolutional network that focuses on the relationships between points in the graph structure, extracting feature information between keypoints at a given moment in human motion. TCN is a temporal convolutional network that focuses on the temporal relationships between points, extracting feature information between points in time (frames) before and after a given point in human motion. After repeated convolutional operations, the extracted feature information is pooled and fully connected before being output.
[0065] In the embodiments of this application, the specific method for obtaining the second evaluation result based on sensor data is as follows:
[0066] Clean the sensor data.
[0067] Frequency analysis and amplitude variance analysis were performed on the sensor data.
[0068] The results of the combined frequency analysis and amplitude variance analysis yielded the second evaluation result.
[0069] Specifically, such as Figure 5 As shown, in the sensor data analysis mode, the raw data of three-dimensional acceleration and deflection angle collected by the sensor are simply cleaned, and then frequency analysis and amplitude variance analysis are performed respectively.
[0070] Frequency analysis primarily screens for fall risks caused by gait instability and disordered movement rhythm. In this pattern analysis, the autocorrelation function (ACF) plot is used to determine the period used for detection, and finally, dynamic time warping (DTW) is utilized specifically, such as... Figure 6 As shown, the specific method for frequency analysis of sensor data is as follows:
[0071] First, the collected normal sensor datasets are processed using Fast Fourier Transform (FFT) to identify the three most probable frequency values for each data set, corresponding to periods T1, T2, and T3. Based on these three periods, an ACF plot analysis is performed on each data set to find the lag order that maximizes the absolute value of the autocorrelation coefficient. After comprehensive comparison, the variation period Ti for each data set is determined. x (x=1,2,3…), by combining various Ti values in the dataset and using statistical methods, the general variation period Ti of acceleration and deflection angle information during walking in a normal dataset for individuals without fall risk is determined. Using the determined normal period Ti as the dividing interval, the sensor data is divided into time series segments. The similarity between each sequence segment is compared using the DTW method. Then, based on the similarity, a second scoring result (two scoring items) is obtained by assigning a score to the subject in the frequency dimension. A simple classification is then made based on the second scoring result to derive the fall risk status based on frequency analysis, i.e., the frequency analysis assessment result. The first scoring item is based on the similarity of periods obtained from different dimensions. Specifically, six different dimensions of periods are obtained based on three-dimensional acceleration and deflection angle. Under normal circumstances, periods obtained from different dimensions should have a certain degree of similarity; therefore, scoring is based on discreteness, with higher scores for more similar periods and lower scores for less similar periods. The second scoring item is based on the same dimension. After dividing the time series according to the calculated period, DTW is used to calculate the similarity. Then, it is compared with the normal and abnormal data obtained from training. If it is more similar to the normal data, a point is added; otherwise, a point is deducted.
[0072] In the amplitude variance analysis process, the risk of falls caused by excessive "swaying" while walking can be preliminarily screened. Under this model analysis, the variance of the initial normal dataset and the fall risk dataset differs significantly.
[0073] In actual detection and analysis, a normal variance distribution is set. Specifically, while ensuring privacy, walking data (angle of deviation and acceleration only) of volunteers is collected to construct a normal dataset and obtain the variance distribution of the normal data. At a 95% confidence level, the variance values of the tested sensor data are compared. If they fall within the normal range, the data is considered normal; otherwise, it is considered a fall risk. Simultaneously, the distance between the variance value of the sensor data and the distribution center is used as the scoring criterion for the variance item in the comprehensive evaluation. At the amplitude level, a t-test is used to determine if there is a significant difference between the mean amplitude of the normal group and the fall risk group, and a Mann-Whitney U test is used to determine if there is a significant difference between the median amplitude value of the normal group and the fall risk group. The amplitude data of the tested individuals is compared with the median amplitude data of the dataset. The fall risk group is derived from pre-obtained sensor data indicating a fall risk, such as collecting patients' walking data (angle of deviation and acceleration only) during clinical diagnosis with their consent and while ensuring their privacy, to construct a fall risk dataset. Similarly, the normal group can be assisted by volunteers providing data on normal walking. The primary goal of determining significant differences is to verify beforehand whether amplitude can easily distinguish between the normal group and the fall-risk group. If a significant difference exists, subsequent testing can determine whether the data leans more towards the normal group or the fall-risk group, i.e., by scoring the amplitude component. Finally, based on the scores for both variance and amplitude, a third scoring result is obtained. The amplitude variance analysis evaluation result is then derived from this third scoring result.
[0074] The second evaluation result is obtained by combining the frequency analysis evaluation results and the amplitude variance analysis evaluation results. Specifically, in this application, the frequency analysis evaluation results and the amplitude variance analysis evaluation results are weighted in a 1:1 ratio to obtain the combined evaluation result 2.
[0075] In the embodiments of this application, the assessment results include normal, relatively normal, possibly at risk of falling, and at risk of falling.
[0076] like Figure 7 The diagram illustrates a comprehensive statistical model proposed in this application, used for the fusion and evaluation of multimodal data to ultimately obtain a more accurate and comprehensive fall risk assessment for the test subject. In the above processing, there are a total of two assessment results and three scoring results (six sub-items). Firstly, for both assessment results, if both are normal or indicate a fall risk, the corresponding result is directly determined. That is, if both the first and second assessment results are normal, the final assessment result is normal. If both the first and second assessment results indicate a fall risk, the final assessment result is a fall risk.
[0077] If the first and second assessment results are inconsistent, the first, second, and third scores are weighted according to a certain ratio to obtain the final score. This final score is then placed in the joint distribution of the scores from the normal and fall risk datasets. Based on its distance from the distribution center, four classification indicators are determined: normal, relatively normal, possible fall risk, and fall risk, thus deriving the final assessment result. Specifically, as mentioned earlier, there are ultimately two assessment results and three scores (six sub-items: two for video, two for amplitude variance, and two for frequency analysis). When assessment results are inconsistent, all six scoring items need to be considered together. These scores are then summed according to their actual impact in the situation, according to a certain ratio, to obtain a single value. For example... Figure 7 As shown, the values in the normal data group have one distribution, and the values in the fall data group have another distribution. The test subject's data is input and categorized according to the relevant confidence level. If the data is within x% of the center of the normal distribution, it is considered normal; if it is between x% and y%, it is considered relatively normal. The fall distribution is analyzed similarly, leading to classifications such as "at risk of falling" and "possible risk of falling."
[0078] This method combines video processing with sensor detection, inheriting the high precision of sensors while utilizing deep learning technology and mathematical statistics principles to significantly shorten detection time. It simplifies traditional detection methods, using only a single sensor located at the waist to analyze changes in gait angle and acceleration data during walking, distinguishing between normal and fall-risk data, greatly reducing costs. The detection method is applicable to the assessment and initial screening of various gait-related diseases. In practical applications, a waist belt integrating sensors will be used, along with video capture via mobile phones or tablets, to analyze the data of the tested individual, making it suitable for everyday community and home use.
[0079] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the above embodiments do not limit the present invention in any way, and all technical solutions obtained by equivalent substitution or equivalent transformation fall within the protection scope of the present invention.
Claims
1. A method for early fall risk assessment based on multimodal information fusion, characterized in that, Include: Simultaneously, video data and sensor data of the subjects while they were walking were collected; The first evaluation result was obtained based on the video data; A second evaluation result is obtained based on sensor data; The final evaluation result is obtained by combining the results of the first and second evaluations. The specific method for obtaining the first evaluation result based on video data is as follows: Perform data cleaning on the video data; Human target detection bounding boxes are used to extract human information from video data. Pose estimation is performed on the human body information, and frame-by-frame human body key point information is extracted and converted into a graph structure; Feature extraction of the human body key point map structure; The extracted features are expanded in one dimension and scored to obtain the first scoring result; The first evaluation result is obtained based on the first scoring result; The specific method for feature extraction of the human body key point map structure is as follows: The spatiotemporal features between key points of the human body were extracted using the stgcn network. The specific method for obtaining the second evaluation result based on sensor data is as follows: Clean the sensor data; Perform frequency analysis and amplitude variance analysis on sensor data; The second evaluation result is obtained by combining the results of frequency analysis and amplitude variance analysis; The specific method for performing frequency analysis on sensor data is as follows: Collect normal sensor datasets and perform Fast Fourier Transform on each one to find the three most probable frequency values in each data set, with corresponding periods of T1, T2, and T3. Based on these three cycles, perform ACF plot analysis on each set of data to find the lag order that maximizes the absolute value of the autocorrelation coefficient. A comprehensive comparison yields the period Ti of change for each set of data. x (x=1,2,3…), using statistical methods, the general variation period Ti of acceleration and deflection angle information when walking in normal datasets of people with no risk of falling is determined; The sensor data is divided into time series segments based on a defined normal period Ti. The DTW method is used to compare the similarity between each sequence segment; The second score result is obtained by scoring the subjects in the frequency dimension based on the degree of similarity; The frequency analysis and evaluation results are obtained based on the second scoring result; The specific method for performing amplitude variance analysis on sensor data is as follows: Set a normal variance distribution and take a 95% confidence level. Compare the variance values of the measured sensor data. If it is in the normal range, it is judged as normal; otherwise, it is judged as a risk of fall. At the same time, the distance between the variance value of the sensor data and the distribution center will be used as the basis for the variance scoring item in the comprehensive evaluation. At the amplitude level, the t-test was used to determine whether there was a significant difference between the mean amplitude of the normal group and the fall risk group, and the Mann-Whitney U test was used to determine whether there was a significant difference between the median amplitude of the normal group and the fall risk group. The amplitude data of the test subjects were compared with the median amplitude data of the dataset, and the amplitude scoring item was scored. Based on the scores of the two scoring items of variance and amplitude, the corresponding third score result was obtained. The amplitude variance analysis evaluation results were obtained based on the third scoring results; The second evaluation result is obtained by combining the results of frequency analysis and amplitude variance analysis.
2. The early fall risk assessment method based on multimodal information fusion according to claim 1, characterized in that, The STGCN network includes a normalization layer, several ST-GCN modules, a pooling layer, and a fully connected layer. The specific method for extracting the spatiotemporal features between key points of the human body using the STGCN network is as follows: The normalization layer performs a normalization operation on the human body key point map structure, so that the feature vectors of the same joint in different frames are normalized. The ST-GCN modules are used to extract the features of human key points in both time and space dimensions from normalized information. The extracted feature information is pooled and fully connected through the pooling layer and the fully connected layer before being output.
3. The early fall risk assessment method based on multimodal information fusion according to claim 2, characterized in that, The ST-GCN module includes an ATT submodule, a GCN submodule, and a TCN submodule.
4. The early fall risk assessment method based on multimodal information fusion according to claim 1, characterized in that, The sensor data includes three-dimensional acceleration data and deflection angle data.
5. The early fall risk assessment method based on multimodal information fusion according to claim 4, characterized in that, The assessment results include normal, relatively normal, possibly at risk of falling, and at risk of falling; The specific method for obtaining the final evaluation result by combining the results of the first and second evaluations is as follows: If both the first and second assessment results are normal, then the final assessment result is normal. If both the first and second assessments indicate a risk of falling, then the final assessment result is a risk of falling. If the first assessment result and the second assessment result are inconsistent, the first score result, the second score result and the third score result are weighted by a certain proportion to obtain the final score result. The final score result is placed in the joint distribution of the score results of the normal and fall risk datasets. Based on its distance from the distribution center, four classification indicators are distinguished: normal, relatively normal, possible fall risk and fall risk, to obtain the final assessment result.