Ultrasonic radio frequency signal-based bone density prediction method and apparatus, and artificial intelligence ultrasonic bone density meter

By using a bone density prediction method based on ultrasound radio frequency signals, and extracting ultrasound radio frequency signal features through multi-scale attention convolution channels and cross-attention modules, the problems of large size, high radiation, and low accuracy of existing equipment are solved, thus achieving efficient and accurate bone density detection.

WO2026123440A1PCT designated stage Publication Date: 2026-06-18SUN YAT SEN MEMORIAL HOSPITAL SUN YAT SEN UNIV +1

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SUN YAT SEN MEMORIAL HOSPITAL SUN YAT SEN UNIV
Filing Date
2025-01-10
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

Existing osteoporosis screening equipment suffers from problems such as large size, high price, and high radiation. Furthermore, traditional quantitative ultrasound bone densitometers, based on acoustic parameters, ignore or lose a large amount of information about sound field and tissue interaction, resulting in limited accuracy in diagnosis.

Method used

A bone mineral density prediction method based on ultrasound radiofrequency signals was adopted. Using a trained bone mineral density prediction model, multiple-scale temporal features of ultrasound radiofrequency signals were extracted through multi-scale attention convolution channels and cross-attention modules, and bone mineral density was predicted in combination with clinical risk factors.

Benefits of technology

It improves the accuracy and intelligence of bone density prediction, and has the advantages of low cost, small size, portability and no radiation, making it suitable for large-scale screening and clinical monitoring.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025071689_18062026_PF_FP_ABST
    Figure CN2025071689_18062026_PF_FP_ABST
Patent Text Reader

Abstract

An ultrasonic radio frequency signal-based bone density prediction method and apparatus, and an artificial intelligence ultrasonic bone density meter. The bone density prediction method comprises: acquiring detection data of a tested user, the detection data comprising ultrasonic radio frequency signal data collected by a quantitative ultrasonic device from the tested user (S101); and on the basis of the detection data, using a trained bone density prediction model to perform bone density prediction on the tested user, wherein the bone density prediction model is obtained by training using multiple sample data collected from a plurality of subjects and respective labels of the multiple sample data; the bone density prediction model comprises an output layer and one or a plurality of multi-scale attention convolution channels; each multi-scale attention convolution channel comprises a cross-attention module and a plurality of convolution layers, wherein the plurality of convolution layers have different convolution kernel sizes for extracting multi-scale temporal features from the ultrasonic radio frequency signal data, and the cross-attention module is configured to perform weighted fusion of the multi-scale temporal features to obtain a fused feature map; and the output layer is configured to obtain a bone density prediction result on the basis of the fused feature map (S102).
Need to check novelty before this filing date? Find Prior Art

Description

Methods, devices, and artificial intelligence-based ultrasound bone densitometers for predicting bone mineral density based on ultrasound radio frequency signals Technical Field

[0001] This disclosure relates to the field of medical examination and testing technology, and in particular to a method, device and artificial intelligence ultrasound bone densitometer for predicting bone density based on ultrasound radio frequency signals. Background Technology

[0002] Osteoporosis is a systemic bone disease characterized by low bone mass, damage to bone microstructure, increased bone fragility, and a high susceptibility to fractures. Osteoporotic fractures, also known as fragility fractures, are the most serious consequence of osteoporosis, referring to fractures that occur when subjected to a force equivalent to a fall from standing height or below. Osteoporotic fractures have a high incidence, high disability and mortality rates, and high medical costs, placing a heavy burden on patients, families, and society. Preventing or reducing fractures has always been a primary goal of osteoporosis prevention and control. However, osteoporosis prevention and control in my country still faces the dilemma of "high incidence" and "low awareness, diagnosis, and treatment rates," the fundamental reason being the lack of economical, portable, and accurate osteoporosis screening equipment.

[0003] Currently, commonly used methods for screening and diagnosing osteoporosis include: dual-energy X-ray absorptiometry (DXA), peripheral dual-energy X-ray absorptiometry (pDXA), quantitative computed tomography (QCT), peripheral quantitative computed tomography (pQCT), and quantitative ultrasound (QUS).

[0004] Dual-energy X-ray absorptiometry (DXA) is an internationally recognized method for osteoporosis detection and is the "gold standard" recommended by the World Health Organization (WHO) for diagnosing osteoporosis. It works by counting the uptake rate of X-rays to obtain bone mineral density (BMD) and the corresponding T-value. However, due to the high cost and large size of DXA equipment, radiation exposure, the need for specialized personnel to operate it, and the lack of such equipment in community-level hospitals (less than 0.35 DXA bone densitometers per million people), it cannot meet the needs of large-scale osteoporosis screening. Quantitative computed tomography (QCT) utilizes the imaging principle of X-ray CT to achieve quantitative bone density imaging per unit volume. However, it suffers from problems such as large size, high cost, high radiation dose, and algorithmic defects. Currently, it is not recommended as an osteoporosis detection device and remains in the animal experiment and clinical research stage.

[0005] In addition, quantitative ultrasound (QUS) is a commonly used osteoporosis screening technology. Compared with DXA, QUS has advantages such as low cost, small size, portability, less time consumption, and no radiation, making it more suitable for a wide range of medical applications and promising to solve the problem of DXA lack in primary healthcare institutions.

[0006] The basic principle of existing QUS technology is to extract acoustic parameters from ultrasound signals based on prior acoustic knowledge, and then deduce the bone mineral density of the measured site from these parameters. Different QUS manufacturers use different acoustic parameters, including: bone speed of sound (SOS), broadband ultrasound attention (BUA), and composite parameters combining BUA and SOS such as stiffness index (SI) and quantitative ultrasound index (QUI). However, these parameters, based on acoustic principles and simplified using an "ideal solid model," only reflect some characteristics of sound wave propagation in bone, neglecting or losing a significant amount of information about sound field and tissue interaction, thus limiting the accuracy of their judgments. Summary of the Invention

[0007] To address the problems in the related technologies, this disclosure provides a method, apparatus, and artificial intelligence ultrasound bone densitometer for predicting bone mineral density based on ultrasound radio frequency signals.

[0008] The first aspect of this disclosure provides a method for predicting bone mineral density based on ultrasound radio frequency signals, characterized in that it includes:

[0009] Acquire detection data from the tested user. The detection data includes ultrasound radio frequency signal data collected from the tested user by a quantitative ultrasound device. The quantitative ultrasound device includes an ultrasound transmitter and an ultrasound receiver. The ultrasound radio frequency signal data includes the raw radio frequency signal data detected by the ultrasound receiver after the ultrasound signal emitted by the ultrasound transmitter passes through a designated detection location on the human body.

[0010] Based on the detection data, a trained bone density prediction model is used to predict the bone density of the tested user, wherein:

[0011] The bone mineral density prediction model is trained using multiple sample data collected from multiple subjects and the labels of each of the multiple sample data. The sample data includes ultrasound radiofrequency signal data collected from the corresponding subjects by a quantitative ultrasound device, and the labels of each of the multiple sample data are determined based on the bone mineral density data of the multiple subjects.

[0012] The bone density prediction model includes an output layer and one or more multi-scale attention convolutional channels. The multi-scale attention convolutional channels include a cross-attention module and multiple convolutional layers with different kernel sizes, used to extract time features of multiple scales from the ultrasound radio frequency signal data. The cross-attention module is used to perform weighted fusion of the time features of multiple scales to obtain a fused feature map. The output layer is used to obtain the bone density prediction result based on the fused feature map.

[0013] According to embodiments of this disclosure, the designated detection location includes the radius; the label includes axial bone mineral density value and / or axial bone mass category; and for any subject, the sample data and bone mineral density data are collected on the same day.

[0014] According to embodiments of this disclosure, the quantitative ultrasound device includes M ultrasound generators and N ultrasound receivers, and the ultrasound radio frequency signal data includes ultrasound radio frequency signal data from M*N transmit-receive channels formed by the M ultrasound generators and N ultrasound receivers; the bone density prediction model includes M*N multi-scale attention convolution channels, which are respectively used to process the ultrasound radio frequency signal data from the M*N transmit-receive channels.

[0015] According to embodiments of this disclosure, the output layer includes a fully connected layer.

[0016] According to embodiments of this disclosure, the bone mineral density prediction result includes a bone mineral density category and / or a bone mineral density value; the fully connected layer includes a first fully connected layer and / or a second fully connected layer, wherein the first fully connected layer is used to obtain the bone mineral density category based on the fusion feature map, and the second fully connected layer is used to obtain the bone mineral density value based on the fusion feature map.

[0017] According to an embodiment of this disclosure, when the bone density prediction model includes multiple multi-scale attention convolutional channels, the bone density prediction model further includes: a first splicing module for splicing the outputs of the multiple multi-scale attention convolutional channels; and a first global average pooling layer for performing global average pooling on the output of the first splicing module, wherein the fully connected layer obtains the bone density prediction result based on the output of the first global average pooling layer.

[0018] According to an embodiment of this disclosure, when the bone density prediction model includes multiple multi-scale attention convolution channels, the bone density prediction model further includes: a second splicing module, used to splice clinical risk factors obtained based on clinical risk factor data with the output of the first global average pooling layer, wherein obtaining the bone density prediction result based on the output of the one or more multi-scale attention convolution channels includes obtaining the bone density prediction result based on the output of the second splicing module.

[0019] According to embodiments of this disclosure, the bone mineral density prediction model further includes a third fully connected layer for obtaining the clinical risk factors based on the clinical risk factor data; the clinical risk factors include at least one or more of the following: gender, age, weight, height, and fracture history.

[0020] According to embodiments of this disclosure, the multi-scale attention convolution channel includes a multi-scale attention unit or a plurality of multi-scale attention units in series, and the multi-scale attention unit includes one or a plurality of multi-scale attention modules in series.

[0021] According to embodiments of this disclosure, when the multi-scale attention convolution channel includes multiple multi-scale attention units connected in series, adjacent multi-scale attention units are connected through pooling layers.

[0022] According to embodiments of this disclosure, the pooling layer includes any one of the following: max pooling layer, average pooling layer, global pooling layer, adaptive pooling layer, global average pooling layer, and random pooling layer.

[0023] According to embodiments of this disclosure, the multi-scale attention convolution channel includes three multi-scale attention units connected in series, adjacent multi-scale attention units are connected through a max pooling layer, and the multi-scale attention unit includes two multi-scale attention modules connected in series.

[0024] According to embodiments of this disclosure, the multi-scale attention module includes a multi-scale cross-attention module, which comprises m convolutional layers, m-1 cross-attention modules, and a third concatenation module, where m ≥ 2. Specifically: the m convolutional layers have different kernel sizes and are used to extract temporal features of different scales from the input of the multi-scale cross-attention module; the m-1 cross-attention modules are used to progressively fuse the outputs of the m convolutional layers in ascending order of kernel size based on a cross-attention mechanism; and the third concatenation module is used to concatenate the output of the convolutional layer with the smallest kernel with the outputs of the m-1 cross-attention modules.

[0025] According to an embodiment of this disclosure, the inputs of the m convolutional layers are connected to the input of the multi-scale cross-attention module, the kernel of the i-th convolutional layer is smaller than the kernel of the (i+1)-th convolutional layer, 1≤i≤m-1; the first input of the first cross-attention module is connected to the output of the second convolutional layer, and the second input of the first cross-attention module is connected to the output of the first convolutional layer; the first input of the j-th cross-attention module is connected to the output of the (j+1)-th convolutional layer, and the second input of the j-th cross-attention module is connected to the first output of the (j-1)-th cross-attention module, wherein the first output of the cross-attention module outputs the cross-attention fusion result of the first input data of the first input of the cross-attention module and the second input data of the second input of the second input, 2≤j≤m-1.

[0026] According to embodiments of this disclosure, the cross-attention module includes: a first projection module for linearly projecting the first input data using a 1*1 pointwise convolution to obtain a first vector matrix Q; a second projection module for linearly projecting the second input data using a 1*1 pointwise convolution to obtain a second vector matrix K; a third projection module for linearly projecting the second input data using a 1*1 pointwise convolution to obtain a third vector matrix V; a first dot product module for calculating the dot product of the transpose of the first vector matrix Q and the second vector matrix K; a normalization module for normalizing the output of the first dot product module; a second dot product module for calculating the dot product of the output of the normalization module and the third vector matrix V; a residual connection module for superimposing the first input data and the output of the second dot product module; a first output terminal for outputting the calculation result of the second dot product module; and a second output terminal for outputting the calculation result of the residual connection module.

[0027] According to embodiments of this disclosure, the multi-scale cross-attention module further includes a batch normalization module and an activation function module that are sequentially connected in series to the output of the third splicing module.

[0028] According to embodiments of this disclosure, the multi-scale attention module further includes a compressed excitation attention module for scaling the weights of features of different scales contained in the output of the multi-scale cross-attention module to remove redundant information.

[0029] According to an embodiment of this disclosure, the compressed excitation attention module includes a second global average pooling layer, a fourth fully connected layer, a fifth fully connected layer, and a third dot multiplication module connected in series. The third dot multiplication module is used to multiply the output of the fifth fully connected layer with the input of the compressed excitation attention module.

[0030] The second aspect of this disclosure provides a method for training a bone mineral density prediction model based on ultrasound radio frequency signals, characterized by comprising:

[0031] Acquire multiple sample data collected from multiple subjects, the sample data including ultrasound radio frequency signal data collected from the corresponding subjects by a quantitative ultrasound device, the quantitative ultrasound device including an ultrasound transmitter and an ultrasound receiver, the ultrasound radio frequency signal data including the raw radio frequency signal data detected by the ultrasound receiver after the ultrasound signal emitted by the ultrasound transmitter passes through a designated detection location on the human body;

[0032] Based on the bone mineral density data of the multiple subjects, the labels of the multiple sample data are determined;

[0033] A bone density prediction model is trained using the plurality of sample data and their respective labels, wherein:

[0034] The bone density prediction model includes an output layer and one or more multi-scale attention convolutional channels. The multi-scale attention convolutional channels include: multiple convolutional layers with different kernel sizes for extracting time features of multiple scales from the ultrasound radio frequency signal data; and a cross-attention module for weighted fusion of the time features of multiple scales to obtain a fused feature map for bone density prediction. The output layer is used to obtain the bone density prediction result based on the fused feature map.

[0035] According to embodiments of this disclosure, the quantitative ultrasound device includes M ultrasound generators and N ultrasound receivers, and the ultrasound radio frequency signal data includes ultrasound radio frequency signal data from M*N transmit-receive channels formed by the M ultrasound generators and N ultrasound receivers; the bone density prediction model includes M*N multi-scale attention convolution channels, which are respectively used to process the ultrasound radio frequency signal data from the M*N transmit-receive channels.

[0036] According to embodiments of this disclosure, the multi-scale attention convolution channel includes a multi-scale attention unit or a plurality of multi-scale attention units in series, and the multi-scale attention unit includes one or a plurality of multi-scale attention modules in series.

[0037] According to embodiments of this disclosure, when the multi-scale attention convolution channel includes multiple multi-scale attention units connected in series, adjacent multi-scale attention units are connected through pooling layers.

[0038] According to embodiments of this disclosure, the multi-scale attention module includes a multi-scale cross-attention module and a compressed activation attention module. The multi-scale cross-attention module includes m convolutional layers, m-1 cross-attention modules, and a third concatenation module, where m ≥ 2. It also includes a batch normalization module and an activation function module sequentially connected in series to the output of the third concatenation module. The compressed activation attention module is used to scale the weights of features of different scales contained in the output of the multi-scale cross-attention module to remove redundant information. Specifically: the m convolutional layers have different kernel sizes and are used to extract temporal features of different scales from the input of the multi-scale cross-attention module; the m-1 cross-attention modules are used to fuse the outputs of the m convolutional layers step by step according to the cross-attention mechanism, in ascending order of kernel size; and the third concatenation module is used to concatenate the output of the convolutional layer with the smallest kernel with the outputs of the m-1 cross-attention modules.

[0039] According to an embodiment of this disclosure, the inputs of the m convolutional layers are connected to the input of the multi-scale cross-attention module, the kernel of the i-th convolutional layer is smaller than the kernel of the (i+1)-th convolutional layer, 1≤i≤m-1; the first input of the first cross-attention module is connected to the output of the second convolutional layer, and the second input of the first cross-attention module is connected to the output of the first convolutional layer; the first input of the j-th cross-attention module is connected to the output of the (j+1)-th convolutional layer, and the second input of the j-th cross-attention module is connected to the first output of the (j-1)-th cross-attention module, wherein the first output of the cross-attention module outputs the cross-attention fusion result of the first input data of the first input of the cross-attention module and the second input data of the second input of the cross-attention module, 2≤j≤m-1; the compressed activation attention module includes a second global average pooling layer, a fourth fully connected layer, a fifth fully connected layer, and a third dot product module connected in series, the third dot product module being used to multiply the output of the fifth fully connected layer with the input of the compressed activation attention module.

[0040] According to embodiments of this disclosure, the cross-attention module includes: a first projection module for linearly projecting the first input data using a 1*1 pointwise convolution to obtain a first vector matrix Q; a second projection module for linearly projecting the second input data using a 1*1 pointwise convolution to obtain a second vector matrix K; a third projection module for linearly projecting the second input data using a 1*1 pointwise convolution to obtain a third vector matrix V; a first dot product module for calculating the dot product of the transpose of the first vector matrix Q and the second vector matrix K; a normalization module for normalizing the output of the first dot product module; a second dot product module for calculating the dot product of the output of the normalization module and the third vector matrix V; a residual connection module for superimposing the first input data and the output of the second dot product module; a first output terminal for outputting the calculation result of the second dot product module; and a second output terminal for outputting the calculation result of the residual connection module.

[0041] A third aspect of this disclosure provides a bone mineral density prediction device based on ultrasound radio frequency signals, characterized in that it comprises:

[0042] The first acquisition module is configured to acquire the detection data of the tested user. The detection data includes ultrasound radio frequency signal data collected by the quantitative ultrasound device from the tested user. The quantitative ultrasound device includes an ultrasound transmitter and an ultrasound receiver. The ultrasound radio frequency signal data includes the raw radio frequency signal data detected by the ultrasound receiver after the ultrasound signal emitted by the ultrasound transmitter passes through a designated detection location on the human body.

[0043] The prediction module is configured to predict the bone density of the tested user based on the detection data using a trained bone density prediction model, wherein:

[0044] The bone mineral density prediction model is trained using multiple sample data collected from multiple subjects and the labels of each of the multiple sample data. The sample data includes ultrasound radiofrequency signal data collected from the corresponding subjects by a quantitative ultrasound device, and the labels of each of the multiple sample data are determined based on the bone mineral density data of the multiple subjects.

[0045] The bone density prediction model includes an output layer and one or more multi-scale attention convolutional channels. The multi-scale attention convolutional channels include a cross-attention module and multiple convolutional layers with different kernel sizes, used to extract time features of multiple scales from the ultrasound radio frequency signal data. The cross-attention module is used to perform weighted fusion of the time features of multiple scales to obtain a fused feature map. The output layer is used to obtain the bone density prediction result based on the fused feature map.

[0046] The fourth aspect of this disclosure provides a bone mineral density prediction model training device based on ultrasound radio frequency signals, characterized in that it includes:

[0047] The second acquisition module is configured to acquire multiple sample data collected from multiple subjects. The sample data includes ultrasound radio frequency signal data collected from the corresponding subjects by a quantitative ultrasound device. The quantitative ultrasound device includes an ultrasound transmitter and an ultrasound receiver. The ultrasound radio frequency signal data includes the raw radio frequency signal data detected by the ultrasound receiver after the ultrasound signal emitted by the ultrasound transmitter passes through a designated detection location on the human body.

[0048] The determination module is configured to determine the labels of the multiple sample data based on the bone mineral density data of the multiple subjects;

[0049] The training module is configured to train a bone density prediction model using the plurality of sample data and the respective labels of the plurality of sample data, wherein:

[0050] The bone density prediction model includes an output layer and one or more multi-scale attention convolutional channels. The multi-scale attention convolutional channels include a cross-attention module and multiple convolutional layers with different kernel sizes, used to extract time features of multiple scales from the ultrasound radio frequency signal data. The cross-attention module is used to perform weighted fusion of the time features of multiple scales to obtain a fused feature map. The output layer is used to obtain the bone density prediction result based on the fused feature map.

[0051] A fifth aspect of this disclosure provides an electronic device, characterized in that it includes a processor and a memory, wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the steps of the method described in any of the preceding claims.

[0052] The sixth aspect of this disclosure provides an artificial intelligence ultrasound bone densitometer, characterized in that it comprises:

[0053] Straps;

[0054] One or more ultrasonic transmitters and one or more ultrasonic receivers are disposed on the strap;

[0055] A control host, the control host including a processor and a memory, wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the steps of the method described above.

[0056] A seventh aspect of this disclosure provides a computer-readable storage medium having computer instructions stored thereon, characterized in that the computer instructions, when executed by a processor, implement the steps of the method described in any of the preceding claims.

[0057] The eighth aspect of this disclosure provides a computer program product comprising computer instructions, characterized in that, when executed by a processor, the computer instructions implement the steps of the method described in any of the preceding claims.

[0058] The technical solution provided in this disclosure fully utilizes the rich amplitude, frequency, and phase information in ultrasound radio frequency signals. This information reflects the interaction between the ultrasound sound field and human tissue, as well as the microstructural characteristics of human tissue. A bone density prediction model is obtained by training a deep neural network model containing an attention mechanism. The bone density prediction model is used to predict bone density based on the ultrasound radio frequency signals of the tested user. It can automatically extract key feature variables related to bone density prediction from the ultrasound radio frequency signals through the attention mechanism. Compared with the existing method of extracting acoustic parameters from ultrasound signals and deriving bone density based on acoustic parameters, it has higher intelligence and accuracy. At the same time, compared with detection methods such as DXA and QCT, the quantitative ultrasound detection method according to this disclosure has advantages such as low cost, small size, portability, less time consumption, and no radiation. It does not require a special examination site to shield radiation, making measurement more convenient and better meeting the needs of large-scale screening, clinical monitoring, and epidemiological surveys.

[0059] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description

[0060] Other features, objects, and advantages of this disclosure will become more apparent from the following detailed description of non-limiting embodiments, taken in conjunction with the accompanying drawings. In the drawings:

[0061] Figure 1 shows a flowchart of a bone mineral density prediction method based on ultrasound radio frequency signals according to an embodiment of the present disclosure.

[0062] Figure 2 shows the ultrasound radio frequency data of four channels of a quantitative ultrasound device for three consecutive cycles.

[0063] Figure 3 shows a structural diagram of a bone mineral density prediction model according to an embodiment of the present disclosure.

[0064] Figure 4 illustrates a specific example of a multi-scale attention module according to an embodiment of the present disclosure.

[0065] Figure 5 illustrates a specific example of a cross-attention module according to an embodiment of this disclosure.

[0066] Figure 6 shows a flowchart of a bone mineral density prediction model training method based on ultrasound radio frequency signals according to an embodiment of the present disclosure.

[0067] Figure 7 shows a structural block diagram of a bone mineral density prediction device based on ultrasound radio frequency signals according to an embodiment of the present disclosure.

[0068] Figure 8 shows a structural block diagram of a bone mineral density prediction model training device based on ultrasound radio frequency signals according to an embodiment of the present disclosure.

[0069] Figure 9 shows a structural block diagram of an electronic device according to an embodiment of the present disclosure.

[0070] Figure 10 shows a schematic diagram of the structure of a computer system suitable for implementing the method according to an embodiment of the present disclosure. Detailed Implementation

[0071] In the following, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings to enable those skilled in the art to readily implement them. Furthermore, for clarity, portions unrelated to the description of exemplary embodiments have been omitted from the drawings.

[0072] In this disclosure, it should be understood that terms such as “comprising” or “having” are intended to indicate the presence of features, figures, steps, behaviors, components, parts or combinations thereof disclosed in this specification, and are not intended to exclude the possibility of the presence or addition of one or more other features, figures, steps, behaviors, components, parts or combinations thereof.

[0073] It should also be noted that, unless otherwise specified, the embodiments and features described in this disclosure can be combined with each other. This disclosure will now be described in detail with reference to the accompanying drawings and embodiments.

[0074] In this disclosure, any operation involving the acquisition of user information or user data, or the display of user information or user data to others, is an operation authorized or confirmed by the user, or actively selected by the user.

[0075] As mentioned above, the commonly used bone density detection technologies such as DXA and QCT in the prior art have drawbacks such as large size, high price and large radiation dose, which are not conducive to large-scale promotion and use. The traditional QUS bone densitometer derives bone density based on acoustic parameters extracted from ultrasound radio frequency signals, ignoring or losing a lot of information about the interaction between the sound field and human tissue, which limits the accuracy of its judgment.

[0076] This disclosure provides a bone mineral density prediction method based on ultrasound radio frequency signals, comprising: acquiring detection data of a test user, the detection data including ultrasound radio frequency signal data collected from the test user by a quantitative ultrasound device, the quantitative ultrasound device including an ultrasound transmitter and an ultrasound receiver, the ultrasound radio frequency signal data including the raw radio frequency signal data detected by the ultrasound receiver after the ultrasound signal emitted by the ultrasound transmitter passes through a designated detection location on the human body; and predicting the bone mineral density of the test user using a trained bone mineral density prediction model based on the detection data, wherein: the bone mineral density prediction model is trained using multiple sample data collected from multiple subjects and the labels of each of the multiple sample data. The obtained sample data includes ultrasound radiofrequency signal data collected from corresponding subjects by a quantitative ultrasound device. The labels of each of the multiple sample data are determined based on the bone mineral density data of the multiple subjects. The bone mineral density prediction model includes an output layer and one or more multi-scale attention convolutional channels. The multi-scale attention convolutional channels include a cross-attention module and multiple convolutional layers with different kernel sizes, used to extract time features of multiple scales from the ultrasound radiofrequency signal data. The cross-attention module is used to perform weighted fusion of the time features of multiple scales to obtain a fused feature map. The output layer is used to obtain the bone mineral density prediction result based on the fused feature map.

[0077] Ultrasonic radio frequency signal data includes the raw radio frequency signal data detected by an ultrasonic receiver after the ultrasonic signal emitted by the ultrasonic transmitter passes through a designated detection location on the human body. For example, it can be the amplitude time series of the raw radio frequency signal. The technical solution provided according to the embodiments of this disclosure fully utilizes the rich amplitude, frequency, and phase information in the ultrasonic radio frequency signal. This information reflects the interaction between the ultrasonic sound field and human tissue, as well as the microstructural characteristics of the human tissue. A bone density prediction model is obtained by training a deep neural network model containing an attention mechanism. Using this bone density prediction model based on the ultrasonic radio frequency signal of the tested user, bone density prediction can be performed. The attention mechanism can automatically extract key feature variables related to bone density prediction from the ultrasonic radio frequency signal. Compared with existing methods that extract acoustic parameters from the ultrasonic signal and derive bone density based on these acoustic parameters, this method has higher intelligence and accuracy. Furthermore, compared with detection methods such as DXA and QCT, the quantitative ultrasound detection method according to the embodiments of this disclosure has advantages such as low cost, small size, portability, less time consumption, and no radiation. It does not require special examination sites to shield radiation, making measurement more convenient and better meeting the needs of large-scale screening, clinical monitoring, and epidemiological surveys.

[0078] Figure 1 shows a flowchart of a bone mineral density prediction method based on ultrasound radio frequency signals according to an embodiment of the present disclosure. As shown in Figure 1, the bone mineral density prediction method based on ultrasound radio frequency signals includes the following steps S101-S102:

[0079] In step S101, the detection data of the tested user is acquired. The detection data includes ultrasound radio frequency signal data collected by the quantitative ultrasound device from the tested user. The quantitative ultrasound device includes an ultrasound transmitter and an ultrasound receiver. The ultrasound radio frequency signal data includes the raw radio frequency signal data detected by the ultrasound receiver after the ultrasound signal emitted by the ultrasound transmitter passes through a designated detection location on the human body.

[0080] In step S102, based on the detection data, a trained bone density prediction model is used to predict the bone density of the tested user. The bone density prediction model is trained using multiple sample data collected from multiple subjects and the labels of each sample data. The sample data includes ultrasound radiofrequency signal data collected from the corresponding subjects by a quantitative ultrasound device. The labels of each sample data are determined based on the bone density data of the multiple subjects. The bone density prediction model includes an output layer and one or more multi-scale attention convolutional channels. Each multi-scale attention convolutional channel includes a cross-attention module and multiple convolutional layers with different kernel sizes, used to extract time features at multiple scales from the ultrasound radiofrequency signal data. The cross-attention module is used to weightedly fuse the time features at multiple scales to obtain a fused feature map. The output layer is used to obtain the bone density prediction result based on the fused feature map.

[0081] According to the embodiments of this disclosure, the self-developed prototype host adopts a commercial portable ultrasonic bone densitometer host (BMD-9) and is equipped with a self-developed wristband-type probe to detect the radial bone position. The wristband-type probe includes a wristband and a probe assembly, which has a modular design that can be detached from the wristband. The probe assembly includes dual-transmitter, dual-receiver piezoelectric ceramic high-performance ultrasonic transducers, namely transducers A, B, C, and D, where A and B are respectively used as transmitting transducers E1 and E2 (used as ultrasonic transmitters), and C and D are respectively used as receiving transducers R1 and R2 (used as ultrasonic receivers). The angle of each element is approximately 30°, forming an axial transmission multi-channel ultrasonic transducer combination. In one specific embodiment, the probe center frequency is 1.25MHz, the bandwidth is 41.27%, the probe sensitivity is -46.69dB, the maximum crosstalk is -52.60dB, the capacitance is approximately 520pF, and the sound velocity accuracy and repeatability are both less than 50m / s sound velocity deviation (PM: 0.1797%; CU: 0.108%).

[0082] The strap can be made of finely frosted soft rubber material and is designed with buckles to adjust the tightness of different tightness and grooves to fit the radial side, which can keep the probe assembly close to the radius and as parallel as possible, ensuring the stability and reliability of the signal during the test.

[0083] The portable ultrasound bone densitometer (BMD-9), after being matched, connected, and improved with a wristband-type probe, can display SOS and T values ​​on the main unit's display interface, and can save and retrieve the raw ultrasound radio frequency signal data detected by the probe assembly. According to the algorithm built into the main unit system, if the subject's T value is lower than or equal to -2.5, it is classified as "osteoporosis"; if its T value is higher than -2.5 but lower than -1.0, it is classified as "osteoporosis"; and if its T value is higher than or equal to -1.0, it is considered "normal".

[0084] According to embodiments of this disclosure, the designated testing site is the distal third of the radius in the non-dominant hand (generally the left hand). If there has been trauma to the forearm or wrist on that side, the contralateral forearm is tested. During the measurement, the user assumes a comfortable seated position, placing their forearm naturally on the examination table. After applying sufficient coupling gel to the probe assembly, the operator assists the user in wearing the wristband-type probe at the designated testing site, begins the measurement, and holds it in that position until the measurement is completed.

[0085] The advantage of using the radius as the designated testing site in the embodiments of this disclosure is that the user being tested does not need to undress or make special preparations, and the testing venue only needs to be equipped with tables and chairs. However, those skilled in the art will understand that the designated testing site can also be other sites besides the radius.

[0086] The probe assembly of this disclosure includes a dual-transmit, dual-receive piezoelectric ceramic high-performance ultrasonic transducer, thus enabling the acquisition of ultrasonic radio frequency signals from four channels (E1 to R1, E1 to R2, E2 to R1, and E2 to R2). Each acquisition session lasts approximately 40 seconds, with a sampling rate of 40 MHz. By defining one transmission and reception cycle as a single period, 1180 time points can be acquired per cycle. The ultrasonic radio frequency data for each cycle records the time series of ultrasonic amplitudes across the four channels, which are then normalized to have a mean of 0 and a variance of 1.

[0087] As an example, Figure 2 shows the ultrasound radio frequency data of four channels of a quantitative ultrasound device for three consecutive cycles. Data from 100 single cycles with stable and effective transmission signals were selected, resulting in a total of 118,000 time points recorded, with a data shape of (118000, 4).

[0088] According to embodiments of this disclosure, when training a bone mineral density prediction model, ultrasound radiofrequency signals from multiple subjects are acquired as samples. For each subject, their bone mineral density (BMD) data is acquired, and a label is determined based on the BMD data for the corresponding sample. For example, since the DXA method is the "gold standard" recommended by the World Health Organization (WHO) for diagnosing osteoporosis, the BMD data used to determine the label can be the axial bone mineral density (AMD) value obtained by a DXA device. Axial bones include, for example, the lumbar spine, hip, and femoral neck. The label can include the axial bone mineral density value and / or the axial bone mass category. According to embodiments of this disclosure, a T-score can be calculated based on the axial bone mineral density value. In this example, healthy adults aged 25-35 years are used as the reference population. The T-score is calculated as: (axial bone mineral density value - mean BMD of the same sex reference population) / standard deviation of BMD of the same sex reference population. Based on the comparison between the T-score and a threshold, axial bone mass can be classified into three categories: normal bone mass, osteopenia, and osteoporosis. Sample data for each subject and bone mineral density data used to determine the label were collected on the same day to ensure a good correlation between the two.

[0089] According to embodiments of this disclosure, ultrasound radiofrequency data is preprocessed before being input into the bone density prediction model, including data stitching, data noise reduction, and data normalization.

[0090] Data stitching: Based on the observation and analysis of ultrasound radiofrequency signal data, it was found that each subject's ultrasound radiofrequency signal data contained tens of thousands of data cycles, which was very large, and the differences between each cycle were small. In order to enable the bone mineral density prediction model to learn the characteristics of the subject's ultrasound radiofrequency signal data more effectively and improve the model training efficiency, stable and effective transmission signals were extracted from each subject's ultrasound radiofrequency signal data. A specified number (e.g., 100) of single-cycle data were selected and stitched together to obtain the subject's sample.

[0091] Data denoising: This embodiment of the present disclosure uses an 8th-order Butterworth low-pass filter with a cutoff frequency of 20MHz to remove high-frequency noise in the data spliced ​​samples, thereby avoiding the impact of noise on the model.

[0092] Data Normalization: The amplitude of ultrasound radio frequency signal data varies between -40,000 and 40,000. The large amplitude and significant differences in magnitude between signals hinder model convergence and the learning of intrinsic features. Therefore, after data denoising, each sample was normalized to have a specified amplitude mean and equation, for example, an amplitude mean of 0 and a variance of 1. Unifying the numerical range helps avoid gradient explosion during model calculations, facilitating parameter updates. It also prevents the model from becoming overly sensitive to features with large numerical ranges, helping the model learn the intrinsic characteristics of the data, thereby improving the numerical stability and training effectiveness of the model.

[0093] After the above processing, each sample represents one subject.

[0094] Figure 3 shows a structural diagram of a bone mineral density prediction model according to an embodiment of the present disclosure.

[0095] As shown in Figure 3, the bone density prediction model according to an embodiment of this disclosure includes an output layer and one or more multi-scale attention convolution channels. The multi-scale attention convolution channels include: multiple convolutional layers with different kernel sizes for extracting time features of multiple scales from the ultrasound radio frequency signal data; and a cross-attention module for weighted fusion of the time features of the multiple scales to obtain a fused feature map for bone density prediction. The output layer is used to obtain the bone density prediction result based on the fused feature map.

[0096] According to embodiments of this disclosure, the quantitative ultrasound device for acquiring timed radio frequency signal data includes M ultrasound generators and N ultrasound receivers. Therefore, the ultrasound radio frequency signal data includes ultrasound radio frequency signal data from M*N transmit-receive channels formed by the M ultrasound generators and N ultrasound receivers. Correspondingly, the bone density prediction model includes M*N multi-scale attention convolution channels, each used to process the ultrasound radio frequency signal data from the M*N transmit-receive channels.

[0097] In the example of Figure 3, the quantitative ultrasound device includes two ultrasound generators and two ultrasound receivers. The ultrasound radio frequency signal data includes ultrasound radio frequency signal data from four transmit-receive channels formed by the two ultrasound generators and two ultrasound receivers. Accordingly, the bone mineral density prediction model shown in Figure 3 includes four multi-scale attention convolution channels, each used to process the ultrasound radio frequency signal data from the four transmit-receive channels.

[0098] According to embodiments of this disclosure, a quantitative ultrasound device may have more or fewer ultrasound transmitters and / or ultrasound receivers; for example, it may have one ultrasound transmitter and one ultrasound receiver. In this case, the bone mineral density prediction model includes one multi-scale attention convolution channel for processing ultrasound radio frequency signal data from one transmitter-receiver channel.

[0099] According to embodiments of this disclosure, each multi-scale attention convolutional channel processes ultrasound radio frequency signal data of the corresponding channel, extracts time features of the ultrasound radio frequency signal data at multiple scales through multiple convolutional layers with different kernel sizes, and performs weighted fusion of the time features at multiple scales through a cross-attention module to obtain a fused feature map for bone density prediction.

[0100] According to embodiments of this disclosure, the output layer includes a fully connected layer, and the bone mineral density prediction result includes a bone mineral density category and / or a bone mineral density value. For example, the fully connected layer may include a first fully connected layer and / or a second fully connected layer, wherein the first fully connected layer is used to obtain the bone mineral density category based on the fused feature map, and the second fully connected layer is used to obtain the bone mineral density value based on the fused feature map.

[0101] Specifically, as shown in Figure 3, the output layer may include a first fully connected layer FC1 and a second fully connected layer FC2. The first fully connected layer is used to obtain the bone mineral density category based on the fused feature map, and the second fully connected layer is used to obtain the bone mineral density value based on the fused feature map. Those skilled in the art will understand that the bone mineral density prediction model may also include only either the first fully connected layer FC1 or the second fully connected layer FC2.

[0102] According to embodiments of this disclosure, in model training, the bone mineral density prediction task employs a mean squared error loss function L. pre :

[0103] Where, m i Let b be the labeled bone mineral density value of the i-th sample. i is the bone mineral density value predicted by the model, and n is the number of samples.

[0104] According to embodiments of this disclosure, the bone density category classification task employs a cross-entropy loss function during model training:

[0105] in, The sign function is set to 1 if the label of the i-th sample is category c, and 0 otherwise. This represents the probability that the model predicts the i-th sample to be of class c, where M represents the number of classes and n represents the number of samples.

[0106] According to embodiments of this disclosure, when the bone density prediction model includes multiple multi-scale attention convolutional channels, the bone density prediction model further includes: a first concatenation module for concatenating the outputs of the multiple multi-scale attention convolutional channels; and a first global average pooling layer for performing global average pooling on the output of the first concatenation module, wherein the fully connected layer obtains the bone density prediction result based on the output of the first global average pooling layer. According to embodiments of this disclosure, the output of the multi-scale attention convolutional channels is the fused feature map of the multi-scale attention convolutional channel outputs.

[0107] For example, as shown in Figure 3, the bone density prediction model includes four multi-scale attention convolutional channels, a first concatenation module C1, and a first global average pooling layer GAP1. The first concatenation module C1 concatenates the outputs of these four multi-scale attention convolutional channels, thus cascading the features of the four channels along the channel direction. The first global average pooling layer GAP1 performs global average pooling on the output of the first concatenation module C1 to reduce the number of parameters and the risk of overfitting. The fully connected layers FC1 and FC2 make predictions based on the output of the first global average pooling layer GAP1.

[0108] According to embodiments of this disclosure, when the bone density prediction model includes multiple multi-scale attention convolution channels, the bone density prediction model further includes: a second splicing module, used to splice clinical risk factors obtained based on clinical risk factor data with the output of the first global average pooling layer, wherein obtaining the bone density prediction result based on the output of the one or more multi-scale attention convolution channels includes obtaining the bone density prediction result based on the output of the second splicing module. The bone density prediction model further includes a third fully connected layer, used to obtain the clinical risk factors based on the clinical risk factor data; the clinical risk factors include at least one or more of the following: gender, age, weight, height, and fracture history.

[0109] For example, as shown in Figure 3, the bone mineral density prediction model also includes a third fully connected layer FC3, used to obtain clinical risk factors based on clinical risk factors. The second splicing module C2 splices the clinical risk factors with the output of the first global average pooling layer GAP1, and the first fully connected layer FC1 and the second fully connected layer FC2 perform predictions based on the second splicing module C2. By introducing consideration of clinical risk factors into bone mineral density prediction through the third fully connected layer and the second splicing module, the accuracy of the prediction results can be further improved.

[0110] According to embodiments of this disclosure, the multi-scale attention convolution channel includes one multi-scale attention unit or multiple multi-scale attention units connected in series. When the multi-scale attention convolution channel includes multiple multi-scale attention units connected in series, adjacent multi-scale attention units are connected through pooling layers. The pooling layer includes any of the following: max pooling layer, average pooling layer, global pooling layer, adaptive pooling layer, global average pooling layer, and random pooling layer. For example, as shown in Figure 3, the multi-scale attention convolution channel includes three multi-scale attention units MU connected in series. Adjacent multi-scale attention units are connected through pooling layers PL. The pooling layer PL can be a max pooling layer. Max pooling can reduce the number of parameters and improve model performance by selecting more recognizable features.

[0111] When a multi-scale attention convolutional channel includes a single multi-scale attention unit, the input of that multi-scale attention convolutional channel is fed into that multi-scale attention unit. When a multi-scale attention convolutional channel includes multiple multi-scale attention units in series, the input of that multi-scale attention convolutional channel is fed into the first multi-scale attention unit; the output of the first multi-scale attention unit is fed into the second multi-scale attention unit, or after passing through a pooling layer, it is fed into the second multi-scale attention unit, and so on.

[0112] According to embodiments of this disclosure, a multi-scale attention unit includes one or more multi-scale attention modules connected in series. When a multi-scale attention unit includes one multi-scale attention module, the input of the multi-scale attention unit is input to that multi-scale attention module. When a multi-scale attention unit includes multiple multi-scale attention modules connected in series, the input of the multi-scale attention unit is input to the first multi-scale attention module; the output of the first multi-scale attention module is input to the second multi-scale attention module, and so on.

[0113] According to embodiments of this disclosure, a multi-scale attention module includes a multi-scale cross-attention module, and the input of the multi-scale attention module is input into the multi-scale cross-attention module.

[0114] The multi-scale cross-attention module includes m convolutional layers, m-1 cross-attention modules, and a third concatenation module, where m ≥ 2. Specifically: the m convolutional layers have different kernel sizes and are used to extract temporal features of different scales from the input of the multi-scale cross-attention module; the m-1 cross-attention modules are used to progressively fuse the outputs of the m convolutional layers according to the cross-attention mechanism, in ascending order of kernel size; and the third concatenation module is used to concatenate the output of the convolutional layer with the smallest kernel with the output of the m-1 cross-attention modules.

[0115] According to an embodiment of this disclosure, the inputs of the m convolutional layers are connected to the input of the multi-scale cross-attention module, the kernel of the i-th convolutional layer is smaller than the kernel of the (i+1)-th convolutional layer, 1≤i≤m-1; the first input of the first cross-attention module is connected to the output of the second convolutional layer, and the second input of the first cross-attention module is connected to the output of the first convolutional layer; the first input of the j-th cross-attention module is connected to the output of the (j+1)-th convolutional layer, and the second input of the j-th cross-attention module is connected to the first output of the (j-1)-th cross-attention module, wherein the first output of the cross-attention module outputs the cross-attention fusion result of the first input data of the first input of the cross-attention module and the second input data of the second input of the second input, 2≤j≤m-1.

[0116] According to embodiments of this disclosure, the cross-attention module includes: a first projection module for linearly projecting the first input data using a 1*1 pointwise convolution to obtain a first vector matrix Q; a second projection module for linearly projecting the second input data using a 1*1 pointwise convolution to obtain a second vector matrix K; a third projection module for linearly projecting the second input data using a 1*1 pointwise convolution to obtain a third vector matrix V; a first dot product module for calculating the dot product of the transpose of the first vector matrix Q and the second vector matrix K; a normalization module for normalizing the output of the first dot product module; a second dot product module for calculating the dot product of the output of the normalization module and the third vector matrix V; a residual connection module for superimposing the first input data and the output of the second dot product module; a first output terminal for outputting the calculation result of the second dot product module; and a second output terminal for outputting the calculation result of the residual connection module.

[0117] According to embodiments of this disclosure, the multi-scale cross-attention module further includes a batch normalization module and an activation function module that are sequentially connected in series to the output of the third splicing module.

[0118] According to embodiments of this disclosure, the multi-scale attention module further includes a compressed activation attention module for scaling the weights of features at different scales contained in the output of the multi-scale cross-attention module to remove redundant information. The compressed activation attention module includes a second global average pooling layer, a fourth fully connected layer, a fifth fully connected layer, and a third dot product module connected in series. The fourth dot product module is used to multiply the output of the fifth fully connected layer with the input of the compressed activation attention module.

[0119] The multi-scale attention module according to an embodiment of the present disclosure will now be described in detail with reference to Figures 4 and 5.

[0120] Figure 4 illustrates a specific example of a multi-scale attention module according to an embodiment of the present disclosure.

[0121] The multiscale attention module MA shown in Figure 4 includes a multiscale cross-attention module MSCA and a squeeze-and-excitation (SE) attention module SE. The input of the multiscale attention module MA is fed into the multiscale cross-attention module MSCA.

[0122] The multi-scale cross-attention module (MSCA) consists of three convolutional layers (CN1, CN2, CN3), two cross-attention modules (CA1, CA2), a third concatenation module (C3), a batch normalization module (BN), and an activation function module (ReLU, X). k As the input to the Multi-Scale Cross-Attention Module (MSCA), F′ m This is the output of the Multiscale Cross-Attention Module (MSCA).

[0123] According to embodiments of this disclosure, the three parallel convolutional layers CN1, CN2, and CN3 of the multi-scale cross-attention module (MSCA) have different kernel sizes, generating receptive fields of different sizes to capture time series X. k The short-term, medium-term, and long-term time dependencies are observed, thereby obtaining multi-scale temporal information to extract features and generate feature maps. The advantage of multi-scale convolution over single-scale convolution is that it can generate receptive fields of different sizes, thus capturing more information. Ultrasonic radio frequency signal data input to four multi-scale attention convolution channels is represented as a time series T. k,L =(t k,1 ,t k,2 ,…,t k,L-1 ,t k,L ), where k represents the k-th multi-scale attention convolution channel, and L represents the length of the time series. This ultrasound radio frequency signal data is input into the corresponding multi-scale attention convolution channel.

[0124] As an example, in the Multi-Scale Cross-Attention Module (MSCA), the kernel sizes of the three convolutional layers CN1, CN2, and CN3 are 3, 7, and 15, respectively, and the stride is 1 for all of them. Their input is a time series X. k The output feature maps can be used separately. express.

[0125] The feature maps generated by the three convolutional layers CN1, CN2, and CN3 are input into the cross-attention module to calculate the attention weights between time features at different scales, thereby obtaining the contextual information of small-scale time features and the rich detailed information contained in large-scale time features.

[0126] Figure 5 shows the structure of the first cross-attention module CA1, which outputs the feature map from the first convolutional layer CN1. and the feature map output by the second convolutional layer CN2 These two inputs, respectively, are used as the two inputs to the first cross-attention module CA1. First, three projection modules P1, P2, and P3 perform feature linear projection on these two inputs through 1×1 pointwise convolutions to obtain the corresponding vector matrices. Among them, W Q W K W V These are the learnable parameters of three 1×1 pointwise convolutions. Then, they are fused using a cross-attention mechanism to obtain... The calculation process can be represented as follows:

[0127] Q and K are calculated using the first dot product module M1. T The dot product is then applied to Q and K, followed by the Softmax activation function. T The dot product weighted values ​​are normalized. Then, a second dot product module M2 is used to perform a dot product weighted sum with matrix V on the normalized feature map, thereby modeling the contextual information of small-scale temporal features. Finally, the attention maps are superimposed using the residual connection module ADD to obtain the feature map. because and If the dimensions are inconsistent, it is necessary to convert them before performing the addition operation. Dimensions adjusted to and Consistent, It can be represented as:

[0128] Similarly, for the second cross-attention module CA2 in the MSCA module, the feature maps obtained above are processed using the same steps. With the output feature map of the third scale convolutional layer They are used as two inputs, where the feature map Input projection module P1, feature map The feature maps are obtained by fusing the input projection modules P2 and P3 through a cross-attention mechanism. The obtained feature maps are then concatenated along the channel dimension to obtain feature map F. m , can be represented as:

[0129] It is understood that the structure of the cross-attention module and its connection method with the convolutional layer described above are only one implementation of the embodiments of this disclosure. The cross-attention module according to the embodiments of this disclosure can also adopt other structures and connection methods with the convolutional layer, as long as it can fuse the outputs of multiple convolutional layers based on the attention mechanism.

[0130] Subsequently, the feature map F is processed by the batch normalization module BN. m Batch normalization (BN) is performed, which normalizes the feature maps to ensure the input distribution remains constant during network training. This operation helps increase the magnitude of gradients, effectively avoiding the vanishing gradient problem. Larger gradients also mean faster convergence, significantly improving training efficiency. The final layer of the MSCA module is the activation function module, employing the ReLU activation function. The main advantage of ReLU is its ability to enhance non-linear connections between network layers and help reduce overfitting by generating sparse activation features. More importantly, the combination of BN and ReLU makes the model learning process more robust, thereby improving model performance. Finally, the output of the MSCA module can be expressed as: F′ m =ReLU(BN(F) m ))

[0131] According to embodiments of this disclosure, the multi-scale attention module MA may further include a squeeze-and-excitation (SE) attention module SE.

[0132] As shown in Figure 4, the compressed activation attention module SE includes a second global average pooling layer GAP2, a fourth fully connected layer FC4, a fifth fully connected layer FC5, and a third dot product module M3 connected in series. The third dot product module M3 is used to combine the output of the fifth fully connected layer FC5 with the input F′ of the compressed activation attention module. m Multiplication. The Compressed Excited Attention Module (SE) enhances the network's recognition ability by rescaling the weights of each multi-scale temporal feature to select important information and remove redundant information. The SE first multiplies the output F′ of the Multi-Scale Cross-Attention Module (MSCA). m The input is fed into the second global average pooling layer GAP2, where global information is compressed onto the channels for representation—the "compression" operation of the SE attention mechanism. Channel feature responses are generated by calculating the average value across all channels, and its output F... q It can be represented as:

[0133] Among them, L f X is the input vector of the multi-scale cross-attention module (MSCA). k The length of F′m (i) is F′ m The i-th element. The second global average pooling layer GAP2 is followed by the fourth fully connected layer FC4 and the fifth fully connected layer FC5, which is the "activation" operation of the SE attention mechanism. The channel weight F output after the activation operation... e It can be represented as: F e =σ(W2ReLU(W1F) q ))

[0134] Where W1 and W2 are the connection weight parameters of the fourth fully connected layer FC4 and the fifth fully connected layer FC5, respectively, and σ is the Sigmoid activation function. Finally, the generated channel weights are applied to the input feature map F′. m The final output F of the MA module is obtained. ma F ma =F e F′ m

[0135] It is understandable that the compressed excitation attention module (SE) is not necessary, but including the compressed excitation attention module (SE) can enhance the network's recognition ability.

[0136] Figure 6 shows a flowchart of a bone mineral density prediction model training method based on ultrasound radio frequency signals according to an embodiment of the present disclosure.

[0137] As shown in Figure 6, the bone density prediction model training method based on ultrasound radio frequency signals according to an embodiment of the present disclosure includes steps S201-S203.

[0138] In step S201, multiple sample data are acquired from multiple subjects. The sample data includes ultrasound radio frequency signal data acquired by a quantitative ultrasound device from the corresponding subjects. The quantitative ultrasound device includes an ultrasound transmitter and an ultrasound receiver. The ultrasound radio frequency signal data includes the raw radio frequency signal data detected by the ultrasound receiver after the ultrasound signal emitted by the ultrasound transmitter passes through a designated detection location on the human body.

[0139] In step S202, based on the bone density data of the multiple subjects, the labels of the multiple sample data are determined.

[0140] In step S203, a bone density prediction model is trained using the multiple sample data and their respective labels. The bone density prediction model includes an output layer and one or more multi-scale attention convolutional channels. The multi-scale attention convolutional channels include: multiple convolutional layers with different kernel sizes for extracting time features of multiple scales from the ultrasound radio frequency signal data; and a cross-attention module for weighted fusion of the time features of multiple scales to obtain a fused feature map for bone density prediction. The output layer is used to obtain the bone density prediction result based on the fused feature map.

[0141] According to embodiments of this disclosure, the bone mineral density prediction model includes the bone mineral density prediction model described above with reference to Figures 3-5.

[0142] According to embodiments of this disclosure, the designated detection location includes the radius, and the label includes an axial bone mineral density value and / or axial bone mass category determined based on bone mineral density data (e.g., axial bone mineral density value) obtained from DXA device detection. For any subject, the sample data and bone mineral density data are collected on the same day.

[0143] The bone mineral density prediction model based on ultrasound radio frequency signals according to the embodiments of this disclosure can accurately predict axial bone mineral density values ​​and bone mineral density categories. Compared with calculating the corresponding T-value of bone mineral density by calculating quantitative parameters such as SOS value and BUA value of ultrasound radio frequency signals, this method is more intuitive and conducive to the establishment of an artificial intelligence osteoporosis diagnosis database based on ultrasound radio frequency signals and the formulation of new thresholds for quantitative ultrasound diagnosis of bone. Secondly, the measurement site of the embodiments of this disclosure is the distal 1 / 3 of the radius on the non-dominant side. Compared with classic axial bone mineral density measurement methods (such as lumbar spine and hip DXA), it can be completed in a shorter time and usually does not require the patient to undress or make special preparations. In addition, the embodiments of this disclosure use ultrasound radio frequency signals from quantitative ultrasound equipment to obtain bone mass information. Compared with common peripheral bone mineral density methods (such as pDXA, pQCT, etc.), there is no radiation exposure, no need for special examination sites to shield radiation, and the measurement is more convenient and more suitable for large-scale screening, clinical monitoring, and epidemiological surveys.

[0144] To verify the performance of the bone mineral density prediction method according to the embodiments of this disclosure, the measurement results of the DXA device were used as a benchmark. The prediction results of the bone mineral density prediction method of the embodiments of this disclosure were compared with the measurement results of the original BMD-9 host system. The classification and prediction performance of the bone mineral density prediction method of the embodiments of this disclosure achieved good results under multiple evaluation indicators. The accuracy (ACC), recall, positive predictive value (PPV), and area under the curve (AUC) were improved by 27.70%, 25.86%, 29.23%, and 22.90%, respectively. In the bone mineral density prediction task, the bone mineral density prediction method of the embodiments of this disclosure had a minimum mean square error of 0.011, a coefficient of determination of 0.457, and a Pearson correlation coefficient of 0.689, demonstrating good predictive performance. This indicates that the model has good sensitivity to changes in bone mineral density in different subjects and a strong ability to identify osteoporosis.

[0145] Figure 7 shows a structural block diagram of a bone mineral density prediction device based on ultrasound radio frequency signals according to an embodiment of the present disclosure. This device can be implemented as part or all of an electronic device through software, hardware, or a combination of both.

[0146] As shown in Figure 7, the bone density prediction device 700 includes a first acquisition module 710 and a prediction module 720.

[0147] The first acquisition module 710 is configured to acquire detection data of the user under test. The detection data includes ultrasound radio frequency signal data collected by the quantitative ultrasound device from the user under test. The quantitative ultrasound device includes an ultrasound transmitter and an ultrasound receiver. The ultrasound radio frequency signal data includes the raw radio frequency signal data detected by the ultrasound receiver after the ultrasound signal emitted by the ultrasound transmitter passes through a designated detection location on the human body.

[0148] The prediction module 720 is configured to predict the bone density of the tested user based on the detection data using a trained bone density prediction model. The bone density prediction model is trained using multiple sample data collected from multiple subjects and the labels of each sample data. The sample data includes ultrasound radiofrequency signal data collected from the corresponding subjects by a quantitative ultrasound device. The labels of each sample data are determined based on the bone density data of the multiple subjects. The bone density prediction model includes an output layer and one or more multi-scale attention convolutional channels. Each multi-scale attention convolutional channel includes a cross-attention module and multiple convolutional layers with different kernel sizes. These layers are used to extract time features at multiple scales from the ultrasound radiofrequency signal data. The cross-attention module is used to weightedly fuse the time features at multiple scales to obtain a fused feature map. The output layer is used to obtain a bone density prediction result based on the fused feature map.

[0149] Figure 8 shows a structural block diagram of a bone mineral density prediction model training device based on ultrasound radio frequency signals according to an embodiment of the present disclosure. This device can be implemented as part or all of an electronic device through software, hardware, or a combination of both.

[0150] As shown in Figure 8, the bone density prediction model training device 800 includes a second acquisition module 810, a determination module 820, and a training module 830.

[0151] The second acquisition module 810 is configured to acquire multiple sample data collected from multiple subjects. The sample data includes ultrasound radio frequency signal data collected from the corresponding subjects by a quantitative ultrasound device. The quantitative ultrasound device includes an ultrasound transmitter and an ultrasound receiver. The ultrasound radio frequency signal data includes the raw radio frequency signal data detected by the ultrasound receiver after the ultrasound signal emitted by the ultrasound transmitter passes through a designated detection location on the human body.

[0152] The determination module 820 is configured to determine the labels of the plurality of sample data based on the bone mineral density data of the plurality of subjects.

[0153] Training module 830 is configured to train a bone density prediction model using the plurality of sample data and their respective labels, wherein: the bone density prediction model includes an output layer and one or more multi-scale attention convolutional channels, the multi-scale attention convolutional channels include a cross-attention module and multiple convolutional layers with different kernel sizes, used to extract time features of the ultrasound radio frequency signal data at multiple scales, the cross-attention module is used to perform weighted fusion of the time features at multiple scales to obtain a fused feature map, and the output layer is used to obtain a bone density prediction result based on the fused feature map.

[0154] According to embodiments of the present disclosure, the bone mineral density prediction model in the bone mineral density prediction device and the bone mineral density prediction model training device can be the bone mineral density prediction model described above with reference to Figures 3-5.

[0155] This disclosure also discloses an electronic device, and FIG9 shows a structural block diagram of the electronic device according to an embodiment of the present disclosure.

[0156] As shown in Figure 9, the electronic device includes a memory and a processor, wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the bone density prediction method and / or bone density prediction model training method according to embodiments of the present disclosure.

[0157] Figure 10 shows a schematic diagram of the structure of a computer system suitable for implementing the method according to an embodiment of the present disclosure.

[0158] As shown in Figure 10, the computer system includes a processing unit that can execute various methods described above based on a program stored in a read-only memory (ROM) or a program loaded from a storage portion into a random access memory (RAM). The RAM also stores various programs and data required for the operation of the computer system. The processing unit, ROM, and RAM are interconnected via a bus. Input / output (I / O) interfaces are also connected to the bus.

[0159] The following components are connected to the I / O interface: input sections including keyboards, mice, etc.; output sections including cathode ray tubes (CRTs), liquid crystal displays (LCDs), and speakers; storage sections including hard disks, etc.; and communication sections including network interface cards such as LAN cards and modems. The communication section performs communication processes via a network such as the Internet. Drives are also connected to the I / O interface as needed. Removable media, such as disks, optical disks, magneto-optical disks, semiconductor memories, etc., are installed on the drive as needed so that computer programs read from them can be installed into the storage section as needed. The processing unit can be implemented as a CPU, GPU, TPU, FPGA, NPU, etc.

[0160] According to embodiments of this disclosure, the methods described above can be implemented as computer software programs. For example, embodiments of this disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program containing program code for performing the methods described above. In such embodiments, the computer program can be downloaded and installed from a network via a communication component, and / or installed from a removable medium.

[0161] Embodiments of this disclosure also provide an artificial intelligence ultrasound bone densitometer. The artificial ultrasound bone densitometer according to embodiments of this disclosure includes: a strap, one or more ultrasound transmitters and one or more ultrasound receivers disposed on the strap, and a control host.

[0162] The control host includes a processor and a memory, wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the bone density prediction method according to embodiments of the present disclosure.

[0163] According to embodiments of this disclosure, an artificial ultrasound bone densitometer employs a ring-type probe to detect the radial bone position. The ring-type probe includes a wristband and a probe assembly, which has a modular design that can be detached from the wristband. In a specific example, the probe assembly includes dual-transmitter, dual-receiver piezoelectric ceramic high-performance ultrasonic transducers, namely transducers A, B, C, and D, wherein A and B serve as transmitting transducers E1 and E2 (used as ultrasonic transmitters), and C and D serve as receiving transducers R1 and R2 (used as ultrasonic receivers), with each element having an angle of approximately 30°, forming an axial transmission multi-channel ultrasonic transducer assembly.

[0164] The strap can be made of finely frosted soft rubber material and is designed with buckles to adjust the tightness of different tightness and grooves to fit the radial side, which can keep the probe assembly close to the radius and as parallel as possible, ensuring the stability and reliability of the signal during the test.

[0165] The designated testing site is the distal third of the radius in the non-dominant hand (usually the left hand). If there has been trauma to the forearm or wrist on that side, the contralateral forearm will be tested. During the measurement, the user should be comfortably seated with their forearm resting naturally on the examination table. After applying sufficient coupling gel to the probe assembly, the operator assists the user in wearing the wristband probe on the measurement site, begins the measurement, and holds the device in that position until the measurement is complete.

[0166] The technical solution provided in this disclosure fully utilizes the rich amplitude, frequency, and phase information in ultrasound radio frequency signals. This information reflects the interaction between the ultrasound sound field and human tissue, as well as the microstructural characteristics of human tissue. A bone density prediction model is obtained by training a deep neural network model containing an attention mechanism. The bone density prediction model is used to predict bone density based on the ultrasound radio frequency signals of the tested user. It can automatically extract key feature variables related to bone density prediction from the ultrasound radio frequency signals through the attention mechanism. Compared with the existing method of extracting acoustic parameters from ultrasound signals and deriving bone density based on acoustic parameters, it has higher intelligence and accuracy. At the same time, compared with detection methods such as DXA and QCT, the quantitative ultrasound detection method according to this disclosure has advantages such as low cost, small size, portability, less time consumption, and no radiation. It does not require a special examination site to shield radiation, making measurement more convenient and better meeting the needs of large-scale screening, clinical monitoring, and epidemiological surveys.

[0167] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0168] The units or modules described in the embodiments of this disclosure can be implemented in software or programmable hardware. The described units or modules can also be located in a processor, and the names of these units or modules do not necessarily constitute a limitation on the unit or module itself.

[0169] In another aspect, this disclosure also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the electronic device or computer system described above; or it may be a standalone computer-readable storage medium not assembled into a device. The computer-readable storage medium stores one or more programs, which are used by one or more processors to perform the methods described in this disclosure.

[0170] The above description is merely a preferred embodiment of this disclosure and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in this disclosure is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the inventive concept. For example, technical solutions formed by substituting the above-described features with (but not limited to) technical features disclosed in this disclosure that have similar functions.

Claims

1. A method for predicting bone mineral density based on ultrasound radio frequency signals, characterized in that, include: Acquire detection data from the tested user. The detection data includes ultrasound radio frequency signal data collected from the tested user by a quantitative ultrasound device. The quantitative ultrasound device includes an ultrasound transmitter and an ultrasound receiver. The ultrasound radio frequency signal data includes the raw radio frequency signal data detected by the ultrasound receiver after the ultrasound signal emitted by the ultrasound transmitter passes through a designated detection location on the human body. Based on the detection data, a trained bone density prediction model is used to predict the bone density of the tested user, wherein: The bone mineral density prediction model is trained using multiple sample data collected from multiple subjects and the labels of each of the multiple sample data. The sample data includes ultrasound radiofrequency signal data collected from the corresponding subjects by a quantitative ultrasound device, and the labels of each of the multiple sample data are determined based on the bone mineral density data of the multiple subjects. The bone density prediction model includes an output layer and one or more multi-scale attention convolutional channels. The multi-scale attention convolutional channels include a cross-attention module and multiple convolutional layers with different kernel sizes, used to extract time features of multiple scales from the ultrasound radio frequency signal data. The cross-attention module is used to perform weighted fusion of the time features of multiple scales to obtain a fused feature map. The output layer is used to obtain the bone density prediction result based on the fused feature map.

2. The method according to claim 1, characterized in that: The designated detection location includes the radius; The label includes axial bone mineral density values ​​and / or axial bone mass categories; For any given subject, the sample data and bone density data were collected on the same day.

3. The method according to claim 1, characterized in that: The quantitative ultrasound device includes M ultrasound generators and N ultrasound receivers, and the ultrasound radio frequency signal data includes ultrasound radio frequency signal data from M*N transmit-receive channels formed by the M ultrasound generators and N ultrasound receivers. The bone density prediction model includes M*N multi-scale attention convolution channels, which are used to process the ultrasound radio frequency signal data of the M*N transmit-receive channels.

4. The method according to claim 1, characterized in that, The output layer includes a fully connected layer.

5. The method according to claim 4, characterized in that: The bone mineral density prediction results include bone mineral density category and / or bone mineral density value; The fully connected layer includes a first fully connected layer and / or a second fully connected layer. The first fully connected layer is used to obtain the bone density category based on the fused feature map, and the second fully connected layer is used to obtain the bone density value based on the fused feature map.

6. The method according to claim 4, characterized in that, When the bone density prediction model includes multiple multi-scale attention convolution channels, the bone density prediction model further includes: The first splicing module is used to splice the outputs of the multiple multi-scale attention convolution channels; The first global average pooling layer is used to perform global average pooling on the output of the first splicing module. The bone density prediction result is obtained by the fully connected layer based on the output of the first global average pooling layer.

7. The method according to claim 6, characterized in that, When the bone density prediction model includes multiple multi-scale attention convolution channels, the bone density prediction model further includes: The second splicing module is used to splice the clinical risk factors obtained based on clinical risk factor data with the output of the first global average pooling layer. The step of obtaining the bone density prediction result based on the output of the one or more multi-scale attention convolution channels includes obtaining the bone density prediction result based on the output of the second stitching module.

8. The method according to claim 7, characterized in that: The bone mineral density prediction model also includes a third fully connected layer, used to obtain the clinical risk factors based on the clinical risk factor data; The clinical risk factors include at least one or more of the following: sex, age, weight, height, and history of fracture.

9. The method according to claim 1, characterized in that, The multi-scale attention convolution channel includes one multi-scale attention unit or multiple multi-scale attention units in series, and the multi-scale attention unit includes one or multiple multi-scale attention modules in series.

10. The method according to claim 9, characterized in that: When the multi-scale attention convolution channel includes multiple multi-scale attention units connected in series, adjacent multi-scale attention units are connected through pooling layers.

11. The method according to claim 10, characterized in that, The pooling layer includes any of the following: max pooling layer, average pooling layer, global pooling layer, adaptive pooling layer, global average pooling layer, and random pooling layer.

12. The method according to claim 9, characterized in that, The multi-scale attention convolution channel includes three multi-scale attention units connected in series. Adjacent multi-scale attention units are connected through a max pooling layer. Each multi-scale attention unit includes two multi-scale attention modules connected in series.

13. The method according to claim 9, characterized in that, The multi-scale attention module includes a multi-scale cross-attention module, which comprises m convolutional layers, m-1 cross-attention modules, and a third concatenation module, where m ≥ 2. The m convolutional layers have different kernel sizes and are used to extract temporal features at different scales from the input of the multi-scale cross-attention module. The m-1 cross-attention modules are used to fuse the outputs of the m convolutional layers step by step based on the cross-attention mechanism, in ascending order of convolutional kernel size; The third splicing module is used to splice the output of the convolutional layer with the smallest kernel with the outputs of the m-1 cross-attention modules.

14. The method according to claim 13, characterized in that: The inputs of the m convolutional layers are connected to the input of the multi-scale cross-attention module. The kernel of the i-th convolutional layer is smaller than the kernel of the (i+1)-th convolutional layer, and 1≤i≤m-1. The first input of the first cross-attention module is connected to the output of the second convolutional layer, and the second input of the first cross-attention module is connected to the output of the first convolutional layer. The first input of the j-th cross-attention module is connected to the output of the (j+1)-th convolutional layer, and the second input of the j-th cross-attention module is connected to the first output of the (j-1)-th cross-attention module. The first output of the cross-attention module outputs the cross-attention fusion result of the first input data of the first input of the cross-attention module and the second input data of the second input of the cross-attention module, where 2≤j≤m-1.

15. The method according to claim 14, characterized in that, The cross-attention module includes: The first projection module is used to perform linear projection on the first input data using 1*1 pointwise convolution to obtain the first vector matrix Q; The second projection module is used to perform linear projection on the second input data using 1*1 pointwise convolution to obtain the second vector matrix K; The third projection module is used to perform linear projection on the second input data using 1*1 pointwise convolution to obtain the third vector matrix V; The first dot product module is used to calculate the dot product of the first vector matrix Q and the transpose of the second vector matrix K; The normalization module is used to normalize the output of the first dot product module; The second dot product module is used to calculate the dot product between the output of the normalization module and the third vector matrix V; The residual connection module is used to superimpose the first input data with the output of the second dot product module; The first output terminal outputs the calculation result of the second dot product module; The second output terminal outputs the calculation results of the residual connection module.

16. The method according to claim 13, characterized in that, The multi-scale cross-attention module also includes a batch normalization module and an activation function module that are sequentially connected in series to the output of the third splicing module.

17. The method according to claim 13, characterized in that, The multi-scale attention module further includes a compressed excitation attention module, which is used to scale the weights of features of different scales contained in the output of the multi-scale cross attention module to remove redundant information.

18. The method according to claim 17, characterized in that, The compressed incentive attention module includes a second global average pooling layer, a fourth fully connected layer, a fifth fully connected layer, and a third dot multiplication module connected in series. The third dot multiplication module is used to multiply the output of the fifth fully connected layer with the input of the compressed incentive attention module.

19. A method for training a bone mineral density prediction model based on ultrasound radio frequency signals, characterized in that, include: Acquire multiple sample data collected from multiple subjects, the sample data including ultrasound radio frequency signal data collected from the corresponding subjects by a quantitative ultrasound device, the quantitative ultrasound device including an ultrasound transmitter and an ultrasound receiver, the ultrasound radio frequency signal data including the raw radio frequency signal data detected by the ultrasound receiver after the ultrasound signal emitted by the ultrasound transmitter passes through a designated detection location on the human body; Based on the bone mineral density data of the multiple subjects, the labels of the multiple sample data are determined; A bone density prediction model is trained using the plurality of sample data and their respective labels, wherein: The bone density prediction model includes an output layer and one or more multi-scale attention convolutional channels. The multi-scale attention convolutional channels include: multiple convolutional layers with different kernel sizes for extracting time features of multiple scales from the ultrasound radio frequency signal data; and a cross-attention module for weighted fusion of the time features of multiple scales to obtain a fused feature map for bone density prediction. The output layer is used to obtain the bone density prediction result based on the fused feature map.

20. The method according to claim 19, characterized in that: The quantitative ultrasound device includes M ultrasound generators and N ultrasound receivers, and the ultrasound radio frequency signal data includes ultrasound radio frequency signal data from M*N transmit-receive channels formed by the M ultrasound generators and N ultrasound receivers. The bone density prediction model includes M*N multi-scale attention convolution channels, which are used to process the ultrasound radio frequency signal data of the M*N transmit-receive channels.

21. The method according to claim 19, characterized in that: The multi-scale attention convolution channel includes one multi-scale attention unit or multiple multi-scale attention units in series, and the multi-scale attention unit includes one or multiple multi-scale attention modules in series.

22. The method according to claim 21, characterized in that: When the multi-scale attention convolution channel includes multiple multi-scale attention units connected in series, adjacent multi-scale attention units are connected through pooling layers.

23. The method according to claim 21, characterized in that, The multi-scale attention module includes a multi-scale cross-attention module and a compressed activation attention module. The multi-scale cross-attention module includes m convolutional layers, m-1 cross-attention modules, and a third concatenation module, where m ≥ 2. It also includes a batch normalization module and an activation function module sequentially connected in series to the output of the third concatenation module. The compressed activation attention module is used to scale the weights of features of different scales contained in the output of the multi-scale cross-attention module to remove redundant information, wherein: The m convolutional layers have different kernel sizes and are used to extract temporal features at different scales from the input of the multi-scale cross-attention module. The m-1 cross-attention modules are used to fuse the outputs of the m convolutional layers step by step based on the cross-attention mechanism, in ascending order of convolutional kernel size; The third splicing module is used to splice the output of the convolutional layer with the smallest kernel with the outputs of the m-1 cross-attention modules.

24. The method according to claim 23, characterized in that: The inputs of the m convolutional layers are connected to the input of the multi-scale cross-attention module. The kernel of the i-th convolutional layer is smaller than the kernel of the (i+1)-th convolutional layer, and 1≤i≤m-1. The first input of the first cross-attention module is connected to the output of the second convolutional layer, and the second input of the first cross-attention module is connected to the output of the first convolutional layer. The first input of the j-th cross-attention module is connected to the output of the (j+1)-th convolutional layer, and the second input of the j-th cross-attention module is connected to the first output of the (j-1)-th cross-attention module. The first output of the cross-attention module outputs the cross-attention fusion result of the first input data of the first input of the cross-attention module and the second input data of the second input of the cross-attention module, where 2≤j≤m-1. The compressed incentive attention module includes a second global average pooling layer, a fourth fully connected layer, a fifth fully connected layer, and a third dot multiplication module connected in series. The third dot multiplication module is used to multiply the output of the fifth fully connected layer with the input of the compressed incentive attention module.

25. The method according to claim 24, characterized in that, The cross-attention module includes: The first projection module is used to perform linear projection on the first input data using 1*1 pointwise convolution to obtain the first vector matrix Q; The second projection module is used to perform linear projection on the second input data using 1*1 pointwise convolution to obtain the second vector matrix K; The third projection module is used to perform linear projection on the second input data using 1*1 pointwise convolution to obtain the third vector matrix V; The first dot product module is used to calculate the dot product of the first vector matrix Q and the transpose of the second vector matrix K; The normalization module is used to normalize the output of the first dot product module; The second dot product module is used to calculate the dot product between the output of the normalization module and the third vector matrix V; The residual connection module is used to superimpose the first input data with the output of the second dot product module; The first output terminal outputs the calculation result of the second dot product module; The second output terminal outputs the calculation results of the residual connection module.

26. A bone mineral density prediction device based on ultrasound radio frequency signals, characterized in that, include: The first acquisition module is configured to acquire the detection data of the tested user. The detection data includes ultrasound radio frequency signal data collected by the quantitative ultrasound device from the tested user. The quantitative ultrasound device includes an ultrasound transmitter and an ultrasound receiver. The ultrasound radio frequency signal data includes the raw radio frequency signal data detected by the ultrasound receiver after the ultrasound signal emitted by the ultrasound transmitter passes through a designated detection location on the human body. The prediction module is configured to predict the bone density of the tested user based on the detection data using a trained bone density prediction model, wherein: The bone mineral density prediction model is trained using multiple sample data collected from multiple subjects and the labels of each of the multiple sample data. The sample data includes ultrasound radiofrequency signal data collected from the corresponding subjects by a quantitative ultrasound device, and the labels of each of the multiple sample data are determined based on the bone mineral density data of the multiple subjects. The bone density prediction model includes an output layer and one or more multi-scale attention convolutional channels. The multi-scale attention convolutional channels include a cross-attention module and multiple convolutional layers with different kernel sizes, used to extract time features of multiple scales from the ultrasound radio frequency signal data. The cross-attention module is used to perform weighted fusion of the time features of multiple scales to obtain a fused feature map. The output layer is used to obtain the bone density prediction result based on the fused feature map.

27. A bone mineral density prediction model training device based on ultrasound radio frequency signals, characterized in that, include: The second acquisition module is configured to acquire multiple sample data collected from multiple subjects. The sample data includes ultrasound radio frequency signal data collected from the corresponding subjects by a quantitative ultrasound device. The quantitative ultrasound device includes an ultrasound transmitter and an ultrasound receiver. The ultrasound radio frequency signal data includes the raw radio frequency signal data detected by the ultrasound receiver after the ultrasound signal emitted by the ultrasound transmitter passes through a designated detection location on the human body. The determination module is configured to determine the labels of the multiple sample data based on the bone mineral density data of the multiple subjects; The training module is configured to train a bone density prediction model using the plurality of sample data and the respective labels of the plurality of sample data, wherein: The bone density prediction model includes an output layer and one or more multi-scale attention convolutional channels. The multi-scale attention convolutional channels include a cross-attention module and multiple convolutional layers with different kernel sizes, used to extract time features of multiple scales from the ultrasound radio frequency signal data. The cross-attention module is used to perform weighted fusion of the time features of multiple scales to obtain a fused feature map. The output layer is used to obtain the bone density prediction result based on the fused feature map.

28. An electronic device, characterized in that, The method includes a processor and a memory, wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any one of claims 1-25.

29. An artificial intelligence ultrasonic bone densitometer, characterized in that, include: Straps; One or more ultrasonic transmitters and one or more ultrasonic receivers are disposed on the strap; A control host, the control host including a processor and a memory, wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any one of claims 1-25.

30. A computer-readable storage medium storing computer instructions thereon, characterized in that, When executed by a processor, the computer instructions implement the method described in any one of claims 1-25.

31. A computer program product, the computer program product comprising computer instructions, characterized in that, When executed by a processor, the computer instructions implement the method described in any one of claims 1-25.