Head lateral radiograph key point detection method based on feature fusion neural network
By using the backbone network, neck module, and encoder/decoder of a feature fusion neural network, a high-resolution heatmap is generated, which solves the problems of automation and accuracy in key point detection of lateral cephalometric radiographs and achieves efficient key point localization.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NANJING UNIV OF POSTS & TELECOMM
- Filing Date
- 2024-09-29
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies lack standardized key point detection procedures and well-designed deviation reduction procedures on lateral cephalometric radiographs, resulting in low detection accuracy and reliance on extensive manual annotation. Furthermore, due to limited data volume, the network output resolution is small, introducing quantization errors.
A feature fusion-based neural network approach, including a backbone network, neck module, prediction head, and encoder/decoder, is adopted to generate high-resolution predictions through feature fusion and Gaussian heatmaps, thereby achieving automated and accurate detection of key points.
It achieves fully automated and accurate detection of key points in lateral cephalometric radiographs, alleviates the problem of quantization error, and improves detection accuracy and efficiency on small datasets.
Smart Images

Figure CN119169667B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical image processing technology, specifically to a method for detecting key points in lateral cephalometric radiographs based on feature fusion neural networks. Background Technology
[0002] Cephalometric analysis is of paramount importance in orthodontics. It involves the precise identification of key points on lateral cephalometric radiographs, providing physicians with crucial information about a patient's craniofacial structure and significantly influencing treatment planning decisions. However, obtaining reliable annotations of these key points from lateral cephalometric radiographs presents significant challenges due to inherent variations in image quality and complex anatomical differences among patients. This typically requires the expertise of highly experienced medical professionals, and even for experienced orthodontists, manually identifying these key points remains a labor-intensive and time-consuming task.
[0003] Fully automated and accurate keypoint detection methods for lateral cephalometric radiographs have long been a demanding area. Existing methods for keypoint detection in lateral cephalometric radiographs often lack standardized procedures and well-designed bias reduction processes, which severely impacts their performance. The limited amount of training data available due to the high cost of image acquisition and the extensive manual annotation required by physicians leads to the use of very shallow networks, further limiting performance. Furthermore, the low resolution of existing network outputs typically introduces quantization errors into keypoint prediction in lateral cephalometric radiographs. Therefore, there is a need to develop feature fusion neural networks capable of generating high-resolution predictive heatmaps to achieve accurate keypoint detection in lateral cephalometric radiographs. Summary of the Invention
[0004] To address the aforementioned problems in the prior art, the purpose of this invention is to provide a method for key point detection in lateral cephalometric radiographs based on a feature fusion neural network, which can train the feature fusion neural network on a small dataset to improve the accuracy of key point localization on lateral cephalometric radiographs.
[0005] To achieve the above objectives, the present invention provides a method for key point detection in lateral cephalometric radiographs based on feature fusion neural networks, the method comprising the following steps:
[0006] Step 1: Obtain lateral cephalometric radiographs and key point labels marked by the doctor;
[0007] Step 2: Segment the images and labels into training, validation, and test sets, and preprocess the dataset;
[0008] Step 3: Input the training set into the keypoint detection model and train the model;
[0009] Step 4: After each training cycle, input the validation set into the model to verify the model performance. After all training is completed, the best-performing pre-trained model is obtained.
[0010] Step 5: Input the test set into the pre-trained model to obtain the Gaussian heatmap of key points on the lateral cephalometric radiograph predicted by the model;
[0011] Step 6: The Gaussian heatmap represents the probability value of each pixel in the lateral cephalometric radiograph being a key point. The key point positions on each lateral cephalometric radiograph are obtained based on the Gaussian heatmap.
[0012] In one implementation, step 2 involves preprocessing the dataset, including:
[0013] Step 2-1: Divide the cephalometric radiograph dataset obtained in Step 1 into a training set Train with 150 images and labels, a test set Test1 with 150 images and labels, and a test set Test2 with 100 images and labels. Use the training set Train for training, the test set Test1 for validation, and the test set Test2 for testing.
[0014] Step 2-2: Perform data augmentation on the training set obtained in Step 2-1 to obtain the augmented training set. Data augmentation operations include random flipping, random cropping, and random bounding box transformation.
[0015] Step 2-3: Pad the validation set images, test set images obtained in Step 2-1 and the enhanced training set images obtained in Step 2-2 with zeros to a uniform size scale (1024, 1024, 3) to obtain the preprocessed training set images, validation set images and test set images.
[0016] Step 2-4: Convert the keypoint labels in the validation and test sets obtained in Step 2-1 and the keypoint labels in the augmented training set obtained in Step 2-2 into the JSON file format required by the model framework input, to obtain the training set JSON label file, the validation set JSON label file, and the test set JSON label file;
[0017] Steps 2-5: Read the training set JSON label file obtained in Step 2-4 to obtain the keypoint coordinates of each image in the preprocessed training set images obtained in Step 2-3. For each keypoint in each training set image, create a blank heatmap with the same size as the image. The initial values of the blank heatmap are all zero. Generate a Gaussian distribution for each keypoint on the blank heatmap. The Gaussian distribution can be represented as a two-dimensional normal distribution, as shown in the following formula:
[0018] ;
[0019] Where G(x, y) represents the output Gaussian heatmap, (x0, y0) are the coordinates of the keypoint, (x, y) represent the coordinates of any pixel on the blank heatmap, and σ is the standard deviation of the Gaussian distribution. For each keypoint on each training set image, an independent Gaussian heatmap is generated. The Gaussian heatmaps generated by all keypoints on each training set image are superimposed to form a multi-channel heatmap, which serves as the target for model training, thus obtaining the target Gaussian heatmap for each training set image.
[0020] In one implementation, in steps 2-5, the standard deviation σ of the Gaussian distribution is 6.
[0021] In one implementation, in step 3, the key point detection model includes modules such as a backbone network, a neck, a prediction head, and an encoder / decoder.
[0022] In one implementation, the backbone network module uses a pre-trained HRNet model as the feature extraction module.
[0023] In one implementation, the neck module employs a feature fusion neural network with a similar feature pyramid network structure, which includes two feature fusion paths from small-scale feature maps to large-scale feature maps and one feature fusion path from large-scale feature maps to small-scale feature maps; in the feature fusion process, pointwise convolution and depthwise convolution are used to fuse feature maps from different scales.
[0024] In one implementation, the prediction head module receives a feature map fused from the neck module and maps it onto a high-resolution heatmap, including encoding the feature map fused with pointwise convolution and pixel rearrangement and mapping it onto the heatmap.
[0025] In one implementation, the codec module includes an encoder and a decoder. The encoder encodes the coordinate values of the keypoint labels in the training set JSON label file obtained in steps 2-4 into a 2D Gaussian distribution heatmap, generating a target Gaussian heatmap as the target for model training. The decoder decodes the predicted Gaussian heatmap output by the model into the keypoint coordinate values on the model input image. The formula for the decoder to generate the keypoint coordinate values is:
[0026] ;
[0027] Where H(x, y) represents the value of the heatmap at pixel (x, y). max y max The point (W×H) in the heatmap represents the location of the maximum value, which is also the keypoint location. The heatmap H is a two-dimensional matrix with a size of W×H, and each channel represents the probability distribution of a keypoint. The point with the largest value in the heatmap, i.e., the maximum response point, is the most likely keypoint location in the heatmap.
[0028] In one implementation, in step 3, the loss function used to train the model is calculated using the following formula:
[0029] ;
[0030] Where L is the total loss for each training image, and H... j G represents the predicted heatmap of the j-th keypoint in each training image. j Let N represent the target Gaussian heatmap corresponding to the j-th key point, where N is the total number of key points, and ||·||2 represents the L2 norm between the predicted heatmap and the target heatmap.
[0031] In one implementation, step 4, obtaining the pre-trained model with optimal performance, is specifically done as follows:
[0032] Step 4-1: After each training cycle, input the preprocessed validation set images obtained in Step 2-3 into the cephalometric radiograph key point detection model to obtain a predicted Gaussian heatmap with the same scale as the input image output by the model.
[0033] Step 4-2: Input the predicted Gaussian heatmap obtained in Step 4-1 into the decoder part of the codec module to obtain the predicted key point coordinates on the input image;
[0034] Step 4-3: Compare the predicted keypoint coordinates on the input image obtained in Step 4-2 with the target keypoint coordinates in the validation set JSON tag file obtained in Step 2-4, and calculate the average distance (MRE) between the predicted keypoint coordinates and the target keypoint coordinates; calculate the percentage of predicted keypoints within the target keypoints within the range where the MRE is less than or equal to 2.0 mm, and obtain the success detection rate (SDR) 2.0 within the 2.0 mm range;
[0035] Step 4-4: For the success detection rate SDR2.0 value within 2.0 mm on the validation set of each cycle obtained in Step 4-3, it will be compared with the SDR2.0 value obtained in each previous cycle. If the SDR2.0 value of the current cycle is higher than the previous one, it means that the current model has the best performance. The current best model is saved. After all model training is completed, the pre-trained model with the best performance is obtained.
[0036] In one implementation, step 5 includes: inputting the preprocessed test set image obtained in steps 2-3 into the pre-trained model with the best performance obtained in steps 4-4, and the model outputting a predicted Gaussian heatmap with the same scale as the input image.
[0037] In one implementation, step 6, obtaining the key point positions on each lateral cephalometric image based on the Gaussian heatmap, specifically includes: inputting the predicted Gaussian heatmap obtained in step 5 into the decoder part of the codec module to generate the predicted key point coordinate values on the test set images.
[0038] The key point detection method for lateral cephalometric radiographs provided by this invention has the following beneficial effects:
[0039] 1. The entire cephalometric radiograph key point detection model is subdivided into modules such as backbone network, neck, predictive head, and encoder / decoder, realizing a standardized design process and a fully automatic and accurate key point localization method;
[0040] 2. The neck module employs a feature fusion neural network with a similar feature pyramid network structure, containing three feature fusion paths, achieving a good balance between network parameter count and keypoint detection efficiency. Compared to existing methods that use very shallow networks, limiting performance, this invention achieves accurate keypoint detection in lateral cephalometric radiographs.
[0041] 3. This invention can generate a high-resolution heatmap with the same scale as the model input image, and predict the coordinate values of key points based on the high-resolution heatmap, which further alleviates the quantization error problem and has high computational efficiency. Attached Figure Description
[0042] To better illustrate the technical solution of the present invention, related drawings are provided below for illustrative purposes. The embodiments of the present invention can be more intuitively understood and grasped through these drawings. It should be noted that these drawings are for illustration and example only and are not intended to limit the scope of the invention.
[0043] Figure 1 This is a flowchart of the method for detecting key points in lateral cephalometric radiographs based on feature fusion neural networks in an embodiment of the present invention;
[0044] Figure 2 This is a schematic diagram of the key point detection model of the lateral view of the skull in an embodiment of the present invention;
[0045] Figure 3 This is a schematic diagram of the feature fusion module in an embodiment of the present invention;
[0046] Figure 4 This is a schematic diagram of the codec in an embodiment of the present invention. Detailed Implementation
[0047] To more clearly illustrate the technical content and advantages of the present invention, the present invention will be described in detail with reference to the following embodiments. Multiple embodiments of the present invention can be understood in conjunction with the accompanying drawings, which are merely illustrative and should not be construed as limiting the present invention.
[0048] This embodiment establishes a standardized keypoint detection process for lateral cephalometric radiographs and a meticulously designed quantization bias reduction process, achieving a fully automated and accurate keypoint detection method for lateral cephalometric radiographs. This embodiment utilizes a feature fusion neural network, achieving a good balance between the number of network parameters and keypoint detection efficiency.
[0049] refer to Figure 1 The basic process of the key point detection method for lateral cephalometric radiographs based on feature fusion neural networks provided in this embodiment is as follows:
[0050] Step 1: Obtain lateral cephalometric radiographs and doctor-labeled keypoints. Specifically, this involves using lateral cephalometric radiographs and a dataset of labeled keypoints. This dataset contains 400 lateral cephalometric radiographs, divided into a training set of 150 images, a Test1 dataset of 150 images, and a Test2 dataset of 100 images. Each image in the dataset has a pixel size of 1935×2400 pixels, and each pixel is 0.1×0.1mm. 2 It contains the labeling information of 19 key points from lateral cephalometric radiographs. Each key point was manually marked and checked by two doctors, and the average of the two doctors' labeling results was used as the actual key point label.
[0051] Step 2: Segment the images and labels into training, validation, and test sets, and preprocess the dataset, specifically including:
[0052] Step 2-1: Divide the cephalometric radiograph dataset obtained in Step 1 into a training set Train with 150 images and labels, a test set Test1 with 150 images and labels, and a test set Test2 with 100 images and labels. Use the training set Train for training, the test set Test1 for validation, and the test set Test2 for testing.
[0053] Step 2-2: Perform data augmentation on the training set obtained in Step 2-1 to obtain the augmented training set. Data augmentation operations include random flipping, random cropping, and random bounding box transformation.
[0054] Step 2-3: Pad the validation set images, test set images obtained in Step 2-1, and images in the enhanced training set obtained in Step 2-2 with zeros to a uniform size scale (1024, 1024, 3) to obtain preprocessed training set images, validation set images, and test set images.
[0055] Step 2-4: Convert the keypoint labels in the validation and test sets obtained in Step 2-1 and the keypoint labels in the augmented training set obtained in Step 2-2 into JSON file format required by the model framework input, to obtain training set JSON label file, validation set JSON label file and test set JSON label file.
[0056] Steps 2-5: Read the training set JSON label file obtained in Step 2-4 to obtain the keypoint coordinates of each image in the preprocessed training set images obtained in Step 2-3. For each keypoint in each training set image, create a blank heatmap with the same size as the image. The initial values of the blank heatmap are all zero. Generate a Gaussian distribution for each keypoint on the blank heatmap. The Gaussian distribution can be represented as a two-dimensional normal distribution, as shown in the following formula:
[0057] ;
[0058] Where G(x, y) represents the output Gaussian heatmap, (x0, y0) are the coordinates of the keypoint, (x, y) represent the coordinates of any pixel on the blank heatmap, and σ is the standard deviation of the Gaussian distribution. For each keypoint on each training set image, an independent Gaussian heatmap is generated. The Gaussian heatmaps generated by all keypoints on each training set image are superimposed to form a multi-channel heatmap, which serves as the target for model training, resulting in the target Gaussian heatmap (1024*1024*19) for each training set image.
[0059] Step 3: Input the training set into the keypoint detection model and train the model, specifically including:
[0060] Step 3-1: Input the preprocessed training set images obtained in Step 2-3 into the cephalometric lateral radiograph key point detection model. After each training cycle, obtain the predicted Gaussian heatmap of each training set image at multiple scales output by the model.
[0061] Step 3-2: Compare the predicted Gaussian heatmap of each training image obtained in Step 3-1 with the target Gaussian heatmap of each training image obtained in Step 2-5 to calculate the loss, obtaining the loss for each training image in each cycle. Average the losses across all training images to obtain the total loss for each cycle of model training. The loss function calculation formula is:
[0062] ;
[0063] Where L is the total loss for each training image, and H... j G represents the predicted heatmap of the j-th keypoint in each training image. jLet N represent the target Gaussian heatmap corresponding to the j-th key point, where N is the total number of key points, and ||·||2 represents the L2 norm between the predicted heatmap and the target heatmap.
[0064] refer to Figure 2 In step 3-1, the cephalometric radiograph key point detection model provided in this embodiment includes modules such as a backbone network, neck, predictive head, and encoder / decoder.
[0065] The backbone network module uses a pre-trained HRNet model as its feature extraction module. The HRNet model is divided into HRNet-W32 and HRNet-W48, where 32 and 48 represent the widths of the high-resolution sub-networks in the last three stages of the model, respectively. We use the HRNet-W48 pre-trained model. The HRNet backbone network contains four parallel sub-networks. The feature map output by the first sub-network has a resolution one-quarter that of the input image. From top to bottom, the resolution of the sub-networks gradually decreases to half that of the previous sub-network, while the width (number of channels) increases by a factor of two. These multi-resolution sub-networks are connected in parallel and perform repeated multi-scale fusion. Each resolution-to-low resolution representation shares information from other parallel high-resolution-to-low resolution representations, resulting in richer resolution representations. After the training set image (1024*1024*3) is input into the cephalometric lateral radiograph keypoint detection model, it first extracts features through the HRNet backbone network to obtain feature maps from F2 to F5. Fi represents a feature map with a resolution of H×W divided by 2 to the power of i. Feature map F2 has a resolution of one-quarter of the input image resolution, and feature map F5 has a resolution of one-thirty-second of the input image resolution. The number of channels in feature maps F2 to F5 extracted by the backbone network HRNet are 48, 96, 192, and 384, respectively. These feature maps are then fused in the neck module.
[0066] The neck module employs a feature fusion neural network with a similar feature pyramid network structure, which includes two feature fusion paths from small-scale feature maps to large-scale feature maps (M5 to M2 and G5 to G2) and one feature fusion path from large-scale feature maps to small-scale feature maps (W2 to W5).
[0067] refer to Figure 3 To efficiently fuse features from multiple levels, the feature fusion module uses a series of separable convolutions to optimize model parameters. First, the feature maps F and M are concatenated by passing them through convolutional blocks and upsampling operations, then the concatenated feature maps are passed through pointwise convolutional blocks. Next, two consecutive modules are applied, including depthwise convolution, batch normalization (BN), pointwise convolution, and the GELU activation function. Finally, pointwise convolution operations generate a new fused feature map M.
[0068] In this embodiment, the feature map F5 (32*32*384) extracted by the backbone network HRNet is first processed by convolution to obtain feature map M5 (32*32*192). Then, a feature fusion operation is performed. First, feature map M5 is upsampled and interpolated to the resolution of the previous layer (F4) to ensure size matching, resulting in feature map M5 (64*64*192). Convolution is then performed on the previous-scale feature map F4 (64*64*192) with the number of channels remaining unchanged. Then, the interpolated feature map M5 is concatenated with the convolutional feature map F4 by channel, resulting in a concatenated feature map (64*64*384). Subsequently, a point convolution operation is performed on the concatenated feature map to reduce the number of channels to half (64*64*192). Then, two consecutive fusion modules are applied, and finally, a new feature map M4 (64*64*192) is generated by fusing the features of M5 and F4 through point convolution.
[0069] In this embodiment, the feature fusion process of subsequent feature maps M4 and F3, and M3 and F2 is the same as that of M5 and F4. Finally, the fused feature map M2 (256*256*48) is output. Then, the feature fusion path from large-scale feature map to small-scale feature map (W2 to W5) is performed. First, feature map M2 is convolved to obtain feature map W2 (256*256*96). Then, feature fusion is performed. Feature map W2 is downsampled and interpolated to reduce its size to the resolution of the next layer (M3), at which point feature map W2 (128*128*96) is obtained. Convolution is performed on the next scale feature map M3 (128*128*96) with the number of channels unchanged. Then, the interpolated feature map W2 and the convolved feature map M3 are concatenated by channel to obtain the concatenated feature map (128*128*192). Subsequently, point convolution operations are performed on the concatenated feature map to reduce the number of channels to half (128*128*96). Then, two consecutive fusion modules are applied, and finally, point convolution operations are performed to generate a new feature map W3 (128*128*96) after fusing the W2 and M3 features.
[0070] The feature fusion process from large-scale feature maps to small-scale feature maps (W2 to W5) is similar to that from M5 to M2, except that the resolution decreases while the number of channels gradually increases. Conversely, in the feature fusion process from small-scale feature maps to large-scale feature maps (M5 to M2), the resolution of the feature map increases after each fusion, while the number of channels gradually decreases. Finally, a second feature fusion process from small-scale feature maps to large-scale feature maps (G5 to G2) is performed, using the same method as from M5 to M2, resulting in the multi-scale fused feature map G2.
[0071] refer to Figure 4The prediction head module receives the feature map after feature fusion from the neck module and maps it to a high-resolution heatmap. This includes encoding the feature map after feature fusion and mapping it to the heatmap using point convolution and pixel rearrangement. First, point convolution is performed on the feature map G2 obtained from the neck module to adjust the number of channels to the number of embedding channels, generating a keypoint embedding representation, and then the GELU activation function is applied. Finally, a feature map H is generated through convolutional layers, and pixel rearrangement is used to upsample H to improve the spatial resolution, obtaining a final heatmap H2 (1024*1024*19) with the same resolution as the input image.
[0072] Step 4: After each training cycle, input the validation set into the model to verify its performance. After all training is completed, the best-performing pre-trained model is obtained, which includes:
[0073] Step 4-1: After each training cycle, input the preprocessed validation set images obtained in Step 2-3 into the cephalometric radiograph key point detection model to obtain a predicted Gaussian heatmap with the same scale as the input image output by the model.
[0074] Step 4-2: Input the predicted Gaussian heatmap obtained in Step 4-1 into the decoder part of the codec module to obtain the predicted key point coordinates on the input image;
[0075] Step 4-3: Compare the predicted keypoint coordinates on the input image obtained in Step 4-2 with the target keypoint coordinates in the validation set JSON tag file obtained in Step 2-4, and calculate the average distance (MRE) between the predicted keypoint coordinates and the target keypoint coordinates; calculate the percentage of predicted keypoints within the target keypoints within the range where the MRE is less than or equal to 2.0 mm, and obtain the success detection rate (SDR) 2.0 within the 2.0 mm range;
[0076] Step 4-4: For the success detection rate SDR2.0 value within 2.0 mm on the validation set of each cycle obtained in Step 4-3, it will be compared with the SDR2.0 value obtained in each previous cycle. If the SDR2.0 value of the current cycle is higher than the previous one, it means that the current model has the best performance. The current best model is saved. After all model training is completed, the pre-trained model with the best performance is obtained.
[0077] In this embodiment, the encoder-decoder module used in step 4-2 includes an encoder and a decoder. The encoder preprocesses the training set JSON label file obtained in step 2-4, encoding the coordinate values of the keypoint labels in the training set into a 2D Gaussian distribution heatmap to generate a target Gaussian heatmap, which serves as the target for model training. The decoder decodes the predicted Gaussian heatmap output by the model into the keypoint coordinate values on the model input image. The formula for the decoder to generate the keypoint coordinate values is:
[0078] ;
[0079] Where H(x, y) represents the value of the heatmap at pixel (x, y). max y max The point (W×H) in the heatmap represents the location of the maximum value, which is also the keypoint location. The heatmap H is a two-dimensional matrix with a size of W×H, and each channel represents the probability distribution of a keypoint. The point with the largest value in the heatmap, i.e., the maximum response point, is the most likely keypoint location in the heatmap.
[0080] Step 5: Input the test set into the pre-trained model to obtain the Gaussian heatmap of key points on the lateral cephalometric radiograph predicted by the model. Specifically, this includes: inputting the preprocessed test set image obtained in Step 2-3 into the best-performing pre-trained model obtained in Step 4-4, and the model outputting a predicted Gaussian heatmap with the same scale as the input image.
[0081] Step 6: The Gaussian heatmap represents the probability value of each pixel in the lateral cephalometric image as a key point. The key point position on each lateral cephalometric image is obtained based on the Gaussian heatmap. Specifically, the predicted Gaussian heatmap obtained in Step 5 is input into the decoder part of the codec module to generate the predicted key point coordinate values on the test set images.
[0082] This embodiment provides a keypoint detection method for lateral cephalometric radiographs based on feature fusion neural networks, focusing on solving the quantization bias problem and designing a unified keypoint detection process. This invention employs a design to generate high-resolution heatmaps, further mitigating the bias problem, and improves keypoint detection accuracy through heatmap prediction based on the high-resolution heatmaps.
[0083] The embodiments of the present invention have been described in detail above to help understand the technical solutions and effects of the present invention. However, these specific embodiments should not be construed as limiting the scope of the present invention. For those skilled in the art, various modifications, variations, or equivalent substitutions can be made without departing from the spirit and technical solutions of the present invention. All such variations should be considered to be included within the protection scope of the present invention.
Claims
1. A method for key point detection in lateral cephalometric radiographs based on feature fusion neural networks, characterized in that, The keypoint detection model comprises a backbone network, a neck region, a prediction head, and an encoder-decoder module. The backbone network module uses a pre-trained HRNet model as its feature extraction module to extract multi-scale feature maps from the input image. The neck region module employs a feature fusion network, containing two feature fusion paths from small-scale to large-scale feature maps and one feature fusion path from large-scale to small-scale feature maps. The prediction head module receives the fused feature maps from the neck region module and maps them onto a high-resolution heatmap. The encoder-decoder module includes an encoder and a decoder. The encoder encodes the coordinate values of keypoint labels in the training set into a 2D Gaussian heatmap, generating a target Gaussian heatmap as the target for model training. The decoder decodes the predicted Gaussian heatmap output by the model into the keypoint coordinate values on the input image. Step 1: Obtain lateral cephalometric radiographs and key point labels marked by the doctor; Step 2: Segment the images and labels into training, validation, and test sets, and preprocess the dataset; Step 3: Input the training set into the keypoint detection model and train the model; Step 4: After each training cycle, input the validation set into the model to verify the model performance. After all training is completed, the best-performing pre-trained model is obtained. Step 5: Input the test set into the pre-trained model to obtain the Gaussian heatmap of key points on the lateral cephalometric radiograph predicted by the model; Step 6: The Gaussian heatmap represents the probability value of each pixel in the lateral cephalometric radiograph being a key point. The key point positions on each lateral cephalometric radiograph are obtained based on the Gaussian heatmap.
2. The method for detecting key points in lateral cephalometric radiographs based on feature fusion neural networks according to claim 1, characterized in that, In step 1, obtaining the lateral cephalometric radiograph and the key point labels calibrated by the doctor includes: using the lateral cephalometric radiograph and the calibrated key point label dataset.
3. The method for detecting key points in lateral cephalometric radiographs based on feature fusion neural networks according to claim 1, characterized in that, In step 2, the process of segmenting the images and labels into training, validation, and test sets, and preprocessing the dataset, includes: Step 2-1: Divide the cephalometric radiograph dataset obtained in Step 1 into a training set Train with 150 images and labels, a test set Test1 with 150 images and labels, and a test set Test2 with 100 images and labels. Use the training set Train for training, the test set Test1 for validation, and the test set Test2 for testing. Step 2-2: Perform data augmentation on the training set obtained in Step 2-1 to obtain the augmented training set. Data augmentation operations include random flipping, random cropping, and random bounding box transformation. Step 2-3: Pad the validation set images, test set images obtained in Step 2-1 and the enhanced training set images obtained in Step 2-2 with zeros to a uniform size scale (1024, 1024, 3) to obtain the preprocessed training set images, validation set images and test set images. Step 2-4: Convert the keypoint labels in the validation and test sets obtained in Step 2-1 and the keypoint labels in the augmented training set obtained in Step 2-2 into the JSON file format required by the model framework input, to obtain the training set JSON label file, the validation set JSON label file, and the test set JSON label file; Steps 2-5: Read the training set JSON label file obtained in Step 2-4 to obtain the keypoint coordinates of each image in the preprocessed training set images obtained in Step 2-3. For each keypoint in each training set image, create a blank heatmap with the same size as the image. The initial values of the blank heatmap are all zero. Generate a Gaussian distribution for each keypoint on the blank heatmap. The Gaussian distribution can be represented as a two-dimensional normal distribution, as shown in the following formula: ; Where G(x, y) represents the output Gaussian heatmap, (x0, y0) are the coordinates of the keypoint, (x, y) represent the coordinates of any pixel on the blank heatmap, and σ is the standard deviation of the Gaussian distribution. For each keypoint on each training set image, an independent Gaussian heatmap is generated. The Gaussian heatmaps generated by all keypoints on each training set image are superimposed to form a multi-channel heatmap, which serves as the target for model training, thus obtaining the target Gaussian heatmap for each training set image.
4. The method for detecting key points in lateral cephalometric radiographs based on feature fusion neural networks according to claim 1, characterized in that, Step 3, which involves inputting the training set into the keypoint detection model and training the model, includes: Step 3-1: Input the preprocessed training set images obtained in Step 2-3 into the key point detection model, and obtain the predicted Gaussian heatmap of each training set image at multiple scales after each training cycle. Step 3-2: Compare the predicted Gaussian heatmap of each training image obtained in Step 3-1 with the target Gaussian heatmap of each training image obtained in Step 2-5 to calculate the loss, obtaining the loss for each training image in each cycle. Average the losses across all training images to obtain the total loss for each cycle of model training. The loss function calculation formula is: ; Where L is the total loss for each training image, and H... j Let G represent the predicted heatmap of the j-th keypoint in each training image. j Let N represent the target Gaussian heatmap corresponding to the j-th key point, where N is the total number of key points, and ||·||2 represents the L2 norm between the predicted heatmap and the target heatmap.
5. The method for detecting key points in lateral cephalometric radiographs based on feature fusion neural networks according to claim 1, characterized in that, In step 4, after each training cycle, the validation set is input into the keypoint detection model to verify its performance. After all training is completed, the pre-trained model with the best performance is obtained, including: Step 4-1: After each training cycle, input the preprocessed validation set image obtained in Step 2-3 into the key point detection model to obtain the predicted Gaussian heatmap output by the model with the same scale as the input image. Step 4-2: Input the predicted Gaussian heatmap obtained in Step 4-1 into the decoder part of the codec module to obtain the predicted key point coordinates on the input image; Step 4-3: Compare the predicted keypoint coordinates on the input image obtained in Step 4-2 with the target keypoint coordinates in the validation set JSON tag file obtained in Step 2-4, and calculate the average distance (MRE) between the predicted keypoint coordinates and the target keypoint coordinates; calculate the percentage of predicted keypoints within the target keypoints within the range where the MRE is less than or equal to 2.0 mm, and obtain the success detection rate (SDR) 2.0 within the 2.0 mm range; Step 4-4: For the success detection rate SDR2.0 value within 2.0 mm on the validation set of each cycle obtained in Step 4-3, it will be compared with the SDR2.0 value obtained in each previous cycle. If the SDR2.0 value of the current cycle is higher than the previous one, it means that the current model has the best performance. The current best model is saved. After all model training is completed, the pre-trained model with the best performance is obtained.
6. The method for detecting key points in lateral cephalometric radiographs based on feature fusion neural networks according to claim 1, characterized in that, In step 5, the step of inputting the test set into the pre-trained model to obtain the Gaussian heatmap of key points on the lateral cephalometric radiograph predicted by the model includes: for the best-performing pre-trained model obtained in step 4-4, inputting the preprocessed test set images obtained in step 2-3 into the pre-trained model for testing, and the model outputting a predicted Gaussian heatmap with the same scale as the input image.
7. The method for detecting key points in lateral cephalometric radiographs based on feature fusion neural networks according to claim 1, characterized in that, In step 6, the Gaussian heatmap represents the probability value of each pixel in the lateral cephalometric image being a key point. The key point positions on each lateral cephalometric image are obtained based on the Gaussian heatmap, including: inputting the predicted Gaussian heatmap obtained in step 5 into the decoder part of the codec module to generate the predicted key point coordinate values on the test set images.