Real-time visual monitoring method based on edge AI
By using edge AI to lightweight YOLOv8-pose model, real-time monitoring of vehicle parking status in weighbridge scenarios is achieved, solving the problems of high cost and low accuracy of traditional systems. It is suitable for scenarios with frequently moving weighbridges and provides an intelligent upgrade solution.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING UNIV OF SCI & TECH
- Filing Date
- 2024-12-17
- Publication Date
- 2026-06-19
AI Technical Summary
Existing unmanned weighbridge systems are costly, cumbersome to install and maintain, and traditional monocular vision object positioning has poor accuracy, making it difficult to meet the requirements of real-time performance and precision, especially in frequently moving scenarios such as construction sites where their application is limited.
A lightweight YOLOv8-pose model based on edge AI is adopted. Through training on multi-view datasets and data augmentation, a key point detection model for vehicle wheels and four corners of the weighbridge is constructed. Combined with homography transformation, the real-time monitoring of vehicle parking status is realized and deployed on edge devices.
It reduces the number of model parameters and computational load, improves detection speed and accuracy, and achieves accurate positioning of vehicle parking status. It is suitable for scenarios with frequently moving weighbridges, and reduces system cost and installation and maintenance complexity.
Smart Images

Figure CN122244777A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of image processing based on edge intelligence and deep learning, and in particular, it is a real-time visual monitoring method for vehicle parking status in a weighbridge weighing scenario based on edge AI. Background Technology
[0002] With the rapid development of deep learning technology, more and more artificial intelligence applications are being implemented. Edge intelligence is a new technology that deploys artificial intelligence in edge scenarios, enabling the efficient execution of intelligent algorithms on resource-constrained edge devices, with significant advantages such as low latency and low cost. Currently, most deep learning models have high hardware resource requirements. How to ensure the accuracy of model calculation results while meeting the requirements of real-time computation is a significant challenge in edge intelligence research.
[0003] Weighing and measuring vehicles has always been an essential part of many industries. Traditional manual weighing methods are not only inefficient and costly, but also prone to false reporting and underreporting. To save costs and improve weighing efficiency, building an automated weighing management system is particularly important. Currently, most unattended weighbridge systems rely on infrared sensors to determine whether vehicles are correctly parked on the weighbridge, typically consisting of a barrier gate, infrared sensors, and surveillance cameras. While these systems achieve automation, their installation and maintenance costs are high, and disassembly and assembly are cumbersome, making them unsuitable for applications such as construction sites where weighbridges need to be frequently moved.
[0004] A real-time monitoring method for vehicle parking status in weighbridge scenarios based on monocular vision can effectively solve the above problems. However, monocular cameras cannot directly provide depth information of objects, making it impossible to directly estimate the actual size and distance of objects. Early monocular vision object localization methods were mostly based on geometric principles and image feature extraction, which were limited in application by factors such as lighting and occlusion. In recent years, with the rapid development of deep learning technology, data-driven methods have become the mainstream direction for monocular vision object localization. However, the current existing monocular 3D target detection models still have poor localization accuracy, making it difficult to meet the needs of practical applications. Summary of the Invention
[0005] To address the problems existing in the prior art, this invention provides a real-time visual monitoring method for vehicle parking status in a weighbridge weighing scenario based on edge AI. This method can accurately locate the vehicle parking position and determine whether the vehicle has crossed the boundary while maintaining real-time performance.
[0006] The technical solution to achieve the purpose of this invention is: a real-time visual monitoring method for vehicle parking status in a weighbridge weighing scenario based on edge AI, comprising the following steps:
[0007] Step 1: Collect multi-view and multi-type engineering vehicle datasets and weighbridge datasets, and perform data cleaning and annotation to generate corresponding YOLO annotation files;
[0008] Step 2: Divide the engineering vehicle dataset and the weighbridge dataset into training set, validation set and test set respectively, and perform data augmentation on the training set;
[0009] Step 3: Construct two lightweight YOLOv8-pose models. Train the vehicle wheel key point detection model on the engineering vehicle dataset and train the weighbridge four corner key point detection model on the weighbridge dataset.
[0010] Step 4: Export the two lightweight YOLOv8-pose models as ONNX format models, then convert them to a model format supported by edge devices, and deploy them to edge devices;
[0011] Step 5: At fixed intervals, use the weighbridge four-corner key point detection model to detect the camera image, obtain the position coordinates of the four corners of the weighbridge, and record them;
[0012] Step 6: Use the vehicle wheel key point detection model to infer the camera image, and after identifying the vehicle, locate the wheel key points to obtain the coordinates of the vehicle wheel contact point with the ground.
[0013] Step 7: For the coordinates of the four corners of the weighbridge and the coordinates of the vehicle wheels in contact with the ground obtained in Step 5 and Step 6, use homography transformation to project them onto a top-down view, and use the coordinates of the four corners of the weighbridge and the coordinates of the vehicle wheels in contact with the ground to determine whether the vehicle is parked correctly.
[0014] Compared with existing technologies, the significant advantages of this invention are as follows: 1) This invention adopts a monocular vision-based detection method to monitor the parking status of vehicles in weighbridge weighing scenarios in real time and deploys it to edge devices. Compared with sensor-based unattended weighbridge systems, this solution has the characteristics of low cost, simple installation and maintenance, and easy disassembly and relocation. It is particularly suitable for scenarios such as construction sites where weighbridges need to be moved frequently, providing a new solution for the intelligent upgrade of automated weighbridge weighing management systems; 2) This invention improves the network structure of YOLOv8-pose, which can significantly reduce the number of model parameters and computational load, and while improving the detection speed of the model, it can still maintain high detection accuracy, thereby better meeting the real-time and accuracy requirements of weighbridge weighing scenarios.
[0015] The present invention will now be described in further detail with reference to the accompanying drawings. Attached Figure Description
[0016] Figure 1This invention provides a real-time visual monitoring method for vehicle parking status in a weighbridge weighing scenario based on edge AI.
[0017] Figure 2 A schematic diagram of the key points of the wheels of an engineering vehicle (Example 1);
[0018] Figure 3 A schematic diagram of the key points of the wheels of an engineering vehicle (Example 2);
[0019] Figure 4 A schematic diagram showing the key locations of a pit-type weighbridge;
[0020] Figure 5 This is a schematic diagram showing the key locations of a bridge-type weighbridge.
[0021] Figure 6 Here is a structural diagram of the C2REPA module;
[0022] Figure 7 This is a structural diagram of the REPAPoseHead detection head module;
[0023] Figure 8 The network structure diagram for the lightweight YOLOv8-pose model;
[0024] Figure 9 Diagram showing the camera deployment locations;
[0025] Figure 10 This is an example diagram illustrating the detection of vehicle parking status in a weighbridge weighing scenario.
[0026] Figure 11 This is a schematic diagram of perspective conversion for homography transformation. Detailed Implementation
[0027] The present invention will be further described below with reference to the accompanying drawings and embodiments.
[0028] Example 1:
[0029] Combination Figure 1This invention discloses a real-time visual monitoring method for vehicle parking status in a weighbridge weighing scenario based on edge AI. The method uses the YOLOv8-pose model and, based on the original YOLOv8-pose model, employs an improved re-parameterization feature extraction module C2REPA (Re-parameterization Block with 2 Convolutions), thereby improving inference speed while effectively extracting features. Simultaneously, an improved re-parameterization detection head REPAPoseHead (Re-parameterization Pose Head) is introduced to generate multi-scale target detection results based on feature maps output from the backbone and feature pyramid. The method includes the following steps:
[0030] Step 1: Collect multi-view, multi-type engineering vehicle datasets and weighbridge datasets, perform data cleaning and annotation, and generate corresponding YOLO annotation files. The specific steps are as follows:
[0031] Step 1.1: For the engineering vehicle dataset, a large number of network images containing concrete trucks, mixer trucks, and trucks (including dump trucks, tractor-trailers, and other models) were selected and collected as the training and validation sets. Simultaneously, a large number of keyframes containing engineering vehicles were extracted from video footage as the test set. During the annotation process, detection boxes were used to label the engineering vehicles and correctly label their vehicle categories. Furthermore, for each engineering vehicle's four front and rear wheels, keypoints were used to label the contact points between the inner and outer sides of the wheels and the ground (only visible keypoints were labeled), and the keypoints were correctly classified and labeled, such as... Figure 2 and Figure 3 As shown, there are a total of 8 keypoints for the four tires of the engineering vehicle (for trucks with more than 4 wheels, the middle wheel is ignored, and only the four front and rear wheels are considered), namely, the outer keypoint of the left front wheel (LFO), the inner keypoint of the left front wheel (LFI), the outer keypoint of the left rear wheel (LRO), the inner keypoint of the left rear wheel (LRI), the outer keypoint of the right front wheel (RFO), the inner keypoint of the right front wheel (RFI), the outer keypoint of the right rear wheel (RRO), and the inner keypoint of the right rear wheel (RRI). According to the above method, a total of 2334 images of engineering vehicles (including concrete trucks, trucks, and tank trucks) and 892 images of weighbridges (including bridge weighbridges and embedded weighbridges) were collected and labeled for model training. After the dataset was labeled, the dataset was exported in YOLO format.
[0032] Step 1.2: For the weighbridge dataset, a large number of network images containing both bridge weighbridges and pit weighbridges were selected and collected as the training and validation sets. Simultaneously, a large number of keyframes containing weighbridges were extracted from video footage as the test set. During the annotation process, bounding boxes were used to label the weighbridge platform area and correctly label the classification information. Additionally, for the four corners of the weighbridge, keypoints were used to label their positions (only visible keypoints were labeled), and the keypoints were correctly classified and labeled, such as... Figure 4 and Figure 5 As shown, there are a total of 4 key points: the top left key point (TLC), the bottom left key point (BLC), the top right key point (TRC), and the bottom right key point (BRC). After the dataset is labeled, the dataset is exported in YOLO format.
[0033] Step 2: Divide the engineering vehicle dataset and the weighbridge dataset into training, validation, and test sets respectively, and perform data augmentation on the training set. The specific steps are as follows:
[0034] Step 2.1: For the engineering vehicle dataset, the collected network images are divided into training and validation sets in a 9:1 ratio, and keyframes extracted from the videos are used as the test set. For the training set, it is first horizontally flipped, and the flipped data is merged with the original training set. Then, the merged training set is further enhanced, including adding Gaussian noise and randomly adjusting the brightness to improve the model's generalization ability. The same data partitioning and data augmentation methods are used for the weighbridge dataset.
[0035] Step 3: Construct two lightweight YOLOv8-pose models. Train one on the engineering vehicle dataset to obtain a vehicle wheel keypoint detection model, and train the other on the weighbridge dataset to obtain a weighbridge four-corner keypoint detection model. The specific steps are as follows:
[0036] Step 3.1: Construct two lightweight YOLOv8-pose models with identical network structures and network size n. Replace the C2f module and detector head in the YOLOv8-pose models with the C2REPA module and REPAPoseHead detector head, respectively. The structure of the C2REPA module is shown in [link to C2REPA module structure]. Figure 6 The structure of the REPAPoseHead detection head is shown in [link to REPAPoseHead]. Figure 7 The network structure of the YOLOv8-pose model is shown below. Figure 8 The formula for calculating the keypoint loss function used by YOLOv8-pose is as follows:
[0037]
[0038]
[0039] Where N is the total number of samples in each training batch, n i Let f be the total number of objects in the i-th sample, P represent the total number of keypoints, and f be the total number of objects in the i-th sample. ij Let δ(v) be the keypoint loss factor for the j-th object in the i-th sample. jk ) represents the visibility indicator function (value 1 when the k-th keypoint of the j-th object in the i-th sample is visible, otherwise 0), ε represents a very small number (used to avoid division by zero error), e ik σ represents the normalization error of the k-th keypoint of the j-th object in the i-th sample. k The area represents the scale constant of the k-th key point. j Let represent the bounding box area of the j-th object in the i-th sample; in the original YOLOv8pose model, the scale constant σ was set for 17 human keypoints based only on the COCO human keypoint detection dataset. k For other types of critical point tasks, the scale constant is uniformly set as:
[0040]
[0041] To improve the model's performance during training, the scale constant σ of all key points is uniformly set to 0.05 in this invention.
[0042] During model training, the initial learning rate was adjusted to 0.001, the Mosaic enhancement strategy was enabled, and training was performed on two datasets respectively. The model weights for each round of training were saved until the model converged.
[0043] Step 3.2: Use mAP@50 Box PCK weighted and OKS weighted The model is evaluated using the evaluation metrics, and the model score is calculated based on the evaluation metrics. The model with the highest score on the validation set is selected.
[0044] mAP@50 Box This indicates that in the object detection task, the mean average precision (mAP) is calculated when the IoU is 0.5. The formula for mAP is:
[0045]
[0046] Where N represents the total number of categories, P i (r) represents the precision and recall curves for the i-th category;
[0047] PCK weightecdPCK (Percentage of Correct Keypoints) represents the weighted proportion of keypoints correctly predicted by the model across all categories. Its calculation formula is as follows:
[0048]
[0049] Where N represents the total number of categories, n i PCK represents the total number of objects in the i-th category. i d represents the proportion of correctly predicted keypoints in the i-th category, P represents the total number of keypoints for each object, and d represents the proportion of correctly predicted keypoints in the i-th category. jk This represents the Euclidean distance between the predicted value and the manually labeled value of the k-th keypoint of the j-th object in the i-th category. Let represent the normalized reference scale of the j-th object in the i-th category, T represent the tolerance threshold (set to 0.025 in this invention), H(A) represent the indicator function (1 when the condition is met, 0 otherwise), and δ(v jk ) represents the visibility indicator function (it takes a value of 1 when the k-th keypoint of the j-th object in the i-th category is visible, and 0 otherwise);
[0050] OKS weighted OKS (Object Keypoint Similarity) represents the weighted similarity between model-predicted keypoints and ground truth keypoints across all categories. Its calculation formula is as follows:
[0051]
[0052] Where N represents the total number of categories, n i OKS represents the total number of objects in the i-th category. i Let d represent the average similarity between predicted keypoints and actual keypoints for all objects in the i-th category, P represent the total number of keypoints for each object, and d represent the total number of keypoints for each object. jk s represents the Euclidean distance between the predicted value and the manually labeled value of the k-th keypoint of the j-th object in the i-th category. j σ represents the normalized reference scale of the j-th object in the i-th category (in this invention, it is the square root of the object's bounding box area). k The scale constant of the k-th keypoint (reflecting the detection difficulty of different keypoints, uniformly set to 0.5 in this invention), δ(v jk ) represents the visibility indicator function (it takes a value of 1 when the k-th keypoint of the j-th object in the i-th category is visible, and 0 otherwise);
[0053] When the model's mAP@50 on the validation set is calculated Box PCK weighted and OKSweighted Then, the score of the model is calculated, and the model with the highest score on the validation set is selected. The score calculation formula is as follows:
[0054] score = PCK weighted *0.5+OKS weighted *0.4+mAP@50 Box *0.1 (14)
[0055] Step 4: Export the two lightweight YOLOv8-pose models as ONNX format models, then convert them to a model format supported by edge devices, and deploy them to edge devices. The specific steps are as follows:
[0056] Step 4.1: The edge device used in this invention is an Orange PI 5B with a built-in RK3588S chip. First, the two trained lightweight YOLOv8-pose models are exported to ONNX format. Then, the rknn-tookit2 tool is used to convert the ONNX format model into an RKNN format model. During the conversion process, each layer of the model is first quantized with int8 precision, and the precision loss before and after quantization is compared. Then, a hybrid quantization strategy is adopted. For some layers with large precision loss, float16 precision is maintained, while other layers continue to use int8 precision quantization to ensure that the precision loss before and after model conversion is controlled within an ideal range. Finally, the two RKNN models are deployed on the edge device. The deployment process for other edge device models is the same.
[0057] Step 5: At fixed intervals, use the weighbridge four-corner key point detection model to detect the camera footage, obtain the position coordinates of the four corners of the weighbridge, and record them. The specific steps are as follows:
[0058] Step 5.1: During actual deployment, the camera remains fixed, and the camera deployment orientation is as follows: Figure 9 As shown, there are four camera positions, with priority given to positions 1 and 2. Each position is designed to capture the four corner key points of the weighbridge without obstruction, and each position must capture at least three vehicle wheel key points (position 1 should capture LFO, LRO, and RFI; position 2 should capture RFO, RRO, and LFI; position 3 should capture RFO, RRO, and LRI; and position 4 should capture LFO, LRO, and RRI). Every certain time interval (e.g., 15 minutes), a frame containing only the weighbridge is captured and used to detect the four corner key points using the weighbridge corner key point detection model. The latest coordinates of the four corner key points in the camera view are then obtained and replaced with the original coordinates to prevent changes in camera angle due to external influences. This ensures that the coordinates of the four corner key points remain consistent with the actual coordinates during subsequent vehicle parking status detection.
[0059] Step 6: Use the vehicle wheel key point detection model to infer the camera image, and after identifying the vehicle, locate the wheel key points to obtain the coordinates of the vehicle wheel contact point with the ground. The specific steps are as follows:
[0060] Step 6.1: Capture one frame every fixed number of frames (e.g., 30 frames) and perform a difference calculation with the previously captured image to determine whether a vehicle has passed by, thereby reducing the computational load on the device. The image difference calculation formula is as follows:
[0061]
[0062] Among them, D mean Let D represent the average difference between the previously sampled image and the currently sampled image, where W and H represent the width and height of the image, respectively. I1(x, y) represents the grayscale value at pixel position (x, y) in the previously sampled image, and I2(x, y) represents the grayscale value at pixel position (x, y) in the currently sampled image. mean If the value is greater than the threshold T1, it indicates that there has been a significant change between the previous sampled image and the current sampled image, and a vehicle may have entered the camera's field of view. Proceed to step 6.2; otherwise, repeat step 6.1.
[0063] Step 6.2: When D mean When the value exceeds the threshold T1, the current sampled image is input into the engineering vehicle wheel key point detection model to obtain and parse the model output data. If no vehicle information is detected, the process returns and re-executes step 6.1. If vehicle information is successfully detected, the image acquisition time is shortened, and a new sampled image is input into the engineering vehicle wheel key point detection model to obtain the detection result. Simultaneously, the ByteTrack algorithm is used to track the vehicle, and the D value between the current sampled image and the previous sampled image is calculated. mean Value, if D three times consecutively mean If the value is less than the threshold T2, the vehicle is considered to have stopped and stabilized, and the coordinate information of the key points of the vehicle wheels detected at this time is recorded.
[0064] Step 7: For the coordinates of the four corners of the weighbridge and the coordinates of the vehicle wheels in contact with the ground obtained in Steps 5 and 6, use homography transformation to project them onto a top-down view. Then, using the coordinates of the four corners of the weighbridge and the coordinates of the vehicle wheels in contact with the ground from the top-down view, determine whether the vehicle is correctly parked. The specific steps are as follows:
[0065] Step 7.1: After the vehicle has come to a stable stop, use homography transformation to project the coordinates of the four corners of the weighbridge and the coordinates of the vehicle wheels' contact points with the ground onto a top-down view. The homography transformation formula is:
[0066]
[0067] Where (x,y) represents a point on the original plane, (x′,y′) represents a point on the target plane, H represents the homography matrix, and w′ represents the normalization factor;
[0068] The length and width of the weighbridge are L and W (known), and the coordinates of the four corners of the weighbridge in the camera's view are (x1, y1), (x2, y1), (x1, y2), and (x2, y2). The homography matrix H can be calculated using four pairs of corresponding points: (x1, y1) and (0, 0), (x2, y1) and (0, W), (x2, y1) and (L, 0), and (x2, y2) and (L, W). Then, the coordinates of the vehicle's wheel contact points with the ground are projected onto a top-down view using the homography matrix H to determine if the contact points are within the weighbridge's range. Figure 10 and Figure 11 As shown, Figure 10 This is an example image showing the detection of a vehicle's parked position in a weighbridge weighing scenario. The left side shows a schematic diagram of the four corners of the weighbridge and key points of the engineering vehicle's wheels, taken from the perspective of camera position 2. The right side shows the relative position of the vehicle and the weighbridge from an actual overhead view. Figure 11 This is a schematic diagram of perspective transformation due to homography; the left side is... Figure 10 The coordinates of key points captured by camera position 2 are shown on the right. After applying homography transformation, the coordinates of the four corner key points of the weighbridge and the key points of the vehicle wheels are shown in the transformed view.
[0069] In summary, this invention uses an improved lightweight YOLOv8-pose model to achieve real-time monitoring of vehicle parking status in weighbridge weighing scenarios, and deploys this method on edge devices, providing a new solution for the intelligent upgrade of automated weighbridge weighing management systems.
[0070] The above description elaborates on the specific implementation details of the present invention to help fully understand the invention. However, the above description is only a preferred embodiment of the present invention, and the present invention can be implemented in many different ways. Therefore, the present invention is not limited to the specific implementations disclosed above. Any simple modifications, equivalent changes, and alterations made to the above embodiments based on the technical essence of the present invention without departing from the content of the technical solution of the present invention shall fall within the protection scope of the technical solution of the present invention.
Claims
1. A visual monitoring method for vehicle parking status in a weighbridge weighing scenario based on edge AI, characterized in that: Based on the original YOLOv8-pose model, an improved re-parameterization feature extraction module C2REPA (Re-parameterization Block with 2 Convolutions) is adopted, thereby improving the inference speed while effectively extracting features. An improved re-parameterization detection head REPAPoseHead (Re-parameterization PoseHead) is introduced to generate multi-scale target detection results based on the feature maps output from the backbone and feature pyramid. The method includes the following steps: Step 1: Collect multi-view and multi-type engineering vehicle datasets and weighbridge datasets, and perform data cleaning and annotation to generate corresponding YOLO annotation files; Step 2: Divide the engineering vehicle dataset and the weighbridge dataset into training set, validation set and test set respectively, and perform data augmentation on the training set; Step 3: Construct two lightweight YOLOv8-pose models. Train the vehicle wheel key point detection model on the engineering vehicle dataset and train the weighbridge four corner key point detection model on the weighbridge dataset. Step 4: Export the two lightweight YOLOv8-pose models as ONNX format models, then convert them to a model format supported by edge devices, and deploy them to edge devices; Step 5: At fixed intervals, use the weighbridge four-corner key point detection model to detect the camera image, obtain the position coordinates of the four corners of the weighbridge, and record them; Step 6: Use the vehicle wheel key point detection model to infer the camera image, and after identifying the vehicle, locate the wheel key points to obtain the coordinates of the vehicle wheel contact point with the ground. Step 7: For the coordinates of the four corners of the weighbridge and the coordinates of the vehicle wheels in contact with the ground obtained in Step 5 and Step 6, use homography transformation to project them onto a top-down view, and use the coordinates of the four corners of the weighbridge and the coordinates of the vehicle wheels in contact with the ground to determine whether the vehicle is parked correctly.
2. The method according to claim 1, characterized in that, Step 1 includes the following steps: Step 1.1: For the engineering vehicle dataset, a large number of network images containing concrete trucks, tank trucks, and trucks (including dump trucks, tractor-trailers, and other models) are selected and collected as the training and validation sets. Simultaneously, a large number of keyframes containing engineering vehicles are extracted from video footage as the test set. During the annotation process, detection boxes are used to label the engineering vehicles and correctly label their vehicle categories. Furthermore, for the four front and rear wheels of each engineering vehicle, keypoints are used to label the contact points between the inner and outer sides of the wheels and the ground (only visible keypoints are labeled), and the keypoints are correctly classified and labeled. After the dataset annotation is completed, the dataset is exported in YOLO format. Step 1.2: For the weighbridge dataset, filter and collect a large number of network images containing bridge weighbridges and pit weighbridges as training and validation sets. At the same time, extract a large number of keyframes containing weighbridges from video footage as test sets. During the annotation process, use bounding boxes to annotate the weighbridge platform area and label it with the correct classification information. Also, for the four corners of the weighbridge, use keypoints to annotate their positions (only annotate visible keypoints) and correctly classify and label the keypoints. After the dataset annotation is completed, export the dataset in YOLO format.
3. The method according to claim 1, characterized in that, Step 2 includes the following steps: Step 2.1: For the engineering vehicle dataset, the collected network images are divided into training and validation sets in a 9:1 ratio, and keyframes extracted from the videos are used as the test set. For the training set, it is first horizontally flipped, and the flipped data is merged with the original training set. Then, the merged training set is further enhanced, including adding Gaussian noise and randomly adjusting the brightness to improve the model's generalization ability. The same data partitioning and data augmentation methods are used for the weighbridge dataset.
4. The method according to claim 1, characterized in that, Step 3 includes the following steps: Step 3.1: Construct two lightweight YOLOv8-pose models with identical network structures and network size n. Replace the C2f module and detector head in YOLOv8-pose with the C2REPA module and REPAPoseHead detector head, respectively. The keypoint loss function used in YOLOv8-pose is calculated as follows: Where N is the total number of samples in each training batch, n i Let f be the total number of objects in the i-th sample, P represent the total number of keypoints, and f be the total number of objects in the i-th sample. ij Let δ(v) be the keypoint loss factor for the j-th object in the i-th sample. jk ) represents the visibility indicator function (value 1 when the k-th keypoint of the j-th object in the i-th sample is visible, otherwise 0), ε represents a very small number (used to avoid division by zero error), e jk σ represents the normalization error of the k-th keypoint of the j-th object in the i-th sample. k The area represents the scale constant of the k-th key point. j Let represent the bounding box area of the j-th object in the i-th sample; in the original YOLOv8 pose model, the scale constant σ was set for 17 human keypoints based only on the COCO human keypoint detection dataset. k For other types of critical point tasks, the scale constant is uniformly set as: To improve the model's performance during training, the scale constant σ of all key points is uniformly set to 0.05 in this invention. During model training, the initial learning rate was adjusted to 0.001, the Mosaic enhancement strategy was enabled, and training was performed on two datasets respectively. The model weights for each round of training were saved until the model converged. Step 3.2: Use mAP@50 Box PCK weighted and OKS weighted The model is evaluated using the evaluation metrics, and the model score is calculated based on the evaluation metrics. The model with the highest score on the validation set is selected. mAP@50 Box This indicates that in the object detection task, the mean average precision (mAP) is calculated when the IoU is 0.
5. The formula for calculating mAP is: Where N represents the total number of categories, P i (r) represents the precision and recall curves for the i-th category; PCK weighted PCK (Percentage of Correct Keypoints) represents the weighted proportion of keypoints correctly predicted by the model across all categories. Its calculation formula is as follows: Where N represents the total number of categories, n i PCK represents the total number of objects in the i-th category. i d represents the proportion of correctly predicted keypoints in the i-th category, P represents the total number of keypoints for each object, and d represents the proportion of correctly predicted keypoints in the i-th category. jk This represents the Euclidean distance between the predicted value and the manually labeled value of the k-th keypoint of the j-th object in the i-th category. Let represent the normalized reference scale of the j-th object in the i-th category, T represent the tolerance threshold (set to 0.025 in this invention), H(A) represent the indicator function (1 when the condition is met, 0 otherwise), and δ(v jk ) represents the visibility indicator function (it takes a value of 1 when the k-th keypoint of the j-th object in the i-th category is visible, and 0 otherwise); OKS weighted OKS (Object Keypoint Similarity) represents the weighted similarity between model-predicted keypoints and ground truth keypoints across all categories. Its calculation formula is as follows: Where N represents the total number of categories, n i OKS represents the total number of objects in the i-th category. i Let d represent the average similarity between predicted keypoints and actual keypoints for all objects in the i-th category, P represent the total number of keypoints for each object, and d represent the total number of keypoints for each object. jk s represents the Euclidean distance between the predicted value and the manually labeled value of the k-th keypoint of the j-th object in the i-th category. j σ represents the normalized reference scale of the j-th object in the i-th category (in this invention, it is the square root of the object's bounding box area). k The scale constant of the k-th keypoint (reflecting the detection difficulty of different keypoints, uniformly set to 0.5 in this invention), δ(v jk ) represents the visibility indicator function (it takes a value of 1 when the k-th keypoint of the j-th object in the i-th category is visible, and 0 otherwise); When the model's mAP@50 on the validation set is calculated Box PCK weighted and OKS weighted Then, the score of the model is calculated, and the model with the highest score on the validation set is selected. The score calculation formula is as follows: score=PCK weighted *0.5+OKS weighted *0.4+mAP@50 Box *0.1 (14) 5. The method according to claim 1, characterized in that, Step 4 includes the following steps: Step 4.1: The edge device used in this invention is an Orange PI 5B with a built-in RK3588S chip. First, the two trained lightweight YOLOv8-pose models are exported to ONNX format. Then, the rknn-tookit2 tool is used to convert the ONNX format model into an RKNN format model. During the conversion process, each layer of the model is first quantized with int8 precision, and the precision loss before and after quantization is compared. Then, a hybrid quantization strategy is adopted. For some layers with large precision loss, float16 precision is still maintained, while other layers continue to use int8 precision quantization to ensure that the precision loss before and after the model conversion is controlled within the ideal range. Finally, the two rknn models are deployed on the edge device. The deployment process for other edge device models is the same.
6. The method according to claim 1, characterized in that, Step 5 includes the following steps: Step 5.1: During actual deployment, the camera remains fixed, and at regular intervals (e.g., every 15 minutes), a frame containing only the weighbridge is captured. The weighbridge four-corner key point detection model is used to detect the latest coordinates of the four corner key points in the camera image and replace the original coordinates of the four corner key points. This is to avoid changes in the camera angle due to external influences, thereby ensuring that the coordinates of the four corner key points of the weighbridge remain consistent with the actual coordinates in subsequent vehicle parking status detection.
7. The method according to claim 1, characterized in that, Step 6 includes the following steps: Step 6.1: Capture one frame every fixed number of frames (e.g., 30 frames) and perform a difference calculation with the previously captured image to determine whether a vehicle has passed by, thereby reducing the computational load on the device. The image difference calculation formula is as follows: Among them, D mean Let D represent the average difference between the previously sampled image and the currently sampled image, where W and H represent the width and height of the image, respectively. I1(x, y) represents the grayscale value at pixel position (x, y) in the previously sampled image, and I2(x, y) represents the grayscale value at pixel position (x, y) in the currently sampled image. mean If the value is greater than the threshold T1, it indicates that there has been a significant change between the previous sampled image and the current sampled image, and a vehicle may have entered the camera's field of view. Proceed to step 6.2; otherwise, repeat step 6.
1. Step 6.2: When D mean When the value exceeds the threshold T1, the current sampled image is input into the engineering vehicle wheel key point detection model to obtain and parse the model output data. If no vehicle information is detected, the process returns and re-executes step 6.
1. If vehicle information is successfully detected, the image acquisition time is shortened, and a new sampled image is input into the engineering vehicle wheel key point detection model to obtain the detection result. Simultaneously, the ByteTrack algorithm is used to track the vehicle, and the D value between the current sampled image and the previous sampled image is calculated. mean Value, if D three times consecutively mean If the value is less than the threshold T2, the vehicle is considered to have stopped and stabilized, and the coordinate information of the key points of the vehicle wheels detected at this time is recorded.
8. The method according to claim 1, characterized in that, Step 7 includes the following steps: Step 7.1: After the vehicle has come to a stable stop, use homography transformation to project the coordinates of the four corners of the weighbridge and the coordinates of the vehicle wheels' contact points with the ground onto a top-down view. The homography transformation formula is: Where (x, y) represents a point on the original plane, (x′, y′) represents a point on the target plane, H represents the homography matrix, and w′ represents the normalization factor; The length and width of the weighbridge are L and W (known), and the coordinates of the four corners of the weighbridge in the camera image are (x1, y1), (x2, y1), (x1, y2), and (x2, y2). Then, the homography matrix H can be calculated by using four pairs of corresponding points: (x1, y1) and (0, 0), (x2, y1) and (0, W), (x2, y1) and (L, 0), and (x2, y2) and (L, W). Then, the coordinates of the contact points between the vehicle wheels and the ground are projected onto the top view through the homography matrix H to determine whether the contact points between the vehicle wheels and the ground are within the weighbridge's range.