Target detection calculation method and system based on binocular vision

By using a target detection method based on binocular vision, the problem of size and weight estimation errors caused by target posture is solved. Through image processing and posture parameter calculation, more accurate target data acquisition is achieved.

CN121883497BActive Publication Date: 2026-06-26RUIMU TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
RUIMU TECHNOLOGY CO LTD
Filing Date
2026-03-20
Publication Date
2026-06-26

Smart Images

  • Figure CN121883497B_ABST
    Figure CN121883497B_ABST
Patent Text Reader

Abstract

The application provides a target detection calculation method and system based on binocular vision, comprising: acquiring a detected scene image through a binocular camera; performing image processing on the detected scene image to separate a candidate target region from the background; extracting the edge of the candidate target region; performing texture feature analysis on the candidate target region after the edge is extracted to screen out a final target region; determining a plurality of key points of the final target region based on the contour of the final target region, calculating the pose parameters of the target according to the key points; establishing a conversion relationship from an image pixel distance to a real physical distance based on the imaging model and the calibration parameters of the binocular camera; acquiring the real physical size of the target based on the conversion relationship; and combining the real physical size and the pose parameters of the target to calculate the data of the target, so that the error of size and weight estimation caused by the target not being horizontally and directly opposite to the camera can be reduced.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of image processing technology, and in particular relates to a target detection calculation method and system based on binocular vision. Background Technology

[0002] In fields such as target physics research, target detection, and resource surveys, there is a need for non-contact measurement and weight estimation of targets (especially fish). Most existing methods implicitly assume that the target is in an ideal posture that is parallel to the camera's optical axis and without bending when performing three-dimensional size reconstruction.

[0003] However, if the target being detected is in free motion when the camera is taking a picture, its body will inevitably have complex postures such as tilting and bending, which will cause the pixel size measured on the two-dimensional image to differ significantly from its actual three-dimensional physical size, resulting in a large error in weight calculation. Summary of the Invention

[0004] To overcome the shortcomings of the prior art, the present invention provides a target detection calculation method and system based on binocular vision, which can reduce the error in size and weight estimation caused by the target not being horizontally facing the camera.

[0005] The technical solution adopted by this invention to solve its technical problem is:

[0006] The object detection computation method based on binocular vision includes the following steps:

[0007] Acquire images of the scene to be detected using a binocular camera;

[0008] Image processing is performed on the detected scene image to separate candidate target regions from the background;

[0009] Extract the edges of the candidate target region;

[0010] Texture feature analysis is performed on the candidate target regions after edge extraction to select the final target regions;

[0011] Key points are determined based on the contour, and the pose parameters of the target are calculated.

[0012] Based on the imaging model and calibration parameters of the binocular camera, a conversion relationship from image pixel distance to real physical distance is established;

[0013] The true physical dimensions of the target are obtained based on the transformation relationship;

[0014] The target's data is calculated by combining the target's actual physical dimensions and attitude parameters.

[0015] Further, image processing is performed on the detected scene image, including color segmentation of the image using an algorithm that fuses the HSV color space and the RGB color space; the algorithm for fusing the HSV color space and the RGB color space includes:

[0016] In the HSV color space, a threshold range corresponding to the target color is set for initial segmentation;

[0017] The brightness distribution in the RGB color space is analyzed, and the preliminary segmentation results of the HSV color space are cross-validated to filter out background areas that do not conform to the brightness characteristics.

[0018] Furthermore, a dynamic threshold edge detection algorithm is used to extract the edges of the candidate target region. The dynamic threshold edge detection algorithm is a dynamic Canny algorithm, which includes:

[0019] The high threshold of the Canny algorithm is dynamically calculated based on the average gray value of local regions of the image, and the low threshold is set to 1 / 2 to 1 / 3 of the high threshold.

[0020] Morphological operations are performed on the edge detection results to remove noise and connect broken edges.

[0021] Furthermore, LBP texture feature analysis is used to perform texture feature analysis on the candidate target region after edge extraction. The LBP texture feature analysis includes:

[0022] Divide the candidate target region into multiple sub-blocks;

[0023] Calculate the LBP histogram and texture statistical features for each sub-block;

[0024] The texture features of all sub-blocks are compared with a preset texture consistency threshold to determine whether the region is the target to be detected.

[0025] Furthermore, before determining multiple key points based on the contour of the final target region, the following steps are also included:

[0026] Determine the confidence level of the target region. If the confidence level of the target region is lower than a preset threshold, then activate the deep learning model for target detection.

[0027] The detection results of the deep learning model are used to optimize the parameters of color segmentation, edge detection, and texture feature analysis, forming a closed-loop learning mechanism.

[0028] Furthermore, based on the contour of the final target region, multiple key points are determined, including: calculating the scaling ratio and transmission relationship between the target and the template based on a pre-stored key point template; determining whether the cross-union ratio (CUI) error between the detected target and the template is less than a threshold; if the CUI is less than the threshold, a deep learning model is started to identify key points, and the target contour is constructed based on the identified key points; if the CUI is greater than or equal to the threshold, the key points of the target are directly calculated using the key point template, and the target contour is constructed; the key points include the anatomical feature points and contour curvature inflection points of the target, and multiple key points are generated by dynamically matching the preset key point template to construct the skeleton curve of the target.

[0029] Furthermore, establishing the conversion relationship from image pixel distance to actual physical distance includes:

[0030] Images of multiple target samples with known real size and weight at different distances are acquired, their pixel size is obtained, and the distance information between the camera and the sample is recorded. The real physical size of the target sample is obtained through manual measurement.

[0031] Considering the effect of light refraction, a nonlinear mapping model is constructed that includes refraction correction coefficients and sample statistical coefficients;

[0032] The pixel size and the actual physical size are fitted using a dynamic kernel function to obtain the parameters of the nonlinear mapping model, thereby establishing the transformation relationship.

[0033] Furthermore, the attitude parameters include at least the target tilt angle and the target curvature. The target tilt angle is defined as the angle between the principal axis of the target body and the optical axis of the camera. The target curvature is obtained in the following manner:

[0034] The key points connecting the target form multiple line segments;

[0035] Calculate the perpendicular bisector of each line segment;

[0036] The target curvature is calculated based on the change in the direction of the perpendicular line in adjacent line segments.

[0037] Furthermore, the target data includes the target weight, which is calculated using the following formula:

[0038] ;

[0039] Where w is the target weight, d is the distance between the camera module and the target being detected, u(f)*v(e) is the correction function, f is the target tilt angle calculated from the key points, e is the target curvature calculated from the key points, thick is the average thickness of the target, ρ is the average density of the target, width is the actual width of the target calculated based on the transformation relationship, and height is the actual height of the target calculated based on the transformation relationship.

[0040] The present invention also provides a system for implementing the target detection calculation method described above, comprising:

[0041] A binocular camera module is used to acquire images;

[0042] An artificial intelligence processing module is used to execute the steps of the method;

[0043] The memory stores instructions executable by the artificial intelligence processing module, which, when executed by the artificial intelligence processing module, cause the system to perform the method as described in any one of claims 1 to 9.

[0044] The beneficial effects of this invention are:

[0045] This invention corrects measurement errors caused by target posture by establishing a correction function based on target tilt angle and target curvature. It constructs the target skeleton by combining anatomical feature points and contour inflection points, and dynamically determines target key points through a complementary approach of preset key point template matching and deep learning. This can more accurately depict the target contour and posture, and reduce target data calculation errors caused by the target not being horizontally facing the camera. Attached Figure Description

[0046] Figure 1 This is an overall flowchart of the present invention;

[0047] Figure 2 This is a flowchart illustrating the determination of key target points in this invention;

[0048] Figure 3 This is a flowchart of obtaining the target curvature in this invention;

[0049] Figure 4 This is a flowchart of converting a target three-dimensional image into a two-dimensional image in this invention. Detailed Implementation

[0050] This specific embodiment is merely an explanation of the present invention and is not intended to limit the invention. After reading this specification, those skilled in the art can make modifications to this embodiment without contributing any inventive element, but such modifications are protected by patent law as long as they fall within the scope of the claims of the present invention.

[0051] This invention provides a target detection calculation method and system based on binocular vision. It comprehensively considers the impact of the target's tilt and angle deviation on the calculation of real data. By mapping the target's tilt function to the real target size, a more accurate target weight can be obtained, thus solving the problem of large errors in existing target detection methods.

[0052] A target detection calculation method and system based on binocular vision includes the following steps:

[0053] Acquire images of the scene to be detected using a binocular camera;

[0054] The image is processed to separate the candidate target region from the background;

[0055] Extract the edges of the candidate target region;

[0056] Texture feature analysis is performed on the candidate target regions after edge extraction to select the final target regions;

[0057] Key points are determined based on the contour, and the pose parameters of the target are calculated.

[0058] Based on the imaging model and calibration parameters of the binocular camera, a conversion relationship from image pixel distance to real physical distance is established;

[0059] The true physical dimensions of the target are obtained based on the transformation relationship;

[0060] The target's data is calculated by combining the target's actual physical dimensions and attitude parameters.

[0061] In one embodiment of the present invention, image processing is performed on the detected scene image, including color segmentation of the image using an algorithm that fuses the HSV and RGB color spaces. This HSV and RGB color space fusion algorithm effectively reduces the candidate region, whose area is typically 10% to 20% of the original image area, thus significantly improving subsequent computational efficiency. The HSV and RGB color space fusion algorithm includes:

[0062] In the HSV color space, a threshold range corresponding to the target color is set for preliminary segmentation. In this step, a basic threshold range needs to be preset for the typical color characteristics of different targets to be detected. For example, the back of a carp in a freshwater environment is usually bluish-gray, and its HSV parameters are roughly distributed in H∈[100,120], S∈[40,80], and V∈[30,70]; while the silvery-white belly of a crucian carp corresponds to H∈[0,30], S∈[10,30], and V∈[70,90].

[0063] The brightness distribution is analyzed in the RGB color space, and the preliminary segmentation results of the HSV color space are cross-validated to filter out background areas that do not conform to the brightness characteristics. Specifically, by analyzing the ratio of the red channel to the green channel, the reflective areas on the surface of the target body and the light spots of the background water are further distinguished, thereby effectively filtering out background areas that differ greatly from the target color, such as the green of aquatic plants and the gray-brown of rocks.

[0064] In one embodiment of the present invention, a dynamic threshold edge detection algorithm is used to extract the edges of the candidate target regions to perform shape screening of typical target contours. The targets to be detected typically have specific shape characteristics, with the aspect ratio of the contour generally between H:1 and K:1. For example, shrimp usually exhibit segmented contours, while fish are mostly streamlined or conical, with obvious segmented contours. Therefore, geometric feature verification is required to exclude most non-target contours and further narrow down the target range.

[0065] In this embodiment, the dynamic threshold edge detection algorithm is a dynamic Canny algorithm. The "dynamic Canny algorithm" refers to a next-stage algorithm that, when the traditional Canny algorithm fails to complete the detection, can adjust the upper and lower thresholds set by Canny itself based on the detection results from deep learning. Specifically, it includes:

[0066] The high threshold of the Canny algorithm is dynamically calculated based on the average gray value of the local area of ​​the image. The low threshold is set to 1 / 2 to 1 / 3 of the high threshold. It is important to note that the high threshold will adaptively fluctuate according to the average brightness. For every 20% increase in brightness, the high threshold will be adjusted and increased by 15% accordingly. The low threshold will be simultaneously and adaptively adjusted to 1 / 2 to 1 / 3 of the high threshold to ensure that continuous edge contours can be extracted under different lighting conditions.

[0067] Morphological operations are performed on the edge detection results to remove noise and connect broken edges. The morphological operations include: first performing a k*k kernel erosion operation to remove fine noise, and then performing a k*k kernel expansion operation to connect broken edges.

[0068] As one embodiment of the present invention, dynamic LBP texture feature analysis is used to perform texture feature analysis on the candidate target region after edge extraction. The purpose is to distinguish between the target to be detected and non-target regions in the image. The "dynamic LBP texture feature analysis" referred to here means that when conventional LBP texture feature analysis fails to complete the detection, the next stage can use the detection results from deep learning to correct the upper and lower thresholds set by LBP itself. Dynamic LBP texture feature analysis includes:

[0069] Divide the candidate target region into multiple sub-blocks;

[0070] Calculate the LBP histogram and texture statistical features for each sub-block;

[0071] The texture features of all sub-blocks are compared with a preset texture consistency threshold to determine whether the region is the target to be detected.

[0072] The texture statistical features include texture uniformity, contrast, and entropy. Uniformity measures the evenness and smoothness of the texture. Smooth areas have higher uniformity. Contrast reflects the sharpness of a local image (the dynamic range of grayscale). Areas with sharp textures have higher contrast. Entropy measures the complexity and randomness of the texture. The more complex and disordered the texture, the higher the entropy value.

[0073] The principle behind determining the preset texture consistency threshold is as follows:

[0074] The surface of a smooth non-detection target area is uniform and lacks detailed texture. The gray value of a pixel is very close to that of most of its surrounding neighborhood. There are very few points with a gray value difference exceeding the threshold, so the proportion is very low (<30%), and thus it is judged as a non-detection target.

[0075] The surface of the target area to be detected usually has rich and irregular textures, wrinkles or patterns, which leads to more and more obvious gray-level changes between pixels and their neighbors. Therefore, the proportion of gray-level differences exceeding the threshold will be high (≥30%), and thus they will be retained for further analysis.

[0076] Therefore, the grayscale difference between each pixel of a sub-block and its surrounding w-neighborhood pixels is calculated, and the proportion of neighboring pixels whose grayscale difference with the center pixel exceeds a preset texture consistency threshold is calculated. If the proportion of the grayscale difference between the center pixel of a sub-block and its surrounding w-neighborhood pixels that exceeds the threshold is less than 30%, the sub-block is determined to be a smooth non-detectable target area; otherwise, it is retained as a candidate target to be detected.

[0077] Before determining multiple key points based on the contour of the final target region, the following steps are also included:

[0078] The region after image processing, edge extraction, and texture feature analysis is taken as the target region. The confidence level of the target region is determined. The confidence level is derived from empirical data. If the confidence level of the target region is lower than a preset threshold, a deep learning model is activated for target detection. The detection results of the deep learning model are used to optimize the parameters of image processing, edge extraction, and texture feature analysis, forming a closed-loop learning mechanism. This allows different algorithms to complement each other and continuously improve the overall detection performance.

[0079] If the confidence level of the target region is higher than a preset threshold, the image distance measurement will proceed directly. For example, if the confidence level of the target region is lower than the preset threshold, or if no target is detected for k consecutive frames but binocular parallax data shows a depth abrupt change in the region, indicating that a target may be present, it will be considered as information loss. In this case, deep learning target detection will be performed on the obtained image to compensate for the information loss.

[0080] The parameter optimization for image processing, edge extraction, and texture feature analysis is all performed using deep learning models. Taking image processing as an example, the principle for optimizing the parameters of the HSV color space in color segmentation is as follows:

[0081] When a shift in the color characteristics of a certain fish species is detected under specific turbidity conditions, such as an overall increase of 1 unit in the H value, the HSV threshold range will be dynamically adjusted to adapt to the new environmental conditions. The selection criteria for edge detection algorithms will also be updated based on typical shapes identified by deep learning. For example, if a certain fish species is found to have a flatter outline when swimming rapidly, the aspect ratio threshold will be relaxed to 2:1 accordingly.

[0082] The establishment of the conversion relationship from image pixel distance to actual physical distance includes:

[0083] Images of multiple target samples with known true sizes and weights at different distances are acquired, their pixel dimensions are obtained, and the distance information between the camera and the sample is recorded. The true physical size of the target sample is obtained through manual measurement. The pixel dimensions refer to the length, width, and height of the sample in the image (in pixels); the distance information refers to the distance between the camera and the sample; the true physical size must be measured manually using digital calipers, accurate to millimeters.

[0084] Considering the effect of light refraction, a nonlinear mapping model is constructed that includes refraction correction coefficients and sample statistical coefficients;

[0085] The pixel size and the actual physical size are fitted using a dynamic kernel function to obtain the parameters of the nonlinear mapping model, thereby establishing the transformation relationship.

[0086] Based on the target pixel size obtained from the final target area through screening, the actual physical size is obtained through nonlinear mapping model transformation, which is used for subsequent weight calculation.

[0087] Based on the contour of the final target region, several key points are determined, specifically as follows:

[0088] First, based on the acquired target region contour, the system performs a 3D-to-2D standardization preprocessing. Since the system's pre-stored keypoint templates are in 2D form, while the detected target exists in 3D space, dimensionality reduction and standardization mapping are necessary. Specifically, a spatial transformation matrix is ​​used to project the contour information of the 3D target onto a 2D plane, forming an initial 2D acquisition image. This initial 2D acquisition image is then input into a correction network module for standardization processing, outputting a standard 2D image.

[0089] In one embodiment of the present invention, 3dT2D is used as the spatial transformation matrix; the Rect2d module is used as the correction network module. The working principle of the Rect2d module is based on the idea of ​​statistical histogram. It divides the image into rows, extracts the features of each row by convolution, and performs pooling operation on the brightness of each row resolution to finally achieve the correction of the image.

[0090] After obtaining the standard 2D image, the system enters the keypoint matching and calculation stage. The system has a pre-stored keypoint template library. By calculating the geometric transformation relationship between the current standard 2D image and the keypoint template, as well as their intersection-over-union (IoU), where the geometric transformation relationship includes scaling and projection, the calculated IoU is logically evaluated.

[0091] If the calculated intersection-union ratio (IU) is less than a preset threshold, the current target is determined to be significantly different from the template. At this point, the system will activate a pre-trained deep learning model to identify key points of the current target. The identification results output by this deep learning model will serve as the final basis for this size measurement.

[0092] If the intersection-union ratio is greater than or equal to the preset threshold, the current target is determined to be highly similar to the template. In this case, the system directly applies the geometric transformation relationship calculated above to map the key points in the template onto the contour of the current target, and calculates the size of the currently detected target based on the mapped key points.

[0093] It is worth noting that this invention introduces a self-evolutionary mechanism: when the deep learning model is used for recognition, the successful recognition results are fed back to the system and used to update the key point template library. This mechanism enables the system to continuously accumulate new target morphological features, thereby improving the success rate and coverage of template matching in subsequent measurements, achieving adaptive optimization of the system.

[0094] The key points include the target's anatomical feature points and contour curvature inflection points. By dynamically matching a preset key point template, multiple key points are generated to construct the target's skeletal curve. The anatomical feature points of different detected targets vary. Here, we take fish as an example. The anatomical feature points of fish include the head (snout), tail (tail fin tip), pectoral fin tips (1 on each side), and dorsal fin apex (1-2). The contour curvature inflection points are the inflection points at the bends of the torso. The number of these key points is dynamically adjusted according to the posture of the detected target (3-8).

[0095] It should be noted that preset keypoint templates can be scaled and merged with keypoint information obtained from deep learning to obtain complete target keypoint information, so as to more accurately depict the outline information of the target.

[0096] The target tilt angle and target curvature are obtained through the following methods:

[0097] The key points connecting the target form multiple line segments;

[0098] Calculate the perpendicular bisector of each line segment;

[0099] The target curvature is calculated based on the change in the direction of the perpendicular bisector in adjacent line segments;

[0100] The angle between the main axis of the target body and the optical axis of the camera is defined as the target tilt angle.

[0101] The target data includes the target weight, which is calculated using the following formula:

[0102] ;

[0103] Where w is the target weight, d is the distance between the camera module and the target being detected, u(f)*v(e) is the correction function, which is mainly calculated based on the angle formed by the lines connecting fixed key points, f is the target tilt angle calculated from the key points, e is the target curvature calculated from the key points, thick is the average thickness of the target, ρ is the average density of the target, both of which are obtained using actual manual statistical data based on the specific scenario, width is the actual width of the target calculated based on the above conversion relationship, and height is the actual height of the target calculated based on the above conversion relationship. The method for determining the distance between the camera module and the target being detected is a well-known technique in the field and will not be elaborated upon here.

[0104] The target data also includes the target volume, which is calculated using the following formula:

[0105] ;

[0106] Where V is the target volume, L is the target length obtained through the nonlinear transformation matrix, B is the target width obtained through the nonlinear transformation matrix, and H is the target height obtained through the nonlinear transformation matrix.

[0107] Alternatively, the following methods can be used for volume calculation:

[0108] ;

[0109] Where V is the target volume, s is the target surface area, and T is the target height or thickness.

[0110] The present invention also provides a system for implementing the method described above, comprising:

[0111] A binocular camera module is used to acquire images;

[0112] An artificial intelligence processing module is used to execute the steps of the method;

[0113] The memory stores instructions that can be executed by the artificial intelligence processing module, which, when executed by the artificial intelligence processing module, cause the system to perform the method described above.

[0114] It is understood that those skilled in the art can combine various implementation methods in the above embodiments under the guidance of the above examples to obtain technical solutions with multiple implementation methods.

[0115] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A target detection calculation method based on binocular vision, characterized in that, Includes the following steps: Acquire images of the scene to be detected using a binocular camera; Image processing is performed on the detected scene image to separate candidate target regions from the background; Extract the edges of the candidate target region; Texture feature analysis is performed on the candidate target regions after edge extraction to select the final target regions; Based on the contour of the final target region, determine its multiple key points and calculate the pose parameters of the target. Based on the imaging model and calibration parameters of the binocular camera, a conversion relationship from image pixel distance to real physical distance is established; The true physical dimensions of the target are obtained based on the transformation relationship; The target's data is calculated by combining its actual physical dimensions and attitude parameters; The attitude parameters include at least the target tilt angle and the target curvature. The target tilt angle is defined as the angle between the principal axis of the target body and the optical axis of the camera. The target curvature is obtained in the following way: The key points connecting the target form multiple line segments; Calculate the perpendicular bisector of each line segment; The target curvature is calculated based on the change in the direction of the perpendicular bisector in adjacent line segments; The target data includes the target weight, which is calculated using the following formula: ; Where w is the target weight, d is the distance between the camera module and the target being detected, u(f)*v(e) is the correction function, f is the target tilt angle calculated from the key points, e is the target curvature calculated from the key points, thick is the average thickness of the target, ρ is the average density of the target, width is the actual width of the target calculated based on the transformation relationship, and height is the actual height of the target calculated based on the transformation relationship.

2. The target detection calculation method based on binocular vision as described in claim 1, characterized in that, Image processing is performed on the detected scene image, including color segmentation of the image using an algorithm that fuses the HSV and RGB color spaces; the algorithm for fusing the HSV and RGB color spaces includes: In the HSV color space, a threshold range corresponding to the target color is set for initial segmentation; The brightness distribution in the RGB color space is analyzed, and the preliminary segmentation results of the HSV color space are cross-validated to filter out background areas that do not conform to the brightness characteristics.

3. The target detection calculation method based on binocular vision as described in claim 1, characterized in that, The edges of the candidate target region are extracted using a dynamic threshold edge detection algorithm, which is a dynamic Canny algorithm, comprising: The high threshold of the Canny algorithm is dynamically calculated based on the average gray value of local regions of the image, and the low threshold is set to 1 / 2 to 1 / 3 of the high threshold. Morphological operations are performed on the edge detection results to remove noise and connect broken edges.

4. The target detection calculation method based on binocular vision as described in claim 1, characterized in that, Dynamic LBP texture feature analysis is used to perform texture feature analysis on the candidate target region after edge extraction. The dynamic LBP texture feature analysis includes: Divide the candidate target region into multiple sub-blocks; Calculate the LBP histogram and texture statistical features for each sub-block; The texture features of all sub-blocks are compared with a preset texture consistency threshold to determine whether the region is the target to be detected.

5. The target detection calculation method based on binocular vision as described in claim 1, characterized in that, Before determining multiple key points based on the contour of the final target region, the following steps are also included: Determine the confidence level of the target region. If the confidence level of the target region is lower than a preset threshold, then activate the deep learning model for target detection. The detection results of the deep learning model are used to optimize the parameters of color segmentation, edge detection, and texture feature analysis, forming a closed-loop learning mechanism.

6. The target detection calculation method based on binocular vision as described in claim 1, characterized in that, Based on the contour of the final target region, multiple key points are determined, including: based on a pre-stored key point template, calculating the scaling ratio and transmission relationship between the target and the template, determining whether the intersection-union ratio (IU) of the detected target and the template is less than a threshold; if the IU is less than the threshold, a deep learning model is started to identify key points, and the target contour is constructed based on the identified key points; if the IU is greater than or equal to the threshold, the key points of the target are directly calculated using the key point template, and the target contour is constructed; the key points include the anatomical feature points and contour curvature inflection points of the target, and multiple key points are generated by dynamically matching the preset key point template to construct the skeleton curve of the target.

7. The target detection calculation method based on binocular vision as described in claim 1, characterized in that, The establishment of the conversion relationship from image pixel distance to actual physical distance includes: Images of multiple target samples with known real size and weight at different distances are acquired, their pixel size is obtained, and the distance information between the camera and the sample is recorded. The real physical size of the target sample is obtained through manual measurement. Considering the effect of light refraction, a nonlinear mapping model is constructed that includes refraction correction coefficients and sample statistical coefficients; The pixel size and the actual physical size are fitted using a dynamic kernel function to obtain the parameters of the nonlinear mapping model, thereby establishing the transformation relationship.

8. A system for implementing the target detection calculation method as described in any one of claims 1 to 7, characterized in that, include: A binocular camera module is used to acquire images; An artificial intelligence processing module is used to execute the steps of the method; The memory stores instructions executable by the artificial intelligence processing module, which, when executed by the artificial intelligence processing module, cause the system to perform the method as described in any one of claims 1 to 7.