An agricultural target detection dataset construction method, system and electronic device

By employing a hybrid training and feedback optimization approach, combined with the Unreal Engine platform, the problem of significant differences between virtual and real scenes was addressed, improving the quality and recognition accuracy of agricultural target detection datasets, making them suitable for agricultural autonomous driving.

CN116453004BActive Publication Date: 2026-06-26LOVOL HEAVY IND CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
LOVOL HEAVY IND CO LTD
Filing Date
2023-04-03
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing methods for constructing virtual scenes cannot be effectively applied to agricultural scenarios, resulting in significant differences between virtual datasets and real-world scenarios, which fails to meet the needs of autonomous driving in agriculture.

Method used

The system trains and validates the data using a dataset that mixes virtual and real agricultural scenarios. It builds virtual agricultural scenarios using the Unreal Engine platform and continuously adjusts the virtual scenarios through a feedback optimization process to ensure that they are consistent with real scenarios.

Benefits of technology

It improves the acquisition quality and recognition accuracy of agricultural target detection datasets, enhances the matching degree between virtual and real scenes, and is suitable for agricultural autonomous driving.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116453004B_ABST
    Figure CN116453004B_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of agricultural dataset construction, and particularly relates to an agricultural target detection dataset construction method, system and electronic device. The method comprises the following steps: obtaining an initial agricultural target detection dataset based on an agricultural virtual scene; obtaining a trained first model and a trained second model, verifying the trained first model by using a verification dataset to obtain a first verification result, verifying the trained second model by using the verification dataset to obtain a second verification result; when the second verification result is better than the first verification result, optimizing the agricultural virtual scene until a preset condition is met; collecting images of the current agricultural virtual scene to obtain a plurality of target virtual scene images to form a final agricultural target detection dataset. Through continuous feedback optimization of the agricultural virtual scene, the problem of insufficient data in the agricultural scene dataset is solved, and the collection quality is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of agricultural dataset construction technology, and in particular to a method, system and electronic device for constructing an agricultural target detection dataset. Background Technology

[0002] Artificial intelligence technology is increasingly being applied in modern agriculture. Unmanned machinery based on visual obstacle avoidance can operate efficiently while ensuring personnel safety. Since visual obstacle avoidance is generally based on deep learning methods, and deep learning models are highly dependent on the training dataset, insufficient training data can lead to low recognition rates. For example, if the number of people in the training set is much smaller than the number of cars, the trained model will recognize people significantly less accurately than it recognizes cars. Conversely, if the amount of training data is sufficient but the data distribution is unbalanced, the trained model may produce false positives. For instance, if the number of four-wheeled vehicles in the dataset is much greater than the number of three-wheeled vehicles, the trained model may misidentify three-wheeled vehicles as four-wheeled vehicles. Therefore, the quantity and quality of the dataset used for training deep learning models are extremely important.

[0003] Traditional methods of acquiring image and video datasets primarily rely on manual fieldwork, which cannot guarantee the integrity and uniformity of the acquired datasets. This is especially true when capturing images of specific scenes and weather conditions, making acquisition even more difficult, time-consuming, and costly in terms of manpower and resources. The development of virtual reality (VR) technology has laid the foundation for virtual simulation. Based on both hardware and software, ray tracing technology can simulate lighting and shadow effects under real-world conditions to the greatest extent possible. In particular, Unreal Engine 5's Nanite virtualized micropolygon geometry and Lumen dynamic global illumination deliver high-quality and realistic scene rendering, freeing scene construction from the limitations of real-world conditions. With the development of virtual technology in scene reconstruction, digital twins, and simulation, its application in agriculture is becoming a reality.

[0004] Currently, internationally, corresponding virtual datasets and methods have been constructed for different tasks. These include AirSim, serving the field of autonomous driving; the Benchmark dataset with data labels for visual tasks such as optical flow estimation, semantic segmentation, object detection, and object tracking; the Virtual KITTI dataset for virtual 3D scene construction; and the NVIDIA DRIVE Sim virtual platform built by NVIDIA for the autonomous driving field. However, most of these methods are applied to the field of road-based autonomous driving or are only used for academic research. Their construction methods are not suitable for autonomous driving in agricultural scenarios, and the scene construction lacks the necessary analysis of layout elements and scene optimization. Currently, virtual scene generation methods based on virtual technology mainly involve three areas, specifically:

[0005] 1) The first method is a virtual dataset construction method applied to the field of road-based autonomous driving. Since autonomous driving focuses more on the complexity of road traffic, the layout of objects is given priority when constructing virtual road scenes. The scene constructed by this method has a relatively rich variety of objects such as vehicles, pedestrians, and streetlights, and has a certain simulation of lighting, shadows, and climate. However, it ignores the situation outside the lane and cannot be directly applied to the agricultural field.

[0006] 2) The second type is the virtual scene construction method applied to academic research. This method is mostly used for pattern recognition algorithm verification. Since academic research focuses more on the algorithm's recognition of objects, the virtual scene construction method is relatively simple. Apart from the object to be recognized, the background is mostly solid color or randomly generated, which cannot simulate the lighting, shadows or climate changes in real scenes. The construction method is also difficult to apply to the agricultural field.

[0007] 3) The third type is a virtual scene construction method applied to the fields of design and architecture. This method is mainly used for the overall or detailed construction of buildings and interiors. Although it also involves some outdoor scenery, it cannot achieve a simulation effect. This method cannot be applied to complex agricultural environments.

[0008] In summary, the virtual scene construction methods applied to autonomous driving, academic research, design, and architecture are all tailored to their respective fields. Compared to complex agricultural scenes, virtual scene construction methods for autonomous driving lack models and scene setups relevant to agricultural scenes. Virtual scene construction methods for academic research, in addition to lacking models and scene setups for agricultural scenes, also lack the ability to simulate lighting, shadows, and climate variations in real-world scenes. While virtual scene construction methods for design and architecture achieve good detail and texture reproduction, they similarly lack models and scene setups relevant to agricultural scenes and do not simulate lighting, shadows, and climate variations in real-world scenes. The datasets constructed by these methods lack a feedback correction process, resulting in significant discrepancies between the constructed virtual scenes and real-world scenes, making the collected virtual datasets unsuitable as extensions of real-world datasets. Summary of the Invention

[0009] The technical problem to be solved by the present invention is to address the shortcomings of the prior art by providing a method, system and electronic device for constructing an agricultural target detection dataset.

[0010] The technical solution of the agricultural target detection dataset construction method of the present invention is as follows:

[0011] S1. Collect images of the agricultural virtual scene to obtain multiple virtual scene images and form an initial agricultural target detection dataset;

[0012] S2. Mix the initial agricultural target detection dataset and the target detection dataset constructed based on real agricultural scenarios in a certain proportion to obtain a training dataset. Train the first model based on the training dataset to obtain a trained first model. Train the second model based on the target detection dataset constructed based on real agricultural scenarios to obtain a trained second model. The first model and the second model are deep learning models with the same network structure.

[0013] S3. Validate the trained first model using the validation dataset to obtain the first validation result. Validate the trained second model using the validation dataset to obtain the second validation result.

[0014] S4. When the second verification result is better than the first verification result, optimize the agricultural virtual scene, return to execute S1, and continue until the preset conditions are met, then execute S5. The preset conditions include: the second verification result is not better than the first verification result.

[0015] S5. Collect images of the current agricultural virtual scene to obtain multiple target virtual scene images, which will form the final agricultural target detection dataset.

[0016] The technical solution of the agricultural target detection dataset construction system of the present invention is as follows:

[0017] It includes a data acquisition and construction module, a training module, a validation module, and an optimization module;

[0018] The acquisition and construction module is used to: acquire images of agricultural virtual scenes, obtain multiple virtual scene images, and form an initial agricultural target detection dataset;

[0019] The training module is used to: mix the initial agricultural target detection dataset and the target detection dataset constructed based on real agricultural scenarios in a certain proportion to obtain a training dataset; train the first model based on the training dataset to obtain a trained first model; and train the second model based on the target detection dataset constructed based on real agricultural scenarios to obtain a trained second model. The first model and the second model are deep learning models with the same network structure.

[0020] The verification module is used to: verify the trained first model using the verification dataset to obtain a first verification result; and verify the trained second model using the verification dataset to obtain a second verification result.

[0021] The optimization module is used to: optimize the agricultural virtual scene when the second verification result is better than the first verification result, call the acquisition construction module, the training module and the verification module until the preset conditions are met, call the acquisition construction module to acquire images of the current agricultural virtual scene, obtain multiple target virtual scene images, and form the final agricultural target detection dataset. The preset conditions include: the second verification result is not better than the first verification result.

[0022] The present invention provides a storage medium storing instructions that, when read by a computer, cause the computer to execute an agricultural target detection dataset construction method as described in any of the preceding claims.

[0023] An electronic device according to the present invention includes a processor and the above-described storage medium, wherein the processor executes instructions in the storage medium.

[0024] The beneficial effects of this invention are as follows:

[0025] By continuously optimizing the virtual agricultural scene based on feedback, we not only solve the problem of insufficient centralized data in the agricultural scene dataset, but also further improve the collection quality of the final agricultural target detection dataset. Attached Figure Description

[0026] Figure 1This is one of the flowcharts illustrating a method for constructing an agricultural target detection dataset according to an embodiment of the present invention;

[0027] Figure 2 This is a second flowchart illustrating a method for constructing an agricultural target detection dataset according to an embodiment of the present invention.

[0028] Figure 3 This is a schematic diagram for analyzing agricultural scenarios.

[0029] Figure 4 A schematic diagram of a virtual agricultural scenario setup;

[0030] Figure 5 These are renderings of a virtual agricultural scene.

[0031] Figure 6 A schematic diagram of the scene feedback optimization process;

[0032] Figure 7 This is a schematic diagram of the structure of an agricultural target detection dataset construction system according to an embodiment of the present invention. Detailed Implementation

[0033] like Figure 1 As shown, an embodiment of the present invention provides a method for constructing an agricultural target detection dataset, which includes the following steps:

[0034] S1. Collect images of the agricultural virtual scene to obtain multiple virtual scene images and form an initial agricultural target detection dataset;

[0035] S2. Mix the initial agricultural target detection dataset and the target detection dataset constructed based on real agricultural scenarios in a certain proportion to obtain the training dataset. Train the first model based on the training dataset to obtain the trained first model. Train the second model based on the target detection dataset constructed based on real agricultural scenarios to obtain the trained second model. The first model and the second model are deep learning models with the same network structure.

[0036] S3. Validate the trained first model using the validation dataset to obtain the first validation result. Validate the trained second model using the validation dataset to obtain the second validation result.

[0037] S4. When the second verification result is better than the first verification result, optimize the agricultural virtual scene and return to execute S1 until the preset conditions are met, then execute S5. The preset conditions include: the second verification result is not better than the first verification result; S5. Collect images of the current agricultural virtual scene to obtain multiple target virtual scene images, which form the final agricultural target detection dataset.

[0038] The first verification result is the recognition accuracy of the trained first model, and the second verification result is the recognition accuracy of the trained second model. When the recognition accuracy of the trained second model is greater than that of the trained first model, the second verification result is deemed superior to the first verification result; when the recognition accuracy of the trained second model is not greater than that of the trained first model, the second verification result is deemed not superior to the first verification result. Optionally, the above technical solution further includes:

[0039] When returning to execute S1, increase the proportion of the initial agricultural target detection dataset in the training dataset according to the preset step size;

[0040] The preset conditions also include: the proportion of the initial agricultural target detection dataset in the training dataset exceeds a preset proportion threshold.

[0041] The preset step size can be set according to the actual situation, such as 1% or 2%. Taking a preset step size of 1% as an example:

[0042] In the first execution of S2, the initial agricultural object detection dataset and the object detection dataset constructed based on real agricultural scenarios are mixed in a 30:70 ratio to obtain the training dataset; in the second execution of S2, the initial agricultural object detection dataset and the object detection dataset constructed based on real agricultural scenarios are mixed in a 31:69 ratio; in the third execution of S2, the initial agricultural object detection dataset and the object detection dataset constructed based on real agricultural scenarios are mixed in a 31:69 ratio; and so on.

[0043] In this embodiment, meeting the preset conditions means that the second verification result is not better than the first verification result, and the proportion of the initial agricultural target detection dataset in the training dataset exceeds a preset proportion threshold. The preset proportion threshold can be set according to the actual situation, such as 50% or 60%.

[0044] Optionally, in the above technical solution, the process of building an agricultural virtual scene includes:

[0045] We used an agricultural scene model library built based on basic agricultural scene elements and the Unreal Engine platform to build virtual agricultural scenes.

[0046] Optionally, in the above technical solution, S1 includes:

[0047] Adjustments are made to the light intensity, direction, color temperature, and weather conditions in the virtual agricultural scene, and images are captured using camera components placed within the virtual agricultural scene.

[0048] Optionally, in the above technical solution, the deep learning model is the YOLOv5 model.

[0049] The present invention provides a method for constructing an agricultural target detection dataset, which mainly includes agricultural scene analysis, construction of a virtual agricultural scene, and scene feedback optimization. Specifically:

[0050] like Figure 2 As shown, agricultural scene analysis provides the necessary agricultural scene model library for the construction of agricultural virtual scenes; the Unreal Engine platform, as the construction platform for agricultural virtual scenes, restores the real agricultural scene to the maximum extent based on the agricultural scene model library. The completed agricultural virtual scene simulates image acquisition under real conditions through the camera component built into the engine. The images and videos acquired by the camera are used to obtain virtual scene images, which form the initial agricultural target detection dataset.

[0051] The initial agricultural target detection dataset was selected and target recognition detection boxes were labeled according to the real dataset, i.e., the target detection dataset constructed based on real agricultural scenarios. The initial agricultural target detection dataset and the real dataset were combined in proportion to form the agricultural target detection dataset, i.e., the training dataset. Then, the first model was trained based on the training dataset to obtain the trained first model. The second model was trained based on the target detection dataset constructed based on real agricultural scenarios to obtain the trained second model. The first model and the second model are deep learning models with the same network structure. S3 was used for inference verification. Based on the verification results, i.e. the first verification result and the second verification result, the agricultural virtual scene was optimized and adjusted manually to complete the scene feedback optimization.

[0052] Furthermore, when returning to execute S1, the proportion of the initial agricultural target detection dataset in the training dataset is increased according to a preset step size. When the second validation result is not better than the first validation result, and the proportion of the initial agricultural target detection dataset in the training dataset exceeds a preset proportion threshold, the optimal agricultural virtual scene, i.e., the current agricultural virtual scene in S5, is determined. Images of the current agricultural virtual scene are then acquired to obtain multiple target virtual scene images, forming the final agricultural target detection dataset. The specific steps include the following:

[0053] S100. Obtain basic agricultural scenario elements through agricultural scenario analysis and construct an agricultural scenario model library:

[0054] like Figure 3As shown, elements such as people, vehicles, animals, plants, buildings, terrain, weather, and other facilities in real agricultural scenes are extracted from example images or videos. People include elements such as gender, clothing color, and shape; vehicles include elements such as motor vehicles, non-motor vehicles, vehicle type, color, and texture; animals include elements such as type, color, and shape; plants include crops and other plants, with crops including elements such as type, texture, color, growth status, and shape, and other plants including elements such as type, color, and texture; buildings include elements such as height, color, and texture; terrain includes elements such as soil color, texture, and degree of unevenness, and rock color and texture; weather includes elements such as light intensity, direction, rain, and snow; other facilities include facilities such as roads, streetlights, fire hydrants, and bridges, as well as their colors and shapes.

[0055] All the aforementioned elements together constitute the basic agricultural scene elements. This also determines the relative positions of objects in a real agricultural scene. Using these basic elements, an agricultural scene model library is built. This library contains models of people, vehicles, animals, plants, buildings, and other facilities needed to construct virtual agricultural scenes. The tools used for model creation include commonly used 3D modeling software such as Autodesk Maya, 3D Studio Max, and Blender. Models are created based on previously obtained color, texture, and other relevant information, and the output models are in mainstream 3D model formats such as OBJ and FBX. To accelerate model generation, the agricultural scene model library also includes existing models provided by third parties. Furthermore, pre-created models are saved for reuse, improving the efficiency of virtual agricultural scene production.

[0056] S101, Construction of Virtual Agricultural Scenes:

[0057] like Figure 4 As shown, an agricultural virtual scene was built using the Unreal Engine platform. Figure 5 The demonstration showcased the actual effect of building a virtual agricultural scene. The Unreal Engine platform provides Blueprint visualization scripts, allowing scene construction to be completed using existing scene elements, facilitating rapid scene creation. The engine's Nanite virtualized micropolygon geometry system and virtual shadow mapping ensure real-time frame rates without significant distortion while maintaining cinematic-level scene performance. Lumen dynamic global illumination creates more realistic lighting and reflection effects, enabling the built virtual scene to simulate real-world conditions as closely as possible. Therefore:

[0058] First, the terrain is edited in the Unreal Engine platform based on the terrain element information in the basic agricultural scene elements. Then, relevant models such as people, animals, vehicles, plants, buildings, and other facilities are obtained from the agricultural scene model library and placed in the scene to complete the scene construction. Finally, the light intensity, direction, color temperature, and other weather conditions in the virtual scene are adjusted based on the weather element information in the basic agricultural scene elements. Images of the virtual scene are acquired by placing the engine's built-in camera component in the agricultural virtual scene. During the acquisition process, adjustments can be made based on the acquisition height, speed, and whether there is shaking. Object detection and annotation are performed on the acquired virtual scene images, and the initial agricultural object detection dataset based on the Unreal Engine platform is completed.

[0059] S102, Scene Feedback Optimization:

[0060] The initial agricultural object detection dataset and the object detection dataset constructed based on real agricultural scenes are mixed in a 3:7 ratio to form the training dataset. For example, if the training dataset contains 10,000 images, 3,000 images are obtained from the initial agricultural object detection dataset and 7,000 images are obtained from the object detection dataset constructed based on real agricultural scenes. The first and second models are YOLOv5 models with the same network structure. Further explanation follows:

[0061] The YOLOv5 model is trained based on the training dataset to obtain the trained YOLOv5 model, i.e., the first trained model. The YOLOv5 model is trained based on the target detection dataset constructed from real agricultural scenarios to obtain the second trained YOLOv5 model.

[0062] The first trained model was validated using a validation dataset containing 2,000 real agricultural scenes to obtain the first validation result. The second trained model was then validated using the validation dataset to obtain the second validation result.

[0063] The first verification result is the recognition accuracy of the trained first model, and the second verification result is the recognition accuracy of the trained second model. When the recognition accuracy of the trained second model is greater than that of the trained first model, the second verification result is considered superior to the first verification result. This indicates that similar data is lacking in the agricultural virtual scene. Feedback to necessary scene elements allows for adjustments to information such as clothing colors based on the real scene, thereby altering the performance of character models in the scene model library. Feedback to the Unreal Engine platform increases the number of distant characters, and then virtual scene images are collected again for training and verification. Furthermore, when virtual scene images are collected again, the proportion of the initial agricultural target detection dataset in the training dataset is increased by a preset step size. This process continues until preset conditions are met, at which point step S103 is executed. These preset conditions include: the second verification result is not superior to the first verification result, and the proportion of the initial agricultural target detection dataset in the training dataset exceeds a preset threshold.

[0064] S5. Image acquisition is performed on the current virtual agricultural scene to obtain multiple target virtual scene images, forming the final agricultural target detection dataset. After feedback optimization, the inference features of the virtual agricultural scene are quite similar to those of the real agricultural scene at the target detection task level. In subsequent scene expansion, the preliminary optimization process can be used as a guiding principle to improve the efficiency and quality of scene expansion.

[0065] In the above embodiments, although the steps are numbered S1, S2, etc., they are only specific embodiments given in this application. Those skilled in the art can adjust the execution order of S1, S2, etc. according to the actual situation, which is also within the protection scope of this invention. It can be understood that in some embodiments, some or all of the above embodiments may be included.

[0066] like Figure 7 As shown, an agricultural target detection dataset construction system 200 according to an embodiment of the present invention includes a data acquisition and construction module 210, a training module 220, a verification module 230 and an optimization module 240.

[0067] The acquisition and construction module 210 is used to: acquire images of agricultural virtual scenes, obtain multiple virtual scene images, and form an initial agricultural target detection dataset;

[0068] The training module 220 is used to: mix the initial agricultural target detection dataset and the target detection dataset constructed based on real agricultural scenarios in a proportional manner to obtain a training dataset; train the first model based on the training dataset to obtain a trained first model; and train the second model based on the target detection dataset constructed based on real agricultural scenarios to obtain a trained second model. The first model and the second model are deep learning models with the same network structure.

[0069] The verification module 230 is used to: verify the trained first model using the verification dataset to obtain a first verification result; and verify the trained second model using the verification dataset to obtain a second verification result.

[0070] The optimization module 240 is used to: optimize the agricultural virtual scene when the second verification result is better than the first verification result, call the acquisition construction module 210, the training module 220 and the verification module 230 until the preset conditions are met, and call the acquisition construction module 210 to acquire images of the current agricultural virtual scene, obtain multiple target virtual scene images, and form the final agricultural target detection dataset. The preset conditions include: the second verification result is not better than the first verification result.

[0071] Optionally, in the above technical solution, when the optimization module 240 calls the acquisition and construction module 210, the training module 220 and the verification module 230, the training module 220 increases the proportion of the initial agricultural target detection dataset in the training dataset according to a preset step size.

[0072] The preset conditions also include: the proportion of the initial agricultural target detection dataset in the training dataset exceeds a preset proportion threshold.

[0073] Optionally, the above technical solution also includes a scene building module, which is used for:

[0074] We used an agricultural scene model library built based on basic agricultural scene elements and the Unreal Engine platform to build virtual agricultural scenes.

[0075] Optionally, in the above technical solution, the data acquisition and construction module 210 is specifically used for:

[0076] Adjustments are made to the light intensity, direction, color temperature, and weather conditions in the virtual agricultural scene, and images are captured using camera components placed within the virtual agricultural scene.

[0077] Optionally, in the above technical solution, the deep learning model is the YOLOv5 model.

[0078] The parameters and steps for implementing the corresponding functions of each unit module in the agricultural target detection dataset construction system 200 of the present invention described above can be referred to the parameters and steps in the embodiments of the agricultural target detection dataset construction method described above, and will not be repeated here.

[0079] An embodiment of the present invention provides a storage medium characterized in that the storage medium stores instructions, which, when read by a computer, cause the computer to execute any of the above-mentioned methods for constructing an agricultural target detection dataset.

[0080] An electronic device according to an embodiment of the present invention includes a processor and the aforementioned storage medium. The processor executes instructions in the storage medium. The electronic device may be a computer, a mobile phone, etc., and its program may be computer software or a mobile APP, etc.

[0081] Those skilled in the art will know that this invention can be implemented as a system, method, or computer program product.

[0082] Therefore, this disclosure can be implemented in the following forms: it can be entirely hardware, entirely software (including firmware, resident software, microcode, etc.), or a combination of hardware and software, generally referred to herein as a "circuit," "module," or "system." Furthermore, in some embodiments, the invention can also be implemented as a computer program product in one or more computer-readable media, the computer-readable medium containing computer-readable program code.

[0083] Any combination of one or more computer-readable media may be used. A computer-readable medium can be a computer-readable signal medium or a computer-readable storage medium. A computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of computer-readable storage media include: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in connection with an instruction execution system, apparatus, or device.

[0084] Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of the present invention.

Claims

1. A method for constructing an agricultural target detection dataset, characterized in that, include: S1. Collect images of the agricultural virtual scene to obtain multiple virtual scene images and form an initial agricultural target detection dataset; S2. Mix the initial agricultural target detection dataset and the target detection dataset constructed based on real agricultural scenarios in a certain proportion to obtain a training dataset. Train the first model based on the training dataset to obtain a trained first model. Train the second model based on the target detection dataset constructed based on real agricultural scenarios to obtain a trained second model. The first model and the second model are deep learning models with the same network structure. S3. Validate the trained first model using the validation dataset to obtain the first validation result. Validate the trained second model using the validation dataset to obtain the second validation result. S4. When the second verification result is better than the first verification result, optimize the agricultural virtual scene, return to execute S1, and continue until the preset conditions are met, then execute S5. The preset conditions include: the second verification result is not better than the first verification result. S5. Collect images of the current agricultural virtual scene to obtain multiple target virtual scene images, which will form the final agricultural target detection dataset. Also includes: When returning to execute S1, increase the proportion of the initial agricultural target detection dataset in the training dataset according to the preset step size; The preset conditions also include: the proportion of the initial agricultural target detection dataset in the training dataset exceeds a preset proportion threshold.

2. The method for constructing an agricultural target detection dataset according to claim 1, characterized in that, The process of building the virtual agricultural scene includes: The agricultural virtual scene was built using an agricultural scene model library constructed based on basic agricultural scene elements and the Unreal Engine platform.

3. The method for constructing an agricultural target detection dataset according to claim 1, characterized in that, S1 includes: The light intensity, direction, color temperature, and weather conditions in the agricultural virtual scene are adjusted, and images are acquired by a camera component placed in the agricultural virtual scene.

4. A method for constructing an agricultural target detection dataset according to any one of claims 1 to 3, characterized in that, The deep learning model is the YOLOv5 model.

5. A system for constructing an agricultural target detection dataset, characterized in that, It includes a data acquisition and construction module, a training module, a validation module, and an optimization module; The acquisition and construction module is used to: acquire images of agricultural virtual scenes, obtain multiple virtual scene images, and form an initial agricultural target detection dataset; The training module is used to: mix the initial agricultural target detection dataset and the target detection dataset constructed based on real agricultural scenarios in a certain proportion to obtain a training dataset; train the first model based on the training dataset to obtain a trained first model; and train the second model based on the target detection dataset constructed based on real agricultural scenarios to obtain a trained second model. The first model and the second model are deep learning models with the same network structure. The verification module is used to: verify the trained first model using the verification dataset to obtain a first verification result; and verify the trained second model using the verification dataset to obtain a second verification result. The optimization module is used to: optimize the agricultural virtual scene when the second verification result is better than the first verification result, call the acquisition construction module, the training module and the verification module until the preset conditions are met, call the acquisition construction module to acquire images of the current agricultural virtual scene, obtain multiple target virtual scene images, and form the final agricultural target detection dataset. The preset conditions include: the second verification result is not better than the first verification result. When the optimization module calls the acquisition and construction module, the training module and the verification module, the training module increases the proportion of the initial agricultural target detection dataset in the training dataset according to a preset step size; The preset conditions also include: the proportion of the initial agricultural target detection dataset in the training dataset exceeds a preset proportion threshold.

6. The agricultural target detection dataset construction system according to claim 5, characterized in that, It also includes a scene building module, which is used for: The agricultural virtual scene was built using an agricultural scene model library constructed based on basic agricultural scene elements and the Unreal Engine platform.

7. A storage medium, characterized in that, The storage medium stores instructions that, when read by a computer, cause the computer to execute a method for constructing an agricultural target detection dataset as described in any one of claims 1 to 4.

8. An electronic device, characterized in that, It includes a processor and the storage medium of claim 7, wherein the processor executes instructions in the storage medium.