Method and system for detecting object in remote detection image

The Squirrel Search Algorithm optimizes YOLOv8 model hyperparameters, addressing computational inefficiencies and improving object detection accuracy by up to 12% in remote sensing images.

WO2026142304A1PCT designated stage Publication Date: 2026-07-02CHANGWON NATIONAL UNIVERSITY INDUSTRY ACADEMY COOPERATION CORPS

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
CHANGWON NATIONAL UNIVERSITY INDUSTRY ACADEMY COOPERATION CORPS
Filing Date
2025-12-24
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Current object detection models like YOLO series face challenges in hyperparameter optimization, requiring significant computational resources and time, and existing meta-heuristic algorithms lack search diversity and performance constraints, leading to suboptimal model performance.

Method used

The Squirrel Search Algorithm (SSA) is applied to optimize hyperparameters of the YOLOv8 model, enhancing search capabilities by mimicking the foraging behavior of flying squirrels, allowing for more accurate object detection in remote sensing images.

Benefits of technology

The SSA-based optimization improves the YOLOv8 model's performance by up to 12% in mean average precision, demonstrating superior detection accuracy and reduced false positives across various IoU thresholds.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025022684_02072026_PF_FP_ABST
    Figure KR2025022684_02072026_PF_FP_ABST
Patent Text Reader

Abstract

Disclosed are a method and system for detecting an object in a remote detection image. The method for detecting an object according to an embodiment comprises the steps of: loading a pre-trained you only look once (YOLO) model according to a hyperparameter tuned on the basis of a squirrel search algorithm (SSA); receiving an image for object detection; and providing information about an object included in the image and output from the YOLO model by inputting the image to the YOLO model.
Need to check novelty before this filing date? Find Prior Art

Description

Method and System for Object Detection in Remote Detection Images

[0001] This invention is the result of research conducted as part of the Phase 3 Industry-Academic Cooperation Leading University Development Project (LINC 3.0), funded by the Ministry of Education and the National Research Foundation of Korea.

[0002] The following description relates to a method and system for object detection in remote detection images.

[0003] Object detection is a critical task in remote sensing image analysis and is utilized across various industries. It is particularly useful in satellite imagery, surveillance cameras, and drone footage. However, because these object detection technologies require not only identifying the category to which an object of interest belongs but also locating the object using bounding boxes, the task becomes difficult and the algorithm requirements become more complex. Currently, one of the most widely used models in the field of object detection is the YOLO (You Only Look Once) series. While this model has the advantage of detecting multiple objects within an image simultaneously in real time, its performance depends heavily on parameter settings. Hyperparameter optimization is a technique that requires a deep understanding of the model, as well as sufficient time and computational resources. Comparing all possible hyperparameters consumes significant computational time and cost, and the final model can become excessively complex. Therefore, various algorithms have been developed to efficiently find optimal hyperparameters within limited resources and time in practical applications.

[0004] Meta-heuristic algorithms are widely used to solve complex optimization problems by mimicking natural phenomena. They find optimal solutions through various search strategies, including Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC), Bat Algorithm (BA), and Firefly Algorithm (FF). However, these algorithms have limitations, such as a lack of search diversity or performance constraints. To address these limitations, the Squirrel Search Algorithm (SSA) enhances search capabilities by introducing multi-stage strategies, random movement, and seasonal conditions. In particular, by adapting to diverse environmental conditions, it can efficiently explore a wider space without getting stuck in local optima. This approach can maximize model performance by identifying global optima.

[0005] Research on hyperparameter optimization methods for object recognition is continuously being conducted. For example, YOLOv5 optimization research based on genetic algorithms (GA) seeks the optimal combination of hyperparameters that maximizes the fitness function by generating new objects through crossover and mutation operators. In particular, mutation is applied with an 80% probability to aid in optimization. As a result of this optimization, mAP and recall improved, but precision was somewhat lower. The study "Real-Time Flying Object Detection with YOLOv8" proposes a method to effectively tune hyperparameters using a Bayesian optimization technique based on the W&B platform. This technique aims for an mAP of 50 by optimizing the learning rate, batch size, and image size, significantly improving the detection accuracy of YOLOv8.

[0006] A method and system for detecting objects in remote detection images are provided.

[0007] An object detection method of an object detection system implemented by at least one computer device, wherein the at least one computer device includes at least one processor, and the object detection method comprises: a step of loading a YOLO (You Only Look Once) model pre-trained according to hyperparameters tuned based on a SSA (Squirrel Search Algorithm) by the at least one processor; a step of receiving an image for object detection by the at least one processor; and a step of inputting the image to the YOLO model by the at least one processor to provide information about an object included in the image output by the YOLO model.

[0008] According to one aspect, the YOLO model may be characterized by including a CloU (Complete Intersection over Union) loss function, a DFL (Distance Focal Loss) loss function, and a semantic segmentation model that predicts a semantic segmentation mask for an image.

[0009] According to another aspect, the YOLO model may be characterized by being pre-trained according to a hyperparameter selected through a goodness-of-fit evaluation of the hyperparameter by matching the hyperparameter to the position of the squirrel used in the SSA.

[0010] According to another aspect, the hyperparameters may be characterized by including at least one of a learning rate, a batch size, and an epoch.

[0011] According to another aspect, the goodness-of-fit evaluation is based on the goodness-of-fit according to a goodness-of-fit function, and the goodness-of-fit function may be characterized by including precision, recall, average precision, and loss indicator as terms.

[0012] According to another aspect, the loss indicator may be characterized by including at least one of a bounding box loss representing an error occurring in predicting the location and size of a bounding box, a classification loss representing an error occurring when the YOLO model predicts the class label of each object, and a deformation loss representing a loss used to process shape deformation of an object.

[0013] According to another aspect, the average precision may be characterized by including at least one of a first average precision calculated based on the case where the Intersection over Union (IoU) is 0.5 or higher, and a second average precision calculated at several thresholds where the IoU increases from 0.5 to 0.95 in increments of 0.05.

[0014] A learning method for a learning system implemented by at least one computer device, wherein the at least one computer device includes at least one processor, and the learning method comprises: a step of preprocessing a dataset including a plurality of learning images by the at least one processor; a step of tuning hyperparameters of a YOLO (You Only Look Once) model based on a SSA (Squirrel Search Algorithm) by the at least one processor; and a step of training the YOLO model according to the tuned hyperparameters using the preprocessed dataset.

[0015] According to one aspect, the YOLO model may be characterized by including a CloU (Complete Intersection over Union) loss function, a DFL (Distance Focal Loss) loss function, and a semantic segmentation model that predicts a semantic segmentation mask for an image.

[0016] According to another aspect, the step of tuning the hyperparameters may be characterized by selecting the hyperparameters through an evaluation of the goodness of fit for the hyperparameters by matching the hyperparameters to the positions of the squirrels used in the SSA, and the step of training the YOLO model may be characterized by training the YOLO model according to the selected hyperparameters.

[0017] According to another aspect, the hyperparameters may be characterized by including at least one of a learning rate, a batch size, and an epoch.

[0018] According to another aspect, the goodness-of-fit evaluation is based on the goodness-of-fit according to a goodness-of-fit function, and the goodness-of-fit function may be characterized by including precision, recall, average precision, and loss indicator as terms.

[0019] According to another aspect, the loss indicator may be characterized by including at least one of a bounding box loss representing an error occurring in predicting the location and size of a bounding box, a classification loss representing an error occurring when the YOLO model predicts the class label of each object, and a deformation loss representing a loss used to process shape deformation of an object.

[0020] According to another aspect, the average precision may be characterized by including at least one of a first average precision calculated based on the case where the Intersection over Union (IoU) is 0.5 or higher, and a second average precision calculated at several thresholds where the IoU increases from 0.5 to 0.95 in increments of 0.05.

[0021] A computer program stored on a computer-readable recording medium is provided to be combined with a computer device to execute the above method on the computer device.

[0022] A computer-readable recording medium is provided on which a computer program for executing the above method is recorded on a computer device.

[0023] An object detection system implemented by at least one computer device, wherein the at least one computer device includes at least one processor, and the at least one processor loads a YOLO (You Only Look Once) model pre-trained according to hyperparameters tuned based on the SSA (Squirrel Search Algorithm), receives an image for object detection, inputs the image to the YOLO model, and provides information about an object contained in the image output by the YOLO model.

[0024] A learning system implemented by at least one computer device, wherein the at least one computer device includes at least one processor, and the at least one processor preprocesses a dataset including a plurality of training images, tunes the hyperparameters of a YOLO (You Only Look Once) model based on a Squirrel Search Algorithm (SSA), and trains the YOLO model according to the tuned hyperparameters using the preprocessed dataset.

[0025] A method and system for object detection in remote detection images can be provided.

[0026] FIG. 1 is a diagram illustrating an example of an overview of a system for optimizing a YOLO model and a system for object detection using an optimized YOLO model in one embodiment of the present invention.

[0027] FIG. 2 is a flowchart illustrating an example of the optimization process of a YOLO model in one embodiment of the present invention.

[0028] FIG. 3 is a diagram illustrating an example of 20 object classes included in a DIOR data set in an experimental example of the present invention.

[0029] FIG. 4 is a diagram illustrating an example of a data set for learning and verification in one experimental example of the present invention.

[0030] FIG. 5 is a diagram illustrating an example of a test data set in one experimental example of the present invention.

[0031] FIG. 6 is a diagram illustrating an example of a prediction result of remote sensing object detection in an embodiment of the present invention.

[0032] FIGS. 7 to 11 are drawings illustrating examples of the results of model learning in one experimental example of the present invention.

[0033] FIG. 12 is a flowchart illustrating an example of a learning method of a learning system according to an embodiment of the present invention.

[0034] FIG. 13 is a flowchart illustrating an example of an object detection method of an object detection system according to an embodiment of the present invention.

[0035] FIG. 14 is a block diagram illustrating an example of a computer device according to an embodiment of the present invention.

[0036] Hereinafter, embodiments will be described in detail with reference to the attached drawings.

[0037] In embodiments of the present invention, hyperparameter optimization of a YOLO (You Only Look Once) model (e.g., YOLOv8 model) can be performed by applying the Squirrel Search Algorithm (SSA), a nature-inspired metaheuristic optimization technique. Optimization using SSA enables more accurate object detection in various scenes, and the process and results can be analyzed in detail to improve the performance of remote object detection.

[0038] Structure of the YOLOv8 network

[0039] As explained above, in the embodiments of the present invention, the performance of YOLOv8 can be improved by using SSA to optimize the hyperparameters of the YOLOv8 model.

[0040] FIG. 1 is a diagram illustrating an example of a schematic view of a system for optimizing a YOLO model and a system for object detection using an optimized YOLO model in an embodiment of the present invention. In the embodiment of FIG. 1, a learning system (110) and an object detection system (120) are shown. Each of the learning system (110) and the object detection system (120) can be implemented by at least one computer device.

[0041] The learning system (110) may include a data preprocessing unit (111), a parameter tuning unit (112), and a model training unit (113). The data preprocessing unit (111) may preprocess training images that are inputs (130) to the learning system (110). For example, annotation files within a directory of training data containing training images may be in a format incompatible with the YOLO model (e.g., Pascal VOC (Pascal Visual Object Classes) format), and the data preprocessing unit (111) may convert the format of such annotation files into the YOLO format. Additionally, the parameter tuning unit (112) may tune the hyperparameters of the YOLO model using SSA. The model training unit (113) may train the YOLO model according to the tuned hyperparameters using the preprocessed training images. As previously described, the learning system (110) may be implemented by at least one computer device, and the at least one computer device may include at least one processor. Here, the data preprocessing unit (111), the parameter tuning unit (112), and the model training unit (113) may be functional expressions of at least one processor.

[0042] The object detection system (120) may include a model loading unit (121) and an object detection unit (122). The model loading unit (121) may load a YOLO model, which is pre-trained and provided by the learning system (110), into the memory of at least one computer device implementing the object detection system (120). The object detection unit (122) may receive an image input (140) and input the input image into the loaded YOLO model to detect objects contained in the image. The output (150) provided through the object detection unit (122) may include information about objects detected in the image that is the input (140). As previously described, the object detection system (120) may be implemented by at least one computer device, and at least one computer device may include at least one processor. Here, the model loading unit (121) and the object detection unit (122) may be functional representations of at least one processor.

[0043] For example, the data preprocessing unit (111) can download a dataset to load training data and prepare data files, and then redistribute the entire dataset for training, testing, and verification to evaluate the data training results. In one embodiment, 50% of the entire dataset can be redistributed for training, 40% for testing, and 10% for verification, respectively.

[0044] The YOLO model is a single-stage detector that integrates localization and classification—the elements of object detection—into a single process. While effective for real-time object detection by simultaneously performing localization and classification on large volumes of images, it faces challenges in handling objects of various sizes and complex scenes. YOLOv8, the latest version of YOLO, is largely composed of three parts: the Backbone, the Neck, and the Head. The YOLOv8 Backbone is based on the CSPDarknet53 architecture. The core of this architecture is the C2f module (Cross-Stage Partial Bottleneck with Two Convolutions), which plays a role in improving detection accuracy by extracting high-level features and combining contextual information. This module processes information efficiently using two convolutions. The Neck helps accurately determine the location and size of targets by combining features of various resolutions. The C2f module is also utilized in this section to effectively integrate high-resolution and low-resolution features. Through this, the model can better detect objects of various sizes. The head is designed as an anchor-free model, reducing complexity and improving performance for object detection. By using a decoupled head, objectity, classification, and regression tasks can be processed independently. This allows each task to be performed more accurately, improving the overall accuracy of the model. Objectity scores are calculated using a sigmoid function, and class probabilities are calculated using a softmax function. Additionally, YOLOv8 enhances small object detection performance by utilizing the Complete Intersection over Union (CloU) and Distance Focal Loss (DFL) loss functions, and includes a semantic segmentation model called YOLOv8-seg to predict semantic segmentation masks for images.With all these elements combined, YOLOv8 records a higher AP and faster speed than the previous version, demonstrating superior performance.

[0045] FIG. 2 is a flowchart illustrating an example of the optimization process of a YOLO model in an embodiment of the present invention. For example, the parameter tuning unit (112) of the learning system (110) can set initial hyperparameters for the learning rate, batch size, and epoch of the YOLOv8 model (Parameter initialization (210)). At this time, the parameter tuning unit (112) can train the YOLOv8 model using the given parameters and define an objective function (Defining Objective Function (220)) that minimizes the validation loss based on the results. The objective function can evaluate the performance of the model and be used as a criterion for parameter optimization.

[0046] Squirrel Search Algorithm (SSA)

[0047] Meta-heuristic optimization is the fundamental definition of an algorithm designed to respond universally to any problem, rather than being limited to a specific one. Because nature possesses a wealth of mechanisms and principles, nature-inspired optimization algorithms have emerged that mimic certain biological behaviors or physical phenomena. SSA is a meta-heuristic optimization technique that models the predatory behavior of squirrels. This algorithm searches for optimal solutions to complex problems by mimicking the seasonal movements and foraging strategies of flying squirrels. Through Global Search and Local Search strategies, SSA explores a broad range in the initial stages to identify potential new areas, and as optimization progresses, it refines the optimal solution by exploring areas that have already demonstrated success in later stages. Furthermore, it finely tunes the optimization process by introducing the concept of seasonality, which utilizes various strategies based on resource availability.

[0048] First, assumptions are made to simplify the mathematical model. There are n flying squirrels in the forest, with each flying squirrel per tree. Each flying squirrel searches for food individually and optimizes the use of available resources through dynamic foraging behavior. The forest contains three types of trees: common trees, oak (acorn) trees, and hickory trees; it is assumed that the forest area under consideration contains three oak trees and one hickory tree. In the study setup, the number of flying squirrels (n) is 50. Among the total 50 trees, there are 4 nutritious food resources N fs There is 1 hickory tree and 3 oak trees. The remaining 46 trees have no food resources. The number of food resources is 1 <N fs< n may vary depending on constraints, one of which includes hickory nuts, the optimal winter food resource. Similar to other cluster-based algorithms, SSA starts by randomly setting the initial position of the flying squirrels. The position of the flying squirrels can be represented as a vector in a dimensional search space. Thus, the flying squirrels can change their position vectors while gliding in a 1-dimensional, 2-dimensional, 3-dimensional, or hyperdimensional search space. There are n flying squirrels (FS) in the forest, and the position of the i-th flying squirrel can be represented as a vector as shown in Equation 1 below.

[0049]

[0050] At this time, FS i,j can represent the position of the i-th flying squirrel in the j-th dimension. In this case, the position FS of the i-th flying squirrel i It can be expressed as shown in mathematical formula 2 below.

[0051]

[0052] Here, FS L ,FS U can represent the upper and lower bounds in the j-th dimension of the i-th flying squirrel, respectively, and U(0,1) can represent random numbers evenly distributed in the range [0,1].

[0053] In addition, a user-defined fitness function for each flying squirrel's position can be calculated and stored in an array such as the following mathematical formula 3.

[0054]

[0055] The fitness value of a flying squirrel based on its location can represent the quality of the food source sought at that location. In other words, it can mean an optimal food source (hickory tree), an average food source (acorn tree), or no food source (ordinary tree), and this can also affect the probability of survival.

[0056] Hyperparameter Optimization of YOLOv8 Using SSA

[0057] The parameter tuning unit (112) can initialize a squirrel population with various parameter combinations randomly (Initialize Squirrel Population with random hyperparameters (230)). Each squirrel may represent hyperparameters such as the learning rate, batch size, and number of epochs. At this time, the parameter tuning unit (112) can train a YOLOv8 model with the hyperparameter combinations of each squirrel and evaluate the performance of the model. This process may include an Evaluating Fitness (240) process, where the fitness function may be set around the mean average precision (mAP) value. For example, the parameter tuning unit (112) can set precision (P), recall (R), and mAP as shown in Equation 4 below. 50 , mAP 50-95 Loss metrics (L) in the sum of values box ,L cls ,L dft The function excluding ) can be applied as an evaluation criterion for hyperparameter optimization performance. Here, mAP 50 mAP is the average precision calculated based on the case where the IoU (Intersection over Union) is 0.5 or higher. 50-95 can respectively represent the average precision calculated at various thresholds where the IoU increases from 0.5 to 0.95 in increments of 0.05. In the computational loss metric, L box Bounding Box Loss can represent the error that occurs in predicting the location and size of bounding boxes. box Loss can be used to minimize the difference between the actual object's bounding box and the predicted bounding box, and can be calculated using IoU loss or a modified form of the loss function. In addition, Lcls (Classification Loss) can represent the error that occurs when a model predicts the class label of each object. These L cls The loss can be calculated using cross-entropy loss and can aid in training the model to correctly classify the exact class of objects. Finally, L dft Deformation Loss is a loss factor used to handle object shape deformation, and it can be used to evaluate how well a model predicts deformation when an object's shape is somewhat deformed or atypical.

[0058]

[0059] Here, θ can represent a set of hyperparameters. In other words, θ represents the positions of the squirrels (P i The function F(θ) in ) is the fitness function f(P i It can correspond to ). The parameter tuning unit (112) can correspond to the position of the squirrels (P) according to the fit evaluation result. i ) can be adjusted. In this process, to find a better combination of hyperparameters, the position within the search space can be continuously updated (Update Squirrel Positions (250)), and the position update can be optimized by reflecting the squirrel's behavior according to seasonal changes. The parameter tuning unit (112) iteratively optimizes the hyperparameters, and the fitness function f(P i The process can be repeated (Convergence check (260)) until a specific convergence criterion, such as Equation 5 below, is satisfied. If an optimal combination of hyperparameters is found, that combination can be selected.

[0060]

[0061] The parameter tuning unit (112) can repeat the process of adjusting the positions of the squirrels (Update Squirrel Positions (250)) and the process of the convergence check (Convergence check (260)) if the convergence condition is not satisfied, and the process can be terminated if satisfied. The parameter tuning unit (112) can finally train the YOLOv8 model using optimized hyperparameters and can analyze and evaluate performance (Evaluate Performance (270) and Result Analysis & validation (280)) through a validation dataset.

[0062] Experiment and Results

[0063] This experiment was conducted in a Google Colab environment. The software configuration used was the PyTorch 2.3.1 framework and Python 3.10.12, while the hardware configuration used was an Intel(R) Xeon(R) CPU @ 2.20GHz and an NVIDIA T4 GPU on the Ubuntu 22.04.3 LTS operating system. This environment configuration provided high performance and efficiency during the training and inference processes of the deep learning model. Table 1 below shows an example of the implementation environment configuration.

[0064]

[0065] In this experimental example, object detection was performed using the DIOR (detection in optical remote sensing images) dataset. This dataset contains 23,463 images and 192,472 instances, and includes a total of 20 different object classes, such as buildings, cars, ships, and aircraft. Figure 3 illustrates an example of the 20 object classes included in the DIOR dataset in an experimental example of the present invention. The DIOR dataset is very useful for evaluating and developing detection algorithms because it includes various object classes and complex backgrounds. The entire DIOR dataset was redistributed as 50% for training, 40% for testing, and 10% for verification. Figure 4 illustrates an example of the training and verification datasets in an experimental example of the present invention, and Figure 5 illustrates an example of the test dataset in an experimental example of the present invention. In addition, FIG. 6 is a diagram illustrating an example of the prediction results of remote sensing object detection in an embodiment of the present invention, Table 2 below compares the performance of the YOLOv8 model and the YOLOv8 model with added SSA using four indicators, and Tables 3 and 4 compare the loss of the YOLOv8 model and the YOLOv8 model with added SSA.

[0066]

[0067] YOLOv8 + SSAepochtrain / box_losstrain / cls_losstrain / dfl_lossval / box_lossval / cls_lossval / dfl_loss12.97604.26803.09602.82103.24002.623051.30101.49801.28401.27801.35701.1660101.18901.04201.09101.15801.01501.0720151.06900.86401.01301.09700.92100.9520200.94800.83100.97801.05400.83600.9950250.82600.73200.95001.05400.78400.9760

[0068] YOLOv8epochtrain / box_losstrain / cls_losstrain / dfl_lossval / box_lossval / cls_lossval / dfl_loss13.50065.02113.64213.31883.81233.085651.53091.763001.51101.50361.59641.3714101.39921.22561.28441.36291.19471.2618151.25761.01591.19241.29131.08401.1205201.11540.97651.15041.24020.98381.1698250.97210.86151.11781.24070.92221.1487

[0069] mAP 50 In this case, the YOLOv8+SSA model shows a performance improvement of 10% with a score of 0.788 compared to the YOLOv8 model's 0.6540. This means that the addition of SSA enables the model to detect objects more accurately at the 50% IoU threshold. In addition, mAP 50-95With YOLOV8+SSA scoring 0.5690 and YOLOv8 scoring 0.4470, this indicates that SSA has generally improved the performance of the SSA model across various IoU thresholds. In terms of Precision, YOLOv8+SSA shows a value of 0.8710 and YOLOv8 shows 0.7650, confirming that the addition of SSA leads to accurate predictions while reducing false detections. In terms of Recall, YOLOv8+SSA records a higher value of 0.7030 compared to YOLOv8's 0.5750. This signifies that SSA has enhanced the model's detection capabilities and reduced missed objects. It can be confirmed that performance has improved compared to the basic YOLOv8 model across all evaluation metrics. In particular, the YOLOv8+SSA model shows up to a 12% improvement in mean precision (mAP). This demonstrates that SSA is effective in improving the detection accuracy of the model. Figures 7 through 11 illustrate examples of the results of model training in an experimental example of the present invention. In the graphs of Figure 7, the x-axis may represent epochs, and the y-axis may represent the corresponding values ​​of each graph. Figure 8 shows an example of a confusion matrix that visually represents the relationship between the class predicted by the model and the actual class, and Figure 9 shows an example of an F1 confidence curve to analyze the overall performance of the model by visualizing the relationship between the model's precision and recall. In Figure 9, the x-axis may represent the threshold value, and the y-axis may represent the F1 score. Figure 10 shows an example of a precision confidence curve that visualizes the model's precision and its confidence interval, and Figure 11 shows an example of a recall confidence curve that visualizes the model's recall and its confidence interval.

[0070] As such, the embodiments of the present invention describe a method and system for detecting objects using an optimized YOLO model, and for training a YOLO model to optimize the hyperparameters of a YOLOv8 model by applying Squirrel Search. SSA possesses excellent search capabilities and is efficient in finding global optimal solutions by adapting to various environmental conditions. Consequently, the model was able to demonstrate higher object detection performance in remote sensing image analysis. In the results of the experimental examples, the SSA-based optimized YOLOv8 model achieved an mAP in the DIOR dataset experiment. 50 Performance improvements were confirmed, with precision and accuracy improving by 10% and 9%, respectively. This suggests that SSA can significantly improve the performance of real-time object detection, and implies that the SSA-based optimized YOLO model according to the embodiments of the present invention can be applied to various remote detection and real-time object detection applications.

[0071] FIG. 12 is a flowchart illustrating an example of a learning method of a learning system according to an embodiment of the present invention. The learning method according to the present embodiment can be performed by the learning system (110) described through FIG. 1. As previously described, the learning system (110) can be implemented by at least one computer device. At this time, at least one processor included in the at least one computer device may be implemented to execute a control instruction according to the code of an operating system included in memory or the code of at least one computer program. Here, the at least one processor may operate according to the control instruction provided by the code stored in the at least one computer device to control the learning system (110) implemented by the at least one computer device so that the learning system (110) performs steps (1210 to 1230) included in the method of FIG. 12. At this time, the data preprocessing unit (111), parameter tuning unit (112), and model learning unit (113) included in the learning system (110) may be functional representations of at least one processor for performing steps (1210 to 1230) included in the method of FIG. 12.

[0072] In step (1210), the data preprocessing unit (111) can preprocess a dataset containing multiple training images. As previously described, the preprocessing may include a process of converting the format of annotation files within the dataset directory into a format compatible with the YOLO model and a process of redistributing the entire dataset for training, testing, and verification.

[0073] In step (1220), the parameter tuning unit (112) can tune the hyperparameters of the YOLO model based on SSA. Here, the YOLO model may be a model including a CloU loss function, a DFL loss function, and a semantic segmentation model that predicts a semantic segmentation mask for an image. The YOLOv8 model was previously described as an example of such a model. At this time, the parameter tuning unit (112) may select hyperparameters by mapping the hyperparameters to the squirrel positions used in SSA and evaluating the goodness of fit for the hyperparameters. Here, the hyperparameters may include at least one of the learning rate, batch size, and epoch. Meanwhile, the goodness of fit evaluation may be based on the goodness of fit according to a goodness of fit function. This goodness of fit function may include precision, recall, average precision, and loss indicator as terms. The loss metric may include at least one of bounding box loss, which represents the error occurring in predicting the location and size of bounding boxes; classification loss, which represents the error occurring when the YOLO model predicts the class label of each object; and deformation loss, which represents the loss used to handle shape deformation of objects. Additionally, the average precision may include at least one of a first average precision calculated based on the case where the IoU is 0.5 or greater, and a second average precision calculated at several thresholds where the IoU increases in increments of 0.05 from 0.5 to 0.95. An example of such a goodness-of-fit function was previously described through Equation 4.

[0074] In step (1230), the model training unit (113) can train the YOLO model according to tuned hyperparameters using the preprocessed dataset. At this time, the model training unit (113) can train the YOLO model according to hyperparameters selected by the parameter tuning unit (112) in step (1230).

[0075] FIG. 13 is a flowchart illustrating an example of an object detection method of an object detection system according to an embodiment of the present invention. The object detection method according to the present embodiment can be performed by the object detection system (120) described through FIG. 1. As previously described, the object detection system (120) can be implemented by at least one computer device. At this time, at least one processor included in the at least one computer device may be implemented to execute a control instruction according to the code of an operating system included in memory or the code of at least one computer program. Here, the at least one processor may operate according to the control instruction provided by the code stored in the at least one computer device to control the object detection system (120) implemented by the at least one computer device so that the object detection system (120) performs steps (1310 to 1330) included in the method of FIG. 13. At this time, the model loading unit (121) and the object detection unit (122) included in the object detection system (120) may be functional representations of at least one processor for performing steps (1310 to 1330) included in the method of FIG. 13.

[0076] In step (1310), the model loading unit (121) can load a pre-trained YOLO model according to hyperparameters tuned based on SSA. Here, the YOLO model may be a model including a CloU loss function, a DFL loss function, and a semantic segmentation model that predicts a semantic segmentation mask for an image. The YOLOv8 model was previously described as an example of such a model. At this time, the YOLO model can be pre-trained according to hyperparameters selected through a goodness-of-fit evaluation of hyperparameters by mapping hyperparameters to the squirrel positions used in SSA. It was previously explained that in step (1220) of FIG. 12, the parameter tuning unit (112) can select hyperparameters by mapping hyperparameters to the squirrel positions used in SSA and evaluating the goodness-of-fit of hyperparameters. Here, the hyperparameters may include at least one of a learning rate, a batch size, and an epoch. Meanwhile, the goodness-of-fit evaluation may be based on the goodness-of-fit according to a goodness-of-fit function. Such a goodness-of-fit function may include terms for precision, recall, average precision, and loss indicator. The loss indicator may include at least one of bounding box loss, representing the error occurring in predicting the location and size of bounding boxes; classification loss, representing the error occurring when the YOLO model predicts the class label of each object; and deformation loss, representing the loss used to handle shape deformation of objects. Additionally, the average precision may include at least one of a first average precision calculated based on the case where the IoU is 0.5 or greater, and a second average precision calculated at several thresholds where the IoU increases in increments of 0.05 from 0.5 to 0.95. An example of such a goodness-of-fit function was previously described through Equation 4.

[0077] In step (1320), the object detection unit (122) may receive an image for object detection. The image may include remote detection images such as satellite images, aerial images, surveillance camera images, drone footage, etc.

[0078] In step (1320), the object detection unit (122) can input an image into the YOLO model and provide information about the object contained in the image output by the YOLO model. As previously explained, the real-time object detection performance of the pre-trained YOLO model is significantly improved according to the hyperparameters tuned through SSA, and due to this performance improvement, the object detection unit (122) can provide a more accurate object detection function for the input image.

[0079] Thus, according to embodiments of the present invention, a method and system for object detection in remote detection images can be provided.

[0080] FIG. 14 is a block diagram illustrating an example of a computer device according to an embodiment of the present invention. For example, a learning system (110) and an object detection system (120) may each be implemented by at least one computer device, and each of the at least one computer device may correspond to the computer device (1400) of FIG. 14. As shown in FIG. 14, the computer device (1400) may include memory (1410), a processor (1420), a communication interface (1430), and an input / output interface (1440). The memory (1410) is a computer-readable recording medium and may include a non-perishable mass storage device such as RAM (random access memory), ROM (read only memory), and a disk drive. Here, non-perishable mass storage devices such as ROM and disk drives may be included in the computer device (1400) as separate permanent storage devices distinct from memory (1410). Additionally, an operating system and at least one program code may be stored in memory (1410). These software components may be loaded into memory (1410) from a computer-readable recording medium separate from memory (1410). This separate computer-readable recording medium may include computer-readable recording media such as floppy drives, disks, tapes, DVD / CD-ROM drives, and memory cards. In another embodiment, software components may be loaded into memory (1410) via a communication interface (1430) rather than a computer-readable recording medium.For example, software components can be loaded into the memory (1410) of a computer device (1400) based on a computer program installed by files received through a network (Network, 1460).

[0081] The processor (1420) may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input / output operations. Instructions may be provided to the processor (1420) via memory (1410) or a communication interface (1430). For example, the processor (1420) may be configured to execute instructions received according to program code stored in a recording device such as memory (1410).

[0082] The communication interface (1430) may provide a function for the computer device (1400) to communicate with other devices through a network (1460). For example, requests, commands, data, files, etc. generated by the processor (1420) of the computer device (1400) according to program code stored in a recording device such as memory (1410) may be transmitted to other devices through the network (1460) under the control of the communication interface (1430). Conversely, signals, commands, data, files, etc. from other devices may be received by the computer device (1400) through the communication interface (1430) of the computer device (1400) via the network (1460). Signals, commands, data, etc. received through the communication interface (1430) may be transmitted to the processor (1420) or memory (1410), and files, etc. may be stored in a storage medium (the permanent storage device described above) that the computer device (1400) may further include.

[0083] The input / output interface (1440) may be a means for interfacing with an input / output device (I / O device, 1450). For example, the input device may include a device such as a microphone, keyboard, or mouse, and the output device may include a device such as a display or speaker. As another example, the input / output interface (1440) may be a means for interfacing with a device in which the functions for input and output are integrated into one, such as a touchscreen. The input / output device (1450) may be composed of a computer device (1400) and a single device.

[0084] Additionally, in other embodiments, the computer device (1400) may include fewer or more components than those of FIG. 14. However, it is not necessary to clearly illustrate most of the prior art components. For example, the computer device (1400) may be implemented to include at least some of the input / output devices (1450) described above, or may include other components such as a transceiver, a database, etc.

[0085] The system or device described above may be implemented as a hardware component, or a combination of a hardware component and a software component. For example, the device and component described in the embodiments may be implemented using one or more general-purpose or special-purpose computers, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing unit may execute an operating system (OS) and one or more software applications executed on said operating system. Additionally, the processing unit may access, store, manipulate, process, and generate data in response to the execution of the software. For ease of understanding, the processing unit may be described as being used as a single unit, but those skilled in the art will understand that the processing unit may include multiple processing elements and / or multiple types of processing elements. For example, the processing unit may include multiple processors or one processor and one controller. In addition, other processing configurations, such as parallel processors, are also possible.

[0086] Software may include computer programs, code, instructions, or a combination of one or more of these, and may configure a processing unit to operate as desired or instruct the processing unit independently or collectively. Software and / or data may be embodied in any type of machine, component, physical device, virtual equipment, computer storage medium, or device so as to be interpreted by the processing unit or to provide instructions or data to the processing unit. Software may be distributed over networked computer systems and may be stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

[0087] The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., either individually or in combination. The medium may continuously store a program executable by a computer, or temporarily store it for execution or download. Furthermore, the medium may be various recording or storage means in the form of a single or multiple hardware components, and is not limited to a medium directly connected to a computer system, but may also exist distributed over a network. Examples of media may include magnetic media such as hard disks, floppy disks, and magnetic tapes; optical recording media such as CD-ROMs and DVDs; magneto-optical media such as floptical disks; and media configured to store program instructions, including ROM, RAM, and flash memory. Additionally, other examples of media may include recording or storage media managed by app stores that distribute applications or sites and servers that supply or distribute various other software. Examples of program instructions include machine code, such as that generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

[0088] Although the embodiments have been described above with reference to limited examples and drawings, those skilled in the art can make various modifications and variations from the description above. For example, suitable results can be achieved even if the described techniques are performed in a different order than described, and / or the components of the described system, structure, device, circuit, etc. are combined or assembled in a form different from described, or replaced or substituted by other components or equivalents.

[0089] Therefore, other implementations, other embodiments, and equivalents to the claims also fall within the scope of the claims set forth below.

Claims

1. An object detection method of an object detection system implemented by at least one computer device, The above at least one computer device includes at least one processor, and The above object detection method is, A step of loading a pre-trained YOLO (You Only Look Once) model according to hyperparameters tuned based on the SSA (Squirrel Search Algorithm) by the above at least one processor; A step of receiving an image for object detection by the above-mentioned at least one processor; and A step of inputting the image into the YOLO model by the at least one processor and providing information about an object included in the image output by the YOLO model. Includes, The above YOLO model is pre-trained according to the hyperparameters selected through a goodness-of-fit evaluation for the hyperparameters by mapping the hyperparameters to the positions of the squirrels used in the above SSA, and The above goodness-of-fit evaluation is based on the goodness-of-fit according to a goodness-of-fit function that subtracts the loss metric from the sum of precision, recall, and mean precision, and The above average precision includes a first average precision calculated based on the case where the Intersection over Union (IoU) is 0.5 or greater, and a second average precision calculated at several thresholds where the IoU increases in increments of 0.05 from 0.5 to 0.

95. The above loss indicators include Bounding Box Loss, which represents the error occurring in predicting the location and size of bounding boxes; Classification Loss, which represents the error occurring when the YOLO model predicts the class label of each object; and Deformation Loss, which represents the loss used to process shape deformation of objects. An object detection method characterized by 2. In Paragraph 1, An object detection method characterized by the above YOLO model including a CloU (Complete Intersection over Union) loss function, a DFL (Distance Focal Loss) loss function, and a semantic segmentation model that predicts a semantic segmentation mask for an image.

3. In Paragraph 1, An object detection method characterized in that the above hyperparameters include at least one of a learning rate, a batch size, and an epoch.

4. A learning method of a learning system implemented by at least one computer device, The above at least one computer device includes at least one processor, and The above learning method is, A step of preprocessing a dataset including a plurality of training images by the above-mentioned at least one processor; A step of tuning the hyperparameters of a YOLO (You Only Look Once) model based on the SSA (Squirrel Search Algorithm) by the above at least one processor; and A step of training the YOLO model according to the tuned hyperparameters using the preprocessed dataset. Includes, The step of tuning the above hyperparameters is, By associating the hyperparameters with the positions of the squirrels used in the above SSA, the hyperparameters are selected through an evaluation of the goodness of fit for the hyperparameters, and The step of training the above YOLO model is, The YOLO model is trained according to the selected hyperparameters, and The above goodness-of-fit evaluation is based on the goodness-of-fit according to a goodness-of-fit function that subtracts the loss metric from the sum of precision, recall, and mean precision, and The above average precision includes a first average precision calculated based on the case where the Intersection over Union (IoU) is 0.5 or greater, and a second average precision calculated at several thresholds where the IoU increases in increments of 0.05 from 0.5 to 0.

95. The above loss indicators include Bounding Box Loss, which represents the error occurring in predicting the location and size of bounding boxes; Classification Loss, which represents the error occurring when the YOLO model predicts the class label of each object; and Deformation Loss, which represents the loss used to process shape deformation of objects. A learning method characterized by 5. In Paragraph 4, A learning method characterized by the above YOLO model including a CloU (Complete Intersection over Union) loss function, a DFL (Distance Focal Loss) loss function, and a semantic segmentation model that predicts a semantic segmentation mask for an image.

6. In Paragraph 4, A learning method characterized in that the above hyperparameters include at least one of a learning rate, a batch size, and an epoch.

7. A computer program stored on a computer-readable recording medium combined with a computer device to execute the method of any one of claims 1 to 6 on the computer device.

8. A computer-readable recording medium having a computer program recorded thereon for executing the method of any one of paragraphs 1 through 6 on a computer device.

9. An object detection system implemented by at least one computer device, The above at least one computer device includes at least one processor, and By the above at least one processor, Load a pre-trained YOLO (You Only Look Once) model based on hyperparameters tuned according to SSA (Squirrel Search Algorithm), and Receive an image for object detection, and The above image is input into the YOLO model to provide information about the object included in the image output by the YOLO model, and The above YOLO model is pre-trained according to the hyperparameters selected through a goodness-of-fit evaluation for the hyperparameters by mapping the hyperparameters to the positions of the squirrels used in the above SSA, and The above goodness-of-fit evaluation is based on the goodness-of-fit according to a goodness-of-fit function that subtracts the loss metric from the sum of precision, recall, and mean precision, and The above average precision includes a first average precision calculated based on the case where the Intersection over Union (IoU) is 0.5 or greater, and a second average precision calculated at several thresholds where the IoU increases in increments of 0.05 from 0.5 to 0.

95. The above loss indicators include Bounding Box Loss, which represents the error occurring in predicting the location and size of bounding boxes; Classification Loss, which represents the error occurring when the YOLO model predicts the class label of each object; and Deformation Loss, which represents the loss used to process shape deformation of objects. An object detection system characterized by 10. In a learning system implemented by at least one computer device, The above at least one computer device includes at least one processor, and By the above at least one processor, Preprocess a dataset containing multiple training images, and Tuning the hyperparameters of the YOLO (You Only Look Once) model based on SSA (Squirrel Search Algorithm), and Using the above preprocessed dataset, the YOLO model is trained according to the above tuned hyperparameters, and To tune the above hyperparameters, by the at least one processor, By associating the hyperparameters with the positions of the squirrels used in the above SSA, the hyperparameters are selected through an evaluation of the goodness of fit for the hyperparameters, and To train the above YOLO model, by the above at least one processor, The YOLO model is trained according to the selected hyperparameters, and The above goodness-of-fit evaluation is based on the goodness-of-fit according to a goodness-of-fit function that subtracts the loss metric from the sum of precision, recall, and mean precision, and The above average precision includes a first average precision calculated based on the case where the Intersection over Union (IoU) is 0.5 or greater, and a second average precision calculated at several thresholds where the IoU increases in increments of 0.05 from 0.5 to 0.

95. The above loss indicators include Bounding Box Loss, which represents the error occurring in predicting the location and size of bounding boxes; Classification Loss, which represents the error occurring when the YOLO model predicts the class label of each object; and Deformation Loss, which represents the loss used to process shape deformation of objects. A learning system characterized by