Method, apparatus, and recording medium for recognizing target object

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The optical inspection system effectively identifies objects using image processing and pattern recognition, addressing the inefficiencies of traditional tagging methods by analyzing unique surface features, ensuring reliable tracking and management without physical attachments.

WO2026127662A1PCT designated stage Publication Date: 2026-06-18KOHYOUNG TECH

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: KOHYOUNG TECH
Filing Date: 2025-12-11
Publication Date: 2026-06-18

Application Information

Patent Timeline

11 Dec 2025

Application

18 Jun 2026

Publication

WO2026127662A1

IPC: G16H40/40; G16H40/20; A61B34/10

AI Tagging

Application Domain

Computer-aided planning/modelling Healthcare resources and facilities

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing systems for managing objects in hospitals, factories, and logistics centers rely on barcodes or RFID tags for identification, which can be cumbersome and inefficient, especially in environments where tags may be damaged or unsuitable due to cleaning processes.

⚗Method used

An optical inspection system that uses an apparatus with processors and memory to acquire images, extract feature sets from predefined regions of interest, determine similarities with stored feature sets, and update vectors to identify objects without separate tags or barcodes, utilizing image processing and pattern recognition techniques.

🎯Benefits of technology

Enables accurate identification of objects by analyzing unique surface features, overcoming the limitations of traditional tagging methods and ensuring reliable tracking and management without physical attachments.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure KR2025021357_18062026_PF_FP_ABST

Patent Text Reader

Abstract

Disclosed is a technology for recognizing a target object. A method performed by an apparatus comprising one or more processors and one or more memories in which instructions to be executed by the one or more processors are stored, according to one aspect of the present disclosure, may comprise the steps of: acquiring, by the one or more processors, an image of a target object; extracting a first feature set from a plurality of image regions on the image corresponding to each of a plurality of predefined regions of interest (ROIs) for the target object, wherein the first feature set includes a plurality of first feature vectors for each of the plurality of image regions; determining an overall similarity with the first feature set for each of the plurality of feature sets stored in the memory; and determining a feature set having the highest overall similarity among the plurality of feature sets stored in the memory as a second feature set.

Need to check novelty before this filing date? Find Prior Art

Description

Method, device, and recording medium for recognizing an object

[0001] The present disclosure relates to a technology for recognizing an object.

[0002] In hospitals, factories, and logistics centers, objects used in large quantities, such as surgical instruments, factory parts, or various materials, can be managed, for example, by being retrieved, cleaned, disinfected, and resupplied after use. In this process, to identify whether objects are missing or damaged and to manage their history, barcodes or RFID tags can be attached to track the objects.

[0003] At least one embodiment of the present disclosure can provide an optical inspection system capable of identifying a target object without attaching a separate tag or barcode.

[0004] The technical problems of the present disclosure are not limited to those mentioned above, and other unmentioned technical problems will be clearly understood by a person skilled in the art of the present disclosure from the description below.

[0005] A method performed in an apparatus comprising one or more processors and one or more memories for storing instructions to be executed by said one or more processors according to one aspect of the present disclosure may include: the one or more processors acquiring an image of a target object; extracting a first feature set from a plurality of image regions on said image corresponding to each of a plurality of regions of interest (ROI) predefined for said target object, wherein the first feature set comprises a plurality of first feature vectors for each of said image regions; determining an overall similarity with respect to each of said feature set stored in said memory with respect to said first feature set; and determining the feature set with the highest overall similarity among said feature sets stored in said memory as a second feature set.

[0006] In one embodiment, the method according to the present disclosure may further include: determining one or more vectors to be changed among a plurality of second feature vectors included in the second feature set based on the first feature set and the second feature set; and updating the vectors to be changed within the second feature set to the value of a first feature vector corresponding to the vectors to be changed within the first feature set.

[0007] In one embodiment, the step of extracting the first feature set may include: obtaining a brightness value for each of a plurality of pixels included in a first region on the image corresponding to the first region of interest; dividing the first region into a plurality of sub-regions; determining a representative value for each of the plurality of sub-regions based on the brightness value of each of the plurality of pixels; and generating a first feature vector of the first region of interest based on the frequency of occurrence of each representative value of the plurality of sub-regions.

[0008] In one embodiment, the step of determining a representative value for each of the plurality of sub-regions may include: determining a representative pixel of a first sub-region; comparing brightness values between the representative pixel and pixels adjacent to the representative pixel; and determining a representative value of the first sub-region based on the brightness comparison result.

[0009] In one embodiment, the step of determining a representative value for each of the plurality of sub-regions may include determining a representative value for the first sub-region based on at least one of the average brightness value, the median brightness value, or the variance value of the brightness values of the plurality of pixels included in the first sub-region.

[0010] In one embodiment, the step of generating a current feature vector of the first region of interest may include: generating a first frequency distribution based on the frequency of occurrence of each representative value of the plurality of sub-regions; dividing the first frequency distribution into a predetermined number of groups and calculating the sum of frequency values included in each of the groups; and generating a first feature vector of the first region of interest based on the sum of frequency values included in each of the groups.

[0011] In one embodiment, the step of extracting the first feature set may include: the step of extracting a feature set comprising a plurality of first feature vectors from a region on the image corresponding to each of the plurality of regions of interest multiple times; and the step of determining the first feature set based on the average value of the feature sets extracted multiple times.

[0012] In one embodiment, the step of extracting the first feature set may include: the step of extracting a feature set comprising a plurality of first feature vectors from a region on the image corresponding to each of the plurality of regions of interest multiple times; and the step of determining the first feature set based on the moving average value of the feature sets extracted multiple times.

[0013] In one embodiment, the plurality of regions of interest may be tool marks or unique pattern regions formed on the surface of the target object.

[0014] In one embodiment, the step of determining the overall similarity may include: determining the individual similarity of each of the plurality of first feature vectors included in the first feature set and each of the corresponding second feature vectors included in the feature set; and determining the overall similarity based on the individual similarities.

[0015] In one embodiment, the step of determining the individual similarity may include the step of determining the individual similarity using at least one of a cosine similarity or Euclidean distance algorithm.

[0016] In one embodiment, the step of determining the overall similarity may include the step of determining the overall similarity by applying a weight pre-assigned to each of the plurality of regions of interest to the individual similarities.

[0017] In one embodiment, the weights may be determined based on the expected frequency of damage or wear of each of the plurality of regions of interest.

[0018] In one embodiment, the step of determining the overall similarity includes determining the overall similarity based only on individual similarities that are greater than or equal to a predetermined first threshold among the determined plurality of individual similarities, and the step of determining one or more vectors to be changed may include determining a second feature vector, in which the individual similarity is greater than or equal to the first threshold and less than a predetermined second threshold, as the vector to be changed.

[0019] An apparatus according to another aspect of the present disclosure comprises one or more processors; and one or more memories in which instructions to be executed by the one or more processors are stored. When the instructions are executed, the one or more processors acquire an image of a target object, extract a first feature set from a plurality of image regions on the image corresponding to each of a plurality of regions of interest (ROI) predefined for the target object, wherein the first feature set comprises a plurality of first feature vectors for each of the plurality of image regions, determine an overall similarity with respect to each of the plurality of feature sets stored in the memory, and determine the feature set with the highest overall similarity among the plurality of feature sets stored in the memory as a second feature set.

[0020] A non-transient computer-readable recording medium having instructions to be executed by one or more processors according to another aspect of the present disclosure, wherein, at the time of execution of the instructions, the one or more processors may acquire an image of a target object, extract a first feature set from a plurality of image regions on the image corresponding to each of a plurality of regions of interest (ROI) predefined for the target object—the first feature set includes a plurality of first feature vectors for each of the plurality of image regions—determine the overall similarity with the first feature set for each of the plurality of feature sets stored in memory, and determine the feature set with the highest overall similarity among the plurality of feature sets stored in memory as a second feature set.

[0021] According to various embodiments of the present disclosure, an optical inspection system capable of identifying an object without attaching a separate tag or barcode can be provided.

[0022] The effects according to the technical concept of the present disclosure are not limited to those mentioned above, and other unmentioned effects can be clearly understood by a person skilled in the art from the description in the specification.

[0023] FIG. 1 illustrates an environment in which a device according to one embodiment of the present disclosure can be applied.

[0024] FIG. 2 is a block diagram of a device according to one embodiment of the present disclosure.

[0025] FIG. 3 illustrates the structure of an optical inspection system including a first camera and a second camera according to one embodiment of the present disclosure.

[0026] FIG. 4 is a front view of an optical inspection system according to one embodiment of the present disclosure.

[0027] FIG. 5 is an exemplary diagram of a region of interest of a target object according to one embodiment of the present disclosure.

[0028] FIG. 6 is a drawing for explaining the process of generating a set of features according to one embodiment of the present disclosure.

[0029] FIG. 7 is a drawing for explaining the process of generating a set of features according to one embodiment of the present disclosure.

[0030] FIG. 8 is a drawing for explaining the process of generating a set of features according to one embodiment of the present disclosure.

[0031] FIG. 9 is a drawing for explaining the process of generating a set of features according to one embodiment of the present disclosure.

[0032] FIG. 10 is an example of a first set of features according to one embodiment of the present disclosure.

[0033] FIG. 11 is a set of features stored in one or more memories according to one embodiment of the present disclosure.

[0034] FIG. 12 is a diagram illustrating the update of a feature vector according to one embodiment of the present disclosure.

[0035] FIG. 13 is a flowchart of a method for generating a two-dimensional image of a target object according to one embodiment of the present disclosure.

[0036] FIG. 14 is a flowchart of a method for identifying a target object according to one embodiment of the present disclosure.

[0037] FIG. 15 is a flowchart of a method for identifying a target object according to one embodiment of the present disclosure.

[0038] FIG. 16 is a flowchart of a method for recognizing a target object according to one embodiment of the present disclosure.

[0039] FIG. 17 is a flowchart of a method for extracting a first set of features according to one embodiment of the present disclosure.

[0040] FIG. 18 is a flowchart of a method for determining a second set of features according to one embodiment of the present disclosure.

[0041] The various embodiments described in this document are illustrative for the purpose of clearly explaining the technical concept of the present disclosure and are not intended to limit it to specific embodiments. The technical concept of the present disclosure includes various modifications, equivalents, alternatives, and embodiments optionally combined from all or part of each embodiment described in this document. Furthermore, the scope of the technical concept of the present disclosure is not limited to the various embodiments presented below or the specific descriptions thereof.

[0042] Terms used in this document, including technical or scientific terms, may have the meaning generally understood by those skilled in the art to which this disclosure pertains, unless otherwise defined.

[0043] Expressions used in this document, such as “includes,” “may include,” “is equipped,” “is equipped,” “has,” and “may have,” imply the existence of the subject feature (e.g., function, operation, or component, etc.) and do not exclude the existence of other additional features. In other words, such expressions should be understood as open-ended terms implying the possibility of including other embodiments.

[0044] Singular expressions used in this document may include the meaning of the plural form unless the context otherwise indicates, and this applies likewise to singular expressions described in the claims.

[0045] Expressions such as "first," "second," or "first," "second" used in this document are used to distinguish one object from another when referring to multiple objects of the same kind, unless the context implies otherwise, and do not limit the order or importance of said objects.

[0046] Expressions used in this document such as “A, B, and C”, “A, B, or C”, “A, B, and / or C”, “at least one of A, B, and C”, “at least one of A, B, or C”, “at least one of A, B, and / or C”, etc., may mean each of the listed items or all possible combinations of the listed items. For example, “at least one of A or B” may refer to (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.

[0047] The expression "based on" as used in this document is used to describe one or more factors affecting an act or action of a decision or judgment described in the phrase or sentence containing such expression, and this expression does not exclude additional factors affecting said act or action of a decision or judgment.

[0048] As used in this document, the expression that a certain component (e.g., a first component) is "connected" or "connected" to another component (e.g., a second component) may mean not only that the said certain component is directly connected or connected to the said other component, but also that it is connected or connected through a new other component (e.g., a third component).

[0049] As used in this document, the expression "configured to" may have meanings such as "set to," "capable of," "modified to," "made to," or "able to." This expression is not limited to the meaning of "specifically designed in hardware," and, for example, a processor configured to perform a specific action may refer to a generic-purpose processor capable of performing that specific action by executing software.

[0050] In the present disclosure, the "learning model" may be designed to implement the structure of the human brain on a computer and may include a plurality of network nodes having weights that simulate neurons of a human neural network. The plurality of network nodes simulate the synaptic activity of neurons transmitting and receiving signals through synapses and may have interconnected relationships. In the learning model, the plurality of network nodes may be located in layers of different depths and may exchange data according to convolutional connections. For example, the learning model may be an artificial neural network model, a regression analysis model, etc. Meanwhile, the present disclosure is not limited thereto, and various types of models for analyzing data may be used.

[0051] In the present disclosure, the "training process" may refer to a process in which a training model extracts and analyzes features (patterns) of pairs of input and output data of training data, and repeats the process of deriving correlations between input and output data, and optimizes the parameters of the training model based on the correlations between input and output data.

[0052] In the present disclosure, the "inference process" may refer to a process in which a learning model applies a pattern previously learned to new input data to generate output data as a result of prediction or classification of the input data.

[0053] Various embodiments of the present disclosure will be described below with reference to the accompanying drawings. In the accompanying drawings and the description thereof, identical or substantially equivalent components may be given the same reference numerals. Furthermore, in the description of the various embodiments below, the description of identical or corresponding components may be omitted, but this does not mean that such components are not included in the embodiments.

[0054] FIG. 1 is a drawing illustrating an environment in which a device according to one embodiment of the present disclosure may be applied. The device (110), the first camera (120), and the second camera (130) may be connected via a network and communicate with each other.

[0055] The device (110) may be a computing device comprising one or more processors that receive and analyze image information acquired from the first camera (120) and the second camera (130) and control the shooting operation of the cameras (120, 130). The device (110) may serve as a brain that performs overall computation and control functions. For example, the device (110) may be a central server, a personal computer (PC), a mobile terminal (tablet PC, etc.), or a control device embedded in the inspection equipment itself. A system for recognizing an object according to one embodiment may include a work table on which the object is placed and a frame structure placed on top of the work table. According to one embodiment, a plurality of object objects may be placed in a tray and positioned on the work table. The frame structure may include a gantry structure such as an X-axis unit and a Y-axis unit, which allows a first camera (120) to be fixedly installed and a second camera (130) to move along the workbench area.

[0056] The first camera (120) can acquire a three-dimensional image including a plurality of target objects placed in an inspection target area. In one embodiment, the first camera (120) can acquire a three-dimensional image of target objects placed on a tray, and the three-dimensional image acquired by the first camera (120) can be used to identify information such as the position, type, and orientation (e.g., tilt) of each object.

[0057] The second camera (130) can acquire a detailed two-dimensional image of a specific area of an individual target object identified by the first camera (120). In one embodiment, the second camera (130) may be used to capture a two-dimensional image with a higher resolution than the first camera (120) to precisely identify unique surface features (e.g., tool marks) of the target object.

[0058] The device (110) analyzes a three-dimensional image acquired from a first camera (120) to determine the shooting position of a second camera (130), and corrects a two-dimensional image acquired by the second camera (130) based on 3D posture information, thereby enabling accurate identification of a target object even when the object is overlapping in an unspecified posture or tilted. 3D posture information may refer to information indicating the direction and position of a target object in three-dimensional space, extracted from a three-dimensional image. For example, 3D posture information may include at least one of slope (or tilting) information indicating how much and in which direction the target object is tilted relative to the bottom surface of a workbench, or height information for determining whether it is overlapping on another object. Such 3D posture information may be used as a parameter for mathematical coordinate transformation to correct the two-dimensional image.

[0059] The aforementioned device (110) may be implemented as one or more computing devices. For example, all functions of the device (110) may be implemented in a single computing device. As another example, the first function of the device (110) may be implemented in a first computing device, and the second function may be implemented in a second computing device. As yet another example, multiple computing devices may be used, each implementing all functions or specific functions of the device (110). The aforementioned computing devices may be a desktop computer, a laptop computer, an application server, a proxy server, or a cloud server, but are not limited thereto, and any type of device equipped with computing functions may be a computing device.

[0060] The network can serve to connect the device (110), the first camera (120), the second camera (130), or other external devices. The network can be implemented as any kind of wired or wireless network, such as, for example, a Local Area Network (LAN), a Wide Area Network (WAN), a Mobile Radio Communication Network, or Wibro (Wireless Broadband Internet).

[0061] In the following, "device" may be used as a top-level concept encompassing an entire inspection system that embodies the technical concept of the present invention. As such an entire inspection system, the device may include a separate device (110) connected to a network with a first camera (120) and a second camera (130), as illustrated in FIG. 1. According to one embodiment, the device may be implemented in a form in which the device (110), the first camera (120), and the second camera (130) are physically integrated within a single housing or frame.

[0062] FIG. 2 is a block diagram of a device according to one embodiment of the present disclosure.

[0063] The device (110) can process information for recognizing a target object. In one embodiment, the device (110) may include one or more processors (210), one or more memories (220), and a communication circuit (230) as components. In one embodiment, at least one of the components of the device (110) may be omitted, or another component may be added to the device (110). In one embodiment, additionally or alternatively, some components may be implemented as an integrated unit or as a singular or multiple entity. In the present disclosure, one or more processors (210) may be referred to as processors (210). Unless the context clearly indicates otherwise, the expression processors (210) may mean a set of one or more processors. In the present disclosure, one or more memories (220) may be referred to as memories (220). Unless the context clearly indicates otherwise, the expression memories (220) may mean a set of one or more memories. In one embodiment, at least some of the components inside and outside the device (110) may be connected to each other via a bus, GPIO (General Purpose Input / Output), SPI (Serial Peripheral Interface), or MIPI (Mobile Industry Processor Interface), etc., to exchange information (data, signals, etc.).

[0064] The processor (210) can perform various information processing operations related to the technology of the present disclosure (e.g., decision, operation, judgment, information generation, output, modification, update, control of other components, etc.). The processor (210) can perform the corresponding operations by executing instructions stored in the memory (220). That is, the processor (210) can cause the device (1100) to perform each of the embodiments of the present disclosure according to the instructions being executed. In one embodiment, the processor (210) acquires an image of a target object, extracts a first feature set from a plurality of image regions on the image corresponding to each of a plurality of regions of interest (ROI) predefined for the target object, determines a second feature set among a plurality of feature sets for the target object stored in one or more memories based on the first feature set, determines one or more change target vectors among a plurality of second feature vectors included in the second feature set based on the first feature set and the second feature set, and can update the value of the one or more change target vectors with one or more first feature vectors corresponding to the one or more change target vectors among the plurality of first feature vectors.

[0065] According to one embodiment, a feature set is an identifier for uniquely identifying a target object, and may include a plurality of feature vectors extracted from a plurality of image regions on an image corresponding to each of a plurality of predefined regions of interest for the target object. A feature vector refers to a one-dimensional data array stored in one or more memories (220) as information corresponding to a region of interest.

[0066] In the present disclosure, the processor (210) is hardware configured to perform the operations described above. The processor (210) may be a general-purpose processor capable of performing specific operations by executing instructions, or a special-purpose processor structured to perform said operations through programming. For example, the processor (210) may be a circuit comprising a CPU (Central Processing Unit), GPU (Graphics Processing Unit), NPU (Neural Processing Unit), integrated circuit, microprocessor, ASICs (Application Specific Integrated Circuits), FPGA (Field-Programmable Gate Array), conventional circuitry, or a combination thereof. That is, the processor (210) may be implemented as a circuit comprising transistors, integrated circuits, or other circuits.

[0067] The memory (220) can store various information (data). The information stored in the memory (220) is information acquired, processed, or used by at least one component of the device (110), and may include software (e.g., instructions, programs, etc.). The memory (220) may include volatile and / or non-volatile memory. In the present disclosure, instructions or programs are software stored in the memory (220) and may include an operating system for controlling the resources of the device (110), an application, and / or middleware that provides various functions to the application so that the application can utilize the resources of the device (110). In one embodiment, the memory (220) may store instructions that cause the processor (210) to perform calculations when executed by the processor (210). The memory (220) may store at least a portion of information received from a terminal through the communication circuit (230) and / or information transmitted to the terminal through the communication circuit (230). The processor (210) can store at least a portion of the information received from the terminal through the communication circuit (230) and / or the information transmitted to the terminal through the communication circuit (230) in the memory (220).

[0068] A communication circuit (230) can communicate with a user's terminal and an external device. The communication circuit (230) can perform wireless or wired communication between the device (110) and the terminal. For example, the communication circuit (230) can perform wireless communication according to methods such as eMBB (enhanced Mobile Broadband), URLLC (Ultra Reliable Low-Latency Communications), MMTC (Massive Machine Type Communications), LTE (Long-Term Evolution), LTE-A (LTE Advance), NR (New Radio), UMTS (Universal Mobile Telecommunications System), GSM (Global System for Mobile communications), CDMA (Code Division Multiple Access), WCDMA (Wideband CDMA), WiBro (Wireless Broadband), WiFi (Wireless Fidelity), Bluetooth, NFC (Near Field Communication), GPS (Global Positioning System), or GNSS (Global Navigation Satellite System). For example, the communication circuit (230) can perform wired communication according to methods such as USB (Universal Serial Bus), HDMI (High Definition Multimedia Interface), RS-232 (Recommended Standard-232), or POTS (Plain Old Telephone Service). In one embodiment, the device (110) may be implemented by integrating it with another device. In this case, the communication circuit (230) may function as a connection circuit or interface connecting the device (110) and the other device.

[0069] FIG. 3 illustrates the structure of an optical inspection system including a first camera and a second camera according to one embodiment of the present disclosure. Referring to FIG. 3, the system for recognizing an object may include a workbench (160) on which an object is placed, a first camera (120) and a first light (122) placed on the upper part of the workbench (160), a second camera (130) and a second light (132) configured to be movable and placed between the first camera (120) and the workbench (160), and an X-axis movement unit (140) and a Y-axis movement unit (150) for moving the second camera (130) and the second light (132).

[0070] The workbench (160) may provide a flat stage on which a plurality of target objects to be inspected (e.g., a tray containing surgical tools) are placed. For example, a user may wash a surgical tool that has been used once, place it in a tray, place it on the workbench, and perform the entire inspection process using the first camera (120) and the second camera (130).

[0071] The first camera (120) may be a wide view camera and may be fixedly installed on the top frame (not shown) of the system to provide a view of the entire area of the workbench (160). The first camera (120) may acquire a three-dimensional image of a plurality of target objects placed on the workbench (160). The first light (122) may be positioned together with the first camera (120) to provide lighting necessary for acquiring a three-dimensional image to the entire area of the workbench (160). The processor (210) may analyze the three-dimensional image acquired from the first camera (120) to extract 3D pose information including information on at least one of the accurate three-dimensional position coordinates or tilting of each target object on the workbench (160).

[0072] According to one embodiment, the processor (210) may include a step of separating overlapping target objects when identifying a first target object based on a three-dimensional image based on 3D pose information. According to one embodiment, the processor (210) may perform three-dimensional object segmentation based on height information extracted from a three-dimensional image obtained from a first camera (120). For example, the processor (210) may detect a point where the difference in height (Z value) between adjacent pixels (or points) on the three-dimensional image changes rapidly above a preset threshold, and determine this as a boundary between different objects. Through this method, even if one object is physically overlapping another object, the processor (210) can accurately separate and identify each object as an independent entity (e.g., 'first target object' and 'second target object').

[0073] The second camera (130) may be a narrow view camera and may acquire a high-resolution two-dimensional image of a specific region of interest of an individual target object. The second light (132) may be positioned together with the second camera (130) to provide local illumination necessary for capturing the two-dimensional image.

[0074] These second camera (130) and second light (132) modules can be coupled to and moved by an X-axis moving unit (140) and a Y-axis moving unit (150) that constitute a gantry system. The X-axis moving unit (140) and the Y-axis moving unit (150) can be driven according to a control signal from a processor (210) so that the second camera (130) can move freely in the XY plane on the workbench (160).

[0075] A system including the above configuration can recognize a target object through the following process. First, the processor (210) can acquire a three-dimensional image of the entire workbench (160) using the first camera (120) and extract the type of the first target object, three-dimensional position coordinates, and tilt information from it. Next, the processor (210) can determine a region of interest (ROI) to be photographed based on the identified type of the target object and determine two-dimensional shooting position coordinates to photograph the region of interest. Then, the processor (210) controls the X-axis movement unit (140) and the Y-axis movement unit (150) to move the second camera (130) to the calculated shooting position, and the second camera (130) can acquire a high-resolution two-dimensional image at that position. At this time, the first light (122) can be installed at a sufficient height so that no physical interference (collision) occurs in the movement path of the second camera (130).

[0076] According to one embodiment, the processor (210) can control the second camera (130) to capture only the remaining areas excluding the hidden or overlapping areas when acquiring a two-dimensional image of a plurality of regions of interest.

[0077] For example, if a first target object is partially obscured by a second target object, some of the regions of interest (e.g., 9) of the first target object (e.g., ROI 5) may be located below the surface of the second target object, making it impossible or meaningless to photograph them with the second camera (130). In this case, the processor (210) can determine one or more regions of interest among the multiple regions of interest of the target object that are identifiable because their surfaces are exposed, based on the three-dimensional image obtained from the first camera (120). Then, the processor (210) can control the second camera (130) to selectively obtain two-dimensional images only for the regions of interest determined to be identifiable. That is, regions of interest that are confirmed to be obscured in the three-dimensional image may be excluded from the shooting targets of the second camera (130).

[0078] According to one embodiment, a plurality of target objects may be stored in a state where they overlap each other inside a tray, and as a result, the second camera (130) can capture a two-dimensional image of the target objects while they are tilted. The processor (210) can generate a corrected two-dimensional image by performing a coordinate transformation on the two-dimensional image to correct distortion caused by the tilting of the target objects. For example, the processor (210) can obtain tilt information of the first target object from a three-dimensional image. Then, the processor (210) can perform a coordinate transformation that mathematically corrects the pixel positions of the two-dimensional image based on the tilt information.

[0079] For example, the pixel coordinates of a specific region of interest on a reference image may be (100, 50), but on a 2D image acquired by the second camera (130), the region of interest may be located at a deformed coordinate such as (90, 45) due to distortion. The processor (210) can extract tilt information indicating that the target object is tilted by 15 degrees from the 3D image. In this case, the processor (210) can use the 15-degree tilt information as a parameter for coordinate transformation to inverse transform the coordinates (90, 45) of the acquired 2D detailed image into the reference coordinates (100, 50), or forward transform the reference coordinates (100, 50) into the distorted coordinates (90, 45) to align the two coordinate systems. Through this, even if the target object is tilted at any angle on the workbench (160), the processor (210) can always extract features from the same reference image as the one taken from the front to improve the accuracy of identification.

[0080] FIG. 4 is a front view of an optical inspection system according to one embodiment of the present disclosure. Description of content that overlaps with FIG. 3 will be omitted. A processor (210) can control the shooting operation of the first camera (120) and the second camera (130) to optimize the shooting operation under the optical and physical arrangement conditions shown in FIG. 4.

[0081] According to one embodiment, the first camera (120) may be positioned at a distance of D2 from the workbench (160). The first camera (120) acquires a three-dimensional image to measure the accurate three-dimensional position coordinates and tilt of an object placed on the workbench (160), where D2 is the field of view of the first camera (120). ) can be secured long enough to capture the entire area of the workbench (160) while simultaneously minimizing perspective distortion that may occur during three-dimensional measurement. According to one embodiment, the angle of view ( ) can have a value close to 90 degrees so that it can obtain an image close to an orthographic view while reducing edge distortion while capturing the entire workbench (160).

[0082] The first light (122) may be positioned at a height of D1 from the workbench (160). According to one embodiment, D1 may be set at a height that ensures sufficient vertical separation distance so that the second camera (130), which moves in the XY plane on the workbench (160) under the control of the processor (210), does not physically collide with the first light (122).

[0083] The second camera (130) and the second light (132) can operate in close proximity to the surface of the workbench (160) to capture the unique surface features of the target object as a high-resolution two-dimensional image. At this time, the processor (210) can control the height so that the second camera (130) maintains an optimal focal distance and working distance of D3 (e.g., 50 mm or more) from the surface of the target object or the workbench.

[0084] FIG. 5 is an exemplary diagram of a region of interest of an object according to one embodiment of the present disclosure. The object (500) shown in FIG. 5 may be surgical scissors, but embodiments of the present invention are not limited to such surgical tools and may be applied to various objects such as factory parts, bolts, and nuts, where it is difficult to attach a separate identification tag or barcode, or where attaching a tag is inefficient due to high temperature / high pressure cleaning and sterilization processes.

[0085] The processor (210) can uniquely identify each object by using fine tool marks or unique patterns that naturally occur on the surface of the target object (500) during the processing or use process. For example, the plurality of regions of interest (510, 520, 530, 540, 550, 560, 570, 580, 590) shown in FIG. 5 may be specific areas on the surface of the target object (500) where these unique surface features are clearly visible.

[0086] The processor (210) can determine the type of target object (500) based on a three-dimensional image and determine the location of the region of interest based on the type of target object (500). According to one embodiment, the processor (210) may use a pre-trained artificial neural network when determining the type of the first target object based on a three-dimensional image. For example, the artificial neural network may be a Convolutional Neural Network (CNN) specialized for image recognition.

[0087] According to one embodiment, the artificial neural network may be pre-trained using a supervised learning method. In this case, the artificial neural network may be trained based on training data in which a training image containing a plurality of target objects is used as input data, and a label specifying the type of each target object included in the training image is used as output data. Through this training data, the processor (210) can learn the correlation between a specific image pattern and a specific type of object (e.g., 'scissors'). According to one embodiment, the artificial neural network may be trained using an unsupervised learning or self-supervised learning method that uses only the training image without labels. In this case, the model can cluster objects by type by learning the statistical distribution or features of the image data itself.

[0088] According to one embodiment, the training image included in the training data may include at least one of a 2D image or a 3D image. For example, the model may receive a 2D projected image generated from a 3D image as input to identify the type. Alternatively, it may receive the data itself, such as depth information of the 3D image or point cloud data, as input to perform training and identification.

[0089] According to one embodiment, a plurality of regions of interest may be determined by a predefined template depending on the type or model of the target object (500). For example, a plurality of regions of interest may be predetermined for each type of surgical tool (e.g., surgical scissors, scalpel, tweezers, etc.) or factory part (e.g., bolt, nut, etc.). These templates may be determined based on frequently used parts for each type of target object. For example, in the case of surgical scissors, even if they are scissors of different individuals, parts such as the handle (510, 520, 530, 540), joint (550, 560), or blade (570, 580, 590) will be frequently used and wear out; therefore, for surgical scissors, these parts may be predetermined as regions of interest. As another example, in the case of a scalpel, a specific area of the handle surface where the blade is attached or the user mainly grips may be designated as a region of interest. In this way, the processor (210) can compare, analyze, and update unique features extracted from the same location of each object in subsequent steps based on consistent criteria.

[0090] According to one embodiment, the processor (210) may automatically analyze an image of a target object to automatically detect areas with high density or complexity of feature points, such as tool marks or patterns, and determine the area as a region of interest. According to one embodiment, the processor (210) may divide the entire or partial area of the target object into a number of candidate regions and apply a preliminary feature extraction algorithm to each candidate region to calculate the feature complexity of the region. For example, a candidate region with a large variance of a feature vector generated by analyzing the relationship between a specific pixel and surrounding pixels in terms of brightness values may be determined to contain that much rich and unique information. The processor (210) may automatically determine the top N candidate regions whose complexity calculated in this way exceeds a preset threshold as the final region of interest.

[0091] According to one embodiment, the regions of interest of the target object (500) may be dynamically added or deleted depending on the usage process. For example, if, while using the target object, a deep scratch or damage of an identifiable level occurs in a new region other than the previously defined regions of interest (510-590), the processor (210) may add that region as a new region of interest. Conversely, if it is determined that a specific region of interest is so severely damaged that it cannot be recovered and unique features can no longer be identified, that region may be deleted from the regions of interest.

[0092] According to one embodiment, information regarding a region of interest of a target object (500) (e.g., [510...590] coordinate template of 'Model A' scissors) may be stored and managed in one or more memories (220) of the device (110). Furthermore, a unique feature set actually extracted for the region of interest from each object may be mapped to the region of interest information and stored in a database.

[0093] FIG. 6 is a diagram illustrating the process of generating a feature set according to one embodiment of the present disclosure. FIG. 6 illustrates an area of interest (600) that is enlarged from any one of the plurality of areas of interest (510-590) shown in FIG. 5, and a sub-area (610) that is further divided into a certain unit from the area of interest (600). A processor (210) can obtain an image of the area of interest (600) from a two-dimensional image captured by a second camera (130), and then divide it into a plurality of sub-areas (610) for analysis.

[0094] According to one embodiment, the method of dividing the region of interest (600) into sub-regions (610) can be implemented in various ways. For example, if the region of interest (600) consists of 128x128 pixels, it can be divided into non-overlapping blocks of 3x3 pixels in size, such as the sub-region (610) of FIG. 6. As another example, the processor (210) can repeat the process of setting the upper-left 3x3 pixel area of the region of interest (600) as the first sub-region to calculate a representative value, moving it 1 pixel to the right, and then setting the 3x3 pixel area as the second sub-region. Each sub-region (610) can be composed of a pixel matrix of a predetermined size, such as 3x3, 5x5, or 7x7, as shown in FIG. 6.

[0095] The processor (210) can obtain a brightness value for each pixel from the second camera (130). For example, if the image is 8-bit grayscale, each pixel may have a single brightness value between 0 (black) and 255 (white). Or, if the image is RGB color scale, each pixel may have individual values for three color channels of red (R), green (G), and blue (B) (e.g., 8-bit values of 0-255 per channel).

[0096] According to one embodiment, the processor (210) can calculate a representative value for each sub-region (610) based on the brightness values of the pixels. The representative value can be determined using statistical values for the brightness values of the pixels included in the sub-region (610). For example, the processor (210) can calculate the average brightness value, the median brightness value, or the variance value of the brightness values of the nine pixels included in the 3x3 sub-region (610), and use them as the representative value of the sub-region (610).

[0097] According to one embodiment, the processor (210) can determine a representative value based on the relative brightness relationship between pixels so that the representative value has robustness against environmental factors such as changes in lighting. Since the relative bright and dark relationship between pixels does not change regardless of the brightness and illuminance of the entire workbench area, the processor (210) can extract a consistent representative value.

[0098] For example, the processor (210) can determine the center pixel of a sub-region (610) (e.g., 3x3 pixels) as the 'representative pixel' and compare the brightness value of the representative pixel with the brightness value of each of the eight surrounding pixels adjacent to the representative pixel. Based on the comparison result, the processor (210) can generate a binary code for the representative value. For example, if the surrounding pixels are brighter than or equal to the representative pixel, '1' can be assigned, and if they are darker, '0' can be assigned. The processor (210) can perform this process for all eight surrounding pixels to generate an 8-bit binary code (e.g., 10110010) and convert it to a decimal value (e.g., 178) to determine the final representative value of the corresponding sub-region (610).

[0099] FIG. 7 is a diagram illustrating the process of generating a feature set according to one embodiment of the present disclosure. As described in FIG. 6, the processor (210) may divide the original region of interest (600) into a plurality of sub-regions (610) and calculate one representative value for each sub-region (610). The processor (210) may generate a representative value map (700), which is a new two-dimensional image created by maintaining the two-dimensional position coordinates of the sub-regions and replacing the representative value calculated in each sub-region with the pixel value (e.g., brightness value) of the corresponding location.

[0100] For example, if the representative value calculated in the upper left 3x3 sub-region of the original region of interest (600) is 178, the upper left pixel value of the representative value map (700) becomes 178, and if the representative value of the adjacent sub-region is 30, the corresponding pixel value of the representative value map (700) may be 30. In this representative value map (700), deviations such as lighting changes that the original image (600) may have are removed, and only local unique patterns such as tool marks are highlighted.

[0101] FIG. 8 is a diagram illustrating the process of generating a feature set according to one embodiment of the present disclosure. FIG. 8 illustrates a histogram visualizing a feature vector corresponding to a region of interest (600).

[0102] According to one embodiment, the processor (210) may divide the representative value map (700) into a plurality of unit regions, calculate the frequency of occurrence of representative values for each unit region, and then sequentially combine (concatenate) all of this frequency information to generate a histogram. A method for generating a histogram will be described in detail below.

[0103] According to one embodiment, the processor (210) may divide the representative value map (700) into a total of 16 unit regions of a size of, for example, 4x4, according to a predetermined method (e.g., the grid pattern of FIG. 7). For each unit region, the processor (210) may generate a local histogram indicating the frequency of occurrence of representative values of pixels included within that unit region. For example, if there are a total of 10 types of representative values from 0 to 9, each local histogram may have 10 bins. Alternatively, the number of bins may be determined by grouping or quantizing the entire range of representative values (e.g., 0 to 255) that can be generated through the algorithm of FIG. 6 into a predetermined number (e.g., 10). For example, when the processor (210) generates a local histogram for a first unit area, the first bin represents the frequency of occurrence of pixels with a representative value of 0-24, the second bin represents the frequency of occurrence of pixels with a representative value of 25-49, and the third bin represents the frequency of occurrence of pixels with a representative value of 50-74, and so on, the local histogram can be generated in which multiple representative values correspond to one bin.

[0104] Subsequently, the processor (210) can sequentially concatenate local histograms for each unit area to generate a histogram for the entire region of interest. According to one embodiment, the X-axis and Y-axis of the histogram of FIG. 8, in which local histograms are combined, may represent the index of each dimension of the feature vector corresponding to one region of interest (600) and the normalized frequency corresponding to each dimension, respectively. For example, referring to FIG. 8, if the representative value map (700) is divided into 16 unit areas and the local histogram for each unit area has 10 bins, the X-axis of the final histogram will have a total of 160 bins. Here, the 0-9 interval of the X-axis may represent the local histogram of the first unit area, and the 10-19 interval may represent the local histogram of the second unit area. This means that the feature vector consists of a total of 160 dimensions, and the value corresponding to each dimension can be the Y-axis value corresponding to each bin in the histogram.

[0105] According to one embodiment, the final histogram of FIG. 8 generated through the above process can indicate not only what unique pattern (representative value) exists in the region of interest (600) (frequency on the Y-axis), but also spatial information regarding where the pattern is distributed (interval on the X-axis). Therefore, this can be used as a powerful identifier to uniquely identify the target object.

[0106] FIG. 9 is a diagram illustrating a process for generating a feature set according to one embodiment of the present disclosure. FIG. 9 is an example of a final feature vector generated from the histogram of FIG. 8.

[0107] The processor (210) can compress the histogram of FIG. 8 to generate a final feature vector with fewer bins, as shown in FIG. 9. Through this, the processor (210) can streamline computation and improve the robustness of the features. This compression or dimensionality reduction process can be implemented in various ways. For example, the processor (210) can group 160 bins into 10 groups and calculate the average or sum of the 16 bin values in each group to determine the final 10 bin values.

[0108] FIG. 10 is an example of a first feature set according to one embodiment of the present disclosure. Referring to FIG. 10, the feature set (1000) may be configured in the form of a matrix, wherein each column may correspond to each of a plurality of predefined regions of interest (ROI) for a target object. For example, if nine regions of interest (510-590) are defined as in FIG. 5, the feature set (1000) may have nine columns (n=9). Each column may represent a feature vector extracted from the corresponding region of interest through the process of FIG. 6 to FIG. 9. Each row of the feature set (1000) corresponds to each dimension of the feature vector. For example, if the feature vector consists of 160 dimensions, the feature set (1000) may have 160 rows (m=160), and each row may represent the value of the corresponding dimension.

[0109] FIG. 11 is a set of features stored in one or more memories according to one embodiment of the present disclosure. Referring to FIG. 11, the memory (220) may store a unique set of features corresponding to each of a plurality of target objects. According to one embodiment, each set of features stored in the memory (220) may be a reference feature set that was previously measured for the corresponding target object. For example, the set of features for a first target object stored in the memory (220) may be one that was acquired and stored when the first target object was last inspected. According to another embodiment, the set of features stored in the memory (220) may be a standard template for the type of the corresponding target object. For example, the set of features for a first target object may be a template representing the standard features of 'Model A scissors'.

[0110] After obtaining a current feature set from the target object currently being inspected, the processor (210) performs a similarity comparison with a plurality of reference feature sets stored in memory (220). Through this similarity comparison, the processor (210) determines the feature set most similar to the feature set of the object currently being inspected among the reference feature sets stored in memory (220), and further determines whether to update the reference feature set of the identified target object.

[0111] FIG. 12 is a diagram illustrating the updating of a feature vector according to one embodiment of the present disclosure. According to one embodiment, a processor (210) can generate a feature similarity map (1200) that visualizes the result of comparing the similarity between a current feature set of a target object and a reference feature set stored in memory (220).

[0112] The processor (210) can generate a current feature set of the target object through the process of FIGS. 4 to 10 and compare it with a plurality of feature sets stored in memory (220) to determine a reference feature set with the highest overall similarity. For example, the processor (210) can calculate individual similarity between each feature vector with respect to the current feature set and one of the plurality of feature sets stored in memory (220). This process of calculating individual similarity can be performed using a cosine similarity or Euclidean distance algorithm.

[0113] Subsequently, the processor (210) may calculate the total similarity of the feature set by summing or weighting the n individual similarities. The processor (210) may calculate the total similarity for all feature sets stored in memory (220) and determine one feature set with the highest total similarity as the reference feature set. The reference feature set determined through this process may be a value previously measured for the corresponding target object, or a feature set template for the type of the target object.

[0114] According to one embodiment, the processor (210) can identify a reference set of features based on the similarity of the feature vector corresponding to the remaining undamaged area, even if the similarity of the feature vector corresponding to a specific cell (1210) within the similarity map (1200) is calculated to be low. For example, in the step of calculating the total similarity, the processor (210) may calculate the total similarity by excluding from the calculation cells (1210) having individual similarity values below a predetermined threshold (e.g., 30%), or by applying a pre-assigned weight to the individual similarity according to the importance or expected frequency of damage of each area of interest.

[0115] After determining a reference feature set, the processor (210) may refer back to the calculated individual similarity to determine at least some of the feature vectors included in the current feature set as vectors to be changed. For example, the processor (210) may identify an area where the individual similarity is less than a preset update threshold (e.g., 50%) as an update target and change the value of the feature vector of that area in the reference feature set stored in memory (220) to the value of the feature vector included in the current feature set.

[0116] According to one embodiment, the processor (210) can change the value of the vector to be changed to a moving average value or a weighted average value. For example, the processor (210) can update the value of the vector to be changed by giving 70% weight to the feature vector included in the existing reference feature set and 30% weight to the feature vector included in the current feature set. By doing so, the processor (210) can minimize the influence of transient noise by accumulating and reflecting recent measurements.

[0117] According to one embodiment, the processor (210) may determine the weight to be assigned to the current feature set according to the degree of decline in individual similarity. According to one embodiment, the processor (210) may assign a greater weight to the current feature set as the degree of decline in individual similarity (i.e., the similarity of the current feature set relative to the reference feature set in the vector to be changed) increases. For example, if the similarity of the current feature set relative to the reference feature set in the vector to be changed falls within a first range (e.g., less than 50%), the processor (210) may determine the value of the vector to be changed as the value of the feature vector included in the current feature set (weight 100%), or set the weight of the current feature set very high. On the other hand, if the similarity of the current feature set relative to the reference feature set in the vector to be changed falls within a second range (e.g., 50% or more, less than 70%), the processor (210) may set the weight of the current feature set relatively low. In such cases, if the similarity of the target vector decreases within a predetermined range, it can be determined that the update is not performed due to the natural aging of the target object, or only a very low weight (e.g., 5%) can be applied to gradually reflect the long-term wear condition.

[0118] The similarity map (1200) of FIG. 12 visualizes individual similarity scores. For example, if the similarity is high (or, if the two vectors match), it may be shown in light gray, and if the similarity is low (or, if the two vectors do not match), it may be shown in dark color. Referring to FIG. 12, the area corresponding to most cells of the similarity map (1200) has a feature vector that is similar to the currently measured feature vector and the feature vector included in the reference feature set, but the area corresponding to a specific cell (1210) may have a low individual similarity. This discrepancy may suggest that the surface tool marks of the area corresponding to the specific cell (1210) have been physically deformed during use since the feature vector of the target object was last stored in memory (220), such as scratches, dents, or severe contamination.

[0119] FIG. 13 is a flowchart of a method for generating a two-dimensional image of a target object according to one embodiment of the present disclosure.

[0120] The processor (210) can acquire a three-dimensional image of at least a portion of a plurality of target objects from the first camera (S1310). This means that the first camera (120) captures the entire area of the workbench (160) to acquire a three-dimensional image, as illustrated in FIGS. 3 and 4. According to one embodiment, the processor (210) can extract 3D pose information from the three-dimensional image, including at least one of the position (X, Y) and height (Z) and tilt information of each object in three-dimensional space.

[0121] The processor (210) can identify a first target object among a plurality of target objects based on a three-dimensional image (S1320). According to one embodiment, the processor (210) can separate objects that overlap each other on a tray into individual objects based on 3D pose information and identify a first target object among them.

[0122] The processor (210) can determine the type of the first target object (S1330). For example, the processor (210) can determine the type of the identified first target object (e.g., scissors, tweezers, etc.) using a trained artificial neural network (e.g., CNN). The training data of the artificial neural network may include both 3D images and 2D images.

[0123] The processor (210) can determine a plurality of predefined regions of interest for the first target object based on the type of the first target object (S1340). According to one embodiment, as the type of the first target object is determined, the processor (210) can obtain a region of interest template corresponding to that type. The processor (210) can determine a plurality of predefined regions of interest for the first target object based on the region of interest template. For example, if the type of the first target object is determined to be 'scissors', the processor (210) can obtain a template corresponding to scissors among the plurality of region of interest templates stored in memory (220) and determine the region of interest of the first target object.

[0124] The processor (210) can control the second camera (130) to generate and acquire two-dimensional images of multiple regions of interest (S1350). Based on the location of the target object identified based on the three-dimensional image and the coordinates of the region of interest determined based on the region of interest template, the processor (210) can drive the XY-axis movement unit (140, 150) to move the second camera (130) to a position for photographing the corresponding region of interest. According to one embodiment, the processor (210) can control the second camera (130) to exclude regions of interest obscured by other objects from the shooting target based on the three-dimensional image, and selectively photograph only the identifiable region of interest.

[0125] The processor (210) can generate a corrected 2D image by performing coordinate transformation on a 2D image of multiple regions of interest based on a 3D image (S1360). The processor (210) can extract tilt information of the target object from the acquired 3D image and mathematically correct the pixel position of the 2D image based on this tilt information.

[0126] FIG. 14 is a flowchart of a method for identifying a target object according to one embodiment of the present disclosure. According to one embodiment, the method described in FIG. 14 may be part of an operation (S1320) for identifying a first target object.

[0127] The processor (210) can determine one or more regions of interest that are identifiable on a three-dimensional image among a plurality of regions of interest (S1410). Based on the three-dimensional image, the processor (210) can exclude regions of interest that are obscured by other objects and cannot be photographed or are meaningless, and select only regions of interest where the surface is exposed and a valid two-dimensional image can be obtained as identifiable regions.

[0128] The processor (210) can control the second camera (130) to acquire a two-dimensional image of one or more identifiable regions of interest (S1420). The processor (210) can control the second camera (130) so that no shooting operation is performed on regions of interest that are identified as already obscured in the three-dimensional image. By doing so, the processor (210) can save unnecessary shooting and computational resources.

[0129] FIG. 15 is a flowchart of a method for identifying a target object according to one embodiment of the present disclosure. According to one embodiment, the method described in FIG. 15 may be part of an operation (S1320) for identifying a first target object.

[0130] The processor (210) can extract height information indicating the height of each of the multiple target objects included in the three-dimensional image (S1510). The three-dimensional image may be a depth map or a point cloud obtained from the first camera (120), where the height information may refer to the Z-axis coordinate value of each pixel or point relative to the surface of the workbench (160). The processor (210) can analyze the three-dimensional image to extract height information of all objects placed on the workbench (160).

[0131] The processor (210) can identify a first target object by separating overlapping target objects among a plurality of target objects based on height information (S1520). The processor (210) can determine a point of abrupt change in height as a boundary between different objects based on height information extracted from a three-dimensional image. Through this, even if one object is physically overlapping on another object, the processor (210) can accurately separate each object into independent entities to identify the first target object.

[0132] FIG. 16 is a flowchart of a method for recognizing a target object according to one embodiment of the present disclosure.

[0133] The processor (210) can acquire an image of a target object (S1610). The processor (210) can control the first camera (120) to acquire a three-dimensional image of the entire area of the workbench (160). By analyzing this three-dimensional image, the processor (210) can identify the exact location, type, overlap status, and 3D pose information of each of the multiple target objects placed on the tray. Based on this identified information, the processor (210) can control the position of the second camera (130) to acquire a high-resolution two-dimensional image in order to capture multiple regions of interest of the object. According to one embodiment, prior to performing the feature extraction step of S1620, the processor (210) may further perform a step of generating a corrected two-dimensional image by performing a coordinate transformation on the two-dimensional image based on the gradient information extracted from the three-dimensional image.

[0134] The processor (210) can extract a first set of features from multiple image regions on an image corresponding to each of a plurality of predefined regions of interest for a target object (S1620). The processor (210) can divide each region of interest (600) of a two-dimensional image into multiple sub-regions (610) and calculate a representative value for each sub-region by comparing relative brightness values between pixels. The processor (210) can reconstruct the representative values according to two-dimensional spatial locations to generate a representative value map (700) and then divide it again into multiple unit regions. The processor (210) can calculate a local histogram representing the frequency of occurrence of representative values for each unit region and sequentially connect these local histograms to generate a feature vector containing spatial information. Alternatively, it can compress this again to generate a low-dimensional feature vector. The processor (210) can repeat this process for all regions of interest and combine the acquired n feature vectors into a single data matrix to generate a first set of features for the target object.

[0135] The processor (210) can determine a second feature set among a plurality of feature sets for a target object stored in one or more memories based on a first feature set (S1630). The processor (210) compares the first feature set with the feature set stored in memory (220), and in this process, can calculate individual similarity for each feature set and each feature vector included in the first feature set. Subsequently, the total similarity between the first feature set and the feature sets stored in memory can be calculated by summing or weighting the multiple individual similarities. At this time, regions of interest that are not identified due to overlap, etc., or regions of interest that are severely damaged, can be excluded from the total similarity calculation to perform robust identification. The processor (210) can repeat this process for all feature sets and determine the feature set with the highest total similarity score as the second feature set (reference feature set).

[0136] The processor (210) can determine one or more change target vectors among a plurality of second feature vectors included in the second feature set based on the first feature set and the second feature set (S1640). The processor (210) can determine that among the feature vectors included in the second feature set, a vector whose individual similarity with a corresponding vector included in the first feature set is less than a preset threshold (e.g., 50%) is a change target vector that has suffered physical damage or wear.

[0137] The processor (210) can update the value of one or more change target vectors with one or more first feature vectors corresponding to one or more change target vectors among a plurality of first feature vectors (S1650). According to one embodiment, in order to minimize the influence of transient measurement errors (noise), the processor (210) may perform an update based on a moving average or a weighted average value. According to one embodiment, the processor (210) may perform an update while dynamically changing the weight between the feature vector of the first feature set and the feature vector of the second feature set based on the value of individual similarity.

[0138] FIG. 17 is a flowchart of a method for extracting a first set of features according to one embodiment of the present disclosure. According to one embodiment, the method described below may be part of an operation (S1620) for extracting a first set of features.

[0139] The processor (210) can obtain the brightness value of each of the plurality of pixels included in the first region on the image corresponding to the first region of interest (S1710). For example, if the image is an 8-bit grayscale, each pixel may have a single brightness value between 0 (black) and 255 (white), and if the image is an RGB color scale, the processor (210) may convert each channel value into a single luminance value and use it for subsequent operations.

[0140] The processor (210) can divide the first region into a plurality of sub-regions (S1720). The processor (210) may mean dividing the first region into small blocks such as 3x3 pixels, and each sub-region may or may not overlap with one another.

[0141] The processor (210) can determine a representative value for each of a plurality of sub-regions based on the brightness value of each of a plurality of pixels (S1730). According to one embodiment, the processor (210) can determine a representative value based on the relative brightness relationship between pixels. For example, the processor (210) can determine a representative value as the average or median value of the brightness values of pixels within a sub-region. For example, the processor (210) can generate a binary code by comparing the brightness values of the center pixel of a sub-region and surrounding pixels, convert it into a decimal value, and determine it as a representative value.

[0142] The processor (210) can generate a first feature vector of a first region of interest based on the frequency of occurrence of each representative value of a plurality of sub-regions (S1740). The processor (210) can divide the representative value map (700) again into a plurality of unit regions (e.g., 16), calculate a local histogram for each unit region, and then sequentially connect these local histograms to generate a feature vector.

[0143] FIG. 18 is a flowchart of a method for determining a second set of features according to one embodiment of the present disclosure. According to one embodiment, the method described below may be part of an operation (S1630) for determining a second set of features.

[0144] The processor (210) can determine the overall similarity with the first feature set for each of the multiple feature sets stored in memory (S1810). The processor (210) can determine individual similarity and determine the overall similarity based on the individual similarity. The processor (210) can first determine individual similarity to determine the overall similarity between the first feature set and the multiple feature sets in memory (220). Individual similarity may refer to the similarity between each feature vector of the first feature set and the feature vector included in the feature set in memory. Individual similarity may be calculated, for example, through a cosine similarity or Euclidean distance algorithm. The processor (210) can determine the feature set with the highest overall similarity among the multiple feature sets stored in memory as the second feature set (S1820).

[0145] Although the steps of the method or algorithm according to the present disclosure are described in a sequential order in the flowcharts illustrated in the present disclosure, the steps may be performed in any order that can be arbitrarily combined according to the present disclosure, in addition to being performed sequentially. The description according to the flowcharts does not exclude changes or modifications to the method or algorithm and does not imply that any step is essential or desirable. In one embodiment, at least some of the steps may be performed in parallel, iteratively, or heuristically. In one embodiment, at least some of the steps may be omitted or other steps may be added.

[0146] Various embodiments of the present disclosure may be implemented as software on a machine-readable storage medium. The software may be software for implementing various embodiments of the present disclosure. The software may be inferred from the various embodiments of the present disclosure by programmers skilled in the art to which the present disclosure pertains. For example, the software may be a program containing machine-readable instructions (e.g., code or code segments). The machine may be a device capable of operating according to instructions called from the storage medium, for example, a computer. In one embodiment, the machine may be a device (110) according to the embodiments of the present disclosure. In one embodiment, the processor of the machine may execute the called instruction to cause the components of the machine to perform a function corresponding to the instruction. In one embodiment, the processor may be a processor (210) according to the embodiments of the present disclosure. The storage medium may mean any type of recording medium in which data is stored that can be read by the machine. The storage medium may include, for example, ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. In one embodiment, the storage medium may be a memory (220). In one embodiment, the storage medium may be implemented in a distributed form in a computer system connected by a network, etc. Software may be stored and executed in a distributed form in a computer system, etc. The storage medium may be a non-transitory storage medium. A non-transitory storage medium refers to a tangible medium that exists regardless of whether data is stored semi-permanently or temporarily, and does not include a signal that is transmitted transitory.

[0147] Although the technical concept of the present disclosure has been described by various embodiments above, the technical concept of the present disclosure includes various substitutions, modifications, and alterations that can be made within the scope of understanding of those skilled in the art to which the present disclosure pertains. Furthermore, it should be understood that such substitutions, modifications, and alterations may be included within the scope of the appended claims.

Claims

1. A method performed in a device comprising one or more processors and one or more memories storing instructions to be executed by said one or more processors, wherein One or more of the above processors, A step of acquiring an image of the target object; A step of extracting a first feature set from a plurality of image regions on the image corresponding to each of a plurality of predefined Regions of Interest (ROI) for the above-mentioned object - wherein the first feature set includes a plurality of first feature vectors for each of the plurality of image regions -; For each of the plurality of feature sets stored in the memory, a step of determining the overall similarity with the first feature set; and A method comprising the step of determining the feature set with the highest overall similarity among the plurality of feature sets stored in the memory as the second feature set.

2. In Paragraph 1, Based on the first feature set and the second feature set, a step of determining one or more change target vectors among a plurality of second feature vectors included in the second feature set; and A method further comprising the step of updating the change target vector in the second feature set with the value of the first feature vector corresponding to the change target vector in the first feature set.

3. In Paragraph 1, The step of extracting the first feature set above is, A step of obtaining the brightness value of each of a plurality of pixels included in a first region on the image corresponding to a first region of interest; A step of dividing the above-mentioned first region into a plurality of sub-regions; A step of determining a representative value for each of the plurality of sub-regions based on the brightness value of each of the plurality of pixels; and A method comprising the step of generating a first feature vector of a first region of interest based on the frequency of occurrence of each representative value of a plurality of sub-regions.

4. In Paragraph 3, The step of determining a representative value for each of the above plurality of sub-regions is, Step of determining a representative pixel of the first sub-region; A step of comparing brightness values between the representative pixel and pixels adjacent to the representative pixel; and A method comprising the step of determining a representative value of the first sub-region based on a brightness comparison result.

5. In Paragraph 3, The step of determining a representative value for each of the above plurality of sub-regions is, A method comprising the step of determining a representative value of a first sub-region based on at least one of an average brightness value, a median brightness value, or a variance value of brightness values of a plurality of pixels included in a first sub-region.

6. In Paragraph 3, The step of generating the current feature vector of the first region of interest is, A step of generating a first frequency distribution based on the frequency of occurrence of each representative value of the plurality of sub-regions; A step of dividing the first frequency distribution into a predetermined number of groups and calculating the sum of frequency values included in each of the groups; and A method comprising the step of generating a first feature vector of the first region of interest based on the sum of frequency values included in each of the above groups.

7. In Paragraph 1, The step of extracting the first feature set above is, A step of extracting a plurality of feature sets including a plurality of first feature vectors from a region on the image corresponding to each of the plurality of interest regions; and A method comprising the step of determining the first feature set based on the average value of the multiple feature sets extracted above.

8. In Paragraph 1, The step of extracting the first feature set above is, A step of extracting a plurality of feature sets including a plurality of first feature vectors from a region on the image corresponding to each of the plurality of interest regions; and A method comprising the step of determining the first feature set based on the moving average value of the multiple feature sets extracted above.

9. In Paragraph 1, A method in which the plurality of regions of interest are tool marks or unique pattern regions formed on the surface of the target object.

10. In Paragraph 1, The step of determining the total similarity above is, A step of determining the individual similarity of each of the plurality of first feature vectors included in the first feature set and each of the corresponding second feature vectors included in the feature set; and A method comprising the step of determining the overall similarity based on the individual similarities.

11. In Paragraph 10, The step of determining the individual similarity above is, A method comprising the step of determining individual similarities using at least one of a cosine similarity or Euclidean distance algorithm.

12. In Paragraph 10, The step of determining the total similarity above is, A method comprising the step of determining the overall similarity by applying a weight pre-assigned to each of the plurality of regions of interest to the individual similarities.

13. In Paragraph 12, A method in which the above weights are determined based on the expected frequency of damage or wear of each of the plurality of regions of interest.

14. In Paragraph 10, The step of determining the overall similarity includes determining the overall similarity based only on individual similarities that are greater than or equal to a predetermined first threshold value among the determined plurality of individual similarities. A method comprising the step of determining one or more change target vectors, wherein the step of determining a second feature vector as the change target vector, wherein the individual similarity is greater than or equal to the first threshold and less than a predetermined second threshold.

15. One or more processors; and It includes one or more memories in which instructions to be executed by the above one or more processors are stored, and When executing the above instructions, the one or more processors, Acquire an image of the target object, and A first feature set is extracted from a plurality of image regions on the image corresponding to each of a plurality of predefined Regions of Interest (ROI) for the above-mentioned object, wherein the first feature set includes a plurality of first feature vectors for each of the plurality of image regions. For each of the plurality of feature sets stored in the memory, the total similarity with the first feature set is determined, and A device for determining the feature set with the highest overall similarity among the plurality of feature sets stored in the memory as the second feature set.

16. In Paragraph 15, Based on the first feature set and the second feature set, one or more change target vectors among a plurality of second feature vectors included in the second feature set are determined, and A device for updating the change target vector in the second feature set to the value of the first feature vector corresponding to the change target vector in the first feature set.

17. In Paragraph 15, The above one or more processors, in extracting the first set of features, Obtaining the brightness value of each of a plurality of pixels included in a first region on the image corresponding to a first region of interest, and The above-mentioned first region is divided into a plurality of sub-regions, and Based on the brightness value of each of the plurality of pixels, a representative value of each of the plurality of sub-regions is determined, and A device that generates a first feature vector of a first region of interest based on the frequency of occurrence of each representative value of a plurality of sub-regions.

18. In Paragraph 17, The above one or more processors, in determining a representative value for each of the plurality of sub-regions, Determine the representative pixel of the first sub-region, and Compare the brightness values between the representative pixel and the pixels adjacent to the representative pixel, and A device that determines a representative value of the first sub-region based on a brightness comparison result.

19. In Paragraph 17, The above one or more processors, in generating the current feature vector of the first region of interest, Based on the frequency of occurrence of each representative value of the plurality of sub-regions mentioned above, a first frequency distribution is generated, and The first frequency distribution is divided into a predetermined number of groups, the sum of frequency values included in each of the groups is calculated, and A device that generates a first feature vector of the first region of interest based on the sum of frequency values included in each of the above groups.

20. A non-transient computer-readable recording medium storing instructions to be executed by one or more processors, The above instructions, when executed, the above one or more processors, Acquire an image of the target object, and A first feature set is extracted from a plurality of image regions on the image corresponding to each of a plurality of predefined Regions of Interest (ROI) for the above-mentioned object, wherein the first feature set includes a plurality of first feature vectors for each of the plurality of image regions. For each of the plurality of feature sets stored in the memory, the total similarity with the first feature set is determined, and A non-transient computer-readable recording medium that determines the feature set with the highest overall similarity among the plurality of feature sets stored in the memory as the second feature set.