Positioning identification method, device, equipment, system and computer storage medium

By generating a matching template and calculating the matching degree by sliding it across the scene image to determine the object's position and classify it, the problem of low accuracy in neural network localization and recognition is solved, and high-precision object localization and classification are achieved.

CN113762238BActive Publication Date: 2026-06-26TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2021-05-27
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In existing technologies, neural networks are prone to misidentification and omission when locating and recognizing objects, resulting in low positioning accuracy and classification accuracy, especially in scenarios with high precision requirements such as robot chess.

Method used

By acquiring the preset common image features of the object, a matching template is generated. The matching template is then slid across the scene image to calculate the matching degree and determine the predicted position information of the object. Finally, target classification and recognition are performed to generate the object classification and recognition result.

Benefits of technology

It improves the accuracy of object localization and classification, reduces the interference of irrelevant information on object classification and recognition, and enhances the recognition effect in high-precision scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN113762238B_ABST
    Figure CN113762238B_ABST
Patent Text Reader

Abstract

The application provides a positioning recognition method, device, equipment, system and computer storage medium, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring preset common image features corresponding to at least one object object in a scene image, and generating a matching template; sliding the matching template in the scene image, obtaining at least one prediction position information corresponding to the at least one object object according to the matching degree between the matching template and an image part corresponding to at least one preset sliding position of the matching template in the scene image; performing target classification recognition on at least one pre-classification image corresponding to the at least one prediction position information in the scene image, and obtaining a classification recognition result of the at least one object object; and the at least one object object belongs to at least one object category. Through the application, the positioning recognition accuracy can be improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to artificial intelligence technology, and more particularly to a positioning and identification method, apparatus, device, system, and computer storage medium. Background Technology

[0002] In recent years, with the continuous development of deep learning, the mainstream technology for object detection in image processing, particularly for localization and recognition tasks, tends to directly feed an image into a neural network, which then outputs the locations and classification information of all potential objects. This method is suitable for multi-scale and multi-category object recognition scenarios. However, due to the susceptibility of neural network recognition to false positives and false negatives, and the potential for the center of detected objects to shift due to background interference, the current method suffers from low localization accuracy. Consequently, the accuracy of localization-based classification is also low, making it unsuitable for scenarios requiring high precision and accuracy, such as robot chess. Summary of the Invention

[0003] This application provides a positioning and identification method, apparatus, device, system, and computer storage medium that can improve the accuracy of object positioning.

[0004] The technical solution of this application embodiment is implemented as follows:

[0005] This application provides a location identification method, including:

[0006] Obtain at least one object's corresponding preset common image features in the scene image and generate a matching template;

[0007] The matching template is slid in the scene image, and at least one predicted position information corresponding to the at least one object is obtained based on the matching degree between the matching template and the corresponding image part at at least one preset sliding position in the scene image.

[0008] In the scene image, target classification and recognition are performed on at least one pre-classified image corresponding to at least one predicted location information to obtain the classification and recognition result of at least one object; the at least one object belongs to at least one object category.

[0009] In the above method, the step of sliding the matching template in the scene image and obtaining at least one predicted position information corresponding to the at least one object based on the matching degree between the matching template and the corresponding image portion at at least one preset sliding position in the scene image includes:

[0010] Align the center position of the matching template with the at least one preset sliding position to obtain the matching area of ​​the matching template at each preset sliding position;

[0011] Calculate the matching degree between the matching template and the image portion within the region to be matched, and obtain the matching score corresponding to each preset sliding position;

[0012] Based on the preset matching strategy and the matching score, at least one predicted position information is determined from at least one preset sliding position.

[0013] In the above method, the step of performing target classification and recognition on at least one pre-classified image corresponding to at least one predicted location information in the scene image to obtain the classification and recognition result of the at least one object includes:

[0014] Based on the at least one predicted location information, at least one candidate region is generated according to a preset region size;

[0015] The image portion within the at least one candidate region is used as the at least one pre-classified image. The at least one object category is predicted for each pre-classified image to obtain the prediction result for each object category corresponding to each pre-classified image.

[0016] Based on the prediction results for each object category corresponding to each pre-classified image, the classification and recognition results of at least one object are obtained.

[0017] In the above method, obtaining the preset common image features corresponding to at least one object in the scene image and generating a matching template includes:

[0018] Using an image acquisition device, an image of a scene containing at least one object is acquired from a preset acquisition position to obtain the scene image;

[0019] Extract the image portion corresponding to a single object from the scene image as a template image; perform image segmentation on the template image based on the preset common image features to obtain the matching template.

[0020] In the above method, the matching degree calculation includes:

[0021] Any one of the following: squared difference matching algorithm, correlation matching algorithm, and standard matching algorithm.

[0022] In the above method, the preset common image features include:

[0023] At least one of the following: contour features, pattern features, color features, and texture features.

[0024] This application provides a positioning and identification device, including:

[0025] The generation module is used to obtain the preset common image features corresponding to at least one object in the scene image and generate a matching template;

[0026] The positioning module is used to slide the matching template in the scene image and obtain at least one predicted position information corresponding to the at least one object based on the matching degree between the matching template and the corresponding image part at at least one preset sliding position in the scene image.

[0027] The recognition module is used to perform target classification recognition on at least one pre-classified image corresponding to at least one predicted location information in the scene image, and obtain the classification recognition result of at least one object; the at least one object belongs to at least one object category.

[0028] In the above-described device, the positioning module is further configured to align the center position of the matching template with the at least one preset sliding position to obtain the matching region corresponding to the matching template at each preset sliding position; calculate the matching degree between the matching template and the image portion within the matching region to obtain the matching score corresponding to each preset sliding position; and determine the at least one predicted position information from the at least one preset sliding position according to the preset matching strategy and the matching score.

[0029] In the above-described apparatus, the recognition module is further configured to generate at least one candidate region based on a preset region size on the at least one predicted location information; use the image portion within the at least one candidate region as the at least one pre-classified image; perform classification prediction of the at least one object category on each of the at least one pre-classified images to obtain a prediction result for each object category corresponding to each pre-classified image; and obtain a classification recognition result for the at least one object based on the prediction result for each object category corresponding to each pre-classified image.

[0030] In the above-described device, the generation module is further configured to acquire an image of a scene containing at least one object from a preset acquisition location using an image acquisition device, thereby obtaining the scene image; extract the image portion corresponding to a single object from the scene image as a template image; and perform image segmentation on the template image according to the preset common image features to obtain the matching template.

[0031] In the above-mentioned device, the matching degree calculation includes:

[0032] Any one of the following: squared difference matching algorithm, correlation matching algorithm, and standard matching algorithm.

[0033] In the above-mentioned device, the preset common image features include:

[0034] At least one of the following: contour features, pattern features, color features, and texture features.

[0035] This application provides a location identification system, including:

[0036] An image acquisition device is used to acquire images of a scene containing at least one object from a preset acquisition location, thereby obtaining a scene image;

[0037] A positioning and recognition device is used to extract a template image corresponding to a single object from a scene image; perform image segmentation on the template image according to preset common image features to obtain a matching template; slide the matching template in the scene image; and obtain at least one predicted position information corresponding to the at least one object based on the matching degree between the matching template and the corresponding image portion at at least one preset sliding position in the scene image; perform target classification and recognition on at least one pre-classified image corresponding to the at least one predicted position information in the scene image to obtain the classification and recognition result of the at least one object; the at least one object belongs to at least one object category.

[0038] A control device is configured to generate operation instructions for a target object among the at least one object based on the at least one predicted location information and the classification and recognition results of the at least one object;

[0039] An execution device is used to operate on the target object according to the operation instructions.

[0040] This application provides a positioning and identification device, including:

[0041] Memory, used to store executable instructions;

[0042] The processor, when executing executable instructions stored in the memory, implements the positioning and identification method provided in the embodiments of this application.

[0043] This application provides a computer storage medium storing executable instructions, which, when executed by a processor, implement the positioning and identification method provided in this application.

[0044] The embodiments of this application have the following beneficial effects:

[0045] In this embodiment, a matching template is generated based on the preset common image features of at least one object, and at least one predicted position information of at least one object in the scene image is obtained through the template matching method, which can improve the accuracy of object positioning; furthermore, target classification and recognition are performed on at least one pre-classified image corresponding to at least one predicted position information, which reduces the interference of irrelevant information on object classification and recognition and improves the accuracy of classification and recognition. Attached Figure Description

[0046] Figure 1 This is an optional structural diagram of the positioning and identification system architecture provided in the embodiments of this application;

[0047] Figure 2 This is a schematic diagram of an optional location of the image acquisition device in the positioning and recognition system architecture provided in this application embodiment;

[0048] Figure 3 This is an optional structural schematic diagram of the positioning and identification device provided in the embodiments of this application;

[0049] Figure 4 This is an optional flowchart illustrating the positioning and identification method provided in an embodiment of this application;

[0050] Figure 5 This is an optional flowchart illustrating the positioning and identification method provided in an embodiment of this application;

[0051] Figure 6 This is an optional schematic diagram of a Chinese chess board scene image provided in the embodiments of this application;

[0052] Figure 7 This is an optional schematic diagram of the chess piece image provided in the embodiments of this application;

[0053] Figure 8 This is an optional schematic diagram of a matching template extracted from a chess piece image, provided in an embodiment of this application.

[0054] Figure 9 This is an optional flowchart illustrating the positioning and identification method provided in an embodiment of this application;

[0055] Figure 10 This is a schematic diagram illustrating the process of the matching template provided in this application sliding on a scene image;

[0056] Figure 11 This is an optional flowchart illustrating the positioning and identification method provided in an embodiment of this application;

[0057] Figure 12 This is a schematic diagram illustrating the effect of a matching score distribution provided in an embodiment of this application;

[0058] Figure 13 This is a schematic diagram illustrating the effect of generating at least one candidate region provided in an embodiment of this application;

[0059] Figure 14 This is a schematic diagram of the classification and recognition results of at least one image to be classified provided in an embodiment of this application. Detailed Implementation

[0060] To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings. The described embodiments should not be regarded as limitations on this application. All other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0061] In the following description, references are made to “some embodiments,” which describe a subset of all possible embodiments. However, it is understood that “some embodiments” may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict.

[0062] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.

[0063] In the implementation of this application, the collection and processing of relevant data should strictly comply with the requirements of relevant laws and regulations, obtain the informed consent or separate consent of the personal information subject, and carry out subsequent data use and processing within the scope of laws and regulations and the authorization of the personal information subject.

[0064] Before providing a further detailed description of the embodiments of this application, the nouns and terms involved in the embodiments of this application will be explained, and the nouns and terms involved in the embodiments of this application shall be interpreted as follows.

[0065] 1) Artificial Intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess the functions of perception, reasoning, and decision-making.

[0066] Artificial intelligence (AI) is a comprehensive discipline encompassing a wide range of fields, including both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies primarily include computer vision, speech processing, natural language processing, and machine learning / deep learning.

[0067] 2) Computer Vision (CV) is a science that studies how to enable machines to "see." More specifically, it refers to machine vision, which uses cameras and computers to replace human eyes for target recognition, tracking, and measurement, and further performs image processing to create images more suitable for human observation or transmission to instruments. As a scientific discipline, computer vision studies related theories and technologies, attempting to build artificial intelligence systems capable of extracting information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content / behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous localization and mapping (SLAM), and common biometric recognition technologies such as facial recognition and fingerprint recognition.

[0068] 3) Machine Learning (ML) is a multidisciplinary field involving probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. It specifically studies how computers can simulate or implement human learning behavior to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to endow computers with intelligence; its applications span all areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and instructional learning.

[0069] 4) A robot is a machine that automatically performs tasks, generally composed of an actuator, drive unit, detection device, control system, and complex mechanics. It can be commanded by humans, run pre-programmed procedures, or act according to principles established using artificial intelligence technology. Its task is to assist or replace human work. A chess-playing robot is a specific application of robots in chess games, capable of autonomously completing the entire chess-playing process, much like a human.

[0070] With the research and advancement of artificial intelligence (AI) technology, AI is being studied and applied in various fields, such as smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, autonomous driving, drones, robots, smart healthcare, and smart customer service. It is believed that with the development of technology, AI will be applied in more fields and play an increasingly important role.

[0071] The solutions provided in this application relate to technologies such as artificial intelligence and computer vision, and are specifically illustrated through the following embodiments:

[0072] Currently, for object localization scenarios, the commonly used method is to obtain the object's position in the acquired image using a neural network, then use a depth camera to measure distance and obtain the object's 3D spatial position within the camera's view. Combining the object's position in the image with its 3D spatial position, the final 3D spatial location of the object is obtained. However, neural network recognition is prone to false positives and false negatives, and the center of the object detected by the neural network can be shifted due to background interference. Furthermore, depth camera ranging also has significant errors, reducing the accuracy of object localization. This is especially problematic in scenarios requiring high precision and accuracy, such as robot chess, where the localization performance is very poor, consequently affecting the accuracy of further image recognition based on localization.

[0073] This application provides a location identification method, apparatus, device, system, and computer storage medium, which can improve the accuracy of location identification. The following describes exemplary applications of the location identification device provided in this application. The location identification device provided in this application can be implemented as various types of user terminals such as laptops, tablets, desktop computers, set-top boxes, and mobile devices (e.g., mobile phones, portable music players, personal digital assistants, dedicated messaging devices, portable gaming devices), or as a server. The following describes exemplary applications when the location identification device is implemented as a server.

[0074] See Figure 1 , Figure 1 This is an optional architecture diagram of the positioning and recognition system 100 provided in the embodiments of this application. In order to support a positioning and recognition task, such as a robot playing chess, the image acquisition device 400 is connected to the server 200 through the network 300, the server 200 is connected to the control device 600, and the control device 600 is connected to the execution device 500. The network 300 can be a wide area network or a local area network, or a combination of the two.

[0075] The acquisition device 400 is used to acquire images of a scene containing at least one object from a preset acquisition location, obtain a scene image, and transmit the scene image to the server 200. Here, for the robot chess task, at least one object can be multiple chess pieces; the scene image can be a chessboard image containing images of multiple chess pieces. In some embodiments, the preset acquisition location can be the scene, such as directly above the chessboard.

[0076] Server 200 extracts a template image corresponding to a single object from a scene image. Based on preset common image features, it performs image segmentation on the template image to obtain a matching template. The matching template is then slid across the scene image, and based on the matching degree between the matching template and its corresponding image portion at at least one preset sliding position in the scene image, at least one predicted position information corresponding to at least one object is obtained. Here, the preset common image features can be common features of multiple chess pieces. For example, if the outlines of chess pieces are all circles of the same size, the shape outlines of the chess pieces can be used as preset common image features to generate the border of the chess piece's ring as the matching template. Server 200 transmits at least one predicted position information and the classification and recognition results of at least one object to the control execution device 400. The server 200 and the control execution device 400 can be connected via a network or other device connection methods; no specific limitation is made here.

[0077] The control device 600 is used to generate operation instructions for a target object among at least one set of ...

[0078] The execution device 500 is used to operate on a target object according to operation instructions. Here, the execution device may include the robot's mechanical gripper, which can grasp the target chess piece and place it at the target position according to the above operation instructions, thereby completing a robot chess-playing operation.

[0079] In some embodiments, server 200 may be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. Control device 600 may be a smartphone, tablet, laptop, desktop computer, smart speaker, smartwatch, etc., but is not limited to these. The control device and server can be directly or indirectly connected via wired or wireless communication, which is not limited in this embodiment of the invention.

[0080] Here, it should be noted that, Figure 1 The image acquisition device 400 is connected to the execution device 500. The default starting position of the execution device is the preset acquisition position of the image acquisition device 400, such as the position directly above the chessboard. This allows the image acquisition device 400 to acquire images of the chessboard from the preset acquisition position, avoiding distortion of the acquired scene image caused by acquisition angle deviation, which would affect positioning accuracy. In some embodiments, after each movement and operation on the target object, such as after each time a target chess piece is grabbed and moved to the target position, the execution device can return to its default starting position so that the image acquisition device can be located at the preset acquisition position for image acquisition during the next chess move.

[0081] In some embodiments, the image acquisition device can also be fixed at a preset acquisition position for acquiring the scene using a support component. Figure 2 The diagram shows an image acquisition device 110 fixed at a preset acquisition position directly above the chessboard by a support component 111, and performing scene image acquisition on the chessboard.

[0082] See Figure 3 , Figure 3 This is a schematic diagram of the structure of the server 200 provided in the embodiments of this application. Figure 3 The server 200 shown includes at least one processor 210, memory 250, at least one network interface 220, and a user interface 230. The various components in server 200 are coupled together via a bus system 240. It is understood that the bus system 240 is used to implement communication between these components. In addition to a data bus, the bus system 240 also includes a power bus, a control bus, and a status signal bus. However, for clarity, ... Figure 3 The general labeled all buses as Bus System 240.

[0083] Processor 210 can be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Among them, the general-purpose processor can be a microprocessor or any conventional processor, etc.

[0084] User interface 230 includes one or more output devices 231 that enable the presentation of media content, including one or more speakers and / or one or more visual displays. User interface 230 also includes one or more input devices 232, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

[0085] The memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid-state storage, hard disk drives, optical disk drives, etc. The memory 250 may optionally include one or more storage devices physically located away from the processor 210.

[0086] The memory 250 may include volatile memory or non-volatile memory, or both. The non-volatile memory may be read-only memory (ROM), and the volatile memory may be random access memory (RAM). The memory 250 described in this application embodiment is intended to include any suitable type of memory.

[0087] In some embodiments, memory 250 is capable of storing data to support various operations, examples of which include programs, modules, and data structures or subsets or supersets thereof, as illustrated below.

[0088] Operating system 251 includes system programs for handling various basic system services and performing hardware-related tasks, such as the framework layer, core library layer, driver layer, etc., for implementing various basic business functions and handling hardware-based tasks;

[0089] The network communication module 252 is used to reach other computing devices via one or more (wired or wireless) network interfaces 220, such as Bluetooth, WiFi, and Universal Serial Bus (USB).

[0090] Presentation module 253 is configured to enable the presentation of information (e.g., a user interface for operating peripheral devices and displaying content and information) via one or more output devices 231 associated with user interface 230 (e.g., a display screen, a speaker, etc.).

[0091] The input processing module 254 is used to detect and translate one or more user inputs or interactions from one or more input devices 232.

[0092] In some embodiments, the apparatus provided in this application can be implemented in software. Figure 3 A positioning and identification device 255 stored in memory 250 is shown. It can be software in the form of programs and plug-ins, including the following software modules: generation module 2551, positioning module 2552 and identification module 2553. These modules are logical and can therefore be arbitrarily combined or further split according to the functions they implement.

[0093] The functions of each module will be explained below.

[0094] In other embodiments, the apparatus provided in this application can be implemented in hardware. As an example, the apparatus provided in this application can be a processor in the form of a hardware decoding processor, which is programmed to execute the positioning and identification method provided in this application. For example, the processor in the form of a hardware decoding processor can be one or more application-specific integrated circuits (ASICs), DSPs, programmable logic devices (PLDs), complex programmable logic devices (CPLDs), field-programmable gate arrays (FPGAs), or other electronic components.

[0095] The location identification method provided in this application will be described in conjunction with exemplary applications and implementations of the server provided in the embodiments of this application.

[0096] See Figure 4 , Figure 4 This is an optional flowchart illustrating the positioning and identification method provided in the embodiments of this application, which will be combined with... Figure 4 The steps shown are explained.

[0097] S101. Obtain the preset common image features corresponding to at least one object in the scene image and generate a matching template.

[0098] The positioning and recognition method provided in this application is applicable to scenarios where the objects are relatively fixed, but the positioning accuracy and classification accuracy are required to be extremely high, such as items with fixed shapes but different types on an assembly line, robots playing chess, or other object positioning and recognition scenarios with common stable features.

[0099] In this embodiment of the application, the positioning and recognition device acquires a scene image containing at least one object, and then generates a matching template based on the preset common image features corresponding to the at least one object in the scene image.

[0100] In some embodiments, the preset common image features can be image features shared by at least one object image extracted from at least one object image corresponding to at least one object in a scene image, based on prior knowledge of at least one object. The preset common image features can also be image features shared by at least one object image extracted by comparing at least one object image. The specific selection depends on the actual situation, and this application embodiment does not limit this.

[0101] In some embodiments, the preset common image features include at least one of contour features, pattern features, color features, and texture features.

[0102] In this embodiment, the contour feature can be a shape or outline feature common to at least one object image; for example, for a chess piece, the contour feature can be a circular chess piece outline; the pattern feature can be a distinctive pattern feature common to at least one object image; for example, for a product on an assembly line, the pattern feature can be a pattern identifier common to the product. The color feature can be a color distribution, color composition, and interrelationships common to at least one object image; the texture feature can be an image texture feature common to at least one object image.

[0103] In some embodiments, the preset common image features may also be other types of common visual features of at least one object. The specific selection is made according to the actual situation, and this application embodiment does not limit it.

[0104] S102. Slide the matching template in the scene image, and obtain at least one predicted position information corresponding to at least one object based on the matching degree between the matching template and the corresponding image part at at least one preset sliding position in the scene image.

[0105] In this embodiment of the application, the scene image contains at least one preset sliding position, i.e., preset sliding point coordinates. The positioning and recognition device can use the matching template to slide in the scene image. At each preset sliding position, the matching degree between the matching template and the image part in the area is calculated. The matching degree corresponding to each preset sliding position is obtained by traversing at least one preset sliding position.

[0106] In this embodiment of the application, the positioning and recognition device can then determine at least one predicted position information of at least one object in the scene image based on the matching degree corresponding to each preset sliding position.

[0107] In some embodiments, the positioning recognition device may use at least one preset sliding position with a high degree of matching as at least one predicted position information.

[0108] S103. In the scene image, target classification and recognition are performed on at least one pre-classified image corresponding to at least one predicted location information to obtain the classification and recognition result of at least one object; at least one object belongs to at least one object category.

[0109] In this embodiment, when the positioning and recognition device obtains at least one predicted location information, it obtains the positioning result of at least one object in the scene image. Based on at least one predicted location information, the positioning and recognition device can locate at least one predicted region corresponding to at least one object in the scene image, and then perform target classification and recognition on the image within at least one predicted region, i.e., at least one pre-classified image, instead of classifying and recognizing the entire image. This targeted recognition can greatly improve the accuracy of classification and recognition.

[0110] In this embodiment of the application, the positioning and recognition device uses the classification and recognition results of at least one pre-classified image as the classification and recognition results of at least one object.

[0111] It is understood that in the embodiments of this application, generating a matching template based on the preset common image features of at least one object and obtaining at least one predicted position information of at least one object in the scene image through the template matching method can improve the accuracy of object positioning; and, performing target classification and recognition on at least one pre-classified image corresponding to at least one predicted position information reduces the interference of irrelevant information on object classification and recognition, and improves the accuracy of classification and recognition.

[0112] In some embodiments, see Figure 5 , Figure 5 This is an optional flowchart illustrating the positioning and identification method provided in an embodiment of this application, based on... Figure 4 S101 can be achieved by executing S1011-S1013.

[0113] S1011. Use an image acquisition device to acquire an image of a scene containing at least one object from a preset acquisition position, obtaining a scene image.

[0114] In an embodiment of the present application, the positioning and recognition device can use an image acquisition device, such as a camera, a camera, an image sensor, etc., to acquire an image of a real scene that needs to be positioned and recognized from a preset acquisition position, obtaining a scene image. Among them, the real scene image contains at least one object.

[0115] S1012. Extract the image part corresponding to a single object from the scene image as a template image.

[0116] In an embodiment of the present application, the positioning and recognition device can extract the image part corresponding to a single object from the scene image. Exemplarily, from a scene image of a chessboard containing multiple chess piece images, extract the image of any single chess piece as a template image.

[0117] In some embodiments, for a scene image of a chessboard as shown in Figure 6 The positioning and recognition device can extract the image of a chess piece from it. Exemplarily, extract the image of the chess piece "Shi" as a template image, as shown in Figure 7 shown.

[0118] S1013. Perform image segmentation on the template image according to preset common image features to obtain a matching template.

[0119] In an embodiment of the present application, the positioning and recognition device can perform image segmentation on the template image according to at least one preset common image feature of the object. Exemplarily, when at least one object is a Chinese chess piece, its preset common image feature can be the ring on the chess piece image, and perform image segmentation on the template image to obtain a matching template.

[0120] In some embodiments, for a template image as shown in Figure 7 The positioning and recognition device can perform image segmentation on the template image by means of color segmentation to obtain the "Shi" character pattern and the ring border pattern. The positioning and recognition device uses the preset common image feature, that is, the ring, and takes the segmented ring border pattern as the matching template, as shown in Figure 8 shown.

[0121] In some embodiments, after the positioning and recognition device performs image segmentation on the template image, it can further process the segmented image, such as manually removing noise points, etc., to improve the image clarity and obtain a matching template.

[0122] It is understood that, in this application embodiment, by performing image segmentation on the template image of a single object based on the preset common image features of at least one object, a matching template is obtained, which enables the location of at least one object in the scene image to be performed using the matching template containing the preset common image features, thereby improving the positioning accuracy.

[0123] In some embodiments, see Figure 9 , Figure 9 This is an optional flowchart illustrating the positioning and identification method provided in an embodiment of this application, based on... Figure 4 or Figure 5 S102 can be achieved by executing S1021-S1023, which will be explained in conjunction with each step.

[0124] S1021. Align the center position of the matching template with at least one preset sliding position to obtain the matching area of ​​the matching template at each preset sliding position.

[0125] In this embodiment of the application, the positioning and identification device obtains the matching area corresponding to each preset sliding position of the matching template by aligning the center position of the matching template with at least one preset sliding position.

[0126] In some embodiments, at least one preset sliding position can be the coordinates of each pixel in the scene image. The positioning and recognition device can slide the matching template along a preset sliding trajectory, such as from left to right and from top to bottom in the scene image, aligning the center position of the matching template with the coordinates of each pixel in the scene image, traversing the entire scene image to obtain the matching area corresponding to each preset sliding position of the matching template. Figure 10 As shown.

[0127] In some embodiments, at least one preset sliding position may be pre-specified among all pixels contained in the scene image. For example, one or more pixels within a range in the scene image may be pre-specified as at least one preset sliding position. Alternatively, at least one pixel coordinate point selected by a preset filtering strategy may be used as at least one preset sliding position to reduce the computational load of the positioning and recognition device when performing template sliding matching and improve the matching speed. The specific selection is made according to the actual situation, and the embodiments of this application do not limit it.

[0128] In some embodiments, the positioning and recognition device may also align the center position of the matching template with at least one preset sliding position in parallel processing to obtain the matching area of ​​the matching template at each preset sliding position.

[0129] S1022. Calculate the matching degree between the matching template and the image portion within the region to be matched, and obtain the matching score corresponding to each preset sliding position.

[0130] In this embodiment of the application, the positioning and recognition device calculates the matching degree between the matching template and the image portion within the area to be matched at each sliding position, and obtains the matching score corresponding to each preset sliding position.

[0131] In some embodiments, the positioning and recognition device can calculate the degree of image matching between the matching template and the image portion within the region to be matched using matching degree calculation methods such as squared difference matching, correlation matching, and standard matching, to obtain the matching score corresponding to each preset sliding position.

[0132] In some embodiments, when using a relevance matching algorithm to calculate the matching degree, the matching score can be a value in the range [-1, 1]. A matching score of 1 indicates a perfect match (positive relevance), a matching score of -1 indicates a negative relevance, and a matching score of 0 indicates a zero-relevance match (no relevance whatsoever).

[0133] As can be seen, the matching score represents the degree of matching between the image portion within the region to be matched and the preset common image features. When the correlation between the image portion within the region to be matched and the preset common image features is high, it indicates that the image portion within the region to be matched is more likely to be the image corresponding to the object.

[0134] S1023. Based on the preset matching strategy and matching score, determine at least one predicted position information from at least one preset sliding position.

[0135] In this embodiment of the application, when the positioning and identification device obtains the matching score corresponding to each preset sliding position, it can filter the matching score according to the preset matching strategy, determine at least one target matching score that satisfies the preset matching strategy, and use the preset sliding position corresponding to the at least one target matching score as at least one predicted position information, thereby determining at least one predicted position information from the at least one preset sliding position.

[0136] In this embodiment, the preset matching strategy may be to use the preset sliding positions corresponding to the top number of matching scores of the top number of objects sorted from high to low based on the number of objects of at least one object as at least one predicted position information. Other preset matching strategies may also be selected according to the actual situation, and the specific selection is based on the actual situation, which is not limited in this embodiment.

[0137] It is understood that by using a matching template to perform sliding matching in a scene image, the embodiments of this application can accurately locate the position of at least one object in the scene image, thereby improving the accuracy of object positioning.

[0138] In some embodiments, see Figure 11 , Figure 11 This is an optional flowchart illustrating the positioning and identification method provided in an embodiment of this application, based on... Figure 4 , Figure 5 or Figure 9 S103 can be achieved by executing S1031-S1033, which will be explained in conjunction with each step.

[0139] S1031. Based on at least one predicted location information, generate at least one candidate region according to a preset region size.

[0140] In this embodiment of the application, the positioning and recognition device can generate at least one candidate region based on a preset region size on at least one predicted location information in a scene image.

[0141] In some embodiments, the positioning and identification device can use each predicted location information in at least one predicted location information as a center point, and generate a candidate region corresponding to each predicted location according to a preset region size, thereby obtaining at least one candidate region.

[0142] In some embodiments, for chess piece positioning scenarios, since the size of each chess piece is fixed, when the positioning and recognition device obtains at least one predicted position information, it can use the target detection neural network in artificial intelligence technology to take the size of the chess piece as the preset area size, and generate at least one candidate box of the preset area size as the center point of each predicted position information, as at least one candidate area.

[0143] S1032. Take an image portion within at least one candidate region as at least one pre-classified image, and perform classification prediction for at least one object category on each of the at least one pre-classified images to obtain the prediction result for each object category corresponding to each pre-classified image.

[0144] In this embodiment of the application, the positioning and recognition device can take an image portion within at least one candidate region as at least one pre-classified image, and classify and recognize the at least one pre-classified image through a convolutional neural network to obtain the probability of each pre-classified image belonging to each object category, which is used as the prediction result for each object category corresponding to each pre-classified image.

[0145] Here, the convolutional neural network can be a multi-object detection neural network, used to output the probability that each pre-classified image belongs to at least one object category. The localization and recognition device can then predict the object category of each pre-classified image based on the probability of each pre-classified image belonging to at least one object category output by the convolutional neural network, and obtain the recognition result of at least one object.

[0146] In some embodiments, the convolutional neural network can be a multi-object classification and recognition network model obtained by training an initial convolutional neural network model using a sample image set of at least one object through machine learning methods. For example, the positioning and recognition device can acquire an image of each chess piece and label the image of each chess piece with its corresponding chess piece category, as a sample image set; then, the multi-object classification and recognition network model can be trained using the sample image set.

[0147] In some embodiments, the multi-object classification and detection network model can be a You Only Look Once (YOLO) model or other multi-object detection models. The specific choice is made according to the actual situation, and the embodiments of this application do not limit it.

[0148] S1033. Based on the prediction results of each object category corresponding to each pre-classified image, obtain the classification and recognition results of at least one object.

[0149] In this embodiment of the application, the classification and recognition device can predict the final object category of each pre-classified image based on the prediction result of each object category corresponding to each pre-classified image, and use this prediction result as the object category to which the object corresponding to each pre-classified image belongs, thereby identifying the object category to which each object belongs, and using this identification result as the classification and recognition result of at least one object.

[0150] It is understood that by classifying and recognizing at least one pre-classified image corresponding to at least one predicted location, the embodiments of this application can narrow the scope of target detection processing of the neural network, limit it to predicting the relevant pre-classified image at each predicted location, reduce the interference of background images, and thus improve the accuracy of positioning and recognition.

[0151] The following example illustrates an exemplary application of the positioning and recognition method described in this application, specifically in the scenario of a robot playing chess.

[0152] In this embodiment, the chess scene mainly consists of a chessboard and chess pieces with rings. The positioning and recognition device first takes a picture of the chess scene using a camera suspended directly above the chessboard. From the scene image, the image of each individual chess piece is extracted. After obtaining the chess piece image, color segmentation and manual removal of noise points are used to obtain the border of the chess piece's ring as a matching template.

[0153] In this embodiment, before each move by the robot, the positioning and recognition device captures an image of the current scene using a camera to obtain the nearest arrangement of pieces on the chessboard. The positioning and recognition device can employ a template matching method, sliding a matching template across the current scene image from left to right and top to bottom. At each pixel of the current scene image, the matching degree between the template and the local area of ​​the image (i.e., the region to be matched) is calculated, resulting in the positions of all the rings in the image. After traversing the entire image, a matching score distribution map is obtained. For example, the positioning and recognition device... Figure 6 The template matching of the chess scene shown can be performed to obtain a matching score distribution map as follows: Figure 12 As shown, Figure 12 In the image, the closer the circle is to white, the higher the degree of matching between the matching template and the area to be matched. The positioning and recognition device can determine the top 32 matching scores based on the number of chess pieces from the matching score distribution map. The preset sliding position corresponding to the top 32 matching scores is used as at least one predicted position, thereby locating the center point of each chess piece in the current scene image.

[0154] In this embodiment, since the default chess piece size is fixed, the candidate box size is also fixed. Here, 79*79 can be selected as the preset candidate box size, i.e., the preset region size. A 79*79 candidate box to be classified is generated at the center point of each chess piece, serving as at least one candidate region. Figure 13 The bounding box region is shown in the image. The positioning and recognition device feeds the current scene image containing multiple candidate boxes to be classified into a convolutional neural network with ResNet18 as the backbone. Then, a fully connected network transforms the feature tensor output by the ResNet18 backbone into a one-dimensional vector. Finally, the 79*79*3 dimensional pre-classified image of each candidate box is transformed into a 1*14 dimensional vector, with each dimension corresponding to the confidence score of a chess category. Here, the 79*79*3 dimensions correspond to the length, width, and RGB values ​​of the pre-classified image, respectively, and each dimension of the 1*14 dimensional vector corresponds to an object category. When at least one object is a chess piece, the dimensions of the 1*14 dimensional vector can be as follows: Figure 14As shown, the following are included: w_chariot, corresponding to the white "chariot"; w_horse, corresponding to the white "horse"; w_elepha, corresponding to the white "elephant"; w_general, corresponding to the white "general"; w_advisor, corresponding to the white "advisor"; w_cannon, corresponding to the white "cannon"; w_soldier, corresponding to the white "pawn"; r_soldier, corresponding to the red "soldier"; r_cannon, corresponding to the red "cannon"; r_chariot, corresponding to the red "chariot"; r_horse, corresponding to the red "horse"; r_elepha, corresponding to the red "elephant"; r_general, corresponding to the red "general"; r_advisor, corresponding to the red "advisor"; r_cannon, corresponding to the red "cannon". For each candidate box to be classified, the 1*14 dimensional vector is used by the positioning and recognition device to select the dimension with the highest value as the object category of the candidate box, thus obtaining the chess category at the corresponding position, as shown below. Figure 14 As shown, the positioning and recognition device uses template matching to obtain pre-classified images from the original current scene image of a chessboard, then uses a neural network to obtain the object category of each pre-classified image, and finally obtains the state information of all chess pieces on the chessboard image.

[0155] In some embodiments, since the state of the pieces changes during a game of chess, the positioning and recognition device can capture an image of the current scene of the chessboard at preset time intervals and apply the method of this embodiment to the latest image of the current scene. Alternatively, the positioning and recognition device can also capture an image of the current scene of the chessboard before each move in the game and apply the method of this embodiment to the latest image of the current scene. The specific choice depends on the actual situation, and this embodiment does not limit the choice.

[0156] It is understood that the Chinese chess recognition and positioning method provided in this application can achieve precise positioning of chess pieces, thereby providing visual support for robot chess playing, and does not require modification of the chessboard and pieces; common chess pieces with rings can meet the requirements. Experiments have shown that when this application embodiment is applied to a robot chess playing scenario, the recognition and positioning accuracy can reach 1mm, which meets the requirements of mechanical gripper grasping.

[0157] The following continues to describe the exemplary structure of the positioning and identification device 255 provided in the embodiments of this application as a software module. In some embodiments, such as Figure 3 As shown, the software module stored in the positioning and identification device 255 in the memory 250 may include:

[0158] The generation module 2551 is used to obtain at least one object in the scene image corresponding to a preset common image feature and generate a matching template.

[0159] The positioning module 2552 is used to slide the matching template in the scene image and obtain at least one predicted position information corresponding to the at least one object based on the matching degree between the matching template and the corresponding image part at at least one preset sliding position in the scene image.

[0160] The recognition module 2553 is used to perform target classification recognition on at least one pre-classified image corresponding to at least one predicted location information in the scene image, and obtain the classification recognition result of at least one object; the at least one object belongs to at least one object category.

[0161] In some embodiments, the positioning module 2551 is further configured to align the center position of the matching template with the at least one preset sliding position to obtain the matching region corresponding to the matching template at each preset sliding position; calculate the matching degree between the matching template and the image portion within the matching region to obtain the matching score corresponding to each preset sliding position; and determine the at least one predicted position information from the at least one preset sliding position according to the preset matching strategy and the matching score.

[0162] In some embodiments, the recognition module 2552 is further configured to generate at least one candidate region based on a preset region size on the at least one predicted location information; use the image portion within the at least one candidate region as the at least one pre-classified image; perform classification prediction of the at least one object category on each of the at least one pre-classified images to obtain a prediction result for each object category corresponding to each pre-classified image; and obtain a classification recognition result for the at least one object based on the prediction result for each object category corresponding to each pre-classified image.

[0163] In some embodiments, the generation module 2553 is further configured to acquire an image of a scene containing at least one object from a preset acquisition location using an image acquisition device to obtain the scene image; extract the image portion corresponding to a single object from the scene image as a template image; and perform image segmentation on the template image according to the preset common image features to obtain the matching template.

[0164] In some embodiments, the matching degree calculation includes:

[0165] Any one of the following: squared difference matching algorithm, correlation matching algorithm, and standard matching algorithm.

[0166] In some embodiments, the preset common image features include:

[0167] At least one of the following: contour features, pattern features, color features, and texture features.

[0168] It should be noted that the description of the above device embodiments is similar to the description of the above method embodiments, and has similar beneficial effects. For technical details not disclosed in the device embodiments of the present invention, please refer to the description of the method embodiments of the present invention for understanding.

[0169] This application provides a computer storage medium storing executable instructions. This computer storage medium is a computer-readable storage medium, wherein the executable instructions are stored. When the executable instructions are executed by a processor, the processor will execute the method provided in this application embodiment, for example... Figure 4 , 5 The method shown in 9\11.

[0170] In some embodiments, the computer-readable storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or it may be a variety of devices including one or any combination of the above-mentioned memories.

[0171] In some embodiments, executable instructions may take the form of a program, software, software module, script, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and may be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

[0172] As an example, executable instructions may, but do not necessarily, correspond to files in the file system. They may be stored as part of a file that holds other programs or data, for example, in one or more scripts in a Hyper Text Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple collaborative files (e.g., a file that stores one or more modules, subroutines, or code sections).

[0173] As an example, executable instructions can be deployed to execute on a single computing device, or on multiple computing devices located in one location, or on multiple computing devices distributed across multiple locations and interconnected via a communication network.

[0174] In summary, by generating a matching template based on preset common image features of at least one object and obtaining at least one predicted position information of at least one object in the scene image through template matching, the accuracy of object localization can be improved. Furthermore, by performing target classification and recognition on at least one pre-classified image corresponding to at least one predicted position information, interference from irrelevant information on object classification and recognition is reduced, thus improving the accuracy of classification and recognition. The localization and recognition method in this embodiment can be applied not only to scenarios involving robots playing chess, but also to the localization of other objects with common stable features, such as the automatic detection and recognition of products on an assembly line, demonstrating good localization and recognition performance.

[0175] The above description is merely an embodiment of this application and is not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, and improvements made within the spirit and scope of this application are included within the scope of protection of this application.

Claims

1. A positioning and identification method, characterized in that, include: Obtain a preset common image feature, including contour features, pattern features, color features and texture features, corresponding to at least one object in the scene image. Use the image portion corresponding to a single object extracted from the scene image as a template image. Perform image segmentation on the template image according to the preset common image feature to generate a matching template. The matching template is slid across the scene image to traverse the scene image and obtain a matching score distribution map. Based on the preset number of objects of the at least one object, a number of preset sliding positions that reach the preset matching score are determined from the matching score distribution map, which are equal to the number of objects, and are used as at least one predicted position information corresponding to the at least one object. Based on the at least one predicted location information, at least one candidate region is generated according to a preset region size; and the image portion within the at least one candidate region is used as at least one pre-classified image. In the scene image, target classification and recognition are performed on the at least one pre-classified image corresponding to the at least one predicted location information to obtain the classification and recognition result of the at least one object; The at least one object belongs to at least one object category.

2. The method according to claim 1, characterized in that, The step of traversing the scene image to obtain a matching score distribution map includes: Align the center position of the matching template with the at least one preset sliding position to obtain the matching area of ​​the matching template at each preset sliding position; Calculate the matching degree between the matching template and the image portion within the region to be matched, obtain the matching score corresponding to each preset sliding position, and obtain the matching score distribution map based on the matching score; The step of determining a number of preset sliding positions that reach a preset matching score, equal to the number of objects, from the matching score distribution map, as at least one predicted position information corresponding to the at least one object, includes: Based on the preset matching strategy and the matching score, at least one predicted position information that reaches the preset matching score is determined from at least one preset sliding position in the matching score distribution map, with the same number of objects as the number of objects.

3. The method according to claim 1, characterized in that, The step of performing target classification and recognition on at least one pre-classified image corresponding to at least one predicted location information in the scene image to obtain the classification and recognition result of at least one object includes: For each of the at least one pre-classified images, perform classification prediction of the at least one object category to obtain the prediction result for each object category corresponding to each pre-classified image; Based on the prediction results for each object category corresponding to each pre-classified image, the classification and recognition results of at least one object are obtained.

4. The method according to claim 1, characterized in that, The method further includes: Using an image acquisition device, images of a scene containing at least one object are acquired from a preset acquisition location to obtain the scene image.

5. The method according to claim 2, characterized in that, The calculation of the matching degree between the matching template and the image portion within the region to be matched includes: The matching degree is calculated using any one of the squared difference matching algorithm, the correlation matching algorithm, or the standard matching algorithm.

6. A positioning and identification system, characterized in that, include: An image acquisition device is used to acquire images of a scene containing at least one object from a preset acquisition location, thereby obtaining a scene image; The positioning and recognition device is used to extract a template image corresponding to a single object from the scene image, and to perform image segmentation on the template image based on preset common image features including contour features, pattern features, color features and texture features to obtain a matching template; The matching template is slid across the scene image to traverse the scene image and obtain a matching score distribution map. Based on the preset number of objects of the at least one object, a number of preset sliding positions that reach the preset matching score are determined from the matching score distribution map, which are equal to the number of objects, and are used as at least one predicted position information corresponding to the at least one object. Based on the at least one predicted location information, at least one candidate region is generated according to a preset region size; the image portion within the at least one candidate region is used as at least one pre-classified image; in the scene image, target classification and recognition are performed on the at least one pre-classified image corresponding to the at least one predicted location information to obtain the classification and recognition result of the at least one object. The at least one object belongs to at least one object category; A control device is configured to generate operation instructions for a target object among the at least one object based on the at least one predicted location information and the classification and recognition results of the at least one object; An execution device is used to operate on a target object according to the operation instructions.

7. A positioning and identification device, characterized in that, include: The generation module is used to obtain a preset common image feature, including contour features, pattern features, color features and texture features, corresponding to at least one object in the scene image; to use the image portion corresponding to a single object extracted from the scene image as a template image; and to perform image segmentation on the template image according to the preset common image feature to generate a matching template. The positioning module is used to slide the matching template in the scene image, traverse the scene image to obtain a matching score distribution map, and determine a number of preset sliding positions that reach the preset matching score from the matching score distribution map based on the preset number of objects of the at least one object, which is equal to the number of objects, as at least one predicted position information corresponding to the at least one object. The recognition module is configured to generate at least one candidate region based on a preset region size on the at least one predicted location information; use the image portion within the at least one candidate region as at least one pre-classified image; and perform target classification recognition on the at least one pre-classified image corresponding to the at least one predicted location information in the scene image to obtain the classification recognition result of the at least one object. The at least one object belongs to at least one object category.

8. The apparatus according to claim 7, characterized in that, The identification module is also used for: For each of the at least one pre-classified images, perform classification prediction of the at least one object category to obtain the prediction result for each object category corresponding to each pre-classified image; Based on the prediction results for each object category corresponding to each pre-classified image, the classification and recognition results of at least one object are obtained.

9. A positioning and identification device, characterized in that, include: Memory, used to store executable instructions; A processor, when executing executable instructions stored in the memory, implements the method according to any one of claims 1 to 5.

10. A computer storage medium, characterized in that, It stores executable instructions for implementing the method of any one of claims 1 to 5 when executed by a processor.