Object recognition method, apparatus, electronic device, and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By collecting and locally filtering multiple sets of facial image sequences, the amount of data transmitted to the server is reduced, solving the problem of high data consumption in object recognition and improving recognition efficiency and accuracy.

CN115410241BActive Publication Date: 2026-06-16TENCENT TECHNOLOGY (SHENZHEN) CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date: 2021-05-26
Publication Date: 2026-06-16

Application Information

Patent Timeline

26 May 2021

Application

16 Jun 2026

Publication

CN115410241B

IPC: G06V40/16; G06V10/75

AI Tagging

Application Domain

Character and pattern recognition

Technical Efficacy Phrases

reduce consumptionImprove recognition efficiency

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A liftable track-type living room lighting device
CN224364798U
A ternary supramolecular deep eutectic solvent and a preparation method and application thereof
CN122233861AImprove efficiencyeasy to identify
A method for reliability evaluation of structural integrity of a solid rocket engine grain
CN122242133AImprove efficiency Improve applicability
Construction method of spatial cable of suspension bridge
CN120906052BSuspension bridgeBridge structural details
A method for modeling performance of inorganic solid-state electrolytes using graph theory and machine learning
CN122224366AShort screening cycleAchieve fine characterizationBiological models Design optimisation/simulation

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

In existing technologies, object recognition consumes a lot of bandwidth, which squeezes the bandwidth of the server's parallel data processing, resulting in low recognition efficiency.

⚗Method used

At least two sets of facial image sequences of the target object are collected, each set of sequences corresponding to one image type. First, the target facial image sequence to be identified is selected locally, then a detection request is sent to the server, image information is received and the matching target facial images are selected, and finally a recognition request is sent to perform object recognition.

🎯Benefits of technology

It effectively reduces traffic consumption during object recognition, improves recognition efficiency, and ensures that the server can quickly and accurately recognize objects.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115410241B_ABST

Patent Text Reader

Abstract

The application discloses an object recognition method and device, electronic equipment and storage medium, and relates to the technical field of cloud computing. The method comprises the following steps: collecting at least two groups of face image sequences of a target object, each group of face image sequences comprising at least one face image, and each group of face image sequences corresponding to one image type; determining a pre-recognized target face image sequence from the at least two groups of face image sequences; sending a detection request carrying the target face image sequence to a server, the detection request instructing the server to detect image information of a face image meeting a predetermined recognition condition from the target face image sequence; receiving image information returned by the server based on the detection request; screening out a target face image matching the image information; and sending a recognition request carrying the target face image to the server to instruct the server to recognize the target object based on the target face image. The application effectively reduces traffic consumption during object recognition and improves recognition efficiency.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of cloud computing technology, specifically to an object identification method, apparatus, electronic device, and storage medium. Background Technology

[0002] Object recognition is the process of identifying objects through facial images. For example, there are related technologies that use cloud computing for object recognition. Cloud computing is a computing model that distributes computing tasks across a resource pool consisting of a large number of computers, enabling various application systems to obtain computing power, storage space, and information services as needed.

[0003] Currently, related technologies directly transmit large amounts of facial data to a server for object recognition. This results in high data consumption during object recognition and server bandwidth being squeezed due to parallel data processing, leading to low recognition efficiency. Summary of the Invention

[0004] This application provides an object recognition method and related apparatus that can effectively reduce traffic consumption and improve recognition efficiency during object recognition.

[0005] To address the aforementioned technical problems, this application provides the following technical solutions:

[0006] According to one embodiment of this application, an object recognition method is provided, the method comprising: acquiring at least two sets of facial image sequences of a target object, each set of facial image sequences including at least one facial image, each set of facial image sequences corresponding to an image type; determining a target facial image sequence for pre-recognition from the at least two sets of facial image sequences; sending a detection request carrying the target facial image sequence to a server, the detection request instructing the server to detect image information of facial images that meet predetermined recognition conditions from the target facial image sequence; receiving the image information returned by the server based on the detection request; filtering out target facial images that match the image information from the at least two sets of facial image sequences; and sending a recognition request carrying the target facial image to the server to instruct the server to recognize the target object based on the target facial image.

[0007] According to one embodiment of this application, an object recognition device includes: an acquisition module for acquiring at least two sets of facial image sequences of a target object, each set of facial image sequences including at least one facial image, and each set of facial image sequences corresponding to an image type; a determination module for determining a target facial image sequence to be recognized from the at least two sets of facial image sequences; a transmission module for sending a detection request carrying the target facial image sequence to a server, the detection request instructing the server to detect image information of facial images that meet predetermined recognition conditions from the target facial image sequence; an acquisition module for receiving the image information returned by the server based on the detection request; a filtering module for filtering out target facial images that match the image information from the at least two sets of facial image sequences; and a recognition module for sending a recognition request carrying the target facial image to the server to instruct the server to recognize the target object based on the target facial image.

[0008] In some embodiments of this application, the acquisition module includes: an image stream acquisition unit, used to acquire at least two sets of facial image streams of a target object, each set of facial image streams including at least one initial facial image, and each set of facial image streams corresponding to an image type; an alignment unit, used to convert the initial facial images corresponding to the same acquisition time point between the facial image streams into facial images with pixel points aligned to the same coordinate space; and a sequence generation unit, used to concatenate the facial images obtained from the conversion of the initial facial images in each set of facial image streams to generate the at least two sets of facial image sequences.

[0009] In some embodiments of this application, the alignment unit includes: a time alignment subunit, used to perform time alignment processing on the at least two sets of face image streams to determine initial face images corresponding to the same exposure time between the face image streams; and a spatial alignment subunit, used to align the pixels in the initial face images corresponding to the same exposure time to the coordinate space corresponding to the target face image stream, to obtain the face image with pixels aligned to the same coordinate space.

[0010] In some embodiments of this application, the at least two sets of facial image streams include a color image stream, an infrared image stream, and a depth image stream. The image type corresponding to the color image stream includes a color image, the image type corresponding to the infrared image stream includes an infrared image, and the image type corresponding to the depth image stream includes a depth image. The spatial alignment subunit includes one of a first alignment subunit and a second alignment subunit. The first alignment subunit is used to: align the pixels in the initial facial images corresponding to the same exposure time in the color image stream, infrared image stream, and depth image stream to the coordinate space corresponding to the color image stream, to obtain facial images with pixels aligned to the same coordinate space. The second alignment subunit is used to: align the pixels in the initial facial images corresponding to the same exposure time in the color image stream, infrared image stream, and depth image stream to the coordinate space corresponding to the infrared image stream, to obtain one type of facial image with pixels aligned to the same coordinate space.

[0011] In some embodiments of this application, at least two sets of face image sequences include color face image sequences, and the image type corresponding to the color face image sequences includes color images; the transmission module includes: a first transmission unit, configured to send a detection request carrying the color face image sequences to the server, the detection request instructing the server to detect image information of face images that meet predetermined recognition conditions from the color face image sequences.

[0012] In some embodiments of this application, the first transmission unit is configured to: compress the color face image sequence to compress the color face image sequence into compressed color image data in a target format; and send a detection request carrying the compressed color image data to the server.

[0013] In some embodiments of this application, the at least two sets of face image sequences include an infrared face image sequence and a depth face image sequence corresponding to the same camera. The image type corresponding to the infrared face image sequence includes an infrared image, and the image type corresponding to the depth face image sequence includes a depth image. The transmission module includes a second transmission unit, configured to send a detection request carrying the infrared face image sequence to the server. The detection request instructs the server to detect image information of face images that meet predetermined recognition conditions from the infrared face image sequence.

[0014] In some embodiments of this application, the second transmission unit is used to compress the infrared face image sequence to compress the infrared face image sequence into compressed infrared image data in a target format; and to send a detection request carrying the compressed infrared image data to the server.

[0015] In some embodiments of this application, the image information includes the position information of a face image whose recognition meets predetermined recognition conditions in the target face image sequence; the filtering module includes: a search unit, used to search for a face image corresponding to the position information from each group of the face image sequences; and a target determination unit, used to use the searched face image as the target face image.

[0016] According to one embodiment of this application, an object recognition method includes: receiving a detection request sent by a terminal carrying a target face image sequence, wherein the target face image sequence originates from at least two sets of face image sequences of a target object, each set of face image sequences including at least one face image, and each set of face image sequences corresponding to an image type; detecting image information of face images that meet predetermined recognition conditions from the target face image sequence according to the detection request; transmitting the image information to the terminal so that the terminal can filter out target face images that match the image information from the at least two sets of face image sequences; receiving a recognition request sent by the terminal carrying the target face image; and recognizing the target object based on the target face image according to the recognition request.

[0017] According to one embodiment of this application, an object recognition device includes: a detection request receiving module, configured to receive a detection request sent by a terminal carrying a target face image sequence, wherein the target face image sequence originates from at least two sets of face image sequences of a target object, each set of face image sequences including at least one face image, and each set of face image sequences corresponding to an image type; a detection module, configured to detect image information of face images that meet predetermined recognition conditions from the target face image sequence according to the detection request; a sending module, configured to transmit the image information to the terminal, so that the terminal can filter out target face images that match the image information from the at least two sets of face image sequences; a recognition request receiving module, configured to receive a recognition request sent by the terminal carrying the target face image; and a recognition response module, configured to recognize the target object based on the target face image according to the recognition request.

[0018] In some embodiments of this application, the detection module includes: a feature extraction unit, used to perform feature extraction processing on the face images in the target face image sequence to obtain image feature information of the face images in the target image sequence; a recognition degree analysis unit, used to perform recognition degree analysis based on the image feature information of the face images in the target face image sequence to obtain the recognition degree of the face images in the target face image sequence; and an information extraction unit, used to extract image information corresponding to face images whose recognition degree meets predetermined recognition conditions from the target face image sequence.

[0019] In some embodiments of this application, the target face image includes a first face image of color type, a second face image of depth type, and a first face image of infrared type; the recognition response module includes: a first response unit, used to perform liveness detection based on the second face image and the third face image to determine the authenticity of the target object; and a second response unit, used to perform facial feature comparison recognition based on the first face image when the authenticity meets the target authenticity condition, so as to identify the target object.

[0020] According to another embodiment of this application, an electronic device may include: a memory storing computer-readable instructions; and a processor reading the computer-readable instructions stored in the memory to execute the methods described in the embodiments of this application.

[0021] According to another embodiment of this application, a storage medium stores computer-readable instructions thereon, which, when executed by a computer's processor, cause the computer to perform the methods described in the embodiments of this application.

[0022] According to another embodiment of this application, a computer program product or computer program includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the methods provided in the various optional implementations described in the embodiments of this application.

[0023] In this embodiment of the application, when performing object recognition, at least two sets of facial image sequences of the target object are collected, each set of facial image sequences includes at least one facial image, and each set of facial image sequences corresponds to an image type; a target facial image sequence for pre-recognition is determined from the at least two sets of facial image sequences; a detection request carrying the target facial image sequence from the at least two sets of facial image sequences is sent to the server, the detection request instructing the server to detect image information of facial images that meet predetermined recognition conditions from the target facial image sequences; image information returned by the server based on the detection request is received; target facial images that match the image information are filtered from the at least two sets of facial image sequences; and a recognition request carrying the target facial image is sent to the server to instruct the server to recognize the target object based on the target facial image.

[0024] In this way, after acquiring at least two sets of facial image sequences, the pre-identified target facial image sequence is transmitted to the server for detection to obtain image information of facial images that meet the predetermined recognition conditions. Then, the target facial images that match the image information are selected from the at least two sets of facial image sequences and sent to the server for recognition. This avoids transmitting a large amount of facial image data directly to the server, effectively reducing the traffic consumption during the object recognition process. The server can quickly perform object recognition based on the target facial images that match the image information, thereby improving recognition efficiency while effectively reducing the traffic consumption during object recognition. Attached Figure Description

[0025] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0026] Figure 1 A schematic diagram of a system to which embodiments of this application can be applied is shown.

[0027] Figure 2 A schematic diagram of another system to which embodiments of this application can be applied is shown.

[0028] Figure 3 A flowchart of an object recognition method according to an embodiment of this application is shown.

[0029] Figure 4 A schematic diagram illustrating the coordinate system transformation relationship according to an embodiment of this application is shown.

[0030] Figure 5 A schematic diagram of coordinate system transformation relationships according to another embodiment of this application is shown.

[0031] Figure 6 A schematic diagram of an image rotation and translation process according to an embodiment of this application is shown.

[0032] Figure 7 A flowchart of an object recognition method according to another embodiment of this application is shown.

[0033] Figure 8 A flowchart of object recognition in a related technology in a certain scenario is shown.

[0034] Figure 9 A system flowchart illustrating object recognition in an embodiment of this application is shown.

[0035] Figure 10A block diagram of an object recognition device according to an embodiment of this application is shown.

[0036] Figure 11 A block diagram of an object recognition device according to another embodiment of this application is shown.

[0037] Figure 12 A block diagram of an electronic device according to an embodiment of this application is shown. Detailed Implementation

[0038] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0039] In the following description, specific embodiments of this application will be illustrated with reference to steps and symbols performed by one or more computers, unless otherwise stated. Therefore, these steps and operations will be referred to several times as being performed by a computer, and computer execution as referred to herein includes operations by a computer processing unit representing electronic signals of data in a structured format. This operation transforms the data or maintains it at a location in the computer's memory system, which can be reconfigured or otherwise alter the operation of the computer in a manner well known to those skilled in the art. The data structure maintained by the data is the physical location of the memory, which has specific characteristics defined by the data format. However, the principles of this application are described in the foregoing text, which is not intended to be limiting, and those skilled in the art will understand that many of the steps and operations described below can also be implemented in hardware.

[0040] Figure 1 A schematic diagram of a system 100 to which embodiments of this application can be applied is shown. For example... Figure 1 As shown, system 100 may include server 101 and terminal 102. Server 101 and terminal 102 can be connected directly or indirectly via wireless communication, and this application does not impose any special restrictions on this connection.

[0041] Data can be transmitted between server 101 and terminal 102 via a target protocol link. The target protocol link may include a transport layer protocol-based link, such as a Transmission Control Protocol (TCP) link or a User Datagram Protocol (UDP) link, as well as other transport layer protocols.

[0042] Server 101 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.

[0043] In one implementation of this example, server 101 is a cloud server that can provide artificial intelligence cloud services, such as AI cloud services for massively multiplayer online role-playing games (MMORPGs). AI cloud services are generally also referred to as AIaaS (AI as a Service). This is a mainstream service model for AI platforms. Specifically, AIaaS platforms break down several common AI services and provide them as independent or packaged services in the cloud. This service model is similar to opening an AI-themed marketplace: all developers can access and use one or more AI services provided by the platform through API interfaces. Some experienced developers can also use the AI framework and AI infrastructure provided by the platform to deploy and maintain their own dedicated cloud AI services. For example, server 101 can provide AI-based facial image recognition services.

[0044] Terminal 102 can be any device. In one embodiment of this example, terminal 102 is a terminal that can collect facial images by facial recognition, such as a dedicated facial recognition payment terminal (such as a facial recognition payment device in a store or a facial recognition payment device for public transportation), a facial recognition access terminal (such as a facial recognition device in an access control system), a mobile phone, a computer, a VR / AR device, a smartwatch, and a computer, etc.

[0045] In one implementation of this example, see [link to relevant documentation]. Figure 1In step S110, terminal 102 can acquire at least two sets of facial image sequences of the target object. Each set of facial image sequences includes at least one facial image, and each set of facial image sequences corresponds to an image type. As shown in the terminal interface of terminal 102 in step S110, the interface of terminal 102 can display facial images during the acquisition process. The target facial image sequence for pre-identification is determined from the at least two sets of facial image sequences. In step S120, terminal 102 can send a detection request carrying the target facial image sequence to server 101. The detection request instructs server 101 to detect image information of facial images that meet predetermined recognition conditions from the target facial image sequence. In step S130... Terminal 102 can receive image information returned by server 101 based on the detection request; in step S140, terminal 102 can filter out the target face image that matches the image information from at least two sets of face image sequences; in step S150, terminal 102 can send a recognition request carrying the target face image to server 101 to instruct server 101 to recognize the target object based on the target face image; in step S160, terminal 102 can receive the recognition result from server 101, wherein, as shown on the terminal interface of terminal 102 in step S160, the display information corresponding to the recognition result (such as payment success or payment failure) can be displayed on the terminal interface of terminal 102.

[0046] In one implementation of this example, see further. Figure 1 In step S120, server 101 can receive a detection request carrying a target face image sequence sent by terminal 102. The target face image sequence comes from at least two sets of face image sequences of the target object. Each set of face image sequences includes at least one face image, and each set of face image sequences corresponds to an image type. In step S130, server 101 can detect image information of face images that meet predetermined recognition conditions from the target face image sequence according to the detection request. In step S130, server 101 can transmit the image information to terminal 102 so that terminal 102 can filter out the target face image that matches the image information from at least two sets of face image sequences. In step S150, server 101 can receive a recognition request carrying a target face image sent by terminal 102. In step S160, server 101 can recognize the target object based on the target face image according to the recognition request, and can return the recognition result to terminal 102.

[0047] Figure 2 A schematic diagram of another system 200 to which embodiments of this application can be applied is shown. For example... Figure 2 As shown, system 200 can be a distributed system formed by connecting client 201 and multiple nodes 202 through network communication.

[0048] Taking a distributed system as an example, see blockchain system. Figure 2 , Figure 2 This is an optional structural diagram of the distributed system 200 provided in this application embodiment applied to a blockchain system. It consists of multiple nodes 202 and clients 201, forming a peer-to-peer (P2P) network. The P2P protocol is an application layer protocol running on top of the Transmission Control Protocol (TCP). In the distributed system, any machine, such as a server, can join and become a node 202 (each node 202 can be, for example, a server). Figure 1 The server 101 and node 202 can provide facial image recognition services. The nodes include a hardware layer, a middleware layer, an operating system layer, and an application layer.

[0049] See Figure 2 The functions of each node in the blockchain system shown include:

[0050] 1) Routing: A basic function of nodes used to support communication between nodes.

[0051] In addition to routing capabilities, nodes can also have the following functions:

[0052] 2) Applications are deployed in the blockchain to implement specific business needs. They record data related to the implementation of functions to form record data, carry digital signatures in the record data to indicate the source of the task data, and send the record data to other nodes in the blockchain system. When other nodes successfully verify the source and integrity of the record data, they add the record data to a temporary block.

[0053] For example, the business logic implemented by the application includes:

[0054] 2.1) A wallet is used to provide the function of conducting electronic currency transactions, including initiating transactions (i.e., sending the transaction record of the current transaction to other nodes in the blockchain system; after other nodes successfully verify the transaction, they store the transaction record data in the temporary block of the blockchain as a response to acknowledge the validity of the transaction; of course, the wallet also supports querying the remaining electronic currency in the electronic currency address.

[0055] 2.2) Shared ledger, used to provide functions such as storage, query and modification of ledger data. It sends the record data of the operation on the ledger data to other nodes in the blockchain system. After the other nodes verify the validity, as a response to acknowledge the validity of the ledger data, they store the record data in a temporary block. They can also send confirmation to the node that initiated the operation.

[0056] 2.3) Smart contracts are computerized protocols that can execute the terms of a contract. They are implemented through code deployed on a shared ledger that executes when certain conditions are met. Based on actual business needs, the code is used to complete automated transactions, such as querying the logistics status of goods purchased by a buyer and transferring the buyer's electronic money to the merchant's address after the buyer signs for the goods. Of course, smart contracts are not limited to executing contracts for transactions; they can also execute contracts for processing received information.

[0057] 3) A blockchain consists of a series of blocks that are sequentially generated. Once a new block is added to the blockchain, it will not be removed. The blocks contain the data submitted by the nodes in the blockchain system.

[0058] In one embodiment of this example, the terminal corresponding to client 201 can collect at least two sets of facial image sequences of the target object, each set of facial image sequences including at least one facial image, and each set of facial image sequences corresponding to an image type; determine the target facial image sequence to be identified from the at least two sets of facial image sequences; send a detection request carrying the target facial image sequence from the at least two sets of facial image sequences to the server corresponding to node 202, the detection request instructing the server corresponding to node 202 to detect image information of facial images that meet predetermined recognition conditions from the target facial image sequences; receive image information returned by the server corresponding to node 202 based on the detection request; filter out the target facial image that matches the image information from the at least two sets of facial image sequences; send a recognition request carrying the target facial image to the server corresponding to node 202 to instruct the server corresponding to node 202 to recognize the target object based on the target facial image; and receive the recognition result from the server corresponding to node 202.

[0059] In one implementation of this example, see further. Figure 2 The server corresponding to node 202 can receive a detection request carrying a target face image sequence sent by the terminal corresponding to client 201. The target face image sequence comes from at least two sets of face image sequences of the target object. Each set of face image sequences includes at least one face image, and each set of face image sequences corresponds to an image type. According to the detection request, the server detects image information of face images that meet predetermined recognition conditions from the target face image sequence. The server transmits the image information to the terminal so that the terminal can filter out the target face image that matches the image information from at least two sets of face image sequences. The server receives a recognition request carrying the target face image sent by the terminal. According to the recognition request, the server recognizes the target object based on the target face image and can return the recognition result to the terminal.

[0060] In this way, object recognition based on the blockchain network can be achieved, which can effectively reduce the traffic consumption during object recognition, improve recognition efficiency, and ensure the security of object recognition, while the data pressure in the blockchain network is low enough.

[0061] Figure 3 A flowchart illustrating an embodiment of an object recognition method according to this application is shown. The execution subject of this object recognition method can be any terminal, such as... Figure 1 The terminal 102 shown or as Figure 2 The terminal corresponding to client 201 shown.

[0062] like Figure 3 As shown, the object recognition method may include steps S310 to S360.

[0063] Step S310: Collect at least two sets of facial image sequences of the target object. Each set of facial image sequences includes at least one facial image, and each set of facial image sequences corresponds to an image type.

[0064] Step S320: Determine the target face image sequence for pre-identification from at least two sets of face image sequences;

[0065] Step S330: Send a detection request carrying a target face image sequence to the server. The detection request instructs the server to detect image information of face images that meet predetermined recognition conditions from the target face image sequence.

[0066] Step S340: Receive image information returned by the server based on the detection request;

[0067] Step S350: Select the target face image that matches the image information from at least two sets of face image sequences;

[0068] Step S360: Send a recognition request carrying a target face image to the server to instruct the server to recognize the target object based on the target face image.

[0069] The following describes the specific process of each step in object recognition.

[0070] In step S310, at least two sets of facial image sequences of the target object are acquired, each set of facial image sequences includes at least one facial image, and each set of facial image sequences corresponds to an image type.

[0071] In this example implementation, the target object is the object to be identified, such as a target user or a target organism. The facial image is a captured image of the target object's face, such as the face of a target user or a target organism.

[0072] A facial image sequence is a sequence consisting of at least one facial image. At least two sets of facial image sequences are acquired for the target object, each set corresponding to a different image type. This allows for comprehensive recognition of the target object based on multiple image types, ensuring reliable recognition. After the terminal acquires at least two sets of facial image sequences for the target object, they can be cached locally on the terminal.

[0073] Image types can include color images, depth images, and infrared images. Figure 3 The face image sequence can include three sets: color face image sequence, infrared face image sequence, and depth face image sequence. The face images in the color face image sequence can be color images, the face images in the depth face image sequence can be depth images, and the face images in the infrared face image sequence can be infrared images.

[0074] In one embodiment, step S310 involves acquiring at least two sets of facial image sequences of the target object, each set of facial image sequences including at least one facial image, including:

[0075] Acquire at least two sets of facial image streams of the target object, each set of facial image streams includes at least one initial facial image, and each set of facial image streams corresponds to one image type; convert the initial facial images corresponding to the same acquisition time point between the facial image streams into facial images with pixels aligned to the same coordinate space;

[0076] The face images obtained by converting the initial face image in each face image stream are concatenated to generate at least two face image sequences.

[0077] The initial facial image is the initial facial image captured by the camera. A series of initial facial images can be captured within a predetermined acquisition time period to obtain a facial image stream. For example, 10 consecutive initial facial images can be captured within a predetermined 2 seconds to form a facial image stream consisting of 10 initial facial images.

[0078] The face image stream can include three sets: a color image stream, an infrared image stream, and a depth image stream. The initial face image in the color image stream can be a color image, the initial face image in the depth image stream can be a depth image, and the initial face image in the infrared image stream can be an infrared image.

[0079] By simultaneously capturing images using at least one camera on the terminal, at least two sets of facial image streams can be obtained. In one example, the terminal can be equipped with three cameras: a color camera, an infrared camera, and a depth camera. These three cameras simultaneously and continuously capture initial facial images of the target object, resulting in three sets of facial image streams: a color image stream, an infrared image stream, and a depth image stream. In another example, the terminal can be equipped with two cameras: a color camera and an infrared camera. The color camera continuously captures an initial facial image (color image) image in natural light, obtaining a color image stream. The infrared camera can continuously capture an initial facial image (infrared image) image in diffuse infrared light, obtaining an infrared image stream. Simultaneously, the infrared camera can continuously capture speckle-structured infrared light, obtaining speckle. The speckle is then analyzed by depth cells to obtain the initial facial image (depth image), resulting in a depth image stream. Here, the depth image is an image or image channel containing information related to the distance from the surface of the scene object to the viewpoint. Each pixel in the depth image represents the vertical distance between the camera plane and the plane of the object being photographed, typically represented by 16 bits and measured in millimeters.

[0080] Each initial face image in a face image stream corresponds to a capture time point, which can be the camera's exposure time. Face image streams correspond to initial face images with the same capture time point; that is, each face image stream captures initial face images with the same time point. For example, face image stream A includes initial face images A1, A2, and A3, and face image stream B includes initial face images B1, B2, and B3. If the capture time point corresponding to A1 is X, and the capture time point corresponding to B1 is X, then the initial face images with the same capture time point X between face image streams A and B are A1 and B1.

[0081] The coordinate space refers to the spatial coordinate system corresponding to the pixels when the initial facial image is captured under the field of view of the camera. It can be understood that different cameras may capture facial image streams corresponding to different coordinate spaces due to differences in their setting position and orientation. The initial facial images corresponding to the same acquisition time point between facial image streams are converted into facial images with pixels aligned to the same coordinate space. For example, for the initial facial images A1 and B1 corresponding to the same acquisition time point X between facial image streams A and B, A1 and B1 are converted into facial images A11 and B11 with pixels aligned to the same coordinate space. This same coordinate space can be a specific coordinate space or the coordinate space corresponding to the target facial image stream in at least two sets of facial image streams. In this example implementation, the same coordinate space can be the coordinate space corresponding to the target facial image stream (e.g., facial image stream A) in at least two sets of facial image streams.

[0082] Finally, the face images converted from the initial face image in each face image stream are concatenated. For example, the face images converted from each initial face image in face image stream A are concatenated sequentially and numbered according to the order of acquisition time points to generate at least two face image sequences.

[0083] In this way, at least two sets of facial image sequences are generated by processing at least two sets of facial image streams, and the target object can be reliably identified based on the facial images in the at least two sets of facial image sequences.

[0084] In one embodiment, the step of converting initial facial images from the same acquisition time point in a facial image stream into facial images with pixels aligned to the same coordinate space includes:

[0085] At least two sets of face image streams are time-aligned to determine the initial face images corresponding to the same exposure time between the face image streams; the pixels in the initial face images corresponding to the same exposure time are aligned to the coordinate space corresponding to the target face image stream to obtain face images with pixels aligned to the same coordinate space.

[0086] When acquiring at least two sets of facial image streams, the cameras corresponding to each set of facial image streams are exposed simultaneously. When receiving the initial facial image, the timestamp corresponding to the exposure time is recorded. Based on the timestamp corresponding to the exposure time of each initial facial image, time alignment processing can be performed on at least two sets of facial image streams to determine the initial facial images corresponding to the same timestamp between the facial image streams, that is, the initial facial images corresponding to the same exposure time. Here, the exposure time refers to the time interval from the opening to the closing of the shutter.

[0087] Then, the pixels in the initial face images corresponding to the same exposure time are aligned to the coordinate space corresponding to the target face image stream (e.g., the coordinate space corresponding to the color image stream). That is, using the coordinate space corresponding to the target face image stream as a reference, pixel alignment processing is performed on the initial face images in the face image stream other than the target face image stream. The initial face images corresponding to the same exposure time are converted into face images with pixels aligned to the same coordinate space. This can efficiently and reliably perform spatial transformation and alignment processing of the initial face images, effectively ensuring the accuracy of object recognition.

[0088] In one embodiment, at least two sets of facial image streams include a color image stream, an infrared image stream, and a depth image stream. The image type corresponding to the color image stream includes a color image, the image type corresponding to the infrared image stream includes an infrared image, and the image type corresponding to the depth image stream includes a depth image. The step of aligning the pixels in the initial facial images corresponding to the same exposure time to the coordinate space corresponding to the target facial image stream to obtain a facial image with pixels aligned to the same coordinate space includes:

[0089] One approach is to align the pixels in the initial face images corresponding to the same exposure time in the color image stream, infrared image stream, and depth image stream to the coordinate space corresponding to the color image stream, thereby obtaining a face image with pixels aligned to the same coordinate space. Another approach is to align the pixels in the initial face images corresponding to the same exposure time in the color image stream, infrared image stream, and depth image stream to the coordinate space corresponding to the infrared image stream, thereby obtaining a face image with pixels aligned to the same coordinate space.

[0090] As mentioned earlier, different facial image streams may correspond to different coordinate spaces. The intrinsic parameters (i.e., the focal length and principal point position of the camera) and extrinsic parameters (i.e., the offset of angle and distance between cameras) of the camera can be extracted. Based on the intrinsic and extrinsic parameters of the camera, the coordinate transformation of the pixel points can be accurately performed to complete the spatial alignment and obtain a facial image with the pixel points aligned to the same coordinate space.

[0091] In this embodiment, when the facial image stream includes a color image stream, an infrared image stream, and a depth image stream, in the first approach, the pixels in the initial facial images corresponding to the same exposure time are aligned to the coordinate space corresponding to the color image stream, using the coordinate space corresponding to the color image stream as a reference. The applicant found that spatial alignment in this way can effectively ensure the accuracy of object recognition in subsequent steps. In the second approach, the pixels in the initial facial images corresponding to the same exposure time are aligned to the coordinate space corresponding to the infrared image stream, using the coordinate space corresponding to the infrared image stream as a reference. This method can efficiently perform spatial alignment while ensuring the accuracy of object recognition.

[0092] In the first method, specifically, the pixels of the initial face image in the infrared image stream can be aligned to the pixels of the initial face image in the color image stream with the same exposure time to obtain the face image in the infrared image stream aligned to the same coordinate space; then, the pixels of the initial face image in the depth image stream can be aligned to the pixels of the initial face image in the color image stream with the same exposure time to obtain the face image in the depth image stream aligned to the same coordinate space; and the initial face image in the color image stream is directly used as the face image in the color image stream aligned to the same coordinate space.

[0093] In the second method, specifically, the pixels of the initial face image in the color stream can be aligned to the pixels of the initial face image in the infrared stream with the same exposure time to obtain the face image in the color stream aligned to the same coordinate space. Then, when the infrared stream and the depth stream are captured by the same camera, the initial face image in the infrared stream can be directly used as the face image in the infrared stream aligned to the same coordinate space; and the initial face image in the depth stream can be directly used as the face image in the depth stream aligned to the same coordinate space.

[0094] Furthermore, let's take the example of aligning the pixels of the initial face image in the infrared stream to the pixels of the initial face image in the color stream with the same exposure time in the first method for illustration.

[0095] In one example, the first step is to obtain the coordinates of the pixels of the initial face image in the infrared image stream (e.g., Figure 4 In the two-dimensional coordinate system shown, p(x,y) is mapped from the intrinsic parameters of the camera acquiring the infrared image stream (i.e., the camera's focal length and principal point position) to the pixel coordinate system (i.e., as shown). Figure 4 The pixel coordinates (x, y) of the two-dimensional image at the focal length position of the camera capturing the infrared image stream are transformed to the camera coordinate system corresponding to the camera capturing the infrared image stream (i.e., for example...). Figure 4 Using the 3D coordinate system (Xc, Yc, Zc) shown, we can obtain the coordinates of the pixel in the camera coordinate system corresponding to the camera that acquired the infrared image stream (e.g., Figure 4 The second step involves retrieving the pixel's coordinates in the camera coordinate system corresponding to the camera capturing the infrared image stream. Based on the camera's extrinsic parameters (i.e., the angle and distance offset between the camera capturing the infrared image stream and the camera capturing the color image stream), a rotation matrix R corresponding to the angle between the cameras and an offset vector T corresponding to the distance are used for rotation and translation. This yields the pixel's coordinates in the camera coordinate system corresponding to the camera capturing the color image stream (e.g., P(Xc,Yc,Zc)). Figure 5 The coordinates in the coordinate system (XW, YW, ZW) shown (e.g.) Figure 5 The third step involves mapping the coordinates of the pixels in the camera coordinate system corresponding to the camera that captures the color image stream back to the two-dimensional coordinate system at the focal length position of the camera that captures the color image stream, similar to the first step. This is done by reverse mapping based on the intrinsic parameters of the camera that captures the color image stream (i.e., the focal length and principal point position of the camera), thus obtaining the face image corresponding to the infrared image stream aligned to the same coordinate space.

[0096] In short, such as Figure 6 As shown, the initial face image in the infrared image stream is aligned to the coordinate space corresponding to the color image stream by rotation and translation, resulting in a face image in the infrared image stream aligned to the same coordinate space.

[0097] In another example, the transformation relationship of the coordinate spaces corresponding to the two cameras (i.e., the camera that captures the infrared image stream and the camera that captures the color image stream) is expressed in the form of the following quadratic curve fitting:

[0098] ,

[0099] ,

[0100] Among them, coordinate offset All are coordinates The following quadratic function:

[0101] ,

[0102] The unknown parameters included in the above two formulas are: .

[0103] Based on the formula corresponding to the quadratic function above, the following matrix equation can be constructed:

[0104] ,

[0105] ,

[0106] The unknown parameters can be obtained by solving the above two matrix equations.

[0107] Furthermore, the parameter β = L1 / L2, where L1 is the distance between the optical axes of the two cameras, and L2 is the distance between the optical axis of the camera acquiring the infrared image stream and the main axis of the infrared laser source in the camera acquiring the infrared image stream.

[0108] Furthermore, the coordinates (x, y, y) of the pixels in the initial facial image in the infrared image stream can be obtained. o ,y o Align the pixel to the coordinate space corresponding to the color image stream to obtain the coordinates (x, y) of the pixel in the camera coordinate system corresponding to the camera that captured the color image stream. r ,y r ).

[0109] In step S320, a target face image sequence for pre-identification is determined from at least two sets of face image sequences.

[0110] In this example implementation, the pre-identified target face image sequence is used by the server to perform recognition detection on the face image sequence before object recognition of the target object. The number of face image sequences included in the pre-identified target face image sequence is less than the number of face image sequences in at least two sets of face image sequences. For example, if the number of face image sequences in at least two sets of face image sequences is 3, the number of face image sequences included in the pre-identified target face image sequence is less than 3. In one implementation, the number of face image sequences included in the pre-identified target face image sequence is 1, which can minimize traffic consumption.

[0111] Specifically, a face image sequence of a predetermined image type can be determined from at least two sets of face image sequences as the target face image sequence for pre-identification. In one example, the predetermined image type can be a color image, and in another example, the predetermined image type can be an infrared image.

[0112] In step S330, a detection request carrying a target face image sequence is sent to the server. The detection request instructs the server to detect image information of face images that meet predetermined recognition conditions from the target face image sequence.

[0113] In this example implementation, after the terminal acquires at least two sets of facial image sequences, it can cache the at least two sets of facial image sequences locally on the terminal. Then, it can copy the target facial image sequence from the at least two sets of facial image sequences and send the copied target facial image sequence to the server with low traffic consumption through a detection request, without having to directly transmit all facial image sequences to the server.

[0114] After receiving a detection request, the server can perform recognition detection on the face images in the target face image sequence, determine the recognition of each face image in the target face image sequence, and then determine the face images that meet the predetermined recognition conditions. The server can also extract the image information of the face images that meet the predetermined recognition conditions (which may be the position information or identifier of the face images that meet the predetermined recognition conditions in the target face image sequence).

[0115] In one embodiment, at least two sets of face image sequences include color face image sequences, and the image type corresponding to the color face image sequences includes color images; step S330, sending a detection request carrying a target face image sequence to the server, the detection request instructing the server to detect image information of face images that meet predetermined recognition conditions from the target face image sequence, including:

[0116] A detection request carrying a sequence of color face images is sent to the server. The detection request instructs the server to detect image information of face images that meet predetermined recognition conditions from the sequence of color face images.

[0117] Since the image types corresponding to the color face image sequence include color images, color images can accurately compare and identify the facial features of the target object. By transmitting the color face image sequence to the server for detection, and detecting the image information of the face image that meets the predetermined recognition conditions, it is possible to ensure the accuracy of the target object recognition by matching the image information uploaded in subsequent steps while reducing traffic consumption and improving recognition efficiency.

[0118] In one embodiment, sending a detection request carrying the color face image sequence to the server includes: compressing the color face image sequence to compress it into compressed color image data in a target format; and sending a detection request carrying the compressed color image data to the server.

[0119] The target format can include H.265 or H.264, etc. Compressing the color face image sequence into compressed color image data in the target format and sending it to the server can further reduce traffic consumption.

[0120] In one embodiment, at least two sets of face image sequences include an infrared face image sequence and a depth face image sequence corresponding to the same camera. The image type corresponding to the infrared face image sequence includes an infrared image, and the image type corresponding to the depth face image sequence includes a depth image. Step S330, sending a detection request carrying a target face image sequence to the server, wherein the detection request instructs the server to detect image information of face images that meet predetermined recognition conditions from the target face image sequence, includes: sending a detection request carrying an infrared face image sequence to the server, wherein the detection request instructs the server to detect image information of face images that meet predetermined recognition conditions from the infrared face image sequence.

[0121] For example, infrared face image sequences and depth face image sequences corresponding to the same camera, infrared camera U can continuously acquire initial face images (infrared maps) of diffuse infrared light imaging to obtain an infrared image stream. Infrared camera U can also continuously acquire speckle structure infrared light to obtain speckle, and then the depth unit analyzes the speckle to obtain the initial face image (depth map) of imaging to obtain a depth image stream. At this time, the infrared face image sequence corresponding to the infrared image stream and the depth face image sequence corresponding to the depth image stream correspond to the same camera U.

[0122] Furthermore, facial images in the infrared and depth facial image sequences from the same camera are correlated. By transmitting the color facial image sequence to the server for detection, image information of facial images that meet predetermined recognition conditions can be reliably detected, and the accuracy of target object recognition can be further guaranteed by matching the image information uploaded in subsequent steps.

[0123] In one embodiment, the step of sending a detection request carrying an infrared face image sequence to a server includes: compressing the infrared face image sequence to compress it into compressed infrared image data in a target format; and sending a detection request carrying the compressed infrared image data to the server.

[0124] The target format can include H.265 or H.264, etc. Compressing the infrared face image sequence into compressed infrared image data in the target format and sending it to the server can further reduce bandwidth consumption.

[0125] In step S340, the image information returned by the server based on the detection request is received.

[0126] Image information can be location information or identifiers, such as the number of a face image that meets predetermined recognition criteria within a target face image sequence. After detecting image information of a face image that meets the predetermined recognition criteria, the server can send it to the terminal.

[0127] In step S350, a target face image matching the image information is selected from at least two sets of face image sequences.

[0128] Each face image in each set of face image sequences or the remaining face image sequences can be filtered to select the target face image that matches the image information in each set of face image sequences. The remaining face image sequences are face image sequences other than the target face image in at least two sets of face image sequences.

[0129] For example, the image information is the positional information such as the number of the face image that meets the predetermined recognition conditions in the target face image sequence. Face images at the same acquisition time point in the face image sequence can correspond to the same positional information, and the target face image corresponding to the positional information can be selected from each group of face image sequences or the remaining face image sequences. Alternatively, the image information is the identifier corresponding to the face image that meets the predetermined recognition conditions in the target face image sequence. Face images at the same acquisition time point in the face image sequence can be labeled with the same identifier, and the target face image corresponding to the identifier can be selected from each group of face image sequences or the remaining face image sequences.

[0130] In one embodiment, at least two sets of face image sequences include three sets: a color face image sequence, an infrared face image sequence, and a depth face image sequence. The target face image sequence is a color face image sequence, and the image information is the image information corresponding to the face images in the color face image sequence that meet predetermined recognition conditions. At this time, a first face image with a color image type can be selected from the color face image sequence, a second face image with a depth image type can be selected from the infrared face image sequence, and a third face image with an infrared image type can be selected from the depth face image sequence. Thus, the target face image includes the first face image, the second face image, and the third face image, or the target face image includes the second face image and the third face image.

[0131] In one embodiment, at least two sets of face image sequences include three sets: a color face image sequence, an infrared face image sequence, and a depth face image sequence. The target face image sequence is an infrared face image sequence, and the image information is the image information corresponding to the face images in the infrared face image sequence that meet predetermined recognition conditions. In this case, a first face image of color type can be selected from the color face image sequence, a second face image of depth type can be selected from the infrared face image sequence, and a third face image of infrared type can be selected from the depth face image sequence. Thus, the target face image includes the first face image, the second face image, and the third face image, or the target face image includes the first face image and the third face image.

[0132] In one embodiment, the image information includes the position information of a face image whose recognition meets predetermined recognition conditions in a target face image sequence; filtering out target face images that match the image information from at least two sets of face image sequences includes: searching for face images corresponding to the position information from each set of face image sequences; and using the searched face images as target face images.

[0133] The location information can be the ID of a face image that meets predetermined recognition criteria within the target face image sequence. This ID can be generated during the concatenation processing of face images corresponding to each image stream, and face images from the same acquisition time point in different face image sequences can correspond to the same ID. Based on the location information, the face image corresponding to this location information can be searched accurately and efficiently from each group of face image sequences.

[0134] In step S360, a recognition request carrying a target face image is sent to the server to instruct the server to recognize the target object based on the target face image.

[0135] In this example implementation, sending a recognition request carrying a target face image to the server instructs the server to recognize the target object based on the target face image. This reliably avoids directly transmitting a large amount of face image data to the server, effectively reducing traffic consumption during object recognition. The server can quickly perform object recognition based on the target face image that matches the image information, thereby improving recognition efficiency while effectively reducing traffic consumption during object recognition.

[0136] In one embodiment, if the target face image is a target face image that matches the image information in each of at least two sets of face image sequences, then the target object can be directly identified based on the target face image.

[0137] In one embodiment, if the target face image is the target face image that matches the image information in each group of face image sequences in the remaining face image sequence, and the remaining face image sequence consists of face image sequences other than the target face image in at least two groups of face image sequences, then the server can also obtain a pre-identified face image that matches the image information from the target face image sequence that has been transmitted to the server, and identify the target object based on the target face image and the pre-identified face image. For example, the at least two groups of face image sequences include three groups: a color face image sequence, an infrared face image sequence, and a depth face image sequence. The target face image sequence is a color face image sequence, and the target face image includes a second face image and a third face image. The server can obtain a first face image that matches the image information from the color face image sequence, and perform object recognition based on the first face image, the second face image, and the third face image. In this way, the number of target face images transmitted by the terminal in the second transmission is further reduced, which can further reduce traffic consumption.

[0138] In this way, based on steps S310 to S360, after acquiring at least two sets of facial image sequences, the pre-identified target facial image sequence is transmitted to the server for detection to obtain image information of facial images that meet the predetermined recognition conditions. Then, the target facial images that match the image information are selected from the at least two sets of facial image sequences and sent to the server for recognition. This avoids directly transmitting a large amount of facial image data to the server, effectively reducing the traffic consumption during the object recognition process. The server can quickly perform object recognition based on the target facial images that match the image information, thereby effectively reducing the traffic consumption during object recognition while improving recognition efficiency.

[0139] Figure 7 A flowchart illustrating an embodiment of an object recognition method according to this application is shown schematically. The execution entity of this object recognition method can be any server, such as... Figure 1 The server 101 shown or as Figure 2 The server corresponding to node 202 shown.

[0140] like Figure 7 As shown, the object recognition method may include steps S410 to S450.

[0141] Step S410: Receive a detection request carrying a target face image sequence sent by the terminal. The target face image sequence comes from at least two sets of face image sequences of the target object. Each set of face image sequences includes at least one face image, and each set of face image sequences corresponds to an image type.

[0142] Step S420: Based on the detection request, detect image information of face images that meet the predetermined recognition conditions from the target face image sequence;

[0143] Step S430: Transmit image information to the terminal so that the terminal can filter out a target face image that matches the image information from at least two sets of face image sequences.

[0144] Step S440: Receive a recognition request carrying a target face image sent by the receiving terminal;

[0145] Step S450: Based on the recognition request, the target object is recognized according to the target face image.

[0146] The following describes the specific process of each step in object recognition.

[0147] In step S410, the receiving terminal sends a detection request carrying a target face image sequence. The target face image sequence comes from at least two sets of face image sequences of the target object. Each set of face image sequences includes at least one face image, and each set of face image sequences corresponds to an image type.

[0148] In this example implementation, the server only receives detection requests carrying target face image sequences from at least two sets of face image sequences, and does not receive all face image sequences from at least two sets of face image sequences.

[0149] The target object is the object to be identified, such as a target user or a target organism. The facial image is a captured image of the target object's face, such as the face of a target user or a target organism.

[0150] A facial image sequence is a sequence consisting of at least one facial image. At least two sets of facial image sequences are acquired for the target object, each set corresponding to a different image type. This allows for comprehensive recognition of the target object based on multiple image types, ensuring reliable recognition. After the terminal acquires at least two sets of facial image sequences for the target object, they can be cached locally on the terminal.

[0151] Image types can include color images, depth images, and infrared images. Figure 3 The face image sequence can include three sets: color face image sequence, infrared face image sequence, and depth face image sequence. The face images in the color face image sequence can be color images, the face images in the depth face image sequence can be depth images, and the face images in the infrared face image sequence can be infrared images.

[0152] In step S420, based on the detection request, image information of face images that meet predetermined recognition conditions are detected from the target face image sequence.

[0153] In this example implementation, after receiving a detection request, the server can perform recognition detection on the face images in the target face image sequence, determine the recognition of each face image in the target face image sequence, and then determine the face images that meet the predetermined recognition conditions, and extract the image information of the face images that meet the predetermined recognition conditions (which may be the position information or identifier of the face images that meet the predetermined recognition conditions in the target face image sequence).

[0154] In one embodiment, step S420, detecting image information of face images that meet predetermined recognition conditions from the target face image sequence, includes:

[0155] Feature extraction is performed on the facial images in the target facial image sequence to obtain the image feature information of the facial images in the target image sequence; recognition analysis is performed based on the image feature information of the facial images in the target facial image sequence to obtain the recognition degree of the facial images in the target facial image sequence; and image information corresponding to the facial images whose recognition degree meets the predetermined recognition conditions is extracted from the target facial image sequence.

[0156] Image feature information can include features such as face angle, face size, face centering, and image sharpness. After extracting the image feature information, the recognition rate can be analyzed based on a predetermined scoring function to obtain the recognition rate of the face images in the target face image sequence (e.g., the weighted sum of all information in the image feature information is used as the recognition rate).

[0157] Then, the predetermined recognition condition can be the image information corresponding to the face image with the highest recognition rate. This can filter out the fewest number of target face images, minimizing bandwidth consumption and improving recognition efficiency. Alternatively, the predetermined recognition condition can be the image information corresponding to the few face images with the highest recognition rate. This can match fewer target face images. The bandwidth consumption is slightly higher than the previous method, but overall it can still effectively reduce bandwidth consumption and improve recognition efficiency, and can effectively ensure the recognition success rate.

[0158] In step S430, the image information is transmitted to the terminal so that the terminal can filter out the target face image that matches the image information from at least two sets of face image sequences.

[0159] After detecting image information, the server can transmit the image information to the terminal. The terminal can filter each face image in each set of face image sequences and select the target face image that matches the image information in each set of face image sequences.

[0160] For example, the image information is the positional information such as the number of the face image that meets the predetermined recognition conditions in the target face image sequence. Face images at the same acquisition time point in the face image sequence can correspond to the same positional information, and the target face image corresponding to the positional information can be selected in each group of face image sequences. Alternatively, the image information is the identifier corresponding to the face image that meets the predetermined recognition conditions in the target face image sequence. Face images at the same acquisition time point in the face image sequence can be labeled with the same identifier, and the target face image corresponding to the identifier can be selected in each group of face image sequences.

[0161] In step S440, the receiving terminal sends a recognition request carrying a target face image.

[0162] The server can receive a relatively small number of target face images, resulting in low bandwidth consumption. In one embodiment, at least two sets of face image sequences include three sets: a color face image sequence, an infrared face image sequence, and a depth face image sequence, with the target face image sequence being a color face image sequence. In another embodiment, at least two sets of face image sequences include three sets: a color face image sequence, an infrared face image sequence, and a depth face image sequence, with the target face image sequence being an infrared face image sequence.

[0163] In step S450, the target object is identified based on the target face image according to the recognition request.

[0164] The server can identify target objects based on only a small number of target facial images, effectively reducing the server's data processing pressure, ensuring that the server's parallel data processing bandwidth is not squeezed, and achieving extremely high recognition efficiency.

[0165] In one embodiment, the target face image includes a first face image of color type, a second face image of depth type, and a third face image of infrared type; in step S450, the target object is identified based on the target face image, including:

[0166] Liveness detection is performed based on the second and third face images to determine the realism of the target object; when the realism meets the target realism condition, facial feature comparison and recognition are performed based on the first face image to identify the target object.

[0167] Liveness detection using the second facial image from the depth map can determine whether the target object is an abnormal object without contours, such as a photograph, during image acquisition. Liveness detection using the third facial image from the infrared map can determine whether the target object is an object with abnormal brightness, such as a silicone head mold, during image acquisition. Furthermore, liveness detection can preliminarily determine the realism of the target object. If the target object is determined to be an abnormal object, its realism does not meet the target realism criteria; if the target object is determined to be a normal object, its realism meets the target realism criteria.

[0168] When the realism meets the target realism condition, facial features can be extracted from the first facial image. By comparing the extracted facial features with preset facial feature samples corresponding to the target object, the identity of the target object can be accurately identified. Furthermore, when performing facial feature comparison and recognition based on the first facial image, a second facial image with a depth map can be used for auxiliary recognition. Facial features can be extracted from the first facial image, and image features can be extracted from the second facial image. By comparing the extracted facial features and image features with preset facial feature samples and preset image features corresponding to the target object, the identity of the target object can be further accurately identified.

[0169] In this manner, based on steps S410 to S450, for at least two sets of facial image sequences, the server only detects the transmitted target facial image sequences to obtain image information of facial images that meet predetermined recognition conditions, and transmits it to the terminal. The terminal then selects the target facial images that match the image information from at least two sets of facial image sequences. Then, the server only receives a small number of target facial images for object recognition, avoiding the server directly receiving a large amount of facial image data, effectively reducing the traffic consumption during the object recognition process. The server can quickly perform object recognition based on the target facial images that match the image information, and the server pressure is low. Thus, the recognition efficiency is improved while effectively reducing the traffic consumption during object recognition.

[0170] Based on the methods described in the above embodiments, the following examples will provide further detailed explanations.

[0171] The following example illustrates the object recognition process in a cloud-based uplink facial recognition system. Figure 8 A flowchart of object recognition in a related technology in a certain scenario is shown. Figure 9 A system flowchart for object recognition using an embodiment of this application is shown in one scenario. In this scenario, the acquired target object's at least two sets of facial image streams include three sets: a color image stream, an infrared image stream, and a depth image stream; the target object's at least two sets of facial image sequences include three sets: a color facial image sequence, an infrared facial image sequence, and a depth facial image sequence.

[0172] See Figure 8 In related technologies, in cloud-based uplink facial recognition systems, color image streams, infrared image streams, and depth image streams of the target object are collected and uploaded to the corresponding server in the cloud in real time. Then, the server optimizes the color image streams, infrared image streams, and depth image streams in the cloud to select the initial facial image of the target object. Then, the server identifies the target object in the cloud based on the initial facial image of the target object, which may include steps such as liveness detection and facial feature comparison and recognition to complete a facial recognition operation.

[0173] In related technologies, the object recognition process transmits a large amount of facial image data, resulting in high traffic and drawbacks such as high traffic costs and squeezed server bandwidth for parallel data processing. As a result, the object recognition process suffers from high traffic consumption and low recognition efficiency.

[0174] See Figure 9 The process of object recognition using the embodiments of this application in a cloud-based uplink face recognition system includes steps S510 to S570.

[0175] In step S510, the terminal acquires three sets of facial image sequences of the target object (including color facial image sequence, infrared facial image sequence and depth facial image sequence). Each set of facial image sequences includes at least one facial image, and each set of facial image sequences corresponds to an image type (including color image, infrared image and depth image).

[0176] Step S510 specifically includes: Step S511, acquiring three sets of facial image streams of the target object (including color image stream, infrared image stream, and depth image stream), each set of facial image streams includes at least one initial facial image, and each set of facial image streams corresponds to an image type; Step S512, converting the initial facial images corresponding to the same acquisition time point between the facial image streams into facial images with pixel points aligned to the same coordinate space, which can be the coordinate space corresponding to the color image stream or the infrared image stream; Step S513, concatenating the facial images obtained from the initial facial images in each set of facial image streams to generate three sets of facial image sequences, wherein, during concatenation, facial images at the same acquisition time point between the facial image sequences can be set with the same image information (e.g., number), wherein the facial images corresponding to the color image stream are concatenated to generate a color facial image sequence, the facial images corresponding to the infrared image stream are concatenated to generate an infrared facial image sequence, and the facial images corresponding to the depth image stream are concatenated to generate a depth facial image sequence.

[0177] In step S520, a detection request is sent to the server in the cloud, carrying the target face image sequence (which can be a color face image sequence or an infrared face image sequence) from three sets of face image sequences. The detection request instructs the server to detect image information of face images that meet the predetermined recognition conditions from the target face image sequence.

[0178] Step S520 specifically includes: Step S521, the terminal caches the three sets of face image sequences locally; Step S522, the target face image sequence is obtained from the three sets of face image sequences, that is, the target face image sequence is copied from the three sets of face image sequences; Step S523, the target face image sequence is compressed to compress the target face image sequence into compressed image data in the target format (which can be H265 or H264) (which can be compressed color image data in the target format or compressed infrared image data in the target format), and then a detection request carrying compressed color image data is sent to the server.

[0179] Step S530: The server detects image information of face images that meet predetermined recognition conditions from the target face image sequence in the cloud. Step S530 specifically includes: Step S531: The server decompresses the compressed image data to obtain the target face image sequence; Step S532: The server detects image information of face images that meet predetermined recognition conditions from the target face image sequence.

[0180] Step S532 may specifically include performing feature extraction processing on the face images in the target face image sequence to obtain image feature information of the face images in the target image sequence; performing recognition analysis based on the image feature information of the face images in the target face image sequence to obtain the recognition degree of the face images in the target face image sequence; and extracting image information corresponding to face images whose recognition degree meets the predetermined recognition conditions from the target face image sequence.

[0181] Step S540: The server sends image information to the terminal;

[0182] In step S550, the terminal receives image information returned by the server based on the detection request, and selects a target face image that matches the image information from at least two sets of face image sequences. The target face image includes a first face image selected from the color face image sequence as a color image, a second face image selected from the infrared face image sequence as a depth image, and a third face image selected from the depth face image sequence as an infrared image.

[0183] In step S560, the terminal sends a recognition request carrying a target face image to the server, instructing the server to recognize the target object based on the target face image.

[0184] Step S570: The server identifies the target object based on the target face image according to the recognition request. Step S570 specifically includes: Step S571: Liveness detection is performed based on the second face image and the third face image to determine the authenticity of the target object; Step S572: When the authenticity meets the target authenticity condition, facial feature comparison recognition is performed based on the first face image to identify the target object.

[0185] like Figure 8 The related technologies shown are based on cloud computing in the uplink facial recognition system. The average data consumption for a single object recognition operation is calculated as follows:

[0186] Color image stream bandwidth consumption: 100KB (640x480 resolution / JPEG RAW16 format) x 30FPS x 20 seconds = 60MB; Infrared image stream bandwidth consumption: 50KB (640x480 resolution / JPEG RAW8 format) x 30FPS x 20 seconds = 15MB; Depth image stream bandwidth consumption: 150KB (640x480 resolution / PNG lossless compressed RAW16 format) x 15FPS x 20 seconds = 45MB; Total bandwidth consumption: 60MB + 15MB + 45MB = 120MB.

[0187] like Figure 9 In the cloud-based low-traffic uplink facial recognition system shown, the average traffic consumption for a single object recognition operation, using the embodiments of this application, is calculated as follows:

[0188] Compressed image data corresponding to the target face image sequence: 100KB / s (resolution 640x480) x 20 seconds = 2MB; First face image in the target face image: 100KB (resolution 640x480 / format JPEG RAW16); Third face image in the target face image: 50KB (resolution 640x480 / format JPEG RAW8); Second face image in the target face image: 150KB (resolution 640x480 / format PNG lossless compression RAW16); Total data consumption: 2MB + 0.1MB + 0.05MB + 0.15MB = 2.3MB.

[0189] As can be seen, in this scenario, the cloud-based uplink face recognition system performs object recognition by applying the embodiments of this application, avoiding the direct transmission of a large amount of facial image data to the server, effectively reducing the traffic consumption during the object recognition process. The server can quickly perform object recognition based on the target facial image that matches the image information, thereby improving recognition efficiency while effectively reducing the traffic consumption during object recognition.

[0190] To facilitate better implementation of the object recognition method provided in this application, this application also provides an object recognition device based on the above-described object recognition method. The meanings of the terms used are the same as in the object recognition method described above, and specific implementation details can be found in the descriptions within the method embodiments. Figure 10 A block diagram of an object recognition device according to an embodiment of this application is shown. Figure 11 A block diagram of an object recognition device according to another embodiment of this application is shown.

[0191] like Figure 10 As shown, the object recognition device 600 may include a collection module 610, a determination module 620, a transmission module 630, an acquisition module 640, a filtering module 650, and a recognition module 660. The object recognition device 600 can be applied to a terminal.

[0192] The acquisition module 610 can be used to acquire at least two sets of facial image sequences of a target object, each set of facial image sequences including at least one facial image, and each set of facial image sequences corresponding to an image type; the determination module 620 can be used to determine a target facial image sequence for pre-identification from the at least two sets of facial image sequences; the transmission module 630 can be used to send a detection request carrying the target facial image sequence to the server, the detection request instructing the server to detect image information of facial images that meet predetermined recognition conditions from the target facial image sequence; the acquisition module 640 can be used to receive the image information returned by the server based on the detection request; the filtering module 650 can be used to filter out target facial images that match the image information from the at least two sets of facial image sequences; the recognition module 660 can be used to send a recognition request carrying the target facial image to the server, instructing the server to recognize the target object based on the target facial image.

[0193] In some embodiments of this application, the acquisition module 610 includes: an image stream acquisition unit, used to acquire at least two sets of facial image streams of a target object, each set of facial image streams including at least one initial facial image, and each set of facial image streams corresponding to an image type; an alignment unit, used to convert the initial facial images corresponding to the same acquisition time point between the facial image streams into facial images with pixel points aligned to the same coordinate space; and a sequence generation unit, used to concatenate the facial images obtained from the initial facial images in each set of facial image streams to generate the at least two sets of facial image sequences.

[0194] In some embodiments of this application, the alignment unit includes: a time alignment subunit, used to perform time alignment processing on the at least two sets of face image streams to determine initial face images corresponding to the same exposure time between the face image streams; and a spatial alignment subunit, used to align the pixels in the initial face images corresponding to the same exposure time to the coordinate space corresponding to the target face image stream, to obtain the face image with pixels aligned to the same coordinate space.

[0195] In some embodiments of this application, the at least two sets of facial image streams include a color image stream, an infrared image stream, and a depth image stream. The image type corresponding to the color image stream includes a color image, the image type corresponding to the infrared image stream includes an infrared image, and the image type corresponding to the depth image stream includes a depth image. The spatial alignment subunit includes one of a first alignment subunit and a second alignment subunit. The first alignment subunit is used to: align the pixels in the initial facial images corresponding to the same exposure time in the color image stream, infrared image stream, and depth image stream to the coordinate space corresponding to the color image stream, to obtain facial images with pixels aligned to the same coordinate space. The second alignment subunit is used to: align the pixels in the initial facial images corresponding to the same exposure time in the color image stream, infrared image stream, and depth image stream to the coordinate space corresponding to the infrared image stream, to obtain one type of facial image with pixels aligned to the same coordinate space.

[0196] In some embodiments of this application, at least two sets of face image sequences include color face image sequences, and the image type corresponding to the color face image sequences includes color images; the transmission module 630 includes: a first transmission unit, configured to send a detection request carrying the color face image sequences to the server, the detection request instructing the server to detect image information of face images that meet predetermined recognition conditions from the color face image sequences.

[0197] In some embodiments of this application, the first transmission unit is configured to: compress the color face image sequence to compress the color face image sequence into compressed color image data in a target format; and send a detection request carrying the compressed color image data to the server.

[0198] In some embodiments of this application, the at least two sets of face image sequences include an infrared face image sequence and a depth face image sequence corresponding to the same camera. The image type corresponding to the infrared face image sequence includes an infrared image, and the image type corresponding to the depth face image sequence includes a depth image. The transmission module 630 includes: a second transmission unit, used to send a detection request carrying the infrared face image sequence to the server, the detection request instructing the server to detect image information of face images that meet predetermined recognition conditions from the infrared face image sequence.

[0199] In some embodiments of this application, the second transmission unit is used to compress the infrared face image sequence to compress the infrared face image sequence into compressed infrared image data in a target format; and to send a detection request carrying the compressed infrared image data to the server.

[0200] In some embodiments of this application, the image information includes the position information of a face image whose recognition meets predetermined recognition conditions in the target face image sequence; the filtering module 650 includes: a search unit, used to search for a face image corresponding to the position information from each group of the face image sequences; and a target determination unit, used to use the searched face image as the target face image.

[0201] In this way, based on the object recognition device 600, after acquiring at least two sets of facial image sequences, the pre-identified target facial image sequence is transmitted to the server for detection to obtain image information of facial images that meet the predetermined recognition conditions. Then, the target facial images that match the image information are selected from the at least two sets of facial image sequences and sent to the server for recognition. This avoids directly transmitting a large amount of facial image data to the server, effectively reducing the traffic consumption in the object recognition process. The server can quickly perform object recognition based on the target facial images that match the image information, thereby improving recognition efficiency while effectively reducing the traffic consumption during object recognition.

[0202] like Figure 11 As shown, the object recognition device 700 may include a detection request receiving module 710, a detection module 720, a sending module 730, an recognition request receiving module 740, and a recognition response module 750. The object recognition device 700 can be applied to a server.

[0203] The detection request receiving module 710 can be used to receive a detection request sent by a terminal carrying a target face image sequence, wherein the target face image sequence originates from at least two sets of face image sequences of a target object, each set of face image sequences includes at least one face image, and each set of face image sequences corresponds to an image type; the detection module 720 can be used to detect image information of face images that meet predetermined recognition conditions from the target face image sequence according to the detection request; the sending module 730 can be used to transmit the image information to the terminal, so that the terminal can filter out target face images that match the image information from the at least two sets of face image sequences; the recognition request receiving module 740 can be used to receive a recognition request sent by the terminal carrying the target face image; the recognition response module 750 can be used to recognize the target object based on the target face image according to the recognition request.

[0204] In some embodiments of this application, the detection module 720 includes: a feature extraction unit, used to perform feature extraction processing on the face images in the target face image sequence to obtain image feature information of the face images in the target image sequence; a recognition degree analysis unit, used to perform recognition degree analysis based on the image feature information of the face images in the target face image sequence to obtain the recognition degree of the face images in the target face image sequence; and an information extraction unit, used to extract image information corresponding to face images whose recognition degree meets predetermined recognition conditions from the target face image sequence.

[0205] In some embodiments of this application, the target face image includes a first face image of color type, a second face image of depth type, and a first face image of infrared type; the recognition response module 750 includes: a first response unit, used to perform liveness detection based on the second face image and the third face image to determine the authenticity of the target object; and a second response unit, used to perform facial feature comparison recognition based on the first face image when the authenticity meets the target authenticity condition, so as to identify the target object.

[0206] In this way, based on the object recognition device 700, for at least two sets of facial image sequences, the server only detects the transmitted target facial image sequence to obtain image information of facial images that meet the predetermined recognition conditions, and transmits it to the terminal. The terminal then selects the target facial images that match the image information from the at least two sets of facial image sequences. Then, the server only receives a small number of target facial images for object recognition, avoiding the server directly receiving a large amount of facial image data, effectively reducing the traffic consumption during the object recognition process. The server can quickly perform object recognition based on the target facial images that match the image information, and the server load is low. Thus, the recognition efficiency is improved while effectively reducing the traffic consumption during object recognition.

[0207] It should be noted that although several modules or units for the device used to perform actions have been mentioned in the detailed description above, this division is not mandatory. In fact, according to the embodiments of this application, the features and functions of two or more modules or units described above can be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided and embodied by multiple modules or units.

[0208] Furthermore, embodiments of this application also provide an electronic device, which can be a terminal or a server, such as... Figure 12 As shown, it illustrates a structural schematic diagram of the electronic device involved in the embodiments of this application, specifically:

[0209] The electronic device may include components such as a processor 801 with one or more processing cores, a memory 802 with one or more computer-readable storage media, a power supply 803, and an input unit 804. Those skilled in the art will understand that... Figure 12 The electronic device structure shown does not constitute a limitation on the electronic device and may include more or fewer components than shown, or combine certain components, or have different component arrangements. Wherein:

[0210] The processor 801 is the control center of the electronic device. It connects to various parts of the computer device via various interfaces and lines. By running or executing software programs and / or modules stored in the memory 802, and by calling data stored in the memory 802, it performs various functions of the computer device and processes data, thereby providing overall monitoring of the electronic device. Optionally, the processor 801 may include one or more processing cores; preferably, the processor 801 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user page, and application programs, and the modem processor mainly handles wireless communication. It is understood that the modem processor may not be integrated into the processor 801.

[0211] The memory 802 can be used to store software programs and modules. The processor 801 executes various functional applications and data processing by running the software programs and modules stored in the memory 802. The memory 802 may mainly include a program storage area and a data storage area. The program storage area may store the operating system, application programs required for at least one function (such as sound playback function, image playback function, etc.), etc.; the data storage area may store data created according to the use of the computer device, etc. In addition, the memory 802 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 802 may also include a memory controller to provide the processor 801 with access to the memory 802.

[0212] The electronic device also includes a power supply 803 that supplies power to the various components. Preferably, the power supply 803 can be logically connected to the processor 801 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system. The power supply 803 may also include one or more DC or AC power supplies, recharging systems, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components.

[0213] The electronic device may also include an input unit 804, which can be used to receive input digital or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

[0214] Although not shown, the electronic device may also include a display unit, etc., which will not be described in detail here. Specifically, in this embodiment, the processor 801 in the electronic device loads the executable files corresponding to the processes of one or more application programs into the memory 802 according to the following instructions, and the processor 801 runs the application programs stored in the memory 802 to realize various functions, such as the processor 801 executing:

[0215] The system acquires at least two sets of facial image sequences of a target object, each set including at least one facial image, and each set corresponding to an image type; determines a target facial image sequence for pre-identification from the at least two sets of facial image sequences; sends a detection request carrying the target facial image sequence to a server, the detection request instructing the server to detect image information of facial images that meet predetermined recognition conditions from the target facial image sequence; receives the image information returned by the server based on the detection request; filters out target facial images that match the image information from the at least two sets of facial image sequences; and sends a recognition request carrying the target facial image to the server, instructing the server to recognize the target object based on the target facial image.

[0216] In one embodiment, the processor 801 may execute: receiving a detection request from a terminal carrying a target face image sequence, wherein the target face image sequence originates from at least two sets of face image sequences of a target object, each set of face image sequences including at least one face image, and each set of face image sequences corresponding to an image type; detecting image information of face images that meet predetermined recognition conditions from the target face image sequence according to the detection request; transmitting the image information to the terminal so that the terminal can filter out target face images that match the image information from the at least two sets of face image sequences; receiving a recognition request from the terminal carrying the target face image; and recognizing the target object based on the target face image according to the recognition request.

[0217] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be performed by a computer program, or by a computer program controlling related hardware. The computer program can be stored in a computer-readable storage medium and loaded and executed by a processor.

[0218] Therefore, embodiments of this application also provide a storage medium storing a computer program that can be loaded by a processor to execute the steps in any of the methods provided in embodiments of this application.

[0219] The storage medium may include: read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.

[0220] Since the computer program stored in the storage medium can execute the steps of any of the methods provided in the embodiments of this application, the beneficial effects that the methods provided in the embodiments of this application can achieve can be realized. For details, please refer to the previous embodiments, which will not be repeated here.

[0221] According to one aspect of this application, a computer program product or computer program is provided, comprising computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the methods provided in the various optional implementations of the above embodiments of this application.

[0222] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein.

[0223] It should be understood that this application is not limited to the embodiments described above and shown in the accompanying drawings, but various modifications and changes can be made without departing from its scope.

Claims

1. An object recognition method, characterized in that, include: At least two sets of facial image sequences of the target object are collected, each set of facial image sequences includes at least one facial image, and each set of facial image sequences corresponds to an image type; Determine the target face image sequence to be identified from the at least two sets of face image sequences; A detection request carrying the target face image sequence is sent to the server. The detection request instructs the server to detect image information of face images that meet predetermined recognition conditions from the target face image sequence. Herein, detecting face images that meet predetermined recognition conditions means performing recognition degree detection on face images in the target face image sequence, determining the recognition degree of each face image in the target face image sequence, thereby determining face images that meet predetermined recognition conditions, and extracting image information of face images that meet predetermined recognition conditions. Receive the image information returned by the server based on the detection request; From the at least two sets of facial image sequences, a target facial image that matches the image information is selected; A recognition request carrying the target face image is sent to the server to instruct the server to recognize the target object based on the target face image.

2. The method according to claim 1, characterized in that, The acquisition of at least two sets of facial image sequences of the target object, each set of facial image sequences including at least one facial image, including: At least two sets of facial image streams of the target object are acquired, each set of facial image streams includes at least one initial facial image, and each set of facial image streams corresponds to an image type; The initial facial images at the same acquisition time point between the facial image streams are converted into facial images with pixels aligned to the same coordinate space. The facial images obtained by converting the initial facial image in each set of facial image streams are concatenated to generate the at least two sets of facial image sequences.

3. The method according to claim 2, characterized in that, The step of converting initial facial images from the facial image streams that correspond to the same acquisition time point into facial images with pixels aligned to the same coordinate space includes: The at least two sets of facial image streams are time-aligned to determine the initial facial images corresponding to the same exposure time between the facial image streams; The pixels in the initial face image with the same exposure time are aligned to the coordinate space corresponding to the target face image stream to obtain the face image with pixels aligned to the same coordinate space.

4. The method according to claim 3, characterized in that, The at least two sets of facial image streams include a color image stream, an infrared image stream, and a depth image stream. The image type corresponding to the color image stream includes a color image, the image type corresponding to the infrared image stream includes an infrared image, and the image type corresponding to the depth image stream includes a depth image. Aligning pixels in an initial face image with the same exposure time to the coordinate space corresponding to the target face image stream to obtain the face image with pixels aligned to the same coordinate space includes: The pixels in the initial face images corresponding to the same exposure time in the color image stream, infrared image stream, and depth image stream are aligned to the coordinate space corresponding to the color image stream, resulting in face images with pixels aligned to the same coordinate space. Align the pixels in the initial face images corresponding to the same exposure time in the color image stream, infrared image stream, and depth image stream to the coordinate space corresponding to the infrared image stream to obtain one type of face image with pixels aligned to the same coordinate space.

5. The method according to claim 1, characterized in that, The at least two sets of facial image sequences include color facial image sequences, and the image type corresponding to the color facial image sequences includes color images; Sending a detection request carrying the target face image sequence to the server, the detection request instructing the server to detect image information of face images that meet predetermined recognition conditions from the target face image sequence, includes: A detection request carrying the color face image sequence is sent to the server, the detection request instructing the server to detect image information of face images that meet predetermined recognition conditions from the color face image sequence.

6. The method according to claim 5, characterized in that, Sending a detection request carrying the color face image sequence to the server includes: The color face image sequence is compressed to compress it into compressed color image data in the target format. A detection request carrying the compressed color image data is sent to the server.

7. The method according to claim 1, characterized in that, The at least two sets of facial image sequences include an infrared facial image sequence and a depth facial image sequence corresponding to the same camera. The image type corresponding to the infrared facial image sequence includes an infrared image, and the image type corresponding to the depth facial image sequence includes a depth image. Sending a detection request carrying the target face image sequence to the server, the detection request instructing the server to detect image information of face images that meet predetermined recognition conditions from the target face image sequence, includes: A detection request carrying the infrared face image sequence is sent to the server, the detection request instructing the server to detect image information of face images that meet predetermined recognition conditions from the infrared face image sequence.

8. The method according to claim 7, characterized in that, Sending a detection request carrying the infrared face image sequence to the server includes: The infrared face image sequence is compressed to compress it into compressed infrared image data in the target format. Send a detection request carrying the compressed infrared image data to the server.

9. The method according to claim 1, characterized in that, The image information includes the position information of the face image that meets the predetermined recognition conditions in the target face image sequence; The step of filtering out a target face image that matches the image information from the at least two sets of face image sequences includes: Search for a face image corresponding to the location information from each group of face image sequences; The searched facial image is used as the target facial image.

10. An object recognition method, characterized in that, include: The receiving terminal sends a detection request carrying a target face image sequence, wherein the target face image sequence is derived from at least two sets of face image sequences of the target object, each set of face image sequences includes at least one face image, and each set of face image sequences corresponds to an image type; According to the detection request, image information of face images that meet predetermined recognition conditions are detected from the target face image sequence; wherein, detecting face images that meet predetermined recognition conditions means performing recognition degree detection on face images in the target face image sequence, determining the recognition degree of each face image in the target face image sequence, thereby determining face images that meet predetermined recognition conditions, and extracting image information of face images that meet predetermined recognition conditions. The image information is transmitted to the terminal so that the terminal can filter out a target face image that matches the image information from the at least two sets of face image sequences; Receive a recognition request carrying the target face image sent by the terminal; Based on the recognition request, the target object is identified using the target face image.

11. The method according to claim 10, characterized in that, The step of detecting image information from the target face image sequence that meets predetermined recognition conditions includes: Feature extraction processing is performed on the facial images in the target facial image sequence to obtain the image feature information of the facial images in the target image sequence; Based on the image feature information of the face images in the target face image sequence, a recognition rate analysis is performed to obtain the recognition rate of the face images in the target face image sequence; Image information corresponding to face images whose recognition rate meets predetermined recognition conditions is extracted from the target face image sequence.

12. The method according to claim 10, characterized in that, The target face image includes a first face image of color type, a second face image of depth type, and a third face image of infrared type; The process of identifying the target object based on the target facial image includes: Liveness detection is performed based on the second and third facial images to determine the realism of the target object; When the realism meets the target realism condition, facial feature comparison and recognition are performed based on the first facial image to identify the target object.

13. An object recognition device, characterized in that, include: The acquisition module is used to acquire at least two sets of facial image sequences of the target object, each set of facial image sequences including at least one facial image, and each set of facial image sequences corresponds to an image type; A determining module is used to determine a target face image sequence for pre-identification from the at least two sets of face image sequences; The transmission module is used to send a detection request carrying the target face image sequence to the server. The detection request instructs the server to detect image information of face images that meet predetermined recognition conditions from the target face image sequence. The detection of face images that meet predetermined recognition conditions refers to performing recognition degree detection on face images in the target face image sequence, determining the recognition degree of each face image in the target face image sequence, thereby determining face images that meet predetermined recognition conditions, and extracting image information of face images that meet predetermined recognition conditions. The acquisition module is used to receive the image information returned by the server based on the detection request; A filtering module is used to filter out target face images that match the image information from the at least two sets of face image sequences; The recognition module is used to send a recognition request carrying the target face image to the server, so as to instruct the server to recognize the target object based on the target face image.

14. The apparatus according to claim 13, characterized in that, The acquisition module includes: An image stream acquisition unit is used to acquire at least two sets of facial image streams of a target object. Each set of facial image streams includes at least one initial facial image, and each set of facial image streams corresponds to an image type. The alignment unit is used to convert the initial face images at the same acquisition time point between the face image streams into face images with pixels aligned to the same coordinate space. The sequence generation unit is used to concatenate the face images obtained by converting the initial face image in each group of face image streams to generate the at least two groups of face image sequences.

15. The apparatus according to claim 14, characterized in that, The alignment unit includes: A time alignment subunit is used to perform time alignment processing on the at least two sets of face image streams to determine the initial face images corresponding to the same exposure time between the face image streams; The spatial alignment subunit is used to align the pixels in the initial face image corresponding to the same exposure time to the coordinate space corresponding to the target face image stream, so as to obtain the face image with the pixels aligned to the same coordinate space.

16. The apparatus according to claim 15, characterized in that, The at least two sets of facial image streams include a color image stream, an infrared image stream, and a depth image stream. The image type corresponding to the color image stream includes a color image, the image type corresponding to the infrared image stream includes an infrared image, and the image type corresponding to the depth image stream includes a depth image. The spatial alignment subunit includes one of a first alignment subunit and a second alignment subunit. The first alignment subunit is configured to: align pixels in the initial face images corresponding to the same exposure time in the color image stream, infrared image stream, and depth image stream to the coordinate space corresponding to the color image stream, thereby obtaining a face image with pixels aligned to the same coordinate space; and The second alignment subunit is used to: align the pixels in the initial face images corresponding to the same exposure time in the color image stream, infrared image stream, and depth image stream to the coordinate space corresponding to the infrared image stream, so as to obtain one type of face image with pixels aligned to the same coordinate space.

17. The apparatus according to claim 13, characterized in that, At least two sets of facial image sequences include color facial image sequences, wherein the image type corresponding to the color facial image sequences includes color images; the transmission module includes: The first transmission unit is configured to send a detection request carrying the color face image sequence to the server, the detection request instructing the server to detect image information of face images that meet predetermined recognition conditions from the color face image sequence.

18. The apparatus according to claim 17, characterized in that, The first transmission unit is configured to: compress the color face image sequence to compress the color face image sequence into compressed color image data in a target format; and send a detection request carrying the compressed color image data to the server.

19. The apparatus according to claim 13, characterized in that, The at least two sets of facial image sequences include an infrared facial image sequence and a depth facial image sequence corresponding to the same camera. The image type corresponding to the infrared facial image sequence includes an infrared image, and the image type corresponding to the depth facial image sequence includes a depth image. The transmission module includes: The second transmission unit is used to send a detection request carrying the infrared face image sequence to the server. The detection request instructs the server to detect image information of face images that meet predetermined recognition conditions from the infrared face image sequence.

20. The apparatus according to claim 19, characterized in that, The second transmission unit is used to compress the infrared face image sequence to compress the infrared face image sequence into compressed infrared image data in the target format. Send a detection request carrying the compressed infrared image data to the server.

21. The apparatus according to claim 13, characterized in that, The image information includes the position information of the face image that meets the predetermined recognition conditions in the target face image sequence; The filtering module includes: The search unit is used to search for a face image corresponding to the location information from each group of face image sequences; A target unit is defined, which is used to select the searched facial image as the target facial image.

22. An object recognition device, characterized in that, include: The detection request receiving module is used to receive a detection request sent by the terminal carrying a target face image sequence. The target face image sequence is derived from at least two sets of face image sequences of the target object. Each set of face image sequences includes at least one face image, and each set of face image sequences corresponds to an image type. The detection module is used to detect image information of face images that meet predetermined recognition conditions from the target face image sequence according to the detection request; wherein, detecting face images that meet predetermined recognition conditions means performing recognition degree detection on face images in the target face image sequence, determining the recognition degree of each face image in the target face image sequence, thereby determining face images that meet predetermined recognition conditions, and extracting image information of face images that meet predetermined recognition conditions. A sending module is used to transmit the image information to the terminal, so that the terminal can filter out a target face image that matches the image information from the at least two sets of face image sequences; The recognition request receiving module is used to receive a recognition request sent by the terminal carrying the target face image; The recognition response module is used to recognize the target object based on the target face image according to the recognition request.

23. The apparatus according to claim 22, characterized in that, The detection module includes: The feature extraction unit is used to perform feature extraction processing on the face images in the target face image sequence to obtain image feature information of the face images in the target image sequence; The recognition analysis unit is used to perform recognition analysis based on the image feature information of the face images in the target face image sequence, and to obtain the recognition degree of the face images in the target face image sequence; The information extraction unit is used to extract image information corresponding to face images whose recognition degree meets predetermined recognition conditions from the target face image sequence.

24. The apparatus according to claim 22, characterized in that, The target face image includes a first face image of color type, a second face image of depth type, and a third face image of infrared type; The identification response module includes: The first response unit is used to perform liveness detection based on the second face image and the third face image to determine the realism of the target object; The second response unit is used to perform facial feature comparison and recognition based on the first facial image when the realism meets the target realism condition, so as to identify the target object.

25. An electronic device, characterized in that, include: Memory, which stores computer-readable instructions; A processor reads computer-readable instructions stored in memory to execute the method according to any one of claims 1 to 12.

26. A storage medium, characterized in that, It stores computer-readable instructions that, when executed by the processor of a computer, cause the computer to perform the method described in any one of claims 1 to 12.

27. A computer program product, characterized in that, The computer program product includes computer instructions stored in a computer-readable storage medium, a processor of a computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method described in any one of claims 1 to 12.