An image category recognition method, device and medium

By using local feature matching and semantic region determination, the problems of low recall and false positives in existing image recognition methods are solved, achieving more efficient recognition of sensitive content.

CN113762280BActive Publication Date: 2026-06-26TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2021-04-23
Publication Date
2026-06-26

Smart Images

  • Figure CN113762280B_ABST
    Figure CN113762280B_ABST
Patent Text Reader

Abstract

The application discloses an image category recognition method and device and a medium, and relates to the field of computer vision. The method comprises the following steps: performing feature extraction on a to-be-recognized image to obtain first feature points and a first feature descriptor of the to-be-recognized image; determining a candidate image from a target category image library according to the first feature descriptor of the to-be-recognized image; determining a key image from the candidate image according to the first feature points and the first feature descriptor of the to-be-recognized image and second feature points and a second feature descriptor of the candidate image, and determining matching feature points of the key image; determining a semantic region representing a target category in the key image, and determining that the category of the to-be-recognized image is the target category if the matching feature points of the key image fall into the semantic region. The scheme provided in the application can improve the precision and recall rate of image category recognition, avoid misjudgment caused by the matching feature points falling into a non-semantic region, and improve the quality and efficiency of content review.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer vision, specifically to an image category recognition method, apparatus, and medium. Background Technology

[0002] Artificial Intelligence (AI) is a comprehensive technology within computer science that studies the design principles and implementation methods of various intelligent machines, enabling them to possess perception, reasoning, and decision-making capabilities. AI technology is an interdisciplinary field encompassing a wide range of areas, including computer vision, natural language processing, machine learning, and deep learning. With technological advancements, AI will be applied in more fields and play an increasingly important role.

[0003] Computer vision is the science that studies how to enable machines to "see." It utilizes computer vision technology to quickly identify the content and category of large amounts of images. With the development of content industries such as news feeds and short videos, the amount of image and video content uploaded by users on the internet is increasing. Because the content uploaded by users is diverse and some users deliberately upload images containing sensitive content to attract attention, it is necessary to review this image and video content to create a safe and healthy online environment. Images identified as containing sensitive content are used as seed images in an image library, allowing for matching and identification of newly uploaded images. However, in the matching and identification process, whether global feature matching or local feature matching is used, the accuracy and efficiency of the identification still need improvement. Summary of the Invention

[0004] To improve the accuracy and efficiency of image category recognition, this application provides an image category recognition method, apparatus, and medium. The specific technical solution is as follows:

[0005] In a first aspect, this application provides an image category recognition method, the method comprising:

[0006] An image to be identified is acquired, and features are extracted from the image to be identified to obtain a first feature point and a first feature descriptor of the image to be identified, wherein there is a correspondence between the first feature point and the first feature descriptor;

[0007] Based on the first feature descriptor of the image to be identified, candidate images that meet the first matching condition are determined from the target category image library;

[0008] Based on the first feature points and first feature descriptor of the image to be identified, and the second feature points and second feature descriptor of the candidate image, a key image that satisfies the second matching condition is determined from the candidate image, and the matching feature points of the key image are determined;

[0009] The semantic region representing the target category in the key image is determined. If the matching feature points of the key image fall within the semantic region, the category of the image to be identified is determined to be the target category.

[0010] Secondly, this application provides an image category recognition device, the device comprising:

[0011] The feature extraction module is used to acquire an image to be identified, and to extract features from the image to be identified to obtain a first feature point and a first feature descriptor of the image to be identified, wherein the first feature point and the first feature descriptor have a corresponding relationship.

[0012] The first matching module is used to determine candidate images that meet the first matching conditions from the target category image library based on the first feature descriptor of the image to be identified;

[0013] The second matching module is used to determine a key image that meets the second matching condition from the candidate images based on the first feature points and first feature descriptors of the image to be identified and the second feature points and second feature descriptors of the candidate images, and to determine the matching feature points of the key images.

[0014] The category recognition module is used to determine the semantic region representing the target category in the key image. If the matching feature points of the key image fall within the semantic region, the category of the image to be recognized is determined to be the target category.

[0015] Thirdly, this application provides a computer-readable storage medium storing at least one instruction or at least one program, which is loaded and executed by a processor to implement an image category recognition method as described in the first aspect.

[0016] Fourthly, this application provides a computer device including a processor and a memory, wherein the memory stores at least one instruction or at least one program, the at least one instruction or at least one program being loaded and executed by the processor to implement an image category recognition method as described in the first aspect.

[0017] Fifthly, the present invention provides a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform an image category recognition method as described in the first aspect.

[0018] The image category recognition method, apparatus, and storage medium provided in this application have the following technical effects:

[0019] The solution provided in this application extracts local feature information (i.e., first feature points and first feature descriptors) from the image to be identified, and matches candidate images with local similarity in the target category image library based on the local feature information. Compared with the global feature matching method, the local similarity between images is higher. At the same time, the solution provided in this application further matches key images with higher local similarity to the image to be identified from the candidate images, thereby improving the accuracy and recall of the final image category identification. Based on the local feature matching, the solution determines whether the image to be identified belongs to the same category as the key image based on the semantic region containing specific category information in the key image and the position of the matching feature points. Compared with the local feature matching method, this solution can avoid misjudgment caused by the matching feature points falling into non-specific category semantic regions in the key image, thereby further improving the quality and efficiency of content review.

[0020] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description

[0021] To more clearly illustrate the technical solutions and advantages in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0022] Figure 1 This is a schematic diagram of the implementation environment of an image category recognition method provided in an embodiment of this application;

[0023] Figure 2 This is a flowchart illustrating an image category recognition method provided in an embodiment of this application;

[0024] Figure 3 This is a schematic diagram of a feature extraction process for an image to be recognized, provided in an embodiment of this application.

[0025] Figure 4 This is a schematic diagram illustrating the effect of feature extraction on an image to be recognized, provided in an embodiment of this application.

[0026] Figure 5 This is a schematic diagram of a process for constructing an image library retrieval table provided in an embodiment of this application;

[0027] Figure 6This is a schematic diagram of another process for constructing an image library retrieval table provided in an embodiment of this application;

[0028] Figure 7 This is a schematic diagram of a process for matching candidate images provided in an embodiment of this application;

[0029] Figure 8 This is a schematic diagram illustrating the effect of local feature matching provided in an embodiment of this application;

[0030] Figure 9 This is a schematic diagram of a process for matching key images provided in an embodiment of this application;

[0031] Figure 10 This is a schematic diagram of a feature point matching process provided in an embodiment of this application;

[0032] Figure 11 This is a flowchart illustrating a process for determining the category of an image to be identified based on semantic regions, as provided in an embodiment of this application.

[0033] Figure 12 This is a schematic diagram of a method flow in a specific application scenario provided by an embodiment of this application;

[0034] Figure 13 This is another specific application process diagram provided in the embodiments of this application;

[0035] Figure 14 This is a schematic diagram of an image category recognition device provided in an embodiment of this application;

[0036] Figure 15 This is a schematic diagram of another image category recognition device provided in the embodiments of this application.

[0037] Figure 16 This is a schematic diagram of the hardware structure of a device for implementing an image category recognition method provided in an embodiment of this application. Detailed Implementation

[0038] Artificial Intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to have perception, reasoning, and decision-making capabilities. AI technology is a comprehensive discipline involving a wide range of fields, encompassing both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing technology, operating / interactive systems, and mechatronics.

[0039] The solutions provided in this application involve artificial intelligence technologies such as computer vision (CV) and deep learning (DL).

[0040] Computer vision is the science that studies how to enable machines to "see." More specifically, it refers to machine vision, which uses cameras and computers to replace human eyes in recognizing, tracking, and measuring targets, and then performs image processing to create images more suitable for human observation or transmission to instruments. As a scientific discipline, computer vision studies related theories and technologies, attempting to build artificial intelligence systems capable of extracting information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, optical character recognition (OCR), video processing, video semantic understanding, video content / behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous localization and mapping (SLAM), and common biometric recognition technologies such as facial recognition and fingerprint recognition.

[0041] Deep learning (DL) is a major research direction in the field of machine learning (ML), bringing it closer to its original goal—artificial intelligence. Deep learning learns the inherent patterns and hierarchical representations of sample data; the information gained during this learning process greatly aids in interpreting data such as text, images, and sound. Its ultimate goal is to enable machines to possess analytical and learning capabilities like humans, capable of recognizing data such as text, images, and sound. Deep learning is a complex machine learning algorithm that has achieved results in speech and image recognition far exceeding previous related technologies. Deep learning has yielded significant achievements in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, speech recognition, recommendation and personalization technologies, and other related fields. Deep learning enables machines to mimic human activities such as sight, hearing, and thought, solving many complex pattern recognition problems and significantly advancing artificial intelligence-related technologies.

[0042] The solutions provided in this application can be deployed in the cloud, and also involve cloud technologies.

[0043] Cloud technology refers to a hosting technology that unifies hardware, software, and network resources within a wide area network (WAN) or local area network (LAN) to achieve data computation, storage, processing, and sharing. It can also be understood as a general term for network technologies, information technologies, integration technologies, management platform technologies, and application technologies based on cloud computing business models. These technologies can form resource pools, allowing for on-demand use and flexibility. Backend services of cloud computing systems require substantial computing and storage resources, such as video websites, image websites, and many portal websites. With the rapid development and application of the internet industry, every item may have its own identification mark in the future, requiring transmission to backend systems for logical processing. Data at different levels will be processed separately, and various industry data require robust system support; therefore, cloud technology relies on cloud computing as its foundation. Cloud computing is a computing model that distributes computing tasks across a resource pool composed of numerous computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network providing these resources is called the "cloud." From the user's perspective, resources in the "cloud" are infinitely scalable, readily available, and can be used on demand, expanded at any time, and paid for based on usage. As a provider of fundamental cloud computing capabilities, a cloud resource pool platform, often referred to as a cloud platform or Infrastructure as a Service (IaaS), is established. This platform deploys various types of virtual resources within the resource pool for external customers to choose from. The cloud resource pool primarily includes: computing devices (which can be virtualized machines containing operating systems), storage devices, and network devices.

[0044] Existing image matching and recognition methods are mainly divided into two categories: global feature matching and local feature matching. Global feature matching methods represent an image as a feature vector and calculate the similarity between feature vectors to achieve image matching. Typical methods include pHash (perceptual hashing) algorithm (which transforms an input of arbitrary length into a fixed-length output, called the hash value) and feature extraction methods based on deep neural networks (DNNs). Local feature matching methods first extract key points from the image, then represent each key point as a feature vector. The similarity between images is represented by calculating the matching relationship of local key point features, thus completing image matching. Typical methods include Scale-Invariant Feature Transform (SIFT) algorithm and Speeded-Up Robust Features (SURF) algorithm.

[0045] Because an image often contains a wealth of information, two similar images are not necessarily similar in every aspect. More often, they share similar local regions while other areas are dissimilar. Global feature matching methods can only consider global image similarity and cannot handle local similarities, thus often resulting in low recall. While local feature matching methods can represent local matching relationships between images, for images containing specific categories of content, such as violent, pornographic, or vulgar images, not all regions necessarily represent that specific type of content. The similar parts of two images may both contain non-vulgar content, leading to many false positives when using local feature matching methods for image matching.

[0046] To improve the accuracy and efficiency of image category recognition, embodiments of this application provide an image category recognition method, apparatus, and medium. The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout.

[0047] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or server that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or devices.

[0048] To facilitate understanding of the technical solutions and their effects described in the embodiments of this application, the relevant technical terms are explained in the embodiments of this application:

[0049] ORB (Oriented FAST and Rotated BRIEF) is a fast feature extraction and description algorithm. The ORB algorithm consists of two parts: feature point extraction and feature point description. Feature extraction is derived from the FAST (Features from Accelerated Segment Test) algorithm, while feature point description is an improvement upon the BRIEF (Binary Robust Independent Elementary Features) feature description algorithm. The ORB algorithm combines the FAST feature point detection method with the BRIEF feature descriptor, and improves and optimizes upon their original versions. The most significant feature of the ORB algorithm is its high computational speed. This is primarily due to the use of FAST for feature point detection, and secondly, the use of the BRIEF algorithm to calculate the descriptor. The unique binary string representation of the descriptor not only saves storage space but also significantly reduces matching time.

[0050] FAST: Features from Accelerated Segment Test, corner detection; The basic principle of the FAST algorithm is: if a pixel has a certain attribute difference from a sufficient number of consecutive pixels in its surrounding area, and this difference is greater than a specified threshold, then it can be determined that the pixel has an identifiable difference from its neighboring pixels and can be used as a feature point (corner point); For grayscale images, the attribute examined by the FAST algorithm is the grayscale difference between the pixel and its neighborhood.

[0051] BRIEF stands for Binary Robust Independent Elementary Features. BRIEF is a feature descriptor that doesn't provide a method for finding features. Instead of calculating descriptors, it directly finds a binary string. This algorithm uses a smoothed image and selects a set of pixel pairs nd(x, y) in a specific way, then compares the grayscale values ​​between these pairs. For example, the first pair might have grayscale values ​​p and q. If p is less than q, the result is 1; otherwise, it's 0. This process is repeated for nd pairs to obtain an nd-dimensional binary string, where nd can be 128, 256, or 512.

[0052] SIFT: Scale-Invariant Feature Transform; it is a scale-space based image local feature descriptor that is invariant to image scaling, rotation, and even affine transformations. The advantages of the SIFT algorithm are feature stability, invariance to rotation, scaling, and brightness changes, and a certain degree of stability against viewpoint changes and noise. The disadvantages are low real-time performance and weak ability to extract feature points from objects with smooth edges.

[0053] SURF (Speeded Up Robust Features) is a robust local feature point detection and description algorithm. Like SIFT, SURF's basic process can be divided into three parts: local feature point extraction, feature point description, and feature point matching. SURF improves feature extraction and description methods, using a more efficient approach. This is achieved through two key improvements: the use of integral images on the Hessian matrix and the use of dimensionality-reduced feature descriptors.

[0054] Hamming distance: In information theory, the Hamming distance between two strings of equal length is the number of distinct characters at corresponding positions in the two strings. In other words, it is the number of characters that need to be replaced to transform one string into the other.

[0055] Manhattan distance is a geometric term used in geometric measurement spaces to indicate the sum of the absolute axial distances between two points in a standard coordinate system.

[0056] Random Sample Consensus (RANSAC) is an algorithm that iteratively estimates the parameters of a mathematical model from a set of observed data containing outliers. RANSAC is a nondeterministic algorithm; in a sense, it produces a result that is reasonable under certain probabilities, and further iterations increase this probability.

[0057] Semantics: Semantics is the interpretation of data symbols. The three levels of semantics in images include: low-level semantics, such as the color and texture of pixels; mid-level semantics, such as the roughness, contrast, and compactness of image patches; and high-level semantics, such as information about the categories of objects contained in the image or image region.

[0058] Recall: Also known as the recall rate, the recall ratio is the ratio of relevant information retrieved from a database to the total number of results. The absolute value of recall is difficult to calculate and can only be estimated based on the database content and quantity.

[0059] Please see Figure 1 This is a schematic diagram illustrating the implementation environment of an image category recognition method provided in this application embodiment, such as... Figure 1 As shown, the implementation environment may include at least client 01 and server 02.

[0060] Specifically, the client 01 may include devices such as smartphones, desktop computers, tablets, laptops, digital assistants, smart wearable devices, monitoring devices, and voice interaction devices. It may also include software running on the device, such as web pages provided to users by service providers, or applications provided by those service providers. Specifically, the client 01 can be used to transmit images or videos uploaded by users to the network for identification by the server as the images to be recognized in this application.

[0061] Specifically, the server 02 can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms. The server 02 may include a network communication unit, a processor, and a memory, etc. The terminal and the server can be directly or indirectly connected via wired or wireless communication, which is not limited herein. Specifically, the server 02 can be used to perform the image category recognition method provided in this application on the image to be recognized, determine whether the image to be recognized belongs to the target category, and further, intercept or recommend downgraded images of specific categories to ensure the green and healthy nature of network content.

[0062] This application embodiment can also be implemented using cloud technology. Cloud technology refers to a hosting technology that unifies hardware, software, and network resources within a wide area network (WAN) or local area network (LAN) to achieve data computation, storage, processing, and sharing. It can also be understood as a general term for network technologies, information technologies, integration technologies, management platform technologies, and application technologies based on cloud computing business models. Cloud technology requires cloud computing as its support. Cloud computing is a computing model that distributes computing tasks across a resource pool composed of a large number of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network providing these resources is called the "cloud." Specifically, the server 02 and the database are located in the cloud. The server 02 can be a physical machine or a virtualized machine.

[0063] The following describes an image category recognition method provided in this application. Figure 2 This is a flowchart illustrating an image category recognition method provided in an embodiment of this application. This application provides the operational steps of the method described in the embodiments or flowchart, but based on conventional or non-inventive labor, more or fewer operational steps may be included. The order of steps listed in the embodiments is merely one possible execution order among many and does not represent the only possible execution order. In actual system or server product execution, the method can be executed sequentially according to the embodiments or accompanying drawings, or in parallel (e.g., in a parallel processor or multi-threaded processing environment). Please refer to... Figure 2 An image category recognition method provided in this application embodiment may include the following steps:

[0064] S210: Obtain the image to be identified, and perform feature extraction on the image to be identified to obtain the first feature point and the first feature descriptor of the image to be identified, wherein the first feature point and the first feature descriptor have a corresponding relationship.

[0065] With the development of content industries such as news feeds and short videos, the amount of image and video content uploaded by users on the internet is increasing. Due to the diverse categories of user-uploaded content, images or videos may contain sensitive information, such as vulgar, pornographic, violent, or gory content. To address this, computer vision and related technologies offer approaches to reviewing image and video content (which can be viewed as multiple image frames). For example, an image library can be established containing seed images already identified as belonging to specific sensitive categories. Then, based on image matching methods, newly uploaded images or videos can be matched and identified. If a seed image is found, the newly uploaded image or video can be determined to contain sensitive content or belong to a specific sensitive category. Image matching can be mainly divided into grayscale-based matching and feature-based matching. Feature-based matching methods mainly include global feature matching and local feature matching. To address the low recall rate of global feature matching, and considering that sensitive content often appears in local areas of an image, the method provided in this application adopts local feature matching. First, local features are extracted from the image to obtain the image's feature points (in a simple understanding, feature points are relatively prominent points in the image, such as contour points, bright spots in darker areas, dark spots in brighter areas, etc.) and feature descriptors (characterizing the feature attributes of feature points). For each feature point, it can have information such as position, scale, and orientation.

[0066] In the embodiments of this application, the extraction of local image features can be performed using the ORB algorithm, or the SIFT algorithm, SURF algorithm, or a local feature extraction method based on deep learning, etc.

[0067] In one embodiment of this application, such as Figure 3 As shown, the step of extracting features from the image to be identified to obtain the first feature points and the first feature descriptor of the image to be identified may include the following steps:

[0068] S310: Based on feature extraction and description algorithms, the first feature points of the image to be identified are detected.

[0069] Feature extraction and description algorithms are used to detect and describe features in an image. In one feasible implementation, the ORB algorithm is preferably used for local feature extraction and description to obtain local feature information of the image to be identified, including a first feature point and a first feature descriptor. The "first" and subsequent "second" and "third" are used only to distinguish the image to which the feature point and feature descriptor belong. ORB uses an optimized FAST algorithm to extract feature points (i.e., corner points in FAST) and an optimized BRIEF algorithm to describe the feature points. This approach is fast, and the extracted feature descriptors are scale- and rotation-invariant, making it efficient and robust for local similarity matching.

[0070] Specifically, FAST determines whether a pixel is a corner point by analyzing the continuous pixel values ​​on a circle of a certain radius. In the ORB algorithm, the obtained FAST corner points are sorted using Harris corner response, and a subset of FAST corner points with strong responses are selected. To ensure scale invariance, the ORB algorithm extracts FAST feature points at multiple scales on the image pyramid. Furthermore, to ensure rotation invariance, patches are extracted from the FAST feature points, and the principal orientation of each feature point is determined by calculating its zeroth and first moments. Through these operations, the position and orientation of feature points at different scales can be obtained.

[0071] S330: Generate a first feature descriptor corresponding to the first feature point, wherein the first feature descriptor is a binary string.

[0072] Specifically, the BRIEF algorithm selects N pairs of points around a feature point in a certain pattern. It then generates an N-dimensional binary string by comparing the values ​​within these N pairs, serving as the feature descriptor for that feature point. Here, N is a pre-defined parameter that limits the number of point pairs selected and the number of bits in the binary string. Understandably, binary encoding not only saves storage space but also facilitates the calculation of similarity between feature descriptors, significantly reducing matching time. While BRIEF has low computational complexity and high speed, it lacks rotational and scale invariance and is sensitive to noise. Therefore, ORB establishes a two-dimensional coordinate system when calculating the BRIEF feature descriptor, with the feature point as the center and the line connecting the feature point and the centroid of the selected region as the X-axis, ensuring rotational invariance. For example, the feature descriptors for feature points A and B can be represented as A: 10101011 and B: 10101010.

[0073] In another feasible implementation, SIFT is used for feature extraction. First, a scale space is constructed, extreme points (feature points) in the scale space are detected, the position and scale of the feature points are determined, and directional parameters are assigned to the extreme points. Finally, feature descriptors are generated in the same way. For details, please refer to the SIFT algorithm. This application will not elaborate further on this aspect.

[0074] For example, such as Figure 4 As shown, Figure 4 The image shows the feature points that can be detected for a face image, where the feature points can be points on the facial contours or points in areas where the brightness changes significantly.

[0075] S230: Based on the first feature descriptor of the image to be identified, determine candidate images that meet the first matching conditions from the target category image library.

[0076] The target category image library contains seed images that have been identified as the target category, such as pornographic images, violent images, etc. If the image to be identified is obtained by rotating, translating, scaling, or otherwise modifying the seed images, a seed image-based matching and recognition method is used. This method achieves a high matching rate and excellent recognition results when identifying sensitive category images.

[0077] In this embodiment, based on the first feature descriptor of the image to be identified, seed images with the same or similar feature descriptors are initially matched as candidate images in this application. The obtained candidate images have a certain degree of local similarity with the image to be identified, that is, they satisfy the first matching condition. At the same time, for the seed images in the target category image library, feature extraction and description are also performed to obtain the local feature information of various sub-images for use in the matching process in step S230.

[0078] Optionally, a retrieval table can be constructed based on seed images and their feature information in the target category image library, such as... Figure 5 As shown, the method may further include:

[0079] S510: Obtain the seed image of the target category image library.

[0080] S530: Perform feature extraction on the seed image to obtain the third feature point and the third feature descriptor of the seed image, wherein the third feature point and the third feature descriptor have a corresponding relationship.

[0081] The process of obtaining the feature information of the seed image (i.e., the third feature point and the third feature descriptor) can be referred to the corresponding content in step S210 of the embodiment of this application, and will not be repeated here.

[0082] S550: Construct a first retrieval table based on the third feature descriptor of the seed image and the image identifier of the seed image.

[0083] Optionally, the first retrieval table can represent the mapping relationship from the third feature descriptor to the image ID (Identity) of the seed image, facilitating the recall of candidate images and saving query time. In addition to the aforementioned mapping relationship, the first retrieval table can also represent the mapping relationship from the image ID to the third feature descriptor. The former mapping relationship can be understood as an inverted index, and the latter as a forward index. In the method provided in this application embodiment, when the key image of the image to be identified is matched again from the candidate images, the mapping relationship from the image ID to the third feature descriptor can save query time and improve matching efficiency.

[0084] Optionally, such as Figure 6 As shown, the method may further include:

[0085] S610: Obtain the seed image of the target category image library.

[0086] S630: Perform target detection on the seed image to determine the semantic region representing the target category in the seed image.

[0087] For example, target detection or semantic segmentation techniques in computer vision can be used to automatically detect, identify, or segment specific semantic regions in seed images. Furthermore, a manual review and annotation system can be integrated to allow for manual judgment and annotation of specific semantic regions, ensuring efficient and accurate annotation.

[0088] S650: Construct a second retrieval table based on the semantic region of the seed image and the image identifier of the seed image.

[0089] Furthermore, by combining the first and second retrieval tables mentioned above, a mapping relationship from image identifiers to feature descriptors to semantic regions can be constructed, which can clearly show the feature descriptors representing specific semantic regions.

[0090] Based on the first search table, such as Figure 7 As shown, determining candidate images that meet the first matching condition from the target category image library based on the first feature descriptor of the image to be identified may include the following steps:

[0091] S710: Based on the first feature descriptor, determine a first matching feature descriptor that satisfies a preset similarity condition with the first feature descriptor.

[0092] Preferably, the first feature descriptor is a feature vector represented by a binary string. The similarity between feature vectors can be calculated using Euclidean distance, cosine similarity, Hamming distance, etc. The preset similarity conditions may include pre-set similarity thresholds, vector distance thresholds, etc. Taking the similarity threshold as an example, feature descriptors with a similarity higher than the similarity threshold can be used as the first matching feature descriptors. For example, in two-dimensional, three-dimensional, or multi-dimensional space, Euclidean distance is the straight-line distance between two points. The smaller the distance, the greater the similarity. Cosine similarity uses the cosine of the angle between two feature vectors to measure the difference between them; the larger the cosine value, the more similar the two feature vectors are, and the smaller the difference. Hamming distance was initially used in data transmission error detection coding to accumulate erroneous data bits that have been flipped in a fixed-length binary word during communication. Hamming distance is a concept that represents the number of different elements at corresponding positions in two (equal-length) vectors or strings. The Hamming distance is calculated by performing an XOR operation on two strings (resulting in 1 if they are different, and 0 if they are the same) and counting the number of 1s. Hamming distance has wide applications in image processing and is a very effective method for comparing binary images. For feature vectors represented by binary characters in this embodiment, the Hamming distance between different feature vectors can be calculated directly using an XOR operation; the smaller the Hamming distance, the greater the similarity. Furthermore, when the feature vector dimension is large, the binary strings can be grouped. For example, a 128-dimensional feature vector can be divided into 16 groups, each with 8 bits. If two groups of two feature vectors differ by even one character, the remaining characters in that group are no longer compared, and the group is directly determined to be different. While this method increases the similarity interval, it reduces the computational load.

[0093] Optionally, based on the mapping relationship between the third feature descriptor and the image identifier of the seed image in the first retrieval table, all the third feature descriptors extracted from the seed image can be directly captured. Using the various similarity measurement methods described above, the first matching feature descriptor that meets the preset similarity conditions with the first feature descriptor of the image to be identified can be determined from all the third feature descriptors.

[0094] S730: Determine the first matching image identifier corresponding to the first matching feature descriptor based on the first retrieval table.

[0095] It is understandable that the first retrieval table contains a mapping relationship between the third feature descriptor and the image identifier of the seed image.

[0096] S750: Based on the first matching image identifier, obtain the corresponding seed image from the target category image library, use the seed image as the candidate image, and obtain the second feature point and the second feature descriptor of the candidate image, wherein the second feature point and the second feature descriptor have a corresponding relationship.

[0097] S250: Based on the first feature points and first feature descriptor of the image to be identified and the second feature points and second feature descriptor of the candidate image, determine the key image that satisfies the second matching condition from the candidate image, and determine the matching feature points of the key image.

[0098] After the aforementioned matching, one or more candidate images have a certain degree of local similarity to the image to be identified. Compared with global feature matching, the recall rate of candidate images is higher. The method provided in this application further selects one or more key images with higher similarity to the image to be identified from the candidate images, thereby improving the accuracy of sensitive category image recognition.

[0099] In this embodiment of the application, candidate images are filtered out by calculating the similarity of feature descriptors during the aforementioned matching process. In step S250, a matching relationship of local features between the image to be identified and the candidate images can be constructed by calculating the similarity of feature descriptors. Then, based on the number or proportion of matching relationships, one or more key images are filtered out from the candidate images. For example, as shown... Figure 8 As shown, local features of the image are matched to construct the matching relationship between local features, which is also the correspondence between feature points.

[0100] Specifically, such as Figure 9 As shown, the step of determining a key image satisfying a second matching condition from the candidate images based on the first feature points and first feature descriptors of the image to be identified, and the second feature points and second feature descriptors of the candidate images, and determining the matching feature points of the key images, may include the following steps:

[0101] S910: Obtain the second feature point and the second feature descriptor of each candidate image, wherein there is a correspondence between the second feature point and the second feature descriptor.

[0102] Based on the first retrieval table, or other retrieval tables that represent the mapping relationship from image identifiers to feature descriptors, the second feature points and second feature descriptors of each candidate image are directly obtained. The second feature points may have information such as position, scale, or orientation.

[0103] S930: Based on the first feature points and first feature descriptor of the image to be identified, and the second feature points and second feature descriptor of the candidate image, determine the matching feature points of the candidate image from the second feature points, and the matching feature points of the candidate image and the matching feature points corresponding to the image to be identified constitute a pair of matching feature points representing the matching relationship.

[0104] For each first feature descriptor of the image to be identified, the similarity between it and each second feature descriptor of each candidate image is calculated. The matching relationship between the image to be identified and each candidate image is determined based on conditions such as similarity threshold or matching strategy. The matching relationship of a local feature can be represented by a matching feature point pair, where the two matching feature points in the matching feature point pair are the first feature point of the image to be identified and the second feature point of the candidate image, respectively.

[0105] In one feasible implementation, such as Figure 10 As shown, determining the matching feature points of the candidate image from the second feature points based on the first feature points and first feature descriptor of the image to be identified, and the second feature points and second feature descriptor of the candidate image, may include the following steps:

[0106] S931: Calculate the similarity between the first feature descriptor and the second feature descriptor respectively.

[0107] In this embodiment, the similarity between feature descriptors can also be measured using the aforementioned similarity measurement method, which will not be repeated here. Alternatively, the Manhattan distance method can be used. The Manhattan distance is a relatively simplified way of measuring distance in spatial geometry, requiring only addition and subtraction. Furthermore, a first standard coordinate system can be established in the vector space containing the first feature descriptor of the image to be identified, and the first Manhattan distance between it and the second feature descriptor of the candidate image can be calculated; simultaneously, a second standard coordinate system can be established in the vector space containing the second feature descriptor of the candidate image, and the second Manhattan distance between it and the first feature descriptor of the image to be identified can be calculated. Since the Manhattan distance represents the sum of absolute axial distances, the Manhattan distance will differ under different standard coordinate systems.

[0108] S935: Based on a preset similarity condition and the similarity, determine the matching feature points of the image to be identified from the first feature points, and determine the matching feature points of the candidate image from the second feature points, wherein the matching feature points of the candidate image and the matching feature points of the image to be identified have a corresponding relationship.

[0109] The preset similarity conditions may include a manually set similarity threshold. Two feature points corresponding to a set of feature descriptors with a similarity higher than this threshold are considered as a pair of matching feature points. Simply put, based on a brute-force matching method, the similarity threshold can be used to filter and determine the matching relationship between each candidate image and the image to be identified, as well as the matching feature points of the candidate images within that matching relationship. Furthermore, since the calculation of similarity between feature descriptors is a many-to-many process, the similarity-based feature matching process can employ matching strategies such as bidirectional matching or fast nearest neighbor matching.

[0110] S950: Determine the key image from the candidate images based on preset filtering conditions and matching feature points of the candidate images.

[0111] In this embodiment of the application, the preset filtering conditions may include a threshold set for the number or proportion of matching feature points, and one or more key images with a number or proportion of matching feature points higher than the threshold are selected from each candidate image. The key images have higher similarity to the image to be identified in local features.

[0112] Specifically, the number of matching feature points in each candidate image can be counted; or the proportion of matching feature points in each candidate image to the total number of feature points in the image to be identified can be calculated; and candidate images that meet the threshold conditions can be selected as key images. Furthermore, key images can also be selected based on the density or dispersion of matching feature points in the candidate images, or based on the density or dispersion of matching feature points in the image to be identified corresponding to the matching feature points of the candidate images. In other words, the selection criteria can also include the distribution characteristics of matching feature points in the image.

[0113] S270: Determine the semantic region representing the target category in the key image. If the matching feature points of the key image fall within the semantic region, then determine that the category of the image to be identified is the target category.

[0114] The method provided in this application uses a local feature matching method. However, for sensitive category images, not all regions are necessarily sensitive content, and similar parts between the image to be identified and the key image may not be sensitive content. Therefore, using only the local feature matching method can lead to a certain degree of misjudgment. Therefore, the method provided in this application combines the semantic regions of the image for further judgment. By determining whether the matching feature points of the key image fall within the semantic region representing a specific category, it determines whether the image to be identified ultimately belongs to the same type as the key image, thereby improving the accuracy of image category recognition and content review.

[0115] In one embodiment of this application, feature points may also be mismatched, such as detecting non-corresponding feature points as matches or failing to detect matching feature points. Therefore, methods are used to filter out erroneous matches in order to better match and identify the category of the image to be recognized. Specifically, such as Figure 11 As shown, determining the semantic region representing the target category in the key image, and determining the category of the image to be identified as the target category if the matching feature points of the key image fall within the semantic region, may further include the following steps:

[0116] S1110: Determine the semantic regions representing target categories in each of the key images based on the second retrieval table.

[0117] S1130: The matching feature points corresponding to the image to be identified are determined by the matching feature points of the key image, and matching feature point pairs are generated accordingly.

[0118] It is understood that the matching relationship between the key image and the image to be identified can be represented by a pair of matching feature points. The two matching feature points in the pair are the first feature point of the image to be identified and the corresponding feature point of the key image. Since the key image is obtained from the candidate image, the feature point of the key image can also be the second feature point of the key image.

[0119] S1150: In the preset spatial model, the spatial relationship of the matching feature point pairs is constructed based on the random sampling consensus algorithm.

[0120] Specifically, the principle of the random sampling consensus algorithm is to randomly select N point pairs from at least one matching feature point pair to fit a perspective transformation matrix. This perspective transformation matrix is ​​then applied to other point pairs to verify the matching results. The best fitting result is obtained through iterative calculation, so that the spatial relationship of the matching feature point pairs achieves the greatest consistency.

[0121] S1170: Based on the spatial relationship of the matching feature point pairs, filter the matching feature point pairs to obtain key matching feature point pairs.

[0122] Matching feature point pairs that conform to the above consistency in spatial relationships are considered as correct key matching feature point pairs, while others are filtered out as incorrect matches.

[0123] For computational and accuracy considerations, the method provided in this application first calculates similarity to match key images, then uses a random sampling consensus algorithm to filter out mismatched feature point pairs, leaving matching feature point pairs with higher similarity and credibility. Besides the random sampling consensus algorithm, cross-matching, K-nearest neighbor matching, and methods where the Hamming distance is less than twice the minimum distance can also be used for filtering, which will not be elaborated here.

[0124] S1190: In the key matching feature point pair, if the matching feature point of the key image falls within the semantic region, then the category of the image to be identified is determined to be the target category.

[0125] Furthermore, the number or proportion of matching feature points falling within the semantic region can be statistically analyzed or calculated. Only when the number or proportion meets a certain threshold is the category of the image to be identified determined to be the target category. For example, within the semantic region representing the sensitive content category of one or more key images, if the number of matching feature points of the key images falling within the semantic region exceeds a certain number, it can be determined that the corresponding image to be identified contains a semantic region belonging to the sensitive content category, and thus the image to be identified can be determined to be an image of the sensitive content category.

[0126] In an application scenario for identifying vulgar images provided in this application embodiment, the method combines the construction of a retrieval table in a vulgar image seed database with category matching and identification of the queried images. Specifically, as follows: Figure 12 As shown, the method may include:

[0127] 1) Image Local Feature Extraction. The ORB algorithm is used to detect local key points in images (including query images and low-quality image torrents) and extract the features of these key points, thereby representing an image as a combination of several local feature descriptors.

[0128] 2) Semantic region identification and annotation of seed images. Using object detection methods and manual review of annotations, the semantic regions containing pornography and vulgarity in seed images are identified.

[0129] 3) Construction of the seed database retrieval table. Based on the local image features extracted in 1) and the pornographic and vulgar semantic regions marked in 2), an image retrieval table is constructed. Two tables are constructed: Table 1 is a mapping table from feature descriptors to image IDs; Table 2 is a mapping table from image IDs to the feature descriptors and vulgar semantic region information of that image.

[0130] 4) Image retrieval. First, the local feature descriptors of the query image extracted in 1) are called. Then, according to Table 1 constructed in 3), the image IDs with similar feature descriptors to the query image are retrieved, and a set of candidate images matching the query image is obtained.

[0131] 5) Image matching. Based on the candidate image set in 4), the local feature descriptor information of each candidate image is retrieved from Table 2 constructed in 3). By calculating the similarity between the local feature descriptors of the query image and the candidate images, the key image matching the query image and the matching feature points of the key image are obtained.

[0132] 6) Semantic region matching judgment. First, the matching points that do not conform to the spatial variation relationship are filtered out by the random sampling consensus algorithm. Then, the vulgar semantic region of the key image is queried from Table 2 in 3). Finally, the query image is determined to be a vulgar image by judging whether the matching feature points of the key image fall within the vulgar semantic region.

[0133] In an application scenario for content review provided in this application embodiment, an image category matching and recognition method combined with semantic regions can be used for reviewing text, images, and video content. For example... Figure 13 As shown, when a user uploads or publishes text, images, or videos, the system first performs semantic region-sensitive matching for vulgar image seeds. If the content matches a vulgar semantic region in the vulgar image seed library, the content will be blocked or downgraded.

[0134] The solution provided in this application extracts local feature information (i.e., first feature points and first feature descriptors) from the image to be identified, and matches candidate images with local similarity in the target category image library based on the local feature information. Compared with the global feature matching method, the local similarity between images is higher. At the same time, the solution provided in this application further matches key images with higher local similarity to the image to be identified from the candidate images, thereby improving the accuracy and recall of the final image category identification. Based on the local feature matching, the solution determines whether the image to be identified belongs to the same category as the key image based on the semantic region containing specific category information in the key image and the position of the matching feature points. Compared with the local feature matching method, this solution can avoid misjudgment caused by the matching feature points falling into non-specific category semantic regions in the key image, thereby further improving the quality and efficiency of content review.

[0135] This application embodiment also provides an image category recognition device 1400, such as... Figure 14 As shown, the device may include:

[0136] Feature extraction module 1410 is used to acquire an image to be identified, and to extract features from the image to be identified to obtain a first feature point and a first feature descriptor of the image to be identified, wherein the first feature point and the first feature descriptor have a corresponding relationship.

[0137] The first matching module 1420 is used to determine candidate images that meet the first matching conditions from the target category image library based on the first feature descriptor of the image to be identified;

[0138] The second matching module 1430 is used to determine a key image that meets the second matching condition from the candidate image based on the first feature points and first feature descriptor of the image to be identified and the second feature points and second feature descriptor of the candidate image, and to determine the matching feature points of the key image.

[0139] The category recognition module 1440 is used to determine the semantic region representing the target category in the key image. If the matching feature points of the key image fall within the semantic region, the category of the image to be recognized is determined to be the target category.

[0140] In one embodiment of this application, the feature extraction module 1410 may include:

[0141] The feature point detection unit is used to detect the first feature points of the image to be identified based on feature extraction and description algorithms.

[0142] The feature description unit is used to generate a first feature descriptor corresponding to the first feature point, wherein the first feature descriptor is a binary string.

[0143] In one embodiment of this application, the image category recognition device 1400 may further include:

[0144] The first acquisition unit is used to acquire seed images of the target category image library;

[0145] The feature extraction unit is used to extract features from the seed image to obtain the third feature point and the third feature descriptor of the seed image, wherein the third feature point and the third feature descriptor have a corresponding relationship.

[0146] The first retrieval table unit is used to construct a first retrieval table based on the third feature descriptor of the seed image and the image identifier of the seed image.

[0147] In one embodiment of this application, the image category recognition device 1400 may further include:

[0148] The second acquisition unit is used to acquire seed images of the target category image library;

[0149] The target detection unit is used to perform target detection on the seed image and determine the semantic region representing the target category in the seed image;

[0150] The second retrieval table unit is used to construct a second retrieval table based on the semantic region of the seed image and the image identifier of the seed image.

[0151] In one embodiment of this application, the first matching module 1420 may include:

[0152] The feature descriptor selection unit is used to determine a first matching feature descriptor that satisfies a preset similarity condition to the first feature descriptor based on the first feature descriptor.

[0153] The first retrieval unit is used to determine the first matching image identifier corresponding to the first matching feature descriptor based on the first retrieval table.

[0154] The candidate image acquisition unit is configured to acquire a corresponding seed image from the target category image library based on the first matching image identifier, use the seed image as the candidate image, and obtain a second feature point and a second feature descriptor of the candidate image, wherein the second feature point and the second feature descriptor have a corresponding relationship.

[0155] In one embodiment of this application, the second matching module 1430 may include:

[0156] The third acquisition unit is used to acquire the second feature points and the second feature descriptor of the candidate image, wherein there is a correspondence between the second feature points and the second feature descriptor;

[0157] The feature point matching unit is used to determine the matching feature points of the candidate image from the second feature points based on the first feature points and the first feature descriptor of the image to be identified and the second feature points and the second feature descriptor of the candidate image. The matching feature points of the candidate image and the matching feature points corresponding to the image to be identified constitute a matching feature point pair representing the matching relationship.

[0158] A key image determination unit is used to determine the key image and the matching feature points of the key image from the candidate images according to preset screening conditions and matching feature points of the candidate images.

[0159] In one feasible implementation, the feature point matching unit may further include:

[0160] A similarity calculation subunit is used to calculate the similarity between the first feature descriptor and the second feature descriptor, respectively;

[0161] The judgment subunit is used to determine the matching feature points of the image to be identified from the first feature points and the matching feature points of the candidate image from the second feature points, based on the preset similarity conditions and the similarity. The matching feature points of the candidate image and the matching feature points of the image to be identified have a corresponding relationship.

[0162] In one embodiment of this application, the category recognition module 1440 may include:

[0163] The second retrieval unit is used to determine the semantic region representing the target category in each of the key images based on the second retrieval table;

[0164] The matching feature point pair determination unit is used to determine the matching feature points corresponding to the image to be identified from the matching feature points of the key image, and generate matching feature point pairs accordingly.

[0165] The spatial relationship construction unit is used to construct the spatial relationship of the matching feature point pairs in a preset spatial model based on a random sampling consensus algorithm.

[0166] A filtering unit is used to filter the matching feature point pairs based on the spatial relationship of the matching feature point pairs to obtain key matching feature point pairs;

[0167] The determination unit is configured to determine the category of the image to be identified as the target category if the matching feature points of the key image fall within the semantic region in the key matching feature point pair.

[0168] In an application scenario for identifying vulgar images provided in this application embodiment, the module for constructing a retrieval table in a vulgar image seed database is specifically used, as follows: Figure 15 As shown, the device may include:

[0169] The image local feature extraction module 1510 uses the ORB algorithm to detect local key points of an image (including query images and low-quality image seeds) and extract the features of the key points, thereby representing an image as a combination of several local feature descriptors.

[0170] The seed image semantic region recognition and annotation module 1520 uses target detection methods and manual review and annotation to annotate the pornographic and vulgar semantic regions on the seed image.

[0171] The seed database retrieval table construction module 1530 constructs an image retrieval table based on the local image features extracted by the image local feature extraction module 1510 and the pornographic and vulgar semantic regions marked by the seed image semantic region recognition and annotation module 1520. Two tables are constructed: Table 1 is a mapping table from feature descriptors to image IDs; Table 2 is a mapping table from image IDs to the image's feature descriptors and vulgar semantic region information.

[0172] The image retrieval module 1540 first calls the local feature descriptor of the query image extracted by the image local feature extraction module 1510, and then retrieves the image ID with similar feature descriptors to the query image according to Table 1 constructed by the seed library retrieval table construction module 1530, thus obtaining a set of candidate images matching the query image.

[0173] The image matching module 1550 retrieves the local feature descriptor information of each candidate image from Table 2 constructed by the seed database retrieval table construction module 1530 based on the candidate image set in the query image retrieval module 1540. By calculating the similarity between the local feature descriptors of the query image and the candidate images, the key image matching the query image and the matching feature points of the key image are obtained.

[0174] The semantic region matching and judgment module 1560 first filters out matching points that do not conform to the spatial variation relationship through a random sampling consistency algorithm, then queries the vulgar semantic region of the key image from Table 2 constructed by the seed library retrieval table construction module 1530, and finally determines whether the query image is a vulgar image by judging whether the matching feature points of the key image fall within the vulgar semantic region.

[0175] It should be noted that the apparatus provided in the above embodiments is only illustrated by the division of the above functional modules when implementing its functions. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.

[0176] This application provides a computer device including a processor and a memory. The memory stores at least one instruction or at least one program, which is loaded and executed by the processor to implement an image category recognition method as provided in the above method embodiments.

[0177] Figure 16 A schematic diagram of the hardware structure of a device for implementing an image category recognition method provided in an embodiment of this application is shown. This device may constitute or include the apparatus or system provided in the embodiment of this application. Figure 16As shown, device 16 may include one or more processors 1602 (shown as 1602a, 1602b, ..., 1602n in the figure) 1602 (processor 1602 may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 1604 for storing data, and a transmission device 1606 for communication functions. In addition, it may also include: a display, an input / output interface (I / O interface), a universal serial bus (USB) port (which may be included as one of the ports of the I / O interface), a network interface, a power supply, and / or a camera. Those skilled in the art will understand that... Figure 16 The structure shown is for illustrative purposes only and does not limit the structure of the electronic device described above. For example, device 16 may also include a... Figure 16 The more or fewer components shown, or having the same Figure 16 The different configurations shown.

[0178] It should be noted that the aforementioned one or more processors 1602 and / or other data processing circuitry are generally referred to herein as "data processing circuitry". This data processing circuitry may be embodied, in whole or in part, in software, hardware, firmware, or any other combination thereof. Furthermore, the data processing circuitry may be a single, independent processing module, or may be integrated, in whole or in part, into any other element within device 16 (or mobile device). As involved in the embodiments of this application, this data processing circuitry serves as a processor control mechanism (e.g., selection of a variable resistor termination path connected to an interface).

[0179] The memory 1604 can be used to store software programs and modules of application software, such as the program instructions / data storage device corresponding to the method described in the embodiments of this application. The processor 1602 executes various functional applications and data processing by running the software programs and modules stored in the memory 1604, thereby realizing the above-mentioned image category recognition method. The memory 1604 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 1604 may further include memory remotely located relative to the processor 1602, and these remote memories can be connected to the device 16 via a network. Examples of the above-mentioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0180] The transmission device 1606 is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by the communication provider of device 16. In one example, the transmission device 1606 includes a network interface controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the transmission device 1606 may be a radio frequency (RF) module for wireless communication with the Internet.

[0181] The display may be, for example, a touchscreen liquid crystal display (LCD) that allows a user to interact with the user interface of device 16 (or a mobile device).

[0182] This application embodiment also provides a computer-readable storage medium, which can be disposed in a server to store at least one instruction or at least one program related to implementing an image category recognition method in the method embodiment. The at least one instruction or at least one program is loaded and executed by the processor to implement the image category recognition method provided in the above method embodiment.

[0183] Optionally, in this embodiment, the storage medium may be located at at least one of the multiple network servers in a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to, various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0184] This invention also provides a computer program product or computer program, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform an image category recognition method provided in the various optional embodiments described above.

[0185] As can be seen from the above embodiments of the image category recognition method, apparatus, and medium provided in this application, the solution provided in this application extracts local feature information from the image to be recognized and matches it with candidate images with local similarity in the target category image library based on the local feature information. Compared with the global feature matching method, the local similarity between images is higher. At the same time, the solution provided in this application further matches key images with higher local similarity to the image to be recognized from the candidate images, thereby improving the accuracy and recall of the final image category recognition. Based on local feature matching, the solution determines whether the image to be recognized belongs to the same category as the key image based on the semantic region containing specific category information in the key image and the position of the matching feature points. Compared with the local feature matching method, this solution can avoid misjudgment caused by the matching feature points falling into non-specific category semantic regions in the key image, thereby further improving the quality and efficiency of content review.

[0186] It should be noted that the order of the embodiments described above is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. Furthermore, the above description focuses on specific embodiments of this application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims can be performed in a different order than that shown in the embodiments and still achieve the desired results. Additionally, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired results. In some implementations, multitasking and parallel processing are also possible or may be advantageous.

[0187] The various embodiments in this application are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the device, equipment, and storage medium embodiments are basically similar to the method embodiments, so the descriptions are relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0188] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.

[0189] The above description is only a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. An image category recognition method, characterized in that, The method includes: An image to be identified is acquired, and features are extracted from the image to be identified to obtain a first feature point and a first feature descriptor of the image to be identified, wherein there is a correspondence between the first feature point and the first feature descriptor; Based on the first feature descriptor of the image to be identified, candidate images that meet the first matching condition are determined from the target category image library; the seed images in the target category image library are marked as sensitive target categories; the first matching condition indicates that there is a local similarity between the image to be identified and the candidate images; Based on the first feature points and first feature descriptor of the image to be identified, and the second feature points and second feature descriptor of the candidate image, the matching feature points of the candidate image are determined from the second feature points. The matching feature points of the candidate image and the matching feature points corresponding to the image to be identified constitute a pair of matching feature points representing the matching relationship. Based on the matching feature points of the candidate images, a key image that satisfies the second matching condition is determined from the candidate images, and the matching feature points of the key image are determined; the second matching condition indicates the number or distribution characteristics of the matching feature points of the candidate images. The semantic region representing the target category in the key image is determined. If the matching feature points of the key image fall within the semantic region, the category of the image to be identified is determined to be the target category.

2. The method according to claim 1, characterized in that, The step of extracting features from the image to be identified to obtain the first feature points and the first feature descriptor of the image to be identified includes: Based on feature extraction and description algorithms, the first feature points of the image to be identified are detected; Generate a first feature descriptor corresponding to the first feature point, wherein the first feature descriptor is a binary string.

3. The method according to claim 1, characterized in that, The method further includes: Obtain the seed image from the target category image library; Feature extraction is performed on the seed image to obtain the third feature point and the third feature descriptor of the seed image, and there is a correspondence between the third feature point and the third feature descriptor; A first retrieval table is constructed based on the third feature descriptor of the seed image and the image identifier of the seed image.

4. The method according to claim 3, characterized in that, The step of determining candidate images that meet the first matching condition from the target category image library based on the first feature descriptor of the image to be identified includes: Based on the first feature descriptor, determine a first matching feature descriptor that satisfies a preset similarity condition with the first feature descriptor; Based on the first retrieval table, determine the first matching image identifier corresponding to the first matching feature descriptor; Based on the first matching image identifier, a corresponding seed image is obtained from the target category image library, and the seed image is used as the candidate image. The second feature point and the second feature descriptor of the candidate image are obtained, and the second feature point and the second feature descriptor have a corresponding relationship.

5. The method according to claim 1, characterized in that, The step of determining matching feature points of the candidate image from the second feature points based on the first feature points and first feature descriptor of the image to be identified, and the second feature points and second feature descriptor of the candidate image, includes: Calculate the similarity between the first feature descriptor and the second feature descriptor respectively; Based on preset similarity conditions and the similarity, matching feature points of the image to be identified are determined from the first feature points, and matching feature points of the candidate image are determined from the second feature points. The matching feature points of the candidate image and the matching feature points of the image to be identified have a corresponding relationship.

6. The method according to claim 1, characterized in that, The method further includes: Obtain the seed images from the target category image library; Target detection is performed on the seed image to determine the semantic regions in the seed image that represent the target category; A second retrieval table is constructed based on the semantic region and image identifier of the seed image.

7. The method according to claim 6, characterized in that, The step of determining the semantic region representing the target category in the key image, and determining the category of the image to be identified as the target category if the matching feature points of the key image fall within the semantic region, includes: Based on the second retrieval table, the semantic regions representing the target category in each of the key images are determined; The matching feature points corresponding to the image to be identified are determined by the matching feature points of the key image, and matching feature point pairs are generated accordingly; In the preset spatial model, the spatial relationship of the matching feature point pairs is constructed based on the random sampling consensus algorithm; Based on the spatial relationship of the matching feature point pairs, the matching feature point pairs are filtered to obtain key matching feature point pairs; In the key matching feature point pair, if the matching feature point of the key image falls within the semantic region, then the category of the image to be identified is determined to be the target category.

8. An image category recognition device, characterized in that, The device includes: The feature extraction module is used to acquire an image to be identified, and to extract features from the image to be identified to obtain a first feature point and a first feature descriptor of the image to be identified, wherein the first feature point and the first feature descriptor have a corresponding relationship. The first matching module is used to determine candidate images that meet the first matching conditions from a target category image library based on the first feature descriptor of the image to be identified; the seed images in the target category image library are marked as sensitive target categories; the first matching condition indicates that there is local similarity between the image to be identified and the candidate images; The second matching module is configured to: determine matching feature points of the candidate image from the second feature points based on the first feature points and first feature descriptor of the image to be identified, and the second feature points and second feature descriptor of the candidate image; wherein the matching feature points of the candidate image and the matching feature points corresponding to the image to be identified constitute a pair of matching feature points representing a matching relationship; determine key images that satisfy a second matching condition from the candidate images based on the matching feature points of the candidate images, and determine the matching feature points of the key images; wherein the second matching condition indicates the number or distribution characteristics of the matching feature points of the candidate images; The category recognition module is used to determine the semantic region in the key image that represents the target category. If the matching feature points of the key image fall within the semantic region, the category of the image to be recognized is determined to be the target category.

9. The apparatus according to claim 8, characterized in that, The feature extraction module includes: The feature point detection unit is used to detect the first feature points of the image to be identified based on feature extraction and description algorithms. The feature description unit is used to generate a first feature descriptor corresponding to the first feature point, wherein the first feature descriptor is a binary string.

10. The apparatus according to claim 8, characterized in that, The device further includes: The first acquisition unit is used to acquire seed images of the target category image library; The feature extraction unit is used to extract features from the seed image to obtain the third feature point and the third feature descriptor of the seed image, wherein the third feature point and the third feature descriptor have a corresponding relationship. The first retrieval table unit is used to construct a first retrieval table based on the third feature descriptor of the seed image and the image identifier of the seed image.

11. The apparatus according to claim 10, characterized in that, The first matching module includes: The feature descriptor selection unit is used to determine a first matching feature descriptor that satisfies a preset similarity condition to the first feature descriptor based on the first feature descriptor. The first retrieval unit is used to determine the first matching image identifier corresponding to the first matching feature descriptor based on the first retrieval table. The candidate image acquisition unit is configured to acquire a corresponding seed image from the target category image library based on the first matching image identifier, use the seed image as the candidate image, and obtain a second feature point and a second feature descriptor of the candidate image, wherein the second feature point and the second feature descriptor have a corresponding relationship.

12. The apparatus according to claim 8, characterized in that, The second matching module includes: A similarity calculation subunit is used to calculate the similarity between the first feature descriptor and the second feature descriptor, respectively; The judgment subunit is used to determine the matching feature points of the image to be identified from the first feature points and the matching feature points of the candidate image from the second feature points, based on the preset similarity conditions and the similarity. The matching feature points of the candidate image and the matching feature points of the image to be identified have a corresponding relationship.

13. The apparatus according to claim 8, characterized in that, The device further includes: The second acquisition unit is used to acquire seed images of the target category image library; The target detection unit is used to perform target detection on the seed image and determine the semantic region representing the target category in the seed image; The second retrieval table unit is used to construct a second retrieval table based on the semantic region of the seed image and the image identifier of the seed image.

14. The apparatus according to claim 13, characterized in that, The category identification module includes: The second retrieval unit is used to determine the semantic region representing the target category in each of the key images based on the second retrieval table; The matching feature point pair determination unit is used to determine the matching feature points corresponding to the image to be identified from the matching feature points of the key image, and generate matching feature point pairs accordingly. The spatial relationship construction unit is used to construct the spatial relationship of the matching feature point pairs in a preset spatial model based on a random sampling consensus algorithm. A filtering unit is used to filter the matching feature point pairs based on the spatial relationship of the matching feature point pairs to obtain key matching feature point pairs; The determination unit is configured to determine the category of the image to be identified as the target category if the matching feature points of the key image fall within the semantic region in the key matching feature point pair.

15. A computer-readable machine storage medium, characterized in that, The storage medium stores at least one instruction or at least one program segment, which is loaded and executed by a processor to implement an image category recognition method as described in any one of claims 1 to 7.