Picture content similarity analysis method and device and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By extracting local regions from images and performing similarity analysis, this method solves the problem of low accuracy in complex backgrounds and multi-target scenes, achieving higher precision in image content similarity judgment.

CN115270907BActive Publication Date: 2026-06-12TENCENT MUSIC ENTERTAINMENT TECH (SHENZHEN) CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: TENCENT MUSIC ENTERTAINMENT TECH (SHENZHEN) CO LTD
Filing Date: 2019-09-05
Publication Date: 2026-06-12

Application Information

Patent Timeline

05 Sep 2019

Application

12 Jun 2026

Publication

CN115270907B

IPC: G06V10/74; G06T7/00; G06T7/70

CPC: G06T7/0002; G06T7/70

AI Tagging

Application Domain

Image analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing technologies have low accuracy in identifying image similarity, especially in complex backgrounds and multi-object scenarios. Traditional hash algorithms have poor robustness, and feature-based methods cannot effectively determine the similarity of local content in large image backgrounds.

⚗Method used

Local regions are extracted from images using a pre-trained object detection model to form a sequence of local images. Similarity analysis is used to calculate the similarity between local images. The calculation process is optimized by combining a greedy algorithm and selection rules to obtain the content similarity of the images.

🎯Benefits of technology

In complex backgrounds and multi-target scenarios, it improves the accuracy of image content similarity analysis, avoids the shortcomings of traditional methods, and enhances the precision of local similarity judgment.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115270907B_ABST

Patent Text Reader

Abstract

This application discloses an image content similarity analysis method, apparatus, and storage medium. The method includes: determining multiple local regions to be analyzed from a first image to obtain a first local image sequence; determining multiple local regions to be analyzed from a second image to obtain a second local image sequence; determining a first number of local images in the first local image sequence and a second number of local images in the second local image sequence; calculating the similarity between each local image in the first local image sequence and each local image in the second local image sequence; determining the smaller value between the first number and the second number as a target number; sequentially selecting the target number of local images from the first local image sequence and the second local image sequence according to the order of similarity from high to low, and performing similarity analysis processing to obtain a similarity result; and calculating the content similarity between the first image and the second image based on the similarity result.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of information processing technology, specifically to a method, apparatus, and storage medium for image content similarity analysis. Background Technology

[0002] With the development of technology and the continuous progress of society, people can quickly and easily obtain image resources. Finding similar or identical images from this vast amount of resources is therefore crucial. In developing this invention, the inventors discovered that existing technologies use hash values to identify image similarity. However, due to the increasing diversity and complexity of image content, the accuracy of hash-based identification is low for images with similar content or partial similarity. Summary of the Invention

[0003] This application provides a method, apparatus, and storage medium for image content similarity analysis, which can improve the accuracy of image content similarity analysis results.

[0004] Accordingly, this application also provides a method for image content similarity analysis, including:

[0005] Multiple local regions to be analyzed are identified from the first image, resulting in a sequence of first local images;

[0006] Multiple local regions to be analyzed are identified from the second image, resulting in a sequence of second local images;

[0007] Determine the first number of local images in the first local image sequence and the second number of local images in the second local image sequence;

[0008] Calculate the similarity between each local image in the first local image sequence and each local image in the second local image sequence;

[0009] The smaller value between the first quantity and the second quantity is determined as the target quantity;

[0010] Based on the order of similarity from high to low, the target number of local images are selected sequentially from the first local image sequence and the second local image sequence for similarity analysis to obtain the similarity results.

[0011] Based on the similarity results, the content similarity between the first image and the second image is calculated.

[0012] Accordingly, this application also provides an image content similarity analysis device, including:

[0013] The first determining unit is used to determine multiple local regions to be analyzed from the first image to obtain a first local image sequence;

[0014] The second determining unit is used to determine multiple local regions to be analyzed from the second image to obtain a second local image sequence;

[0015] The analysis unit is used to determine the first number of local images in the first local image sequence and the second number of local images in the second local image sequence;

[0016] Calculate the similarity between each local image in the first local image sequence and each local image in the second local image sequence;

[0017] The smaller value between the first quantity and the second quantity is determined as the target quantity;

[0018] Based on the order of similarity from high to low, the target number of local images are selected sequentially from the first local image sequence and the second local image sequence for similarity analysis to obtain the similarity results.

[0019] The processing unit is used to calculate the content similarity between the first image and the second image based on the similarity results.

[0020] Accordingly, this application also provides a storage medium storing multiple instructions adapted for loading by a processor to execute the steps in the image content similarity analysis method described above.

[0021] This application's solution can apply the similarity algorithm to small areas even when the image background is complex, avoiding the problem that image similarity algorithms are not robust to large images with complex backgrounds and multi-target scenes, thus improving the accuracy of image content similarity analysis results. Attached Figure Description

[0022] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0023] Figure 1 This is a flowchart illustrating the image content similarity analysis method provided in the embodiments of this application.

[0024] Figure 2 This is a schematic diagram of the image content similarity analysis device provided in the embodiments of this application.

[0025] Figure 3 This is a schematic diagram of the terminal structure provided in the embodiments of this application. Detailed Implementation

[0026] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0027] This application provides a method, apparatus, and storage medium for image content similarity analysis.

[0028] Specifically, the image content similarity analysis device can be integrated into a terminal device such as a tablet PC (Personal Computer) or a mobile phone that has a storage unit and a microprocessor with computing capabilities.

[0029] In related technologies, many traditional methods for image similarity assessment employ hash algorithms. However, hash-based methods are poorly robust to transformations such as rotation and color changes, and their accuracy is not high enough. Feature-based similarity assessment, divided into those based on handcrafted image features (such as STFT) and those based on convolutional features, offers better robustness and accuracy. However, for large images, especially with background interference, these algorithms cannot be directly used for content similarity detection. In summary, traditional image similarity assessment techniques (such as hash-based and feature-based methods) tend to assess the similarity between an image and its original version under the influence of watermarks, blurring, rotation, noise, and cropping.

[0030] Furthermore, two images may contain similar content, but may not share specific similarities in certain areas. For example, advertisements for a certain type of item (such as computers) can take many forms, but they all include the target product (computer) or some of its components. Moreover, the degree of local similarity for the same target product can vary. For instance, two Apple computers will be more similar than one Apple computer and one computer from another brand. The methods mentioned above cannot effectively solve this type of problem.

[0031] Based on this, embodiments of this application provide a method for image content similarity analysis. The method includes: determining multiple local regions to be analyzed from a first image to obtain a first local image sequence; determining multiple local regions to be analyzed from a second image to obtain a second local image sequence; performing similarity analysis on the first local image sequence and the second local image sequence to obtain a similarity result; and calculating the content similarity between the first image and the second image based on the similarity result.

[0032] The following sections provide detailed descriptions of each example. It should be noted that the sequence numbers of the following embodiments are not intended to limit the preferred order of the embodiments.

[0033] Please see Figure 1 , Figure 1 This is a flowchart illustrating the image content similarity analysis method provided in this application embodiment. The specific flow of the image content similarity analysis method can be as follows:

[0034] 101. Identify multiple local regions to be analyzed from the first image to obtain the first local image sequence.

[0035] The first image can refer to a two-dimensional medium composed of graphics, images, etc. It is a form of object with a specific shape, encompassing drawings, photographs, rubbings, etc., referring to a form that uses points, lines, symbols, text, and numbers to depict the geometric characteristics, shape, position, and size of things. Image formats are numerous, but can generally be divided into two main categories: raster graphics and vector graphics. Commonly used formats such as BMP and JPG are raster graphics, while formats such as SWF, CDR, and AI are vector graphics. With the development of digital acquisition technology and signal processing theory, more and more images are stored in digital form.

[0036] In this embodiment of the application, the first image can be an image with many constituent elements and a high degree of visual interference, such as an advertising image with a complex background.

[0037] When comparing images of this type, the images contain numerous and complex elements. Therefore, to reduce the complexity of the images, this solution can extract the target to be compared from the original image for analysis and comparison. That is, in some embodiments, the step "determining multiple local regions to be analyzed from the first image to obtain a first local image sequence" may include the following process:

[0038] The entity location information is identified from the first image based on the pre-trained object detection model;

[0039] Based on the identified entity location information, local images are extracted from the first image to obtain the first local image sequence.

[0040] Among them, "entity" refers to people and / or objects with physical forms in the first image, such as living people, animals and plants, and inanimate objects (such as electrical appliances, household products, daily necessities, etc.).

[0041] In this embodiment, a pre-trained object localization model is required, such as a Faster-RCNN algorithm model, a YOLO algorithm model, or an SSD algorithm model. Typically, such a model is part of a complete object detection model, which generally includes an object localization part and an object classification part. That is, in practice, the object localization part can be extracted from the pre-trained object detection model; this part provides the location information (such as coordinate information) of entities that "may be objects." If the pre-trained model can contain the categories of the objects to be detected, the entire pre-trained model can be retained to obtain the final entity location information with category labels.

[0042] Based on a pre-trained object detection model, the location of entities in the first image is detected to determine the specific location of all entities in the first image. Then, based on this specific location, the corresponding region image (i.e., entity image) is extracted from the first image, resulting in one or more region images, thus forming a first local image sequence.

[0043] In some embodiments, taking image A as an example, the localization function of the object detection model can be used to obtain the coordinates of possible targets and extract them, thus forming a local image sequence Ao = [Ao1, Ao2, Ao3…Aon]. For example, the obtained first local image sequence may include a computer image, a cup image, a logo image on the computer, a pattern image on the cup, and other local images.

[0044] 102. Identify multiple local regions to be analyzed from the second image to obtain the second local image sequence.

[0045] Similarly, the second image is preferably an image with many constituent elements and a high degree of visual interference, such as an advertising image with a complex background. The step "determine multiple local regions to be analyzed from the second image to obtain a second local image sequence" may include the following process:

[0046] The entity location information is identified from the second image based on the pre-trained object detection model;

[0047] Based on the identified entity location information, local images are extracted from the second image to obtain a second local image sequence.

[0048] Here, "entity" can be referred to with reference to the definition of entity in the first image above, referring to people and / or objects with physical forms in the second image, such as living people, animals and plants, and inanimate objects (such as electrical appliances, household products, daily necessities, etc.).

[0049] Similarly, based on a pre-trained object detection model, the positions of entities in the second picture are detected to determine the specific positions of all entities in the second picture. Then, based on the specific positions, the corresponding regional pictures (i.e., entity pictures) are cropped from the second picture, and one or more regional pictures can be obtained, thus forming a second local picture sequence.

[0050] In some embodiments, taking the first picture to be compared, picture B, as an example, the positioning function of the object detection model can be used to obtain the coordinate positions of possible targets and crop them out, so as to form a local picture sequence Bo = [Bo1, Bo2, Bo3… Bom]. For example, the obtained second local picture sequence may include local images such as a computer picture, a cup picture, a trademark picture on the computer, a pattern picture on the cup, and so on.

[0051] It should be noted that each item in the above first local picture sequence and second local picture sequence is a local picture containing a target. And the lengths of the two sequences can be the same or different.

[0052] 103. Perform similarity analysis processing on the first local picture sequence and the second local picture sequence to obtain a similarity result.

[0053] In practical applications, there are multiple dimensions for analyzing picture similarity, such as similarity in hue, similarity after rotation, blurring, and scaling. In the embodiments of this application, the picture similarity to be analyzed specifically refers to the similarity of pictures containing similar content (or themes). Therefore, this application proposes a method for judging picture content similarity, which judges picture similarity based on the target entities contained in the pictures, and is especially applicable to scenarios where the similarity discrimination of picture content is relatively important, such as advertising pictures.

[0054] In some embodiments, when measuring the similarity between sequence Ao and sequence Bo, assume n < m without loss of generality. Let there be a local picture similarity metric f, then n items need to be selected from Bo, and a permutation of these n items needs to be found to maximize the similarity Sim(Ao, Bo). The similarity calculation formula is as follows:

[0055]

[0056] Specifically, all permutations of all n-item subsequences of Bo can be traversed to obtain the minimum loss. This method requires m! (i.e., the factorial of m) comparisons. However, due to the relatively high computational complexity of this method, when the number of elements in the two local picture sequences is not too large, this calculation method can be directly used to perform a full-sequence sorting on the first local picture sequence and the second local picture sequence. That is, in some embodiments, the step “Perform similarity analysis processing on the first local picture sequence and the second local picture sequence to obtain a similarity result” may include the following steps:

[0057] (11) Determine the first number of local images in the first local image sequence and the second number of local images in the second local image sequence;

[0058] (12) If neither the first quantity nor the second quantity exceeds the first preset value, then perform a full sequence image similarity analysis on the first local image sequence and the second local image sequence to obtain the similarity result.

[0059] The first preset value can be set based on the actual processing power of the processor in the terminal. Since the terminal's processing power has a significant impact on the output results, a larger first preset value can be set for terminals with stronger processing power, while a smaller value can be set for terminals with weaker processing power. In practical applications, other factors that affect the output speed can also be considered to adaptively adjust the first preset value.

[0060] However, when there are too many local images in a local image sequence, directly using the above calculation method to sort the first and second local image sequences can easily lead to a large number of permutations, resulting in high computational complexity. In this embodiment, a series of filtering rules can be used to filter local images in the local image sequence, removing some local images with low matching degree to reduce computational complexity. In specific implementation, a maximum value m_max can be set. When the number of elements in the image sequence exceeds m_max, the local images in the local image sequence can be filtered using filtering rules before similarity analysis is performed. There are several methods for filtering local images, as follows:

[0061] In some embodiments, the step "performing similarity analysis on the first local image sequence and the second local image sequence to obtain similarity results" may include the following steps:

[0062] (21) Determine the first number of local images in the first local image sequence and the second number of local images in the second local image sequence;

[0063] (22) If the first quantity and the second quantity satisfy the first preset condition, then calculate the similarity between each local image in the first local image sequence and each local image in the second local image sequence;

[0064] (23) From the first local image sequence and the second local image sequence, determine the longer local image sequence and the shorter local image sequence;

[0065] (24) Delete a corresponding number of partial images from the longer partial image sequence in ascending order of similarity, so that the number of remaining partial images in the longer partial image sequence is the second preset value, to obtain a target partial image sequence;

[0066] (25) Perform similarity analysis on all the images in the target partial image sequence and the shorter partial image sequence to obtain a similarity result.

[0067] Specifically, in this embodiment, m_max can be the second preset value, and the first condition can be: the first quantity is less than the second preset value, and the second quantity is greater than the second preset value.

[0068] Still taking n < m as an example above, that is, the shorter partial image sequence is the first partial image sequence Ao, and the longer partial image sequence is the second partial image sequence Bo. Then the first condition is: n < m_max, m > m_max. If the first quantity and the second quantity meet the first preset condition, at this time, the similarity between each element in the second partial image sequence Bo and each element in the first image sequence Ao can be calculated, and the elements in the second partial image sequence Bo can be reordered in descending order of the calculated similarity, and the last m_max - m elements in the second partial image sequence Bo can be eliminated. Finally, perform similarity analysis on all the images in the 1 to m_max elements (i.e., the target partial image sequence) in the second partial image sequence Bo and the first partial image sequence Ao to obtain a similarity result.

[0069] In practical applications, the second preset value can be equal to the above first preset value.

[0070] In some embodiments, the step of "performing similarity analysis on the first partial image sequence and the second partial image sequence to obtain a similarity result" may include the following steps:

[0071] (31) Determine the first quantity of the partial images in the first partial image sequence and the second quantity of the partial images in the second partial image sequence;

[0072] (32) If the first quantity and the second quantity meet the second preset condition, calculate the similarity between each partial image in the first partial image sequence and each partial image in the second partial image sequence;

[0073] (33) Delete partial images from the first partial image sequence and the second partial image sequence in turn according to the ascending order of the similarity;

[0074] (34) When the number of remaining partial pictures in the first partial picture sequence is the third preset value, the remaining partial pictures are taken as the first partial picture subsequence, and when the number of remaining partial pictures in the second partial picture sequence is the third preset value, the remaining partial pictures are taken as the second partial picture subsequence;

[0075] (35) Perform a similarity analysis of all-sequence pictures on the first partial picture subsequence and the second partial picture subsequence to obtain a similarity result.

[0076] Specifically, in this embodiment, m_max can be the third preset value, then the first condition can be: the first quantity is less than the third preset value, and the second quantity is greater than the third preset value. Still taking the first partial picture sequence Ao, the second partial picture sequence Bo, and n < m as an example, the second condition can be: n > m_max, m > m_max.

[0077] If the first quantity and the second quantity meet the second preset condition, at this time, the similarity between each element in the second partial picture sequence Bo and each element in the first picture sequence Ao can be calculated, and the elements in the first partial picture sequence Ao and the second partial picture sequence Bo can be re-sorted according to the calculated similarity from high to low. Then, start eliminating one by one from the end of the two re-sorted sequences. When the number of elements in the second partial picture sequence Bo reaches m_max or the number of elements in the first partial picture sequence Ao reaches m_max, skip the elements in the list that have reached m_max elements. Stop eliminating when both sequences reach m_max. Then, perform a similarity analysis of all-sequence pictures on the remaining elements in the second partial picture sequence Bo (i.e., the second partial picture subsequence) and the remaining elements in the first partial picture sequence Ao (i.e., the first partial picture subsequence) to obtain a similarity result.

[0078] In practical applications, the third preset value can be equal to the above first preset value and the second preset value.

[0079] In the specific implementation process, if the complexity of m! comparisons cannot be tolerated, a greedy algorithm can be used to calculate the similarity between the first partial picture sequence Ao and the second partial picture sequence Bo. That is, in some embodiments, the step "Perform a similarity analysis process on the first partial picture sequence and the second partial picture sequence to obtain a similarity result" may include the following steps:

[0080] (41) Determine the first quantity of the partial pictures in the first partial picture sequence and the second quantity of the partial pictures in the second partial picture sequence;

[0081] (42) Calculate the similarity between each partial picture in the first partial picture sequence and each partial picture in the second partial picture sequence;

[0082] (43) Determine the smaller value between the first quantity and the second quantity as the target quantity;

[0083] (44) Based on the order of similarity from high to low, select the target number of local images from the first local image sequence and the second local image sequence respectively for similarity analysis and processing to obtain the similarity results.

[0084] Specifically, for each element of the first local image sequence Ao, calculate its similarity to each element of the second local image sequence Bo, and then select the largest n similarity items using a greedy algorithm (first select the largest item, then remove the corresponding element and select the second largest item, and so on).

[0085] It should be noted that when calculating similarity, the similarity results should be averaged to keep the value between 0 and 1.

[0086] 104. Based on the similarity results, calculate the content similarity between the first image and the second image.

[0087] In some embodiments, the similarity result includes multiple similarity values between a corresponding local image in a first local image sequence and a corresponding local image in a second local image sequence.

[0088] When calculating the content similarity between the first image and the second image based on the similarity results, the average of the multiple similarity values can be calculated, and the calculated average can be used as the content similarity between the first image and the second image.

[0089] Since the object of similarity comparison here is a local area, this local image is typically small in size, with a concentrated target and weak background interference. In this case, feature-based image similarity algorithms can be used for comparison, such as the well-established convolutional feature image similarity and STFT feature image similarity. Obviously, the results of local similarity should also be scaled down to between 0 and 1.

[0090] The image content similarity analysis method provided in this embodiment determines multiple local regions to be analyzed from a first image, obtaining a first local image sequence; determines multiple local regions to be analyzed from a second image, obtaining a second local image sequence; performs similarity analysis on the first and second local image sequences to obtain a similarity result; and calculates the content similarity between the first and second images based on the similarity result. This scheme can apply the similarity algorithm to small regions even when the image background is complex, avoiding the problem of image similarity algorithms being ineffective for large images with complex backgrounds and multi-target scenes, thus improving the accuracy of image content similarity analysis results.

[0091] To facilitate better implementation of the image content similarity analysis method provided in this application, this application also provides an apparatus (hereinafter referred to as a processing apparatus) based on the above-described image content similarity analysis method, applied to a client. The meanings of the terms are the same as in the above-described image content similarity analysis method, and specific implementation details can be found in the description of the method embodiments.

[0092] Please see Figure 2 , Figure 2 This is a schematic diagram of the image content similarity analysis device provided in an embodiment of this application. The processing device 400 may include a first determining unit 401, a second determining unit 402, an analysis unit 403, and a processing unit 404, specifically as follows:

[0093] The first determining unit 401 is used to determine multiple local regions to be analyzed from the first image to obtain a first local image sequence;

[0094] The second determining unit 402 is used to determine multiple local regions to be analyzed from the second image to obtain a second local image sequence;

[0095] Analysis unit 403 is used to perform similarity analysis on the first local image sequence and the second local image sequence to obtain similarity results;

[0096] The processing unit 404 is used to calculate the content similarity between the first image and the second image based on the similarity result.

[0097] In some embodiments, the analysis unit 403 may specifically be used for:

[0098] Determine the first number of local images in the first local image sequence and the second number of local images in the second local image sequence;

[0099] If neither the first quantity nor the second quantity exceeds the first preset value, then a full sequence image similarity analysis is performed on the first local image sequence and the second local image sequence to obtain the similarity result.

[0100] In some embodiments, the analysis unit 403 may specifically be used for:

[0101] Determine the first number of local images in the first local image sequence and the second number of local images in the second local image sequence;

[0102] If the first quantity and the second quantity satisfy the first preset condition, then calculate the similarity between each local image in the first local image sequence and each local image in the second local image sequence;

[0103] From the first local image sequence and the second local image sequence, determine the longer local image sequence and the shorter local image sequence;

[0104] Based on the order of similarity from low to high, a corresponding number of local images are deleted from the longer local image sequence, so that the number of remaining local images in the longer local image sequence is a second preset value, in order to obtain the target local image sequence.

[0105] A full-sequence image similarity analysis is performed on the target local image sequence and the shorter local image sequence to obtain the similarity results.

[0106] In some embodiments, the analysis unit 403 may specifically be used for:

[0107] Determine the first number of local images in the first local image sequence and the second number of local images in the second local image sequence;

[0108] If the first quantity and the second quantity satisfy the second preset condition, then calculate the similarity between each local image in the first local image sequence and each local image in the second local image sequence;

[0109] Based on the order of similarity from low to high, local images are deleted sequentially from the first local image sequence and the second local image sequence, respectively.

[0110] When the number of remaining local images in the first local image sequence is a third preset value, the remaining local images are used as the first local image subsequence; and when the number of remaining local images in the second local image sequence is a third preset value, the remaining local images are used as the second local image subsequence.

[0111] A similarity analysis of the entire sequence of images is performed on the first local image subsequence and the second local image subsequence to obtain the similarity results.

[0112] In some embodiments, the analysis unit 403 may be specifically used for:

[0113] Determine the first number of local images in the first local image sequence and the second number of local images in the second local image sequence;

[0114] Calculate the similarity between each local image in the first local image sequence and each local image in the second local image sequence;

[0115] The smaller value between the first quantity and the second quantity is determined as the target quantity;

[0116] Based on the order of similarity from high to low, the target number of local images are selected sequentially from the first local image sequence and the second local image sequence for similarity analysis to obtain similarity results.

[0117] In some embodiments, the similarity result includes multiple similarity values between a corresponding local image in a first local image sequence and a corresponding local image in a second local image sequence. The processing unit 404 can specifically be used for:

[0118] The average of the multiple similarity values is calculated as the content similarity between the first image and the second image.

[0119] In some embodiments, determining multiple local regions to be analyzed from the first image to obtain a first local image sequence includes:

[0120] The entity location information is identified from the first image based on the pre-trained object detection model;

[0121] Based on the identified entity location information, local images are extracted from the first image to obtain the first local image sequence;

[0122] The step of determining multiple local regions to be analyzed from the second image to obtain a second local image sequence includes:

[0123] The entity location information is identified from the second image based on the pre-trained object detection model;

[0124] Based on the identified entity location information, local images are extracted from the second image to obtain a second local image sequence.

[0125] The image content similarity analysis apparatus provided in this application determines multiple local regions to be analyzed from a first image to obtain a first local image sequence; determines multiple local regions to be analyzed from a second image to obtain a second local image sequence; performs similarity analysis on the first local image sequence and the second local image sequence to obtain a similarity result; and calculates the content similarity between the first image and the second image based on the similarity result. This solution can apply the similarity algorithm to small regions even when the image background is complex, avoiding the problem that image similarity algorithms are not robust to large images with complex backgrounds and multi-target scenes, thus improving the accuracy of image content similarity analysis results.

[0126] This application also provides a terminal, which has the client described in the above embodiment installed. For example... Figure 3As shown, the terminal may include a radio frequency (RF) circuit 601, a memory 602 including one or more computer-readable storage media, an input unit 603, a display unit 604, a sensor 605, an audio circuit 606, a wireless fidelity (WiFi) module 607, a processor 608 including one or more processing cores, and a power supply 609, among other components. Those skilled in the art will understand that... Figure 3 The terminal structure shown does not constitute a limitation on the terminal and may include more or fewer components than shown, or combine certain components, or have different component arrangements. Wherein:

[0127] RF circuit 601 can be used for receiving and transmitting signals during information transmission or calls. Specifically, it receives downlink information from the base station and hands it over to one or more processors 608 for processing; additionally, it transmits uplink data to the base station. Typically, RF circuit 601 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, a low-noise amplifier (LNA), a duplexer, etc. Furthermore, RF circuit 601 can also communicate wirelessly with networks and other devices. The wireless communication can use any communication standard or protocol, including but not limited to GSM, GPRS, CDMA, WCDMA, LTE, email, and SMS.

[0128] The memory 602 can be used to store software programs and modules. The processor 608 executes various functional applications and data processing by running the software programs and modules stored in the memory 602. The memory 602 may mainly include a program storage area and a data storage area. The program storage area may store the operating system, application programs required for at least one function (such as sound playback function, image playback function, etc.), etc.; the data storage area may store data created according to the use of the terminal (such as audio data, phone book, etc.). In addition, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 602 may also include a memory controller to provide access to the memory 602 for the processor 608 and the input unit 603.

[0129] The input unit 603 can be used to receive input digital or character information, and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control. Specifically, in one embodiment, the input unit 603 may include a touch-sensitive surface and other input devices. The touch-sensitive surface, also known as a touch display or touchpad, can collect user touch operations on or near it (e.g., user operations using fingers, styluses, or any suitable object or accessory on or near the touch-sensitive surface), and drive corresponding connection devices according to a pre-set program. Optionally, the touch-sensitive surface may include a touch detection device and a touch controller. The touch detection device detects the user's touch orientation and the signal generated by the touch operation, transmitting the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into touch point coordinates, sends it to the processor 608, and can receive and execute commands from the processor 608. Furthermore, various types of touch-sensitive surfaces, such as resistive, capacitive, infrared, and surface acoustic wave, can be used. In addition to the touch-sensitive surface, the input unit 603 may also include other input devices. Specifically, other input devices may include, but are not limited to, one or more of the following: physical keyboard, function keys (such as volume control buttons, power buttons, etc.), trackball, mouse, joystick, etc.

[0130] Display unit 604 can be used to display information input by the user or information provided to the user, as well as various graphical user interfaces of the terminal. These graphical user interfaces can be composed of graphics, text, icons, video, and any combination thereof. Display unit 604 may include a display panel, optionally configured as a liquid crystal display (LCD), organic light-emitting diode (OLED), or similar form. Furthermore, a touch-sensitive surface may cover the display panel. When the touch-sensitive surface detects a touch operation on or near it, it transmits the information to processor 608 to determine the type of touch event. Subsequently, processor 608 provides corresponding visual output on the display panel according to the type of touch event. Although in Figure 3 In this context, the touch-sensitive surface and the display panel are two separate components for implementing input and output functions. However, in some embodiments, the touch-sensitive surface and the display panel can be integrated to achieve both input and output functions.

[0131] The terminal may also include at least one sensor 605, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor. The ambient light sensor can adjust the brightness of the display panel according to the ambient light level, and the proximity sensor can turn off the display panel and / or backlight when the terminal is moved to the ear. As a type of motion sensor, a gravity acceleration sensor can detect the magnitude of acceleration in various directions (generally three axes). When stationary, it can detect the magnitude and direction of gravity and can be used for applications that recognize the phone's posture (such as landscape / portrait switching, related games, magnetometer posture calibration), vibration recognition-related functions (such as pedometer, tapping), etc. Other sensors that the terminal may also be equipped with, such as gyroscopes, barometers, hygrometers, thermometers, and infrared sensors, will not be described in detail here.

[0132] Audio circuitry 606, a speaker, and a microphone provide an audio interface between the user and the terminal. Audio circuitry 606 converts received audio data into electrical signals, transmits them to the speaker, and the speaker converts them into sound signals for output. Conversely, the microphone converts collected sound signals into electrical signals, which are then received by audio circuitry 606, converted back into audio data, and processed by processor 608. The processed data is then transmitted via RF circuitry 601 to, for example, another terminal, or output to memory 602 for further processing. Audio circuitry 606 may also include an earphone jack to facilitate communication between a peripheral headset and the terminal.

[0133] WiFi is a short-range wireless transmission technology. Terminals using the WiFi module 607 can help users send and receive emails, browse web pages, and access streaming media, providing users with wireless broadband internet access. Although Figure 3 WiFi module 607 is shown, but it is understood that it is not a necessary component of the terminal and can be omitted as needed without changing the essence of the invention.

[0134] The processor 608 is the control center of the terminal, connecting various parts of the phone via various interfaces and lines. It executes software programs and / or modules stored in the memory 602, and calls data stored in the memory 602 to perform various functions and process data, thereby providing overall monitoring of the phone. Optionally, the processor 608 may include one or more processing cores; preferably, the processor 608 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and applications, while the modem processor mainly handles wireless communication. It is understood that the modem processor may not be integrated into the processor 608.

[0135] The terminal also includes a power supply 609 (such as a battery) to power various components. Preferably, the power supply can be logically connected to the processor 608 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system. The power supply 609 may also include one or more DC or AC power supplies, recharging systems, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components.

[0136] Although not shown, the terminal may also include a camera, Bluetooth module, etc., which will not be described in detail here. Specifically, in this embodiment, the processor 608 in the terminal loads the executable files corresponding to the processes of one or more applications into the memory 602 according to the following instructions, and the processor 608 runs the applications stored in the memory 602 to realize various functions:

[0137] Multiple local regions to be analyzed are identified from the first image to obtain a first local image sequence; multiple local regions to be analyzed are identified from the second image to obtain a second local image sequence; similarity analysis is performed on the first local image sequence and the second local image sequence to obtain similarity results; based on the similarity results, the content similarity between the first image and the second image is calculated.

[0138] In the embodiment of this application, when performing image content similarity analysis, the similarity algorithm can be applied to a small area when the image background is complex. This avoids the problem that the image similarity algorithm is not robust to large images with complex backgrounds and multi-target scenes, and improves the accuracy of image content similarity analysis results.

[0139] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be performed by instructions, or by instructions controlling related hardware. These instructions can be stored in a computer-readable storage medium and loaded and executed by a processor.

[0140] Therefore, embodiments of this application provide a storage medium storing multiple instructions that can be loaded by a processor to execute steps in any of the image content similarity analysis methods provided in embodiments of this application. For example, the instructions can execute the following steps:

[0141] Multiple local regions to be analyzed are identified from the first image to obtain a first local image sequence; multiple local regions to be analyzed are identified from the second image to obtain a second local image sequence; similarity analysis is performed on the first local image sequence and the second local image sequence to obtain similarity results; based on the similarity results, the content similarity between the first image and the second image is calculated.

[0142] For details on the implementation of each of the above operations, please refer to the previous examples, which will not be repeated here.

[0143] The storage medium may include: read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.

[0144] Since the instructions stored in the storage medium can execute the steps of any of the image content similarity analysis methods provided in the embodiments of this application, the beneficial effects that any of the image content similarity analysis methods provided in the embodiments of this application can achieve can be realized. For details, please refer to the previous embodiments, which will not be repeated here.

[0145] The image content similarity analysis method, apparatus, and storage medium provided in the embodiments of this application have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.

Claims

1. A method of picture content similarity analysis, characterized by, include: Multiple local regions to be analyzed are identified from the first image, resulting in a sequence of first local images; Multiple local regions to be analyzed are identified from the second image, resulting in a sequence of second local images; Determine the first number of local images in the first local image sequence and the second number of local images in the second local image sequence; Calculate the similarity between each local image in the first local image sequence and each local image in the second local image sequence; The smaller value between the first quantity and the second quantity is determined as the target quantity; Based on the order of similarity from high to low, a greedy algorithm is used to sequentially select first and second local images from the first and second local image sequences to form local image pairs, until the number of selected local image pairs reaches the target number. Each time a selection is made, the pair of local images with the highest similarity value is selected from the currently unselected local images. The similarity value corresponding to the selected local image pair is determined as the similarity result. Based on the similarity results, the content similarity between the first image and the second image is calculated.

2. The method of claim 1, wherein, The step of calculating the content similarity between the first image and the second image based on the similarity result includes: The mean of the similarity values corresponding to the selected local image pairs is calculated and used as the content similarity between the first image and the second image.

3. The similarity analysis method according to claim 2, wherein, The calculation of the average of the multiple similarity values, as the content similarity between the first image and the second image, includes: Each of the multiple similarity values is normalized to a preset numerical range to obtain multiple normalized similarity values; The average of the normalized similarity values is used as the content similarity between the first image and the second image.

4. The method of claim 1, wherein, The pre-trained object detection model includes an object localization model. The step of determining multiple local regions to be analyzed from a first image to obtain a first local image sequence includes: The entity location information is identified from the first image based on the target localization model; Based on the identified entity location information, local images are extracted from the first image to obtain the first local image sequence.

5. The method of claim 4, wherein, The target detection model includes a target classification model. The step of extracting local images from the first image based on the identified entity location information to obtain a first local image sequence includes: The target location information is classified according to the target classification model to obtain the entity location information with category labels. Based on the entity location information with category labels, local images are extracted from the first image to obtain a first local image sequence.

6. The image content similarity analysis method according to claim 1, characterized in that, The step of determining multiple local regions to be analyzed from the second image to obtain a second local image sequence includes: The entity location information is identified from the second image based on the pre-trained object detection model; Based on the identified entity location information, local images are extracted from the second image to obtain a second local image sequence.

7. An image content similarity analysis device, characterized in that, include: The first determining unit is used to determine multiple local regions to be analyzed from the first image to obtain a first local image sequence; The second determining unit is used to determine multiple local regions to be analyzed from the second image to obtain a second local image sequence; The analysis unit is used to determine a first number of local images in the first local image sequence and a second number of local images in the second local image sequence; Calculate the similarity between each local image in the first local image sequence and each local image in the second local image sequence; The smaller value between the first quantity and the second quantity is determined as the target quantity; Based on the order of similarity from high to low, a greedy algorithm is used to sequentially select first and second local images from the first and second local image sequences to form local image pairs, until the number of selected local image pairs reaches the target number. Each time a selection is made, the pair of local images with the highest similarity value is selected from the currently unselected local images. The similarity value corresponding to the selected local image pair is determined as the similarity result. The processing unit is used to calculate the content similarity between the first image and the second image based on the similarity result.

8. The image content similarity analysis device according to claim 7, characterized in that, The first determining unit is used to identify entity location information from the first image based on a pre-trained target detection model; and to extract local images from the first image based on the identified entity location information to obtain a first local image sequence. The second determining unit is used to identify entity location information from the second image based on a pre-trained object detection model; and to extract local images from the second image based on the identified entity location information to obtain a second local image sequence.

9. A storage medium, characterized in that, The storage medium stores multiple instructions, which are adapted for loading by a processor to execute the steps in the image content similarity analysis method according to any one of claims 1 to 6.