A picture similarity recognition method and device

By performing online augmentation and feature encoding calculations on tobacco display scene images, the problem of low recognition accuracy of images from different angles was solved, achieving efficient and accurate similarity recognition of tobacco display scene images.

CN115861657BActive Publication Date: 2026-06-26SHENZHEN AIMALL TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN AIMALL TECHNOLOGY CO LTD
Filing Date
2022-12-06
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies have low accuracy in identifying images taken from different angles within the same tobacco display scene, resulting in inaccurate identification of image similarity for tobacco display scenes.

Method used

By acquiring a set of tobacco display images and a target recognition model for tobacco display scenes, online augmentation processing is performed on the target images and reference images to generate target augmented image sets and reference augmented image sets. Feature codes are extracted using a feature extraction model, and cosine similarity is calculated to determine image similarity.

Benefits of technology

It improves the efficiency and accuracy of identifying similarity between images of tobacco display scenes, and can accurately determine whether images are from the same tobacco display scene, especially when there are significant differences in shooting angle or cropping.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115861657B_ABST
    Figure CN115861657B_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of picture similarity recognition, and particularly relates to a picture similarity recognition method, which comprises the following steps: obtaining a tobacco display picture set of a tobacco display scene and a target recognition model, wherein the tobacco display picture set comprises a target picture and a reference picture; performing online augmentation processing on the target picture and the reference picture to obtain a target augmented picture set of the target picture and a reference augmented picture set of the reference picture; and using the target recognition model to recognize the target augmented picture set and the reference augmented picture set to obtain a target similarity of the target picture and the reference picture. The method can quickly, efficiently and accurately recognize the tobacco display scene picture, accurately determine whether any two tobacco display scene pictures are pictures of the same tobacco display scene, and improve the recognition efficiency and accuracy of the tobacco display scene picture similarity.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image similarity recognition technology, and in particular to a method and apparatus for image similarity recognition. Background Technology

[0002] To centralize the management and evaluation of tobacco retail outlets and the tobacco sold there, tobacco companies require outlet staff to photograph their own tobacco displays and upload them to the company's management system. However, some retail staff secretly photograph other outlets' tobacco displays to pass them off as their own.

[0003] To differentiate tobacco display scene images for each tobacco retail store, existing solutions use basic object detection techniques to identify the similarity of images uploaded by stores and downloaded from the internet. However, by identifying images of the same tobacco display scene taken from different angles, the accuracy of current solutions in identifying the similarity of tobacco display scene images is low. Summary of the Invention

[0004] This application provides a method and apparatus for identifying image similarity, which solves the technical problem of low accuracy in identifying the similarity of tobacco display scene images taken from different angles in the same tobacco display scene in the prior art. It realizes fast, efficient and accurate identification of tobacco display scene images, accurately determines whether any two tobacco display scene images are images of the same tobacco display scene, and improves the technical effect of identifying the similarity of tobacco display scene images.

[0005] In a first aspect, embodiments of the present invention provide a method for recognizing image similarity, comprising:

[0006] Acquire a set of tobacco display images and a target recognition model for a tobacco display scene, wherein the tobacco display image set includes target images and reference images;

[0007] Both the target image and the reference image are subjected to online augmentation processing to obtain a target augmented image set for the target image and a reference augmented image set for the reference image;

[0008] The target recognition model is used to identify the target augmented image set and the reference augmented image set to obtain the target similarity between the target image and the reference image.

[0009] Preferably, the step of using the target recognition model to identify the target augmented image set and the reference augmented image set to obtain the similarity between the target image and the reference image includes:

[0010] The target feature code of each target augmented image in the target augmented image set and the reference feature code of each reference augmented image in the reference augmented image set are extracted using the feature extraction model of the target recognition model;

[0011] The target similarity between the target image and the reference image is obtained based on the target feature encoding of each target augmented image and the reference feature encoding of each reference augmented image.

[0012] Preferably, obtaining the target similarity between the target image and the reference image based on the target feature encoding of each target augmented image and the reference feature encoding of each reference augmented image includes:

[0013] Based on the target feature code of a target augmented image and the reference feature code of a reference augmented image, a cosine similarity between the target image and the reference image is obtained. After performing the above operation on the target feature code of each target augmented image and the reference feature code of each reference augmented image, multiple cosine similarities between the target image and the reference image are obtained.

[0014] If one of the cosine similarities is not less than the similarity threshold, then the cosine similarity is determined as the target similarity, and information is output that the target image and the reference image are images of the same tobacco display scene.

[0015] Preferably, after obtaining multiple cosine similarities between the target image and the reference image, the method further includes:

[0016] If each of the multiple cosine similarities is less than the similarity threshold, then the target similarity is determined based on the multiple cosine similarities, and information about the target image and the reference image being images of different tobacco display scenarios is output.

[0017] Preferably, the target image is subjected to online augmentation processing to obtain a target augmented image set, including:

[0018] The target image is subjected to online augmentation processing to obtain the target augmented image set. The online augmentation processing includes rotation processing, distortion processing, magnification processing, reduction processing, cropping processing, occlusion processing, watermark addition processing, color adjustment processing, and combination processing. The target augmented image set includes the rotated target image, the distorted target image, the magnified target image, the reduced target image, the cropped target image, the occluded target image, the target image with added watermark, the target image with adjusted color, and the combined target image.

[0019] Preferably, the acquisition of the target recognition model includes:

[0020] The initial recognition model is trained using a set of training images of tobacco displays in the tobacco display scene until the trained recognition model meets the training constraints. The recognition model that meets the training constraints is then determined as the target recognition model.

[0021] Preferably, the process of training the initial recognition model using the tobacco display training image set of the tobacco display scene further includes:

[0022] The tobacco display training image set was offline augmented using a 3D simulation method to obtain a tobacco display simulation image set.

[0023] The initial recognition model is trained using the set of simulated tobacco display images.

[0024] Based on the same inventive concept, in a second aspect, the present invention also provides an image similarity recognition device, comprising:

[0025] The acquisition module is used to acquire a set of tobacco display images and a target recognition model for a tobacco display scene, wherein the tobacco display image set includes target images and reference images;

[0026] An augmentation module is used to perform online augmentation processing on both the target image and the reference image to obtain a target augmented image set for the target image and a reference augmented image set for the reference image.

[0027] The recognition module is used to recognize the target augmented image set and the reference augmented image set using the target recognition model, and to obtain the target similarity between the target image and the reference image.

[0028] Based on the same inventive concept, in a third aspect, the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of a method for recognizing image similarity.

[0029] Based on the same inventive concept, in a fourth aspect, the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of a method for recognizing image similarity.

[0030] One or more technical solutions in the embodiments of the present invention have at least the following technical effects or advantages:

[0031] In this embodiment of the invention, a set of tobacco display images and a target recognition model are first acquired, wherein the tobacco display image set includes target images and reference images. Then, online augmentation processing is performed on both the target images and the reference images to obtain a target augmented image set and a reference augmented image set. Here, by performing online augmentation processing on the target images and reference images, diverse target images are obtained from the target images, and diverse reference images are obtained from the reference images, resulting in target augmented image sets and reference augmented image sets. This provides a solid foundation for the subsequent accurate identification of target images and reference images, improves the recognition efficiency of the target recognition model, and enhances the accuracy of image similarity recognition.

[0032] After obtaining the target augmented image set and the reference augmented image set, a target recognition model is used to identify the target and reference augmented image sets, obtaining the target similarity between the target and reference images. Here, a feature extraction model is used to extract the target feature code for each target augmented image in the target augmented image set and the reference feature code for each reference augmented image in the reference augmented image set. Then, through multiple target feature codes and reference feature codes, multiple cosine similarities are obtained, thereby accurately determining the target similarity between the target and reference images. This accurately determines whether the target and reference images are from the same tobacco display scene, improving the recognition efficiency and accuracy of tobacco display scene image similarity. It also effectively supports matching when two images differ significantly due to factors such as shooting angle or cropping. Attached Figure Description

[0033] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings:

[0034] Figure 1 A flowchart illustrating the steps of the image similarity recognition method in an embodiment of the present invention is shown.

[0035] Figure 2 A schematic diagram of a spatial model in an embodiment of the present invention is shown;

[0036] Figure 3 A schematic diagram of a rotational space model in an embodiment of the present invention is shown;

[0037] Figure 4 This illustration shows a schematic diagram of rotating and distorting a target image in an embodiment of the present invention.

[0038] Figure 5This diagram illustrates a central magnification process for a target image in an embodiment of the present invention.

[0039] Figure 6 A schematic diagram of the process for extracting target feature encoding for each target augmented image in an embodiment of the present invention is shown;

[0040] Figure 7a This illustration shows a schematic diagram of a standing tobacco display in a tobacco display scene in store M, as shown in an embodiment of the present invention.

[0041] Figure 7b This illustration shows a schematic diagram of a standing tobacco display in a tobacco display scene in store N, as shown in an embodiment of the present invention.

[0042] Figure 8 The ROC curves obtained by comparing the image similarity recognition method, LPIPS method and DHASH method in an embodiment of the present invention are shown.

[0043] Figure 9 A schematic diagram of the image similarity recognition device in an embodiment of the present invention is shown. Detailed Implementation

[0044] Exemplary embodiments of the present disclosure will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

[0045] Example 1

[0046] The first embodiment of the present invention provides a method for recognizing image similarity, such as... Figure 1 As shown, it includes:

[0047] S101, Obtain a tobacco display image set and a target recognition model for the tobacco display scene, wherein the tobacco display image set includes target images and reference images;

[0048] S102, perform online augmentation processing on both the target image and the reference image to obtain the target augmented image set and the reference augmented image set for the reference image;

[0049] S103, use the target recognition model to identify the target augmented image set and the reference augmented image set, and obtain the target similarity between the target image and the reference image.

[0050] It should be noted that the tobacco display scenario refers to the scene where tobacco is displayed in a counter or cabinet in a tobacco sales store, such as a scene where a tobacco sales store has a counter or cabinet for displaying cigarettes.

[0051] In this embodiment, a set of tobacco display images and a target recognition model are first acquired for the tobacco display scene. The tobacco display image set includes target images and reference images. Then, online augmentation processing is performed on both the target images and reference images to obtain a target augmented image set and a reference augmented image set. Here, by performing online augmentation processing on the target images and reference images, diverse target images are obtained from the target images, and diverse reference images are obtained from the reference images, resulting in target augmented image sets and reference augmented image sets. This provides a solid foundation for the subsequent accurate identification of target images and reference images, improves the recognition efficiency of the target recognition model, and enhances the accuracy of image similarity recognition.

[0052] After obtaining the target augmented image set and the reference augmented image set, a target recognition model is used to identify the target and reference images, obtaining their target similarity. Here, the target recognition model obtains multiple cosine similarities between the target and reference images, thus accurately determining their target similarity. This precisely identifies whether the target and reference images represent the same tobacco display scene, improving the efficiency and accuracy of similarity recognition for tobacco display scenes. It also effectively supports matching when two images differ significantly due to factors such as shooting angle or cropping.

[0053] Below, in conjunction with Figure 1 The following details the specific implementation steps of the image similarity recognition method provided in this embodiment:

[0054] First, step S101 is executed to obtain a set of tobacco display images and a target recognition model for the tobacco display scene. The tobacco display image set includes target images and reference images.

[0055] Specifically, the process begins with obtaining a set of tobacco display images and a target recognition model for the tobacco display scene. Each tobacco display image in the set is unlabeled and typically taken using a camera, video camera, or drone. The set includes target images and reference images. Each image is selected as the target image, and the remaining images are used as reference images. Subsequent descriptions of the target and reference images will use a specific tobacco display image from the set as the target image and another image from the set as a reference image.

[0056] The target recognition model is designed to identify the similarity between each tobacco display image in a set of tobacco display images. The model is obtained by training the initial recognition model using a set of training images of tobacco display scenes. Each training image in the set carries a manually labeled tag; images from the same tobacco retailer share the same tag. For example, suppose the training image set includes images 1, 2, and 3. Since images 1 and 2 originate from the same tobacco retailer (store A), both images are labeled A. Similarly, since image 3 originates from the same tobacco retailer (store B), it is also labeled A.

[0057] In training the initial recognition model using a tobacco display training image set, the image set is first offline augmented using 3D simulation to obtain a simulated tobacco display image set. Then, each simulated tobacco display image is augmented online to obtain an augmented training image set for each simulated tobacco display image. In this augmented training image set, each simulated tobacco display image shares the same label, and each training image in an augmented training image set also shares the same label. This ensures diversity among training images with the same label, compensating for missing image data and improving the performance of the recognition model during the training phase. Finally, each augmented training image set is input into the initial recognition model. The model is controlled to determine the similarity between each training image in each augmented training image set, aiming to maximize the cosine similarity between training images with the same label and minimize the cosine similarity between training images with different labels.

[0058] The process of obtaining a simulated tobacco display image set by offline augmentation of a tobacco display training image set using 3D simulation methods is as follows: First, obtain M cabinet images, including the front, left, right, top, and bottom views of a counter cabinet, and the front, left, right, top, and bottom views of a standing cabinet. Each cabinet image represents a cabinet displaying tobacco, containing the texture of the tobacco display, and M ≥ 2. Next, each tobacco display training image is input into the background plane of the spatial model, serving as the background image of the 3D scene. Then, N cabinet images are randomly selected from the M images and placed in N adjustable cabinet-shaped cubes on the background plane, thus serving as cabinet images on the background image. Where M ≥ N, there is a one-to-one correspondence between N cabinet images and N adjustable cabinet-shaped cubes, representing the placement of one cabinet image within an adjustable cabinet-shaped cube, i.e., the texture of a tobacco display filled with one cabinet image within an adjustable cabinet-shaped cube. Then, by adjusting the model parameters of the spatial model, a set of simulated tobacco display images is generated. Model parameters include, but are not limited to, the number, shape, rotation angle, and display position of the adjustable cabinet-shaped cubes, the random cabinet images of the adjustable cabinet-shaped cube device, the shape and rotation angle of the background plane, the rotation angle of the spatial model, and additional background plane textures. For example... Figure 2 As shown, the spatial model includes a background plane 201 and multiple adjustable cabinet-shaped cubes 202 set on the background plane 201. The background plane 201 is used to display tobacco display training images, and the adjustable cabinet-shaped cubes 202 are used to display cabinet images containing tobacco display textures placed on the tobacco display training images.

[0059] Taking a tobacco display training image as an example, the image is input into the background plane of the spatial model. Then, N cabinet images are randomly selected from M cabinet images, and these N images are used to fill N adjustable cabinet-shaped cubes on the background plane. This allows the 3D scene background image to display the tobacco display training image, while the adjustable cabinet-shaped cubes display the tobacco display texture of the cabinet images. Next, by randomly selecting different textures for the background plane, the number, shape, rotation angle, and display position of the adjustable cabinet-shaped cubes, as well as the random cabinet images used in the adjustable cabinet-shaped cube device, the shape and rotation angle of the background plane, and other model parameters, are adjusted. The spatial model is also randomly rotated to create different tobacco display simulation images.

[0060] exist Figure 3In the left image, a tobacco display training image is input into the background plane 201 of the spatial model. Two adjustable cabinet-shaped cubes 202 are placed in the background plane 201, which can randomly fill two cabinet images. By adjusting the model parameters of the spatial model, i.e., adjusting the rotation angle of the spatial model, a simulated tobacco display image is generated, as shown below. Figure 3 As shown in the middle right figure.

[0061] During the initial recognition model training process, offline augmentation processing is used to generate a large number of simulated tobacco display images based on the tobacco display training image set. This addresses the problem of insufficient training data, allows for better control of the augmentation effect, and improves model performance.

[0062] During the initial training of the recognition model, the model is designated as the target recognition model when it meets the training constraints. These constraints can be that the current loss function value of the recognition model's error function Arcface is not greater than a loss function threshold, the number of training iterations meets an iteration threshold, or other constraints. The loss function threshold and iteration threshold can be set according to actual needs. For example, the current recognition model is designated as the target recognition model when the current loss function value obtained from the training model is not greater than the loss function threshold.

[0063] Next, step S102 is executed, and online augmentation processing is performed on both the target image and the reference image to obtain the target augmented image set and the reference augmented image set of the reference image.

[0064] Specifically, taking the target image as an example, online augmentation processing is performed on the target image to obtain a target augmented image set. This online augmentation processing includes rotation, distortion, enlargement, reduction, cropping, occlusion, watermark addition, color adjustment, and combination. The target augmented image set includes the rotated target image, the distorted target image, the enlarged target image, the reduced target image, the cropped target image, the occluded target image, the target image with added watermark, the target image with color adjustment, and the combined target image. Different processing methods can be set according to actual needs. Similarly, the process of online augmentation processing on the reference image is the same as that on the target image, and the process of online augmentation processing on the reference image will not be described further.

[0065] The target image is rotated to obtain a rotated target image. The target image is distorted to obtain a distorted target image. The target image is enlarged to obtain an enlarged target image. The target image is reduced to obtain a reduced target image. And so on.

[0066] It should be noted that combination processing is a process that combines at least two of the following: rotation, distortion, enlargement, reduction, cropping, occlusion, watermark addition, color adjustment, and other processing. For example, suppose the combination processing involves rotation and distortion. Combining a target image means first rotating the target image to obtain the rotated target image, then distorting the rotated target image to obtain the combined target image. Alternatively, it means simultaneously rotating and distorting the target image to obtain the combined target image. Or, it means first distorting the target image to obtain the distorted target image, then rotating the distorted target image to obtain the combined target image. Therefore, the combination processing method is set according to the actual needs.

[0067] This embodiment provides a specific method for combining processing into rotational twisting processing. For example... Figure 4 As shown, this combination process is applied to the target image to obtain the combined target image. Figure 4 In the image, the black and white grid on the left is the target image, and the image on the right is the combined target image. The center of the image in the right image is rotated and distorted. The rotation and distortion processing function is as follows:

[0068]

[0069]

[0070] Among them, (x c1 ,y c1 (x) is the center point of the target image. s1 ,y s1 (x1, y1) represents the coordinates of any point in the target image relative to the center point, θ is the predefined rotation angle, r1 is the rotation radius, and (x1, y1) represents the coordinates in the combined target image.

[0071] This embodiment provides a specific method for center magnification processing. Figure 5 As shown, the target image is magnified at the center to obtain the magnified target image. Figure 5 In the image, the black and white grid on the left is the target image, and the image on the right is the target image magnified from the center. The center of the image in the right image is magnified into a circle. The center magnification processing function is as follows:

[0072] x2=(x s2 -xc2 )*k+x c2

[0073] y2=(y s2 -y c2 )*k+y c2

[0074] in, coordinates (x) c2 ,y c2 (x) represents the center point of the target image. s2 ,y s2 (x2, y2) represents the source coordinate point, (x2, y2) represents the coordinate point in the magnified target image, r2 represents the radius of the magnified area, and C0 is a constant with a value range of (0, 1) used to determine the degree of magnification.

[0075] Therefore, online augmentation processing is the process of creating an image set from a single image. In practical applications, the target image undergoes online augmentation processing to obtain 30 target augmented images, thus creating a target augmented image set. Similarly, the reference image undergoes online augmentation processing to obtain multiple reference augmented images, thus creating a reference augmented image set.

[0076] In this embodiment, by performing online augmentation processing on the target image and the reference image, a variety of target images are obtained from the target image, and a variety of reference images are obtained from the reference image. This results in a target augmented image set and a reference augmented image set, providing a solid foundation for the subsequent accurate identification of the target image and the reference image, improving the recognition efficiency of the target recognition model, and increasing the recognition accuracy of image similarity.

[0077] Then, step S103 is executed, in which the target augmented image set and the reference augmented image set are identified using the target recognition model to obtain the target similarity between the target image and the reference image.

[0078] Specifically, the process of obtaining the similarity between the target image and the reference image involves first using the feature extraction model of the target recognition model to extract the target feature code for each target augmented image in the target augmented image set and the reference feature code for each reference augmented image in the reference augmented image set. Then, based on the target feature code of each target augmented image and the reference feature code of each reference augmented image, the target similarity between the target image and the reference image is obtained. The preferred feature extraction model is the lightweight network model MobileNetV2, but other models can be selected according to actual needs.

[0079] like Figure 6As shown, taking the target image as an example, online augmentation processing is performed on the target image to obtain a target augmented image set. This target augmented image set is then input into a feature extraction model (MobileNetV2). The feature extraction model extracts the target feature code for each target augmented image, i.e., the feature code of each target augmented image. Similarly, the process of the feature extraction model extracting the reference feature code for each reference augmented image in the reference augmented image set is the same as the process of the feature extraction model extracting the target feature code for each target augmented image in the target augmented image set.

[0080] After extracting the target feature code of each target augmented image in the target augmented image set and the reference feature code of each reference augmented image in the reference augmented image set, the target similarity between the target image and the reference image is obtained by using the cosine similarity algorithm.

[0081] In this embodiment, the preferred length of the feature code extracted by the feature extraction model is 256 to improve the computational speed and recognition efficiency of both the target recognition model and the feature extraction model. Shorter feature code lengths result in shorter processing time for cosine similarity matching, but also lower expressive power. To select a suitable code length, based on actual business needs, a set of 10,000 original images was constructed, and online augmentation was performed on these 10,000 original images to obtain 300,000 augmented images. Three feature extraction models were trained using feature lengths of 128, 256, and 512. The accuracy and speed of the different models were compared during the process of querying one original image in the database to generate 30 augmented images. The test results are shown in Table 1.

[0082] Feature length Precision (1:1) speed 128 94.8% 0.65 seconds 256 99.4% 0.86 seconds 512 99.5% 1.75 seconds

[0083] As can be seen from the test results in Table 1, the 256-bit length feature achieves a good balance between accuracy and speed.

[0084] The specific process of obtaining the target similarity between a target image and a reference image using the cosine similarity algorithm is as follows: Based on the target feature encoding of a target augmented image and the reference feature encoding of a reference augmented image, a cosine similarity between the target image and the reference image is obtained. After performing the above operation on the target feature encoding of each target augmented image and the reference feature encoding of each reference augmented image, multiple cosine similarities between the target image and the reference image are obtained.

[0085] For example, the target augmented image set includes target augmented image C1, target augmented image C2, and target augmented image C3. The feature encoding of C1 is c1, the feature encoding of C2 is c2, and the feature encoding of C3 is c3. The reference augmented image set includes reference augmented image D1, reference augmented image D2, and D3. The feature encoding of D1 is d1, the feature encoding of D2 is d2, and the feature encoding of D3 is d3. c1 and d1 are processed using a cosine similarity algorithm to obtain a cosine similarity c1d1. c1 and d2 are processed using a cosine similarity algorithm to obtain a cosine similarity c1d2. And so on. After performing the above operations on the target feature encoding of each target augmented image and the reference feature encoding of each reference augmented image, multiple cosine similarities are obtained, namely c1d1, c1d2, c1d3, c2d1, c2d2, c2d3, c3d1, c3d2, and c3d3.

[0086] After obtaining multiple cosine similarities, if any one of the cosine similarities is not less than the similarity threshold (i.e., there exists at least one pair of target feature codes and reference feature codes whose cosine similarity is not less than the similarity threshold), indicating that the target image and the reference image belong to the same tobacco display scene, then this cosine similarity is determined as the target similarity, and the information that the target image and the reference image belong to the same tobacco display scene is output. The similarity threshold is set according to actual needs, and is usually set to 0.75.

[0087] Furthermore, if all n cosine similarities out of multiple cosine similarities are not less than the similarity threshold, then the average, median, or other calculated value among these n cosine similarities is taken as the target similarity, and information indicating that the target image and the reference image are from the same tobacco display scene is output. Where n ≥ 2.

[0088] If each of the multiple cosine similarities is less than the similarity threshold, it indicates that the target image and the reference image are not from the same tobacco display scene. In this case, the target similarity is determined based on the mean or median of the multiple cosine similarities, and information indicating that the target image and the reference image are from different tobacco display scenes is output. The specific method for determining the target similarity using multiple cosine similarities is set according to actual needs.

[0089] In this embodiment, a feature extraction model is used to extract the target feature code for each target augmented image in the target augmented image set and the reference feature code for each reference augmented image in the reference augmented image set. Then, multiple cosine similarities are obtained through these multiple target and reference feature codes, thereby accurately determining the target similarity between the target image and the reference image. This enables matching multiple augmented target images and multiple augmented reference images, accurately identifying target and reference images, and precisely determining whether the target image and reference image belong to the same tobacco display scene. This improves the efficiency and accuracy of identifying tobacco display scene image similarity, and effectively supports matching when two images differ significantly due to factors such as shooting angle or cropping.

[0090] It should also be noted that the image similarity recognition method of this embodiment can be used for images taken from different angles within the same tobacco display scene, images taken from different tobacco display scenes, or images from other scenes that are difficult to distinguish. For example, images taken from different angles within the same beverage area in a supermarket scene. Figure 7a and Figure 7b As shown, Figure 7a This is a photo of a standing tobacco display case taken in the tobacco display scene of store M. Figure 7b These are photos of a standing tobacco display case taken at store N. Figure 7a and Figure 7b Different grid patterns represent different brands of tobacco. Using the image similarity recognition method in this embodiment, [the following is identified]: Figure 7a and Figure 7b The target similarity is 0.12, indicating that the two images are from different tobacco display scenarios.

[0091] Furthermore, the image similarity recognition method of this embodiment is compared with the Learned Perceptual Image Patch Similarity (LPIPS) method and the traditional DHASH method: a test set of tobacco display images from 500 tobacco sales stores is selected, and this test set is tested using the three recognition methods to obtain the following results: Figure 8 The comparison results are shown.

[0092] exist Figure 8In the graph, the horizontal axis represents recall, and the vertical axis represents precision of the recognition method in identifying image similarity. The DHASH curve is the ROC (Receiver Operating Characteristic curve) obtained through the traditional DHASH method, the LPIPS curve is the ROC curve obtained through the LPIPS method, and the ISC curve is the ROC curve obtained through the recognition method in this embodiment. Figure 8 As can be seen intuitively, the identification method of this embodiment is significantly better than other methods in terms of recall and precision.

[0093] Example 2

[0094] Based on the same inventive concept, the second embodiment of the present invention also provides an image similarity recognition device, such as... Figure 9 As shown, it includes:

[0095] The acquisition module 301 is used to acquire a set of tobacco display images and a target recognition model for a tobacco display scene, wherein the tobacco display image set includes target images and reference images;

[0096] The augmentation module 302 is used to perform online augmentation processing on both the target image and the reference image to obtain a target augmented image set for the target image and a reference augmented image set for the reference image;

[0097] The recognition module 303 is used to recognize the target augmented image set and the reference augmented image set using the target recognition model, and obtain the target similarity between the target image and the reference image.

[0098] As an optional embodiment, the recognition module 303 is used to recognize the target augmented image set and the reference augmented image set using the target recognition model to obtain the similarity between the target image and the reference image, including:

[0099] The target feature code of each target augmented image in the target augmented image set and the reference feature code of each reference augmented image in the reference augmented image set are extracted using the feature extraction model of the target recognition model;

[0100] The target similarity between the target image and the reference image is obtained based on the target feature encoding of each target augmented image and the reference feature encoding of each reference augmented image.

[0101] As an optional embodiment, the recognition module 303 is used to obtain the target similarity between the target image and the reference image based on the target feature encoding of each target augmented image and the reference feature encoding of each reference augmented image, including:

[0102] Based on the target feature code of a target augmented image and the reference feature code of a reference augmented image, a cosine similarity between the target image and the reference image is obtained. After performing the above operation on the target feature code of each target augmented image and the reference feature code of each reference augmented image, multiple cosine similarities between the target image and the reference image are obtained.

[0103] If one of the cosine similarities is not less than the similarity threshold, then the cosine similarity is determined as the target similarity, and information is output that the target image and the reference image are images of the same tobacco display scene.

[0104] As an optional embodiment, the identification module 303 is configured to: after obtaining multiple cosine similarities between the target image and the reference image, if each of the multiple cosine similarities is less than the similarity threshold, determine the target similarity based on the multiple cosine similarities, and output information that the target image and the reference image are images of different tobacco display scenarios.

[0105] As an optional embodiment, the augmentation module 302 is used to perform online augmentation processing on the target image to obtain a target augmented image set, including:

[0106] The target image is subjected to online augmentation processing to obtain the target augmented image set. The online augmentation processing includes rotation processing, distortion processing, magnification processing, reduction processing, cropping processing, occlusion processing, watermark addition processing, color adjustment processing, and combination processing. The target augmented image set includes the rotated target image, the distorted target image, the magnified target image, the reduced target image, the cropped target image, the occluded target image, the target image with added watermark, the target image with adjusted color, and the combined target image.

[0107] As an optional embodiment, the acquisition module 301, used for acquiring the target recognition model, includes:

[0108] The initial recognition model is trained using a set of training images of tobacco displays in the tobacco display scene until the trained recognition model meets the training constraints. The recognition model that meets the training constraints is then determined as the target recognition model.

[0109] As an optional embodiment, the process of training the initial recognition model using a training image set of tobacco display scenes further includes:

[0110] The tobacco display training image set was offline augmented using a 3D simulation method to obtain a tobacco display simulation image set.

[0111] The initial recognition model is trained using the set of simulated tobacco display images.

[0112] Since the image similarity recognition device described in this embodiment is the same device used to implement the image similarity recognition method in Embodiment 1 of this application, those skilled in the art can understand the specific implementation and various variations of the image similarity recognition device in this embodiment based on the image similarity recognition method described in Embodiment 1 of this application. Therefore, how the image similarity recognition device implements the method in Embodiment 1 of this application will not be described in detail here. Any device used by those skilled in the art to implement the image similarity recognition method in Embodiment 1 of this application falls within the scope of protection of this application.

[0113] Example 3

[0114] Based on the same inventive concept, the third embodiment of the present invention also provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of any of the above-described image similarity recognition methods.

[0115] Example 4

[0116] Based on the same inventive concept, the fourth embodiment of the present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of any of the methods of the image similarity recognition method described in the first embodiment above.

[0117] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0118] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0119] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0120] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0121] Although preferred embodiments of the invention have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including both the preferred embodiments and all changes and modifications falling within the scope of the invention.

[0122] Obviously, those skilled in the art can make various modifications and variations to this invention without departing from its spirit and scope. Therefore, if these modifications and variations fall within the scope of the claims of this invention and their equivalents, this invention also intends to include these modifications and variations.

Claims

1. A method for recognizing image similarity, characterized in that, include: Acquire a set of tobacco display images and a target recognition model for a tobacco display scene, wherein the tobacco display image set includes target images and reference images; Both the target image and the reference image are subjected to online augmentation processing to obtain a target augmented image set for the target image and a reference augmented image set for the reference image; The target recognition model is used to identify the target augmented image set and the reference augmented image set to obtain the target similarity between the target image and the reference image; so as to accurately determine whether the target image and the reference image are images of the same tobacco display scene; In training the initial recognition model using a tobacco display training image set, the tobacco display training image set is first offline augmented using a 3D simulation method to obtain a tobacco display simulation image set. Then, each tobacco display simulation image is augmented online to obtain an augmented training image set for each tobacco display simulation image. In this augmented training image set, each tobacco display simulation image has the same label, and each training image in an augmented training image set also has the same label. This ensures diversity among training images with the same label, compensating for missing image data and improving the performance of the recognition model during the training phase. Finally, each augmented training image set is input into the initial recognition model, controlling the similarity of each training image in each augmented training image set to ensure that the trained recognition model meets the requirement of maximizing the cosine similarity between training images with the same label and minimizing the cosine similarity between training images with different labels.

2. The method as described in claim 1, characterized in that, The step of using the target recognition model to identify the target augmented image set and the reference augmented image set to obtain the similarity between the target image and the reference image includes: The target feature code of each target augmented image in the target augmented image set and the reference feature code of each reference augmented image in the reference augmented image set are extracted using the feature extraction model of the target recognition model; The target similarity between the target image and the reference image is obtained based on the target feature encoding of each target augmented image and the reference feature encoding of each reference augmented image.

3. The method as described in claim 2, characterized in that, The step of obtaining the target similarity between the target image and the reference image based on the target feature encoding of each target augmented image and the reference feature encoding of each reference augmented image includes: Based on the target feature code of a target augmented image and the reference feature code of a reference augmented image, a cosine similarity between the target image and the reference image is obtained. After performing the above operation on the target feature code of each target augmented image and the reference feature code of each reference augmented image, multiple cosine similarities between the target image and the reference image are obtained. If one of the cosine similarities is not less than the similarity threshold, then the cosine similarity is determined as the target similarity, and information is output that the target image and the reference image are images of the same tobacco display scene.

4. The method as described in claim 3, characterized in that, After obtaining multiple cosine similarities between the target image and the reference image, the method further includes: If each of the multiple cosine similarities is less than the similarity threshold, then the target similarity is determined based on the multiple cosine similarities, and information about the target image and the reference image being images of different tobacco display scenarios is output.

5. The method as described in claim 1, characterized in that, The target image is subjected to online augmentation processing to obtain a target augmented image set, including: The target image is subjected to online augmentation processing to obtain the target augmented image set. The online augmentation processing includes rotation processing, distortion processing, magnification processing, reduction processing, cropping processing, occlusion processing, watermark addition processing, color adjustment processing, and combination processing. The target augmented image set includes the rotated target image, the distorted target image, the magnified target image, the reduced target image, the cropped target image, the occluded target image, the target image with added watermark, the target image with adjusted color, and the combined target image.

6. The method as described in claim 1, characterized in that, The acquisition of the target recognition model includes: The initial recognition model is trained using a set of training images of tobacco displays in the tobacco display scene until the trained recognition model meets the training constraints. The recognition model that meets the training constraints is then determined as the target recognition model.

7. The method as described in claim 6, characterized in that, The process of training the initial recognition model using the tobacco display training image set of the tobacco display scene also includes: The tobacco display training image set was offline augmented using a 3D simulation method to obtain a tobacco display simulation image set. The initial recognition model is trained using the set of simulated tobacco display images.

8. A device for recognizing image similarity, characterized in that, The apparatus for performing the image similarity recognition method as described in any one of claims 1-7, the apparatus comprising: The acquisition module is used to acquire a set of tobacco display images and a target recognition model for a tobacco display scene, wherein the tobacco display image set includes target images and reference images; An augmentation module is used to perform online augmentation processing on both the target image and the reference image to obtain a target augmented image set for the target image and a reference augmented image set for the reference image. The recognition module is used to recognize the target augmented image set and the reference augmented image set using the target recognition model, and to obtain the target similarity between the target image and the reference image.

9. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps of the method as described in any one of claims 1-7.

10. A computer-readable storage medium storing a computer program thereon, characterized in that, When the program is executed by the processor, it implements the steps of the method as described in any one of claims 1-7.