Method and apparatus for unsupervised multi-level clustering of images
By employing an unsupervised multi-level clustering method, leveraging GPU acceleration and streaming database processing, the clustering bottleneck of large-scale image data was resolved, achieving efficient and accurate image grouping and generating high-quality group label data.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHUXING TECH (BEIJING) CO LTD
- Filing Date
- 2022-06-02
- Publication Date
- 2026-06-23
Smart Images

Figure CN117218357B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates generally to the field of computer technology, and more specifically to a method and apparatus for unsupervised multilevel clustering of images. Background Technology
[0002] In large-scale fine-grained classification, recognition, and retrieval tasks, there are high requirements for the amount of group-labeled data, and there is also the problem of open class re-identification due to domain adaptation. Summary of the Invention
[0003] A brief overview of this disclosure is given below to provide a basic understanding of some aspects of it. However, it should be understood that this overview is not an exhaustive summary of this disclosure. It is not intended to identify key or essential parts of this disclosure, nor is it intended to limit the scope of this disclosure. Its purpose is merely to present certain concepts of this disclosure in a simplified form as a prelude to the more detailed description that follows.
[0004] According to a first aspect of this disclosure, a method for unsupervised multi-level clustering of images is provided, the method comprising: acquiring feature vectors and attribute information of multiple images in a streaming or non-streaming manner; performing first-level clustering to divide the images into multiple first-level clusters based on the image attribute information; performing second-level clustering on each first-level cluster based on the image feature vectors to divide the images into multiple refined second-level clusters, the second-level clustering employing the k-means algorithm and being executed using a GPU; performing third-level clustering on each second-level cluster based on the image feature vectors to divide the images into multiple further refined third-level clusters, the third-level clustering employing the DBSCAN algorithm; and determining whether the result of the multi-level clustering meets predetermined conditions, and if not, repeating one or more of the first-level clustering, second-level clustering, and third-level clustering.
[0005] According to a second aspect of this disclosure, a system for unsupervised multi-level clustering of images is provided, the system comprising: a data acquisition module configured to acquire feature vectors and attribute information of multiple images in a streaming or non-streaming manner; a first clustering module configured to perform first-level clustering based on the attribute information of the images to divide the images into multiple first-level clusters; and a second clustering module configured to: perform second-level clustering on each first-level cluster based on the feature vectors of the images to divide the images into multiple refined second-level clusters, the second-level clustering employing the k-means algorithm and executed using a GPU; perform third-level clustering on each second-level cluster based on the feature vectors of the images to divide the images into multiple further refined third-level clusters, the third-level clustering employing the DBSCAN algorithm; and determine whether the result of the multi-level clustering meets predetermined conditions, and if not, repeat one or more of the first-level clustering, second-level clustering, and third-level clustering.
[0006] According to a third aspect of this disclosure, an apparatus for unsupervised multi-level clustering of images is provided, comprising: a memory having instructions stored thereon; and a processor configured to execute the instructions stored in the memory to perform the method according to a first aspect of this disclosure.
[0007] According to a fourth aspect of this disclosure, a computer-readable storage medium is provided, including computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the method according to a first aspect of this disclosure.
[0008] Other features and advantages of this disclosure will become clearer from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description
[0009] The accompanying drawings, which form part of this specification, illustrate embodiments of this disclosure and, together with the specification, serve to explain the principles of this disclosure.
[0010] This disclosure will be more clearly understood with reference to the accompanying drawings and the following detailed description, wherein:
[0011] Figure 1 A schematic diagram of at least a portion of an apparatus for unsupervised multi-level clustering of images according to an embodiment of the present disclosure is shown;
[0012] Figure 2 A flowchart of at least a portion of a method for unsupervised multi-level clustering of images according to embodiments of the present disclosure is shown;
[0013] Figure 3A schematic diagram of at least a portion of an exemplary system for unsupervised multi-level clustering of images according to embodiments of the present disclosure is shown;
[0014] Figure 4 A schematic diagram of at least a portion of a computer system for unsupervised multi-level clustering of images according to embodiments of the present disclosure is shown. Detailed Implementation
[0015] The following detailed description is based on the accompanying drawings and provides various exemplary embodiments of the present disclosure to aid in a comprehensive understanding. Various details are included in the following description to aid understanding; however, these details are considered exemplary only and not intended to limit the present disclosure, which is defined by the appended claims and their equivalents. The words and phrases used in the following description are intended only to provide a clear and consistent understanding of the present disclosure. Additionally, descriptions of well-known structures, functions, and configurations may have been omitted for clarity and brevity. Those skilled in the art will recognize that various changes and modifications can be made to the examples described herein without departing from the spirit and scope of the present disclosure.
[0016] The following description of at least one exemplary embodiment is merely illustrative and is in no way intended to limit the scope of this disclosure or its application or use. That is, the structures and methods herein are shown in an exemplary manner to illustrate different embodiments of the structures and methods in this disclosure. However, those skilled in the art will understand that they merely illustrate exemplary ways that can be used to implement this disclosure, and not exhaustive ways. Furthermore, the drawings are not necessarily drawn to scale, and some features may be enlarged to show details of specific components.
[0017] Techniques, methods, and equipment known to those skilled in the art may not be discussed in detail, but where appropriate, such techniques, methods, and equipment should be considered part of the specification.
[0018] In all examples shown and discussed herein, any specific values should be interpreted as merely exemplary and not as limitations. Therefore, other examples of exemplary embodiments may have different values.
[0019] For tasks such as image search, it's necessary to locate the subject of the current image and search for identical or similar images. This provides users with an entry point for in-depth exploration based on a clear interest in a single piece of content, enriching the user's search experience. These tasks typically expect very fine-grained matching results. However, the content may involve very complex scenes, and the image data may contain a lot of noise. The image angle, background, and clarity may not meet the requirements, and there may be a large amount of image copying. Therefore, obtaining a large-scale, high-quality image dataset (such as a collection of sufficiently clean and rich image data) is time-consuming and labor-intensive.
[0020] How to efficiently and quickly acquire large-scale grouped training sample data from massive discrete images and solve the domain adaptation problem of fine-grained retrieval in complex data scenarios is a technical bottleneck that needs to be overcome.
[0021] Existing technologies include distance-based unsupervised clustering algorithms, density-based unsupervised clustering algorithms, and graph-based supervised clustering algorithms, among others. However, when faced with unsupervised clustering of large-scale (hundreds of millions) data points, these algorithms suffer from bottlenecks in terms of computing power, data, and algorithmic capabilities.
[0022] Figure 1 A schematic diagram of at least a portion of an apparatus 100 for unsupervised multi-level clustering of images according to an embodiment of the present disclosure is shown. It should be understood that... Figure 1 The modules shown are merely exemplary, and the device 100 may include only a portion of these modules, or it may include many other modules.
[0023] like Figure 1 As shown, the device 100 may include a data acquisition module 101, a first clustering module 102, and a second clustering module 103.
[0024] Specifically, the data acquisition module 101 can be configured to acquire feature vectors and attribute information of multiple content images from the data source 104 in a streaming or non-streaming manner. The data source 104 can include image data from the content, with or without prior information. The content can include image data and other information associated with the images; for example, the content can be notes, and each note can include one or more images and associated text descriptions, tags, etc. Furthermore, each piece of content can have an identifier corresponding to that content; for example, each note can have a note ID, etc. Other information associated with the images in the content can include prior information associated with one or more images in the content. For example, in the case of notes, the text descriptions or tags included in the notes can be descriptions of the image content as prior information, and therefore can include image attribute information. The detection module can detect the images in the content to obtain the bounding boxes of the detected objects in the images. For example, for an image containing a person, the detection module can be used to obtain the bounding box of the person in the image. Image attribute information can refer to the category information of objects included in the image. For example, for an image including a person, the attribute information could be the category of clothing worn by the person, such as pants, skirt, T-shirt, etc. Image attribute information can be obtained directly from the content (e.g., based on prior information included in the content) or extracted from the image by an attribute module (e.g., based on an image with a bounding box). Image feature vectors can be obtained in advance based on the image or extracted from the image by a feature module (e.g., through deep learning) (e.g., based on an image with a bounding box). For example, feature vectors can be 128-dimensional, 256-dimensional, or 512-dimensional feature floating-point values. In embodiments according to this disclosure, apparatus 100 may further include a first database 105, which is a database capable of streaming. The feature vectors and attribute information of the streamed images can be extracted in advance from images from data source 104 and stored in the first database 105. Furthermore, in embodiments according to this disclosure, feature vectors and attribute information of images can also be acquired non-streamingly (e.g., in real time), and these feature vectors and attributes can be obtained by the data acquisition module 101 from the data source 104 without having to be written into the first database 105 and read by the data acquisition module 101.
[0025] The first clustering module 102 can be configured to perform first-level clustering based on the image's attribute information to divide the image into multiple first-level clusters. Specifically, images with the same attribute information can be grouped into the same first-level cluster, and each first-level cluster can include multiple images with the same attribute information. For example, images including people wearing skirts can be grouped into one first-level cluster, and images including people wearing pants can be grouped into another first-level cluster.
[0026] The second clustering module 103 can be configured to perform secondary clustering on each primary cluster based on the image's feature vector to divide the image into multiple refined secondary clusters, wherein the secondary clustering adopts the k-means algorithm and is executed using a GPU; perform tertiary clustering on each secondary cluster based on the image's feature vector to divide the image into multiple further refined tertiary clusters, wherein the tertiary clustering adopts the DBSCAN algorithm; and determine whether the results of the multi-level clustering meet predetermined conditions, and if not, repeat one or more of the primary, secondary, and tertiary clustering methods.
[0027] In embodiments according to this disclosure, the apparatus 100 may further include a second database 106, which is a database capable of streaming read and write operations. Specifically, the second database 106 may be configured to store the feature vectors and related information of an image in the second database 106 after performing first-level clustering, wherein the related information includes one or more of the following: an identifier of the content to which the image belongs, an identifier of the first-level cluster to which the image belongs, an identifier of the image, and attribute information of the image; and to stream the feature vectors and related information of the image from the second database 106 to perform second-level and third-level clustering.
[0028] In this disclosure, based on these databases 105 and 106 that are capable of streaming processing, image data acquisition and clustering can be performed asynchronously, which helps to solve problems such as insufficient data storage space and low clustering efficiency.
[0029] Figure 2 A flowchart of at least a portion of a method 200 for unsupervised multilevel clustering of images according to embodiments of the present disclosure is shown.
[0030] like Figure 2As shown, at S21, feature vectors and attribute information of multiple content images can be acquired either streaming or non-streaming. Specifically, the feature vectors and attribute information of images acquired streaming can be extracted from the images in advance and stored in a first database capable of streaming; the feature vectors and attribute information of images acquired non-streaming can be extracted from the images in real time. In embodiments according to this disclosure, for multiple content images, sparse sampling can be performed on the images in the multiple content images according to the attribute information, so that only one feature vector is obtained for multiple images in the same content.
[0031] At S22, first-level clustering can be performed based on the image's attribute information to divide the image into multiple first-level clusters. In embodiments according to this disclosure, after performing first-level clustering, the image's feature vector and related information can be stored in a second database capable of streaming read and write operations. The related information may include one or more of the following: an identifier of the content to which the image belongs, an identifier of the first-level cluster to which the image belongs, an image identifier, and image attribute information. The image's feature vector and related information can also be streamed from the second database to perform second-level and third-level clustering.
[0032] At S23, based on the feature vector of the image, a secondary clustering can be performed on each primary cluster to divide the image into multiple refined secondary clusters. This secondary clustering adopts the k-means algorithm and is executed using a GPU.
[0033] At point S24, based on the image's feature vectors, tertiary clustering can be performed on each secondary cluster to divide the image into multiple further refined tertiary clusters. The tertiary clustering can employ the DBSCAN algorithm. As a non-limiting example, performing a single tertiary clustering operation may include: repeatedly executing the DBSCAN algorithm on each secondary cluster, wherein the parameters of the DBSCAN algorithm are adaptively adjusted based on the number of images included in the clusters obtained from the previous DBSCAN algorithm execution; after multiple DBSCAN algorithm executions, it is determined whether the number of images included in each cluster obtained from each DBSCAN algorithm has converged; if convergence has been achieved, multiple further refined tertiary clusters are obtained. It should be understood that tertiary clustering is not limited to this cyclic adaptive DBSCAN algorithm; other DBSCAN algorithms can also be used.
[0034] At point S25, it can be determined whether the result of multi-level clustering meets predetermined conditions. If not, one or more of the following can be repeated: first-level clustering, second-level clustering, and third-level clustering. Specifically, determining whether the result of multi-level clustering meets predetermined conditions may include: determining whether the number of rounds of multi-level clustering has reached a round threshold and whether the number of images included in the third-level cluster exceeds a first image number threshold. If the round threshold is not reached or the first image number threshold is exceeded, another round of multi-level clustering is performed. This one round of multi-level clustering includes performing second-level clustering and performing third-level clustering.
[0035] In embodiments according to this disclosure, the method may further include obtaining an identifier (e.g., a note ID) of the content to which each image belongs, wherein each content includes one or more images and their corresponding feature vectors and attribute information; if the number of images in the third-level clusters obtained after performing third-level clustering is less than a threshold, obtaining the feature vectors and attribute information of the images from the corresponding content based on the identifiers of the content corresponding to the images in the third-level clusters where the number of images is less than the threshold for data augmentation; and performing first-level clustering, second-level clustering, and third-level clustering again on the feature vectors and attribute information of the data-augmented images.
[0036] Figure 3 A schematic diagram of at least a portion of an exemplary system 300 for unsupervised multi-level clustering of images according to embodiments of the present disclosure is shown.
[0037] The input data for system 300 can be notes, which include image data with or without prior information, as well as text descriptions, tags, and other information associated with the images. In embodiments according to this disclosure, the feature vectors and attribute information of the images in the notes can be pre-acquired. Figure 3 As shown, the pre-acquired image feature vectors and attribute information can be stored in the business's MySQL table and then synchronized to the system's MySQL database via Hive. For example... Figure 3 As shown, incremental business data can be synchronized to the MySQL database of system 300 via Hive, thus enabling streaming data reading. If the business's MySQL table does not exist, feature vectors and attribute information can be directly extracted from the images in the notes using the detection module, attribute module, feature module, etc., to obtain the image's feature vectors and attribute information non-streaming.
[0038] Because the image data in the content may be unevenly distributed—for example, images of trending items might make up a large proportion—sparse sampling can be used. For instance, images within a note corresponding to a note ID can be sampled based on attribute information, with only one bounding box for each attribute. Sparse sampling reduces the number of images that need to be processed, thereby improving processing speed. Furthermore, for notes with a small number of images in the final three-level clusters, data sampling can be performed on the note module itself, i.e., horizontal data augmentation based on the note ID.
[0039] System 300 may include a pre-clustering module, in which the sampled data can be pre-clustered (first-level clustering) based on attribute information. This is because images in the same category (or other prior attributes) have a certain probability of being the same. Through pre-clustering, image clustering at the scale of hundreds of millions can be reduced to image clustering at the scale of tens of millions.
[0040] like Figure 3 As shown, system 300 may also include a Redis database capable of streaming read and write, which stores the feature vectors (embeddings) and related information (infos) of the image. The related information may include, for example, one or more of the following: the note ID of the note to which the image belongs, the identifier of the category of the pre-cluster to which the image belongs, the image identifier, and the image attribute information.
[0041] System 300 may also include a k-means GPU clustering module, in which the GPU is used to accelerate clustering processing (secondary clustering) based on the feature vector of the image, thereby achieving secondary coarse-grained unsupervised clustering of tens of millions to millions of data.
[0042] System 300 may also include a DBSCAN clustering module, in which an improved cyclic DBSCAN unsupervised clustering algorithm is used to achieve data clustering within the millions. Cyclic DBSCAN clustering mainly strengthens the clustering conditions by adjusting the weights based on the number of images within the three-level clusters, adaptively adjusting the DBSCAN clustering parameters, and enhancing the confidence of the clustering algorithm's performance through clustering results under different conditions.
[0043] like Figure 3 As shown, if there are too many images in the three-level clusters or the number of iterations is insufficient, another round of k-means clustering and DBSCAN clustering can be performed.
[0044] Furthermore, if the number of images in the three-level cluster is too small, the process can return to the module that samples the notes, expand the data through data sampling from the note module, and then perform multi-level clustering again on the expanded data.
[0045] After obtaining clustering results that meet the requirements, the data format with group labels generated by the data post-processing module can be used to output soft labels.
[0046] Figure 4 A schematic diagram of at least a portion of a computer system for unsupervised multilevel clustering of images according to embodiments of the present disclosure is shown. System 400 includes one or more processors 410, one or more memories 420, and other components (not shown) typically found in devices such as computers. Each of the one or more memories 420 may store content accessible by the one or more processors 410, including instructions 421 executable by the one or more processors 410, and data 422 that may be retrieved, manipulated, or stored by the one or more processors 410.
[0047] Instruction 421 can be any set of instructions that will be executed directly by one or more processors 410, such as machine code, or any set of instructions that will be executed indirectly, such as a script. The terms “instruction,” “application,” “procedure,” “step,” and “program” used herein are interchangeable. Instruction 421 can be stored in object code format for direct processing by one or more processors 410, or stored as a script or set of independent source code modules in any other computer language, including those interpreted on demand or compiled ahead of time. Instruction 421 may include instructions that cause one or more processors 410 to act as the various models described herein. The function, methods, and routines of instruction 421 are explained in more detail elsewhere in this document.
[0048] One or more memories 420 may be any temporary or non-temporary computer-readable storage medium capable of storing content accessible by one or more processors 410, such as hard disk drives, memory cards, ROM, RAM, DVDs, CDs, USB storage, writable memory, and read-only memory. One or more of the memories 420 may include a distributed storage system, wherein instructions 421 and / or data 422 may be stored on multiple different storage devices that may be physically located in the same or different geographical locations. One or more of the memories 420 may be connected to one or more processors 410 via a network, and / or may be directly connected to or incorporated into any of the one or more processors 410.
[0049] One or more processors 410 may retrieve, store, or modify data 422 according to instructions 421. Data 422 stored in one or more memories 420 may include at least a portion of one or more of the items stored in the one or more storage devices 410 described above. For example, while the subject matter described herein is not limited to any particular data structure, data 422 may also be stored in computer registers (not shown), or as a table or XML document with many different fields and records in a relational database. Data 422 may be formatted in any computing device-readable format, such as, but not limited to, binary values, ASCII, or Unicode. Furthermore, data 422 may include any information sufficient to identify the relevant information, such as numbers, descriptive text or symbols, proprietary codes, pointers, references to data stored in other memories such as other network locations, or information used by functions to calculate the relevant data.
[0050] One or more processors 410 can be any conventional processor, such as a commercially available central processing unit (CPU), graphics processing unit (GPU), etc. Alternatively, one or more processors 410 can also be special-purpose components, such as application-specific integrated circuits (ASICs) or other hardware-based processors. While not required, one or more processors 410 may include specialized hardware components to perform specific computational processes, such as image processing of images, faster or more efficiently.
[0051] Although Figure 4 One or more processors 410 and one or more memories 420 are schematically shown within the same box, but system 400 may actually include multiple processors or memories that may reside within the same physical housing or multiple different physical housings. For example, one of the one or more memories 420 may be a hard disk drive or other storage medium located in a housing different from the housing of each of the one or more computing devices (not shown) described above. Therefore, references to processors, computers, computing devices, or memories should be understood to include references to a collection of processors, computers, computing devices, or memories that may operate in parallel or not in parallel.
[0052] The method and apparatus disclosed herein can achieve feature-based clustering of hundreds of millions of discrete images, generating hundreds of millions of image clusters with group labels at low cost, providing data support for large-scale classification tasks. By combining Hive, MySQL, and Redis data storage media and accelerating the process with both CPU and GPU, it solves the problems of computing power, storage, and efficiency. Through multi-level unsupervised clustering algorithms, it gradually achieves unsupervised clustering of hundreds of millions of discrete data points, down to tens of millions, millions, and finally grouped data. By designing cyclic clustering, it improves the accuracy of grouped data, ensuring that the accuracy within a group is greater than 90%. In scenarios where manual annotation and data mining are costly, the method and apparatus disclosed herein can be applied to situations with scarce data, solving the problems of training quality and efficiency.
[0053] The term "A or B" in the specification and claims includes "A and B" as well as "A or B", and does not exclusively include only "A" or only "B", unless otherwise specified.
[0054] In this disclosure, references to "one embodiment" or "some embodiments" mean that a feature, structure, or characteristic described in connection with that embodiment is included in at least one embodiment or at least some embodiments of this disclosure. Therefore, the appearance of the phrases "in one embodiment" or "in some embodiments" throughout this disclosure does not necessarily refer to the same or the same embodiments. Furthermore, in one or more embodiments, features, structures, or characteristics can be combined in any suitable combination and / or sub-combination.
[0055] As used herein, the term "exemplary" means "serving as an example, instance, or illustration," and not as a "model" to be precisely copied. Any implementation described herein by example is not necessarily to be construed as preferred or advantageous over other implementations. Furthermore, this disclosure is not limited to any theory expressed or implied as given in the foregoing technical field, background, summary of invention, or detailed description.
[0056] Additionally, certain terms may be used in the following description for reference only and are therefore not intended to be limiting. For example, unless the context clearly indicates otherwise, the words “first,” “second,” and other such numerical terms relating to structures or elements do not imply order or sequence. It should also be understood that the term “including / comprising,” as used herein, indicates the presence of the indicated feature, whole, step, operation, unit, and / or component, but does not preclude the presence or addition of one or more other features, wholes, steps, operations, units, and / or components, and / or combinations thereof.
[0057] In this disclosure, the terms "component" and "system" are intended to refer to a computer-related entity, or hardware, a combination of hardware and software, software, or software in execution. For example, a component can be, but is not limited to, a process, object, executable, thread of execution, and / or program running on a processor. By way of example, both an application running on a server and the server itself can be a component. One or more components can exist within an executing process and / or thread, and a component can be located on a single computer and / or distributed across two or more computers.
[0058] Those skilled in the art will recognize that the boundaries between the above operations are merely illustrative. Multiple operations may be combined into a single operation, a single operation may be distributed among additional operations, and operations may be performed with at least partial overlap in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be changed in various other embodiments. However, other modifications, variations, and substitutions are equally possible. Therefore, this specification and the accompanying drawings should be considered illustrative rather than restrictive.
[0059] While specific embodiments of this disclosure have been described in detail by way of example, those skilled in the art should understand that the examples are for illustrative purposes only and not intended to limit the scope of this disclosure. The various embodiments disclosed herein can be combined in any way without departing from the spirit and scope of this disclosure. Those skilled in the art should also understand that various modifications can be made to the embodiments without departing from the scope and spirit of this disclosure. The scope of this disclosure is defined by the appended claims.
Claims
1. A method for unsupervised multi-level clustering of images, the method comprising: The feature vectors of multiple images, including the image content, and the attribute information of the image are acquired in a streaming or non-streaming manner. Based on the attribute information of the image, perform first-level clustering to divide the image into multiple first-level clusters; Based on the feature vector of the image, secondary clustering is performed on each primary cluster to divide the image into multiple refined secondary clusters. The secondary clustering adopts the k-means algorithm and is executed using a GPU. Based on the feature vector of the image, tertiary clustering is performed on each secondary cluster to divide the image into multiple further refined tertiary clusters. The tertiary clustering employs the DBSCAN algorithm. Determine whether the results of multi-level clustering meet the predetermined conditions. If not, repeat one or more of the first-level clustering, second-level clustering, and third-level clustering.
2. The method according to claim 1, wherein, Performing three-level clustering includes: The DBSCAN algorithm is executed repeatedly for each secondary cluster. During the execution of the DBSCAN algorithm, the parameters of the DBSCAN algorithm are adaptively adjusted according to the number of images included in the cluster obtained from the previous DBSCAN algorithm execution. After executing the DBSCAN algorithm multiple times, determine whether the number of images included in each cluster obtained by each DBSCAN algorithm has converged; If convergence has been achieved, then multiple further refined tertiary clusters are obtained.
3. The method according to claim 1 or 2, wherein, Determining whether the results of multi-level clustering meet predetermined conditions includes: Determine whether the number of rounds of multi-level clustering has reached a threshold and whether the number of images included in the three-level clusters exceeds a first threshold for the number of images. If the number of rounds does not reach the threshold or the number of images exceeds the threshold, then another round of multi-level clustering is performed, wherein performing one round of multi-level clustering includes performing two-level clustering and performing three-level clustering.
4. The method according to claim 1, wherein: The feature vectors and attribute information of the images acquired in streaming are extracted in advance from the corresponding images and stored in a first database that can be streamed; and The feature vectors and attribute information of images acquired non-streaming are extracted from the corresponding images in real time.
5. The method according to claim 1, wherein the plurality of contents includes an image and other information related to the image, and the method further includes: For the multiple contents obtained, sparse sampling is performed on the images included in the multiple contents according to the attribute information, so that only one feature vector is obtained for the multiple images included in the same content.
6. The method according to claim 5, wherein: The method further includes obtaining an identifier for the content to which each image belongs, wherein each content includes one or more images, feature vectors of the one or more images, and attribute information of the one or more images; and Determining whether the results of multi-level clustering meet predetermined conditions includes: Determine whether the number of images in the third-level cluster, which is the result of multi-level clustering, is lower than the second image threshold. If the number of images is below the second image quantity threshold, based on the identifier of the content corresponding to the images in the third-level clusters where the number of images is below the second image quantity threshold, the feature vector and attribute information of the images are obtained from the corresponding content for data augmentation. Perform first-level, second-level, and third-level clustering on the feature vectors and attribute information of the data-enhanced images.
7. The method according to claim 1, wherein, The method further includes: After performing first-level clustering, the image's feature vector and related information are stored in a second database capable of streaming read and write operations. The related information includes one or more of the following: an identifier of the content to which the image belongs, an identifier of the first-level cluster to which the image belongs, an image identifier, and attribute information included in the image. The content includes the image and other information related to the image. The feature vectors and attribute information of the images are streamed from the second database to perform secondary and tertiary clustering.
8. A system for unsupervised multi-level clustering of images, the system comprising: A data acquisition module is configured to acquire feature vectors of multiple images and attribute information of the images in a streaming or non-streaming manner. A first clustering module is configured to perform first-level clustering based on the attribute information of the image to divide the image into multiple first-level clusters; as well as The second clustering module is configured as follows: Based on the feature vectors of the image, secondary clustering is performed on each primary cluster to divide the image into multiple refined secondary clusters. This secondary clustering employs the k-means algorithm and is executed using a GPU. Based on the feature vectors of the image, tertiary clustering is performed on each secondary cluster to divide the image into multiple further refined tertiary clusters. The tertiary clustering employs the DBSCAN algorithm. Determine whether the results of multi-level clustering meet the predetermined conditions. If not, repeat one or more of the first-level clustering, second-level clustering, and third-level clustering.
9. The system according to claim 8, wherein, Performing three-level clustering includes: The DBSCAN algorithm is executed repeatedly for each secondary cluster. During the execution of the DBSCAN algorithm, the parameters of the DBSCAN algorithm are adaptively adjusted according to the number of images included in the cluster obtained from the previous DBSCAN algorithm execution. After executing the DBSCAN algorithm multiple times, determine whether the number of images included in each cluster obtained by each DBSCAN algorithm has converged; If convergence has been achieved, then multiple further refined tertiary clusters are obtained.
10. The system according to claim 8 or 9, wherein, Determining whether the result of multi-level clustering meets predetermined conditions includes: determining whether the number of rounds of multi-level clustering has reached a round threshold and whether the number of images included in the tertiary cluster exceeds a first image number threshold. If the number of rounds has not reached the round threshold or exceeds the first image number threshold, then another round of multi-level clustering is performed. Executing one round of multi-level clustering includes executing secondary clustering and executing tertiary clustering.
11. The system according to claim 8, wherein, The system also includes a first database capable of streaming reads, wherein the feature vectors and attribute information of the images acquired in the streaming are extracted in advance from the corresponding images and stored in the first database; as well as The feature vectors and attribute information of images acquired non-streaming are extracted from the corresponding images in real time.
12. The system of claim 8, wherein the plurality of contents includes an image and other information related to the image, wherein: The data acquisition module is further configured to perform sparse sampling on the images included in the multiple acquired contents according to the attribute information, so as to obtain only one feature vector for the multiple images included in the same content.
13. The system according to claim 12, wherein: The data acquisition module is also configured to: Obtain the identifier of the content to which each image belongs, wherein each content includes one or more images, the feature vectors of the one or more images, and the attribute information of the one or more images. When the number of images in a third-level cluster is less than a second image number threshold, the feature vector and attribute information of the image are obtained from the corresponding content based on the identifier of the content corresponding to the image in the third-level cluster whose image number is less than the second image number threshold, in order to perform data augmentation; and The first clustering module and the second clustering module are further configured to perform first-level clustering, second-level clustering, and third-level clustering on the feature vectors of the data-enhanced images and the attribute information of the images.
14. The system according to claim 8, wherein, The system also includes a second database capable of streaming read and write operations, and the second database is configured as follows: After performing first-level clustering, the image's feature vector and related information are stored in the second database. The related information includes one or more of the following: an identifier of the content to which the image belongs, an identifier of the first-level cluster to which the image belongs, an image identifier, and attribute information included in the image. The content includes the image and other information related to the image. The feature vectors and attribute information of the images are streamed from the second database to perform secondary and tertiary clustering.
15. An apparatus for unsupervised multi-level clustering of images, comprising: A memory that stores instructions; as well as The processor is configured to execute instructions stored in the memory to perform the method according to any one of claims 1 to 7.
16. A computer-readable storage medium comprising computer-executable instructions, which, when executed by one or more processors, cause the one or more processors to perform the method according to any one of claims 1 to 7.