Object storage file association tagging cleanup method, apparatus, and electronic device
By obtaining the object list of the object storage space and parsing and generating object tags based on storage type and file content, the problem of automated identification and association in object storage file management in the prior art is solved, improving the identification accuracy, batch processing efficiency and traceability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHONGKE TIMES (SHENZHEN) COMPUTER SYST CO LTD
- Filing Date
- 2026-05-15
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, object storage file management relies on manual identification and configuration, making it difficult to automatically parse file content attributes. It also lacks automatic association and tag transfer mechanisms, resulting in low identification efficiency, poor accuracy, and a tendency for anomalies to occur during batch processing. Furthermore, it lacks a tracking mechanism.
By obtaining the object list of the object storage space, determining candidate objects based on storage type, performing file type identification and content parsing, generating object tags, searching for related objects in the object list, generating a tagging queue, and supporting tag writing and report generation in both simulation and production modes.
It improves the accuracy of object recognition and the efficiency of association tagging, enhances the traceability of batch processing, reduces access anomalies, and realizes automated file management.
Smart Images

Figure CN122240566A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of data management technology, and in particular to a method, apparatus and electronic device for object storage file association tagging and cleaning. Background Technology
[0002] With the continuous growth of documents, images, tool resources, and business files in enterprise R&D, production, and customer service processes, object storage services have become a crucial infrastructure for centralized storage and access to cloud resources. Object storage typically organizes files through buckets and object paths, and supports management capabilities such as object tags, storage types, and lifecycle rules to perform processing such as storage level adjustments, expiration cleanup, or access control for different types of files.
[0003] In existing technologies, file management in object storage typically relies on manual filtering based on directory, file name, upload time, or access frequency, and manual configuration of lifecycle rules to achieve low-frequency file dumping or historical file cleanup. For some business resources, although the subsequent processing method can be determined by the environment information, version information, or usage information in the file content, the existing management method has difficulty in automatically reading and parsing the file content, and it is also difficult to synchronously associate the attributes parsed from the configuration file with the corresponding image files, binary files, and other resource objects.
[0004] Therefore, existing technologies have at least the following problems: the number of files in object storage is large and the types are complex, making manual identification and tagging inefficient; processing based solely on directory or time makes it difficult to accurately reflect the true attributes of files; there is a lack of automatic matching and tag transfer mechanisms between configuration files and associated resource files; different storage types of objects have different content access capabilities, making it easy for processing anomalies to occur during batch reading; and online batch tag changes lack simulation verification, log recording, and report tracking mechanisms, making it difficult to meet the needs of automated governance of object storage files. Summary of the Invention
[0005] In view of this, embodiments of this application provide a method, apparatus, and electronic device for object storage file association tagging and cleaning, in order to solve the problems of difficulty in identifying content attributes, inaccurate tagging of associated objects, and lack of tracking of batch changes in the prior art.
[0006] A first aspect of this application provides a method for associating and tagging cleanup of object storage files, comprising: obtaining a list of objects under a target path in a target object storage space, and extracting the object name, storage type, and object path of each object; determining the content access status of each object according to the storage type, and identifying objects that meet preset reading conditions as candidate objects; performing file type identification on the candidate objects, identifying text configuration objects, reading the file content of the text configuration objects, and extracting environment identifiers based on preset field parsing rules; generating object tags corresponding to the text configuration objects according to the environment identifiers, and extracting base names according to the object names of the text configuration objects; searching for associated objects in the object list whose object names and base names meet preset matching relationships, associating object tags with associated objects, and generating a queue of objects to be tagged; determining the current running mode according to running parameters, writing object tags according to the queue of objects to be tagged in the formal running mode, and generating a record of tags to be written in the simulated running mode; generating an object processing report according to the tag processing record and the object list, and using the objects with written object tags as processing objects of lifecycle rules.
[0007] A second aspect of this application provides an object storage file association tagging and cleanup apparatus, comprising: an acquisition module, configured to acquire a list of objects under a target path in a target object storage space, and extract the object name, storage type, and object path of each object; a determination module, configured to determine the content access status of each object according to the storage type, and determine objects that meet preset reading conditions as candidate objects; an identification module, configured to perform file type identification on the candidate objects, determine text configuration objects, read the file content of the text configuration objects, and extract environment identifiers based on preset field parsing rules; an extraction module, configured to generate object tags corresponding to the text configuration objects according to the environment identifiers, and extract base names according to the object names of the text configuration objects; an association module, configured to search for associated objects in the object list whose object names and base names meet preset matching relationships, associate object tags with associated objects, and generate a queue of objects to be tagged; a writing module, configured to determine the current running mode according to running parameters, write object tags according to the queue of objects to be tagged in the formal running mode, and generate a record of tags to be written in the simulated running mode; and a generation module, configured to generate an object processing report according to the tag processing record and the object list, and use the objects with written object tags as processing objects of lifecycle rules.
[0008] A third aspect of this application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the above-described method.
[0009] The above-described technical solutions adopted in the embodiments of this application can achieve the following beneficial effects: This application improves object recognition accuracy, enhances associative tagging efficiency, and strengthens batch processing traceability by acquiring a list of objects under the target path in the target object's storage space and extracting the object name, storage type, and object path for each object. Based on the storage type, it determines the content access status of each object and identifies objects that meet preset reading conditions as candidate objects. For candidate objects, it performs file type identification to determine text configuration objects, reads the file content of the text configuration objects, and extracts environment identifiers based on preset field parsing rules. Based on the environment identifiers, it generates object tags corresponding to the text configuration objects and extracts the base name based on the object name of the text configuration objects. It searches the object list for associated objects whose object names and base names meet preset matching relationships, associates object tags with associated objects, and generates a queue of objects to be tagged. Based on the running parameters, it determines the current running mode; in the formal running mode, it writes object tags according to the queue of objects to be tagged; in the simulated running mode, it generates a record of tags to be written. Based on the tag processing record and the object list, it generates an object processing report and uses objects with written object tags as processing objects for lifecycle rules. Attached Figure Description
[0010] To more clearly illustrate the technical solutions in the embodiments of this application, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0011] Figure 1 This is a flowchart illustrating the object storage file association tagging and cleanup method provided in an embodiment of this application; Figure 2 This is a schematic diagram of the object storage file association marking and cleaning device provided in the embodiments of this application; Figure 3 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application. Detailed Implementation
[0012] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, in order to provide a thorough understanding of the embodiments of this application. However, those skilled in the art will understand that this application may also be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods have been omitted so as not to obscure the description of this application with unnecessary detail.
[0013] In existing technologies, object storage services are typically used to centrally store documents, images, tool resources, and other business files generated during enterprise R&D, production, and customer service. As the number of resources in object storage continues to increase, current management methods largely rely on external information such as directory paths, file names, upload times, and access frequencies to filter objects. They also manually configure lifecycle rules to adjust the storage type, archive, or clean up some infrequently accessed or historical files. While this approach achieves a certain level of object storage management, it primarily depends on manual experience and static rules, making it difficult to fully identify the environmental information, usage information, and relationships contained within object files.
[0014] The existing technology has several problems. Object storage contains a large number of complex file types, and processing based solely on directory or time makes it difficult to accurately reflect the true attributes of files. For environment identifiers and other information contained in text configuration files, existing methods struggle to automatically parse and convert them into object tags. For image files or other resource files with name associations to text configuration files, existing methods cannot automatically associate and synchronously tag them based on the parsing results of the configuration files. Furthermore, objects of different storage types have different content access capabilities, and the lack of storage type detection during batch reads can easily lead to access anomalies. In addition, the lack of simulation verification, logging, and report output mechanisms for online batch tagging and cleanup operations hinders the tracking and verification of the operation process. Therefore, the technical problem this application addresses is: how to automatically identify file content attributes in an object storage environment and associate the identification results with the corresponding objects to achieve traceable object tagging and lifecycle management.
[0015] To address the aforementioned technical problems, this application provides a method for object storage file association tagging and cleanup. This method first obtains a list of objects under a target path in the target object storage space and extracts the object name, storage type, and object path of each object. Then, it determines the content access status of each object based on its storage type, identifying objects that meet preset reading conditions as candidate objects to avoid performing content access operations on objects that do not support direct reading. Next, it identifies the file type of the candidate objects, determines text configuration objects, reads the file content of the text configuration objects, and extracts environment identifiers from the file content according to preset field parsing rules.
[0016] After obtaining the environmental identifier, this application generates object tags corresponding to the text configuration object based on the environmental identifier, and extracts the base name based on the object name of the text configuration object; then, it searches the object list for associated objects whose object names and base names satisfy a preset matching relationship, and associates the object tags corresponding to the text configuration object with the associated objects, generating a queue of objects to be tagged. Thus, the system can not only perform content parsing and tagging of text configuration objects, but also synchronously transmit the tag information parsed from the text configuration objects to relevant mirror objects or resource objects based on the matching relationship between object names.
[0017] Furthermore, this application determines the current operating mode based on operating parameters. In formal operating mode, the system calls the object tag writing interface according to the queue of objects to be tagged, and writes the object tags to the corresponding objects; in simulated operating mode, the system only generates records of tags to be written, and does not submit actual tag writing requests. After the tagging process is completed, the system generates an object processing report based on the tag processing records and the object list, and uses the objects with written object tags as the processing objects of lifecycle rules, so that subsequent storage type adjustment, archiving, or cleanup operations can be performed based on object tags.
[0018] Through the above technical solutions, this application can convert the content attributes of object files into tag data that can be recognized by the object storage system, and realize the synchronous tagging of related objects based on the file name association relationship, thereby improving the accuracy of object recognition and the efficiency of association tagging; reduce access anomalies during batch content reading by judging storage type; and record and provide feedback on the batch tag change process through simulation operation, log recording and object processing report, thereby enhancing the traceability of batch processing of object storage files.
[0019] The technical solution of this application will now be described in detail with reference to the accompanying drawings and specific embodiments.
[0020] Figure 1 This is a flowchart illustrating the object storage file association tagging and cleanup method provided in an embodiment of this application. For example... Figure 1 As shown, the method may specifically include: S101, obtain the list of objects under the target path in the target object storage space, and extract the object name, storage type and object path of each object; S102, determine the content access status of each object according to the storage type, and identify the objects that meet the preset reading conditions as candidate objects; S103, perform file type identification on candidate objects, determine text configuration objects, read the file content of text configuration objects, and extract environment identifiers based on preset field parsing rules; S104, Generate object tags corresponding to text configuration objects based on environment identifiers, and extract base names based on object names of text configuration objects; S105, find the associated objects in the object list whose object names and base names satisfy the preset matching relationship, associate the object tags with the associated objects, and generate a queue of objects to be tagged. S106, determine the current operating mode according to the operating parameters, write object tags according to the queue of objects to be tagged in the formal operating mode, and generate a record of tags to be written in the simulation operating mode. S107, Generate an object processing report based on the tag processing records and object list, and use the objects that have been written with object tags as the processing objects of the lifecycle rules.
[0021] In some embodiments, a list of objects under a target path in the target object storage space is obtained, and the object name, storage type, and object path of each object are extracted, including: Based on the runtime parameters, the target service address, target object storage space, and target path are parsed, the object storage access configuration is loaded, and an access connection with the target object storage space is established. The object enumeration interface is called based on the target path to obtain the original object list. The object metadata in the original object list is extracted and formatted to obtain an object list containing object name, storage type and object path, and the object list is written into the memory state data.
[0022] Specifically, during the execution of the object storage file association tagging and cleanup method, the system first parses the runtime parameters carried in the startup command or task configuration. These parameters include the target service address, target object storage space, target path, and simulation run identifier. The target service address determines the access point for the object storage service, the target object storage space determines the storage bucket to be processed, and the target path limits the directory to be processed. After parsing the parameters, the system performs format validation and default handling on each parameter. If the target path does not end with a uniform path separator, it is padded according to preset path rules to ensure that the object enumeration interface returns object results according to the same directory.
[0023] After determining the target object storage space and target path, the system loads a pre-configured object storage access configuration, which includes access credentials, region information, and connection control parameters. The system initializes the object storage access management component based on the access configuration and establishes an access connection with the target object storage space. For scenarios involving batch processing of a large number of files, the access management component maintains reusable connection states to reduce the overhead of repeated connections during continuous enumeration, reading, and tagging processes. If the access configuration loading fails, the credentials are invalid, or the target object storage space is inaccessible, the system writes the exception information to the runtime log and stops subsequent processes.
[0024] Furthermore, after the access connection is established, the system calls the object enumeration interface with the target path as a prefix condition to obtain the original object list under the target path. Taking the R&D images and tool configuration resources in the company's object storage as an example, the target path can correspond to a product version directory, which simultaneously stores text configuration files, image files, compressed resources, and historical backup files. In the original object list returned by the object enumeration interface, each object carries metadata such as object key name, storage type, object size, and update time. The system iterates through the original object list item by item, extracting the object name, storage type, and object path for subsequent processing, and converts the path prefix, file name field, and storage type field into a unified structure.
[0025] During field extraction, the system performs consistency checks based on the inclusion relationship between object paths and object names to avoid incorrect object directory assignments due to path truncation or prefix matching errors. For abnormal objects with empty names, paths not belonging to the target path, or missing metadata, the system marks them as an exception record and writes the exception reason to the runtime log. For objects with normal formats, the system retains their storage type as a unified field for later determination of whether the object's content can be directly read; its object path as a unified field for restoring the object's location in subsequent reports; and its object name as a unified field for subsequent text configuration object identification and associated object matching.
[0026] After processing the original object list, the system generates an object list under the target path and writes the object list to the memory state data. The memory state data uses a key-value structure to maintain the processing status of this task, where the object path is used as the basis for object location, the object name is used as the basis for object matching, and the storage type is used as the basis for content access determination.
[0027] Through the above implementation methods, this embodiment can reuse the same object list in subsequent content reading, environment identifier parsing, associated object search, tag writing and report generation processes, reduce data inconsistency caused by repeated object listing, and provide traceable basic data for batch object processing.
[0028] In some embodiments, determining the content access status of each object based on the storage type, and identifying objects that meet preset read conditions as candidate objects, includes: Match the storage type of each object with the preset set of readable storage types to determine whether each object is allowed to read file content; Mark objects that allow file content reading as accessible objects, and identify accessible objects as candidate objects; Objects that are not allowed to read file contents are marked as restricted access objects, and the object name, object path, and storage type of the restricted access objects are written to the memory state data and runtime log.
[0029] Specifically, after the object list is written to the memory state data, the system reads the storage type field of each object item by item in the object list and matches this storage type field with a preset set of readable storage types. This preset set of readable storage types is pre-configured by the object storage service's content access rules and represents storage types that can directly read file content. In some examples, standard storage and infrequently accessed storage can directly access object content and are therefore configured as readable storage types; archive storage or other storage types that require recovery before reading are not configured as readable storage types. The system completes this matching process before performing content analysis to avoid initiating read requests for objects that cannot be directly accessed when subsequently reading text configuration files.
[0030] In the specific processing, the system retrieves an object record from the memory state data and reads the object name, object path, and storage type from that record. If the storage type matches a preset set of readable storage types, the system sets the content access status of the object to accessible and marks the object as an accessible object. Accessible objects then enter the candidate object set for subsequent file type identification, text content reading, and environment identifier parsing.
[0031] For example, if an environment configuration text file, an image file, and a compressed resource file exist in the target path, and the environment configuration text file is in standard storage, the system can use it as a candidate for subsequent content reading; if some image files are in low-frequency access storage, although the system can determine that they have direct access capabilities, it can still decide whether to read the content or only participate in association matching based on the subsequent file type identification results.
[0032] If an object's storage type does not match the preset set of readable storage types, the system sets the object's content access status to restricted access and marks the object as a restricted access object. For restricted access objects, the system does not call the object read interface to obtain its file content. Instead, it writes the object name, object path, and storage type into the memory status data and records in the runtime log that the object was skipped because its storage type does not meet the read conditions. Taking historical image resources in a bucket as an example, some earlier versions of binary image files may have been converted to archive storage. When the system traverses objects of this type, it only records their asset information and does not perform content read operations. The distribution of their storage types and their processing status can be displayed in subsequent reports.
[0033] Content access status can also serve as a status field for subsequent tagging and report generation. For accessible objects, the system records whether text recognition, environmental identifier extraction, and tag association have been completed during subsequent processing. For restricted access objects, the system records the reason why they did not enter the content analysis process. Thus, the operation report can distinguish between analyzed objects, accessible objects that did not match the text configuration attributes, and restricted access objects, enabling administrators to determine the coverage of the object list and the reasons for non-processing based on the report.
[0034] Through the above embodiments, this application determines the access capability of objects based on storage type before content analysis, so that subsequent text reading and environment identifier parsing only apply to objects that meet the reading conditions, reducing interface exceptions caused by reading archived objects; at the same time, the name, path and storage type of restricted access objects are synchronously written into memory status data and running logs, improving the stability and traceability of the batch object analysis process.
[0035] In some embodiments, file type identification is performed on candidate objects to determine text configuration objects, the file content of the text configuration objects is read, and environment identifiers are extracted based on preset field parsing rules, including: Candidate objects are filtered based on their file type characteristics to identify text configuration objects with text configuration attributes. Call the object reading interface to obtain the file content of the text configuration object, and then perform character format standardization and invalid content filtering on the file content; According to the preset field parsing rules, the environment field is matched in the file content, the environment identifier corresponding to the text configuration object is extracted, and the environment identifier is written into the memory state data.
[0036] Specifically, after the candidate object set is generated, the system filters the candidate objects according to object name, object path, and file type characteristics to identify text configuration objects with text configuration attributes. File type characteristics can include file extension, object name structure, object content type identifier, and resource classification information in the path. In some examples, configuration manual files, binary image files, tool archives, and historical resource files often coexist in the target path. The system prioritizes identifying text-formatted configuration files and uses them as objects for subsequent content analysis. For objects such as image files and compressed resource files that are not suitable as sources for environment field parsing, text content parsing is not performed; instead, they are retained in the object list for subsequent association and matching.
[0037] After identifying the text configuration object, the system calls the object reading interface to retrieve its file content. Since configuration files in different business directories may be generated by different R&D or production processes, the file content may exhibit inconsistent character encoding, different line break formats, excessive blank fields, or numerous comments. Therefore, after reading the file content, the system first performs character format unification, converting the file content into a unified text processing format and filtering out blank lines, invalid delimiters, unrecognizable characters, and content irrelevant to environment identification. For text configuration objects that fail to read, have empty content, or exhibit obviously abnormal content format, the system writes the object's reading status and the reason for the exception to the memory status data and the runtime log, and stops extracting the environment identifier for that object.
[0038] After the file content is preprocessed, the system matches environment fields in the file content according to preset field parsing rules. These preset field parsing rules describe the field names, boundaries, and value formats of environment fields in the configuration file, and can be implemented using field matching rules. In some examples, text configuration files can record information about the runtime, release, or test environments corresponding to image resources. The system extracts the corresponding environment identifiers from the text content through field matching and generates the data source for subsequent object tags based on the extraction results. For example, when a configuration file in a product version directory records its corresponding environment as a test environment, the system uses this test environment information as the environment identifier for the text configuration object; when the configuration file records its corresponding environment as a production environment, the system uses this production environment information as the environment identifier for the text configuration object.
[0039] In some examples, to avoid erroneous extraction, the system can also perform validity checks on the matched environment fields. If multiple candidate environment fields are matched in the same text configuration object, the system determines the target environment field according to the preset field priority or the field's location. If the value of the environment field does not meet the preset value format, the system marks the object as a field parsing exception object and records its object name, object path, and exception type in the runtime log. For text configuration objects whose environment identifiers are successfully extracted, the system writes the environment identifier, object name, and object path together into the memory state data, so that subsequent object tag generation, basic name extraction, and associated object lookup can reference the same parsing result.
[0040] Through the above embodiments, this application can accurately filter text configuration objects from candidate objects, perform normalized reading and field parsing on the file content of the text configuration objects, convert the environmental information in the configuration file into an environmental identifier that can be used in the subsequent labeling process, improve the accuracy of file content attribute identification, and provide a stable data foundation for the synchronous labeling of associated mirror objects.
[0041] In some embodiments, an object label corresponding to the text configuration object is generated based on the environment identifier, and the base name is extracted based on the object name of the text configuration object, including: The environment identifier is matched with the preset label mapping rules to generate object labels containing label names and label values, and a correspondence is established between text configuration objects and object labels; Separate the path field and file type field from the object name of the text configuration object, extract the base name used for associated object lookup, and associate the base name with the object label and write it into the memory state data.
[0042] Specifically, after extracting the environment identifier from the text configuration object, the system standardizes the environment identifier according to preset tag mapping rules. These preset tag mapping rules define the correspondence between environment identifiers and object tags, including standardized expression of tag names, standard format of tag values, and handling of abnormal values. In some examples, after parsing the environment identifier representing the image's runtime environment from the text configuration file, the system does not directly write the original field to object storage. Instead, it first converts the environment identifier into an object tag recognizable by the object storage lifecycle rules. For example, if an environment field recorded in a configuration file corresponds to a test environment, the system maps it to a preset environment class tag and its corresponding value; if the environment field corresponds to a production environment or other release environment, the system generates different values for the same type of tag according to the same mapping rule, creating a unified tag structure for configuration files from different sources.
[0043] When generating object tags, the system simultaneously establishes a correspondence between text configuration objects and object tags. This correspondence includes the object path, object name, parsed environment identifier, and generated object tag of the text configuration object. The system writes this correspondence into the memory state data, ensuring that subsequent object lookups, tagging queue generation, and operation report statistics all reference the same tag source. For text configuration objects with empty environment identifiers, those that cannot match preset tag mapping rules, or those with conflicting values, the system does not generate valid object tags but instead records the tag generation status of the object in the memory state data and runtime logs.
[0044] After generating object tags, the system further extracts the base name from the text configuration object's name. Specifically, the system first separates the path field from the filename field in the complete object name according to object path separation rules. Then, based on the file type identification result, it removes the file type field from the text configuration object, obtaining the base name without the path and file type suffix. For example, in some cases, when both a text configuration file and its corresponding binary image file exist under the target path, they often have the same or similar filename prefixes. By extracting the base name of the text configuration object, the system provides a matching basis for subsequently searching for image objects with the same name or prefix in the object list.
[0045] In some examples, the system can also perform format standardization on the base name to remove invalid separators, standardize capitalization, and eliminate redundant characters in the version field, enabling the base name to adapt to naming differences between configuration files and image files in different directories. After the base name is generated, the system writes the base name, object tag, and object name of the text configuration object together into the memory state data, forming an intermediate record for association matching. This intermediate record serves as input for association object lookup in subsequent steps, allowing the system to continue passing the object tag obtained from parsing the text configuration object content to the name-matching image object or other associated resource objects.
[0046] Through the above embodiments, this application can convert the environment identifier in the text configuration object into a unified object tag, extract the base name from the object name of the text configuration object, and simultaneously solidify the tag source and object name association basis in the memory state data, thereby improving the standardization of tag generation and the accuracy of association matching, and providing a stable data foundation for subsequent object tagging and lifecycle rule processing.
[0047] In some embodiments, the process involves searching the object list for associated objects whose object names and base names satisfy a preset matching relationship, associating object tags with the associated objects, and generating a queue of objects to be tagged, including: Match object names in the object list based on base names to identify associated objects that have a name association with the text configuration object; Use the object label corresponding to the text configuration object as the label to be written to the associated object, and establish an association record between the associated object, the object label, and the text configuration object; Generate a queue of objects to be tagged, containing text configuration objects and associated objects, based on the associated records.
[0048] Specifically, after generating object tags and extracting base names for text configuration objects, the system uses the base names as the basis for association matching, matching the object names of each object in the object list corresponding to the target path. During the matching process, the system first limits the objects to be matched to belong to the same target path or the same business resource directory, then performs path field stripping and file type field identification on the object names of the objects to be matched to obtain the base names to be matched, and then judges the consistency or prefix relationship between the base names to be matched and the base names of the text configuration objects.
[0049] For example, in some cases, the target path can contain both an environment configuration text file and its corresponding binary image file. The text configuration file is used to record the image's runtime environment. The image file itself is usually not suitable for direct reading. The system automatically searches for image files with the same name or prefix based on the base name of the configuration text file, so that the image file can inherit the environment class object tag obtained by parsing the configuration text file.
[0050] When performing object name matching, the system excludes text configuration objects themselves and irrelevant resource objects. Objects that match the base name according to a preset relationship and whose file type characteristics conform to resource object attributes are identified as associated objects. Objects with similar names but inconsistent paths, mismatched version fields, or already recorded as abnormal objects are not added to the associated object scope, and the matching results are recorded in the runtime log. Taking a product version directory as an example, after the system parses the test environment identifier from a text configuration file and generates the corresponding object tag, the system searches for the corresponding image file in the object list based on the base name of the text configuration file, identifies the image file as an associated object, and retains the source relationship between the text configuration file and the image file.
[0051] After identifying the associated objects, the system uses the object tag corresponding to the text configuration object as the tag to be written to the associated object, and establishes an association record between the associated object, the object tag, and the text configuration object. The association record includes the associated object's object name, object path, original storage type, tag to be written, tag source object, and matching criteria. This association record is written to memory state data for subsequent tag writing in formal operation mode, generation of pending operation records in simulated operation mode, and operation report statistics. If the same associated object is matched by multiple text configuration objects, the system determines the target tag according to preset conflict handling rules and records the conflict information in the operation log.
[0052] When generating the queue of objects to be tagged, the system organizes the text configuration objects of the generated object tags and their associated objects into a unified queue record, and maintains the queue order according to object path, matching source, and processing status. The queue record stores object location information, tags to be written, and operation status, so that the subsequent tagging process can be executed item by item according to the queue or output item by item in simulation mode.
[0053] Through the above embodiments, this application can automatically transfer the content parsing results of the text configuration object to the name-associated mirror object, reduce the manual confirmation of association relationships, improve the accuracy of associated object tagging, and enhance the traceability of the batch object governance process.
[0054] In some embodiments, the current operating mode is determined based on operating parameters. In the formal operating mode, object tags are written according to the queue of objects to be tagged. In the simulated operating mode, a record of tags to be written is generated, including: Parse the mode identifier in the running parameters to determine the current running mode; When the current running mode is the formal running mode, the object tag writing interface is called according to the queue of objects to be tagged, the object tags are written to the corresponding objects, and the tag processing records and memory state data are updated. When the current running mode is simulation running mode, submitting object tag writing requests is prohibited, and a record of tags to be written is generated based on the queue of objects to be tagged.
[0055] Specifically, after the queue of objects to be tagged is generated, the system continues to read the mode identifier from the startup command or task configuration, and determines the current running mode based on the mode identifier. The mode identifier can be passed in the running parameters along with the target service address, target object storage space, and target path, and is used to indicate whether this task should actually modify the object tags in the object storage.
[0056] For example, in some cases, the system enters simulation mode when the runtime parameters include a simulation run identifier; when the runtime parameters do not include a simulation run identifier and the access configuration verification passes, the system enters the actual runtime mode. After determining the current runtime mode, the system writes the runtime mode to the memory state data and records the execution scope, target path, and mode status of this task in the runtime log.
[0057] When the current operating mode is the formal operating mode, the system reads the records of objects to be processed one by one according to the queue order of the objects to be tagged, and extracts the object path, object name, and object tag to be written from the object record. For text configuration objects, the system writes the object tag generated based on the environment identifier into the text configuration object; for associated objects that have a name association with the text configuration object, the system uses the object tag from the text configuration object as the tag to be written to the associated object.
[0058] Before calling the object tag writing interface, the system can first read the current tag status of the object and determine the final content to be written according to the preset merging rules, avoiding overwriting existing tags that are unrelated to this cleanup task. After the object tag writing interface returns a successful result, the system updates the tag processing record, marks the object as written, and synchronously updates the number of processed tags, the number of environment tags, and the object status in the memory status data.
[0059] During actual operation, if a write operation fails due to access restrictions, object non-existence, API exception return, or incorrect tag format, the system will not repeatedly submit requests to that object indefinitely. Instead, it will record the object name, object path, tag to be written, and reason for failure, marking it as a write exception object. For subsequent queue objects, the system can continue processing or stop the task according to a preset exception handling strategy, preventing exception objects from affecting the overall processing results. In some examples, the running results will be fed back to the logging system in real time, and the memory state will be maintained through a dictionary structure, ensuring that each object tag change has a corresponding processing record.
[0060] When the current operating mode is simulation mode, the system still performs the complete processes of object traversal, text configuration object recognition, file content reading, environment identifier extraction, basic name matching, and generation of the object queue to be tagged. However, when processing the object queue to be tagged, the system prohibits submitting object tag write requests. The system only generates tag records to be written based on the object queue to be tagged, and writes the object name, object path, tag to be written, tag source object, and matching criteria for each object to be processed to the running log and memory state data. For example, when a text configuration file is recognized under the target path and a matching image file with the same name is found, in simulation mode, only the environment class tags to be written to the text configuration file and the image file are output, without modifying the actual object tags in the object storage.
[0061] Through the above embodiments, this application can switch between formal operation mode and simulated operation mode according to the operating parameters. In formal operation mode, the actual writing of object tags and status updates are completed. In simulated operation mode, the analysis results and records to be changed are fully retained without changing the online object status, thereby improving the security, verifiability and traceability of batch labeling operations.
[0062] In some embodiments, an object processing report is generated based on the tag processing records and the object list, and objects with written object tags are used as processing objects for lifecycle rules, including: Extract the tag writing status and object processing status from the tag processing records, and generate operation statistics data corresponding to object tag changes; Based on the object path, storage type and object tag in the object list, the objects in the target object storage space are classified and summarized to generate storage asset data; An object processing report is generated based on operational statistics and stored asset data, and objects with written object tags are matched to the corresponding lifecycle rules.
[0063] Specifically, after the queue of objects to be tagged is processed, the system extracts the processing results of this task from the memory state data and tag processing records, and summarizes the tag writing status and object processing status of each object. The tag writing status indicates whether the object tag has been written, whether it is in a pending writing state, or whether a writing exception has occurred; the object processing status indicates whether the object has completed content reading, environment identifier parsing, association matching, simulated recording, or skipped processing.
[0064] In some examples, when the main tagging logic loop ends, the system extracts the total number of processed objects, the number of accessible objects, the number of restricted access objects, the number of text configuration objects, the number of associated objects, the number of each environment tag, and the number of abnormal objects from the statistical variables maintained by the dictionary structure, and generates operation statistics data corresponding to object tag changes.
[0065] After generating operation statistics, the system further reads the object list corresponding to the target object's storage space and categorizes and summarizes the objects in the target object's storage space according to object path, storage type, and object tags. Specifically, the system divides the object's directory according to the target path, distinguishes between directly accessible and restricted access objects according to storage type, and differentiates between objects included in the lifecycle rule processing scope and untagged objects according to object tags. In some examples, the system simultaneously obtains the full object list, organizing objects of different storage levels according to directory and storage type to form storage asset data. This allows the report to reflect not only the current tag change but also the overall distribution of object resources in the current bucket.
[0066] When generating object processing reports, the system converts operation statistics and stored asset data into a readable text format and generates report content according to a preset report template. The report content includes task execution parameters, target object storage space, target path, current operating mode, tag writing statistics, environment tag statistics, restricted access object statistics, abnormal object records, and stored asset classification information. For formal operating mode, the system generates an operation audit report, recording the object name, object path, written tag, tag source object, and processing result for each actual tag change. For simulated operating mode, the system generates a tag record to be written, recording each tag operation that is to be executed but not actually submitted. The system can also generate a stored asset snapshot report, recording the complete asset overview of the objects in the current target object storage space, and persistently saving the report file according to time stamps.
[0067] After the report is generated, the system determines the objects to be processed by lifecycle rules based on the objects that have been tagged. Specifically, the system matches object tags with pre-configured lifecycle rules in the object storage service, allowing objects that meet the tag and path conditions to enter subsequent storage type adjustment, low-frequency storage optimization, or cleanup processes. For objects that have not been tagged, objects that only generate records to be tagged in simulation mode, and objects with tagged write errors, the system does not treat them as actual lifecycle rule processing objects, but retains their corresponding status in the report for later review and re-execution.
[0068] Through the above embodiments, this application can synchronously generate operation statistics and storage asset data after object tagging is completed, and record tag changes, object status and asset distribution through object processing reports, so that objects with written object tags can be identified and processed by lifecycle rules, thereby improving the traceability, verifiability and accuracy of subsequent processing of the object storage batch governance process.
[0069] The following are embodiments of the apparatus described in this application, which can be used to execute the embodiments of the method described in this application. For details not disclosed in the apparatus embodiments of this application, please refer to the embodiments of the method described in this application.
[0070] Figure 2 This is a schematic diagram of the object storage file association marking and cleaning device provided in an embodiment of this application. For example... Figure 2 As shown, the device includes: The acquisition module 201 is used to acquire a list of objects under the target path in the target object storage space, and extract the object name, storage type and object path of each object; The determination module 202 is used to determine the content access status of each object according to the storage type, and to determine the objects that meet the preset reading conditions as candidate objects. The identification module 203 is used to identify the file type of the candidate object, determine the text configuration object, read the file content of the text configuration object, and extract the environment identifier based on the preset field parsing rules. Extraction module 204 is used to generate object tags corresponding to text configuration objects based on environment identifiers, and extract base names based on object names of text configuration objects; The association module 205 is used to find associated objects in the object list whose object names and base names meet a preset matching relationship, associate object tags with associated objects, and generate a queue of objects to be tagged. The writing module 206 is used to determine the current running mode based on the running parameters. In the formal running mode, it writes object tags according to the queue of objects to be tagged. In the simulated running mode, it generates a record of tags to be written. The generation module 207 is used to generate an object processing report based on the tag processing records and the object list, and to use the objects that have been written with object tags as the processing objects of the lifecycle rules.
[0071] In some embodiments, Figure 2 The acquisition module 201 parses the target service address, target object storage space and target path according to the running parameters, loads the object storage access configuration and establishes an access connection with the target object storage space; it calls the object enumeration interface based on the target path to obtain the original object list, extracts the fields and unifies the format of the object metadata in the original object list, obtains the object list containing the object name, storage type and object path, and writes the object list into the memory state data.
[0072] In some embodiments, Figure 2 The determination module 202 matches the storage type of each object with a preset set of readable storage types to determine whether each object is allowed to execute file content reading; it marks objects that are allowed to execute file content reading as accessible objects and determines accessible objects as candidate objects; it marks objects that are not allowed to execute file content reading as restricted access objects and writes the object name, object path and storage type of the restricted access objects into memory status data and running log.
[0073] In some embodiments, Figure 2 The identification module 203 filters candidate objects based on their file type characteristics to determine text configuration objects with text configuration attributes; it calls the object reading interface to obtain the file content of the text configuration object and performs character format unification and invalid content filtering on the file content; it matches the environment field in the file content according to the preset field parsing rules, extracts the environment identifier corresponding to the text configuration object, and writes the environment identifier into the memory state data.
[0074] In some embodiments, Figure 2The extraction module 204 matches the environment identifier with the preset label mapping rules to generate object labels containing label names and label values, and establishes a correspondence between text configuration objects and object labels; it separates the path field and file type field of the object name of the text configuration object, extracts the base name used for associated object lookup, and associates the base name with the object label and writes it into the memory state data.
[0075] In some embodiments, Figure 2 The association module 205 matches the object names in the object list based on the base name to determine the associated objects that have a name association relationship with the text configuration object; it uses the object tag corresponding to the text configuration object as the tag to be written to the associated object, and establishes an association record between the associated object, the object tag and the text configuration object; it generates a queue of objects to be tagged containing the text configuration object and the associated object based on the association record.
[0076] In some embodiments, Figure 2 The writing module 206 parses the mode identifier in the running parameters to determine the current running mode. When the current running mode is the formal running mode, it calls the object tag writing interface according to the queue of objects to be tagged, writes the object tags to the corresponding objects, and updates the tag processing record and memory state data. When the current running mode is the simulated running mode, it prohibits the submission of object tag writing requests and generates a tag record to be written according to the queue of objects to be tagged.
[0077] In some embodiments, Figure 2 The generation module 207 extracts the tag writing status and object processing status from the tag processing record, and generates operation statistics data corresponding to the object tag change; according to the object path, storage type and object tag in the object list, it classifies and summarizes the objects in the target object storage space to generate storage asset data; based on the operation statistics data and storage asset data, it generates an object processing report, and matches the objects that have been written with object tags to the corresponding lifecycle rules.
[0078] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0079] Figure 3 This is a schematic diagram of the electronic device 3 provided in an embodiment of this application. Figure 3As shown, the electronic device 3 of this embodiment includes: a processor 301, a memory 302, and a computer program 303 stored in the memory 302 and executable on the processor 301. When the processor 301 executes the computer program 303, it implements the steps in the various method embodiments described above. Alternatively, when the processor 301 executes the computer program 303, it implements the functions of each module / unit in the various device embodiments described above.
[0080] Electronic device 3 can be a desktop computer, laptop, handheld computer, cloud server, or other electronic device. Electronic device 3 may include, but is not limited to, processor 301 and memory 302. Those skilled in the art will understand that... Figure 3 This is merely an example of electronic device 3 and does not constitute a limitation on electronic device 3. It may include more or fewer components than shown, or different components.
[0081] The processor 301 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
[0082] The memory 302 can be an internal storage unit of the electronic device 3, such as a hard disk or memory of the electronic device 3. The memory 302 can also be an external storage device of the electronic device 3, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the electronic device 3. The memory 302 can also include both internal and external storage units of the electronic device 3. The memory 302 is used to store computer programs and other programs and data required by the electronic device.
[0083] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0084] If integrated modules / units are implemented as software functional units and sold or used as independent products, they can be stored in a readable storage medium (e.g., a computer-readable storage medium). Based on this understanding, all or part of the processes in the methods of the above embodiments can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program may include computer program code, which may be in the form of source code, object code, executable files, or certain intermediate forms. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording media, USB flash drives, portable hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc.
[0085] The above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and should all be included within the protection scope of this application.
Claims
1. A method for associative tagging and cleanup of object storage files, characterized in that, include: Obtain a list of objects under the target path in the target object's storage space, and extract the object name, storage type, and object path of each object; The content access status of each object is determined according to the storage type, and objects that meet the preset reading conditions are identified as candidate objects. The candidate objects are identified by file type to determine the text configuration object, the file content of the text configuration object is read, and the environment identifier is extracted based on the preset field parsing rules; Generate object tags corresponding to the text configuration object based on the environment identifier, and extract the base name based on the object name of the text configuration object; Find associated objects in the object list whose object names match the base name according to a preset relationship, associate the object tags with the associated objects, and generate a queue of objects to be tagged. The current operating mode is determined based on the operating parameters. In the formal operating mode, object tags are written according to the queue of objects to be tagged. In the simulated operating mode, a record of tags to be written is generated. An object processing report is generated based on the tag processing records and the object list, and the objects that have been written with object tags are used as the processing objects of the lifecycle rules.
2. The method according to claim 1, characterized in that, The step of obtaining the list of objects under the target path in the target object's storage space and extracting the object name, storage type, and object path of each object includes: Based on the running parameters, the target service address, the target object storage space, and the target path are parsed, the object storage access configuration is loaded, and an access connection with the target object storage space is established. Based on the target path, the object enumeration interface is called to obtain the original object list. The object metadata in the original object list is extracted and formatted to obtain the object list containing the object name, the storage type and the object path. The object list is then written into the memory state data.
3. The method according to claim 1, characterized in that, The step of determining the content access status of each object according to the storage type and identifying objects that meet preset reading conditions as candidate objects includes: The storage type of each object is matched with a preset set of readable storage types to determine whether each object is allowed to read file content. Objects that allow reading of file contents are marked as accessible objects, and the accessible objects are identified as candidate objects; Objects that are not allowed to read file contents are marked as restricted access objects, and the object name, object path, and storage type of the restricted access objects are written into the memory status data and the runtime log.
4. The method according to claim 1, characterized in that, The step of identifying the file type of the candidate objects, determining the text configuration object, reading the file content of the text configuration object, and extracting the environment identifier based on preset field parsing rules includes: The candidate objects are filtered based on their file type characteristics to determine the text configuration objects with text configuration attributes; Call the object reading interface to obtain the file content of the text configuration object, and perform character format unification and invalid content filtering on the file content; According to the preset field parsing rules, the environment field is matched in the file content, the environment identifier corresponding to the text configuration object is extracted, and the environment identifier is written into the memory state data.
5. The method according to claim 1, characterized in that, The step of generating object tags corresponding to the text configuration object based on the environment identifier, and extracting the base name based on the object name of the text configuration object, includes: The environment identifier is matched with a preset tag mapping rule to generate an object tag containing a tag name and a tag value, and a correspondence is established between the text configuration object and the object tag; The object name of the text configuration object is separated into path field and file type field, the base name used for associated object lookup is extracted, and the base name is associated with the object tag and written into memory state data.
6. The method according to claim 1, characterized in that, The step of searching for associated objects in the object list whose object names satisfy a preset matching relationship with the base name, associating the object tags with the associated objects, and generating a queue of objects to be tagged includes: Based on the base name, match the object names in the object list to determine the associated object that has a name association with the text configuration object; The object tag corresponding to the text configuration object is used as the tag to be written to the associated object, and an association record is established between the associated object, the object tag, and the text configuration object; Generate a queue of objects to be tagged based on the associated records, which includes the text configuration object and the associated object.
7. The method according to claim 1, characterized in that, The process of determining the current operating mode based on operating parameters, writing object tags according to the queue of objects to be tagged in the formal operating mode, and generating a record of tags to be written in the simulated operating mode includes: Parse the mode identifier in the operating parameters to determine the current operating mode; When the current running mode is the formal running mode, the object tag writing interface is called according to the queue of objects to be tagged to write the object tag to the corresponding object, and the tag processing record and memory status data are updated. When the current running mode is simulation running mode, submitting object tag writing requests is prohibited, and the tag record to be written is generated according to the queue of objects to be tagged.
8. The method according to claim 1, characterized in that, The step of generating an object processing report based on the tag processing records and the object list, and using objects with written object tags as processing objects for lifecycle rules, includes: Extract the tag writing status and object processing status from the tag processing record, and generate operation statistics data corresponding to object tag changes; Based on the object path, storage type and object tag in the object list, the objects in the target object storage space are classified and summarized to generate storage asset data; The object processing report is generated based on the operation statistics and the stored asset data, and the objects with written object tags are matched to the corresponding lifecycle rules.
9. An object storage file association tagging and cleaning device, characterized in that, include: The acquisition module is used to obtain a list of objects under the target path in the target object's storage space, and extract the object name, storage type, and object path of each object; The determination module is used to determine the content access status of each object according to the storage type, and to determine the objects that meet the preset reading conditions as candidate objects. The identification module is used to identify the file type of the candidate object, determine the text configuration object, read the file content of the text configuration object, and extract the environment identifier based on the preset field parsing rules. The extraction module is used to generate object tags corresponding to the text configuration object based on the environment identifier, and extract the base name based on the object name of the text configuration object; The association module is used to find associated objects in the object list whose object names and base names satisfy a preset matching relationship, associate the object tags with the associated objects, and generate a queue of objects to be tagged. The writing module is used to determine the current running mode based on the running parameters, write object tags according to the queue of objects to be tagged in the formal running mode, and generate a record of tags to be written in the simulation running mode. The generation module is used to generate an object processing report based on the tag processing records and the object list, and to treat the objects that have been written with object tags as the processing objects of the lifecycle rules.
10. An electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method as described in any one of claims 1 to 8.