Metadata anomaly detection method and distributed storage system

CN115757301BActive Publication Date: 2026-06-26ZHEJIANG DAHUA TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG DAHUA TECH CO LTD
Filing Date
2022-09-21
Publication Date
2026-06-26

Smart Images

  • Figure CN115757301B_ABST
    Figure CN115757301B_ABST
Patent Text Reader

Abstract

The application relates to a metadata anomaly detection method and a distributed storage system. A detection period is set for each data node server respectively; when the detection period of a current data node server is reached, a detection task is generated according to first detection information and is sent to the current data node server; wherein the first detection information is metadata of all data blocks distributed in the current data node server and recorded in a metadata server; a consistency comparison result fed back by the current data node server is received, and whether the metadata saved in the distributed storage system is abnormal is confirmed according to the consistency comparison result; the consistency comparison result is a consistency comparison result obtained by the current data node server by performing consistency comparison on the first detection information and second detection information; the second detection information is metadata of data blocks actually distributed in the current data node server, and the stability of metadata anomaly detection is effectively improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of distributed storage technology, and in particular to a metadata anomaly detection method and a distributed storage system. Background Technology

[0002] In a distributed storage system, a file is divided into multiple objects, each composed of multiple data blocks. These data blocks are distributed across various data node servers. Data node servers store the data block data, while metadata servers store the file's metadata, including data block metadata such as block length and the data node server where each block is located. When requesting to read or write a file stored in the distributed storage system, the distribution status of the file's data blocks needs to be obtained from the metadata server. Based on this distribution status, the system then attempts to read or write the data blocks on the corresponding data node server. Therefore, if the metadata of the data node server and the metadata node server is inconsistent, file read / write operations will fail.

[0003] In existing technologies, the metadata server periodically sends a task to all data node servers to simultaneously report metadata. This means each data node server needs to aggregate the metadata of all its data blocks and submit it to the metadata server. The metadata server then compares the reported metadata with its own recorded metadata to determine if there are any anomalies in the metadata stored in the distributed storage system. However, having all data node servers report metadata to the metadata server simultaneously leads to a significant increase in network load on the metadata server. Furthermore, the metadata server's need to receive all reported metadata and compare it with its own recorded metadata results in excessive CPU load. Both the increased network load and the excessive CPU load on the metadata server contribute to the poor stability of metadata anomaly detection in existing technologies. Summary of the Invention

[0004] Therefore, it is necessary to provide a metadata anomaly detection method and a distributed storage system to address the aforementioned technical problems and solve the issue of poor stability in metadata anomaly detection in related technologies.

[0005] In a first aspect, embodiments of this application provide a method for detecting metadata anomalies. The method is applied to a metadata server, wherein the distributed storage system includes a data node server and the metadata server; the method includes the following steps:

[0006] A detection period is set for each of the data node servers; wherein, the data of each data block of the file stored in the distributed storage system is distributed across different data node servers;

[0007] When the detection cycle of the current data node server is reached, a detection task is generated and sent to the current data node server according to the first detection information; wherein, the first detection information is the metadata of all data blocks distributed in the current data node server recorded in the metadata server;

[0008] The system receives the consistency comparison result fed back by the current data node server and confirms whether there is any anomaly in the metadata stored in the distributed storage system based on the consistency comparison result; the consistency comparison result is the consistency comparison result obtained by the current data node server by comparing the first detection information and the second detection information; the second detection information is the metadata of the data blocks actually distributed in the current data node server.

[0009] In some embodiments, confirming whether there are any anomalies in the metadata stored in the distributed storage system based on the consistency comparison result includes the following steps:

[0010] If the consistency comparison result shows that there are data blocks that appear in the second detection information but do not appear in the first detection information, then it is determined that the metadata stored in the distributed storage system has a first type of anomaly; and / or,

[0011] If the consistency comparison result shows that there are data blocks that appear in the first detection information but do not appear in the second detection information, then it is determined that the metadata stored in the distributed storage system has a second type of anomaly; and / or,

[0012] If the consistency comparison result shows that the length of the same data block is inconsistent in the first detection information and the second detection information, then it is determined that the metadata stored in the distributed storage system has a third type of anomaly.

[0013] In some embodiments, the method further includes:

[0014] If the metadata stored in the distributed storage system has a first type of anomaly, then the data block with the first type of anomaly will be used as the first data block.

[0015] If the corresponding file is found by the ID of the first data block, then the metadata of the first data block is added to the metadata server;

[0016] If the corresponding file cannot be found using the ID of the first data block, then the current data node server is instructed to delete the data in the first data block.

[0017] In some embodiments, the method further includes:

[0018] If the metadata stored in the distributed storage system has a second type of anomaly, then the data block with the second type of anomaly will be used as the second data block.

[0019] The metadata of the second data block recorded by the metadata server is modified according to a preset rule to indicate that the data of the second data block does not exist in the distributed storage system.

[0020] In some embodiments, the method further includes:

[0021] If the metadata stored in the distributed storage system has a third type of anomaly, then the data block containing the third type of anomaly will be used as the third data block.

[0022] The length of the third data block recorded in the metadata server is corrected with reference to the length of the third data block in the second detection information.

[0023] In some embodiments, before generating a detection task based on the first detection information and sending it to the current data node server, the method further includes:

[0024] The metadata of the data blocks of files in the write state are removed from the first detection information and used as the latest first detection information.

[0025] Secondly, this application provides a metadata anomaly detection method, which is applied to a data node server, wherein the distributed storage system includes the data node server and a metadata server; the method includes the following steps:

[0026] When the detection period of the current data node server is reached, a detection task generated by the metadata server based on the first detection information is received; wherein, the detection period is set by the data node server for the metadata server; the first detection information is the metadata of all data blocks distributed in the current data node server recorded in the metadata server.

[0027] The first detection information and the second detection information are compared for consistency, and the consistency comparison result is fed back to the metadata server; the second detection information is the metadata of the data blocks actually distributed in the current data node server.

[0028] In some embodiments, the step of performing a consistency comparison between the first detection information and the second detection information, and feeding back the consistency comparison result to the metadata server, includes the following steps:

[0029] The first detection information and the second detection information are compared for consistency to determine whether the length of the same data block is consistent in the first detection information and the second detection information, and whether the same data block appears in both the first detection information and the second detection information.

[0030] If the length of the same data block is inconsistent in the first detection information and the second detection information, the metadata of the data block whose length is inconsistent in the first detection information and the second detection information will be sent to the metadata server as the consistency comparison result.

[0031] If the same data block does not appear in both the first detection information and the second detection information, the metadata of the data block that does not appear in both the first detection information and the second detection information is sent to the metadata server as the consistency comparison result.

[0032] In some embodiments, the method further includes:

[0033] If a data block changes on the data node server, the metadata of the changed data block is reported to the metadata server.

[0034] Thirdly, this application embodiment provides a distributed storage system, which includes a data node server and a metadata server;

[0035] The metadata server is used to set a detection period for each of the data node servers; wherein, the data of each data block of the file stored in the distributed storage system is distributed across different data node servers;

[0036] The metadata server is configured to generate a detection task based on first detection information and send it to the current data node server when the detection cycle of the current data node server is reached; wherein, the first detection information is the metadata of all data blocks distributed in the current data node server recorded in the metadata server.

[0037] The data node server is used to perform a consistency comparison between the first detection information and the second detection information, and send the consistency comparison result to the metadata server; wherein, the second detection information is the metadata of the data blocks actually distributed in the current data node server;

[0038] The metadata server is used to confirm whether there are any anomalies in the metadata stored in the distributed storage system based on the received consistency comparison results.

[0039] The aforementioned metadata anomaly detection method and distributed storage system set detection cycles for each data node server. The data of each data block of a file stored in the distributed storage system is distributed across different data node servers. When the detection cycle for the current data node server arrives, a detection task is generated based on first detection information and sent to the current data node server. The first detection information is the metadata of all data blocks distributed across the current data node server, recorded in the metadata server. The system receives consistency comparison results from the current data node server and confirms whether there are any anomalies in the metadata stored in the distributed storage system based on these results. The consistency comparison result is the result obtained by the current data node server through a consistency comparison of the first and second detection information. The second detection information is the metadata of the data blocks actually distributed across the current data node server. This application, by setting detection cycles for each data node server, distributes the data transmission task, effectively reducing network pressure caused by centralized data transmission. Furthermore, distributing the metadata consistency comparison task across various data node servers significantly reduces the CPU pressure on the metadata server, thereby effectively improving the stability of metadata anomaly detection. Attached Figure Description

[0040] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:

[0041] Figure 1 This is an application scenario diagram of the metadata anomaly detection method provided in the embodiments of this application;

[0042] Figure 2 This is a flowchart of the metadata anomaly detection method provided in the embodiments of this application. Figure 1 ;

[0043] Figure 3 This is a flowchart of the metadata anomaly detection method provided in the embodiments of this application. Figure 2 ;

[0044] Figure 4 A schematic diagram of the structure of a computer device provided in the embodiments of this application. Detailed Implementation

[0045] To make the objectives, technical solutions, and advantages of this application clearer, the application is described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application. All other embodiments obtained by those skilled in the art based on the embodiments provided in this application without inventive effort are within the scope of protection of this application.

[0046] Obviously, the accompanying drawings described below are merely some examples or embodiments of this application. Those skilled in the art can apply this application to other similar scenarios based on these drawings without any inventive effort. Furthermore, it is understood that although the efforts made in this development process may be complex and lengthy, for those skilled in the art related to the content disclosed in this application, any changes to design, manufacturing, or production based on the technical content disclosed in this application are merely conventional technical means and should not be construed as insufficient disclosure of the content of this application.

[0047] In this application, the reference to "embodiment" means that a specific feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment that is mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described in this application may be combined with other embodiments without conflict.

[0048] Unless otherwise defined, the technical or scientific terms used in this application shall have the ordinary meaning understood by one of ordinary skill in the art to which this application pertains. The terms “a,” “an,” “an,” “the,” and similar words used in this application do not indicate quantity limitation and may indicate singular or plural. The terms “comprising,” “including,” “having,” and any variations thereof used in this application are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or device that includes a series of steps or modules (units) is not limited to the listed steps or units, but may also include steps or units not listed, or may include other steps or units inherent to these processes, methods, products, or devices. The terms “connected,” “linked,” “coupled,” and similar words used in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. “Multiple” used in this application refers to two or more. “And / or” describes the relationship between related objects, indicating that three relationships may exist; for example, “A and / or B” can represent: A alone, A and B simultaneously, and B alone. The character " / " generally indicates that the preceding and following objects are in an "or" relationship. The terms "first," "second," and "third" used in this application are merely to distinguish similar objects and do not represent a specific ordering of the objects.

[0049] Figure 1 This diagram illustrates an application scenario for a metadata anomaly detection method provided in one embodiment of this application. For example... Figure 1 As shown, the metadata server 101 and the data node server 102 can transmit data via a network. The metadata server 101 sets a detection period for each data node server 102. The data of each data block of a file stored in the distributed storage system is distributed across different data node servers 102. When the detection period for the current data node server 102 arrives, the metadata server 101 generates a detection task based on first detection information and sends it to the current data node server 102. The first detection information is the metadata of all data blocks distributed across the current data node server 102, recorded in the metadata server 101. The current data node server 102 performs a consistency comparison between the first and second detection information and sends the consistency comparison result to the metadata server 101. The second detection information is the metadata of the data blocks actually distributed across the current data node server 102. The metadata server 101 confirms whether there are any anomalies in the metadata stored in the distributed storage system based on the received consistency comparison result. The metadata server 101 and the data node server 102 can be implemented by any server.

[0050] This embodiment provides a metadata anomaly detection method applied to a metadata server. The distributed storage system includes data node servers and a metadata server; for example... Figure 2 As shown, the method includes the following steps:

[0051] Step S210: Set a detection period for each data node server; wherein, the data of each data block of the file stored in the distributed storage system is distributed across different data node servers.

[0052] A typical distributed storage system consists of two metadata servers (primary and backup) and several data node servers. The data node servers store the actual data of each data block in a file, while the metadata servers store the metadata of each data block. In this embodiment, to distribute the detection tasks among the data node servers, different detection cycles can be set for each data node within the metadata server.

[0053] Step S220: When the detection cycle of the current data node server is reached, a detection task is generated and sent to the current data node server according to the first detection information; wherein, the first detection information is the metadata of all data blocks distributed in the current data node server recorded in the metadata server.

[0054] Specifically, a timer can be set in the metadata server for each data node server to determine when its detection cycle has arrived. When the detection cycle for a given data node server arrives, the metadata server retrieves the metadata of all data blocks distributed across that data node server from its own memory, using this as the first detection information. Based on this first detection information, it generates a detection task and sends it to the current data node server. After the detection task for the current data node server is completed, if another data node server reaches its detection cycle, its detection task can be immediately sent. Specifically, after retrieving the first detection information from its own memory, the metadata server can use heartbeat packets with the current data node server to generate detection tasks in batches and send them to the current data node server.

[0055] In one implementation, before generating a detection task based on the first detection information and sending it to the current data node server, the metadata of data blocks of files in a write state is removed from the first detection information and used as the latest first detection information. Since the data blocks of files in a write state will increase in length as the file is written, or the file may be closed at any time, not sending the metadata of data blocks of files in a write state as the first detection information to the current data node server, but only sending the metadata of the file's data blocks as the first detection information after the file writing is complete, can effectively reduce network bandwidth transmission pressure and effectively reduce the false alarm rate of metadata anomaly detection.

[0056] Step S230: Receive the consistency comparison result fed back by the current data node server, and confirm whether there is any abnormality in the metadata stored in the distributed storage system based on the consistency comparison result; the consistency comparison result is the consistency comparison result obtained by the current data node server by comparing the first detection information and the second detection information; the second detection information is the metadata of the data blocks actually distributed in the current data node server.

[0057] Specifically, after receiving the detection task, the current data node server compares the metadata of all data blocks distributed across the current data node server (recorded by the metadata server in the first detection information) with the metadata of the actual data blocks distributed across the current data node server (i.e., the second detection information), and reports the consistency comparison result to the metadata server. It can be assumed that if the metadata stored in the distributed storage system is normal, then the metadata in the first detection information and the metadata in the second detection information are consistent, ensuring smooth file reading and writing. After receiving the consistency comparison result from the current data node server, the metadata server can confirm whether there are any anomalies in the metadata stored in the distributed storage system.

[0058] In existing technologies, the metadata server periodically sends a task to all data node servers to simultaneously report metadata. This means each data node server needs to aggregate the metadata of all its data blocks and submit it to the metadata server. The metadata server then compares the reported metadata with its own recorded metadata to determine if there are any anomalies in the metadata stored in the distributed storage system. However, having all data node servers report metadata to the metadata server simultaneously leads to a significant increase in network load on the metadata server. Furthermore, the metadata server's need to receive all reported metadata and compare it with its own recorded metadata results in excessive CPU load. Both the increased network load and the excessive CPU load on the metadata server contribute to the poor stability of metadata anomaly detection in existing technologies.

[0059] To address the aforementioned issues, this application proposes a metadata anomaly detection method. This method involves setting a detection period for each data node server. The data of each data block in a file stored in a distributed storage system is distributed across different data node servers. When the detection period for the current data node server arrives, a detection task is generated based on first detection information and sent to the current data node server. The first detection information consists of the metadata of all data blocks distributed across the current data node server, recorded in the metadata server. The method receives a consistency comparison result from the current data node server and uses this result to confirm whether there are any anomalies in the metadata stored in the distributed storage system. The consistency comparison result is the result obtained by comparing the first and second detection information. The second detection information is the metadata of the data blocks actually distributed across the current data node server. By setting a detection period for each data node server, this application distributes the data transmission task, effectively reducing network pressure caused by centralized data transmission. Furthermore, distributing the metadata consistency comparison task across various data node servers significantly reduces the CPU pressure on the metadata server, thereby effectively improving the stability of metadata anomaly detection.

[0060] In one embodiment, step S230 above confirms whether there are any anomalies in the metadata stored in the distributed storage system based on the consistency comparison result, including the following steps:

[0061] Step S231: If the consistency comparison result shows that there are data blocks that appear in the second detection information but do not appear in the first detection information, then it is determined that there is a first type of anomaly in the metadata stored in the distributed storage system.

[0062] If the consistency comparison results show that there are data blocks that appear in the second detection information but not in the first detection information, it means that although the data of some data blocks are distributed in the current data node server, due to network connection and other reasons, the metadata of these data blocks has not been successfully reported to the metadata server, resulting in the absence of this record in the metadata server. In this case, it is determined that there is a first type of anomaly in the metadata stored in the distributed storage system.

[0063] Step S232: If the consistency comparison result shows that there are data blocks that appear in the first detection information but do not appear in the second detection information, then it is determined that there is a second type of anomaly in the metadata stored in the distributed storage system.

[0064] If the consistency comparison results show that some data blocks that appear in the first detection information do not appear in the second detection information, it means that the metadata of some data blocks is recorded in the metadata server, but the data of these data blocks is not saved in the data node server. In this case, it is determined that there is a first type of anomaly in the metadata stored in the distributed storage system.

[0065] Step S233: If the consistency comparison result shows that the length of the same data block is inconsistent in the first detection information and the second detection information, then it is determined that the metadata stored in the distributed storage system has a third type of anomaly.

[0066] If the consistency comparison results show that the length of the same data block is inconsistent in the first and second detection information, it indicates that the metadata recorded in the metadata server has not been updated in time, resulting in errors in the recorded metadata. In this case, it is determined that there is a third type of anomaly in the metadata stored in the distributed storage system.

[0067] Through the above steps S231 to S233, various abnormal situations that occur in the distributed storage system can be effectively distinguished based on the consistency comparison results.

[0068] Furthermore, in one embodiment, the metadata anomaly detection method further includes the following steps:

[0069] If the metadata stored in the distributed storage system contains a Type I exception, then the data block containing the Type I exception will be used as the first data block.

[0070] If the corresponding file is found by the ID of the first data block, then the metadata of the first data block is added to the metadata server;

[0071] If the corresponding file cannot be found by the ID of the first data block, the current data node server is instructed to delete the data in the first data block.

[0072] Specifically, if the metadata stored in the distributed storage system exhibits a Type I anomaly, it indicates that although some data blocks are distributed across the current data node servers, their metadata failed to be successfully reported to the metadata server due to network connectivity issues, resulting in the absence of this record in the metadata server. In this case, based on the consistency comparison results, the data block exhibiting the Type I anomaly, i.e., the first data block, can be identified. The file corresponding to this data block is then searched using its ID. If the file is found using the ID, it means that the file corresponding to the first data block is indeed stored in the distributed storage system, and the metadata server simply did not record its metadata. Therefore, the metadata for the first data block is added to the metadata server, effectively eliminating the Type I anomaly in the distributed storage system. Furthermore, if the file is found using the ID of the first data block, its status can be changed: from corrupted to recoverable, or from recoverable to a normal file. If the corresponding file cannot be found by the ID of the first data block, it means that the file corresponding to the first data block has been deleted from the distributed storage system, but the current data node server has not deleted the data of the first data block in time. By instructing the current data node server to delete the data of the first data block, the first type of anomaly of the distributed storage system can be effectively eliminated.

[0073] In one embodiment, the metadata anomaly detection method further includes the following steps:

[0074] If the metadata stored in the distributed storage system contains a second type of exception, then the data block containing the second type of exception will be used as the second data block.

[0075] The metadata of the second data block recorded by the metadata server is modified according to preset rules to indicate that the data of the second data block does not exist in the distributed storage system.

[0076] Specifically, if the metadata stored in the distributed storage system exhibits a Type II anomaly, it indicates that the metadata of certain data blocks is recorded in the metadata server, but the data of these data blocks is not stored in the data node servers. In this case, based on the consistency comparison results, the data blocks exhibiting the Type II anomaly, i.e., the second data block, are identified. The metadata of the second data block recorded in the metadata server is then modified according to preset rules to indicate that the data of the second data block does not exist in the distributed storage system. For example, the preset rule could be to modify the length of the second data block recorded in the metadata server to 1, and simultaneously modify the number of data node servers where the second data block is distributed to to 0, thereby indicating that the data of the second data block does not exist in the distributed storage system and effectively eliminating the Type II anomaly of the distributed storage system. Furthermore, since the file corresponding to the second data block has lost its data, the corresponding file can be found through the ID of the second data block, and the status of the corresponding file can be changed from a normal file to a recoverable file, or from a recoverable file to a corrupted file. Further, when the file status is a recoverable file, a recovery task needs to be issued to the current data node server, instructing the current data node server to recover the data of the second data block.

[0077] In one embodiment, the metadata anomaly detection method further includes the following steps:

[0078] If the metadata stored in the distributed storage system contains a third type of exception, then the data block containing the third type of exception will be treated as the third data block.

[0079] The length of the third data block recorded in the metadata server is corrected with reference to the length of the third data block in the second detection information.

[0080] If the consistency comparison results show that the length of the same data block is inconsistent between the first and second detection information, it indicates that the metadata recorded in the metadata server has not been updated in a timely manner, resulting in errors in the recorded metadata. In this case, based on the consistency comparison results, a third type of abnormal data block can be identified, i.e., a third data block. Using the length of the third data block in the second detection information as a reference—that is, the actual length of the third data block distributed across the current data node servers—the length of the third data block recorded in the metadata server can be corrected, thereby effectively eliminating the third type of abnormality in the distributed storage system.

[0081] As one implementation method, before generating a detection task based on the first detection information and sending it to the current data node server in step S220, the metadata anomaly detection method provided in this application further includes:

[0082] The metadata reported by the current data node server to the metadata server between the previous detection cycle and the current detection cycle is removed from the first detection information and used as the latest first detection information.

[0083] In a distributed storage system, data node servers and metadata servers may continuously exchange data. Because metadata reported by the current data node server to the metadata server between the previous and current detection cycles may not be promptly retrieved by the metadata server, inconsistencies are highly likely to occur when the current detection cycle arrives. Therefore, the metadata server removes the metadata reported by the current data node server between the previous and current detection cycles from the first detection information, using it as the latest first detection information. This reduces the number of metadata comparisons and the false alarm rate. The metadata reported by the current data node server between the previous and current detection cycles can be used as the first detection information in the next detection cycle for subsequent consistency comparisons.

[0084] This embodiment also provides a metadata anomaly detection method applied to a data node server, wherein the distributed storage system includes a data node server and a metadata server; such as Figure 3 As shown, the method includes the following steps:

[0085] Step S310: When the detection period of the current data node server is reached, a detection task generated by the metadata server based on the first detection information is received; wherein, the detection period is set by the data node server for the metadata server; the first detection information is the metadata of all data blocks distributed in the current data node server recorded in the metadata server.

[0086] Step S320: Perform a consistency comparison between the first detection information and the second detection information, and feed back the consistency comparison result to the metadata server; the second detection information is the metadata of the data blocks actually distributed in the current data node server.

[0087] In one embodiment, step S320 above compares the consistency of the first detection information and the second detection information, and feeds back the consistency comparison result to the metadata server, including the following steps:

[0088] Step S321: Compare the first detection information and the second detection information to determine whether the length of the same data block is consistent in the first detection information and the second detection information, and whether the same data block appears in both the first detection information and the second detection information.

[0089] Step S322: If the length of the same data block is inconsistent in the first detection information and the second detection information, the metadata of the data block whose length is inconsistent in the first detection information and the second detection information is sent to the metadata server as the consistency comparison result.

[0090] Step S323: If the same data block does not appear in both the first detection information and the second detection information at the same time, the metadata of the data block that does not appear in both the first detection information and the second detection information is sent to the metadata server as a consistency comparison result.

[0091] Specifically, the metadata includes the length of the data blocks and their distribution status. Since there are multiple metadata entries in the first and second detection information, the data node server can compare each metadata entry in the first and second detection information sequentially. Metadata entries that have been compared are marked as complete, while metadata entries for data blocks whose lengths differ between the first and second detection information are marked as inconsistent. After traversing all metadata entries in the first and second detection information, the server can determine which data blocks have inconsistent lengths and which data blocks do not appear simultaneously in both information, based on these markings. The corresponding metadata entries are then sent to the metadata server as consistency comparison results.

[0092] Furthermore, in one embodiment, the metadata anomaly detection method provided in this application further includes the following steps:

[0093] Step S330: If a data block has changed on a data node server, the metadata of the changed data block is reported to the metadata server.

[0094] When changes occur to data blocks on a data node server, such as the addition or deletion of data blocks, the data node server that experienced the change reports the metadata of the changed data blocks to the metadata server. This ensures consistency between the metadata in the data node server and the metadata server between two detection cycles. Specifically, when a disk is inserted into a data node server, a data block is added. The data node server mounts the new disk, scans all data blocks on that disk, establishes a mapping from disk slot numbers to the data block set, and reports the metadata of the data blocks on that disk to the metadata server. When a disk is removed from a data node server, a data block is deleted. Upon detecting the disk removal, the data node server collects the metadata set of all data blocks recorded on that disk and reports this metadata as a deletion list to the metadata server. When a data node server has newly written data blocks, it periodically reports the metadata of these data blocks to the metadata server via heartbeat.

[0095] This embodiment also provides a distributed storage system, which includes a data node server and a metadata server;

[0096] The metadata server is used to set the detection period for each data node server; the data of each data block of a file stored in the distributed storage system is distributed across different data node servers.

[0097] The metadata server is used to generate a detection task based on the first detection information and send it to the current data node server when the detection cycle of the current data node server is reached. The first detection information is the metadata of all data blocks distributed in the current data node server, which is recorded in the metadata server.

[0098] The data node server is used to perform a consistency comparison between the first detection information and the second detection information, and send the consistency comparison result to the metadata server; wherein, the second detection information is the metadata of the data blocks actually distributed in the current data node server.

[0099] The metadata server is used to confirm whether there are any anomalies in the metadata stored in the distributed storage system based on the received consistency comparison results.

[0100] The aforementioned distributed storage system utilizes a metadata server to set a detection cycle for each data node server, distributing data transmission tasks and effectively reducing network pressure caused by centralized data transmission. Furthermore, distributing metadata consistency comparison tasks across various data node servers significantly reduces the CPU load on the metadata server, thereby effectively improving the stability of metadata anomaly detection.

[0101] In one embodiment, the data node server is further configured to perform a consistency comparison between the first detection information and the second detection information to determine whether the length of the same data block is consistent in the first detection information and the second detection information, and whether the same data block appears simultaneously in the first detection information and the second detection information; if the length of the same data block is inconsistent in the first detection information and the second detection information, the metadata of the data block whose length is inconsistent in the first detection information and the second detection information is sent to the metadata server as a consistency comparison result; if the same data block does not appear simultaneously in the first detection information and the second detection information, the metadata of the data block that does not appear simultaneously in the first detection information and the second detection information is sent to the metadata server as a consistency comparison result.

[0102] In one embodiment, the metadata server is further configured to: if the consistency comparison result shows that a data block appearing in the second detection information does not appear in the first detection information, then determine that the metadata stored in the distributed storage system has a first type of anomaly; and / or, if the consistency comparison result shows that a data block appearing in the first detection information does not appear in the second detection information, then determine that the metadata stored in the distributed storage system has a second type of anomaly; and / or, if the consistency comparison result shows that the length of the same data block is inconsistent in the first detection information and the second detection information, then determine that the metadata stored in the distributed storage system has a third type of anomaly.

[0103] In one embodiment, the metadata server is further configured to: if the metadata stored in the distributed storage system has a first type of anomaly, then designate the data block with the first type of anomaly as the first data block; if the corresponding file is found by the ID of the first data block, then add the metadata of the first data block to the metadata server; if the corresponding file cannot be found by the ID of the first data block, then instruct the current data node server to delete the data of the first data block.

[0104] In one embodiment, the metadata server is further configured to, if there is a second type of anomaly in the metadata stored in the distributed storage system, use the data block with the second type of anomaly as the second data block; modify the metadata of the second data block recorded by the metadata server according to a preset rule to indicate that there is no data in the second data block in the distributed storage system.

[0105] In one embodiment, the metadata server is further configured to, if the metadata stored in the distributed storage system has a third type of anomaly, use the data block with the third type of anomaly as the third data block; and, with reference to the length of the third data block in the second detection information, correct the length of the third data block recorded in the metadata server.

[0106] In one embodiment, before generating a detection task based on the first detection information and sending it to the current data node server, the metadata server is also used to remove the metadata of the data blocks of the file in the write state from the first detection information and use it as the latest first detection information.

[0107] In one embodiment, if a data block changes on the data node server, the data node server is also used to report the metadata of the changed data block to the metadata server.

[0108] In one embodiment, before generating a detection task based on the first detection information and sending it to the current data node server, the metadata server is further configured to remove the metadata reported by the current data node server to the metadata server between the previous detection cycle and the current detection cycle from the first detection information, and use it as the latest first detection information.

[0109] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 4 As shown, the computer device includes a processor, memory, network interface, and database connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and database. The memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores a set of preset configuration information. The network interface communicates with external terminals via a network connection. When executed by the processor, the computer program implements a metadata anomaly detection method.

[0110] In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, memory, a network interface, a display screen, and an input device connected via a system bus. The processor of the computer device provides computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores an operating system and computer programs. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal via a network connection. When the computer program is executed by the processor, it implements a metadata anomaly detection method. The display screen of the computer device may be a liquid crystal display (LCD) or an e-ink display. The input device of the computer device may be a touch layer covering the display screen, or buttons, a trackball, or a touchpad located on the casing of the computer device, or an external keyboard, touchpad, or mouse, etc.

[0111] Those skilled in the art will understand that Figure 4 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0112] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, the computer program performing the following steps when executed by a processor:

[0113] A detection cycle is set for each data node server; the data of each data block of a file stored in the distributed storage system is distributed across different data node servers.

[0114] When the detection cycle of the current data node server is reached, a detection task is generated and sent to the current data node server based on the first detection information; wherein, the first detection information is the metadata of all data blocks distributed in the current data node server, which is recorded in the metadata server.

[0115] The system receives the consistency comparison result from the current data node server and confirms whether there are any anomalies in the metadata stored in the distributed storage system based on the consistency comparison result. The consistency comparison result is the consistency comparison result obtained by the current data node server by comparing the first detection information and the second detection information. The second detection information is the metadata of the data blocks actually distributed in the current data node server.

[0116] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0117] If the consistency comparison result shows that there are data blocks that appear in the second detection information but do not appear in the first detection information, then it is determined that the metadata stored in the distributed storage system has a first type of anomaly; and / or,

[0118] If the consistency comparison results show that there are data blocks that appear in the first detection information but do not appear in the second detection information, then it is determined that the metadata stored in the distributed storage system contains a second type of anomaly; and / or,

[0119] If the consistency comparison results show that the length of the same data block is inconsistent in the first and second detection information, then it is determined that there is a third type of anomaly in the metadata stored in the distributed storage system.

[0120] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0121] If the metadata stored in the distributed storage system contains a Type I exception, then the data block containing the Type I exception will be used as the first data block.

[0122] If the corresponding file is found by the ID of the first data block, then the metadata of the first data block is added to the metadata server;

[0123] If the corresponding file cannot be found by the ID of the first data block, the current data node server is instructed to delete the data in the first data block.

[0124] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0125] If the metadata stored in the distributed storage system contains a second type of exception, then the data block containing the second type of exception will be used as the second data block.

[0126] The metadata of the second data block recorded by the metadata server is modified according to preset rules to indicate that the data of the second data block does not exist in the distributed storage system.

[0127] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0128] If the metadata stored in the distributed storage system contains a third type of exception, then the data block containing the third type of exception will be treated as the third data block.

[0129] The length of the third data block recorded in the metadata server is corrected with reference to the length of the third data block in the second detection information.

[0130] In one embodiment, before generating a detection task based on the first detection information and sending it to the current data node server, the processor executes the computer program and further performs the following steps:

[0131] The metadata of data blocks of files in a write state are removed from the first detection information and used as the latest first detection information.

[0132] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0133] When the detection cycle of the current data node server is reached, the detection task generated by the metadata server based on the first detection information is received; wherein, the detection cycle is set by the data node server for the metadata server; the first detection information is the metadata of all data blocks distributed in the current data node server recorded in the metadata server;

[0134] The first detection information and the second detection information are compared for consistency, and the consistency comparison result is fed back to the metadata server; the second detection information is the metadata of the data blocks actually distributed in the current data node server.

[0135] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0136] The first detection information and the second detection information are compared for consistency to determine whether the length of the same data block is consistent in the first detection information and the second detection information, and whether the same data block appears in both the first detection information and the second detection information.

[0137] If the length of the same data block is inconsistent in the first detection information and the second detection information, the metadata of the data block whose length is inconsistent in the first detection information and the second detection information will be sent to the metadata server as the consistency comparison result.

[0138] If the same data block does not appear in both the first and second detection information, the metadata of the data block that does not appear in both the first and second detection information will be sent to the metadata server as a consistency comparison result.

[0139] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0140] If a data block changes on a data node server, the metadata of the changed data block is reported to the metadata server.

[0141] The aforementioned storage medium sets a detection period for each data node server. The data of each data block of a file stored in the distributed storage system is distributed across different data node servers. When the detection period for the current data node server arrives, a detection task is generated based on first detection information and sent to the current data node server. The first detection information is the metadata of all data blocks distributed across the current data node server, recorded in the metadata server. The system receives the consistency comparison result from the current data node server and confirms whether there are any anomalies in the metadata stored in the distributed storage system based on the consistency comparison result. The consistency comparison result is the consistency comparison result obtained by the current data node server by comparing the first and second detection information. The second detection information is the metadata of the data blocks actually distributed across the current data node server. This application, by setting a detection period for each data node server, distributes the data transmission task, effectively reducing network pressure caused by centralized data transmission. Furthermore, distributing the metadata consistency comparison task across various data node servers significantly reduces the CPU pressure on the metadata server, thereby effectively improving the stability of metadata anomaly detection.

[0142] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.

[0143] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties.

[0144] It should be understood that the specific embodiments described herein are merely illustrative of the application and not intended to limit it. All other embodiments derived by those skilled in the art based on the embodiments provided in this application without inventive effort are within the scope of protection of this application.

[0145] Obviously, the accompanying drawings are merely some examples or embodiments of this application. Those skilled in the art can apply this application to other similar situations based on these drawings without any creative effort. Furthermore, it is understood that although the work done in this development process may be complex and lengthy, for those skilled in the art, certain design, manufacturing, or production modifications made based on the technical content disclosed in this application are merely conventional technical means and should not be considered as insufficient disclosure of this application.

[0146] The term "embodiment" in this application refers to a specific feature, structure, or characteristic described in connection with an embodiment that may be included in at least one embodiment of this application. The appearance of this phrase in various places in the specification does not necessarily imply the same embodiment, nor does it imply that it is mutually exclusive with or independent of other embodiments. It will be clearly or implicitly understood by those skilled in the art that the embodiments described in this application may be combined with other embodiments without conflict.

[0147] The above embodiments merely illustrate several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of patent protection. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the appended claims.

Claims

1. A method for detecting metadata anomalies, characterized in that, The method is applied to a metadata server, wherein the distributed storage system includes a data node server and the metadata server; the method includes the following steps: A detection period is set for each of the data node servers; wherein, the data of each data block of the file stored in the distributed storage system is distributed across different data node servers; When the detection cycle of the current data node server is reached, a detection task is generated and sent to the current data node server according to the first detection information; wherein, the first detection information is the metadata of all data blocks distributed in the current data node server recorded in the metadata server; The system receives the consistency comparison result fed back by the current data node server and confirms whether there is any anomaly in the metadata stored in the distributed storage system based on the consistency comparison result. The consistency comparison result is the consistency comparison result obtained by the current data node server by comparing the first detection information and the second detection information. The second detection information is the metadata of the data blocks actually distributed in the current data node server. The process of comparing the first detection information and the second detection information to obtain the consistency comparison result includes: comparing the first detection information and the second detection information to determine whether the length of the same data block is consistent in the first detection information and the second detection information, and whether the same data block appears in both the first detection information and the second detection information. If the length of the same data block is inconsistent in the first detection information and the second detection information, the metadata of the data block with inconsistent length in the first detection information and the second detection information is used as the consistency comparison result. If the same data block does not appear in both the first detection information and the second detection information, the metadata of the data block that does not appear in both the first detection information and the second detection information is used as the consistency comparison result.

2. The metadata anomaly detection method according to claim 1, characterized in that, The step of confirming whether there are any anomalies in the metadata stored in the distributed storage system based on the consistency comparison results includes the following steps: If the consistency comparison result shows that there are data blocks that appear in the second detection information but do not appear in the first detection information, then it is determined that the metadata stored in the distributed storage system has a first type of anomaly; and / or, If the consistency comparison result shows that there are data blocks that appear in the first detection information but do not appear in the second detection information, then it is determined that the metadata stored in the distributed storage system has a second type of anomaly; and / or, If the consistency comparison result shows that the length of the same data block is inconsistent in the first detection information and the second detection information, then it is determined that the metadata stored in the distributed storage system has a third type of anomaly.

3. The metadata anomaly detection method according to claim 2, characterized in that, The method further includes: If the metadata stored in the distributed storage system has a first type of anomaly, then the data block with the first type of anomaly will be used as the first data block. If the corresponding file is found by the ID of the first data block, then the metadata of the first data block is added to the metadata server; If the corresponding file cannot be found using the ID of the first data block, then the current data node server is instructed to delete the data in the first data block.

4. The metadata anomaly detection method according to claim 2, characterized in that, The method further includes: If the metadata stored in the distributed storage system has a second type of anomaly, then the data block with the second type of anomaly will be used as the second data block. The metadata of the second data block recorded by the metadata server is modified according to a preset rule to indicate that the data of the second data block does not exist in the distributed storage system.

5. The metadata anomaly detection method according to claim 2, characterized in that, The method further includes: If the metadata stored in the distributed storage system has a third type of anomaly, then the data block containing the third type of anomaly will be used as the third data block. The length of the third data block recorded in the metadata server is corrected with reference to the length of the third data block in the second detection information.

6. The metadata anomaly detection method according to claim 1, characterized in that, Before generating a detection task based on the first detection information and sending it to the current data node server, the method further includes: The metadata of the data blocks of files in the write state are removed from the first detection information and used as the latest first detection information.

7. A method for detecting metadata anomalies, characterized in that, The method is applied to a data node server, wherein the distributed storage system includes the data node server and a metadata server; the method includes the following steps: When the current detection cycle of the data node server is reached, a detection task generated by the metadata server based on the first detection information is received; wherein, the detection cycle is set by the data node server for the metadata server; and the first detection information is the metadata of all data blocks distributed in the current data node server recorded in the metadata server. The first detection information and the second detection information are compared for consistency, and the consistency comparison result is fed back to the metadata server; the second detection information is the metadata of the data blocks actually distributed in the current data node server; the step of comparing the first detection information and the second detection information for consistency and feeding back the consistency comparison result to the metadata server includes: comparing the first detection information and the second detection information for consistency, determining whether the length of the same data block is consistent in the first detection information and the second detection information, and whether the same data block appears in both the first detection information and the second detection information; if the length of the same data block is inconsistent in the first detection information and the second detection information, the metadata of the data block with inconsistent length in the first detection information and the second detection information is sent to the metadata server as the consistency comparison result; if the same data block does not appear in both the first detection information and the second detection information, the metadata of the data block that does not appear in both the first detection information and the second detection information is sent to the metadata server as the consistency comparison result.

8. The metadata anomaly detection method according to claim 7, characterized in that, The method further includes: If a data block changes on the data node server, the metadata of the changed data block is reported to the metadata server.

9. A distributed storage system, characterized in that, The distributed storage system includes a data node server and a metadata server; The metadata server is used to set a detection period for each of the data node servers; wherein, the data of each data block of the file stored in the distributed storage system is distributed across different data node servers; The metadata server is configured to generate a detection task based on the first detection information and send it to the current data node server when the detection period of the current data node server is reached; wherein, the first detection information is the metadata of all data blocks distributed in the current data node server recorded in the metadata server. The data node server is configured to perform a consistency comparison between the first detection information and the second detection information, and send the consistency comparison result to the metadata server; wherein, the second detection information is the metadata of data blocks actually distributed in the current data node server; the step of performing a consistency comparison between the first detection information and the second detection information, and feeding back the consistency comparison result to the metadata server, includes: performing a consistency comparison between the first detection information and the second detection information, determining whether the length of the same data block is consistent in the first detection information and the second detection information, and whether the same data block appears simultaneously in the first detection information and the second detection information; if the length of the same data block is inconsistent in the first detection information and the second detection information, then the metadata of the data block with inconsistent lengths in the first detection information and the second detection information is sent to the metadata server as the consistency comparison result; if the same data block does not appear simultaneously in the first detection information and the second detection information, then the metadata of the data block that does not appear simultaneously in the first detection information and the second detection information is sent to the metadata server as the consistency comparison result. The metadata server is used to confirm whether there are any anomalies in the metadata stored in the distributed storage system based on the received consistency comparison results.