Method and device for updating data in distributed storage system

A distributed storage and data update technology, applied in the field of distributed systems, can solve problems such as inability to guarantee data consistency of multiple copies, increase overhead, etc., to achieve the effect of solving data consistency, improving efficiency, and ensuring reading performance

Active Publication Date: 2013-09-11
SHANDA INTERACTIVE ENTERTAINMENT
6 Cites 62 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0005] In view of this, the present invention provides a method and device for updating data in a distributed storage system to overcome the pr...
View more

Method used

[0065] In this embodiment, the update is considered successful when at least half of the replica server nodes are successfully updated when the data is updated, so that the efficiency of updating data can be improved. At the same time, a scheme corresponding to the version number of the data and the number of updates is also adopted, so that when the data is requested from the server node with the successfully written version number when reading, if the version number of the server node indicates that the copy stored by itself is not the latest , you can reject this read operation, so that you can ensure that the client can retry to other server nodes to ensure that the client can read the latest data, that is, to achieve read-write consistency. Therefore, this embodiment more perfectly solves the problem of data consistency between multiple copies, and ensures subsequent read performance, and the solution provided by the embodiment of the present invention can be directly applied to various storage systems.
[0082] In the embodiment of the present invention, the consistency of data update can also be guaranteed, and the data recovery problem during downtime and recovery can be solved by at least one server node caching the latest data.
[0093] In this embodiment, the data update device in the ...
View more

Abstract

The invention discloses a method and a device for updating data in a distributed storage system. The method comprises the steps as follows: a current server node distributes an only version number to to-be-updated data in a progressive increase manner, and a plurality of replica server node identifiers where a plurality of replicas of the to-be-updated data are located from a metadata information repository; the current server node sends the to-be-updated data and the distributed version number to the replica server nodes corresponding to the plurality of the replica server node identifiers, so that a plurality of the replica server nodes can update respectively preserved replicas and corresponding version numbers according to the to-be-updated data; the version numbers represent the update times of the replicas; and the current server node judges whether data of more than a half of the plurality of the replica server nodes are updated successfully, if the data are updated successfully, successful data updating information and updated version numbers are sent back to a client. With the adoption of the method and the device for updating data in the distributed storage system, the expenditure is increased, and the data consistency among the plurality of the replicas can be guaranteed.

Application Domain

TransmissionSpecial data processing applications

Technology Topic

Client-sideMetadata +3

Image

  • Method and device for updating data in distributed storage system
  • Method and device for updating data in distributed storage system
  • Method and device for updating data in distributed storage system

Examples

  • Experimental program(4)

Example Embodiment

[0050] Example 1
[0051] see figure 1 shown, figure 1 It is a flowchart of Embodiment 1 of the data update method in the distributed storage system disclosed in the embodiment of the present invention. In this embodiment, the method may include:
[0052] Step 101: The current server node receives the data to be updated sent by the client.
[0053]The client first sends the data to be updated to a server node in the distributed storage system. For example, user information is stored on each server node in the distributed storage system, and if the user information needs to be changed, the client Sending new user information to one of the server nodes is the data to be updated, and the server node that receives the new user information is the current server node.
[0054] Step 102: The current server node incrementally assigns a unique version number to the data to be updated, and acquires the identifiers of multiple replica server nodes where multiple copies of the data to be updated are located from the metadata information repository.
[0055] The metadata information repository is pre-established, and can store the identifiers of each server node in the distributed storage system, the distribution information of the replicas in the server nodes, and the status of the replicas. Each server node can register the copy distribution information and copy status of the server node with the metadata information repository through its own data service module when it is started, and then maintain the heartbeat with the metadata information repository. It is also possible to maintain replica distribution information and replica state data through the metadata information repository, and provide externally a replica query interface and a replica state change monitoring interface.
[0056] After receiving the to-be-updated data sent by the client, the current server node may acquire, from the metadata information repository, the identifiers of multiple replica server nodes where multiple replicas of the to-be-updated data are located. In a distributed storage system, a piece of data will be divided into multiple sub-data, and these multiple sub-data are stored in multiple server nodes, and each server node has a copy of a certain sub-data stored in it. Therefore, in this step, if the data to be updated is received, it is first necessary to know which server nodes the data to be updated is stored in, that is, multiple replica server nodes where multiple copies of the data to be updated are located.
[0057] For example, if the data to be updated is user information, then if the user information is divided into 10 parts and stored in 10 servers, the 1st to 5th servers save the 1st to 100th user information, and the 6th to 100th The 10th server saves the 101st to 200th user information, and so on, the 45th to 50th servers saves the 901st to 1000th user information, and the user information saved on each server node has a copy. exist. If the data to be updated in this step is the 99th user information, it can be determined that the copies saved in the 1st to 5th server nodes need to be updated, that is, the 1st to 5th server nodes are determined in this step.
[0058] Among them, when the current server node assigns a version number to the data to be updated, for the same data, the version number of the data to be updated received by the current server from the first update can be assigned as 1, and so on. After several updates, a unique version number can be assigned to the data to be updated, and each time the replica server receives the data to be updated and the version number, it updates the data and records the currently received version number, and treats it as the copy's version number. Current version number.
[0059] Step 103: The current server node sends the data to be updated and its assigned version number to the replica server nodes corresponding to the identifiers of the multiple replica server nodes, so that the multiple replica server nodes store the data separately according to the data to be updated. The copy and the corresponding version number are updated; the version number indicates the update times of the copy.
[0060] The current server node sequentially sends the data to be updated to the replica server nodes corresponding to the identifiers of the multiple replica server nodes, and the multiple replica server nodes then store the respective copies and corresponding versions according to the data to be updated. number, where the version number represents the number of times the copy is updated. For example, a version number of 1 indicates that the current update is the first update, and so on, a version number of n indicates that the current update is the nth update. When the replica server performs data update, it also needs to determine whether the version number is consistent, that is, whether the replica version number recorded by itself is 1 smaller than the received version number, and if not, it does not need to update.
[0061] The current server node may send data to the multiple replica server nodes in different ways, for example, the data to be updated may be sent to the multiple replica server nodes in parallel or in an orderly manner.
[0062] Step 104 : the current server node judges whether at least half of the multiple replica server nodes successfully update data, and if so, go to step 105 .
[0063] After the replica server node successfully updates the copy saved by itself, it informs the current server node that if the current server node determines that at least half of the replica server nodes have successfully updated the data, it will go to step 105 to return the data update success to the client. The message and the updated version number. For example, following the above example, if three replica server nodes are successfully updated, step 105 is executed. If there are 6 replica servers in total, at least 4 replica servers are required to be updated successfully.
[0064] Step 105: Return the data update success message and the updated version number to the client.
[0065] In this embodiment, when at least half of the replica server nodes are successfully updated, it is considered that the update is successful when updating the data, so that the efficiency of updating the data can be improved. At the same time, the scheme in which the version number of the data corresponds to the number of updates is also adopted, so that when the server node requests data with the successfully written version number when reading, if the version number of the server node indicates that the copy stored by itself is not the latest , the read operation can be rejected, so that the client can retry to other server nodes to ensure that the client can read the latest data, that is, to achieve read-write consistency. Therefore, the present embodiment more completely solves the problem of data consistency among multiple copies, and ensures subsequent read performance, and the solutions provided by the embodiments of the present invention can be directly applied to various storage systems.

Example Embodiment

[0066] Embodiment 2
[0067] refer to figure 2 shown, figure 2 In the flowchart of Embodiment 2 of the method for updating data in a distributed storage system disclosed in this embodiment of the present invention, in addition to steps 101 to 104 in Embodiment 1, after step 104, the method may further include:
[0068] Step 201: The current server node caches the data to be updated, and deletes the data to be updated after all the multiple replica server nodes are updated.
[0069] The current server node first caches the data to be updated, and when the data is successfully updated on all replica server nodes, it can be deleted. In addition, when the current server node has insufficient space, it can also be deleted, which can save less space. The storage space of the current server node.
[0070] It should be noted that, in this embodiment of the present invention, the recently updated data can be cached separately on at least one replica node, that is, on a replica server node, not only the saved copy is updated, but also the copy is updated for the copy. Save the most recently updated data in the order of version numbers, so that in the event that the current update server fails and causes its own cached data to be lost, other replica servers can also rely on the latest piece of data cached on this replica server for recovery.
[0071] Step 202: The replica server node receives the version number carried by the client when reading the data.
[0072] When the client needs to perform a data read operation on the current server node, the client simultaneously carries the version number of the data to be read this time to the replica server node.
[0073] Step 203: The replica server node determines whether the carried version number is newer than the version number of the replica stored by itself, and if so, rejects the current data read operation.
[0074] The replica server node determines whether the carried version number is newer than the version number of the replica stored by itself. If so, for example, the version number carried is 3, while the version number of the replica stored by the replica server itself is 2, indicating that the stored version number is 2. The copy is the data that the client needs to read, the data that has been updated twice, while the client needs to read the data that has been updated three times, so this read operation is not allowed, and if the carried version If the version number is the same as or older than the version number of the copy stored by itself, it means that the stored copy can be used as the data that the client needs to read this time, and this data read operation is accepted.
[0075] Step 204: When the current server node restarts, a request is sent to the plurality of replica server nodes to obtain the latest version numbers corresponding to the replica server nodes.
[0076] In practical applications, once the current server is down or restarted, the current server node sends a request to other multiple replica server nodes to request to obtain the latest version numbers corresponding to the multiple replica servers.
[0077] Step 205: The current server updates the initial version number of the copy to the latest version number, so that the data to be updated is subsequently allocated with the latest version number as the initial version number.
[0078] The current server node updates the initial version number of the copy to the latest version number obtained in step 204, so that when subsequently assigning a version number to the data to be updated, the latest version number can be used as the starting version number to perform Assignment of version numbers.
[0079] Step 206: Compare the multiple version numbers of the multiple copies saved by the multiple replica server nodes, and if there is a replica server node with a version number smaller than that of other replica server nodes, trigger the replica server with a smaller version number. The node requests the replica corresponding to the larger version number from the replica server node with the larger version number.
[0080] The current server node then compares the version numbers obtained from multiple replica server nodes. If there are replica server nodes with version number 1 and version number 2 at the same time, the current server will trigger the replica server node with version number 1 to send the version number to the version number. Request a copy corresponding to a larger version number for the replica server node of 2, so as to ensure that even if the server node encounters an unexpected situation, the consistency between the data copies can be guaranteed.
[0081] It should be noted that, the above steps 201 to 205 may not be limited to the above-defined sequence relationship during the data update process.
[0082] In the embodiment of the present invention, the consistency of data update can also be ensured, and the problem of data recovery during downtime and recovery can be solved by caching the latest data by at least one server node.
[0083] The method is described in detail in the above-mentioned embodiments disclosed in the present invention, and the method of the present invention can be implemented by various forms of devices. Therefore, the present invention also discloses a data update device in a distributed storage system. The specific details are given below. Examples are described in detail.

Example Embodiment

[0084] Embodiment 3
[0085] see image 3 shown, image 3 It is a schematic structural diagram of Embodiment 1 of a data update apparatus in a distributed storage system disclosed in an embodiment of the present invention. In this embodiment, the apparatus may include:
[0086] The metadata information repository 301 is used to store the identifier of each server node in the distributed storage system, the distribution information of the replica in the server node and the status of the replica;
[0087] Receive data to be updated module 302, configured to receive data to be updated sent by the client;
[0088] The allocation module 303 is used to incrementally allocate a unique version number for the data to be updated;
[0089] an obtaining module 304, configured to obtain, from the metadata information repository, the identifiers of multiple replica server nodes where multiple copies of the data to be updated are located;
[0090] The sending module 305 is configured to send the data to be updated and the allocated version numbers thereof to the replica server nodes corresponding to the identifiers of the multiple replica server nodes, so that the multiple replica server nodes store their respective data according to the data to be updated. The copy and the corresponding version number are updated; the version number indicates the number of updates of the copy;
[0091] Judging and updating module 306, for judging whether at least more than half of the multiple replica server nodes update data successfully, if so, triggering return module 307;
[0092] The returning module 307 is configured to return the data update success message and the updated version number to the client.
[0093] In this embodiment, the data update device in the distributed storage system considers that the update is successful when at least half of the replica server nodes are updated successfully when updating the data, so that the efficiency of updating the data can be improved. At the same time, the scheme in which the version number of the data corresponds to the number of updates is also adopted, so that when the server node requests data with the successfully written version number when reading, if the version number of the server node indicates that the copy stored by itself is not the latest , the read operation can be rejected, so that the client can retry to other server nodes to ensure that the client can read the latest data, that is, to achieve read-write consistency. Therefore, this embodiment more completely solves the problem of data consistency among multiple copies, and ensures subsequent read performance, and the solution provided by the embodiment of the present invention can be directly applied to various storage systems.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Compositing Windowing System

InactiveUS20100058229A1minimal costimprove efficiency
Owner:QUALCOMM INC

Image reading apparatus

InactiveUS20050238205A1improve efficiency
Owner:FUJIFILM BUSINESS INNOVATION CORP

Classification and recommendation of technical efficacy words

  • Solve data consistency
  • Improve efficiency

Distributed database data verification method and device and related device

PendingCN107798007ASolve data consistency
Owner:金篆信科有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products