Database backup system ensuring consistency between primary and mirrored backup database copies despite backup interruption

A database management system ensures consistency between primary and mirrored backup copies of a database, despite occurrence of a suspending condition interrupting the normal process of mirroring the primary database. One or more primary controllers are provided, each having a data storage unit with multiple primary data storage devices. Multiple secondary controllers each have multiple associated secondary data storage devices, each secondary controller being coupled to one primary controller. One or more primary databases reside on the primary devices, with a corresponding number of secondary databases residing on the secondary devices. Each secondary database mirrors a corresponding primary database. Either a host, attached to a primary controller, or one of the primary controllers itself, maintains a map cross-referencing each primary and secondary database with the primary and secondary devices containing portions thereof. If a predefined "suspending condition" affecting data mirroring occurs, the host or primary controller consults its map to identify all primary and secondary devices affected by the condition. Then, each primary controller stops all ongoing and future read/writes with each of its affected primary devices. Each primary controller also directs each secondary controller having an affected secondary device to stop mirroring the primary databases stored on that device. Then, the primary controller starts intermediate change recording and resumes read/writes with its primary devices. When the suspending condition ends, each primary controller applies the appropriate logged changes to its secondary database(s) and then reactivates each secondary database.

Fast primary cluster recovery

A cluster recovery process is implemented across a set of distributed archives, where each individual archive is a storage cluster of preferably symmetric nodes. Each node of a cluster typically executes an instance of an application that provides object-based storage of fixed content data and associated metadata. According to the storage method, an association or “link” between a first cluster and a second cluster is first established to facilitate replication. The first cluster is sometimes referred to as a “primary” whereas the “second” cluster is sometimes referred to as a “replica.” Once the link is made, the first cluster's fixed content data and metadata are then replicated from the first cluster to the second cluster, preferably in a continuous manner. Upon a failure of the first cluster, however, a failover operation occurs, and clients of the first cluster are redirected to the second cluster. Upon repair or replacement of the first cluster (a “restore”), the repaired or replaced first cluster resumes authority for servicing the clients of the first cluster. This restore operation preferably occurs in two stages: a “fast recovery” stage that involves preferably “bulk” transfer of the first cluster metadata, following by a “fail back” stage that involves the transfer of the fixed content data. Upon receipt of the metadata from the second cluster, the repaired or replaced first cluster resumes authority for the clients irrespective of whether the fail back stage has completed or even begun.
