[0017] Example
[0018] Aiming at the characteristics of huge unstructured business data volume, high real-time requirements, and frequent concurrent access in a specific field of electric power, the present invention proposes a solution for massive data storage and related services based on a cloud architecture, which is scalable and low Cost, high performance, easy to use and reliable. Experimental data shows that this solution can effectively improve data storage and response time under existing conditions.
[0019] Cloud storage is a distributed file system composed of a large number of ordinary computer clusters interconnected through a high-speed network. It is managed and maintained by administrators and provides data storage and business access functions as a whole. The system uses API or API-based applications to provide network access to the outside world, which has the characteristics of scalability, low cost, high performance, and ease of use. In view of the huge amount of unstructured business data, high real-time requirements, and frequent concurrent access in specific areas of electric power, the system uses multiple standard PCs with lower unit prices to build distributed storage services, which can be expanded to hundreds or even thousands The scale of the cluster, and, with the growth of the cluster size, the overall performance of the system shows a linear growth. By introducing consistent hashing technology and data redundancy at the software level, at the expense of a certain degree of data consistency to achieve high availability and scalability, it supports multi-tenant mode, container and object read and write operations, and solves massive power application scenarios. Storage and management of unstructured data.
[0020] Cloud storage often refers to a server or a process running on a server as a node, and nodes are interconnected through a network. A core problem of cloud storage is automatic fault tolerance. However, server nodes are often unreliable, and so is the network. In order to ensure the reliability of data and the availability of services, it is necessary to keep multiple copies of the data. In the case of server downtime, network abnormality, disk failure, network timeout, etc., there will be consistency problems among multiple copies. Due to the existence of anomalies, distributed storage systems tend to store multiple copies of data redundantly during design, and each copy is called a copy (replica/copy). In this way, when a node fails, data can be read from other copies. It can be considered that copy is the only means of fault tolerance technology for distributed storage systems. Due to the existence of multiple copies, how to ensure the consistency between the copies is the theoretical core of the entire distributed system.
[0021] The invention is based on the well-known OpenStack in the industry, and according to the characteristics of power business data, it adopts object storage technology and deeply customizes the required services and interfaces. It has built-in object-based data management strategies to ensure the safety of data in the event of partial system failures. And reliability. From the perspective of the client, the strong consistency, weak consistency and eventual consistency of data writing are guaranteed. Completely eliminate single points of failure in the storage system, combined with automatic failure detection and rapid failure recovery technology, to ensure the continuous and stable operation of user applications, while reducing the difficulty of deployment and management.
[0022] For a more intuitive understanding, the cloud storage system built on the general disk array manages the data on the disk through the operating system API. Such a system can be logically divided into metadata nodes (control nodes) and data nodes (storage nodes) , Management node and client four parts, these four parts respectively correspond to the 4-layer structure model of cloud storage, cloud storage built on distributed storage is through the above characteristics and design to achieve the efficiency of massive data files Organize and sort. While it can accommodate massive amounts of data, it can also quickly manage the required files, providing the possibility for large-scale applications of technologies such as geographic imaging, virtual reality, and point cloud fitting analysis.
[0023] Such as figure 1 As shown, the cloud storage system includes a control server and a proxy server, an object server, a container server, and an account server connected to the control server. The system also includes a verification server and a cache server. The verification server and the cache server are respectively connected to the proxy server, and the verification server pair The data in the proxy server performs identity verification, and the cache server caches the data in the proxy server.
[0024] Object server includes object file, object update service module, object replication service module and object audit service module. The object audit service module scans and detects the integrity of the object file. If the object file is found to be damaged, the object replication service module uses a copy of the object file to replace the object file, and uses the object update service module to complete the object file update. If the update fails, then The updated file will be added to the queue, waiting to be processed again.
[0025] The container server includes a container database, a container update service module, a container replication service module, and a container audit service module. The container audit service module scans and detects the integrity of the data in the container database. If the data is found to be damaged, the object replication service module replaces the data with a copy of the data, and uses the object update service module to complete the data update. If the update fails, this time The updated file will be added to the queue, waiting to be processed again.
[0026] The account server includes an account database, an account update service module, an account copy service module, and an account audit service module. The account audit service module scans and detects the integrity of the data in the account database. If the data is found to be damaged, the account replication service module replaces the data with a copy of the data, and uses the account update service module to complete the data update. If the update fails, this time The updated file will be added to the queue, waiting to be processed again.
[0027] According to the characteristics of electric power application data, the present invention provides a unified, simple and reliable RESTful format service externally. For the internal purpose of safe and reliable functions, fast and stable performance, an identity verification mechanism and a cache service mechanism are added, and the audit module runs in each Swift The background of the server continuously scans the disk to check the integrity of the data. If the data is found to be damaged, the audit module will move the file to an isolated area, and then the replication module is responsible for replacing the data with a good copy. If the update fails, the update will be added to the queue on the local file system, and then the update module will continue to process these failed updates. In this way, the purpose of mass data storage is stable and easy to manage.
[0028] At the same time, through the asymmetric architecture that separates metadata and storage data, and through load balancing and data concurrent access strategies, it can obtain transmission rates of up to tens of Gbps and storage capacity of hundreds of PB under ordinary hardware conditions, and can be based on user applications The trend of development is to carry out online dynamic expansion in time and on demand. Different from the stand-alone file system, the distributed file system does not put these data on a disk and is managed by the upper operating system, but is stored on a server cluster. The servers in the cluster do their responsibilities and cooperate with each other. Provide services for the entire file system.
[0029] The invention can effectively support the storage and management of massive unstructured data, and can provide effective basic service means for the development of many business applications of electric power. Especially in the process of information application with huge data ontology and data volume such as geographic image, 3D model, 3D point cloud, etc., it exhibits extremely high data durability, data consistency and unlimited storage scalability, easy to use, and reliable high.