Data storage methods, devices, and business systems across HBase clusters

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By generating and reversing index numbers in the HBase cluster to determine the data storage location, the hotspot problem caused by storing files sequentially in the same area is solved, thus improving data storage speed and retrieval efficiency.

CN117130561BActive Publication Date: 2026-06-30中国邮政储蓄银行股份有限公司

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: 中国邮政储蓄银行股份有限公司
Filing Date: 2023-09-01
Publication Date: 2026-06-30

Application Information

Patent Timeline

01 Sep 2023

Application

30 Jun 2026

Publication

CN117130561B

IPC: G06F3/06; G06F16/13; G06F16/14

AI Tagging

Technology Topics

Time informationEndianness

Technical Efficacy Phrases

Issues that reduce storage speedImprove retrieval speed

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Building supply chain transparent communication system and method based on block chain
CN122087876Ahigh transparency Improve efficiency Data processing applications User identity/authority verification Payment Communications system
A smart storage chip integrating an edge computing data processing system
CN119248521BImprove retrieval efficiencyImprove retrieval speedResource allocation Record carriers used with machines Digital data Data processing system
A similar protein structure retrieval method based on hash learning
CN117275591BEnhance expressive abilityReduce storage overheadProtein structureHamming distance
Intelligent question answering method and device, electronic equipment and storage medium
CN122262273Ahigh speed Improve timeliness Other databases indexing Inference methods Engineering Question answer
Intelligent Generation and Processing Method and System for Meeting Content Based on Multimodal Large Model
CN120873212Bavoid redundancyavoid missingMultimedia data indexingMultimedia data clustering/classificationSemantic alignment Algorithm

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In existing technologies, storing files sequentially in the same area in an HBase cluster leads to hotspot phenomena, resulting in reduced data storage speed and decreased throughput.

Method used

By obtaining the location, time, and identification information of the data to be stored, an index number is generated through encoding. The index number is then reversed to generate a row key value. The data is then stored in the storage partition corresponding to the identification information to avoid overloading of the same area.

Benefits of technology

This solves the hotspot problem caused by storing files sequentially in the same area, improving data storage speed and accelerating data retrieval.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117130561B_ABST

Patent Text Reader

Abstract

This application provides a data storage method, apparatus, and business system across an HBase cluster. The method includes: upon receiving a data upload message, obtaining the location information, first time information, and first identification information of the data to be stored; encoding the location information, first time information, and first identification information to obtain an index number, which uniquely identifies the data to be stored; reversing the index number to obtain the row key value of the data to be stored; and storing the data to be stored in a first target partition on a target server according to the row key value. The first target partition is the storage partition corresponding to the byte of the first identification information in the row key value, and the byte order of the row key value and the index number is reversed. This method solves the problem in existing technologies where storing files sequentially in the same area causes hotspot phenomena and reduces data storage speed.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data storage and retrieval, and more specifically, to a data storage method, apparatus, computer-readable storage medium, and business system across an HBase cluster. Background Technology

[0002] With the rapid development of digitalization, unstructured data is playing an increasingly important role. In the process of enterprise experience and business processing, a large number and variety of unstructured data such as vouchers, contracts, archives, reports, audio and video are constantly generated. These data are in various formats, including widely used office documents such as DOC, PPT, EXCEL, PDF, etc., as well as semi-structured HTML, XML, EMAIL, and other professional formats such as images, audio, video, and multimedia stream files.

[0003] Currently, when dealing with massive amounts of unstructured data, a combination of HDFS and HBase is generally used for storage and management. The file index number is a globally unique identifier generated for each file, serving as unique index information for identifying and accessing the file. In traditional HBase, rows are sorted lexicographically by RowKey. A large number of accesses can cause a single machine hosting a hot region to exceed its capacity, leading to performance degradation or even region unavailability. This also affects other regions on the same RegionServer. Because the host cannot serve requests from other regions, this creates a data hotspot phenomenon, resulting in decreased throughput and reduced data upload speed. Summary of the Invention

[0004] The main objective of this application is to provide a data storage method, apparatus, computer-readable storage medium, and business system across an HBase cluster, so as to at least solve the problem of hotspot phenomenon caused by storing files sequentially in the same area in the prior art, which reduces data storage speed.

[0005] To achieve the above objectives, according to one aspect of this application, a data storage method across an HBase cluster is provided. The method includes: upon receiving a data upload message, obtaining location information, first time information, and first identification information of data to be stored, wherein the location information represents the planned storage location of the data to be stored, the first time information represents the upload time of the data to be stored, and the first identification information represents the planned storage partition of the data to be stored in a first target server, the first target server being the server where the data to be stored is planned to be stored; encoding the location information, the first time information, and the first identification information to obtain an index number, the index number being used to uniquely identify the data to be stored; reversing the index number to obtain a row key value of the data to be stored; and storing the data to be stored in a first target partition in the first target server according to the row key value, the first target partition being the storage partition corresponding to the byte of the first identification information in the row key value, the row key value being the byte order of the index number being the reverse.

[0006] Optionally, encoding the location information, the first time information, and the first identification information to obtain an index number includes: encoding the first location information in decimal to obtain the first byte and the second byte of the index number, wherein the first location information is used to characterize the geographical location of the cluster group where the data to be stored is planned to be stored, and the cluster group includes multiple HBase clusters and Hadoop clusters; encoding the second location information in binary to obtain the third byte of the index number, wherein the second location information is used to characterize the cluster where the data to be stored is planned to be stored; encoding the third location information in 30-binary to obtain the fourth byte of the index number, wherein the third location information is used to characterize the server where the data to be stored is planned to be stored; encoding the first time sub-information in decimal to obtain the fifth and sixth bytes of the index number, and encoding the second time sub-information in 30-binary to obtain the seventh and eighth bytes of the index number, wherein the first time sub-information is the part of the first time information used to characterize the year, and the second time sub-information is the part of the first time information used to characterize the month and day; and encoding the first identification information in 30-binary to obtain the ninth to twelfth bytes of the index number.

[0007] Optionally, before obtaining the location information, first time information, and first identification information of the data to be stored, the method further includes: parsing the data upload message to obtain a first message format and message data size, wherein the first message format includes the format information of the data upload message, and the message data size is the data size of the data upload message; if the first message format is consistent with a first preset format and the message data size is consistent with a preset data size, determining that the data upload message verification is error-free; if the first message format is inconsistent with the first preset format and / or the message data size is inconsistent with the preset data size, determining that the data upload message verification is error-free and issuing a first signaling, wherein the first signaling is used to indicate that the data storage of the data to be stored has failed.

[0008] Optionally, storing the data to be stored in the first target partition of the first target server according to the row key value includes: determining second identification information according to the row key value, wherein the second identification information corresponds to the ninth to twelfth bytes of the index number and is in reverse order; obtaining third identification information, wherein the third identification information is a unique identifier for each partition in the first target server; determining the first target partition according to the second identification information and the third identification information and writing the data to be stored into the first target partition.

[0009] Optionally, after storing the data to be stored in the first target partition of the first target server according to the row key value, the method further includes: obtaining a target index number and determining the storage status of the target data according to the target index number, wherein the target index number is the index number corresponding to the target data, and the storage status includes deleted and not deleted; if the storage status is not deleted, determining a target cluster group according to the target index number, wherein the target cluster group is the cluster group to which the target data belongs; determining a target cluster according to the target index number, wherein the target cluster is the cluster to which the target data belongs; determining a second target server according to the target index number, wherein the second target server is the server to which the target data belongs; reversing the target index number to obtain a target row key value, and reading the target data from the second target partition of the second target server according to the target row key value.

[0010] Optionally, before obtaining the target index number, the method further includes: obtaining a data access message and parsing the data upload message to obtain a second message format, the second message format including the format information of the data access message; if the second message format is consistent with a second preset format, determining that the data access message verification is correct; if the second message format is inconsistent with the second preset format, determining that the data access message verification is incorrect and sending a second signaling, the second signaling being used to indicate data access failure.

[0011] Optionally, determining the storage status of target data based on the target index number includes: determining the first time information based on the target index number; obtaining second time information and target lifecycle, wherein the second time information is the current time and the target lifecycle is the retention time of the target data in the target server; and determining the storage status of the target data based on the first time information, the second time information, and the target lifecycle.

[0012] According to another aspect of this application, a cross-HBase cluster data storage device is provided, the device comprising: a first acquisition unit, configured to acquire, upon receiving a data upload message, location information, first time information, and first identification information of data to be stored, wherein the location information is used to characterize the planned storage location of the data to be stored, the first time information is used to characterize the upload time of the data to be stored, and the first identification information is used to characterize the planned storage partition of the data to be stored in a first target server, the first target server being the server where the data to be stored is planned to be stored; an encoding unit, configured to encode the location information, the first time information, and the first identification information to obtain an index number, the index number being used to uniquely identify the data to be stored; and a reversal unit, configured to reverse the index number to obtain the row key value of the data to be stored, and store the data to be stored in a first target partition in the first target server according to the row key value, wherein the first target partition is the storage partition corresponding to the byte in the row key value corresponding to the first identification information, and the byte order of the row key value is the reverse of the index number.

[0013] According to another aspect of this application, a computer-readable storage medium is provided, the computer-readable storage medium including a stored program, wherein, when the program is executed, it controls the device on which the computer-readable storage medium is located to perform any of the methods described.

[0014] According to another aspect of this application, a business system is provided, comprising: one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including methods for performing any one of the methods described.

[0015] Applying the technical solution of this application, in the above-mentioned cross-HBase cluster data storage method, firstly, upon receiving a data upload message, the location information, first time information, and first identification information of the data to be stored are obtained. The location information is used to characterize the planned storage location of the data to be stored, the first time information is used to characterize the upload time of the data to be stored, and the first identification information is used to characterize the planned storage partition of the data to be stored in the first target server, where the first target server is the server where the data to be stored is planned to be stored. Then, the location information, the first time information, and the first identification information are encoded to obtain an index number, which is used to uniquely identify the data to be stored. Finally, the index number is reversed to obtain the row key value of the data to be stored, and the data to be stored is stored in the first target partition of the first target server according to the row key value. The first target partition is the storage partition corresponding to the byte of the first identification information in the row key value, and the byte order of the row key value is the reverse of the index number. This application designs a file index code, including a cluster group code, a cluster code, a sequence code, a machine code, and a date code, to support fast data indexing. The file index number is then reversed to obtain a row key value. During the reversal process, the sequence code changes order, and files originally stored sequentially in the same area are distributed across different areas according to the row key value. This solves the problem of hotspots caused by storing files sequentially in the same area in existing technologies, which reduces data storage speed. Simultaneously, the cluster group code, cluster code machine code, and date code support rapid location of data, accelerating retrieval. Attached Figure Description

[0016] Figure 1 A hardware structure block diagram of a mobile terminal for performing a data storage method across an HBase cluster, according to an embodiment of this application, is shown.

[0017] Figure 2 A flowchart illustrating a data storage method across an HBase cluster according to an embodiment of this application is shown.

[0018] Figure 3 A schematic flowchart of a data storage process according to an embodiment of this application is shown;

[0019] Figure 4 A schematic flowchart of a data access process according to an embodiment of this application is shown;

[0020] Figure 5 A structural block diagram of a data storage device across an HBase cluster provided according to an embodiment of this application is shown.

[0021] The above figures include the following reference numerals:

[0022] 102. Processor; 104. Memory; 106. Transmission device; 108. Input / output device. Detailed Implementation

[0023] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. This application will now be described in detail with reference to the accompanying drawings and embodiments.

[0024] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort should fall within the scope of protection of the present application.

[0025] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate for the embodiments of this application described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0026] For ease of description, the following explains some of the nouns or terms used in the embodiments of this application:

[0027] Unstructured data: Compared to structured data, data that is inconvenient to represent using a two-dimensional logical table in a database is called unstructured data. This includes all formats of office documents, text, images, XML, HTML, various reports, images, and audio / video information.

[0028] Hadoop is a distributed system infrastructure developed by the Apache Software Foundation. Users can develop distributed programs without understanding the underlying details of distributed systems. It is a software framework capable of distributing large amounts of data in a reliable, efficient, and scalable manner.

[0029] HBase is a highly reliable, high-performance, column-oriented distributed database and an important sub-project of the Apache Software Foundation's open-source project Hadoop.

[0030] RowKey: The row key of an HBase table, which is the unique primary key for adding, deleting, modifying, and querying data.

[0031] File index number: A unique identifier for a file in a distributed file system.

[0032] As described in the background section, in existing technologies, files are sorted in dictionary order according to the RowKey. A large number of accesses can cause a single machine containing a hot region to exceed its capacity, leading to performance degradation or even region unavailability. This also affects other regions on the same RegionServer. Since the host cannot serve requests from other regions, this creates a data hotspot phenomenon, resulting in a decrease in throughput. To solve the problem of hotspot phenomenon and reduced data storage speed caused by storing files sequentially in the same region in existing technologies, embodiments of this application provide a data storage method, apparatus, computer-readable storage medium, and business system across HBase clusters.

[0033] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

[0034] The methods and embodiments provided in this application can be executed on a mobile terminal, computer terminal, or similar computing device. Taking running on a mobile terminal as an example, Figure 1 This is a hardware structure block diagram of a mobile terminal for a cross-HBase cluster data storage method according to an embodiment of the present invention. Figure 1 As shown, a mobile terminal may include one or more ( Figure 1 Only one is shown in the diagram. A processor 102 (which may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.) and a memory 104 for storing data are also shown. The mobile terminal may further include a transmission device 106 for communication functions and an input / output device 108. Those skilled in the art will understand that... Figure 1 The structure shown is for illustrative purposes only and does not limit the structure of the mobile terminal described above. For example, the mobile terminal may also include components that are more... Figure 1 The more or fewer components shown, or having the same Figure 1 The different configurations shown.

[0035] The memory 104 can be used to store computer programs, such as application software programs and modules, like the computer program corresponding to the device information display method in this embodiment of the invention. The processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, thereby implementing the above-described method. The memory 104 may include high-speed random access memory and non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory remotely located relative to the processor 102, and these remote memories can be connected to the mobile terminal via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof. The transmission device 106 is used to receive or send data via a network. Specific examples of the aforementioned networks may include wireless networks provided by the mobile terminal's communication provider. In one example, the transmission device 106 includes a network interface controller (NIC), which can be connected to other network devices via a base station to communicate with the Internet. In one example, the transmission device 106 may be a radio frequency (RF) module, which is used to communicate with the Internet wirelessly.

[0036] This embodiment provides a data storage method across an HBase cluster that runs on a mobile terminal, computer terminal, or similar computing device. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Also, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.

[0037] Figure 2 This is a flowchart of a cross-HBase cluster data storage method according to an embodiment of this application. Figure 2 As shown, the method includes the following steps:

[0038] Step S201: Upon receiving a data upload message, obtain the location information, first time information, and first identification information of the data to be stored. The location information is used to characterize the planned storage location of the data to be stored, the first time information is used to characterize the upload time of the data to be stored, and the first identification information is used to characterize the planned storage partition of the data to be stored in the first target server. The first target server is the server where the data to be stored is planned to be stored.

[0039] Specifically, when new business data is generated in the business system, a data upload message is sent to the resource management module in the system. After receiving the upload message, the resource management module parses the message to obtain the storage location requested by the message, including the cluster group, cluster, and server to be stored, i.e., the aforementioned location information. At the same time, in order to determine the timeliness of the file, the message also includes the data upload time, i.e., the aforementioned first time information. In addition, in order to determine the specific partition in the server to which the data will be stored, the message also includes the sequence code corresponding to the partition, i.e., the aforementioned first identification information.

[0040] Step S202: Encode the above location information, the above first time information and the above first identification information to obtain an index number, the above index number is used to uniquely identify the above data to be stored;

[0041] In this embodiment, the index number consisting of 12 letters or numbers can be obtained by encoding the information obtained from the above parsing. This includes a 2-digit cluster group code, a 1-digit cluster identifier code, a 1-digit server machine identifier code, a 4-digit date code, and a 4-digit sequence code. The order of different types of encoding is not limited, and each index number corresponds to a unique data file.

[0042] Step S203: Reverse the index number to obtain the row key value of the data to be stored, and store the data to be stored in the first target partition of the first target server according to the row key value. The first target partition is the storage partition corresponding to the byte of the first identifier information in the row key value. The byte order of the row key value is reversed with that of the index number.

[0043] Specifically, the byte order in the above index number is reversed to obtain the row key value of the corresponding data. After the sequence code is reversed, the files that were originally uploaded in adjacent order are stored according to the partition corresponding to the reversed sequence code, and are no longer stored in the same partition. This avoids the phenomenon of hotspots caused by a large number of storage operations on the same partition in a short period of time.

[0044] In this embodiment, firstly, upon receiving a data upload message, the location information, first time information, and first identification information of the data to be stored are obtained. The location information represents the planned storage location of the data to be stored, the first time information represents the upload time of the data to be stored, and the first identification information represents the planned storage partition of the data to be stored in a first target server, where the first target server is the server where the data to be stored is planned to be stored. Then, the location information, the first time information, and the first identification information are encoded to obtain an index number, which is used to uniquely identify the data to be stored. Finally, the index number is reversed to obtain the row key value of the data to be stored, and the data to be stored is stored in the first target partition of the first target server according to the row key value. The first target partition is the storage partition corresponding to the byte of the first identification information in the row key value, and the byte order of the row key value is the reverse of the index number. This application designs a file index code, including a cluster group code, a cluster code, a sequence code, a machine code, and a date code, to support fast data indexing. The file index number is then reversed to obtain a row key value. During the reversal process, the sequence code changes order, and files originally stored sequentially in the same area are distributed across different areas according to the row key value. This solves the problem of hotspots caused by storing files sequentially in the same area in existing technologies, which reduces data storage speed. Simultaneously, the cluster group code, cluster code machine code, and date code support rapid location of data, accelerating retrieval.

[0045] In order to generate an index number corresponding to the data file according to a preset rule, in one optional implementation, step S202 above includes:

[0046] Step S2021: Encode the first location information into decimal to obtain the first byte and the second byte of the index number. The first location information is used to characterize the geographical location of the cluster group where the data to be stored is planned to be stored. The cluster group includes multiple HBase clusters and Hdoop clusters.

[0047] Specifically, cluster groups are identified by assigning codes to them.

[0048] In practice, clusters can be divided according to administrative area codes, taking the first two digits of the national unified cluster code, such as Beijing-11, Tianjin-12, Hebei-13, etc.

[0049] Step S2022: The second position information is binary encoded to obtain the third byte of the index number. The second position information is used to characterize the cluster where the data to be stored is planned to be stored.

[0050] Specifically, files are stored in different clusters depending on their size. Based on the characteristics of Hadoop and HBase clusters, HDFS in Hadoop is suitable for storing large files, while HBase is suitable for storing small files. Corresponding encodings are set for different storage locations.

[0051] In practice, depending on the file size, with 50MB as the limit, small files smaller than 50MB are stored in HBase and their identifier is set to 1; files larger than 50MB are stored in HDFS and their identifier is set to 0. Small files are directly stored in columns in HBase, while large files are stored in HDFS, with their paths also stored in columns in HBase. This identifier identifies the file's storage and access method. The methods for reading content objects stored in different locations differ.

[0052] Step S2023: Encode the third position information into 30-bit binary to obtain the fourth byte of the index number. The third position information is used to characterize the server where the data to be stored is planned to be stored.

[0053] Specifically, this label is a unique identifier for different servers in each cluster group, and the data file can be determined on which server it is stored on.

[0054] Step S2024: Encode the first time sub-information in decimal to obtain the fifth and sixth bytes of the index number, and encode the second time sub-information in 30-bit binary to obtain the seventh and eighth bytes of the index number. The first time sub-information is the part of the first time information used to represent the year, and the second time sub-information is the part of the first time information used to represent the month and day.

[0055] Specifically, to obtain the file's upload time, such as 2023 / 07 / 01, the last two digits of the year are extracted during encoding, and the month and date are converted to 30-bit binary to obtain the fifth to eighth bytes mentioned above.

[0056] Step S2025: Encode the first identification information into 30-bit binary to obtain the ninth to twelfth bytes of the index number.

[0057] Specifically, the sequence code is an identifier generated by each server for the data file. Different segments of the identifier code correspond to different partitions, and in this application, it is represented by a 4-digit base-32 code.

[0058] To ensure the integrity of the stored data file, in one optional implementation, before obtaining the location information, first time information, and first identification information of the data to be stored, the above method further includes:

[0059] Step S301: Parse the above data upload message to obtain the first message format and message data size. The first message format includes the format information of the above data upload message, and the message data size is the data size of the above data upload message.

[0060] Specifically, such as Figure 3 As shown, before storing data on the server, the integrity of the data upload message and the integrity of the data file are determined. First, the format of the data message is obtained and matched with the preset format. Then, the size of the data file is compared with the size of the record.

[0061] Step S302: If the first message format is consistent with the first preset format and the message data volume is consistent with the preset data volume, determine that the data upload message is correct.

[0062] Specifically, such as Figure 3 As shown, if the message format is the same as the preset format, the request is confirmed to be secure. If the data volume is consistent, it is confirmed that the data file has no loss during transmission and the next storage process can continue.

[0063] Step S303: If the first message format is inconsistent with the first preset format and / or the message data volume is inconsistent with the preset data volume, determine that the data upload message verification is incorrect and issue a first signaling, the first signaling being used to indicate that the data storage to be stored has failed.

[0064] Specifically, such as Figure 3 As shown, if the message format is different from the preset format, the request is determined to be insecure; if the data volume is inconsistent, the data file is determined to have been lost during transmission. If any of these situations occur, an instruction is issued to indicate that the upload has failed and to re-upload.

[0065] In an optional implementation, to store the data file at the target location, step S203 includes:

[0066] Step S2031: Determine the second identification information based on the above row key value. The second identification information corresponds to the ninth to twelfth bytes of the above index number and is in the reverse order.

[0067] Specifically, such as Figure 3 As shown, after the data verification is successful, the data file is sent to the target server according to the cluster group corresponding code, the cluster corresponding code and the server corresponding code in the index code. Then, the target server obtains the reversed sequence code, i.e. the second identification information mentioned above, from the row key value obtained after reversing the index code and corresponding to the sequence code.

[0068] Step S2032: Obtain third identification information, which is the unique identifier of each partition in the first target server.

[0069] Specifically, obtain the number corresponding to each storage partition in the server, i.e., the third identification information mentioned above.

[0070] In practice, each partition in the target server is used to store data files with sequence numbers corresponding to a certain segment. For example, partition number 1 can be used to store data files with sequence numbers 1 to 1000.

[0071] Step S2033: Determine the first target partition based on the second and third identification information and write the data to be stored into the first target partition.

[0072] Specifically, based on the second representation information mentioned above, the third identification information corresponding to the second identification information can be determined, thereby determining the target partition and writing the data file.

[0073] In order to retrieve target data from the server, in one optional implementation, after storing the data to be stored in the first target partition of the first target server according to the row key value, the method further includes:

[0074] Step S401: Obtain the target index number and determine the storage status of the target data based on the target index number. The target index number is the index number corresponding to the target data. The storage status includes deleted and not deleted.

[0075] Specifically, such as Figure 4 As shown, the index number of the target data is obtained, the file upload time is determined based on the bytes corresponding to the upload time in the index number, and then compared with the current time to determine whether the target file has been deleted.

[0076] Step S402: If the storage status is not deleted, determine the target cluster group based on the target index number. The target cluster group is the cluster group to which the target data belongs.

[0077] Specifically, such as Figure 4 As shown, if the file has not been deleted, the cluster group where the target data is located is determined based on the cluster group code in the index number.

[0078] Step S403: Determine the target cluster based on the target index number mentioned above. The target cluster is the cluster to which the target data belongs.

[0079] Specifically, such as Figure 4 As shown, the cluster where the target data is located and the access method are determined based on the byte corresponding to the cluster code in the index number.

[0080] Step S404: Determine the second target server based on the target index number mentioned above. The second target server is the server to which the target data belongs.

[0081] Specifically, such as Figure 4 As shown, the server for storing the target data is determined based on the byte of the machine code corresponding to the index number.

[0082] Step S405: Invert the target index number to obtain the target row key value, and read the target data from the second target partition of the second target server according to the target row key value.

[0083] Specifically, such as Figure 4 As shown, the index number is reversed to obtain the corresponding row key value. The target partition is determined by comparing the row key values, and then the data with the same row key value is extracted from the target partition.

[0084] To determine whether the data is stored on the target server, in one optional implementation, step S401 includes:

[0085] Step S4011: Determine the first time information based on the target index number.

[0086] Specifically, the upload time of the file is determined based on the index number.

[0087] Step S4012: Obtain second time information and target lifecycle, wherein the second time information is the current time and the target lifecycle is the retention time of the target data in the target server;

[0088] Specifically, the data retention time on the target server is determined based on the aforementioned lifecycle and the time for data access.

[0089] Step S4013: Determine the storage state of the target data based on the first time information, the second time information, and the target lifecycle.

[0090] Specifically, the time interval between data upload time and data access time is determined. If the time interval is greater than the lifespan, the target data has been deleted; if the time interval is less than the lifespan, the target data is still stored on the target server.

[0091] To determine the legitimacy of the access request, in one optional implementation, the method further includes, before obtaining the target index number:

[0092] Step S501: Obtain the data access message and parse the data upload message to obtain the second message format, wherein the second message format includes the format information of the data access message.

[0093] Specifically, such as Figure 4 As shown, to determine the legitimacy of a data access message before accessing the server, the format of the data message is first obtained and matched with a preset format.

[0094] Step S502: If the second message format is consistent with the second preset format, determine that the data access message verification is correct.

[0095] Specifically, such as Figure 4 As shown, if the message format is the same as the preset format, the request is confirmed to proceed to the next access step.

[0096] Step S503: If the second message format is inconsistent with the second preset format, determine that the data access message verification is incorrect and send a second signaling message, which is used to indicate that the data access has failed.

[0097] Specifically, such as Figure 4 As shown, if the message format is different from the preset format, the request is determined to be insecure, and an instruction is issued to indicate that the access has failed and to attempt to access again.

[0098] It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although a logical order is shown in the flowchart, in some cases the steps shown or described may be executed in a different order than that shown here.

[0099] This application also provides a cross-HBase cluster data storage device. It should be noted that the cross-HBase cluster data storage device of this application can be used to execute the cross-HBase cluster data storage method provided in this application. This device is used to implement the above embodiments and preferred embodiments; details already described will not be repeated. As used below, the term "module" can refer to a combination of software and / or hardware that implements a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.

[0100] The following describes the data storage device across HBase clusters provided in the embodiments of this application.

[0101] Figure 5 This is a structural block diagram of a cross-HBase cluster data storage device according to an embodiment of this application. Figure 5 As shown, the device includes:

[0102] The first acquisition unit 10 is used to acquire, upon receiving a data upload message, the location information, the first time information, and the first identification information of the data to be stored. The location information is used to characterize the planned storage location of the data to be stored, the first time information is used to characterize the upload time of the data to be stored, and the first identification information is used to characterize the planned storage partition of the data to be stored in the first target server. The first target server is the server where the data to be stored is planned to be stored.

[0103] Specifically, when new business data is generated in the business system, a data upload message is sent to the resource management module in the system. After receiving the upload message, the resource management module parses the message to obtain the storage location requested by the message, including the cluster group, cluster, and server to be stored, i.e., the aforementioned location information. At the same time, in order to determine the timeliness of the file, the message also includes the data upload time, i.e., the aforementioned first time information. In addition, in order to determine the specific partition in the server to which the data will be stored, the message also includes the sequence code corresponding to the partition, i.e., the aforementioned first identification information.

[0104] Encoding unit 20 is used to encode the above-mentioned location information, the above-mentioned first time information and the above-mentioned first identification information to obtain an index number, the index number being used to uniquely identify the above-mentioned data to be stored;

[0105] In this embodiment, the index number consisting of 12 letters or numbers can be obtained by encoding the information obtained from the above parsing. This includes a 2-digit cluster group code, a 1-digit cluster identifier code, a 1-digit server machine identifier code, a 4-digit date code, and a 4-digit sequence code. The order of different types of encoding is not limited, and each index number corresponds to a unique data file.

[0106] The reversal unit 30 is used to reverse the index number to obtain the row key value of the data to be stored, and store the data to be stored in the first target partition in the first target server according to the row key value. The first target partition is the storage partition corresponding to the byte corresponding to the first identification information in the row key value. The byte order of the row key value is reversed with that of the index number.

[0107] Specifically, the byte order in the above index number is reversed to obtain the row key value of the corresponding data. After the sequence code is reversed, the files that were originally uploaded in adjacent order are stored according to the partition corresponding to the reversed sequence code, and are no longer stored in the same partition. This avoids the phenomenon of hotspots caused by a large number of storage operations on the same partition in a short period of time.

[0108] In this embodiment, upon receiving a data upload message, the first acquisition unit acquires the location information, first time information, and first identification information of the data to be stored. The location information represents the planned storage location of the data to be stored, the first time information represents the upload time of the data to be stored, and the first identification information represents the planned storage partition of the data to be stored in the first target server. The first target server is the server where the data to be stored is planned to be stored. The encoding unit encodes the location information, the first time information, and the first identification information to obtain an index number, which uniquely identifies the data to be stored. The reversal unit reverses the index number to obtain the row key value of the data to be stored, and stores the data to be stored in the first target partition of the first target server according to the row key value. The first target partition is the storage partition corresponding to the byte of the first identification information in the row key value, and the byte order of the row key value is the reverse of the index number. This application designs a file index code, including a cluster group code, a cluster code, a sequence code, a machine code, and a date code, to support fast data indexing. The file index number is then reversed to obtain a row key value. During the reversal process, the sequence code changes order, and files originally stored sequentially in the same area are distributed across different areas according to the row key value. This solves the problem of hotspots caused by storing files sequentially in the same area in existing technologies, which reduces data storage speed. Simultaneously, the cluster group code, cluster code machine code, and date code support rapid location of data, accelerating retrieval.

[0109] In order to generate an index number corresponding to the data file according to preset rules, in one optional implementation, the above-mentioned encoding unit includes:

[0110] The first encoding module is used to encode the first location information into decimal to obtain the first byte and the second byte of the index number. The first location information is used to characterize the geographical location of the cluster group where the data to be stored is planned to be stored. The cluster group includes multiple HBase clusters and Hdoop clusters.

[0111] Specifically, cluster groups are identified by assigning codes to them.

[0112] In practice, clusters can be divided according to administrative area codes, taking the first two digits of the national unified cluster code, such as Beijing-11, Tianjin-12, Hebei-13, etc.

[0113] The second encoding module is used to encode the second position information into binary to obtain the third byte of the index number. The second position information is used to characterize the cluster where the data to be stored is planned to be stored.

[0114] Specifically, files are stored in different clusters depending on their size. Based on the characteristics of Hadoop and HBase clusters, HDFS in Hadoop is suitable for storing large files, while HBase is suitable for storing small files. Corresponding encodings are set for different storage locations.

[0115] In practice, depending on the file size, with 50MB as the limit, small files smaller than 50MB are stored in HBase and their identifier is set to 1; files larger than 50MB are stored in HDFS and their identifier is set to 0. Small files are directly stored in columns in HBase, while large files are stored in HDFS, with their paths also stored in columns in HBase. This identifier identifies the file's storage and access method. The methods for reading content objects stored in different locations differ.

[0116] The third encoding module is used to encode the third position information into 30 binary to obtain the fourth byte of the index number. The third position information is used to characterize the server where the data to be stored is planned to be stored.

[0117] Specifically, this label is a unique identifier for different servers in each cluster group, and the data file can be determined on which server it is stored on.

[0118] The fourth encoding module is used to encode the first time sub-information in decimal to obtain the fifth and sixth bytes of the index number, and to encode the second time sub-information in 30-binary to obtain the seventh and eighth bytes of the index number. The first time sub-information is the part of the first time information used to represent the year, and the second time sub-information is the part of the first time information used to represent the month and day.

[0119] Specifically, to obtain the file's upload time, such as 2023 / 07 / 01, the last two digits of the year are extracted during encoding, and the month and date are converted to 30-bit binary to obtain the fifth to eighth bytes mentioned above.

[0120] The fifth encoding module is used to encode the first identification information into 30-bit binary to obtain the ninth to twelfth bytes of the index number.

[0121] Specifically, the sequence code is an identifier generated by each server for the data file. Different segments of the identifier code correspond to different partitions, and in this application, it is represented by a 4-digit base-32 code.

[0122] To ensure the integrity of the stored data files, in one optional embodiment, the above-mentioned apparatus includes:

[0123] The first parsing unit is used to parse the data upload message before obtaining the location information, first time information and first identification information of the data to be stored, to obtain the first message format and message data volume. The first message format includes the format information of the data upload message, and the message data volume is the data size of the data upload message.

[0124] Specifically, such as Figure 3 As shown, before storing data on the server, the integrity of the data upload message and the integrity of the data file are determined. First, the format of the data message is obtained and matched with the preset format. Then, the size of the data file is compared with the size of the record.

[0125] The first determining unit is used to determine that the data upload message is correct when the first message format is consistent with the first preset format and the message data volume is consistent with the preset data volume.

[0126] Specifically, such as Figure 3 As shown, if the message format is the same as the preset format, the request is confirmed to be secure. If the data volume is consistent, it is confirmed that the data file has no loss during transmission and the next storage process can continue.

[0127] The second determining unit is used to determine that the data upload message verification is incorrect and issue a first signaling when the first message format is inconsistent with the first preset format and / or the message data volume is inconsistent with the preset data volume. The first signaling is used to indicate that the data storage to be stored has failed.

[0128] Specifically, such as Figure 3 As shown, if the message format is different from the preset format, the request is determined to be insecure; if the data volume is inconsistent, the data file is determined to have been lost during transmission. If any of these situations occur, an instruction is issued to indicate that the upload has failed and to re-upload.

[0129] In an optional implementation, to store the data file to the target location, the aforementioned reversal unit includes:

[0130] The first determining module is used to determine the second identification information based on the above row key value. The second identification information corresponds to the ninth to twelfth bytes of the above index number and is in the reverse order.

[0131] Specifically, such as Figure 3 As shown, after the data verification is successful, the data file is sent to the target server according to the cluster group corresponding code, the cluster corresponding code and the server corresponding code in the index code. Then, the target server obtains the reversed sequence code, i.e. the second identification information mentioned above, from the row key value obtained after reversing the index code and corresponding to the sequence code.

[0132] The first acquisition module is used to acquire third identification information, which is the unique identifier of each partition in the first target server.

[0133] Specifically, obtain the number corresponding to each storage partition in the server, i.e., the third identification information mentioned above.

[0134] In practice, each partition in the target server is used to store data files with sequence numbers corresponding to a certain segment. For example, partition number 1 can be used to store data files with sequence numbers 1 to 1000.

[0135] The second determining module is used to determine the first target partition based on the second identification information and the third identification information and to write the data to be stored into the first target partition.

[0136] Specifically, based on the second representation information mentioned above, the third identification information corresponding to the second identification information can be determined, thereby determining the target partition and writing the data file.

[0137] In an optional embodiment, to retrieve target data from the server, the above-mentioned apparatus further includes:

[0138] The second acquisition unit is used to acquire a target index number and determine the storage status of the target data according to the target index number after storing the data to be stored in the first target partition of the first target server according to the row key value. The target index number is the index number corresponding to the target data. The storage status includes deleted and not deleted.

[0139] Specifically, such as Figure 4 As shown, the index number of the target data is obtained, the file upload time is determined based on the bytes corresponding to the upload time in the index number, and then compared with the current time to determine whether the target file has been deleted.

[0140] The third determining unit is used to determine the target cluster group based on the target index number when the storage status is not deleted. The target cluster group is the cluster group to which the target data belongs.

[0141] Specifically, such as Figure 4 As shown, if the file has not been deleted, the cluster group where the target data is located is determined based on the cluster group code in the index number.

[0142] The fourth determining unit is used to determine the target cluster based on the target index number mentioned above, wherein the target cluster is the cluster to which the target data belongs;

[0143] Specifically, such as Figure 4As shown, the cluster where the target data is located and the access method are determined based on the byte corresponding to the cluster code in the index number.

[0144] The fifth determining unit is used to determine the second target server based on the target index number mentioned above, wherein the second target server is the server to which the target data belongs;

[0145] Specifically, such as Figure 4 As shown, the server for storing the target data is determined based on the byte of the machine code corresponding to the index number.

[0146] The sixth determining unit is used to reverse the target index number to obtain the target row key value, and read the target data from the second target partition of the second target server according to the target row key value.

[0147] Specifically, such as Figure 4 As shown, the index number is reversed to obtain the corresponding row key value. The target partition is determined by comparing the row key values, and then the data with the same row key value is extracted from the target partition.

[0148] To determine whether data is stored on the target server, in one optional implementation, the second acquisition unit includes:

[0149] The third determining module is used to determine the aforementioned first-time information based on the aforementioned target index number;

[0150] Specifically, the upload time of the file is determined based on the index number.

[0151] The second acquisition module is used to acquire second time information and target lifecycle, wherein the second time information is the current time and the target lifecycle is the retention time of the target data in the target server.

[0152] Specifically, the data retention time on the target server is determined based on the aforementioned lifecycle and the time for data access.

[0153] The fourth determining module is used to determine the storage state of the target data based on the first time information, the second time information, and the target lifecycle.

[0154] Specifically, the time interval between data upload time and data access time is determined. If the time interval is greater than the lifespan, the target data has been deleted; if the time interval is less than the lifespan, the target data is still stored on the target server.

[0155] To determine the legitimacy of the access request, in one optional implementation, the above-mentioned apparatus further includes:

[0156] The third acquisition unit is used to acquire data access messages and parse the data upload messages to obtain a second message format, wherein the second message format includes the format information of the data access messages.

[0157] Specifically, such as Figure 4 As shown, to determine the legitimacy of a data access message before accessing the server, the format of the data message is first obtained and matched with a preset format.

[0158] The seventh determining unit is used to determine that the above data access message verification is correct when the above second message format is consistent with the second preset format;

[0159] Specifically, such as Figure 4 As shown, if the message format is the same as the preset format, the request is confirmed to proceed to the next access step.

[0160] The eighth determining unit is used to determine that the data access message verification is incorrect and send a second signaling when the second message format is inconsistent with the second preset format. The second signaling is used to indicate that the data access has failed.

[0161] Specifically, such as Figure 4 As shown, if the message format is different from the preset format, the request is determined to be insecure, and an instruction is issued to indicate that the access has failed and to attempt to access again.

[0162] The aforementioned cross-HBase cluster data storage device includes a processor and memory. The first acquisition unit, encoding unit, and reversal unit are all stored as program units in the memory, and the processor executes these program units stored in the memory to achieve the corresponding functions. All of the above modules reside in the same processor; alternatively, the modules may be located in different processors in any combination.

[0163] A processor contains a kernel, which retrieves the corresponding program units from memory. One or more kernels can be configured, and hotspots in data storage can be avoided by adjusting kernel parameters.

[0164] The memory may include non-permanent memory in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM, and the memory includes at least one memory chip.

[0165] This invention provides a computer-readable storage medium including a stored program, wherein the program, when running, controls the device containing the computer-readable storage medium to execute the cross-HBase cluster data storage method.

[0166] Specifically, data storage methods across HBase clusters include:

[0167] Step S201: Upon receiving a data upload message, obtain the location information, first time information, and first identification information of the data to be stored. The location information is used to characterize the planned storage location of the data to be stored, the first time information is used to characterize the upload time of the data to be stored, and the first identification information is used to characterize the planned storage partition of the data to be stored in the first target server. The first target server is the server where the data to be stored is planned to be stored.

[0168] Step S202: Encode the above location information, the above first time information and the above first identification information to obtain an index number, the above index number is used to uniquely identify the above data to be stored;

[0169] Step S203: Reverse the index number to obtain the row key value of the data to be stored, and store the data to be stored in the first target partition of the first target server according to the row key value. The first target partition is the storage partition corresponding to the byte of the first identifier information in the row key value. The byte order of the row key value is reversed with that of the index number.

[0170] This invention provides a processor for running a program, wherein the program executes the data storage method across an HBase cluster.

[0171] Specifically, data storage methods across HBase clusters include:

[0172] Step S201: Upon receiving a data upload message, obtain the location information, first time information, and first identification information of the data to be stored. The location information is used to characterize the planned storage location of the data to be stored, the first time information is used to characterize the upload time of the data to be stored, and the first identification information is used to characterize the planned storage partition of the data to be stored in the first target server. The first target server is the server where the data to be stored is planned to be stored.

[0173] Step S202: Encode the above location information, the above first time information and the above first identification information to obtain an index number, the above index number is used to uniquely identify the above data to be stored;

[0174] Step S203: Reverse the index number to obtain the row key value of the data to be stored, and store the data to be stored in the first target partition of the first target server according to the row key value. The first target partition is the storage partition corresponding to the byte of the first identifier information in the row key value. The byte order of the row key value is reversed with that of the index number.

[0175] This invention provides a business system, which includes a processor, a memory, and a program stored in the memory and executable on the processor. When the processor executes the program, it performs at least the following steps:

[0176] Step S201: Upon receiving a data upload message, obtain the location information, first time information, and first identification information of the data to be stored. The location information is used to characterize the planned storage location of the data to be stored, the first time information is used to characterize the upload time of the data to be stored, and the first identification information is used to characterize the planned storage partition of the data to be stored in the first target server. The first target server is the server where the data to be stored is planned to be stored.

[0177] Step S202: Encode the above location information, the above first time information and the above first identification information to obtain an index number, the above index number is used to uniquely identify the above data to be stored;

[0178] Step S203: Reverse the index number to obtain the row key value of the data to be stored, and store the data to be stored in the first target partition of the first target server according to the row key value. The first target partition is the storage partition corresponding to the byte of the first identifier information in the row key value. The byte order of the row key value is reversed with that of the index number.

[0179] This application also provides a computer program product, which, when executed on a data processing device, is suitable for executing an initialization program having at least the following method steps:

[0180] Step S201: Upon receiving a data upload message, obtain the location information, first time information, and first identification information of the data to be stored. The location information is used to characterize the planned storage location of the data to be stored, the first time information is used to characterize the upload time of the data to be stored, and the first identification information is used to characterize the planned storage partition of the data to be stored in the first target server. The first target server is the server where the data to be stored is planned to be stored.

[0181] Step S202: Encode the above location information, the above first time information and the above first identification information to obtain an index number, the above index number is used to uniquely identify the above data to be stored;

[0182] Step S203: Reverse the index number to obtain the row key value of the data to be stored, and store the data to be stored in the first target partition of the first target server according to the row key value. The first target partition is the storage partition corresponding to the byte of the first identifier information in the row key value. The byte order of the row key value is reversed with that of the index number.

[0183] It is obvious to those skilled in the art that the modules or steps of the present invention described above can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. They can be implemented using computer-executable program code, and thus can be stored in a storage device for execution by a computing device. In some cases, the steps shown or described can be performed in a different order than those described herein, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, the present invention is not limited to any particular combination of hardware and software.

[0184] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0185] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0186] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0187] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0188] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0189] Memory may include non-persistent memory in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0190] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0191] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.

[0192] As can be seen from the above description, the embodiments of this application achieve the following technical effects:

[0193] 1) The cross-HBase cluster data storage method of this application firstly, upon receiving a data upload message, obtains the location information, first time information, and first identification information of the data to be stored. The location information is used to characterize the planned storage location of the data to be stored, the first time information is used to characterize the upload time of the data to be stored, and the first identification information is used to characterize the planned storage partition of the data to be stored in a first target server, where the first target server is the server where the data to be stored is planned to be stored. Then, the location information, the first time information, and the first identification information are encoded to obtain an index number, which is used to uniquely identify the data to be stored. Finally, the index number is reversed to obtain the row key value of the data to be stored, and the data to be stored is stored in the first target partition of the first target server according to the row key value. The first target partition is the storage partition corresponding to the byte of the first identification information in the row key value, and the byte order of the row key value and the index number is reversed. This application designs a file index code, including a cluster group code, a cluster code, a sequence code, a machine code, and a date code, to support fast data indexing. The file index number is then reversed to obtain a row key value. During the reversal process, the sequence code changes order, and files originally stored sequentially in the same area are distributed across different areas according to the row key value. This solves the problem of hotspots caused by storing files sequentially in the same area in existing technologies, which reduces data storage speed. Simultaneously, the cluster group code, cluster code machine code, and date code support rapid location of data, accelerating retrieval.

[0194] 2) The cross-HBase cluster data storage device of this application, upon receiving a data upload message, the first acquisition unit acquires the location information, first time information, and first identification information of the data to be stored. The location information is used to characterize the planned storage location of the data to be stored, the first time information is used to characterize the upload time of the data to be stored, and the first identification information is used to characterize the planned storage partition of the data to be stored in the first target server. The first target server is the server where the data to be stored is planned to be stored. The encoding unit encodes the location information, the first time information, and the first identification information to obtain an index number. The index number is used to uniquely identify the data to be stored. The reversal unit reverses the index number to obtain the row key value of the data to be stored, and stores the data to be stored in the first target partition in the first target server according to the row key value. The first target partition is the storage partition corresponding to the byte of the first identification information in the row key value, and the byte order of the row key value and the index number is reversed. This application designs a file index code, including a cluster group code, a cluster code, a sequence code, a machine code, and a date code, to support fast data indexing. The file index number is then reversed to obtain a row key value. During the reversal process, the sequence code changes order, and files originally stored sequentially in the same area are distributed across different areas according to the row key value. This solves the problem of hotspots caused by storing files sequentially in the same area in existing technologies, which reduces data storage speed. Simultaneously, the cluster group code, cluster code machine code, and date code support rapid location of data, accelerating retrieval.

[0195] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A data storage method across an HBase cluster, characterized in that, The method includes: Upon receiving a data upload message, the location information, first time information, and first identification information of the data to be stored are obtained. The location information is used to characterize the planned storage location of the data to be stored, the first time information is used to characterize the upload time of the data to be stored, and the first identification information is used to characterize the planned storage partition of the data to be stored in the first target server. The first target server is the server where the data to be stored is planned to be stored. The location information, the first time information, and the first identification information are encoded to obtain an index number, which is used to uniquely identify the data to be stored. The index number is reversed to obtain the row key value of the data to be stored. The data to be stored is stored in the first target partition of the first target server according to the row key value. The first target partition is the storage partition corresponding to the byte of the first identifier information in the row key value. The byte order of the row key value is reversed with that of the index number. Encoding the location information, the first time information, and the first identification information to obtain an index number includes: The first location information is encoded in decimal to obtain the first byte and the second byte of the index number. The first location information is used to characterize the geographical location of the cluster group where the data to be stored is planned to be stored. The cluster group includes multiple HBase clusters and Hadoop clusters. The second position information is binary encoded to obtain the third byte of the index number. The second position information is used to characterize the cluster where the data to be stored is planned to be stored. The third position information is encoded into 30 binary to obtain the fourth byte of the index number. The third position information is used to characterize the server where the data to be stored is planned to be stored. The first time sub-information is encoded in decimal to obtain the fifth and sixth bytes of the index number, and the second time sub-information is encoded in 30-bit binary to obtain the seventh and eighth bytes of the index number. The first time sub-information is the part of the first time information used to represent the year, and the second time sub-information is the part of the first time information used to represent the month and day. The first identification information is encoded in 30-bit binary to obtain the ninth to twelfth bytes of the index number; Storing the data to be stored into the first target partition in the first target server according to the row key value includes: The second identification information is determined based on the row key value. The second identification information corresponds to the ninth to twelfth bytes of the index number and is in reverse order. Obtain third identification information, which is the unique identifier of each partition in the first target server; The first target partition is determined based on the second identification information and the third identification information, and the data to be stored is written into the first target partition.

2. The method according to claim 1, characterized in that, Before acquiring the location information, first time information, and first identifier information of the data to be stored, the method further includes: The data upload message is parsed to obtain a first message format and a message data size. The first message format includes the format information of the data upload message, and the message data size is the data size of the data upload message. If the first message format is consistent with the first preset format and the message data volume is consistent with the preset data volume, it is determined that the data upload message is error-free. If the first message format is inconsistent with the first preset format and / or the message data volume is inconsistent with the preset data volume, it is determined that the data upload message verification is incorrect and a first signaling is issued. The first signaling is used to indicate that the data storage to be stored has failed.

3. The method according to claim 1, characterized in that, After storing the data to be stored in the first target partition of the first target server according to the row key value, the method further includes: Obtain the target index number and determine the storage status of the target data based on the target index number, wherein the target index number is the index number corresponding to the target data, and the storage status includes deleted and not deleted; If the storage status is not deleted, the target cluster group is determined according to the target index number, and the target cluster group is the cluster group to which the target data belongs; The target cluster is determined based on the target index number, and the target cluster is the cluster to which the target data belongs; The second target server is determined based on the target index number, and the second target server is the server to which the target data belongs; The target index number is reversed to obtain the target row key value, and the target data is read from the second target partition of the second target server according to the target row key value.

4. The method according to claim 3, characterized in that, Before obtaining the target index number, the method further includes: A data access message is acquired and parsed to obtain a second message format, the second message format including the format information of the data access message; If the second message format is consistent with the second preset format, it is determined that the data access message verification is correct; If the second message format is inconsistent with the second preset format, it is determined that the data access message verification is incorrect and a second signaling is sent, the second signaling being used to indicate that the data access has failed.

5. The method according to claim 4, characterized in that, Determining the storage status of the target data based on the target index number includes: The first time information is determined based on the target index number; Obtain second time information and target lifecycle, wherein the second time information is the current time and the target lifecycle is the retention time of the target data in the target server; The storage state of the target data is determined based on the first time information, the second time information, and the target lifecycle.

6. A data storage device spanning an HBase cluster, characterized in that, The device includes: The first acquisition unit is used to acquire the location information, first time information and first identification information of the data to be stored when a data upload message is received. The location information is used to characterize the planned storage location of the data to be stored, the first time information is used to characterize the upload time of the data to be stored, and the first identification information is used to characterize the planned storage partition of the data to be stored in the first target server. The first target server is the server where the data to be stored is planned to be stored. An encoding unit is used to encode the location information, the first time information, and the first identification information to obtain an index number, the index number being used to uniquely identify the data to be stored; The reversal unit is used to reverse the index number to obtain the row key value of the data to be stored, and to store the data to be stored in the first target partition in the first target server according to the row key value. The first target partition is the storage partition corresponding to the byte corresponding to the first identification information in the row key value, and the byte order of the row key value is reversed with that of the index number. The encoding unit includes: The first encoding module is used to encode the first location information into decimal to obtain the first byte and the second byte of the index number. The first location information is used to characterize the geographical location of the cluster group where the data to be stored is planned to be stored. The cluster group includes multiple HBase clusters and Hadoop clusters. The second encoding module is used to encode the second position information into binary to obtain the third byte of the index number. The second position information is used to characterize the cluster where the data to be stored is planned to be stored. The third encoding module is used to encode the third position information into 30 binary to obtain the fourth byte of the index number. The third position information is used to characterize the server where the data to be stored is planned to be stored. The fourth encoding module is used to encode the first time sub-information in decimal to obtain the fifth and sixth bytes of the index number, and to encode the second time sub-information in 30-bit binary to obtain the seventh and eighth bytes of the index number. The first time sub-information is the part of the first time information used to represent the year, and the second time sub-information is the part of the first time information used to represent the month and day. The fifth encoding module is used to encode the first identification information in 30-bit binary to obtain the ninth to twelfth bytes of the index number; The inversion unit includes: The first determining module is used to determine the second identification information based on the row key value, wherein the second identification information corresponds to the ninth to twelfth bytes of the index number and is in the reverse order; The first acquisition module is used to acquire third identification information, wherein the third identification information is the unique identifier of each partition in the first target server; The second determining module is used to determine the first target partition based on the second identification information and the third identification information, and to write the data to be stored into the first target partition.

7. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes a stored program, wherein, when the program is executed, it controls the device on which the computer-readable storage medium is located to perform the method according to any one of claims 1 to 5.

8. A business system, characterized in that, include: One or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising methods for performing any one of claims 1 to 5.

Citation Information

Patent Citations

Nucleic acid based data storage
CN110248724A
Reverse-byte indexing
US5956705A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Nucleic acid based data storage

Reverse-byte indexing