Data storage method, device, equipment, storage medium and product
By generating unique identifiers and querying them in the data storage area, the problem of redundant data in cloud storage is solved, and efficient utilization of storage resources is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING QIHOOD TECHNOLOGY CO LTD
- Filing Date
- 2024-12-17
- Publication Date
- 2026-06-19
AI Technical Summary
There is a lot of redundant data in cloud storage, which wastes storage resources.
By generating a unique identifier for the target file, a file query is performed in the data storage area based on this identifier. If no identical file is found, the identifier is added to the target file and stored to avoid storing redundant data.
Ensure that files with identical content are saved only once, saving storage resources and reducing storage costs.
Smart Images

Figure CN122240577A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of data storage technology, and in particular to a data storage method, apparatus, device, storage medium and product. Background Technology
[0002] Cloud storage is a data storage model that stores data on remote servers via the internet, managed and maintained by third-party providers. The advantage of cloud storage is that it allows users to access and manage their data from any location using any device over a network. This not only increases the flexibility of data management for users but also reduces the cost of local hardware investment and maintenance.
[0003] In related technologies, after receiving files uploaded by users, the server stores them. Due to the large user base of cloud storage, the amount of data stored on the server is enormous, inevitably containing a large amount of redundant data and wasting storage resources.
[0004] The above content is only used to help understand the technical solution of this application and does not represent an admission that the above content is prior art. Summary of the Invention
[0005] The main objective of this application is to provide a data storage method, apparatus, device, storage medium, and product that can avoid storing redundant data, thereby saving storage resources.
[0006] To achieve the above objectives, this application proposes a data storage method, executed by a storage server, the method comprising:
[0007] In response to a data storage request, the target file to be stored is extracted from the data storage request;
[0008] Generate a unique identifier corresponding to the target file based on the content of the target file;
[0009] Based on the unique identifier, a file query is performed in the data storage area. If no file containing the unique identifier is found, the unique identifier is added to the target file.
[0010] The target file, with the unique identifier added, is stored in the data storage area.
[0011] Optionally, generating a unique identifier corresponding to the target file based on its content includes:
[0012] Perform a hash operation on the contents of the target file to obtain the hash value of the target file;
[0013] The hash value is determined as a unique identifier corresponding to the target file.
[0014] Optionally, adding the unique identifier to the target file includes:
[0015] The unique identifier is used to determine the filename of the target file, and the extension of the target file is removed to obtain the target file with the unique identifier but no extension.
[0016] Optionally, storing the target file with the added unique identifier in the data storage area includes:
[0017] A file index is created based on the unique identifier, and the file index contains the unique identifier and the target location indicated by the unique identifier;
[0018] The target file, which has been given the unique identifier, is stored at the target location in the data storage area.
[0019] Optionally, storing the target file with the added unique identifier at the target location in the data storage area includes:
[0020] Obtain the metadata of the target file;
[0021] The metadata is standardized to obtain metadata in the target format;
[0022] The target file, with the added unique identifier, and the metadata of the target format are stored at the target location in the data storage area.
[0023] Optionally, the method further includes:
[0024] In response to a data query request, a unique identifier for the target file to be queried is extracted from the data query request;
[0025] Based on the file index, determine the target location indicated by the unique identifier;
[0026] Query the target file from the target location in the data storage area;
[0027] Return a data query response, which carries the target file.
[0028] Optionally, the data storage request is sent by the database server in response to the data upload request, and the data upload request carries the user identifier, the target file to be uploaded, and the metadata of the target file;
[0029] After storing the target file with the added unique identifier in the data storage area, the method further includes:
[0030] A data storage response carrying the unique identifier is returned to the database server, instructing the database server to store the user identifier, the metadata of the target file, and the unique identifier together.
[0031] Optionally, the method further includes:
[0032] If a file containing the unique identifier is found in the data storage area, the target file in the data storage request is deleted, and the data storage response is returned to the database server.
[0033] To achieve the above objectives, this application proposes another data storage method, executed by a database server, the method comprising:
[0034] In response to a data upload request from a user terminal, the user identifier, the target file to be uploaded, and the metadata of the target file are extracted from the data upload request;
[0035] A data storage request carrying the target file is sent to the storage server, so that the storage server generates a unique identifier corresponding to the target file based on the content of the target file; a file query is performed in the data storage area based on the unique identifier; if no file containing the unique identifier is found, the target file with the added unique identifier is stored in the data storage area, and a data storage response carrying the unique identifier is returned.
[0036] In response to the data storage response, the unique identifier is extracted from the data storage response, and the user identifier, the metadata of the target file, and the unique identifier are stored together.
[0037] Optionally, the method further includes:
[0038] In response to a data access request from the user terminal, the user identifier and metadata of the target file are extracted from the data access request;
[0039] Determine a unique identifier that is associated with the user identifier and the metadata of the target file;
[0040] Send a data query request carrying the unique identifier to the storage server, so that the storage server can query the target file in the data storage area based on the unique identifier, and return a data query response carrying the target file to the database server;
[0041] In response to the data query response, a data access response is returned to the user terminal, the data access response carrying the target file.
[0042] Furthermore, to achieve the above objectives, this application also proposes a data storage device configured on a storage server, the device comprising:
[0043] The file acquisition module is used to extract the target file to be stored from the data storage request in response to the data storage request;
[0044] An identifier generation module is used to generate a unique identifier corresponding to the target file based on the content of the target file;
[0045] The identifier addition module is used to perform file queries in the data storage area based on the unique identifier, and add the unique identifier to the target file if no file containing the unique identifier is found.
[0046] The file storage module is used to store the target file with the added unique identifier in the data storage area.
[0047] Optionally, the identifier generation module is used to perform a hash operation on the content of the target file to obtain the hash value of the target file; and to determine the hash value as a unique identifier corresponding to the target file.
[0048] Optionally, the identifier adding module is used to determine the unique identifier as the filename of the target file, and to remove the extension of the target file to obtain the target file without an extension named with the unique identifier.
[0049] Optionally, the file storage module includes:
[0050] An index creation unit is configured to create a file index based on the unique identifier, wherein the file index includes the unique identifier and the target location indicated by the unique identifier;
[0051] A file storage unit is used to store the target file, which has been given the unique identifier, at the target location in the data storage area.
[0052] Optionally, the file storage unit is used to acquire the metadata of the target file; perform standardization processing on the metadata to obtain metadata in the target format; and store the target file with the added unique identifier and the metadata in the target format in the target location of the data storage area.
[0053] Optionally, the device further includes:
[0054] The data query module is configured to, in response to a data query request, extract a unique identifier for the target file to be queried from the data query request; determine the target location indicated by the unique identifier based on the file index; query the target file from the target location in the data storage area; and return a data query response carrying the target file.
[0055] Optionally, the data storage request is sent by the database server in response to the data upload request, and the data upload request carries the user identifier, the target file to be uploaded, and the metadata of the target file;
[0056] The device further includes:
[0057] The data sending module is configured to, after storing the target file with the added unique identifier in the data storage area, return a data storage response carrying the unique identifier to the database server, so as to instruct the database server to store the user identifier, the metadata of the target file and the unique identifier together.
[0058] Optionally, the data sending module is further configured to, if a file containing the unique identifier is found in the data storage area, delete the target file in the data storage request and return the data storage response to the database server.
[0059] Furthermore, to achieve the above objectives, this application also proposes another data storage device configured on a database server, the device comprising:
[0060] The data acquisition module is used to respond to a data upload request from a user terminal and extract the user identifier, the target file to be uploaded, and the metadata of the target file from the data upload request.
[0061] The data sending module is used to send a data storage request carrying the target file to the storage server, so that the storage server generates a unique identifier corresponding to the target file based on the content of the target file; performs a file query in the data storage area based on the unique identifier; if no file containing the unique identifier is found, stores the target file with the added unique identifier in the data storage area; and returns a data storage response carrying the unique identifier.
[0062] A data storage module is configured to, in response to the data storage response, extract the unique identifier from the data storage response and associate and store the user identifier, the metadata of the target file, and the unique identifier together.
[0063] Optionally, the device further includes:
[0064] A data query module is configured to, in response to a data access request from the user terminal, extract the user identifier and metadata of the target file from the data access request; determine a unique identifier associated with the user identifier and the metadata of the target file; send a data query request carrying the unique identifier to the storage server, so that the storage server queries the target file in the data storage area based on the unique identifier, and return a data query response carrying the target file to the database server; and, in response to the data query response, return a data access response to the user terminal, the data access response carrying the target file.
[0065] In addition, to achieve the above objectives, this application also proposes a data storage device, the device comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the data storage method described above.
[0066] In addition, to achieve the above objectives, this application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, and which, when executed by a processor, implements the steps of the data storage method described above.
[0067] In addition, to achieve the above objectives, this application also provides a computer program product, which includes a computer program that, when executed by a processor, implements the steps of the data storage method described above.
[0068] One or more technical solutions proposed in this application have at least the following technical effects:
[0069] The data storage scheme provided in this application, upon receiving a data storage request carrying a target file, does not directly store the target file. Instead, it generates a unique identifier for the target file based on its content and uses this unique identifier to perform a file search in the data storage area. If no file containing this unique identifier is found, it means that the data storage area does not store a file with the same content as the target file. In this case, the unique identifier is added to the target file, and then the target file with the added unique identifier is stored in the data storage area. This scheme ensures that files with identical content are stored only once, avoiding the storage of large amounts of redundant data and thus saving storage resources. Attached Figure Description
[0070] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0071] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0072] Figure 1 This is a schematic diagram of an implementation environment for the data storage method of this application;
[0073] Figure 2 This is a flowchart illustrating the first embodiment of the data storage method of this application.
[0074] Figure 3 A schematic diagram of a file storage process provided for this application;
[0075] Figure 4 This is a flowchart illustrating the second embodiment of the data storage method of this application.
[0076] Figure 5 This is a schematic diagram of the module structure of a data storage device provided in an embodiment of this application;
[0077] Figure 6 A schematic diagram of the module structure of another data storage device provided in the embodiments of this application;
[0078] Figure 7 This is a schematic diagram of the device structure of the hardware operating environment involved in the data storage method in this application embodiment.
[0079] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation
[0080] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.
[0081] To better understand the technical solution of this application, a detailed description will be provided below in conjunction with the accompanying drawings and specific implementation methods.
[0082] Figure 1 This is a schematic diagram of an implementation environment provided by an embodiment of this disclosure. See also... Figure 1The implementation environment includes a user terminal 101, a database server 102, and a storage server 103. The user terminal 101, database server 102, and storage server 103 are connected via a wireless or wired network. For example, the terminal 101 is installed with a target application that provides services from the database server 102 and storage server 103, enabling the terminal 101 to perform functions such as data transmission and message interaction through this target application.
[0083] For example, user terminal 101 is a computer, mobile phone, tablet computer, or other terminal. For example, target application is a target application in the operating system of user terminal 101, or a target application provided by a third party. For example, the target application is a search application, short video application, photo application, etc., which provides online data storage function.
[0084] In this application, user terminal 101 sends a data upload request, which carries a user identifier, a target file to be uploaded, and the target file's metadata. Database server 102 responds to the data upload request from the user terminal by extracting the user identifier, the target file to be uploaded, and the target file's metadata from the data upload request. Then, it sends a data storage request carrying the target file to storage server 103. Storage server 103 generates a unique identifier corresponding to the target file based on the target file's content; performs a file query in the data storage area based on this unique identifier; if no file containing the unique identifier is found, it adds the unique identifier to the target file and stores the target file with the added unique identifier in the data storage area. Then, it returns a data storage response carrying the unique identifier to database server 102. Database server 102 also responds to the data storage response by extracting the unique identifier from the data storage response and storing the user identifier, the target file's metadata, and the unique identifier together.
[0085] For example, this implementation environment also includes an application server. The application server is connected to the user terminal 101 and the database server 102 via a network. The application server is used to receive data upload requests from the user terminal 101 and forward the data upload requests to the database server 102. For example, the application server is used to authenticate the user identifier in the data upload request, and if the authentication is successful, forward the data upload request to the database server 102.
[0086] For example, this implementation environment includes multiple application servers and a load balancer. The load balancer is connected to the user terminal 101 and the multiple application servers via a network. The load balancer receives data upload requests from the user terminal 101 and forwards the data upload requests to the application servers among the multiple application servers whose current load has not reached the load threshold.
[0087] The data storage method provided in this application is applicable to various scenarios. Optionally, it can be used for storing film and television data. For example, the solution provided in this application can be used to store a target movie uploaded by a user terminal to a storage server. It ensures that for the same movie, only one movie file will be stored on the storage server. Optionally, it can also be used for storing e-books. For example, the solution provided in this application can be used to store a target e-book uploaded by a user terminal to a storage server. It ensures that for the same e-book, only one e-book file will be stored on the storage server.
[0088] Figure 2 This is a flowchart illustrating the first embodiment of the data storage method of this application. (Refer to...) Figure 2 Taking a storage server as the executing entity as an example, this data storage method includes the following steps S210 to S240:
[0089] Step S210: In response to the data storage request, extract the target file to be stored from the data storage request.
[0090] A data storage request is an instruction or command issued by a user terminal or other server to a storage server, requesting that certain data be saved to the storage medium. The target file is the specific file specified in the data storage request that needs to be saved. It can be any type of file, such as a text file, image, video, or binary file.
[0091] Storage servers are where user files are actually stored. These servers are typically configured with large amounts of hard disk space and use distributed file systems to manage the data. For example, distributed file systems include OBS (Object-Based Storage).
[0092] After receiving a data storage request, the storage server responds by extracting the target file to be stored from the data storage request. In other words, it separates the target file that needs to be stored from the data storage request so that the target file can be processed later.
[0093] Step S220: Generate a unique identifier corresponding to the target file based on the content of the target file.
[0094] A unique identifier is an identifier used to uniquely identify a target file. It can be numbers, letters, or a combination of them. Because the unique identifier is generated based on the content of the target file rather than the filename or other external attributes, the generated unique identifier will remain consistent even if the filename changes, as long as the content remains the same. If the file content is modified, the unique identifier will also change accordingly.
[0095] Optionally, generating a unique identifier corresponding to the target file based on the content of the target file includes: performing a hash operation on the content of the target file to obtain the hash value of the target file; and determining the hash value as the unique identifier corresponding to the target file.
[0096] Hash operations are a process of mapping data of arbitrary length to a fixed-length value, accomplished through a specific algorithm. For example, a storage server performs an MD5 (Message-Digest Algorithm 5) operation on the content of a target file to obtain a hash value. This hash value is a 128-bit binary number. When this binary number is converted to hexadecimal representation, it consists of 32 characters. Alternatively, the storage server performs a SHA-1 (Secure Hash Algorithm 1) operation on the content of the target file to obtain a hash value. This hash value is a 160-bit binary number. When this binary number is converted to hexadecimal representation, it consists of 40 characters.
[0097] Since the same input will always produce the exact same hash value using the same hash algorithm, and different inputs are almost impossible to produce the same hash value, hashing the contents of the target file to obtain its hash value, and then using that hash value as a unique identifier for the target file, ensures that this unique identifier can uniquely indicate the contents of the target file.
[0098] Step S230: Perform a file query in the data storage area based on the unique identifier. If no file containing the unique identifier is found, add the unique identifier to the target file.
[0099] A data storage area refers to a physical or virtual location used for long-term data storage, such as a hard drive or solid-state drive. This data storage area stores a large number of files, and each file is uniquely identified to uniquely indicate its content.
[0100] For example, given the unique identifier of the target file, the storage server traverses the data storage area to check if a file containing that unique identifier is stored. If no file containing that unique identifier is found, it means the data storage area does not store a file with the same content as the target file. If a file containing that unique identifier is found, it means the data storage area already stores a file with the same content as the target file. Accordingly, if no file containing that unique identifier is found, the target file should be stored in the data storage area. Before storing, the unique identifier must be added to the target file.
[0101] Optionally, the unique identifier is added to the target file, including: determining the unique identifier as the filename of the target file, and removing the file extension to obtain a target file with the unique identifier but no extension. When the target file is renamed with the unique identifier and has no extension, if there are files with duplicate content as the target file, the storage server can identify and delete redundant copies through simple filename comparison, saving storage space.
[0102] Step S240: Store the target file with the added unique identifier in the data storage area.
[0103] After adding the identifier, the next step is to save the target file with the unique identifier to the data storage area. This includes writing the target file itself and its unique identifier to the storage medium so that the unique identifier can be used to quickly determine whether there are duplicate files in the future.
[0104] As can be understood from the above process, if the content of a file is being uploaded for the first time, meaning it has never been stored before, a new unique identifier will be assigned to the file, and this identifier will be saved along with the file. If other files with the same identifier already exist, it means that other files with the same content have already been stored, and the file will not be stored repeatedly. This avoids storing redundant data, reduces storage capacity, and lowers storage costs.
[0105] The data storage scheme provided in this application, upon receiving a data storage request carrying a target file, does not directly store the target file. Instead, it generates a unique identifier for the target file based on its content and uses this unique identifier to perform a file search in the data storage area. If no file containing this unique identifier is found, it means that the data storage area does not store a file with the same content as the target file. In this case, the unique identifier is added to the target file, and then the target file with the added unique identifier is stored in the data storage area. This scheme ensures that files with identical content are stored only once, avoiding the storage of large amounts of redundant data and thus saving storage resources.
[0106] Optionally, storing the target file with the added unique identifier in the data storage area includes: creating a file index based on the unique identifier, the file index containing the unique identifier and the target location indicated by the unique identifier; and storing the target file with the added unique identifier at the target location in the data storage area.
[0107] A file index is a data structure that contains key information about a file to accelerate file retrieval and management. In this application, the file index records the unique identifier of each file and the target location indicated by each identifier, i.e., the specific storage location of the file. This facilitates quick file location without traversing the entire data storage area in the database. The target location refers to the specific storage location or path of the file within the data storage area. It can be a directory on a physical disk, a bucket in a cloud storage service, or a node in a distributed file system. The target location determines where the file will be stored and is determined by entries in the file index.
[0108] This application embodiment establishes a file index containing unique identifiers and their corresponding locations, so that the query operation only needs to access the index structure instead of traversing the entire data storage area, thus enabling fast file searching.
[0109] Optionally, storing the target file with the added unique identifier at a target location in the data storage area includes: obtaining the metadata of the target file; standardizing the metadata to obtain metadata in the target format; and storing the target file with the added unique identifier and the metadata in the target format at the target location in the data storage area.
[0110] Metadata refers to data that describes file attributes, providing additional information about the file. Examples include filename, creation time, modification time, file size, file type, author, and copyright information. Metadata helps in better managing and understanding file content. Standardization refers to the process of converting raw metadata into a format that conforms to a specific standard. For example, standardizing field names, data types, units, and encoding methods ensures that metadata from different sources or formats can be consistently parsed and used. Target format metadata is metadata organized according to a predetermined format after standardization. The target format is a structured representation agreed upon internally by the system or that follows industry standards. Target format metadata facilitates the exchange and sharing of information between different systems.
[0111] In this embodiment, by standardizing the metadata, the consistency and normalization of metadata fields are ensured, facilitating subsequent parsing and use. This also helps improve the readability and usability of metadata, making it easier for users and the system to understand and manipulate this information. Furthermore, by storing the target format metadata together with the target file, which has been given a unique identifier, at the target location in the data storage area, subsequent searches based on the file index can not only quickly locate the target file but also quickly locate its metadata, improving data query efficiency.
[0112] After the target file is stored in the data storage area, the storage server can provide a target file query service. Accordingly, in response to a data query request, the storage server extracts the unique identifier of the target file from the query request; determines the target location indicated by the unique identifier based on the file index; queries the target file from the target location in the data storage area; and then returns a data query response carrying the target file.
[0113] A data query request is a message sent from a user terminal or other server to the storage server to request a specific file. This request includes filtering criteria such as a unique identifier for the file to be queried, so that the storage server can accurately locate and return the required file.
[0114] In this application, the file index structure includes a unique identifier for each file and the file storage location indicated by that unique identifier. The storage server first locates the unique identifier of the target file from the file index structure, determines the location indicated by that unique identifier, and then retrieves the target file from that location. This data query method only requires accessing the file index structure and does not require traversing the entire data storage area, enabling fast searching of the target file.
[0115] Optionally, the storage server is configured with a security component to prevent unauthorized access and protect the system from attacks by encrypting data. For example, during the query service process, after retrieving the target file corresponding to the unique identifier from the data storage area, the storage server uses this security component to regenerate the unique identifier based on the content of the target file. The newly generated unique identifier is compared with the original unique identifier; if they match, it indicates that the content of the target file has not been tampered with. In this case, a data query response carrying the target file is returned. This scheme ensures the authenticity and reliability of the returned target file content.
[0116] In one possible implementation, a data storage request is sent by the database server in response to a data upload request. This data upload request carries the user identifier, the target file to be uploaded, and the target file's metadata. Correspondingly, after storing the target file with the added unique identifier in the data storage area, the storage server returns a data storage response carrying the unique identifier to the database server, instructing the database server to associate and store the user identifier, the target file's metadata, and the unique identifier.
[0117] The database server stores user information and file metadata. It can use a relational database management system, such as MySQL. In this embodiment, the database server stores structured information at the user level. The storage server, acting as the underlying layer of the cloud storage system, stores file-level data, such as documents, images, videos, and log files. This data typically does not undergo complex structured processing. For example, after user A uploads movie B, the database server records user A's information and the metadata of movie B. This includes, for instance, the name user A gave to movie B and the upload time. The storage server stores the file "movie B," i.e., the actual content of movie B.
[0118] In this embodiment, after the storage server stores the target file with a unique identifier added in the data storage area, it returns a data storage response carrying the unique identifier to the database server. The database server then knows that the target file has been successfully stored and subsequently stores the user identifier, the target file's metadata, and the unique identifier together. Thus, when the database server needs to retrieve the content of the target file later, it only needs to provide the storage server with the unique identifier associated with the target file's metadata to obtain the target file from the storage server.
[0119] The above embodiments describe a data storage scheme when no file containing the unique identifier of the target file is found in the data storage area. Specifically, if it is confirmed that no file with the same content as the target file is stored in the data storage area, the target file is stored in the data storage area, and a data storage response is provided. The following describes the processing flow when a file containing the unique identifier is found in the data storage area.
[0120] Optionally, if a file containing the unique identifier is found in the data storage area, the target file in the data storage request is deleted, and a data storage response carrying the unique identifier is returned to the database server. If a file containing the unique identifier is found, it means that the data storage area already stores a file with the same content as the target file. In this case, deleting the target file in the data storage request without storing it avoids storing redundant data. Furthermore, returning a data storage response carrying the unique identifier to the database server informs the database server that the target file has been stored, allowing the database server to associate and store the user identifier, the target file's metadata, and the unique identifier. Thus, when the database server subsequently needs to retrieve the content of the target file, it only needs to provide the storage server with the unique identifier associated with the target file's metadata to retrieve the file with the same content as the target file.
[0121] For example, if a reference file containing a unique identifier for the target file is found in the data storage area, a unique identifier is regenerated based on the content of the reference file using a security component. The newly generated unique identifier is compared with the unique identifier of the target file. If they match, it indicates that the content of the stored reference file has not been tampered with. In this case, the target file in the data storage request is deleted, and a data storage response carrying the unique identifier is returned to the database server. If, after comparing the newly generated unique identifier with the unique identifier of the target file, it is determined that the newly generated unique identifier does not match the unique identifier of the target file, it indicates that the content of the stored reference file has been tampered with, and therefore the content of the reference file is inconsistent with the content of the target file. In this case, the reference file is deleted, and the target file, after adding its corresponding unique identifier, is stored in the data storage area. Since this application stores only one copy of files with identical content, by deleting newly uploaded target files only when it is determined that the content of the stored files has not been tampered with, the authenticity of the stored file content can be guaranteed, improving the reliability of file management.
[0122] Figure 3 A diagram illustrating a file storage process is provided. (Refer to...) Figure 3 After receiving a data storage request from the database server, the storage server extracts the target file from the request. Then, it performs file transformation, converting the target file into a uniquely named file without an extension. Next, it performs file metadata normalization. Then, it performs file deduplication, checking if any other files with identical content to the target file are already stored; if so, the target file is deleted; otherwise, a file index is created. Finally, it stores the file, including both the target file and its metadata.
[0123] Based on the first embodiment described above, a second embodiment of this application is proposed. Contents that are the same as or similar to the first embodiment can be referred to the above description and will not be repeated hereafter. (Refer to...) Figure 4 Taking a database server as the executing entity as an example, in the second embodiment, the data storage method includes the following steps S410 to S430:
[0124] Step S410: In response to a data upload request from the user terminal, extract the user identifier, the target file to be uploaded, and the metadata of the target file from the data upload request.
[0125] A data upload request is a message sent from a user terminal to a database server, with the purpose of transferring a target file to the server for storage. This data upload request carries a user identifier, the target file to be uploaded, and the target file's metadata. Correspondingly, upon receiving the data upload request, the database server, in response, extracts the user identifier, the target file, and the target file's metadata from the request, stores the user identifier and the target file's metadata together, and sends a data storage request carrying the target file to the storage server to store the target file there.
[0126] Step S420: Send a data storage request carrying the target file to the storage server so that the storage server generates a unique identifier corresponding to the target file based on the content of the target file, performs a file query in the data storage area based on the unique identifier, and if no file containing the unique identifier is found, stores the target file with the added unique identifier in the data storage area and returns a data storage response carrying the unique identifier.
[0127] Please refer to the implementation example above for how this step is implemented; it will not be repeated here.
[0128] Step S430: In response to the data storage response, extract the unique identifier from the data storage response and associate the user identifier, the metadata of the target file, and the unique identifier for storage.
[0129] After receiving the data storage response from the storage server, the database server determines that the storage server has successfully stored the target file. In this case, it extracts the unique identifier of the target file from the data storage response, and then associates and stores this unique identifier with the user identifier and the target file's metadata for subsequent file queries.
[0130] Optionally, in response to a data access request from a user terminal, the database server extracts the user identifier and metadata of the target file from the data access request; determines a unique identifier associated with the user identifier and the metadata of the target file; sends a data query request carrying the unique identifier to the storage server, enabling the storage server to query the target file in the data storage area based on the unique identifier, and returns a data query response carrying the target file to the database server. Then, in response to the data query response, the database server returns a data access response to the user terminal, the data access response carrying the target file.
[0131] A data access request is a request sent by the user terminal to the database server to access file data stored on the database server or storage server. This data access request carries the user's identifier and metadata of the target file, such as the target file's name. Upon receiving this data access request, the database server can determine what the target file is, and then retrieves the target file from the storage server by sending a data query request carrying the target file's unique identifier. Finally, it sends the target file to the user terminal by sending a data query response.
[0132] Optionally, the data access request is sent from the user terminal to the application server, which then forwards it to the database server. For example, data access requests can include various types, such as requests to download a target file, requests to modify the name of a target file, and requests to delete a target file. After receiving the data access request forwarded by the application server, the database server determines the next processing operation based on the type of the data access request. For example, if the data access request is for downloading a target file, the database server sends a data query request to the storage server to retrieve the target file. As another example, if the data access request is for modifying the name of a target file, the database server directly modifies the name in the metadata of the target file stored locally, without sending a data query request to the database server. Yet another example, if the data access request is for deleting a target file, the database server simply deletes the metadata and unique identifier of the target file stored locally and associated with the user's identifier, without sending a data query request to the database server.
[0133] Understandably, the database server stores user-level information. For the same file in the storage server, the metadata associated with that file by different user identifiers in the database server may be different, but they are all linked to the same file in the storage server through the same identifier. In this way, for files with different metadata uploaded by different users but duplicate content, the database server stores the metadata for each user, while the storage server stores the actual file content, thus achieving a reduction in storage capacity.
[0134] The data storage scheme provided in this application involves a storage server receiving a data storage request carrying a target file. Instead of directly storing the target file, it generates a unique identifier for the target file based on its content and then searches the data storage area using this unique identifier. If no file containing the unique identifier is found, it means the data storage area does not store a file with the same content as the target file. In this case, the unique identifier is added to the target file, and then the target file with the added unique identifier is stored in the data storage area. This scheme ensures that files with identical content are stored only once, avoiding the storage of large amounts of redundant data and thus saving storage resources.
[0135] This application also provides a data storage device configured on a storage server; please refer to [reference needed]. Figure 5 The data storage device includes:
[0136] The file acquisition module 510 is used to extract the target file to be stored from the data storage request in response to the data storage request;
[0137] The identifier generation module 520 is used to generate a unique identifier corresponding to the target file based on the content of the target file;
[0138] The identifier addition module 530 is used to perform file queries in the data storage area based on unique identifiers, and to add a unique identifier to the target file if no file containing a unique identifier is found.
[0139] File storage module 540 is used to store target files with added unique identifiers in the data storage area.
[0140] Optionally, the identifier generation module 520 is used to perform a hash operation on the content of the target file to obtain the hash value of the target file; and to determine the hash value as a unique identifier corresponding to the target file.
[0141] Optionally, the identifier addition module 530 is used to determine the unique identifier as the filename of the target file and remove the extension of the target file to obtain a target file with the unique identifier but no extension.
[0142] Optionally, the file storage module 540 includes:
[0143] The index creation unit is used to create a file index based on a unique identifier. The file index contains the unique identifier and the target location indicated by the unique identifier.
[0144] The file storage unit is used to store target files with unique identifiers at the target location in the data storage area.
[0145] Optionally, the file storage unit is used to obtain the metadata of the target file; to standardize the metadata to obtain metadata in the target format; and to store the target file with the added unique identifier and the metadata in the target format in the target location of the data storage area.
[0146] Optionally, the device further includes:
[0147] The data query module is used to respond to data query requests by extracting the unique identifier of the target file to be queried from the data query request; determining the target location indicated by the unique identifier based on the file index; querying the target file from the target location in the data storage area; and returning a data query response, which carries the target file.
[0148] Optionally, the data storage request is sent by the database server in response to the data upload request. The data upload request carries the user identifier, the target file to be uploaded, and the metadata of the target file.
[0149] The device also includes:
[0150] The data sending module is used to return a data storage response carrying the unique identifier to the database server after storing the target file with the added unique identifier in the data storage area, so as to instruct the database server to store the user identifier, the metadata of the target file and the unique identifier together.
[0151] Optionally, the data sending module is also used to delete the target file in the data storage request and return a data storage response to the database server if a file containing a unique identifier is found in the data storage area.
[0152] This application also provides a data storage device configured on a database server; please refer to [reference needed]. Figure 6 The database server includes:
[0153] The data acquisition module 610 is used to respond to a data upload request from a user terminal and extract the user identifier, the target file to be uploaded, and the metadata of the target file from the data upload request.
[0154] The data sending module 620 is used to send a data storage request carrying a target file to the storage server, so that the storage server generates a unique identifier corresponding to the target file based on the content of the target file; performs a file query in the data storage area based on the unique identifier; if no file containing the unique identifier is found, stores the target file with the added unique identifier in the data storage area, and returns a data storage response carrying the unique identifier.
[0155] The data storage module 630 is used to extract a unique identifier from the data storage response in response to the data storage response, and to associate and store the user identifier, the metadata of the target file and the unique identifier.
[0156] Optionally, the device further includes:
[0157] The data query module is used to respond to data access requests from user terminals, extract user identifiers and target file metadata from the data access requests; determine a unique identifier associated with the user identifiers and target file metadata; send a data query request carrying the unique identifier to the storage server, so that the storage server can query the target file in the data storage area based on the unique identifier, and return a data query response carrying the target file to the database server; and return a data access response carrying the target file to the user terminal in response to the data query response.
[0158] The data storage device provided in this application, employing the data storage method in the above embodiments, can solve the technical problem in related technologies where servers store large amounts of redundant data, wasting storage resources. Compared with the prior art, the beneficial effects of the data storage device provided in this application are the same as those of the data storage method provided in the above embodiments, and other technical features in the data storage device are the same as those disclosed in the methods of the above embodiments, and will not be repeated here.
[0159] This application provides a data storage device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, which are executed by the at least one processor to enable the at least one processor to perform the data storage method in Embodiment 1 above.
[0160] The following is for reference. Figure 7The diagram illustrates a structural schematic of a data storage device suitable for implementing embodiments of this application. The data storage device in these embodiments may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Portable Application Description), PMPs (Portable Media Players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers. Figure 7 The data storage device shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of this application.
[0161] like Figure 7 As shown, the data storage device may include a processing unit 1001 (e.g., a central processing unit, a graphics processing unit, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1002 or a program loaded from a storage device 1003 into a random access memory (RAM) 1004. The RAM 1004 also stores various programs and data required for the operation of the data storage device. The processing unit 1001, ROM 1002, and RAM 1004 are interconnected via a bus 1005. An input / output (I / O) interface 1006 is also connected to the bus. Typically, the following systems can be connected to the I / O interface 1006: input devices 1007 including, for example, touchscreens, touchpads, keyboards, mice, image sensors, microphones, accelerometers, gyroscopes, etc.; output devices 1008 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 1003 including, for example, magnetic tapes, hard disks, etc.; and communication devices 1009. Communication device 1009 allows the data storage device to communicate wirelessly or wiredly with other devices to exchange data. Although the figure shows data storage devices with various systems, it should be understood that it is not required to implement or have all of the systems shown. More or fewer systems may be implemented alternatively.
[0162] Specifically, according to the embodiments disclosed in this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device, or installed from storage device 1003, or installed from ROM 1002. When the computer program is executed by processing device 1001, it performs the functions defined in the methods of the embodiments disclosed in this application.
[0163] The data storage device provided in this application, employing the data storage method described in the above embodiments, can solve the technical problem in related technologies where servers store large amounts of redundant data, wasting storage resources. Compared with the prior art, the beneficial effects of the data storage device provided in this application are the same as those of the data storage method provided in the above embodiments, and other technical features of this data storage device are the same as those disclosed in the previous embodiment method, and will not be repeated here.
[0164] It should be understood that the various parts disclosed in this application can be implemented using hardware, software, firmware, or a combination thereof. In the description of the above embodiments, specific features, structures, materials, or characteristics can be combined in any suitable manner in one or more embodiments or examples.
[0165] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
[0166] This application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon, the computer-readable program instructions being used to execute the data storage method in the above embodiments.
[0167] The computer-readable storage medium provided in this application may be, for example, a USB flash drive, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, system, or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (Radio Frequency), etc., or any suitable combination thereof.
[0168] The aforementioned computer-readable storage medium may be included in a data storage device; or it may exist independently and not assembled into a data storage device.
[0169] The aforementioned computer-readable storage medium carries one or more programs that, when executed by the data storage device, cause the data storage device to: in response to a data storage request, extract a target file to be stored from the data storage request; generate a unique identifier corresponding to the target file based on the content of the target file; perform a file search in the data storage area based on the unique identifier; and, if no file containing the unique identifier is found, add a unique identifier to the target file; and store the target file with the added unique identifier in the data storage area.
[0170] Computer program code for performing the operations of this application can be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, and C++, and conventional procedural programming languages such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).
[0171] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0172] The modules described in the embodiments of this application can be implemented in software or hardware. The names of the modules do not necessarily limit the functionality of the unit itself.
[0173] The readable storage medium provided in this application is a computer-readable storage medium that stores computer-readable program instructions (i.e., a computer program) for executing the above-described data storage method. This solves the technical problem in related technologies where servers store large amounts of redundant data, wasting storage resources. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this application are the same as those of the data storage method provided in the above embodiments, and will not be repeated here.
[0174] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the data storage method described above.
[0175] The computer program product provided in this application can solve the technical problem in related technologies of servers storing large amounts of redundant data, resulting in wasted storage resources. Compared with the prior art, the beneficial effects of the computer program product provided in this application are the same as those of the data storage method provided in the above embodiments, and will not be repeated here.
[0176] The above description is only a part of the embodiments of this application and does not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.
Claims
1. A data storage method, characterized in that, The method, performed by the storage server, includes: In response to a data storage request, the target file to be stored is extracted from the data storage request; Generate a unique identifier corresponding to the target file based on the content of the target file; Based on the unique identifier, a file query is performed in the data storage area. If no file containing the unique identifier is found, the unique identifier is added to the target file. The target file, with the unique identifier added, is stored in the data storage area.
2. The method as described in claim 1, characterized in that, The step of generating a unique identifier corresponding to the target file based on the content of the target file includes: Perform a hash operation on the contents of the target file to obtain the hash value of the target file; The hash value is determined as a unique identifier corresponding to the target file.
3. The method as described in claim 1, characterized in that, Adding the unique identifier to the target file includes: The unique identifier is used to determine the filename of the target file, and the extension of the target file is removed to obtain the target file with the unique identifier but no extension.
4. The method as described in claim 1, characterized in that, Storing the target file with the added unique identifier in the data storage area includes: A file index is created based on the unique identifier, and the file index contains the unique identifier and the target location indicated by the unique identifier; The target file, which has been given the unique identifier, is stored at the target location in the data storage area.
5. A data storage method, characterized in that, The method, executed by the database server, includes: In response to a data upload request from a user terminal, the user identifier, the target file to be uploaded, and the metadata of the target file are extracted from the data upload request; A data storage request carrying the target file is sent to the storage server, so that the storage server generates a unique identifier corresponding to the target file based on the content of the target file; a file query is performed in the data storage area based on the unique identifier; if no file containing the unique identifier is found, the target file with the added unique identifier is stored in the data storage area, and a data storage response carrying the unique identifier is returned. In response to the data storage response, the unique identifier is extracted from the data storage response, and the user identifier, the metadata of the target file, and the unique identifier are stored together.
6. A data storage device, characterized in that, Configured on a storage server, the device includes: The file acquisition module is used to extract the target file to be stored from the data storage request in response to the data storage request; An identifier generation module is used to generate a unique identifier corresponding to the target file based on the content of the target file; The identifier addition module is used to perform file queries in the data storage area based on the unique identifier, and add the unique identifier to the target file if no file containing the unique identifier is found. The file storage module is used to store the target file with the added unique identifier in the data storage area.
7. A data storage device, characterized in that, Configured on a database server, the device includes: The data acquisition module is used to respond to a data upload request from a user terminal and extract the user identifier, the target file to be uploaded, and the metadata of the target file from the data upload request. The data sending module is used to send a data storage request carrying the target file to the storage server, so that the storage server generates a unique identifier corresponding to the target file based on the content of the target file; performs a file query in the data storage area based on the unique identifier; if no file containing the unique identifier is found, stores the target file with the added unique identifier in the data storage area; and returns a data storage response carrying the unique identifier. A data storage module is configured to, in response to the data storage response, extract the unique identifier from the data storage response and associate and store the user identifier, the metadata of the target file, and the unique identifier together.
8. A data storage device, characterized in that, The device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the data storage method as described in any one of claims 1 to 5.
9. A storage medium, characterized in that, The storage medium is a computer-readable storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, it implements the steps of the data storage method as described in any one of claims 1 to 5.
10. A computer program product, characterized in that, The computer program product includes a computer program that, when executed by a processor, implements the steps of the data storage method as described in any one of claims 1 to 5.