Storage file filtering method and apparatus
A filtering method and a technology for importing files, which are applied in the field of data processing, can solve problems such as failure to load, low traversal efficiency, and no effective solution proposed, so as to achieve the effect of reducing abnormal situations and not reducing storage efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0029] Embodiment 1 of the present invention provides a filtering method for storage files, which is implemented by the data warehouse tool Hive, see figure 1 The schematic flow chart of the filtering method for the storage files shown includes the following steps:
[0030] Step S11, using the data warehouse tool Hive to obtain the verification code of the current storage file.
[0031] The method provided in this embodiment is applicable to the case of Hive importing data, where the current storage file is the source file to be imported. The above check code is the unique corresponding value calculated by each file. For example, the check code is uniquely obtained by the checksum checksum or other algorithms such as the Hash function, which can distinguish whether the content of each file is the same as that of other files. (regardless of whether the two names are the same).
[0032] Step S12, searching whether there is a check code consistent with the check code of the cur...
Embodiment 2
[0047] Embodiment 2 of the present invention provides a method for filtering files stored in a database, and the verification code is checksum as an example for illustration. see figure 2 The schematic flow chart of the filtering method for the storage files shown includes the following steps:
[0048] Step S21, obtain the checksum checksum of the current storage file through the Hive CheckSum function in the data warehouse tool.
[0049] First, the program uses Hive to execute the load operation, and obtains the checksum of the current storage file (that is, the source file) through the Hive CheckSum function. Before the source file is stored in the database, the checksum of the source file needs to be calculated to obtain the checksum result. The specific calculation process belongs to the prior art and will not be repeated here.
[0050] Step S22, using the checksum checksum of the current storage file as the check code of the current storage file.
[0051] In step S23,...
Embodiment 3
[0069] Embodiment 3 of the present invention provides a filter device for storage files, which is realized by the data warehouse tool Hive, see Figure 4 The schematic diagram of the structure includes a verification code acquisition module 410, a verification code search module 420, a discarding module 430 and an import module 440, wherein the functions of each module are as follows:
[0070] The verification code acquisition module 410 is used to obtain the verification code of the current storage file using the data warehouse tool Hive;
[0071] The verification code search module 420 is used to find whether there is a verification code consistent with the verification code of the current storage file in the pre-stored verification code in the target directory; the pre-stored verification code is the imported file in the target directory check code;
[0072] Discarding module 430, for if yes, discarding the current storage file;
[0073] The import module 440 is configure...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com