Distributed type processing method based on massive data
A distributed processing and massive data technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems that cannot be directly applied to scientific data processing, and achieve the effect of simple use and efficient operation
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
example 1
[0059] The SQL command is: select time, var1, var2 from examplewhere y>=3andy<=9;
[0060] (1) Condition variable selection optimization and array merge storage are not performed
[0061] NcFileInputFormat knows from the query command to use three variables time, y, and var1, where the output variable is time, var1, and y are condition variables. According to the NetCDF file header information, it is known that var1 and var2 are the main variables, and time and y are both Dimension variables for var1 and var2. NcFileInputFormat traverses the output data tuple {time, y, var1, var2} from the NetCDF file, such as the first data tuple is {1000.0, 3, 1, 2}, the number of data tuples depends on the main variable var1 or The dimension of var2, here is 6x2=12, NcFileSerDe knows from NcFileInputFormat that the variables to be used are time, y, var1, var2, and deserializes the data tuple output by NcFileInputFormat according to the type of variables in the table {time is double type ,...
example 2
[0071] The SQL command is: select time, y, x, var3, from example where y=6andx>=6and x<=8;
[0072] (1) Conditional selection optimization and array merge storage are not performed
[0073] The main variable in {time, y, x, var3} is var3. The total generated data tuples are 2x6x4=48, and there are 2x1x2=4 qualified data tuples after filtering, as follows: {1000.0, 6 , 6, 6}, {1000.0, 6, 8, 8}, {1001.0, 6, 6, 30}, {1001.0, 6, 8, 32}, the final generated NetCDF file is as follows:
[0074]
[0075](2) The case of conditional selection optimization
[0076] The selection range [1, 3] of x>=6 and x>=8 of the x variable as the conditional variable is not continuous, and the conditional selection optimization cannot be used.
[0077] It can be seen from the above embodiments that the present invention designs a MapReduce-based distributed processing method for massive data stored in the form of arrays, so that users can use SQL commands to perform distributed processing on the ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More - R&D
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com



