Local computing and distributed computing based data computing method and system

A distributed computing and local computing technology, applied in the field of computer science, can solve the problems of high implementation cost, redundant processing power, large preparation time, etc., and achieve the effect of optimizing computing efficiency, avoiding data preparation time, and ensuring computing efficiency.

Active Publication Date: 2016-04-06
GUANGZHOU SHIYUAN ELECTRONICS CO LTD
4 Cites 2 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0003] In order to optimize computing efficiency, on the one hand, it is to enhance the performance of local computing, or to optimize local algorithms; after optimization, for micro data projects, there is a surplus of processing capacity, which forms a waste of resources, an...
View more

Method used

[0052] For example, by detecting whether a subroutine is the first to return the calculation result of the data item, it is judged whether a subroutine is completed first...
View more

Abstract

The invention relates to a local computing and distributed computing based data computing method and system. The method comprises: calling preset local computing mode and distributed computing mode to compute a same data item, marking the computing mode with relatively short computing time as a preferred computing mode of the data item, and obtaining training samples each containing data volume, the preferred computing mode and computing time information; according to the training samples, generating a training model; and estimating the data volume of a to-be-processed data item, and according to the training model and the data volume of the to-be-processed data item, determining the computing mode adapting to the to-be-processed data item and calling the computing mode to compute the to-be-processed data item. Through the data computing method and system, computing policies adapting to the scales of data items can be selected for different data items, the realization cost is low, and the computing efficiency is optimized.

Application Domain

Relational databasesDatabase distribution/replication +1

Technology Topic

Item selectionTime information +2

Image

  • Local computing and distributed computing based data computing method and system
  • Local computing and distributed computing based data computing method and system
  • Local computing and distributed computing based data computing method and system

Examples

  • Experimental program(1)

Example Embodiment

[0041] In order to make the purpose, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. The embodiments described above are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
[0042] The embodiments provided by the present invention include data computing method embodiments based on local computing and distributed computing, and also include corresponding data computing system embodiments based on local computing and distributed computing, which will be described in detail below.
[0043] figure 1 This is a schematic flowchart of a data computing method based on local computing and distributed computing according to an embodiment of the present invention. like figure 1 As shown in , the data computing method based on local computing and distributed computing in this embodiment includes the following steps S101 to S103, which are described in detail as follows:
[0044]S101, respectively call a preset local computing mode and a distributed computing mode to calculate the same data item, and record the calculation mode with a shorter calculation time as the preferred calculation mode of the data item; obtain the preferred calculation mode of the data item. Calculation is time-consuming, and obtains a training sample comprising the data volume of the data item, the preferred calculation mode, and the time-consuming calculation;
[0045] In this embodiment, two calculation modes (ie, a local calculation mode and a distributed calculation mode) are preset to process data items, and the preferred calculation mode for a data item is obtained based on the two calculation modes. The two calculation modes perform calculations on the same data item; it is detected whether one of the calculation modes returns the calculation result first, and if so, the calculation mode that returns the calculation result first is recorded as the preferred calculation mode of the data item.
[0046] Preferably, after detecting the calculation mode that returns the calculation result first, immediately terminate the calculation of the data item by the calculation mode that does not return the calculation result, and recycle resources in time.
[0047] It can be understood that the purpose of step S101 is to collect the time-consuming conditions of the two calculation modes on different data items, and use this as a training sample to establish a training model. The training process can be as follows:
[0048] Main training program:
[0049] (1) Prepare the data items that need to be processed, and count the data volume of the data items;
[0050] (2) run the local computing subprogram and the distributed computing subprogram simultaneously to process the data items simultaneously with calling two preset computing modes;
[0051] (3) Judging whether there is a subroutine completed first, if so, record the calculation mode corresponding to the subroutine, and execute the next step, otherwise, judge;
[0052] For example, it can be determined whether a subroutine is completed first by detecting whether a subroutine returns the calculation result of the data item first. The purpose of this process is to determine which calculation mode is more suitable for the data item to be processed, so that the calculation efficiency is the highest;
[0053] (4) Obtain the execution time of the subprogram that first completes the data item, that is, obtain the calculation time-consuming of processing the data item in the local computing mode or the distributed computing mode corresponding to the subprogram; and then obtain the data item containing the data item. A training sample of information such as the amount of data, the calculation mode that first completed the data item, and the time-consuming calculation, record the training sample.
[0054] (5) Terminate another subroutine that has not been completed, that is, terminate the processing of the data item by the subroutine that has not returned the calculation result, and recycle resources in time.
[0055] The two subroutines are basically similar, and the local computing mode and the distributed computing mode are respectively invoked to process the data items, and the main difference between the two subroutines is that the algorithms are written in different ways.
[0056] Two subroutines:
[0057] (1) Run local computing algorithms (or distributed computing algorithms) to process data items;
[0058] Wherein, the local computing subroutine can directly call the preset algorithm, and the distributed computing subroutine needs to modify the preset algorithm into a format conforming to distributed processing;
[0059] (2) Judging whether the execution is completed, if so, execute the next step, otherwise, continue to judge;
[0060] (3) Obtain the execution time (that is, the calculation time), and return the execution time and the calculation result to the main training program.
[0061] S102, generating a training model according to several of the training samples;
[0062] It can be understood that the training model includes information such as the amount of data, the preferred calculation mode corresponding to each amount of data, and the time-consuming of calculation.
[0063] As a preferred embodiment, identification information can be set for the local computing mode and the distributed computing mode in advance to distinguish and identify, and the corresponding mode identification may be recorded in the training sample or the training model.
[0064] As a preferred embodiment, the training model may be generated based on all the training samples generated within a set time; or, the training model may be generated based on a set number of training samples closest to the current time. It can be understood that the larger the set time range or the larger the number of referenced training samples, the more accurate the generated training model will be, and the more complex the generated training model will be.
[0065] As a preferred embodiment, before generating a training model according to a plurality of the training samples, the data set of the training samples can be analyzed, and the data with the same or similar data amount can be clustered or a data distribution curve can be fitted to eliminate the Outlier data.
[0066] S103, estimating the data volume of the data item to be processed, determining a calculation mode suitable for the data item to be processed according to the training model and the data volume of the data item to be processed, and calling the calculation mode for the data item to be processed Data items are calculated.
[0067] When there is a new data item to be processed, first estimate the data volume of the data item to be processed, and then input the data volume of the data item to be processed into the generated training model, which can match the data volume and the data volume of the data item to be processed. For the training sample that is closest and takes the shortest calculation time, the preferred calculation mode of the training sample is taken as the calculation mode suitable for the data item to be processed.
[0068] For example: if the data volume of the data item to be processed is 12,000 data units, the similar data volume in the training model includes 10,000 data units and 14,000 data units. If the data volume is 10,000, the preferred calculation mode is local calculation and the corresponding The calculation time is 0.25 seconds. If the preferred calculation mode corresponding to the data volume is 14,000, distributed computing, and the corresponding calculation time is 0.3 seconds. Based on this, inputting the data volume of the data item to be processed into the generated training model can match the training sample information with a data volume of 10,000, local calculation, and calculation time of 0.25 seconds. Therefore, the local calculation mode can be used as the data to be processed. Project-appropriate calculation mode.
[0069] It can be understood that when the training model reaches the ideal state, the calculation strategy suitable for the data items of different scales can be accurately selected, so that the calculation time of data items of different scales can be minimized and the calculation efficiency can be optimized.
[0070] On the basis of the above-mentioned embodiment, figure 2 It is a schematic flowchart of a data computing method based on local computing and distributed computing according to another embodiment of the present invention. figure 2 Example embodiments with figure 1 The main difference between the exemplary embodiments is that, after the training model is generated, the training model can also be dynamically adjusted according to the actual processing situation of the feedback, so that it gradually tends to an ideal state.
[0071] It should be noted, figure 2 In the example embodiment of the data computing method based on local computing and distributed computing, the process of generating training samples and generating a training model according to the training samples is the same as figure 1 The example embodiment is similar, this part is not reflected in the figure 2 middle.
[0072] refer to figure 2 , the following will specifically describe the related process of performing data processing based on the training model and dynamically adjusting the training model according to the actual situation of the data processing after generating the training model, including steps S201 to S209.
[0073] S201, prepare the data items to be processed, and estimate the data volume of the data items to be processed;
[0074] S202, determining a calculation mode ( figure 2 Mode 1) in the middle, as the main mode;
[0075] S203, calling the main mode to calculate the data item to be processed;
[0076] S204, change another calculation mode ( figure 2 Mode 2) as an auxiliary mode, call the auxiliary mode to calculate the data item to be processed;
[0077] S205, determine whether the main mode is the first to return the calculation result of the data item to be processed, if so, go to step S206, otherwise, go to step S207;
[0078] S206, terminate the calculation of the data item to be processed by the auxiliary mode; execute step S208;
[0079] S207, when the auxiliary mode returns the calculation result of the data item to be processed, terminate the calculation of the data item to be processed by the main mode;
[0080] S208, obtain the calculation time-consuming information of the main mode or the auxiliary mode that returns the calculation result of the data item to be processed first, according to the data volume of the data item to be processed, the main mode or auxiliary mode that returns the calculation result first, and the calculation consumption time information to get a new training sample;
[0081] S209, use the new training sample to adjust the training model.
[0082] Through the above method embodiment, when there are data items that need to be processed, the local technology mode and the distributed computing mode are comprehensively compared, and the calculation mode suitable for the data volume of small, medium and large-scale data items is automatically selected, which is automatic and convenient, and reduces manual intervention. ; to avoid the occurrence of data delay when invoking distributed computing to process small and medium-sized data items.
[0083] It should be noted that, for the convenience of description, the foregoing method embodiments are all expressed as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described action sequence, because Certain steps may be performed in other orders or simultaneously in accordance with the present invention. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily all necessary to the present invention.
[0084] The following describes a data computing system based on local computing and distributed computing that can be used to execute the above-mentioned data computing method based on local computing and distributed computing according to the embodiments of the present invention. image 3 It is a schematic structural diagram of a data computing system based on local computing and distributed computing according to an embodiment of the present invention. For convenience of description, image 3 Only the parts related to the embodiments of the present invention are shown in the above, and those skilled in the art can understand that, image 3 The system structure shown in the figure does not constitute a limitation of the system, and may include more or less components than shown, or combine certain components, or arrange different components.
[0085] image 3 This is a schematic structural diagram of a data computing system based on local computing and distributed computing according to an embodiment of the present invention. like image 3 As shown, the data computing system based on local computing and distributed computing in this embodiment includes: a training module 310, a model generation module 320, and a call execution module 330, wherein:
[0086] The training module 310 is used to respectively call the preset local computing mode and distributed computing mode to calculate the same data item, and record the calculation mode with a shorter calculation time as the preferred calculation mode of the data item; obtain The calculation time-consuming of the preferred calculation mode is obtained, and the data volume, the preferred calculation mode and the calculation time-consuming training sample containing the data items are obtained;
[0087] Preferably, the training module 310 specifically includes:
[0088] The preparation unit is used to simultaneously call the preset local computing mode and the distributed computing mode to calculate the same data items, and is also used to count the data volume of the data items; the detection unit is used to detect the calculation mode that returns the calculation result first, and It is denoted as the preferred calculation mode of the data item; And, the termination unit is used to obtain the data volume, the preferred calculation mode and the time-consuming training sample comprising the data item, and terminates the calculation mode pair that does not return the calculation result. Calculation of the data items, and timely recycling of resources.
[0089] The model generation module 320 is configured to generate a training model according to a plurality of the training samples;
[0090] It can be understood that the training model includes information such as the amount of data, the preferred calculation mode corresponding to each amount of data, and the time-consuming of calculation.
[0091] As a preferred embodiment, identification information can be set for the local computing mode and the distributed computing mode in advance to distinguish and identify, and the corresponding mode identification may be recorded in the training sample or the training model.
[0092] As a preferred embodiment, the training model can be generated based on all the training samples generated within the set time; or the training model can be generated based on the training samples of the set data closest to the current time. It can be understood that the larger the set time range or the larger the number of referenced training samples, the more accurate the generated training model will be, and the more complex the generated training model will be.
[0093] As a preferred embodiment, the model generation module 320 includes a model tuning unit, configured to analyze the data set of the training samples before generating a training model according to a plurality of the training samples. The data is clustered or fitted with a data distribution curve to eliminate outliers.
[0094] The invoking execution module 330 is used to estimate the data volume of the data item to be processed, determine a calculation mode suitable for the data item to be processed according to the training model and the data volume of the data item to be processed, and call the data volume of the data item to be processed. The calculation mode performs calculations on the data items to be processed.
[0095] Wherein, determining a calculation mode suitable for the data item to be processed according to the training model and the data volume of the data item to be processed may include inputting the data volume of the data item to be processed into the training model, A training sample whose data volume is closest to the data volume of the data item to be processed and with the shortest calculation time is matched, and the preferred calculation mode of the training sample is used as the calculation mode suitable for the data item to be processed.
[0096] It can be understood that when the training model reaches the ideal state, the calculation strategy suitable for the data items of different scales can be accurately selected, so that the calculation time of data items of different scales can be minimized and the calculation efficiency can be optimized.
[0097] As a preferred embodiment, the data computing system based on local computing and distributed computing further includes:
[0098] The model adjustment module 340 is configured to use the calculation mode suitable for the data item to be processed as the main mode, use another calculation mode as the auxiliary mode, and call the auxiliary mode to calculate the data item to be processed; determine the main mode Whether to return the calculation result of the data item to be processed first, if so, terminate the calculation of the data item to be processed by the auxiliary mode; if not, terminate the calculation result of the data item to be processed in the main mode when the auxiliary mode returns the calculation result The calculation of the data item to be processed; and the calculation time-consuming of the main mode or the auxiliary mode that obtains the first to return the calculation result of the data item to be processed, according to the data volume of the data item to be processed, the main mode that returns the calculation result first Or a new training sample is obtained in the auxiliary mode and the time-consuming calculation, and the training model is adjusted with the new training sample.
[0099] According to the above-mentioned embodiments of the data computing system based on local computing and distributed computing of the present invention, when there are data items that need to be processed, the local technical mode and the distributed computing mode are comprehensively compared, and the data items of small, medium and large scale are automatically selected and their data amount The adaptive calculation mode is automatic and convenient, reduces manual intervention, realizes low cost, and optimizes calculation efficiency.
[0100] It should be noted that the information exchange and execution process among the modules/units in the above embodiments are based on the same concept as the foregoing method embodiments of the present invention, and the technical effects brought by them are the same as those of the foregoing method embodiments of the present invention. , and the specific content can refer to the description in the method embodiment of the present invention, which is not repeated here.
[0101] In addition, in the implementation of the data computing system based on local computing and distributed computing in the above example, the logical division of each functional module is only an example, and in practical applications, it can be required, for example, due to the configuration requirements of the corresponding hardware or software requirements. For the convenience of implementation, the above-mentioned function distribution is completed by different function modules, that is, the internal structure of the data computing system based on local computing and distributed computing is divided into different function modules to complete all or part of the functions described above.
[0102] In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.
[0103] In addition, each functional module in the foregoing embodiments of the present invention may be integrated into one processing module, or each module may exist physically alone, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules.
[0104] If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may be stored in a computer-readable storage medium. Those of ordinary skill in the art can understand that all or part of the steps of the method specified by any embodiment of the present invention can be completed by instructing relevant hardware (personal computer, server, or network device, etc.) through a program. The program can be stored in a computer-readable storage medium. When the program is executed, all or part of the steps of the methods specified in any of the foregoing embodiments can be executed. The aforementioned storage medium may include any medium that can store program codes, such as read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, and the like.
[0105] The above is a description of the data computing method and system based on local computing and distributed computing provided by the present invention. For those of ordinary skill in the art, according to the idea of ​​the embodiments of the present invention, there will be specific implementation methods and application scopes. Changes, in conclusion, the content of this specification should not be construed as a limitation to the present invention.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Ice coating simulation method for 220kV transmission line tower-coupling system

InactiveCN108710763AShort modeling cycleImprove computing efficiency
Owner:国网江西省电力有限公司经济技术研究院 +1

Hydraulic power plant comprehensive data analysis method

InactiveCN104574212AGuaranteed Computational Efficiency
Owner:NANJING NARI GROUP CORP +2

Ship-based image-stabilizing method based on sea-sky boundary detecting

ActiveCN103514587AReduce computing burdenImprove computing efficiency
Owner:BEIJING INST OF ENVIRONMENTAL FEATURES

Classification and recommendation of technical efficacy words

  • Guaranteed Computational Efficiency

Hydraulic power plant comprehensive data analysis method

InactiveCN104574212AGuaranteed Computational Efficiency
Owner:NANJING NARI GROUP CORP +2
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products