Unlock instant, AI-driven research and patent intelligence for your innovation.

Dynamic priority iterator based on data characteristics in Gaia system

A technology of dynamic priority and data characteristics, applied in the direction of electronic digital data processing, instrumentation, program startup/switching, etc., can solve the problems of task completion time and throughput impact, reduce system execution efficiency, and iterative algorithm convergence cannot be achieved. The effect of accelerating the convergence speed and execution efficiency

Active Publication Date: 2021-03-19
NORTHEASTERN UNIV +1
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In large-scale data, only a small number of data update calculations will play a decisive role in the convergence of the iterative algorithm, and the role of most data update calculations is often limited, so the above iteration mechanism will cause Gaia to perform some different functions. The necessary calculations cannot make the iterative algorithm converge quickly, which greatly reduces the execution efficiency of the system
[0005] However, in today’s era of explosive data volume, the Gaia system’s processing efficiency for time-varying data is low, and the iterative algorithms supported begin to show many inapplicability, and system processing is also prone to bottlenecks, which cannot meet users’ needs for real-time performance.
Especially for data with data skew, if we still follow the traditional thinking and only consider the memory and performance of the processing nodes in the big data computing system to optimize the iterative computing process, without considering the characteristics of the iterative data being processed, it will be It will cause the system to perform some unnecessary operations and even prevent the iterative algorithm from converging within the effective time, which will affect the completion time and throughput of the overall task of the system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dynamic priority iterator based on data characteristics in Gaia system
  • Dynamic priority iterator based on data characteristics in Gaia system
  • Dynamic priority iterator based on data characteristics in Gaia system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

[0056] In this embodiment, the Gaia system is used to process the Lloyd-Forgy clustering algorithm as the actual iterative application scenario. This clustering algorithm is the most classic and simple K-means iterative algorithm, and the distance is used as the evaluation index of similarity, that is, the distance between two objects is considered The closer the distance, the greater the similarity. In this embodiment, in the core iterative calculation process of the Lloyd-Forgy clustering algorithm, the incremental iteration module in the data feature-based dynamic priority iterator of the present invention is used for iterative calculation.

[0057] The data used in th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a dynamic priority iterator based on data characteristics in a Gaia system, and relates to the technical field of distributed big data calculation. The iterator comprises a priority scheduling module, a total iterative computation module and an incremental iterative computation module, the priority scheduling module reads the data of the data source to serve as an initial work set of iterative computation, and maintains a skip list for searching and selecting a data unit and a state list for storing state information corresponding to the data unit in each iterative taskexecution process; updating the state table according to a priority adjustment formula at the beginning of each iteration task, and determining the priority of each data unit after all the data unitsinput by the current iteration task are updated; the Gaia system performs iterative computation according to the priority information of each data unit; wherein the full iteration module is used for realizing iterative computation through a BulkIterate operator, and the increment iteration module is used for realizing iterative computation through a Delta Iterate operator.

Description

technical field [0001] The invention relates to the technical field of distributed big data computing, in particular to a dynamic priority iterator based on data characteristics in a Gaia system. Background technique [0002] The Gaia system is a new generation of big data computing system with high timeliness and scalability based on the hybrid coexistence of multiple computing models. Solve a series of key technical problems at several core levels of big data analysis systems such as adaptive and scalable big data storage, batch-flow fusion big data computing, high-dimensional large-scale machine learning, and high-time-effective big data intelligent interactive guides. Build an independent, controllable, time-effective and scalable new-generation big data analysis system, and master the core technology of the world's leading big data analysis system. [0003] In the big data environment, distributed iterative computing plays a vital role in data processing and analysis, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/48G06F9/50G06F9/54
CPCG06F9/4881G06F9/5016G06F9/544G06F2209/484G06F2209/5021
Inventor 岳晓飞赵宇海王国仁季航旭李博扬
Owner NORTHEASTERN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More