A method for eliminating accumulation of rounding errors
By reinitializing components during iterative computation and employing a role-swapping mechanism between the main computation window and the backup computation window, the problem of rounding error accumulation is solved, enabling efficient and accurate iterative computation that meets the needs of real-time data processing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 吕纪竹
- Filing Date
- 2021-09-13
- Publication Date
- 2026-06-16
AI Technical Summary
Existing iterative computation methods suffer from accumulating rounding errors when processing large and streaming data, causing the calculation results to gradually deviate from the correct results. Furthermore, existing methods are time-consuming and cannot meet the needs of real-time data processing.
By re-initializing components during iterative calculations and employing a role-swapping mechanism between the main and backup calculation windows, rounding errors are gradually eliminated through iterative and incremental calculations.
It effectively eliminates rounding errors, ensures the accuracy and stability of calculation results, and improves calculation efficiency to meet the needs of real-time data processing.
Smart Images

Figure CN115809261B_ABST
Abstract
Description
Technical Field
[0001] Big data or streaming data analysis. Background Technology
[0002] The internet, mobile communications, navigation, online games, sensing technologies, and large-scale computing infrastructure generate massive amounts of data daily. Big data, due to its enormous scale, rapid changes, and growth rate, exceeds the processing capabilities of traditional database systems and the analytical capabilities of traditional analytical methods. Many companies now rely on big data and / or streaming data for real-time decision-making to solve various problems. Existing methods involve the use of large amounts of computing resources, resulting in significant waste and not necessarily meeting the needs of real-time decision-making based on the latest information, especially for the financial industry. Iterative computation is an effective method for reusing existing computation results to avoid repeated data access and redundant calculations, thus enabling timely and efficient processing and analysis of big data and / or streaming data. However, rounding errors accumulate during iterative computation, causing the computation results to gradually deviate from the correct outcome. Recalculating initial values for iterative computation can avoid the continuous accumulation of rounding errors, but the calculation of initial values may be time-consuming and may not meet the needs of real-time data processing. Therefore, a method that can overcome the accumulation of rounding errors and provide efficient and stable iterative computation is needed. Summary of the Invention
[0003] This invention extends to methods, systems, and computing device program products for eliminating rounding error accumulation during iterative computation of large and / or streaming data on a computing device-based computing system. A computing device-based computing system includes one or more computing devices and one or more storage media, wherein each computing device includes one or more processors. Eliminating rounding error accumulation is a specially designed process and mechanism for reinitializing components during iterative computation so that rounding errors accumulated during iterative computation are eliminated rather than escalated. Embodiments of this invention include iteratively computing a function for a primary computing window of a specified size while incrementally computing the function for one or more backup computing windows starting at different time points, and when one of the backup computing windows reaches a specified window size, swapping the roles of the backup and primary computing windows: by resetting the size of the primary computing window and making it a backup computing window on which incremental computation of the function begins, and by making the backup computing window that has reached the specified size the primary computing window on which iterative computation of the function begins. Embodiments of this invention return the result of computing the function on a computing window with a window size of the specified window size as the output computation result of the function.
[0004] In this paper, a component of a function is a quantity or expression that appears in the function's definition or any transformation thereof. A function can be computed based on one or more of its components.
[0005] In this article, a data source can be a real-time data stream or a storage medium.
[0006] A computing system based on a computing device initializes one or more components of a function on a pre-modification main computing window that is accessible to one or more data sources. The pre-modification main computing window contains a specified number of sets of data elements, n (n > 1), and each set of data elements contains k (k ≥ 1) data elements from the one or more data sources.
[0007] The computation system initializes one or more components of the function on each of one or more (l (l ≥ 1)) pre- and post-modification computation windows.
[0008] The computing system accesses r (r≥1) groups of data elements from one or more data sources to be added to the main computing window before the modification, where each group of data contains k data elements.
[0009] The computing system stores the accessed r sets of data elements into one or more data buffers. This is an optional operation that is only performed if the one or more data sources include real-time data streams.
[0010] The computing system modifies the original main computing window by: removing the earliest r groups of data elements from the original main computing window; and adding the r groups of data elements to be added to the original main computing window.
[0011] The calculation system modifies each of the l pre- and post-modification backup calculation windows. Modifying each pre- and post-modification backup calculation window includes: adding r sets of data elements to be added to the pre- and post-modification backup calculation window and modifying the corresponding window size counter for the pre- and post-modification backup calculation window.
[0012] The computing system iteratively computes one or more components of the function on the main computing window after the modification, based on one or more components of the function on the main computing window before the modification.
[0013] The computational system incrementally computes one or more components of the function in the modified backup computation window corresponding to each of the l modified backup computation windows, based on one or more components of the function in each of the l modified backup computation windows.
[0014] The computing system can generate one or more computation results of the function based on one or more components of iterative computation or one or more components of incremental computation on a backup computation window of size n when the one or more computation results are accessed or queried.
[0015] In this computing system, whenever the size of any of the l modified backup computing windows reaches n, the roles of the modified main computing window and the modified backup computing window of size n are interchanged by setting the modified backup computing window as a pre-modification main computing window and resetting the modified main computing window as a pre-modification backup computing window. The reset includes making the modified main computing window contain the latest n mod r (n mod r) set of data, setting the modified main computing window as a pre-modification backup computing window, and initializing one or more components of the function on the pre-modification backup computing window.
[0016] The computing system can continuously access r groups of data elements to be added to the original main computing window, modify the original main computing window, modify each of the l backup computing windows, iteratively calculate one or more components of the function in the modified main computing window, incrementally calculate one or more components of the function in each of the l modified backup computing windows, and generate one or more calculation results of the function in the modified main computing window based on one or more components of the iteratively calculated components or one or more components of the incrementally calculated components in a backup computing window of size n. When these one or more calculation results are accessed or queried, and whenever the size of a modified backup computing window reaches n, the roles of the modified main computing window and the modified backup computing window of size n are swapped. The computing system can repeat this process multiple times as needed, for example, until a preset number of times is reached or until the computing system is notified to stop the process.
[0017] This brief description is intended to present some alternative concepts in a simplified manner, which will be described in further detail below. This brief description is not intended to identify key or essential features of the subject matter of the claims, nor is it intended to help determine the scope of the subject matter of the claims.
[0018] Other features and advantages of the invention will become apparent from the description which follows, or in part will be obvious from the description, or may be learned by practice of the invention. The features and advantages of the invention may be realized and obtained from the methods, apparatus, and combinations thereof particularly pointed out in the appended claims. These and other features of the invention will become more fully apparent and clear from the following description and the appended claims, or from practice of the invention. Attached Figure Description
[0019] To illustrate how the above and other advantages and features of the present invention can be obtained, a more specific description of the invention briefly described above will be presented by referring to the specific embodiments shown in the following accompanying figures. These figures merely illustrate typical embodiments of the invention and therefore should not be construed as limiting the scope of the invention. The invention will be described and explained herein using the following accompanying figures and some added specificity and detail:
[0020] Figure 1 The diagram illustrates a high-level overview of an example computing system that implements iterative computation of big data or streaming data.
[0021] Figure 2-1 This shows an example method 200A for eliminating rounding error accumulation with two computational streams.
[0022] Figure 2-2 This demonstrates an example method 200B for eliminating rounding error accumulation with three computational streams.
[0023] Figure 2-3 This shows an example method 200C for eliminating rounding error accumulation with four computational streams.
[0024] Figure 3 The flowchart shows an example method for eliminating the accumulation of rounding errors in iterative computations of big data or streaming data. Specific implementation methods
[0025] This invention extends to methods, systems, and computing device program products for eliminating rounding error accumulation during iterative computation of large and / or streaming data on a computing device-based computing system. A computing device-based computing system includes one or more computing devices and one or more storage media, wherein each computing device includes one or more processors. Eliminating rounding error accumulation is a specially designed process and mechanism for reinitializing components during iterative computation so that rounding errors accumulated during iterative computation are eliminated rather than escalated. Embodiments of this invention include iteratively computing a function for a primary computing window of a specified size while incrementally computing the function for one or more backup computing windows starting at different time points, and when one of the backup computing windows reaches a specified window size, swapping the roles of the backup and primary computing windows: by resetting the size of the primary computing window and making it a backup computing window on which incremental computation of the function begins, and by making the backup computing window that has reached the specified size the primary computing window on which iterative computation of the function begins. Embodiments of this invention return the result of computing the function on a computing window with a window size of the specified window size as the output computation result of the function. Eliminating the accumulation of rounding errors enables the computing system based on the computing device to run iterative calculations stably over a long period of time without accumulating rounding errors.
[0026] In this paper, a component of a function is a quantity or expression that appears in the function's definition or any transformation thereof. A function itself can be considered its largest component. A function can be computed based on one or more of its components.
[0027] An embodiment of the present invention includes simultaneously computing one or more components of a function on multiple computation windows with different initial window sizes. These different initial window sizes are predefined and maintained by the computation system. All predefined initial window sizes are different, including a predefined full window size and a predefined minimum window size. One computation window with a predefined full window size, i.e., the window size used for iterative computation, is called the main computation window, on which one or more components of the function are iteratively computed. All other computation windows with smaller window sizes are called backup computation windows, on which one or more components of the function are incrementally computed. When the size of one of the backup computation windows on which incremental computation is performed reaches the predefined full window size, the computation system converts the main computation window into a backup computation window by resetting its window size to the predefined minimum computation window size, and converts the backup computation window that has reached the predefined full window size into the main computation window. The computation system uses the computation result of the function generated on the main computation window or on a backup computation window whose window size has reached the full window size as the output computation result of the function.
[0028] A function may require k (k ≥ 1) inputs (e.g., k data elements from k variables). For example, for univariate functions such as variance, standard deviation, skewness, kurtosis, and autocorrelation, a single data element is needed as input when iteratively computed on a truly real-time data stream, i.e., k = 1. For bivariate functions such as covariance, correlation, and simple linear regression, data from two variables is needed as input, i.e., k = 2. In other words, when iteratively computed on a truly real-time data stream, at each time point, a set of data containing two data elements needs to be used as input to these bivariate functions. Similarly, for functions like multiple linear regression, at each time point, a set of data containing k data elements may be needed as input. Depending on the actual needs, iterative computation can be performed as any new data element is collected at a single time point or after a period of time after new data has been collected at multiple time points. Assuming the number of data collection points is r (r ≥ 1), then in each round of iterative computation, a total of r × k data elements are removed from the main computation window and a total of r × k data elements are added to the main computation window. For microbatch iterative computation, the input data of a function is collected over a time interval consisting of multiple collection points. In this case, the input of the function can be an r × k matrix, or k vectors of length r, or r groups of data elements, each group consisting of k data elements. For simplicity, in this paper, all the above-described input cases are represented as an input consisting of r groups of data elements, each group consisting of k data elements. The fastest change to a computation window is to change the computation window whenever a new single data element arrives for each variable. Therefore, for a function requiring k inputs, the fastest change is to remove k data elements from the computation window and add k data elements to the computation window. This is the case for true real-time streaming data, i.e., when r = 1. Note that data from multiple variables may be combined into a single data stream. Therefore, the input to a function may come from k data sources, each consisting of data elements from a single variable, or it may come from a single data source consisting of data elements from multiple variables. For simplicity, assume a main computation window of size n (n > 1) contains n sets of data elements, each set containing data elements from one or more (from 1 to k) data sources, and for each iteration of the iterative computation, newly accessed r (r ≥ 1) sets of data are added to the original main computation window, and the earliest accessed r (r ≥ 1) sets of data are removed from the original main computation window.
[0029] In this article, a data source can be a real-time data stream or a storage medium.
[0030] A computing system based on a computing device initializes one or more components of a function on a pre-modification main computing window accessible to the computing system from one or more data sources. The pre-modification main computing window contains a specified number of sets of data elements, n (n > 1), and each set contains r (r ≥ 1) data elements from k (k ≥ 1) data sources. In each iteration, the function takes a total of r × k (r ≥ 1, k ≥ 1) data elements as input. Initialization here refers to computing one or more components of the function on the pre-modification main computing window in any way, such as iteratively, incrementally, decrementally, dynamically, or by computing one or more components of the function using the data elements in the pre-modification computing window.
[0031] The computational system initializes one or more components of the function on each of one or more (l (l ≥ 1)) pre- and post-modification backup computation windows in the following manner:
[0032] (a) The computational system initializes one or more components of the function on each of l pre- and post-modification computational windows with different window sizes, which consist of data from one or more data sources at the same starting time. Here, the window size of the i-th pre- and post-modification computational window is m. i (0≤m i <n,(n-m i (mod r) = 0, m i ≠m j (when i≠j, 1≤i, j≤l) and includes the latest m i Groups of data elements, where each group contains k (k ≥ 1) data elements from one or more data sources. Or
[0033] (b) The computational system initializes one or more components of the function on each of l pre- and post-modification computational windows of the same window size, which consist of data from one or more data sources at different start times. Here, the window size of the i-th pre- and post-modification computational window is m. i (0≤m i <n,(n-m i (mod r) = 0, m i ≠m j (when i≠j, 1≤i, j≤l) and includes the latest m i Groups of data elements, with each group containing k data elements from one or more data sources. Or
[0034] (c) The computational system initializes one or more components of the function on each of the l pre- and post-modification standby computational windows of different window sizes, which are composed of data from one or more data sources at the same point in time, in a manner combining (a) and (b).
[0035] The computing system accesses r groups of data elements to be added to the main computing window before modification from one or more data sources, where each group of data contains k data elements.
[0036] The computing system stores the accessed r sets of data elements into one or more (from 1 to k) data buffers. This is an optional operation that is only executed if the one or more data sources include real-time data streams.
[0037] The computing system modifies the original main computing window by: removing the earliest r groups of data elements from the original main computing window; and adding the r groups of data elements to be added to the original main computing window.
[0038] The calculation system modifies each of the l pre- and post-modification backup calculation windows. Modifying each pre- and post-modification backup calculation window includes adding r sets of data elements to the window and modifying the corresponding window size counter by incrementing r.
[0039] The computing system iteratively computes one or more components of the function on the main computing window after the modification, based on one or more components of the function on the main computing window before the modification.
[0040] The computational system incrementally computes one or more components of the function in the modified backup computation window corresponding to each of the l modified backup computation windows, based on one or more components of the function in each of the l modified backup computation windows.
[0041] The computing system can generate one or more computation results of the function on the modified main computing window based on one or more components of the iterative computing components when the one or more computation results are accessed or queried.
[0042] The computing system continuously accesses r sets of data elements, modifies the main computing window and l backup computing windows, obtains multiple components of the function, and generates one or more computing results of the function on the modified main computing window based on one or more of the components of the iterative computing. When the one or more computing results are accessed or queried, the process continues until the window size of any one of the l backup computing windows reaches n - r.
[0043] At this point, the computing system accesses r sets of data elements, modifies the main computing window and l backup computing windows, and calculates multiple components of the function. Since the size of the main computing window and one of the l backup computing windows is n, the computing system can generate one or more calculation results for the function based either on one or more components of the function obtained through iterative calculation on the main computing window or on one or more components of the function obtained through incremental calculation on the backup computing window of size n.
[0044] The computation system generates one or more computation results of the function on the modified main computation window for a computation window of size n, based on one or more components of the components that compute the function on the computation window when the one or more computation results are accessed or queried.
[0045] The computing system interchanges the roles of the modified primary computing window and the modified backup computing window of size n by setting the modified backup computing window as a pre-modification primary computing window and resetting the modified primary computing window as a pre-modification backup computing window. The reset includes making the modified primary computing window contain the latest n mod r (n mod r) sets of data, setting the size of the modified primary computing window to n mod r (n mod r), setting the modified primary computing window as a pre-modification backup computing window, and initializing one or more components of the function on the pre-modification backup computing window.
[0046] The steps described above, excluding the two initializations, can be repeated multiple times as needed. That is, the computing system can continuously access r groups of data elements to be added to the original main computing window, modify the original main computing window, modify each of the l backup computing windows, iteratively calculate one or more components of the function in the modified main computing window, incrementally calculate one or more components of the function in each of the l modified backup computing windows, and generate one or more calculation results for the function in the modified main computing window based on one or more components from the iteratively calculated components or one or more components from the incrementally calculated components in a backup computing window of size n. When these one or more calculation results are accessed or queried, and whenever the size of a modified backup computing window reaches n, the roles of the modified main computing window and the modified backup computing window of size n are swapped. The computing system can continuously repeat this process as needed, for example, until a preset number of times is reached or until the computing system is notified to stop the process.
[0047] Embodiments of the present invention may include or utilize computing device hardware, such as one or more processors and storage devices as described in more detail below, whether dedicated or general-purpose computing devices. The scope of embodiments of the present invention also includes physical and other computing device-readable media used to carry or store computing device-executable instructions and / or data structures. These computing device-readable media may be any media accessible to general-purpose or dedicated computing devices. Computing device-readable media storing computing device-executable instructions is storage media (device). Computing device-readable media carrying computing device-executable instructions is transmission media. Therefore, by way of example and not limitation, embodiments of the present invention may include at least two different types of computing device-readable media: storage media (devices) and transmission media.
[0048] Storage media (devices) include random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), read-only optical disc memory (CD-ROM), solid-state drive (SSD), flash memory, phase-change memory (PCM), other types of memory, other optical disk storage, disk storage or other magnetic storage devices, or any other media that can be used to store program code in the form of computer device-executable instructions or data structures and that can be accessed by general-purpose or special-purpose computing devices.
[0049] A “network” is defined as one or more data links that enable computing devices and / or modules and / or other electronic devices to transmit electronic data. When information is transmitted or provided to a computing device via a network or other communication connection (wired, wireless, or a combination of wired and wireless), the computing device treats the connection as a transmission medium. The transmission medium may include a network and / or data link for carrying necessary program code in the form of instructions or data structures executable by the computing device, and which can be accessed by general-purpose or special-purpose computing devices. Combinations of the above should also be included within the scope of media readable by the computing device.
[0050] Furthermore, when using different computing device components, program code in the form of executable instructions or data structures can be automatically transferred from the transmission medium to the storage medium (device) (or vice versa). For example, executable instructions or data structures received from a network or data link can be temporarily stored in the random access memory (RAM) of a network interface module (e.g., a NIC) and then eventually transferred to the RAM of the computing device and / or to a smaller, more volatile storage medium (device) of the computing device. Therefore, it should be understood that the storage medium (device) can be included in computing device components that also (or even primarily) utilize the transmission medium.
[0051] Computer device executable instructions include, for example, instructions and data, which, when executed by a processor, cause a general-purpose or special-purpose computing device to perform a specific function or set of functions. Computer device executable instructions can be, for example, binary, intermediate format instructions such as assembly code, or even source code. Although the described subject matter is described in a specific language of structural features and / or methodological actions, it should be understood that the subject matter defined in the appended claims is not necessarily limited to the features or actions described above. Rather, the described features or actions are disclosed only as examples of implementing the claims.
[0052] Embodiments of the present invention can be implemented in a network computing environment configured with various types of computing devices, including personal computers, desktops, laptops, information processors, handheld devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network computers, minicomputers, mainframe computers, supercomputers, mobile phones, PDAs, tablets, pagers, routers, switches, and similar products. Embodiments of the present invention can also be applied to a distributed system environment consisting of local or remote computing devices performing tasks via network interconnection (i.e., via wired data links, wireless data links, or a combination of wired and wireless data links). In a distributed system environment, program modules can be stored on local or remote storage devices.
[0053] Embodiments of the present invention can also be implemented in a cloud computing environment. In this description and the following claims, "cloud computing" is defined as a model that enables on-demand access to a shared pool of configurable computing resources over a network. For example, cloud computing can be marketed to provide widespread and convenient on-demand access to a shared pool of configurable computing resources. The shared pool of configurable computing resources can be quickly prepared via virtualization and provided with low management overhead or low service provider interaction, and then adjusted accordingly.
[0054] Cloud computing models can include various features such as on-demand self-service, broadband network access, resource collection, rapid deployment and deployment, metered services, and so on. Cloud computing models can also be embodied in various service models, such as Software as a Service (“SaaS”), Platform as a Service (“PaaS”), and Infrastructure as a Service (“IaaS”). Cloud computing models can also be deployed through different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so on.
[0055] Iterative computation reuses the results of previous calculations to generate new results, thus improving computational efficiency and reducing computational resource requirements. This makes it possible to process and analyze data where it is generated and collected. Therefore, embodiments of the present invention can also be implemented on sensors or in edge computing environments.
[0056] Several examples will be given in the following sections.
[0057] Figure 1 This diagram illustrates a high-level overview of an example computing system 100 that supports iterative computation of big data or streaming data. (Reference) Figure 1 The computing system 100 includes multiple devices connected by different networks, such as a local area network 1021, a wireless network 1022, and the Internet 1023. These multiple devices include, for example, a data analysis engine 1007, a storage system 1011, a real-time data stream 1006, and multiple distributed computing devices, such as a personal computer 1016, a handheld device 1017, and a desktop computer 1018, etc., capable of scheduling data analysis tasks and / or querying data analysis results.
[0058] The data analysis engine 1007 may include one or more processors, such as CPU 1009 and CPU 1010, one or more system memories, such as system memory 1008, and component computing modules 131 and 192. The storage system 1011 may include one or more storage media, such as storage media 1012 and storage media 1014, which can be used to store large datasets. For example, 1012 and / or 1014 may store dataset 124. The dataset in the storage system 1011 can be accessed by the data analysis engine 1007.
[0059] Typically, data stream 1006 can include streaming data from various data sources, such as stock prices, audio data, video data, geospatial data, internet data, mobile communication data, online game data, bank transaction data, sensor data, and / or closed caption data. Several examples are described here; real-time data 1000 can include data collected in real time from sensors 1001, stock data 1002, communication data 1003, and banking data 1004, etc. Data analysis engine 1007 can receive data elements from data stream 1006. Data from different data sources can be stored in storage system 1011 and accessed for big data analytics; for example, dataset 124 can come from different data sources and be accessed for big data analytics.
[0060] Please understand. Figure 1 The concepts are presented in a very simplified form. For example, distributed devices 1016 and 1017 may connect to the data analysis engine 1007 through a firewall and load balancer. Data accessed or received by the data analysis engine 1007 from data stream 1006 and / or storage system 1011 may be filtered by data filters, and so on.
[0061] Figure 2-1 The diagram illustrates an example method 200A for eliminating rounding error accumulation using two computational streams. (Reference) Figure 2-1"INI" indicates initialization, "INC" indicates incremental computation, and "ITR" indicates iterative computation. When a computational flow called the main computational flow performs iterative computation on a main computational window with a predefined window size n (n > 1), another computational flow called the backup computational flow performs incremental computation on a backup computational window. Whenever the backup computational window reaches the predefined window size n, the two computational flows switch roles: the backup computational flow, which had been performing incremental computation, begins iterative computation and becomes the main computational flow, while the main computational flow, which had been performing iterative computation, begins incremental computation and becomes the backup computational flow. The main computational window and the backup computational window also switch roles accordingly: the original main computational window is reset to a backup computational window containing only one data element, while the original backup computational window becomes the main computational window and maintains a fixed window size of n. The computational system outputs the computation results generated on either a main computational window or a backup computational window whose window size reaches n.
[0062] In this paper, a computational flow is simply a logical process containing a series of operations; it is not equivalent to an actual process in an operating system. In other words, two computational flows can be executed separately by two processes / threads within a computing system, or even by two different computing devices, or they can be combined and executed by a single process / thread within a computing system. These two computational flows are driven and synchronized by data. Therefore, the execution order of these two computational flows is not important, as long as they complete the necessary operations before accessing or receiving the next data element.
[0063] For simplicity, assume that a particular function requires one data element (i.e., k = 1) as input for iterative computation, and that the function iteratively computes within a computation window of a predefined size n (n > 1). A new round of computation begins whenever a data element is removed from the main computation window and a new data element is added. The main computation window maintains a fixed window size n. The expected computation result is the result calculated by the function using all data elements in the main computation window. The iterative computation of this function utilizes the computation result of the computation window before modification, thus avoiding the use of all data elements in the modified computation window and saving computational resources. The two computational flows operate as described below. For the first n data elements, computational flow 1, the main computational flow, initializes one or more components of the function (201) according to the definition based on all n data elements in the first main computation window. At the next time point, the (n+1)th data element is accessed or received. A data element is removed from the main computation window, so that the main computation window still maintains a total of n data elements. The function needs to be recomputed after the data in the main computation window changes. Computational flow 1 begins iterative computation based on one or more components of the function initialized on the first main computation window after the (n+1)th data element is accessed or received (202). The computational system generates output results from data flow 1. Computational flow 2 initializes one or more components of the function to 0 (221) and begins incremental computation of one or more components of the function on the backup computation window with the (n+1)th data element (222). Now, the backup computation window has a window size of 1 and contains only one data element, namely the (n+1)th data element. At the next time point, the (n+2)th data element is accessed or received. Computational flow 1 continues iterative computation (203), and computational flow 2 continues incremental computation (223). This process continues until the (2n)th data element is accessed or received. During this process, the computational system generates output computation results (from 202, 203, …, to 204) from the iterative computation of computational flow 1. After the (2n)th data element is accessed or received, computation flow 1 becomes the backup computation flow. The computation system will generate the required output computation results (from 225, 226, …, to 227) through iterative computation of computation flow 2. The former main computation window is initialized to empty and computation flow 1 begins incremental computation. The above process can be repeated as needed. Since the size of the original backup computation window has reached n during the last incremental computation before the roles are switched, the result of either window can be used as the output computation result at that point in time, provided that the size of both the main computation window and the backup computation window is n. For example, the computation system can generate the output computation result through 224 or 204, 227 or 207, 230 or 216. In short, the two computation flows will alternate as the main computation flow.When one computational stream acts as the primary stream, the other acts as the backup stream. The primary stream performs iterative computations, and its results are used to produce the desired output. The backup stream performs incremental computations, and its results are only used to continue incremental computations, not to produce the desired output, except when its window size reaches n. Please note this. Figure 2-1 The diagram illustrates a better arrangement of two computational flows, where both flows perform the same number of iterations. In a worse arrangement, the two flows perform different numbers of iterations.
[0064] Figure 2-2 The diagram illustrates an example method 200B for eliminating rounding error accumulation, which uses three computational flows. The only difference between method 200B and method 200A is that method 200B uses an additional computational flow to reduce the number of iterative calculations performed by the main computational flow, thereby further mitigating the impact of rounding error accumulation. Method 200B can be used in scenarios where the main computational window is large and rounding error accumulation is severe. References Figure 2-2 "INI" indicates initialization, "INC" indicates incremental computation, and "ITR" indicates iterative computation. In Method 200B, three computational flows take turns acting as the main computational flow, while the other two flow act as backup computational flows. Similar to Method 200A, the main computational flow performs iterative computation, and the two backup flowal flows begin incremental computation at two different time points. The computational system uses the computational results generated in a main computational window or in a backup computational window when the window size reaches n as the output computational result. The computational results of the two backup flowal flows are only used to continue incremental computation and not to generate the required output computational result except when one of their window sizes reaches n. (See reference) Figure 2-2Flow 1, Flow 2, and Flow 3 are initialized at different points in time (201, 221, 241). Flow 1 initially acts as the main flow (202, 203, …, up to 204). Flow 2 begins incremental computation (222, …, 223, …, up to 224). When the (n / 2+1)th data element is accessed or received, Flow 3 begins incremental computation (242, 243, …, up to 224). When the (n+1)th data element is accessed or received… When the (n+n / 2+1)th data element is accessed or received, computation flow 1 becomes the backup computation flow and begins incremental computation (205, 206, …, to 207), computation flow 2 becomes the main computation flow and begins iterative computation (225, 226, …, to 227), while computation flow 3 continues incremental computation (245, 246, …, to 247). This process continues until the (2n)th data element is accessed or received. After that point, the roles of the three computation flows will change again. When the (2n+1)th data element is accessed or received, computation flow 1 continues to perform incremental computation (208, 209, …, to 210), computation flow 2 becomes the backup computation flow and begins to perform incremental computation (228, 229, …, to 230), while computation flow 3 becomes the main computation flow and begins to perform iterative computation (248, 249, …, to 250). This process continues until the (2n+n / 2)th data element is accessed or received. After that point, the roles of the three computation flows will change again. When the (2n+n / 2+1)th data element is accessed or received, computation flow 1 becomes the main computation flow and begins iterative computation (217, 218, ..., up to 213), computation flow 2 continues incremental computation (231, 232, ..., up to 233), and computation flow 3 becomes the backup computation flow and begins incremental computation (251, 252, ..., up to 253). This process continues until the (3n)th data element is accessed or received. After that point, the roles of the three computation flows will change again. When the (3n+1)th data element is accessed or received, computation flow 1 becomes the backup computation flow and begins incremental computation (214, 215, ...), computation flow 2 becomes the main computation flow and begins iterative computation (234, 235, ...), and computation flow 3 continues incremental computation (254, 255, ...). Since the size of the original backup calculation window had reached n during the last incremental calculation before the roles were swapped, the result of either window can be used as the output calculation result at that point in time, provided that the size of both the main calculation window and the backup calculation window is n. For example, the calculation system can generate the output calculation result using 224 or 204, 227 or 247, 250 or 210, 253 or 213.In short, the three computational flows will take turns acting as the primary computational flow. While one flow is the primary flow, the other two flow flows will act as backup flow flows. The primary flow performs iterative computation, and its results are used to produce the desired output. The backup flow performs incremental computation, and its results are only used to continue incremental computation, not to produce the desired output, except when one of them reaches a window size of n. Please note this. Figure 2-2 The diagram illustrates a better arrangement of three computational flows, where all three flows perform the same number of iterations. In a worse arrangement, the different computational flows perform different or not all the same number of iterations.
[0065] Figure 2-3 The diagram illustrates an example method 200C for eliminating rounding error accumulation, which uses four computational flows. The only difference between methods 200A, 200B, and 200C is that method 200C uses more computational flows to reduce the number of iterations performed by the main computational flow, further mitigating the impact of rounding error accumulation. Method 200C can be used in scenarios where the main computational window is large and rounding error accumulation is severe. References Figure 2-3 "INI" indicates initialization, "INC" indicates incremental computation, and "ITR" indicates iterative computation. In Method 200C, four computational flows take turns acting as the main computational flow, while the other three act as backup computational flows. Similar to Methods 200A and 200B, the main computational flow performs iterative computation, and the three backup computational flows begin incremental computation at three different time points. The computing system can select the result produced by the main computational flow as the output result of the iterative computation. The computation results of the three backup computational flows are only used to continue incremental computation and not to produce the required output result, except when the window size of one of them reaches n. (See reference) Figure 2-3Flow 1, Flow 2, Flow 3, and Flow 4 are initialized at different points in time (201, 221, 241, 261). Flow 1 initially acts as the main flow (202, 203, …, up to 204). Flow 2 begins incremental computation (222, …, 223, …, up to 224). When the (n / 3+1)th data element is accessed or received, Flow 3 begins incremental computation (242, …, 243, …, up to 244). When the (2n / 3+1)th data element is accessed or received, Flow 4 begins incremental computation (262, 263, …, up to 264). When the (n+1)th data element is accessed or received… When the (n+n / 3+1)th data element is accessed or received, computation flow 1 becomes the backup computation flow and begins incremental computation (205, 206, …, to 207), computation flow 2 becomes the main computation flow and begins iterative computation (225, 226, …, to 227), computation flow 3 continues incremental computation (245, 246, …, to 247), and computation flow 4 continues incremental computation (265, 266, …, to 267). This process continues until the (n+2n / 3)th data element is accessed or received. After that point, the roles of the four computation flows will change again. When the (n+2n / 3+1)th data element is accessed or received, computation flow 1 continues to perform incremental computation (208, 209, …, to 210), computation flow 2 becomes the backup computation flow and begins to perform incremental computation (228, 229, …, to 230), computation flow 3 becomes the main computation flow and begins to perform iterative computation (248, 249, …, to 250), and computation flow 4 continues to perform incremental computation (268, 269, …, to 270). This process continues until the (2n)th data element is accessed or received. After that point in time, the roles of the four computation flows will change again. When the (2n+1)th data element is accessed or received, computation flow 1 continues incremental computation (211, 212, …, up to 219), computation flow 2 continues incremental computation (231, 232, …, up to 233), computation flow 3 becomes the backup computation flow and begins incremental computation (251, 252, …, up to 253), and computation flow 4 becomes the main computation flow and begins iterative computation (271, 272, …, up to 273). This process continues until the (2n+n / 3)th data element is accessed or received. After that point, the roles of the four computation flows will change again.When the (2n+n / 3+1)th data element is accessed or received, computation flow 1 becomes the main computation flow and begins iterative computation (217, 218, ...), computation flow 2 continues incremental computation (236, 237, ...), computation flow 3 continues incremental computation (254, 255, ...), and computation flow 4 becomes the backup computation flow and begins incremental computation (274, 275, ...). Since the size of a backup computation window had reached n during the last incremental computation before the role swap, the result of either window can be used as the output computation result at that point in time, provided that the size of both the main computation window and the backup computation window is n. For example, the computation system can generate the output computation result using 224 or 204, 227 or 247, 250 or 270, 273 or 219. In short, the four computation flows will take turns as the main computation flow. When one computation flow is the main computation flow, the other three computation flows act as backup computation flows. The main computation stream performs iterative computations, and its results are used to produce the desired output. Backup computation streams perform incremental computations, and their results are only used to continue incremental computations, not to produce the desired output, except when one of their window sizes reaches n. Please note this. Figure 2-3 The diagram illustrates a better arrangement of four computational flows, where all four flows perform the same number of iterations. In a worse arrangement, the different computational flows perform different or not all the same number of iterations.
[0066] For simplicity, the example methods 200A, 200B, and 200C above use the smallest data unit (a single data element) to modify the main computation window (i.e., by removing and adding a single data element) and the backup computation window (i.e., by adding a single data element). These methods can be extended to modify the main and backup computation windows using larger data units. For example, each time r (r > 1) data elements are removed from the main computation window, r data elements are added to the main computation window, and r data elements are added to one or more backup computation windows. As long as each backup computation window starts at a different time point and the initial window size is n mod r, the backup computation windows will always reach the window size n at some point in time. Note that in the three example methods, the backup computation stream initially performs incremental computations at different time points, but this is not the only way to start a backup computation stream. Backup computation streams can also be started when the main computation stream initializes one or more components of a function: each backup computation stream initializes one or more components of the function on a backup computation window with different initial window sizes (n minus its initial window size must be an integer multiple of r) with a different number of the latest data instead of the latest n data, and then begins incremental computation.
[0067] Figure 3 The diagram illustrates a flowchart of an example method 300 for eliminating the accumulation of rounding errors in iterative computations of large or streaming data. Method 300 will be described in conjunction with methods 200A, 200B, and 200C.
[0068] Method 300 includes initializing one or more components of a function on a pre-modification compute window containing n (n > 1) sets of data from one or more data sources, each set containing k (k ≥ 1) data elements (301). Initialization here refers to computed one or more components of the function on the pre-modification compute window in any way, such as iteratively, incrementally, decrementally, dynamically, or by computed one or more components of the function using the data elements in the pre-modification compute window. For example, see [reference]. Figure 2-1 As shown in 200A, the first main computation window contains n sets of data, each set containing a single data element. Computation flow 1 uses the earliest n data elements to initialize one or more components of a function (201). Methods 200A, 200B, and 200C demonstrate the simplest case where k = 1.
[0069] Method 300 includes initializing one or more components of the function on each of one or more (l (l ≥ 1)) pre- and post-modification standby computation windows, wherein the one or more pre- and post-modification standby computation windows include the latest n (mod r) (r ≥ 1) sets of data starting from different start points (302). Here, n (mod r) refers to n modulo r. For example, see Reference Figure 2-1 In the 200A, the computational system initializes one or more components of the function on a pre- and post-modification computation window (l = 1). Since each time 1 (r = 1) sets of data are added, n (mod r) = 0, the pre- and post-modification computation window contains no data elements, and the one or more components can be initialized to zero. Similar to the 200A, [the following is an example / details]. Figure 2-2 The 200B implementation has an additional computational flow, with one main computational window and two backup computational windows (i.e., l = 2) at any given time. Because each time 1 (r = 1) sets of data are added, n (mod r) = 0, the backup computational windows do not contain any data elements, and one or more components can be initialized to zero. Similar to 200B, [the following is an example / details]. Figure 2-3Method 200C has an additional computational flow, with one main computational window and three backup computational windows (i.e., l = 3) at any given time. Because each time 1 (r = 1) sets of data are added, n (mod r) = 0, the backup computational windows do not contain any data elements before or after the modification, and one or more components can be initialized to zero. Methods 200A, 200B, and 200C demonstrate the simplest cases where k = 1, r = 1, and l is equal to 1, 2, and 3 respectively.
[0070] Method 300 includes accessing r groups of data elements (303) to be added to the main calculation window before modification from the one or more data sources, where each group of data contains k data elements. For methods 200A, 200B, and 200C, r = 1.
[0071] Method 300 includes storing the accessed r sets of data elements into one or more (from 1 to k) data buffers (304). This is an optional operation that is only performed if the one or more data sources include truly real-time data streams rather than streaming big data read from storage media.
[0072] Method 300 includes modifying the pre-modification main calculation window and modifying l pre-modification backup calculation windows (305), including: removing the earliest r groups of data elements from the pre-modification main calculation window; adding the r groups of data elements to be added to the pre-modification main calculation window (306); adding the r groups of data elements to be added to each pre-modification backup calculation window and modifying the corresponding window size counter of the pre-modification backup calculation window (307).
[0073] Method 300 includes iteratively and incrementally calculating one or more components of the function (308), including iteratively calculating one or more components of the function in the modified main calculation window based on one or more components of the function in the pre-modification main calculation window (309), and incrementally calculating one or more components of the function in the modified backup calculation window corresponding to each pre-modification backup calculation window based on one or more components of the function in each of the pre-modification backup calculation windows (310).
[0074] Method 300 includes one or more components of the iterative computation component generating one or more computation results of the function on the modified main computation window when the one or more computation results are accessed or queried (311).
[0075] Method 300 includes determining whether the size of any one of the one or more backup computation windows reaches n-r (312).
[0076] If not, method 300 includes accessing r sets of data elements to begin the next round of iterative computation (303, 304, 305, 308, 311, 312, …).
[0077] If so, method 300 includes repeating steps (303, 304, 305, 308) and generating one or more computation results of the function on a computation window of size n based on one or more components of the function (313).
[0078] Method 300 includes swapping the roles of the modified main computation window and a modified standby computation window of size n (314) by setting the modified main computation window as a pre-modification standby computation window, allowing the pre-modification standby computation window to contain the latest n (mod r) sets of data, setting the size of the pre-modification standby computation window to n (mod r), and initializing one or more components of the function on the pre-modification standby computation window (315) and setting the modified standby computation window of size n as a pre-modification main computation window (316). Method 300 includes accessing r sets of data elements to begin the next round of iterative computation (303, 304, 305, 308, 311, 312, …). For example, for 200A, 200B, and 200C, since r = 1, therefore for any n, n (mod r) = 0, the modified main computation window will be reset to empty (i.e., not containing data elements), and one or more components of the function will be initialized to 0.
[0079] This invention can be implemented in other specific ways without departing from its spirit or essential characteristics. The implementations described in this application are entirely illustrative in all respects and not intended to limit the invention in any way, either formally or substantively. Therefore, the scope of the invention is defined by the appended claims rather than the foregoing description. It should be noted that those skilled in the art can make various improvements, additions, modifications, alterations, or equivalent changes to the disclosed technical content without departing from its spirit or essential characteristics; all such equivalent changes are equivalent embodiments of the invention. All changes equivalent to the meaning and scope of the claims are included within the scope of the claims.
Claims
1. A method for eliminating the accumulation of rounding errors in the iterative calculation of a function, characterized in that: A computing system based on one or more computing devices initializes one or more components of a function on a pre-modification main computing window for one or more data sources accessible by the computing system. The pre-modification main computing window has a window size of n and contains n sets of data elements, n > 1, and each set of data elements contains k data elements from one or more data sources, k ≥ 1. The computing system based on the computing device initializes one or more components of the function on each of the l pre- and post-modification backup computing windows, where l ≥ 1; The computing system based on the computing device accesses r groups of data elements to be added to the main computing window before modification from one or more data sources, where r ≥ 1, and each group of data contains k data elements. The computing system based on the computing device modifies the main computing window from its previous state via: Remove the earliest r group of data elements from the main calculation window before the modification; And add the accessed r groups of data elements to the main calculation window before the modification; The computing system based on the computing device modifies each of the l pre- and post-modification backup computing windows. Modifying each pre- and post-modification backup computing window includes adding r sets of accessed data elements to the pre- and post-modification backup computing window and modifying the value of the corresponding window size counter of the pre- and post-modification backup computing window. The computing system based on the computing device iteratively calculates one or more components of the function on the main computing window after the modification, based on one or more components of the function on the main computing window before the modification. The computing system based on the computing device incrementally computes one or more components of the function on the modified backup computing window corresponding to each of the modified backup computing windows, based on one or more components of the function on each of the modified backup computing windows. The computing system based on the computing device determines whether any of the modified backup computing windows has a size of n. The computing system based on the computing device generates one or more calculation results of the function based on one or more components of the function on the modified main computing window or the modified backup computing window with a size of n when any of the modified backup computing windows reaches a size of n. The system also swaps the roles of the modified main computing window and the modified backup computing window with a size of n so that the modified backup computing window becomes a pre-modification main computing window and the modified main computing window becomes a pre-modification backup computing window. The computing system based on the computing device generates one or more computation results of the function based on one or more components of the function on the modified main computing window when none of the l modified backup computing windows has a size of n.
2. The method according to claim 1, characterized in that... l = 1。 3. The method according to claim 1, characterized in that... l > 1。 4. The method according to claim 1, characterized in that... r = 1。 5. The method according to claim 1, characterized in that... r > 1。 6. A computing system, characterized in that: One or more computing devices; Each computing device contains one or more processors; One or more storage media; as well as One or more computing modules, when executed by at least one of one or more computing devices, eliminate the accumulation of rounding errors in the iterative computation of a function, wherein the one or more computing modules are configured to: a. Initialize one or more components of a function on a pre-modification main computation window that can be accessed by the computation system for one or more data sources. The pre-modification main computation window has a window size of n and contains n sets of data elements, n > 1, and each set of data elements contains k data elements from one or more data sources, k ≥ 1. b. Initialize one or more components of the function on each of the l pre- and post-modification standby calculation windows, where l ≥ 1; c. Access r groups of data elements to be added to the main calculation window before modification from one or more data sources, where r ≥ 1, and each group of data contains k data elements; d. Modify the original main calculation window by: Remove the earliest r group of data elements from the main calculation window before the modification; And add the accessed r groups of data elements to the main calculation window before the modification; e. Modify each of the l pre- and post-modification standby calculation windows respectively. Modifying each pre- and post-modification standby calculation window includes: adding the accessed r sets of data elements to the pre- and post-modification standby calculation window and modifying the value of the corresponding window size counter of the pre- and post-modification standby calculation window. f. Iteratively compute one or more components of the function on the main compute window after the modification, based on one or more components of the function on the main compute window before the modification; g. Incrementally compute one or more components of the function on the modified backup calculation window corresponding to each of the l modified backup calculation windows, based on one or more components of the function on each of the modified backup calculation windows; h. Determine whether any of the l modified backup calculation windows has a size of n; i. When the size of any one of the l modified backup computation windows reaches n, generate one or more computation results of the function based on one or more components of the function on the modified main computation window or the modified backup computation window with a size of n, and swap the roles of the modified main computation window and the modified backup computation window with a size of n so that the modified backup computation window becomes a pre-modification main computation window and the modified main computation window becomes a pre-modification backup computation window; j. When none of the l modified backup computation windows has a size of n, generate one or more computation results of the function based on one or more components of the function on the modified main computation window.
7. The computing system according to claim 6, characterized in that... l > 1。 8. The computing system according to claim 6, characterized in that... r > 1。 9. A computing system program product comprising a plurality of computing device executable instructions, which, when executed by at least one computing device in a computing system comprising one or more computing devices, cause the computing system to perform a method, characterized in that: The computing system initializes one or more components of a function on a pre-modification main computing window for one or more data sources accessible by the computing system. The pre-modification main computing window has a window size of n and contains n sets of data elements, n > 1, and each set of data elements contains k data elements from one or more data sources, k ≥ 1. The computing system initializes one or more components of the function on each of the l pre- and post-modification backup computing windows, where l ≥ 1; The computing system accesses r groups of data elements, r ≥ 1, from one or more data sources to be added to the main computing window before modification, where each group contains k data elements; The main calculation window before the modification was changed by the calculation system through: Remove the earliest r group of data elements from the main calculation window before the modification; And add the accessed r groups of data elements to the main calculation window before the modification; The computing system modifies each of the l pre- and post-modification backup computing windows. Modifying each pre- and post-modification backup computing window includes adding r sets of accessed data elements to the pre- and post-modification backup computing window and correspondingly modifying the value of the window size counter of the pre- and post-modification backup computing window. The computing system iteratively calculates one or more components of the function in the main computing window after the modification, based on one or more components of the function in the main computing window before the modification. The computing system incrementally computes one or more components of the function in the modified backup computing window corresponding to each of the l modified backup computing windows, based on one or more components of the function in each modified backup computing window; The calculation system determines whether any of the modified backup calculation windows has a size of n. The computing system generates one or more calculation results of the function based on one or more components of the function on the modified main computing window or the modified backup computing window with a size of n, and swaps the roles of the modified main computing window and the modified backup computing window with a size of n so that the modified backup computing window becomes a pre-modification main computing window and the modified main computing window becomes a pre-modification backup computing window. The computing system generates one or more computation results of the function based on one or more components of the function in the modified main computing window when none of the l modified backup computing windows has a size of n.
10. A computing system program product comprising a plurality of computing device executable instructions, which, when executed by at least one computing device in a computing system comprising one or more computing devices and one or more storage devices, cause the computing system to implement the method as described in any one of claims 1-5.