Method, apparatus, and system for predicting continuous sequence

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The method addresses scalability and temporal non-uniformity in time-series forecasting by employing a mean-field game framework with neural graphons, reducing computational complexity and improving predictive accuracy.

WO2026127370A1PCT designated stage Publication Date: 2026-06-18LG MANAGEMENT DEV INST CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: LG MANAGEMENT DEV INST CO LTD
Filing Date: 2025-10-31
Publication Date: 2026-06-18

Application Information

Patent Timeline

31 Oct 2025

Application

18 Jun 2026

Publication

WO2026127370A1

IPC: G06F16/2458; G06F16/242; G06F16/248; G06F16/22; G06F16/25; G06F16/28; G06N20/00; G06F123/02

AI Tagging

Application Domain

Database management systems Machine learning

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing time-series forecasting methods struggle with scalability and temporal non-uniformity, particularly in handling spatiotemporal causality, leading to inefficient computational processes.

⚗Method used

A method utilizing a mean-field game framework with neural graphons to model continuous time series, employing a mean-field predictor and neural graphon structures to handle infinite-dimensional data, reducing computational complexity through gradient descent-based solutions.

🎯Benefits of technology

Enables accurate and efficient prediction of future events in non-uniform time series data by minimizing computational load and capturing complex inductive biases, enhancing predictive accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure KR2025017718_18062026_PF_FP_ABST

Patent Text Reader

Abstract

An embodiment of the present disclosure may provide a system comprising: at least one processor; and at least one memory that stores instructions, when executed by the at least one processor, causing the system to perform operations, wherein the operations comprise: an operation of acquiring an observation sequence; an operation of generating, with the observation sequence and time information as inputs, a latent sequence at a first time point by using an encoder module; an operation of generating a latent sequence at a future time point by inputting, into a first linear operation module, a first parameter regarding coefficients of a linear probability differential equation and the latent sequence at the first time point; and an operation of converting the latent sequence at the future time point into a prediction sequence and outputting same, by using a decoder module.

Need to check novelty before this filing date? Find Prior Art

Description

Continuous sequence prediction method, device, and system

[0001] The present disclosure relates to a method, apparatus, and system for predicting a continuous sequence, and more specifically, to a system, apparatus, and method for increasing prediction speed by reducing computational values.

[0002] Time-series data refers to data recorded sequentially over time. The problem of predicting the future by analyzing observed time-series data is precisely the time-series forecasting problem. However, despite extensive recent research, there is currently no forecaster applicable and scalable in terms of both temporal non-uniformity and spatiotemporal causality.

[0003] [Prior Art Literature]

[0004] [Non-patent literature]

[0005] Chankyu Lee et al., "NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models", arxiv: 2405.17428, 27 May 2024

[0006] One embodiment of the present disclosure aims to provide a continuous sequence prediction method, apparatus, and system capable of reducing the number of operations and deriving rapid results when predicting the future using past time series data.

[0007] One embodiment of the present disclosure may provide a system performed by a computer comprising: at least one processor; and at least one memory storing instructions that cause the system to perform operations when executed by the at least one processor, wherein the operations include: an operation of acquiring an observation sequence; an operation of generating a potential sequence at a first time point using an encoder module with the observation sequence as input; an operation of generating a potential sequence at a second time point by inputting a first parameter relating to the coefficients of a linear stochastic differential equation and the potential sequence at the first time point to a first linear operation module; and an operation of converting the potential sequence at the second time point into a prediction sequence and outputting it using a decoder module, wherein the value of a third timestamp of the potential sequence at the second time point is determined based on the value of a first timestamp of the observation sequence, the value of a first timestamp of the first parameter, and the value of a second timestamp of the first parameter.

[0008] In one embodiment, the value of the third time stamp of the potential sequence at the second time point may be determined based on the first time stamp value of the first parameter sequence and the second time stamp value of the first parameter sequence, and based on the first time stamp value of the first time stamp of the observation sequence.

[0009] In one embodiment, the operation of generating a potential sequence at a second time point may include: defining the future time point propagation of the potential sequence at a first time point as a transition operator corresponding to a continuous-time linear stochastic dynamics model; reconstructing the transition operation into a chain of operators that satisfies the associativity; and generating the potential sequence at the second time point by performing a parallel operation on the chain of operators.

[0010] In one embodiment, the total number of iterations of the parallel operation may be limited to log N or less with respect to the input sequence length N.

[0011] In one embodiment, the operations further include an operation of acquiring time information corresponding to the observation sequence, and the operation of generating a potential sequence at the first time point may include an operation of combining a condition parameter with the observation sequence; and an operation of generating a potential sequence at the first time point using an encoder module with the observation sequence combined with the condition parameter and the time information as inputs.

[0012] In one embodiment, the operations further include an operation of acquiring time information corresponding to the observation sequence, and the operation of generating a potential sequence at the first time point may include an operation of combining a condition parameter with the observation sequence; and an operation of generating a potential sequence at the first time point using an encoder module with the past observation sequence combined with the condition parameter and the time information combined with the observation sequence and the condition parameter as inputs.

[0013] In one embodiment, the decoder module includes a linear operation module, a non-linear activation module, and an adaptive hierarchical normalization module, and the adaptive hierarchical normalization module can generate the prediction sequence by considering condition parameters.

[0014] In one embodiment, the first parameter sequence can be generated by a condition parameter.

[0015] One embodiment of the present disclosure may provide a graphics processing device comprising: an encoder module that generates a potential sequence at a first time point by taking an observation sequence as input; a first linear operation module that generates a potential sequence at a second time point by taking a first parameter relating to the coefficients of a linear stochastic differential equation and the potential sequence at the first time point as input; and a decoder module that converts the potential sequence at the second time point into a prediction sequence and outputs it, wherein the value of a third timestamp of the potential sequence at the second time point is determined based on the value of a first timestamp of the observation sequence, the value of a first timestamp of the first parameter, and the value of a second timestamp of the first parameter.

[0016] One embodiment of the present disclosure may provide a method comprising: an operation of acquiring an observation sequence; an operation of generating a potential sequence at a first time point using an encoder module with the observation sequence as input; an operation of generating a potential sequence at a second time point by inputting a first parameter regarding the coefficients of a linear stochastic differential equation and the potential sequence at the first time point into a linear module; and an operation of converting the potential sequence at the second time point into a prediction sequence and outputting it using a decoder module, wherein the value of a third timestamp of the potential sequence at the second time point is determined based on the value of a first timestamp of the observation sequence, the value of a first timestamp of the first parameter, and the value of a second timestamp of the first parameter.

[0017] One embodiment of the present disclosure includes a program stored on a recording medium to execute a method according to one embodiment of the present disclosure on a computer.

[0018] One embodiment of the present disclosure includes a computer-readable recording medium having a program for executing a method according to one embodiment of the present disclosure on a computer.

[0019] One embodiment of the present disclosure includes a computer-readable recording medium that records a database used in one embodiment of the present disclosure.

[0020] According to one embodiment of the present disclosure, the computational level of the prediction calculator can be effectively reduced.

[0021] FIG. 1a is a diagram illustrating a method in which past information is sampled at an arbitrary time distribution according to one embodiment of the present disclosure.

[0022] FIG. 1b is a diagram illustrating a method for generating future predictions through propagation according to one embodiment of the present disclosure.

[0023] FIG. 2a is a drawing showing an example of an exponential graphon according to one embodiment of the present disclosure.

[0024] FIG. 2b is a drawing showing an example of a cosinusoidal graphon according to one embodiment of the present disclosure.

[0025] FIG. 3 is a flowchart of a prediction method using mean field theory according to one embodiment of the present disclosure.

[0026] FIG. 4 is a diagram showing the gradient system of an average field predictor associated with the updated parameters of neural agents in the m-th iteration step according to one embodiment of the present disclosure.

[0027] FIG. 5 is a block diagram of a computing system that performs a method for predicting future data according to one embodiment of the present disclosure.

[0028] FIG. 6 is a block diagram of a computing device, which is one of the components of a computing system that performs a method for predicting future data according to one embodiment of the present disclosure.

[0029] FIG. 7 is a block diagram of another aspect of a computing device, which is one of the components of a computing system that performs a method for predicting future data according to one embodiment of the present disclosure.

[0030] FIG. 8 is a flowchart of a method for predicting the outlook of a target through a machine learning model according to one embodiment of the present disclosure.

[0031] FIG. 9 is a diagram showing a meta-architecture for performing a method of predicting the prospect of a target according to one embodiment of the present disclosure.

[0032] FIG. 10 is a diagram illustrating the process of executing a target prediction task and determining relationship information with a target prediction variable according to one embodiment of the present disclosure.

[0033] FIG. 11 is a diagram illustrating a process of performing data preparation based on relationship information according to one embodiment of the present disclosure.

[0034] FIG. 12 is a diagram illustrating a process of converting unstructured data into quantitative data according to one embodiment of the present disclosure.

[0035] FIG. 13 is a diagram illustrating a process of integrating structured data and quantitative data to calculate a target outlook according to one embodiment of the present disclosure.

[0036] FIG. 14 is a diagram illustrating the process of deriving the basis for a target forecast according to one embodiment of the present disclosure and the process of predicting an additional target forecast based on a user's simulation.

[0037] FIG. 15 is an example of a chart for a predicted target outlook according to one embodiment of the present disclosure.

[0038] FIG. 16 is an example of a causal relationship graph presented as supporting data for a predicted target outlook according to one embodiment of the present disclosure.

[0039] FIG. 17 is another example of a causal relationship graph presented as supporting data for a predicted target outlook according to one embodiment of the present disclosure.

[0040] FIG. 18 is a flowchart of a method for performing a hypothetical situation simulation according to one embodiment of the present disclosure.

[0041] FIG. 19 is a graph relating to general results and assumption results according to one embodiment of the present disclosure.

[0042] FIG. 20a is a diagram illustrating a time series forecasting method of a transformer model in which one observation becomes one token according to one embodiment of the present disclosure.

[0043] FIG. 20b is a diagram illustrating a time series forecasting method for a segment-based transformer model according to one embodiment of the present disclosure.

[0044] FIG. 21 is a schematic diagram of an ESSformer (Efficient Segment-based Sparse Transformer) block according to one embodiment of the present disclosure.

[0045] FIG. 22 is a flowchart illustrating a method for generating prediction data according to one embodiment of the present disclosure.

[0046] FIG. 23 is an exemplary on-premises full-stack structure according to one embodiment of the present disclosure.

[0047] FIG. 24 is a schematic diagram of a time series forecasting system according to one embodiment of the present disclosure.

[0048] FIG. 25 is a diagram showing the operation method of the first linear operation module.

[0049] FIG. 26 is a diagram illustrating a method of operation of a first linear operation module according to one embodiment of the present disclosure.

[0050] FIGS. 27a to 27c are schematic diagrams of a time series prediction system using condition parameters according to one embodiment of the present disclosure.

[0051] To clarify the technical concept of the present disclosure, embodiments of the present disclosure will be described in detail with reference to the attached drawings. In describing the present disclosure, detailed descriptions of related known functions or components will be omitted if it is determined that such detailed descriptions would unnecessarily obscure the essence of the present disclosure. Components having substantially the same functional configuration among the drawings have been assigned the same reference numerals and symbols as much as possible, even if they are shown in different drawings. For convenience of explanation, devices and methods will be described together where necessary. Each operation of the present disclosure does not necessarily have to be performed in the order described and may be performed in parallel, selectively, or individually.

[0052] The terms used in the embodiments of this disclosure have been selected to be as widely used and general as possible, taking into account the functions of this disclosure; however, these terms may vary depending on the intent of those skilled in the art, case law, the emergence of new technologies, etc. Additionally, in specific cases, terms have been selected at the applicant's discretion, and in such cases, their meanings will be described in detail in the description of the relevant embodiments. Therefore, terms used in this specification should be defined not merely by their names, but based on their meanings and the overall content of this disclosure.

[0053] Throughout this disclosure, singular expressions may include plural expressions unless the context clearly indicates otherwise. Terms such as “comprising” or “having” are intended to specify the presence of features, numbers, steps, actions, components, parts, or combinations thereof, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof. That is, throughout this disclosure, when a part is described as “comprising” a certain component, it means that, unless specifically stated otherwise, it does not exclude other components but may include additional components.

[0054] Expressions such as "at least one" modify the entire list of components and do not modify the components of the list individually. For example, "at least one of A, B, and C" and "at least one of A, B, or C" refer to only A, only B, only C, both A and B, both B and C, both A and C, all of A, B, and C, or any combination thereof.

[0055] Additionally, terms such as “...part,” “...module,” etc., as described in this disclosure refer to a unit that processes at least one function or operation, and may be implemented in hardware or software, or a combination of hardware and software.

[0056] Throughout the entire disclosure, when a part is described as being “connected” to another part, this includes not only cases where they are “directly connected” but also cases where they are “electrically connected” with other elements interposed between them. Furthermore, when a part is described as “comprising” a certain component, this means that, unless specifically stated otherwise, it does not exclude other components but may include additional components.

[0057] As used throughout this disclosure, the expression “configured to” may be replaced, depending on the context, with, for example, “suitable for,” “having the capacity to,” “designed to,” “adapted to,” “made to,” or “capable of.” The term “configured to” may not necessarily mean only “specifically designed to” in hardware. Instead, in some situations, the expression “system configured to” may mean that the system is “capable of” together with other devices or components. For example, the phrase “a processor configured (or set) to perform A, B, and C” may mean a dedicated processor for performing said operations (e.g., an embedded processor), or a generic-purpose processor (e.g., a CPU or an application processor) capable of performing said operations by executing one or more software programs stored in memory.

[0058] Throughout this disclosure, the expression [N : M] is a set of integers from N to M, wherein N is included and M is not included. That is, [N : M] may mean {N, N+1, ..., M-1}.

[0059] Modeling spatiotemporal processes can enhance the ability to predict the behavior of complex systems and provide deep insights. Although neural differential equation models have recently been proposed to model spatiotemporal processes, they have not yet resolved the issue of how to handle a significant amount of computation when processing infinite (or equivalent) observations by finely subdividing time intervals. Therefore, one embodiment of the present disclosure aims to directly model future data in continuous intervals having infinite or equivalent complexity, and to develop a predictive decision framework of infinite or equivalent dimensions using a mean-field game and to provide a generalization of differential equation models.

[0060] Throughout this disclosure, "observation" may refer to an act of acquiring, obtaining, verifying, recording, etc., the state, events, signals, metadata, etc. of a target by an agent. However, observation is not limited thereto and may be a concept that includes measurement, detection, tracking, sampling, collection, logging, accumulation, storage, aggregation, etc. Accordingly, throughout this disclosure, "observation data" or "observation sequence" may refer to the output of an observation act.

[0061] FIG. 1a is a diagram illustrating a method in which past information is sampled at an arbitrary time distribution according to one embodiment of the present disclosure.

[0062] Referring to FIG. 1a, a processor may generate multiple observation data by sampling past data at an arbitrary time distribution. Throughout the present disclosure, observation data may be referred to as observations. In the example of FIG. 1a, past data may be sampled at times t1, t2, t3, and t4. In this case, the time intervals between t1, t2, t3, and t4 do not have to be uniform. One embodiment of the present disclosure aims to provide a system that enables accurate prediction even in irregular observations where the time intervals of sampling (e.g., the intervals between t1, t2, t3, and t4) are not uniform.

[0063] In one embodiment, a plurality of observation data may be generated based on past data sampled at an arbitrary time distribution. The labels of the past observation data may be represented by u. For example, an infinite number of labels in a sequence can be conditionally set for past observation intervals. For example, observation data label u1 at time t1, observation data label u2 at time t2, observation data label u3 at time t3, and observation data label u4 at time t4 can be generated. In one embodiment, p(u) is a label distribution that provides a continuous representation of past observation data.

[0064] In one embodiment, is a probability measure that can provide a continuous representation of past observation data by concisely expressing the dynamic laws of a system. Specifically, Is It can be defined as follows: where, is a measure for label v at time t, and OХT is a set representation of labels and time.

[0065] In one embodiment, can represent the mean field predictor with the label u and the state variable at time t. is past observational data It can include continuous information from initialization in to time t.

[0066] In one embodiment, the neural graphon is represented as W and can be used to model continuous time series data. is defined as a measure-valued function for v, and Combined with It can be represented as. Each spacetime dynamic is a neural graph that utilizes inductive bias tailored to sequential data. It can be interconnected through.

[0067] In one embodiment, an Euler-Maruyama sampling method for graphone interaction particles may be used to generate a set of mean field predictors at each time stamp. In one embodiment, [Algorithm 1] below is an algorithm for sampling mean field predictors, It is assumed that in a gradient system operating with FBSDEs (Forward-Backward Stochastic Differential Equations), it is optimal from the perspective of mean-field equilibrium.

[0068] [Algorithm 1]

[0069]

[0070] In one embodiment, due to the infinite (or equivalent) dimensional nature, sampling a mean-field predictor can result in inherent complex errors when applied to finite-dimensional real-world datasets. Mean-Field Predictors (MFPs) sampled by [Algorithm 1] and infinite-dimensional MFPs can be defined as follows.

[0071] - MFPs by [Algorithm 1]:

[0072] - Infinite Dimensional MPFs:

[0073] Here, is a sampled predictor variable obtained by implementing [Algorithm 1], and is a weighted sum is the mean-field predictor variable It can approximate the actual collective prediction performed by.

[0074] arbitrary About , If is the probability measure, then constants c, c7, c8, c9>0, w>0, For , the squared probability of the 2-Wasserstein distance can be controlled as follows.

[0075]

[0076] The above shows the distribution of prediction results when N samples are taken. According to one embodiment, it can be seen that as the size of N increases, that is, as a larger number of samples are taken, the reliability of the prediction results increases.

[0077] FIG. 1b is a diagram illustrating a method for generating future predictions through propagation according to one embodiment of the present disclosure.

[0078] Referring to FIG. 1b, the processor can calculate the propagation signal of each of the plurality of observation data based on mean-field theory.

[0079] In one embodiment, mean-field theory (which may be referred to as the mean-field principle or mean-field game throughout this disclosure) can be used as a tool to probabilistically model and analyze how many interacting agents behave in a dynamic, distributed environment. In the mean-field domain, many agents can satisfy a Nash equilibrium state by individually adjusting the dynamics of partially observed historical sequence data and by collectively interacting with one another to make optimal group decisions to predict future events. One embodiment of this disclosure aims to extend this problem of predicting a continuous time sequence to the formulaic setup of a mean-field game.

[0080] In one embodiment, a mean-field graphon stochastic differential equation (SDE) may be used as a new framework for modeling a sequence predictor.

[0081] The mean-field graphone stochastic differential equation can be defined as [Definition 1] below. [Equation 1], defined in [Definition 1], is a stochastic differential equation (SDE) designed to represent a continuous signal of infinite order by integrating inductive biases in time series modeling.

[0082] [Definition 1]

[0083]

[0084] [Mathematical Equation 1 (or Eq (1))]

[0085]

[0086] In one embodiment, [Equation 1] is an equation that focuses on 1) a mean-field predictor and 2) a neural graphon, which are important for comprehensive and continuous time series modeling, and these mean-field predictors and neural graphons will be explained in detail below.

[0087] 1) Mean-field Prediction System

[0088] A system according to one embodiment of the present disclosure may include two types of continuity encodings. For example, the system may include an encoding for positionality (t) and an encoding for labeling (u).

[0089] In one embodiment, the continuum or mean-field predictors (MFPs) of the predictor are state variables in [Equation 1] It can be expressed as. State variable are labeled as u ~ p(u) respectively, and past observations It can represent a set of continuous information trajectories initialized from. For example, mean field system Infinite independent and identically distributed labels The continuum of predictors for is in the future event interval obtained from [Equation 1] according to the past observation interval, i.e., the label distribution p(u). It can be conditioned by future causal effects that produce.

[0090] According to one embodiment of the present disclosure, since both the input and the output are processed in a continuous manner, a continuous signal can be appropriately processed through [Equation 1]. In the process of processing a continuous signal, the neural network A closed Markovian control process parameterized by It can be referred to as a neural agent, and the state The trajectory of can be controlled.

[0091] One embodiment of the present disclosure is an optimal neural agent that is closest to the target interval The trajectory of the predictor can be corrected by determining it. The collective behavior of the mean-field predictor can be captured through the aggregation of decisions.

[0092] 2) Neural Graphon

[0093] Basic assumptions regarding inductive biases, such as temporal decay, cycles, and seasonality, are essential in time series modeling. To incorporate these into the mean-field system of the present disclosure, neural graphons may be utilized. Neural graphons are graphon structures parameterized by neural networks that can capture the inherent heterogeneity among predictor variables. In one embodiment, neural graphons may include exponential graphons, cosinusoidal graphons, etc.

[0094] In one embodiment, a neural graphone may be defined as follows [Definition 2].

[0095] [Definition 2]

[0096]

[0097] In one embodiment, two tuples and For, the symmetric function It can measure the scaled relative difference between spatial features x and y. In addition, the neural agent It can adjust the importance of dissimilarity by rescaling the magnitude of the projected vector. The neural graphon W can encode the degree of interaction between time variables u and v. Among the various available graphon designs, an exponential graphon (e.g., FIG. 2a) and a cosinusoidal graphon (e.g., FIG. 2b), known by inductive bias specialized for continuous-time series, may be used. According to one embodiment of the present disclosure, through the graphon structure, a data space rather than a latent feature space In this, the inductive bias model can be directly modeled.

[0098] According to one embodiment of the present disclosure, by extending existing differential equation models, the stochastic spacetime dynamics of an infinite agent continuum can be effectively captured based on predictions from time series analysis (e.g., seasonality).

[0099] In one embodiment, the processor can generate forward and back-propagation signals for each of a plurality of observation data using a neural graphone which is a symmetric integrable function.

[0100] In one embodiment, to efficiently solve the mean field game, the processor may compute forward-backward stochastic differential equations (FBSDEs) using gradient descent so as to significantly reduce the computational complexity associated with approximating the Nash equilibrium. The processor may generate propagation signals for each of the multiple observation data by solving the differential equations using gradient descent. Throughout the entire disclosure, generating propagation signals may include deriving values obtained by solving the differential equations.

[0101] In one embodiment, the processor can reflect updates of neural agents using a gradient descent-based algorithm. In one embodiment, a fixed flow of measurements and for a fixed label u at each step m, a series of processes (X) as a gradient system solving the forward-backward stochastic differential equation in relation to the graphone system of [Equation 1] u (t), Y u (t), Z u (t)) can be defined as [Definition 3] below.

[0102] [Definition 3]

[0103]

[0104] Accordingly, (X u (t), Y u (t), Z u (t))= can be obtained.

[0105] In one embodiment, the gradient system can decompose the equation by repeating the two-step procedure of the information propagation step and the control profile update step over a total of M steps. This will be described in more detail later with reference to FIG. 4.

[0106] In one embodiment, the processor may determine an aggregation distribution by aggregating the results of propagation signal calculations for a plurality of observation data, and determine a predicted value using the same. Additionally, the processor may determine the aggregation distribution using an attention mechanism.

[0107] In one embodiment, during the training process, the predicted value corresponding to the group decision of the mean field predictor can be corrected to approximate the interval with the target future event. That is, the processor determines a loss value based on the difference between the predicted value and the true value, and can train the artificial intelligence prediction model so that the loss value becomes less than or equal to a preset value (e.g., a very small value). If the model is trained until the loss value becomes less than or equal to the preset value, the artificial intelligence prediction model can be used as a predictor to forecast future information.

[0108] In one embodiment, to generate an accurate target interval, the artificial intelligence prediction model uses a value function that characterizes a state in which a series of players forms an alliance to cooperatively predict the best future event. Can be trained to derive.

[0109] According to one embodiment of the present disclosure, based on the concentration of empirical measurements and the propagation of chaotic properties, the impact of leakage from past observations on the generalization performance of a mean-field system can be clarified. Furthermore, according to one embodiment of the present disclosure, as the number of agents increases, accuracy increases further, and reliable predictions can be generated.

[0110] FIG. 2a is a drawing showing an example of an exponential graphon according to one embodiment of the present disclosure.

[0111] Referring to FIG. 2a, an example of an exponential graphone is illustrated in which the influence of past events decreases exponentially by incorporating temporal decay for spatiotemporal variables. FIG. 2a is an exponential graphone in which temporally close events tend to exhibit strong interactions. Here, a neural agent can determine the magnitude of the interaction. The deviation between labels Regarding this, the influence of temporally dissimilar events may have a penalty as shown in [Equation 2] below.

[0112] [Mathematical Formula 2]

[0113]

[0114] FIG. 2b is a drawing showing an example of a cosinusoidal graphon according to one embodiment of the present disclosure.

[0115] Referring to FIG. 2b, an example of a cosine wave graph is illustrated, emphasizing a continuous cycle assumption that captures the periodic characteristics of a time series. In one embodiment, the sine wave eigenfunction and various frequency modes for eigenvalues Using The eigenvalue decomposition of the graphone operator can be performed in.

[0116] [Mathematical Formula 3]

[0117]

[0118] In one embodiment, Fourier coefficients By replacing it with a corresponding neural agent (i.e., ), graphon operators can be parameterized into neural networks. To represent various periods, a set of predetermined frequencies It can be defined as such. In this case, the cosine wave graph can be expressed by the following [Equation 4].

[0119] [Mathematical Formula 4]

[0120]

[0121] In one embodiment, for convenience of calculation, the summation may be limited to a finite mode (L).

[0122] According to one embodiment, the mean field system can formulate the objective function as a stochastic control problem by using a controlled stochastic differential equation having a neural agent.

[0123] FIG. 3 is a flowchart of a prediction method using mean field theory according to one embodiment of the present disclosure.

[0124] Referring to FIG. 3, in operation 310, the processor may generate multiple observation data by sampling past data into an arbitrary time distribution. In one embodiment, the arbitrary time distribution may include a non-regular, i.e., irregular distribution. To this end, a system according to one embodiment can perform accurate future predictions even for irregular time series data by modeling dy values rather than y values.

[0125] In operation 330, the processor can generate propagation signals for each of multiple observation data based on mean field theory. In one embodiment, the processor can model the average movement of infinite or near-infinite observations based on mean field theory. That is, the processor can determine information about what behavioral patterns infinite or near-infinite observations will exhibit based on mean field theory.

[0126] In one embodiment, the processor may generate a forward propagation signal based on mean field theory and generate a backward propagation signal based on the forward propagation signal. The processor may generate a forward or backward propagation signal and evaluate the forward or backward propagation signal. Additionally, the processor may update a control profile based on the backward propagation signal.

[0127] In one embodiment, the processor can generate a forward propagation signal for each of the plurality of observed data.

[0128] In operation 350, the processor can determine a predicted value by aggregating the results of propagation signal calculations for multiple observation data.

[0129] One embodiment of the present disclosure is a cost function designed to train a neural agent Minimize and the value function It can derive.

[0130] In one embodiment, neural graphene and a fixed set of acceptable control elements For this, the value function can be defined as follows [Equation 5].

[0131] [Mathematical Formula 5]

[0132]

[0133] Here, G represents the final cost at time t=T, and Is It is an aggregation function that satisfies .

[0134] In one embodiment, to generate a future prediction, the mean field predictor is a coalition, i.e., the time difference of the predictor Collaboration is possible by forming. Here, the expectation for label u is the target continuous interval Predictor variables close to It can be used to aggregate weight determinations (i.e., w) for a continuum of

[0135] In one embodiment, the neural agent is a value function that characterizes a state in which a continuum of players forms an association to cooperatively predict the best future event. It can be trained to derive. The neural agent is a predictor It influences the number of, which in turn can continuously affect individual state variables as dynamics propagate through interactions via neural graphons. To address this, one embodiment may formulate the continuous sequence prediction problem as a mean-field game. One embodiment of the present disclosure and The best control variable that induces the best possible response in the recursive relationship between The purpose is to find the accurate solution in the optimal control profile over time by investigating the forward-backward partial differential equation (FBPDE) system in the mean field domain. In one embodiment, by investigating the forward-backward partial differential equation (FBPDE) system in the mean field domain, the solution is found. This can be derived. In one embodiment, the obtained optimal neural agent For [Equation 5], the value function can be obtained by solving the two PDEs below.

[0136] [Hamilton-Jacobi-Bellman (HJB) Equation]

[0137]

[0138] [Fokker-Planck-Kolmogorov (FPK) equations]

[0139]

[0140] Here, ∇· represents the Laplacian and the divergence operator, respectively.

[0141] In one embodiment, the stochastic Hamiltonian system H can be expressed by the following [Equation 6].

[0142] [Mathematical Formula 6]

[0143]

[0144] Here, is the graphone interaction term of [Definition 2].

[0145] In one embodiment, the HJB equation and the FPK equation can describe the propagation rules of state variables and value functions over time, respectively. In mean-field equilibrium, these PDEs describe the laws of state variables, i.e. with limit error and It can be combined by matching. This mean-field equilibrium state can be expressed as [Definition 4] as follows.

[0146] [Definition 4]

[0147]

[0148] In one embodiment, the mean field equilibrium state is a continuum of predictor variables policies non-optimal counterparts that cause marginal error It can mean a state where there is no incentive to change to. In other words, the mean market equilibrium state is It can mean a state. In one embodiment, the optimal average field predictor is a limit error population It can approximate.

[0149] In one embodiment, solving the HJB equation and the FPK equation can be computationally difficult due to nonlinearity such as neural networks. Accordingly, one embodiment of the present disclosure may use a gradient system described below with reference to FIG. 4 to solve the equations.

[0150] In operation 370, the processor determines a loss value based on the difference between the predicted value and the actual value, and in operation 390, the processor can train an artificial intelligence prediction model until the loss value becomes less than or equal to a preset value. In one embodiment, the artificial intelligence prediction model can be used as a predictor to predict future information after being trained until the loss value becomes less than or equal to a preset value.

[0151] According to one embodiment of the present disclosure, a mean-field continuous sequence predictor may be provided that efficiently generates a continuous sequence having a complexity of a near-infinite order. Additionally, according to one embodiment of the present disclosure, complex inductive biases in time-series data may be captured by using graphones. Furthermore, a gradient descent-based method and a virtual agent play approach may be used to reconstruct the time-series forecasting problem into a mean-field game, utilize stochastic maximum originals, and identify Nash equilibria.

[0152] FIG. 4 is a diagram showing the gradient system of an average field predictor associated with the updated parameters of neural agents in the m-th iteration step according to one embodiment of the present disclosure.

[0153] Referring to FIG. 4, the gradient system may include an information propagation step (410) and a control profile update step (420).

[0154] In one embodiment, the information propagation step (410) may include the step of providing population information from the previous step (m-1th) to a neural agent performing the m-th iteration. At this time, the population updated as shown in the following [Equation 7] through FBSDE Information about it can be disseminated.

[0155] [Mathematical Formula 7]

[0156]

[0157] At this time, backward propagation is in a terminal state While it starts from the initial state, forward propagation starts from the initial state, which means it is parallel to the PDE system in [Definition 2].

[0158] In one embodiment, the control profile update step (420) is a neural agent Using the reverse dynamics value The parameter along the steepest direction that minimizes it It may include a step of performing an update regarding. Cost function The inverse dynamics associated with it provide updates to the parameters, allowing the mean-field predictor to gradually approximate the target interval.

[0159] In one embodiment, the processor generates the m-th iteration result. It can be provided to a neural agent that repeats the m+1th time.

[0160] In one embodiment, if the information propagation step (410) and the control profile update step (420) are repeated m times, the loss is minimized, making optimal prediction possible.

[0161] In one embodiment, the gradient system of [Definition 4] is a realizable function The optimal neural agent that causes It can derive.

[0162] [Mathematical Formula 8]

[0163]

[0164] In one embodiment, [Equation 8] is This shows that both the HJB and FPK equations can be solved, which guarantees probabilistic optimality.

[0165] In one embodiment, for convergence to equilibrium, the projector Φ and the updater These can be expressed as [Equation 9] and [Equation 10], respectively.

[0166] [Mathematical Formula 9]

[0167]

[0168] [Mathematical Formula 10]

[0169]

[0170] In step m, the configuration of these operators takes the population information of the previous state and moves to the next step It can be mapped to. That is, the population generated according to the above algorithm It can converge to the Wasserstein metric as the step m increases.

[0171] According to one embodiment, it is demonstrated that an average field game can be efficiently utilized in continuous sequence prediction by showing that the gradient system converges if it is repeated sufficiently m times.

[0172] It can be seen that the prediction method according to one embodiment of the present disclosure has superior performance compared to other methods, as shown in [Table 1] below.

[0173] MethodsMIT Humanoid RobotMIMIC-IIBeijing Air QualityMean Squared Error (MSE)Mean Absolute Error (MAE)MSEMAEMSEMAENeural Laplace8.11±0.2517.03±0.337.76±0.0418.70±0.083.21±0.1211.45±0.23MaSDEs16.51±0.2127.89±0.308.41±0.0620.67±0.083.47±0.0313.13±0.07CRU32.08±5.0742.50±3.9013.09±0.3124.68±0.473.48±0.0612.76±0.19Latent SDE6.01±0.1415.94±0.148.04±0.0219.63±0.063.29±0.0311.99±0.07Neural LSDE6.80±0.1416.51±0.087.93±0.0519.09±0.073.74±0.0411.98±0.15CONTIME6.88±0.2916.60±0.2512.29±0.1425.26±0.125.15±0.1715.86±0.27Contiformer5.94±0.2315.29±0.267.90±0.1219.05±0.183.25±0.1011.48±0.16S45.59±0.1613.98±0.1913.24±0.0124.79±0.303.95±0.1512.35±0.17Mamba5.21±0.0913.71±0.1513.23±0.0224.76±0.193.68±0.1411.56±0.24MFPs (Exp.)3.89±0.1011.42±0.147.51±0.0818.59±0.113.14±0.0711.45±0.13MFPs (Cosin.)3.91±0.0711.43±0.077.51±0.0618.60±0.103.13±0.0711.38±0.08

[0174] With reference to FIGS. 1a to 4, an artificial intelligence prediction model that has been sufficiently trained by the method described above can be used to learn future data. Hereinafter, a method for predicting future data using an artificial intelligence prediction model trained by the method described above with reference to FIGS. 1a to 4 will be described.

[0175] FIG. 5 is a block diagram of a computing system that performs a method for predicting future data according to one embodiment of the present disclosure.

[0176] Referring to FIG. 5, a computing system (1000) for predicting future data according to one embodiment of the present disclosure includes a user computing device (110), a training computing system (150), and a server computing system (130), and each device and system may be connected to communicate via a network (170). Throughout the present disclosure, the future data to be predicted may also be referred to as a target.

[0177] In one embodiment, 1) a user computing device (110) may perform a method for predicting future data by using a local or / and external machine learning model (120) or by using a machine learning model (140) provided by a server. The machine learning model (120, 140) of FIG. 5 may correspond to the artificial intelligence prediction model described above with reference to FIG. 1a through FIG. 4. The machine learning model (120, 140) may include a model trained by a training computing system (150) according to the training method described above with reference to FIG. 1a through FIG. 4.

[0178] In another embodiment, 2) a server computing system (130) communicating with a user computing device (110) may provide a future data prediction service to the user computing device (110) on an application or / and the web in response to a request from a user through the user computing device (110).

[0179] In another embodiment, 3) the user computing device (110) and the server computing system (130) may perform at least part of the method of performing future data prediction in conjunction with each other to provide a future data prediction service to the user.

[0180] Additionally, according to various embodiments, a user computing device (110) and / or a server computing system (130) can train a machine learning model (120, 140) for future data prediction through interaction with a training computing system (150) that is communicatedly connected via a network (170). In this case, the training computing system (150) may be separate from the server computing system (130) or may be part of the server computing system (130).

[0181] In some embodiments, the training computing system (150) may be part of the server computing system (130) or part of the user computing device (110).

[0182] In the following description, the method is described based on connecting to a server computing system (130) via a user computing device (110) to execute a prediction task, collecting and analyzing data necessary for future data prediction directly from the server computing system (130) or using a model from a separate server, and performing future forecast predictions based on the collected and analyzed data. However, it can be understood that if a part of the process described as being performed in the server computing system (130) is performed in the user computing device (110), it is naturally included in the description of the present invention.

[0183] The user computing device (110) may include all other types of computing devices, such as a smartphone, a mobile phone, a digital broadcasting device, a PDA (personal digital assistants), a PMP (portable multimedia player), a desktop, a wearable device, an embedded computing device and / or a tablet PC.

[0184] The user computing device (110) includes at least one processor (111) and memory (112). Here, the processor (111) may be composed of at least one of a central processing unit (CPU), a graphics processing unit (GPU), ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays), controllers, microcontrollers, microprocessors, and / or electrical units for performing other functions, or a plurality of electrically connected processors.

[0185] The memory (112) may include one or more non-transient / transient computer-readable storage media such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, and combinations thereof, and may include web storage of a server that performs memory storage functions on the internet. This memory (112) may store data and instructions necessary for the at least one processor (111) to perform the operation of an application for performing target prediction.

[0186] In one embodiment, the user computing device (110) may store at least one machine learning model (120). For example, the user computing device (110) may be various machine learning models, such as a plurality of neural networks (e.g., deep neural networks) that perform predictions on future data (targets) based on structured / quantitative data, or other types of machine learning models including non-linear models and / or linear models, and may be composed of a combination thereof. In one embodiment, the machine learning models (120) may include an artificial intelligence prediction model trained by sampling past data into an arbitrary time distribution to generate a plurality of observation data, generating a propagation signal for each of the plurality of observation data based on mean-field theory, aggregating the results of calculating propagation signals for the plurality of observation data to determine a predicted value, and determining a loss value based on the difference between the predicted value and the actual value.

[0187] For example, the predictive model may store linear regression, decision trees, random forests, gradient-boosting pre-trained language models or / and deep learning models. And the neural network may include at least one of feed-forward neural networks, recurrent neural networks (e.g., long short-term memory recurrent neural networks), convolutional neural networks or / and other forms of neural networks.

[0188] Additionally, the user computing device (110) may store a model to be used in each process for predicting future data and a prompt template that serves as the basis for input to the model. For example, the user computing device (110) may store: 1) a prompt for generating a query from user input, 2) a prompt for determining the relationship between future data (target) and future data (target) influence variables, 3) a prompt for identifying raw data associated with the determined relationship, 4) a prompt template for quantifying unstructured data, etc.

[0189] That is, in one embodiment, the user computing device (110) can perform future data prediction based on the received data by requesting an external server to perform some execution steps in the future data prediction task through a prompt, etc.

[0190] In another embodiment, for a future data prediction task requested through a user computing device (110), the server computing system (130) may perform future data prediction through at least one machine learning model (140) and a machine learning model of another server and provide the predicted data to the user computing device (110).

[0191] The user computing device (110) may include at least one input component (121) for detecting user input. For example, the user input component (121) may include a touch sensor (e.g., a touch screen or / and a touchpad, etc.) for detecting a touch of a user input medium (e.g., a finger or a stylus), an image sensor for detecting user motion input, a microphone for detecting user voice input, a button, a mouse and / or a keyboard, etc. Additionally, the user input component (121) may include an interface and an external controller when receiving input to an external controller (e.g., a mouse, a keyboard, etc.) through an interface.

[0192] The server computing system (130) includes at least one processor (131) and memory (132).

[0193] Here, the processor (131) may be composed of at least one of a central processing unit (CPU), a graphics processing unit (GPU), ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays), controllers, microcontrollers, microprocessors, and / or other electrical units for performing functions, or a plurality of electrically connected processors.

[0194] And memory (132) may include one or more non-transient / transient computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof. This memory (132) may store data and instructions required for prompt templates, machine learning models (140) for future prediction, etc., for a processor (131) to perform tasks through a language model of the server computing system (130) or / and a language model of an external server. For example, the server computing system (130) may include a neural network or / and other multi-layer non-linear models as a machine learning model (140) for future prediction. Exemplary neural networks may include feed-forward neural networks, deep neural networks, recurrent neural networks, and convolutional neural networks.

[0195] In one embodiment, the server computing system (130) may be implemented to include at least one computing device. For example, the server computing system (130) may be implemented to operate a plurality of computing devices according to a sequential computing architecture, a parallel computing architecture, or a combination thereof. Additionally, the server computing system (130) may include a plurality of computing devices connected via a network.

[0196] In one embodiment, the server computing system (130) may further include a data store computing system (1000) (hereinafter, data store) which is a storage for continuously storing and managing raw data that forms the basis for future predictions of future data (target). This data store may include various forms of data storage, ranging from file systems to cloud storage.

[0197] For example, a data store may include at least one database among a relational database that uses a structured query language (SQL) to define and manipulate data, a NoSQL database designed for flexibility and scalability to process unstructured and semi-structured data, a data warehouse optimized for querying and analysis by centralizing large volumes of data from multiple sources as a system used for reporting and data analysis, a data warehouse that stores large volumes of raw data in basic formats such as structured data, semi-structured data, and unstructured data, and a local storage device or Network Attached Storage (NAS) that stores data in files in a format generally accessible by a computer operating system.

[0198] The training computing system (150) includes at least one processor (151) and a memory (152). Here, the processor (151) may be composed of at least one of a central processing unit (CPU), a graphics processing unit (GPU), ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays), controllers, micro-controllers, microprocessors, and / or electrical units for performing other functions, or a plurality of electrically connected processors. In one embodiment, the training computing system (150) can train an artificial intelligence prediction model by repeating the actions of generating a plurality of observation data by sampling past data into an arbitrary time distribution, generating a propagation signal for each of the plurality of observation data based on mean-field theory, determining a predicted value by aggregating the results of calculating propagation signals for the plurality of observation data, and determining a predicted value by aggregating the results of calculating propagation signals for the plurality of observation data. For example, the training computing system (150) can train an artificial intelligence prediction model by repeating the above operation until the calculated loss value becomes less than or equal to a preset value.

[0199] And the memory (152) may include one or more non-transient / transient computer-readable storage media such as RAM, ROM, EEPROM, EPROM, flash memory device, magnetic disk, etc. and combinations thereof.

[0200] This memory (152) can store data and instructions necessary for the processor (151) to train a future prediction model.

[0201] For example, the training computing system (150) may include a model trainer (160) that trains an artificial intelligence model stored in a user computing device (110) and / or a server computing system (130) using various training or learning techniques, such as back propagation of error.

[0202] For example, a model trainer (160) can update one or more parameters of a machine learning model for future prediction based on a defined loss function in a backpropagation manner.

[0203] In some embodiments, performing backpropagation of the error may include performing truncated backpropagation through time. The model trainer (160) may perform a number of generalization techniques (e.g., weight devaluation, dropout, knowledge distillation, etc.) to improve the generalization ability of the fusioncasting model being trained.

[0204] And the model trainer (160) may include computer logic utilized to provide desired functions. The model trainer (160) may be implemented as hardware, firmware and / or software that controls a general-purpose processor. For example, in one embodiment, the model trainer (160) may include a program file stored in a storage device, loaded into memory, and executed by one or more processors. In another embodiment, the model trainer (160) includes one or more sets of computer-executable instructions stored in a tangible computer-readable storage medium, such as a RAM hard disk or an optical or magnetic medium.

[0205] The network (170) includes, but is not limited to, 3GPP (3rd Generation Partnership Project) networks, LTE (Long Term Evolution) networks, WIMAX (World Interoperability for Microwave Access) networks, the Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), Bluetooth networks, satellite broadcasting networks, analog broadcasting networks and / or DMB (Digital Multimedia Broadcasting) networks. Generally, communication through the network (170) can be performed using any type of wired and / or wireless connection through various communication protocols (e.g., TCP / IP, HTTP, SMTP, FTP), encodings or formats (e.g., HTML, XML), and / or protection schemes (e.g., VPN, Secure HTTP, SSL).

[0206] FIG. 6 is a block diagram of a computing device, which is one of the components of a computing system (1000) that performs a method for predicting future data according to one embodiment of the present disclosure.

[0207] Including FIG. 6, the computing device (100) included in the user computing device (110), server computing system (130), and training computing system (150) includes a plurality of applications (e.g., Application 1 to Application N). Each application may include a machine learning library. For example, the applications may include a future prediction application, a text messaging application, an email application, a dictation application, a virtual keyboard application, a browser application, a separate future prediction application, etc. In one embodiment, the computing device (100) may include a model trainer (160) for training a future prediction model, and may store and operate the future prediction model to perform a future data prediction task on input data.

[0208] Each application of the computing device (100) can communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, and / or additional components. In one embodiment, each application can communicate with each device component using an API (e.g., a public API). In one embodiment, the API used by each application may be specific to that application.

[0209] FIG. 7 is a block diagram of another aspect of a computing device, which is one of the components of a computing system (1000) that performs a method for predicting future data according to one embodiment of the present disclosure.

[0210] Referring to FIG. 7, the computing device (200) includes a plurality of applications (e.g., Application 1 through Application N). Each application may communicate with a central intelligence layer. For example, applications may include an image processing application, a text messaging application, an email application, a dictation application, a virtual keyboard application, a browser application, etc. In one embodiment, each application may communicate with the central intelligence layer (and a model stored therein) using an API (e.g., an API common across all applications). The central intelligence layer may include prompts using a plurality of machine learning models or / and language models. For example, as illustrated in FIG. 7, each machine learning model and at least some thereof may be provided for each application and managed by the central intelligence layer. In another embodiment, two or more applications may share a single machine learning model. For example, in some embodiments, the central intelligence layer may provide a single model for all applications. In some embodiments, the central intelligence layer may be included within the operating system of the computing device (200) or implemented otherwise.

[0211] The central intelligence layer can communicate with the central device data layer. The central device data layer may be a centralized data store for the computing device (200). As illustrated in FIG. 7, the central device data layer can communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, and / or additional components. In some embodiments, the central device data layer can communicate with each device component using an API (e.g., a private API).

[0212] The technology described herein may refer to servers, databases, software applications, and other computer-based systems, as well as actions taken and information transmitted to or from said systems. It will be recognized that the inherent flexibility of computer-based systems allows for a wide range of possible configurations, combinations, division of tasks, and functionality between and from components. For example, the processes described herein may be implemented using a single device or component or multiple devices or components operating in combination. Databases and applications may be implemented in a single system or in a distributed system across multiple systems. Distributed components may operate sequentially or in parallel.

[0213] In one embodiment, a computing system (1000) may collect observation data (e.g., historical data or raw data), analyze the collected observation data to predict the outlook of a target, and provide relationship information that serves as the basis for the outlook prediction. This will be explained in more detail with reference to FIGS. 8 to 19.

[0214] FIG. 8 is a flowchart of a method for predicting the outlook of a target through a machine learning model according to one embodiment of the present disclosure.

[0215] In operation S101, a target prediction request is received from a user computing device (110) of a computing system (1000), and a target prediction task can be executed according to the received target prediction request. In one embodiment, the user computing device (110) receives a text-based target prediction request from a user through a chat interface and transmits the text containing the target prediction request to a server computing system (130) to execute a target prediction task of the server computing system (130).

[0216] The server computing system (130) can execute a target prediction task by detecting a pre-stored phrase for a target prediction request in text input through a chat interface, or by analyzing the text based on context to detect the context of the target prediction request.

[0217] And the server computing system (130) can recognize text containing a target prediction request and determine target prediction elements for target prediction.

[0218] Here, the target prediction element includes a target to be predicted and may further include at least one of a total prediction length and a prediction unit time. Throughout the disclosure, 'target' may be referred to as 'future data'.

[0219] In one embodiment, the target refers to a value that varies over time, and predicting the target means calculating the target value at a predetermined future point in time by predicting it over a total forecast period using a prediction unit period as a cycle.

[0220] Specifically, the server computing system (130) can determine the target prediction element by analyzing the text of the target prediction request and inputting the text of the target prediction request into a query generation prompt template that determines the target prediction element, and receiving at least one of the target prediction elements as an output from the language model.

[0221] For example, a query generation prompt template can be configured to input “text of a target forecast request” into an interactive forecast request field as input, and to recognize values corresponding to the target, total forecast period, and unit period based on Named Entity Recognition (NER) as an action, and to return the target, total forecast period, and unit period of the query as output values.

[0222] As a more specific example, when a user inputs a target prediction request text such as “Predict how the lithium price will be in the future on a monthly basis for 12 months,” the server computing system (130) can determine the target prediction elements by inputting the following into a language model as a prompt: <<Input: Interactive prediction request “Predict how the lithium price will be in the future on a monthly basis for 12 months”, Action: NER, to recognize values corresponding to the target, total forecast period, and unit period for the input text, and to generate and return a query as follows, Output: Query - {Target:, Unit period:, Total forecast period:}>>, thereby outputting the target prediction elements as {Target: Lithium market price, Unit period: Monthly, Total forecast period: 12 months}. At this time, if the target prediction elements are not specified or are abstract, the server computing system (130) can provide a separate future prediction interface to input target prediction elements for the target prediction, and transmit the target prediction elements input through the provided future prediction interface to the server computing system (130) to execute the target prediction task. That is, when the target is classified according to category from a higher concept to multiple lower concepts, the server computing system (130) can list target keywords mapped to the higher and lower concepts and provide them for the user to select.

[0223] For example, the future prediction interface provides target keywords derived through named entity recognition sequentially from superordinate to subordinate concepts and allows the user to select, thereby enabling the user to more accurately determine the target they wish to predict.

[0224] Once the target prediction element is determined, the server computing system (130) can determine relationship information between the target and the target influence variable. In operation S103, the server computing system (130) can collect target analysis data for the target. This may be done by filtering data in a data store within the server computing system (130) or by crawling data existing on the internet.

[0225] For example, a server computing system (130) can detect target analysis data by keyword searching based on keywords representing a determined target. Here, the target analysis data may be target analysis reports related to the target.

[0226] Specifically, the server computing system (130) can request to search for analysis data related to the target based on the target's keywords and return it through an analysis report based on a target analysis report collection prompt template set in the language model.

[0227] More specifically, the server computing system (130) can obtain a target analysis report as output by using a target analysis report collection prompt template to <<Input: Target - Lithium Market Price, Action: Find and return an analysis report with a title associated with the target through keyword search>>.

[0228] When such a target analysis report is obtained, the server computing system (130) can record a reference to the target so that semantic information about the target can be extracted. The server computing system (130) can detect target influence variables that affect the target from the collected target analysis data and generate relationship information between the target influence variables and the target.

[0229] In one embodiment, the relationship information may include information about target influence variables that affect the future prediction of the target, and information about the relationship between the target influence variables and the target.

[0230] More specifically, information regarding target influence variables refers to information defining target influence variables at a semantic level, and information regarding the relationship between the target and target influence variables may refer to the chronological relationship, proportion, and weight of influence exerted between the target and target influence variables, as well as among the target influence variables.

[0231] In one embodiment, the server computing system (130) can generate relationship information by analyzing a semantic causal graph at the semantic level as target-target influence variable association information based on collected target analysis data.

[0232] To this end, in one embodiment, the server computing system (130) can perform a topic-relevant terms recognition module in the target analysis data to detect and annotate target influence variables associated with the target in the target analysis data.

[0233] And the server computing system (130) can generate a relationship graph at a semantic level by inputting it into a causal graph generation model trained to generate a relationship graph between a target and a target influence variable based on target analysis data annotated with the target and the target influence variable.

[0234] Here, the relationship graph between the target and the target influence variable can include information defining the target and the target influence variable at the semantic level in the nodes, along with the node names.

[0235] For example, information for determining targets and target influence variables at the semantic level may include the name, keyword, source, domain, region, place, and characteristics of the relevant element as additional annotations.

[0236] And referring to Fig. 10, the relationship graph between the target and the target influence variable can include information about the relationship between each node (target and target influence variable) regarding whether they influence each other in a preceding or succeeding manner through arrows.

[0237] In one embodiment, the server computing system (130) can perform the process of collecting target analysis data based on context and outputting relationship information between target and target influence variables based on the collected target analysis data through a RAG (Retrieval Augmented Generation) model.

[0238] Here, the RAG model can operate as a type of module that is activated to input the latest information into a large-scale language model (LLM) according to an embodiment of the present disclosure.

[0239] More specifically, the above RAG model can operate to detect more target influence variables among infinitely many pieces of information related to at least one target prediction element included in a target prediction request received from a user computing device (110).

[0240] For example, a RAG model can be one of naive RAG, advanced RAG, or modular RAG. Taking the advanced RAG model as an example, data can be refined by enhancing data granularity through pre-retrieval procedures to remove unnecessary information and special characters, optimizing the index structure through chunk size adjustments and index path changes, and adding metadata such as date and purpose to each data chunk.

[0241] Furthermore, the embedding model can be adjusted through fine-tuning embedding and / or dynamic embedding to enhance the relevance between the user's question and the retrieved content. Additionally, data can be refined through a post-retrieval process by combining important contextual information from the retrieved content with the user's question and inputting it into the LLM, rearranging the retrieved content in order of relevance, and compressing prompts based on importance.

[0242] Such a RAG model may be a model that combines a pre-trained parametric memory (e.g., a sequence-to-sequence (seq2seq) model) and a non-parametric memory (e.g., Wikipedia's dense vector index). The parametric memory may perform a search using the same phrase as a condition across the entire sequence, and the non-parametric memory may perform a search using different phrases per token as a condition.

[0243] Accordingly, the RAG model can generate fact-based language by excluding more specific, diverse, and unnecessary information through a Large Language Model (LLM).

[0244] Through the process of generating relationship information according to such an embodiment, target influence variables can be clearly identified and defined at a semantic level by concepts, categories, topics, or / and specific criteria, thereby accurately determining the context and domain related to the target influence variables at a semantic level. Furthermore, the information defined in this way can be attached as annotations to the target influence variables and utilized to perform data preparation at a semantic level thereafter, thereby accurately determining the raw data required for target prediction. In operation S105, when the relationship information between the target and the target influence variables is determined, the server computing system (130) can perform a data preparation step based on the determined relationship information.

[0245] First, the server computing system (130) can collect raw data related to the target of the relationship information and the target influence variable for the forecast of the target.

[0246] In one embodiment, the server computing system (130) can collect unstructured data (e.g., news, analysis reports, etc. consisting of text) and structured data related to the target and target influence variables through keyword search representing the target and target influence variables, and store the collected raw data in a data store.

[0247] In other words, the server computing system (130) can store vectors defined through unstructured data and structured data in a vector database.

[0248] And the server computing system (130) can determine whether raw data stored in the data store is related to the target influence variable at a semantic level and extract the relevant data. At this time, the raw data can be filtered according to whether it matches the semantic definition included in the aforementioned target-target influence variable to obtain the basic prediction data necessary for target prediction.

[0249] For example, a server computing system (130) can input a document to be judged as input and output a correlation with a target influence variable at a semantic level as an action, thereby extracting prediction basis data that is related to the target and the target influence variable at a semantic level from the raw data.

[0250] In order to identify data related to target influence variables that affect such targets, past data analysis knowledge and domain expertise regarding the target-related field are important. To complement this, the server computing system (130) can derive events related to the target and events unrelated to the target through a language model.

[0251] For example, a server computing system (130) can instruct a language model through an associated / unassociated event creation prompt containing a phrase instructing it to act as a domain expert for the target, and return a plurality of associated events that affect changes in the target at a semantic level, and a plurality of unassociated events that do not affect or affect below a threshold.

[0252] Specifically, by including information defining each target influence variable at the semantic level in the relevant / unrelevant event creation prompt, the language model can be instructed to distinguish between relevant and unrelevant events affecting the target in the prediction basis data.

[0253] And the server computing system (130) can create a document identification prompt for classifying and identifying prediction basis data from raw data through the returned association / non-association event, and request a language model to classify documents for raw data based on the created document identification prompt, thereby accurately extracting prediction basis data related to the target and target influence variable.

[0254] In addition, unstructured data related to the outlook of the target can be detected from the prediction basis data related to the target or / and target influence variables. That is, the server computing system (130) can classify documents related to the target or / and target influence variables from the raw data stored in the data store, and detect related events or / and sentences that affect the target in the documents.

[0255] For example, a document classification prompt may be configured to: 1) instruct to predict a target as an expert on the target; 2) input at least one document included in the raw data to be identified as input data; 3) provide an action instruction to select one of the associated event options that are related to the prediction of the target in the document and the non-associated event options that do not affect the target; and 4) add associated events that affect the target among the information in the document to the associated event options or add non-associated events that do not affect the target to the non-associated event options.

[0256] As a specific example, when the element to be predicted is “lithium production,” the server computing system (130) can identify whether the document in the raw data is related to “lithium production” through a prompt consisting of: <<1) Please become a lithium expert. 2) Input: [Document] 3) Classify [Documents] related to the increase or decrease in lithium production. There are two options for your answer. - Option 1: Highly relevant (Related Event List), - Option 2: Unrelevant (Non-Related Event List), 4) First, please describe the reason why the information provided regarding lithium production increases or decreases. Then, place the option number on the last line.>>

[0257] That is, the server computing system (130) can collect raw data in relation to the target or / and target influence variables, classify prediction basis data related to the target or / and target influence variables from the raw data, determine associated events and sentences that influence the target's outlook from the classified prediction basis data, and filter sentences and associated events related to the target's outlook from the raw data into unstructured data.

[0258] Next, the server computing system (130) can use a language model to identify and classify whether each feature stored in the data store belongs to a relevant target semantic variable, and can generate a structured dataset consisting of structured data for the relevant features. Here, features refer to attributes of data stored in a structured data format as various factors that can influence the outlook of the target, and may include, for example, CSV, Excel file or / and table, etc.

[0259] For example, if the target is the price of lithium, the target influencing variable refers to a variable related to the price of lithium, such as “spodumene, lithium mine, lithium salt lake, lithium carbonate, lithium hydroxide, lithium battery,” and the feature may be structured data representing “Australia spodumene production volume, Australia spodumene export volume, Chile lithium hydroxide production volume, Chile lithium hydroxide export volume, China spodumene import volume, China lithium carbonate import volume, China lithium carbonate production volume, China lithium carbonate sales volume, lithium battery efficiency (km / wh), China electric vehicle sales volume, China electric vehicle subsidy plan,” which belong to the target influencing variable and affect the outlook of the target.

[0260] That is, in one embodiment, the target influence variable may be a specific concept, topic, or category that influences the target outlook, and the feature may refer to the attributes of the structured data of the data repository related to the target influence variable.

[0261] And the server computing system (130) can filter relevant features related to target influence variables among the features of the data store and integrate the filtered features to generate structured data or a structured data set.

[0262] Specifically, to explain the process of generating structured data or a data set, first, the server computing system (130) can list features available in the data store as feature names. And it can also list a description for each feature.

[0263] At this time, the server computing system (130) can perform embedding by refining the description using LLM. Accordingly, there is an effect of better including important content in the description of features during embedding.

[0264] And the server computing system (130) can filter features related to target influence variables that can influence the target among the listed features based on their association with target influence variables defined at a semantic level.

[0265] To this end, the server computing system (130) may utilize a machine learning model or a language model that classifies the relationship between features and target influence variables.

[0266] In one embodiment, the server computing system (130) lists the feature names and descriptions of the data store and inputs the keywords of the target influence variables of the relationship information into a word embedding model to detect feature names associated with the keywords of each target influence variable according to feature relevance, thereby mapping the features classified by each target influence variable. Here, word embedding refers to a method of classifying features relevant to semantic target influence variables based on feature names and descriptions and representing words as vectors.

[0267] And the server computing system (130) can retrieve tabular data corresponding to the names of classified features from a data store, and process the retrieved tabular data by cleaning and preprocessing it, and arranging it in a structured format so that it is suitable for input into a target prediction model, thereby generating a time-series structured data format (e.g., CSV, Excel, etc.). In this way, the server computing system (130) can collect accurate raw data that serves as the basis for target prediction based on the relationship information between the target and the target influence variable, and can precisely filter the tabular and unstructured data required for target prediction from the collected raw data and use them as input data for the target prediction model.

[0268] In operation S107, the server computing system (130) can generate quantitative data by quantifying unstructured data. For example, the server computing system (130) can generate quantitative data by quantifying unstructured data through text processing for prediction.

[0269] First, the server computing system (130) can generate prediction scoring data by scoring the target prediction values for each target prediction report among the documents classified as unstructured data.

[0270] Specifically, in one embodiment, the server computing system (130) inputs each target forecast report into a language model, performs sentiment analysis on associated sentences classified as target forecasts, classifies target forecasts as positive, neutral, or negative, and operates according to a target forecast scoring prompt that returns a numerical level of tone, thereby generating quantitative data by arranging the forecast scoring data in chronological order.

[0271] Specifically, the returned target outlook scoring prompt can be configured to classify opinions on the target outlook in the input text into positive / neutral / negative when a target outlook report (or related sentences associated with the target outlook extracted from the target outlook report) is input, and to select a tone of the outlook opinion in the input text within a predetermined level range.

[0272] Additionally, the server computing system (130) can generate an event list based on associated events that affect the outlook of targets detected in documents when filtering unstructured data.

[0273] For example, the server computing system (130) can generate quantitative data including the date of occurrence of an event affecting the target's outlook, related features, the value of related features, and a list of events that quantify the impact and influence affecting the target's outlook.

[0274] Additionally, the server computing system (130) can return an embedding matrix by encoding each document classified as unstructured data into a latent vector through the encoder of the language model. Specifically, the server computing system (130) can obtain an embedding matrix that captures the semantic essence of each document by using the language model to encode the document into a latent vector.

[0275] Specifically, the server computing system (130) can input documents such as news among unstructured data into the encoder of a language model to generate a document embedding matrix for modeling the themes prevalent in each document. The document embeddings generated in this way can highlight themes (variables, features) that may influence the future prospects of the target by identifying the themes prevalent within the documents using an algorithm such as LDA (Latent Dirichlet Allocation).

[0276] In operation S109, the server computing system (130) can predict a target outlook based on the generated structured dataset and quantitative data.

[0277] Specifically, the server computing system (130) can calculate the forecast value of the target for each forecast unit period during the total forecast period based on the quantitative data and structured dataset, etc.

[0278] To this end, the server computing system (130) can create an integrated structured dataset by concatenating a structured dataset created based on structured data and a quantitative dataset created based on unstructured data.

[0279] Specifically, the server computing system (130) can first classify data according to the influence affecting the target and combine the data by assigning weights.

[0280] For example, the server computing system (130) can classify among the features included in the structured dataset variables that have an effect on the target above a threshold as macro variables, and variables that have an effect below a threshold as micro variables.

[0281] And the server computing system (130) can match the classified macro variables and quantitative data in a time series and integrate them into one macro time series structured dataset, and can integrate the data classified into micro variables into one micro time series structured dataset.

[0282] That is, in one embodiment, an integrated structured dataset containing both structured data information and unstructured data information can be generated by matching and combining event lists and prediction scoring data according to the time-series flow of the structured dataset.

[0283] And the server computing system (130) can input the generated integrated structured dataset into a prediction model to calculate the forecast value of the target for each prediction unit period during the total forecast period. Here, the prediction model may include linear regression, decision tree, random forest, gradient boosting, deep learning model or / and a pre-trained language model.

[0284] In one embodiment, the server computing system (130) may additionally input relationship information at the semantic level into the prediction model to induce the prediction of a target outlook based on the relationship information. In addition, in one embodiment, the server computing system (130) may input the aforementioned embedding matrix into a second prediction model that predicts a target outlook based on the embedding matrix, thereby enabling unstructured target prediction information not present in the structured data to be reflected in the prediction value.

[0285] Specifically, in one embodiment, the server computing system (130) can input an integrated structured dataset into a first prediction model to primarily calculate a first target forecast. Then, the server computing system (130) can regulate the first target forecast based on a semantic relationship graph to calculate a second target forecast that reflects relationship information between the target influence variable and the target.

[0286] Finally, the server computing system (130) can calculate the final target forecast by calibrating the calculated second target forecast based on the unstructured target forecast information.

[0287] In operation S111, the server computing system (130) can generate basis information by interpreting the basis for the target outlook based on relation information and a structured dataset.

[0288] Referring to FIG. 14, the server computing system (130) can output basis information by interpreting the basis for the final target forecast based on semantic-level relationship information and a structured dataset.

[0289] Specifically, the server computing system (130) can generate a past relationship graph at the feature level based on the past existing target values, the structured dataset, and the semantic relationship graph based on the present in the structured dataset.

[0290] And the server computing system (130) can generate future relationship graphs at the feature level based on a relationship discovery model (Data-driven Causal Discovery) trained with a structured dataset and a semantic relationship graph based on the present future final target forecast and past relationship graph.

[0291] And the server computing system (130) can provide a future relationship graph mapped to the target forecast value, and provide as evidence information on how the target forecast value was calculated due to the extent that certain features influence the target forecast value.

[0292] FIG. 15 is an example of a chart for a predicted target outlook according to one embodiment of the present disclosure.

[0293] Referring to FIG. 15, the server computing system (130) can provide a target forecast graph showing the target forecast value calculated for each forecast unit period during the total forecast period through the user computing device (110).

[0294] FIG. 16 is an example of a causal relationship graph presented as supporting data for a predicted target outlook according to one embodiment of the present disclosure.

[0295] Referring to FIG. 16, the server computing system (130) can provide a relationship graph at the feature level, which interprets the basis for the prediction of the target forecast, as basis information through the user computing device (110).

[0296] FIG. 17 is another example of a causal relationship graph presented as supporting data for a predicted target outlook according to one embodiment of the present disclosure.

[0297] Referring to FIG. 17, the server computing system (130) can further improve user reliability of the target forecast by displaying specific numerical values of features that influenced the predicted target forecast at a specific forecast point in time.

[0298] In this way, the server computing system (130) can output and provide the final target forecast and supporting information calculated for the target in response to a target prediction request.

[0299] In addition, the server computing system (130) can receive a hypothetical situation (what-if) from a user and perform a simulation of the input hypothetical situation to predict a hypothetical target outlook.

[0300] In operation S113, when the server computing system (130) receives input from a user regarding an assumption situation for changing the prediction environment, it can predict the assumption target outlook by performing a simulation again to predict the target outlook for the target prediction request according to the input assumption situation, and by providing the target outlook value and supporting information in the changed environment (i.e., assumption situation) again. Specifically, referring to FIG. 14, the user can input a change in the prediction environment by changing the target influence variable (hereinafter, target value) that affects the target outlook value through the user computing device (110) or by inputting the occurrence of a specific event (hereinafter, event value).

[0301] For example, regarding a target prediction request such as “predict how lithium prices will go in the future on a monthly basis for 12 months,” a change in target value means changing ‘lithium’ to lithium carbonate and / or lithium hydroxide, etc., and a change in event value may mean adding events such as war situations or supply and demand situations.

[0302] In one embodiment, the server computing system (130) may provide the target forecast and basis information according to the simulation, which re-executes the process of interpreting the target forecast and basis after changing the integrated structured data set according to the changed target influence variable when there is a change in the target influence variable, to the user computing device (110).

[0303] For the sake of convenience of explanation, the target outlook predicted without reflecting assumptions will be referred to as the 'General Result (or Target Outlook),' and the target outlook predicted with reflecting assumptions will be referred to as the 'Assumption Result (or Target Outlook).' Additionally, the process of predicting the assumption target outlook by reflecting changes in target influencing variables will be referred to as 'Assumption Simulation.'

[0304] In addition, since the simulation of the assumption situation in the embodiment is a process based on steps S101 to S111, the description will focus only on the differences.

[0305] FIG. 18 is a flowchart of a method for performing a hypothetical situation simulation according to one embodiment of the present disclosure.

[0306] Referring to FIG. 18, in operation S201, the server computing system (130) can extract and obtain the assumptions included in the target prediction request upon receiving the target prediction request that includes the assumptions.

[0307] For example, if the existing target prediction request was “Predict what the price of lithium will be on a monthly basis for 12 months” without reflecting assumptions, the target prediction request reflecting assumptions could be entered as “Predict what the price of lithium will be on a monthly basis for 12 months in the event of a war between China and Taiwan.”

[0308] In other words, in the above example, the occurrence of a specific event, the 'breakout of a war between China and Taiwan,' can be extracted as a hypothetical situation.

[0309] Additionally, the server computing system (130) may perform counterfactual inference to predict a hypothetical target outlook through a what-if simulation. Throughout this disclosure, counterfactual inference may be referred to as conditional inference.

[0310] Here, counterfactual inference refers to predicting how results will be derived when a situation is assumed in the form of a scenario. This counterfactual inference can be utilized to specify paths and eliminate causal effects in order to reduce recommendation bias regarding existing target prospects.

[0311] To this end, the server computing system (130) can derive a first hypothetical result, which is a counterfactual result when there is a hypothetical situation, and a first general result, which is a realistic result when there is no hypothetical situation, based on input of a specific path called a hypothetical situation. For example, when a hypothetical situation of “when a China-Taiwan war occurs” is additionally input to a target prediction request text that says “predict how lithium prices will be in the future on a monthly basis for 12 months,” the server computing system (130) can derive a first hypothetical result, which is a counterfactual result when a China-Taiwan war occurs, and a first general result, which is a realistic result when a China-Taiwan war does not occur.

[0312] That is, the server computing system (130) can simulate how the hypothetical result differs from the normal result through counterfactual reasoning when an artificial intervention is applied to the observed data distribution.

[0313] In other words, the server computing system (130) can derive a first assumption result by setting a first assumption target to derive a counterfactual result according to the assumption situation based on counterfactual reasoning.

[0314] Here, since deriving the first general result is identical to steps S101 to S111, it will be omitted by applying them, and the process for deriving the first assumption result will be explained below.

[0315] In operation S203, the server computing system (130) can determine a first similar situation for an assumed situation obtained based on a vector database. In one embodiment, the server computing system (130) can determine target prediction elements by performing a target prediction task by analyzing the text of the acquired assumed situation. Additionally, the server computing system (130) can search for at least one similar situation in which the similarity between the target and the target influence variable of the first assumed target is greater than or equal to a preset standard (value) through a search within a data store (in the example, a vector database) based on the determined target prediction elements. Additionally, the server computing system (130) can extract and determine the first similar situation with the highest similarity among the at least one similar situation found. That is, the determined first similar situation may be the situation most similar to the assumed situation among the situations (news) found in the data store.

[0316] In operation S205, the server computing system (130) can predict a similar target outlook for a determined first similar situation. In one embodiment, the server computing system (130) can search for attributes of the first similar situation (e.g., time and / or another associated target, etc.) through a large-scale language model (LLM) and / or data tagging, and predict a similar target outlook for the first similar situation based on the searched content. At this time, the server computing system (130) can predict a similar target outlook by collecting the latest information based on the time of the first similar situation using a RAG model. Accordingly, a hypothetical past based on the time of the first similar situation can be determined.

[0317] In operation S207, the server computing system (130) can determine the hypothetical impact by comparing actual data for the first similar situation and the predicted similar target outlook.

[0318] In operation S209, the server computing system (130) can calculate hypothesis relevance and hypothesis similarity between the determined hypothesis impact and the hypothetical situation. In one embodiment, the hypothesis relevance can be calculated and determined through a large-scale language model (LLM) to determine how relevant the hypothesis impact is to the hypothetical situation. Similarly, the hypothesis similarity can be calculated and determined through a large-scale language model (LLM) to determine how similar the hypothesis impact is to the hypothetical situation.

[0319] In operation S211, the server computing system (130) can predict a first assumption target forecast (or first assumption result) by reflecting the hypothesis impact, hypothesis relevance, and hypothesis similarity (hereinafter, hypothesis dataset) in the assumption situation based on the current time.

[0320] FIG. 19 is a graph relating to general results and assumption results according to one embodiment of the present disclosure.

[0321] Referring to FIG. 19, the server computing system (130) can compare and analyze the derived first general result (1910) and the first assumption result (1920) and provide the difference between the two results to the user as visualized data. In another embodiment, the server computing system (130) can receive input regarding changes in the prediction environment due to the occurrence of a specific event. In this case, if the server computing system (130) can quantitatively reflect the occurrence of a specific event in the event list, it can calculate the changed quantitative data, change the integrated structured data set based on this, and then re-execute the process of interpreting the target forecast and the basis, thereby outputting the target forecast and basis information based on the assumption situation simulation and providing it to the user computing device (110).

[0322] An artificial intelligence prediction model according to one embodiment may be further trained according to the ESSformer (Efficient Segment-based Sparse Transformer) method to capture both long-term temporal dependencies and dependencies between features of different variables. This will be explained below.

[0323] Time series forecasting is a fundamental machine learning task that aims to predict future events based on past observations. These forecasting problems often require long-term predictions and can involve various variables. For example, stock price forecasting may require predicting multiple market values over a long time axis. In these multivariate long-term time-series forecasting (M-LTSF) problems, it is important to capture both the long-term temporal dependencies between past and future events and the dependencies between the features of different variables.

[0324] In recent years, many deep neural architectures, including linear models, state-space models, and Recurrent Neural Networks (RNNs), have been developed for the M-LTSF problem. Among them, the Transformer model is a neural network that learns context and semantics by tracking relationships within sequential data, such as words in a sentence. Transformer models have demonstrated remarkable performance in various fields, such as language and image processing, and due to their ability to capture long-term relationships, they are also being studied in the field of multivariate long-term time series forecasting. For example, as shown in Fig. 20a, a Transformer model in which a single observation becomes a single token has been used in the field of time series forecasting. In recent research, a segment-based Transformer model has been proposed, as shown in Fig. 20b, in which each token is represented as a continuous group of observations rather than a single observation. However, in the case of self-attention in segment-based transformer models, one segment is converted into one token; while prediction performance improves as the segment becomes more subdivided, the number of tokens increases significantly when subdivided, thereby greatly increasing the computational cost of attention. Additionally, as shown in FIG. 20b, in inter-feature attention which finds associations between features, prediction can be performed significantly inefficiently when the number of features is very large. To solve these problems, one embodiment of the present disclosure aims to provide a time series prediction method that maintains performance while being less subdivided, and maintains performance even in inter-feature attention with a large number of features. The transformer model provided by one embodiment of the present disclosure may be referred to as an Efficient Segment-based Sparse Transformer (ESSformer).

[0325] FIG. 21 is a schematic diagram of an ESSformer (Efficient Segment-based Sparse Transformer) block according to one embodiment of the present disclosure.

[0326] Referring to FIG. 21, Dimension-Segment-Wise (DSW) embeddings can be performed to process historical time series information. In DSW embeddings, a series of each dimension is divided into segments and then embedded in a feature vector. The output of the DSW embeddings can be a 2D vector matrix with time and dimension as the two axes. Two attention layers can be used to efficiently capture cross-time and cross-dimensional dependencies between these vector matrices.

[0327] In one embodiment, the ESSformer block (2100) may include sparse attention modules customized for a segment-based transformer. In one embodiment, the ESSformer may include a Dilated Attention (DilA) module (2110) that periodically learns interactions between distant segments to efficiently capture temporal dependencies, and a Random-Partition Attention (R-PartA) module (2120) that captures inter-feature dependencies. The DilA module (2110) may be an attention module in the temporal dimension, and the R-PartA module (2120) may be an attention module in the feature dimension. That is, the DilA module (2110) may be a model that efficiently learns temporal dependencies, and the R-PartA module (2120) may be a model that efficiently learns inter-feature dependencies.

[0328] In the following, the ESSformer block (2100) will be described in more detail using formulas. In one embodiment, the DilA module (2110) can be designed by constructing attention extended with a stride P and block-diagonal attention with a block size P, based on the appearance of a periodic pattern in the self-attention matrix of a segment-based transformer. Through this, the number of segments N as input S Given , the computational cost in the temporal attention layer at It can be reduced to.

[0329] In one embodiment, the R-PartA module (2120) is of the same size S to capture dependencies between various features. G It can be designed by randomly partitioning features into groups and masking the attention matrices between different groups. Through this design, when the feature size is D, the attention computation cost is O(D 2 In ) O(DS G It can be reduced to ). According to one embodiment, the probabilistic nature inherent in the random splitting of the R-PartA module (2120) can enable efficient and effective learning. Additionally, according to one embodiment, by using a test-time ensemble technique in the inference step, the limitation that inter-feature relationships cannot be captured from masked attention can be resolved.

[0330] In one embodiment, the time series observation x of the variable D at time t t Is It can be expressed as, where x t,d represents the actual value observation of the d-th feature at time t. The goal of time series forecasting is previous observations Future observations based on It could be predicting. Here, T is the length of the past time step, and represents the length of a future time step. One embodiment of the present disclosure is a multivariate long-term time series forecast D>1 and We aim to provide an efficient time series forecasting method for the case of 1.

[0331] In one embodiment, multivariate time series observations is N of the same length S It can be divided into segments. That is, the b-th segment of the d-th feature can be expressed as follows [Equation 1].

[0332] [Mathematical Formula 11]

[0333]

[0334] In one embodiment, observations are passed through a linear layer and inserted into the latent space, and a learnable time encoding is performed. and feature-based position encoding With this addition, the input can be expressed as [Equation 2] below.

[0335] [Mathematical Formula 12]

[0336]

[0337] Initial expression H (0) Given this input, a segment-based transformer encoder with L layers produces the final representation H (L) It outputs, and this output value H (L) It can be transmitted through a decoder to predict future observations.

[0338] In one embodiment, a linear-based decoder is used, future observations by a single linear layer It can be mapped to.

[0339] Below, an ESSformer according to an embodiment of the present disclosure will be described based on the above expressions. In one embodiment, the input segment representation is H (0)Given this, each layer of the ESSformer can be expressed as follows [Equation 13] and [Equation 14].

[0340] [Mathematical Formula 13]

[0341]

[0342] [Mathematical Formula 14]

[0343]

[0344] The DilA module (2110) is described below. In one embodiment, input segments To capture temporal relationships from, the DilA module (2110) processes the input through two attention modules (2112, 2114), each of which can discover distinct temporal relationships. In one embodiment, the attention modules (2112, 2114) may be Multi-head Self Attention (MHSA) modules. For relationships within a period, a block-diagonal attention module (2112) having a block size P can mix features between segments within the same time period. Additionally, for relationships between periods, an extended attention module (2114) having a stride P can periodically share representations between distant segments for longer-range contextualization.

[0345] Here, Q, K, and V are the query, key, and value, respectively, and it is assumed that MHSA(Q, K, V) is represented as a vanilla MHSA hierarchy. Given a set of numbers C as indices, selecting all indices included in C (e.g., It can be meant as ). At this time, the step-by-step procedure of the DilA module (2110) can be expressed as the following [Equation 15] and [Equation 16].

[0346] [Mathematical Formula 15]

[0347]

[0348] [Mathematical Formula 16]

[0349]

[0350] Here, [j :: P] represents a set of indices starting from j with a stride of P. That is, [j :: P] := {j, j+P, j+2P, ...}. In one embodiment, the block-diagonal attention module captures relationships within a period through [Equation 15], and relationships between periods can be considered through [Equation 16].

[0351] If the DilA module (2110) is not used, N through self-attention S In order to encode segments Computational cost is required. This can become difficult to handle when considering time series data where T is large. If the duration of each segment is extended, N S Although it can be reduced, in Transformer-based generative modeling, the lower the segment granularity, the lower the inference quality. Accordingly, considering that time series forecasting is similar to generating future observations conditioned on past signals, an efficient architecture with a second-order asymptotic cost in terms of the number of segments is required. To address this limitation, a DilA module (2110) according to one embodiment effectively applies block diagonal and stride sparse attention masks, thereby reducing computational cost without significantly sacrificing the expressiveness of self-attention.

[0352] In one embodiment, sparse attention is an attention that reduces computational complexity by adding a sparsity bias to the attention, based on the fact that a matrix filled with non-zero elements is called a dense matrix and a matrix containing many zeros is called a sparse matrix, and may include location-based sparse attention, content-based sparse attention, etc.

[0353] A periodically expanded sparsity structure according to one embodiment was proposed, inspired by graphs depicting the attention score matrices of various Transformer models after training on M-LTSF. In one embodiment, the period is Because of this, time and memory complexity at It can be reduced to. P * Periodically sparse attention using may be sufficient to maintain the downstream function of full attention.

[0354] [Mathematical Formula 17]

[0355]

[0356] Since this task considers only interactions within the group, the computational cost is O(D 2 In ) O(DS G It can be reduced to ). However, if the prediction procedure is executed only once during the inference phase, only information between partial features within each group may be considered. To address the limitation where the entire information is not utilized, the test-time ensemble method [uses] random partition N E The prediction procedure is executed by randomly dividing time, and N E The predicted outputs of can be ensemble (e.g., averaged). The ensemble procedure can be performed according to [Algorithm 2] below.

[0357] [Algorithm 2]

[0358]

[0359] According to one embodiment of the present disclosure, the R-PartA module (2120) can not only reduce computational costs but also increase prediction performance.

[0360] In the description with reference to FIG. 21, the case in which both the DilA module (2110) and the R-PartA module (2120) are used in the ESSformer block (2100) was described as an example, but it is obvious that only one of the DilA module (2110) and the R-PartA module (2120) can be used.

[0361] FIG. 22 is a flowchart illustrating a method for generating prediction data according to one embodiment of the present disclosure.

[0362] Referring to FIG. 22, in operation 3310, the method for generating prediction data may include the operation of dividing input data along a time axis to generate one or more segments. In one embodiment, the input data may include input time series data (2210) in the example of FIG. 21. The input time series data (2210) may be multivariate time series data. Such input data may be segmented along a time axis to generate an input sequence or an input segment. Accordingly, a system for generating prediction data may include a segmentation module (2240) for dividing input data along a time axis to generate one or more segments. An ESSformer block (2100) according to one embodiment of the present disclosure may include a neural network for generating prediction data (2230) using such an input sequence (2220). The segmentation module (2240) may be located outside the ESSformer block (2100) as in FIG. 21, or it may be located inside the ESSformer block (2100).

[0363] In operation 3330, the method for generating prediction data may include an operation of randomly distributing features of the input data. Although operation 3330 is depicted in FIG. 22 as being located between operations 3310 and 3350, this is merely an example, and operation 3330 can be performed regardless of order as long as it is before operation 3370. For example, operation 3330 may be performed before operation 3310, or between operations 3350 and 3370.

[0364] In one embodiment, the system may include a random partitioning module (2250) that randomly distributes features of input data. In one embodiment, the partitioning information (2260) of the features partitioned by the random partitioning module (2250) may be used when using a second neural network to extract dependencies between features.

[0365] In one embodiment, the random splitting module (2250) may be located outside the ESSformer block (2100) as shown in FIG. 21, or it may be located inside the ESSformer block (2100). For example, when the random splitting module (2250) and the segmentation module (2240) are located outside the ESSformer block (2100), the ESSformer block (2100) may receive feature splitting information (2260) and a segmented input sequence (2220) as inputs and use them to generate prediction data (2230).

[0366] In operation 3350, the prediction data generation method may include an operation of determining temporal relationship information of input data using a first neural network. The first neural network may include a neural network that applies extended attention to an input sequence segmented along the time axis.

[0367] In one embodiment, at least one processor performing a prediction data generation method may rearrange a segmented input sequence based on a predetermined period and perform MultiHead Self-Attention (MHSA) on the rearranged data. Here, the predetermined period is It could be.

[0368] For example, the explanation will be based on the case where there are six segments as shown in Fig. 21, using identification numbers assigned sequentially from the beginning as Segment #0, Segment #1, ..., Segment #5. In this case, the predetermined period is It may be possible. Accordingly, at least one processor may rearrange the input sequence by cutting it into periods. Accordingly, the first rearranged data (2270) may be rearranged into {Segment #0, Segment #1}, {Segment #2, Segment #3}, {Segment #4, Segment #5}. Time-axis dependencies may be extracted by applying the first MHSA (2112) to this first rearranged data. However, since it is difficult to identify dependencies between distant segments in this case, at least one processor may rearrange the input sequence by grouping segments separated by a period. Accordingly, the second rearranged data (2280) may be rearranged into {Segment #0, Segment #2, Segment #4}, {Segment #1, Segment #3, Segment #5}. At least one processor may identify dependencies between segments separated by a period using the second MHSA (2114) on the second rearranged data (2280). That is, the first neural network may include a first MHSA (MultiHead Self-Attention) module (2112) that extracts features between segments within the same time period based on rearranging the input sequence segmented along the time axis for each feature, and a second MHSA module (2114) that extracts features between periods of periodically separated segments.

[0369] In one embodiment, temporal relationship information of the input data can be determined by a first neural network. According to one embodiment, prediction performance can be maintained while extracting only the temporal dependencies between segments within a period and the temporal dependencies between segments separated by a period, rather than extracting all temporal dependencies between all segments. That is, according to one embodiment, by extracting the dependencies between consecutive segments and the dependencies between segments separated by a certain period, the complexity of the calculation can be reduced while maintaining prediction performance.

[0370] In operation 3370, the method for generating prediction data may include an operation of determining characteristic relationship information of input data using a second neural network. In one embodiment, the second neural network may include a neural network that applies random partition attention to data arranged along feature axes. The second neural network may use data arranged along feature axes of the output data of the first neural network, or may use data arranged along feature axes of segmented input sequences.

[0371] In one embodiment, the second neural network may include a third MHSA module (2290) that extracts dependencies between features based on rearranging the features of the input data according to the partitioning information (2260) determined by the random partitioning module (2250). For example, if there are a total of four features from top to bottom, such as feature #1, feature #2, feature #3, and feature #4, and the partitioning information (2260) is {feature #4, feature #2} and {feature #3, feature #1} as shown in FIG. 2, at least one processor may generate third rearranged data by rearranging each data aligned along the feature axis into {feature #4, feature #2} and {feature #3, feature #1}. Additionally, at least one processor may apply MHSA to the third rearranged data and rearrange it again based on the partitioning information (2260) to restore the order of the features. Through this, at least one processor can determine characteristic relationship information of input data using a second neural network.

[0372] In operation 3390, the method for generating prediction data may include an operation of generating prediction data based on temporal relationship information and characteristic relationship information. In one embodiment, the prediction data may be generated based on temporal relationship information determined using a first neural network and characteristic relationship information determined using a second neural network. That is, at least one processor may generate prediction data by processing an input sequence segmented along the time axis using neural networks.

[0373] In one embodiment, at least one of the first neural network and the second neural network may include a sparse attention module.

[0374] [Table 2] is a table showing the performance of an ESSformer (Efficient Segment-based Sparse Transformer) block according to one embodiment of the present disclosure.

[0375] [Table 2]

[0376]

[0377] Referring to [Table 2], it can be demonstrated that the ESSformer method achieves the most efficient computational complexity among various segment-based transformers. For example, FIG. 4 illustrates that the ESSformer achieved the best performance in 27 out of 28 tasks of M-LTSF. It also illustrates that it achieved second place in the remaining task. According to one embodiment of the present disclosure, the ESSformer method can not only reduce computational complexity but also improve prediction performance.

[0378] FIG. 24 is a schematic diagram of a time series forecasting system according to one embodiment of the present disclosure.

[0379] Referring to FIG. 24, the time series prediction system may be a GPU (graphic processing unit) or a concept including a GPU. The time series prediction system may include an encoder module (2410), a decoder module (2480), a first linear operation module (2450), etc. In one embodiment, the time series prediction system may acquire an observation sequence (2420). The observation sequence (2420) may include, for example, a past observation sequence. In the example of FIG. 24, the observation sequence (2420) may include x0 at time t=0, x1 at time t=1, and x2 at time t=2. In one embodiment, the time series prediction system may further acquire time information. The time information may include agent time (2430), time point information where each data of the observation sequence (2420) was obtained, etc.

[0380] In one embodiment, the agent time (2430) may include the timestamp at which each agent made an observation. By acquiring the agent time (2430), the encoder module (2410) can generate different representations depending on the time in an asynchronous multi-agent system. The normalization layer of the encoder can correct time differences caused by asynchronous collection through agent time-based modulation. Additionally, the decoder can generate output predictions by modulating with the same or different condition parameters.

[0381] In one embodiment, the encoder module (2410) can generate a potential sequence (2440) of a first time point by taking an observation sequence (2420) and time information as inputs. For example, the first time point may refer to the time point at which the observation sequence (2420) was acquired. That is, if the observation sequence (2420) is a sequence observed at a past time point, the potential sequence (2440) of the first time point may include the potential sequence of the past time point. The potential representation constituting the potential sequence may be information that compresses the corresponding observation representation. For example, the encoder module (2410) can generate a potential sequence {z0, z1, z2} of the first time point based on the observation sequence {x0, x1, x2} and the agent time (2430). That is, the encoder module (2410) can generate the value of the first time stamp z0 of the potential sequence at the first time point based on the value of the first time stamp x0 of the observation sequence, generate the value of the second time stamp z1 of the potential sequence at the first time point based on the value of the second time stamp x1 of the observation sequence, and generate the value of the third time stamp z2 of the potential sequence at the first time point based on the value of the third time stamp x2 of the observation sequence.

[0382] In one embodiment, the encoder module (2410) may include a first linear layer, a SiLU (Sigmoid Linear Unit), a second linear layer, and an AdaLN (Adaptive Layer Normalization; Ada LN) layer. For example, the encoder module (2410) may include an observation sequence x0, x1, ..., x k and when the timestamp and agent time information are received as input, the latent representations z0, z1, ..., z of each time point are processed through the first linear layer, SiLU, second linear layer, non-linear transformation layer, and AdaLN layer. k By generating, a potential sequence (2440) of the first time point, which is a set of these, can be generated.

[0383] In one embodiment, the first linear operation module (2450) solves a continuous-time linear stochastic differential equation defined by the first parameter (2460) for a latent representation at each time point to the second time point t k+1 , t k+2 , ..., t k+m Propagated as to the latent representation z at time 2 k+1 , z k+2 , ..., z k+m It can be calculated. The second point in time may be, for example, a future point in time. A set of potential representations of such a second point in time may be referred to as a potential sequence of the second point in time (2470).

[0384] In one embodiment, the first parameter (2460) may include a set of coefficients of a stochastic differential equation of the linear operation module (2450). For example, the stochastic differential equation is dz t =(Az t +b) dt + ∑dW t When expressed as such, the first parameter (2460) may be {A, b, ∑}. The first parameter (2460) defines the continuous-time transition distribution of the latent state and can be updated by learning.

[0385] In one embodiment, the first linear operation module (2450) may include a linear stochastic dynamics module. The first linear operation module (2450) may generate a potential sequence (2470) at a second time point by taking a first parameter (2460) and a potential sequence (2440) at a first time point as inputs. That is, the first linear operation module (2450) may generate a potential sequence (2470) at a second time point by performing a calculation on the potential sequence (2440) at the first time point based on a continuous-time linear stochastic differential equation defined by the first parameter (2460). For example, the first linear operation module (2450) can generate a potential sequence (2470) for a second time point corresponding to the future by solving a continuous-time linear stochastic differential equation defined by the first parameter (2460) for the potential expression z0 at time point t=0, the potential expression z1 at time point t=1, and the potential expression z2 at time point t=2, and calculating the potential expression z3 at time point t=3, the potential expression z4 at time point t=4, and the potential expression z5 at time point t=5 that correspond to the future.

[0386] In one embodiment, the decoder module (2480) may include a third linear layer, a SiLU (Sigmoid Linear Unit), a fourth linear layer, and a LN (Layer Normalization) layer. The decoder module (2480) includes a latent sequence z at a second time point. k+1 , z k+2 , ..., z k+m When received as input, the predicted value x corresponding to the latent representation at each time point is obtained through the first linear layer, SiLU, second linear layer, non-linear transformation layer, and LN layer. k , x k+2 , ..., x k+m By generating, a second time point prediction sequence (2490), which is a set of these, can be generated.

[0387] FIG. 25 is a diagram showing the operation method of the first linear operation module.

[0388] Referring to FIG. 25, the latent representation z at timestamp t+1 t+1 is the latent representation z at timestamp t. t Since it is the value moved from, z t+1 =z t +f(z t ; )= It can be expressed as follows. That is, if the latent expression of z0 at time t=0 is given, the latent expression at time t=1 is z1=z0+f(z0; )= It can be calculated as follows. The latent representation at time t=2 is obtained using the calculated z1 value: z2=z1+f(z1; )= It can be calculated as follows. Similarly, the latent representation at time t=3 is obtained using the value of z2: z3=z2+f(z2; )= It can be calculated as. In this way, the latent representation at time t=k can be calculated using a For loop statement.

[0389] That is, in order to calculate the latent expression at time t=k+1, the calculation of the latent expression at time t=k must be completed, so the calculation time may be long. Figure 26 explains a method to shorten the calculation.

[0390] FIG. 26 is a diagram illustrating a method of operation of a first linear operation module according to one embodiment of the present disclosure.

[0391] As explained with reference to FIG. 25, the latent representation z at timestamp t+1 t+1 is the latent representation z at timestamp t. t Since it is the value moved from, z t+1 =z t +f(z t ; )= It can be expressed as, where the latent representation z at timest+2 is t+2 =z t+1 +f(z t+1; )= latent representation z of timestamp t+1 t+1 = If we substitute, z t+2 =z t+1 +f(z t+1 ; )= = It can be expressed as. Rearranging according to the associative law, z t+2 = It can be summarized as follows.

[0392] That is, while the calculation at time t+1 can only be performed after the calculation at time t is completed in FIG. 25, according to one embodiment of the present disclosure, parallel calculations can be performed as shown in FIG. 26.

[0393] In one embodiment, when the first, second, and third timestamps represent consecutive points in time, the value (z) of the third timestamp of the potential sequence at the second point in time (e.g., a future point in time) t+2 ) is the value of the first time stamp of the observation sequence (z t ), the value of the first time stamp of the first parameter( ), and the value of the second timestamp of the first parameter ( It can be determined based on ). That is, the value of the third timestamp of the potential sequence at the second time point (z t+2 )silver It can be calculated as.

[0394] In one embodiment, by the association rule, the value of the first time stamp of the first parameter ( ) and the value of the second timestamp of the first parameter ( The first operation result based on ). is generated, and the first operation result and the value of the first time stamp of the observation sequence (z t Based on ), the value of the third timestamp of the latent sequence at the second time point (z t+2) can be determined. Accordingly, in the embodiment of FIG. 25, the number of calculation iterations was N times for an input sequence length N, whereas in the embodiment of FIG. 26, the number of calculation iterations can be reduced to log N.

[0395] In one embodiment, the first linear operation module (2450) can generate the second time point potential sequence (2470) by defining the second time point propagation of the first time point potential sequence (2440) as a transition operator corresponding to a continuous-time linear stochastic dynamics model, reconstructing the transition operation into a chain of operators that satisfies the associative law, and performing parallel operations on the chain of operators.

[0396] In one embodiment, the processor takes an observation sequence as input and uses an encoder module to generate a latent sequence at a first time point, and a set of affine transition operators {θ corresponding to continuous-time linear stochastic dynamics or their discrete equivalents t =(A t , b t Synthesis is performed by inputting the first parameter represented by )} and the latent sequence at the first time point into the first linear operation module. A latent sequence of the second time point can be generated by propagation based on the above, and the latent sequence of the second time point can be converted into a predicted sequence and output using a decoder module.

[0397] FIGS. 27a to 27c are schematic diagrams of a time series prediction system using condition parameters according to one embodiment of the present disclosure.

[0398] Referring to FIG. 27a, a condition parameter (2710) is combined with an observation sequence (2420), and the encoder module (2410) can generate a potential sequence of a first time point by using the observation sequence combined with the condition parameter and time information as inputs. In one embodiment, the processor of the time series prediction system can obtain the observation sequence (2420), time information of each time point, agent time (2430), and condition parameter (2710) as inputs. Additionally, the processor may form an input to an encoder module (2410) by combining the values of an observation sequence (2420) and a condition parameter (2710) in at least one manner of concatenation or element-wise sum, and after generating a potential representation for each time point through the encoder module (2410), pass through an intermediate transition module that propagates the potential representation to a second time point, and then convert the potential representation for the second time point into a predicted value in a decoder module (2480) to output a prediction sequence (2490).

[0399] In one embodiment, the condition parameter can generate a modulation vector (scale and shift) for the normalization layer of the encoder and decoder to reflect time, action, and environmental conditions. The encoder receives an observation sequence and time information (e.g., agent time) as input to produce a latent representation, and the normalization output can be scaled and shifted by a modulation vector generated by combining the condition parameter and agent time. If the condition parameter is absent, it can operate in an unconditional mode using a zero-vector or a template vector.

[0400] In one embodiment, the condition parameter may be provided as a one-hot vector representing a type of behavior. For example, in the case of robot control, the condition parameter may include slow jogging, fast jogging, forward running, backward running, slow walking in place, fast walking in place, left turn, right turn, forward stride length, etc.

[0401] In one embodiment, the condition parameter may be coupled to the observation sequence (2420) and become the input of the encoder module (2410). Additionally, the condition parameter may also be coupled to the input of the decoder module (2480).

[0402] In one embodiment, before the condition parameters are combined element by element, the time series forecasting system may further include a linear projection layer to align the dimension of the condition parameters with the dimension of the time series feature.

[0403] In one embodiment, an adaptive layer normalization (Ada Layer Normalization) step may be further performed to apply the scale and shift calculated from the condition parameter in the normalization step.

[0404] In one embodiment, the condition parameter may include at least one of a behavior label, an environment state, or a text embedding.

[0405] In one embodiment, both connection and element-by-element combination are applied as combination methods of condition parameters, and the two combination results can be fused and used as a gate or weighted sum method.

[0406] In one embodiment, if no condition parameter is provided, an unconditional prediction mode may be provided that combines by replacing it with a zero (0) vector or a basic template.

[0407] Referring to FIG. 27b, the first condition parameter (2720) is combined with the agent time (2430), and the encoder module (2410) can generate a potential sequence (2440) of a first time point (e.g., a past time point) by taking the agent time (2430) combined with the observation sequence (2420) and the first condition parameter (2710) as input.

[0408] In one embodiment, the processor receives an observation sequence (2420), an agent time (2430), and a first condition parameter (2710) as inputs, and generates a latent representation using an encoder module (2410), configured to scale and shift the normalized output using a modulation vector generated by combining the agent time (2430) and the first condition parameter (2710) in the adaptive layer normalization layer of the encoder module (2410), and may pass through a transition module that propagates the latent representation to a second time point (e.g., a future time point). The decoder module (2480) includes a linear operation module, a non-linear activation module, and an adaptive layer normalization module, and the adaptive layer normalization module of the decoder module can generate a prediction sequence by using a modulation vector generated by combining the second condition parameter (2730). The second condition parameter (2730) may have the same value as the first condition parameter (2720) or may have a different value.

[0409] Referring to FIG. 27c, the processor receives an observation sequence (2420), an agent time (2430), and a condition parameter (2740) as inputs, and generates potential expressions through an encoder module (2410) including a linear operator, a non-linear activation and normalization layer, wherein the normalization step of the encoder module (2410) is configured to be modulated by the agent time, generates a first parameter (2460) that defines the operation of a transition module from the condition parameter (2740), inputs the potential expression to a transition module mediated by the first parameter (2460) to generate a potential sequence (2470) at a second time point, and applies a decoder module (2480) to generate a prediction sequence (2490) for the potential sequence (2470) at the second time point.

[0410] In one embodiment, the first parameter (2460) can be determined based on a condition parameter.

[0411] One embodiment of the present disclosure may also be implemented in the form of a recording medium comprising computer-executable instructions, such as program modules executed by a computer. A computer-readable medium may be any available medium accessible by a computer and includes both volatile and non-volatile media, and both removable and non-removable media. Additionally, a computer-readable medium may include both computer storage media and communication media. A computer storage medium includes both volatile and non-volatile, removable and non-removable media implemented by any method or technique for storing information, such as computer-readable instructions, data structures, program modules, or other data. A communication medium typically includes computer-readable instructions, data structures, or program modules and includes any information transmission medium.

[0412] The foregoing description of the present disclosure is for illustrative purposes only, and those skilled in the art will understand that modifications can be easily made to other specific forms without altering the technical spirit or essential features of the present invention. Therefore, the embodiments described above should be understood as illustrative in all respects and not restrictive. For example, each component described as a single unit may be implemented in a distributed manner, and components described as distributed may likewise be implemented in a combined form.

[0413] The scope of the present disclosure is defined by the claims set forth below rather than by the detailed description above, and all modifications or variations derived from the meaning and scope of the claims and equivalent concepts thereof should be interpreted as being included within the scope of the present disclosure.

Claims

1. In a system performed by a computer, At least one processor; and It includes at least one memory that stores instructions that cause the system to perform operations when executed by the above-mentioned at least one processor, and the operations are: Operation of acquiring an observation sequence; An operation to generate a latent sequence at a first time point using an encoder module with the above observation sequence as input; The operation of inputting a first parameter regarding the coefficients of a linear stochastic differential equation and a latent sequence at the first time point into a first linear operation module to generate a latent sequence at the second time point; and It includes an operation of converting the potential sequence at the second time point into a predicted sequence and outputting it using a decoder module, The value of the third timestamp of the potential sequence at the second time point above is, A system determined based on the value of a first time stamp of the observation sequence, the value of a first time stamp of the first parameter, and the value of a second time stamp of the first parameter.

2. In paragraph 1, the value of the third timestamp of the potential sequence at the second point in time is, A system in which a first operation result is generated based on the value of the first time stamp of the first parameter and the value of the second time stamp of the first parameter, and is determined based on the first operation result and the value of the first time stamp of the observation sequence.

3. In paragraph 1, the operation of generating the potential sequence at the second time point is, An operation of defining the propagation of a potential sequence at the first time point to future time points as a transition operator corresponding to a continuous-time linear stochastic dynamics model; The operation of reorganizing a transfer operation into a chain of operators that satisfy the associative law; and A system comprising an operation to generate a potential sequence at the second time point by performing a parallel operation on a chain of the above-mentioned operators.

4. In Paragraph 3, A system or method characterized in that the total number of iterations of the above parallel operation is limited to log N or less with respect to the input sequence length N.

5. In paragraph 1, the above operations are, It further includes an operation to acquire time information corresponding to the above observation sequence, and The operation of generating a potential sequence at the first time point above is, An operation of combining condition parameters with the above observation sequence; and A system comprising the operation of generating a potential sequence at a first time point using an encoder module with the observation sequence combined with the above condition parameters and the above time information as inputs.

6. In paragraph 1, the above operations are, It further includes an operation to acquire time information corresponding to the above observation sequence, and The operation of generating a potential sequence at the first time point above is, An operation of combining condition parameters with the above time information; and A system comprising the operation of generating a potential sequence at a first time point using an encoder module with time information combining the observation sequence and the condition parameter as input.

7. In paragraph 1, the decoder module is, It includes a linear operation module, a non-linear activation module, and an adaptive hierarchical normalization module, and The above adaptive hierarchical normalization module is, A system that generates the above prediction sequence by considering condition parameters.

8. In paragraph 1, the system wherein the first parameter is determined based on a condition parameter.

9. In a graphics processing unit, An encoder module that generates a latent sequence at a first time point using an observation sequence as input; A first linear operation module that generates a potential sequence at a second time point by taking as input a first parameter regarding the coefficients of a linear stochastic differential equation and a potential sequence at the first time point; and It includes a decoder module that converts the potential sequence at the second time point into a predicted sequence and outputs it, The value of the third timestamp of the potential sequence at the second time point above is, A graphics processing device determined based on the value of a first time stamp of the observation sequence, the value of a first time stamp of the first parameter, and the value of a second time stamp of the first parameter.

10. In Paragraph 9, The value of the third timestamp of the potential sequence at the second time point above is, A graphics processing device that generates a first operation result based on the value of the first time stamp of the first parameter and the value of the second time stamp of the first parameter, and determines based on the first operation result and the value of the first time stamp of the observation sequence.

11. Operation to acquire an observation sequence; An operation to generate a latent sequence at a first time point using an encoder module with the above observation sequence as input; The operation of inputting a first parameter regarding the coefficients of a linear stochastic differential equation and a latent sequence at the first time point into a linear module to generate a latent sequence at the second time point; and It includes an operation of converting the potential sequence at the second time point into a predicted sequence and outputting it using a decoder module, The value of the third timestamp of the potential sequence at the second time point above is, A method determined based on the value of a first time stamp of the observation sequence, the value of a first time stamp of the first parameter, and the value of a second time stamp of the first parameter.

12. In Paragraph 11, The value of the third timestamp of the potential sequence at the second time point above is, A method in which a first operation result is generated based on the value of the first time stamp of the first parameter and the value of the second time stamp of the first parameter, and is determined based on the first operation result and the value of the first time stamp of the observation sequence.

13. In paragraph 11, the operation of generating the potential sequence at the second time point is, An operation of defining the propagation of a potential sequence at the first time point to future time points as a transition operator corresponding to a continuous-time linear stochastic dynamics model; The operation of reorganizing a transfer operation into a chain of operators that satisfy the associative law; and A method comprising the operation of generating a potential sequence at the second time point by performing a parallel operation on a chain of the above-mentioned operators.

14. A program stored on a computer-readable recording medium to execute any one of the methods of paragraphs 11 through 13 on a computer.