A large language model driven time series anomaly detection method
By combining the anomaly injection-reconstruction paradigm with a large language model, the problems of ambiguous supervision signals and high computational resources in time series anomaly detection are solved, achieving efficient anomaly detection with strong generalization performance, and applicable to a variety of application scenarios.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIHANG UNIV
- Filing Date
- 2026-04-03
- Publication Date
- 2026-06-12
AI Technical Summary
Existing time series anomaly detection methods suffer from problems such as fuzzy supervision signals, overfitting, insufficient information exposure, high computational resource requirements, slow inference speed, and difficulty in modality alignment, especially in scenarios with few samples or complex anomalies.
We adopt the anomaly injection-reconstruction paradigm, which generates synthetic anomalies and combines them with a large language model to provide explicit supervision signals, thereby aligning time series data with text semantics. We also design a lightweight framework for training and inference.
It improves generalization performance, enhances modal alignment, reduces computational resource requirements, increases detection efficiency, and supports efficient deployment in various application scenarios.
Smart Images

Figure CN122196834A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of time series anomaly detection, specifically involving a time series anomaly detection method driven by a large language model. Background Technology
[0002] Time series anomaly detection is a key technology for monitoring the reliability of complex dynamic systems, such as industrial operations, medical infrastructure, financial markets, and cybersecurity platforms. Its core objective is to identify outliers or segments in time series data that deviate from normal temporal behavior, enabling early fault diagnosis, proactive maintenance, and risk mitigation. In practical applications, time series anomaly detection systems often need to operate under conditions of evolving data distribution, limited or missing anomaly labels, and strict computational constraints, which pose challenges to the baseline performance of the model and the reliability of its actual deployment.
[0003] Existing time series anomaly detection methods are mainly based on prediction paradigms or reconstruction paradigms. Prediction paradigm methods (such as Timer and Sundial) predict future values through historical observations and detect anomalies using prediction errors, but they usually rely on the assumption of strong stationarity, making it difficult to capture complex time dependencies or subsequence-level anomalies. Reconstruction paradigm methods (such as autoencoders and recurrent neural networks) are widely used. Their trained models reconstruct the input sequence itself and use the reconstruction error as an anomaly score, supporting unsupervised learning and subsequence-level anomaly detection. They have also become a common training standard for unified evaluation methods, and many models designed for time series prediction can produce better and more stable results when trained under the reconstruction paradigm for time series anomaly detection tasks. However, the reconstruction paradigm has inherent limitations: First, the reconstruction objective requires the input and output to be consistent, leading to fuzzy supervision signals and the model easily degenerating into an identity mapping problem; second, the model training only uses normal data, lacking explicit exposure to anomaly patterns, assuming that anomalies cannot be well reconstructed, but this often fails in practice, resulting in poor generalization ability, especially in few-sample and zero-sample settings. These limitations stem from the reconstruction paradigm itself, rather than from any specific model architecture. Addressing these challenges requires a re-examination of reconstruction-based time-series anomaly detection from the perspective of training signals and representation learning. Models should actively engage with anomalous patterns during training, rather than passively assuming their absence, and learn representations that can be abstracted from raw numerical values, moving towards higher-level temporal semantics.
[0004] In recent years, advancements in large language models for sequence modeling have provided new opportunities for time series analysis. However, directly applying large language models to time series anomaly detection faces challenges: modal alignment between continuous-valued time series and discrete text representations is difficult, resulting in high computational costs, large model footprints, and low inference efficiency. This highlights the need to construct a framework to fully leverage the semantic reasoning capabilities of large language models while explicitly aligning time patterns with text representations.
[0005] One existing technology is the traditional deep learning-based time series anomaly detection method. This type of method uses deep learning models (such as DLinear, Informer, TimesNet, Anomaly Transformer, PatchTST, FEDformer, iTransformer, KAN-AD, etc.) to model the time series. Evaluation frameworks often use paradigm alignment based on reconstruction for training and testing. The specific steps are as follows: The time series is divided into subsequences using a sliding window as model input; during the training phase, only data without anomalies is used to optimize the model to minimize the mean squared error between the input and reconstructed sequences; during the inference phase, the reconstruction mean squared error between the input and reconstructed sequences is calculated as an anomaly score, and an anomaly threshold is determined using specific quantiles. A binary anomaly label is output by comparing the anomaly score with the anomaly threshold, indicating whether each time point is an anomaly. This method relies on the model learning latent patterns from normal data and assumes that anomalies lead to significant reconstruction bias.
[0006] Existing technologies can complete end-to-end training and testing on specific datasets, and some methods have achieved good performance. However, these models still have some problems in terms of training methods and generalization.
[0007] 1. Ambiguous supervision signals: The reconstruction target requires the output to be consistent with the input. The lack of differentiation of abnormal behavior makes it difficult to guide the model to learn discriminative representations, resulting in reconstruction bias during inference when anomaly detection depends on it. When anomalies and normal patterns share some structures, the detection performance will decrease.
[0008] 2. Overfitting problem: The model tends to memorize normal patterns in the training data rather than learn the time dynamic representation of generalization. When the training data is scarce or the test abnormal patterns differ greatly from the training data, the generalization ability is limited. The performance guarantee of these models relies excessively on the adjustment of hyperparameters, and different hyperparameter sets need to be designed for different datasets.
[0009] 3. Insufficient information exposure: During the training phase, only normal data is used, and the model does not explicitly expose itself to abnormal patterns. It cannot learn how abnormalities deviate from normal behavior, which limits its robustness to unknown abnormalities.
[0010] Existing technology two is time series anomaly detection methods based on large language models. These methods (such as GPT4TS and Time-LLM) leverage the sequence modeling capabilities of large language models, using techniques such as numerical sequence encoding and cue construction to map time series tasks into a framework based on large language models. The technical solution includes: converting time series data into an embedded representation that can be processed by large language models (e.g., using multilayer perceptrons or reprogrammed layers for projection); combining textual cues such as task descriptions as multimodal inputs; using large language models for sequence modeling; and outputting the reconstructed time series. Because large language models have higher overhead, the training and inference phases generally also employ a reconstruction paradigm. This type of method aims to align the temporal and textual modalities, utilizing the pre-trained knowledge of large language models to improve generalization ability.
[0011] The existing technology 2 has the following shortcomings:
[0012] 1. Insignificant performance improvement: Despite the introduction of the pre-training capability of large language models, the advantages of traditional methods are limited in time series anomaly detection tasks, especially in scenarios with few samples or complex anomalies, and may be affected by modality misalignment.
[0013] 2. High computational resource requirements: Large language models already have a large number of parameters. Existing methods have designed more complex encoding and decoding structures on this basis, resulting in higher memory usage and computational overhead during the training and inference stages, as well as high deployment costs, making it difficult to meet the needs of resource-constrained scenarios.
[0014] 3. Low detection efficiency: The model structure is complex and the inference speed is slow, making it difficult to meet the low latency required for real-time monitoring.
[0015] 4. Modal alignment challenge: The misalignment between time series numerical features and text semantics may limit the model's performance, requiring additional design to bridge the gap.
[0016] The third existing technology is based on large-scale pre-trained time series models (such as Chronos, Sundial, and Moirai). These models learn general temporal dynamic patterns through unified pre-training on massive amounts of multivariate time series data (usually using an autoregressive prediction objective), thus possessing powerful zero-shot prediction capabilities. The core of their technical solution is that the model directly utilizes pre-trained parameters, without needing fine-tuning for specific datasets, to complete time series prediction tasks. In time series anomaly detection applications, these methods typically use a prediction paradigm for downstream task adaptation: the model receives historical time series as input and outputs a predicted sequence of future values; then, it calculates the error between the predicted and actual values (such as mean squared error) as an anomaly score, and time points with errors exceeding a preset threshold are judged as anomalies. For example, the Chronos series models discretize time series values into token sequences and use a Transformer architecture for pre-training; Sundial, on the other hand, uses multi-task learning to uniformly handle time series from different domains. These models rely on their generalization ability and can be directly applied to anomaly detection in a zero-shot setting, avoiding the need for labeled data during the training phase.
[0017] The existing technology 3 has the following shortcomings:
[0018] 1. Poor prediction stability: Although the pre-trained model performs well on normal time series, the prediction results may be unstable in environments with dense anomalies or distributional shifts. The pre-training data is mainly based on normal patterns, and the model lacks explicit learning of anomalous behavior. When the test data contains unknown anomaly types (such as sudden failures or gradual drift), the model has difficulty distinguishing between anomalies and normal fluctuations, its generalization ability is limited, and it is difficult to adapt to cross-domain anomalies.
[0019] 2. Inherent limitations of relying on prediction paradigms: Prediction paradigms assume that time series are stationary, but actual anomalies often disrupt the periodic or trend structure, leading to prediction failure.
[0020] 3. High training cost: The number of parameters in pre-trained models has reached hundreds of millions, and the amount of training data has reached hundreds of billions. The time cost of training a pre-trained model from scratch is far greater than the method of fine-tuning with a few samples. Summary of the Invention
[0021] To address the aforementioned technical problems, this invention proposes a time series anomaly detection method driven by a large language model. This method solves the problems of decreased generalization performance caused by ambiguous supervision signals and insufficient exposure of anomaly information in traditional reconstruction paradigms, overcomes the difficulties in modal alignment when directly applying large language models, and addresses the issues of high computational resource requirements and slow inference speed of methods based on large language models.
[0022] The specific technical solution of the present invention is as follows:
[0023] A time series anomaly detection method driven by a large language model includes the following steps:
[0024] Step S1: Given a multivariate time series, the anomaly injection module generates a time series containing synthetic anomalies during the training phase;
[0025] Step S2: The anomaly injection module simultaneously generates anomaly category-related prompt text and time series after anomaly injection. The anomaly category-related prompt text is used as text prompt input to the large model after the prompt is constructed.
[0026] Step S3: The processed time series is mapped into a two-dimensional slice representation and fed into a large model for time series feature extraction;
[0027] Step S4: The error message text and two-dimensional segments related to the anomaly category are used for feature extraction through forward propagation to reconstruct the time series containing the anomaly;
[0028] Step S5: Model outputs the reconstructed time series sequence;
[0029] Step S6: Calculate the loss between the anomalous time series and the reconstructed sequence;
[0030] Step S7: During the training phase, backpropagation of the loss is used for model optimization;
[0031] Step S8: Inference phase, the loss is used for binary classification to assign anomaly labels.
[0032] Preferably, step S1 includes the following sub-steps:
[0033] Step S11: During the training phase, process the input multivariate time series... Applying random transformations, time series The dimension is T is the time step, N is the number of variables, and the sequence for generating injection anomalies is generated. ;
[0034] Step S12: The injection operation covers five types of anomalies, including global point anomalies, context point anomalies, shape pattern anomalies, seasonal pattern anomalies, and trend pattern anomalies, to simulate diverse abnormal behaviors.
[0035] Step S13: The injection intensity is controlled to avoid excessive perturbation affecting convergence;
[0036] Step S14: The anomaly injection module actively exposes anomalies, enabling the model to learn repair capabilities. The calculation formula for the anomaly injection operator is:
[0037]
[0038] Where A represents the exception injection operation, and θ represents the encapsulation of exception configuration.
[0039] Preferably, step S2 includes the following steps:
[0040] Step S21: Describe the domain and details of the time series data;
[0041] Step S22: Clearly define the objectives of the time series anomaly detection and reconstruction paradigm;
[0042] Step S23: For the anomaly injection process during training, inform the model of the type of anomaly injected;
[0043] Step S24: During training, 30% of the hints are replaced with heuristic variants to prevent overfitting;
[0044] Step S25: The exception prompting and time series generation module realizes explicit alignment between time series and text semantics, promoting cross-modal reasoning.
[0045] Preferably, step S3 includes the following steps:
[0046] Step S31: For the input tensor , dimension We apply symmetric padding to ensure that the dimension is divisible by p, and then extract the piecewise set p using a sliding window.
[0047]
[0048] Where, N p The total number of slices is given. The unfold operation samples a two-dimensional vector time series according to the kernel_size window size. stride is the sampling step size. The output is that each sample in the batch is split into N_p slices of size P*P.
[0049] Step S32: Process the slices using multi-layer convolutional blocks, doubling the channel dimension at each convolutional layer, and outputting a feature map, the expression of which is:
[0050]
[0051] in, This represents the total number of convolutional layers. Let F(l) be the feature vector of the l-th layer; ConvBlock is the level of each convolutional kernel, F^(l) is the input of the l-th layer, and F^(l-1) is the output of the previous layer. The final output will extract features from a P*P size slice. Because the number of convolutional channels is constantly expanding, the final number of channels will become C_L. That is, for each slice, after feature extraction, the feature vector becomes P*P*C_L.
[0052] Step S33: Apply global average pooling to generate the pooled representation. Temporal embedding vectors are generated by projecting the vectors onto the hidden dimension D of a large language model using a multilayer perceptron. Its expression is:
[0053]
[0054]
[0055] in, It is the channel dimension of the last convolutional layer.
[0056] Preferably, step S4 specifically includes:
[0057] The anomaly category-related prompt text constructed in step S2 and the two-dimensional segments from step S3 are fused across modalities. A large language model is used as the backbone. During training, the anomaly category-related prompt text and the time series are aligned to enable the large language model to analyze time series. Finally, the time series is output through a simple decoder.
[0058] Its expression is:
[0059]
[0060]
[0061] in, To begin learning, tag embeddings. For the purpose of embedding, Embed for time-sliced data. For learnable end-of-tag embedding, The number of tokens in all embedded vectors. The total number of layers in the backbone of the large language model is given by the l-th layer of the backbone, which outputs the context representation. It encapsulates timing and semantic information.
[0062] Preferably, step S5 specifically includes:
[0063] The forward propagation process ultimately uses a lightweight multilayer perceptron to map the output of the large language model back to the time series domain, generating a reconstructed sequence. The calculation formula for the decoding process is:
[0064]
[0065] The output is clipped to the original dimension. This ensures an exact match with the input.
[0066] Preferably, step S6 includes the following steps:
[0067] Step S61: The training objective after reconstruction is to minimize the mean square error between the reconstructed sequence and the original normal sequence, so that the output sequence is as close as possible to the original input sequence, thereby achieving the effect of reconstructing the abnormal sequence.
[0068] Step S62: During inference, the mean square error of the input and output time series will also be calculated to perform inference classification.
[0069] Preferably, step S7 includes the following steps:
[0070] Step S71: During the training process, mean squared error is used as the loss for backpropagation to train the model;
[0071] Step S72: The model is trained in a two-stage manner. In the first stage, it is trained uniformly on multiple datasets to learn a common anomaly category classification and activates the training parameters only on the codec. In the second stage, it is fine-tuned on a separate dataset to adapt to downstream tasks and activates the LoRA adapter for backpropagation based on the trained codec.
[0072] Preferably, step S8 includes the following steps:
[0073] Step S81: During the inference process, the mean square error obtained from the model inference is statistically analyzed, and the outlier quantile is used as the outlier threshold. If the time point is greater than the threshold, it will be marked as an outlier point; otherwise, it is a normal point.
[0074] Step S82: By comparing the binary classification labels and the true labels, the F1 score that represents the model's inference performance is calculated, while the binary classification labels are provided to the user as a reference for anomaly detection.
[0075] The beneficial effects of the large language model-driven time series anomaly detection method of this invention are as follows:
[0076] 1. This invention improves generalization performance: by providing explicit supervision signals through anomaly injection, it avoids model overfitting to normal data and achieves stable detection in few-shot and zero-shot settings. Experiments show that on five publicly available multivariate anomaly detection datasets (MSL, PSM, SMAP, SMD, and SWaT), AIR-Time improves the average F1 score by 4.68% in the few-shot setting and also demonstrates strong competitiveness in the zero-shot setting, achieving the best performance on all four datasets.
[0077] 2. This invention enhances modality alignment: This invention suggests that the construction module facilitates the alignment of time series numerical features with text semantics, and enhances the model's understanding of anomaly classification by utilizing the semantic understanding capabilities of large language models.
[0078] 3. This invention improves computational efficiency: The lightweight framework design significantly reduces resource consumption. Compared to baselines GPT4TS and Time-LLM, AIR-Time reduces training memory usage by 46%, increases training throughput by 4.45 times, and inference throughput by 4.56 times, supporting efficient deployment.
[0079] 4. This invention improves compatibility and scalability: the solution is compatible with different large language model backbones and requires no architectural modifications, making it easily adaptable to various application scenarios. Ablation experiments confirm that each component contributes to performance improvement. Attached Figure Description
[0080] To more clearly illustrate the purpose, design concept, and innovation of the large language model-driven time series anomaly detection method proposed in this invention, the invention will be described in detail below with reference to the accompanying drawings and tables.
[0081] Figure 1 This diagram illustrates the traditional reconstruction paradigm and the exception injection-reconstruction paradigm of this invention.
[0082] Figure 2 This is a diagram of the AIR-Time framework of the present invention. Detailed Implementation
[0083] The present invention will now be described in detail with reference to the accompanying drawings and specific embodiments.
[0084] The specific embodiments of the present invention are described below to enable those skilled in the art to understand the present invention. However, it should be understood that the present invention is not limited to the scope of the specific embodiments. For those skilled in the art, various changes are obvious as long as they are within the spirit and scope of the present invention as defined and determined by the appended claims. All inventions utilizing the concept of the present invention are protected.
[0085] This invention proposes a time series anomaly detection method driven by a large language model, with its core framework called AIR-Time (Anomaly Injection-Reconstruction Based Time Series Anomaly Detection). This scheme employs an anomaly injection-reconstruction paradigm for training, transforming traditional passive reconstruction into an anomaly-aware repair task, and leveraging the semantic reasoning capabilities of the large language model to align time patterns with text representations. As shown in Figure 1, traditional reconstruction methods completely expose the model's target output, leading to ambiguous supervision signals. In contrast, the anomaly injection-reconstruction paradigm preprocesses the data, synthesizing anomalies for different anomaly types, allowing the model to specifically repair different categories of anomalies. This provides stronger supervision signals and incorporates prior knowledge of traditional anomaly type classification into the model, improving model performance and interpretability.
[0086] Figure 2 provides an overview of the AIR-Time training and inference process, which mainly includes the following eight steps:
[0087] ① Anomaly Injection: Given a multivariate time series, the anomaly injection module generates a sequence containing synthetic anomalies during the training phase (no anomalies during inference).
[0088] ② Anomaly Prompt and Time Series Generation: The anomaly injection module generates both anomaly category-related prompt text and the time series after anomaly injection. The former is used as text prompt input to the large model after the prompt is constructed.
[0089] ③ Two-dimensional segmented mapping: The processed time series is mapped into a two-dimensional segmented representation, which is then fed into a large model for time series feature extraction.
[0090] ④ Forward propagation: Text and time series segments are used for feature extraction through forward propagation to reconstruct time series containing anomalies.
[0091] ⑤ Reconstructed sequence output: The model outputs the reconstructed time series.
[0092] ⑥ Loss calculation: Calculate the loss between the original sequence and the reconstructed sequence.
[0093] ⑦ Backpropagation: During the training phase, loss is backpropagated for model optimization.
[0094] ⑧ Inference Classification: In the inference phase, the loss is used for binary classification to assign anomaly labels.
[0095] The AIR-Time framework consists of modules such as exception injection, suggestion construction, slice embedding, large language model backbone, and decoder. It maintains a lightweight and modular design without requiring modification to the internal architecture of the large language model. A detailed breakdown of each step is as follows.
[0096] ① Abnormal injection
[0097] During the training phase, the input multivariate time series... (dimension) , For time step, Apply random transformations to the number of variables to generate a sequence for injecting anomalies. The injection operation covers five types of anomalies: global point anomalies, context point anomalies, shape pattern anomalies, seasonal pattern anomalies, and trend pattern anomalies, to simulate diverse anomalous behaviors. The injection strength is controlled to avoid excessive perturbation affecting convergence. This module enables the model to learn repair capabilities by actively exposing anomalies. Specifically, the anomaly injection operator is defined as:
[0098]
[0099] in, This indicates an abnormal injection operation. Encapsulate anomalous configurations (such as anomalous category, affected variables, and time location). For example, global point anomalies introduce isolated extreme biases at random time steps, while shape pattern anomalies introduce structured distortions over continuous time intervals. This design encourages model inference of temporal structure, cross-variable dependencies, and semantic consistency. During training, 70% of the data is injected with anomalies, while the remaining data is retained, allowing the model to be exposed to both anomalous and normal data, thus enhancing generalization ability. During inference, no further anomaly injection is performed to introduce perturbations.
[0100] ② Anomaly alerts and time series generation
[0101] The exception injection module generates exception time series and constructs a text prompt based on the exception type and injection method, and then passes it to the prompt construction module.
[0102] The prompt module constructs structured text prompts, providing dataset descriptions, task instructions, and anomaly injection details (e.g., "Reconstruct t-step n-dimensional time series and fix outliers"). The prompts are tokenized and embedded, concatenated with the time series embedding to form a multimodal input. Prompt templates include:
[0103] 1) Dataset description: Describe the domain and details of the time series data (e.g., "The MSL dataset is a widely used dataset in multivariate time series anomaly detection, originating from the NASA Mars Science Laboratory rover").
[0104] 2) Task Description: Clearly define the objectives of the time series anomaly detection and reconstruction paradigm.
[0105] 3) Anomaly injection details: For the anomaly injection process during training, inform the model of the type of anomaly injected (e.g., "inject context point anomalies within local windows of different variables").
[0106] During training, 30% of the cues were replaced with heuristic variants to prevent overfitting. This module achieves explicit alignment between time series and text semantics, facilitating cross-modal reasoning.
[0107] ③ Two-dimensional piecewise mapping
[0108] This step transforms the original time series into an embedded representation compatible with large language models. A two-dimensional piecewise partitioning strategy is employed, using a sliding window (partitioning) across the time and variable dimensions. Step length Overlapping slices are extracted. Feature extraction is performed using a multi-layer convolutional neural network, including convolutional blocks and global average pooling, and finally projected onto the hidden dimensions of the large language model. Specific steps:
[0109]
[0110]
[0111]
[0112]
[0113] Piece extraction: For the input tensor (dimension) Apply symmetrical padding to ensure dimensions are permissible. Divisible by integer, then extract the fragmented set using a sliding window. ,in This represents the total number of fragments.
[0114] Convolutional feature extraction: Multi-layer convolutional blocks are used to process the slices, with each convolutional layer doubling the channel dimension, outputting a feature map. This represents the total number of convolutional layers. For the first Layer feature representation.
[0115] Feature aggregation and projection: Applying global average pooling to generate pooled representations ,in It's the last convolutional channel dimension, then projected onto the hidden dimension of the large language model through a multilayer perceptron. Generate temporal embedding vectors .
[0116] This module captures multi-scale spatiotemporal features and supports cross-variable dependency modeling.
[0117] ④ Forward propagation
[0118] This step performs cross-modal fusion of the text prompts constructed in step 2 and the feature vectors extracted in step 3, using a large language model as the backbone. During training, the text and time series modalities are aligned to enable the large language model to analyze time series. Finally, the time series is output through a simple decoder.
[0119] A pre-trained large language model (such as DeepSeek, Llama, or Qwen series) is used as the backbone, with parameters frozen and fine-tuned only through the LoRA adapter. The input is a concatenated embedding sequence. :
[0120]
[0121]
[0122] in, To begin learning, tag embeddings, For the purpose of embedding, Embed for time-sliced data. For learnable end-of-tag embedding, The number of tokens in all embedded vectors. This represents the total number of layers in the backbone of the large language model. The first layer of the backbone of the large language model... Layer output context representation This module encapsulates temporal and semantic information. It constructs the anomaly reconstructing task as a language-guided repair problem targeting anomaly classification, leveraging the general temporal modeling capabilities of large language models.
[0123] ⑤ Reconstructed sequence output
[0124] The forward propagation process ultimately uses a lightweight multilayer perceptron to map the output of the large language model back to the time series domain, generating a reconstructed sequence. The decoding process is defined as follows:
[0125]
[0126] The output is clipped to the original dimension. This ensures an exact match with the input.
[0127] ⑥ Loss Calculation
[0128] The goal of post-reconstruction training is to minimize the mean squared error between the reconstructed sequence and the original normal sequence, thereby making the output sequence as close as possible to the original input sequence and achieving the effect of reconstructing anomalies. During inference, the mean squared error of both the input and output time series is calculated for inference and classification.
[0129] ⑦ Backpropagation
[0130] During training, mean squared error is used as the loss for backpropagation to train the model. The model is trained in two stages: the first stage involves uniform training on multiple datasets to learn a common anomaly category classification, activating training parameters only on the encoder and decoder; the second stage involves fine-tuning on a separate dataset to adapt to downstream tasks, activating the LoRA adapter for backpropagation based on the trained encoder and decoder. Backpropagation is not performed during inference.
[0131] ⑧ Reasoning Classification
[0132] During inference, the mean squared error obtained from the model's inference is statistically analyzed, and an outlier threshold (such as the 99th percentile) is used. Time points exceeding this threshold are identified as outliers; otherwise, they are considered normal. By comparing these binary classification labels with the true labels, an F1 score representing the model's inference performance is calculated. The binary classification labels are then provided to the user as a reference for anomaly detection.
[0133] Other alternative technical solutions:
[0134] 1. Alternatives to exception injection strategies:
[0135] This invention employs five predefined anomaly types (global point anomaly, context point anomaly, shape segment anomaly, seasonal segment anomaly, and trend segment anomaly). Alternatives may include:
[0136] (1) Expand or reduce anomaly types: For example, use only a subset of point anomalies and pattern anomalies, or introduce new anomaly categories (such as semantic anomalies based on generative adversarial networks) to adapt to specific application scenarios.
[0137] (2) Dynamic injection mechanism: The parameters of abnormal injection (such as location, amplitude, duration) can be sampled from the data-driven distribution, rather than a fixed configuration, to enhance diversity.
[0138] (3) Real anomaly fusion: Mix real anomaly data (if labeled) with synthetic anomalies during training to improve robustness to complex anomalies.
[0139] 2. Alternative Model Architecture:
[0140] This approach uses a large language model backbone with convolutional slice embeddings and LoRA adaptation. Alternative solutions may include:
[0141] (1) Embedding module change: Piece embedding can use Transformer encoder, graph neural network, etc. instead of convolutional neural network to capture spatiotemporal features at different scales.
[0142] (2) Replacement of large language model backbone: The large language model can be replaced with other basic general models or special vertical domain models. Multimodal alignment and sequence reconstruction may result in performance improvement.
[0143] (3) Decoder adjustment: The decoder uses linear projection or recurrent neural network instead of multilayer perceptron to reduce complexity.
[0144] 3. Replacement of training and reasoning processes:
[0145] This method employs a two-stage training strategy to train the model. The first stage involves uniform training on multiple datasets to learn the classification of different anomaly categories using text semantics. The second stage involves training on a single dataset and using an adapter to fine-tune the large model, thus adapting it to downstream tasks.
[0146] (1) Training strategy adjustment: Two-stage training can be simplified to single-stage end-to-end learning, or the dataset can be adjusted to gradually increase the learning difficulty in single-stage learning.
[0147] (2) Fine-tuning method extension: In addition to LoRA, prefix tuning or full parameter fine-tuning can be used. Although the computational cost is high, it may improve performance.
[0148] (3) Enhanced reasoning mechanism: Anomaly scores can be combined with prediction errors (e.g., by designing a joint training prediction head) as a supplement to reconstruction errors.
[0149] Protection points of this invention:
[0150] 1. AIR-Time is a time series anomaly detection framework driven by a large language model based on the anomaly injection-reconstruction paradigm. It mainly includes five core components: (1) anomaly injection-reconstruction module, (2) prompt construction module, (3) segmented embedding module, (4) large language model backbone, and (5) decoder.
[0151] 2. In terms of multimodal anomaly injection and alignment during the training phase, this invention proposes a unified anomaly injector that can generate a variety of synthetic anomalies, including point anomalies and pattern anomalies. At the same time, a structured prompt template that combines dataset description, task instructions and anomaly details is designed. Through the fusion of text and time series segment embedding, effective alignment of numerical sequences and semantic space is achieved.
[0152] 3. In the temporal repair and reconstruction stage based on a large language model, this invention constructs a complete model architecture including a lightweight segmented embedding network, a frozen large language model backbone (adapted to LoRA), and a reconstruction decoder. Its key point lies in constructing the anomaly repair task as a language-guided sequence-to-sequence transformation problem, utilizing the general sequence modeling capabilities of the large language model to achieve the repair and reconstruction from abnormal sequences to normal sequences without modifying its internal architecture.
[0153] 4. Regarding anomaly detection and system implementation, this invention provides a system solution comprising a two-stage training and inference workflow. During training, anomaly injection is activated to learn repair capabilities. The first stage involves unified training on multiple datasets to learn a common anomaly category classification. The second stage involves fine-tuning on a separate dataset to adapt to downstream tasks. During inference, injection is disabled, and anomaly detection is performed based on reconstruction errors.
Claims
1. A time series anomaly detection method driven by a large language model, characterized in that, Includes the following steps: Step S1: Given a multivariate time series, the anomaly injection module generates a time series containing synthetic anomalies during the training phase; Step S2: The anomaly injection module simultaneously generates anomaly category-related prompt text and time series after anomaly injection. The anomaly category-related prompt text is used as text prompt input to the large model after the prompt is constructed. Step S3: The processed time series is mapped into a two-dimensional slice representation and fed into a large model for time series feature extraction; Step S4: The error message text and two-dimensional segments related to the anomaly category are used for feature extraction through forward propagation to reconstruct the time series containing the anomaly; Step S5: Model outputs the reconstructed time series sequence; Step S6: Calculate the loss between the anomalous time series and the reconstructed sequence; Step S7: During the training phase, backpropagation of the loss is used for model optimization; Step S8: Inference phase, the loss is used for binary classification to assign anomaly labels.
2. The time series anomaly detection method driven by a large language model according to claim 1, characterized in that, Step S1 includes the following sub-steps: Step S11: During the training phase, process the input multivariate time series... Applying random transformations, time series The dimension is T is the time step, N is the number of variables, and the sequence for generating injection anomalies is generated. ; Step S12: The injection operation covers five types of anomalies, including global point anomalies, context point anomalies, shape pattern anomalies, seasonal pattern anomalies, and trend pattern anomalies, to simulate diverse abnormal behaviors. Step S13: The injection intensity is controlled to avoid excessive perturbation affecting convergence; Step S14: The anomaly injection module actively exposes anomalies, enabling the model to learn repair capabilities. The calculation formula for the anomaly injection operator is: ; Where A represents the exception injection operation, and θ represents the encapsulation of exception configuration.
3. The time series anomaly detection method driven by a large language model according to claim 1, characterized in that, Step S2 includes the following steps: Step S21: Describe the domain and details of the time series data; Step S22: Clearly define the objectives of the time series anomaly detection and reconstruction paradigm; Step S23: For the anomaly injection process during training, inform the model of the type of anomaly injected; Step S24: During training, 30% of the hints are replaced with heuristic variants to prevent overfitting; Step S25: The exception prompting and time series generation module realizes explicit alignment between time series and text semantics, promoting cross-modal reasoning.
4. The time series anomaly detection method driven by a large language model according to claim 1, characterized in that, Step S3 includes the following steps: Step S31: For the input tensor , dimension We apply symmetric padding to ensure that the dimension is divisible by p, and then extract the piecewise set p using a sliding window. ; Where, N p The total number of slices is given. The unfold operation samples a two-dimensional vector time series according to the kernel_size window size. stride is the sampling step size. The output is that each sample in the batch is split into N_p slices of size P*P. Step S32: Process the slices using multi-layer convolutional blocks, doubling the channel dimension at each convolutional layer, and outputting a feature map, the expression of which is: ; in, This represents the total number of convolutional layers. Let F(l) be the feature vector of the l-th layer; ConvBlock is the level of each convolutional kernel, F^(l) is the input of the l-th layer, and F^(l-1) is the output of the previous layer. The final output will extract features from a P*P size slice. Because the number of convolutional channels is constantly expanding, the final number of channels will become C_L. That is, for each slice, after feature extraction, the feature vector becomes P*P*C_L. Step S33: Apply global average pooling to generate the pooled representation. Temporal embedding vectors are generated by projecting the vectors onto the hidden dimension D of a large language model using a multilayer perceptron. Its expression is: ; ; in, It is the channel dimension of the last convolutional layer.
5. The time series anomaly detection method driven by a large language model according to claim 1, characterized in that, Step S4 specifically involves: The anomaly category-related prompt text constructed in step S2 and the two-dimensional segments from step S3 are fused across modalities. A large language model is used as the backbone. During training, the anomaly category-related prompt text and the time series are aligned to enable the large language model to analyze time series. Finally, the time series is output through a simple decoder. Its expression is: ; ; in, To begin learning, tag embeddings. For the purpose of embedding, Embed for time-sliced data. For learnable end-of-tag embedding, The number of tokens in all embedded vectors. The total number of layers in the backbone of the large language model is given by the l-th layer of the backbone, which outputs the context representation. It encapsulates timing and semantic information.
6. The time series anomaly detection method driven by a large language model according to claim 1, characterized in that, Step S5 specifically involves: The forward propagation process ultimately uses a lightweight multilayer perceptron to map the output of the large language model back to the time series domain, generating a reconstructed sequence. The calculation formula for the decoding process is: ; The output is clipped to the original dimension. This ensures an exact match with the input.
7. The time series anomaly detection method driven by a large language model according to claim 1, characterized in that, Step S6 includes the following steps: Step S61: The training objective after reconstruction is to minimize the mean square error between the reconstructed sequence and the original normal sequence, so that the output sequence is as close as possible to the original input sequence, thereby achieving the effect of reconstructing the abnormal sequence. Step S62: During inference, the mean square error of the input and output time series will also be calculated to perform inference classification.
8. The time series anomaly detection method driven by a large language model according to claim 1, characterized in that, Step S7 includes the following steps: Step S71: During the training process, mean squared error is used as the loss for backpropagation to train the model; Step S72: The model is trained in a two-stage manner. In the first stage, it is trained uniformly on multiple datasets to learn a common anomaly category classification and activates the training parameters only on the codec. In the second stage, it is fine-tuned on a separate dataset to adapt to downstream tasks and activates the LoRA adapter for backpropagation based on the trained codec.
9. The time series anomaly detection method driven by a large language model according to claim 1, characterized in that, Step S8 includes the following steps: Step S81: During the inference process, the mean square error obtained from the model inference is statistically analyzed, and the outlier quantile is used as the outlier threshold. If the time point is greater than the threshold, it will be marked as an outlier point; otherwise, it is a normal point. Step S82: By comparing the binary classification labels and the true labels, the F1 score that represents the model's inference performance is calculated, while the binary classification labels are provided to the user as a reference for anomaly detection.