Time-frequency-based framework and algorithm for explainability of time-series classification model
The time-frequency-based framework addresses the lack of frequency domain consideration in conventional models by using conversion and perturbation algorithms to enhance explainability and provide rapid decision rationale in time series classification models.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- KOREA ADVANCED INST OF SCI & TECH
- Filing Date
- 2024-12-12
- Publication Date
- 2026-06-18
AI Technical Summary
Conventional techniques for explaining time series classification models lack the ability to consider the frequency domain, leading to decreased explainability due to increased model complexity, which is critical in fields like finance, climate, and medicine.
A time-frequency-based framework and algorithm that includes a first transformation unit for converting time series data to frequency data, a feature extraction unit for deriving perturbation outputs using insertion, deletion, and combination steps, and a second transformation unit for applying inverse Fourier transforms to verify classification results.
Provides detailed explanations for time series classification by considering both time and frequency domains, enhancing explainability and enabling faster decision rationale derivation in fields such as finance, climate, and medicine.
Smart Images

Figure KR2024096785_18062026_PF_FP_ABST
Abstract
Description
Time-Frequency Based Framework and Algorithm for Explaining Time Series Classification Models
[0001] The present invention relates to a time-frequency-based framework and algorithm for explaining a time series classification model, and more specifically, to a time-frequency-based framework and algorithm for explaining a time series classification model that provides an explanation for class determination of a time series classification model by considering a time domain and a frequency domain.
[0002] While the performance of time series classification models has improved due to the rapid recent advancements in artificial intelligence technology, the complexity of these models has also increased significantly, leading to a decrease in the explainability of their decisions.
[0003] In particular, since time series classification models are primarily used in fields where the model's decisions regarding time series data are critical, such as finance, climate, science, and medicine, explaining the model's decisions is essential for users to understand those decisions and verify the rationale.
[0004] Conventional techniques for explaining decisions in time series models have limitations in that they cannot consider the frequency domain because they rely only on the time domain to extract and explain important features using perturbation methods.
[0005] Therefore, there is a need for a framework that provides an explanation for the determination of time series classification models by considering both the time domain and the frequency domain.
[0006] As such, according to the present invention, the purpose is to provide a time-frequency-based framework and algorithm for explaining a time series classification model, which provides an explanation for class determination of the time series classification model by considering the time domain and the frequency domain.
[0007] According to an embodiment of the present invention for achieving such technical challenges, a time-frequency-based framework for explaining a time series classification model may include: a first transformation unit that receives time series data from a user and extracts frequency data by applying a predetermined first transformation method; a feature extraction unit that derives a perturbation output by applying the frequency data, a target time series classification model, and a classification result derived by applying the time series data to the target time series classification model to a constructed perturbation algorithm; and a second transformation unit that derives predicted time series data by applying a predetermined second transformation method to the perturbation output and applies the derived predicted time series data to the target time series classification model to check whether the predicted classification result matches the input classification result.
[0008] The above first conversion unit can convert the time series data into frequency data in the frequency domain through the following mathematical formula:
[0009]
[0010] Here, S is frequency data converted to the frequency domain. And, M is the total number of time segments, K is the total frequency bin, N is the number of window samples, w[n] is the window function applied to n within the window portion of the time series data, and x[n+mH] is the application of the window to segments where the signal is different by multiplying each time segment (m) by the hop size (H), and is the Fourier transform kernel for discrete frequency k.
[0011] The above feature extraction unit can derive a perturbation output from frequency data through a perturbation algorithm that performs any one of an insertion step, a deletion step, and a combination step.
[0012] The above insertion step involves inserting a plurality of features into the frequency data and analyzing the classification probability of the frequency data with the inserted features for each class of the target time series classification model to measure the first importance of the frequency data with the inserted features for each class, determining the ranking of the features in order of highest first importance to be classified as the correct class, and deriving features of a rank higher than a predetermined rank as perturbation outputs.
[0013] The above perturbation algorithm can determine the ranking of features in order of highest first importance to be classified as the correct class by repeating the process of measuring the first importance of each class by repeating the features of the frequency data with the above multiple features inserted in a plurality of different combinations.
[0014] The above deletion step analyzes the degree to which the probability of the target time series classification model classifying the frequency data by class decreases when one or more features of the frequency data are deleted, measures the second importance of the features of the frequency data by class, and determines the ranking of the features in order of the highest second importance to be classified as the correct class, thereby deriving features of a predetermined rank or higher as perturbation outputs.
[0015] The above perturbation algorithm can determine the ranking of features in order of highest second importance to be classified as the correct class by repeating the process of removing one or more features in the time-frequency domain and measuring the second importance by repeating the process of removing one or more features from the frequency data in a plurality of different combinations.
[0016] The above combination step involves analyzing the classification probability of the frequency data with inserted features for each class of the target time series classification model when multiple features are inserted into the frequency data, measuring the first importance of the features of the frequency data with inserted features for each class, analyzing the degree to which the probability of the target time series classification model classifying the frequency data for each class decreases when one or more features of the frequency data are deleted, measuring the second importance of the features of the frequency data for each class, calculating the difference between the first importance and the second importance in the correct class, determining the ranking of the features in order of smallest difference, and deriving features of a predetermined rank or higher as perturbation outputs.
[0017] The first transform unit can derive frequency data by applying a short-time fourier transform (STFT) to the input time series data, and the second transform unit can derive predicted time series data by applying an inverse short-time fourier transform (ISTFT) to the perturbation output.
[0018] The second transformation unit above can apply a second transformation method specified in advance to the perturbation output by setting hyperparameters to satisfy a preset condition.
[0019] The second transformation unit above can provide an explanation of the classification result of the target time series classification model through over-supplication based on the perturbation output to the class-specific classification probabilities included in the prediction classification result.
[0020] As such, according to the present invention, it is possible to provide an explanation for determining a high-quality time series classification model by considering both the time domain and the frequency domain, and to provide a detailed explanation of why time series data is classified into a specific class by utilizing multiple features derived from the time domain and the frequency domain.
[0021] Furthermore, by applying it to the fields of finance, climate, science, and medicine, the reasons for the decisions of time series classification models can be derived in the time and frequency domains.
[0022] In addition, it is possible to derive an explanation for the determination of the time series classification model more quickly than conventional technology.
[0023] FIG. 1 is a diagram of a time-frequency-based framework explaining the determination of a time series classification model according to one embodiment of the present invention.
[0024] FIG. 2 is a diagram illustrating the structure of a time-frequency-based framework explaining the determination of a time series classification model according to one embodiment of the present invention.
[0025] FIG. 3 is a diagram illustrating time series data in the time domain of each class and converted frequency data according to an embodiment of the present invention.
[0026] FIG. 4 is a diagram illustrating the structure of a perturbation algorithm according to one embodiment of the present invention.
[0027] Figure 5 is a table showing the difference in class prediction probabilities before and after when randomly generated noise is added in the insertion step according to an embodiment of the present invention and a conventional model.
[0028] Figure 6 is a table of reduced class prediction probabilities when the most important features in each class are removed in the deletion step according to a conventional model and an embodiment of the present invention.
[0029] Figure 7 is a table of reduced class prediction probabilities when k features are removed for the correct class in the deletion step according to a conventional model and an embodiment of the present invention.
[0030] Figure 8 is a graph using the most important features of each class derived using a conventional model and a perturbation algorithm according to an embodiment of the present invention.
[0031] Preferred embodiments according to the present invention will be described in detail below with reference to the attached drawings. In this process, the thickness of lines or the size of components shown in the drawings may be exaggerated for clarity and convenience of explanation.
[0032] Furthermore, the terms described below are defined in consideration of their functions within the present invention, and these may vary depending on the intent or practice of the user or operator. Therefore, the definitions of these terms should be based on the content throughout this specification.
[0033] In the embodiments described below, the time-frequency-based framework (100) describing the determination of a time series classification model may be operated by a computing device comprising one or more memories or one or more processes capable of performing the following process.
[0034] FIG. 1 is a configuration diagram of a time-frequency-based framework explaining the determination of a time series classification model according to one embodiment of the present invention, and FIG. 2 is a diagram illustrating the structure of a time-frequency-based framework explaining the determination of a time series classification model according to one embodiment of the present invention.
[0035] As illustrated in FIGS. 1 and 2, the time-frequency based framework (100) includes a first transformation unit (110), a feature extraction unit (120), and a second transformation unit (130).
[0036] First, the first transformation unit (110) can receive time series data from the user and apply a pre-specified first transformation method (e.g., Short-time Fourier transform (STFT)) to extract frequency data.
[0037] Specifically, the first conversion unit (110) can apply a predetermined first conversion method (e.g., short-time Fourier transform) to the input time series data to convert and extract frequency data including frequency changes of the time series data. At this time, the frequency data is formed by decomposing the input time series data into the sum of a plurality of simple periodic functions, and represents the change in the intensity of the frequency components over time for the plurality of frequency components included in the time series data.
[0038] At this time, the first conversion unit (110) can extract frequency data by converting time series data into the frequency domain through the following [Equation 1].
[0039]
[0040] Here, S is frequency data converted to the frequency domain. And, M is the total number of time segments, K is the total frequency bin, N is the number of window samples, w[n] is the window function applied to n within the window portion of the time series data, and x[n+mH] is the application of the window to segments where the signal is different by multiplying each time segment (m) by the hop size (H), and is the Fourier transform kernel for discrete frequency k.
[0041] FIG. 3 is a diagram illustrating time series data in the time domain of each class and converted frequency data according to an embodiment of the present invention.
[0042] As illustrated in FIG. 3, the first conversion unit (110) can convert time series data in the time domain by class into frequency data by applying a predetermined first conversion method (e.g., short-time Fourier transform).
[0043] Next, the feature extraction unit (120) can derive a perturbation output by applying the frequency data, the target time series classification model, and the classification result derived by applying the time series data to the target time series classification model to the constructed perturbation algorithm. At this time, the class of the classification result derived by applying the time series data to the target time series classification model is the correct class.
[0044] FIG. 4 is a diagram illustrating the structure of a perturbation algorithm according to one embodiment of the present invention.
[0045] As illustrated in Fig. 4, the perturbation algorithm is a model constructed to derive a perturbation output from input frequency data by performing any one of an insertion step, a deletion step, and a combined step. In this case, the perturbation algorithm can select features that are more accurately classified into the correct class through realistic background perturbation (RBP) on the frequency data.
[0046] In the insertion step, the perturbation algorithm analyzes the classification probability of the frequency data with inserted features for each class of the target time series classification model when multiple features are inserted into the frequency data, measures the first importance of the features of the frequency data with multiple features inserted for each class, determines the ranking of the features in order of highest first importance to be classified as the correct class, and derives features of a predetermined rank or higher as perturbation outputs. At this time, the first importance for each class can be calculated by repeating the features of the frequency data with multiple features inserted in multiple different combinations.
[0047] To elaborate, perturbation algorithms can derive a perturbation output by applying realistic background perturbations to frequency data to identify the frequency band with the highest representation and lowest variance in time-frequency.
[0048] At this time, the feature extraction unit (120) can produce a perturbation output derived by applying a binary perturbation mask to a realistic background perturbation through [Equation 2] below.
[0049]
[0050] Here, is the p-th perturbed output of the i-th iteration, and is the result of applying realistic background perturbations to time series data (x), and i is an index ranging from 1 to any one of the number of all feature sets in S, and is a set of p random binary perturbation masks, each binary perturbation mask representing R unmasked time-frequency regions as 1 and the remaining masked regions as 0, and S is the frequency data.
[0051] In addition, the feature extraction unit (120) converts the perturbation output into the time domain and converts the perturbation output ( ) can be obtained, and the obtained transformed perturbation output can be applied to the target time series classification model (C) to calculate the classification probability of frequency data by class.
[0052] In addition, the feature extraction unit (120) classifies time series data into a correct class and has a classification probability ( ) and classification probability of classifying frequency data into the correct class ( Based on ), the first importance can be calculated through the following [Equation 3].
[0053]
[0054] Here, is an exponential function that is 1 if the binary perturbation mask for the feature is removed at the p-th perturbation, and 0 if the binary perturbation mask for the feature is present at the p-th perturbation.
[0055] In addition, the feature extraction unit (120) has the highest first importance feature ( Select ) as the perturbation output to obtain the feature set (F) of the frequency data and the selected feature set ( ) can be updated through [Equation 4] below.
[0056]
[0057] Figure 5 is a table showing the difference in class prediction probabilities before and after when randomly generated noise is added in the insertion step according to an embodiment of the present invention and a conventional model.
[0058] As shown in Figure 5, unlike conventional models (LIME, KernelSHAP, RISE, LIMESegment), the perturbation algorithm (FIA(ours)) of the present invention does not show a large difference in the class prediction probability of frequency data with inserted noise, indicating robustness against noise.
[0059] In the deletion step, the perturbation algorithm measures the second importance of features by class by analyzing the degree to which the probability of a target time series classification model classifying frequency data by class decreases when one or more features of the frequency data are deleted from the frequency data, and determines the rank of features in order of the greatest second importance in the ground truth class to derive features of a predetermined rank or higher as the perturbation output. At this time, the perturbation algorithm deletes one or more features in the time-frequency domain and calculates the second importance of each class by repeating the features of the frequency data from which one or more features have been removed in multiple different combinations.
[0060] To elaborate, the feature extraction unit (120) has P different random mask features ( depending on the i-th repetition number for frequency data (S) A ) is generated, and for each perturbation, R time-frequency domains can be represented by 0 if masked and 1 if masked. Here, i is an index ranging from 1 to any one of the number of all feature sets of frequency data (S) (e.g., i=1, 2, 3, …, F), and P is the total number of masks.
[0061] At this time, the number of masks (P) is a hyperparameter, and as the number of masks increases, the computation cost increases, and if the number of masks is too small, too few masked features are exposed, so the measured importance is inaccurate, which may form a tradeoff. In the present invention, the number of masks (P) is limited to 2,000 in all processes (insertion step, deletion step, and combination step).
[0062] In addition, the feature extraction unit (120) can derive a perturbation output using each mask p of the i-th iteration number through the following [Equation 5].
[0063]
[0064] Here, is the p-th perturbed output of the i-th iteration, and is the result of applying realistic background perturbations to time series data (x), and is the i-th mask feature, and S is the frequency data.
[0065] In addition, the feature extraction unit (120) can calculate the second importance through the following [Equation 6].
[0066]
[0067] Here, is 1 if the i-th mask feature is not masked, and 0 at the p-th perturbation otherwise, and is the classification probability of classifying time series data into the correct class, and is the classification probability of classifying frequency data into the correct class.
[0068] In addition, the feature extraction unit (120) has a feature with the lowest second importance ( Select ) to obtain the feature set (F) of the frequency data and the selected feature set ( ) can be updated through [Equation 7] below.
[0069]
[0070] Figure 6 is a table of reduced class prediction probabilities when the most important features in each class are removed in the deletion step according to a conventional model and an embodiment of the present invention.
[0071] As shown in Fig. 6, unlike conventional models (LIME, KernelSHAP, RISE, LIMESegment), the perturbation algorithm (FIA(ours)) of the present invention has the largest reduced class prediction probability, so it can be seen that the perturbation algorithm of the present invention selects important features better than conventional models.
[0072] Figure 7 is a table of reduced class prediction probabilities when k features are removed for the correct class in the deletion step according to a conventional model and an embodiment of the present invention.
[0073] As shown in Fig. 7, unlike conventional models (LIME, KernelSHAP, RISE, LIMESegment), the perturbation algorithm (FIA(ours)) of the present invention has the largest reduced class prediction probability, so it can be seen that the perturbation algorithm of the present invention selects important features better than conventional models.
[0074] In the combination step, the perturbation algorithm analyzes the classification probability of the frequency data with inserted features for each class of the target time series classification model when multiple features are inserted into the frequency data, measures the first importance of the features of the frequency data with inserted features for each class, analyzes the degree to which the probability of classifying the frequency data for each class decreases when one or more features of the frequency data are deleted, measures the second importance of the features for each class, calculates the difference between the first importance and the second importance in the correct class to determine the rank of the features in order of smallest difference, and can derive features of a predetermined rank or higher as perturbation outputs.
[0075] To elaborate, the feature extraction unit (120) can calculate the difference between the first importance and the second importance using the following [Equation 8].
[0076]
[0077] Here, is the difference between the first and second importance of the i-th feature, and is the first importance of the i-th feature, and is the second importance of the i-th feature, and α is a pre-set weight less than 1.
[0078] Figure 8 is a graph using the most important features of each class derived using a conventional model and a perturbation algorithm according to an embodiment of the present invention.
[0079] As shown in Fig. 8, it can be seen that the combination step of the perturbation algorithm (FIA(ours)) of the present invention has the highest proportion of first place compared to conventional models (LIME, KernelSHAP, RISE, LIMESegment).
[0080] Furthermore, perturbation algorithms can be applied to and combined with conventional explainable algorithms (Explainable AI, XAI) (e.g., LIME (Local interpretable model-agnostic explanations), KernelSHAP, RISE (Randomized input sampling for explanation)).
[0081] Next, the second transformation unit (130) can derive predicted time series data by applying a pre-specified second transformation method (e.g., Inverse short-time fourier transform, ISTFT) to the perturbation output, and apply the derived predicted time series data to a target time series classification model to check whether the predicted classification result matches the input classification result.
[0082] Specifically, the second transformation unit (130) can derive predicted time series data by applying a second transformation method specified in the perturbation output, since the target time series classification model is learned using time series data (Raw signal) in the time domain.
[0083] At this time, the second transformation unit (130) can apply a second transformation method specified in advance to the perturbation output by setting hyperparameters to satisfy a pre-set condition (e.g., an overlap-add condition (e.g., applying 70% overlap), a match rate condition (e.g., satisfying a match rate of 80% or more between the predicted classification result and the input classification result), etc.).
[0084] According to one embodiment of the present invention, the second conversion unit (130) can set the window size and hop size (or the size where the window overlaps, hop size) among the hyperparameters to satisfy a preset condition (designated as an optimal size for the input frequency data).
[0085] Here, the present invention can set optimal hyperparameters based on the length of the samples using the UCR repository dataset. For example, since Twopatterns of the UCR repository dataset has the shortest length with each sample having 128 timesteps and CinCECGTorso has the longest length with each sample having 1,639 timesteps, when the window size is 16 and the hop size is 8, the reliability is significantly higher than that of CinCECGTorso because the amount of information contained in Twopatterns is greater. In addition, the second transformation unit (130) can provide an explanation of the classification result of the target time series classification model through weighted-sum based on the perturbation output of the class-specific classification probability included in the predicted classification result.
[0086] According to the embodiments of the present invention described above, it is possible to provide an explanation for determining a high-quality time series classification model by considering both the time domain and the frequency domain, and to provide a detailed explanation of why time series data is classified into a specific class by utilizing multiple features derived from the time domain and the frequency domain.
[0087] Furthermore, by applying it to the fields of finance, climate, science, and medicine, the reasons for the decisions of time series classification models can be derived in the time and frequency domains.
[0088] In addition, it is possible to derive an explanation for the determination of the time series classification model more quickly than conventional technology.
[0089] The present invention has been described with reference to the embodiments illustrated in the drawings, but this is merely illustrative, and those skilled in the art will understand that various modifications and equivalent alternative embodiments are possible therefrom. Accordingly, the true technical scope of protection of the present invention should be determined by the technical spirit of the following claims.
[0090] [Explanation of the symbol]
[0091] 100: A time-frequency-based framework explaining decisions in time series classification models
[0092] 110: First converter
[0093] 120: Feature extraction unit
[0094] 130: Second converter
Claims
1. A first conversion unit that receives time series data from a user and extracts frequency data by applying a pre-specified first conversion method; A feature extraction unit that derives a perturbation output by applying the above frequency data and a target time series classification model, and the classification result derived by applying the above time series data to the target time series classification model to a constructed perturbation algorithm; and A time-frequency-based framework for describing a time series classification model, comprising a second transformation unit that derives predicted time series data by applying a pre-specified second transformation method to the perturbation output and applies the derived predicted time series data to the target time series classification model to verify whether the predicted classification result matches the input classification result.
2. In Paragraph 1, The above first conversion unit is, A time-frequency-based framework for explaining a time series classification model that converts the above time series data into frequency data in the frequency domain through the following mathematical formula: Here, S is frequency data converted to the frequency domain. And, M is the total number of time segments, K is the total frequency bin, N is the number of window samples, w[n] is the window function applied to n within the window portion of the time series data, and x[n+mH] is the application of the window to segments where the signal is different by multiplying each time segment (m) by the hop size (H), and is the Fourier transform kernel for discrete frequency k.
3. In Paragraph 1, The above feature extraction unit is, A time-frequency-based framework for describing a time series classification model that derives a perturbation output from frequency data through a perturbation algorithm that performs any one of an insertion step, a deletion step, and a joining step.
4. In Paragraph 3, The above insertion step is, A time-frequency-based framework for describing a time-series classification model, wherein a plurality of features are inserted into the above frequency data, the classification probability of the frequency data with the inserted features is analyzed for each class of the above target time-series classification model to measure the first importance of the frequency data with the inserted features for each class, the ranking of the features is determined in order of highest first importance to be classified as the correct class, and features of a predetermined rank or higher are derived as perturbation outputs.
5. In Paragraph 4, The above perturbation algorithm is, A time-frequency-based framework for explaining a time-series classification model that determines the ranking of features in order of highest first importance to be classified as the correct class by repeating the process of measuring the first importance by repeating the features of frequency data with the above-mentioned multiple features inserted in multiple different combinations.
6. In Paragraph 3, The above deletion step is, A time-frequency-based framework for describing a time-series classification model, which analyzes the degree to which the probability of the target time-series classification model classifying the frequency data by class decreases when one or more features of the frequency data are deleted, measures the second importance of the features of the frequency data by class, determines the ranking of the features in order of the highest second importance to be classified as the correct class, and derives features of a predetermined rank or higher as perturbation outputs.
7. In Paragraph 6, The above perturbation algorithm is, A time-frequency-based framework for describing a time-series classification model that deletes one or more features in the time-frequency domain, and repeats the process of measuring the second importance for each class by repeating the features of the frequency data from which one or more features have been deleted in multiple different combinations, thereby determining the ranking of features in order of highest second importance to be classified as the correct class.
8. In Paragraph 3, The above joining step is, When multiple features are inserted into the above frequency data, the classification probability of the frequency data with inserted features is analyzed for each class of the target time series classification model, and the first importance of the features of the frequency data with inserted multiple features is measured for each class. When one or more features of the above frequency data are deleted, the degree to which the probability of the target time series classification model classifying the above frequency data by class decreases is analyzed, and the second importance of the features of the above frequency data by class is measured respectively. A time-frequency-based framework for describing a time-series classification model that calculates the difference between the first importance and the second importance in the correct answer class, determines the ranking of features in order of smallest difference, and derives features of a predetermined rank or higher as perturbation outputs.
9. In Paragraph 1, The first transformation unit applies a Short-time Fourier transform (STFT) to the input time series data to derive frequency data, and The above second transform unit is a time-frequency-based framework for explaining a time series classification model that derives predicted time series data by applying an inverse short-time fourier transform (ISTFT) to the perturbation output.
10. In Paragraph 1, The above second converter is, A time-frequency-based framework for describing a time-series classification model that applies a second transformation method specified in advance to the perturbation output by setting hyperparameters to satisfy pre-set conditions.
11. In Paragraph 1, The above second converter is, A time-frequency-based framework for explaining a time series classification model, which provides an explanation of the classification results of a target time series classification model through overweighting based on the perturbation output to the class-specific classification probabilities included in the above-mentioned prediction classification results.