Anomaly detection using measurement time series and event sequences for medical decision-making

Anomaly detection using multivariate time series and event sequences improves predictive maintenance and health management by encoding feature vectors for timely corrective actions.

JP7876721B2Active Publication Date: 2026-06-19NEC LABORATORIES AMERICA INC

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
NEC LABORATORIES AMERICA INC
Filing Date
2023-10-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing systems struggle to accurately predict anomalies in complex systems like cyber-physical and health management systems, leading to inefficiencies in preventive maintenance and health treatment adjustments.

Method used

Anomaly detection is performed using multivariate time series and event sequences encoded by transducers and aggregation networks to generate feature vectors, followed by corrective actions to mitigate anomalies.

🎯Benefits of technology

Enhances the reliability and efficiency of anomaly detection, allowing for timely preventive maintenance and health event predictions, thereby improving system stability and patient safety.

✦ Generated by Eureka AI based on patent content.
Patent Text Reader

Abstract

The method and system for anomaly detection includes encoding (506) multivariate time series and multiple types of event sequences using respective transformers and aggregation networks to generate feature vectors. Anomaly detection is performed (508) using the feature vectors to identify anomalies in the system. Corrective actions are taken (510) in response to the anomalies to correct or mitigate the effects of the anomalies. The detected anomalies can be used in healthcare settings to assist medical professionals in making decisions regarding patient treatment. The encoding can include machine learning models for implementing the transformers and aggregation networks using deep learning.
Need to check novelty before this filing date? Find Prior Art

Description

【Technical Field】 【0001】 Related Application Information This application claims the priority of U.S. Patent Application No. 63 / 418,999 filed on October 25, 2022, and U.S. Patent Application No. 18 / 493,374 filed on October 24, 2023, the entire contents of each of which are incorporated herein by reference. 【Background Art】 【0002】 The present invention relates to event prediction, and more particularly to inferring a system state from an event history and time series information. Description of Related Art 【0003】 Event prediction is useful for managing complex systems. In cyber-physical systems such as information technology systems, it is possible to predict hardware failures and plan preventive maintenance. In health management systems, by predicting harmful events, doctors can adjust treatments early and prevent adverse effects on health. 【Summary of the Invention】 【0004】 Methods for anomaly detection include encoding a multivariate time series and multiple types of event sequences using respective transducers and an aggregation network to generate feature vectors. Anomaly detection is performed using the feature vectors to identify anomalies within the system. To correct or mitigate the effects of the anomalies, corrective actions corresponding to the anomalies are performed. 【0005】 An anomaly detection system includes a hardware processor and memory for storing computer programs. When executed by the hardware processor, the computer program causes the hardware processor to encode multivariate time series and multiple types of event sequences using its respective transducers and aggregation networks to generate feature vectors. Anomaly detection is performed using these feature vectors to identify anomalies within the system. Corrective actions corresponding to the anomaly are then performed to correct or mitigate its effects. 【0006】 The treatment method involves measuring time-series information about the patient. Using separate converters and aggregation networks, the time-series information and health event sequences are encoded for the patient, generating feature vectors. Anomaly detection is performed using the feature vectors to identify health events. Corrective actions corresponding to the health events are taken to correct or mitigate the adverse health effects caused by the health events. 【0007】 These and other features and advantages will become apparent from the following detailed description of the exemplary embodiment, which will be read in conjunction with the attached drawings. [Brief explanation of the drawing] 【0008】 This disclosure provides further details in the following description of preferred embodiments with reference to the following figures. 【0009】 [Figure 1] This is a diagram illustrating the configuration of a cyber-physical system that performs anomaly detection based on time-series information and event log information, according to one embodiment of the present invention. 【0010】 [Figure 2] This is a diagram of a patient receiving treatment according to a detected health abnormality, based on one embodiment of the present invention. 【0011】 [Figure 3]A block diagram of a health management facility that detects anomalies using information regarding the functions of a treatment system and medical records according to an embodiment of the present invention. 【0012】 [Figure 4] A block / flow diagram of an anomaly detection method / system according to an embodiment of the present invention. 【0013】 [Figure 5] A block / flow diagram of a method for training and using a deep learning model for anomaly detection according to an embodiment of the present invention. 【0014】 [Figure 6] A block / flow diagram of a method for training a deep learning model for detecting anomalies according to an embodiment of the present invention. 【0015】 [Figure 7] A block diagram of an arithmetic unit that can train and use an event prediction model to detect and correct anomalies according to an embodiment of the present invention. 【0016】 [Figure 8] A diagram of an exemplary neural network architecture that can be used to implement a part of an anomaly prediction model according to an embodiment of the present invention. 【0017】 [Figure 9] A diagram of an exemplary deep neural network architecture that can be used to implement a part of an anomaly prediction model according to an embodiment of the present invention. 【Embodiments for Carrying Out the Invention】 【0018】 Event prediction uses past event information, but may also use information regarding a system state that is constantly changing. The system state can be inferred from current time-series information regarding system measurement values. 【0019】 Events and time series information have complex causal relationships. For example, an increasing trend in memory usage in a computer system is likely to ultimately cause a memory shortage event. On the other hand, the level of increase in the disk queue length may be the result of a disk access-intensive application startup event. Event prediction considering time series information and event history yields better results than a system that considers only one of them. 【0020】 Measurement values from the system may be correlated with event logs. This is because many actions that cause changes in measurement values are recorded in the logs. For example, when a large-scale application is launched, it is expected that the processor usage will increase rapidly. If such an activity is not recorded, it may be abnormal. On the other hand, a sudden burst of network traffic events is not abnormal if it is correlated with an increase in the number of users. However, if the traffic measurement values are normal, the burst of events may suggest a hardware failure. Furthermore, different measurement values may be correlated with each other because they measure the same basic performance factors. For example, processor usage and memory usage both reflect the level of system activity. 【0021】 Furthermore, the interpretation of the system state can utilize the situation jointly provided by the time series of all measurement values and the event logs. In order to make the detection results reliable and explainable, events that respond to specific anomalies can be pinpointed. 【0022】 For this purpose, machine learning models can be used to predict the type and timing of anomalies given a history of previous events and one or more time-series measurements. Transducers and attention mechanisms are used to explicitly model the interaction between system events and measurements, encapsulating the interaction in a hidden state. Support vector data description (SVDD) loss can be used to characterize the encoded state of normal data and to detect whether the input multivariate sensor data deviates from a normal state. 【0023】 Referring here to Figure 1, the maintenance system 106 in the context of the monitored system 102 is shown. The monitored system 102 can be any appropriate system, including physical systems such as manufacturing lines and physical plant operations, electronic systems such as computers and other computerized devices, software systems such as operating systems and applications, and cyber-physical systems that combine physical systems with electronic and / or software systems. Exemplary systems 102 include a variety of types, such as railway systems, power plants, vehicle sensors, data centers, and transportation systems. 【0024】 One or more sensors 104 record information about the state of the monitored system 516 102. Sensors 104 can be any suitable type of sensor, including physical sensors such as temperature, humidity, vibration, pressure, voltage, current, magnetic field, electric field, and light sensors, as well as software sensors such as logging utilities installed on the computer system to record information about the state and operation of operating systems and applications running on the computer system. The information generated by sensors 104 can be in any suitable format and may include sensor log information generated in heterogeneous formats. 【0025】 Sensor 104 may transmit recorded sensor information to the abnormal maintenance system 106 via any suitable communication medium and protocol, including wireless and wired communication. The maintenance system 106 can identify abnormalities or abnormal operation, for example, by monitoring the multivariate time series generated by sensor 104. When abnormal operation is detected, the maintenance system 106 communicates with the system control unit and corrects the abnormal operation by changing one or more parameters of the monitored system 102. 【0026】 Exemplary corrective actions include changing the security settings of an application or hardware component, changing the operating parameters (e.g., operating speed) of an application or hardware component, stopping and / or restarting an application, stopping and / or restarting a hardware component, changing environmental conditions, or changing the state or settings of a network interface. This allows the maintenance system 106 to automatically correct or mitigate abnormal behavior. Identifying specific sensors 104 associated with the abnormal classification can reduce the time required to isolate the problem. 【0027】 Each of the sensors 104 outputs a time series that encodes the measurements taken by the sensor over time. For example, a time series contains pairs of information, each pair containing a measurement and a timestamp representing the time the measurement was taken. Each time series may be divided into segments that represent measurements taken by the sensor over a specific time range. The time series segments may represent any appropriate interval, such as one second, one minute, one hour, or one day. The time series segments may represent a set number of collection points, such as 100 measurements, rather than a fixed period. 【0028】 The maintenance system 106 can track the occurrence of events related to the state of the monitored system 102. For example, the maintenance system 106 can receive information related to workload, job start and stop, and failures. This information may be recorded together with appropriate timestamps and other state information related to the state of the monitored system 102 at the time the event occurred, including information collected by, for example, sensor 104. While time-series information may be recorded periodically or aperiodic but frequent, event information may be recorded each time a discrete event occurs. 【0029】 The maintenance system 106 may use the event prediction 108 to predict when future events are likely to occur. The event prediction uses both historical event information and time-series information to determine when an event is likely to occur. The maintenance system 106 can further determine which information sources contribute most to the prediction, thereby aligning the corrective actions taken by the maintenance system 106 with the root cause of the problem. 【0030】 Referring to Figure 2, patient 20 is shown in the context of a health management system. For example, patient 202 may be undergoing hemodialysis (also simply referred to as "dialysis"). During dialysis, the dialysis machine 204 automatically draws the patient's blood, processes and purifies it, and reintroduces the purified blood into the patient's body. Dialysis can take up to four hours and may be performed every three days, but other intervals and durations are also possible. While dialysis is specifically envisioned, it should be understood that other appropriate medical procedures, monitoring, or systems may be used instead. 【0031】 Before, during, and after dialysis sessions, patient 202 may experience treatment-related health events. While such events may be dangerous for patient 202, they can be predicted based on knowledge of previous health events and the patient's current health measurements. Before patient 202 undergoes a medical procedure or treatment, such as dialysis, healthcare professionals 206 should consider recommendations 208, including a predictive score. This predictive score indicates the likelihood of health events occurring during the dialysis session. Recommendations 208 may further include information related to the type of event expected and measurements of the patient's condition. These recommendations are particularly intended to be made before the start of a dialysis session, allowing for adjustments to the treatment. 【0032】 Recommendations may be based on various input information. Some of this information may include a patient's static profile, such as age, sex, dialysis initiation date, and previous health events. This information may also include dynamic data such as dialysis measurement records, blood pressure, weight, venous pressure, blood test measurements, and cardiothoracic ratio (CTR), which may be collected at each dialysis session. Blood test measurements may be performed regularly, for example twice a month, and may measure factors such as albumin, glucose, and platelet count. CTR may also be measured regularly, for example once a month. Dynamic information may also be recorded during dialysis sessions, for example using sensors on the dialysis machine 204. Dynamic information may be modeled as a time series over each frequency. 【0033】 Furthermore, the system itself may be monitored within the healthcare environment. For example, the operating parameters of the dialysis machine 204 or other systems within the hospital or other healthcare facility, along with a history of past events in the system, are often monitored to detect anomalies, as described below. If an anomaly is detected, corrective actions may be taken and / or the system administrator may be notified. 【0034】 Next, referring to Figure 3, this is a diagram illustrating anomaly detection in the context of healthcare facility 300. Rather than detecting anomalies in the treatment of a single patient, or in addition to that, this principle may be applied to all systems within the facility. This can be useful for monitoring and treating multiple patients, for example, in response to changes in environmental conditions or shortages of materials. Such facilities may also be vulnerable to cyberattacks, and detecting anomalies in such situations can help identify and prevent attacks, thereby maintaining the facility's ability to treat patients. 【0035】 The healthcare facility may include one or more healthcare professionals 302 who provide the anomaly detection system 308 with information about anomalies and measurements of the system status. The treatment system 304 may further be designed to monitor the patient's condition, create medical records 306, and automatically administer and adjust treatment as needed. 【0036】 Based on information extracted from at least medical professionals 302, treatment systems 304, and medical records 306, the anomaly detection system 308 can detect anomalies and automatically respond to correct or mitigate detected anomalies. For example, corrective actions may be taken and / or the facility administrator may be notified. If the administrator is notified, the anomaly detection can be used to assist the hospital administrator in making decisions. 【0037】 Different elements of the health management facility 300 can communicate with each other via the network 310, for example, using any suitable wired or wireless communication protocol and medium. Thus, the output of the anomaly detection system 308 can access medical records 306 stored remotely, communicate with the treatment system 304, receive instructions to medical professionals 302, and send reports. 【0038】 Next, referring to Figure 4, anomaly detection using time series information and event sequence information is shown. The input may include, for example, a multivariate time series 402 that combines multiple time series from different sensors. The input may further include multiple types of event sequences 404 that can indicate the type of event, the time the event occurred, and other appropriate information related to the event. The multiple types of event sequences may include multiple different types of events. 【0039】 Encoder 406 embeds the input into latent space and generates a set of features representing the combined inputs 402 and 404. Each input generates its own feature vector, and these vectors may be aggregated in 408 to create a context vector. Encoder 406 may include a converter having a stack of self-attention and cross-attention layers that combine information between different time steps from each sequence. The hidden states from each time step of each sequence are then concatenated and passed to the aggregation network 408. The aggregation network 408 may include a self-attention layer that combines information from all time steps of both streams to output a context vector. 【0040】 The context vector may be used as input to the SVDD loss function 410. Feature vectors from all training data may also be used to compute the SVDD loss 410, which is interpreted as the radius of the smallest hypersphere encompassing all training data in the latent feature space. The output of the SVDD loss function 410 may be used to detect anomalies 414. 【0041】 Next, referring to Figure 5, a method for detecting and addressing anomalies is shown. Block 502 acquires a set of training data, which may include system measurement time series and timestamped event sequences, and trains a machine learning model to detect anomalies, as will be described in more detail below. The training process 502 can use multivariate time series and event logs collected during the normal operating state of the system, and this information can be used to optimize according to the SVDD loss 410. The trained model is then deployed 504 to equipment such as a cyber-physical system management system 106 or a health management analysis system. 【0042】 In block 506, new data including event information and multivariate time-series information may be collected from the device. This new data represents the operating state of the system and its specific event history. In block 508, the trained model is used to generate anomaly scores related to the system's operation, for example, anomaly scores exceeding a threshold are interpreted as abnormal operating states. If the anomaly score is higher than the threshold, block 508 may further generate a ranked list of system events that are potential causes of the anomaly. For example, this event prediction may relate to anticipated system malfunctions or harmful health events. 【0043】 Next, block 510 takes action to prevent or mitigate damage caused by the predicted event. In the case of the cyber-physical system 102, the action may include taking an automated response to one or more subsystems that are expected to be related to the predicted event. For example, the action may include changing environmental parameters to prevent overheating, or shutting down subsystems to prevent damage. 【0044】 In the case of patient 202, if the detected abnormality is a health adverse event, the response may include automated adjustments to treatment, such as adjusting the operation of the dialysis machine 204, adjusting the dosage of intravenously administered drugs, or stopping any procedure deemed dangerous. 【0045】 Next, referring to Figure 6, the details of training 502 are shown. Block 602 obtains a set of training data, which includes a synchronized system measurement time series and a timestamped system event sequence. Block 604 parses the event messages in the system event sequence, identifying the event type from each message, for example, by representing each different type of event as a different integer value. 【0046】 The training data can be divided into context windows, for example, using overlapping windows of a fixed length. Each training sample corresponds to a time-series segment x within the window. i , event subsequence v within the window i , the event that follows immediately after the window (time t i Type u i ) is obtained from each time window, including ). 【0047】 Training can be performed in two phases. In the first phase, autoencoder training 606 can be performed, as will be described in more detail below. Autoencoder training 606 trains an encoder to determine the hidden state of the input time series and uses a decoder to reconstruct the input time series. The loss of the autoencoder is used to update the autoencoder parameters, for example, using stochastic gradient descent. The encoder portion of the autoencoder can be used with encoder 406. 【0048】 In the second phase, the entire model is trained using the SVDD loss, as will be described in more detail below. The encoder 406 and the aggregation network 408 are used to generate feature vectors for the training example, and the SVDD loss is calculated. Based on the SVDD loss, the parameters of the encoder 406 and the aggregation network 408 may be adjusted, for example, according to stochastic gradient descent. 【0049】 The multivariate time series is first processed by a one-dimensional convolutional layer, and the result of each timestamp is concatenated with the corresponding time embedding vector before being input to encoder 406. The event sequence may first be parsed by a log parser to decompose each event message into a template and parameters. For example, the message "ESMCommonService has transitioned to the stopped state" may be converted into the template "[*]has transitioned to the stopped state" and the parameter "ESMCommonService". The template embedding layer and parameter embedding layer are learned to convert the template type and parameters into vectors, respectively. For each event in the sequence, the template embedding vector, parameter embedding vector, and time embedding vector are concatenated and used as input to the conversion encoder. 【0050】 The aggregate network 408 is a stack of self-attention layers. The hidden state of the time series transform encoder at the last time step is the latent vector. 【number】 It may be used to calculate the latent vector. 【number】 This is used as the condition and initial hidden state for the gated recurrent unit (GRU) decoder. The GRU decoder outputs a time series of the same length as the input time series. Encoder 406 and decoder may be trained together in block 606 to minimize the autoencoder error between the input time series and the decoder output. 【0051】 In the second training phase 608, feature vectors from the aggregate network 408 are generated. 【number】 SVDD loss 【number】 This is used to calculate the following: Here, φ is an encoder network that includes a converter and an aggregation network, 【number】 These are parameters of the neural network, 【number】 λ is a hypersphere in the feature space, and λ is a hyperparameter. A randomly sampled batch of values ​​is sampled from the training dataset, and the mean of the resulting feature vector is: 【number】 The first term of the loss is the radius of a hypersphere that encloses all n training data values ​​in the feature space. The second term regularizes the magnitude of the network parameters. The loss is calculated, for example, using stochastic gradient descent, to determine the parameters. 【number】 It can be minimized by adjusting this setting. 【0052】 After the model is trained and deployed, at time t, events and time series within a fixed-size context window preceding t may be input to the model. The model outputs an anomaly score, for example, by calculating the SVDD loss. System events with the highest weight of interest (e.g., selecting the top k weights) may be output as potential causes for events where the anomaly score exceeds a threshold. 【0053】 Next, referring to Figure 7, an exemplary arithmetic unit 700 according to an embodiment of the present invention is shown. The arithmetic unit 700 is configured to perform anomaly detection. 【0054】 The arithmetic unit 700 can be embodied as any type of computing or computer device capable of performing the functions described herein, including but not limited to computers, servers, rack-based servers, blade servers, workstations, desktop computers, laptop computers, notebook computers, tablet computers, mobile computing devices, wearable computing devices, network devices, web devices, distributed computing systems, processor-based systems, and / or user electronic devices. Furthermore or alternatively, the arithmetic unit 700 may be embodied as one or more compute threads, memory threads, or other racks, threads, computing chassis, or other components of a physically disassembled computing device. 【0055】 As shown in Figure 7, the arithmetic unit 700 exemplary includes a processor 710, an input / output subsystem 720, memory 730, a data storage device 740, and a communication subsystem 750, and / or other components and devices commonly found in a server or similar arithmetic unit. In other embodiments, the arithmetic unit 700 may include other or additional components (e.g., various input / output devices) commonly found in a server computer. Furthermore, in some embodiments, one or more exemplary components may be incorporated into another component or form part of another component. For example, memory 730, or part thereof, may be incorporated into the processor 710 in some embodiments. 【0056】 The processor 710 can be embodied as any type of processor capable of performing the functions described herein. The processor 710 may be embodied as a single processor, a multiprocessor, a central processing unit (CPU), a graphics processing unit (GPU), a single or multicore processor, a digital signal processor, a microcontroller, or other processor or processing / control circuit. 【0057】 Memory 730 may be embodied as any type of volatile or non-volatile memory or data storage capable of performing the functions described herein. During operation, memory 730 may store various data and software used during the operation of the arithmetic unit 700, such as operating systems, applications, programs, libraries, and drivers. Memory 730 may be communicatively coupled to the processor 710 via the I / O subsystem 720 and may be embodied as circuits and / or components to facilitate input / output operations with the processor 710, memory 730, and other components of the arithmetic unit 700. For example, the I / O subsystem 720 may be embodied as, or otherwise include, a memory controller hub, an input / output control hub, a platform controller hub, an integrated control circuit, a firmware device, communication links (e.g., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc.) and / or other components and subsystems to facilitate input / output operations. In some embodiments, the I / O subsystem 720 may form part of a system-on-a-chip (SOC) and be integrated into a single integrated circuit chip together with other components of the processor 710, memory 730, and arithmetic unit 700. 【0058】 The data storage device 740 can be embodied as any type of device or apparatus configured for short-term or long-term storage of data, such as a memory device and circuit, a memory card, a hard disk drive, a solid-state drive, or other data storage device. The data storage device 740 can store program code 740A for training a model, program code 740B for detecting anomalies, and / or program code 740C for performing corrective actions in response to detected anomalies. The communication subsystem 750 of the arithmetic unit 700 can be embodied as any network interface controller or other communication circuit, apparatus, or assembly thereof that can enable communication between the arithmetic unit 700 and other remote devices over a network. The communication subsystem 750 can be configured to achieve such communication using any one or more communication technologies (e.g., wired or wireless) and associated protocols (e.g., Ethernet, InfiniBand®, Bluetooth®, Wi-Fi®, WiMAX®, etc.). 【0059】 As shown in the figure, the arithmetic unit 700 may also include one or more peripheral devices 760. The peripheral devices 760 may include any number of additional input / output devices, interface devices, and / or other peripheral devices. For example, in some embodiments, the peripheral devices 760 may include a display, a touchscreen, a graphics circuit, a keyboard, a mouse, a speaker system, a microphone, a network interface, and / or other input / output devices, interface devices, and / or peripheral devices. 【0060】 Of course, the arithmetic unit 700 may include other elements (not shown) and may omit certain elements, as can be easily conceived by those skilled in the art. For example, various other sensors, input devices, and / or output devices may be included in the arithmetic unit 700, depending on specific implementations of the same, as can be easily understood by those skilled in the art. For example, various types of wireless and / or wired input and / or output devices may be used. Furthermore, processors, controllers, memory, etc., may be added to make it available in various configurations. These and other variations of the processing system 700 can be easily conceived by those skilled in the art, given the teachings of the invention provided herein. 【0061】 Next, refer to Figures 8 and 9. Figures 8 and 9 show exemplary neural network architectures, including encoder 406, which can be used to implement parts of this model. A neural network is a generalized system whose functionality and accuracy improve with exposure to additional empirical data. Neural networks are learned by exposure to empirical data. During training, the neural network remembers and adjusts multiple weights applied to the input empirical data. By applying the adjusted weights to the data, it can identify whether the data belongs to a predefined class from a set of classes, or output the probability that the input data belongs to each class. 【0062】 The empirical data obtained from a series of examples (also called training data) is formatted as a string of values ​​and fed into the neural network. Each example is associated with a known result or output. Each column is represented as a pair (x,y), where x is the input data and y is the known output. The input data can be of various data types and may contain multiple different values. The network can have one input node for each value that makes up the example's input data, and each input value can be assigned a separate weight. The input data can be formatted as a vector, array, or string, for example, depending on the architecture of the neural network being built and trained. 【0063】 A neural network "learns" by comparing the neural network output generated from input data with known values ​​from examples, and adjusting the stored weights to minimize the difference between the output and the known values. This adjustment can be performed on the stored weights through backpropagation, and the effect of the weights on the output is determined by calculating a mathematical gradient and adjusting the weights in a way that shifts the output to the minimum difference. This optimization, called gradient descent, is a non-restrictive example of how training takes place. A subset of examples with known values ​​not used in training can be used to test and validate the accuracy of the neural network. 【0064】 During operation, the trained neural network can be used on new data that has not been previously used for training or validation through generalization. The weights of the tuned neural network can be applied to the new data, and the weights estimate the function developed from the training examples. The parameters of the estimated function captured by the weights are based on statistical inference. 【0065】 In a layered neural network, nodes are arranged in layers. An exemplary simple neural network has an input layer 820 with source nodes 822 and a single computation layer 830 with one or more computation nodes 832 that also function as output nodes, with one computation node 832 for each possible category into which the input example can be classified. The input layer 820 can have a number of source nodes 822 equal to the number of data values ​​812 of the input data 810. The data values ​​812 of the input data 810 can be represented as a column vector. Each computation node 832 in the computation layer 830 generates a linear combination of weighted values ​​from the input data 810 supplied to the input nodes 820 and applies a differentiable nonlinear activation function to the sum. The exemplary simple neural network can perform classification on linearly separable examples (e.g., patterns). 【0066】 Deep neural networks, such as multilayer perceptrons, can have an input layer 820 with source nodes 822, one or more computation layers 830 with one or more computation nodes 832, and an output layer 840 with one output node 842 for each category into which the input example may be classified. The input layer 820 can have a number of source nodes 822 equal to the number of data values ​​812 of the input data 810. The computation nodes 832 of the computation layer 830 are located between the source nodes 822 and the output nodes 842 and are not directly observed, and are therefore also called hidden layers. Each node 832,842 of the computation layer generates a linear combination of weighted values ​​from the output values ​​of the nodes of the previous layer and applies a differentiable nonlinear activation function over the range of the linear combination. The weights applied to the values ​​from each previous node are, for example, w1, w2, ... w n-i ,w n It can be represented as follows: The output layer provides the network's overall response to the input data. Deep neural networks can be fully connected, where each node in the computational layer is connected to all nodes in the previous layer, or the connections between layers can be in other configurations. If there are missing links between nodes, the network is said to be partially connected. 【0067】 Training a deep neural network involves two phases: a forward phase in which the weights of each node are fixed and the input is propagated through the network, and a backward phase in which error values ​​are propagated back through the network and the weight values ​​are updated. 【0068】 One or more computational (hidden) layers 830 compute nodes 832 perform a nonlinear transformation on the input data 812 that generates the feature space. Classes and categories may be easier to separate in the feature space than in the original data space. 【0069】 The embodiments described herein may be entirely hardware, entirely software, or may include both hardware and software elements. In preferred embodiments, the present invention is implemented in software including, but not limited to, firmware, resident software, and microcode. 【0070】 Embodiments may include computer program products accessible from computer-enabled or computer-readable media that provide program code for use by or in connection with a computer or any instruction execution system. Computer-enabled or computer-readable media may include any device that stores, communicates, propagates, or transports programs for use by or in connection with an instruction execution system, apparatus, or device. The medium may be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor systems (or apparatus or devices), or propagation media. The medium may include computer-readable storage media such as semiconductor or solid-state memory, magnetic tape, removable computer diskettes, random-access memory (RAM), read-only memory (ROM), rigid magnetic disks, and optical disks. 【0071】 Each computer program can be substantially stored in a machine-readable storage medium or device (e.g., program memory or magnetic disk) that is readable by a general-purpose or special-purpose programmable computer, in order to configure and control the operation of the computer when the storage medium or device is read by the computer in order to perform the procedures described herein. The system of the present invention can also be considered to be implemented on a computer-readable storage medium configured with a computer program, in which case the configured storage medium causes the computer to operate in a specific predetermined manner to perform the functions described herein. 【0072】 A data processing system suitable for storing and / or executing program code may include at least one processor directly or indirectly coupled to a memory element via a system bus. The memory element may include local memory, bulk storage, and cache memory that provides at least some temporary storage for the program code to reduce the number of times the code is retrieved from bulk storage during execution. Input / output or I / O devices (including, but not limited to, keyboards, displays, pointing devices, etc.) may be coupled to the system directly or via an intermediary I / O controller. 【0073】 Network adapters can also be integrated into a system to enable a data processing system to connect to other data processing systems or remote printers or storage devices via an intervening private or public network. Modems, cable modems, and Ethernet cards are just a few of the types of network adapters currently available. 【0074】 As used herein, the terms “hardware processor subsystem” or “hardware processor” may refer to a processor, memory, software, or combination thereof that works together to perform one or more specific tasks. In useful embodiments, a hardware processor subsystem may include one or more data processing elements (e.g., logic circuits, processing circuits, instruction execution devices, etc.). One or more data processing elements may be included in a central processing unit, a graphics processing unit, and / or a separate processor- or arithmetic element-based controller (e.g., logic gates, etc.). A hardware processor subsystem may include one or more onboard memories (e.g., caches, dedicated memory arrays, read-only memory, etc.). In some embodiments, a hardware processor subsystem may include one or more memories (e.g., ROM, RAM, Basic Input / Output System (BIOS), etc.) that may be onboard or offboard, or that may be dedicated for use by the hardware processor subsystem. 【0075】 In some embodiments, a hardware processor subsystem may include and execute one or more software elements. These software elements may include an operating system and / or one or more applications and / or specific code to achieve a specified result. 【0076】 In other embodiments, the hardware processor subsystem may include dedicated circuits that perform one or more electronic processing functions to achieve a specified result. Such circuits may include one or more application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and / or programmable logic arrays (PLAs). 【0077】 These and other variations of the hardware processor subsystem are also intended in accordance with embodiments of the present invention. 【0078】 In this specification, any reference to “one embodiment” or “one embodiment” of the present invention, and to other modifications, means that certain features, structures, properties, etc., described in relation to the embodiments are included in at least one embodiment of the present invention. Therefore, expressions such as “in one embodiment” or “in one embodiment” appearing elsewhere in this specification, and any other modifications, do not necessarily all refer to the same embodiment. However, it should be understood that, considering the teachings of the present invention provided herein, features of one or more embodiments can be combined. 【0079】 For example, in the case of "A / B," the use of any of the following " / ," "and / or," or "at least one," such as "A and / or B" or "at least one of A and B," will be understood as intended to include the selection of only the first listed option (A), only the second listed option (B), or both options (A and B). As further examples, in the case of "A, B, and / or C" and "at least one of A, B, and C," such expressions are intended to include the selection of only the first listed option (A), only the second listed option (B), only the third listed option (C), only the first and second listed options (A and B), only the first and third listed options (A and C), only the second and third listed options (B and C), or all three options (A, B, and C). This can be extended as many times as there are listed items. 【0080】 The foregoing is to be understood in all respects to be illustrative and not restrictive, and the scope of the invention disclosed herein is to be determined not from the detailed description but from the claims as interpreted in accordance with the full width permitted by patent law. The embodiments shown and described herein are merely illustrative of the invention, and those skilled in the art should understand that various modifications can be implemented without departing from the scope and spirit of the invention. Those skilled in the art can implement various other combinations of features without departing from the scope and spirit of the invention. Thus, while aspects of the invention have been described with the detail and specificity required by patent law, what is claimed and intended to be protected by the patent is as stated in the appended claims.

Claims

[Claim 1] A method for detecting anomalies implemented in a computer, Using each of the converters and aggregation networks, the multivariate time series and multiple types of event sequences are encoded (506) to generate feature vectors, Using the aforementioned feature vectors, anomaly detection is performed (508) to identify anomalies within the cyber-physical system, A method for correcting or mitigating the effects of the aforementioned abnormality, including changing environmental parameters to prevent overheating or shutting down a subsystem to prevent damage. [Claim 2] In the method according to claim 1, A method for using a support vector data description that includes a hypersphere radius term and a network parameter regularization term when performing anomaly detection. [Claim 3] In the method of claim 2, The aforementioned hypersphere radius term is a method for representing the radius of a hypersphere that encompasses the input multivariate time series data in the feature space. [Claim 4] In the method according to claim 1, A method for determining an anomaly score by performing anomaly detection, comparing the anomaly score with a threshold, and indicating an anomaly if the anomaly score exceeds the threshold. [Claim 5] In the method according to claim 1, The aggregation network includes a stack of self-attention layers that convert the output of each of the converters into the feature vector. [Claim 6] In the method according to claim 1, Furthermore, a method comprising determining a ranked list of past events and time-series measurements that have the greatest impact on the anomaly. [Claim 7] In the method according to claim 6, A method for determining the ranked list, according to the attention weights from the aggregate network. [Claim 8] In the method according to claim 1, The converter and the aggregation network are trained using deep learning with a set of training data including synchronized time-series information and timestamped event sequences. [Claim 9] In the method according to claim 1, A method further comprising reporting the detected abnormality to a medical professional in order to support medical decision-making. [Claim 10] In the method according to claim 1, The corrective action described above is a method that includes actions selected from the group consisting of changing the security settings of an application or hardware component, changing the operating parameters of an application or hardware component, stopping and / or restarting an application, stopping and / or restarting a hardware component, changing environmental conditions, and changing the state or settings of a network interface. [Claim 11] A system for detecting anomalies, Hardware processor (710), The system has a memory (740) for storing a computer program, and when the computer program is executed by the hardware processor, the hardware processor has a memory (740) for storing a computer program, Using each of the converters and aggregation networks, the multivariate time series and multiple types of event sequences are encoded (506) to generate feature vectors, Using the aforementioned feature vectors, anomaly detection is performed (508) to identify anomalies within the cyber-physical system, The system performs corrective actions to correct or mitigate the effects of the aforementioned abnormality, such as changing environmental parameters to prevent overheating or shutting down subsystems to prevent damage. [Claim 12] In the system according to claim 11, The computer program is a system that causes the hardware processor to support a vector data description that includes a hypersphere radius term and a network parameter regularization term for anomaly detection. [Claim 13] In the system according to claim 12, The aforementioned hypersphere radius term is a system that represents the radius of a hypersphere that encompasses the input multivariate time series data in the feature space. [Claim 14] In the system according to claim 11, The computer program further includes a system that causes the hardware processor to determine an abnormal score, compare the abnormal score with a threshold, and indicate an abnormality if the abnormal score exceeds the threshold. [Claim 15] In the system according to claim 11, The aggregation network is a system that includes a stack of self-attention layers that convert the output of each of the converters into the feature vector. [Claim 16] In the system according to claim 11, The computer program further causes the hardware processor to determine a ranked list of past events and time-series measurements that have the greatest impact on the anomaly. [Claim 17] In the system described in claim 16, The determination of the ranked list is performed according to the attention weights from the aggregated network. [Claim 18] In the system according to claim 11, The corrective action includes a system in which the action is selected from a group consisting of changing the security settings of an application or hardware component, changing the operating parameters of an application or hardware component, stopping and / or restarting an application, stopping and / or restarting a hardware component, changing environmental conditions, and changing the state or settings of a network interface.

Citation Information

Patent Citations

  • Learning device, abnormality detection device, learning method, abnormality detection method, and program

    JP2022046177A

  • Anomaly detection method, anomaly detection device and anomaly detection system

    KR1020220055960A

  • Anomaly detection in multidimensional time series data

    US20190147300A1

  • Methods and systems for predicting time of server failure using server logs and time-series data

    US20220103444A1

  • Abnormality detection device, abnormality detection method, and program

    WO2021100179A1