Exploring automatic neural network architecture construction with bayesian networks
By using the AutoBayes framework for automatic Bayesian inference, the architecture of artificial neural networks is optimized, solving the problem of search space explosion caused by manual design in existing technologies. This achieves perturbation-robust neural network construction, improving learning efficiency and adaptability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- MITSUBISHI ELECTRIC CORP
- Filing Date
- 2021-02-26
- Publication Date
- 2026-06-16
AI Technical Summary
Existing technologies require extensive manual design and trial-and-error methods to optimize the architecture when constructing artificial neural networks, leading to an explosion in the search space and excessively long exploration time, making it difficult to achieve perturbation-invariant machine learning.
We introduce the AutoBayes automatic Bayesian inference framework, which automatically constructs artificial neural networks through Bayesian graph exploration. By utilizing the Bayesian sphere algorithm and ordered factorization, we optimize the connections of the classifier, encoder, decoder, and adversarial network blocks to achieve robust learning against disturbances.
It automatically constructs compact architectures, improves the performance and adaptability of neural networks, reduces sensitivity to perturbation changes, and is applicable to datasets in different domains.
Smart Images

Figure CN115769228B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to an automated construction system for artificial neural networks, and more specifically, to an automated construction system for artificial neural networks explored using Bayesian graphs. Background Technology
[0002] The tremendous advancements in deep learning technology, based on deep neural networks (DNNs), have solved a wide range of problems in data processing, including media signal processing for video, speech, and images; physical data processing for radio waves, electrical impulses, and light beams; and physiological data processing for heart rate, temperature, and blood pressure. For example, DNNs have enabled more practical designs for human-machine interfaces (HMIs) by analyzing users' physiological data, such as electroencephalograms (EEGs) and electromyograms (EMGs). However, these biosignals are highly susceptible to variations in the biological state of each individual subject. Therefore, frequent calibration is often required in typical HMI systems.
[0003] To address this issue, subject-invariant methods employing conditional variational autoencoder adversarial training (A-CVAE) have emerged to reduce user calibration for achieving successful HMI systems. Integrating additional functional blocks of the encoder, scrambling conditional decoder, and adversarial network provides superior subject-invariant performance compared to standard DNN classifiers. DNN architectures can potentially be expanded using more functional blocks and more hidden nodes. However, most work relies on manual design to determine the block connectivity and architecture of DNNs. Specifically, DNN techniques are often handcrafted by experts with human insight into the data model. Optimizing the DNN architecture requires trial and error. A novel framework of Automated Machine Learning (AutoML) is proposed to automatically explore different DNN architectures. Automation of hyperparameter and architecture exploration within the context of AutoML facilitates the design of DNNs suitable for subject-invariant biosignal processing.
[0004] Learning data representations that capture task-relevant features but are invariant to perturbation remains a key challenge in machine learning. VAEs introduce variational Bayesian inference methods that combine automated association architectures, where generative and inference models can be learned jointly. This approach is extended by incorporating CVAEs that introduce moderating variables to represent perturbations and regularized VAEs that consider decoupling the perturbation variables from the latent representation. The adversarial concept was introduced with Generative Adversarial Networks (GANs) and has been adopted in numerous applications. Simultaneously discovered Adversarial Learning Inference (ALI) and Bidirectional GANs (BiGANs) propose adversarial methods for training autoencoders. Adversarial training has also been combined with VAEs to regularize and decouple latent representations, enabling perturbation-robust learning. Searching for DNN models using hyperparameter optimization has been extensively explored within a related framework known as AutoML. Automated methods include architecture search, learning rule design, and augmentation exploration. Most works use evolutionary optimization or reinforcement learning frameworks to tune hyperparameters or construct network architectures from pre-selected building blocks. The recent AutoML-Zero considers extending this to exclude human knowledge and insights from fully automated design from scratch.
[0005] However, AutoML requires a significant amount of exploration time to find the optimal hyperparameters due to the exploding search space. Furthermore, most of the search space for link connections becomes meaningless without any good justification. Therefore, there is a need to develop a system for automatically constructing neural networks using a more systematic exploration approach. Summary of the Invention
[0006] Technical issues
[0007] This disclosure relates to systems and methods for automatically constructing artificial neural networks using Bayesian graphs. Specifically, the system of the present invention introduces an automated Bayesian inference framework called AutoBayes, which explores different graphical models linking classifiers, encoders, decoders, estimators, and adversarial network blocks to optimize a perturbation-invariant machine learning pipeline. AutoBayes also allows for justifying decoupled representations that split latent variables into multiple paths to impose different relationships with subject / session variations and task labels. The framework is applied to a range of physiological datasets where we access subjects and class labels during training and provide capability analysis for subject transfer learning in both variational and adversarial training scenarios. The framework can also be effectively used for semi-supervised multi-class classification and reconstruction tasks on datasets from different domains.
[0008] The core nontrivial achievements of this invention under existing prior knowledge are in the following five aspects:
[0009] a. AutoBayes explores the underlying graphical models inherent in the data, rather than exploring the hyperparameters of DNN blocks.
[0010] b. AutoBayes' exploration-based Bayesian graphs provide a solid rationale for how to connect multiple DNN blocks to impose modulation and adversarial censorship on task classifiers, feature encoders, decoders, scrambling indicators, and adversarial networks.
[0011] c. It provides a framework for system automation to explore different inference models by using the Bayes-Ball algorithm and ordered factorization.
[0012] d. The framework can also be extended to multiple implicit representations and multiple scrambling factors.
[0013] e. In addition to fully supervised training, AutoBayes can automatically build some relevant graphical models suitable for semi-supervised learning.
[0014] Some embodiments of this disclosure are based on the understanding that a novel concept called AutoBayes explores various Bayesian graphical models to facilitate the search for the optimal inference strategy suitable for a perturbation-robust HMI system. Utilizing the Bayesian sphere algorithm, our method automatically constructs reasonable link connections between the classifier, encoder, decoder, perturbation estimator, and adversarial DNN blocks. As a proof-of-concept analysis, we demonstrate the benefits of AutoBayes on various neural / physiological datasets. We observe a significant performance gap between the best and worst graphical models, implying that using a deterministic model without graph exploration may result in poor classification outcomes. Furthermore, the best model for a physiological dataset does not always perform optimally for different data, encouraging the use of AutoBayes for adaptive model generation given a target dataset. One implementation extends the AutoBayes framework to integrate AutoML to optimize the hyperparameters of individual DNN blocks. How to handle the exponentially growing search space of possible Bayesian graphs and the number of random variables is also addressed using confidence propagation analysis of factor graphs for progressive edge pruning / grafting.
[0015] Our invention enables AutoML to efficiently search for potential architectures with solid theoretical justification. The method is based on the understanding that datasets are modeled using directed Bayesian graphs, hence the name AutoBayes method. One implementation uses Bayesian graphs with different factorization orders of the joint probability distribution for exploration. The invention also provides a method for creating compact architectures by pruning links based on conditional independence derived from the Bayes Ball algorithm through Bayesian graph assumptions. Another method optimizes inference graphs with different factorization orders of likelihood, enabling the automatic construction of joint generation and inference graphs. It achieves natural architectures based on VAEs in both with and without conditional links. Additionally, another implementation uses adversarial training with adversarial networks appended with latent variables independent of the scrambling parameters to achieve scrambling-robust feature extraction. Another case uses deliberately redundant graphs with conditional grafting to facilitate scrambling-robust feature extraction. Another implementation uses variational sampling for a semi-supervised setting. Yet another implementation provides a way to transform one data structure into another data structure with a dimension mismatch using tensor projection. Another implementation uses ensemble graphs, which combine estimates from multiple different Bayesian graphs to improve performance. One implementation uses a dynamic attention network to implement the ensemble approach. Additionally, it jointly addresses the recurrence consistency of the VAE and model consistency across different inference graphs. Another implementation uses graph neural networks to leverage the geometric information of the data and assists a pruning strategy to validate relevance through confidence propagation across Bayesian graphs. Furthermore, it integrates AutoBayes and AutoML to tune the hyperparameters of individual DNN blocks. Instead of divergence, Wasserstein distance can also be used.
[0016] This system provides an automated framework for searching for the optimal inference graph model associated with a Bayesian graph model well-suited for reproducing the training dataset. The proposed system automatically represents various Bayesian graphs by factoring the joint probability distribution in terms of data, class labels, subject identifiers (IDs), and intrinsic latent representations. Given a Bayesian graph, redundant links are pruned using a Bayesian sphere algorithm to generate meaningful inference graphs, achieving high-accuracy estimation. To enhance robustness to perturbation parameters such as subject IDs, the explored Bayesian graph provides justification for adversarial training with / without variational modeling and latent decoupling. As an implementation, we demonstrate that AutoBayes achieves superior performance across a wide range of physiological datasets.
[0017] The accompanying drawings are included to provide a further understanding of the invention, illustrating embodiments of the invention and explaining the principles of the invention together with the description. Attached Figure Description
[0018] [ Figure 1 ] Figure 1 (a) to Figure 1 (c) illustrates a reasoning method for classifying Y given data X under hidden Z and semi-labeled interference S according to an embodiment of the present disclosure;
[0019] [ Figure 2 ] Figure 2 (a) to Figure 2 (c) shows a Boyesian model (figure), Z-first perturbation, and S-first perturbation corresponding to a fully connected Bayesian graph and inference model for Z-first or S-first factorization according to some embodiments of the present disclosure.
[0020] [ Figure 3 ] Figure 3 (a) to Figure 3 (k) shows an example Bayesian diagram of a data generation model under automated exploration according to some embodiments of this disclosure;
[0021] [ Figure 4 ] Figure 4 (a) to Figure 4 (l) shows Z-priority and S-priority reasoning graph models associated with generative models DG, J, and K according to some embodiments of the present disclosure;
[0022] [ Figure 5 ] Figure 5 This is a schematic diagram illustrating the overall network structure for pairing generation model K and inference model Kz according to some embodiments of the present disclosure;
[0023] [ Figure 6A ] Figure 6A The performance of datasets showing scores for reconstruction loss, harassment classification, and task classification under variational / non-variable and adversarial / non-adversarial settings, according to embodiments of this disclosure, is illustrated.
[0024] [ Figure 6B ] Figure 6B The performance of datasets showing scores for reconstruction loss, harassment classification, and task classification under variational / non-variable and adversarial / non-adversarial settings, according to embodiments of this disclosure, is illustrated.
[0025] [ Figure 6C ] Figure 6C The performance of datasets showing scores for reconstruction loss, harassment classification, and task classification under variational / non-variable and adversarial / non-adversarial settings, according to embodiments of this disclosure, is illustrated.
[0026] [ Figure 7A ] Figure 7A The indications provided by embodiments of this disclosure indicate that the optimal inference strategy is highly dependent on the performance of the dataset.
[0027] [ Figure 7B ] Figure 7B The indications provided by embodiments of this disclosure indicate that the optimal inference strategy is highly dependent on the performance of the dataset.
[0028] [ Figure 7C ] Figure 7C The indications provided by embodiments of this disclosure indicate that the optimal inference strategy is highly dependent on the performance of the dataset.
[0029] [ Figure 8 ] Figure 8 (a) to Figure 8 (j) illustrates the basic rules of the Bayesian sphere algorithm with shaded condition nodes according to an embodiment of the present disclosure;
[0030] [ Figure 9 ] Figure 9 An example algorithm illustrating the overall process of the AutoBoyes algorithm according to embodiments of the present disclosure is shown.
[0031] [ Figure 10 ] Figure 10 Examples of DNN blocks for classifiers, encoders, decoders, estimators, and adversarial functions according to embodiments of this disclosure are shown; and
[0032] [ Figure 11 ] Figure 11 A schematic diagram of a system configured with a processor, memory, and interface according to an embodiment of the present disclosure is shown. Detailed Implementation
[0033] Various embodiments of the present invention are described below with reference to the accompanying drawings. It should be noted that the drawings are not drawn to scale, and elements with similar structures or functions are indicated by similar reference numerals throughout the drawings. It should also be noted that the drawings are intended only to facilitate the description of specific embodiments of the invention. They are not intended to be an exhaustive description of the invention or to limit the scope of the invention. Furthermore, aspects described in connection with specific embodiments of the invention are not necessarily limited to those embodiments and may be practiced in any other embodiment of the invention.
[0034] Figure 1 of (a), Figure 1 (b) and Figure 1 (c) shows a schematic diagram illustrating three classifier networks (a), (b) and (c) according to an embodiment of the present disclosure, which describes an inference method for classifying given data X to Y under latent Z and semi-labeled perturbation variations S.
[0035] At the heart of our approach is a graphical Bayesian model that captures the probabilistic relationships between random variables representing data features X, task labels Y, perturbation variation labels S, and (latent) latent representations Z. The ultimate goal is to infer task labels Y from measured data features X, which is hampered by the presence of perturbation variations (e.g., subject / session variations) labeled (partially) by S. Latent representations Z (further denoted by Z1, Z2, ... if desired) are also optionally introduced into these models to help capture the underlying relationships between S, X, and Y.
[0036] Let p(y,s,z,x) represent the joint probability distribution of the underlying biosignal dataset for four random variables, namely Y, S, Z, and X. The chain rule can produce the following factorization of the generative model from Y to X (note that there are at most 4! factorization sequences, including the useless ones):
[0037] p(y,s,z,x)=p(y)p(s|y)p(z|s,y)p(x|z,s,y), (1)
[0038] This is visualized in Figure 2 In the Bayesian graph of (a), for example, the probability conditional on X can be factored as follows (in the 3! different inference factorization orders of the four-node graph):
[0039]
[0040] It is marginalized to obtain the likelihood of class Y given data X: p(y|x)=E s,z [p(y,s,z|x)]. (2) The two inference scheduling strategies mentioned above are shown in the following example. Figure 2 (b) and Figure 2 In the factor graph model in (c), the number of Bayesian graphs and inference graphs may increase rapidly when considering more nodes with multiple perturbations and latent variables.
[0041] Figure 2 of (a), Figure 2 (b) and Figure 2 The graphical model described above in (c) makes no assumptions about the potential inherent independence in the dataset and is therefore the most general. However, based on the underlying independence in biosignals, we may be able to prune some of the edges in those graphs. For example, if the data has a Markov chain YX independent of S and Z, then all links except the link between X and Y would be unreasonable, leading to Figure 1(a) This implies that the most complex inference models with high degrees of freedom do not always perform best across arbitrary datasets. This prompts us to consider extending the AutoML framework to automatically explore the best inference factor graph and corresponding Bayesian graph model pair that matches the dataset, in addition to hyperparameter design.
[0042] AutoBayes started by cutting Figure 2 We explore any potential Bayesian graph by examining the links in the full-link graph in (a), thereby imposing possible independence. Then, we employ the Bayesian sphere algorithm on each hypothetical Bayesian graph to check conditional independence over different inference strategies, such as... Figure 2 (b) / Figure 2 The full-chain Z / S priority inference graph in (c). The Bayesian sphere proves... Figure 2 (b) / Figure 2 The reasonable pruning of links in the full-chain reasoning graph of (c) and the reasonableness of potential adversarial censorship when Z is independent of S. This processing automatically constructs the connectivity of reasoning, generation, and adversarial blocks with good arguments, such as from Figure 1 Construction of arbitrary models of (c) Figure 1 The A-CVAE classifier in (b). Before describing the more detailed system configuration for the general case, some implementations of Bayesian graph exploration are described below.
[0043] Bayesian graph exploration
[0044] Given sensor measurements such as media data, physical data, and physiological data, we never know the true joint probability beforehand, so we should assume a possible generative model. AutoBayes aims to explore any such potential graphical models to match the measurement distribution. Since the maximum possible number of graphical models is enormous, even for a four-node case involving Y, S, Z, and X, we... Figure 3 (a) to Figure 3 Some implementations of these Bayesian diagrams are shown in (k). Each Bayesian diagram corresponds to the following assumptions of the joint probability factorization (the p(x|…) term specifies the generative model of X):
[0045]
[0046] Independence is explicitly indicated by the slant elimination factor from the full-chain case in (1). The relevant inference strategy will change according to the hypothetical Bayesian diagram, as some variables can be conditionally independent, allowing for the pruning of links in the inference factor graph. For example... Figure 4 (a) to Figure 4As shown in (l), a reasonable inference graph model can be automatically generated using the Bayesian sphere algorithm based on the various Bayesian graph assumptions inherent in the dataset. Specifically, the conditional probability p(y,s,z|x) can be obtained for each model as follows.
[0047] Bayesian graphical model A (direct Markov): The simplest model between X and Y would be a single Markov chain with no dependencies on S and Z, such as... Figure 3 The Bayesian diagram in (a) is shown. This graphical model assumes that the biological signal is perturbation-invariant. For this case, there is no reason to use a complex inference model such as A-CVAE, because most factors will be independent, such as p(y,s,z|x)=p(z|x)p(s|z,x)p(y|s,z,x). Therefore, we should use a model like... Figure 1 The standard classification method in (a) infers Y based on the inference model p(y|x) given X, without involving S and Z.
[0048] Bayesian graphical model B (Hidden Markov): Assuming hidden Z can be... Figure 3 Operating within the Markov chain YZX shown in (b), we obtain a simple inference model: p(y,s,z|x)=p(z|x)p(s|z,x)p(y|s,z,x). Note that this model assumes independence between Z and S, thus making it more robust to spoofing through adversarial censorship. Specifically, the adversarial DNN block estimating the spoofing variations should be fitted with hidden vectors and trained alternately to achieve adversarial mini-max optimization.
[0049] Bayesian graphical model C (subject-dependent): We can model situations where data X directly depends on subject S and task Y, such as... Figure 3 As shown in (c). For this case, due to the Bayesian sphere, we can consider the corresponding inference model:
[0050]
[0051] It's important to note that this model is independent of Z; therefore, the Z-priority inference strategy is reduced to the S-priority inference strategy. For reference, we consider adding a Y-priority inference strategy to evaluate the difference.
[0052] Bayesian graphical model D (implicit summary): Another graphical model is shown in Figure 3 In (d), the latent space bridges all other random variables. The Bayesian sphere produces the following model:
[0053]
[0054] Its graphic models are depicted in Figure 4 (a) and Figure 4 In (b).
[0055] Bayesian graphical model E (task-implicit summary): Another graphical model involving latent variables is shown in... Figure 3 In (e), the latent space only summarizes Y. The Bayesian sphere produces the following inference model:
[0056]
[0057] It is shown in Figure 4 (c) and Figure 4 In (d), it is important to note that the generative model E has no marginal dependency between Z and S, which provides a rationale for using adversarial censorship to suppress the harassing information S in the latent space Z. Furthermore, since the generative model of X depends on both Z and S, it is reasonable to adopt... Figure 1 The A-CVAE classifier is shown in (b).
[0058] Bayesian graphical model F (subject-hidden summary): Considering Figure 3 (f), where latent variables summarize subject information S. The Bayesian sphere provides... Figure 4 (e) and Figure 4 The reasoning diagram shown in (f) corresponds to:
[0059]
[0060] Bayesian graphical model G: ensures the joint distribution follows Figure 3 From model G in (g), we obtain the following inference model via a Bayesian sphere:
[0061]
[0062] Its graphical model is described in Figure 4 (g) and Figure 4 In (h). Note that, Figure 4 The inference model Gs in (h) and Figure 4 The inference model Ds in (b) is the same. Although the inference graphs Gs and Ds are the same, the generative models of X are different, such as... Figure 3 (g) and Figure 3 As shown in (d). Specifically, the VAE decoder of model G should be fed into S along with the variational latent space Z, thus justifying the use of CVAE for model G rather than D. This difference in generative models can potentially have different effects on inference performance, even though the individual inference graphs are the same.
[0063] Bayesian graphical models H and I: Figure 3 (h) and Figure 3 The two generative models H and I shown in (i) both have the fully connected inference strategy given in (2), and their graphs are shown in [the diagrams]. Figure 2(a) to Figure 2 In (c), useful conditional independence cannot be found using Bayesian spheres. Similar to the relationship between models Ds and Gs, for Bayesian graphs H and I, the inference graphs can be the same, while the generative models for X are different, such as... Figure 3 (h) and Figure 3 As shown in (i).
[0064] Bayesian graphical model J (implicit decoupling): We can also consider multiple implicit vectors to make the Bayesian model have more vertices. Figure 1 Generalization. Here, we focus on two such implementations of a graph model with two latent spaces, such as... Figure 3 (j) and Figure 3 As shown in (k). These models belong to the same class as model D, except that the single hidden Z is decoupled into two parts Z1 and Z2, which are associated with S and Y respectively. Given Figure 3 The Bayesian graph of (j) and the Bayesian sphere generate some inference strategies including the following two models:
[0065]
[0066] It is shown in Figure 4 (i) and Figure 4 In (j). Note that the Z2 margins are independent of the perturbation variable S, which encourages the use of adversarial training that is robust to subject / session changes.
[0067] Bayesian graphical model K (conditional implicit decoupling): Figure 3 Another modified model linking Z1 and Z2 in (k) produces the following inference model:
[0068]
[0069] like Figure 4 (k) and Figure 4 As shown in (l). The main difference from model J is that the inference graph should use Z1 and Z2 to infer Y.
[0070] As described in the above implementation, AutoBayes explores by assuming an independent factor in (3). Figure 3 Different Bayesian graphs are used to generate links by pruning the links using the Bayesian sphere algorithm. Figure 4 Some non-redundant inference graphs. Given a pair of generating and inference graphs, the corresponding DNN architecture will be trained. For Figure 3 Examples of generative graph models K in (k), Figure 4 A related inference graph Kz in (k) will yield the following: Figure 5 The overall network structure shown includes an adversarial network because Z2 (conditional) is independent of S. Individual factor blocks are implemented using DNNs, for example, p...θ (z1,z2|x) is a DNN parameterized by θ, and the entire network except for the adversarial network is optimized to include The corresponding loss function is minimized as follows:
[0071]
[0072]
[0073] Where λ * Let represent the regularization coefficient, KL be the Kullback–Leibler divergence, and the adversarial network p is trained in an alternating manner. η (s′|z2) make so that minimize.
[0074] Bayesian Sphere Algorithm
[0075] The system of this invention relies on the Bayesian sphere algorithm to facilitate the automatic pruning of links in the inference factor graph by analyzing conditional independence. The Bayesian sphere algorithm uses only ten rules to identify conditional independence, such as... Figure 8 As shown. Given a directed Bayesian graph, we can determine whether the conditional independence between two disjoint sets of nodes is imparted to other nodes by applying the graph separation criterion. Specifically, if the Bayesian sphere can be used without encountering... Figure 8 If the movement proceeds in the presence of the stop arrow symbol, an undirected path is enabled. If there is no active path between two sets of nodes when some other adjustment nodes are obscured, then these sets of random variables are conditionally independent. Using the Bayesian sphere algorithm, this invention generates a list of independence relationships for specifying two disjoint nodes for the AutoBayes algorithm.
[0076] AutoBayes algorithm
[0077] Figure 9 The pseudocode of Algorithm 1, illustrating some embodiments of this disclosure, describes the overall process of the AutoBayes algorithm, for a more general case, and not just... Figure 3 and Figure 4 In this context, AutoBayes automatically constructs a non-redundant inference factor graph by using the Bayesian Sphere algorithm, given the hypothetical Bayesian graph assumption. Based on the derived conditional independence and pruned factor graph, the encoder, decoder, classifier, scrambling estimator, and adversarial DNN blocks are logically connected. Adversarial learning is utilized to train the entire DNN block in variational Bayesian inference. Note that, as an implementation, the hyperparameters of the individual DNN blocks can be further optimized using AutoML on top of the AutoBayes framework.
[0078] The system of this invention uses memory to store hyperparameters, trainable variables, intermediate neuron signals, and provisional computational values including forward propagation signals and backward propagation gradients. It reconfigures DNN blocks by exploring various Bayesian graphs based on the Bayesian sphere algorithm, pruning redundant links to make them compact. Based on several different criteria for matching the Bayesian model to the dataset, AutoBayes first creates a fully-linked directed Bayesian graph to connect all nodes in a specific permutation order. Then, the system prunes specific combinations of graph edges in the fully-linked Bayesian graph. Next, the Bayesian sphere algorithm is used to list conditional independence relationships across two disjoint nodes. For each Bayesian graph in the hypothesis, another fully-linked directed factor graph is constructed from the nodes associated with the data signal X in a different factorization order to infer other nodes. Then, redundant links in the fully-linked factor graph are pruned according to the independence list, making the DNN links compact. In another embodiment, redundant links are intentionally preserved and progressively grafted. The pruned Bayesian graph and the pruned factor graph are combined to make the generative and inference models consistent. Given a combined graphical model, all DNN blocks—encoder, decoder, classifier, estimator, and adversarial network—are connected to the model. This AutoBayes implements perturbation-robust inference, which can be transferred to new data domains for the test dataset.
[0079] The AutoBayes algorithm can be generalized to more than four node factors. For examples of these implementations, the disturbance variation S is further decomposed into variations S1, S2, ..., S according to combinations of supervised, semi-supervised, and unsupervised settings. N Multiple factors are used as multi-domain side information. In another example of the implementation, the latent variables are further decomposed into latent variables Z1, Z2, ..., Z... L Multiple factors are used as decoupling feature vectors. Figure 5 This is one of these implementation methods. For an example of an implementation with decomposed factors, the disturbance variations are grouped into different factors such as subject identification, session number, biological state, environmental state, sensor state, location, orientation, sampling rate, time, and sensitivity.
[0080] When exploring different graph models, one implementation uses the outputs of all the different models explored to improve performance, for example, by weighted summation to achieve ensemble performance. Another implementation uses additional DNN blocks that learn the optimal weights for combining different graph models. This implementation leverages attention networks to adaptively select relevant graph models given data. This implementation considers consensus equilibrium across different graph models because the original joint probabilities are the same. It also identifies the cycle consistency of the encoder / decoder DNN blocks.
[0081] Examples and experimental evaluations can be implemented.
[0082] Example dataset: We demonstrate the performance of AutoBayes on publicly available physiological datasets and the benchmark MNIST through experiments, as follows.
[0083] QMNIST: The benchmark handwritten digit image MNIST dataset with extended label information including author IDs. There are |S| = 539 authors who were used to classify |Y| = 10 digits from grayscale 28×28 pixel images on 60,000 training samples.
[0084] Stress: A physiological dataset considering neural stress levels. |Y| = 4 discrete stress states from |S| = 20 subjects. Data were recorded using C = 7 sensors for 300 samples, including heart rate, skin conductance, temperature, and arterial oxygenation levels.
[0085] RSVP: EEG data for rapid serial visual presentation (RSVP) sleepiness. For T=128 samples, C=16 channels, 41400 epochs, three sessions, |S|=10 subjects. |Y|=4 labels for mood induction, resting state, or motor imagery / performance task.
[0086] MI: PhysioNet EEG Motion Imagery (MI) dataset. This dataset consists of T=480 samples with C=64 channels of data from |S|=106 subjects. There are 90 trials of a |Y|=4 class MI task.
[0087] ErrP: Error-related potentials (ErrP) from an EEG dataset. This data consists of |S| = 16 subjects participating in a spelling task, recorded from C = 56 channels across 340 trials and T = 250 samples. |Y| = 2 binary labels represent errors or correct responses.
[0088] Ninapro: EMG dataset for finger movement detection in a prosthetic hand of 10 subjects. Subjects repeatedly moved 12 fingers as represented by a movie displayed on a laptop screen. Each movement was repeated for 5 seconds, followed by a 3-second rest. Muscle activity was collected at a rate of 200 Hz using two Thalmic Myo armbands with C=16 active differential wireless electrodes.
[0089] The examples of datasets described above include a variety of different sensor modalities; specifically, images, electroencephalograms (EEGs), electromyograms (EMGs), temperature, heart rate, etc. In addition to these examples, the system of the present invention is applicable to a wide range of datasets that include combinations of sensor measurements, such as…
[0090] a. Media data, such as images, photographs, movies, text, letters, voice, music, audio, and speech;
[0091] b. Physical data, such as radio waves, optical signals, electrical pulses, temperature, pressure, acceleration, velocity, vibration, and force; and
[0092] c. Physiological data, such as heart rate, blood pressure, weight, water content, electroencephalogram (EEG), electromyography (EMG), electrocardiogram (ECG), kineography, electrooculography (EOG), skin conductance, magnetoencephalography (MEG), and electrocorticography (ECG).
[0093] Model Implementation: Each DNN block is configured with hyperparameters to specify the set of layers with neurons, which are interconnected using trainable variables to propagate signals layer by layer. The trainable variables are numerically optimized using gradient methods such as stochastic gradient descent, adaptive momentum, Ada gradient, Ada bound, Nesterov accelerated gradient, and root mean square propagation. Gradient methods use training data to update the trainable parameters of the DNN blocks, resulting in the DNN block outputs providing fewer loss values, such as mean squared error, cross-entropy, structural similarity, negative log-likelihood, absolute error, cross-covariance, clustering loss, divergence, hinge loss, Huber loss, negative sampling, Wasserstein distance, and triplet loss. Multiple loss functions are further weighted using regularization coefficients according to the training scheduling strategy.
[0094] In some implementations, the DNN block can be reconfigured based on hyperparameters, such that the DNN block is configured with a set of fully connected layers, convolutional layers, graph convolutional layers, recurrent layers, recurrent connections, skip connections, and inception layers, having a set of nonlinear activations including modified linear variants, hyperbolic tangent, sigmoid, gated linear, softmax, and thresholding. The DNN block is further regularized using a set of fallback, swap-out, time zone drop, block drop, drop connection, noise injection, jitter, and batch normalization. In another implementation, the layer parameters are further quantized to reduce the memory size specified by the adjustable hyperparameters.
[0095] As an example of implementation, all models were trained with an initial learning rate of 0.001 using the Adam optimizer at a mini-batch size of 64. The learning rate was halved whenever validation became less stable. A compact convolutional neural network (CNN) with four layers was used as the encoder network E to extract features from C×T multichannel biomedical data. The first three layers had 1D temporal convolutional kernels to take advantage of long-term, mid-term, and short-term temporal dependencies. Each temporal convolution was followed by batch normalization and Rectified Linear Unit (ReLU) activation. The final convolutional layer was a 1D spatial convolution across all channels. AutoBayes selected either a deterministic or variational implicit encoder under a Gaussian prior. The raw data was reconstructed by a decoder network D with 1D spatial and temporal transposed convolutions applied at the same kernel resolution. The data was split into training (70%) and validation (30%). All methods were initialized without data augmentation and using data normalization. For models where adversarial training was available, the regularization parameter λ was used. a It was set to 0.01.
[0096] Figure 10 Show Figure 5 The DNN model parameters in Conv(h,w) c g Let represent a 2D convolutional layer with kernel size (h, w) for output channel c on group g. FC(h) represents a fully connected layer with h output nodes. BN represents batch normalization. For 2D datasets, we use a deep CNN for the encoder and decoder blocks. For the classifier, scrambling estimator, and adversarial DNN blocks, we use a multilayer perceptron (MLP) with three layers whose hidden nodes are doubled from the input dimension. We also use batch normalization (BN) in addition to ReLU activation. Note that for tabular data such as the stress dataset, the CNN is replaced with a 3-layer MLP with a ReLU activation and de-emphasis ratio of 0.2. Additionally, for 2D input dimension cases (e.g., in model A), the MLP classifier is replaced with a CNN. The number of hidden dimensions is chosen to be 64. When we need to feed 2D data into the CNN encoder along with the data (e.g., in model Ds), we use interpolation to concatenate the data as additional channel inputs. In another implementation of link cascading, the system uses multidimensional tensor projection with multidimensional trainable linear filters to convert low-dimensional signals into high-dimensional signals from dimension-mismatched links.
[0097] Another implementation integrates AutoML into AutoBayes for hyperparameter exploration and learning scheduling of individual DNN blocks. Note that AutoBayes can be easily integrated with AutoML to optimize any hyperparameters of individual DNN blocks. More specifically, the system uses reinforcement learning, evolutionary strategies, differential evolution, particle swarm optimization, genetic algorithms, annealing, Bayesian optimization, overclocking, and multi-objective Lamarkian evolution to modify hyperparameters to explore different combinations of discrete and continuous hyperparameter values.
[0098] The system of this invention also provides a further testing step to adapt to a post-training step, which refines the trained DNN blocks by unfreezing some trainable variables, making the DNN blocks robust to new datasets with new perturbation variations (e.g., new subjects). This implementation can reduce the calibration time requirements for new users of the HMI system.
[0099] Results: The results in Figures 6 and 7 show that the optimal inference strategy is highly dependent on the dataset. Specifically, the best model for one dataset does not necessarily perform best for different datasets; for example, model Kz is best for the stress dataset, while the simple model B is best for the ErrP dataset. This suggests that we should adaptively consider different inference strategies for each target dataset. AutoBayes provides such an adaptive framework. Furthermore, a significant performance gap was observed between the best and worst models for each dataset. For example, model Dz achieved 93.1% task accuracy for the MI dataset, while model Es provided 25.5%. This implies a potential risk that we might have a particular model that fails to achieve good performance if we do not explore different models. It should also be noted that reconstruction loss may not be a good metric for selecting a graph model. To efficiently explore potential graphs, one implementation uses graph neural networks to correlate factor graphs, where confidence propagation is applied to progressively prune graph edges from the full-link graph. Specifically, the training schedule consists of strategies that measure the difference between training and validation data using confidence propagation, adaptively controlling the learning rate, regularizing weights, factoring permutations, and pruning low-priority links.
[0100] Variational Bayesian inference using adversarial training
[0101] Variational AE: For example, for Figure 3In model E of (e), AutoBayes can automatically construct an autoencoder architecture when latent variables are involved. In this case, Z represents the random nodes marginalized for X reconstruction and Y inference, thus requiring a VAE. Compared to a regular autoencoder, a VAE uses variational inference by assuming a marginal distribution of the latent p(z). In the variational approach, we reparameterize Z from a prior distribution such as a normal distribution to marginalize. According to the Bayesian graphical model, we can also consider reparameterizing the semi-supervised aspect of S (i.e., incorporating the reconstruction loss of S) as a moderating variable. The moderating of Y and / or S should depend on consistency with the assumptions of the graphical model. Since a VAE is a special case of a CVAE, we will explore further details about more general CVAEs below.
[0102] Conditional VAE: AutoBayes generates CVAE architectures when X directly depends on S or Y and Z in the Bayesian graph, for example, for Figure 3 The models are E / F / G / H / I. For these generative models, the decoder DNN needs to be fed S or Y as tuning parameters. Even for other Bayesian graphs, the S-first inference strategy will still require a conditional encoder in CVAE, such as... Figure 4 The model is Ds / Es / Fs / Gs / Js / Ks, where the hidden Z depends on S.
[0103] Consider the case where S is used as a moderating variable in a data model utilizing factorization:
[0104] p(s,x,z)=p(s)p(z)p(x|s,z), (13)
[0105] Here we directly parameterize p(x|s,z), setting p(z) to be simple (e.g., isotropic Gaussian), and keeping p(s) arbitrary (because it will not be used directly). CVAE is trained by maximizing the likelihood of the data tuple (s,x) with respect to p(x|s), given by the following equation.
[0106] p(x|s)=∫p(x|s,z)p(z)dz, (14)
[0107] Given the potential complexity of parameterizing p(x|s,z), this is difficult to compute precisely. Although it can be approximated by the integral of the sampled Z, the challenge for VAE methods lies in utilizing the variational lower bound of likelihood, which involves a variational approximation of the posterior p(z|s,x) implied by the generative model. Using q(z|s,x) to represent the variational approximation of the posterior, the evidence lower bound (ELBO) is given by:
[0108] logp(x|s)≥E z:q(z|s,x)[logp(x|s,z)]-KL(q(z|s,x)Pp(z)). (15)
[0109] The parameterization of the variational posterior q(z|s,x) can also be decomposed into parameterized components, for example, q(z|s,x) = q(s|x)q(z|s,x) (for example) Figure 4 (As shown in the S-priority model). This decomposition also enables semi-supervised training, which is convenient when some variables (e.g., perturbation variations) are not always labeled. For data tuples including s, the likelihood q(s|x) can also be directly optimized, and the given values of s are used as inputs to the computation of q(z|s,x). However, for tuples with missing s, the component q(s|x) can be used to generate an estimate of s to be input to q(z|s,x). We will discuss the implementation of semi-supervised learning and sampling methods for classifying perturbation variables further later.
[0110] Counter-CVAE: When Z and S should be edge-independent (e.g.) Figure 1 (b) and Figure 5 In the context of [the previous sentence], we can leverage adversarial censorship to enhance the learning of the representation Z, which is decoupled from the harassment variation S. This is achieved by introducing an adversarial network that aims to maximize a parameterized approximation q(s|z) of the likelihood p(s|z), which is also incorporated negatively into the loss of other modules. By maximizing the log-likelihood logq(s|z), the adversarial network essentially maximizes the lower bound of the mutual information I(S;Z), and thus the main network utilizes additional regularization corresponding to minimizing this estimate of the mutual information. This is because the log-likelihood maximized by the adversarial network is given by [equation].
[0111] E[logq(s|z)]=I(S;Z)-H(S)-KL(p(s|z)Pq(s|z)), (16)
[0112] The entropy H(S) is constant.
[0113] In another implementation, the adversarial DNN block is configured to update the learnable trainable variables using alternating gradient ascent and gradient descent, such that the set of latent vectors is minimally correlated with the combination of perturbation variations, wherein the adversarial DNN block is also configured to minimize the difference between the encoder DNN block and the decoder DNN block (referred to as the cycle consistency loss).
[0114] Semi-supervised learning: classification sampling
[0115] Graphical Models for Semi-Supervised Learning: For typical physiological datasets, especially for the testing phase of HMI system deployment with new users, a nuisance value S, such as subject ID or session ID, may not always be available, necessitating semi-supervised methods. We note that some graphical models are well-suited for this type of semi-supervised training. For example, in Figure 3 In Bayesian graphical models, models C / E / G / I require perturbation of S to reproduce X. If no ground truth labels for S are available, then we need to marginalize S across all possible categories of the decoder DNN. Even for other Bayesian models, Figure 4 The corresponding inference factor graphs in the semi-supervised setting may also be inconvenient. Specifically, for models Ez / Fz / Jz / Kz, inference with S at the end nodes is problematic, while other inference models use the inferred S for subsequent inference of other parameters. If S is missing or unknown in the semi-supervised setting, those inference graphs with S at the intermediate nodes are inconvenient because we need to sample all possible harassment categories. For example, Figure 5 The model Kz shown does not require S marginalization, so it is easily applicable to semi-supervised datasets.
[0116] Variational Classification Reparameterization: In one implementation, variational sampling is employed for latent variables with an isotropic normal distribution as their prior distribution for the reparameterization technique, and for classification variables with unknown perturbation variations and task labels generated using the Gumbel softmax technique based on a random number generator to produce near one-hot vectors. Specifically, to address the classification sampling problem, we can use the Gumbel-Softmax reparameterization technique, which allows for differentiable approximations of one-hot encoding. Let [π1,π2,...,π] |S| Let g1, g2, ..., g[] denote the target probability mass function of the categorical variable S. |S| This represents the independent and identically distributed samples extracted from the Gumbel distribution Gumbel(0,1). In this case, Gumbel(0,1) is obtained by extracting a uniform random variable u: Uniform(0,1) and calculating g = -log(-log(u)). Then, an |S|-dimensional vector is generated according to the following formula.
[0117]
[0118] Where τ>0 is the softmax temperature. As the softmax temperature τ approaches 0, samples from the Gumbel-Softmax distribution become one-hot and the distribution becomes the same as the target classification distribution. As an annealing technique (e.g., with exponential decay specified by a scheduling policy), the temperature τ typically decreases across training epochs.
[0119] Figure 11This is a block diagram illustrating an example of a system 500 for automatically constructing an artificial neural network architecture, representing some embodiments of the present disclosure. System 500 includes an interface and data link set 105 configured to receive and transmit signals, at least one processor 120, a memory (or storage set) 130, and a storage unit 140. The processor 120, in conjunction with the memory 130, executes computer-executable programs and algorithms stored in the storage unit 140. The interface and data link set 105 may include a human-machine interface (HMI) 110 and a network interface controller 150. The processor 120 may execute computer-executable programs and algorithms in conjunction with the memory 130, which uploads computer-executable programs and algorithms from the storage unit 140. The computer-executable programs and algorithms stored in the storage unit 140 may be a reconfigurable deep neural network (DNN) 141, hyperparameters 142, scheduling criteria 143, forward / backward data 144, temporary cache 145, Bayesian ball algorithm 146, and AutoBayes algorithm 147.
[0120] System 500 can receive signals via an interface and a data link set. The signals can be datasets of training data, validation data, and test data, and the signals include a set of random number factors in a multidimensional signal X, wherein some of the random number factors are associated with the identified task label Y and the disturbance variation S.
[0121] In some cases, each reconfigurable DNN block (DNN) 141 is configured to encode a multidimensional signal X into latent variables Z, decode the latent variables Z to reconstruct the multidimensional signal X, classify a task label Y, estimate a perturbation variation S, anti-perturbation variation S, or select a graphical model. In this case, the storage also includes hyperparameters, trainable variables, interneuron signals, and provisional computations (including forward propagation signals and backward propagation gradients).
[0122] At least one processor 120 is configured to submit signals and datasets to a reconfigurable DNN block 141 by combining an interface and a storage bank 130. Furthermore, at least one processor 120 performs a Bayesian graph exploration using a Bayesian ball algorithm 146 to reconfigure the DNN block, thereby pruning redundant links to achieve compactness by modifying hyperparameters 142 in the storage bank 130.
[0123] System 500 can apply the analysis of user physiological data to the design of human-machine interface (HMI). System 500 can receive physiological data 195B as user physiological data via network 190 and interface and data link set 105. In some embodiments, system 500 can receive electroencephalogram (EEG) and electromyography (EMG) from sensor set 111 as user physiological data.
[0124] The embodiments of the present invention described above can be implemented in any of a variety of ways. For example, the embodiments can be implemented using hardware, software, or a combination thereof. When implemented in software, the software code can execute on any suitable processor or set of processors (whether located in a single computer or distributed across multiple computers). These processors can be implemented as integrated circuits, with one or more processors in an integrated circuit assembly. However, the processors can be implemented using circuits of any suitable format.
[0125] Furthermore, embodiments of the present invention can be specifically implemented as a method, examples of which have been provided. Actions performed as part of this method can be ordered in any suitable manner. Therefore, embodiments can be constructed that perform actions in a different order than those shown, which may include performing some actions simultaneously, although they are shown as sequential actions in the illustrative embodiments.
[0126] The use of ordinal numbers such as "first" and "second" to modify claim elements in claims does not imply any priority or order of one claim element over another, or the temporal order of the execution of method actions. Rather, it serves only as a label to distinguish one claim element with a specific name from another element with the same name (but using ordinal numbers), thus differentiating claim elements.
[0127] Although the invention has been described by way of example of preferred embodiments, it will be understood that various other adjustments and modifications may be made within the spirit and scope of the invention.
[0128] Therefore, the purpose of the appended claims is to cover all such variations and modifications that fall within the true spirit and scope of the invention.
Claims
1. A system for automatically constructing artificial neural network architectures, the system being used to design human-computer interface systems, the system comprising: An interface and data link set (105) is configured to receive and transmit signals related to images, electroencephalograms, electromyograms, temperature, heart rate, or combinations thereof, wherein the signals include datasets of training data, validation data, and test data, and wherein the signals include a combination of sensor measurements and also include multidimensional signals. X The set of random number factors, wherein a portion of the random number factors are associated with the identified task label. Y and disturbances S Related; A set of storage units (130) storing a set of reconfigurable deep neural network (DNN) blocks (141), wherein the reconfigurable DNN blocks (141) are configured to process the multidimensional signals. X Encoded as multiple latent variables Z For the latent variables Z Decoding is performed to reconstruct the multidimensional signal. X For the task tags Y Classify and estimate the changes in disturbance. S Counteracting the estimated disturbance changes S Alternatively, a graphical model can be selected, wherein the storage (130) further includes hyperparameters (142), trainable variables, interneuron signals, and temporary computed values including forward propagation signals and backward propagation gradients (144). At least one processor (120), in conjunction with the interface (105) and the storage bank (130), is configured to submit the signal and the dataset to the reconfigurable DNN block (141), wherein the at least one processor (120) is configured to perform a Bayesian graph exploration using a Bayesian ball algorithm (146) to reconfigure the DNN block (141) such that redundant links are pruned to achieve compactness by modifying the hyperparameters (142) in the storage bank (130). The dataset includes a combination of sensor measurements and also includes: Media data, including images, photographs, films, text, letters, voice, music, audio, and speech; Physical data, including radio waves, optical signals, electrical pulses, temperature, pressure, acceleration, velocity, vibration, and force; and Physiological data, including heart rate, blood pressure, weight, water content, electroencephalogram (EEG), electromyography (EMG), electrocardiogram (ECG), kineography, electrooculography (EOG), electroskin conductance, magnetoencephalography (MEG), and electrocorticography (ECG).
2. The system according to claim 1, wherein, At least one processor (120) performs the following steps: Modify the hyperparameter (142) to specify the training schedule set, the set of inner layers of the reconfigurable DNN block (141), and the standard set of the underlying dataset; Create a fully chained directed Bayesian graph, configured to link multiple nodes to the graph edges and in a specific order based on the criterion set with respect to the multidimensional signal. X The task tags Y The aforementioned disturbance changes S and the hidden variables Z Associate the graph nodes with the random number factor; According to the specified hypothetical Bayesian graph model to represent the random behavior of the dataset, a specific combination of graph edges in the arranged full-chain Bayesian graph is pruned. The conditional independence relationship between two disjoint nodes in the pruned Bayesian graph is listed using the Bayesian sphere algorithm (146). Creating origins from data signals X Another fully chained directed factor graph of the associated nodes is used to infer other nodes; Trim redundant links in the full-chain factor graph based on the independence list to make the node connectivity compact; The pruned Bayesian graph and the pruned factor graph are merged so that the generative model and the inference model are consistent according to the hypothesized Bayesian graph model. Attach the adversarial reconfigurable DNN block (141) to some perturbation variations independent of the independence list. S Some hidden nodes Z ; Based on the link connectivity specified by the merged factor graph, multiple links are used to assign other reconfigurable DNN blocks (141) to the encoder, decoder, scrambling estimator and task classifier to concatenate multiple data for feeding. According to the specified training schedule, all the reconfigurable DNN blocks (141) constructed using the connected DNN blocks (141) are trained with variational sampling and gradient methods for encoding, decoding, estimation, classification, adversarial estimation and model selection; The model selector DNN selects a graph model based on the output of all the reconfigurable DNN blocks (141) for the validation data; Repeat the above execution according to the specified schedule; as well as The trained reconfigurable DNN block is tested against the test data and against newly introduced data in operation to be tested for robust transfer of perturbations (141).
3. The system according to claim 2, wherein, The variational sampling is employed for the latent variables that have an equidistant normal distribution as their prior distribution for the parameterization technique, and for the categorical variables of unknown perturbation variations and task labels that are generated using the Gumbel softmax technique based on a random number generator and softmax temperature to produce near-isothermal vectors.
4. The system according to claim 2, wherein, Link cascading also includes a step of multidimensional tensor projection using multiple trainable linear filters to transform low-dimensional signals for dimensionality-mismatched links.
5. The system according to claim 2, wherein, The model selection also includes the step of weighted integration and selection of multiple outputs of the hypothetical graphical model based on a model selector DNN block that takes into account the model consensus, attention mechanism and cycle consistency of the encoder / decoder DNN blocks.
6. The system according to claim 1, wherein, The reconfigurable DNN block is configured with a combination of fully connected layers, convolutional layers, graph convolutional layers, recursive layers, loop connections, skip connections, and inception layers, with a set of nonlinear activations including modified linear variants, hyperbolic tangent, sigmoid, gated linear, softmax, and thresholding, and regularized using a combination of fallback, swapping out, time zone dropping, block dropping, dropped connections, noise injection, jitter, and batch normalization, wherein the layer parameters are further quantized to reduce the memory size specified by a plurality of hyperparameters (142) to be tuned by the processor (120).
7. The system according to claim 2, wherein, The training execution uses the training data to update the trainable parameters of the reconfigurable DNN block, such that the output of the reconfigurable DNN block provides a smaller loss value with a combination of objective functions, wherein the objective function further includes a combination of mean squared error, cross-entropy, structural similarity, negative log-likelihood, absolute error, cross-covariance, clustering loss, divergence, hinge loss, Huber loss, negative sampling, Wasserstein distance, and triplet loss, wherein the loss function is weighted by multiple regularization coefficients adjusted according to a specified training schedule.
8. The system according to claim 2, wherein, The gradient method employs a combination of stochastic gradient descent, adaptive momentum, Ada gradient, Ada bound, Nesterov accelerated gradient, and root mean square propagation to optimize the trainable parameters of the reconfigurable DNN block.
9. The system according to claim 1, wherein, The disturbance variations were grouped into different factors, including a set of subject identifiers, session numbers, biological states, environmental states, sensor states, locations, orientations, sampling rates, times, and sensitivity.
10. The system according to claim 1, wherein, Each of the reconfigurable DNN blocks also includes hyperparameters (142) specifying a set of layers with an artificial set of neuron nodes, wherein a pair of neuron nodes from neighboring layers are interconnected using multiple trainable variables and activation functions to pass signals from the previous layer to the next layer.
11. The system according to claim 1, wherein, The disturbance changes S The combination of supervised, semi-supervised, and unsupervised settings is further decomposed into variations. S 1. S 2 , S N Multiple factors are used as multi-domain side information, wherein the latent variables are further decomposed into latent variables. Z 1 、Z 2、…、 Z L Multiple factors are used as decoupling feature vectors.
12. The system according to claim 2, wherein, The steps for modifying hyperparameters (142) employ a combination of reinforcement learning, evolutionary strategies, differential evolution, particle swarm optimization, genetic algorithms, annealing, Bayesian optimization, overclocking, and multi-objective Lamarckian evolution to explore different combinations of discrete and continuous hyperparameter (142) values.
13. The system according to claim 2, wherein, The testing step also includes refining the trained reconfigurable DNN block by unfreezing the combination of the trainable variables to adapt the reconfigurable DNN block to a post-training step with new perturbation variations on a new dataset.
14. The system according to claim 2, wherein, The adversarial reconfigurable DNN block is configured to learn the trainable variables using alternating updates of gradient ascent and gradient descent, such that the latent vector set is minimally correlated with the combination of the perturbation changes, wherein the adversarial reconfigurable DNN block is further configured to minimize the difference between the encoder DNN block and the decoder DNN block.
15. The system according to claim 2, wherein, The training schedule set includes strategies for adaptively controlling the learning rate, regularizing weights, factoring and permuting, and pruning low-priority links by measuring the difference between the training data and the validation data through belief propagation.