A method and system for managing super large scale neural simulation data based on zero storage virtual instantiation
By dividing neurons into four states and utilizing a blueprint database and a deterministic pseudo-random number generator, the memory bottleneck problem of large-scale neuron simulation systems is solved, achieving efficient storage resource management and hardware cost optimization, and supporting whole-brain simulation of 86 billion neurons.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 沈青雷
- Filing Date
- 2026-03-14
- Publication Date
- 2026-06-19
AI Technical Summary
Existing large-scale neuron simulation systems suffer from memory bottlenecks due to full instantiation, quiescent neurons waste storage resources, hardware costs increase linearly, fail to utilize the sparse activity characteristics of the brain, and neuron state information is unrecoverable.
The data management method of zero-storage virtual instantiation divides neurons into four states: virtual, cold storage, dormant, and active. It uses a blueprint database to store type templates and pseudo-random number seeds, generates neuron states through instantaneous instantiation, and dynamically allocates storage resources by combining a deterministic pseudo-random number generator and a multi-level storage management strategy.
It effectively reduces the requirements for neuron metadata management from 5.85TB to 292GB, optimizes storage resource usage by leveraging sparse activity characteristics, and ensures determinism in state reconstruction and flexibility in hardware scale.
Smart Images

Figure CN122242220A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the interdisciplinary field of computational neuroscience and high-performance computing, and in particular to a data management method and system for implementing large-scale neuronal whole-brain simulation on a heterogeneous computing platform with graphics processing units. Background Technology
[0002] Whole-brain simulation is an important research direction in computational neuroscience. The human brain contains approximately 86 billion neurons, with an average of about 7,000 synaptic connections per neuron, totaling about 600 trillion synapses. Existing large-scale neuron simulation systems mainly employ a full instantiation strategy, meaning that each neuron is always kept in a complete state in memory.
[0003] Taking the adaptive exponential integral firing model as an example, each neuron state requires approximately 68 bytes of storage space. Instantiating all 86 billion neurons would require approximately 5.85TB of storage space, far exceeding the memory capacity of any current single-machine or small-scale cluster of graphics processing units. If a complete ion channel model is used, each neuron would require approximately 1500 bytes, resulting in a total storage requirement of approximately 129TB.
[0004] Existing neural simulation systems have the following shortcomings: First, all neurons occupy the same level of storage resources regardless of whether they are active or not, resulting in a large number of silent neurons wasting graphics processing unit memory; second, there is a lack of a mechanism to dynamically allocate storage resources based on activity, leading to a linear increase in hardware costs when scaling up; third, they fail to take advantage of the biological characteristic of the brain that only 1% to 5% of neurons are in an active firing state at any given time; and fourth, once a neuron's state is released, its state information is permanently lost and cannot be deterministically reconstructed using templates and seeds. Summary of the Invention
[0005] The purpose of this invention is to solve the memory bottleneck problem caused by full instantiation in existing large-scale neuron simulation systems, and to provide a data management method and system based on zero-storage virtual instantiation, making it possible to achieve whole-brain simulation of 86 billion neurons on limited graphics processing unit hardware resources.
[0006] To achieve the above objectives, this invention provides a data management method for large-scale neuron simulations, comprising the following steps:
[0007] Step (a) marks the N neuronal entities in the neural simulation system as one of four existence states: virtual state, cold storage state, dormant state, and active state.
[0008] In step (b), the neuron entity in the virtual state does not store dynamic state data in any physical storage medium, but only retains the type template identifier and pseudo-random number seed of the neuron in the blueprint database.
[0009] In step (c), the active neuron entity maintains complete dynamic state data in the graphics processing unit's memory, the dynamic state data including membrane potential, ion channel state, and adaptation variables.
[0010] Step (d): When a signal reaches a target neuron in the virtual state, an immediate instantiation operation is performed. The immediate instantiation operation includes: querying the corresponding neuron type template from the blueprint database based on the type template identifier to obtain the default parameter set for that type; using a deterministic pseudo-random number generator with the pseudo-random number seed as input to generate parameter perturbation values; combining the default parameter set with the parameter perturbation values to generate a complete neuron state vector; and allocating the neuron state vector to the graphics processing unit's video memory, causing the neuron to transition to the active state.
[0011] Step (e): When the active neuron does not receive a signal or generate firing activity within a preset time threshold, it is downgraded step by step according to the path of active state, dormant state, cold storage state, and virtual state. When downgraded to virtual state, all physical storage space is released, and only the type template identifier and the pseudo-random number seed are retained.
[0012] Furthermore, the blueprint database is a read-only database, and the neuron type template includes membrane capacitance, leakage conductance, leakage reversal potential, threshold potential, threshold sharpness parameter, adaptation time constant, and adaptation current increment. The neuron type template remains unchanged during simulation.
[0013] Furthermore, the deterministic pseudo-random number generator is a counter-based pseudo-random number generator, and the pseudo-random number seed is determined by the XOR operation of the neuron's globally unique identifier and the global seed. For the same neuron identifier and the same global seed, regardless of the time and location of the instantiation operation, the neuron state vector generated by the instantaneous instantiation operation is exactly the same.
[0014] Furthermore, the specific calculation formula for the parameter perturbation value is: param_i = default_i × (1.0 + noise_level × PRNG(seed, i)), where param_i is the reconstructed value of the i-th parameter, default_i is the default value of the i-th parameter, noise_level is the globally configured noise level coefficient, PRNG is the deterministic pseudo-random number generator, seed is the pseudo-random number seed, and i is the parameter index.
[0015] Furthermore, the dormant neuron entities are stored in the main memory of the central processing unit, maintaining simplified state data, which includes membrane potential values, ion channel state tables, and the last firing timestamp. The cold-stored neuron entities are stored in non-volatile memory, retaining only the neuron identifier, the last firing timestamp, and the weight hierarchy marker.
[0016] Furthermore, the step-by-step demotion adopts a two-factor elimination strategy that combines the least recently used strategy with an importance score. The importance score comprehensively considers a weighted combination of discharge frequency, number of synaptic modifications, number of global workspace broadcast participations, and connectivity.
[0017] Furthermore, it also includes a predictive prefetching step, in which when the neuronal activity rate of a certain brain region exceeds a preset threshold, neurons in the spatial neighborhood of that region that are in a cold storage state or a virtual state are preloaded in batches to a dormant state or an active state.
[0018] Furthermore, the active neurons are stored in the graphics processing unit's memory using a structure array layout, with data fields of the same type arranged in contiguous memory regions. The neurons are arranged according to the order of the space-filling curve, so that spatially adjacent neurons are located in adjacent positions in memory.
[0019] This invention also provides a data management system for large-scale neuron simulation, comprising: a blueprint management module for storing and managing neuron type templates; a virtual storage index module for maintaining metadata indexes of neurons in a virtual state; an instantiation engine module for performing instantaneous instantiation operations when a neuron in a virtual state receives a signal; a multi-level storage management module for managing four storage levels and automatically performing upgrade and downgrade migration operations; and a downgrade decision module for determining downgraded objects based on a least recently used strategy and importance score.
[0020] The beneficial effects of this invention are as follows: First, through the zero-storage design of virtual states, the metadata management of 86 billion neurons is compressed to a few GB of index space, reducing the graphics processing unit's video memory requirement from approximately 5.85TB to approximately 292GB; Second, by utilizing the biological characteristics of sparse brain activity, only active neurons consume computing resources; Third, the deterministic pseudo-random number generator ensures that the same seed produces the same parameters, and the reconstruction process is completely deterministic and repeatable; Fourth, the same codebase can adapt to various hardware scales, from consumer-grade graphics processing units to data center clusters, through configuration parameters. Attached Figure Description
[0021] Figure 1 This is a schematic diagram of the existence state of the fourth-order neurons in this invention.
[0022] Figure 2 This is a flowchart illustrating the instantaneous instantiation process of the virtual neuron in this invention.
[0023] Figure 3 This is a flowchart of the neuron degradation path in this invention.
[0024] Figure 4 This is a comparison chart of the scale invariance of the present invention under different hardware configurations.
[0025] Figure 5 This is a schematic diagram illustrating the deterministic reconstruction principle of the present invention.
[0026] Figure 6 This is a diagram of the data management system architecture of the present invention.
[0027] Figure 7 This is a comparison chart of the memory usage of the graphics processing unit in this invention and existing technologies.
[0028] Figure 8 This is a schematic diagram of the predictive prefetching strategy of the present invention. Detailed Implementation
[0029] The present invention will be further described below with reference to the accompanying drawings and embodiments.
[0030] The core idea of this invention is to define a neuron entity as having four states. Virtual neurons do not occupy any physical storage space, retaining only a type template identifier and a pseudo-random number seed. When a virtual neuron receives a signal and needs to participate in computation, the system instantly reconstructs its complete state using template parameters combined with deterministic random noise. This approach is based on a key insight from neuroscience: at any given time, only about 5% of the 86 billion neurons are in an active computational state.
[0031] like Figure 1 As shown, this invention defines four neuron existence states to form a hierarchical storage framework;
[0032] Active states: Neuron states are stored in the graphics processing unit's memory, maintaining a complete dynamic state vector, including membrane potential, ion channel states, adaptation variables, etc. Each neuron occupies 68 to 1500 bytes, depending on the model's precision level. Active neurons can directly participate in real-time simulation calculations without state transitions. The neurons are stored in the memory using a structure array layout, arranged according to a space-fill curve to optimize cache hit rate.
[0033] Dormant state: The neuron state is stored in the central processing unit's main memory, maintaining a simplified state data of approximately 40 bytes, including membrane potential values, ion channel state tables, and the timestamp of the last discharge. This information is sufficient to restore the neuron to a complete active state within microseconds.
[0034] Cold storage state: Neuron states are stored in non-volatile memory, retaining only the minimum state of approximately 16 bytes, including the neuron identifier, last firing timestamp, and weight level marker. The cold storage layer is managed using a key-value storage engine based on a log-structured merge tree, and each data file is equipped with an independent Bloom filter to accelerate negative lookups.
[0035] Virtual state: Neurons occupy no physical storage space, zero storage. The system only retains type template identifiers and pseudo-random number seeds in the index of the blueprint database. The metadata of 86 billion virtual neurons can be managed with only a few gigabytes of index space.
[0036] like Figure 2 As shown, when a signal reaches a neuron in a virtual state, the system executes the following on-the-fly instantiation process:
[0037] Step 1: Template Query. Extract the type identifier from the neuron identifier and query the corresponding neuron type template in the read-only cache of the Blueprint Management module. Obtain the default parameter set, including membrane capacitance, leakage conductance, leakage reversal potential, threshold potential, threshold sharpness parameter, adaptation time constant, and adaptation current increment. Hotspot type templates are cached in the constant memory of the graphics processing unit, with a query latency of approximately 10 nanoseconds.
[0038] Step 2, State Initialization. Load default parameters from the neuron type template to populate the state structure, and select the model precision based on the ion channel hierarchy field. Initialize the membrane potential to the resting potential.
[0039] Step 3, Deterministic Random Noise Injection. A counter-based deterministic pseudo-random number generator is used, taking the seed obtained by XORing the neuron identifier and the global seed as input, to apply a deterministic perturbation to each parameter. The calculation formula is param_i = default_i × (1.0 + noise_level × PRNG(seed, i)), where noise_level is the globally configured noise level coefficient. This process guarantees that for the same neuron identifier and the same global seed, the reconstructed parameter vector is exactly the same regardless of when or where instantiation is performed.
[0040] Step 4, memory allocation. Check the memory capacity of the graphics processing unit (GPU). If it is full, trigger a degradation strategy to free up space, allocate new slots in the structure array data structure, transfer the initialized state data to the GPU, and update the metadata index.
[0041] like Figure 3 As shown, when a neuron ceases to be active, it degrades step-by-step according to the following path: active state to dormant state, dormant state to cold storage state, and cold storage state to virtual state. The degrading decision is based on idle time since the last firing, importance score, current storage layer capacity pressure, and least recently used ranking. The prerequisite for degrading from the cold storage state to the virtual state is that the neuron's weight increment is zero, meaning the neuron has never undergone a learning process. If the weight increment is not zero, the neuron remains in the cold storage state.
[0042] Example 1: With a hardware configuration including a consumer-grade graphics processing unit (GPU) with 24GB of video memory, 64GB of CPU memory, and 2TB of non-volatile memory, the maximum capacity is set at 5 million neurons in active state, 20 million neurons in dormant state, and 80 million neurons in cold storage state. The 5 million active neurons occupy approximately 340MB of GPU video memory, the 20 million dormant neurons occupy approximately 800MB of CPU memory, and the 80 million cold storage neurons occupy approximately 1.28GB of non-volatile memory; the remainder is entirely virtual neurons.
[0043] Example 2: With a hardware configuration including four 80GB graphics processing units (GPUs), 512GB of CPU memory, and 8TB of non-volatile memory, the maximum capacity in active state is set at 200 million neurons, the maximum capacity in dormant state at 800 million neurons, and the maximum capacity in cold storage state at 3.2 billion neurons. The 200 million active neurons occupy approximately 13.6GB of GPU memory, and there are approximately 82 billion virtual neurons, accounting for 95.3% of the 86 billion.
[0044] Example 3: Virtual Neuron Instantiation Delay Test. 100 virtual neurons simultaneously receive input signals to trigger instantiation. The time taken for each step is as follows: template query approximately 30 nanoseconds, state initialization approximately 200 nanoseconds, noise injection approximately 50 nanoseconds, memory slot allocation approximately 5 microseconds, and data transmission approximately 2 microseconds, totaling approximately 8 microseconds, which is much less than the simulation time step of 0.1 milliseconds.
[0045] like Figure 6 As shown, the data management system of this invention includes a blueprint management module, a virtual storage index module, an instantiation engine module, a multi-level storage management module, and a degradation decision module. The blueprint management module stores templates for various neuron types and loads them into a read-only memory area upon system startup. The virtual storage index module maintains the type template identifier and pseudo-random number seed for each virtual neuron. When a virtual neuron receives a signal, the instantiation engine module queries the default parameters from the blueprint management module, injects parameter perturbations using a deterministic pseudo-random number generator, and generates a complete neuron state vector. The multi-level storage management module manages four storage levels and automatically performs upgrade and degradation migration operations. The degradation decision module determines the degradation targets based on a least recently used strategy and importance scoring.
[0046] like Figure 8 As shown, the present invention also includes a predictive prefetching mechanism. When the neuronal activity rate of a certain brain region exceeds a preset threshold, the prefetching scheduling module obtains the spatial neighborhood of the current hotspot region and asynchronously instantiates neurons in the neighborhood that are in a virtual state or a cold storage state into an active state or a dormant state in batches, without blocking the main simulation calculation loop.
[0047] The above description is merely a preferred embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A data management method for ultra-large-scale neuron simulation, characterized in that, Includes the following steps: (a) Label the N neuronal entities in the neural simulation system as one of four existence states: virtual state, cold storage state, dormant state, and active state; (b) The virtual state neuron entity does not store dynamic state data in any physical storage medium, but only retains the type template identifier and pseudo-random number seed of the neuron in the blueprint database; (c) The active neuron entity maintains complete dynamic state data in the graphics processing unit’s memory, the dynamic state data including membrane potential, ion channel state and adaptation variables; (d) When a signal reaches a target neuron in the virtual state, an immediate instantiation operation is performed, the immediate instantiation operation including: (d1) Query the corresponding neuron type template from the blueprint database according to the type template identifier, and obtain the default parameter set for that type; (d2) Using a deterministic pseudo-random number generator, with the pseudo-random number seed as input, generate parameter perturbation values; (d3) Combine the default parameter set with the parameter perturbation value to generate a complete neuron state vector; (d4) The neuron state vector is allocated to the video memory of the graphics processing unit, so that the neuron is transformed into the active state; (e) When the active neuron does not receive a signal or generate a firing activity within a preset time threshold, it is downgraded step by step according to the path of active state, dormant state, cold storage state and virtual state. When downgraded to virtual state, all physical storage space is released and only the type template identifier and the pseudo-random number seed are retained.
2. The method according to claim 1, characterized in that: The blueprint database is a read-only database. The neuron type template includes membrane capacitance, leakage conductance, leakage reversal potential, threshold potential, threshold sharpness parameter, adaptation time constant, and adaptation current increment. The neuron type template remains unchanged during simulation.
3. The method according to claim 1, characterized in that: The deterministic pseudo-random number generator is a counter-based pseudo-random number generator. The pseudo-random number seed is determined by the XOR operation between the globally unique identifier of the neuron and the global seed. For the same neuron identifier and the same global seed, the neuron state vector generated by the instantaneous instantiation operation is exactly the same regardless of the time and location of the instantiation operation.
4. The method according to claim 1, characterized in that: In step (d2), the parameter perturbation value is a deterministic perturbation within the preset perturbation range of the default parameter, and the specific calculation formula is as follows: param_i = default_i × (1.0 + noise_level × PRNG(seed, i)) Wherein, param_i is the reconstructed value of the i-th parameter, default_i is the default value of the i-th parameter, noise_level is the globally configured noise level coefficient, PRNG is the deterministic pseudo-random number generator, seed is the pseudo-random number seed, and i is the parameter index.
5. The method according to claim 1, characterized in that: The dormant neuron entities are stored in the main memory of the central processing unit, maintaining simplified state data, which includes membrane potential values, ion channel state tables, and the last firing timestamp; the cold-stored neuron entities are stored in non-volatile memory, retaining only the neuron identifier, the last firing timestamp, and the weight level marker.
6. The method according to claim 1, characterized in that: The step-by-step demotion in step (e) adopts a two-factor elimination strategy that combines the least recently used strategy with an importance score. The importance score comprehensively considers a weighted combination of discharge frequency, number of synaptic modifications, number of global workspace broadcast participations, and connectivity.
7. The method according to claim 1, characterized in that: It also includes a predictive prefetching step, in which when the neuronal activity rate of a certain brain region exceeds a preset threshold, neurons in the spatial neighborhood of that region that are in a cold storage state or virtual state are preloaded in batches to a dormant state or an active state.
8. The method according to claim 1, characterized in that: The active neurons are stored in the graphics processing unit's memory using a structure array layout, with data fields of the same type arranged in contiguous memory areas; the neurons are arranged according to the order of the space-filling curve, so that neurons that are spatially adjacent are located in adjacent positions in memory.
9. The method according to claim 1, characterized in that: The maximum capacity limit for each of the four existence states is specified through a configuration file, enabling the same program code to run under different hardware configurations and achieving scale invariance from consumer-grade graphics processing units to data center graphics processing unit clusters.
10. The method according to claim 1, characterized in that: The blueprint database contains templates for various neuron types, covering major neuron subtypes including pyramidal cells, basket cells, and dendritic cells. Each template contains multiple kinetic parameters and ion channel hierarchical configuration parameters.
11. The method according to claim 1, characterized in that: The cold storage state is managed using a key-value storage engine based on a log structure merge tree, and each data file is equipped with an independent Bloom filter to accelerate negative lookups.
12. A data management system for ultra-large-scale neuron simulation, characterized in that, include: The blueprint management module is used to store and manage neuron type templates. The neuron type templates contain a set of default dynamic parameters for various types of neurons. The neuron type templates are loaded into the read-only memory area when the system starts. The virtual storage index module is used to maintain the metadata index of neurons in a virtual state. The metadata index contains only the type template identifier and pseudo-random number seed of each virtual neuron. The instantiation engine module is used to perform instantaneous instantiation operations when a neuron in a virtual state receives a signal. The instantaneous instantiation operations include querying default parameters from the blueprint management module, injecting parameter perturbations using a deterministic pseudo-random number generator, and generating a complete neuron state vector. The multi-level storage management module manages four storage levels: graphics processing unit video memory, central processing unit memory, non-volatile memory, and virtual index, and automatically performs upgrade and downgrade migration operations based on the activity state of neurons. The degradation decision module is used to determine which active neurons should be degraded to lower-level storage based on the least recently used strategy and importance score, until they are degraded to a virtual state.
13. The system according to claim 12, characterized in that: The deterministic pseudo-random number generator in the instantiation engine module is deployed on the graphics processing unit, supporting the simultaneous instantiation of multiple neurons in parallel execution within the graphics processing unit's computation thread.
14. The system according to claim 12, characterized in that: It also includes a prefetching scheduling module, which monitors the activity rate of neurons in each brain region. When the activity rate exceeds a preset threshold, it asynchronously instantiates virtual neurons in the spatial neighborhood of that region into the graphics processing unit memory layer in batches.
15. The system according to claim 12, characterized in that: The multi-level storage management module receives capacity parameters for each level through a configuration file, enabling the same system to run active neurons of different scales on graphics processing units equipped with different capacities of video memory.