An end-side dynamic inference method and system of semantic compression and neuromorphic perception

By combining a lightweight convolutional neural network and a neuromorphic storage pool with a dynamic sparse activation module, the resource waste and device security issues in edge inference technology are solved, realizing an edge inference system with instant response and self-evolution.

CN122222012APending Publication Date: 2026-06-16SHENZHEN SAHARA INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN SAHARA INFORMATION TECHNOLOGY CO LTD
Filing Date
2026-03-16
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing edge-side inference technology solutions suffer from large model parameters and high computational complexity, resulting in wasted memory and computing resources. They cannot provide an instant interactive experience, cannot evolve on their own, and cannot perceive device status in real time. This can easily lead to the device continuing to perform high-load calculations when it is overheating or low in battery, causing device damage.

Method used

A lightweight convolutional neural network is used for semantic compression. Combined with a neuromorphic storage pool and a dynamic sparse activation module, the network depth and computing resource allocation are dynamically adjusted through information entropy calculation and hardware status monitoring to achieve self-evolution and security protection.

🎯Benefits of technology

While reducing computing and memory usage, it provides instant responsiveness, adaptive learning capabilities, ensures device safety, avoids hardware damage during overheating or low battery conditions, and guarantees system survivability and service continuity.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122222012A_ABST
    Figure CN122222012A_ABST
Patent Text Reader

Abstract

An end-side dynamic inference method of semantic compression and neuromorphic perception, an input preprocessing module receives a natural language sequence and outputs a structured input sequence to a semantic compression network, the semantic compression network is located downstream of the input preprocessing module, maps the structured input sequence into a low-dimensional semantic feature vector and transmits it to a semantic gating unit and a neuromorphic storage pool, the semantic gating unit calculates the information entropy based on the low-dimensional semantic feature vector, and sends a task complexity level signal to a dynamic sparse activation module. The system uses a lightweight convolutional neural network with less than one tenth of the parameters of a traditional encoder as a semantic compression network at the input end, so that the system can map a high-dimensional natural language sequence into a compact low-dimensional semantic feature vector with almost zero delay, reduce the computational overhead and memory occupation of the front-end processing, and make the system have the self-adaptive learning ability of getting smarter.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of neural perception and reasoning technology, specifically a method and system for end-to-end dynamic reasoning based on semantic compression and neuromorphic perception. Background Technology

[0002] With the explosive growth of large language models in the field of natural language processing, deploying inference capabilities to resource-constrained edge environments such as mobile phones, IoT terminals, and embedded devices has become an industry trend. However, existing edge inference technologies face the following technical challenges in practical applications: First, existing technologies generally rely on traditional heavy encoder architectures (such as the standard TransformerEncoder or deep bidirectional LSTM) in the input data processing stage. The model has a large number of parameters and a complex computation graph, which means that it needs to consume a lot of memory bandwidth and computing power when converting high-dimensional natural language sequences into semantic feature vectors. This not only causes significant inference delays, making it impossible for users to obtain an immediate interactive experience, but also the huge memory consumption often forces the edge device to frequently exchange data or even trigger memory overflows, crowding out the available resources of the subsequent core inference layer. This causes the system to fall into a high-load state in the early stage of startup, leaving no room for subsequent dynamic scheduling.

[0003] Secondly, when faced with each new input request, traditional solutions blindly start all or part of a fixed neural network layer for repeated calculations, regardless of whether the task has been processed before. This not only causes a huge waste of computing power and increases unnecessary energy consumption, but also prevents the system from accumulating knowledge over time and lacks the ability to self-evolve.

[0004] Furthermore, existing scheduling algorithms cannot detect in real time a sharp drop in battery power, an abnormal rise in chip core temperature, or the current load congestion of the operating system. When the device is in a low power or overheating state, traditional systems still activate a large number of network layers for calculation according to the standards of high-complexity tasks, which can easily cause the device to suddenly shut down, the system to be forced to reduce frequency and lag, or even cause permanent hardware damage due to prolonged high-temperature operation. Summary of the Invention

[0005] In view of the above situation and to overcome the shortcomings of the prior art, the present invention provides an edge-side dynamic reasoning method and system for semantic compression and neuromorphic perception, so as to at least partially solve the above technical problems.

[0006] The technical solution adopted in this invention is as follows: This invention proposes an end-to-end dynamic reasoning method based on semantic compression and neuromorphic perception, comprising the following steps: Step 1: The input preprocessing module receives the natural language sequence and outputs the structured input sequence to the semantic compression network; Step 2: The semantic compression network is located downstream of the input preprocessing module. It maps the structured input sequence into a low-dimensional semantic feature vector and transmits it to the semantic gating unit and the neuromorphic storage pool. Step 3: The semantic gating unit calculates the information entropy based on the low-dimensional semantic feature vector and sends the task complexity level signal to the dynamic sparse activation module. Step 4: The neuromorphic storage pool stores historical semantic feature vectors and corresponding inference results. It compares the current low-dimensional semantic feature vector with the stored historical semantic feature vector. If the match is successful, the historical inference result is directly output. If the match fails, a miss signal is generated and sent to the dynamic sparse activation module. Step 5: The dynamic sparse activation module is connected to the semantic gating unit, the neuromorphic storage pool and the hardware status monitoring module respectively, and receives the task complexity level signal, the miss signal and the device physical status parameters to generate a dynamic sparse mask. Step 6: The core inference layer of the large model receives the dynamic sparse mask, activates only the weight blocks identified by the dynamic sparse mask to calculate the structured input sequence, and outputs preliminary inference results.

[0007] In one embodiment of the present invention, the semantic gating unit includes an information entropy calculation submodule, which calculates information entropy according to the formula... Calculate the information entropy of the low-dimensional semantic feature vector, where The semantic feature vector is the first The probability distribution of each dimension feature; the semantic gating unit also includes a threshold comparison submodule, which compares the calculated information entropy with a preset low-complexity threshold. and high complexity threshold When comparing, if the information entropy is less than When the information entropy is greater than a certain value, it is considered a low-complexity task. When a task is classified as high-complexity, it is classified as medium-complexity; when it falls between these two levels, it is classified as high-complexity. The classification result is then used as the task complexity level signal.

[0008] In one embodiment of the present invention, the neuromorphic storage pool includes a storage array, a fuzzy matching submodule, and a dynamic update submodule. The storage array stores data items, each of which includes a historical semantic feature vector, a historical inference result, a storage timestamp, and a usage frequency count. The fuzzy matching submodule calculates the cosine similarity between the current low-dimensional semantic feature vector and each historical semantic feature vector in the storage array. When the maximum cosine similarity is greater than the dynamic matching threshold τ, a successful match is determined. The dynamic update submodule is connected to the storage array and periodically updates the data items according to a time decay factor and a frequency enhancement factor, deleting data items whose timestamps exceed a preset expiration time or increasing the matching weight of frequently used data items.

[0009] In one embodiment of the present invention, the hardware status monitoring module collects battery power parameters, chip core temperature parameters, and current computing load parameters in real time through the device's underlying interface; the dynamic sparse activation module has a built-in mapping function generator, which takes the task complexity level signal, the battery power parameters, the chip core temperature parameters, and the current computing load parameters as input variables, and generates the dynamic sparse mask in binary matrix form through a preset nonlinear mapping function. The elements in the binary matrix correspond one-to-one with the weight blocks in the core inference layer of the large model. An element value of 1 indicates that the corresponding weight block is activated, and an element value of 0 indicates that the corresponding weight block is masked.

[0010] In one embodiment of the present invention, the dynamic sparse activation module further includes a protection threshold trigger submodule, which is connected to the hardware status monitoring module and the mapping function generator respectively. When the battery power parameter is lower than a first preset power threshold or the chip core temperature parameter is higher than a first preset temperature threshold, the protection threshold trigger submodule sends a forced power-saving command to the mapping function generator. In response to the forced power-saving command, the mapping function generator forcibly limits the proportion of elements with a value of 1 in the generated dynamic sparse mask to a range of 10% to 15%, regardless of the task complexity indicated by the task complexity level signal.

[0011] In one embodiment of the present invention, a system for a semantic compression and neuromorphic perception-based end-side dynamic reasoning method includes: The input preprocessing module receives external natural language sequences at its input end and connects to the input end of the semantic compression network at its output end. A semantic compression network, located on the output path of the input preprocessing module, is used to convert input data into low-dimensional semantic feature vectors. A semantic gating unit, whose input is connected to the output of the semantic compression network, is used to receive the low-dimensional semantic feature vector and output a task complexity level signal. The neuromorphic storage pool, whose input is connected to the output of the semantic compression network, stores a historical semantic feature vector library, which is used to perform similarity comparison and output matching results or miss signals. The hardware status monitoring module, which is located at the system bottom layer, is used to collect and output the physical status parameters of the device in real time; The dynamic sparse activation module has a first input terminal connected to the semantic gating unit, a second input terminal connected to the neuromorphic storage pool, and a third input terminal connected to the hardware state monitoring module, and is used to generate a dynamic sparse mask by integrating various signals. The core inference layer of the large model has its control end connected to the output end of the dynamic sparse activation module to receive the dynamic sparse mask, and its calculation end connected to the semantic compression network, which is used to selectively activate the internal weight blocks according to the dynamic sparse mask to perform inference calculation and output the result.

[0012] In one embodiment of the present invention, the semantic compression network adopts a lightweight convolutional neural network structure, with a parameter count less than one-tenth of that of a traditional encoder, and is directly embedded in the data path between the input preprocessing module and the semantic gating unit; the neuromorphic storage pool is physically deployed in the cache area of ​​the system memory, and communicates bidirectionally with the semantic compression network and the dynamic sparse activation module via a bus to support the writing and reading of historical data.

[0013] In one embodiment of the present invention, the dynamic sparse activation module includes a signal fusion unit and a mask generation unit. The signal fusion unit is electrically connected to the semantic gating unit, the neuromorphic storage pool, and the hardware state monitoring module, respectively, and is used to perform vector concatenation of the received task complexity level signal, miss signal, and device physical state parameters. The mask generation unit is connected to the output of the signal fusion unit and has a configurable mapping table stored inside, which is used to generate binary control signals that control the switching states of each Transformer encoder layer and decoder layer in the core inference layer of the large model by looking up the table or calculating based on the concatenated vector.

[0014] In one embodiment of the present invention, an output fusion module is further included. The input of the output fusion module is connected to the output of the neuromorphic storage pool and the output of the large model core inference layer, respectively. When the neuromorphic storage pool outputs a historical inference result that has been successfully matched, the output fusion module directly uses the result as the final output of the system. When the neuromorphic storage pool outputs a miss signal and the large model core inference layer has completed the calculation, the output fusion module receives the preliminary inference result of the large model core inference layer, performs syntax correction and format standardization on it, and uses it as the final output of the system. At the same time, the low-dimensional semantic feature vector generated this time and the final output feedback of the system are written into the neuromorphic storage pool.

[0015] In one embodiment of the present invention, the hardware status monitoring module includes a power detection submodule, a temperature sensing submodule, and a load monitoring submodule. The power detection submodule is connected to the device's battery management chip, the temperature sensing submodule is connected to the thermistor inside the device's SoC, and the load monitoring submodule is connected to the device's operating system kernel scheduler. The dynamic sparse activation module also integrates a mode switching switch, which is controlled by an abnormal status flag bit output by the hardware status monitoring module. When an abnormal status flag bit is detected to be valid, the mode switching switch cuts off the semantic gating unit's control over mask generation, and instead, a preset extreme energy-saving strategy directly generates a dynamic sparse mask with a fixed sparsity.

[0016] The beneficial effects of the technical solution of this invention are as follows: This invention employs a lightweight convolutional neural network with fewer than one-tenth the parameters of a traditional encoder as a semantic compression network at the input end. This enables the system to map high-dimensional natural language sequences into compact low-dimensional semantic feature vectors with almost zero latency, reducing the computational overhead and memory usage of front-end processing and laying the foundation for subsequent rapid decision-making. The neuromorphic storage pool leverages the physical advantage of being deployed in a high-speed cache area. Through a fuzzy matching submodule, it calculates the cosine similarity between the current feature and the historical database in real time. Through a self-evolutionary strategy based on time decay and frequency enhancement, the dynamic update submodule ensures that the memory bank always retains high-value data and automatically removes expired information, giving the system an adaptive learning capability that becomes smarter with use.

[0017] This invention introduces a multi-dimensional signal fusion mechanism. The semantic gating unit accurately quantifies task complexity by calculating information entropy, while the hardware status monitoring module comprehensively senses the physical health of the device through three sub-modules: power detection, temperature sensing, and load monitoring. The signal fusion unit in the dynamic sparse activation module performs vector splicing of these two types of heterogeneous data, enabling the mask generation unit to generate refined binary control signals based on a configurable mapping table. This allows for flexible adjustment of network depth according to the difficulty of the task, calling only shallow networks for simple tasks and activating deep networks as needed for complex tasks, thereby minimizing matrix operations and memory access pressure while ensuring accuracy.

[0018] The built-in protection threshold triggering submodule of this invention acts as an absolute safety circuit breaker mechanism. When the battery level is detected to be lower than the preset threshold or the chip temperature is higher than the safety red line, the module immediately sends a forced power-saving command, forcing the mapping function generator to ignore the task complexity and forcibly compress the activation ratio of the dynamic sparse mask to an extremely low range of 10% to 15%. Although the extreme strategy temporarily sacrifices some inference performance, it effectively prevents the device from suddenly shutting down due to over-computation or causing permanent hardware damage due to overheating, ensuring the system's survivability and service continuity in extreme environments.

[0019] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description

[0020] The above and / or additional aspects and advantages of the present invention will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein: Figure 1 This is a flowchart illustrating the edge-side dynamic reasoning method for semantic compression and neuromorphic perception proposed in an embodiment of the present invention. Figure 2 This is a first functional diagram of the end-side dynamic reasoning method for semantic compression and neuromorphic perception proposed in an embodiment of the present invention. Figure 3 This is a second functional diagram of the semantic compression and neuromorphic perception end-side dynamic reasoning method proposed in an embodiment of the present invention; Figure 4 This is a schematic diagram of the framework of the edge dynamic reasoning system for semantic compression and neuromorphic perception proposed in an embodiment of the present invention. Detailed Implementation

[0021] Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain the present invention, and should not be construed as limiting the present invention.

[0022] The following describes an end-to-end dynamic reasoning method and system for semantic compression and neuromorphic perception according to an embodiment of the present invention, with reference to the accompanying drawings.

[0023] like Figures 1 to 4 As shown, this embodiment of the invention provides an edge-side dynamic reasoning method based on semantic compression and neuromorphic perception, comprising the following steps: Step 1: The input preprocessing module receives the natural language sequence and outputs the structured input sequence to the semantic compression network; Step 2: The semantic compression network is located downstream of the input preprocessing module. It maps the structured input sequence into a low-dimensional semantic feature vector and transmits it to the semantic gating unit and the neuromorphic storage pool. Step 3: The semantic gating unit calculates the information entropy based on the low-dimensional semantic feature vector and sends the task complexity level signal to the dynamic sparse activation module. Step 4: The neuromorphic storage pool stores historical semantic feature vectors and corresponding inference results. It compares the current low-dimensional semantic feature vector with the stored historical semantic feature vectors. If the match is successful, the historical inference result is directly output. If the match fails, a miss signal is generated and sent to the dynamic sparse activation module. Step 5: The dynamic sparse activation module is connected to the semantic gating unit, the neuromorphic storage pool and the hardware status monitoring module respectively, and receives the task complexity level signal, the miss signal and the device physical status parameters to generate a dynamic sparse mask. Step 6: The core inference layer of the large model receives the dynamic sparse mask, activates only the weight blocks identified by the dynamic sparse mask to calculate the structured input sequence, and outputs the preliminary inference results.

[0024] In a specific application of this invention, the system first receives the natural language sequence input by the user through the input preprocessing module. The module immediately performs deep structured cleaning and standardization on the original text. The preprocessing module filters out data noise and extracts key semantic units. Then, the high-quality structured input sequence is sent to the downstream semantic compression network. The semantic compression network, as the feature abstraction core of the system, uses a lightweight neural network architecture to map the high-dimensional sparse structured sequence into a compact low-dimensional semantic feature vector. The generated low-dimensional semantic feature vector is then simultaneously distributed to the two parallel processing branches: the semantic gating unit and the neuromorphic storage pool.

[0025] Upon receiving a feature vector, the semantic gating unit immediately initiates a cognitive evaluation mechanism. Its internal information entropy calculation submodule deeply analyzes the probability distribution of each dimension of the feature vector. By quantifying the uncertainty of the data, the semantic gating unit accurately determines the inherent complexity of the task. After comparing the calculation results with a preset threshold, the semantic gating unit generates a specific task complexity level signal, which directly reflects the task's theoretical computational requirements. Simultaneously, the neuromorphic storage pool performs a memory retrieval operation. The fuzzy matching submodule within the storage pool calculates the cosine similarity between the current feature vector and vectors in the historical database. The neuromorphic storage pool determines whether the current task is a "re-doing of an old problem." If the similarity exceeds a dynamic threshold, a successful match is determined. The neuromorphic storage pool directly retrieves and outputs the cached historical inference results to achieve instantaneous response with zero computational power consumption. If a match fails, the neuromorphic storage pool generates a miss signal and triggers a dynamic update submodule to perform self-evolution of the memory database based on time decay and frequency enhancement.

[0026] As the intelligent scheduling hub of the system, the dynamic sparse activation module simultaneously receives task complexity level signals from the semantic gating unit, miss signals from the neuromorphic storage pool, and device physical state parameters from the hardware status monitoring module. The dynamic sparse activation module inputs multi-dimensional information into its internal mapping function generator. The dynamic sparse activation module balances the contradiction between task demand and physical supply through a nonlinear mapping strategy. The hardware status monitoring module collects battery power, chip temperature, and computing load data in real time. When it detects that the battery power is too low or the temperature is too high, the protection threshold triggering submodule immediately sends a forced energy-saving command to the mapping function generator. The dynamic sparse activation module responds to this command and forcibly limits the activation ratio in the generated dynamic sparse mask to an extremely low range to ensure device safety. If the device is in good condition, the dynamic sparse activation module generates an appropriate sparse strategy based on the task complexity.

[0027] In one specific implementation, the semantic gating unit includes an information entropy calculation submodule, which calculates information entropy according to the formula... Calculate the information entropy of the low-dimensional semantic feature vector, where The semantic feature vector is the first The probability distribution of each dimension of features; the semantic gating unit also has a threshold comparison submodule, which compares the calculated information entropy with a preset low-complexity threshold. and high complexity threshold When comparing, if the information entropy is less than When the information entropy is greater than a certain value, it is considered a low-complexity task. Tasks are classified as high-complexity or medium-complexity depending on their complexity level. The classification result is used as a signal indicating the task complexity level. The neuromorphic storage pool includes a storage array, a fuzzy matching submodule, and a dynamic update submodule. The storage array stores data items, each containing a historical semantic feature vector, historical inference results, a storage timestamp, and a usage frequency count. The fuzzy matching submodule calculates the cosine similarity between the current low-dimensional semantic feature vector and each historical semantic feature vector in the storage array. When the maximum cosine similarity is greater than the dynamic matching threshold τ, a successful match is determined. The dynamic update submodule is connected to the storage array and periodically updates data items based on a time decay factor and a frequency enhancement factor, deleting data items whose timestamps exceed a preset expiration time or increasing the matching weight of frequently used data items.

[0028] In specific applications, the system first performs deep information entropy calculation on the input low-dimensional semantic feature vector through a semantic gating unit. The semantic gating unit deeply analyzes the probability distribution of each dimension feature in the feature vector. The semantic gating unit uses the information entropy formula to quantify the inherent uncertainty and logical complexity of the data. The semantic gating unit compares the calculated value with preset low complexity thresholds and high complexity thresholds in real time. When the value is less than the low threshold, the semantic gating unit determines the task to be low complexity; when the value is greater than the high threshold, the semantic gating unit determines the task to be high complexity; when the value is between the two, the semantic gating unit determines the task to be medium complexity. Finally, the semantic gating unit generates a clear task complexity level signal and uses it as the basis for subsequent resource scheduling.

[0029] Meanwhile, the neuromorphic storage pool initiates a memory retrieval mechanism in parallel. The storage array, as the system's dynamic long-term memory, stores multidimensional data items including historical semantic feature vectors, historical inference results, storage timestamps, and usage frequency counts. The fuzzy matching submodule immediately calculates the cosine similarity between the current low-dimensional semantic feature vector and all historical vectors in the storage array. The fuzzy matching submodule searches for the historical memory with the smallest geometric angle in the high-dimensional space. When the fuzzy matching submodule finds that the maximum cosine similarity exceeds the dynamic matching threshold τ, the system determines that the match is successful. The neuromorphic storage pool directly retrieves and outputs the corresponding historical inference results to achieve an instantaneous response with zero computational power consumption. If the fuzzy matching submodule does not find a matching item that meets the threshold, it determines that the match has failed and generates a miss signal to trigger a new inference process.

[0030] The dynamic update submodule continuously maintains the health and timeliness of the storage array during this process. Based on the time decay factor, the dynamic update submodule automatically reduces the weight of data items that have not been accessed for a long time. The dynamic update submodule periodically deletes old data whose timestamps have exceeded the preset expiration time to free up storage space. At the same time, the dynamic update submodule uses the frequency enhancement factor to increase the matching weight of high-value data items that are frequently accessed. The dynamic update submodule ensures that the storage pool can adaptively evolve with changes in user habits and gradually accumulate high-quality memories that best suit the current scenario.

[0031] In one specific implementation, the hardware status monitoring module collects battery power parameters, chip core temperature parameters, and current computing load parameters in real time through the device's underlying interface. The dynamic sparse activation module has a built-in mapping function generator. The mapping function generator takes the task complexity level signal, battery power parameters, chip core temperature parameters, and current computing load parameters as input variables and generates a dynamic sparse mask in binary matrix form through a preset nonlinear mapping function. The elements in the binary matrix correspond one-to-one with the weight blocks in the core inference layer of the large model. An element value of 1 indicates that the corresponding weight block is activated, and an element value of 0 indicates that the corresponding weight block is masked. The dynamic sparse activation module also has a protection threshold trigger submodule, which is connected to both the hardware status monitoring module and the mapping function generator. When the battery power parameter is lower than a first preset power threshold or the chip core temperature parameter is higher than a first preset temperature threshold, the protection threshold trigger submodule sends a forced power-saving command to the mapping function generator. In response to the forced power-saving command, the mapping function generator forcibly limits the proportion of elements with a value of 1 in the generated dynamic sparse mask to the range of 10% to 15%, regardless of the task complexity indicated by the task complexity level signal.

[0032] In a specific application of this invention, the hardware status monitoring module first collects battery power parameters, chip core temperature parameters, and current computing load parameters in real time at high frequency through the device's underlying interface. The hardware status monitoring module converts the physical quantities reflecting the device's health into standardized input signals and continuously sends them to the dynamic sparse activation module. The mapping function generator inside the dynamic sparse activation module then starts a multivariate fusion decision mechanism. The mapping function generator takes the received task complexity level signal and the three types of physical status parameters—battery power, chip temperature, and computing load—as input variables. The mapping function generator uses a preset nonlinear mapping function to find the optimal solution in a multidimensional space. The mapping function generator finally outputs a dynamic sparse mask in the form of a binary matrix. Each element in the matrix corresponds to a weight block in the core inference layer of the large model. Elements with a value of 1 in the matrix represent authorized activated computing units, and elements with a value of 0 represent energy-saving units that are forced to sleep. This allows the large model to dynamically change its computing form according to real-time task requirements and device status.

[0033] Building upon this basic workflow, the protection threshold trigger submodule acts as the system's safety circuit breaker. This submodule monitors the data stream from the hardware status monitoring module in real time. Once it detects that the battery level has fallen below a first preset threshold, or the chip core temperature has risen above a first preset threshold, the submodule immediately determines that the system is in a critical state and sends a high-priority forced power-saving command to the mapping function generator. Upon receiving the forced power-saving command, the mapping function generator immediately overrides regular business logic judgments. Regardless of the complexity of the task indicated by the task complexity level signal, the mapping function generator will forcibly reconstruct its output strategy. It will forcibly compress and lock the proportion of elements with a value of 1 in the generated dynamic sparse mask to an extremely low range of 10% to 15%. This means that over 85% of the weight blocks in the core inference layer of the large model will be instantly masked, retaining only the backbone network path for minimal inference calculations. Although this temporarily sacrifices some inference accuracy, it effectively prevents sudden power outages due to overcomputation or permanent hardware damage caused by overheating.

[0034] In one specific implementation, the system of semantic compression and neuromorphic perception-based end-side dynamic reasoning method includes: The input preprocessing module receives external natural language sequences at its input end and connects to the input end of the semantic compression network at its output end. A semantic compression network, located on the output path of the input preprocessing module, is used to convert input data into low-dimensional semantic feature vectors. The semantic gating unit, whose input is connected to the output of the semantic compression network, is used to receive low-dimensional semantic feature vectors and output task complexity level signals. The neuromorphic storage pool, whose input is connected to the output of the semantic compression network, stores a library of historical semantic feature vectors, which is used to perform similarity comparison and output matching results or miss signals. The hardware status monitoring module, which is located at the system bottom layer, is used to collect and output the physical status parameters of the device in real time; The dynamic sparse activation module has a first input connected to the semantic gating unit, a second input connected to the neuromorphic storage pool, and a third input connected to the hardware state monitoring module. It is used to generate a dynamic sparse mask by integrating various signals. The core inference layer of the large model has its control end connected to the output end of the dynamic sparse activation module to receive the dynamic sparse mask, and its computation end connected to the semantic compression network to selectively activate the internal weight blocks according to the dynamic sparse mask to perform inference calculations and output the results.

[0035] In practical applications, this invention constructs a closed-loop system from data perception to dynamic computation during actual operation. The input preprocessing module first receives external natural language sequences as the system's entry point. The input preprocessing module immediately performs deep structured cleaning and standardization on the original text, filters out data noise and extracts key semantic units, and then sends the high-quality structured input sequence to the downstream semantic compression network. The semantic compression network, as the core of the system's feature abstraction, uses a lightweight neural network architecture to map high-dimensional sparse structured sequences into compact low-dimensional semantic feature vectors. The generated low-dimensional semantic feature vectors are then simultaneously distributed to the semantic gating unit and the neuromorphic storage pool, two parallel processing branches.

[0036] Upon receiving a feature vector, the semantic gating unit initiates a cognitive evaluation mechanism. Its internal information entropy calculation submodule deeply analyzes the probability distribution of each dimension of the feature vector. The semantic gating unit determines the inherent complexity of the task by quantifying the uncertainty of the data. After comparing the calculation results with a preset threshold, the semantic gating unit generates a specific task complexity level signal, which directly reflects the task's theoretical computational requirements. Simultaneously, the neuromorphic storage pool performs a memory retrieval operation. The fuzzy matching submodule within the storage pool calculates the cosine similarity between the current feature vector and vectors in the historical database. The neuromorphic storage pool determines whether the current task is a "re-doing of an old problem." If the similarity exceeds a dynamic threshold, a successful match is determined. The neuromorphic storage pool directly retrieves and outputs the cached historical inference results to achieve instantaneous response with zero computational power consumption. If a match fails, the neuromorphic storage pool generates a miss signal and triggers a dynamic update submodule to perform self-evolution of the memory database based on time decay and frequency enhancement.

[0037] The hardware status monitoring module, as the physical sensing cornerstone of the system, collects battery power, chip core temperature, and current computing load physical parameters in real time at high frequency through the device's underlying interface. The hardware status monitoring module converts analog quantities reflecting device health into standardized digital signals and continuously outputs them. The dynamic sparse activation module, as the intelligent scheduler of the entire system, simultaneously receives task complexity level signals from the semantic gating unit, miss signals from the neuromorphic storage pool, and device physical status parameters from the hardware status monitoring module. The mapping function generator inside the dynamic sparse activation module inputs multi-dimensional information into a nonlinear mapping strategy for deep fusion. The dynamic sparse activation module weighs the contradiction between task demand and physical supply. When low battery or high temperature is detected, the protection threshold triggering submodule within the dynamic sparse activation module immediately sends a forced energy-saving command. The dynamic sparse activation module responds to this command and forcibly limits the activation ratio in the generated dynamic sparse mask to an extremely low range to ensure device safety. If the device is in good condition, the dynamic sparse activation module generates an adapted sparse strategy based on task complexity and finally outputs a binary matrix corresponding to the weight block of the core inference layer of the large model.

[0038] In one specific implementation, the semantic compression network adopts a lightweight convolutional neural network structure, with fewer than one-tenth the number of parameters of a traditional encoder, and is directly embedded in the data path between the input preprocessing module and the semantic gating unit. The neuromorphic storage pool is physically deployed in the high-speed cache area of ​​the system memory and communicates bidirectionally with the semantic compression network and the dynamic sparse activation module via a bus to support the writing and reading of historical data. The dynamic sparse activation module includes a signal fusion unit and a mask generation unit. The signal fusion unit is electrically connected to the semantic gating unit, the neuromorphic storage pool, and the hardware status monitoring module, respectively, and is used to perform vector concatenation of the received task complexity level signal, miss signal, and device physical status parameters. The mask generation unit is connected to the output of the signal fusion unit and has a configurable mapping table inside, which is used to look up the table or calculate the binary control signals of the switching states of each Transformer encoder layer and decoder layer in the core inference layer of the control model based on the concatenated vector.

[0039] In specific applications, the semantic compression network, acting as a hub for data flow, employs a lightweight convolutional neural network structure with fewer than one-tenth the number of parameters of a traditional encoder. This allows the semantic compression network to be embedded in the high-speed data path between the input preprocessing module and the semantic gating unit. After receiving the structured input sequence, the semantic compression network maps it into a low-dimensional semantic feature vector with almost zero latency, reducing the computational overhead and memory usage of the front-end processing. The neuromorphic storage pool is physically deployed in the high-speed cache area of ​​the system memory. Its close proximity to the computing unit enables the neuromorphic storage pool to establish a bidirectional high-speed communication link with the semantic compression network and the dynamic sparse activation module via the system bus. The neuromorphic storage pool uses this channel to perform real-time writing operations of historical data to update the memory bank, while simultaneously quickly reading historical semantic feature vectors for similarity comparison, ensuring that the memory retrieval process does not become a system bottleneck.

[0040] The dynamic sparse activation module achieves intelligent decision-making through close collaboration between the signal fusion unit and the mask generation unit. The signal fusion unit first establishes stable electrical connections with the semantic gating unit, the neuromorphic storage pool, and the hardware state monitoring module. It captures three types of heterogeneous data in real time: task complexity level signals, miss signals, and device physical state parameters. The signal fusion unit aligns the discrete signals on the time axis and performs vector concatenation to form a comprehensive state vector that includes both task logic requirements and device physical constraints. The mask generation unit then receives this comprehensive state vector. Its pre-stored configurable mapping table is immediately activated. Based on the specific numerical characteristics of the concatenated vector, the mask generation unit performs a fast lookup in the mapping table or executes preset interpolation calculations. Finally, the mask generation unit generates a set of refined binary control signals, where each bit precisely corresponds to the on / off state of a specific Transformer encoder and decoder layer in the core inference layer of the large model.

[0041] When the binary control signal is transmitted to the large model core inference layer, the large model core inference layer dynamically reconstructs its computation graph according to the signal instructions. The large model core inference layer only activates the encoder and decoder layers that are marked as active to participate in the current inference operation. The large model core inference layer puts the unselected layers into a completely dormant state to cut off the current path, so that the system can flexibly adjust the network depth according to the difficulty of the task. For simple tasks, the large model core inference layer can output the result by calling only the shallow network. For complex tasks, the large model core inference layer activates the deep network as needed to ensure accuracy. At the same time, if the hardware status monitoring module detects that the device is overheating or the power is insufficient, the comprehensive vector generated by the signal fusion unit will guide the mask generation unit to output extremely sparse control signals, forcing the large model core inference layer to shut down most layers to protect hardware safety.

[0042] In one specific implementation, an output fusion module is further included. The input of the output fusion module is connected to the output of the neuromorphic storage pool and the output of the large model core inference layer, respectively. When the neuromorphic storage pool outputs a successfully matched historical inference result, the output fusion module directly uses the result as the final output of the system. When the neuromorphic storage pool outputs a miss signal and the large model core inference layer completes its calculation, the output fusion module receives the preliminary inference result from the large model core inference layer, performs syntax correction and format standardization on it, and uses it as the final output of the system. At the same time, the low-dimensional semantic feature vector generated this time and the final output of the system are written into the neuromorphic storage pool. The storage pool and hardware status monitoring module include a power detection submodule, a temperature sensing submodule, and a load monitoring submodule. The power detection submodule is connected to the device's battery management chip, the temperature sensing submodule is connected to the thermistor inside the device's SoC, and the load monitoring submodule is connected to the device's operating system kernel scheduler. The dynamic sparse activation module also integrates a mode switching switch, which is controlled by the abnormal status flag bit output by the hardware status monitoring module. When the abnormal status flag bit is detected to be valid, the mode switching switch cuts off the semantic gating unit's control over mask generation, and instead, a preset extreme energy-saving strategy directly generates a dynamic sparse mask with a fixed sparsity.

[0043] In practical applications, the output fusion module of this invention serves as the final convergence point of the data stream. Its input simultaneously monitors the output status of the neuromorphic storage pool and the core inference layer of the large model. When the neuromorphic storage pool determines that the current task is a historical recurrence and outputs a successfully matched historical inference result, the output fusion module immediately bypasses the complex computational path and directly delivers the historical result as the final system output to the user, thus achieving zero-latency instantaneous response. If the neuromorphic storage pool outputs a miss signal indicating that the current task is a novel scenario, the output fusion module enters a standby state until the core inference layer of the large model completes dynamic sparse computation. Once the core inference layer of the large model outputs a preliminary inference result, the output fusion module immediately starts a post-processing program to perform syntax correction on the result to eliminate generation errors and performs format standardization to ensure output specifications. The processed data is then established as the final system output. Simultaneously, the output fusion module performs a memory write operation, packaging the generated low-dimensional semantic feature vector with the newly determined final system output and writing it back to the historical database of the neuromorphic storage pool via a high-speed bus. This allows the system to continuously accumulate new knowledge with increased usage, achieving adaptive evolution that becomes smarter with use.

[0044] At the underlying sensing level, the hardware status monitoring module constructs a comprehensive physical sensing network through three dedicated sub-modules. The power detection sub-module directly connects to the device's battery management chip to obtain the remaining power percentage. The temperature sensing sub-module is closely attached to the thermistor inside the device's SoC to capture real-time core temperature changes. The load monitoring sub-module delves into the device's operating system kernel scheduler to read the current computing power queue length and resource utilization. These three sub-modules work together to convert analog quantities in the physical world into digital status parameters. The mode switching switch integrated within the dynamic sparse activation module acts as a safety gatekeeper, continuously monitoring the abnormal status flags output by the hardware status monitoring module. Once the power detection sub-module reports a risk of running out of power or the temperature sensing sub-module reports an overheating alarm, the switch will take action. When the abnormal state flag is activated, the mode switching switch immediately performs a forced takeover action. The mode switching switch instantly cuts off the semantic gating unit's original dynamic control over mask generation, preventing it from allocating resources according to task complexity. Instead, it activates the preset extreme energy-saving strategy. This strategy ignores the difficulty of the task and directly generates a dynamic sparse mask with a fixed extremely low sparsity. The large model inference layer only retains the minimum weight blocks required to maintain basic functions, preventing equipment downtime or permanent hardware damage due to overcomputation. At the same time, through the dual-path arbitration and memory feedback mechanism of the output fusion module, the system maximizes inference efficiency and accuracy under normal operating conditions and prioritizes equipment safety and survivability under abnormal operating conditions, truly realizing robust operation and continuous optimization of the edge intelligent system in complex and ever-changing environments.

[0045] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.

[0046] The present invention and its embodiments have been described above. This description is not restrictive, and the accompanying drawings are only one embodiment of the present invention; the actual structure is not limited thereto. In conclusion, if those skilled in the art are inspired by this description and design similar structures and embodiments without departing from the spirit of the invention, such designs should fall within the protection scope of the present invention.

Claims

1. An end-to-end dynamic reasoning method for semantic compression and neuromorphic perception, characterized in that, Includes the following steps: Step 1: The input preprocessing module receives the natural language sequence and outputs the structured input sequence to the semantic compression network; Step 2: The semantic compression network is located downstream of the input preprocessing module. It maps the structured input sequence into a low-dimensional semantic feature vector and transmits it to the semantic gating unit and the neuromorphic storage pool. Step 3: The semantic gating unit calculates the information entropy based on the low-dimensional semantic feature vector and sends the task complexity level signal to the dynamic sparse activation module. Step 4: The neuromorphic storage pool stores historical semantic feature vectors and corresponding inference results. It compares the current low-dimensional semantic feature vector with the stored historical semantic feature vector. If the match is successful, the historical inference result is directly output. If the match fails, a miss signal is generated and sent to the dynamic sparse activation module. Step 5: The dynamic sparse activation module is connected to the semantic gating unit, the neuromorphic storage pool and the hardware status monitoring module respectively, and receives the task complexity level signal, the miss signal and the device physical status parameters to generate a dynamic sparse mask. Step 6: The core inference layer of the large model receives the dynamic sparse mask, activates only the weight blocks identified by the dynamic sparse mask to calculate the structured input sequence, and outputs preliminary inference results.

2. The end-side dynamic reasoning method for semantic compression and neuromorphic perception according to claim 1, characterized in that, The semantic gating unit contains an information entropy calculation submodule, which calculates information entropy according to the formula... Calculate the information entropy of the low-dimensional semantic feature vector, where The semantic feature vector is the first The probability distribution of each dimension feature; the semantic gating unit also includes a threshold comparison submodule, which compares the calculated information entropy with a preset low-complexity threshold. and high complexity threshold When comparing, if the information entropy is less than When the information entropy is greater than a certain value, it is considered a low-complexity task. When a task is classified as high-complexity, it is classified as medium-complexity; when it falls between these two levels, it is classified as high-complexity. The classification result is then used as the task complexity level signal.

3. The end-side dynamic reasoning method for semantic compression and neuromorphic perception according to claim 1, characterized in that, The neuromorphic storage pool includes a storage array, a fuzzy matching submodule, and a dynamic update submodule. The storage array stores data items, each containing a historical semantic feature vector, historical inference results, a storage timestamp, and a usage frequency count. The fuzzy matching submodule calculates the cosine similarity between the current low-dimensional semantic feature vector and each historical semantic feature vector in the storage array. When the maximum cosine similarity is greater than the dynamic matching threshold τ, a successful match is determined. The dynamic update submodule is connected to the storage array and periodically updates the data items according to a time decay factor and a frequency enhancement factor, deleting data items whose timestamps exceed a preset expiration time or increasing the matching weight of frequently used data items.

4. The end-side dynamic reasoning method for semantic compression and neuromorphic perception according to claim 1, characterized in that, The hardware status monitoring module collects battery power parameters, chip core temperature parameters, and current computing load parameters in real time through the device's underlying interface. The dynamic sparse activation module has a built-in mapping function generator. The mapping function generator takes the task complexity level signal, battery power parameters, chip core temperature parameters, and current computing load parameters as input variables, and generates the dynamic sparse mask in binary matrix form through a preset nonlinear mapping function. The elements in the binary matrix correspond one-to-one with the weight blocks in the core inference layer of the large model. An element value of 1 indicates that the corresponding weight block is activated, and an element value of 0 indicates that the corresponding weight block is masked.

5. The end-side dynamic reasoning method for semantic compression and neuromorphic perception according to claim 4, characterized in that, The dynamic sparse activation module also includes a protection threshold trigger submodule, which is connected to the hardware status monitoring module and the mapping function generator. When the battery power parameter is lower than a first preset power threshold or the chip core temperature parameter is higher than a first preset temperature threshold, the protection threshold trigger submodule sends a forced power-saving command to the mapping function generator. In response to the forced power-saving command, the mapping function generator forcibly limits the proportion of elements with a value of 1 in the generated dynamic sparse mask to a range of 10% to 15%, regardless of the task complexity indicated by the task complexity level signal.

6. A system for an end-to-end dynamic reasoning method based on semantic compression and neuromorphic perception according to any one of claims 1-5, characterized in that, include: The input preprocessing module receives external natural language sequences at its input end and connects to the input end of the semantic compression network at its output end. A semantic compression network, located on the output path of the input preprocessing module, is used to convert input data into low-dimensional semantic feature vectors. A semantic gating unit, whose input is connected to the output of the semantic compression network, is used to receive the low-dimensional semantic feature vector and output a task complexity level signal. The neuromorphic storage pool, whose input is connected to the output of the semantic compression network, stores a historical semantic feature vector library, which is used to perform similarity comparison and output matching results or miss signals. The hardware status monitoring module, which is located at the system bottom layer, is used to collect and output the physical status parameters of the device in real time; The dynamic sparse activation module has a first input terminal connected to the semantic gating unit, a second input terminal connected to the neuromorphic storage pool, and a third input terminal connected to the hardware state monitoring module, and is used to generate a dynamic sparse mask by integrating various signals. The core inference layer of the large model has its control end connected to the output end of the dynamic sparse activation module to receive the dynamic sparse mask, and its calculation end connected to the semantic compression network, which is used to selectively activate the internal weight blocks according to the dynamic sparse mask to perform inference calculation and output the result.

7. The system of the end-side dynamic reasoning method for semantic compression and neuromorphic perception according to claim 6, characterized in that, The semantic compression network adopts a lightweight convolutional neural network structure, with fewer than one-tenth the number of parameters of a traditional encoder, and is directly embedded in the data path between the input preprocessing module and the semantic gating unit; the neuromorphic storage pool is physically deployed in the high-speed cache area of ​​the system memory, and communicates bidirectionally with the semantic compression network and the dynamic sparse activation module through a bus to support the writing and reading of historical data.

8. The system of the end-side dynamic reasoning method for semantic compression and neuromorphic perception according to claim 6, characterized in that, The dynamic sparse activation module includes a signal fusion unit and a mask generation unit. The signal fusion unit is electrically connected to the semantic gating unit, the neuromorphic storage pool, and the hardware state monitoring module, respectively, and is used to perform vector concatenation of the received task complexity level signal, miss signal, and device physical state parameters. The mask generation unit is connected to the output of the signal fusion unit and has a configurable mapping table stored inside. It is used to generate binary control signals that control the switching states of each Transformer encoder layer and decoder layer in the core inference layer of the large model by looking up the table or calculating based on the concatenated vector.

9. The system of the end-side dynamic reasoning method for semantic compression and neuromorphic perception according to claim 6, characterized in that, It also includes an output fusion module, whose input is connected to the output of the neuromorphic storage pool and the output of the large model core inference layer, respectively; when the neuromorphic storage pool outputs a successfully matched historical inference result, the output fusion module directly uses the result as the final output of the system; When the neuromorphic storage pool outputs a miss signal and the large model core inference layer completes the calculation, the output fusion module receives the preliminary inference result of the large model core inference layer, performs syntax correction and format standardization on it, and uses it as the final output of the system. At the same time, the low-dimensional semantic feature vector generated this time and the final output feedback of the system are written into the neuromorphic storage pool.

10. The system of the end-side dynamic reasoning method for semantic compression and neuromorphic perception according to claim 6, characterized in that, The hardware status monitoring module includes a power detection submodule, a temperature sensing submodule, and a load monitoring submodule. The power detection submodule is connected to the device's battery management chip, the temperature sensing submodule is connected to the thermistor inside the device's SoC, and the load monitoring submodule is connected to the device's operating system kernel scheduler. The dynamic sparse activation module also integrates a mode switching switch, which is controlled by an abnormal status flag bit output by the hardware status monitoring module. When an abnormal status flag bit is detected to be valid, the mode switching switch cuts off the semantic gating unit's control over mask generation, and instead, a preset extreme energy-saving strategy directly generates a dynamic sparse mask with a fixed sparsity.