A smart metering-oriented tinyml hardware acceleration system and method

By introducing a combined architecture of instruction control unit, shut-off array, hybrid quantization and power scheduling unit into smart metering and smart meter reading devices, the problems of high power consumption and low computing efficiency are solved, achieving extreme energy efficiency and efficient TinyML inference, accelerating deep separable convolution and hybrid quantization operations, and supporting real-time and low power consumption requirements.

CN122242602APending Publication Date: 2026-06-19HUNAN TENGFA MICROELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUNAN TENGFA MICROELECTRONICS CO LTD
Filing Date
2026-05-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies suffer from high power consumption, low computational utilization, and low computational efficiency when accelerating TinyML inference for battery-powered IoT edge devices such as smart metering and smart meter reading. In particular, they lack support for depthwise separable convolution, pointwise convolution, and INT4/INT8 mixed quantization, and have poor memory access efficiency, making it difficult to meet the system requirements of real-time performance and extremely low power consumption.

Method used

It adopts a combined architecture of instruction control unit, power-off lightweight computing array, hybrid quantization processing unit, hierarchical cache unit and power scheduling unit, supports deep convolution, pointwise convolution and activation function operations, adopts INT4/INT8 hybrid quantization, and combines a three-level power management strategy to achieve microsecond-level wake-up and fast sleep, and optimizes data layout and memory access efficiency.

Benefits of technology

It achieves extreme energy efficiency, reducing computational redundancy by more than 70%, power consumption by 90%, area by 60%, and speed by 5 to 10 times. The computational utilization and efficiency are greatly improved, and it supports the efficient execution of TinyML lightweight models.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242602A_ABST
    Figure CN122242602A_ABST
Patent Text Reader

Abstract

This invention relates to the field of energy Internet of Things (IoT) technology, specifically a TinyML hardware acceleration system and method for smart metering. The system includes: an instruction control unit for parsing inference instructions and scheduling the execution flow of each operator, and for automatically extracting quantization parameters from the cache and configuring them to a hybrid quantization processing unit before operator execution; a power-off lightweight computing array, including a depthwise convolution array, a pointwise convolution array, and activation units; a hybrid quantization processing unit for performing quantization, dequantization, scaling, and offset operations for INT4 and INT8; a hierarchical cache unit, including a weight cache, a feature cache, and an output cache; and a power scheduling unit for managing power consumption of each unit based on the inference task status. This invention addresses the problems of high power consumption, low computational utilization, and low computational efficiency in existing technologies for TinyML inference acceleration requirements of battery-powered IoT edge devices such as smart metering and smart meter reading.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of energy Internet of Things technology, and in particular relates to a TinyML hardware acceleration system and method for smart metering. Background Technology

[0002] Smart metering and smart meter reading devices in the energy IoT ecosystem need to perform AI inference tasks such as data anomaly detection, load identification, and meter OCR recognition under the harsh conditions of battery power. For the TinyML lightweight models commonly used in these tasks (characterized by small model size, low computational requirements, intermittent operation, and extreme sensitivity to power consumption), a dedicated hardware acceleration architecture must be designed. This architecture must fully support key TinyML operators such as depthwise separable convolution, pointwise convolution, pooling, and activation; it must support low-bit quantization such as INT4 / INT8 to further improve energy efficiency; it must have microwatt-level standby power consumption and millisecond-level wake-up capability to match intermittent operating modes; and its area and power consumption must meet the requirements of system-on-a-chip (SoC) integration, thereby achieving a high-efficiency, low-cost, and integrated edge AI solution.

[0003] The existing technologies have several shortcomings: General-purpose low-power NPU (Neural Processing Unit, a dedicated integrated circuit designed for neural network inference) solutions: These use large systolic arrays and multi-core parallel architectures to support complex DNN models, but they are large in area and consume a lot of power (usually tens of milliwatts or more). They also have serious redundancy for lightweight models, and their standby power consumption cannot meet the requirements of metering equipment. Simplified MCU + software inference library: These use MCU software to execute TinyML models without hardware accelerators, but the inference speed is slow (hundreds of milliseconds) and the computational efficiency is low, which cannot meet the requirements of real-time anomaly detection. Simple convolution accelerators: These only support standard convolution and do not support lightweight operators such as depthwise separable convolution, so they cannot efficiently execute TinyML models such as MobileNet and SqueezeNet, resulting in low computational utilization.

[0004] In summary, existing technologies have the following drawbacks when addressing the TinyML inference acceleration needs of battery-powered IoT edge devices such as smart metering and smart meter reading:

[0005] Excessive power consumption. The inference and standby power consumption of the general-purpose NPU architecture far exceeds the power budget for battery-powered scenarios, and the large standby leakage current makes it difficult to meet the long battery life requirements of devices.

[0006] Low computational utilization. The computing array designed for large models has a large number of redundant computing units when running TinyML lightweight models, resulting in extremely poor energy efficiency.

[0007] Insufficient operator support. Existing solutions lack efficient hardware support for depthwise separable convolution, pointwise convolution, and INT4 / INT8 hybrid quantization (simultaneously using multiple quantization precisions such as INT4 and INT8 to minimize storage and power consumption while ensuring inference accuracy), resulting in low model inference efficiency and high storage and computational overhead.

[0008] Poor memory access efficiency. Data reuse and layout optimization were not performed for the small-dimensional feature maps typical of metering scenarios, resulting in a high proportion of off-chip memory access power consumption in the total power consumption.

[0009] There is a lack of low-power scheduling mechanisms suitable for intermittent inference tasks. Existing solutions cannot achieve microsecond-level wake-up and fast sleep, making it difficult to simultaneously meet the system requirements of real-time performance and extreme low power consumption. Summary of the Invention

[0010] To address the shortcomings of existing technologies, the purpose of this invention is to provide a TinyML hardware acceleration system for smart metering, which solves the problems of high power consumption, low computational utilization, and low computational efficiency in existing technologies when meeting the TinyML inference acceleration requirements of battery-powered IoT edge devices such as smart metering and smart meter reading. In addition, this invention also provides a TinyML hardware acceleration method for smart metering.

[0011] To solve the above-mentioned technical problems, the present invention adopts the following technical solution:

[0012] In a first aspect, the present invention provides a TinyML hardware acceleration system for smart metering, comprising:

[0013] The instruction control unit is used to parse inference instructions and schedule the execution flow of each operator.

[0014] A power-off lightweight computing array, connected to the instruction control unit, includes a deep convolution array, a pointwise convolution array, and an activation unit. The deep convolution array is used to perform deep convolution operations, the pointwise convolution array is used to perform pointwise convolution operations, and the activation unit is used to perform activation function operations. Each of the deep convolution array, the pointwise convolution array, and the activation unit supports independent clock shutdown and power shutdown.

[0015] The hybrid quantization processing unit, connected to the power-off lightweight computing array, is used to perform INT4 and INT8 quantization (using 4-bit and 8-bit mixed precision to reduce storage and computing power consumption), dequantization, scaling and offset operations.

[0016] The hierarchical caching unit is connected to the hybrid quantization processing unit and the turn-off lightweight computing array, and includes a weight cache, a feature cache and an output cache.

[0017] The power scheduling unit is connected to the instruction control unit, the power-off lightweight computing array, the hybrid quantization processing unit, and the hierarchical cache unit, respectively, and is used to manage the power consumption of each unit according to the inference task status.

[0018] Furthermore, the depthwise convolutional array supports 3×3 and 1×1 convolutional kernels; the activation unit supports ReLU, ReLU6, and Hardswish activation functions.

[0019] Furthermore, the hybrid quantization processing unit includes a quantization multiplier, an efficient adder tree, and a shift scaling circuit, which are used to perform quantization and dequantization operations at the hardware level.

[0020] Furthermore, the hybrid quantization processing unit supports a hybrid precision calculation mode where weights use INT4 precision and activation values ​​use INT8 precision.

[0021] Furthermore, in the hierarchical caching unit, the feature data is arranged in NHWC format; the weight data is stored in groups according to the convolution kernel; and the output cache is used to connect to the subsequent communication interface.

[0022] Furthermore, the power scheduling unit adopts a three-stage power management strategy, including:

[0023] Sleep mode: Except for the wake-up detection circuit, all other modules are turned off, and the static power consumption is less than 1μW;

[0024] Standby state: Powering the instruction control unit and the hierarchical cache unit with a power consumption of less than 10μW;

[0025] Inference state: The power-off lightweight computing array and the hybrid quantization processing unit are powered on as needed to perform calculations, with power consumption of less than 5mW.

[0026] Furthermore, after detecting the completion of the inference task, the power scheduling unit sequentially shuts down the clock and power supply of the power-offable lightweight computing array, causing the system to fall back to the sleep state.

[0027] Furthermore, the instruction control unit has automatic parameter loading logic, which is used to automatically extract the quantization scaling factor and offset parameter matching the layer from the layered cache unit before the start of each layer operator operation, and configure them to the hybrid quantization processing unit.

[0028] Secondly, the present invention also provides a TinyML hardware acceleration method for smart metering, comprising the following steps:

[0029] S10, the hybrid quantization processing unit dequantizes and scales the input INT8 activation value and the preloaded INT4 weights;

[0030] S20. A dual-array separation architecture is adopted. First, the spatial features are extracted by 3×3 or 1×1 deep convolution of the deep convolution array. Then, the pointwise convolution array performs 1×1 convolution to complete channel fusion. The calculation results are nonlinearly transformed by the activation function unit.

[0031] S30, the hybrid quantization processing unit performs inverse quantization and scaling on the output, and the optimized layout output buffer temporarily stores the result in NHWC format and prepares it for output;

[0032] S40: Upon detecting the completion of an inference task, the power scheduling unit immediately initiates a shutdown sequence, sequentially shutting down the computing array clock and power supply, causing the system to automatically fall back to sleep mode until the next task wakes it up.

[0033] Furthermore, in S20, the activation function includes one or more of ReLU, ReLU6, and Hardswish.

[0034] The TinyML hardware acceleration system and method for smart metering provided by this invention have at least the following advantages compared with the prior art:

[0035] Existing technologies suffer from high power consumption, low computational utilization, and low computational efficiency when addressing the TinyML inference acceleration needs of battery-powered IoT edge devices such as smart metering and smart meter reading. This invention addresses these issues through a lightweight, turn-off array, hybrid quantization acceleration, metering feature-optimized caching, and three-level power management. It features a dual-array turn-off lightweight computing structure for TinyML, supporting hardware acceleration for depthwise separable convolution (which divides convolution into depthwise convolution and pointwise convolution, the core operator of lightweight networks). This achieves extreme energy efficiency for smart metering scenarios. Employing an INT4 / INT8 hybrid quantization hardware processing unit, it requires no software intervention and utilizes an operator-level hardware autonomous execution architecture, eliminating the need for MCU intervention. Compared to a general-purpose NPU, it reduces computational redundancy by over 70%, power consumption by 90%, and area by 60%. Compared to software inference, it offers a 5-10x speed improvement and supports lightweight TinyML models, allowing direct integration into edge metering SoCs, effectively reducing power consumption and significantly improving computational utilization and efficiency. Attached Figure Description

[0036] To more clearly illustrate the solution of the present invention, a brief introduction will be given to the drawings used in the description of the embodiments below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0037] Figure 1An overall architecture diagram of a TinyML hardware acceleration system for smart metering provided in an embodiment of the present invention;

[0038] Figure 2 A block diagram of a switchable lightweight computing array for a TinyML hardware acceleration system for smart metering, provided as an embodiment of the present invention;

[0039] Figure 3 A block diagram of an INT4 / INT8 hybrid quantization processing unit for a TinyML hardware acceleration system for smart metering, provided in an embodiment of the present invention;

[0040] Figure 4 A schematic diagram of the hierarchical caching and data layout of a TinyML hardware acceleration system for smart metering provided in an embodiment of the present invention;

[0041] Figure 5 A power scheduling unit state transition diagram for a TinyML hardware acceleration system for smart metering, provided in an embodiment of the present invention;

[0042] Figure 6 This is a flowchart of a TinyML hardware acceleration method for smart metering provided in an embodiment of the present invention. Detailed Implementation

[0043] To facilitate understanding of the present invention, a more complete description will be given below with reference to the accompanying drawings. Preferred embodiments of the invention are shown in the drawings. However, the invention can be implemented in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided to provide a thorough and complete understanding of the disclosure of the invention.

[0044] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

[0045] This invention provides a TinyML hardware acceleration system for smart metering, applicable to smart metering and smart meter reading scenarios. The TinyML hardware acceleration system for smart metering includes:

[0046] The system comprises: an instruction control unit for parsing inference instructions and scheduling the execution of each operator; a power-off lightweight computing array connected to the instruction control unit, including a deep convolution array, a pointwise convolution array, and an activation unit; the deep convolution array for performing deep convolution operations, the pointwise convolution array for performing pointwise convolution operations, and the activation unit for performing activation function operations; wherein the deep convolution array, the pointwise convolution array, and the activation unit all support independent clock shutdown and power shutdown; a hybrid quantization processing unit connected to the power-off lightweight computing array for performing quantization, dequantization, scaling, and offset operations for INT4 and INT8; a hierarchical cache unit connected to the hybrid quantization processing unit and the power-off lightweight computing array, including a weight cache, a feature cache, and an output cache; and a power scheduling unit connected to the instruction control unit, the power-off lightweight computing array, the hybrid quantization processing unit, and the hierarchical cache unit for implementing power management for each unit according to the inference task status.

[0047] This invention addresses the problems of high power consumption, low computational utilization, and low computing efficiency in existing technologies when addressing the TinyML inference acceleration needs of battery-powered IoT edge devices such as smart metering and smart meter reading.

[0048] To enable those skilled in the art to better understand the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings.

[0049] This invention provides a TinyML hardware acceleration system for smart metering, applicable to smart metering and smart meter reading scenarios, combined with... Figures 1 to 5 In this embodiment, the TinyML hardware acceleration system for smart metering includes:

[0050] The instruction control unit is used to parse inference instructions and schedule the execution flow of each operator.

[0051] The power-off lightweight computing array supports independent shutdown at the module level and connects to the instruction control unit. It includes a deep convolution array, a pointwise convolution array, and activation units. The deep convolution array performs deep convolution operations, the pointwise convolution array performs pointwise convolution operations, and the activation units perform activation function operations. Each of the deep convolution array, pointwise convolution array, and activation units supports independent clock and power shutdown. Specifically, the power-off lightweight computing array adopts a dual-array separation architecture to efficiently accelerate the two core stages of deep separable convolution: one deep convolution array is dedicated to spatial filtering, supporting small-sized convolution kernels such as 3×3 and 1×1; the other pointwise convolution array is dedicated to 1×1 convolution to improve the computational efficiency of channel fusion. The array is backed by a dedicated activation unit that supports common TinyML activation functions such as ReLU, ReLU6, and Hardswish. The core energy efficiency feature of this design is that the deep convolution array, the pointwise convolution array, and the activation unit all support independent clock and power shutdown, and are coordinated by a power scheduling unit. This allows the relevant circuits to automatically enter a low-power state when there is no computing task, thereby greatly reducing static power consumption.

[0052] The hybrid quantization processing unit, designed specifically for low-bit computation, connects to a power-off lightweight computing array to efficiently perform quantization, dequantization, scaling, and offset operations for INT4 and INT8. Specifically, the INT4 / INT8 hybrid quantization processing unit designed in this embodiment aims to achieve extreme energy efficiency through mixed-precision inference. This unit supports mixed-precision computation with INT4 weights and INT8 activation values. Its core hardware circuitry includes a quantization multiplier, an efficient adder tree, and shift scaling circuitry, enabling automatic and seamless quantization and dequantization operations at the hardware level without software intervention. Compared to traditional all-INT8 precision solutions, this design can reduce model storage footprint by 30% to 50% and reduce computational power consumption by approximately 40%.

[0053] A hierarchical caching unit, optimizing data layout for the small-dimensional features of metering scenarios, connects to the hybrid quantization processing unit and the turn-off lightweight computing array. It includes a weight cache, a feature cache, and an output cache to maximize data reuse and reduce memory access. Specifically, this embodiment significantly improves energy efficiency through hierarchical caching and data layout optimization strategies. For the typical small-dimensional data characteristics of smart metering scenarios (8-64 channels, 8×8 to 32×32 feature maps), deep optimization of storage access is performed: feature data is compactly arranged in NHWC format to enhance access locality; weights are grouped and stored according to convolution kernels to ensure read continuity; the output cache directly connects to the subsequent communication interface to achieve high-efficiency data throughput. After optimization, the system cache hit rate is no less than 95%, thereby significantly reducing SRAM access frequency and power consumption.

[0054] The power scheduling unit, connected to the instruction control unit, the power-off lightweight computing array, the hybrid quantization processing unit, and the hierarchical cache unit, manages the power consumption of each unit according to the inference task state (clock gating, power gating, and fast sleep and wake-up strategies). Specifically, the power scheduling unit designed in this embodiment adopts a refined three-stage power management strategy to achieve seamless switching from deep sleep to full-speed operation and extreme energy efficiency. During periods without tasks, the system enters a sleep state, where all modules except the necessary wake-up detection circuit are turned off, and the static power consumption is less than 1μW. When a task signal is detected, the system switches to a standby state, supplying power only to the core control unit and cache, maintaining power consumption below 10μW to prepare for rapid response. After entering the inference state, high-power modules such as the computing array are powered on as needed to perform calculations. After the task is completed, the system automatically and quickly falls back to the sleep state. This strategy ensures that the chip always operates at the lowest power consumption state matching its task load, thereby meeting the stringent requirements of battery-powered devices for constant low power consumption and instantaneous high performance.

[0055] This invention also provides a TinyML hardware acceleration method for smart metering, applied to the TinyML hardware acceleration system for smart metering described in the above embodiments, combined with... Figures 1 to 6 In this embodiment, the TinyML hardware acceleration method for smart metering includes the following steps:

[0056] S10. When the input feature data is sensed, the system first enters the quantization preprocessing stage: the hybrid quantization processing unit dequantizes and scales the input INT8 activation value and the preloaded INT4 weights to prepare for subsequent calculations.

[0057] Specifically, in this embodiment, before starting the calculation of each layer of operators, the instruction control unit automatically extracts the quantization scaling factor and offset parameter that match the layer from the weight cache and configures them to the hybrid quantization processing unit, thereby achieving fully automatic precision alignment at the hardware level without the need for intervention from an external microcontroller (MCU).

[0058] S20. Then, the core computation stage begins: a dual-array separation architecture is adopted. First, the deep convolution array performs 3×3 or 1×1 deep convolution on the spatial features to extract local features. Then, the pointwise convolution array performs 1×1 convolution to complete channel fusion. The computation results are then transformed nonlinearly by activation function units (supporting ReLU, ReLU6, Hardswish, etc.).

[0059] S30. After the calculation is completed, the result output stage begins: the hybrid quantization processing unit performs inverse quantization and scaling on the output, and the optimized layout output buffer temporarily stores the result in NHWC format and prepares it for output.

[0060] S40. Finally, the power management phase begins: After the power scheduling unit detects that the inference task has been completed, it immediately starts the shutdown sequence, turning off the computing array clock and power supply in sequence, so that the system automatically falls back to a sleep state with power consumption below 1μW until the next task wakes it up.

[0061] Specifically, in this embodiment, the power scheduling unit is connected to an external wake-up pin. When the system is in a sleep state with power consumption below 1μW, the external microcontroller sends a level trigger signal through this pin. After recognizing the signal, the power scheduling unit immediately turns on the clock and power domains of the computing array, achieving a microsecond-level wake-up response.

[0062] The entire process is completed automatically under hardware scheduling, realizing fully pipelined processing from data input, parallel computing to result output, and maintaining extremely low static power consumption between tasks.

[0063] Example 1

[0064] Smart metering anomaly detection scenario

[0065] System Configuration: The core control of this system is handled by a Cortex-M0+ microcontroller (operating at 32MHz). The AI ​​acceleration task is undertaken by the dedicated hardware acceleration structure proposed in this embodiment. This accelerator is designed using a 40nm process, with a chip area of ​​0.26mm², meeting the requirements for System-on-Chip (SoC) integration. The algorithm running on the system is a lightweight anomaly detection network, primarily composed of depthwise separable convolutional and LSTM units. To achieve extreme energy efficiency, the model employs a hybrid quantization strategy, with weights at INT4 precision and activation values ​​at INT8 precision. The entire system operates at 1.2V, specifically designed for battery-powered scenarios, thus achieving high-efficiency real-time intelligent processing.

[0066] Implementation Process: This system operates under strict low-power cycles while powered by battery. Upon power-up, the hardware acceleration structure enters a sleep state, with a static power consumption of only 0.7μW. During operation, the meter collects voltage, current, and power characteristics every 10 seconds, and then the MCU writes this characteristic data into the accelerator's input buffer via the bus. The power scheduling unit then detects the task and wakes the acceleration structure from sleep mode. Next, the hardware automatically executes the complete inference pipeline: first, the data undergoes mixed quantization preprocessing, followed by depthwise convolution, pointwise convolution, and activation function calculation. After inference is complete, the hardware directly outputs an anomaly judgment result (0 for normal, 1 for anomaly). This result is directly sent to the communication module for reporting. Simultaneously, the acceleration structure automatically shuts down the computing unit power and immediately returns to sleep mode, awaiting the next cycle's task.

[0067] The TinyML hardware acceleration system and method for smart metering described in the above embodiments, compared with existing technologies, suffers from high power consumption, low computational utilization, and low computational efficiency when addressing the TinyML inference acceleration needs of battery-powered IoT edge devices such as smart metering and smart meter reading. This invention, through a lightweight, turn-off array, hybrid quantization acceleration, metering feature optimization caching, and three-level power management, presents a dual-array turn-off lightweight computing structure for TinyML. It supports hardware acceleration of depthwise separable convolution (dividing convolution into depthwise convolution and pointwise convolution, which is the core operator of lightweight networks), achieving extreme energy efficiency for smart metering scenarios. Employing an INT4 / INT8 hybrid quantization hardware processing unit, it requires no software intervention and uses an operator-level hardware autonomous execution architecture, eliminating the need for MCU intervention. Compared to a general-purpose NPU, computational redundancy is reduced by more than 70%, power consumption by 90%, and area by 60%. Compared to software inference, speed is increased by 5-10 times, and it supports lightweight TinyML models, allowing direct integration into edge metering SoCs, effectively reducing power consumption and significantly improving computational utilization and efficiency.

[0068] Obviously, the embodiments described above are merely preferred embodiments of the present invention, and not all embodiments. The accompanying drawings illustrate preferred embodiments of the present invention, but do not limit the scope of the patent. The present invention can be implemented in many different forms; rather, these embodiments are provided to provide a more thorough and complete understanding of the disclosure of the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or make equivalent substitutions for some of the technical features. Any equivalent structures made using the content of this specification and drawings, directly or indirectly applied to other related technical fields, are similarly within the scope of patent protection of this invention.

Claims

1. A smart metering oriented TinyML hardware acceleration system, characterized in that, include: The instruction control unit is used to parse inference instructions and schedule the execution process of each operator, and is responsible for automatically extracting quantization parameters from the cache and configuring them to the hybrid quantization processing unit before the operator is executed; A power-off lightweight computing array, connected to the instruction control unit, includes a deep convolution array, a pointwise convolution array, and an activation unit. The deep convolution array is used to perform deep convolution operations, the pointwise convolution array is used to perform pointwise convolution operations, and the activation unit is used to perform activation function operations. Each of the deep convolution array, the pointwise convolution array, and the activation unit supports independent clock shutdown and power shutdown. A hybrid quantization processing unit, connected to the switchable lightweight computing array, is used to perform quantization, dequantization, scaling, and offset operations for INT4 and INT8. The hierarchical caching unit is connected to the hybrid quantization processing unit and the turn-off lightweight computing array, and includes a weight cache, a feature cache and an output cache. The power scheduling unit is connected to the instruction control unit, the power-off lightweight computing array, the hybrid quantization processing unit, and the hierarchical cache unit, respectively, and is used to manage the power consumption of each unit according to the inference task status.

2. The TinyML hardware acceleration system for smart metering of claim 1, wherein, The depthwise convolutional array supports 3×3 and 1×1 convolutional kernels; the activation unit supports ReLU, ReLU6, and Hardswish activation functions. 3.The smart metering oriented TinyML hardware acceleration system of claim 1, wherein, The hybrid quantization processing unit includes a quantization multiplier, an efficient adder tree, and a shift scaling circuit, which are used to perform quantization and dequantization operations at the hardware level.

4. The TinyML hardware acceleration system for smart metering of claim 3, wherein, The hybrid quantization processing unit supports a hybrid precision calculation mode where weights use INT4 precision and activation values ​​use INT8 precision.

5. The TinyML hardware acceleration system for smart metering of claim 1, wherein, In the hierarchical caching unit, the feature data is arranged in NHWC format; the weight data is stored in groups according to the convolution kernel; and the output cache is used to connect to the subsequent communication interface.

6. The TinyML hardware acceleration system for smart metering of claim 1, wherein, The power scheduling unit adopts a three-stage power management strategy, including: Sleep mode: Except for the wake-up detection circuit, all other modules are turned off, and the static power consumption is less than 1μW; Standby state: Powering the instruction control unit and the hierarchical cache unit with a power consumption of less than 10μW; Inference state: The power-off lightweight computing array and the hybrid quantization processing unit are powered on as needed to perform calculations, with power consumption of less than 5mW.

7. The TinyML hardware acceleration system for smart metering of claim 6, wherein, After detecting that the inference task has been completed, the power scheduling unit sequentially shuts down the clock and power supply of the power-offable lightweight computing array, causing the system to fall back to the sleep state.

8. The TinyML hardware acceleration system for smart metering of claim 1, wherein, The instruction control unit has automatic parameter loading logic, which is used to automatically extract the quantization scaling factor and offset parameter that match the layer from the layered cache unit before the start of each layer operator operation, and configure them to the hybrid quantization processing unit.

9. A method applied to the system of any of claims 1 to 8, characterized in that, Includes the following steps: S10, the hybrid quantization processing unit dequantizes and scales the input INT8 activation value and the preloaded INT4 weights; S20. A dual-array separation architecture is adopted. First, the spatial features are extracted by 3×3 or 1×1 deep convolution of the deep convolution array. Then, the pointwise convolution array performs 1×1 convolution to complete channel fusion. The calculation results are nonlinearly transformed by the activation function unit. S30, the hybrid quantization processing unit performs inverse quantization and scaling on the output, and the optimized layout output buffer temporarily stores the result in NHWC format and prepares it for output. S40: Upon detecting the completion of an inference task, the power scheduling unit immediately initiates a shutdown sequence, sequentially shutting down the computing array clock and power supply, causing the system to automatically fall back to sleep mode until the next task wakes it up.

10. The method of claim 9, wherein, In step S20, the activation function includes one or more of ReLU, ReLU6, and Hardswish.