A neural network inference chip Softmax operator splitting method
By performing normalization or non-normalization splitting on demand in the neural network inference chip, the numerical out-of-bounds problem under FP16 precision is solved, achieving high-efficiency and low-cost computational accuracy and efficiency, avoiding the dependence on FP32 precision and the redundant calculations caused by full normalization.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- JIUZHI (SUZHOU) INTELLIGENT TECH CO LTD
- Filing Date
- 2026-03-17
- Publication Date
- 2026-06-16
AI Technical Summary
In existing neural network inference chips that only support FP16 precision, the numerical values are prone to going out of bounds when splitting operators. Furthermore, existing solutions either rely on FP32 precision, resulting in high costs and latency, or use full normalization splitting to reduce inference efficiency.
By statistically analyzing the target data range of each layer operator in the neural network, determining whether the out-of-bounds condition is met, performing normalization or non-normalization splitting as needed, generating configuration files to guide compilation, avoiding exponential operation overflow and underflow under FP16 precision, and reducing redundant operations.
It effectively balances the accuracy and efficiency of computational functions under FP16 precision, avoids dependence on FP32, reduces hardware costs and latency, and improves computational efficiency.
Smart Images

Figure 1 
Figure 2