Floating-point number conversion method and device
A conversion method and floating-point number technology, which is applied in the computer field and can solve the problems of low training efficiency of neural networks
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0078] figure 2 A flow chart of the floating-point number conversion method provided by Embodiment 1 of the present invention. Such as figure 2 As shown, a floating-point number conversion method provided by the embodiment of the present invention includes the following steps:
[0079] S101. Obtain the value of the first symbol segment, the value of the first exponent segment, and the value of the first mantissa segment in the first floating-point number respectively, the first floating-point number is a single-precision floating-point number in a normalized data format, that is, based on IEEE A normalized single-precision floating-point number of the 754 specification.
[0080] Generally, for a normalized single-precision floating-point number based on the IEEE 754 specification, the normalized single-precision floating-point number representation of the floating-point number is:
[0081] A=(-1) S ×2 21-127 ×1.F,
[0082] Among them, E1 is the value of the exponent segm...
Embodiment 2
[0099] In the floating-point number conversion method provided by Embodiment 2 of the present invention, on the basis of Embodiment 1 above, the value of the first index segment and the preset index bit width are used to determine the value of the organization segment and the second index segment The value of the step specific can include:
[0100] S201. Determine the value of the organization segment by using the value of the first index segment and the preset index bit width.
[0101] Specifically, the following formula is used in this embodiment to determine the value of the tissue segment:
[0102] r=[E / 2 es ],
[0103] Wherein, r represents the value of the organization segment, and when r is a non-integer value, the value of r is rounded down; E represents the value of the first index segment, and es represents the bit width of the preset index.
[0104] Taking the number whose true value is 0.125 as an example, it is expressed in the form of floating-point scientific...
Embodiment 3
[0113] In the floating-point conversion method provided by the third embodiment of the present invention, on the basis of the above-mentioned embodiment, the value of the second symbol segment, the value of the organization segment, the value of the second exponent segment and the The step of forming the second floating-point number in the form of binary code according to the preset total bit width of the value of the second mantissa segment specifically includes:
[0114] S301. Using the value of the organization segment, determine a binary code corresponding to the value of the organization segment.
[0115] For floating-point numbers in the Posit data format, the value of the organization segment r is floating. In data representation, the encoding of the organization segment r has two representations: one is continuous 1 and a subsequent 0, such as 111...0; the other is continuous 0 and a subsequent 1, such as 000... 1. For the real value r of the organizational segment, ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com