A data encoding method and related apparatus
By preserving the multiplication and reversibility operations of the volumetric flow model, the problems of poor representation ability of the integer flow model and irreversibility of the general flow model are solved, achieving higher compression ratio and lower encoding times, thus improving the efficiency and accuracy of lossless compression.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2021-02-27
- Publication Date
- 2026-06-19
AI Technical Summary
In existing lossless compression techniques, integer stream models are limited to integer addition and subtraction operations, resulting in poor representation capabilities, inability to accurately estimate data distribution, and low compression ratio. Furthermore, general stream models cannot achieve numerical invertibility in discrete spaces, leading to low algorithm efficiency.
By adopting a volumetric flow model, and through the multiplication and reversibility operations of the target volumetric flow layer, combined with entropy coding technology, we achieve stronger data distribution representation capabilities and numerical reversibility, reduce the number of encoding operations, and improve compression and throughput.
It achieves higher compression ratios and fewer encoding cycles, improving the efficiency and effectiveness of lossless compression and ensuring the reversibility and accuracy of data encoding and decoding.
Smart Images

Figure CN114978189B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence, and more particularly to a data encoding method and related equipment. Background Technology
[0002] The core of lossless compression is finding the distribution patterns within data. For example, the letter 'e' appears much more frequently than 'z' in English documents. If 'e' is stored using fewer bits, the document's storage length can be shortened, thus achieving document compression. Artificial intelligence (AI) lossless compression is a new technological field that uses artificial intelligence for lossless compression. Its core is using AI to find better distribution patterns within data and leveraging these patterns for compression, aiming to achieve a higher lossless compression ratio.
[0003] In lossless compression, the input data and the latent variable output used for encoding must be discrete and completely reversible. This directly limits the choice and use of data encoding and decoding methods, because most data encoding and decoding methods introduce numerical errors when performing floating-point operations, making reversible operations impossible.
[0004] In one existing implementation, lossless compression is achieved using an integer discrete flow (IDF) model. This model uses integer addition and subtraction to avoid floating-point errors and ensures the numerical reversibility of the flow model. During computation, the model uses integer addition and subtraction for all input data to avoid numerical errors, and both the input data x and the latent variable output z = f(x) are integers, guaranteeing that f... -1 (f(x)) = x. However, due to the limitation to integer addition and subtraction operations during the encoding and decoding process, the IDF has poor representation ability and cannot accurately estimate the data distribution, resulting in a low compression rate. Summary of the Invention
[0005] In a first aspect, this application provides a data encoding method, the method comprising:
[0006] Obtain the data to be encoded;
[0007] The data to be encoded can be image, video, or text data.
[0008] Taking image data as an example, the image can be an image captured by the terminal device through a camera, or it can be an image obtained from within the terminal device (e.g., an image stored in the terminal device's photo album, or an image obtained by the terminal device from the cloud). It should be understood that the image can be an image that requires image compression, and this application does not limit the source of the image to be processed.
[0009] The data to be encoded is processed by a volumetric flow model to obtain latent variable output; wherein, the volumetric flow model includes a target volumetric flow layer, the operation corresponding to the target volumetric flow layer is an invertible operation that satisfies the volumetric flow constraint, and the target volumetric flow layer is used to multiply the first data input to the target volumetric flow layer with a preset coefficient, wherein the preset coefficient is not 1;
[0010] Among them, the target volume-preserving flow layer can also be called the target volume-preserving coupling layer;
[0011] The volumetric flow constraint refers to the consistency of the input and output volumes of the operation corresponding to the volumetric operation layer. Consistent volume means a one-to-one correspondence between the data in the input and output volumes; different output data correspond to different input data. To ensure that the operation corresponding to the target volumetric flow layer satisfies the volumetric flow constraint, the product of the coefficients of the first-order terms in the operation must be 1. Specifically, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients, and the N coefficients in the preset coefficients are the coefficients of the first-order terms in the operation corresponding to the target volumetric flow layer, and the product of the N coefficients is 1.
[0012] The term "reversible operation" refers to an operation that can both obtain output data from input data and deduce input data from output data. For example, if the input data is x and the output data is z = f(x), x can be recovered from the output data z through the inverse operation.
[0013] The output of the hidden variable is encoded to obtain encoded data.
[0014] In this embodiment of the application, the latent variable output z can be derived from the probability distribution p. Z (z) indicates that, according to the probability distribution p Z (z) Encodes the latent variable output z to obtain encoded data.
[0015] In one alternative implementation, the encoded data is a binary bitstream. The probability estimate of each point in the latent variable output can be obtained using an entropy estimation network. The latent variable output is then entropy encoded using this probability estimate to obtain the binary bitstream. It should be noted that the entropy encoding process mentioned in this application can use existing entropy encoding techniques, which will not be elaborated upon here.
[0016] This application utilizes a volumetric flow model to achieve lossless compression. Compared with the integer flow model, the target volumetric flow layer in the volumetric flow model, while ensuring reversibility, includes operations other than integer addition and subtraction (multiplication), which makes the volumetric flow model have stronger representation capabilities and can more accurately determine the data distribution, thereby achieving a better compression ratio.
[0017] On the other hand, for general flow models, it can be proven that there is no method to achieve numerical invertibility in discrete space. This is because there will always be cases where latent variables correspond to multiple input data due to numerical errors. In such cases, multiple encoding operations must be performed to eliminate numerical errors, leading to low algorithm efficiency. However, the volumetric flow model in this application embodiment utilizes a numerically invertible target volumetric flow layer to achieve numerically invertible operations. While ensuring the model has strong representational capabilities, the compression process achieves a very small number of encoding operations, thereby achieving higher compression throughput and lower compression ratio.
[0018] In one possible implementation, the first data and the preset coefficients are vectors, the first data includes N elements, the preset coefficients include N coefficients, the N elements of the first data correspond one-to-one with the N coefficients, and the product of the N coefficients is 1; the multiplication operation between the first data and the preset coefficients includes:
[0019] Perform a multiplication operation on each element in the first data with its corresponding coefficient to obtain the product result.
[0020] In one possible implementation, the method further includes: processing the second data input to the target preserving vortex layer using a first neural network to obtain a first network output, and performing a preset operation on the first network output to obtain the preset coefficient. In one implementation, the preset operation is an exponential operation with the natural constant e as the base.
[0021] The first data and the second data are two parts of the input data. For example, if the input data is a vector [A, B], then the first data is vector [A] and the second data is vector [B].
[0022] In one possible implementation, the first network output is a vector comprising N elements, and the preset operation on the output of the first neural network includes:
[0023] Obtain the average of the N elements included in the first network output, and subtract the average from each element included in the first network output to obtain the processed N elements;
[0024] Each of the N processed elements is subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which include N coefficients.
[0025] To ensure that the product of the N coefficients in the preset coefficients is 1, the average of each element in the first network output can be subtracted. Specifically, the first network output is a vector containing N elements. The average of the N elements in the first network output can be obtained, and the average can be subtracted from each element in the first network output to obtain the processed N elements. Each of the processed N elements is then subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which consist of N coefficients.
[0026] In one possible implementation, the output of the target volumetric flow layer includes the second data.
[0027] In one possible implementation, the target preservation volumetric flow layer is further used to add the product of the first data and a preset coefficient to a constant term, wherein the constant term is not 0.
[0028] In one possible implementation,
[0029] The method further includes:
[0030] The second data input to the target preserving vortex layer is processed by a second neural network to obtain the constant term.
[0031] In one possible implementation, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N elements include a first target element and a second target element. The first target element corresponds to a first target coefficient, and the second target element corresponds to a second target coefficient. The multiplication operation between the first data and the preset coefficients includes:
[0032] Obtain the first fixed-point number corresponding to the first target element and the second fixed-point number corresponding to the second target element;
[0033] Obtain the first score corresponding to the first target coefficient and the second score corresponding to the second target coefficient. The first score includes a first numerator and a first denominator, and the second score includes a second numerator and a second denominator. The first numerator, the first denominator, the second numerator and the second denominator are integers, and the first denominator is the same as the second numerator.
[0034] Multiply the first fixed-point number with the first numerator to obtain the first result;
[0035] The first result is divided by the first denominator to obtain a second result, which includes a first quotient and a first remainder. The first quotient is used as the result of multiplying the first target element and the first target coefficient.
[0036] Multiply the second fixed-point number with the second numerator to obtain the third result;
[0037] The third result is added to the first remainder result to obtain the fourth result;
[0038] The fourth result is divided by the second denominator to obtain the fifth result, which includes the second quotient and the second remainder. The second quotient is used as the result of multiplying the second target element and the second target coefficient.
[0039] In this embodiment, the reversible calculation problem is solved by using division with remainder. Specifically, the coefficients of the linear terms are converted into fractional form, with the numerator of each dimension serving as the denominator of the previous dimension. The data for each dimension is multiplied by the numerator of the current linear term coefficient and the remainder from the previous dimension is added. Then, division with remainder is performed on the denominator to obtain the result for the current dimension. Simultaneously, the remainder from the division with remainder is passed to the next dimension to eliminate numerical errors.
[0040] For example, the fixed-point number of the first data x can be [44 / 16, 55 / 16, 66 / 16], where 16 indicates that the precision of the fixed-point number is not in the multiplication operation. Then the fixed-point number of the first data is x = [44, 55, 66]. The preset coefficient s is [0.65, 0.61, 2.52]. The fraction corresponding to the preset coefficient s is [2 / 3, 3 / 5, 5 / 2]. Here, the first fixed-point number is 44, the second fixed-point number is 55, the first target coefficient is 0.65, the second target coefficient is 0.61, the first fraction is 2 / 3, the second fraction is 3 / 5, the first numerator is 2, the first denominator is 3, the second numerator is 3, and the second denominator is 5. Multiply the first fixed-point number (44) with the first numerator (2) to obtain a first result (88). Divide the first result (88) with the first denominator (3) to obtain a second result. The second result includes a first quotient result (29) and a first remainder result (1). The first quotient result (29) is used as the result of multiplying the first target element with the first target coefficient. Multiply the second fixed-point number (55) with the second numerator (3) to obtain a third result (165). Add the third result (165) with the first remainder result (1) to obtain a fourth result (166). Divide the fourth result (166) with the second denominator (5) to obtain a fifth result. The fifth result includes a second quotient result (33) and a second remainder result (1). The second quotient result (33) is used as the result of multiplying the second target element with the second target coefficient.
[0041] In one possible implementation, the second target element is the last element among the N elements that undergoes multiplication with its corresponding coefficient during the multiplication operation of the first data and a preset coefficient. The target preservation capacitive flow layer is also used to output the second remainder result. Specifically, the target preservation capacitive flow layer can output the second remainder result to the next preservation capacitive flow layer adjacent to the target preservation capacitive flow layer. That is, each element in the first data obtains a remainder result based on the above method and inputs it into the calculation process of the next element until the product operation of the last element in the first data is completed. At this point, the remainder result can be input into the next adjacent preservation capacitive flow layer.
[0042] In one possible implementation, the target volumetric flow layer is further configured to output the second remainder result to the next volumetric flow layer adjacent to the target volumetric flow layer.
[0043] In one possible implementation, the volumetric flow model further includes a first volumetric flow layer, which is the volumetric flow layer adjacent to the target volumetric flow layer. The step of multiplying the first fixed-point number with the first numerator to obtain a first result includes:
[0044] Obtain the remainder result of the output of the first preservation volumetric flow layer;
[0045] The first fixed-point number is multiplied by the first numerator, and the result of the multiplication is added to the remainder of the first volumetric flow layer output to obtain the first result.
[0046] In one implementation, if the target volumetric conservation layer is the first volumetric conservation layer in the volumetric conservation model (that is, the volumetric conservation layer that processes the data to be encoded), then the first result is the result of multiplying the first fixed-point number with the first numerator. If the target volumetric conservation layer is not the first volumetric conservation layer in the volumetric conservation model (that is, the volumetric conservation layer that does not process the data to be encoded, but processes the output results of other intermediate layers), then the first fixed-point number is the sum of the result of multiplying the first fixed-point number with the first numerator and the remainder result of the output of the adjacent previous volumetric conservation layer.
[0047] In one possible implementation, the preserve-flow model comprises M serial preserve-flow layers, including the target preserve-flow layer. The output of the (i-1)th preserve-flow layer is used as the input of the ith preserve-flow layer, where i is a positive integer not greater than M. The input of the ith preserve-flow layer is the data to be encoded, and the output of the Mth preserve-flow layer is the latent variable output. The preserve-flow model can be a stack of multiple preserve-flow layers.
[0048] In one possible implementation, the volumetric flow model further includes a target convolutional layer connected to the target volumetric flow layer, wherein the output of the target volumetric flow layer is used as the input of the target convolutional layer, and the target convolutional layer is used to perform a multiplication operation between the output of the target volumetric flow layer and the weight matrix.
[0049] In one possible implementation, the multiplication operation between the output of the target preserving bulk flow layer and the weight matrix includes:
[0050] Obtain the weight matrix;
[0051] The weight matrix is decomposed by LU to obtain a first matrix, a second matrix, a third matrix and a fourth matrix. The first matrix is a scrambled matrix, the second matrix is a lower triangular matrix, the third matrix is an identity matrix with a product of 1 for its diagonal elements, and the fourth matrix is an upper triangular matrix.
[0052] The output of the target preserving volumetric layer is multiplied with the fourth matrix to obtain the sixth result;
[0053] The sixth result is multiplied by the third matrix to obtain the seventh result;
[0054] The seventh result is multiplied by the second matrix to obtain the eighth result;
[0055] The eighth result is multiplied with the first matrix to obtain a ninth result, which is used as the result of multiplying the output of the target preserving volumetric layer with the weight matrix.
[0056] In this embodiment, the target convolutional layer is transformed into matrix multiplication operations involving continuous upper triangular matrices, diagonal matrices, lower triangular matrices, and scrambled matrices. Iterative computation, numerical computation of the coupling layer, iterative computation, and element rearrangement are applied to each of the four matrix multiplication methods. When performing multiplication operations with the weight matrix in the convolutional layer, the target convolutional layer is transformed into matrix multiplication operations involving continuous upper triangular matrices, diagonal matrices, lower triangular matrices, and scrambled matrices. Iterative computation, numerical computation of the target convolutional layer while preserving its volumetric properties, iterative computation, and element rearrangement are applied to each of the four matrix multiplication methods. Reversible computation methods for each method are provided, thereby achieving numerically reversible computation of the target convolutional layer.
[0057] In one possible implementation, the volumetric-preserving model includes M serially connected volumetric-preserving layers and M convolutional layers. The M volumetric-preserving layers include the target volumetric-preserving layer, and the M convolutional layers include the target convolutional layer. The output of the i-th volumetric-preserving layer is used as the input of the i-th convolutional layer, and the output of the i-th convolutional layer is used as the input of the (i+1)-th volumetric-preserving layer, where i is a positive integer not greater than M. The input of the first volumetric-preserving layer is the data to be encoded, and the output of the M-th convolutional layer is the latent variable output. The volumetric-preserving model can be a stack of multiple volumetric-preserving layers and convolutional layers.
[0058] Secondly, this application provides a data decoding method, the method comprising:
[0059] Obtain encoded data;
[0060] In this embodiment of the application, after obtaining the encoded data, the encoded data can be sent to a terminal device for decompression. The image processing device for decompression can then obtain the encoded data and decompress it. Alternatively, the terminal device for compression can store the encoded data in a storage device. When needed, the terminal device can retrieve the encoded data from the storage device and decompress it.
[0061] It should be understood that the decoding device can also obtain the remainder result as described in the above embodiments.
[0062] The encoded data is decoded to obtain the latent variable output;
[0063] In this embodiment of the application, the decoding device can decode the encoded data to obtain the latent variable output.
[0064] Specifically, entropy decoding technology, which is already in use, can be used to decode the encoded data and obtain the reconstructed latent variable output.
[0065] The latent variable output is processed by the volumetric flow model to obtain the decoded output; wherein, the volumetric flow model includes a target volumetric flow layer, the operation corresponding to the target volumetric flow layer is an invertible operation that satisfies the volumetric flow constraint, and the target volumetric flow layer is used to multiply the first data input to the target volumetric flow layer with a preset coefficient, wherein the preset coefficient is not 1.
[0066] The term "reversible operation" refers to an operation that can both obtain output data from input data and deduce input data from output data. For example, if the input data is x and the output data is z = f(x), x can be recovered from the output data z through the inverse operation.
[0067] In this embodiment of the application, after obtaining the latent variable output, the latent variable output can be processed based on the inverse operation of the operation corresponding to each layer in the volumetric model to restore the original data to be encoded (that is, the decoded output), thereby realizing the lossless decompression process.
[0068] In one possible implementation, the volume-preserving flow constraint includes: the input space and output space of the operation corresponding to the volume-preserving operation layer have the same volume size.
[0069] In one possible implementation, the first data and the preset coefficients are vectors, the first data includes N elements, the preset coefficients include N coefficients, the N elements of the first data correspond one-to-one with the N coefficients, and the product of the N coefficients is 1; the division operation between the first data and the preset coefficients includes:
[0070] Perform a division operation on each element in the first data with its corresponding coefficient to obtain the division result.
[0071] In this embodiment of the application, in order to ensure that the operation corresponding to the target volumetric flow layer satisfies the volumetric flow constraint, the product of the coefficients of the first-order terms in the operation corresponding to the target volumetric flow layer needs to be 1. Specifically, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N coefficients in the preset coefficients are the coefficients of the first-order terms in the operation corresponding to the target volumetric flow layer, and the product of the N coefficients is 1.
[0072] In one possible implementation, the method further includes: processing the second data input to the target preserving vortex layer through a first neural network to obtain a first network output, and performing a preset operation on the first network output to obtain the preset coefficient.
[0073] In one possible implementation, the first network output is a vector comprising N elements, and the preset operation on the output of the first neural network includes:
[0074] Obtain the average of the N elements included in the first network output, and subtract the average from each element included in the first network output to obtain the processed N elements;
[0075] Each of the N processed elements is subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which include N coefficients.
[0076] To ensure that the product of the N coefficients in the preset coefficients is 1, the average of each element in the first network output can be subtracted. Specifically, the first network output is a vector containing N elements. The average of the N elements in the first network output can be obtained, and the average can be subtracted from each element in the first network output to obtain the processed N elements. Each of the processed N elements is then subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which consist of N coefficients.
[0077] In one possible implementation, the output of the target volumetric flow layer includes the second data.
[0078] In one possible implementation, the target preservation bulk flow layer is further used to perform a subtraction operation between the first data and a constant term to obtain a subtraction result, wherein the constant term is not 0;
[0079] The division operation between the first data and the preset coefficient includes:
[0080] Perform a division operation between the subtraction result and the preset coefficient.
[0081] In one possible implementation, the method further includes:
[0082] The second data input to the target preserving vortex layer is processed by a second neural network to obtain the constant term.
[0083] In one possible implementation, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N elements include a first target element and a second target element. The first target element corresponds to a first target coefficient, and the second target element corresponds to a second target coefficient. The division operation between the first data and the preset coefficients includes:
[0084] Obtain the first fixed-point number corresponding to the first target element and the second fixed-point number corresponding to the second target element;
[0085] Obtain the first score corresponding to the first target coefficient and the second score corresponding to the second target coefficient. The first score includes a first numerator and a first denominator, and the second score includes a second numerator and a second denominator. The first numerator, the first denominator, the second numerator and the second denominator are integers, and the first numerator and the second denominator are the same.
[0086] Multiply the first fixed-point number with the first denominator to obtain the first result;
[0087] The first result is divided by the first numerator to obtain a second result, which includes a first quotient and a first remainder. The first quotient is used as the result of the division between the first target element and the first target coefficient.
[0088] Multiply the second fixed-point number with the second denominator to obtain the third result;
[0089] The third result is added to the first remainder result to obtain the fourth result;
[0090] The fourth result is divided with the second numerator to obtain a fifth result, which includes a second quotient and a second remainder. The second quotient is used as the result of the division between the second target element and the second target coefficient.
[0091] In one possible implementation, the second target element is the last element among the N elements that is divided with the corresponding coefficient during the division operation between the first data and the preset coefficient. The target preservation volume flow layer is also used to output the second remainder result.
[0092] In one possible implementation, the volumetric flow preservation model further includes a first volumetric flow preservation layer, which is the volumetric flow preservation layer adjacent to the target volumetric flow preservation layer. The step of multiplying the first fixed-point number with the first denominator to obtain a first result includes:
[0093] Obtain the remainder result of the output of the first preservation volumetric flow layer;
[0094] The first fixed-point number is multiplied by the first denominator, and the result of the multiplication is added to the remainder of the first volumetric flow layer output to obtain the first result.
[0095] In one possible implementation, the preserved volumetric model includes M serial preserved volumetric layers, the M serial preserved volumetric layers including the target preserved volumetric layer, and the output of the (i-1)th preserved volumetric layer is used as the input of the ith preserved volumetric layer, where i is a positive integer not greater than M, the input of the ith preserved volumetric layer is the latent variable output, and the output of the Mth preserved volumetric layer is the decoded output.
[0096] In one possible implementation, the volumetric flow model further includes a target convolutional layer connected to the target volumetric flow layer, wherein the output of the target convolutional layer is the first data, and the target convolutional layer is used to perform a division operation on the input data and the weight matrix.
[0097] In one possible implementation, the division operation between the input data and the weight matrix includes:
[0098] Obtain the weight matrix;
[0099] The weight matrix is decomposed by LU to obtain a first matrix, a second matrix, a third matrix and a fourth matrix. The first matrix is a scrambled matrix, the second matrix is a lower triangular matrix, the third matrix is an identity matrix with a product of 1 for its diagonal elements, and the fourth matrix is an upper triangular matrix.
[0100] The input data is multiplied by the inverse of the first matrix to obtain the sixth result;
[0101] The sixth result is multiplied by the inverse of the second matrix to obtain the seventh result;
[0102] The seventh result is multiplied by the inverse of the third matrix to obtain the eighth result;
[0103] The eighth result is multiplied by the inverse of the fourth matrix to obtain the ninth result, which is used as the result of the division operation between the input data and the weight matrix.
[0104] The second target element is the last element among the N elements that is divided with its corresponding coefficient during the division operation between the first data and the preset coefficient. The target preservation volumetric flow layer is also used to output the second remainder result. Specifically, the target preservation volumetric flow layer can output the second remainder result to the next preservation volumetric flow layer adjacent to the target preservation volumetric flow layer. That is, each element in the first data obtains a remainder result based on the above method and inputs it into the calculation process of the next element until the product operation of the last element in the first data is completed. At this time, the remainder result can be input into the next adjacent preservation volumetric flow layer.
[0105] In one possible implementation, the volumetric-preserving model includes M sequentially connected volumetric-preserving layers and M convolutional layers. The M volumetric-preserving layers include the target volumetric-preserving layer, and the M convolutional layers include the target convolutional layer. The output of the i-th convolutional layer is used as the input of the i-th volumetric-preserving layer, and the output of the i-th volumetric-preserving layer is used as the input of the (i+1)-th convolutional layer, where i is a positive integer not greater than M. The input of the first convolutional layer is the hidden variable output, and the output of the M-th volumetric-preserving layer is the decoded output.
[0106] First, the input data can be multiplied by the inverse of the first matrix to obtain a sixth result; the sixth result can be multiplied by the inverse of the second matrix to obtain a seventh result; the seventh result can be multiplied by the inverse of the third matrix to obtain an eighth result; and the eighth result can be multiplied by the inverse of the fourth matrix to obtain a ninth result. This ninth result is used as the result of the division operation between the input data and the weight matrix. For instructions on how to perform the inverse operation of the target convolutional layer, please refer to [reference needed]. Figure 3 The description of the inverse operation of the target convolutional layer in the corresponding embodiment will not be repeated here.
[0107] This application utilizes a volumetric flow model to achieve lossless compression. Compared with the integer flow model, the target volumetric flow layer in the volumetric flow model, while ensuring reversibility, includes operations other than integer addition and subtraction (multiplication), which makes the volumetric flow model have stronger representation capabilities and can more accurately determine the data distribution, thereby achieving a better compression ratio.
[0108] On the other hand, for general flow models, it can be proven that there is no method to achieve numerical invertibility in discrete space. This is because there will always be cases where latent variables correspond to multiple input data due to numerical errors. In such cases, multiple encoding operations must be performed to eliminate numerical errors, leading to low algorithm efficiency. However, the volumetric flow model in this application embodiment utilizes a numerically invertible target volumetric flow layer to achieve numerically invertible operations. While ensuring the model has strong representational capabilities, the compression process achieves a very small number of encoding operations, thereby achieving higher compression throughput and lower compression ratio.
[0109] Thirdly, this application provides a data encoding apparatus, the apparatus comprising:
[0110] The acquisition module is used to acquire the data to be encoded.
[0111] The volumetric flow preservation module is used to process the data to be encoded through the volumetric flow preservation model to obtain the latent variable output; wherein, the volumetric flow preservation model includes a target volumetric flow preservation layer, the operation corresponding to the target volumetric flow preservation layer is an invertible operation that satisfies the volumetric flow preservation constraint, and the target volumetric flow preservation layer is used to perform a multiplication operation on the first data input to the target volumetric flow preservation layer and a preset coefficient, wherein the preset coefficient is not 1;
[0112] The encoding module is used to encode the output of the hidden variables to obtain encoded data.
[0113] This application utilizes a volumetric flow model to achieve lossless compression. Compared with the integer flow model, the target volumetric flow layer in the volumetric flow model, while ensuring reversibility, includes operations other than integer addition and subtraction (multiplication), which makes the volumetric flow model have stronger representation capabilities and can more accurately determine the data distribution, thereby achieving a better compression ratio.
[0114] On the other hand, for general flow models, it can be proven that there is no method to achieve numerical invertibility in discrete space. This is because there will always be cases where latent variables correspond to multiple input data due to numerical errors. In such cases, multiple encoding operations must be performed to eliminate numerical errors, leading to low algorithm efficiency. However, the volumetric flow model in this application embodiment utilizes a numerically invertible target volumetric flow layer to achieve numerically invertible operations. While ensuring the model has strong representational capabilities, the compression process achieves a very small number of encoding operations, thereby achieving higher compression throughput and lower compression ratio.
[0115] In one possible implementation, the volume-preserving flow constraint includes: the input space and output space of the operation corresponding to the volume-preserving operation layer have the same volume size.
[0116] In one possible implementation, the first data and the preset coefficients are vectors, the first data includes N elements, the preset coefficients include N coefficients, the N elements of the first data correspond one-to-one with the N coefficients, and the product of the N coefficients is 1; the multiplication operation between the first data and the preset coefficients includes:
[0117] Perform a multiplication operation on each element in the first data with its corresponding coefficient to obtain the product result.
[0118] In one possible implementation, the volumetric flow preservation module is used to process the second data input to the target volumetric flow preservation layer through a first neural network to obtain a first network output, and to perform a preset operation on the first network output to obtain the preset coefficient.
[0119] In one possible implementation, the first network output is a vector comprising N elements. The acquisition module is configured to acquire the average of the N elements in the first network output and subtract the average from each element in the first network output to obtain the processed N elements.
[0120] Each of the N processed elements is subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which include N coefficients.
[0121] In one possible implementation, the output of the target volumetric flow layer includes the second data.
[0122] In one possible implementation, the target preservation volumetric flow layer is further used to add the product of the first data and a preset coefficient to a constant term, wherein the constant term is not 0.
[0123] In one possible implementation, the volumetric flow preservation module is used to process the second data input to the target volumetric flow preservation layer through a second neural network to obtain the constant term.
[0124] In one possible implementation, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N elements include a first target element and a second target element. The first target element corresponds to the first target coefficient, and the second target element corresponds to the second target coefficient. The volumetric flow preservation module is used to obtain the first fixed-point number corresponding to the first target element and the second fixed-point number corresponding to the second target element.
[0125] Obtain the first score corresponding to the first target coefficient and the second score corresponding to the second target coefficient. The first score includes a first numerator and a first denominator, and the second score includes a second numerator and a second denominator. The first numerator, the first denominator, the second numerator and the second denominator are integers, and the first denominator is the same as the second numerator.
[0126] Multiply the first fixed-point number with the first numerator to obtain the first result;
[0127] The first result is divided by the first denominator to obtain a second result, which includes a first quotient and a first remainder. The first quotient is used as the result of multiplying the first target element and the first target coefficient.
[0128] Multiply the second fixed-point number with the second numerator to obtain the third result;
[0129] The third result is added to the first remainder result to obtain the fourth result;
[0130] The fourth result is divided by the second denominator to obtain the fifth result, which includes the second quotient and the second remainder. The second quotient is used as the result of multiplying the second target element and the second target coefficient.
[0131] In one possible implementation, the second target element is the last of the N elements that is multiplied with the corresponding coefficient during the multiplication operation of the first data and the preset coefficient. The target preservation volumetric flow layer is also used to output the second remainder result.
[0132] In one possible implementation, the target volumetric flow layer is further configured to output the second remainder result to the next volumetric flow layer adjacent to the target volumetric flow layer.
[0133] In one possible implementation, the volumetric flow model further includes a first volumetric flow layer, which is the volumetric flow layer adjacent to the target volumetric flow layer, and the volumetric flow module is used to obtain the remainder result output by the first volumetric flow layer.
[0134] The first fixed-point number is multiplied by the first numerator, and the result of the multiplication is added to the remainder of the first volumetric flow layer output to obtain the first result.
[0135] In one possible implementation, the preserved volumetric model includes M serial preserved volumetric layers, the M serial preserved volumetric layers including the target preserved volumetric layer, and the output of the (i-1)th preserved volumetric layer is used as the input of the ith preserved volumetric layer, where i is a positive integer not greater than M, the input of the ith preserved volumetric layer is the data to be encoded, and the output of the Mth preserved volumetric layer is the latent variable output.
[0136] In one possible implementation, the volumetric flow model further includes a target convolutional layer connected to the target volumetric flow layer, wherein the output of the target volumetric flow layer is used as the input of the target convolutional layer, and the target convolutional layer is used to perform a multiplication operation between the output of the target volumetric flow layer and the weight matrix.
[0137] In one possible implementation, the volumetric module is used to obtain the weight matrix;
[0138] The weight matrix is decomposed by LU to obtain a first matrix, a second matrix, a third matrix and a fourth matrix. The first matrix is a scrambled matrix, the second matrix is a lower triangular matrix, the third matrix is an identity matrix with a product of 1 for its diagonal elements, and the fourth matrix is an upper triangular matrix.
[0139] The output of the target preserving volumetric layer is multiplied with the fourth matrix to obtain the sixth result;
[0140] The sixth result is multiplied by the third matrix to obtain the seventh result;
[0141] The seventh result is multiplied by the second matrix to obtain the eighth result;
[0142] The eighth result is multiplied with the first matrix to obtain a ninth result, which is used as the result of multiplying the output of the target preserving volumetric layer with the weight matrix.
[0143] In one possible implementation, the volumetric-preserving model includes M sequentially connected volumetric-preserving layers and M convolutional layers. The M volumetric-preserving layers include the target volumetric-preserving layer, and the M convolutional layers include the target convolutional layer. The output of the i-th volumetric-preserving layer is used as the input of the i-th convolutional layer, and the output of the i-th convolutional layer is used as the input of the (i+1)-th volumetric-preserving layer, where i is a positive integer not greater than M. The input of the first volumetric-preserving layer is the data to be encoded, and the output of the M-th convolutional layer is the latent variable output.
[0144] Fourthly, this application provides a data decoding apparatus, the apparatus comprising:
[0145] The acquisition module is used to acquire encoded data;
[0146] A decoding module is used to decode the encoded data to obtain the latent variable output;
[0147] The volumetric flow preservation module is used to process the latent variable output through the volumetric flow preservation model to obtain the decoded output; wherein, the volumetric flow preservation model includes a target volumetric flow preservation layer, the operation corresponding to the target volumetric flow preservation layer is a reversible operation that satisfies the volumetric flow preservation constraint, and the target volumetric flow preservation layer is used to perform a multiplication operation on the first data input to the target volumetric flow preservation layer and a preset coefficient, wherein the preset coefficient is not 1.
[0148] This application utilizes a volumetric flow model to achieve lossless compression. Compared with the integer flow model, the target volumetric flow layer in the volumetric flow model, while ensuring reversibility, includes operations other than integer addition and subtraction (multiplication), which makes the volumetric flow model have stronger representation capabilities and can more accurately determine the data distribution, thereby achieving a better compression ratio.
[0149] On the other hand, for general flow models, it can be proven that there is no method to achieve numerical invertibility in discrete space. This is because there will always be cases where latent variables correspond to multiple input data due to numerical errors. In such cases, multiple encoding operations must be performed to eliminate numerical errors, leading to low algorithm efficiency. However, the volumetric flow model in this application embodiment utilizes a numerically invertible target volumetric flow layer to achieve numerically invertible operations. While ensuring the model has strong representational capabilities, the compression process achieves a very small number of encoding operations, thereby achieving higher compression throughput and lower compression ratio.
[0150] In one possible implementation, the volume-preserving flow constraint includes: the input space and output space of the operation corresponding to the volume-preserving operation layer have the same volume size.
[0151] In one possible implementation, the first data and the preset coefficients are vectors, the first data includes N elements, the preset coefficients include N coefficients, the N elements of the first data correspond one-to-one with the N coefficients, and the product of the N coefficients is 1; the division operation between the first data and the preset coefficients includes:
[0152] Perform a division operation on each element in the first data with its corresponding coefficient to obtain the division result.
[0153] In one possible implementation, the volumetric flow preservation module is used to process the second data input to the target volumetric flow preservation layer through a first neural network to obtain a first network output, and to perform a preset operation on the first network output to obtain the preset coefficient.
[0154] In one possible implementation, the first network output is a vector comprising N elements. The acquisition module is configured to acquire the average of the N elements in the first network output and subtract the average from each element in the first network output to obtain the processed N elements.
[0155] Each of the N processed elements is subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which include N coefficients.
[0156] In one possible implementation, the output of the target volumetric flow layer includes the second data.
[0157] In one possible implementation, the target preservation bulk flow layer is further used to perform a subtraction operation between the first data and a constant term to obtain a subtraction result, wherein the constant term is not 0;
[0158] The acquisition module is used to perform a division operation between the subtraction result and the preset coefficient.
[0159] In one possible implementation, the volumetric flow preservation module is used to process the second data input to the target volumetric flow preservation layer through a second neural network to obtain the constant term.
[0160] In one possible implementation, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N elements include a first target element and a second target element. The first target element corresponds to the first target coefficient, and the second target element corresponds to the second target coefficient. The volumetric flow preservation module is used to obtain the first fixed-point number corresponding to the first target element and the second fixed-point number corresponding to the second target element.
[0161] Obtain the first score corresponding to the first target coefficient and the second score corresponding to the second target coefficient. The first score includes a first numerator and a first denominator, and the second score includes a second numerator and a second denominator. The first numerator, the first denominator, the second numerator and the second denominator are integers, and the first numerator and the second denominator are the same.
[0162] Multiply the first fixed-point number with the first denominator to obtain the first result;
[0163] The first result is divided by the first numerator to obtain a second result, which includes a first quotient and a first remainder. The first quotient is used as the result of the division between the first target element and the first target coefficient.
[0164] Multiply the second fixed-point number with the second denominator to obtain the third result;
[0165] The third result is added to the first remainder result to obtain the fourth result;
[0166] The fourth result is divided with the second numerator to obtain a fifth result, which includes a second quotient and a second remainder. The second quotient is used as the result of the division between the second target element and the second target coefficient.
[0167] In one possible implementation, the second target element is the last element among the N elements that is divided with the corresponding coefficient during the division operation between the first data and the preset coefficient. The target preservation volume flow layer is also used to output the second remainder result.
[0168] In one possible implementation, the volumetric flow model further includes a first volumetric flow layer, which is the volumetric flow layer adjacent to the target volumetric flow layer. The acquisition module is used to acquire the remainder result output by the first volumetric flow layer; multiply the first fixed-point number by the first denominator, and add the multiplication result to the remainder result output by the first volumetric flow layer to obtain the first result.
[0169] In one possible implementation, the preserved volumetric model includes M serial preserved volumetric layers, the M serial preserved volumetric layers including the target preserved volumetric layer, and the output of the (i-1)th preserved volumetric layer is used as the input of the ith preserved volumetric layer, where i is a positive integer not greater than M, the input of the ith preserved volumetric layer is the latent variable output, and the output of the Mth preserved volumetric layer is the decoded output.
[0170] In one possible implementation, the volumetric flow model further includes a target convolutional layer connected to the target volumetric flow layer, wherein the output of the target convolutional layer is the first data, and the target convolutional layer is used to perform a division operation on the input data and the weight matrix.
[0171] In one possible implementation, the volumetric module is used to obtain the weight matrix;
[0172] The weight matrix is decomposed by LU to obtain a first matrix, a second matrix, a third matrix and a fourth matrix. The first matrix is a scrambled matrix, the second matrix is a lower triangular matrix, the third matrix is an identity matrix with a product of 1 for its diagonal elements, and the fourth matrix is an upper triangular matrix.
[0173] The input data is multiplied by the inverse of the first matrix to obtain the sixth result;
[0174] The sixth result is multiplied by the inverse of the second matrix to obtain the seventh result;
[0175] The seventh result is multiplied by the inverse of the third matrix to obtain the eighth result;
[0176] The eighth result is multiplied by the inverse of the fourth matrix to obtain the ninth result, which is used as the result of the division operation between the input data and the weight matrix.
[0177] In one possible implementation, the volumetric-preserving model includes M sequentially connected volumetric-preserving layers and M convolutional layers. The M volumetric-preserving layers include the target volumetric-preserving layer, and the M convolutional layers include the target convolutional layer. The output of the i-th convolutional layer is used as the input of the i-th volumetric-preserving layer, and the output of the i-th volumetric-preserving layer is used as the input of the (i+1)-th convolutional layer, where i is a positive integer not greater than M. The input of the first convolutional layer is the hidden variable output, and the output of the M-th volumetric-preserving layer is the decoded output.
[0178] Fifthly, this application provides a data encoding apparatus, including a storage medium, a processing circuit, and a bus system; wherein the storage medium is used to store instructions, and the processing circuit is used to execute the instructions in the memory to perform the data encoding method described in the first aspect and any of the above.
[0179] In a sixth aspect, this application provides a data decoding apparatus, including a storage medium, a processing circuit, and a bus system; wherein the storage medium is used to store instructions, and the processing circuit is used to execute the instructions in the memory to perform the data decoding method described in the second aspect and any of the above.
[0180] In a seventh aspect, embodiments of this application provide a computer-readable storage medium storing a computer program that, when run on a computer, causes the computer to perform any of the methods described in the first to second aspects above.
[0181] Eighthly, embodiments of this application provide a computer program that, when run on a computer, causes the computer to perform any of the methods described in the first to second aspects above.
[0182] Ninthly, this application provides a chip system including a processor for supporting an execution device or training device in implementing the functions involved in the foregoing aspects, such as transmitting or processing data and / or information involved in the foregoing methods. In one possible design, the chip system further includes a memory for storing program instructions and data necessary for the execution device or training device. This chip system may be composed of chips or may include chips and other discrete devices.
[0183] This application provides a data encoding method, the method comprising: acquiring data to be encoded; processing the data to be encoded using a volumetric flow model to obtain latent variable output; wherein the volumetric flow model includes a target volumetric flow layer, the operation corresponding to the target volumetric flow layer is a reversible operation that satisfies volumetric flow constraints, and the target volumetric flow layer is used to multiply the first data input to the target volumetric flow layer with a preset coefficient, wherein the preset coefficient is not 1; and encoding the latent variable output to obtain encoded data.
[0184] On the one hand, this application utilizes a volumetric flow model to achieve lossless compression. Compared with the integer flow model, the target volumetric flow layer in the volumetric flow model, under the premise of ensuring reversibility, includes operations other than integer addition and subtraction (multiplication), which makes the volumetric flow model have stronger representation capabilities and can more accurately determine the data distribution, thereby achieving a better compression ratio.
[0185] On the other hand, during multiplication, division with remainder is used to solve the problem of numerical reversibility of the target fluvial layer while preserving its numerical value. The coefficients of the first-order terms (i.e., the preset coefficients in the above embodiments) are converted into fractional form, with the numerator of each dimension becoming the denominator of the previous dimension. The data for each dimension is multiplied by the numerator of the current first-order coefficient and the remainder from the previous dimension is added. Then, division with remainder is performed on the denominator, and finally, a constant term is added to obtain the calculation result for the current dimension. Simultaneously, the remainder of the division with remainder is passed to the next dimension to eliminate numerical errors, thereby achieving numerical reversibility of the target fluvial layer while preserving its numerical value.
[0186] On the other hand, when performing multiplication operations with the weight matrix in the convolutional layer, the target convolutional layer is transformed into matrix multiplication operations of continuous upper triangular matrix, diagonal matrix, lower triangular matrix and scrambled matrix. Four calculation methods are used for the four matrix multiplications, namely iterative calculation, numerical calculation of the target volumetric layer, iterative calculation and element rearrangement. The reversible calculation method of each calculation method is given, thereby realizing the numerical reversible calculation of the target convolutional layer.
[0187] On the other hand, for general flow models, it can be proven that there is no method to achieve numerical invertibility in discrete space. This is because there will always be cases where latent variables correspond to multiple input data due to numerical errors. In such cases, multiple encoding operations must be performed to eliminate numerical errors, leading to low algorithm efficiency. However, the volumetric flow model in this application embodiment utilizes a numerically invertible target volumetric flow layer to achieve numerically invertible operations. While ensuring the model has strong representational capabilities, the compression process achieves a very small number of encoding operations, thereby achieving higher compression throughput and lower compression ratio. Attached Figure Description
[0188] Figure 1 A structural diagram illustrating the main framework of artificial intelligence;
[0189] Figure 2a This is a schematic diagram of the application architecture of an embodiment of this application;
[0190] Figure 2b This is a schematic diagram of the convolutional neural network in the embodiments of this application;
[0191] Figure 2c This is a schematic diagram of the convolutional neural network in the embodiments of this application;
[0192] Figure 3 This application provides an example of a data encoding method.
[0193] Figure 4 This is a schematic diagram of the application architecture of an embodiment of this application;
[0194] Figure 5 This is a schematic diagram of a volumetric flow model for an embodiment of this application;
[0195] Figure 6 This is a schematic diagram of a volumetric flow model for an embodiment of this application;
[0196] Figure 7 This is an illustration of an embodiment of a data decoding method provided in this application.
[0197] Figure 8 This is an illustration of an embodiment of a data decoding method provided in this application.
[0198] Figure 9This is a schematic diagram of a volumetric flow model for an embodiment of this application;
[0199] Figure 10 This is a schematic diagram of a volumetric flow model for an embodiment of this application;
[0200] Figure 11 A system architecture diagram provided for an embodiment of this application;
[0201] Figure 12 A schematic diagram of a data encoding device provided in an embodiment of this application;
[0202] Figure 13 A schematic diagram of a data decoding device provided in an embodiment of this application;
[0203] Figure 14 A schematic diagram of the structure of the execution device provided in the embodiments of this application;
[0204] Figure 15 This is a schematic diagram of a chip structure provided in an embodiment of this application. Detailed Implementation
[0205] The embodiments of the present invention will now be described with reference to the accompanying drawings. The terminology used in the embodiments section is for illustrative purposes only and is not intended to limit the scope of the invention.
[0206] The embodiments of this application will now be described with reference to the accompanying drawings. Those skilled in the art will recognize that, with technological advancements and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are equally applicable to similar technical problems.
[0207] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such terms are interchangeable where appropriate; this is merely a way of distinguishing objects with the same attributes in the embodiments of this application. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion, so that a process, method, system, product, or apparatus that comprises a series of elements is not necessarily limited to those elements, but may include other elements not explicitly listed or inherent to those processes, methods, products, or apparatuses.
[0208] First, the overall workflow of the artificial intelligence system is described; please refer to [link / reference]. Figure 1 , Figure 1The diagram illustrates a structural framework for artificial intelligence (AI). The framework is further elaborated below along two dimensions: the "Intelligent Information Chain" (horizontal axis) and the "IT Value Chain" (vertical axis). The "Intelligent Information Chain" reflects a series of processes from data acquisition to processing. For example, it could be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, and intelligent execution and output. In this process, data undergoes a condensation process of "data—information—knowledge—wisdom." The "IT Value Chain" reflects the value that AI brings to the information technology industry, from the underlying infrastructure of human intelligence and information (provided and processed through technological means) to the industrial ecosystem of the system.
[0209] (1) Infrastructure
[0210] Infrastructure provides computing power to support artificial intelligence systems, enabling communication with the external world and providing support through a basic platform. This communication occurs through sensors; computing power is provided by intelligent chips (hardware acceleration chips such as CPUs, NPUs, GPUs, ASICs, and FPGAs); and the basic platform includes distributed computing frameworks and related platform guarantees and support, which may include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to acquire data, and this data is provided to intelligent chips in the distributed computing system provided by the basic platform for computation.
[0211] (2) Data
[0212] The data at the next layer of infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, text, and IoT data from traditional devices, including business data from existing systems and sensor data such as force, displacement, liquid level, temperature, and humidity.
[0213] (3) Data processing
[0214] Data processing typically includes methods such as data training, machine learning, deep learning, search, reasoning, and decision-making.
[0215] Among them, machine learning and deep learning can perform intelligent information modeling, extraction, preprocessing, and training on data, including symbolization and formalization.
[0216] Reasoning refers to the process in which, in a computer or intelligent system, the machine thinks and solves problems by simulating human intelligent reasoning, based on reasoning control strategies and using formalized information. Typical functions include search and matching.
[0217] Decision-making refers to the process of making decisions based on intelligent information after reasoning, and it typically provides functions such as classification, sorting, and prediction.
[0218] (4) General ability
[0219] After the data processing mentioned above, the results of the data processing can be used to form some general capabilities, such as algorithms or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.
[0220] (5) Smart Products and Industry Applications
[0221] Intelligent products and industry applications refer to products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Their application areas mainly include: intelligent terminals, intelligent transportation, intelligent healthcare, autonomous driving, smart cities, etc.
[0222] This application can be applied to the field of lossless compression of data such as images, videos, and text in the field of artificial intelligence. For example, this application can be applied to the image compression process in terminal devices.
[0223] Specifically, the image compression method provided in this application can be applied to the image compression process in a terminal device, specifically to photo albums, video surveillance, etc., on the terminal device. For more details, please refer to... Figure 2a , Figure 2a This is an illustration of an application scenario for an embodiment of this application, such as... Figure 2a As shown, the terminal device can acquire an image to be compressed (also referred to as encoded data in this application), which can be a photograph taken by a camera or a frame extracted from a video. The terminal device can process the acquired image to be compressed using a volumetric model that preserves data flow, transforming the image data into latent variable output and generating probability estimates for each point in the latent variable output. The encoder can encode the extracted latent variable output using the probability estimates for each point in the latent variable output, reducing the encoding redundancy of the latent variable output, further reducing the amount of data transmitted during image compression, and saving the encoded data as a data file in the corresponding storage location. When the user needs to retrieve the file saved in the above storage location, the CPU can retrieve and load the saved file in the corresponding storage location, and obtain the decoded latent variable output based on the decoding. The CPU then reconstructs the latent variable output using a volumetric model that preserves data flow, obtaining the reconstructed image (i.e., the decoded output).
[0224] Since the embodiments of this application involve a large number of neural network applications, for ease of understanding, the relevant terms and concepts of neural networks that may be involved in the embodiments of this application will be introduced below.
[0225] (1) Neural Network
[0226] A neural network can be composed of neural units, which can be operational units that take xs and an intercept of 1 as inputs, and whose output can be:
[0227]
[0228] Where s = 1, 2, ..., n, where n is a natural number greater than 1, Ws is the weight of Xs, and b is the bias of the neural unit. f is the activation function of the neural unit, used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer, and the activation function can be the sigmoid function. A neural network is a network formed by connecting multiple of the above-mentioned individual neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field, which can be a region composed of several neural units.
[0229] (2) Deep Neural Networks
[0230] A deep neural network (DNN), also known as a multilayer neural network, can be understood as a neural network with multiple hidden layers. Based on the position of the layers, the internal neural network of a DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally, the first layer is the input layer, the last layer is the output layer, and the layers in between are hidden layers. The layers are fully connected, meaning that any neuron in the i-th layer is connected to any neuron in the (i+1)-th layer.
[0231] Although DNNs seem complex, the operation of each layer is actually not complicated. Simply put, it involves the following linear relationship expression: in, It is the input vector. It is the output vector. α is the offset vector, W is the weight matrix (also called coefficients), and α() is the activation function. Each layer is simply an adjustment of the input vector. The output vector is obtained through such a simple operation. Because DNNs have many layers, the coefficients W and the offset vector... The number of these parameters is also relatively large. The definitions of these parameters in DNNs are as follows: Taking the coefficient W as an example: Assuming a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as... The superscript 3 represents the layer number where coefficient W is located, while the subscript corresponds to the third layer index 2 of the output and the second layer index 4 of the input.
[0232] In summary, the coefficient from the k-th neuron in layer L-1 to the j-th neuron in layer L is defined as...
[0233] It's important to note that the input layer does not have a W parameter. In deep neural networks, more hidden layers allow the network to better represent complex real-world situations. Theoretically, the more parameters a model has, the higher its complexity and "capacity," meaning it can perform more complex learning tasks. Training a deep neural network is essentially the process of learning the weight matrix, with the ultimate goal of obtaining the weight matrix of all layers in the trained deep neural network (a weight matrix formed by the vectors W from many layers).
[0234] (2) A Convolutional Neural Network (CNN) is a deep neural network with a convolutional structure. A CNN contains a feature extractor consisting of convolutional layers and subsampling layers. This feature extractor can be viewed as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or a convolutional feature map. A convolutional layer refers to the layer of neurons in a CNN that performs convolution processing on the input signal (e.g., the first and second convolutional layers in this embodiment). In a convolutional layer of a CNN, a neuron can be connected to only some of the neurons in neighboring layers. A convolutional layer typically contains several feature planes, each of which can be composed of several rectangularly arranged neural units. Neural units on the same feature plane share weights, which are the convolutional kernels. Shared weights can be understood as the way image information is extracted regardless of location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. This means that image information learned in one part can also be used in another part. Therefore, we can use the same learned image information for all locations in the image. Within the same convolutional layer, multiple convolutional kernels can be used to extract different image information. Generally, the more convolutional kernels there are, the richer the image information reflected by the convolution operation.
[0235] Convolutional kernels can be initialized as matrices of random size, and during the training of a convolutional neural network, they can learn appropriate weights. Furthermore, sharing weights directly reduces the number of connections between layers in the convolutional neural network, while also lowering the risk of overfitting.
[0236] Specifically, such as Figure 2bAs shown, the convolutional neural network (CNN) 100 may include an input layer 110, a convolutional / pooling layer 120, wherein the pooling layer is optional, and a neural network layer 130.
[0237] The structure consisting of the convolutional layer / pooling layer 120 and the neural network layer 130 can be the first convolutional layer and the second convolutional layer described in this application. The input layer 110 is connected to the convolutional layer / pooling layer 120, and the convolutional layer / pooling layer 120 is connected to the neural network layer 130. The output of the neural network layer 130 can be input to the activation layer, and the activation layer can perform non-linear processing on the output of the neural network layer 130.
[0238] Convolutional / pooling layers 120:
[0239] Convolutional layers:
[0240] like Figure 2b The convolutional / pooling layer 120 shown may include layers 121-126 as in Examples 121-126. In one implementation, layer 121 is a convolutional layer, layer 122 is a pooling layer, layer 123 is a convolutional layer, layer 124 is a pooling layer, layer 125 is a convolutional layer, and layer 126 is a pooling layer. In another implementation, layers 121 and 122 are convolutional layers, layer 123 is a pooling layer, layers 124 and 125 are convolutional layers, and layer 126 is a pooling layer. That is, the output of the convolutional layer can be used as the input of a subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
[0241] Taking convolutional layer 121 as an example, it can include multiple convolution operators, also known as kernels. In image processing, a convolution operator acts as a filter, extracting specific information from the input image matrix. Essentially, a convolution operator can be a weight matrix, which is usually predefined. During the convolution operation, the weight matrix processes the input image pixel by pixel (or two pixels by two pixels, depending on the stride) along the horizontal direction, thus extracting specific features. The size of the weight matrix should be related to the image size. It's important to note that the depth dimension of the weight matrix is the same as the depth dimension of the input image; during convolution, the weight matrix extends to the entire depth of the input image. Therefore, convolution with a single weight matrix produces a single-depth convolutional output. However, in most cases, multiple weight matrices of the same dimension are applied instead of a single weight matrix. The outputs of each weight matrix are stacked to form the depth dimension of the convolutional image. Different weight matrices can be used to extract different features from an image. For example, one weight matrix can be used to extract image edge information, another weight matrix can be used to extract specific colors from the image, and yet another weight matrix can be used to blur unwanted noise in the image. These multiple weight matrices have the same dimension, and the feature maps extracted by these multiple weight matrices with the same dimension also have the same dimension. The extracted feature maps with the same dimension are then merged to form the output of the convolution operation.
[0242] The weight values in these weight matrices need to be obtained through extensive training in practical applications. The weight matrices formed by the weight values obtained through training can extract information from the input image, thereby helping the convolutional neural network 100 to make correct predictions.
[0243] When a convolutional neural network 100 has multiple convolutional layers, the initial convolutional layers (e.g., 121) tend to extract more general features, which can also be called low-level features. As the depth of the convolutional neural network 100 increases, the features extracted by later convolutional layers (e.g., 126) become more and more complex, such as high-level semantic features. Features with higher semantic levels are more suitable for the problem to be solved.
[0244] Pooling layer:
[0245] Because it is often necessary to reduce the number of training parameters, pooling layers are often introduced periodically after convolutional layers, i.e., ... Figure 2b In the example of 120, each layer 121-126 can be a convolutional layer followed by a pooling layer, or multiple convolutional layers followed by one or more pooling layers.
[0246] Neural network layer 130:
[0247] After processing by the convolutional / pooling layers 120, the convolutional neural network 100 is still insufficient to output the required information. As mentioned earlier, the convolutional / pooling layers 120 only extract features and reduce the parameters introduced by the input image. However, to generate the final output information (the required class information or other relevant information), the convolutional neural network 100 needs to utilize neural network layers 130 to generate one or more outputs representing the required number of classes. Therefore, neural network layers 130 may include multiple hidden layers (such as...). Figure 2a As shown in 131, 132 to 13n) and output layer 140, the parameters contained in these multi-layer hidden layers can be pre-trained based on relevant training data for specific task types, such as image recognition, image classification, image super-resolution reconstruction, etc.
[0248] After the multiple hidden layers in neural network layer 130, the final layer of the entire convolutional neural network 100 is the output layer 140. This output layer 140 has a loss function similar to classification cross-entropy, specifically used to calculate the prediction error. Once the entire convolutional neural network 100 has undergone forward propagation (e.g., ...), the loss function is applied. Figure 2b The propagation from 110 to 140 is completed (forward propagation), and the reverse propagation (such as...) Figure 2b The propagation from 140 to 110 (backpropagation) will begin to update the weight values and biases of the layers mentioned above, in order to reduce the loss of the convolutional neural network 100 and the error between the output of the convolutional neural network 100 through the output layer and the ideal result.
[0249] It should be noted that, as Figure 2b The convolutional neural network 100 shown is merely an example of a convolutional neural network. In specific applications, convolutional neural networks can also exist in the form of other network models, such as... Figure 2c The multiple convolutional / pooling layers shown are run in parallel, and the extracted features are all input into the full neural network layer 130 for processing.
[0250] (3) Deep Neural Networks
[0251] Deep Neural Networks (DNNs), also known as multilayer neural networks, can be understood as neural networks with many hidden layers, though there's no specific metric for "many." DNNs can be categorized into three layers based on their position: input layers, hidden layers, and output layers. Generally, the first layer is the input layer, the last layer is the output layer, and the layers in between are hidden layers. All layers are fully connected, meaning that any neuron in the i-th layer is connected to any neuron in the (i+1)-th layer. Although DNNs appear complex, the operation of each layer is actually quite simple, resembling a linear relationship as follows: in, It is the input vector. It is the output vector. α is the offset vector, W is the weight matrix (also called coefficients), and α() is the activation function. Each layer is simply an adjustment of the input vector. The output vector is obtained through such a simple operation. Because DNNs have many layers, the coefficients W and the offset vector... The number of these parameters is therefore quite large. The definitions of these parameters in a DNN are as follows: Taking the coefficient W as an example: Assuming a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as... The superscript 3 represents the layer number where coefficient W resides, while the subscript corresponds to the output third layer index 2 and the input second layer index 4. In summary, the coefficients from the k-th neuron in layer L-1 to the j-th neuron in layer L are defined as follows: It's important to note that the input layer does not have a W parameter. In deep neural networks, more hidden layers allow the network to better represent complex real-world situations. Theoretically, the more parameters a model has, the higher its complexity and "capacity," meaning it can perform more complex learning tasks. Training a deep neural network is essentially the process of learning the weight matrix, with the ultimate goal of obtaining the weight matrix of all layers in the trained deep neural network (a weight matrix formed by the vectors W from many layers).
[0252] (4) Loss Function
[0253] In training a deep neural network, to ensure the output closely approximates the desired predicted value, we compare the network's prediction with the actual desired value. Based on the difference, we update the weight vector of each layer (usually pre-configuring parameters before the initial update). For example, if the prediction is too high, the weight vector is adjusted to predict a lower value. This adjustment continues until the deep neural network predicts the desired value or a value very close to it. Therefore, we need to predefine "how to compare the difference between the predicted and actual values," which is the loss function or objective function. These are important equations used to measure the difference between the predicted and actual values. For example, a higher loss value indicates a greater difference, so training a deep neural network becomes a process of minimizing this loss.
[0254] (5) Backpropagation algorithm
[0255] Neural networks can employ backpropagation (BP) to correct the parameters of the initial neural network model during training, thereby reducing the reconstruction error loss. Specifically, forward propagation of the input signal to the output generates error loss; this error loss information is then propagated back to update the parameters of the initial neural network model, leading to convergence of the error loss. The backpropagation algorithm is an error-loss-driven backpropagation process aimed at obtaining the optimal parameters of the neural network model, such as the weight matrix.
[0256] (6) Lossless compression
[0257] Data compression techniques produce compressed data that is shorter than the original data. The data recovered after decompression must be identical to the original data.
[0258] (7) Compression length
[0259] The storage space occupied by the compressed data.
[0260] (8) Compression ratio
[0261] The ratio of the original data length to the compressed data length. If there is no compression, the value is 1. A higher value is better.
[0262] (9) Latent variables
[0263] A set of data with a specific probability distribution can be used to obtain the probability distribution of the original data by establishing the conditional probabilities of these data and the original data.
[0264] (10) Flow Model
[0265] A reversible deep generative model is proposed, which enables bidirectional transformation between latent variables and raw data.
[0266] (11) Volumetric flow model
[0267] A special form of the flow model where the input space and the corresponding latent variable space have the same volume.
[0268] (12) Fixed point number
[0269] A decimal with a specific precision, a fixed-point number x with precision k, satisfies 2^k*x, where x is an integer.
[0270] (13) Floating-point numbers
[0271] A floating-point number is a decimal number stored using the computer's floating-point storage format.
[0272] (14) Reverse encoding
[0273] A special encoding technique that uses additional binary data stored in the system to generate specific data through decoding.
[0274] The execution subject of this application embodiment can be a terminal device or a server.
[0275] As one example, the terminal device can be a mobile phone, tablet, laptop, smart wearable device, etc., and the terminal device can compress the acquired data (such as image data, video data, or text data). As another example, the terminal device can be a virtual reality (VR) device. As yet another example, the embodiments of this application can also be applied to intelligent monitoring, where a camera can be configured to acquire images to be compressed. It should be understood that the embodiments of this application can also be applied to other scenarios requiring data compression, and these other application scenarios will not be listed here.
[0276] Reference Figure 3 , Figure 3 This is an illustration of an embodiment of an image processing method provided in this application, such as... Figure 3 As shown, an image processing method provided in this application embodiment includes:
[0277] 301. Obtain the data to be encoded.
[0278] In this embodiment of the application, the data to be encoded can be image, video, or text data.
[0279] Taking image data as an example, the image can be an image captured by the terminal device through a camera, or it can be an image obtained from within the terminal device (e.g., an image stored in the terminal device's photo album, or an image obtained by the terminal device from the cloud). It should be understood that the image can be an image that requires image compression, and this application does not limit the source of the image to be processed.
[0280] In one possible implementation, the data to be encoded can also be preprocessed.
[0281] Specifically, the data to be encoded can be processed into fixed-point numbers and normalized, and then decoding techniques can be used to calculate u ~ U(0,2). -h )δ, where δ=2 -k Where k is the precision of a fixed-point number, and U is a uniform distribution, after obtaining u, the processed data to be encoded can be obtained based on u, where, x is the data to be encoded. This is the processed data to be encoded.
[0282] If the data to be encoded is video data, and the video size does not match the model input size, the video needs to be cut into several video blocks, with each block having the same input size as the model (preserving the volumetric block). If the video length is greater than the length required by the model, it should be cut into multiple video input segments. If the input size or video length is insufficient, color blocks of a specific color can be used to fill the input size or length.
[0283] If the data to be encoded is text data, then word vector representations need to be constructed for the characters or words in the text; that is, the compression process requires first converting the text into vectors. For example, let the input data w (words or characters) be represented by a d-dimensional vector x = μ(w). Construct the probability distribution p(x|w) = N(μ(w), σ 2 (w))δ(δ=2 -dk ), (p(w) is the prior of w, usually the word frequency of w). In the data preprocessing process, given the input w, x is decoded using p(x|w), and w is encoded using p(w|x) to obtain the data to be encoded.
[0284] 302. The data to be encoded is processed by a volumetric flow model to obtain latent variable output; wherein the volumetric flow model includes a target volumetric flow layer, the operation corresponding to the target volumetric flow layer is a reversible operation that satisfies the volumetric flow constraint, and the target volumetric flow layer is used to multiply the first data input to the target volumetric flow layer with a preset coefficient, wherein the preset coefficient is not 1.
[0285] In this embodiment of the application, a volume preserving flow (VPF) model can be obtained.
[0286] Among them, the volumetric flow model is used to process the data to be encoded in order to obtain the latent variable output. The latent variable output is a type of data with a specific probability distribution. By establishing the conditional probabilities of the latent variable output and the data to be encoded, the probability distribution of the data to be encoded can be obtained.
[0287] For specific details, please refer to Figure 4 , Figure 4 This is a schematic diagram of a data encoding process provided in an embodiment of this application, wherein a volumetric model can process the data to be encoded to obtain latent variable output, and an encoder can process the latent variable output to obtain encoded data.
[0288] The structural features of the volumetric flow model in the embodiments of this application are described below:
[0289] In one implementation, the preserve-flow model can be a stack of multiple preserve-flow layers; specifically, the preserve-flow model can include M serial preserve-flow layers, the output of the (i-1)th preserve-flow layer is used as the input of the ith preserve-flow layer, where i is a positive integer not greater than M, the input of the ith preserve-flow layer is the data to be encoded, and the output of the Mth preserve-flow layer is the latent variable output.
[0290] For specific details, please refer to Figure 5 , Figure 5 This application provides a flowchart illustrating a volumetric flow model that preserves volumetric flow. The volumetric flow model may include M volumetric flow layers. Figure 5 The volumetric flow layers shown are 1, 2, 3, ..., M. The output of the first volumetric flow layer (volume flow layer 1) is used as the input of the second volumetric flow layer (volume flow layer 2), the output of the second volumetric flow layer (volume flow layer 2) is used as the input of the third volumetric flow layer (volume flow layer 3), and so on. The output of the Mth volumetric flow layer is the implicit variable output.
[0291] In one implementation, the volumetric flow model can be a stack of multiple volumetric flow layers and convolutional layers; specifically, the volumetric flow model includes M sequentially connected volumetric flow layers and M convolutional layers, the M volumetric flow layers include the target volumetric flow layer, the M convolutional layers include the target convolutional layer, and the output of the i-th volumetric flow layer is used as the input of the i-th convolutional layer, the output of the i-th convolutional layer is used as the input of the (i+1)-th volumetric flow layer, where i is a positive integer not greater than M, the input of the first volumetric flow layer is the data to be encoded, and the output of the M-th convolutional layer is the latent variable output.
[0292] For specific details, please refer to Figure 6 , Figure 6 This application provides a flowchart illustrating a volumetric flow model that preserves volumetric flow. The volumetric flow model may include M volumetric flow layers. Figure 6 The diagram shows M convolutional layers (preserving flow layer 1, preserving flow layer 2, preserving flow layer 3, ..., preserving flow layer M). Figure 6 The diagram shows convolutional layers 1, 2, 3, ..., M. The output of the first volumetric flow layer (volumeflow layer 1) is used as the input of the first convolutional layer (volumeflow layer 1), the output of the first convolutional layer (volumeflow layer 1) is used as the input of the second volumetric flow layer (volumeflow layer 2), and so on. The output of the Mth volumetric flow layer is the latent variable output.
[0293] The flow-preserving layer in the embodiments of this application is described below:
[0294] Taking a target-preserving volumetric flow layer as an example, in this embodiment of the application, the volumetric flow model may include a target-preserving volumetric flow layer, and the target-preserving volumetric flow layer is used to acquire first data and perform a multiplication operation between the first data and a preset coefficient, wherein the preset coefficient is not 1.
[0295] Specifically, the first data is the input of the target preservation volumetric flow layer, which can perform a multiplication operation between the first data and a preset coefficient.
[0296] In one implementation, since computers use binary storage, the input data to the target preserving the capacitive fluid layer can be processed into fixed-point numbers with precision k: Where x is the first data.
[0297] In this embodiment of the application, the first data and the preset coefficient are vectors. The first data includes N elements, and the preset coefficient includes N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. Therefore, when performing the multiplication operation between the first data and the preset coefficient, the multiplication operation between each element in the first data and the corresponding coefficient can be performed to obtain the product result, which is a vector containing N elements.
[0298] The following describes how to calculate the preset coefficients:
[0299] In one implementation, the second data input to the target preserving vortex layer can be processed by a first neural network to obtain a first network output, and a preset operation can be performed on the first network output to obtain the preset coefficient. In one implementation, the preset operation is an exponential operation with the natural constant e as the base.
[0300] In this embodiment of the application, the operation corresponding to the target volumetric flow layer is a reversible operation that satisfies the volumetric flow constraint.
[0301] The volume-preserving flow constraint can refer to the fact that the input space and output space of the operation corresponding to the volume-preserving operation layer are of the same size.
[0302] In this embodiment of the application, in order to ensure that the operation corresponding to the target volumetric flow layer satisfies the volumetric flow constraint, the product of the coefficients of the first-order terms in the operation corresponding to the target volumetric flow layer needs to be 1. Specifically, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N coefficients in the preset coefficients are the coefficients of the first-order terms in the operation corresponding to the target volumetric flow layer, and the product of the N coefficients is 1.
[0303] To ensure that the product of the N coefficients in the preset coefficients is 1, the average of each element in the first network output can be subtracted. Specifically, the first network output is a vector containing N elements. The average of the N elements in the first network output can be obtained, and the average can be subtracted from each element in the first network output to obtain the processed N elements. Each of the processed N elements is then subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which consist of N coefficients.
[0304] In this embodiment of the application, the operation corresponding to the target volumetric flow layer is a reversible operation that satisfies the volumetric flow constraint.
[0305] The term "reversible operation" refers to an operation that can both obtain output data from input data and deduce input data from output data. For example, if the input data is x and the output data is z = f(x), x can be recovered from the output data z through the inverse operation.
[0306] The following describes how to ensure that the operations corresponding to the target-preserving capacitive layer are reversible, even when multiplication operations are included.
[0307] In this embodiment, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N elements include a first target element and a second target element. The first target element corresponds to the first target coefficient, and the second target element corresponds to the second target coefficient. The first fixed-point number corresponding to the first target element and the second fixed-point number corresponding to the second target element can be obtained. The first fraction corresponding to the first target coefficient and the second fraction corresponding to the second target coefficient can be obtained. The first fraction includes a first numerator and a first denominator, and the second fraction includes a second numerator and a second denominator. The first numerator, first denominator, second numerator, and second denominator are all integers. The first denominator is the same as the second numerator. The first fixed-point number is multiplied by the first numerator to obtain a first result. The first result is divided by the first denominator to obtain a second result. The second result includes a first quotient and a first remainder. The first quotient is used as the result of multiplying the first target element and the first target coefficient. The second fixed-point number is multiplied by the second numerator to obtain a third result. The third result is added by the first remainder to obtain a fourth result. The fourth result is divided by the second denominator to obtain a fifth result. The fifth result includes a second quotient and a second remainder. The second quotient is used as the result of multiplying the second target element and the second target coefficient.
[0308] In this embodiment, the reversible calculation problem is solved by using division with remainder. Specifically, the coefficients of the linear terms are converted into fractional form, with the numerator of each dimension serving as the denominator of the previous dimension. The data for each dimension is multiplied by the numerator of the current linear term coefficient and the remainder from the previous dimension is added. Then, division with remainder is performed on the denominator to obtain the result for the current dimension. Simultaneously, the remainder from the division with remainder is passed to the next dimension to eliminate numerical errors.
[0309] For example, the fixed-point number of the first data x can be [44 / 16, 55 / 16, 66 / 16], where 16 indicates that the precision of the fixed-point number is not in the multiplication operation. Then the fixed-point number of the first data is x = [44, 55, 66]. The preset coefficient s is [0.65, 0.61, 2.52]. The fraction corresponding to the preset coefficient s is [2 / 3, 3 / 5, 5 / 2]. Here, the first fixed-point number is 44, the second fixed-point number is 55, the first target coefficient is 0.65, the second target coefficient is 0.61, the first fraction is 2 / 3, the second fraction is 3 / 5, the first numerator is 2, the first denominator is 3, the second numerator is 3, and the second denominator is 5. Multiply the first fixed-point number (44) with the first numerator (2) to obtain a first result (88). Divide the first result (88) with the first denominator (3) to obtain a second result. The second result includes a first quotient result (29) and a first remainder result (1). The first quotient result (29) is used as the result of multiplying the first target element with the first target coefficient. Multiply the second fixed-point number (55) with the second numerator (3) to obtain a third result (165). Add the third result (165) with the first remainder result (1) to obtain a fourth result (166). Divide the fourth result (166) with the second denominator (5) to obtain a fifth result. The fifth result includes a second quotient result (33) and a second remainder result (1). The second quotient result (33) is used as the result of multiplying the second target element with the second target coefficient.
[0310] The second target element is the last element among the N elements that is multiplied by its corresponding coefficient during the multiplication operation of the first data and the preset coefficient. The target preservation capacitive flow layer is also used to output the second remainder result. Specifically, the target preservation capacitive flow layer can output the second remainder result to the next preservation capacitive flow layer adjacent to the target preservation capacitive flow layer. That is, each element in the first data obtains a remainder result based on the above method and inputs it into the calculation process of the next element until the product operation of the last element in the first data is completed. At this time, the remainder result can be input into the next adjacent preservation capacitive flow layer.
[0311] In one implementation, if the target volumetric conservation layer is the first volumetric conservation layer in the volumetric conservation model (that is, the volumetric conservation layer that processes the data to be encoded), then the first result is the result of multiplying the first fixed-point number with the first numerator. If the target volumetric conservation layer is not the first volumetric conservation layer in the volumetric conservation model (that is, the volumetric conservation layer that does not process the data to be encoded, but processes the output results of other intermediate layers), then the first fixed-point number is the sum of the result of multiplying the first fixed-point number with the first numerator and the remainder result of the output of the adjacent previous volumetric conservation layer.
[0312] In one implementation, the target preservation volumetric flow layer is further used to add the product of the first data and a preset coefficient to a constant term, wherein the constant term is not 0.
[0313] Specifically, the second data input to the target preserving vortex layer can be processed by a second neural network to obtain the constant term.
[0314] The first and second neural networks mentioned above can be complex convolutional neural networks, such as ResNet, DenseNet, etc. For a detailed description of convolutional neural networks, please refer to... Figure 2b and Figure 2c The corresponding implementation examples are described in detail here.
[0315] In this embodiment of the application, the output of the target preservation volumetric layer may include the second data, that is, the second data serves as the basis for calculating constant terms and preset coefficients, and also as part of the output of the target preservation volumetric layer.
[0316] The following describes the calculations corresponding to the target-preserving bulk flow layer in the embodiments of this application, using formulas:
[0317] Decompose the first data x into two parts according to its dimension: x = [x a ,x b ], where x a For the second data, x b As the first data, x b For d b Given a dimensional vector, the operation z = f(x) corresponding to the target preserving the fluidic layer can be expressed as:
[0318] z a =x a ,z b =exp(s(x) a ))⊙x b +t(x a ), z = [z a ,z b ];
[0319] Where s(·) and t(·) are trainable neural networks (such as convolutional neural networks), s(·) is the first neural network, t(·) is the second neural network, ⊙ is the vector multiplication operation, and exp() is the exponential operation with the natural constant e as the base. Due to its volume-preserving limitation, exp(s(x)) must be true. a The product of the elements in )) is 1, which is (s(x a The sum of the elements in )) is 0, that is, sum(s(x) a ))=0, where sum is the sum of the vector elements. In the implementation, s(x a ) will be written as s(x a )←s(x a )-mean(s(x a )), where mean is the average value of the vector elements.
[0320] Let s = exp(s(x) a )), t=t(x a If the target is to preserve the fluid layer, then the operation corresponding to this can be expressed as a linear transformation z. b =s⊙x b +t, where s and t are the coefficient and constant term of the linear term, respectively, and the product of the elements of s is 1. This embodiment uses division with remainder to ensure complete numerical reversibility. First, additional data (remainder result) needs to be initialized r∈[0,2]. C Let s i , t i x i , i = 1, ..., d b s, t, x b The i-th element requires calculation of z. i ←s i ·x i +t i and output The remainder result r.
[0321] For example, z can be calculated in the following way i First, obtain the fixed-point number x. i For i = 1, 2, ..., d b : at this time For i = 1, 2, ..., d b v = x i ·m i-1 +r;y i =floor(v / m) i ), r←v mod m i At this time y i ≈xi ·s i , Where floor represents the floor function, and mod represents the modulo function. Because y i ≈x i ·s i Therefore Therefore z i The numerical calculation results are accurate.
[0322] The following describes how to perform an inverse operation based on z to recover x. During the inverse operation, z is decomposed into two parts: z = [z...] a , z b The inverse operation of this layer is x = f. -1 (z) is: x a =z a , x = [x a x b ];
[0323] in, Vector element-wise division. The reverse process requires calculating x. i =(z i -t i ) / s i Given the output r∈[0, 2]. C The corresponding numerical calculation method can be described as follows: First, calculate y i =2 k ·z i -round(2 k ·t i For i = 1, 2, ..., db: at this time For i = d b ,...,2,1:v=y i ·m i +r;y i ←floor(v / m i-1 ), r←v mod m i-1 At this time, x i ≈y i / s i ; Output and r.
[0324] In the above calculation process, it is obvious that... Due to the uniqueness of the division with remainder operation, x and r can be guaranteed to be completely reversible, that is, the forward and reverse operations of the target preserving the strobosphere can completely recover the original x and remainder result r.
[0325] In one implementation, one can refer to Figure 6 The illustrated architecture of the volumetric flow model further includes a target convolutional layer connected to the target volumetric flow layer. The output of the target volumetric flow layer is used as the input of the target convolutional layer, and the target convolutional layer is used to perform a multiplication operation between the output of the target volumetric flow layer and the weight matrix. In one implementation, the target convolutional layer is a 1x1 convolutional layer.
[0326] Specifically, the weight matrix can be obtained and LU decomposed to obtain a first matrix, a second matrix, a third matrix, and a fourth matrix. The first matrix is a scrambled matrix, the second matrix is a lower triangular matrix, the third matrix is an identity matrix whose diagonal elements have a product of 1, and the fourth matrix is an upper triangular matrix.
[0327] First, the output of the target convolutional layer can be multiplied with the fourth matrix to obtain the sixth result. Specifically, let the weight matrix of the target convolutional layer be W∈R. c×c The operation corresponding to the target convolutional layer is equivalent to matrix multiplication. Let the input data x have dimension c, then z = Wx. Decomposing W using LU, we get W = PLΛU, where P is the scrambling matrix, L is the lower triangular matrix, U is the upper triangular matrix, and Λ is the identity matrix with the product of its diagonal elements equal to 1 (detΛ = 1). The value of z can then be obtained by multiplying it by the matrices P, L, Λ, and U, i.e., z = PLΛUx. In the calculation, we can first calculate Ux. Specifically, let the sixth result z = Ux, u... ij Let x be the element in the i-th row and j-th column of U. i , z i Let x and z be the i-th elements respectively. Then, the forward calculation is:
[0328]
[0329] The reverse calculation starts from the last dimension and iterates through each dimension:
[0330] x c =z c ,
[0331] The sixth result can then be multiplied by the third matrix to obtain the seventh result; specifically, the seventh result can be set as z = Λx, where λ is the i-th diagonal element of Λ, and x... i , z i Let x and z be the i-th elements respectively, then z i =λ i ·x iIf Πλ=1, then the product of the diagonal matrix and the sixth result can be calculated using the calculation method for the operation corresponding to the target preservation fluid layer mentioned above, which will not be elaborated here.
[0332] The seventh result can then be multiplied by the second matrix to obtain the eighth result; specifically, let the eighth result z = Lx, l ij Let x be the element in the i-th row and j-th column of L. i , z i Let x and z be the i-th elements respectively. Then, the forward calculation is:
[0333] z1 = x1,
[0334] Reverse computation can start from the first dimension and iterate through each dimension:
[0335] x1 = z1,
[0336] The eighth result can then be multiplied with the first matrix to obtain a ninth result. This ninth result is used as the result of multiplying the output of the target preserving bulk flow layer with the weight matrix. Specifically, let the ninth result z = Px. z can be obtained by rearranging the elements of x according to matrix P, and x can be obtained according to P. -1 The elements of z can be restored by reversing their arrangement.
[0337] In this embodiment, the target convolutional layer is transformed into matrix multiplication operations of continuous upper triangular matrix, diagonal matrix, lower triangular matrix and scrambled matrix. Iterative calculation, numerical calculation of coupling layer, iterative calculation and element rearrangement are used for the four types of matrix multiplication respectively.
[0338] 303. Encode the output of the hidden variable to obtain encoded data.
[0339] In this embodiment of the application, the latent variable output z can be derived from the probability distribution p. Z (z) indicates that, according to the probability distribution p Z (z) Encodes the latent variable output z to obtain encoded data.
[0340] In one alternative implementation, the encoded data is a binary bitstream. The probability estimate of each point in the latent variable output can be obtained using an entropy estimation network. The latent variable output is then entropy encoded using this probability estimate to obtain the binary bitstream. It should be noted that the entropy encoding process mentioned in this application can use existing entropy encoding techniques, which will not be elaborated upon here.
[0341] In this embodiment of the application, after obtaining the encoded data, the encoded data can be sent to a device for decompression, which can then decompress (or decode) the data. Alternatively, the terminal device for compression can store the encoded data in a storage device, and when needed, the terminal device can retrieve the encoded data from the storage device and decompress it.
[0342] This application provides a data encoding method, the method comprising: acquiring data to be encoded; acquiring a volumetric flow block and processing the data to be encoded according to the volumetric flow model to obtain latent variable output; wherein the volumetric flow model includes a target volumetric flow layer, the operation corresponding to the target volumetric flow layer is a reversible operation that satisfies volumetric flow constraints, and the target volumetric flow layer is used to acquire first data and perform a multiplication operation between the first data and a preset coefficient, wherein the preset coefficient is not 1; encoding the latent variable output to obtain encoded data.
[0343] On the one hand, this application utilizes a volumetric flow model to achieve lossless compression. Compared with the integer flow model, the target volumetric flow layer in the volumetric flow model, under the premise of ensuring reversibility, includes operations other than integer addition and subtraction (multiplication), which makes the volumetric flow model have stronger representation capabilities and can more accurately determine the data distribution, thereby achieving a better compression ratio.
[0344] On the other hand, during multiplication, division with remainder is used to solve the problem of numerical reversibility of the target fluvial layer while preserving its numerical value. The coefficients of the first-order terms (i.e., the preset coefficients in the above embodiments) are converted into fractional form, with the numerator of each dimension becoming the denominator of the previous dimension. The data for each dimension is multiplied by the numerator of the current first-order coefficient and the remainder from the previous dimension is added. Then, division with remainder is performed on the denominator, and finally, a constant term is added to obtain the calculation result for the current dimension. Simultaneously, the remainder of the division with remainder is passed to the next dimension to eliminate numerical errors, thereby achieving numerical reversibility of the target fluvial layer while preserving its numerical value.
[0345] On the other hand, when performing multiplication operations with the weight matrix in the convolutional layer, the target convolutional layer is transformed into matrix multiplication operations of continuous upper triangular matrix, diagonal matrix, lower triangular matrix and scrambled matrix. Four calculation methods are used for the four matrix multiplications, namely iterative calculation, numerical calculation of the target volumetric layer, iterative calculation and element rearrangement. The reversible calculation method of each calculation method is given, thereby realizing the numerical reversible calculation of the target convolutional layer.
[0346] On the other hand, for general flow models, it can be proven that there is no method to achieve numerical invertibility in discrete space. This is because there will always be cases where latent variables correspond to multiple input data due to numerical errors. In such cases, multiple encoding operations must be performed to eliminate numerical errors, leading to low algorithm efficiency. However, the volumetric flow model in this application embodiment utilizes a numerically invertible target volumetric flow layer to achieve numerically invertible operations. While ensuring the model has strong representational capabilities, the compression process achieves a very small number of encoding operations, thereby achieving higher compression throughput and lower compression ratio.
[0347] Reference Figure 7 , Figure 7 This is a flowchart illustrating a data decoding method provided in an embodiment of this application, as shown below. Figure 7 As shown in the embodiment of this application, a data decoding method includes:
[0348] 701. Obtain the encoded data.
[0349] In this embodiment of the application, the decoding device can obtain the information as described above. Figure 3 The encoded data obtained in step 303 in the corresponding embodiment.
[0350] In this embodiment of the application, after obtaining the encoded data, the encoded data can be sent to a terminal device for decompression. The image processing device for decompression can then obtain the encoded data and decompress it. Alternatively, the terminal device for compression can store the encoded data in a storage device. When needed, the terminal device can retrieve the encoded data from the storage device and decompress it.
[0351] It should be understood that the decoding device can also obtain the remainder result as described in the above embodiments.
[0352] 702. Decode the encoded data to obtain the latent variable output.
[0353] In this embodiment of the application, the decoding device can decode the encoded data to obtain the latent variable output.
[0354] Specifically, entropy decoding technology, which is already in use, can be used to decode the encoded data and obtain the reconstructed latent variable output.
[0355] 703. The latent variable output is processed by the volumetric flow model to obtain the decoded output; wherein the volumetric flow model includes a target volumetric flow layer, the operation corresponding to the target volumetric flow layer is a reversible operation that satisfies the volumetric flow constraint, and the target volumetric flow layer is used to multiply the first data input to the target volumetric flow layer with a preset coefficient, wherein the preset coefficient is not 1.
[0356] In this embodiment of the application, after obtaining the latent variable output, it can be based on Figure 3 In the corresponding embodiment, the inverse operation of the operation corresponding to each layer in the volumetric model is used to process the hidden variable output in order to restore the original data to be encoded (that is, the decoded output), thereby realizing the lossless decompression process.
[0357] For specific details, please refer to Figure 8 , Figure 8 This is a schematic diagram of a data encoding and decoding process provided in an embodiment of this application. In the process of performing forward operation, the volumetric model can process the data to be encoded to obtain the latent variable output, the encoder can process the latent variable output to obtain the encoded data, and in the process of performing reverse operation, the volumetric model can process the latent variable output to obtain the decoded output.
[0358] Reference Figure 3 The structure of the preserve-flow model described in the corresponding embodiment can be, in one implementation, a stack of multiple preserve-flow layers. Specifically, the preserve-flow model can include M serial preserve-flow layers, where the output of the (i-1)th preserve-flow layer is used as the input of the ith preserve-flow layer, where i is a positive integer not greater than M, the input of the ith preserve-flow layer is the latent variable output, and the output of the Mth preserve-flow layer is the decoded output.
[0359] For specific details, please refer to Figure 9 , Figure 9 This application provides a flowchart illustrating a volumetric flow model that preserves volumetric flow. The volumetric flow model may include M volumetric flow layers. Figure 9 The volumetric flow preservers shown are 1, 2, 3, ..., M. During the decoding process, the output of the first volumetric flow preserver (M) is used as the input of the second volumetric flow preserver (M-1), and so on. The output of the (M-2)th volumetric flow preserver (3) is used as the input of the (M-1)th volumetric flow preserver (2), the output of the (M-1)th volumetric flow preserver (2) is used as the input of the Mth volumetric flow preserver (1), and the output of the Mth volumetric flow preserver (1) is the decoding output.
[0360] In one implementation, the volumetric conservation model can be a stack of multiple volumetric conservation layers and convolutional layers; specifically, the volumetric conservation model includes M serially connected volumetric conservation layers and M convolutional layers, the M volumetric conservation layers include the target volumetric conservation layer, the M convolutional layers include the target convolutional layer, and the output of the i-th volumetric conservation layer is used as the input of the i-th convolutional layer, the output of the i-th convolutional layer is used as the input of the (i+1)-th volumetric conservation layer, where i is a positive integer not greater than M, the input of the first convolutional layer is the hidden variable output, and the output of the M-th volumetric conservation layer is the decoded output.
[0361] For specific details, please refer to Figure 10 , Figure 10 This application provides a flowchart illustrating a volumetric flow model that preserves volumetric flow. The volumetric flow model may include M volumetric flow layers. Figure 10 The diagram shows M convolutional layers (preserving flow layer 1, preserving flow layer 2, preserving flow layer 3, ..., preserving flow layer M). Figure 10 The diagram shows convolutional layers 1, 2, 3, ..., M. During the decoding process, the output of the first convolutional layer (convolutional layer M) is used as the input of the first volume-preserving layer (volume-preserving layer M), and so on. The output of the (M-1)th convolutional layer (convolutional layer 2) is used as the input of the (M-1)th volume-preserving layer (volume-preserving layer 2), the output of the (M-1)th volume-preserving layer (volume-preserving layer 2) is used as the input of the Mth convolutional layer (convolutional layer 1), the output of the Mth convolutional layer (convolutional layer 1) is used as the input of the Mth volume-preserving layer (volume-preserving layer 1), and the output of the Mth volume-preserving layer is the decoding output.
[0362] In one possible implementation, the volume-preserving flow constraint includes: the input space and output space of the operation corresponding to the volume-preserving operation layer have the same volume size.
[0363] In one possible implementation, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients, and the product of the N coefficients is 1. Then, a division operation can be performed on each element in the first data and its corresponding coefficient to obtain the division result.
[0364] The following describes how to calculate the preset coefficients:
[0365] In one possible implementation, the second data input to the target preserving vortex layer can be processed by a first neural network to obtain the first network output, and a preset operation can be performed on the first network output to obtain the preset coefficient. In one implementation, the preset operation is an exponential operation with the natural constant e as the base.
[0366] The volume-preserving flow constraint can refer to the fact that the input space and output space of the operation corresponding to the volume-preserving operation layer are of the same size.
[0367] In this embodiment of the application, in order to ensure that the operation corresponding to the target volumetric flow layer satisfies the volumetric flow constraint, the product of the coefficients of the first-order terms in the operation corresponding to the target volumetric flow layer needs to be 1. Specifically, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N coefficients in the preset coefficients are the coefficients of the first-order terms in the operation corresponding to the target volumetric flow layer, and the product of the N coefficients is 1.
[0368] To ensure that the product of the N coefficients in the preset coefficients is 1, the average of each element in the first network output can be subtracted. Specifically, the first network output is a vector containing N elements. The average of the N elements in the first network output can be obtained, and the average can be subtracted from each element in the first network output to obtain the processed N elements. Each of the processed N elements is then subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which consist of N coefficients.
[0369] In this embodiment of the application, the operation corresponding to the target volumetric flow layer is a reversible operation that satisfies the volumetric flow constraint.
[0370] The term "reversible operation" refers to an operation that can both obtain output data from input data and deduce input data from output data. For example, if the input data is x and the output data is z = f(x), x can be recovered from the output data z through the inverse operation.
[0371] In one possible implementation, the first network output is a vector comprising N elements. The average of the N elements in the first network output can be obtained, and the average can be subtracted from each element in the first network output to obtain the processed N elements. Each of the processed N elements is subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficient, which comprises N coefficients.
[0372] In this embodiment, the reversible computation problem is solved by using division with remainder during the encoding process. Specifically, the coefficients of the linear terms are converted into fractional form, with the numerator of each dimension serving as the denominator of the previous dimension. The data for each dimension is multiplied by the numerator of the current linear term coefficient and the remainder from the previous dimension is added. Then, division with remainder is performed on the denominator to obtain the result for the current dimension. Simultaneously, the remainder of the division with remainder is passed to the next dimension to eliminate numerical errors. During the decoding process, the inverse operation of division with remainder is required.
[0373] In one possible implementation, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N elements include a first target element and a second target element. The first target element corresponds to the first target coefficient, and the second target element corresponds to the second target coefficient. A first fixed-point number corresponding to the first target element and a second fixed-point number corresponding to the second target element are obtained. A first fraction corresponding to the first target coefficient and a second fraction corresponding to the second target coefficient are obtained. The first fraction includes a first numerator and a first denominator, and the second fraction includes a second numerator and a second denominator. The first numerator, first denominator, second numerator, and second denominator are all integers. The first denominator is the same as the second numerator; the first fixed-point number is multiplied by the first denominator to obtain a first result; the first result is divided by the first numerator to obtain a second result, the second result including a first quotient and a first remainder, the first quotient being used as the result of the division between the first target element and the first target coefficient; the second fixed-point number is multiplied by the second denominator to obtain a third result; the third result is added by the first remainder to obtain a fourth result; the fourth result is divided by the second numerator to obtain a fifth result, the fifth result including a second quotient and a second remainder, the second quotient being used as the result of the division between the second target element and the second target coefficient.
[0374] For example, the number of fixed points of the first data x can be [29, 33, 165], where the remainder of the output of the adjacent previous volumetric layer or encoding side is 1, the preset coefficient s is [0.65, 0.61, 2.52], and the fraction corresponding to the preset coefficient s is represented as [2 / 3, 3 / 5, 5 / 2], where the first fixed point number is 165, the second fixed point number is 33, the first target coefficient is 2.52, the second target coefficient is 0.61, the first fraction is 5 / 2, the second fraction is 3 / 5, the first numerator is 5, the first denominator is 2, the second numerator is 3, and the second denominator is 5. Multiply the first fixed-point number (165) with the first denominator (2) and add the multiplication result to the remainder result 1 to obtain the first result (331). Divide the first result (331) with the first numerator (5) to obtain the second result. The second result includes the first quotient result (66) and the first remainder result (1). The first quotient result (66) is used as the result of the division between the first target element and the first target coefficient. Multiply the second fixed-point number (33) with the second denominator (5) to obtain the third result (165). Add the third result (165) with the first remainder result (1) to obtain the fourth result (166). Divide the fourth result (166) with the second numerator (3) to obtain the fifth result. The fifth result includes the second quotient result (55) and the second remainder result (1). The second quotient result (55) is used as the result of the division between the second target element and the second target coefficient.
[0375] The second target element is the last element among the N elements that is divided with its corresponding coefficient during the division operation between the first data and the preset coefficient. The target preservation volumetric flow layer is also used to output the second remainder result. Specifically, the target preservation volumetric flow layer can output the second remainder result to the next preservation volumetric flow layer adjacent to the target preservation volumetric flow layer. That is, each element in the first data obtains a remainder result based on the above method and inputs it into the calculation process of the next element until the product operation of the last element in the first data is completed. At this time, the remainder result can be input into the next adjacent preservation volumetric flow layer.
[0376] In one possible implementation, the volumetric flow model further includes a first volumetric flow layer, which is the volumetric flow layer adjacent to the target volumetric flow layer. The remainder result output by the first volumetric flow layer is obtained. The first fixed-point number is multiplied by the first denominator, and the multiplication result is added to the remainder result output by the first volumetric flow layer to obtain the first result.
[0377] In one possible implementation, the input to the target preservation volumetric layer is the output of the hidden variable. Then, the remainder result of the encoding side output can be obtained. After multiplying the first fixed-point number with the first denominator, the result of the multiplication operation needs to be added to the remainder result of the encoding side output to obtain the first result.
[0378] In one possible implementation, the target preservation volumetric flow layer is further used to perform a subtraction operation between the first data and a constant term to obtain a subtraction result, wherein the constant term is not 0; then the subtraction result can be divided by the preset coefficient.
[0379] Specifically, the second data input to the target preserving vortex layer can be processed by a second neural network to obtain the constant term.
[0380] The first and second neural networks mentioned above can be complex convolutional neural networks, such as ResNet, DenseNet, etc. For a detailed description of convolutional neural networks, please refer to... Figure 2b and Figure 2c The corresponding implementation examples are described in detail here.
[0381] In this embodiment of the application, the output of the target preservation volumetric layer may include the second data, that is, the second data serves as the basis for calculating constant terms and preset coefficients, and also as part of the output of the target preservation volumetric layer.
[0382] Next, we will describe the inverse operation corresponding to the target preservation volumetric flow layer in this embodiment of the application, specifically how to perform the inverse operation based on z to recover x.
[0383] During the inverse operation, z is decomposed into two parts: z = [z] a , z b The inverse operation of this layer is x = f. -1 (z) is: x a =z a , x = [x a x b ];
[0384] in, This is element-wise division of a vector. The reverse process requires calculating x. i =z i / s i +t i Given the output r∈[0, 2]. C The corresponding numerical calculation method can be described as follows: First, calculate y i =2 k ·z i-round(2 k ·t i For i = 1, 2, ..., d b : at this time For i = d b ,...,2,1:v=y i ·m i +r;y i ←floor(v / m i-1 ), r←v mod m i-1 At this time, x i ≈y i / s i ; Output and r.
[0385] In the above calculation process, it is obvious that... Due to the uniqueness of the division with remainder operation, x and r can be guaranteed to be completely reversible, that is, the forward and reverse operations of the target preserving the strobosphere can completely recover the original x and remainder result r.
[0386] In one implementation, one can refer to Figure 10 The illustrated architecture of the volumetric flow model further includes a target convolutional layer connected to the target volumetric flow layer. The output of the target convolutional layer is the first data, and the target convolutional layer is used to perform a division operation between the input data and the weight matrix. In one implementation, the target convolutional layer is a 1x1 convolutional layer.
[0387] Specifically, a weight matrix can be obtained; the weight matrix can be decomposed into a first matrix, a second matrix, a third matrix, and a fourth matrix. The first matrix is a scrambled matrix, the second matrix is a lower triangular matrix, the third matrix is an identity matrix whose diagonal elements have a product of 1, and the fourth matrix is an upper triangular matrix.
[0388] First, the input data can be multiplied by the inverse of the first matrix to obtain a sixth result; the sixth result can be multiplied by the inverse of the second matrix to obtain a seventh result; the seventh result can be multiplied by the inverse of the third matrix to obtain an eighth result; and the eighth result can be multiplied by the inverse of the fourth matrix to obtain a ninth result. This ninth result is used as the result of the division operation between the input data and the weight matrix. For instructions on how to perform the inverse operation of the target convolutional layer, please refer to [reference needed]. Figure 3 The description of the inverse operation of the target convolutional layer in the corresponding embodiment will not be repeated here.
[0389] In one implementation, preprocessing operations such as de-encoding can be performed on the data to be encoded during the encoding process. To recover the original data to be encoded, the inverse operation of the preprocessing during encoding can be performed after obtaining the decoding result. For example, the fixed-point number of the decoded output x can be calculated, where... Then use U(0, 2) -h Encode u with δ; and output x = 2 h ·(x+0.5), where 0.5 is a preset parameter.
[0390] As shown in Table 1 below, compared with the current best lossless compression method based on the stream model, the local bits-back coding (LBB) model in this application embodiment reduces the number of encoding operations by more than 180 times and improves the compression efficiency by 7 times.
[0391] Table 1
[0392]
[0393] Taking image data as an example, as shown in Table 2, the embodiments of this application have achieved good compression ratios on various image datasets, and can be effectively applied to lossless image compression tasks. Optimal lossless compression ratios have been achieved on datasets such as CIFAR10, ImageNet32, and ImageNet. A key reason for achieving good compression ratios is the strong data distribution fitting ability of the volumetric flow model used.
[0394] Table 2
[0395]
[0396] Furthermore, as shown in Table 3 below, the volumetric compression model in this application embodiment exhibits excellent generalization performance, enabling the compression of images of various types and sizes using a single model. The volumetric compression model was trained using the ImageNet64 dataset (input size 64x64), and lossless compression tests were performed on natural images (natural images were cut into 64x64 blocks; for images smaller than 64x64, color blocks with appropriate pixel values were used to fill the image to a 64x64 size). This achieved a compression ratio exceeding 3 times, far surpassing existing lossless compression methods.
[0397] Table 3
[0398]
[0399] Figure 11 This is a schematic diagram of a system architecture provided in an embodiment of this application. Figure 11In the process, the execution device 110 is configured with an input / output (I / O) interface 112 for data interaction with external devices. Users can input data (e.g., data to be encoded or encoded data) into the I / O interface 112 through the client device 140.
[0400] During the preprocessing of input data by the execution device 120, or during the calculation module 111 of the execution device 120 performing calculations and other related processing (such as implementing the neural network function in this application), the execution device 120 may call data, code, etc. in the data storage system 150 for corresponding processing, or store the data, instructions, etc. obtained from the corresponding processing into the data storage system 150.
[0401] Finally, I / O interface 112 returns the processing result (e.g., encoded or decoded data) to client device 140, thereby providing it to the user.
[0402] Optionally, the customer device 140 may be, for example, a control unit in an autonomous driving system or a functional algorithm module in a mobile terminal, such as a functional algorithm module that can be used to perform related tasks.
[0403] It is worth noting that the training device 120 can generate corresponding target models / rules (such as the target neural network model in this embodiment) based on different training data for different objectives or tasks. The corresponding target models / rules can be used to achieve the above objectives or complete the above tasks, thereby providing the user with the required results.
[0404] exist Figure 11 In the scenario shown, the user can manually provide input data, which can be done through the interface provided by I / O interface 112. Alternatively, the client device 140 can automatically send input data to I / O interface 112. If user authorization is required for the client device 140 to automatically send input data, the user can set the corresponding permissions in the client device 140. The user can view the output results of the execution device 110 on the client device 140, which can take the form of display, sound, or action. The client device 140 can also act as a data acquisition terminal, collecting the input data and output results of the input I / O interface 112 as new sample data and storing them in the database 130. Alternatively, data can be collected directly from the I / O interface 112 without going through the client device 140, using the input data and output results of the input I / O interface 112 as new sample data and storing them in the database 130.
[0405] It is worth noting that, Figure 11This is merely a schematic diagram of a system architecture provided in an embodiment of this application. The positional relationships between the devices, components, modules, etc., shown in the diagram do not constitute any limitation. For example, in Figure 11 In this context, the data storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 may also be placed within the execution device 110.
[0406] exist Figures 3 to 11 Based on the corresponding embodiments, in order to better implement the above-described solutions of this application, related equipment for implementing the above solutions is also provided below. See details. Figure 12 , Figure 12 This is a schematic diagram of a data encoding device 1200 provided in an embodiment of this application. The data encoding device 1200 can be a terminal device or a server. The data encoding device 1200 includes:
[0407] Module 1201 is used to acquire data to be encoded;
[0408] The volumetric flow preservation module 1202 is used to process the data to be encoded through a volumetric flow preservation model to obtain latent variable output; wherein, the volumetric flow preservation model includes a target volumetric flow preservation layer, the operation corresponding to the target volumetric flow preservation layer is an invertible operation that satisfies the volumetric flow preservation constraint, and the target volumetric flow preservation layer is used to perform a multiplication operation on the first data input to the target volumetric flow preservation layer and a preset coefficient, wherein the preset coefficient is not 1;
[0409] The encoding module 1203 is used to encode the output of the hidden variable to obtain encoded data.
[0410] In one possible implementation, the volume-preserving flow constraint includes: the input space and output space of the operation corresponding to the volume-preserving operation layer have the same volume size.
[0411] In one possible implementation, the first data and the preset coefficients are vectors, the first data includes N elements, the preset coefficients include N coefficients, the N elements of the first data correspond one-to-one with the N coefficients, and the product of the N coefficients is 1; the multiplication operation between the first data and the preset coefficients includes:
[0412] Perform a multiplication operation on each element in the first data with its corresponding coefficient to obtain the product result.
[0413] In one possible implementation, the volumetric flow preservation module is used to process the second data input to the target volumetric flow preservation layer through a first neural network to obtain a first network output, and to perform a preset operation on the first network output to obtain the preset coefficient.
[0414] In one possible implementation, the first network output is a vector comprising N elements. The acquisition module is configured to acquire the average of the N elements in the first network output and subtract the average from each element in the first network output to obtain the processed N elements.
[0415] Each of the N processed elements is subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which include N coefficients.
[0416] In one possible implementation, the output of the target volumetric flow layer includes the second data.
[0417] In one possible implementation, the target preservation volumetric flow layer is further used to add the product of the first data and a preset coefficient to a constant term, wherein the constant term is not 0.
[0418] In one possible implementation, the volumetric flow preservation module is used to process the second data input to the target volumetric flow preservation layer through a second neural network to obtain the constant term.
[0419] In one possible implementation, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N elements include a first target element and a second target element. The first target element corresponds to the first target coefficient, and the second target element corresponds to the second target coefficient. The volumetric flow preservation module is used to obtain the first fixed-point number corresponding to the first target element and the second fixed-point number corresponding to the second target element.
[0420] Obtain the first score corresponding to the first target coefficient and the second score corresponding to the second target coefficient. The first score includes a first numerator and a first denominator, and the second score includes a second numerator and a second denominator. The first numerator, the first denominator, the second numerator and the second denominator are integers, and the first denominator is the same as the second numerator.
[0421] Multiply the first fixed-point number with the first numerator to obtain the first result;
[0422] The first result is divided by the first denominator to obtain a second result, which includes a first quotient and a first remainder. The first quotient is used as the result of multiplying the first target element and the first target coefficient.
[0423] Multiply the second fixed-point number with the second numerator to obtain the third result;
[0424] The third result is added to the first remainder result to obtain the fourth result;
[0425] The fourth result is divided by the second denominator to obtain the fifth result, which includes the second quotient and the second remainder. The second quotient is used as the result of multiplying the second target element and the second target coefficient.
[0426] In one possible implementation, the second target element is the last of the N elements that is multiplied with the corresponding coefficient during the multiplication operation of the first data and the preset coefficient. The target preservation volumetric flow layer is also used to output the second remainder result.
[0427] In one possible implementation, the target volumetric flow layer is further configured to output the second remainder result to the next volumetric flow layer adjacent to the target volumetric flow layer.
[0428] In one possible implementation, the volumetric flow model further includes a first volumetric flow layer, which is the volumetric flow layer adjacent to the target volumetric flow layer, and the volumetric flow module is used to obtain the remainder result output by the first volumetric flow layer.
[0429] The first fixed-point number is multiplied by the first numerator, and the result of the multiplication is added to the remainder of the first volumetric flow layer output to obtain the first result.
[0430] In one possible implementation, the preserved volumetric model includes M serial preserved volumetric layers, the M serial preserved volumetric layers including the target preserved volumetric layer, and the output of the (i-1)th preserved volumetric layer is used as the input of the ith preserved volumetric layer, where i is a positive integer not greater than M, the input of the ith preserved volumetric layer is the data to be encoded, and the output of the Mth preserved volumetric layer is the latent variable output.
[0431] In one possible implementation, the volumetric flow model further includes a target convolutional layer connected to the target volumetric flow layer, wherein the output of the target volumetric flow layer is used as the input of the target convolutional layer, and the target convolutional layer is used to perform a multiplication operation between the output of the target volumetric flow layer and the weight matrix.
[0432] In one possible implementation, the volumetric module is used to obtain the weight matrix;
[0433] The weight matrix is decomposed by LU to obtain a first matrix, a second matrix, a third matrix and a fourth matrix. The first matrix is a scrambled matrix, the second matrix is a lower triangular matrix, the third matrix is an identity matrix with a product of 1 for its diagonal elements, and the fourth matrix is an upper triangular matrix.
[0434] The output of the target preserving volumetric layer is multiplied with the fourth matrix to obtain the sixth result;
[0435] The sixth result is multiplied by the third matrix to obtain the seventh result;
[0436] The seventh result is multiplied by the second matrix to obtain the eighth result;
[0437] The eighth result is multiplied with the first matrix to obtain a ninth result, which is used as the result of multiplying the output of the target preserving volumetric layer with the weight matrix.
[0438] In one possible implementation, the volumetric-preserving model includes M sequentially connected volumetric-preserving layers and M convolutional layers. The M volumetric-preserving layers include the target volumetric-preserving layer, and the M convolutional layers include the target convolutional layer. The output of the i-th volumetric-preserving layer is used as the input of the i-th convolutional layer, and the output of the i-th convolutional layer is used as the input of the (i+1)-th volumetric-preserving layer, where i is a positive integer not greater than M. The input of the first volumetric-preserving layer is the data to be encoded, and the output of the M-th convolutional layer is the latent variable output.
[0439] See Figure 13 , Figure 13 This is a schematic diagram of a data decoding device 1300 provided in an embodiment of this application. The data decoding device 1300 can be a terminal device or a server. The data decoding device 1300 includes:
[0440] Module 1301 is used to acquire encoded data;
[0441] Decoding module 1302 is used to decode the encoded data to obtain the latent variable output;
[0442] The volumetric flow preservation module 1303 is used to process the latent variable output through the volumetric flow preservation model to obtain the decoded output; wherein, the volumetric flow preservation model includes a target volumetric flow preservation layer, the operation corresponding to the target volumetric flow preservation layer is a reversible operation that satisfies the volumetric flow preservation constraint, and the target volumetric flow preservation layer is used to perform a multiplication operation on the first data input to the target volumetric flow preservation layer and a preset coefficient, wherein the preset coefficient is not 1.
[0443] In one possible implementation, the volume-preserving flow constraint includes: the input space and output space of the operation corresponding to the volume-preserving operation layer have the same volume size.
[0444] In one possible implementation, the first data and the preset coefficients are vectors, the first data includes N elements, the preset coefficients include N coefficients, the N elements of the first data correspond one-to-one with the N coefficients, and the product of the N coefficients is 1; the division operation between the first data and the preset coefficients includes:
[0445] Perform a division operation on each element in the first data with its corresponding coefficient to obtain the division result.
[0446] In one possible implementation, the volumetric flow preservation module is used to process the second data input to the target volumetric flow preservation layer through a first neural network to obtain a first network output, and to perform a preset operation on the first network output to obtain the preset coefficient.
[0447] In one possible implementation, the first network output is a vector comprising N elements. The acquisition module is configured to acquire the average of the N elements in the first network output and subtract the average from each element in the first network output to obtain the processed N elements.
[0448] Each of the N processed elements is subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which include N coefficients.
[0449] In one possible implementation, the output of the target volumetric flow layer includes the second data.
[0450] In one possible implementation, the target preservation bulk flow layer is further used to perform a subtraction operation between the first data and a constant term to obtain a subtraction result, wherein the constant term is not 0;
[0451] The acquisition module is used to perform a division operation between the subtraction result and the preset coefficient.
[0452] In one possible implementation, the volumetric flow preservation module is used to process the second data input to the target volumetric flow preservation layer through a second neural network to obtain the constant term.
[0453] In one possible implementation, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N elements include a first target element and a second target element. The first target element corresponds to the first target coefficient, and the second target element corresponds to the second target coefficient. The volumetric flow preservation module is used to obtain the first fixed-point number corresponding to the first target element and the second fixed-point number corresponding to the second target element.
[0454] Obtain the first score corresponding to the first target coefficient and the second score corresponding to the second target coefficient. The first score includes a first numerator and a first denominator, and the second score includes a second numerator and a second denominator. The first numerator, the first denominator, the second numerator and the second denominator are integers, and the first numerator and the second denominator are the same.
[0455] Multiply the first fixed-point number with the first denominator to obtain the first result;
[0456] The first result is divided by the first numerator to obtain a second result, which includes a first quotient and a first remainder. The first quotient is used as the result of the division between the first target element and the first target coefficient.
[0457] Multiply the second fixed-point number with the second denominator to obtain the third result;
[0458] The third result is added to the first remainder result to obtain the fourth result;
[0459] The fourth result is divided with the second numerator to obtain a fifth result, which includes a second quotient and a second remainder. The second quotient is used as the result of the division between the second target element and the second target coefficient.
[0460] In one possible implementation, the second target element is the last element among the N elements that is divided with the corresponding coefficient during the division operation between the first data and the preset coefficient. The target preservation volume flow layer is also used to output the second remainder result.
[0461] In one possible implementation, the volumetric flow model further includes a first volumetric flow layer, which is the volumetric flow layer adjacent to the target volumetric flow layer. The acquisition module is used to acquire the remainder result output by the first volumetric flow layer; multiply the first fixed-point number by the first denominator, and add the multiplication result to the remainder result output by the first volumetric flow layer to obtain the first result.
[0462] In one possible implementation, the preserved volumetric model includes M serial preserved volumetric layers, the M serial preserved volumetric layers including the target preserved volumetric layer, and the output of the (i-1)th preserved volumetric layer is used as the input of the ith preserved volumetric layer, where i is a positive integer not greater than M, the input of the ith preserved volumetric layer is the latent variable output, and the output of the Mth preserved volumetric layer is the decoded output.
[0463] In one possible implementation, the volumetric flow model further includes a target convolutional layer connected to the target volumetric flow layer, wherein the output of the target convolutional layer is the first data, and the target convolutional layer is used to perform a division operation on the input data and the weight matrix.
[0464] In one possible implementation, the volumetric module is used to obtain the weight matrix;
[0465] The weight matrix is decomposed by LU to obtain a first matrix, a second matrix, a third matrix and a fourth matrix. The first matrix is a scrambled matrix, the second matrix is a lower triangular matrix, the third matrix is an identity matrix with a product of 1 for its diagonal elements, and the fourth matrix is an upper triangular matrix.
[0466] The input data is multiplied by the inverse of the first matrix to obtain the sixth result;
[0467] The sixth result is multiplied by the inverse of the second matrix to obtain the seventh result;
[0468] The seventh result is multiplied by the inverse of the third matrix to obtain the eighth result;
[0469] The eighth result is multiplied by the inverse of the fourth matrix to obtain the ninth result, which is used as the result of the division operation between the input data and the weight matrix.
[0470] In one possible implementation, the volumetric-preserving model includes M sequentially connected volumetric-preserving layers and M convolutional layers. The M volumetric-preserving layers include the target volumetric-preserving layer, and the M convolutional layers include the target convolutional layer. The output of the i-th convolutional layer is used as the input of the i-th volumetric-preserving layer, and the output of the i-th volumetric-preserving layer is used as the input of the (i+1)-th convolutional layer, where i is a positive integer not greater than M. The input of the first convolutional layer is the hidden variable output, and the output of the M-th volumetric-preserving layer is the decoded output.
[0471] The following describes an execution device provided in an embodiment of this application. Please refer to [link / reference]. Figure 14 , Figure 14 This is a schematic diagram of an execution device provided in an embodiment of this application. The execution device 1400 can specifically be a virtual reality (VR) device, a mobile phone, a tablet, a laptop computer, a smart wearable device, a monitoring data processing device, a server, etc., and is not limited thereto. The execution device 1400 may be equipped with... Figure 3 The corresponding data encoding device or Figure 7 The corresponding data decoding apparatus described in the embodiment. Specifically, the execution device 1400 may include: a receiver 1401, a transmitter 1402, a processor 1403, and a memory 1404 (wherein the number of processors 1403 in the execution device 1400 may be one or more). Figure 15 (Taking a processor as an example), processor 1403 may include application processor 14031 and communication processor 14032. In some embodiments of this application, receiver 1401, transmitter 1402, processor 1403 and memory 1404 may be connected via a bus or other means.
[0472] Memory 1404 may include read-only memory and random access memory, and provides instructions and data to processor 1403. A portion of memory 1404 may also include non-volatile random access memory (NVRAM). Memory 1404 stores processor and operation instructions, executable modules, or data structures, or subsets thereof, or extended sets thereof, wherein the operation instructions may include various operation instructions for implementing various operations.
[0473] Processor 1403 controls the operation of the execution device. In specific applications, the various components of the execution device are coupled together through a bus system, which may include not only the data bus, but also power buses, control buses, and status signal buses. However, for clarity, all buses are referred to as the bus system in the diagram.
[0474] The methods disclosed in the embodiments of this application can be applied to or implemented by the processor 1403. The processor 1403 can be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of the processor 1403 or by instructions in software form. The processor 1403 can be a general-purpose processor, a digital signal processor (DSP), a microprocessor, or a microcontroller, and may further include an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. The processor 1403 can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can reside in a mature storage medium in the art, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory 1404. Processor 1403 reads the information in memory 1404 and, in conjunction with its hardware, completes the steps of the above method.
[0475] Receiver 1401 can be used to receive input digital or character information, and to generate signal inputs related to the settings and function control of the execution device. Transmitter 1402 can be used to output digital or character information through the first interface; transmitter 1402 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; transmitter 1402 may also include a display device such as a display screen.
[0476] Specifically, the application processor 14031 is used to acquire the data to be encoded;
[0477] The data to be encoded can be image, video, or text data.
[0478] Taking image data as an example, the image can be an image captured by the terminal device through a camera, or it can be an image obtained from within the terminal device (e.g., an image stored in the terminal device's photo album, or an image obtained by the terminal device from the cloud). It should be understood that the image can be an image that requires image compression, and this application does not limit the source of the image to be processed.
[0479] The data to be encoded is processed by a volumetric flow model to obtain latent variable output; wherein, the volumetric flow model includes a target volumetric flow layer, the operation corresponding to the target volumetric flow layer is an invertible operation that satisfies the volumetric flow constraint, and the target volumetric flow layer is used to multiply the first data input to the target volumetric flow layer with a preset coefficient, wherein the preset coefficient is not 1;
[0480] Among them, the target volume-preserving flow layer can also be called the target volume-preserving coupling layer;
[0481] The volumetric flow constraint refers to the consistency of the input and output volumes of the operation corresponding to the volumetric operation layer. Consistent volume means a one-to-one correspondence between the data in the input and output volumes; different output data correspond to different input data. To ensure that the operation corresponding to the target volumetric flow layer satisfies the volumetric flow constraint, the product of the coefficients of the first-order terms in the operation must be 1. Specifically, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients, and the N coefficients in the preset coefficients are the coefficients of the first-order terms in the operation corresponding to the target volumetric flow layer, and the product of the N coefficients is 1.
[0482] The term "reversible operation" refers to an operation that can both obtain output data from input data and deduce input data from output data. For example, if the input data is x and the output data is z = f(x), x can be recovered from the output data z through the inverse operation.
[0483] The output of the hidden variable is encoded to obtain encoded data.
[0484] In this embodiment of the application, the latent variable output z can be derived from the probability distribution p. Z (z) indicates that, according to the probability distribution p Z (z) Encodes the latent variable output z to obtain encoded data.
[0485] In one alternative implementation, the encoded data is a binary bitstream. The probability estimate of each point in the latent variable output can be obtained using an entropy estimation network. The latent variable output is then entropy encoded using this probability estimate to obtain the binary bitstream. It should be noted that the entropy encoding process mentioned in this application can use existing entropy encoding techniques, which will not be elaborated upon here.
[0486] This application utilizes a volumetric flow model to achieve lossless compression. Compared with the integer flow model, the target volumetric flow layer in the volumetric flow model, while ensuring reversibility, includes operations other than integer addition and subtraction (multiplication), which makes the volumetric flow model have stronger representation capabilities and can more accurately determine the data distribution, thereby achieving a better compression ratio.
[0487] On the other hand, for general flow models, it can be proven that there is no method to achieve numerical invertibility in discrete space. This is because there will always be cases where latent variables correspond to multiple input data due to numerical errors. In such cases, multiple encoding operations must be performed to eliminate numerical errors, leading to low algorithm efficiency. However, the volumetric flow model in this application embodiment utilizes a numerically invertible target volumetric flow layer to achieve numerically invertible operations. While ensuring the model has strong representational capabilities, the compression process achieves a very small number of encoding operations, thereby achieving higher compression throughput and lower compression ratio.
[0488] In one possible implementation, the first data and the preset coefficients are vectors, the first data includes N elements, the preset coefficients include N coefficients, the N elements of the first data correspond one-to-one with the N coefficients, and the product of the N coefficients is 1; the multiplication operation between the first data and the preset coefficients includes:
[0489] Perform a multiplication operation on each element in the first data with its corresponding coefficient to obtain the product result.
[0490] In one possible implementation, specifically, the application processor 14031 is configured to process the second data input to the target preserving vortex layer through a first neural network to obtain a first network output, and to perform a preset operation on the first network output to obtain the preset coefficients. In one implementation, the preset operation is an exponential operation with the natural constant e as the base.
[0491] In one possible implementation, the first network output is a vector comprising N elements. Specifically, the application processor 14031 is used to obtain the average of the N elements in the first network output and subtract the average from each element in the first network output to obtain the processed N elements.
[0492] Each of the N processed elements is subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which include N coefficients.
[0493] To ensure that the product of the N coefficients in the preset coefficients is 1, the average of each element in the first network output can be subtracted. Specifically, the first network output is a vector containing N elements. The average of the N elements in the first network output can be obtained, and the average can be subtracted from each element in the first network output to obtain the processed N elements. Each of the processed N elements is then subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which consist of N coefficients.
[0494] In one possible implementation, the output of the target volumetric flow layer includes the second data.
[0495] In one possible implementation, the target preservation volumetric flow layer is further used to add the product of the first data and a preset coefficient to a constant term, wherein the constant term is not 0.
[0496] In one possible implementation, specifically, application processor 14031 is used to process the second data input to the target preserving volumetric layer via a second neural network to obtain the constant term.
[0497] In one possible implementation, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N elements include a first target element and a second target element. The first target element corresponds to the first target coefficient, and the second target element corresponds to the second target coefficient. Specifically, the application processor 14031 is used to obtain the first fixed-point number corresponding to the first target element and the second fixed-point number corresponding to the second target element.
[0498] Obtain the first score corresponding to the first target coefficient and the second score corresponding to the second target coefficient. The first score includes a first numerator and a first denominator, and the second score includes a second numerator and a second denominator. The first numerator, the first denominator, the second numerator and the second denominator are integers, and the first denominator is the same as the second numerator.
[0499] Multiply the first fixed-point number with the first numerator to obtain the first result;
[0500] The first result is divided by the first denominator to obtain a second result, which includes a first quotient and a first remainder. The first quotient is used as the result of multiplying the first target element and the first target coefficient.
[0501] Multiply the second fixed-point number with the second numerator to obtain the third result;
[0502] The third result is added to the first remainder result to obtain the fourth result;
[0503] The fourth result is divided by the second denominator to obtain the fifth result, which includes the second quotient and the second remainder. The second quotient is used as the result of multiplying the second target element and the second target coefficient.
[0504] In this embodiment, the reversible calculation problem is solved by using division with remainder. Specifically, the coefficients of the linear terms are converted into fractional form, with the numerator of each dimension serving as the denominator of the previous dimension. The data for each dimension is multiplied by the numerator of the current linear term coefficient and the remainder from the previous dimension is added. Then, division with remainder is performed on the denominator to obtain the result for the current dimension. Simultaneously, the remainder from the division with remainder is passed to the next dimension to eliminate numerical errors.
[0505] For example, the fixed-point number of the first data x can be [44 / 16, 55 / 16, 66 / 16], where 16 indicates that the precision of the fixed-point number is not in the multiplication operation. Then the fixed-point number of the first data is x = [44, 55, 66]. The preset coefficient s is [0.65, 0.61, 2.52]. The fraction corresponding to the preset coefficient s is [2 / 3, 3 / 5, 5 / 2]. Here, the first fixed-point number is 44, the second fixed-point number is 55, the first target coefficient is 0.65, the second target coefficient is 0.61, the first fraction is 2 / 3, the second fraction is 3 / 5, the first numerator is 2, the first denominator is 3, the second numerator is 3, and the second denominator is 5. Multiply the first fixed-point number (44) with the first numerator (2) to obtain a first result (88). Divide the first result (88) with the first denominator (3) to obtain a second result. The second result includes a first quotient result (29) and a first remainder result (1). The first quotient result (29) is used as the result of multiplying the first target element with the first target coefficient. Multiply the second fixed-point number (55) with the second numerator (3) to obtain a third result (165). Add the third result (165) with the first remainder result (1) to obtain a fourth result (166). Divide the fourth result (166) with the second denominator (5) to obtain a fifth result. The fifth result includes a second quotient result (33) and a second remainder result (1). The second quotient result (33) is used as the result of multiplying the second target element with the second target coefficient.
[0506] In one possible implementation, the second target element is the last element among the N elements that undergoes multiplication with its corresponding coefficient during the multiplication operation of the first data and a preset coefficient. The target preservation capacitive flow layer is also used to output the second remainder result. Specifically, the target preservation capacitive flow layer can output the second remainder result to the next preservation capacitive flow layer adjacent to the target preservation capacitive flow layer. That is, each element in the first data obtains a remainder result based on the above method and inputs it into the calculation process of the next element until the product operation of the last element in the first data is completed. At this point, the remainder result can be input into the next adjacent preservation capacitive flow layer.
[0507] In one possible implementation, the target volumetric flow layer is further configured to output the second remainder result to the next volumetric flow layer adjacent to the target volumetric flow layer.
[0508] In one possible implementation, the volumetric flow preservation model further includes a first volumetric flow preservation layer, which is the volumetric flow preservation layer adjacent to the target volumetric flow preservation layer. Specifically, the application processor 14031 is used for...
[0509] Obtain the remainder result of the output of the first preservation volumetric flow layer;
[0510] The first fixed-point number is multiplied by the first numerator, and the result of the multiplication is added to the remainder of the first volumetric flow layer output to obtain the first result.
[0511] In one implementation, if the target volumetric conservation layer is the first volumetric conservation layer in the volumetric conservation model (that is, the volumetric conservation layer that processes the data to be encoded), then the first result is the result of multiplying the first fixed-point number with the first numerator. If the target volumetric conservation layer is not the first volumetric conservation layer in the volumetric conservation model (that is, the volumetric conservation layer that does not process the data to be encoded, but processes the output results of other intermediate layers), then the first fixed-point number is the sum of the result of multiplying the first fixed-point number with the first numerator and the remainder result of the output of the adjacent previous volumetric conservation layer.
[0512] In one possible implementation, the preserve-flow model comprises M serial preserve-flow layers, including the target preserve-flow layer. The output of the (i-1)th preserve-flow layer is used as the input of the ith preserve-flow layer, where i is a positive integer not greater than M. The input of the ith preserve-flow layer is the data to be encoded, and the output of the Mth preserve-flow layer is the latent variable output. The preserve-flow model can be a stack of multiple preserve-flow layers.
[0513] In one possible implementation, the volumetric flow model further includes a target convolutional layer connected to the target volumetric flow layer, wherein the output of the target volumetric flow layer is used as the input of the target convolutional layer, and the target convolutional layer is used to perform a multiplication operation between the output of the target volumetric flow layer and the weight matrix.
[0514] In one possible implementation, specifically, application processor 14031 is used to obtain the weight matrix;
[0515] The weight matrix is decomposed by LU to obtain a first matrix, a second matrix, a third matrix and a fourth matrix. The first matrix is a scrambled matrix, the second matrix is a lower triangular matrix, the third matrix is an identity matrix with a product of 1 for its diagonal elements, and the fourth matrix is an upper triangular matrix.
[0516] The output of the target preserving volumetric layer is multiplied with the fourth matrix to obtain the sixth result;
[0517] The sixth result is multiplied by the third matrix to obtain the seventh result;
[0518] The seventh result is multiplied by the second matrix to obtain the eighth result;
[0519] The eighth result is multiplied with the first matrix to obtain a ninth result, which is used as the result of multiplying the output of the target preserving volumetric layer with the weight matrix.
[0520] In this embodiment, the target convolutional layer is transformed into matrix multiplication operations involving continuous upper triangular matrices, diagonal matrices, lower triangular matrices, and scrambled matrices. Iterative computation, numerical computation of the coupling layer, iterative computation, and element rearrangement are applied to each of the four matrix multiplication methods. When performing multiplication operations with the weight matrix in the convolutional layer, the target convolutional layer is transformed into matrix multiplication operations involving continuous upper triangular matrices, diagonal matrices, lower triangular matrices, and scrambled matrices. Iterative computation, numerical computation of the target convolutional layer while preserving its volumetric properties, iterative computation, and element rearrangement are applied to each of the four matrix multiplication methods. Reversible computation methods for each method are provided, thereby achieving numerically reversible computation of the target convolutional layer.
[0521] In one possible implementation, the volumetric-preserving model includes M serially connected volumetric-preserving layers and M convolutional layers. The M volumetric-preserving layers include the target volumetric-preserving layer, and the M convolutional layers include the target convolutional layer. The output of the i-th volumetric-preserving layer is used as the input of the i-th convolutional layer, and the output of the i-th convolutional layer is used as the input of the (i+1)-th volumetric-preserving layer, where i is a positive integer not greater than M. The input of the first volumetric-preserving layer is the data to be encoded, and the output of the M-th convolutional layer is the latent variable output. The volumetric-preserving model can be a stack of multiple volumetric-preserving layers and convolutional layers.
[0522] Specifically, the application processor 14031 is used to acquire encoded data;
[0523] The encoded data is decoded to obtain the latent variable output;
[0524] The latent variable output is processed by the volumetric flow model to obtain the decoded output; wherein, the volumetric flow model includes a target volumetric flow layer, the operation corresponding to the target volumetric flow layer is an invertible operation that satisfies the volumetric flow constraint, and the target volumetric flow layer is used to multiply the first data input to the target volumetric flow layer with a preset coefficient, wherein the preset coefficient is not 1.
[0525] In one possible implementation, the volume-preserving flow constraint includes: the input space and output space of the operation corresponding to the volume-preserving operation layer have the same volume size.
[0526] In one possible implementation, the first data and the preset coefficients are vectors, the first data includes N elements, the preset coefficients include N coefficients, the N elements of the first data correspond one-to-one with the N coefficients, and the product of the N coefficients is 1; the division operation between the first data and the preset coefficients includes:
[0527] Perform a division operation on each element in the first data with its corresponding coefficient to obtain the division result.
[0528] In one possible implementation, specifically, the application processor 14031 is used to process the second data input to the target preservation bulk current layer through a first neural network to obtain the first network output, and to perform a preset operation on the first network output to obtain the preset coefficient.
[0529] In one possible implementation, the first network output is a vector comprising N elements. Specifically, the application processor 14031 is used to obtain the average of the N elements in the first network output and subtract the average from each element in the first network output to obtain the processed N elements.
[0530] Each of the N processed elements is subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which include N coefficients.
[0531] In one possible implementation, the output of the target volumetric flow layer includes the second data.
[0532] In one possible implementation, the target preservation bulk flow layer is further used to perform a subtraction operation between the first data and a constant term to obtain a subtraction result, wherein the constant term is not 0;
[0533] Perform a division operation between the subtraction result and the preset coefficient.
[0534] In one possible implementation, specifically, application processor 14031 is used to process the second data input to the target preserving volumetric layer via a second neural network to obtain the constant term.
[0535] In one possible implementation, the first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N elements include a first target element and a second target element. The first target element corresponds to a first target coefficient. Specifically, the application processor 14031 is used to obtain the first fixed-point number corresponding to the first target element and the second fixed-point number corresponding to the second target element.
[0536] Obtain the first score corresponding to the first target coefficient and the second score corresponding to the second target coefficient. The first score includes a first numerator and a first denominator, and the second score includes a second numerator and a second denominator. The first numerator, the first denominator, the second numerator and the second denominator are integers, and the first denominator is the same as the second numerator.
[0537] Multiply the first fixed-point number with the first denominator to obtain the first result;
[0538] The first result is divided by the first numerator to obtain a second result, which includes a first quotient and a first remainder. The first quotient is used as the result of the division between the first target element and the first target coefficient.
[0539] Multiply the second fixed-point number with the second denominator to obtain the third result;
[0540] The third result is added to the first remainder result to obtain the fourth result;
[0541] The fourth result is divided with the second numerator to obtain a fifth result, which includes a second quotient and a second remainder. The second quotient is used as the result of the division between the second target element and the second target coefficient.
[0542] In one possible implementation, the second target element is the last element among the N elements that is divided with the corresponding coefficient during the division operation between the first data and the preset coefficient. The target preservation volume flow layer is also used to output the second remainder result.
[0543] In one possible implementation, the volumetric flow model further includes a first volumetric flow layer, which is the volumetric flow layer adjacent to the target volumetric flow layer. Specifically, the application processor 14031 is used to obtain the first fixed-point number of the first target element and add the first fixed-point number to the remainder result output by the first volumetric flow layer to obtain the first fixed-point number corresponding to the first target element.
[0544] In one possible implementation, the preserved volumetric model includes M serial preserved volumetric layers, the M serial preserved volumetric layers including the target preserved volumetric layer, and the output of the (i-1)th preserved volumetric layer is used as the input of the ith preserved volumetric layer, where i is a positive integer not greater than M, the input of the ith preserved volumetric layer is the latent variable output, and the output of the Mth preserved volumetric layer is the decoded output.
[0545] In one possible implementation, the volumetric flow model further includes a target convolutional layer connected to the target volumetric flow layer, wherein the output of the target convolutional layer is the first data, and the target convolutional layer is used to perform a division operation on the input data and the weight matrix.
[0546] In one possible implementation, specifically, application processor 14031 is used to obtain the weight matrix;
[0547] The weight matrix is decomposed by LU to obtain a first matrix, a second matrix, a third matrix and a fourth matrix. The first matrix is a scrambled matrix, the second matrix is a lower triangular matrix, the third matrix is an identity matrix with a product of 1 for its diagonal elements, and the fourth matrix is an upper triangular matrix.
[0548] The input data is multiplied by the inverse of the first matrix to obtain the sixth result;
[0549] The sixth result is multiplied by the inverse of the second matrix to obtain the seventh result;
[0550] The seventh result is multiplied by the inverse of the third matrix to obtain the eighth result;
[0551] The eighth result is multiplied by the inverse of the fourth matrix to obtain the ninth result, which is used as the result of the division operation between the input data and the weight matrix.
[0552] In one possible implementation, the volumetric-preserving model includes M sequentially connected volumetric-preserving layers and M convolutional layers. The M volumetric-preserving layers include the target volumetric-preserving layer, and the M convolutional layers include the target convolutional layer. The output of the i-th convolutional layer is used as the input of the i-th volumetric-preserving layer, and the output of the i-th volumetric-preserving layer is used as the input of the (i+1)-th convolutional layer, where i is a positive integer not greater than M. The input of the first convolutional layer is the hidden variable output, and the output of the M-th volumetric-preserving layer is the decoded output.
[0553] This application also provides a computer program product that, when run on a computer, causes the computer to perform the aforementioned actions. Figure 14 The steps performed by the execution device in the method described in the illustrated embodiment.
[0554] This application embodiment also provides a computer-readable storage medium storing a program for performing signal processing, which, when run on a computer, causes the computer to perform the aforementioned actions. Figure 14 The steps performed by the execution device in the method described in the illustrated embodiment.
[0555] The execution device, training device, or terminal device provided in this application embodiment can specifically be a chip. The chip includes a processing unit and a communication unit. The processing unit can be, for example, a processor, and the communication unit can be, for example, an input / output interface, pins, or circuitry. The processing unit can execute computer execution instructions stored in the storage unit, causing the chip within the execution device to perform the aforementioned operations. Figure 3 The data encoding method described in the illustrated embodiment, or, to cause a chip within the training device to perform the above... Figure 7The data decoding method described in the illustrated embodiment. Optionally, the storage unit is a storage unit within the chip, such as a register, cache, etc. The storage unit can also be a storage unit located outside the chip within the wireless access device, such as a read-only memory (ROM) or other types of static storage devices capable of storing static information and instructions, random access memory (RAM), etc.
[0556] For details, please refer to Figure 15 , Figure 15 This is a schematic diagram of a chip provided in an embodiment of this application. The chip can be represented as a neural network processor (NPU) 1500. The NPU 1500 is mounted as a coprocessor on the host CPU, and tasks are assigned by the host CPU. The core part of the NPU is the arithmetic circuit 1503, which is controlled by the controller 1504 to extract matrix data from the memory and perform multiplication operations.
[0557] In some implementations, the arithmetic circuit 1503 internally includes multiple processing engines (PEs). In some implementations, the arithmetic circuit 1503 is a two-dimensional pulsating array. The arithmetic circuit 1503 can also be a one-dimensional pulsating array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1503 is a general-purpose matrix processor.
[0558] For example, suppose we have an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit retrieves the corresponding data of matrix B from the weight memory 1502 and caches it in each PE of the arithmetic circuit. The arithmetic circuit retrieves the data of matrix A from the input memory 1501 and performs matrix operations with matrix B. The partial result or the final result of the obtained matrix is stored in the accumulator 1508.
[0559] Unified memory 1506 is used to store input and output data. Weight data is directly transferred to weight memory 1502 via Direct Memory Access Controller (DMAC) 1505. Input data is also transferred to unified memory 1506 via DMAC.
[0560] BIU stands for Bus Interface Unit, which is used for interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 1509.
[0561] The Bus Interface Unit (BIU) 1510 is used by the instruction fetch memory 1509 to fetch instructions from external memory, and also by the memory access controller 1505 to fetch the original data of the input matrix A or the weight matrix B from external memory.
[0562] The DMAC is mainly used to move input data from external memory DDR to unified memory 1506, or to weight data to weight memory 1502, or to input data to input memory 1501.
[0563] The vector computation unit 1507 includes multiple arithmetic processing units that further process the output of the computation circuit as needed, such as vector multiplication, vector addition, exponential operations, logarithmic operations, size comparisons, etc. It is mainly used for computation in non-convolutional / fully connected layers of neural networks, such as batch normalization, pixel-level summation, and upsampling of feature planes.
[0564] In some implementations, the vector computation unit 1507 can store the processed output vector in the unified memory 1506. For example, the vector computation unit 1507 can apply linear and / or nonlinear functions to the output of the computation circuit 1503, such as performing linear interpolation on feature planes extracted by convolutional layers, or accumulating a vector of values to generate activation values. In some implementations, the vector computation unit 1507 generates normalized values, pixel-level summed values, or both. In some implementations, the processed output vector can be used as activation input to the computation circuit 1503, for example, for use in subsequent layers of the neural network.
[0565] The instruction fetch buffer 1509 connected to the controller 1504 is used to store the instructions used by the controller 1504;
[0566] Unified memory 1506, input memory 1501, weighted memory 1502, and instruction fetch memory 1509 are all on-chip memories. External memory is proprietary to this NPU hardware architecture.
[0567] The processor mentioned above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of a program in the first aspect of the method.
[0568] It should also be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. In addition, in the device embodiment drawings provided in this application, the connection relationship between modules indicates that they have a communication connection, which can be implemented as one or more communication buses or signal lines.
[0569] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware, or it can be implemented by special-purpose hardware including application-specific integrated circuits, special-purpose CPUs, special-purpose memory, special-purpose components, etc. Generally, any function performed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be diverse, such as analog circuits, digital circuits, or special-purpose circuits. However, for this application, software program implementation is more often the preferred implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, ROM, RAM, magnetic disk, or optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, training equipment, or network device, etc.) to execute the methods described in the various embodiments of this application.
[0570] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product.
[0571] The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device or data center that integrates one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state drives (SSDs)).
Claims
1. A data encoding method, characterized in that, The method includes: Obtain the data to be encoded; The data to be encoded is processed using a volumetric flow model to obtain latent variable outputs. The volumetric flow model includes a target volumetric flow layer, the operation corresponding to which is a reversible operation satisfying volumetric flow constraints, and the target volumetric flow layer is used to multiply the first data input to the target volumetric flow layer with a preset coefficient, where the preset coefficient is not 1. The volumetric flow model includes multiple volumetric flow layers. The volumetric flow constraint means that the input space and output space of the operation corresponding to the target volumetric flow layer are of the same size. The output of the hidden variable is encoded to obtain encoded data.
2. The method according to claim 1, characterized in that, The first data and the preset coefficient are a vector. The first data includes N elements, and the preset coefficient includes N coefficients. The N elements of the first data correspond one-to-one with the N coefficients, and the product of the N coefficients is 1. The step of performing the multiplication operation between the first data and the preset coefficient includes: Perform a multiplication operation on each element in the first data with its corresponding coefficient to obtain the product result.
3. The method according to claim 1, characterized in that, The method further includes: processing the second data input to the target preserving vortex layer through a first neural network to obtain a first network output, and performing a preset operation on the first network output to obtain the preset coefficient.
4. The method according to claim 3, characterized in that, The first network output is a vector, and the first network output includes N elements. The preset operation on the output of the first neural network includes: Obtain the average of the N elements included in the first network output, and subtract the average from each element included in the first network output to obtain the processed N elements; Each of the N processed elements is subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which include N coefficients.
5. The method according to claim 3, characterized in that, The output of the target volumetric fluid layer includes the second data.
6. The method according to claim 1, characterized in that, The target preservation flow layer is also used to add the product of the first data and the preset coefficient to a constant term, wherein the constant term is not 0.
7. The method according to claim 6, characterized in that, The method further includes: The second data input to the target preserving vortex layer is processed by a second neural network to obtain the constant term.
8. The method according to claim 1, characterized in that, The first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N elements include a first target element and a second target element. The first target element corresponds to the first target coefficient, and the second target element corresponds to the second target coefficient. The multiplication operation between the first data and the preset coefficients includes: Obtain the first fixed-point number corresponding to the first target element and the second fixed-point number corresponding to the second target element; Obtain the first score corresponding to the first target coefficient and the second score corresponding to the second target coefficient. The first score includes a first numerator and a first denominator, and the second score includes a second numerator and a second denominator. The first numerator, the first denominator, the second numerator and the second denominator are integers, and the first denominator is the same as the second numerator. Multiply the first fixed-point number with the first numerator to obtain the first result; The first result is divided by the first denominator to obtain a second result, which includes a first quotient and a first remainder. The first quotient is used as the result of multiplying the first target element and the first target coefficient. Multiply the second fixed-point number with the second numerator to obtain the third result; The third result is added to the first remainder result to obtain the fourth result; The fourth result is divided by the second denominator to obtain the fifth result, which includes the second quotient and the second remainder. The second quotient is used as the result of multiplying the second target element and the second target coefficient.
9. The method according to claim 8, characterized in that, The second target element is the last element among the N elements that is multiplied with the corresponding coefficient during the multiplication operation of the first data and the preset coefficient. The target preservation volume flow layer is also used to output the second remainder result.
10. The method according to claim 9, characterized in that, The preserved volumetric model includes M serial preserved volumetric layers, the M serial preserved volumetric layers include the target preserved volumetric layer, and the output of the (i-1)th preserved volumetric layer is used as the input of the ith preserved volumetric layer, where i is a positive integer not greater than M, the input of the ith preserved volumetric layer is the data to be encoded, and the output of the Mth preserved volumetric layer is the latent variable output.
11. The method according to claim 10, characterized in that, The volumetric flow preservation model further includes a first volumetric flow preservation layer, which is the volumetric flow preservation layer adjacent to the target volumetric flow preservation layer. The step of multiplying the first fixed-point number with the first numerator to obtain a first result includes: Obtain the remainder result of the output of the first preservation volumetric flow layer; The first fixed-point number is multiplied by the first numerator, and the result of the multiplication is added to the remainder of the first volumetric flow layer output to obtain the first result.
12. The method according to claim 1, characterized in that, The volumetric flow model further includes a target convolutional layer connected to the target volumetric flow layer. The output of the target volumetric flow layer is used as the input of the target convolutional layer, and the target convolutional layer is used to perform a multiplication operation between the output of the target volumetric flow layer and the weight matrix.
13. The method according to claim 12, characterized in that, The multiplication operation between the output of the target preserving fluid layer and the weight matrix includes: Obtain the weight matrix; The weight matrix is decomposed by LU to obtain a first matrix, a second matrix, a third matrix and a fourth matrix. The first matrix is a scrambled matrix, the second matrix is a lower triangular matrix, the third matrix is an identity matrix with a product of 1 for its diagonal elements, and the fourth matrix is an upper triangular matrix. The output of the target preserving volumetric layer is multiplied with the fourth matrix to obtain the sixth result; The sixth result is multiplied by the third matrix to obtain the seventh result; The seventh result is multiplied by the second matrix to obtain the eighth result; The eighth result is multiplied with the first matrix to obtain a ninth result, which is used as the result of multiplying the output of the target preserving volumetric layer with the weight matrix.
14. The method according to claim 12, characterized in that, The volumetric-preserving model includes M volumetric-preserving layers and M convolutional layers connected in series. The M volumetric-preserving layers include the target volumetric-preserving layer, and the M convolutional layers include the target convolutional layer. The output of the i-th volumetric-preserving layer is used as the input of the i-th convolutional layer, and the output of the i-th convolutional layer is used as the input of the (i+1)-th volumetric-preserving layer. Here, i is a positive integer not greater than M. The input of the first volumetric-preserving layer is the data to be encoded, and the output of the M-th convolutional layer is the latent variable output.
15. A data decoding method, characterized in that, The method includes: Obtain encoded data; The encoded data is decoded to obtain the latent variable output; The latent variable output is processed using a volumetric flow model to obtain a decoded output. The volumetric flow model includes a target volumetric flow layer, the operation corresponding to which is a reversible operation satisfying volumetric flow constraints, and the target volumetric flow layer is used to multiply the first data input to the target volumetric flow layer with a preset coefficient, where the preset coefficient is not 1. The volumetric flow model includes multiple volumetric flow layers. The volumetric flow constraint means that the input space and output space of the operation corresponding to the target volumetric flow layer are of the same size.
16. The method according to claim 15, characterized in that, The first data and the preset coefficient are a vector. The first data includes N elements, and the preset coefficient includes N coefficients. The N elements of the first data correspond one-to-one with the N coefficients, and the product of the N coefficients is 1. The division operation between the first data and the preset coefficient includes: Perform a division operation on each element in the first data with its corresponding coefficient to obtain the division result.
17. The method according to claim 15, characterized in that, The method further includes: processing the second data input to the target preserving vortex layer through a first neural network to obtain a first network output, and performing a preset operation on the first network output to obtain the preset coefficient.
18. The method according to claim 17, characterized in that, The first network output is a vector, and the first network output includes N elements. The preset operation on the output of the first neural network includes: Obtain the average of the N elements included in the first network output, and subtract the average from each element included in the first network output to obtain the processed N elements; Each of the N processed elements is subjected to an exponential operation with the natural constant e as the base to obtain the preset coefficients, which include N coefficients.
19. The method according to claim 17, characterized in that, The output of the target volumetric fluid layer includes the second data.
20. The method according to claim 15, characterized in that, The target volumetric flow layer is also used to perform a subtraction operation between the first data and a constant term to obtain a subtraction result, wherein the constant term is not 0; The division operation between the first data and the preset coefficient includes: Perform a division operation between the subtraction result and the preset coefficient.
21. The method according to claim 20, characterized in that, The method further includes: The second data input to the target preserving vortex layer is processed by a second neural network to obtain the constant term.
22. The method according to claim 15, characterized in that, The first data and the preset coefficients are vectors. The first data includes N elements, and the preset coefficients include N coefficients. The N elements of the first data correspond one-to-one with the N coefficients. The N elements include a first target element and a second target element. The first target element corresponds to the first target coefficient, and the second target element corresponds to the second target coefficient. The division operation between the first data and the preset coefficients includes: Obtain the first fixed-point number corresponding to the first target element and the second fixed-point number corresponding to the second target element; Obtain the first score corresponding to the first target coefficient and the second score corresponding to the second target coefficient. The first score includes a first numerator and a first denominator, and the second score includes a second numerator and a second denominator. The first numerator, the first denominator, the second numerator and the second denominator are integers, and the first numerator and the second denominator are the same. Multiply the first fixed-point number with the first denominator to obtain the first result; The first result is divided by the first numerator to obtain a second result, which includes a first quotient and a first remainder. The first quotient is used as the result of the division between the first target element and the first target coefficient. Multiply the second fixed-point number with the second denominator to obtain the third result; The third result is added to the first remainder result to obtain the fourth result; The fourth result is divided with the second numerator to obtain a fifth result, which includes a second quotient and a second remainder. The second quotient is used as the result of the division between the second target element and the second target coefficient.
23. The method according to claim 22, characterized in that, The second target element is the last element among the N elements that is divided with the corresponding coefficient during the division operation between the first data and the preset coefficient. The target preservation volume flow layer is also used to output the second remainder result.
24. The method according to claim 22, characterized in that, The volumetric flow preservation model further includes a first volumetric flow preservation layer, which is the volumetric flow preservation layer adjacent to the target volumetric flow preservation layer. The step of multiplying the first fixed-point number with the first denominator to obtain a first result includes: Obtain the remainder result of the output of the first preservation volumetric flow layer; The first fixed-point number is multiplied by the first denominator, and the result of the multiplication is added to the remainder of the first volumetric flow layer output to obtain the first result.
25. The method according to claim 22, characterized in that, The preserved volumetric model includes M serial preserved volumetric layers, the M serial preserved volumetric layers include the target preserved volumetric layer, and the output of the (i-1)th preserved volumetric layer is used as the input of the ith preserved volumetric layer, where i is a positive integer not greater than M, the input of the ith preserved volumetric layer is the latent variable output, and the output of the Mth preserved volumetric layer is the decoding output.
26. The method according to claim 15, characterized in that, The volumetric flow model further includes a target convolutional layer connected to the target volumetric flow layer. The output of the target convolutional layer is the first data, and the target convolutional layer is used to perform a division operation on the input data and the weight matrix.
27. The method according to claim 26, characterized in that, The division operation between the input data and the weight matrix includes: Obtain the weight matrix; The weight matrix is decomposed by LU to obtain a first matrix, a second matrix, a third matrix and a fourth matrix. The first matrix is a scrambled matrix, the second matrix is a lower triangular matrix, the third matrix is an identity matrix with a product of 1 for its diagonal elements, and the fourth matrix is an upper triangular matrix. The input data is multiplied by the inverse of the first matrix to obtain the sixth result; The sixth result is multiplied by the inverse of the second matrix to obtain the seventh result; The seventh result is multiplied by the inverse of the third matrix to obtain the eighth result; The eighth result is multiplied by the inverse of the fourth matrix to obtain the ninth result, which is used as the result of the division operation between the input data and the weight matrix.
28. The method according to claim 26, characterized in that, The volumetric-preserving model includes M volumetric-preserving layers and M convolutional layers connected in series. The M volumetric-preserving layers include the target volumetric-preserving layer, and the M convolutional layers include the target convolutional layer. The output of the i-th convolutional layer is used as the input of the i-th volumetric-preserving layer, and the output of the i-th volumetric-preserving layer is used as the input of the (i+1)-th convolutional layer, where i is a positive integer not greater than M. The input of the first convolutional layer is the latent variable output, and the output of the M-th volumetric-preserving layer is the decoded output.
29. A data encoding device, characterized in that, It includes a storage medium, a processing circuit, and a bus system; wherein the storage medium is used to store instructions, and the processing circuit is used to execute the instructions in the memory to perform the steps of the method according to any one of claims 1 to 14.
30. A data decoding device, characterized in that, It includes a storage medium, processing circuitry, and a bus system; wherein the storage medium is used to store instructions, and the processing circuitry is used to execute the instructions in the memory to perform the steps of the method according to any one of claims 15 to 28.
31. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the steps of the method according to any one of claims 1 to 28.
32. A computer program product, characterized in that, The computer program product includes code that, when executed, performs the steps of the method according to any one of claims 1 to 28.