Decoding / encoding method applying linear model coefficient update and apparatus incorporating the same

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The method updates linear model coefficients using L1-based robust regression to address prediction challenges in video coding, enhancing accuracy and efficiency by minimizing objective functions and reducing overfitting.

WO2026137361A1PCT designated stage Publication Date: 2026-07-02SHENZHEN TCL NEW-TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: SHENZHEN TCL NEW-TECH CO LTD
Filing Date: 2024-12-26
Publication Date: 2026-07-02

Application Information

Patent Timeline

26 Dec 2024

Application

02 Jul 2026

Publication

WO2026137361A1

IPC: H04N19/176

AI Tagging

Technology Topics

Theoretical computer scienceLinear model

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Bidirectional schema modification on tree-structured schemas
US20260169961A1Special data processing applications Database design/maintainance Theoretical computer science Data science
A method for modeling delay differential equations based on Bayesian optimization and neural networks
CN121920244BAlgorithm Theoretical computer science
Large language model text provenance method based on virtual prompt word embedding
CN121959528Bquick fitLower deployment costsBiological models Program/content distribution protection Linguistic model Theoretical computer science
Methods and apparatus for processing trusted data
CN117892308BRealize the processing functionincrease credibility Theoretical computer science Data transport
A client selection method and system for multi-task federated learning
CN122287781AComputation complexity Theoretical computer science

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing video coding technologies face challenges in accurately predicting chroma components using linear models due to mismatch between cost functions in least squares regression and evaluation metrics, overfitting issues, and complexity in L1-based robust regression, particularly in small block sizes with large models.

Method used

Implement a method for updating linear model coefficients using L1-based robust regression, involving deriving an update vector and step size to minimize the objective function, which includes sum of absolute differences or regularized least squares, to improve prediction accuracy.

Benefits of technology

Enhances prediction accuracy and reduces overfitting by optimizing coefficient updates, thereby improving video decoding and encoding efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN2024142939_02072026_PF_FP_ABST

Patent Text Reader

Abstract

A decoding method and an encoding method are provided. The decoding method includes: acquiring initial values for coefficients of a linear model prediction mode; deriving a coefficient update; acquiring updated values for the coefficients of the linear model prediction mode based on the coefficient update and the initial values;and applying the updated values to generate prediction samples of a current block.

Need to check novelty before this filing date? Find Prior Art

Description

DECODING / ENCODING METHOD APPLYING LINEAR MODEL COEFFICIENT UPDATE AND APPARATUS INCORPORATING THE SAMETECHNICAL FIELD

[0001] The present disclosure generally relates to encoding and decoding technology, and in particular to a decoding method, an encoding method, an apparatus for video decoding or encoding, and a computer readable non-transitory medium.BACKGROUND

[0002] Under the context of video compression, colour image or a frame of a colour video usually consists of three colour components, namely a luma component Y and two chroma components Cb and Cr. Each component is represented as a data matrix. The data matrix for each component is decomposed into blocks associated with specific encoding parameters. A block is usually a square or rectangle whose dimensions are integer powers of 2. The coding of an image is processed in raster scanning order: from left to right, then from top to bottom. Within a specific block or a plurality of blocks, luma component is usually coded before chroma components.

[0003] Popular video coding standards such as Versatile Video Coding (VVC) uses a prediction / transform hybrid coding framework. Prediction refers to predicting current block (i.e., the block to be coded) using coded blocks or coded areas within the same frame (i.e., intra prediction) or from a different frame (i.e., inter prediction) . When performing intra or inter prediction, the encoder tries multiple intra prediction modes available to the coding standard, computes and compares corresponding prediction blocks and chooses the best prediction mode. The difference between original current block and the prediction block generated by the chosen prediction mode, namely the residual, will also be coded. By transmitting prediction modes and residuals only, the encoder is able to instruct the decoder to decode and reconstruct the original image or video or its approximation.

[0004] In VVC and in recent studies towards future video coding standards, several prediction modes use linear models to generate prediction blocks. Notable examples of these prediction modes include Cross-Component Linear Model (CCLM) , Gradient Linear Model (GLM) , Convolutional Cross-Component Model (CCCM) , Local Illumination Compensation (LIC) , etc. Such prediction modes are used in parallel with other intra and / or inter prediction modes like DC mode, planar mode, angular modes, chroma direct mode, etc.

[0005] There are common aspects amongst the above-mentioned prediction modes. General procedures involve building a linear model, solving a linear regression problem and applying the linear model to predict images: 1) Define a formula to generate the prediction block. The formula is a linear function of reconstructed samples and known attributes. 2) Derive coefficients in the formula using samples from template areas with respect to the reference block and the current block. For example, in CCLM where chroma block is predicted by co-located luma block, samples of chroma values and co-located luma values in the template area are used. The derivation of coefficients is the process of solving the linear regression problem using these samples. 3) Apply the derived formula on the current block to generate the prediction block.SUMMARY

[0006] Accordingly, the present disclosure aims to provide a decoding method, an encoding method, an apparatus for video decoding or encoding, and a computer readable non-transitory medium.

[0007] A technical scheme adopted by the present disclosure is to provide a decoding method. The method includes: acquiring initial values for coefficients of a linear model of a prediction mode; deriving a coefficient update; acquiring updated values for the coefficients of the linear model based on the coefficient update and the initial values; and applying the updated values to generate prediction samples of a current block.

[0008] Another technical scheme adopted by the present disclosure is to provide an encoding method. The method includes: acquiring initial values for coefficients of a linear model of a prediction mode; deriving a coefficient update; acquiring updated values for the coefficients of the linear model based on the coefficient update and the initial values; and applying the updated values to generate prediction samples of a current block.

[0009] Another technical scheme adopted by the present disclosure is to provide an apparatus for video decoding or encoding. The apparatus includes a processor and a memory. The memory is configured to store executable instructions that, when executed by the processor, cause the processor to perform any of the foregoing methods.

[0010] Another technical scheme adopted by the present disclosure is to provide a computer readable medium. The computer readable medium is configured to store executable instructions that, when executed by the processor, cause the processor to perform any of the foregoing methods.BRIEF DESCRIPTION OF THE DRAWINGS

[0011] In order to clearly explain the technical solutions in the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. Obviously, the drawings in the following description are merely some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings may also be obtained based on these drawings without any creative work.

[0012] FIG. 1 shows a schematic diagram of a conventional video encoding system.

[0013] FIG. 2 shows a schematic diagram of a conventional video decoding system.

[0014] FIG. 3 illustrates template areas of a current block and a corresponding reference block.

[0015] FIG. 4 shows several Sobel operators.

[0016] FIG. 5 is a flowchart of a decoding method applying linear model coefficient update according to an embodiment of the present disclosure.

[0017] FIG. 6 is a flowchart of an encoding method applying linear model coefficient update according to an embodiment of the present disclosure.

[0018] FIG. 7 shows a schematic diagram of an apparatus for encoding or decoding according to an embodiment of the present disclosure.DETAILED DESCRIPTION

[0019] The disclosure will now be described in detail with reference to the accompanying drawings and examples. Apparently, the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts shall fall within the protection scope of the present invention.

[0020] To help understand the technical solutions proposed in the embodiments of this application, a brief introduction of video encoding and decoding system will be provided below.

[0021] As shown in FIG. 1, a video encoding system 110 may include multiple modules, such as block partitioning unit 1101, transform and quantization unit 1102, intra-frame estimation unit 1103, intra-frame prediction unit 1104, motion compensation unit 1105, motion estimation unit 1106, an inverse transformation and inverse quantization unit 1107, a filter control analysis unit 1108, a filtering unit 1109, an encoding unit 1110, an encoded image buffer unit 1111 and a subtractor 1112.

[0022] Original video signals include video frames. Each video frame is divided into blocks by a block partitioning unit 1101. For each of the video frames, the subtractor 1112 generates residual pixel information about a residual frame by subtracting the input video frame from the output of the intra-frame prediction unit 1104 or the motion compensation unit 1105. The residual pixel information obtained after intra-frame prediction or inter-frame prediction (motion compensation) , is transformed by the transformation and quantization unit 1102. The transformation includes transforming the residual pixel information from the pixel domain to a transform domain, and the resulting transform coefficients are quantized to further reduce the bit rate. The intra-frame estimation unit 1103 performs intra-frame estimation, and the intra-frame prediction unit 1104 performs intra-frame prediction on the video reconstructed blocks. Motion estimation performed by the motion estimation unit 1106 is a process of generating a motion vector that can estimate the displacement of the reconstructed video block, and then motion compensation is performed by the motion compensation unit 1105 based on the determined motion vector. After determining an intra-frame prediction mode, the intra-frame prediction unit 1104 provides selected intra-frame predicted data to the encoding unit 1110, and the motion estimation unit 1106 also sends calculated motion vector data to the encoding unit 1110. The inverse transform and inverse quantization unit 1107 reconstructs the video reconstructed blocks and reconstructs a residual block in the pixel domain, and the filtering unit 1109 is controlled by the filter analysis unit 1108 to remove the blocking artifacts in the reconstructed residual block, and the encoding unit 1110 adds the reconstructed residual block to the prediction block of the encoded image buffer unit 1111 to generate a reconstructed block. The encoding unit 1110 is used for encoding various encoding parameters and quantized transform coefficients (quantized transform coefficients) into bitstream, and outputs the bitstream of the video signals. The encoded image buffer unit 1111 is used for storing reconstructed blocks as the reference blocks for intra-frame prediction. As the video image encoding progresses, new reconstructed blocks are continuously generated, and these blocks are stored in the encoded image buffer unit 1111.

[0023] As shown in FIG. 2, a video decoding system 120 may include multiple modules such as a decoding unit 1201, an inverse transform and inverse quantization unit 1202, an intra-frame prediction unit 1203, a motion compensation unit 1204, a filtering unit 1205, a decoded image buffer unit 1206 and a post filtering unit 1207.

[0024] The input signals of video frames are encoded by the video encoding system 110 to obtain an output bitstream. The video encoding system 110 transmits the bitstream to the video decoding system 120. The video decoding system 120 receives the bitstream representing the video frames in an encoded format (i.e., in a compressed format) . In the video decoding system 120, the bitstream is processed by the decoding unit 1201 to obtain decoded transform coefficients. The inverse transform and inverse quantization unit 1202 process the transform coefficients to generate a residual block in the pixel domain. The intra-frame prediction unit 1203 is operable to generate an intra-frame prediction block for a current video decoding block based on a determined intra-frame prediction mode and data from previously decoded blocks of the current video frame or picture. The motion compensation unit 1204 determines the inter-frame prediction information for the current video decoding block and generates an inter-frame prediction block by parsing the motion vector and other associated syntax elements. Finally, the decoded video block is formed by summing the residual block from the inverse transform and inverse quantization unit 1202 and the corresponding prediction block generated by the intra-frame prediction unit 1203 or the motion compensation unit 1204. In order to improve video quality, the decoded video blocks are filtered through the filtering unit 1205 to remove blocking artifacts. The decoded video block is then stored in the decoded image buffer unit 1206 as the reference block for subsequent intra-prediction or motion compensation, and for video output, i.e., to reproduce and reconstruct the original video signals. The output video can be optionally further processed by a post filtering unit 1207 for more suitable or enhanced viewing experiences.

[0025] The present disclosure aims to specify and apply coefficient update of a linear model for a corresponding prediction mode. The following sections provide introduction for several related technologies of the present disclosure, including: 1) Prediction modes with linear models; and 2) Alternative regression methods and strategies. ● Prediction modes with linear models

[0026] In VVC and in recent studies towards future video coding standards, several prediction modes use linear models to generate prediction blocks. Notable examples of these prediction modes include Cross-Component Linear Model (CCLM) , Gradient Linear Model (GLM) , Convolutional Cross-Component Model (CCCM) , Local Illumination Compensation (LIC) , etc. Such prediction modes are used in parallel with other intra and / or inter prediction modes like DC mode, planar mode, angular modes, chroma direct mode, etc. In the decoding unit 1201 of the video decoding system, bitstreams are decoded in order to decide the prediction mode to be used for the current block.

[0027] There are common aspects amongst the above-mentioned prediction modes. General procedures involve building a linear model, solving a linear regression problem and applying the linear model to predict images: 1) Define a formula to generate the prediction block. The formula is a linear function of reconstructed samples and known attributes. 2) Derive coefficients in the formula using samples from template areas with respect to the reference block and the current block. For example, in CCLM where chroma block is predicted by co-located luma block, samples of chroma values and co-located luma values in the template area are used. The derivation of coefficients is the process of solving the linear regression problem using these samples. 3) Apply the derived formula on the current block to generate the prediction block. In mathematical terms, the procedures for generating a prediction block for the current block Predcur is:

[0028] In this formulation, β is the vector of the coefficients of the linear prediction function and βopt contains the optimal function coefficients derived by the samples in the template areas T. and Recref are sample vectors from the reconstructed values in the reference block and the template of the reference block, respectively. βopt is derived by minimizing the sum of squared difference (SSD) between the reconstructed samples RecTcur in the template area of the current block Tcur and the prediction values generated by taking reconstructed samples RecTref in the template area of the reference block Tref into the prediction function. The derivation of βopt is in fact solving an ordinary least squares (OLS) linear regression problem. There is a close-form solution to this OLS problem:

[0029] In the above equation, X is an n-by-p matrix and y is an n-by-1 vector, with n being the number of pixels in T and p being the dimension of the sampling vector as well as β. Each row of X is the transpose of sample vector of a pixel in the template area of the reference block and p is the number of elements of The ith element of y is the value in corresponding to the ith row of X.

[0030] With βopt and the reconstructed samples of the reference block Recref , the prediction values for the current block can be generated.

[0031] Typically, template areas are defined as the left and top neighbour areas of the reference and current blocks as shown in FIG. 3. The predictive ability of the linear model comes from the spatial proximity between the blocks and the templates.

[0032] In the present disclosure, these modes are in general referred to as linear model prediction modes. Below are brief descriptions of these prediction modes.

[0033] (1) Cross-Component Linear Model (CCLM)

[0034] CCLM is a cross-component intra prediction mode that predicts chroma image or chroma block using reconstructed luma image or luma block. The term ‘cross-component’ stems from the fact that prediction is made from one colour component to another.

[0035] The formula for predicting a chroma block (Cb or Cr) in CCLM is as follows: PredC (x, y) =a·Rec′L (x, y) +b

[0036] In the above equation, PredC (x, y) is the value of pixel to be predicted in a chroma block, Rec′L (x, y) is the value of the corresponding pixel in the co-located reconstructed luma block, which has been adjusted to the size of the chroma block and (x, y) is the coordinate of the pixel. As the dimensions (height and width) of the luma image are twice the dimensions of the chroma image in 4: 2: 0 picture format, Rec′L is usually obtained by downsampling the actually reconstructed luma image RecL.

[0037] The model coefficients a and b are calculated according to the relations between the luma and chroma images in the template area, which includes a row of pixels adjacent to the top of the current block and a column of pixels adjacent to the left of the current block. b=M (tC) -α·M (tL)

[0038] In the above equation, tL is the value of a pixel in the downsampled luma template area, tC is the value of a pixel in the chroma template area, (x, y) is the coordinate of the pixel and M (X) is the mean of sample X is the corresponding template area.

[0039] As per the above-mentioned common aspects, chroma block is the current block to be predicted, luma block serves as the reference block, and the template areas are the reconstructed pixels on the left and the top of respective blocks.

[0040] (2) Convolutional Cross-Component Model (CCCM)

[0041] CCCM is another cross-component prediction mode. CCCM differs from CCLM and GLM by its typically wider template area and the integration of various types of samples in the prediction formula.

[0042] The formula for predicting a chroma block (Cb or Cr) in CCCM is as follows: redC (x, y) =c0·Rec′L (x, y) +c1·Rec′L (x-1, y) +c2·Rec′L (x+1, y) +c3·Rec′L (x, y-1) +c4·Rec′L (x, y+1) +c5· [Rec′L (x, y) ] 2+c6·2D-1 where D is the colour bit depth of the image and the constant 2D-1 is also known as the bias term.

[0043] The model coefficients β= {c0, c1, …, c6} in CCCM are obtained by a cost minimization formula:

[0044] In the above equation, R is the set of coordinates of all pixels in the template area, which includes several rows of pixels adjacent to the top of the current block and several columns of pixels adjacent to the left of the current block. This numerical optimization problem can be solved with Gaussian elimination method or other approximation algorithms.

[0045] Details can be obtained from P. Astola, J. Lainema, R.G. Youvalari, A. Aminlou, K. Panusopone, EE2-1.1a: Convolutional cross-component intra prediction model, document JVET-AA0126, Joint Video Experts Team (JVET) , Jul. 2022.

[0046] (3) Multi-Model Linear Model (MMLM)

[0047] MMLM is an extension of linear model prediction. In recent development of video coding standards, MMLM is applied as a variant of CCLM and CCCM, which are Multi-Model CCLM (MM-CCLM) and Multi-Model CCCM (MM-CCCM) . In contrast to ordinary linear model prediction methods where the whole template area is regarded as a single training set and one linear model is derived for the predicting the current block, MMLM classifies all pixels in the template area into two training sets and derives one linear model for each training set for making the prediction. For example, in MM-CCLM and MM-CCCM, pixels in the template area are classified based on whether its luma value is less than the average luma value in the whole template area. Meanwhile in the prediction stage, pixels in the current block can also be classified using the same criterion and the corresponding linear model is applied for the prediction.

[0048] The formula for predicting a chroma block (Cb or Cr) in MM-CCLM is as follows:

[0049] In the above equation, ML is the mean value of luma samples in the template area. Function coefficients {α0, β0} is used for predicting pixels whose luma values are smaller than or equal to ML and {α1, β1} is used for predicting pixels whose luma values are greater than ML. They are obtained separately: β1=M (tC1) -α1·M (tL1)

[0050] In the above equations, tL0 or tL1 is the value of a pixel in the downsampled luma template area, tC0 or tC1 is the value of a pixel in the chroma template area, (x, y) is the coordinate of the pixel and M (X) is the mean of sample X is the corresponding template area; tL0 and tC0 only include the pixels whose luma values are smaller than or equal to ML; tL1 and tC1 only include the pixels whose luma values are greater than ML.

[0051] Similarly in MM-CCCM, the formula for predicting a chroma block (Cb or Cr) is as follows:

[0052] The model coefficients and in MM-CCCM are obtained separately:

[0053] (4) Local-Boosting Cross-Component Prediction (LBCCP)

[0054] LBCCP is a method applied upon above-mentioned CCLM and CCCM, with the intention of improving prediction accuracy via noise removal or wrong classification correction in MMLM. After obtaining prediction block with CCLM or CCCM, an additional filter is applied on the prediction block, which results in a new block as the final prediction block. where denotes linear convolution, F is the LBCCP filter kernel.

[0055] (5) Gradient Linear Model (GLM)

[0056] GLM is another cross-component prediction mode that is similar to CCLM. Instead of directly using luma sample value as in CCLM, GLM utilizes luma sample gradients to derive the linear model. Therefore, the formula for predicting a chroma block (Cb or Cr) is changed to: PredC (x, y) =a·G (x, y) +b

[0057] The gradient G can be computed by applying one of the Sobel operators as shown in FIG. 4 on the luma samples.

[0058] Details can be obtained from: 1) C. -W. Kuo, X. Xiu, N. Yan, H. -J. Jhu, W. Chen, H. Gao, X. Wan, AHG12: Enhanced CCLM, document JVET-Z0140, Joint Video Experts Team (JVET) , Apr. 2022; 2) P. Astola, J. Lainema, R. G. Youvalari, A. Aminlou, K. Panusopone, C. -W. Kuo, H. -J. Jhu, X. Xiu, N. Yan, W. Chen, X. Wang, EE2-1.1c, 1.3a and 1.3b: Combined tests of EE2-1.1a, 1.1b and 1.2, document JVET-AA0126, Joint Video Experts Team (JVET) , Jul. 2022.

[0059] (6) Chroma Fusion Linear Model (CFLM)

[0060] Chroma fusion is an intra prediction method for chroma block, which takes in the predictor generated by direct mode (DM) and the predictor generated by MMLM and generates a new predictor via weighted average. The weights are picked from a set of fixed values.

[0061] Chroma fusion linear model (CFLM) is a further enhancement to chroma fusion. It uses a linear model of the reconstructed luma block and the predicted chroma block from non-linear model modes to generate the final predicted chroma block: PredC (x, y) =a0·Rec′L (x, y) +a1·Pred′C (x, y) +a2·2D-1

[0062] The model coefficients β= {a0, a1, a2} are obtained by a cost minimization formula with samples on the template:

[0063] Details can be obtained from C. Zhou, Z. Lv, J. Zhang, EE2-1.6: On Chroma Fusion improvement, document JVET-AC0119, Joint Video Experts Team (JVET) , Jan. 2023.

[0064] (7) Extrapolation filter-based Intra Prediction (EIP)

[0065] EIP is an intra prediction mode that progressively predicts each pixel in the current block from the pixel’s reconstructed or predicted neighbours.

[0066] The formula for predicting a block in EIP is as follows: N (x, y) = { (x-ox, y-oy) |0≤ox≤w, 0≤oy≤h, (ox≠0) ∨ (oy≠0) } where N (x, y) is the set of pixels within a w×h rectangle whose bottom-right-most location is (x, y) save (x, y) itself.

[0067] To calculate the model coefficients c (x′, y′) , a template area is defined as one or more rows adjacent to the top of the current block and one or more columns adjacent to the left of the current block. The solving process is similar to CCCM. If (x′, y′) is in the current block instead of the template area, Pred (x′, y′) is used in place of Rec (x′, y′) as there is no reconstructed value for that location.

[0068] Alternatively, the formula for predicting a block in EIP is as follows: N (x, y) = { (x-ox, y-oy) |0≤ox≤w, 0≤oy≤h, (ox≠0) ∨ (oy≠0) , (ox≠w) ∨ (oy≠h) } where cB is the coefficient related to the constant 2D-1 (also known as the bias term) .

[0069] Details can be obtained from: 1) L. Xu, Y. Yu, H. Yu, D. Wang, EE2-1.14: An extrapolation filter-based intra prediction mode, document JVET-AG0058, Joint Video Experts Team (JVET) , Jan. 2024; 2) J. Lainema, P. Astola, EE2-2.9: EIP with bias and clipping, document JVET-AH0086, Joint Video Experts Team (JVET) , Apr. 2024.

[0070] (8) Filtered Intra Block Copy (FIBC)

[0071] Intra block copy (IBC) is an intra prediction mode that prediction block is generated by copying a previously coded block (reference block) . Filtered IBC is an enhancement to IBC by creating a linear model between the reference block and the current block. PredCur (x, y) =c0·RecRef (x, y) +c1·RecRef (x-1, y) +c2·RecRef (x+1, y) +c3 ·RecRef (x, y-1) +c4·RecRef (x, y+1) +c5· [RecRef (x, y) ] 2+c6·2D-1

[0072] The model coefficients β= {c0, c1, …, c6} are obtained by a cost minimization formula with samples on the templates:

[0073] Details can be obtained from H. -J. Jhu, X. Xiu, C. -W. Kuo, W. Chen, N. Yan, C. Ma, X. Wang, B. Ray, M. Coban, V. Seregin, M. Karczewicz, EE2-2.5: Filtered Intra Block Copy (FIBC) , document JVET-AE0159, Joint Video Experts Team (JVET) , Jul. 2023.

[0074] (9) Enhanced Intra Template Matching Prediction (IntraTMP) methods

[0075] Intra Template Matching Prediction (IntraTMP) is a series of intra prediction methods where the coder searches from the coded areas within the current frame using template matching method and uses a reference block from the coded areas similar to the current block as a predictor. The search is usually done by using a similarity metric to measure the similarity of the template of the current block and the template of candidate reference blocks. While IntraTMP reference block is found using similarity of the templates, the similarity of the reference block (s) and current block is less guaranteed. Two enhancements of IntraTMP have been proposed.

[0076] In J. -Y. Huo, W. -H. Qiao, X. Hao, Z. -Y. Zhang, H. -Q. Du, Y. -Z. Ma, F. -Z. Yang, EE2-1.15a: Intra template matching (Intra TMP) based on linear filter model, document JVET-AD0112, Joint Video Experts Team (JVET) , Apr. 2023., it was proposed to establish a linear model between the reference block and the current block: PredCur (x, y) =c0·RecRef (x, y) +c1·RecRef (x-1, y) +c2·RecRef (x+1, y) +c3 ·RecRef (x, y-1) +c4·RecRef (x, y+1) +c5·2D-1 where the predicted value in the current block PredCur (x, y) is calculated from the reconstructed values RecRef in the reference block at and near (x, y) .

[0077] The model coefficients β= {c0, c1, …, c5} are obtained by a cost minimization formula with samples on the templates:

[0078] In J. -Y. Huo, H. -Q. Du, H. -L. Zhang, W. -H. Qiao, Y. -Z. Ma, F. -Z. Yang, EE2-1.16: A Fusion method of Intra Template Matching Prediction (Intra TMP) , document JVET-AD0116, Joint Video Experts Team (JVET) , Apr. 2023., it was proposed to perform fusion (weighted average) of multiple reference blocks using a linear model:

[0079] The model coefficients β= {w0, w1, …, wN-1, wN} are obtained by a cost minimization formula with samples on the templates:

[0080] (10) Local Illumination Compensation (LIC)

[0081] LIC is a prediction enhancement method that applies to inter prediction. In inter prediction, a reference block is found by the above-mentioned motion compensation unit. In order to compensate the illumination difference between reference block and current block, LIC is used to measure such difference and adjust prediction blocks. The formula for generating the prediction block is: Pred (x, y) =a·Ref (x, y) +b where Pred (x, y) is the value of pixel to be predicted and Ref (x, y) is the value of pixel in the reference block.

[0082] To calculate the model coefficients α and β , template areas are defined around both reference block and current block by choosing one or more rows adjacent to the top of the respective blocks and one or more columns adjacent to the left of the respective blocks. The calculation is similar to CCLM mentioned above.

[0083] (11) Regression-based Geometric Partitioning Mode blending (Regression-based GPM blending)

[0084] Geometric partitioning mode (GPM) is an inter prediction mode that uses two reference blocks and a partitioning line to generate the predicted block. The partitioning line splits the block into two parts and each part is filled with predicted samples from each reference block. Optionally, there is also a blending operation near the partitioning line. The samples near the partitioning line are computed as a weighted average of the samples from two reference blocks. Pred (x, y) =w0·Rec0 (x, y) +w1·Rec1 (x, y) w1=1-w0 where Rec0 (x, y) and Rec1 (x, y) are the reconstructed samples in the reference block.

[0085] Or in another formulation: Pred (x, y) -Rec0 (x, y) =w1· [Rec1 (x, y) -Rec0 (x, y) ]

[0086] Regression-based GPM blending removes the need to specify the partitioning line but instead derives the blending weights from the templates of the reference blocks and the current block. The blending operation is then applied to the whole block. The linear model is: w1=a·x+b·y+c

[0087] The model coefficients β= {a, b, c} are obtained by a cost minimization formula with samples on the template:

[0088] Details can be obtained from P. Bordes, K. Reuzé, F. Galpin, F. Urban, K. Naser, F. Le Léannec, E. Francois, EE2-2.11: Regression-based GPM blending (tests a, b, c) , document JVET-AG0112, Joint Video Experts Team (JVET) , Jan. 2024. ● Alternative regression methods and strategies

[0089] This section introduces some of the notable regression methods. Other than ordinary least squares which uses the sum of squared differences (SSD) on the whole set of training samples as the cost function to be minimized, there are a number of regression method that might shore up some of the disadvantages of OLS. Main differences between OLS and these methods are usually the choice of cost function and / or the processing of training samples.

[0090] (1) Minimization of a different cost function

[0091] In regression analysis, there are a few choices of cost function that in general render better robustness than SSD as in OLS:

[0092] Sum of absolute differences (SAD) . Since SAD is the L1-norm of the difference of two vectors in mathematical terms, this method is also referred to as L1-loss regression. Using the absolute difference instead of squared difference as a target of minimization reduces the impact of outliers. One drawback is that the cost function is indifferentiable at zero, which makes it relatively challenging for algorithm design. Another drawback is the possible multiple solutions without additional rules, which is undesired in data compression.

[0093] Huber loss. Huber loss function is a mix of SSD and SAD. Huber loss acts like SSD near zero and acts like SAD otherwise. It offers both differentiability and resilience to outliers but is more complex to implement and requires tuning of the threshold to switch between SSD and SAD.

[0094] Log-Cosh loss. Log-Cosh loss function also offers similar benefits as Huber loss but is also more complex to design.

[0095] (2) Iterative methods

[0096] Iterative regression with outlier detection. Instead of performing OLS regression only once, it might also be desirable to perform regression multiple times and iteratively. In between each iteration, outlier detection can be performed in order to reduce the impact of outliers in the sample pool.

[0097] Random sample consensus (RANSAC) . RANSAC is another class of iterative method for mathematical model estimation. Instead of putting all available samples into one pool and perform regression, random subsets of the original data are used to fit a model and all data are tested against the fitted model to determine if they are fitted to the model. By iteratively taking in samples and re-performing model fitting, the model with the most fitted samples prevails as the solution.

[0098] (3) Slope adjustment for CCLM and LIC

[0099] As to ECM version 14.0, slope adjustment is adopted as an additional option for CCLM and LIC. Besides the slope and offset calculated from ordinary least squares method, an adjustment value is added to the slope, and corresponding offset is recalculated. Based on the slope adjustment flags and values in the bitstream, the decoder can obtain the adjusted slope and offset to perform predictions in CCLM or LIC.

[0100] In addition, an LIC refinement based on SAD or L1-norm has been proposed. Besides the slope and offset calculated from ordinary least squares method, two adjusted values are added to the slope, and the corresponding offsets are recalculated. The three sets of slopes and offsets are evaluated in the input sample by measuring the SAD between predicted values and reconstructed values. The set of coefficients with the lowest SAD is chosen as the LIC model coefficients.

[0101] Details can be obtained from: 1) J. Lainema, A. Aminlou, P. Astola, R.G. Youvalari, EE2-1.1: Slope adjustment for CCLM, document JVET-Z0049, Joint Video Experts Team (JVET) , Apr. 2022; 2) Y. Wang, K. Zhang, Y. He, H. Liu, L. Zhang, Y. Zhang, C. -C. Chen, H. Huang, Z. Zhang, V. Seregin, H. Wang, M. Karczewicz, X. Xiu, C. Ma, N. Yan, H. -J. Jhu, C. -W. Kuo, W. Chen, X. Wang, EE2 Test 2.6g, 2.6h, 2.6i, 2.6j: Combination of tests on LIC improvement, document JVET-AG0276, Joint Video Experts Team (JVET) , Jan. 2024; 3) T. M. Bae, S. Deshpande, EE2-Related: LIC for Screen Content, document JVET-AI0229, Joint Video Experts Team (JVET) , Jul. 2024.

[0102] (4) Regularized least squares methods

[0103] As to ECM version 14.0, a regularized least squares method is introduced. Instead of OLS regression, a regularization term is added to the target function to be minimized.

[0104] One example is L2-regularized least squares:

[0105] In addition to the sum of squared difference, a penalty term represented by the L2-norm of the function coefficients multiplied by a regularization parameter is added. This problem has a close-form solution:

[0106] This problem is also called ridge regression.

[0107] Another example is L1-regularized least squares:

[0108] In addition to the sum of squared difference, a penalty term represented by the L1-norm of the function coefficients multiplied by a regularization parameter is added. This problem is also called lasso regression.

[0109] A third example is a combination of L2-regularized and L1-regularized least squares:

[0110] In addition to the sum of squared difference, a penalty term represented by a linear combination of L2-norm and L1-norm of the function coefficients. The two multipliers are regularization parameters. This problem is also called elastic net regression.

[0111] Details of L2-regularized least squares for linear model prediction can be obtained from H. Qin, J. Konieczny, K. Ding, Z. Xu, EE2-2.7: Regularized EIP / CCCM, document JVET-AI0066, Joint Video Experts Team (JVET) , Jul. 2024.

[0112] In existing technologies, the decoding process of any of the above-mentioned prediction mode may include: starting decoding a linear model prediction mode; obtaining input samples from neighbouring coded samples; deriving linear model coefficients; generating prediction samples based on the linear model; and finishing decoding. Certain limitations may exist in the process described above, as further elaborated below.

[0113] There may exist mismatch between the cost function of least squares regression and the evaluation metrics of video coding. In linear regression, the most typical cost function is SSD or L2-norm between predicted values and training samples. Meanwhile in video coding, the most frequently used evaluation metric for evaluating the performance of a prediction is SAD or L1-norm between predicted values and training samples. While L1-norm has been empirically shown to be strongly related to the bit rate for coding the residuals, L2-norm is used instead in the above-mentioned linear model prediction modes since a unique and analytical solution exists for least squares method.

[0114] Besides, there may exist overfitting in linear regression. While linear regression is generally effective in predicting data in a near future or spatial proximity, there is a risk of ‘overfitting’ . With increasing number of coefficients in a linear model, the prediction residual in the test data usually decreases at first but starts to increase after some point. This happens because the model over-fits into the training data and grasps distorted information from input noise. The risk of overfitting is higher when fewer input samples are being used to training a relatively larger model. In image and video coding, block sizes in general vary from 4x4 samples to 256x256 samples. The input samples range from dozens to thousands of pixels. In the meantime, the linear models in different coding tools usually contain two to fifteen coefficients. Therefore, overfitting would be more likely to happen with small block sizes and large models.

[0115] Furthermore, L1-based robust regression problems are difficult to solve. Although L1-based robust regression is more resilient towards input noise and potentially alleviate the overfitting problem that comes with small sample size, there is typically no analytical solution or even no unique solution to such minimization problems. Iterative solutions have high computation complexity that makes them less feasible for implementation and the lack of uniqueness could potentially break the principle of data compression.

[0116] The present disclosure introduces coefficient update for L1-based robust regression in linear model prediction modes for intra and inter prediction in image and video coding. Different embodiments of the present disclosure may solve one or several of the above-mentioned technical issues.

[0117] FIG. 5 is a flowchart of a decoding method applying linear model coefficient update according to an embodiment of the present disclosure. As shown in FIG. 5, the method includes operations described in blocks S201 to S204.

[0118] In S201, initial values for coefficients of a linear model of a prediction mode are acquired.

[0119] The prediction mode may be, but not limited to, any one of the prediction modes described in foregoing sections, for which the derivation of linear model coefficients by regression is part of the coding process. The initial values for the linear model coefficients β can be obtained from existing methods. The initial linear model coefficients with the initial values can be denoted as β0.

[0120] For example, the initial linear model coefficients can be the solution of ordinary least squares regression, which can be calculated as:

[0121] As another example, the initial linear model coefficients can be the solution of L2-regularized least squares regression, which can be calculated as:

[0122] In S202, a coefficient update is derived.

[0123] The coefficient update required for doing an L1-based robust regression is the difference between the desired coefficients and the initial coefficients. Given the high dimensionality of the linear model coefficients, it is unfeasible to use a brute-force or unguided search to find the optimal coefficients.

[0124] In some embodiments, the coefficient update may include an update vector. The update vector is a vector of the same length as the linear model coefficients which points approximately to the direction in which the optimal coefficients are located.

[0125] The operation of deriving the update vector may include: determining an objective function associated with the coefficients; and determining the update vector as an update direction in which the objective function is estimated to change most rapidly at a position of the initial values. The objective function represents a difference between reconstructed samples and predicted samples of the prediction mode.

[0126] For example, the base update vector v is the gradient of the objective function S (β) of the corresponding L1-based robust regression with respect to the linear model coefficients evaluated at the initial coefficients.

[0127] In one implementation, a coefficient update is applied for L1-loss regression, in which the objective function is a sum of absolute differences between reconstructed samples and predicted samples. The predicted samples are associated with the coefficients (the predicted samples are calculated from a corresponding linear model utilizing the coefficients) .

[0128] In another implementation, a coefficient update is applied for L1-regularized least squares regression, in which the objective function is a sum of squared differences between reconstructed samples and predicted samples plus a regularization term. The predicted samples are associated with the coefficients (the predicted samples are calculated from a corresponding linear model utilizing the coefficients) .

[0129] Specifically, in the above formula, S (β) may be one of the followings: 1) for L1-loss regression, S (β) =|y-Xβ| is the sum of absolute differences between reconstructed samples and the predicted samples in the input samples; 2) for L1-regularized least squares regression, is the sum of squared differences between reconstructed samples and the predicted samples in the input samples plus a regularization term.

[0130] It should be noticed, in an alternative embodiment, the derivation of the update vector may be performed at the encoding side. Then the encoder may transmit values of the update vector or an indication of the update vector to the decoder.

[0131] In some embodiments, the coefficient update may further include an update step size. In this implementation, the updated values for the coefficients may be calculated based on the update vector and the update step size.

[0132] The operation of deriving the update step size may include: using a difference between the initial values of the coefficients and a product of the update step size and the update vector as a variant of the objective function; and determining a value for the update step size that makes a derivative of the objective function with respect to the update step size substantially equal to zero.

[0133] Ideally, the optimal update step size ηopt should be the value that minimizes the objective function when the coefficient update is applied. β=β0-η·v

[0134] Since the optimal update step size is unknown and an exhaustive search is not feasible, the update step size can be an estimator that minimizes the objective function based on the derivative of the objective function over the update step size

[0135] For example, the optimal update step size is chosen from one of the values such that the derivative of the objective function over the update step size is zero when the derivative is evaluated at these values.

[0136] Optionally, the list of optimal update step sizes can be reduced by checking the Hessian matrix of the objective function with respect to the updated mode coefficients. The Hessian matrix is a p×p matrix whose element at the ith row and jth column is defined as the second order partial derivative of the objective function with respect to the ith and then the jth model coefficient.

[0137] If the Hessian matrix is positive definite, the model coefficients represent a local minimum of the objective function and retained in the list; otherwise, the model coefficients are removed.

[0138] It should be noticed, in an alternative embodiment, the derivation of the update step size may be performed at the encoding side. Then the encoder may transmit values of the update step size or an indication of the update step size to the decoder.

[0139] In S203, updated values for the coefficients of the linear model are acquired based on the coefficient update and the initial values.

[0140] With the update vector v, the update step size and the initial coefficients β0, the updated linear model coefficients can be obtained.

[0141] In one embodiment, one set of updated coefficients may be generated. For example, a set of updated linear model coefficients is obtained by subtracting from initial coefficients.

[0142] In another embodiment, multiple sets of updated coefficients may be generated. Each set of update linear model coefficients is obtained by subtracting an update from initial coefficients, where the update is a multiple of m∈ {0.5, 1, 1.5, 2}

[0143] The multiple sets of updated coefficients may be further evaluated to determine which one should be selected for the prediction mode.

[0144] The coder may check the updated coefficients and the initial coefficients, and determine whether a set of updated coefficients is accepted and used to update the linear model to be used in subsequent coding procedures. The process may include: acquiring a first difference between reconstructed samples and predicted samples of the prediction mode within a template area by applying the coefficients with the updated values; acquiring a second difference between reconstructed samples and predicted samples of the prediction mode within the template area by applying the coefficients with the initial values; and determining whether the first difference is less than the second difference. If the first difference is less than the second difference, the coefficients with the updated values may be selected. Otherwise, the coefficients with the initial values may be kept.

[0145] For example, when one set of updated coefficients is generated, if the objective function with the updated coefficients is smaller than that with the initial coefficients the updated coefficients are used as the output linear model; otherwise, the initial coefficients remain to be the output linear model.

[0146] As another example, when multiple sets of updated coefficients are generated, the set of coefficients with the smallest objective function value, be it the initial coefficients or a set of updated coefficients, is used as the output linear model coefficients.

[0147] As another example, when multiple sets of updated coefficients are generated, an additional integer indicating the index of the set of coefficients is sent to the bitstream. The encoder and the decoder shall use the set of coefficients indicated by the index.

[0148] In S204, the updated values are applied to generate prediction samples of a current block.

[0149] The updated coefficients with the updated values may be used as a replacement to the coefficients from least squares regression. Then the linear model with the updated coefficients may be utilized to generate prediction samples of a current block from reference samples of the current block.

[0150] In one embodiment, coefficient update may be enabled and used unconditionally. That is, whenever least squares regression is used to compute the coefficients of linear models for the generation for predicted samples, the updated coefficients for L1-robust regression may be used instead.

[0151] In another embodiment, the enabling of coefficient update may be derived from the decoder side based on codec configurations. The process may include: determining whether the updated values for the coefficients are enabled based on at least one selected from: sequence configuration; the linear model prediction mode; input sample size; noise level of input sample noise level; or indication from an encoder.

[0152] For example, the enabling of coefficient update depends on sequence configuration. In VVC or its subsequent standard, coefficient update is enabled when the quantization parameter (QP) is less than a pre-set value, e.g., 30.

[0153] As another example, the enabling of coefficient update depends on the prediction modes. In this example, coefficient update is used on prediction modes with ordinary least squares regression, as shown in Table 1. Table 1

[0154] As another example, the enabling of coefficient update depends on block size or sample size (number of samples) . Coefficient update is made available to all or certain block sizes or sample sizes. For example, coefficient update may be enabled if the number of input samples is not greater than 128, since smaller sample size leads to high risk of overfitting to noise samples.

[0155] As another example, the enabling of coefficient update depends on both sample size and parameters of the linear model. The ratio between sample size and the number of parameters of the linear model offers an estimation of the possibility of over-fitting in a linear regression task. Coefficient update is enabled only when the following relation is satisfied. n<T×p2 where n is the input sample size, p is the number of coefficients in the relevant linear model and T is a positive multiplier threshold.

[0156] In another embodiment, the enabling of coefficient update may be derived from decoder side based on an analysis of input samples. In the situation where there is no noise in the input samples, the model coefficients calculated by existing methods and by L1-based robust regression shall converge. In other word, coefficient update is not enabled if the estimated noise in the input samples is below a certain level. For example, the level of noise of input samples can be evaluated by the regression residuals r from the model coefficients in existing methods. r=y-Xβ0

[0157] If mean absolute residuals or mean squared residuals is under a threshold, i.e., |r| / n<T or ‖r‖2 / n<T, coefficient update is not enabled.

[0158] In another embodiment, the enabling of coefficient update may be signaled in the bitstream which is transmitted by the encoder.

[0159] Specifically, the indication for enabling of coefficient update may be signaled in a flag. A flag indicating if coefficient update is enabled is transmitted in the bitstream. When the flag is ‘true’ , coefficient update is used; otherwise, ordinary least squares method or other existing method is used. For example, in CCCM, the following flags are used to indicate the specific CCCM model, which is from one of {BVG-CCCM, CCCM, CCCM-MDF, CCCM-NoSub, GL-CCCM} . After decoding specific CCCM model, a flag for coefficient update (CCCM-L1) is decoded. Pseudo codes for the process are given in Table 2. Table 2

[0160] When CCCM-L1 flag is ‘true’ , the decoded CCCM variant is executed using coefficient update; otherwise, the decoded CCCM variant is executed using ordinary least squares or other existing method.

[0161] As another example, in CFLM, the following flags as shown in Table 3 are used to indicate whether coefficient update is used. Table 3

[0162] When CFLM-L1 flag is ‘true’ , CFLM is executed using coefficient update; otherwise, CFLM is executed using ordinary least squares or other existing method.

[0163] In addition, the flag can be attached to any other modes listed in Table 1 to indicate whether the decoding process should use coefficient update or not.

[0164] Alternatively, the indication for enabling of coefficient update may be signaled as part of mode selection procedures. In this embodiment, linear model prediction with coefficient update is added as additional modes and the flags or signals being used for identifying the coding mode to be used are updated accordingly.

[0165] For example, in CCCM, the following flags are used to indicate the specific CCCM model, which is from one of {BVG-CCCM, CCCM, CCCM-MDF, CCCM-NoSub, GL-CCCM} . By including CCCM with coefficient update, the following flags are used to indicate one of {BVG-CCCM, CCCM, CCCM-MDF, CCCM-NoSub, GL-CCCM, CCCM-L1} , as shown in Table 4. Table 4

[0166] When CCCM-L1 is ‘true’ , coefficient update is used for performing CCCM prediction.

[0167] Once it is determined that the coefficient update is enabled for the current prediction mode, the updated coefficients with the updated values may be applied to generate prediction samples of a current block from its corresponding reference samples.

[0168] In some embodiments, several parameters of the linear model prediction mode may be changed when L1-robust regression is used instead of ordinary least squares regression. The changes may be made to cater for a different objective function. For example, before the operation of applying the updated values to generate prediction samples of the current block, the method may further include: changing a number of reference lines for the linear model prediction mode.

[0169] In this implementation, the number of reference lines used for linear model prediction is reduced. The benefits of reducing the number of reference lines and thus the size of input samples include reducing encoding and decoding computation complexity and memory usage, as L1-loss regression tends to require fewer samples than ordinary least squares regression. Table 5 shows an example of changing the number of reference lines for several prediction modes. Table 5

[0170] According to the present disclosure, a coefficient update may be derived for updating the linear model coefficients of a prediction mode. The implementation of the present disclosure allows for the adjustment of linear model coefficients based on the coefficient update, which could result in more accurate predictions, and improvement of the overall performance of the decoding system.

[0171] A brief introduction about the process for derivation of the coefficient update has been given in the section with regard to the operation S202. Subsequent sections will offer a more thorough explanation on the application of L1-loss regression and L1-regularized least squares regression, respectively.

[0172] (1) Derivation of update vector and update step size for L1-loss regression

[0173] In L1-loss regression, the objective function is the sum of absolute differences between observed responses and the predicted responses in the input samples. S (β) =|y-Xβ|

[0174] The update vector and update step size are derived by setting the derivatives of the objective function. Specifically, the base update vector is determined by the gradient of the objective function with respect to the linear model coefficients, and the update step size is determined by the derivative of the objective function with respect to the magnitude of the update vector.

[0175] To find the base update vector, a gradient descent approach is used. For the gradient of the L1 loss S (β) at β, the partial derivative with respect to each coefficient βp is:

[0176] In the above formula, n is the number of input samples, p is the number of coefficients in the linear model, j∈ [1, p] is the index of each coefficient, xi, j is the jth element of input sample xi and sign (·) is the sign function.

[0177] Combine the partial derivatives of each coefficient back to the vector form to obtain the gradient of S (β) with respect to β:

[0178] As the negative of the gradient points towards the direction in which the L1 loss decreases the fastest, the gradient at initial coefficients is used as the base update vector:

[0179] To obtain the update step size, consider the cost function evaluated at

[0180] To find the optimal value for update step size η, consider the derivate of with respect to η: where

[0181] Therefore, the above derivate of with respect to η can be rewritten as:

[0182] The optimal update step size is obtained when the derivative is approximately zero.

[0183] Therefore, the update step size that approximately minimizes the L1 loss is the weighted median of all step size threshold values {η1, η2, …, ηn} where with non-negative weights {w1, w2, …, wn} where

[0184] A typical algorithm for finding the weighted median including a stable sort procedure with O (n·logn) complexity and an accumulate-and-compare procedure with O (n) complexity. Alternatively, it is also desirable to consider low complexity, stable and parallelizable alternative that approximates this value.

[0185] For example, the update step size is the weighted mean of threshold values {ηi} .

[0186] As another example, the update step size is the arithmetic mean of threshold values {ηi} .

[0187] As another example, the update step size can be chosen from a shortened list. A shortened candidate list of update step sizes is created by taking K evenly spaced values between the minimum and the maximum. ηmin=min (η1, η2, …, ηn) ηmax=max (η1, η2, …, ηn)

[0188] The update step size is chosen as the candidate that returns in the lowest objective function value.

[0189] As a special case, if K=1, the update step size is the average of the minimum and the maximum threshold values.

[0190] (2) Derivation of update vector and update step size for L1-regularized least squares regression

[0191] In L1-regularized least squares regression, the objective function is the sum of squared differences between observed responses and the predicted responses in the input samples plus a regularization term. The regularization term is the L1-norm of the linear model coefficients multiplied by a regularization parameter.

[0192] The update vector and update step size are derived by setting the derivatives of the objective function. Specifically, the update vector is determined by the gradient of the objective function with respect to the linear model coefficients. The update step size is determined by the derivative of the objective function with respect to the magnitude of the update vector. Optionally, model coefficients can be forced to zero if magnitudes are small.

[0193] To find the base update vector, a gradient descent approach is used. For the gradient of the loss function of L1-regularized least squares regression S (β) at β, the partial derivative with respect to each coefficient βp is:

[0194] In the formula, n is the number of input samples, p is the number of coefficients in the linear model, j∈ [1, p] is the index of each coefficient, xi, k is the kth element of input sample xi and sign (·) is the sign function.

[0195] Combine the partial derivatives of each coefficient back to the vector form to obtain the gradient of S (β) with respect to β:

[0196] If β0 is obtained from ordinary least squares regression, the gradient can be simplified since

[0197] As the negative of the gradient points towards the direction in which the objective function decreases the fastest, the gradient at initial coefficients is used as the base update vector:

[0198] To obtain the update step size, consider the objective function of L1-regularized least squares regression evaluated at the desired linear model coefficients

[0199] Let regression residual and thus the above equation can be reorganized as:

[0200] The derivative of with respect to update step size η is:

[0201] Let a=λ2·‖Xsign (β0) ‖2:

[0202] In the formula, N (η) =n ( {β|β∈β0, |β|<η·λ} ) is the number of coefficients in β0 whose absolute value is smaller than η·λ (n (A) denotes the number of elements in set A) . Therefore, N (η) is a step function and N (0) =0. As η increases, the value of N (η) increases by 1 every time η pass through a threshold determined by one of the coefficients in β0.

[0203] The derivative with respect to update step size η is a linear function plus a step function with multiple thresholds. The derivative is negative when η=0 and will eventually become positive as η increases and thus the objective function is a convex function of η. The optimal update step size is obtained when the derivative is zero. The solution is given as follows. wj=|a·tj+1-λ· (p-2·j) | wj+1=|a·tj-λ· (p-2·j) |

[0204] L1-regularization is a widely used technique that adaptively reduces the number of expressing coefficients by pushing some coefficients towards zero because of the L1-norm of the coefficients in the penalty term. The effect of automatically selecting the most relevant coefficients while suppressing others can largely alleviate the overfitting problem.

[0205] In some embodiments, after the operation of acquiring updated values for the coefficients of the linear model prediction mode, the process may further include: determining whether an element of the updated values for the coefficients is less than a pre-set threshold; and responsive to the element of the updated values for the coefficients being less than the pre-set threshold, setting the element to zero.

[0206] One of the benefits of L1-regularization in linear regression is ‘model selection’ . Some of the model coefficients become zero when the L1-regularized cost function is minimized. Since the procedures presented in this invention are an approximation to the unknown solution, it is desirable to snap some of the coefficients with small magnitude to zero. As a side benefit, the processing time for computing the prediction values is reduced since some coefficients are effectively removed from the model. The process is as: where is the news coefficient that replaces βj in the output model and T is a threshold.

[0207] In some embodiments, after the operation of acquiring updated values for the coefficients of the linear model prediction mode, the process may further include: setting an element of the updated values for the coefficients which corresponds to a bias term to zero. Alternatively, the process may further include: adjusting an element of the updated values for the coefficients which corresponds to a bias term based on input samples. Detailed explanations are given below.

[0208] In a first option, the bias term is relaxed in L1-regularized least squares regression. The magnitude of the coefficient associated with the bias term is excluded from the objective function. The bias term refers to one of the elements in the feature vector (regressor) x which is a same constant for all input samples. Without loss of generality, the bias term is designated as the last element in the regressor and thus the last coefficient of the linear model is associated with the bias term. By this definition, the objective function is changed to: where βj are linear model coefficients.

[0209] The base update vector is then changed to:

[0210] By replacing the base update vector using the one with relaxed bias term, the condition for determining the update step size is changed accordingly: where and N (η) =n ( {β|β∈ {β0, 1 β0, 2 … β0, p-1} , |β|<η·λ} ) .

[0211] In a second option, the coefficient associated with the bias term can be recalculated after L1-regularization coefficient update. The benefit for recalculating the coefficient associated with the bias term is to remove the mean bias in the prediction model.

[0212] For example, the updated coefficient can be calculated by back-substituting the updated model to the input samples to obtain the average prediction residual and add the required adjustment value to the previously obtained coefficient

[0213] Furthermore, in some embodiments, after the operation of acquiring updated values for the coefficients of the linear model prediction mode, the process may further include: determining a regularization parameter for the linear model prediction mode. The regularization parameter may be determined based on at least one selected from: a pre-set value; image-bit depth; number of elements of the coefficients; input sample size.

[0214] The regularization parameter λ may be determined by one of the following factors: image bit depth; input sample size; model size (the number of coefficients) ; a combination of the previous factors; none of these factors (constant value) . The objective function of L1-regularized least squares regression includes a squared error term and a penalty term. The squared error term is roughly proportional to input sample size and the square of pixel value dynamic range, while the penalty term is roughly proportional to model size and pixel value dynamic range. Making regularization parameter λdependent on these factors is a necessary step to equalize the impact of regularization for different images and videos, block sizes, etc.

[0215] In a first option, the regularization parameter λ is configured to be a constant value. That is: λ=C where C may be a pre-set constant.

[0216] In a second option, the regularization parameter λ depends on the image bit depth. In general, λ is positively correlated to the image bit depth. For example: λ=C·2D where D is the image bit depth and C is a constant value.

[0217] In a third option, the regularization parameter λ depends on the number of model coefficients. In general, λ is positively correlated to the number of model coefficients. For example: λ=C·p where p is the number of coefficients of the prediction model and C is a constant value.

[0218] In a fourth option, the regularization parameter λ depends on the input sample size. In general, λ is negatively correlated to the number of model coefficients. For example: λ=C / n where n is the number of input samples and C is a constant value.

[0219] In a fifth option, the regularization parameter λ depends on the combination of the image bit depth, the number of model coefficients and the input sample size. where D is the image bit depth, p is the number of coefficients of the prediction model, n is the number of input samples and C is a constant value.

[0220] In a sixth option, a look-up table is used to determine the regularization parameter. For example, the regularization parameter can be snapped to a power of 2 so that shifting can be used in place of multiplication. where denotes rounding down to the nearest integer, D is the image bit depth, p is the number of coefficients of the prediction model, n is the number of input samples and C is a constant value.

[0221] The present disclosure further provide a video encoding method applying linear model coefficient update. FIG. 6 is a flowchart of an encoding method applying linear model coefficient update according to an embodiment of the present disclosure. As shown in FIG. 6, the method may include operations described in blocks S301 to S304.

[0222] In S301, initial values for coefficients of a linear model of a prediction mode are acquired.

[0223] In S302, a coefficient update is derived.

[0224] In S303, updated values for the coefficients of the linear model are acquired based on the coefficient update and the initial values.

[0225] In S304, the updated values are applied to generate prediction samples of a current block.

[0226] The process for linear model coefficient update involved in the encoding process is similar to that involved in the decoding process as described with regard to FIG. 5. Thus, for simplicity and clarity, detailed explanation about each operation will not be repeated herein.

[0227] The encoding system may apply the linear model coefficient update, and then use the updated coefficient to generate prediction samples of a current block. The residual may be calculated by subtracting the predicted samples from the original samples in the current block. Then the encoder may transmit the residual to the decoder.

[0228] In some embodiments, the encoder may transmit information relative to the coefficient update (i.e., the update vector or the update step size) to the decoder. Thus, the decoder may not need to perform the derivation of the coefficient update.

[0229] In the present disclosure, L1-based robust regression includes the following types of regression: 1) L1-loss regression, where the objective function is the sum of absolute differences between the observed response and the predicted response in the input samples; 2) L1-regularized least squares regression, where the objective function is the sum of squared differences between the observed response and the predicted response in the input samples plus a regularization term defined as a multiple of the sum of absolute values of the model coefficients.

[0230] Using L1-loss regression for certain modes in image and video compression can improve compression efficiency based on one or several sources of improvements as explained below.

[0231] In linear regression, minimizing L1-loss is known to be more resilient to noises in input sample than minimizing L2-loss as it is being done in prior arts, as the prediction residuals of noise samples are not squared and contribute less in regression loss evaluation.

[0232] Since the impact of noise samples are better suppressed in L1-loss regression, fewer input samples are required to find stable prediction models. Under the context of image and video compression, the coder can avoid using input samples that are relatively far away from the samples to be predicted. Therefore, the statistical features of input samples and the samples to be predicted will be more positively correlated.

[0233] As a positive side effect of using few input samples, the encoding and / or decoding time and the memory requirement are also reduced.

[0234] As to L1-regularized least squares regression, compression efficiency can be improved based on one or several sources of improvements as explained below.

[0235] Adding L1-regularization term to the objective function has the power of ‘model selection’ . In several linear model prediction modes, the number of model coefficients is relatively large. Ordinary least squares regression may over-fit into the input samples given a large model size relative to the sample size. L1-regularized least squares regression can alleviate the overfitting effect by automatically pushing some of the model coefficients towards zero.

[0236] Besides, similar to L1-loss regression, L1-regularized least squares regression also has the benefits of suppressing noise samples and requiring fewer input samples.

[0237] As the solution of L1-based robust regression is not necessarily unique and may not be obtained under constraints imposed on computation complexity, the coefficient update methods introduced in the present disclosure can provide one or more of the following benefits.

[0238] 1) A close approximation of L1-based robust regression is generated under controllable computation time and memory consumption.

[0239] 2) The approximated solution is unique, highly parallelizable and thus suitable in image and video coding applications.

[0240] The present disclosure may be used in various codecs, including proprietary ones, and standardized video coding solutions (e.g., MPEG / ISO / IEC, AOM, AVS) . Solutions could be obtained by applying one or several embodiments to a selected video codec. Combination of embodiments also could be applied into a selected video coding framework. The solution also can be used for multiple prediction modes as shown in Table 6. Table 6

[0241] FIG. 7 conceptually illustrates an apparatus 400 with which some embodiments of the invention are implemented. The apparatus 400 may be a video decoder or a video encoder. The apparatus 700 may be a computer (e.g., a desktop computer, personal computer, tablet computer, etc. ) , phone, PDA, or any other sort of electronic device. Such an apparatus includes various types of computer readable media and interfaces for various other types of computer readable media. The apparatus 400 includes a processor 402 and a memory 404. The memory 404 is configured to store executable instructions that, when executed by the processor, cause the processor to perform any one of the foregoing decoding or encoding methods.

[0242] The processor 402 may be a single processor or a multi-core processor in different embodiments. In some embodiments, the processor may include a GPU, NPU or DSP which may offload various computations or complement the image processing provided by the processor 402.

[0243] Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media) . Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM) , recordable compact discs (CD-R) , rewritable compact discs (CD-RW) , read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM) , a variety of recordable / rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc. ) , flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc. ) , magnetic and / or solid state hard drives, read-only and recordable discs, ultra density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.

[0244] While the above discussion primarily refers to microprocessor or multi-core processors that execute software, some embodiments are performed by one or more integrated circuits, such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) . In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself. In addition, some embodiments execute software stored in programmable logic devices (PLDs) , ROM, or RAM devices.

[0245] As used in this specification and any claims of this application, the terms “computer” , “server” , “processor” , and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people. As used in this specification and any claims of this application, the terms “computer readable medium, ” “computer readable media, ” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.

[0246] The present disclosure further provides a computer readable media which is configured to store executable instructions. When the instructions are executed by a processor, the processor may perform any one of the foregoing methods and processes. Many of the above-described features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer readable storage medium (also referred to as computer readable medium) . When these instructions are executed by one or more computational or processing unit (s) (e.g., one or more processors, cores of processors, or other processing units) , they cause the processing unit (s) to perform the actions indicated in the instructions. Examples of computer readable media include, but are not limited to, CD-ROMs, flash drives, random access memory (RAM) chips, hard drives, erasable programmable read only memories (EPROMs) , electrically erasable programmable read-only memories (EEPROMs) , etc. The computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.

[0247] In this specification, the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage which can be read into memory for processing by a processor. Also, in some embodiments, multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions. In some embodiments, multiple software inventions can also be implemented as separate programs. Finally, any combination of separate programs that together implement a software invention described here is within the scope of the invention. In some embodiments, the software programs, when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.

[0248] While the disclosure has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be embodied in other specific forms without departing from the spirit of the invention. In addition, a number of the figures conceptually illustrate processes and methods. The specific operations of these processes may not be performed in the exact order shown and described. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Furthermore, the process could be implemented using several sub-processes, or as part of a larger macro process.

[0249] The foregoing is merely embodiments of the present disclosure, and is not intended to limit the scope of the disclosure. Any transformation of equivalent structure or equivalent process which uses the specification and the accompanying drawings of the present disclosure, or directly or indirectly application in other related technical fields, are likewise included within the scope of the protection of the present disclosure.

Claims

1.A decoding method, comprising:acquiring initial values for coefficients of a linear model of a prediction mode;deriving a coefficient update;acquiring updated values for the coefficients of the linear model of the prediction mode based on the coefficient update and the initial values; andapplying the updated values to generate prediction samples of a current block.2.The method of claim 1,wherein the coefficient update comprises an update vector.3.The method of claim 2, wherein the deriving the coefficient update comprises:determining an objective function associated with the coefficients, wherein the objective function represents a difference between reconstructed samples and predicted samples of the prediction mode; anddetermining the update vector as an update direction in which the objective function is estimated to change most rapidly at a position of the initial values.4.The method of claim 3,the objective function is a sum of absolute differences between reconstructed samples and predicted samples, wherein the predicted samples are associated with the coefficients.5.The method of claim 3,the objective function is a sum of squared differences between reconstructed samples and predicted samples plus a regularization term, wherein the predicted samples are associated with the coefficients.6.The method of claim 2,wherein the coefficient update further comprises an update step size;wherein the updated values for the coefficients are calculated based on the update vector and the update step size.7.The method claim 6, wherein the deriving the coefficient update comprises:using a difference between the initial values of the coefficients and a product of the update step size and the update vector as a variant of the objective function;determining a value for the update step size that makes a derivative of the objective function with respect to the update step size substantially equal to zero.8.The method of claim 1, before the applying the updated values to generate prediction samples of the current block, further comprising:acquiring a first difference between reconstructed samples and predicted samples of the prediction mode within a template area by applying the coefficients with the updated values;acquiring a second difference between reconstructed samples and predicted samples of the prediction mode within the template area by applying the coefficients with the initial values; anddetermining whether the first difference is less than the second difference.9.The method of claim 1, after the acquiring updated values for the coefficients of the linear model, further comprising:determining whether an element of the updated values for the coefficients is less than a pre-set threshold; andresponsive to the element of the updated values for the coefficients being less than the pre-set threshold, setting the element to zero.10.The method of claim 1, after the acquiring updated values for the coefficients of the linear model, further comprising:setting an element of the updated values for the coefficients which corresponds to a bias term to zero.11.The method of claim 1, after the acquiring updated values for the coefficients of the linear model, further comprising:adjusting an element of the updated values for the coefficients which corresponds to a bias term based on input samples.12.The method of claim 1, after the acquiring updated values for the coefficients of the linear model, further comprising:determining a regularization parameter for the linear model;wherein the regularization parameter is determined based on at least one selected from: a pre-set value; image-bit depth; number of elements of the coefficients; input sample size.13.The method of claim 1, before the applying the updated values to generate prediction samples of the current block, further comprising:determining whether the updated values for the coefficients are enabled based on at least one selected from: sequence configuration; the linear model; input sample size; noise level of input sample noise level; or indication from an encoder.14.The method of claim 1, before the applying the updated values to generate prediction samples of the current block, further comprising:changing a number of reference lines for the linear model.15.An encoding method, comprising:acquiring initial values for coefficients of a linear model of a prediction mode;deriving a coefficient update;acquiring updated values for the coefficients of the linear model of a prediction mode based on the coefficient update and the initial values; andapplying the updated values to generate prediction samples of a current block.16.An apparatus for video decoding or encoding, comprising a processor and a memory, wherein the memory is configured to store executable instructions that, when executed by the processor, cause the processor to perform the method of any of claims 1 to 15.17.A computer readable non-transitory medium storing executable instructions that, when executed by a processor, cause the processor to perform the method of any of claims 1 to 15.