Sub-sampling for implicit neural representation image encoding

By using an iterative optimization process and a sample subset selection method in the implicit neural representation, the complexity of parameter learning is reduced, the encoding and decoding efficiency is improved, and the efficiency problem of INR networks in real-time video applications is solved.

CN122249813APending Publication Date: 2026-06-19INTERDIGITAL CE PATENT HOLDINGS SAS

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
INTERDIGITAL CE PATENT HOLDINGS SAS
Filing Date
2024-11-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing implicit neural representation (INR) networks have high parameter learning complexity on the encoder side, which becomes a bottleneck and affects efficiency, especially in real-time video applications.

Method used

The parameters of the implicit neural representation are learned through an iterative optimization process. A subset of image samples is gradually added during the loss function calculation. By combining random selection or contribution-based sample selection, the samples are ensured to be evenly distributed on the signal, thus optimizing parameter learning.

Benefits of technology

It reduces the learning complexity of INR network parameters, improves encoding and decoding efficiency, and is suitable for real-time video applications.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122249813A_ABST
    Figure CN122249813A_ABST
Patent Text Reader

Abstract

One approach includes: obtaining (41) an input signal represented by a set of image samples; learning (42) parameters of an implicit neural representation using an iterative optimization process based on a loss function, the implicit neural representation allowing values ​​representing the image samples to be derived from the coordinates of the image samples; and signaling (43) the learned parameters in a dataset; wherein, during the learning of the parameters, a subset of image samples from the set is used to compute the loss function in at least one iteration of the iterative optimization process.
Need to check novelty before this filing date? Find Prior Art

Description

Cross-reference to related applications

[0001] This application claims priority to European application No. 23307059.8, filed on 27 November 2023, which is incorporated herein by reference in its entirety. Technical Field

[0002] At least one of these embodiments generally relates to a method and apparatus for encoding and decoding image or video data based on implicit neural representations. Background Technology

[0003] Implicit Neural Representation (INR)-based compression techniques are relatively new and can be applied to 2D images, videos, 3D scenes, or objects. These techniques have a significantly lower computational complexity than compression methods based on end-to-end neural networks.

[0004] INR networks are typically neural networks composed of multiple neural layers, such as fully connected layers. Each neural layer can be described as a function that first multiplies the input signal by a tensor, adds a vector called the bias, and then applies a nonlinear function to the resulting value. The shape (and other properties) of the tensor and the type of nonlinear function are referred to as the network's architecture.

[0005] Typically, parameters describing the INR network are learned for the input signal, and it is assumed that the characteristics of the INR network are provided to the decoding unit responsible for reconstructing the input signal. These parameters are used to generate a reconstructed version of the input signal.

[0006] INR networks are known to be far less complex than other end-to-end neural networks. Therefore, INR networks can be a solution for implementing neural networks on the decoder side. Learning the parameters of the INR network makes its use on the encoder side much more complex. This complexity can be problematic for applications such as video streaming and real-time video applications.

[0007] The aim is to propose solutions that can overcome the aforementioned problems. In particular, it is hoped that solutions can reduce the learning complexity of INR network parameters. Summary of the Invention

[0008] In a first aspect, one or more of the embodiments in this paper provide a method, the method comprising: Obtain the input signal represented by a set of image samples; The parameters of an implicit neural representation are learned using an iterative optimization process based on a loss function, the implicit neural representation allowing the derivation of values ​​representing the image samples from the image sample coordinates; and Signal the learned parameters in the dataset; During the learning of the parameters, a subset of image samples from the set is used to compute the loss function in at least one iteration of the iterative optimization process.

[0009] In one embodiment, in response to using a subset of image samples from the set to compute the loss function in multiple successive iterations of the iterative optimization process, the number of image samples in the subset increases with each successive iteration.

[0010] In an embodiment, the number of image samples in the subset is a proportion representing the number of samples of the input signal, or depends on information representing the complexity of the input signal.

[0011] In one embodiment, the number of image samples in the subset is increased based on a comparison between a first value and a second value provided by the loss function.

[0012] In one embodiment, the number of image samples in the subset is increased by monitoring the rate at which the value provided by the loss function decreases.

[0013] In an embodiment, during iteration, the image samples in the subset are selected in the following ways: randomly, or based on the contribution of the image samples to a first value provided by the loss function, or based on image samples used in the subset in at least one previous iteration, or in such a way that the samples in the subset are uniformly distributed on the input signal, or in response to the input signal being at least one partition of an image, the image samples in the subset are selected based on the position of the image samples in each partition.

[0014] In an embodiment, the image samples selected from the subset based on the position of the image samples in each partition are image samples located at the boundaries of the partitions or image samples corresponding to the edges included in the partitions.

[0015] In a second aspect, one or more of the embodiments in this paper provide an apparatus including an electronic circuit system configured to: Obtain the input signal represented by a set of image samples; The parameters of an implicit neural representation are learned using an iterative optimization process based on a loss function, the implicit neural representation allowing the derivation of values ​​representing the image samples from the image sample coordinates; and Signal the learned parameters in the dataset; During the learning of the parameters, a subset of image samples from the set is used to compute the loss function in at least one iteration of the iterative optimization process.

[0016] In one embodiment, in response to using a subset of image samples from the set to compute the loss function in multiple successive iterations of the iterative optimization process, the number of image samples in the subset increases with each successive iteration.

[0017] In an embodiment, the number of image samples in the subset is a proportion representing the number of samples of the input signal, or depends on information representing the complexity of the input signal.

[0018] In one embodiment, the number of image samples in the subset is increased based on a comparison between a first value and a second value provided by the loss function.

[0019] In one embodiment, the number of image samples in the subset is increased by monitoring the rate at which the value provided by the loss function decreases.

[0020] In an embodiment, during iteration, the image samples in the subset are selected in the following ways: randomly, or based on the contribution of the image samples to a first value provided by the loss function, or based on image samples used in the subset in at least one previous iteration, or in such a way that the samples in the subset are uniformly distributed on the input signal, or in response to the input signal being at least one partition of an image, the image samples in the subset are selected based on the position of the image samples in each partition.

[0021] In an embodiment, the image samples selected from the subset based on the position of the image samples in each partition are image samples located at the boundaries of the partitions or image samples corresponding to the edges included in the partitions.

[0022] In a third aspect, one or more of the embodiments in this embodiment provide an output signal generated by the method of the first aspect or by the device of the second aspect.

[0023] In a fourth aspect, one or more of the embodiments in this embodiment provide a non-transitory information storage medium, the non-transitory information storage medium including program code instructions for implementing the method according to the first aspect.

[0024] In a fifth aspect, one or more of the embodiments in this embodiment provide a computer program, the computer program including program code instructions for implementing the method according to the first aspect. Attached Figure Description

[0025] Figure 1 Examples illustrating the background in which various embodiments can be implemented; Figure 2AAn example of a hardware architecture for a processing module capable of implementing an encoding or decoding module is schematically shown, in which various aspects and embodiments are implemented; Figure 2B A block diagram illustrating an example of a first system that implements various aspects and embodiments is shown; Figure 2C A block diagram illustrating an example of a second system that implements various aspects and embodiments is shown; Figure 3 A simple neural network for implicit neural representation is shown; Figure 4A The process of encoding signals using implicit neural representations is illustrated; Figure 4B The process of decoding a signal using implicit neural representation is illustrated; Figure 5 An example of the partitions that the pixel images of the original video sequence have undergone is shown; and Figure 6 The coding process according to various embodiments is illustrated schematically. Detailed Implementation

[0026] In the following description, various embodiments are applied to 2D signals, such as image or video data. It can be noted that these various embodiments can also be applied in the same way to other types of signals, such as 3D signals representing 3D scenes or objects.

[0027] Figure 1 An example describing the background in which the following embodiments can be implemented is given.

[0028] exist Figure 1 In this system, system 11 (which may be a camera, storage device, computer, server, or any device capable of delivering video streams (i.e., video data)) uses communication channel 12 to send a video stream to system 13. The video stream is encoded and sent by system 11, or received and / or stored by system 11 and then sent. Communication channel 12 is a wired (e.g., Internet or Ethernet) network link or a wireless (e.g., WiFi, 3G, 4G, or 5G) network link.

[0029] System 13 (which may be, for example, a set-top box) receives a video stream and decodes it to generate a sequence of decoded images.

[0030] The decoded image is then transmitted to the display system 15 via communication channel 14, which can be a wired or wireless network. The display system 15 then displays the image.

[0031] In this embodiment, system 13 is included in display system 15. In this case, system 13 and display system 15 are included in a TV, computer, tablet computer, smartphone, head-mounted display, etc.

[0032] Figure 2A An example of a hardware architecture for a processing module 200 capable of implementing an encoding module or a decoding module is schematically shown. The encoding module or the decoding module is respectively capable of implementing... Figure 6 The encoding method and the decoding method shown in Figure 7 are described. When the device is responsible for encoding the video stream, the encoding module is included in system 11, for example. The decoding module is included in system 13, for example. Processing module 200 includes the following items connected via communication bus 2005: a processor or CPU (Central Processing Unit) 2000, which, as a non-limiting example, includes one or more microprocessors, general-purpose computers, special-purpose computers, and processors based on multi-core architectures; random access memory (RAM) 2001; read-only memory (ROM) 2002; a storage unit 2003, which may include non-volatile memory and / or volatile memory, including but not limited to electrically erasable programmable read-only memory (EEPROM), read-only memory (ROM), programmable read-only memory (PROM), random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, disk drives and / or optical disk drives, or storage media readers such as SD (Secure Digital) card readers and / or hard disk drives (HDDs) and / or network-accessible storage devices; and at least one communication interface 2004 for exchanging data with other modules, devices, or equipment. Communication interface 2004 may include, but is not limited to, a transceiver configured to send and receive data via a communication channel. The communication interface 2004 may include, but is not limited to, a modem or a network card.

[0033] If processing module 200 implements a decoding module, then communication interface 2004 enables, for example, processing module 200 to receive encoded video streams and provide decoded image sequences. If processing module 200 implements an encoding module, then communication interface 2004 enables, for example, processing module 200 to receive raw image data sequences for encoding and provide encoded video streams.

[0034] Processor 2000 is capable of executing instructions loaded into RAM 2001 from: ROM 2002, external memory (not shown), storage media, or a communication network. When processing module 200 is powered on, processor 2000 is capable of reading instructions from RAM 2001 and executing them. These instructions form a computer program that causes, for example, processor 2000 to implement... Figure 4B The described decoding method, about Figure 4A and Figure 6 The encoding methods described herein include the various aspects and embodiments described below.

[0035] Figure 4A , Figure 4B and Figure 6 All or some of the algorithms and steps of the method can be implemented in software by executing a set of instructions by a programmable machine such as a DSP (digital signal processor) or microcontroller, or in hardware by a machine or special component such as an FPGA (field-programmable gate array) or ASIC (application-specific integrated circuit).

[0036] As can be seen, microprocessors, general-purpose computers, special-purpose computers, processors based on or not based on multi-core architectures, DSPs, microcontrollers, FPGAs, and ASICs are suitable for at least partial implementation (i.e., configured for implementation). Figure 4A , Figure 4B and Figure 6 Electronic circuit systems using this method.

[0037] Figure 2C A block diagram illustrating examples of a system 13 implementing various aspects and embodiments is shown. System 13 may be embodied as a device including the various components described below and configured to perform one or more embodiments described in this document. Examples of such devices include, but are not limited to, various electronic devices such as personal computers, laptop computers, smartphones, tablet computers, digital multimedia set-top boxes, digital television receivers, personal video recording systems, connected home appliances, and head-mounted displays. Elements of system 13 may be embodied individually or in combination in a single integrated circuit (IC), multiple ICs, and / or discrete components. For example, in at least one embodiment, system 13 includes a processing module 200 implementing a decoding module. In various embodiments, system 13 is communicatively coupled to one or more other systems or other electronic devices via, for example, a communication bus or through dedicated input and / or output ports. In various embodiments, system 13 is configured to implement one or more aspects described in this document.

[0038] Inputs to the processing module 200 can be provided through various input modules, as indicated in box 231. Such input modules include, but are not limited to, (i) a radio frequency (RF) module that receives, for example, RF signals transmitted over the air by a broadcaster, (ii) a component (COMP) input module (or a set of COMP input modules), (iii) a universal serial bus (USB) input module, and / or (iv) a high-frequency input module. Clarity Multimedia Interface (HDMI) input module. Figure 2C Other examples not shown include composite video.

[0039] In various embodiments, the input module of block 231 has associated corresponding input processing elements known in the art. For example, the RF module may be associated with elements suitable for: (i) selecting a desired frequency (also known as selecting a signal, or limiting the signal band to a band), (ii) down-converting the selected signal, (iii) further band-limiting to a narrower band to select, for example, a signal band that may be referred to as a channel in some embodiments, (iv) demodulating the down-converted and band-limited signal, (v) performing error correction, and (vi) demultiplexing to select the desired data packet stream. The RF module in various embodiments includes one or more elements performing these functions, such as a frequency selector, signal selector, band limiter, channel selector, filter, downconverter, demodulator, error corrector, and demultiplexer. The RF section may include tuners performing various of these functions, including, for example, down-converting a received signal to a lower frequency (e.g., intermediate frequency or near-baseband frequency) or baseband. In one set-top box embodiment, the RF module and its associated input processing elements receive RF signals transmitted via a wired (e.g., cable) medium and perform frequency selection by filtering, down-converting, and filtering again to the desired frequency band. Various embodiments rearrange the order of the above (and other) components, remove some of these components, and / or add other components that perform similar or different functions. Adding components may include inserting components between existing components, such as insert amplifiers and analog-to-digital converters. In various embodiments, the RF module includes an antenna.

[0040] Additionally, the USB and / or HDMI modules may include corresponding interface processors for connecting system 13 to other electronic devices via USB and / or HDMI connections. It should be understood that aspects of input processing (e.g., Reed-Solomon error correction) can be implemented, for example, within a separate input processing IC or within processing module 200 as needed. Similarly, aspects of USB or HDMI interface processing can be implemented, as needed, within a separate interface IC or within processing module 200. The demodulated, error-corrected, and demultiplexed stream is provided to processing module 200.

[0041] Various components of system 13 can be housed within an integrated housing. Within the integrated housing, the various components can be interconnected and transmit data between them using suitable connection means (e.g., internal buses known in the art, including inter-IC (I2C) buses, wiring, and printed circuit boards). For example, in system 13, processing module 200 is interconnected with other components of system 13 via bus 2005.

[0042] The communication interface 2004 of the processing module 200 allows the system 13 to communicate on the communication channel 12. As mentioned above, the communication channel 12 can be implemented, for example, in a wired and / or wireless medium.

[0043] In various embodiments, data is streamed to or otherwise provided to system 13 using a wireless network, such as a Wi-Fi network, for example, IEEE 802.11 (IEEE refers to the Institute of Electrical and Electronics Engineers). In these embodiments, the Wi-Fi signal is received via a communication channel 12 and a communication interface 2004 adapted for Wi-Fi communication. The communication channel 12 in these embodiments is typically connected to an access point or router that provides access to external networks, including the Internet, to allow streaming applications and other over-the-top communications. Other embodiments use the RF connection of input block 231 to provide streaming data to system 13. As indicated above, various embodiments provide data in a non-streaming manner. Additionally, various embodiments use wireless networks other than Wi-Fi, such as cellular networks or Bluetooth networks.

[0044] System 13 can provide output signals to various output devices, including display system 15, speakers 26, and other peripheral devices 27. Display system 15 in various embodiments includes one or more of, for example, a touchscreen display, an organic light-emitting diode (OLED) display, a curved display, and / or a foldable display. Display system 15 can be used in televisions, tablets, laptops, mobile phones, head-mounted displays, or other devices. Display system 15 can also be integrated with other components (e.g., as in a smartphone) or stand alone (e.g., an external monitor for a laptop computer). In various examples of embodiments, other peripheral devices 27 include one or more of a standalone digital video disc (or digital versatile disc) (DVR, used for both terms), an optical disc player, a stereo system, and / or a lighting system. Various embodiments use one or more peripheral devices 27 based on the output of system 13 to provide functionality. For example, an optical disc player performs the function of playing the output of system 13.

[0045] In various embodiments, signaling (such as AV.Link, Consumer Electronics Control (CEC), or other communication protocols that enable device-to-device control with or without user intervention) is used to transmit control signals between system 13 and display system 15, speaker 26, or other peripheral devices 27. Output devices may be communicatively connected to system 13 via dedicated connections through corresponding interfaces 232, 233, and 234. Alternatively, output devices may be connected to system 13 via communication interface 2004 using communication channel 12, or connected via communication interface 2004 to... Figure 1A dedicated communication channel corresponding to communication channel 14 in the system. Display system 15 and speaker 26 can be integrated into a single unit along with other components of system 13 in an electronic device (e.g., a television). In various embodiments, display interface 232 includes a display driver, such as a timing controller (TCon) chip.

[0046] Display system 15 and speaker 26 may alternatively be separate from one or more other components. In various embodiments where display system 15 and speaker 26 are external components, output signals may be provided via dedicated output connections, including, for example, HDMI ports, USB ports, or COMP outputs.

[0047] Figure 2B A block diagram illustrating examples of system 11 implementing various aspects and embodiments is shown. System 11 is very similar to system 13. System 11 can be embodied as a device including the various components described below and configured to perform one or more embodiments described in this document. Examples of such devices include, but are not limited to, various electronic devices such as personal computers, laptop computers, smartphones, tablet computers, cameras, and servers. Elements of system 11 can be embodied individually or in combination in a single integrated circuit (IC), multiple ICs, and / or discrete components. For example, in at least one embodiment, system 11 includes a processing module 200 implementing an encoding module. In various embodiments, system 11 is communicatively coupled to one or more other systems or other electronic devices via, for example, a communication bus or through dedicated input and / or output ports. In various embodiments, system 11 is configured to implement one or more aspects described in this document.

[0048] Input to the processing module 200 can be provided through various input modules, such as those already provided regarding... Figure 2C As indicated in box 231 of the description.

[0049] Various components of system 11 can be housed within an integrated housing. Within the integrated housing, the various components can be interconnected and transmit data between them using suitable connection means (e.g., internal buses known in the art, including inter-IC (I2C) buses, wiring, and printed circuit boards). For example, in system 11, processing module 200 is interconnected with other components of system 11 via bus 2005.

[0050] The communication interface 2004 of the processing module 200 allows the system 11 to communicate on the communication channel 12.

[0051] In various embodiments, data is streamed to or otherwise provided to system 11 using a wireless network, such as a Wi-Fi network, for example, IEEE 802.11 (IEEE refers to the Institute of Electrical and Electronics Engineers). In these embodiments, the Wi-Fi signal is received via a communication channel 12 and a communication interface 2004 adapted for Wi-Fi communication. The communication channel 12 in these embodiments is typically connected to an access point or router that provides access to external networks, including the Internet, to allow streaming applications and other over-the-top communications. Other embodiments use the RF connection of input block 231 to provide streaming data to system 11.

[0052] As indicated above, various embodiments provide data in a non-streaming manner. Additionally, various embodiments use wireless networks other than Wi-Fi, such as cellular networks or Bluetooth networks.

[0053] The data provided to system 11 may be provided in different formats. In various embodiments, this data is raw data, for example, provided by an image acquisition module connected to or included in system 11. In this case, processing module 200 is responsible for encoding this data.

[0054] System 11 can provide output signals to various output devices (such as System 13) that are capable of storing and / or decoding output signals.

[0055] Various implementations involve decoding. As used herein, "decoding" can encompass all or part of a process performed, for example, on a received encoded video stream (i.e., received video data) to produce a final output suitable for display. In various embodiments, such a process includes those described herein. Figure 4B The description covers the execution process of decoders in various implementations.

[0056] Various implementations involve encoding. Similar to the discussion of "decoding" above, the term "encoding" as used in this application can encompass all or part of a process performed, for example, on an input video sequence to produce an encoded video stream. In various embodiments, such a process includes the encoding described in this application. Figure 4A and Figure 6 The description covers the execution process of encoders in various implementations.

[0057] When a diagram is presented as a flowchart, it should be understood that it also provides a block diagram of the corresponding apparatus. Similarly, when a diagram is presented as a block diagram, it should be understood that it also provides a flowchart of the corresponding method / process.

[0058] The implementations and aspects described herein can be implemented, for example, in methods or processes, apparatuses, software programs, data streams, or signals. Even if discussed only in the context of a single implementation (e.g., discussed only as a method), the implementation of the features in question can also be implemented in other forms (e.g., apparatuses or programs). Apparatuses can be implemented, for example, with appropriate hardware, software, and firmware. Methods can be implemented, for example, in a processor, which generally refers to a processing device, including, for example, a computer, microprocessor, integrated circuit, or programmable logic device. Processors also include communication devices, such as computers, cellular phones, portable / personal digital assistants (“PDAs”), and other devices that facilitate information communication between end users.

[0059] The references to "an embodiment" or "an embodiment" or "an implementation" or "an implementation," and their variations, mean that a particular feature, structure, characteristic, etc., described in connection with that embodiment is included in at least one embodiment. Therefore, the phrases "in an embodiment" or "in an embodiment" or "in an implementation" or "in an implementation," and any other variations appearing throughout this application, do not necessarily refer to the same embodiment.

[0060] Additionally, this application may relate to "determining" various types of information. Determining information may include, for example, one or more of the following: estimated information, calculated information, predicted information, information retrieved from memory, or information obtained, for example, from another device, module, or user.

[0061] Furthermore, this application may relate to "accessing" various types of information. Accessing information may include one or more of the following: for example, receiving information, retrieving information (e.g., retrieving from memory), storing information, moving information, copying information, calculating information, determining information, predicting information, or estimating information.

[0062] Additionally, this application may relate to "receiving" various types of information. Like "access," the intent to receive is a broad term. Receiving information may include one or more of the following: for example, accessing information or retrieving information (e.g., retrieving from memory). Furthermore, "receiving" is generally referred to in one or more ways during operation, such as storing information, processing information, transmitting information, moving information, copying information, erasing information, calculating information, determining information, predicting information, or estimating information.

[0063] It should be understood that, for example, in the cases of “A / B,” “A and / or B,” “at least one of A and B,” and “one or more of A and B,” the use of any of the following “ / ,” “and / or,” “at least one,” and “one or more” is intended to cover selecting only the first listed option (A), or only the second listed option (B), or selecting both options (A and B). As yet another example, in the cases of “A, B, and / or C,” “at least one of A, B, and C,” and “one or more of A, B, and C,” this wording is intended to include selecting only the first listed option (A), or only the second listed option (B), or only the third listed option (C), or only the first and second listed options (A and B), or only the first and third listed options (A and C), or only the second and third listed options (B and C), or selecting all three options (A, B, and C). As will be apparent to those skilled in the art and related fields, this can be extended to a large number of listed items.

[0064] Furthermore, as used herein, the term "signaling" specifically refers to instructing the corresponding decoder to do something. For example, in some embodiments, the encoder signals the use of some INR parameters. In this way, in embodiments, the same parameters can be used on both the encoder and decoder sides. Thus, for example, the encoder can send (explicit signaling) a specific parameter to the decoder so that the decoder can use the same specific parameter. Conversely, if the decoder already has the specific parameter as well as other parameters, signaling can be used without sending (implicit signaling) to simply allow the decoder to know and select the specific parameter. Bit savings are achieved in various embodiments by avoiding the transmission of any actual functionality. It should be understood that signaling can be implemented in various ways. For example, in various embodiments, one or more syntactic elements, flags, etc., are used to send information to the corresponding decoder. Although the verb form of the term "signaling" has been used above, the term "signal" can also be used as a noun herein.

[0065] It will be apparent to those skilled in the art that implementations can generate various signals that are formatted to carry, for example, information that can be stored or transmitted. The information may include, for example, instructions for performing a method, or data generated by one of the described implementations. For example, a signal may be formatted to carry an encoded video stream (i.e., encoded data). Such a signal may be formatted as, for example, electromagnetic waves (e.g., using the radio frequency portion of the spectrum) or baseband signals. Formatting may include, for example, encoding the encoded video stream and modulating a carrier wave with the encoded video stream. The information carried by the signal may be, for example, analog or digital information. It is well known that signals can be transmitted via a variety of different wired or wireless links. The signal may be stored on a processor-readable medium.

[0066] Figure 3 A simple neural network for implicit neural representation (INR) is shown. This type of neural network for INR can be called... INR network For clarity, a 2D signal (such as an image) is used for illustration, but as mentioned above, INR can be used for signals of any dimension. INR parameterizes the signal as a function (300) that takes coordinates (310) as input and outputs the potential approximate signal values ​​(320) at those coordinates. When the signal processed by INR is an image, the input coordinates (310) can be the sample coordinates of an image sample (…). x , y The INR output (320) is the image sample value. The image sample value can be the original sample value of the original image, or a residual value representing the difference between the predicted sample and the original sample. The image sample can be a single-component signal (such as a grayscale image) or a multi-component signal including multiple components, such as RGB, YUV, or YUV+d images, where... d This represents the depth component. In the case of video, the output is similar, but the input can also include image indices in addition to sample coordinates. t INR can be used to calculate some or each sample coordinates ( x , y The signal is reconstructed using the image sample values.

[0067] INR networks are typically neural networks composed of multiple neural layers, such as fully connected layers. Figure 3 In this network, there are four neural layers. The intermediate outputs are represented by circles. Each neural layer can be described as a function that first multiplies the input by a tensor, adds a vector called the bias, and then applies a non-linear function to the resulting value. In this document, the neural layer can also be simply referred to as... layer The shape of the tensor (and other properties of tensors) and the type of nonlinear function define the neural network. Architecture In the following text, tensor values ​​and bias values ​​are referred to by the terminology... Weight The weights and parameters (if applicable) of a nonlinear function are referred to as the neural network's... parameter Architecture and parameters definition Model In the following text, the term "…" will be used. To indicate by Parameterized INR function.

[0068] Using a single INR network globally for the entire signal makes learning difficult because all parameters contribute to all values, resulting in a massive network that must encode every detail of the signal. A solution is to divide the signal into partitions and define a local INR network for each partition. When the signal is an image, these partitions can be slices, tiles, coding units, etc.

[0069] Figure 5 An example of the partitions that image 51, representing the pixels of the original video sequence 50, traverses is shown.

[0070] An image can be divided into multiple coded entities. Figure 5 This illustrates commonly used encoding entities in video compression standards. Firstly, as... Figure 5 The figure in Figure 53 indicates that, as referred to as Coding Tree Unit (CTU) uses a block grid to divide the image. A CTU is composed of, for example, brightness samples. It consists of two corresponding blocks of a block and a chroma sample. V It is usually a power of two. Secondly, the image is divided into one or more CTUs. For example, it can be divided into one or more tile rows and tile columns. tile It is a CTU sequence that covers a rectangular area of ​​the image. In some cases, a tile can be divided into one or more... strips Each strip consists of at least one row of CTUs within a tile. Beyond the concepts of tiles and strips, there exists a hierarchy called… slice Another encoded entity, which may contain at least one tile of the image or at least one strip of the tile.

[0071] exist Figure 5 In the example shown, as indicated by reference numeral 52 in the figure, image 51 is divided into three slices S1, S2 and S3 in a raster scan slice pattern. Each slice includes multiple tiles (not shown), and each tile includes only one strip.

[0072] like Figure 5 As shown by reference numeral 54 in the attached figure, the CTU can be partitioned into one or more sub-blocks (referred to as...). Encoding unit (CU) is a hierarchical tree. CTU is the root of the hierarchical tree ( Right now (parent node) and can be partitioned into multiple CUs (i.e., child nodes). If each CU is not further partitioned into smaller CUs, it becomes a leaf of the hierarchical tree, or if it is further partitioned, it becomes a smaller CU (parent node). Right now The parent node of the child node.

[0073] exist Figure 5In the example, CTU 54 is first partitioned into "4" square CUs using a quadtree partitioning method. The top-left CU is a leaf of the hierarchical tree because it is not further partitioned. Right now It is not the parent node of any other CU. Using quadtree partitioning again, the top-right CU is further partitioned into "4" smaller square CUs. Using binary tree partitioning, the bottom-right CU is vertically partitioned into "2" rectangular CUs. Using ternary tree partitioning, the bottom-left CU is vertically partitioned into "3" rectangular CUs.

[0074] During image encoding, partitioning is adaptive, with each CTU being partitioned to optimize criteria such as the consistency criteria of samples within the partition (based on CU characteristics such as pixel mean, variance, texture, and / or any other statistics of the signal within the CU under consideration) or the compression efficiency of the CTU criteria.

[0075] In this application, the term "block" or "picture block" may be used to refer to either CTU or CU.

[0076] In this application, the terms "reconstruction" and "decoding" are used interchangeably, as are the terms "pixel" and "sample," and the terms "image," "picture," "subpicture," "slice," and "frame." Generally, but not necessarily, the term "reconstruction" is used on the encoder side, while "decoding" is used on the decoder side.

[0077] Figure 4A This illustrates a typical process for encoding a signal using INR.

[0078] Figure 4A The process is executed, for example, by the processing module 200 of system 11.

[0079] In step 41, the processing module 200 obtains an input signal represented by a set of samples. The input signal is, for example, an image.

[0080] In step 42, the processing module 200 applies a learning phase, during which the INR parameters of the INR network that allow reconstruction of the values ​​of the input signal from the sample space coordinates are learned. (or a subset thereof). INR parameters (or weights) Learning is typically performed through an iterative optimization process, such as batch gradient descent based on a loss function.

[0081] Figure 6 An example of the batch gradient descent method is shown.

[0082] In this example, the batch gradient descent method is performed during step 42.

[0083] In step 4201, the processing module 200 will change the variables j Initialize to zero. Variable j Used to count the number of iterations in the batch gradient descent method.

[0084] In step 4202, the processing module 200 will change the variables i Initialize to zero. Here, it is assumed that the image has already been partitioned. NumPart A partition. A partition can be a CU, CTU, slice, superpixel, or an object determined by an object detection process.

[0085] In step 4204, processing module 200 calculates the partition. Local distortion :

[0086] in I(x,y) It is an image sample, and It has parameters INR function The output is shown. Here, the distortion is the mean squared error. However, other metrics, such as LPIPS (Learning-Aware Patch Similarity), can also be used in this case.

[0087] In step 4205, processing module 200 determines whether all partitions of the image have been processed. If not, the processing module resets the variable in step 4206. i Add one unit and return to step 4204. Otherwise, processing module 200 executes step 4207.

[0088] In step 4207, the processing module 200 calculates the global distortion of the image by summing up all local distortions:

[0089] in M and N These are the width and height of the image.

[0090] In step 4208, processing module 200 calculates the value of the loss function. loss : loss =

[0091] in β It represents the trade-off parameter between distortion D and bit rate R, and R is the parameter used to adjust the INR parameter. The bit rate value for encoding:

[0092] in It is used to signal the INR parameter. The bit rate.

[0093] In step 4209, the processing module updates the INR parameters using, for example, a stochastic gradient descent method. ( ).

[0094] In step 4211, processing module 200 determines whether the termination condition for stopping the batch gradient descent method is met. For example, when the number of iterations... j equal value J_MAX At that time, or if the value obtained in two consecutive iterations loss The difference between them is lower than the value DIFF_LOSS If the condition is met, then the termination condition is satisfied.

[0095] If the termination condition is not met, then step 4211 is followed by step 4212. In step 4212, processing module 200 sets the variable... j Add one unit. Otherwise, if the termination condition is met, processing module 200 stops. Figure 6 The process.

[0096] Back Figure 2A In step 43, the processing module 200 signals the INR parameter in the output bitstream (i.e., in the output data or dataset). ( (or a subset thereof). When the signal is an image, the processing module 200 also adds information representing the image, such as the image's width and height, and information representing the partitions.

[0097] exist Figure 4A In some variations of the process, processing module 200 can modify the input coordinates (x, y) through a transformation in an optional preprocessing step (between steps 41 and 42), and then use them as input in step 41. This transformation can be a Fourier mapping, coordinate transformation, normalization, etc. Tancik, MS-K. (2020). Fourier features let Networks learn high-frequency functions in low-dimensional domains (Fourier features) (Enabling networks to learn high-frequency functions in low-dimensional domains), Advances in Neural Information Processing Systems (pp. 7537-7547) This demonstrates that mapping to Fourier features (i.e., Fourier mapping) enables the multilayer perceptron (MLP) to learn the high-frequency components of the input signal. Otherwise, the MLP suffers from spectral bias and is unable to learn the high frequencies of the input signal, which significantly degrades visual quality when reconstructing the coded signal.

[0098] Technically, input coordinates The Fourier map is defined as:

[0099] Therefore, when the mapping is considered as a Fourier approximation of the kernel function, the mapping depends on the coefficients. , where the coefficient It is the Fourier fundamental frequency. In some implementations, the coefficients... It is predefined.

[0100] Figure 4B This illustrates a typical process for decoding data using INR.

[0101] Figure 4B The process is executed, for example, by the processing module 200 of system 13.

[0102] In step 44, the processing module 200 obtains input data, for example, when the application... Figure 4A In this method, the input data corresponds to the output data generated by the processing module 200 of system 11. The input data includes encoded INR parameters. ( During step 44, processing module 200 decodes the INR parameters from the input data. And by applying the INR function to each partition To regenerate the INR network. When the signal is an image, the processing module 200 also decodes the information representing the image.

[0103] In step 45, the processing module 200 applies the regenerated INR network to (i.e., the processing module 200 applies the INR function) The regenerated INR network is applied to the sample coordinates of the image to generate a reconstructed version of the input signal obtained in step 41 by system 11. If the input signal is an image, the processing module 200 applies the regenerated INR network to the sample coordinates of the image. x , y At least a sub-part of ). As an example, for a 256×256 sample image, these coordinates can be all pairs of x∈{0,1,...,255} and y∈{0,1,...,255} ( x , y Other options are also possible, such as generating upsampled, downsampled, or expanded versions of the input image.

[0104] If in Figure 4A During the process of input coordinates (x, y) Application transformations, such as those based on coefficients The Fourier mapping is then applied by the processing module 200 to the sample coordinates in an optional post-processing step following step 44. (x, y) And step 45 is applied to the transformed sample coordinates.

[0105] While INR networks are less complex than other end-to-end neural compression methods when applied to the decoding process, encoding is time-consuming because the model must be trained for each signal. Furthermore, when the signal is partitioned, a different set of INR parameter values ​​is learned for each element of the partition. Here, an embodiment is proposed that allows for reduced encoding complexity and time while maintaining encoding quality.

[0106] In various embodiments, the processing module 200 does not calculate each partition. Distortion on each sample Instead, it calculates each partition. Distortion on a subset of samples .

[0107] about Figure 4A The explanation of the process of encoding signals using INR applies... Figure 6 This is a modified version of the batch gradient descent method. In this modified version, steps 4201, 4202, 4205, 4206, 4208, 4209, 4211, 4212 and 4213 remain unchanged.

[0108] An additional step 4203 is performed between step 4202 (or 4206) and step 4204. During step 4203, processing module 200 selects a partition. A subset of samples. Various embodiments of step 4203 are described in detail below.

[0109] In step 4204, the partition is calculated. Local distortion ,as follows:

[0110] in Indicates partition The coordinates of the samples in the sample subset.

[0111] In step 4207, the processing module 200 calculates the global distortion of the image. ,as follows:

[0112] in Corresponding to partition The number of samples in (where) ).

[0113] Consider using it in each partition Step 4203 is achieved through various processes of selecting a subset of samples.

[0114] In the first embodiment, during step 4203, a partition is randomly selected. A fixed number of samples. In a first variant of the first embodiment suitable for partitions of uneven size, the number of randomly selected samples is a proportion of the number of samples in the partition.

[0115] However, due to Only each iteration j hour This approximation can lead to an increase in the number of iterations required to reach convergence (e.g., when the termination condition of step 1211 depends on the values ​​of two consecutive iterations). loss (The time difference between them). Therefore, the time gained in each iteration may be offset by the need to increase the number of iterations to achieve the same coding performance.

[0116] To address this issue, in a second variation of the first embodiment, the number of samples in the subset increases with each successive iteration. In other words, the number of samples in the subset is time-dependent. For example, the number of samples in the subset is increased every Δ iterations (where Δ is a fixed value), or by increasing the value provided by the loss function. loss Increase the number of samples in the subset by comparing it to a threshold or by monitoring the rate at which the value provided by the loss function decreases in successive iterations.

[0117] In the second embodiment of step 4203, the impact of samples on the overall image quality is considered. For example, the boundaries and edges of partitions are particularly important because the accumulation of errors on boundaries and edges can lead to visible artifacts in the reconstructed image. Therefore, it is ensured that boundary and edge samples are used for local distortion calculation in each iteration or more frequently than other samples. This results in better quality, at least perceptually. In an embodiment, a certain percentage of samples are selected from samples located at partition boundaries and / or pixels corresponding to edges. An edge detection process is used to identify edge samples. For example, in applications... Figure 4A Before the process, an edge detection process is applied to each image.

[0118] In a variation of the second embodiment of step 4203, the subset of samples is selected in part based on the influence of the subset's samples on the reconstruction error (e.g., Samples with larger reconstruction errors are preferred. For example, for iterations... j ,if Then select the sample, where It depends, for example, on the value of the iteration number j (i.e., j The higher, The lower (the lower).

[0119] You might also be interested in ensuring the entire partition or partition The most important part is fully covered. In the third embodiment of step 4203, a subset of samples for training is constructed in a specific way such that the samples in the subset are uniformly distributed across the partitions. This can be achieved by maintaining the samples selected in previous iterations in the current iteration, sequentially increasing the number of samples in the subset with each iteration, and increasing the probability of including samples that already exist in subsets far from those used in previous iterations. For example, a subsampling rate is defined for the first iteration, and this subsampling rate decreases with each iteration. For example, for the first iteration, the subsampling rate is fixed at "32" (i.e., "1" sample is selected from "32" in each direction in the first iteration), and the subsampling rate in any given next iteration is equal to the subsampling rate of the iteration before that given iteration divided by "2". In another variation, partitioning The sample is divided into equal sub-partitions, and the same number of samples are selected from each sub-partition in each iteration; for example, this number increases with each iteration. In another variation, different subsampling rates are applied to boundary and / or edge samples as well as other samples during iteration. For example, the subsampling rate for boundary and edge samples is systematically lower than the subsampling rate for other samples (more samples are retained in the edge and boundary samples than in other samples). In yet another variation, when randomly selecting a subset of samples from a partition, more samples are randomly selected from those corresponding to edges and boundaries than from other samples.

[0120] It may also be of interest to ensure the variability of samples in a subset across different iterations. In the fourth embodiment, for example, in a given iteration... j At that point, based on probability values P Select partition The sample. Arranged by decreasing probability. P Selecting partitions in sequence The sample. In the first variant, if the sample is in the iteration j-1 ... jk If a sample is used at least once, its probability P decreases. k It is an integer value ( k >1). In the second variation, the decrease in the probability P associated with the sample is proportional to the number of times the sample is selected in the last k iterations. In the third variation, the probability associated with the sample... P The reduction also depends on (a) the current iteration j With the iteration using the sample j’ The differences between them.

[0121] In an image, some parts may contain many details that are very useful for training the parameters of the INR network, while other parts may be homogeneous regions containing little or no detail. More attention should be paid to regions containing more detail. In an embodiment, the number of samples in a subset of a partition depends on the information representing the complexity of that partition. For example, for each partition... Calculate the variance, and the number of samples in the subset of the partition is proportional to the variance.

[0122] The foregoing describes numerous embodiments. The features of these embodiments may be provided individually or in any combination. Furthermore, embodiments may encompass one or more of the following features, devices, or aspects, individually or in any combination, across various claim classes and types: ● Bitstream or signal or video data, including information representing one or more of the described INR parameters, INR architecture or picture partitions or variations thereof.

[0123] ● Create and / or send and / or receive and / or decode bitstreams or signals that include information representing one or more of the partitions or variations thereof of the described INR parameters, INR schema, or picture.

[0124] ● A TV, set-top box, mobile phone, tablet computer or other electronic device that performs at least one of the described embodiments.

[0125] ● A TV, set-top box, mobile phone, tablet computer, or other electronic device that performs at least one of the described embodiments and displays (e.g., using a monitor, screen, or other type of display) a resulting image.

[0126] ● A TV, set-top box, mobile phone, tablet computer or other electronic device that tunes the channel (e.g., using a tuner) to receive signals including encoded video streams and performs at least one of the described embodiments.

[0127] ● A TV, set-top box, mobile phone, tablet computer or other electronic device that receives signals including encoded video streams over the air (e.g., using an antenna) and performs at least one of the described embodiments.

[0128] ● A server, camera, mobile phone, tablet or other electronic device that transmits signals including encoded video streams over the air (e.g., using an antenna) and performs at least one of the described embodiments.

[0129] ● A server, camera, mobile phone, tablet, or other electronic device that tunes the channel (e.g., using a tuner) to send signals including encoded video streams and performs at least one of the described embodiments.

Claims

1. A method, the method comprising: Obtain (41) the input signal represented by the set of image samples; The parameters of the (42) implicit neural representation are learned using an iterative optimization process based on a loss function, which allows the values ​​representing the image samples to be derived from the image sample coordinates; as well as Signal the parameters learned in (43) to the dataset; During the learning of the parameters, a subset (4203) of the image samples of the set is used to calculate (4208) the loss function in at least one iteration of the iterative optimization process.

2. The method according to claim 1, wherein, In response to using a subset of image samples from the set to compute the loss function in multiple successive iterations of the iterative optimization process, the number of image samples in the subset increases with each successive iteration.

3. The method according to claim 2, wherein, The number of image samples in the subset is a proportion representing the number of samples of the input signal, or depends on information representing the complexity of the input signal.

4. The method according to claim 2, wherein, The number of image samples in the subset is increased based on a comparison between a first value and a second value provided by the loss function.

5. The method according to claim 2, wherein, The number of image samples in the subset is increased by monitoring the rate at which the value provided by the loss function decreases.

6. The method according to any of the preceding claims, wherein, In the iterations of the iterative optimization process, the image samples in the subset are selected in the following ways: randomly, or based on the contribution of the image sample to a first value provided by the loss function, or based on the image samples used in the subset in at least one previous iteration, or in such a way that the samples in the subset are uniformly distributed on the input signal, or in response to the input signal being at least one partition of an image, the image samples in the subset are selected based on the position of the image samples in each partition.

7. The method according to claim 6, wherein, The image samples selected from the subset based on the location of the image samples in each partition are image samples located at the boundaries of the partition or image samples corresponding to the edges included in the partition.

8. An apparatus including an electronic circuit system, said electronic circuit system being configured to: Obtain (41) the input signal represented by the set of image samples; The parameters of (42) implicit neural representations are learned using an iterative optimization process based on a loss function, which allows deriving values ​​representing the image samples from the image sample coordinates; and Signal the parameters learned in (43) to the dataset; in, During the learning of the parameters, a subset (4203) of the image samples of the set is used to compute (4208) the loss function in at least one iteration of the iterative optimization process.

9. The device according to claim 8, wherein, In response to using a subset of image samples from the set to compute the loss function in multiple successive iterations of the iterative optimization process, the number of image samples in the subset increases with each successive iteration.

10. The device according to claim 9, wherein, The number of image samples in the subset is a proportion representing the number of samples of the input signal, or depends on information representing the complexity of the input signal.

11. The device according to claim 9, wherein, The number of image samples in the subset is increased based on a comparison between a first value and a second value provided by the loss function.

12. The device according to claim 9, wherein, The number of image samples in the subset is increased by monitoring the rate at which the value provided by the loss function decreases.

13. The device according to any one of claims 8 to 12, wherein, In the iterations of the iterative optimization process, the image samples in the subset are selected in the following ways: randomly, or based on the contribution of the image sample to a first value provided by the loss function, or based on the image samples used in the subset in at least one previous iteration, or in such a way that the samples in the subset are uniformly distributed on the input signal, or in response to the input signal being at least one partition of an image, the image samples in the subset are selected based on the position of the image samples in each partition.

14. The device according to claim 13, wherein, The image samples selected from the subset based on the location of the image samples in each partition are image samples located at the boundaries of the partition or image samples corresponding to the edges included in the partition.

15. An output signal generated by the method according to any one of the preceding claims 1 to 7 or by the device according to any one of the preceding claims 8 to 14.