Processing a group of images associated with a three-dimensional representation of a scene
By identifying and rearranging similar images within groups, the method addresses the inefficiencies of three-dimensional representations, enhancing processing efficiency and reducing file sizes for storage and transmission.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- V NOVA INT LTD
- Filing Date
- 2025-12-03
- Publication Date
- 2026-06-11
Smart Images

Figure GB2025052640_11062026_PF_FP_ABST
Abstract
Description
[0001] Processing a group of images associated with a three-dimensional representation of a scene
[0002] Field of the Disclosure
[0003] The present disclosure relates to methods, systems, and apparatuses for processing a group of images, in particular the present disclosure relates to methods, systems, and apparatuses for processing a group of images associated with a three-dimensional representation of a scene (e.g. a point cloud). According to one aspect of the disclosure, the disclosure relates to methods, systems, and apparatuses for rearranging the group of images in dependence on a further group of images. According to one aspect of the disclosure, the disclosure relates to methods, systems, and apparatuses for encoding a plurality of groups of images as a video.
[0004] Background to the Disclosure
[0005] Three-dimensional representations of environments are used in many contexts, including for the generation of virtual reality videos, in which depth information for a plurality of points of the representation is used to generate different images for a left eye and a right eye of a user. Typically, substantial processing power is required to determine such a three-dimensional representation, and the file size of files associated with these representations is typically large so that substantial amounts of storage are needed to keep the files and substantial amounts of bandwidth are required to transfer the files.
[0006] Summary of the Disclosure
[0007] According to an aspect of the present disclosure, there is described a method of processing a group of images, the method comprising: identifying a first group of images and a second group of images, wherein each image in each group of images is associated with an index; identifying a first image in the first group of images; determining a second image in the second group of images, the second image being similar to the first image; and updating an index of the second image based on an index of the first image.
[0008] Preferably, the method comprises updating the index of the second image to be the same as the index of the first image.
[0009] Preferably, determining the second image comprises comparing one or more potential images from the second group of images to the first image, preferably comprising identifying the second image as a most similar image from the plurality of potential images.
[0010] Preferably, the potential images are selected from those images with an index equal to or greater than an index of the first image, preferably wherein the potential images include each image of the second group of images that has an index equal to or greater than the index of the first image.
[0011] Preferably, determining the second image comprises determining one or more of: a distance between the first image and the second image; a Euclidean distance between the first image and the second image, and a sum of absolute differences (SAD) between the first image and the second image.
[0012] Preferably, the second group of images is associated with a three-dimensional representation of a scene, wherein one or more texture points in the three-dimensional representation reference images within the second group of images.
[0013] Preferably, the method comprises: identifying a texture point in the three-dimensional representation that is associated with the second image; and updating a reference of the texture point based on the updated index of the second image.
[0014] Preferably, the method comprises outputting a record ofthe update made to the index of the second index. Preferably, the method comprises outputting a record of one or more changes made to the second group of images, the changes indicating correspondences between original indexes of the images of the second group of images and updated indexes of said images of the second group of images. Preferably, one or more of the first group of images and / or the second group of images is associated with a plurality of three-dimensional representations.
[0015] Preferably, the method comprises comprising reordering the first group of images prior to the identifying of the first image. Preferably, the method comprises reordering the first group of images based on a characteristic of each image in the first group of images, more preferably wherein the characteristic comprises a luminance.
[0016] Preferably, the method comprises reordering the second group of images prior to the identifying of the first image. Preferably, the method comprises reordering the second group of images based on a characteristic used to reorder the first group of images.
[0017] Preferably, the method comprises iterating through a plurality of images in the first group of images and, for each image: determining a further image from the second group of images that is similar to said image; and updating an index of the further image based on an index of said image.
[0018] Preferably, the method comprises performing a plurality of iterations through the first group of images so as to update indexes of images of the second group of images. Preferably, the plurality of iterations are performed in a plurality of different directions. Preferably, the plurality of iterations includes a forwards iteration and a backwards iteration.
[0019] Preferably, the method comprises, for one or more of the images in the second group of images: identifying, during the first iteration, a first similar image in the first texture atlas; identifying, during the second iteration, a second similar image in the first texture atlas; determining a first similarity between the image in the second group of images and the first similar image; determining a second similarity between the image in the second group of images and the second similar image; comparing the first similarity and the second similarity; and updating the index of the image in the second group of images in dependence on the comparison of the first similarity and the second similarity.
[0020] Preferably, the method comprises updating the index of the image in the second group of image based on the image in the first group of images with a greater similarity.
[0021] Preferably, the method comprises: determining a first rearranged second group of images following a first iteration through the first group of images; determining a second rearranged second group of images following a second iteration through the first group of images; determining a first similarity between the first rearranged second group of images and the first group of images; determining a second similarity between the second rearranged second group of images and the first group of images; and outputting one of the first rearranged second group of images and the second rearranged second group of images based on a comparison of the first similarity and the second similarity.
[0022] Preferably, each group of images comprises a two-dimensional macro image that is composed of the images in the group of images.
[0023] Preferably, each group of images is arranged such that the indices of the component images are arranged in a z- pattern.
[0024] Preferably, the method comprises iterating through the first group of images following a z-pattern scanning pattern.
[0025] Preferably, the method comprises: identifying a plurality of potential first images in the first group of images, the plurality of potential first images comprising images with a range of indices; for one or more potential second images from the second group of images: comparing said potential second image to each of the plurality of potential first images; determining, for said potential second image, a most similar first image from the plurality of potential first images; and updating the index of said potential second image based on the index of the most similar first image. Preferably, updating the index of the second image comprises storing a record of the change in index.
[0026] Preferably, the first group of images is associated with a first three-dimensional representation and the second group of images is associated with a second three-dimensional representation. Preferably, the first and second three-dimensional representations are successive three-dimensional representations.
[0027] Preferably, the method comprises: selecting a set of potential second images from the second group of images. Preferably, the set of potential images is selected by random or stratified sampling.
[0028] Preferably, the method comprises: determining, for each of the first image and the second image, a characteristic set of pixels; and determining that the second image is similar to the first image based on the respective characteristic sets of pixels.
[0029] Preferably, the method comprises forming a bitstream comprising the second group of images.
[0030] Preferably, the method comprises encoding the first group of images and the second group of images as a video. Preferably, the encoding uses one or more of: AVC, HEVC, VVC, and LCEVC processes.
[0031] According to an aspect of the present disclosure, there is described a method of encoding a group of images, the method comprising: identifying a first group of images that forms a first two-dimensional image comprising these images, wherein each image of the first group of images is present in the first two- dimensional image; identifying a second group of images that forms a second two-dimensional image comprising these images, wherein each image of the second group of images is present in the second two- dimensional image; and encoding the first group of images and the second group of images as a video.
[0032] Preferably, each two-dimensional image comprises a plurality of tiles, wherein each tile comprises an image.
[0033] Preferably, the method comprises: arranging the first group of images so as to form the first two-dimensional image; and arranging the second group of images so as to form a second two-dimensional image.
[0034] Preferably, each of the images of the first group of images and / or each of the images of the second group of images is associated with an index such that said image can be identified based on the index.
[0035] Preferably, each of the images of the first group of images and / or each of the images of the second group of images is associated with a point in a three-dimensional representation associated with the groups of images. Preferably, the point comprises a reference to an index of an image within one of the groups of images.
[0036] Preferably, the method comprises encoding the first group of images and the second group of images as a video using a video codec. Preferably, the encoding uses a MP4, HEVC, VVC, and / or LCEVC codec.
[0037] Preferably, the method comprises rearranging the first group of images and / or the second group of images so as to decrease a spatial difference within the first group of images and / or the second group of images, thereby allowing more efficient encoding of the video.
[0038] Preferably, the method comprises rearranging the second group of images so as to decrease a difference between the second group of images and the first group of images, thereby allowing more efficient encoding of the video.
[0039] According to another aspect of the present disclosure, there is described an apparatus for processing a group of images, the apparatus comprising: means for (e.g. a processor for) identifying a first group of images and a second group of images, wherein each image in each group of images is associated with an index; means for (e.g. a processor for) identifying a first image in the first group of images; means for (e.g. a processor for) determining a second image in the second group of images, the second image being similar to the first image; and means for (e.g. a processor for) updating an index of the second image based on an index of the first image. According to another aspect of the present discloser, there is described an apparatus for encoding a group of images, the apparatus comprising: means for (e.g. a processor for) identifying a first group of images that forms a first two-dimensional image comprising these images, wherein each image of the first group of images is present in the first two-dimensional image; means for (e.g. a processor for) identifying a second group of images that forms a second two-dimensional image comprising these images, wherein each image of the second group of images is present in the second two-dimensional image; and means for (e.g. a processor for) encoding the first group of images and the second group of images as a video.
[0040] According to another aspect of the present disclosure, there is disclosed a bitstream comprising one or more groups of images determined and / or encoded using the aforesaid method.
[0041] According to another aspect of the present disclosure, there is disclosed a bitstream comprising one or more groups of images modified using the aforesaid method.
[0042] According to another aspect of the present disclosure, there is disclosed a bitstream comprising: one or more texture points of a three-dimensional representation, each texture point comprising a reference to an image in a group of images; a group of images; and a record of one or more changes made previously to the groups of images, the changes indicating a correspondence between an index contained in the texture point and an actual index of an image referenced by the texture point.
[0043] According to another aspect of the present disclosure, there is disclosed a bitstream comprising: a plurality of groups of images, wherein each group of images is in the form of a two-dimensional image comprising a plurality of component images, and wherein the two-dimensional images are encoded in the form of a video.
[0044] Preferably, the bitstream comprises a plurality of groups of images. Preferably, the images are encoded using video encoding processes. Preferably, the images are encoded using one or more of: AVC, HEVC, VVC, and LCEVC processes.
[0045] According to another aspect of the present disclosure, there is described an apparatus (e.g. an encoder) for forming and / or encoding the aforesaid bitstream.
[0046] According to another aspect of the present disclosure, there is described an apparatus (e.g. a decoder) for receiving and / or decoding the aforesaid bitstream.
[0047] Any feature in one aspect of the disclosure may be applied to other aspects of the invention, in any appropriate combination. In particular, method aspects may be applied to apparatus aspects, and vice versa.
[0048] Furthermore, features implemented in hardware may be implemented in software, and vice versa. Any reference to software and hardware features herein should be construed accordingly.
[0049] Any apparatus feature as described herein may also be provided as a method feature, and vice versa. As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure, such as a suitably programmed processor and associated memory.
[0050] It should also be appreciated that particular combinations of the various features described and defined in any aspects of the disclosure can be implemented and / or supplied and / or used independently.
[0051] The disclosure also provides a computer program and a computer program product comprising software code adapted, when executed on a data processing apparatus, to perform any of the methods described herein, including any or all of their component steps.
[0052] The disclosure also provides a computer program and a computer program product comprising software code which, when executed on a data processing apparatus, comprises any of the apparatus features described herein. The disclosure also provides a computer program and a computer program product having an operating system which supports a computer program for carrying out any of the methods described herein and / or for embodying any of the apparatus features described herein.
[0053] The disclosure also provides a computer readable medium having stored thereon the computer program as aforesaid.
[0054] The disclosure also provides a signal carrying the computer program as aforesaid, and a method of transmitting such a signal.
[0055] The disclosure extends to methods and / or apparatus substantially as herein described with reference to the accompanying drawings.
[0056] The disclosure will now be described, by way of example, with reference to the accompanying drawings.
[0057] Description of the Drawings
[0058] Figure 1 shows a system for generating a sequence of images.
[0059] Figure 2 shows a computer device on which components of the system of Figure 1 may be implemented.
[0060] Figure 3 shows a method of determining a three-dimensional representation of a scene.
[0061] Figures 4a and 4b show method of determining a point based on a plurality of sub-points.
[0062] Figure 5 shows a scene comprising a viewing zone.
[0063] Figures 6a and 6b show arrangements of capture devices for determining points of the three-dimensional representation.
[0064] Figure 7 shows a point that can be captured by a plurality of capture devices.
[0065] Figures 8a and 8b show grids formed by the different capture devices.
[0066] Figure 9 describes a method of determining a location of a point of the three-dimensional representation.
[0067] Figure 10 shows a method of determining an angle of a point from a capture device used to capture the point.
[0068] Figure 11 shows a method of determining a texture patch associated with a point of the three-dimensional representation.
[0069] Figures 12a, 12b, 12c, and 12d illustrate the determination and use of a texture patch.
[0070] Figures 13a and 13b each show a texture atlas that comprises a plurality of texture patches.
[0071] Figures 14a - 14d shows a method of rearranging a second texture atlas based on a first texture atlas.
[0072] Figure 15 shows a method of iterating through texture patches of a first texture atlas so as to rearrange a second texture atlas based on these texture patches.
[0073] Figures 16a and 16b show a method of comparing texture patches of a second texture atlas to texture patches of a first texture atlas.
[0074] Figures 17a - 17d show possible scanning patterns for scanning through texture patches in a texture atlas.
[0075] Figure 18 shows a method of determining an attribute value for a texture point in a three-dimensional representation.
[0076] Figure 19 shows a schematic of a bitstream.
[0077] Figure 20 shows a method of encoding a plurality of groups of images, where each group of images comprises one or more images. Description of the Preferred Embodiments
[0078] Referring to Figure 1 , there is shown a system for generating a sequence of images. This system can be used to generate, and then display, a representation of an environment, which may comprise a VR environment (or an XR environment).
[0079] The system comprises an image generator 11 , an encoder 12, a transmitter 13, a network 14, a receiver 15, a decoder 16 and a display device 17.
[0080] These components may each be implemented on separate apparatuses. Equally, various combinations of these components may be implemented on a shared apparatus; for example, the image generator 11 , the encoder 12, and the transmitter 13 may all be part of a single image data generation device. Similarly, the receiver 15, the decoder 16, and the display device 17 may all be a part of a single image rendering device.
[0081] Typically, the system comprises at least one encoding computer device (e.g. a server of a content provider) and at least one rendering computer device (e.g. a VR headset).
[0082] Referring to Figure 2, each of the components, and in particular the image generator 11 , the encoder 12, the transmitter 13, the receiver 15, the decoder 16 and the display device 17 is typically implemented on a computer device 20, where, as described above, a plurality of these components may be implemented on a shared computer device.
[0083] Each computer device comprises one or more of: a processor 21 for executing instructions (e.g. so as to perform one or more of the steps of the various methods described below), a communication interface 22 for facilitating communication between computer devices (e.g. an ethernet interface, a Bluetooth® interface, or a universal serial bus (UBS) interface, a memory 23 and / or storage 24 for storing information and instructions (e.g. a random access memory (RAM), a read only memory (ROM), a hard drive disk (HDD) a solid state drive (SSD), and / or a flash memory, and a user interface 25 (e.g. a display, a mouse, and / or a keyboard) for enabling a user to interact with the computer device. These components may be coupled to one another by a bus 25 of the computer device.
[0084] The computer device 20 may comprise further (or fewer) components. In particular, the computer device (e.g. the display device 17) may comprise one or more sensors, such as an accelerometer, a GPS sensor, or a light sensor. These sensors typically enable the computer device to identify an environmental condition and / or an action of wearer of the display device.
[0085] Turning back to Figure 1 , the image generator 11 is configured to generate a sequence of image data (e.g. a sequence of image frames) to enable the display device 17 to use this image data to display a plurality of images. The image data may comprise one or more digital objects and the image data may be generated or encoded in any format. For example, the image data may comprise point cloud data, where each point has a 3D position and one or more attributes. These attributes may, for example, include, a surface colour, a transparency value, an object size and a surface normal direction. Each attribute may have a value chosen from a continuous range or may have a value chosen from a discrete set.
[0086] The Image data enables the later rendering of images. This image data may enable a direct rendering (e.g. the image data may directly represent an image). Equally, the image data may require further processing in order to enable rendering. For example, the image data may comprise three-dimensional point cloud data, where rendering a two-dimensional image using this data requires processing based on a viewpoint of this two-dimensional image.
[0087] The image data may comprise depth map data, where one or more pixels or objects in the image is associated with a depth that is specified by the depth map data. The depth map data may be provided as a depth map layer, separate from an image layer. In some contexts, such as MPEG Immersive Video (MIV), the image layer may instead be described as a texture layer. Similarly, in some contexts, the depth map layer may instead be described as a geometry layer.
[0088] The image data may include a predicted display window location. The predicted display window location may indicate a portion of an image that is likely to be displayed by the display device 17. The predicted display window location may be based on a viewing position (such as a virtual position and / or orientation of the user in a 3D environment) of the user, where this viewing position may be obtained from the display device. The predicted display window location may be defined using one or more coordinates. For example, the predicted display window location may be defined using the coordinates of a corner or center of a predicted display window, and may be defined using a size of the predicted display window. The predicted display window location may be encoded as part of metadata included with the frame.
[0089] The image data for each image (e.g. each frame) may include further information, which may be provided as a part of an image, e.g. as part of the point cloud data, or as separate layers. In particular, the image data may include audio information or haptic feedback information indicating audio or haptics which can accompany displayed visual data. An audio layer or haptic layer may accompany each image, and may be omitted for images where no accompanying audio or haptics are required.
[0090] Similarly, the image data may comprise interactivity information, where the image data may contain or indicate elements with which a user can interact. The interactivity information may, for example, define a behaviour of an element, where a user is able to interact with the element based on this behaviour. The behaviour typically defines a change in an element that occurs as a result of a user interaction where this change may comprise a change in the attributes of the element or in the rendering of the element. As an example, where an image contains a target element, the target element may be arranged to disappear when a user interacts with this element, or to provide feedback indicating that the user has interacted with the target. This interactivity data may be provided as part of, or separately to, the image data.
[0091] The image data may indicate, or may be combinable with, a state of the virtual environment, a position of a user, ora viewing direction of the user. Here, the position and viewing direction may be physical properties of the user in the real-world, or position and viewing direction may also be purely virtual, for example being controlled using a handheld controller. The image generator 11 may, for example, obtain information from the display device 17 that indicates the position, viewing direction, or motion of the user. Equally, the image generator may generate image data such that it can later be combined with this position, viewing direction, or motion, where the image generator may generate a full scene which is only partially viewed by a user depending on the position of that user.
[0092] In some cases, the generated image may be independent of user position and viewing direction. This type of image generation typically requires significant computer resources such as a powerful GPU, and may be implemented in a cloud service, or on a local but powerful computer. For example, a cloud service (such as a Cloud Rendering Service (CRN)) may reduce the cost per-user and thereby make the image frame generation more accessible to a wider range of users. Here “rendering” refers at least to an initial stage of rendering to generate an image. Further rendering may occur at the display device 17 based on the generated image to produce a final image which is displayed.
[0093] The image generator 11 may, for example, comprise a rendering engine for initially rendering a virtual environment such as a game or a virtual meeting room.
[0094] The encoder 12 is configured to encode frames to be transmitted to the display device 17. The encoder may be implemented using executable software or may be implemented on specific hardware such as an ASIC. In some embodiments, the image generator 11 may transmit raw, unencoded, data through the network 14. However, such transmission typically leads to a high file size and requires a high bandwidth so that it is typically desirable to encode the data prior to the transmission. The encoder 12 may encode the image data in a lossless manner or may encode the data a lossy manner. The encoder may apply inter-frame or intra-frame compression based on a currently-encoded frame and optionally one or more previously encoded frames. The encoder may be a multi-layer encoder, such as a low complexity enhancement video codec (LCEVC) enabled encoder.
[0095] Where the generated frames comprise depth map data, the encoder 12 may perform layered encoding on each instance of image data (e.g. each frame) to generate an encoded frame comprising a base depth map layer and an enhancement depth map layer. Encoding a depth map in this way may improve compression. In some applications, such as HDR video, depth maps are desirably highly detailed with a bit depth of up to twelve or fourteen bits, which is a significant increase in the data to be transmitted. As a result, providing ways to improve compression of the depth map can make more realistic depth map-based displays viable when performing rendering or transmission of rendered data in real-time. Furthermore, this type of layered encoding makes it easy to drop (and then pick back up) one or more of the layers, which provides flexibility and tools for bandwidth management.
[0096] Layered encoding is also helpful as the final decoder / user device (such as a user display device) can choose whether to process these extra layers. For example, in a non-layered approach, the best the end device (i.e. the receiver, decoder or display device associated with a user that will view the images) can do is determine that it does not have enough resources for a given quality (be it resolution, frame rate, inclusion of depth map) and then signal to the controller / renderer / encoder that it does not have enough resources. The controller then will send future images at a lower quality. In that alternative scenario, the end device still unfortunately has to process the higher quality data until the lower quality data arrives, if it can process the received images at all.
[0097] In some of the described embodiments, this situation is improved upon because when / if the end device determines for example that it does not have the processing capabilities to handle the highest level of quality, then it can drop and / or choose not to process certain layers. The end device may also signal to the controller that it needs a lower level of quality, but in the meantime the end device can only process the number of layers that it can handle. Therefore, the end device can react to conditions much more quickly.
[0098] In some cases, depth map data may be embedded in image data. In this case, the base depth map layer may be a base image layer with embedded depth map data, and the enhancement depth map layer may be an enhancement image layer with embedded depth map data.
[0099] Alternatively, when the generated images comprise a depth map layer separate from an image layer and multi-layer encoding is applied, the encoded depth map layers may be separate from the encoded image layers. This has the advantage that the encoded depth map layers can be dropped under some conditions while still retaining image layers that can be displayed (albeit with a lower level of realism). For example, the encoded depth map layers can be dropped by a transmitter or encoder when available communication resources are reduced, or can be dropped by an end device which lacks the processing resources to handle the highest level of quality.
[0100] Similarly, if some images comprise an audio base layer, a haptic feedback base layer, an audio enhancement layer or a haptic feedback enhancement layer, these can be processed or dropped flexibly.
[0101] Again similarly, if some images comprise an interactivity data base layer or an interactivity enhancement layer these can be processed or dropped flexibly. For example, certain interactions may only be possible where a threshold bandwidth is available, where complex interactions (e.g. those enabling a conversation with a digital object) may be disabled before less complex interactions (e.g. changing a pixel colour) are disabled.
[0102] Additionally or alternatively, where the image data comprises point cloud data, the encoder may apply a point cloud data encoding technique such as described in European patent application EP21386059.6, which is incorporated herein by reference. Such a point cloud encoder may act as a base encoder for a layered encoding technique such as LCEVC or VC-6. Notably LCEVC and VC-6 techniques encode and decode a layered signal, but are agnostic about the content type of data encoded in the signal. For example, the signal can include textures, video frames, geometry or depth data, meshes, point clouds, rendering attributes or physics engine attributes.
[0103] The transmitter 13 may be any known type of transmitter for wired or wireless communications, including an Ethernet transmitter or a Bluetooth transmitter.
[0104] The transmitter 13 may be configured to make decisions about how to transmit the image data, and / or may provide feedback to the encoder 12 or the image generator 11 . For example, the transmitter may determine available communication resources (e.g. bandwidth) for transmitting image data, and may drop one or more layers from an encoded frame, or indicate to the image generator and / or encoder that image data should be generated and encoded with fewer layers, when insufficient bandwidth is available for transmission of all generated data. As specific examples, the transmitter may be configured to drop a depth map layer, an LCEVC enhancement layer, or a VC-6 enhancement layer from a frame when insufficient communication resources are available.
[0105] The network 14 provides a channel for communication between the transmitter 13 and the receiver 15, and may be any known type of network such as a WAN or LAN or a wireless Wi-Fi or Bluetooth network. The network may further be a composite of several networks of different types. Many users only have access to a network with a bandwidth of 30MBps which can lead to latency jitter when streaming. The required bandwidth and the observed latency can be reduced by means of tactics such as forward-looking rendering and last-millisecond reprojection, which are enabled by improved compression.
[0106] The receiver 15 may be any known type of receiver for wired or wireless communications, including an Ethernet transmitter or a Bluetooth transmitter.
[0107] The decoder 16 is configured to receive and decode an encoded frame. The decoder may be implemented using executable software or may be implemented on specific hardware such as an ASIC.
[0108] The display device 17 may for example be a television screen or a VR headset. The timing of the display may be linked to a configured frame rate, such that the display device may wait before displaying the image. The display device may be configured to perform warping, that is, to obtain a final display window location, adjust a warpable image to obtain a final image corresponding to a final viewing direction of the user, and display the final image.
[0109] In this regard, the image data is typically arranged to provide a warpable image for which a portion of the image that is displayed at the display device 17 is dependent on a position or orientation of a viewer. The warpable image may then be rendered before a most up to date viewing direction of the user is known. The warpable image may be transmitted to the display device, or the warpable image may be transmitted to a rendering node which is near to the display device, and the display device or rendering node may perform time warping to generate a displayed image portion based on the warpable image and the most up to date viewing direction of the user.
[0110] As mentioned above, a single device may provide a plurality of the described components. For example, a first rendering node may comprise the image generator 11 , encoder 12 and transmitter 13. Additional similar rendering nodes may be included in the system, and may work together to generate the sequence of frames.
[0111] In one case, multiple rendering nodes may each provide separate image data to an image data assembling node; for example, each rendering node may provide a part of a sequence of frames to a frame assembling node. For example, the receiver 15, decoder 16 or display device 17 may be configured to assemble parts of image data from multiple sources to generate a sequence of images for display on the display device.
[0112] Alternatively, the image data assembling node may be separate from the receiver 15, decoder 16 and display device 17.
[0113] Additionally or alternatively, multiple rendering nodes may be chained. In otherwords, successive rendering nodes may add to a sequence of image data as it passes from rendering node to rendering node, and eventually a complete sequence of image data is then provided to the receiver 15. Furthermore, each rendering node may obtain components of a render from multiple upstream rendering nodes and / or distribute components of a render to multiple downstream rendering nodes.
[0114] A chain of rendering nodes may be useful for performing different rendering tasks that require different quantities of processing resources, or different frame rates. For example, a company may provide distributed processing in the form of a centralised hub which has abundant processing resources but is distant from users, and peripheral locations which have more scarce processing resources but are closer to users. Expensive but fairly static rendering features such as background lighting or environmental impact on sound may be generated at the central hub (for example using ray tracing), while features that require fewer resources but faster responses or higher frame rates may be generated closer to the user. In other words, the more responsive a rendering feature needs to be, the lower latency it needs between the rendering node which generates the feature and the user display and, in a chain of rendering nodes, the node which generates each rendering feature can be chosen based on a required maximum latency of that feature. On the other hand, if it is expensive to generate a rendering feature, then it may be preferable to generate the feature less frequency and with a higher maximum latency. For example, a static, high-quality background feature may be generated early in the chain of rendering nodes and a dynamic, but potentially lower-quality, foreground feature may be generated later in the chain of rendering nodes, closer to the user device. Here, environmental impact on sound means, for example, a set of surfaces may be constructed where each surface has different sound reflection and absorption properties depending upon material and shape. The frame rates may be matched by creating multiple frames with features generated at the lower frame rate, and combining them with the frames with features generated at the higher frame rate. In a nonlimiting embodiment, a preliminary rendering generates volumetric object data including motion vectors at a first (lowest) frame rate, then produces 2D rendered frames plus depth information for a specific user at a second (higher) frame rate, then transmits video plus depth data to the user device, which produces final frames for display via space warping (depth-based reprojections) at a third (highest) frame rate. One or more of these steps may be performed in combination with the other described embodiments. The viewing position of the user may change as additional rendering tasks are performed at different rendering nodes in the chain. Each or any rendering node may obtain an updated viewing position before performing its respective rendering task.
[0115] Additionally, the system may simultaneously generate multiple sequences of image data for different respective users or different respective display devices. For example, in the context of a VR or AR experience, each user or display device may view a different 3D environment, or may view different parts of a same 3D environment. When using a chain of rendering nodes, each node may serve multiple users or just one user.
[0116] For example, a starting rendering node (e.g. at a centralised hub) may serve a large group of users. For example, the group of users may be viewing nearby parts of a same 3D environment. In this case, the starting node may render a wide zone of view (“field of view”) which is relevant for all users in the large group.
[0117] The starting node may send this wide field of view to a first middle rendering node which renders additional aspects of the 3D environment. These additional aspects may for example be aspects which require less processing power to render, or may be aspects which are specific to individual users of the group. Additionally, the middle rendering node may render features in a smaller field of view than the starting node - this smaller field of view may be relevant to each user rather than the group of users. The first middle rendering node may additionally only serve a smaller number of users (e.g. half of the large group of users), with the remaining users being served by a second middle rendering node which also receives the wide field of view from the starting node.
[0118] The middle rendering node(s) may then send sequences of second partially or fully rendered frames to an end device for each user. The end device may perform further processes such as warping or focal distance adjustments, optionally using depth map data.
[0119] Preferably, each rendering node encodes the partially or fully rendered frames before transmitting them on to a next rendering node or to the receiver 15. This means that the required communication resources can be reduced when the rendering nodes are separated by one or more networks, or more generally are implemented in a distributed system such as a cloud.
[0120] However, each rendering node in a chain is encoding a different partially or fully rendered frame, with different data. Therefore, it may be advantageous for different rendering nodes to use different rendering formats and / or encoding formats. For example, the output from a first rendering node may be point cloud data which logically describes a 3D scene. This point cloud data can be encoded using the techniques of EP21386059.6. A second rendering node may then operate on the point cloud data to generate image data that is more readily displayed by a generic display device, without requiring the display device to model the 3D environment. This image data may be encoded using video coding techniques.
[0121] The chaining of rendering nodes may be extended to arbitrary tree structures, where a rendering node obtains partially rendered frames from more than one preceding rendering node, and generates further partially or fully rendered frames based on the multiple obtained sequences of partially rendered frames.
[0122] For example, a content rendering network (CRN) comprising numerous rendering nodes may be used to serve a volumetric event to a large number of same-time users, such as users participating in a shared virtual environment. Rendering the same event for each user is far more expensive in terms of computation time and power consumption than rendering the volumetric effect once and performing the rendering equivalent of multicasting the volumetric effect for multiple users. For example, each user may have a second rendering node (such as a VR headset), and the network may comprise a central first rendering node. The first rendering node may render the volumetric event, and distribute partially rendered frames depicting the volumetric event to the different second rendering nodes. The second rendering node for each user may then integrate the partially rendered frames depicting the volumetric event into a view of the virtual environment which is currently being shown to each user, based on parameters such as the user’s virtual position.
[0123] The receiver 15, decoder 16 and display device 17 may be consolidated into a single device, or may be separated into two or more devices. For example, some VR headset systems comprise a base unit and a headset unit which communicate with each other. The receiver 15 and decoder 16 may be incorporated into such a base unit.
[0124] In some embodiments, the network 14 may be omitted. For example, a home display system may comprise a base unit configured as an image source, and a portable display unit comprising the display device 17.
[0125] In the event that the decoder 16 or the display device 17 does not or cannot handle one or more layers, the receiver 15 or another transmitter associated with the decoder or display device may send a corresponding layer drop indication back through the network 14. The layer drop indication may be received by each rendering node. A rendering node which generates partially or fully rendered frames for that specific decoder or display device may cease generating the dropped layer. On the other hand, a rendering node which generates partially or fully rendered frames for multiple end devices may disregard a layer drop indication received from one end device (as the dropped layer is still needed for other devices). Alternatively, rendering nodes which serve multiple end devices may record received layer drop indications, and may cease generating the dropped layer only when all end devices served by the rendering node indicate that the layer is to be dropped.
[0126] In preferred examples, the encoders or decoders are part of a tier-based hierarchical coding scheme or format. Hierarchical coding enables frames to be communicated with higher resolution and / or higher frame rate than is possible in single-tier coding schemes. In hierarchical coding, one or more enhancement layers is communicated with base data, where the enhancement layers can be used to up-sample the base data at the decoder, for example providing up-sampling in a spatial ortemporal dimension. When combined with equivalent down-sampling of the original frames and generation of the enhancement layer at an encoder, hierarchical coding can overall provide lossless compression of data, with higher resolution and / or higher frame rate for a given transmission bit rate. Examples of a tier-based hierarchical coding scheme include LCEVC: MPEG-5 Part 2 LCEVC (“Low Complexity Enhancement Video Coding”) and VC-6: SMPTE VC-6 ST-2117, the former being described in PCT / GB2020 / 050695, published as WO 2020 / 188273, (and the associated standard document) and the latter being described in PCT / GB2018 / 053552, published as WO 2019 / 111010, (and the associated standard document), all of which are incorporated by reference herein. However, the concepts illustrated herein need not be limited to these specific hierarchical coding schemes.
[0127] A further example is described in WO2018 / 046940, which is incorporated by reference herein. In this example, a set of residuals are encoded relative to the residuals stored in a temporal buffer.
[0128] LCEVC (Low-Complexity Enhancement Video Coding) is a standardised coding method set out in standard specification documents including the Text of ISO / IEC 23094-2 Ed 1 Low Complexity Enhancement Video Coding published in November 2021 , which is incorporated by reference herein.
[0129] The system describes above is suitable for generating and presenting a representation of a scene, where this scene displays media content to a user. The scene typically comprises an environment, where the user is able to move (e.g. to move their head or to turn their head) to look around the environment and / or to move around the environment. For example, the scene may be a scene of a room in a building, where the user is able to move around the room (e.g. by moving in the real-world and / or by providing an input to a user interface) in orderto inspect various parts of the room. Typically, the scene is a XR (e.g. a VR) scene, where the user is able to move about the scene in three degrees of freedom (3DoF) or six degrees of freedom (6DoF) so as to experience the scene.
[0130] As has been described with reference to Figure 1 , the image generator 11 may be arranged to determine point cloud data, where each point of the point cloud has a 3D position and one or more attributes. More generally, the image generator (or another component) is arranged to determine a three-dimensional representation of a scene, where this three-dimensional representation is thereafter used to generate two- dimensional images that are presented to a user at the display device 17.
[0131] While the points are typically points of a point cloud, more generally the disclosure extends to any point that is associated with a location and a value. Therefore, the points may, more generally, be considered to be data (or datapoints), which data is associated with a location and a value, and the ‘points’ may comprise polygons, planes (regular or irregular), Gaussian splats, etc.
[0132] Referring to Figure 3, there is described a method of determining (an attribute for) a point of such a three- dimensional representation. The method comprises determining the attribute using a capture device, such as a camera or a scanner. The scene may comprise a real scene, in which attribute values are captured using a camera, or a virtual scene (e.g. a three-dimensional model of a scene), in which attribute values are captured using a virtual scanner. Where this disclosure describes ‘determining a point’ it will be understood that this generally refers to determining a point that has a location and an attribute value, where determining the point comprises determining the attribute value and / or storing a point that comprises at least an attribute value and a location value (these values may be indirect values, e.g. where the location is identified relative to another point). Once a plurality of points have been captured, these points can be stored as a three-dimensional representation (e.g. a point cloud) so as to enable the reconstruction of the three-dimensional scene based on this representation.
[0133] Typically, the scene comprises a simulated scene that exists only on a computer. Such a scene may, for example, be generated using software such as the Maya software produced by Autodesk®. The attributes determined using the methods described herein may then depend on virtual objects located within the scene as well as a virtual lighting arrangement used in the scene.
[0134] In a first step 31 , a computer device initiates a capture process for a capture device, the capture process being initiated with an initial azimuth angle (e.g. of 0°) and an initial elevation angle (e.g. of 0°).
[0135] In a second step 32, the computer device causes a point to be captured using the capture device at the current azimuth angle and current elevation angle. Capturing a point typically comprises assigning an attribute value to the point, which attribute value may, for example, be a color of the point and / or a transparency value of the point. Typically, the point has one or more color values associated with each of a left eye and a right eye of a viewer. Capturing the point may also comprise determining a normal value associated with the point, e.g. a normal of a surface on which the point lies. Typically, capturing the point further comprises determining a location of the point, e.g. by determining a distance of the point from the camera.
[0136] In practice, determining the point may comprise sending a ‘ray’ from the capture device and then stepping through a computer model to determine which surface of the computer model is impacted by the ray. The color, transparency, and normal of this surface are then recorded alongside the distance of the surface from the capture device.
[0137] In a third step, 33, the computer device determines whether a point has been captured for the capture device at each azimuth of a range of azimuths and in a fourth step 34, if points have not been captured at each azimuth, then the azimuth angle is incremented and the method returns to the second step 32 and another point is captured. The azimuth angle may, for example, be incremented by between 0.01 ° and 1 ° and / or by between 0.025° and 0.1 °. Typically, the range of azimuth angles is selected to be 360° (i.e. so that the capture device captures points surrounding the entirety of the capture device), but it will be appreciated that other ranges are possible.
[0138] Once a point has been captured for each azimuth, in a fifth step 35, the computer device determines whether a point has been captured for the capture device at each elevation of a range of elevations and in a sixth step 36, if points have not been captured at each elevation, then the azimuth angle is reset to the initial value, elevation angle is incremented and the method returns to the second step 32 and another point is captured. The elevation angles may, for example, be incremented by between 0.01 ° and 1 ° and / or by between 0.025° and 0.1 °. Typically, the range of elevation angles is selected to be 360° (i.e. so that the capture device captures points surrounding the entirety of the capture device), but it will be appreciated that other ranges are possible.
[0139] In a seventh step 37, once points have been captured for each azimuth angle and each elevation angle, the scanning process ends.
[0140] This method enables a capture device to capture points at a range of elevation and azimuth angles. This point data is typically stored in a matrix. The point data may then be used to provide a representation of the scene to a user, e.g. the three-dimensional representation formed by the point data may be processed to produce two-dimensional images for each eye of a user, with these images then being shown to a user via the display device 17 to provide a virtual reality experience to the viewer. By using the captured data, a video can be provided to a viewer that enables the viewer to move their head to look around the scene (while remaining at the location of the capture device).
[0141] It will be appreciated that the capture pattern (or scanning pattern) described with reference to Figure 3 is purely exemplary and that numerous capture patterns are possible. In general, the capture process for each capture device comprises capturing one or more points at one or more azimuth angles and / or one or more elevation angles.
[0142] The ‘points’ captured by the capture device are typically associated with a size, such as a height, a width, or a depth. That is, the points typically relate to two-dimensional planes / pixels and / or three-dimensional voxels. In this regard, there is necessarily some space between the locations of adjacent points (since if the points had no width, then an infinite number of points would be required to capture points at each angle). The size provides points that depict a non-negligible area of the three-dimensional space so that a plurality of points can be fit together to provide a depiction of the scene to a viewer.
[0143] The width and height of each point is typically dependent on the distance of that point from the capture device, where more distant points have a larger width / height. The width and height of each point is typically determined so that when each point is displayed, there is no space between adjacent points (indeed, there may be some overlap between points to ensure that no gaps appear between points). This height / width of each point can be determined at the time of capturing the points, or can be determined or defined after the capture of the points.
[0144] Typically, the points comprise a size value, which is stored as a part of the point data. For example, the points may be stored with a width value and / or a height value. Typically, the minimum width and the minimum height of a point are set by the angle increment of the azimuth angle and the elevation angle respectively. The size may be then specified in terms of this angle increment and / or in terms of this minimum width / minimum height (e.g. as being a multiple of the angle increment). In some embodiments, the size value is stored as an index, which index relates to a known list of sizes (e.g. if the size may be any of 1x1 , 2x1 , 1x2, 2x2, pixels this may be specified by using 3 bits and a list that relates each combination of bits to a size). The size may be stored based on an underscan value. In this regard, where an object is very near to the viewing zone it may be captured using an unnecessarily dense arrangement of points. Therefore, certain surfaces or areas of the representation may be associated with an underscan value, which underscan value defines a reduction in the number of points captured as compared to a representation without underscan. The size of the points may be defined so as to indicate this underscan value. In an exemplary embodiment, the underscan value is an integer value between 0 and 3 and the size is stored as a combination of point dimensions (e.g. a width in the range [0,2]) and a height in the range ([0,2]) and an underscan factor (e.g. an underscan factor in the range [0,3]).
[0145] In some embodiments, the width and the height are dependent on the underscan factor. For example, when the underscan factor exceeds a threshold value, the possible height and width values may be limited. In a specific example, when the underscan factor is 3, the width and the height may be limited to the range [0,1]. The size may then be defined as size = underscan*9 + height*3 + width. Such a method provides efficient storage and indication of width, height, and underscan values.
[0146] As shown in Figure 4a, typically, for each capture step (e.g. each azimuth angle and / or each elevation angle), a plurality of sub-points SP1 , SP2, SP3, SP4, SP5 is determined. For example, where the azimuth angle increment is 0.1 ° then for an azimuth angle of 0°, sub-points may be determined at azimuth angles of -0.05°, -0.025°, 0, 0.025°, and 0.05° (and similar sub-points may be determined for a plurality of elevation angles). Attribute values of these sub-points may then be combined to obtain an attribute value for the point. For example, a maximum attribute value of the sub-points may be used as the value for the point, an average attribute value of the sub-points may be used as the value forthe point, and / or a weighted average of the sub-points may be used as the value forthe point. It will be appreciated that numerous other methods for combining the attribute values of the sub-points are possible.
[0147] By determining the attribute of a point based on the attributes of sub-points, the accuracy of the capture process can be increased. While it would be possible to simply reduce the increment of the angle steps to provide a higher resolution scene, by considering sub-points but only storing attributes for points, a balance can be struck between accuracy and file size (since storing every sub-point would lead to a substantial increase in the amount of data that needs storing).
[0148] With the example of Figure 4a, for each point of the three-dimensional representation that is captured by a capture device, this capture device may obtain attributes associated with each of the sub-points SP1 , SP2, SP3, SP4, SP5, combine these attributes to obtain a point attribute, and then store a point with a distance that is an average (e.g. a weighted average) of the distances of the sub-points from the capture device, at the nominal angle of the point, with the point attribute.
[0149] As shown in Figure 4b, where a plurality of sub-points SP1 , SP2, SP3, SP4, SP5 are considered, these points may have different distances from the location of the capture device. In some embodiments, the attributes of the sub-points may be combined in dependence on this distance, e.g. so that sub-points nearer to the capture device have higher weightings.
[0150] However, the possibility of sub-points with substantially different distances raises a potential problem. Typically, in order to determine a distance for a point, the distances for the sub-points are averaged. But where the sub-points have substantially different distances and / or are related to different surfaces in the scene, this may result in the point having a distance that does not correspond to any actual surface in the scene. Therefore, the point may seem to hang in space (e.g. to hang between the front and rear surfaces shown in Figure 4b.
[0151] Similarly, where the attribute values of the sub-points greatly differ, e.g. if the sub-points SP1 and SP2 are white in colour and the sub-points SP3 and SP4 are black in colour, then the attribute value of the point may be substantially different to the attribute value of other points in the scene. In an example, if the scene were composed of black and white objects, the point may appear as a grey point hanging in space between these objects.
[0152] In some embodiments, the computer device is arranged to aggregate sub-points so as not to create any floating points. For example, the computer device may determine whether the sub-points are spatially coherent by employing a clustering algorithm (e.g. a k-means clustering algorithm). Where the sub-points are spatially coherent (e.g. where a difference in the distance of the sub-points is below a threshold value), these distances may be averaged to obtain a distance for the point. Where the sub-points are not spatially coherent, the sub-points may be processed to ensure that the distance of any point places it upon a surface; for example, in the system of Figure 4b, sub-points SP1 , SP2, and SP3 may be grouped into a first point and sub-points SP4 and SP5 may be grouped into a second point. Since each sub-point is associated with the same capture device and capture angle (all of these sub-points being associated with a capture step that has a particular azimuth angle and elevation angle), these points may be located at the same angle with respect to a capture device. Therefore, to ensure that each sub-point affects the representation considered, the first point (made up of sub-points SP1 , SP2, and SP3) may have a smaller distance value than the second point (made up of sub-points SP4 and SP5) and the first point may be assigned a nonzero transparency value so that the second point can be seen through the first point.
[0153] By capturing points at a plurality of azimuth angles and elevation angles, e.g. using the method described with reference to Figure 3, it is possible to provide a three-dimensional representation of the scene that can later be used to enable a viewer to view the scene from a plurality of angles. More specifically, given the three-dimensional points captured by the capture device, a computer device is able to render a two- dimensional representation (e.g. a two-dimensional image) of the scene for each eye of a viewer so as to provide a representation with an impression of depth. The computer device may render a series of two- dimensional representations to enable the viewer to look around the scene, where the two-dimensional representations are rendered based on an orientation of the viewer’s head. In this way, the determined representation is useable to provide, for example, a virtual reality (VR), mixed reality (MR), augmented reality (AR), and / or extended reality (XR) experience to the viewer.
[0154] To enable such a display, the display device 17 is typically a virtual reality headset, that comprises a plurality of sensors to track a head movement of the user. By tracking this head movement, the display device is able to update the images being displayed to the viewer as the viewer moves their head to look about the scene. Typically, this involves the display device sensing the sensor data to an external computer device (e.g. a computer connected to the display device via a wire). The external computer device may comprise powerful graphical processing units (GPUs) and / or computer processing units (CPUs) so that the external computer device is able to rapidly render appropriate two-dimensional images for the viewer based on the three-dimensional images and the sensor data.
[0155] In some embodiments, the external computer device may comprise a server device, where the display device 17 may be connected to this server device wirelessly. This enables the two-dimensional images to be streamed from the server to the display device so as to enable the display of high-quality images without the need for a viewer to purchase expensive computer equipment. In other words, operations that require large amounts of computing power, such as the rendering of two-dimensional images based on the three- dimensional representation, may be performed by the server, so that the display device is only required to perform relatively simple operations. This enables the experience to be provided to a wide range of viewers.
[0156] In some embodiments, a first two-dimensional image is provided to the display device 17 (and / or a connected device) and this first image is “warped’ in order to provide an image for viewing at the display device. The warping of the image comprises processing the image based on the sensor data in order to provide an image that matches a current viewpoint of the viewer. By performing the warping at the display device or another local device, the lag between a head movement of the user and an updating of the two- dimensional representation of the scene can be reduced.
[0157] One issue with the above-described method of capturing a three-dimensional representation is that it only enables a viewer to make rotational movements. That is, since the points are captured using a single capture device at a single capture location, there is no possibility of enabling translational movements of a viewer through a scene. This inability to move translationally can induce motion sickness within a viewer, can reduce a degree of immersion of the viewer, and can reduce the viewer’s enjoyment of the scene.
[0158] Therefore, it is desirable to enable translational movements through the scene. To enable such movements, the three-dimensional representation of the scene may be captured using a plurality of capture devices placed at different locations (or the same capture device placed at different locations). A viewer is then able to move around the scene translationally (e.g. by moving between these locations).
[0159] More generally, by capturing points for every possible surface that might be viewed by a viewer, a three- dimensional representation of a scene may be captured that allows a suitable two-dimensional representation of this scene to be rendered regardless of a location of a viewer (e.g. regardless of where a user is standing within a virtual room).
[0160] This need to capture points for every possible surface (so as to enable movement about a scene) greatly increases the amount of data that needs to be stored to form the three-dimensional representation.
[0161] Therefore, as has been described in the application WO 2016 / 061640 A1 , which is hereby incorporated by reference, the three-dimensional representation may be associated with a viewing zone, or a zone of viewpoints (ZVP), where the three-dimensional representation is arranged to enable a user to move about the viewing zone so as to view the scene.
[0162] Figure 5 illustrates such a viewing zone 1 and illustrates how the use of a viewing zone limits the amount of image data that needs to be stored to provide a three-dimensional representation of the scene. With the scene shown in this figure, and the viewing zone 1 shown in this figure, it is not necessary to determine attribute data for the occluded surface 2 since this occluded surface cannot be viewed from any point in the viewing zone. Therefore, by enabling the user to only move within the viewing zone (as opposed to around the whole scene) the amount of data needed to depict the scene is greatly reduced.
[0163] While Figure 5 shows a two-dimensional viewing zone, it will be appreciated that in practice the viewing zone 1 is typically a three-dimensional zone or volume.
[0164] The viewing zone 1 may, for example, comprise a rectangular volume, or a rectangular parallelepiped, and the viewing zone may have a height of at least 30 cm, a depth of at least 30 cm, and / or a width of at least 30 cm, where these dimensions enable a user to move their head while remaining in the viewing zone. This is merely an exemplary arrangement ofthe viewing zone; it will be appreciated that viewing zones of various shapes and sizes may be used (e.g. spherical viewing zones). That being said, it is preferable that the viewing zone is limited so as to cover only a part of the volume of the scene, e.g. no more than 50% of the scene no more than 25% of the scene, and / or no more than 10% of the scene. In this regard, if the viewing zone is the same size as the scene, then the three-dimensional representation will simply be a standard representation for virtual reality (that enables a user to move freely about the scene) - and so the use of the viewing zone will not provide any reduction in file size.
[0165] The viewing zone 1 enables movement of a viewer around (a portion of) the scene. For example, where the scene is a room, the base representation may enable a user to walk around the room so as to view the room from different angles. In particular, the viewing zone enables a user to move through the scene with six degrees-of-freedom (6DoF) movement through the scene, where this aids in the provision of an immersive experience.
[0166] In some embodiments, the viewing zone 1 may be four-dimensional, where a three-dimensional location of the viewing zone changes over time - and in such embodiments the size and location of the occluded surface 2 may also change over time. More generally, it will be appreciated that viewing zones may be formed in any size or shape, with different sizes and shapes being suitable for different scenes.
[0167] The volume of the viewing zone 1 is typically selected so that a user is able to move to a degree sufficient to avoid motion sickness and to provide an immersive sensation, while still only enabling a limited amount of movement (where this leads to a smaller file size as compared to an implementation where a user is able to fully move about the scene). Typically, the viewing zone is arranged to enable a user to move their head while they are sitting or standing, but not to freely roam around a room.
[0168] The viewing zone 1 may have a (e.g. real-world) volume of less than five cubic metres (5m3), less than one cubic metre (1 m3), less than one-tenth of a cubic metre (0.1 m3) and / or less than one-hundredth of a cubic metre (0.01 m3).
[0169] The viewing zone 1 may also have a minimum size, e.g. the viewing zone may have a volume of at least 1 % of the volume of the scene, at least 5% of the volume of the scene, and / or at least than 10% of the volume of the scene. Similarly, the viewing zone may have a volume of at least one-thousandth of a cubic metre (0.01 m3); at least one-hundredth of a cubic metre (0.01 m3); and / or at least one cubic metre (1 m3).
[0170] The ‘size’ of the viewing zone 1 typically relates to a size in the real world, where if the viewing zone has a length of one metre this means that a user is able to move one metre in the real world while staying within the viewing zone. The size of the viewing zone in the scene may be greater than, equal to, or less than the size of the viewing zone in the real world. For example, the viewing zone may scale a real-world distance so that moving one metre in the real world moves the user less than (or more than) one metre in the scene. This enables the scene to provide different perceptions to the user (e.g. to make the user feel larger or smaller than they are in real life). Similarly, the viewing zone may scale a real-world angle so that rotating one degree in the real world rotates the user less than (or more than) one degree in the scene.
[0171] Therefore, a viewing zone with a volume of one cubic metre typically connotes a viewing zone in which the user is able to move about a one cubic metre volume in the real world while remaining in the viewing zone. And this may cause the user to move about a volume that is more than, or less than, one metre in the scene.
[0172] Referring to Figure 6a, in order to capture points for each surface and location that is visible from the viewing zone 1 , a plurality of capture devices C1 , C2 C9 may be used (e.g. a plurality of virtual scanners and / or a plurality of cameras). Each capture device is typically arranged to perform a capture process, e.g. as described with reference to Figure 3, in which the capture device captures points at a plurality of azimuth angles and elevation angles. By locating the capture devices appropriately, e.g. by locating a capture device at each corner of the viewing zone, it can be ensured that most (or all) points of a scene are captured.
[0173] Typically, a first capture device C1 is located at a centrepoint of the viewing zone 1. In various embodiments, one or more capture devices C2, C3, C4, C5 may be located at the centre ef faces of the viewing zone; and / or one or more capture devices C6, C7, C8, C9 may be located at edges of and / or corners of the viewing zone.
[0174] Figure 6a shows a two-dimensional view (e.g. a plan view) of a rectangular viewing zone. It will be appreciated that within this viewing zone each capture device may be located on a shared plane. Equally, the various capture devices may be located on different planes. Referring, for example, to Figure 6b, there is shown a three-dimensional view of a cuboid viewing zone, where there is a capture device located: at the centre of the viewing zone; at the centre of each face of the viewing zone; and at each corner of the viewing zone.
[0175] With this arrangement, many locations in the scene (e.g. specific surfaces) will be captured by a plurality of capture devices so that there will be overlapping points relating to different capture devices. This is shown in Figure 7, which shows a first point P1 being captured by each of a first capture device C1 , a sixth capture device C6, and a seventh capture device C7. Each capture device captures this point at a different angle and distance and may be considered to capture a different ‘version’ of the point.
[0176] Typically, only a single version of the point is stored, where this version may be the highest quality version of the point and / or may be the version of the point associated with the nearest and / or least angled capture device.
[0177] In this regard, the highest ‘quality’ version of the point is captured by the capture device with the smallest distance and smallest angle to the point (e.g. the smallest solid angle). In this regard, as described with reference to Figures 4a and 4b, capturing a point for a given azimuth angle and elevation angle typically comprises capturing a plurality of sub-points at varying sub-point azimuth and elevation angles spread around the point azimuth and elevation angles. Due to the different spreads of sub-points, each capture device will capture a different version of the point (that has a different attribute) even when the points are at the same location. Capture devices that are close to the point and less angled with respect to the point typically have a smaller spread of sub-points and so typically obtain a version of a point that is sharper than a version of that point captured by more distant capture devices.
[0178] In some embodiments, a quality value of a version of the point is determined based on the spread of subpoints associated with this version (e.g. based on the perimeter formed by these sub-points and / or based on a surface area or volume bounded by these sub-points). The version of the point that is stored may depend on the respective quality values of possible versions of the points. Regarding the ‘versions’ of the points, it will be appreciated that two ‘points’ in approximately the same location captured by each capture device may not have exactly the same location in the three-dimensional representation. More specifically, since each capture device typically projects a ‘ray’ at a given angle, the rays of differing capture devices may contact the surface at different locations for each capture device. Two points may be considered to be two ‘versions’ of a single point when they are within a certain proximity, e.g. a threshold proximity. For example, where the first capture device C1 captures a first point and a second point at subsequent azimuth angles, and the sixth capture device C6 captures a further point that is in between the locations of the first point and the second point, this further point may be considered to be a ‘version’ of one of the first point and the second point.
[0179] This difference in the points captured by different capture devices is illustrated by Figures 8a and 8b, which show the separate captured grids that are formed by two different capture devices. As shown by these figures, each capture device will capture a slightly different ‘version’ of a point at a given location and these captured points will have different sizes. Each capture step is associated with a particular range of angles (e.g. a nominal capture angle of 1 ° might encompass angles from 0.9° to 1.1 °), and therefore capture devices that are far from a point to be captured represent a wider region at the capture distance than capture devices closer to that point to be captured. As shown in Figure 8a, the capture device C1 would capture the points P1 and P2 in separate brackets, whereas for the capture device C2 these points are in the same bracket. Therefore, the capture device C2 might determine a single point that encompasses both points P1 and P2, whereas the capture device C1 would determine separate points for these two points.
[0180] Considering then a situation in which points P1 and P2 are captured separately, and capture device C1 is used to capture point P1 while capture device C2 being used to capture point P2, it should be apparent that the ‘sizes’ of these captured points, and the locations in space that are encompassed by the captured points will be based on different grids. For example, the width of the captured point P2 captured by the capture device C2 will be larger than the width of the captured point P1 captured by the capture device C1. The capture process may be determined based on the existence of these different grids, and on the different bracket widths that occur at different distances from a capture device.
[0181] Figure 8a shows an exaggerated difference between grids for the sake of illustration. Figure 8b shows a more realistic embodiment in which the three-dimensional representation comprises a plurality of points associated with different capture devices, where these points lie on different grids associated with these different capture devices.
[0182] In order to store the points of the three-dimensional representation, the points may be stored as a string of bits, where a first portion of the string indicates a location of the point (e.g. using x, y, z coordinates) and a second portion ofthe string locates an attribute ofthe point. In various embodiments, further portions of the string may be used to indicate, for example, a transparency of the point, a size of the point, and / or a shape of the point.
[0183] A computer device that processes the three-dimensional representation after the generation of this representation is then able to determine the location and attribute of each point so as to recreate the scene. This location and attribute may then be used to render a two-dimensional representation of the scene that can be displayed to a viewer wearing the display device 17. Specifically, the locations and attributes of the points of the three-dimensional representation can be used to render a two-dimensional image for each of the left eye of the viewer and the right eye of the viewer so as to provide an immersive extended reality (XR) experience to the viewer.
[0184] The present disclosure considers an efficient method of storing the locations of the points (e.g. at an encoder) and of determining the locations of the points (e.g. at a decoder).
[0185] As has been described with reference to Figures 5a and 5b, the points of the three-dimensional representation are determined using a set of capture devices placed at locations about the viewing zone, where these capture devices are arranged to capture points at a series of azimuth angles and elevation angles. Typically, each of the capture devices is arranged to use the same capture process (e.g. the same series of azimuth angles and elevation angles), though it will be appreciated that different series of capture angles are possible. For example, there may be a plurality of possible series of capture angles, where different capture devices use different capture angles.
[0186] In general, the present disclosure considers a method in which points are stored based on a capture device identifier and an indication of a distance of the point from the capture device associated with this capture device identifier. Typically, the point is also associated with an angular indicator, which indicates an azimuth angle and / or an elevation angle of the point relative to the identified capture device.
[0187] It will be appreciated that the storage of the distance and the angle may take many forms. For example, the distance and the angle of each point may be converted into a universal coordinate system, where each capture device has a different location in this universal coordinate system. In particular, each point may be stored with reference to a centre of this universal coordinate system, which centre may be co-located with a central capture device. Where a point is determined based on a distance and an angle from a capture device of a known location in this universal coordinate system, the coordinates of the point in this universal coordinate system can be determined trivially - and the location of the point may then be stored either relative to the capture device or as a coordinate in the universal coordinate system.
[0188] The capture device identifier may comprise a location of a capture device (e.g. a location in a co-ordinate system of the three-dimensional representation). Equally, the capture device identifier may comprise an index of a capture device. Similarly, the indication of the azimuth angle and the elevation angle for a point may comprise an angle with reference to a zero-angle of a co-ordinate system of the three-dimensional representation. Equally, the azimuth angle and / or the elevation angle may be indicated using an angle index.
[0189] In some embodiments, the three-dimensional representation is associated with configuration information, which configuration information comprises one or more of: a set of capture device indexes; locations associated with the capture devices and / or the capture device indexes; a spacing of capture devices (e.g. so that locations of the capture devices can be determined from a location of a first capture device and the spacing); angles associated with a capture process for the capture devices; an azimuth angle increment and / or an elevation angle increment associated with the capture process; and a set of angle indexes (e.g. to match an angle index to an angle).
[0190] With this configuration information, it is possible to determine a location of each capture device from an index of that capture device and / or to determine a capture angle from a known capture process. Therefore, given two numbers: a capture device index and an angle index (that is associated with a combination of a specific azimuth angle and a specific elevation angle), a location of a capture device and a direction of a point from this capture device can be determined. By also signalling a distance of the point from the signalled capture device, a precise location of the point in the three-dimensional space can be signalled efficiently.
[0191] Typically, the point is associated with each of: a camera index, a distance, an first angular index (e.g. a first azimuth), and a second angle (e.g. a second elevation)
[0192] This method of indicating a location of a point enables point locations to be identified using a much smaller number of bits than if each point location is identified using x, y, z coordinates.
[0193] Referring to Figure 9, there is shown a method of determining a location of a point. This method is carried out by a computer device, e.g. the image generator 11 and / or the decoder 15.
[0194] In a first step 41 , the computer device identifies an indicator of a capture device used to capture the point. Typically, this comprises identifying a portion of a string of bits associated with a capture device index. In a second step 42, the computer device identifies an indicator of an angle of the point from the capture device. Typically, this comprises identifying an angle index, e.g. an azimuth index and / or an elevation index and / or a combined azimuth / elevation index, which index(es) identifies a step of the capture process during which the point was captured.
[0195] In a third step 43, based on the identifiers, the computer device determines the location of the capture device and the angle of the point from the capture device.
[0196] The capture device identifier is typically a capture device index, which is related to a capture device location based on configuration information that has been sent before, or along with, the point data. For example, the configuration information may specify:
[0197] Location of first capture device is (0,0,0).
[0198] Step between capture devices is (0,0,1) along the grid, then across the grid, then up the grid.
[0199] - The grid is (10,10,10).
[0200] With this information, a capture device with an index of 1 can be determined to be located at (0,0,0); a capture device with an index of 5 can be determined to be located at (0,0,4); a capture device with an index of 12 can be determined to be located at (0,1 ,0), and so on.
[0201] Equally, the configuration information may specify a list of camera indexes and locations associated with these indexes, where this enables the use of a wide range of setups of capture devices.
[0202] Typically, the three-dimensional representation is associated with a frame of video. The configuration information may be constant over the frames of the video so that the configuration information needs to be signalled only once for an entire video. Therefore, the configuration information may be transmitted alongside a three-dimensional representation of a first frame of the video, with this same information being used for any subsequent frames (e.g. until updated configuration information is sent).
[0203] The angle identifier may similarly be related to an angle by a location and an increment that are signalled in a configuration file. For example, the configuration information may specify:
[0204] An azimuth increment and an elevation increment are each 1 °.
[0205] There are 359 increments for each angle type.
[0206] With this information: a capture angle with an index of 1 can be determined to be at an azimuth angle of 0° and an elevation angle of 0°; a capture angle with an index of 10 can be determined to be at an azimuth angle of 10° and an elevation angle of 0°; a capture angle with an index of 360 can be determined to be at an azimuth angle of 0° and an elevation angle of 1 °; and a capture angle with an index of 370 can be determined to be at an azimuth angle of 9° and an elevation angle of 1 °; etc.
[0207] In a fourth step 44, based on the determined location of the capture device and the determined angle, a location of the point is determined. Typically, this comprises determining the location ofthe point based on the location of the capture device, the capture angle, and a distance of the point from the capture device (where this distance is specified in the point data for the point).
[0208] Determining the location of the point typically comprises determining the location of the point relative to a centrepoint of the three-dimensional representation. This location of the point may then be converted into a desired coordinate system and / orthe point may be processed based on its location (e.g. to stitch together adjacent points).
[0209] The angular identifier typically comprises a first angular identifier and a second angular identifier, where the first identifier provides the azimuthal angle of the point and the second identifier provides the elevation angle of the point. Referring to Figure 10, each angular identifier may be provided as an index of a segment of the three- dimensional representation, where, for example, an index of 0 may identify the point as being in a first angular bracket 101 and an index of 1 may identify the point as being in a second angular bracket 102.
[0210] In this regard, the capture devices are arranged to perform a capture process, e.g. as described with reference to Figure 3, with a non-infinite angular resolution. Given this non-infinite resolution, each point is not a one-dimensional point located at a precise angle. Instead, each point is a point for a particular area of space, with the size of this area being dependent on the angular resolution as well as the distance of the point from the capture device. In other words, each capture angle determines a point for an angular range (with the range being dependent on the angular resolution). That is, if the capture process leads to points being captured at angles of 10°, 11 °, and 12° then this can equally be considered to relate to points being captured at a first range of 9.5°-10.5°, a second range of 10.5°-11 .5°, and a third range of 11 .5°-12.5°.
[0211] This is shown in Figure 10, which shows a series of angular brackets, with the size of these angular brackets at a given distance being dependent on the angular resolution. The angular identifier(s) typically comprise a reference to such an angular bracket. Consider, for example, a cube placed with the capture device C1 at the centre of this cube. By dividing this cube into x segments at regular azimuth angles and y segments at regular elevation angles, it is possible to identify any angular range of the representation by reference to an x segment and a y segment (and then the space bracketed by this angular range will depend on both the angular resolution (e.g. the angle between adjacent brackets) and the distance of the point from the capture device).
[0212] Typically, each capture device has the same capture pattern so that the angular bracketing of each device is the same (albeit centred differently at the location of the relevant capture device). For example, in an embodiment with 1000 equal angular brackets, the angle for each bracket may be 360 / 1000.
[0213] In some embodiments, different capture devices are associated with different capture patterns, where this may be signalled in configuration information relating to the three-dimensional representation.
[0214] In some embodiments, each capture device is arranged to capture a point for a plurality of angular brackets, where each bracket is associated with a different angle. The angular spread of each bracket (that is, the angle between a first, e.g. left, angular boundary of the bracket and a second, e.g. right, angular boundary of the bracket) may be the same; equally, this angular spread may vary. In particular, the angular spread may vary so as to be smaller for points which are directly in front of (or behind, or to a side of) the capture device. For example, the embodiment shown in Figure 7 shows an angular bracketing system that is based on a cube. With this system, a cube is placed such that a capture device is located at the centre of the cube and the cube is then split into 1000 sections of equal size (it will be appreciated that the use of 1000 sections is exemplary and any number of sections may be used). Each of these sections is then associated with an angular index. With this arrangement, the angular spread of each section (or bracket) varies, as has been described above.
[0215] Figure 10 shows a two-dimensional square, where each angular bracket of the square is referenced by an index number (between 1 and 100). In a three-dimensional implementation, an angular bracket of a cube could be indicated with two separate numbers (with a first azimuthal indicator that identifies a ‘column’ of the cube and a second elevational indicator that identifies a ‘row’ of the cube). Equally, a singular indicator may be provided that indicates a specific bracket of the cube. Therefore, for a cube that is divided into 1000 elevational sections and 1000 azimuthal sections, the bracket may be indicated with two separate indicators that are each between 0 and 999 or with a single indicator that is between 0 and 999999.
[0216] It will be appreciated that the use of a cube to define the brackets is exemplary and that other bracketing systems are possible. For example, a spherical bracketing system may be used (where this leads to curve angular brackets). Equally, a lookup table may be provided that relates angular indexes to angles, where this enables irregularly spaced brackets to be used. Typically, determining the location of the point comprises determining the location of the point so as to be at the centre of the angular bracket identified by the angular identifier(s).
[0217] Texture patches
[0218] In order to reduce the file size of the three-dimensional representation (and the bandwidth required to transmit the three-dimensional representation) it is desirable to reduce the number of points within the three-dimensional representation. Therefore, referring to Figure 11 , there is described a method of determining a texture patch that can replace a plurality of points in the representation.
[0219] In a first step 51 , the computer device identifies a plurality of points of the representation; in a second step 52, the computer device determines that the points lie on a shared plane; in a third step 53, the computer device determines a texture patch based on the attributes of the points; and in a fourth step 54, the computer determines a new point that references the texture patch (this new point may be referred to as a ‘texture point’).
[0220] The texture patch typically comprises a patch with a plurality of attribute values, which attribute values may be the same as the attribute values of the identified points. Therefore, the texture patch enables the recreation of the plurality of points. A benefit of using the texture patch is that a single point, with a single location value and a (single) reference to the texture point, can replace the plurality of identified points. The attribute values of each point are contained in the texture patch so that little (or no) information is lost from the original representation, but by representing all of these attribute values by reference to the texture patch, only a single location needs to be signalled (saving on the computational cost of signalling locations for a plurality of points). For example, an 8x8 square of identified points that each have separate locations and attribute values may be replaced by a single point with a single location and an attribute value that is a reference to a texture patch (which texture patch comprises the attribute values of the identified points arranged in the relative positions of the identified points); this would reduce the size of the representation by 63 points (where a single point replaces an 8x8 grid of points) at the cost of needing to signal a 8x8 texture patch (that has 64 attribute values and / or transparency values and / or normal values).
[0221] This is shown in Figures 12a, 12b, and 12c. Figure 12a shows a plurality of points of a three-dimensional representation that lie on a shared plane. Each of these points has a location and an attribute value. Figure 18b shows how these points may be replaced by a single point (e.g. a ‘texture point’) that contains a reference to the texture patch shown in Figure 12c. This texture patch may comprise the attribute values of the plurality of points without separately storing the locations of the attributes (instead, the attribute values are laid out in a predetermined pattern, which is a 5x5 grid in the example of Figure 18c).
[0222] It will be appreciated that various sizes of texture patch are possible and that the 5x5 grid of Figure 12c is only an example. Another (practical) example of a texture patch is shown in Figure 12d, which shows an 8x8 arrangement of values laid out in the form of a texture patch. As shown in Figure 12d, typically the texture patch provides a continuous grid of pixel values (e.g. that can be used to form a continuous image) - in this regard, the points shown in Figures 12a - 12c are shown as separated points. In practice, these ‘points’ are typically abutting points that form a joined arrangement of values.
[0223] In some embodiments, the method comprises determining the texture patch in dependence on a difference of the attributes of the identified points exceeding a threshold (e.g. in dependence on a variance, a range, or a maximum difference of these attributes exceeding a threshold). In this regard, points that are similar in both location and attribute may be aggregated into a single point with a location and attribute that is based on the initial points and a size that covers both of the initial points (e.g. two adjacent points of the same colour and a size of 1 may be aggregated into a single point of this colour with a size of 2). Such an aggregation does not require any determination of a texture patch. In contrast, a texture patch may be determined where there is a plurality of dissimilar points (e.g. points with dissimilar attributes) that lie on a shared plane, where the use of the texture patch enables the attributes of each of these points to be signalled in an efficient manner.
[0224] The second step 52 of determining that the points lie on a shared plane may comprise determining that the points are lie on a shared surface (e.g. on the same object), where the method may comprise identifying a surface associated with the identified points.
[0225] Determining that the points lie on a shared plane may comprise comparing a distance of (each of) the points from this plane and / or surface to a threshold distance and determining that the points lie on the threshold / plane if they are within this threshold distance from the plane.
[0226] This second step 52 may also, or alternatively, comprise identifying a normal for each of the points, which normal may be contained in point data of the points, and determining a similarity of the normals (e.g. determining that each of the normals is within a threshold value of an average normal and / or determining that a variance of the normals is below a threshold value).
[0227] The texture patch is typically determined based on this determination in the second step 52, where if (e.g. only if) the identified points lie on a shared surface or plane then they may be replaced by a single point that references a texture patch.
[0228] In some embodiments, the texture patch may be determined for points that lie on a curved plane, where the second step 52 may comprise determining that the points lie on a curved plane or a curved surface. Such a texture patch may be associated with a bend value to enable the reproduction of the identified points. Typically, the texture patch comprises a quadrilateral, where the texture path may be able to bend about a line that is formed between opposite corners of this quadrilateral so as to map the texture patch to a curved surface.
[0229] The threshold distance (forthe points to be considered co-planar) may depend on the distance of the points from the viewing zone; in particular, points that are located far from the viewing zone may have a higher threshold separation than points that are located nearer to the viewing zone. Typically, users are better able to identify separations between a plurality of surfaces when these surfaces are near to the viewing zone whereas users may not be able to identify separations between surfaces that are distant from the viewing zone. Therefore, the maximum (threshold) acceptable distance between the identified points and a plane passing through the identified points may be dependent on the distance of the identified points from the viewing zone (e.g. the threshold may increase from a first value when the identified points are within 1 km of the viewing zone to a second value when the identified points are more than 1 km from the viewing zone).
[0230] Typically, the texture patch is associated with a size, where there may also be provided a plurality of texture patches of different sizes. Identifying the plurality of points may then comprise identifying a plurality of points that could potentially be replaced with a single point that references a texture patch. This may involve the computer device iterating through a plurality of pluralities of identified points and then evaluating each of these pluralities of identified points in order to determine whether the points lie on a shared plane. If these identified points are found to lie on such a shared plane, then a texture patch may be determined based on the attributes of these points and this texture patch may be added to a database of texture patches.
[0231] The texture patch may be associated with one or more of: one or more attribute values; one or more transparency values; one or more normals; etc. The texture patch may comprise a plurality of points that correspond to the points used to form the texture patch (e.g. where each point of the texture patch comprises an attribute, a normal, and / or a transparency of a corresponding point of the three-dimensional representation). The method may comprise determining a multi-layered texture patch and / or a plurality of texture patches. For example, the method may comprise determining a texture patch for each eye of a user (where these texture patches may be located at the same index of separate databases of texture patches so that they can be signalled by a single reference in a point).
[0232] In some embodiments, texture patches for each of a left eye and a right eye are stored in a shared database. The index for a texture patch for a first eye may then be set as being one greater than the index for a texture patch of a right eye, where this simplifies the signalling of the texture patches. In some situations, e.g. for diffuse non-reflective materials, each eye may be associated with the same texture patch. In these situations, only a single texture patch may be included in the database (with the index that would otherwise contain a second texture patch instead pointing to the single texture patch). Equally, the same texture patch may be stored twice. By storing the texture patches for each eye adjacent to each other, there is an increased ability to benefit from similarities between these texture patches when encoding the texture patch database.
[0233] Texture Atlas
[0234] The present disclosure considers an efficient method of encoding a database of texture patches, which is henceforth referred to as a ‘texture atlas’. More generally, the present disclosure considers an efficient method of processing one or more images of a plurality of images so as to enable computer devices to more efficiently store and transmit this plurality of images. This method is particularly applicable to texture patches, but more generally may be applied to any type of image (or group of images).
[0235] The ‘image’ typically comprises a plurality of attribute values (e.g. a plurality of pixel values and / or a plurality of colours). For example, the image may be a texture patch that comprises an, e.g., 8x8 arrangement of pixels. A point that references an image is therefore a point that is associated with a plurality of attribute values (whereas other points are typically associated with only a single attribute value).
[0236] As described above, replacing groups of points within the three-dimensional representation with a texture point that refers to a texture patch within a texture atlas provides a reduction in the file size of the three- dimensional representation since this reduces the amount of location data that is stored for the points represented by the texture patch.
[0237] In some embodiments, each texture patch is stored separately (e.g. each texture patch is stored without any consideration of other texture patches) and / or each texture atlas is stored separately (e.g. an entirely new texture atlas is determined for each three-dimensional representation). In these embodiments, each texture atlas may have a large file size, and this can slow down processing (e.g. encoding, transmitting and decoding) of the three-dimensional representation of a scene. Therefore, it is desirable to reduce or compress the file size of each texture atlas - methods for providing such a reduction in the size of a texture atlas are described below.
[0238] Figures 13a and 13b show schematic examples of different arrangements of texture atlases. Specifically: Figure 13a depicts a first texture atlas TA-1 that is formed from three square texture patches TP-X, TP-Y, TP-Z arranged in a line; Figure 13b depicts a different texture atlas TA-2 formed from five square texture patches TP-A, TP-B, TP-C, TP-D, TP-E arranged in an irregular arrangement.
[0239] In general, the texture atlas may be considered to be a database. Therefore, each texture patch may be stored with reference to an index within this database. In practice, the texture atlas typically comprises a two-dimensional image, where each texture patch occupies a different location in this image. Typically, each texture patch has a predetermined size and a predetermined attribute spacing; for example, each texture patch may have a size of 5x5, 6x6, 7x7, or 8x8 and the attributes of each texture patch may form a contiguous arrangement of attributes (e.g. an 8x8 texture patch may occupy an 8x8 arrangement of pixels within the texture atlas and this may relate to a contiguous 8x8 arrangement of adjacent angular brackets in the three-dimensional representation). In such embodiments, the references of the texture points may each comprise an index and these indexes may be related to texture patches within the texture atlas based on the known size of each texture patch; e.g. a first texture patch may start at the pixel (1 ,1) with a second texture patch starting at (9,1), and so on. It will be appreciated that texture patches with various, and / or irregular, sizes and spacings may equally be used where, for example, a size and a starting pixel of each texture patch may be defined in a header of the texture atlas.
[0240] Typically, the texture atlas is a collation of all the texture patches which have been determined from the three-dimensional representation. As described above, the texture patches can be of different sizes and shapes but typically the texture patch forms an M x N rectangle of points (and hence a corresponding M x N rectangle of attribute values, typically for each attribute of said rectangle of points). More typically the texture patch is an N x N square, also known as a texture quad, as depicted in Figures 13a and 13b. In such embodiments the texture atlas may be formed by simply appending all the square texture patches into one 2D image, wherein the texture patches are identified and located by a texture atlas index which locates the texture patch within the texture atlas (2d image). For example, in Figure 13b, TP-A (the upper leftmost texture patch) may be given the texture atlas index [0,0] and TP-E may be given the texture atlas index [1 ,2], Thus, said texture atlas index may be associated with the corresponding texture point (in the three- dimensional representation) such that the correct texture patch may be retrieved from the texture atlas, during decoding, by way of locating the texture patch at said texture patch index in the texture atlas. It will be appreciated that the number of texture patches determined within any three-dimensional representation will vary and hence the size of the texture atlas will similarly vary. Generally, the texture atlas may be a 2D image of any dimensions.
[0241] In some embodiments, the texture atlas may instead be implemented as a lookup table or dictionary wherein the texture patches are not arranged in a spatially adjacent manner. In this embodiment, the texture atlas index may be implemented as a dictionary index within the dictionary such that a texture patch may be identified and selected from within the dictionary by the dictionary index.
[0242] It will be appreciated that the specific implementation of the texture atlas will vary depending on the implementation of the texture patches. For example, if the texture patches are of an irregular shape or size then it may be preferable to use a dictionary implementation as they may not be as easily formed into a 2D image. Alternatively, storage as a 2D image may allow quicker generation of the texture atlas and so may be preferable in other cases. In general, the methods disclosed herein are applicable to the processing of an image of a plurality of images, regardless of how this plurality of images is stored.
[0243] As described above, each point in the three-dimensional representation may have multiple attributes. Typically, texture patches are generated for each attribute of a point (e.g. a right eye attribute, a left eye attribute, a transparency, etc.) and hence a texture atlas may be generated for each attribute (such that multiple texture atlases may be generated from one three-dimensional representation). Equally, each texture atlas may define a plurality of attributes that relate to a corresponding texture point (e.g. by containing a plurality of adjacent texture patches for each texture points or by containing texture patches that define a plurality of points).
[0244] For one or more of the texture points, a texture atlas may be formed that defines only a subset of the possible attributes. For example, for a texture point that is far from the viewing zone, there may be formed only a single texture patch that is used for both of a left eye and a right eye (in this regard, viewers typically are not able to notice stereoscopic effects for objects far from the viewing zone). In some embodiments a texture atlas may be formed from only a subset of possible texture patches or of generated texture patches. In such embodiments, the computing device may determine which texture patches to include in the texture atlas based on information about the texture patch - for example, the determination may be based on one or more of: an average of attribute values within a texture patch, a standard deviation of attribute values within a texture patch or a location of a texture point in the three-dimensional representation. In particular the computer device may be arranged to form the texture atlas in dependence on the distances of texture points in the three-dimensional representation from the viewing zone, where more information (e.g. more texture patches or texture patches for more attributes) are stored for points closer to the viewing zone. In some embodiments, the computer device is arranged to determine a distance of a texture point from the viewing zone and to store: a first number of texture patches for this texture point if the distance does not exceed a threshold and a second number of texture patches for this texture point if the distance exceeds the threshold, with the first number typically being greater than the second number.
[0245] As described above, in some embodiments, a texture patch may combine multiple attributes such that a resulting texture atlas similarly represents a combination of the attribute values. For example, a single texture atlas may include all three RGB values of a point. In some embodiments, related texture atlases may be combined such that related texture patches (such as those for a left and right eye of a user) are given adjacent texture atlas indexes.
[0246] In some embodiments, a plurality of texture atlases are generated based on a plurality of three-dimensional representations. In such embodiments this plurality of texture atlases may be compiled into a ‘composite’ texture atlas, where this composite texture atlas may be arranged as previously described (for example, as a composite two-dimensional image or a composite database of two-dimensional-images). The composite texture atlas may comprise some or all of the texture patches within each constituent texture atlas and / or the composite texture atlas may comprise texture patches that have been determined based on these texture patches within the constituent texture atlases.
[0247] For example, the texture atlas TA-1 of Figure 13a may be compiled with the texture atlas TA-2 of Figure 13b to create a composite texture atlas (not shown) comprising all texture patches TP-A, TP-B, TP-C, TP- D, TP-E, TP-X, TP-Y, and TP-Z. This composite texture atlas may be arranged differently to the texture atlas TA-1 and the texture atlas TA-2 (e.g. the composite texture atlas may be larger than either of these original texture atlases). This composite texture atlas allows texture patches derived from a plurality of three-dimensional representations to be combined into a singular (composite) texture atlas. Typically, a composite texture atlas comprises more texture patches than a texture atlas derived from only one three- dimensional representation. It will be appreciated that the methods described below, in regard to texture patches in a texture atlas, may be equally applicable to texture patches in a composite texture atlas. Indeed, typically a composite texture atlas may be functionally identical to a (regular) texture atlas.
[0248] The composite texture atlas may then be associated with a plurality of three-dimensional representations, e.g. with at least three, at least five, and / or at least ten three-dimensional representations. For example, the three-dimensional representations may be sent in a bitstream as a group of representations, where this group may be accompanied by the composite texture atlas (e.g. ten three-point representations may be sent alongside a single composite texture atlas). The references in each representation of this group of representations may then be treated as being references to the composite texture atlas.
[0249] The method may include processing one or more of the representations in the group of representations so as to update the texture points in these representations. In this regard, the compilation of the composite texture atlas typically leads to changes in the index values of each texture path (e.g. the texture patch TP- X may be located at a first index in the initial texture atlas TA-2 and at a second, different, index in the composite texture atlas). The method may then comprise processing a texture point that references the texture patch TP-X via the first index so as to update a reference of this texture point to relate to the second index. Typically, this comprises modifying an attribute datafield of the texture point so that this modified attribute datafield references the second index. Atlas inter-compression
[0250] In general, the present disclosure considers methods of processing a second group (or set) of images (e.g. a second texture atlas) in dependence on an arrangement of a first group (or set) of images (e.g. a first texture atlas) so as to enable more efficient encoding of a plurality of sets of images. In particular, the images in a second texture atlas may be rearranged based on the images in a first texture atlas so as to enable more efficient encoding of the second texture atlas. This encoding may involve defining the second texture atlas based on differences between the first and second texture atlases, where the reordering of the second texture atlas is used to reduce these differences so as to increase the efficiency ofthe encoding.
[0251] The description below considers an implementation in which each image is a texture patch and each group of images is a texture atlas. It will be appreciated that this is merely exemplary and that all of the teachings in this application are more generally applicable to any arrangement of images within groups of images.
[0252] Referring to Figure 14a, there is described a method of reordering a second group of images based on an image in a first group of images. With this example, each group of images is a texture atlas and each image is a texture patch. This method is typically performed by a computer device such as the image generator 11 and / or the encoder 13.
[0253] In a first step 61 , the computer device identifies a first texture patch in a first texture atlas. In a second step
[0254] 62, the computer device identifies a second, similar, texture patch in a second texture atlas. In a third step
[0255] 63, the computer device updates an index of the second texture patch based on an index ofthe first texture patch.
[0256] The updating of the index, may comprise the computer device rearranging (or reordering) one or more texture patches in the second texture atlas based on the identified pair of similar texture patches (e.g. the method may comprise updating indexes of a plurality of texture patches in order to rearrange the texture patches in the second texture atlas).
[0257] Typically, each texture atlas comprises an (e.g. two-dimensional) image, where each texture patch is located in a different portion of this image. This is shown by Figures 13a and 13b; for example, Figure 13a shows how a 15x5 texture atlas is formed of three 5x5 texture patches. Updating the index of the second texture patch then comprises rearranging the texture patches within this image. Effectively, the method of Figure 14a may then comprise rearranging a second texture atlas (a second image) so that it is more similar to a first texture atlas (a first image) than before the rearranging. The second image can then be encoded based on the first image, where the rearrangement enables more efficient encoding of the second image.
[0258] This is shown in Figures 14b, 14c, and 14d. Figure 14b shows a first texture atlas TP-1 that comprises three texture patches TP1-a, TP1-b, TP1-c. Figure 14c shows a second texture atlas TP-2 that comprises three texture patches TP2-a, TP2-b, TP2-c. Figure 14d shows a rearranged second texture atlas TP-2rthat comprises the same three texture patches TP2-a, TP2-b, TP2-c in a different order.
[0259] Considering the Figures 14b-14d in view of the method of Figure 14a, the computer device may identify the first texture patch TP1-a of the first texture atlas TA-1 and then search the second texture atlas TA-2 to identify the most similar texture patch of the second texture atlas. Upon identifying that the most similar texture patch is the second texture patch TP2-b ofthe second texture atlas, the computer device may move this second texture patch to the first position in the second texture atlas. This results in the first texture patch TP2-a of the second texture patch moving to a second position of the second texture atlas (so that effectively the first and second texture patches TP2-a, TP2-b switch places).
[0260] Referring to Figure 14d, this process may then be repeated for the second texture patch TP1-b of the first texture atlas TA-1 , for which the most similar texture patch of the second texture atlas TA-2 is the third texture patch TP2-c. This third texture patch is then moved to the second position of the second texture atlas, which results in the original first texture patch TP2-a of the second texture atlas moving from the second position into the third position of the second texture atlas. Therefore, the computer device eventually forms a rearranged second texture atlas TA-2r that is more similar to the first texture atlas than the original second texture atlas.
[0261] In general, the method may comprise iterating through the first texture atlas and, for each texture patch in the first texture atlas, determining a similar texture patch in the second texture atlas. An index of this similar texture patch is then updated to reposition the similar texture patch in the second texture atlas.
[0262] Yet more generally, the method may comprise: identifying one or more images (e.g. a texture patch) in a first set of images (e.g. a texture atlas) and then, for each identified image, identifying a similar image in a second set of images. A computer device may then, for each identified similar image, update an index of the similar image. This may be considered as the computer device rearranging the second set of images based on correspondences between images in the first set of images and respective similar images in the second set of images.
[0263] Updating an index of a second texture patch of a second texture atlas typically has a knock-on effect on other texture patches in this second texture atlas. For example, if an index of the second texture patch is updated from 5 to 2, then the other texture patches in the second texture atlas may be rearranged so that, for example, the texture patches with original indexes of 2, 3, and 4 are moved to indexes 3, 4, and 5 respectively. Typically, the updating of the index of the second texture patch also involves updating the index of one or more other texture patches in the second texture atlas to ensure that each index of the second texture atlas is occupied by only a single texture patch. This typically comprises incrementing the index of any texture patches in the second texture atlas that have an index between the index of the first texture patch and the (original) index of the second texture patch. In this way, the second texture patch can be slotted into the second texture atlas at the index of the first texture patch.
[0264] As described above, the method comprises identifying a second texture patch (or more generally image) that is similar to a first texture patch. Identifying a ‘similar’ texture patch may comprise determining a distance between the first texture patch and the second texture patch (e.g. a distance in Euclidean space). In some embodiments, identifying a similar image comprises determining a sum of absolute differences (SAD) between the first texture patch and the second texture patch. But it will be appreciated that this is just an exemplary implementation and that various other methods are possible for comparing two images in order to identify a similarity (or difference) between these images.
[0265] Typically, each image (e.g. each texture patch) comprises an M x N (e.g. an 8 x 8) arrangement of pixels. Determining the similarity of images may then comprise determining a sum of the (absolute) differences between corresponding pixels of each image. Determining a pair of similar texture patches - e.g. determining the second texture patch - may involve determining a sum of differences between the identified first texture patch in the first texture atlas and each texture patch in the second texture atlas. The second texture patch that is paired with the first texture patch can then be determined by determining the texture patch in the second atlas that provides the smallest sum of differences.
[0266] Identifying a similar image may comprise identifying a most similar image. Equally, identifying a similar image may comprise identifying an image that exceeds a threshold similarity.
[0267] A potential issue with this method of determining similarity is that a single texture patch in the second texture atlas may be the most similar texture patch to a plurality of texture patches in the first texture atlas. Therefore, if the reordering occurs purely based on a highest similarity, this could result in a repeated updating of an index of a single texture patch that would lead to a reduction in overall accuracy.
[0268] Consider an example where a single texture patch in the second texture atlas is the most similar for every single texture patch in the first texture atlas. The above method might result in the computer device repeatedly moving this single texture patch through the second texture atlas until it ends up in the very last index of the second texture atlas. This would lead to a situation where the final texture patches of the first and second texture atlases are similar, but each other pair of texture patches has not been considered or reordered.
[0269] Therefore, referring to Figure 15a, there is described a method of reordering a second texture atlas based on texture patches of a first texture atlas that avoids this issue. Specifically, this method involves only considering texture patches in the second texture atlas that have an index equal to or greater than an identified texture patch of the first atlas (where the computer device iterates through a plurality of identified texture patches with different indexes).
[0270] In a first step 71 , the computer device identifies a first texture patch in a first texture atlas, the first texture patch having a first index I. This first index may, for example, be an index of 1 (or 0 depending on an implemented programming language) so that the computer device begins at the first texture patch in the first texture atlas.
[0271] In a second step 72, the computer device iterates through a second texture atlas to determine a texture patch in a second texture atlas that is most similar to the first texture patch. This may be considered to be the computer device comparing a plurality of potential (or ‘candidate’) texture patches to the first texture patch to find a second texture patch that is similar to the first texture patch.
[0272] Typically, the iterating comprises iterating through the second texture atlas starting from an index that is the same as the first index I. That is, the iterating comprises iterating from an index I to an index N of the second texture atlas (where the second texture atlas comprises N texture patches).
[0273] In a third step 73, an index of the second texture patch is updated based on an index of the first texture patch. In particular, the second texture patch may be moved to the first index I of the second texture atlas. Therefore, when the texture atlases are encoded, the second texture patch can readily be encoded with reference to the first texture patch.
[0274] In a fourth step 74, the computer device increments the index associated with the first texture patch and identifies a further texture patch in the first texture atlas based on the incremented index. The computer device then returns to the second step 72 and again iterates through the second texture atlas to find a most similar texture patch of the second texture atlas to this further texture patch of the first texture atlas.
[0275] With this method, the second iteration is typically performed starting from an index that is the same as the index of the further texture patch. That is, the computer device typically only considers texture patches of the second texture atlas that have an index equal to or greater than an index of an identified texture patch of the first texture atlas.
[0276] Considering the example of Figures 14b-14d, the computer device may first identify a first texture patch TP1-a in the first texture atlas. This texture patch has an index of 1 . The computer device then compares this texture patch to each texture patch TP2-a, TP2-b, TP2-c in the second texture atlas and determines that the second texture patch TP2-b is the most similar to the first texture patch TP1-a. This texture patch TP2-b is then moved to index 1 of the second texture atlas (leaving the texture patch TP2-a at the index 2 and the texture patch TP2-c at the index 3). The computer device then identifies a further texture patch TP1-b at an index 2 of the first texture atlas. This further texture patch is compared only to the texture patches of the second texture atlas that have indexes equal to or greater than 2 (i.e. to TP2-a and TP2-c). The most similar of these remaining texture patches to TP1-b is the texture patch TP2-c, so this texture patch is then moved to the index 2 of the second texture atlas moving the texture patch TP2-a to the index 3 of the second texture atlas.
[0277] Typically, for a first texture patch that is at an index I of the first texture atlas, the computer device iterates through the second texture atlas from index I to index N to find a most similar texture patch of the second atlas. This most similar texture patch is moved to an index I of the second texture atlas. Then, for a further texture patch that is at an index i+1 of the first texture atlas, the computer device iterates through the second texture atlas from index i+1 to index N to again find a most similar texture patch of the second atlas. This further most similar texture patch is moved to an index i+1 of the second texture atlas. This process continues until the computer device reaches a texture patch that is at an index N of the first texture atlas (at which point there is only one remaining candidate texture patch in the second texture atlas, the texture patch that is at the index N of the second texture atlas).
[0278] With this method, the texture patch that is moved to the index I of the second texture atlas is excluded from consideration in later comparisons between texture patches of the first and second texture atlas. This prevents the repeated moving of a single texture patch in the second texture atlas (which repeated moving could lead to an inefficient rearrangement of the second texture atlas).
[0279] As described above, the computer device may be arranged to consider only a certain index range to identify potential second texture patches (e.g. to consider only indexes that are equal to or greater than an index of an identified texture patch of the first texture atlas).
[0280] Additionally, or alternatively, each texture patch of the second atlas that has been determined to be similar to a given texture patch of the first texture atlas may be marked once this second texture patch has been paired with a first texture patch. The computer device may then consider only unmarked texture patches in the second texture atlas when determining potential second texture patches for comparison to an identified first texture patch. With the example of Figures 14b-14d, this may comprise marking the texture patch TP2-b as being used once this texture patch has been paired with the similar texture patch TP1-a. This texture patch TP2-b may still be moved in the second texture atlas (or may be left in place with the pairing being recorded in a separate file).
[0281] In practice, this may comprise the computer device initialising a flag for each texture patch of the second set of texture patches, with each flag initially being set to a value of 0. Then, the computer device iterates through the texture patches of the first texture atlas to find similar texture patches in the second texture atlas. Each time the computer device identifies a texture patch of the second texture atlas that is similar to a texture patch of the first texture atlas, an index of this texture patch of the second texture atlas may be updated and / or a flag associated with this texture patch of the second texture atlas may be updated to identify that it has been paired with a texture patch of the first texture atlas. Then, when the computer device is considering a further texture patch of the first texture atlas, the computer device may compare this further texture patch only with texture patches of the second texture atlas that are associated with non-updated flags (e.g. flags identified with a 0 value).
[0282] The above method of reordering avoids the repeated moving of a single texture patch through the second texture atlas and so typically leads to a more efficient overall rearrangement of the second texture atlas. However, this method can also lead to an inefficient ordering where a texture patch of the second atlas that is only slightly similar to an initial texture patch of the first atlas has already been moved and so this texture patch of the second atlas is not available for consideration during the evaluation of the later, more similar, texture patches of the first texture atlas.
[0283] Therefore, the method may comprise performing a plurality of iterations through the first texture atlas where these iterations may include one or more of: an iteration through the first texture atlas in a first order, e.g. a front-to-back order; an iteration through the first texture atlas in a second order, e.g. an order opposite to the first order and / or a back-to-front order; an iteration in a random order; and / or an iteration in a selected order (e.g. through a selected list of indexes, where the computer device considers texture patches in the first texture atlas by moving through the list of indexes, and for each entry in the list the computer device identifies an index for the entry, identifies a texture patch in the first texture atlas that is at this index, and then compares this texture patch of the first texture atlas to each texture patch in the second texture atlas that has an index that is a subsequent entry in the list). In some embodiments, a first iteration through the texture patches of the first texture atlas proceeds as described above. A second (or later) iteration may involve an additional step in which, following the determination of a similar texture patch of the second texture atlas for an identified texture patch of the first texture atlas, the computer device determines a similarity of a given texture patch of the second atlas to one or more of: an identified texture patch of the first texture atlas with a different index than the given texture patch; one or more texture patches of the first texture atlas with indexes similar to a current index of the given texture patch; one or more texture patches of the first texture atlas that have previously been determined to be similar to the given texture patch.
[0284] Considering a practical example, on a first iteration moving front-to-back through the first texture atlas, the texture patch of the first texture atlas that has an index of 3 may be paired with a given texture patch of the second texture atlas (and this given texture patch may be moved to the index 3 of the second texture atlas). In a second iteration moving back-to-front through the first texture atlas, the texture patch of the first texture atlas that has an index of 20 may be paired with this same given texture patch of the second atlas. The similarity (e.g. the SAD) of each pair of texture patches may then be compared to determine whether this given texture patch is more similar to the texture patch at index 3 of the first texture atlas or the texture patch at the index 20 of the first texture atlas. If the given texture patch is more similar (e.g. by a threshold amount) to the texture patch at index 3, then another candidate similar texture patch may be determined for the texture patch at index 20 (e.g. a second most similar texture patch may be identified).
[0285] Given the rearranging of the second texture atlas, the given texture patch may have moved from the index 3 position by the time the index 20 texture patch of the first texture atlas is being evaluated (e.g. as a result of other texture patches in the second texture atlas being moved). Therefore, the computer device may compare the given texture patch to a range of texture patches of the first texture patch that have a similar index (e.g. to the texture patches of the first texture atlas that have an index of + / - 2 of the second texture patch). Equally, the computer device may store (e.g. in a cache) the rearranged second texture atlas or the pairs of similar texture patches that were determined in the first iteration before the second iteration of the method so that the computer device can identify pairs of similar texture patches that were determined in the first iteration. The given texture patch of the second texture atlas can then be compared to each of: a texture patch of the first texture atlas determined to be similar during the second iteration; and a texture patch of the first texture atlas determined to be similar during the first iteration to determine whether to update an index of this given texture patch.
[0286] In some embodiments, the computer device considers a plurality of possible orderings of the second texture atlas in order to determine a most efficient ordering. For example, the computer device may consider each possible ordering of texture patches in the second texture atlas and identify the ordering that provides the lowest overall encoding cost as compared to the first texture atlas. However, at least for large texture atlases, this is typically infeasible. Similarly, the computer device may perform a plurality of iterations and store the rearranged second texture atlas following each iteration. The computer device may then output the rearranged second texture atlas that is most similar to the first texture atlas.
[0287] Determining the rearranged second texture atlas that is most similar to the first texture atlas may comprise determining a sum of absolute differences (SAD) between the first texture atlas and each candidate rearranged second texture atlas (and selecting the rearranged second texture atlas with the smallest SAD). Equally, other methods of determining similarity (e.g. other than SAD) may be used.
[0288] In some embodiments, a machine learning model is used to determine the most efficient arrangement of texture patches in the second texture atlas (the most efficient ordering being the ordering that leads to the smallest difference between the first and second texture atlases (e.g. the lowest SAD when the texture atlases as a whole are compared). Referring to Figure 16a, in some embodiments the identification of a second texture patch in the second texture atlas (e.g. the second step 61 of the method of Figure 14a or the second step 72 of the method of Figure 15) comprises comparing one or more potential second texture patches to a plurality (or range) of texture patches from the first texture atlas.
[0289] In particular, in a first step 81 the computer device may identify a first texture patch in a first texture atlas (this first texture patch having a certain index, where the computer device then iterates through a plurality of indexes as described above). In a second step 82, the computer device may identify one or more potential second texture patches in a second texture atlas that are to be compared to the first texture patch (e.g. the computer device may identify one or more potential texture patches that are similar to the first texture patch or the computer device may identify each texture patch of the second texture atlas that has an index equal to or greater than an index of the first texture patch). In a third step, the computer device compares the (or each) potential second texture patch to a plurality of texture patches in the first texture atlas. In particular, the computer device may compare the (or each) potential second texture patch to a range of texture patches surrounding (and including) the first texture patch.
[0290] For example, as shown in Figure 16b, five texture patches of the first texture atlas may be considered, with these including the first texture patch itself, two texture patches with lower indexes than the first texture patch and two texture patches with higher indexes than the second texture patch. It will be appreciated that the use of five texture patches is exemplary and that any number of texture patches of the first texture atlas may be considered.
[0291] The computer device may then identify the most similar pair of texture patches (i.e. the most similar pair from the plurality of texture patches of the first index and the potential texture patches of the second texture atlas), where this pair includes a primary texture patch from the first texture atlas and a secondary texture patch from the second texture atlas (where the primary texture patch may be the same as the first texture patch, but equally may be a different texture patch - e.g. the first texture patch may be at the index I and the primary texture patch may be at the index i-1). Then, as described previously, the computer device may update the index of the secondary texture patch based on the index of the primary texture patch.
[0292] Equally, the computer device may update the index of the secondary texture patch based on the index of the first texture patch. That is, where the first texture patch is located at an index I, the computer device may identify a plurality of texture patches in the first texture atlas with indexes from i-2 to i+2. The computer device may then identify the texture patch in the second texture atlas (or the texture patch in the second atlas with an index equal to or greater than I) that is the most similar to any of the plurality of texture patches of the first texture atlas. For example, a secondary texture patch of the second texture atlas may be identified that is most similar to a primary texture patch at index i-1 of the first texture atlas. This secondary texture patch may be moved to the index I of the second texture atlas even though it is more similar to the primary texture patch at index i-1 than the first texture patch at index I (this avoids moving a texture patch of the second texture atlas that has already been moved to index i-1).
[0293] In a practical example, the second texture atlas may comprise two texture patches that are very similar to a texture patch at index i-1 of the first texture atlas. During the consideration of this i-1 th texture patch of the first texture atlas, one of these texture patches of the second texture atlas may be moved to the i-1 index of the second texture atlas. Then, during the consideration of this ith texture patch of the first texture atlas, the other similar texture patch of the second texture atlas may be moved to the I index of the second texture atlas based on its similarity to the i-1th patch of the first texture atlas. This can provide efficient encoding since these similar blocks will be located spatially adjacent in a two-dimensional image formed by the texture atlas. For example, the encoder may eventually encode the ith block of the second texture atlas based on the i-1 th block of the first texture atlas and a motion vector. The range of indexes considered when identifying the plurality of texture patches of the first texture atlas may depend, for example, on the available time or the available hardware, where a larger range typically provide a more efficient rearrangement of texture patches at the cost of an additional computational requirement.
[0294] Referring to Figures 17a-17d, in some embodiments the indexes are ordered left-to-right and / or top-to- bottom as shown in Figures 17a and 17b. Therefore, the plurality of texture patches of the first texture atlas that are selected for comparison to the potential second texture patches may be arranged along a line.
[0295] In some embodiments, as shown in Figure 17c and 17d, the plurality of texture patches are arranged in a z-order, where this leads to consecutive texture patches being located in spatially similar locations (e.g. instead of being arranged in a line, the plurality of texture patches of the first texture atlas may be arranged in a box).
[0296] This use of a z-order for the indexing therefore leads to each potential second texture patch being compared to a plurality of texture patches of the first texture atlas that are located in a spatially similar location (which texture patches may, in some situations, be expected to contain similar values.
[0297] While Figure 17d shows one arrangement of a z-order pattern, it will be appreciated that various z-order arrangements are possible (e.g. where the texture atlas comprises separate blocks of texture patches, where a z-order pattern is used for each block as well as the texture atlas as a whole).
[0298] Reordering
[0299] In some embodiments, the computer device is arranged to reorder the first texture atlas before the reordering of the second texture atlas. For example, the computer device may reorder the first texture atlas based on luminance values of the texture patches of the first texture atlas (e.g. to orderthe texture patches from lightest to darkest or vice versa). As used herein ‘reordering’ is typically used to indicate the movement of texture patches in a texture atlas where these movements are not dependent on another texture atlas while ‘rearranging’ is typically used to indicate the movement of texture patches in a texture atlas in dependence on (texture patches in) another texture atlas. This terminology is used largely for the sake of clarity.
[0300] Reordering the first texture atlas in this manner may involve updating the references of one or more texture points of an associated three-dimensional representation. In particular, the computer device may determine a correspondences list that defines original indexes for each texture patch (before the reordering) and updated indexes for each texture patch (after the ordering) where the correspondences list may then be used to update references associated with texture points within the three-dimensional representation. For example, if a texture patch is moved from an index i+4 to an index I, the associated texture point may be updated so as to point to this index I (instead of the originally referenced index i+4).
[0301] Similarly, following the reordering of the second texture atlas, the method typically comprises updating one or more texture points of the second three-dimensional representation based on updated index values of texture patches in the second texture atlas.
[0302] As described above, in some embodiments the method comprises reordering the first texture atlas based on a characteristic of the texture patches in the first texture atlas, e.g. based on the luminance of each texture patch. The method may comprise reordering the second texture atlas based on a characteristic of the texture patches in the first texture atlas. Typically, the same characteristic is used to implement an initial reordering of each texture atlas. Reordering the second texture atlas based on the characteristic before carrying out the above-described rearranging of the second texture atlas based on the first texture atlas can reduce the time taken to obtain an accurate rearranged second texture atlas since this initial reordering step moves the texture patches in the second texture atlas into a roughly correct order before the rearranging is performed. Typically, correspondences (between texture points and initial texture patch indexes) are determined before any reordering or rearranging takes place. Any movements of texture patches (any updates of indexes of texture patches) are then recorded during the reordering and rearranging processes, and then once a final rearranged texture atlas is obtained, each texture point is updated so as to reference the correct texture patch (where this typically comprises updating a reference in the texture point to point to a new index of a rearranged texture patch). In some embodiments, any changes in index are tracked, and then these changes are transmitted in a bitstream alongside the texture atlas. This enables an unedited version of the three-dimensional representation to be transmitted, where the correct texture patch for each texture point can then be determined based on the index referenced by the texture point and the stored changes that indicate any updates in the index of the texture patches.
[0303] In a practical example, a given texture point may reference the 5thtexture patch in a texture atlas. As part of the reordering and / or the rearranging of a texture atlas, this texture patch may be moved to the 3rdindex of the texture atlas. To ensure that the texture point is correctly interpreted, the reference of a texture point associated with this texture atlas may be updated (to point to the 3rdindex instead of the 5thindex).
[0304] Additionally or alternatively, a list of index updates may be transmitted alongside the rearranged texture atlas so that a computer device that is parsing the texture point is able to identify that it pointed to the original 5thindex and to identify that the texture patch at this original location has been moved to the 3rdindex. The use of a list of index updates precludes the need to alter the three-dimensional representation alongside the rearranging of the texture atlases. This may be useful, for example, when a first device already has a three-dimensional representation stored. The above-described rearranging of a texture atlas may be performed at a second device that sends the rearranged texture atlas and a list of index updates to the first device. With this arrangement, the second device does not need to have access to the three- dimensional representation (which may be useful, for example, if the three-dimensional representation comprises data that must be kept secure) and the first device does not need to alter the stored three- dimensional representation (which may be useful, for example, if the first device has been used to make modifications to the three-dimensional representation, e.g. to alter the locations of points of the three- dimensional representation or to alter attributes of non-texture points of the three-dimensional representation).
[0305] In some embodiments, the method comprises reordering and / or rearranging a plurality of texture atlases. In some embodiments, a plurality of texture atlases are rearranged in dependence on the first texture atlas (e.g. a second, third, fourth etc. texture atlas are each rearranged based on the same first texture atlas). In some embodiments, a plurality of texture atlases are rearranged based on previous texture atlases (e.g. a second texture atlas is rearranged based on the first texture atlas, a third texture atlas is rearranged based on the second texture atlas, a fourth texture atlas is rearranged based on the third texture atlas, etc.) so as to indirectly rearrange a plurality of texture atlases based on the first texture atlas. Such methods may involve reordering the first texture atlas based on, e.g. the relative luminance of the component texture patches and then rearranging each other texture atlas based on the first texture atlas in a successive pattern. Any of these other texture atlases may be reordered before being rearranged.
[0306] When each of the texture atlases have been reordered and / or rearranged, the present disclosure envisages a method of encoding the plurality of texture patches so as to form a video bitstream. For example, the texture patches may be encoded using AVC, HEVC, VVC, LCEVC or any other video compression technology. The rearranging of the texture atlases enables the efficient compression of the texture atlases.
[0307] Determination of an attribute value of the scene
[0308] As described above, the texture patches may be used to render a two-dimensional image based on the three-dimensional representation. In particular, the texture patches may be used so that a single point within the three-dimensional representation can be associated with a plurality of attribute values that lie on a shared plane (where the single ‘texture’ point can then define a location of these attribute values so as to provide an increase in efficiency as compared to an implementation where each attribute value is associated with a separate point with its own location).
[0309] Typically, the texture patches are stored within a texture atlas, where each texture point in the three- dimensional representation references a texture patch (e.g. by indicating an index within the texture atlas).
[0310] As has been described above, the methods of processing a texture patch within the texture atlas may comprise updating the references of texture points associated with any processed texture patch.
[0311] Thereafter, a representation of the scene may be rendered based on the updated references (and, e.g. based on deltas that are associated with the texture atlas).
[0312] Figure 18 shows a method of determining an attribute value of the scene in dependence on a reference in a texture point. In particular, the attribute value may be used to determine a pixel value of a rendered two- dimensional image of the scene.
[0313] In a first step 91 , the computer device identifies a texture point.
[0314] In a second step 92, the computer device identifies a reference in this texture point to a texture patch. Typically, the texture points comprise a reference to a texture patch in place of an attribute value (where other points may contain an attribute value, such as a colour, in an attribute field, the texture points may contain a texture patch index in this field).
[0315] The texture atlas index may be a modified texture atlas index, wherein the original texture atlas index has been replaced during the previously described processing. In some embodiments, the method comprises identifying that a reference has been modified, where the computer device may be arranged to modify the reference so as to identify that the modified reference refers to, for example, a texture patch within a rearranged texture atlas.
[0316] In some embodiments, the method comprises comparing the reference in the texture point to a list of updates, which list of updates identifies texture patches that have been rearranged. Where the texture point has not been modified following the rearrangement of the texture atlas, this may involve determining, based on the list, a correspondence between an original position of a texture patch (as identified by a reference of the texture point reference) and a final position of a texture patch (the position of this texture patch in the rearranged texture atlas) so as to identify the correct texture patch for a texture point. For example, the computer device may identify that a texture point points to the 5thindex of a texture atlas, and the computer device may query the list of updates to identify that the texture patch that was at the 5thindex in the original texture atlas has been moved to the 3rdindex of the rearranged texture atlas. This texture patch at the 3rdindex of the rearranged texture atlas may then be retrieved so that it can be displayed at the location of the texture point.
[0317] In this regard, in a third step 93, the computer device determines a plurality of attribute values for the texture point based on the referenced texture patch. Determining a plurality of attribute values may comprise identifying a texture patch located at a texture atlas index and identifying the attribute values of this texture patch.
[0318] Typically, the attribute values comprise colour values. For example, the texture patch may define an 8x8 square of colour values that can then be rendered at the location defined by the texture point. The texture patch may also define, for example, one or more transparency values, one or more normal values, etc. Typically, the texture patch defines a predetermined arrangement of attribute values (e.g. an 8x8 square).
[0319] Encoding (and formation of a bitstream)
[0320] Typically, the processing of a plurality of images as described above is performed on a computer device, wherein the resulting processed plurality of images (e.g. the texture atlas) may then be stored and / or transmitted by the computer device. Typically, the processed plurality of images will be encoded into a bitstream (a series of bits) by an encoding device before being transmitted. Typically, the bitstream will be decoded by a decoding device after being transmitted.
[0321] Typically, the texture atlas is associated with a three-dimensional representation, and the computer device may be arranged to generate a bitstream that comprises the processed texture atlas and one or more points of the three-dimensional representation. In some embodiments the texture atlas is a composite texture atlas which is associated with multiple three-dimensional representations and the computer device may be arranged to generate a bitstream that comprises the processed composite texture atlas as well as one or more points of a plurality of three-dimensional representation.
[0322] Figure 19 shows a schematic of a bitstream comprising 3 sections wherein each section comprises bits encoded from a portion of the processed plurality of images.
[0323] Bit-a to Bit-d forms a first section of the bitstream wherein one or more texture patches and / or one or more texture atlases may be encoded. The first section ofthe bitstream may comprise one or more flags, wherein flags are bits which may indicate the format of the data related to the processed plurality of images. For example, flags may indicate information related to the size and shape of said texture patches may also be encoded into the bitstream. In various embodiments, the bitstream may comprise one or more flags that indicate: whether texture patches are used in the three-dimensional representation(s); whether rearranged texture patches are used in the three-dimensional representation(s); and whether texture atlases are encoded using intra-encoding. In some embodiments, one or more texture atlases may be encoded by reference to other texture atlases (e.g. a second texture atlas may be encoded as differences between the second texture atlas and a first texture atlas; a flag may then be included in the bitstream to indicate that the second texture atlas is encoded in this manner).
[0324] Bit-e to Bit-f forms a second section ofthe bitstream wherein a plurality of texture points are encoded (other points that are not texture points may also be encoded). Each texture point comprises a texture patch reference that identifies an index of a texture patch in a texture atlas.
[0325] Typically, each texture atlas is associated with a different three-dimensional representation so that the index of a texture point relates to a specific texture atlas. In some embodiments, the reference of a texture point may identify each of a texture atlas and a texture point where a single texture atlas may be referenceable by the texture points of a plurality of different three-dimensional representations.
[0326] As described above, the bitstream may comprise a list of tracked changes that indicates any updates that have been made to the indexes of texture patches in a texture atlas. Therefore, a computer device parsing the bitstream may be able to match references in texture points to an original texture atlas to texture patches within a rearranged texture atlas.
[0327] In some embodiments, flags may indicate the length (in number of bits) of each section of the bit stream and / or if any section of the bitstream can be subdivided into portions of equal length. Hence flags may indicate if a decoder device may parallelise the decoding step by segmenting the bitstream into a plurality of decodable portions.
[0328] It will be appreciated that the number of bits in each section and the ratio of bits between sections are provided purely as an example and that in practice the number and ratio will vary based on the nature of the plurality of images provided.
[0329] While the bitstream is typically encoded in the order of the sections provided above, the bitstream may be encoded in any order. In some embodiments bits from the first and second section may be ‘interlaced’, wherein, for example, each texture atlas is immediately followed by the points of a three-dimensional representation associated with that texture atlas. The bitstream described above may be decoded by a decoding device and this may allow the original (or similar to the original) plurality of images to be regenerated.
[0330] A method of decoding said bitstream may comprise the steps of: identifying a first and a section of bits in a bitstream; generating one or more texture atlases based on the first section; identifying one or more texture points based on the second section; and generating one or more images based on the texture atlas(es) and the texture points.
[0331] Typically, the encoding of the texture atlases comprises inter encoding. In particular, video coding processes may be used to efficiently encode a plurality of successive texture atlases.
[0332] Optionally, initial texture atlases may be rearranged prior to encoding to enable more efficient encoding of one or more texture atlases. For example, thew texture patches in a texture atlas may be rearranged to reduce spatial differences in the texture atlas (e.g. to place similar texture patches adjacent each other). Equally, a second texture atlas may be rearranged to be more similar to a first texture atlas as described above.
[0333] Typically, the encoder 12 and the decoder 16 will comprise a hardware video encoder / decoder chip that is capable of such video encoding so that this method of encoding / compressing the texture atlases is efficient from a resource point of view.
[0334] In some embodiments, the aforementioned sections of a bitstream are arranged to be decoded separately (e.g. by separate computer devices or processing units), where this enables a parallelised method of decoding the bitstream so as to speed up a process of decoding and rendering a scene.
[0335] Referring to Figure 20, there is described a method of encoding a plurality of texture atlases, where each texture atlas comprises one or more texture points. More broadly, there is described a method of encoding a plurality of groups of images, where each group of images comprises one or more images. This method is carried out by a computer device, e.g. a computer device encoding one or more three-dimensional representations).
[0336] In a first step 101 , the computer device identifies a first texture atlas. In a second step 102, the computer device identifies a second texture atlas. In a third step 103, the computer device encodes the first texture atlas and the second texture atlas as a video (e.g. using a video encoding technique such as LCEVC).
[0337] The computer device may arrange (or rearrange or reorder) the texture patches in the first or second texture atlas before the encoding of the first and second texture atlas in order to improve the efficiency of the encoding. For example, the first and / or second texture atlas may be rearranged using the methods described above.
[0338] Typically, the method comprises encoding a plurality of texture atlases as a video (e.g. at least 3, at least 5, or at least 10 texture atlases), where this provides an efficient method of encoding and transmitting texture atlases. The texture atlases may be provided in the same bitstream as the points of one or more three-dimensional representations. Equally, the texture atlases may be provided in a separate bitstream.
[0339] Alternatives and modifications
[0340] It will be understood that the present invention has been described above purely by way of example, and modifications of detail can be made within the scope of the invention.
[0341] The representation is typically arranged to provide an extended reality (XR) experience (e.g. a representation that is useable to render a XR video). The term extended reality (XR) covers each of virtual reality (VR), augmented reality (AR), and mixed reality (MR) and it will be appreciated that the disclosures herein are applicable to any of these technologies. The representation may be encoded into, and / or transmitted using, a bitstream, which bitstream typically comprises point data for one or more points of the three-dimensional representation. The point data may be compressed or encoded to form the bitstream. The bitstream may then be transmitted between devices before being decoded at a receiving device so that this receiving device can determine the point data and reform the three-dimensional representation (or form one or more two-dimensional images based on this three-dimensional representation). In particular, the encoder 12 may be arranged to encode (e.g. one or more points of) the three-dimensional representation in order to form the bitstream and the decoder 14 may be arranged to decode the bitstream to generate the one or more two-dimensional images.
[0342] Typically, each texture atlas is associated with a separate three-dimensional representation and these texture atlases are processed separately. In some embodiments, the plurality of images includes images from a plurality of texture atlases, where this enables the efficient encoding of texture atlases for different three-dimensional representations (e.g. a shared set of representative images may be determined for a plurality of texture atlases associated with a plurality of three-dimensional representations where this set of representative images can then be signalled once at the start of a bitstream comprising this plurality of three-dimensional representations).
[0343] While the detailed description has primarily described the processing of a texture atlas that comprises a plurality of texture patches, it will be appreciated that more generally the methods disclosed herein may be applied to any group or set of images, where these sets of images may be used for various purposes. Typically, the methods are applicable to sets of images that comprise code books, where the sets of images are not intended to be displayed themselves but instead are intended to be used as a source from which images can be extracted. These extracted images can then be used to form images for display (e.g. by arranging the images in a different configuration to the code book).
[0344] Typically, a second image in the second group of images (e.g. a potential second texture patch in the second texture atlas) is compared to a first image in the first group of images (e.g. a first texture patch in the first texture atlas) in order to determine a pair of similar images. In some embodiments, a subset of pixels of the second group of images may be compared to the first image, where this subset of pixels may not be a texture patch (e.g. this subset of pixels may be spread across a plurality of texture patches). Such embodiments enable the formation of a rearranged second group of images that is more similar to the first group of images, but this may require the tracking of the rearrangements in order to enable the texture patches of the second texture atlas to be reconstituted in order to form an image based on these texture patches.
[0345] Typically, the computer device is arranged to evaluate texture patches based on indexes of these texture atlases in order to rearrange the texture patches in the second texture atlas. In some embodiments, the computer device may also (or alternatively) evaluate another set of pixels of the images that compose the texture atlases. For example, instead of considering a texture patch at an index I, the computer device may consider a shifted set of pixels (e.g. a set of pixels that comprises the last seven columns of the texture patch I and the first column of the texture patch i+1). This may lead to the computer device comparing sets of pixels that are not texture patches (and that are not associated with any texture points of the three- dimensional representation), but that are pixels from texture patches located adjacent to each other in the texture atlas.
[0346] In general, the computer device may consider a plurality of sets (or blocks) of pixels from the first texture atlas and compare these to one or more corresponding blocks of pixels from the second texture atlas in order to rearrange these blocks of the second texture atlas. This may comprise identifying a plurality of blocks from the first texture atlas at one or more positions that are shifted from the indices of the first texture atlas (e.g. the computer device may consider a texture patch at an index I, then an image that is shifted by one column of pixels, then an image that is shifted by two columns of pixels, etc., then an image that is shifted by one row of pixels, then an image that is shifted by two rows of pixels, etc. Therefore, each block of pixels of a given size or arrangement may be considered with the second texture atlas being arranged based on these blocks of pixels. This can provide more efficient encoding of the second texture atlas at the cost of requiring the recording of pixel movements so that the initial texture patches of the second texture atlas can be reconstituted.
[0347] Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.
Claims
Claims1 . A method of processing a group of images, the method comprising: identifying a first group of images and a second group of images, wherein each image in each group of images is associated with an index; identifying a first image in the first group of images; determining a second image in the second group of images, the second image being similar to the first image; and updating an index of the second image based on an index of the first image.
2. The method of any preceding claim, wherein the second group of images is associated with a three- dimensional representation of a scene, wherein one or more texture points in the three-dimensional representation reference images within the second group of images.
3. The method of claim 2, comprising: identifying a texture point in the three-dimensional representation that is associated with the second image; and updating a reference of the texture point based on the updated index of the second image.
4. The method of any preceding claim, comprising updating the index of the second image to be the same as the index of the first image.
5. The method of any preceding claim, wherein determining the second image comprises comparing one or more potential images from the second group of images to the first image, preferably comprising identifying the second images as a most similar image from the plurality of potential images.
6. The method of claim 5, wherein the potential images is selected from those images with an index equal to or greater than an index of the first image, preferably wherein the potential images include each image of the second group of images that has an index equal to or greater than the index of the first image.
7. The method of any preceding claim, wherein determining the second image comprises determining one or more of: a distance between the first image and the second image; a Euclidean distance between the first image and the second image, and a sum of absolute differences (SAD) between the first image and the second image.
8. The method of any preceding claim, comprising outputting a record of the update made to the index of the second index, preferably comprising outputting a record of one or more changes made to the second group of images, the changes indicating correspondences between original indexes of the images of the second group of images and updated indexes of said images of the second group of images.
9. The method of any preceding claim, comprising reordering the first group of images prior to the identifying of the first image, preferably comprising reordering the first group of images based on a characteristic of each image in the first group of images, more preferably wherein the characteristic comprises a luminance.
10. The method of claim 9, comprising reordering the second group of images prior to the identifying of the first image, preferably comprising reordering the second group of images based on a characteristic used to reorder the first group of images.11 . The method of any preceding claim, comprising iterating through a plurality of images in the first group of images and, for each image: determining a further image from the second group of images that is similar to said image; and updating an index of the further image based on an index of said image.
12. The method of claim 11 , comprising performing a plurality of iterations through the first group of images so as to update indexes of images of the second group of images, preferably wherein the plurality of iterations are performed in a plurality of different directions, more preferably wherein the plurality of iterations includes a forwards iteration and a backwards iteration.
13. The method of claim 12 comprising, for one or more of the images in the second group of images: identifying, during the first iteration, a first similar image in the first texture atlas; identifying, during the second iteration, a second similar image in the first texture atlas; determining a first similarity between the image in the second group of images and the first similar image; determining a second similarity between the image in the second group of images and the second similar image; comparing the first similarity and the second similarity; and updating the index of the image in the second group of images in dependence on the comparison of the first similarity and the second similarity; preferably, comprising updating the index of the image in the second group of image based on the image in the first group of images with a greater similarity.
14. The method of claim 12 or 13, comprising: determining a first rearranged second group of images following a first iteration through the first group of images; determining a second rearranged second group of images following a second iteration through the first group of images; determining a first similarity between the first rearranged second group of images and the first group of images; determining a second similarity between the second rearranged second group of images and the first group of images; and outputting one of the first rearranged second group of images and the second rearranged second group of images based on a comparison of the first similarity and the second similarity.
15. The method of any preceding claim, wherein each group of images comprises a two-dimensional macro image that is composed of the images in the group of images.
16. The method of claim 15, wherein each group of images is arranged such that the indices of the component images are arranged in a z-pattern.
17. The method of any preceding claim, comprising: identifying a plurality of potential first images in the first group of images, the plurality of potential first images comprising images with a range of indices; for one or more potential second images from the second group of images:comparing said potential second image to each of the plurality of potential first images; determining, for said potential second image, a most similar first image from the plurality of potential first images; and updating the index of said potential second image based on the index of the most similar first image.
18. The method of any preceding claim, wherein the first group of images is associated with a first three- dimensional representation and the second group of images is associated with a second three- dimensional representation; preferably, wherein the first and second three-dimensional representations are successive three-dimensional representations.
19. The method of any preceding claim, comprising: selecting a set of potential second images from the second group of images; preferably, the set of potential images is selected by random or stratified sampling.
20. The method of any preceding claim, comprising: determining, for each of the first image and the second image, a characteristic set of pixels; and determining that the second image is similar to the first image based on the respective characteristic sets of pixels.21 . The method of any preceding claim, comprising forming a bitstream comprising the second group of images.
22. The method of any preceding claim, comprising encoding the first group of images and the second group of images as a video, preferably using one or more of: AVC, HEVC, VVC, and LCEVC processes.
23. A method of encoding a group of images, the method comprising: identifying a first group of images that forms a first two-dimensional image comprising these images, wherein each image of the first group of images is present in the first two-dimensional image; identifying a second group of images that forms a second two-dimensional image comprising these images, wherein each image of the second group of images is present in the second two-dimensional image; and encoding the first group of images and the second group of images as a video.
24. The method of claim 22 or 23, comprising rearranging the first group of images and / or the second group of images so as to decrease a spatial difference within the first group of images and / or the second group of images, thereby allowing more efficient encoding of the video.
25. The method of any of claims 22 to 24, comprising rearranging the second group of images so as to decrease a difference between the second group of images and the first group of images, thereby allowing more efficient encoding of the video.
26. The method of any preceding claim, wherein each two-dimensional image comprises a plurality of tiles, wherein each tile comprises an image.
27. The method of any preceding claim, comprising: arranging the first group of images so as to form the first two-dimensional image; and arranging the second group of images so as to form a second two-dimensional image.
28. The method of any preceding claim, wherein each of the images of the first group of images and / or each of the images of the second group of images is associated with an index such that said image can be identified based on the index.
29. The method of any preceding claim, wherein each of the images of the first group of images and / or each of the images of the second group of images is associated with a point in a three-dimensional representation associated with the groups of images, preferably wherein the point comprises a reference to an index of an image within one of the groups of images.
30. A computer program product comprising software code that, when executed on a computer device, causes the computer device to perform the method of any preceding claim.31 . A machine-readable storage medium that includes instructions that, when executed by one or more processors of a machine, cause the machine to perform the method of any of claims 1 to 29.
32. An apparatus for processing a group of images, the apparatus comprising: means for identifying a first group of images and a second group of images, wherein each image in each group of images is associated with an index; means for identifying a first image in the first group of images; means for determining a second image in the second group of images, the second image being similar to the first image; and means for updating an index of the second image based on an index of the first image.
33. An apparatus for encoding a group of images, the apparatus comprising: means for identifying a first group of images that forms a first two-dimensional image comprising these images, wherein each image of the first group of images is present in the first two-dimensional image; means for identifying a second group of images that forms a second two-dimensional image comprising these images, wherein each image of the second group of images is present in the second two-dimensional image; and means for encoding the first group of images and the second group of images as a video.
34. A bitstream comprising one or more groups of images determined using the method of any of claims 1 to 29, preferably wherein the groups of images are encoded using a video codec, more preferably wherein the images are encoded using one or more of: AVC, HEVC, VVC, and LCEVC processes.
35. The bitstream of claim 34, comprising: one or more texture points of a three-dimensional representation, each texture point comprising a reference to an image in a group of images; a group of images; and a record of one or more changes made previously to the groups of images, the changes indicating a correspondence between an index contained in the texture point and an actual index of an image referenced by the texture point.
36. An apparatus for forming and / or encoding the bitstream of claim 34 or 35.
37. An apparatus for receiving and / or decoding the bitstream of claim 34 or 35.