Method, apparatus, and computer program for decoding Time-Referenced Video Expert Group (MPEG) Immersive Haptics Stream (MIHS) units

The introduction of MIHS units addresses the challenge of integrating haptic experiences by providing a structured format for haptic data delivery, enhancing efficiency and reducing computational overhead in multimedia presentations.

JP7883673B2Active Publication Date: 2026-07-01TENCENT AMERICA LLC

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
TENCENT AMERICA LLC
Filing Date
2023-10-17
Publication Date
2026-07-01

AI Technical Summary

Technical Problem

The integration of haptic experiences in multimedia presentations faces challenges due to the lack of a concept of a haptic access unit that enables mapping a haptic delivery format to a file format with an ISOBMFF sample structure, leading to cumbersome and computationally inefficient corrections of haptic sample offsets.

Method used

The implementation of Time-Referenced Video Expert Group (MPEG) Immersive Haptics Stream (MIHS) units, which are non-overlapping in time and include start times, allowing efficient mapping to ISOBMFF sample structures.

Benefits of technology

This approach simplifies and optimizes the mapping process, making it more efficient and computationally effective for haptic data delivery in multimedia presentations.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007883673000001
    Figure 0007883673000001
  • Figure 0007883673000002
    Figure 0007883673000002
  • Figure 0007883673000003
    Figure 0007883673000003
Patent Text Reader

Abstract

A method, an apparatus, and a system for haptic signal processing are provided. The process may include receiving a media stream including at least one haptic track and at least one media track. The process may include obtaining at least one MIHS unit from the media stream and obtaining, from the media stream, a start time associated with each of the at least one MIHS unit. The process may include rendering the media stream based on the start times. Each of the at least one MIHS unit does not overlap in time, and each MIHS unit in the at least one MIHS unit includes a start time for the MIHS unit, and the MIHS unit is associated with one or more haptic channels.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure is directed to a series of advanced video coding techniques. More specifically, this disclosure is directed to the encoding and decoding of haptic experiences for multimedia presentations.

Background Art

[0002] Haptic experiences are part of multimedia presentations. In applications where multimedia presentations include aspects of haptic experiences, haptic signals are delivered to a device or wearable, and a user can feel a haptic experience in conjunction with visual and / or audio media experiences during use of the application.

[0003] Recognizing the growing popularity of haptic experiences in multimedia presentations, the Motion Picture Experts Group (MPEG) has begun working on the development of compression standards (both MPEG-DASH and MPEG-I) for haptics, along with the transmission of compressed haptic signaling in the ISO based media file format (ISOBMFF).

[0004] One of the problems to be solved when incorporating aspects of haptic experiences within a multimedia presentation is the lack of a concept of a haptic access unit that would enable mapping a haptic delivery format to a file format having an ISOBMFF sample structure. This makes the mapping difficult and often requires cumbersome and computationally inefficient corrections of haptic sample offsets. Therefore, a solution to address this problem is needed.

Summary of the Invention

[0005] According to an embodiment, a method for processing haptic data may include the steps of: receiving a media stream including at least one haptic track and at least one media track; obtaining at least one MIHS unit from the media stream, wherein each of the at least one MIHS unit does not overlap in time, and each MIHS unit within the at least one MIHS unit includes the start time of that MIHS unit, and each MIHS unit is associated with one or more haptic channels; obtaining each start time associated with the at least one MIHS unit from the media stream; and rendering data in at least one haptic track based on each start time.

[0006] In accordance with embodiments, an apparatus for processing haptic data may be provided. The apparatus may include at least one memory configured to store program code, and at least one processor configured to read the program code and operate as directed by the program code. The program code may include a first receive code configured to cause at least one processor to receive a media stream including at least one haptic track and at least one media track, a first acquire code configured to cause at least one processor to acquire at least one MIHS unit from the media stream, each of the at least one MIHS unit not overlapping in time, an MIHS unit within the at least one MIHS unit including the start time of that MIHS unit, and an MIHS unit associated with one or more haptic channels, a second acquire code configured to cause at least one processor to acquire the respective start times associated with at least one MIHS unit from the media stream, and a rendering code configured to cause at least one processor to render the media stream based on the respective start times.

[0007] In accordance with the embodiment, a non-temporary computer-readable medium for storing computer instructions may be provided. The instructions may include one or more instructions that, when executed by one or more processors of a device for processing haptic data, cause one or more processors to perform the steps of: receiving a media stream including at least one haptic track and at least one media track; obtaining at least one MIHS unit from the media stream, each of the at least one MIHS unit not overlapping in time, and each MIHS unit in the at least one MIHS unit includes the start time of that MIHS unit, and each MIHS unit is associated with one or more haptic channels; obtaining the respective start times associated with the at least one MIHS unit from the media stream; and rendering the media stream based on the respective start times.

[0008] Further features, properties, and various advantages of the disclosed subject matter will become clearer from the following detailed description and accompanying drawings. [Brief explanation of the drawing]

[0009] [Figure 1] This is a schematic diagram of a simplified block diagram of a communication system according to an embodiment of the present disclosure. [Figure 2] This is a schematic diagram of a simplified block diagram of a streaming system according to an embodiment of the present disclosure. [Figure 3] This is a schematic diagram of a simplified block diagram of a haptic encoder according to an embodiment of the present disclosure. [Figure 4] This is a schematic diagram of a simplified block diagram of a haptic decoder and a haptic renderer according to embodiments of the present disclosure. [Figure 5] This is an illustrative diagram of a timing model for a haptic track according to embodiments of the present disclosure. [Figure 6]This is an exemplary flowchart illustrating the process for decoding haptic data according to embodiments of the present disclosure. [Figure 7] This is a diagram of a computer system suitable for implementing an embodiment. [Modes for carrying out the invention]

[0010] Methods, systems, and non-temporary storage media for parallel processing of dynamic mesh compression are provided in accordance with aspects of this disclosure. Embodiments of this disclosure can also be applied to static meshes.

[0011] With reference to Figures 1 and 2, embodiments of the present disclosure for implementing the encoding and decoding structures of the present disclosure are described.

[0012] Figure 1 shows a simplified block diagram of a communication system 100 according to an embodiment of the present disclosure. The system 100 may include at least two terminals 110, 120 interconnected via a network 150. In the case of one-way data transmission, the first terminal 110 may code video data, which may include mesh data, at its local location for transmission to the other terminal 120 via the network 150. The second terminal 120 may receive the coded video data from the other terminal via the network 150, decode the coded data, and display the recovered video data. One-way data transmission may be common in media serving applications, etc.

[0013] Figure 1 shows a second pair of terminals 130, 140 provided to support bidirectional transmission of coded video, for example, during a video conference. In the case of bidirectional data transmission, each terminal device 130, 140 may code video data captured at its local location for transmission to the other terminal over the network 150. Each terminal 130, 140 may also receive coded video data transmitted by the other terminal, decode the coded video data, and display the recovered video data on a local display device.

[0014] In Figure 1, terminals 110-140 may be represented, for example, as servers, personal computers, and smartphones, and / or any other type of terminal. For example, terminals 110-140 may be laptop computers, tablet computers, media players, and / or dedicated video conferencing equipment. Network 150 corresponds to any number of networks that carry coded video data between terminals 110-140, including, for example, wireline and / or wireless communication networks. Communication network 150 may exchange data over circuit-switched and / or packet-switched channels. Typical networks include telecommunications networks, local area networks, wide area networks, and / or the Internet. For the purposes of this discussion, the architecture and topology of network 150 may be irrelevant to the operation of this disclosure unless described hereafter herein.

[0015] Figure 2 illustrates the arrangement of a video encoder and decoder in a streaming environment as an example of an application of the disclosed subject matter. The disclosed subject matter can be used in other video-enabled applications, including, for example, video conferencing, digital TV, and storage of compressed video on digital media such as CDs, DVDs, and memory sticks.

[0016] As shown in Figure 2, the streaming system 200 may include a capture subsystem 213 which includes a video source 201 and an encoder 203. The streaming system 200 may further include at least one streaming server 205 and / or at least one streaming client 206.

[0017] The video source 201 can generate a stream 202 containing, for example, a 3D mesh and metadata associated with the 3D mesh. The video source 201 may include, for example, a 3D sensor (e.g., a depth sensor) or 3D imaging technology (e.g., a digital camera) and a computing device configured to generate a 3D mesh using data received from the 3D sensor or 3D imaging technology. The sample stream 202, which may have a higher data volume compared to the encoded video bitstream, can be processed by an encoder 203 coupled by the video source 201. The encoder 203 may include hardware, software, or a combination thereof to enable or implement the aspects of the subject disclosed as described in more detail below. The encoder 203 may also generate an encoded video bitstream 204. The encoded video bitstream 204 may have a lower data volume compared to the uncompressed stream 202 and can be stored in the streaming server 205 for future use. One or more streaming clients 206 can access the streaming server 205 to read a video bitstream 209, which may be a copy of the encoded video bitstream 204.

[0018] The streaming client 206 may include a video decoder 210 and a display 212. The video decoder 210 can, for example, decode a video bitstream 209, which is an incoming copy of an encoded video bitstream 204, and generate an outgoing video sample stream 211 that can be rendered on a display 212 or other rendering device (not shown). In some streaming systems, the video bitstreams 204, 209 may be encoded according to a specific video coding / compression standard.

[0019] An embodiment of the present disclosure for implementing the haptic encoder 300 and the haptic decoder 350 will be described with reference to Figures 3 and 4.

[0020] As shown in Figure 3, the haptic encoder 300 can receive both descriptive data and waveform haptic data. Therefore, the haptic encoder 300 can process three types of input files: .ohm metadata files (Object Haptic Metadata - a text file format for haptic metadata), descriptive haptic files (.ivs, .ahap, and .hjif), or waveform PCM files (.wav). Examples of descriptive data include Apple's .ahap (Apple Haptic and Audio Pattern - a JSON-like file format specifying haptic patterns) (representing the expected haptic output by a set of modulated continuous signals and a set of modulated parameterized transient signals), Immersion's .ivs (representing the expected haptic output by a set of basic effects parameterized by a set of parameters), or the proposed MPEG format .hjif (Haptics JSON Interchange Format). An example of a waveform puzzle code modulation (PCM) signal is a .ohm input file containing metadata information.

[0021] According to an embodiment, the haptic encoder 300 can process two types of input files differently. In the case of descriptive content, the haptic encoder 300 can semantically analyze the input to transcode the data (if necessary) into the proposed coded representation.

[0022] According to an embodiment, the.ohm metadata input file may include a description of the haptic system and setup. In particular, it may include the name of each associated haptic file (either descriptive or PCM) along with a description of the signal. It also provides a mapping between each channel of the signal and the target body part on the user's body. For the.ohm metadata input file, the haptic encoder 300 performs metadata extraction by reading the associated haptic file from the URI, encodes it based on its type and by extracting the metadata from the.ohm file, and maps it to the metadata information of the data model.

[0023] According to an embodiment, descriptive haptics files (e.g.,.ivs,.ahap, and.hjif) can be encoded by a simple process. The haptic encoder 300 first, specifically, identifies the input format. If the input format is a.hjif file, transcoding is not required and the file can be further edited, compressed into a binary format, and finally packetized into a MIHS stream. If an.ahap or.ivs input file is used, transcoding is required. The haptic encoder 300 first semantically analyzes the input file information and transcodes it to be formatted into the selected data model. After transcoding, the data can be exported as a.hjif file,.hmpg binary file, or MIHS stream.

[0024] According to an embodiment, the haptic encoder 300 may perform signal analysis to interpret the signal structure of a.wav file and convert it into a proposed encoded representation. In the case of waveform PCM content, the signal analysis process may be divided by the haptic encoder 300 into two sub-processes. After performing frequency band decomposition on the signal, in the first sub-process, the low frequencies may be encoded by a keyframe extraction process. The low frequency band may then be reconstructed, and the error between this signal and the original low frequency signal may be calculated. This residual signal may then be added to the original high frequency band before encoding by wavelet transform. Encoding by wavelet transform is the second sub-process. According to an embodiment, when several low frequency bands are used, the residual errors from all the low frequency bands are added to the high frequency band before encoding. In an embodiment where several high frequency bands are used, the residual error from the low frequencies is added to the first high frequency band before encoding.

[0025] According to an embodiment, keyframe extraction includes obtaining a low frequency band from frequency band decomposition and analyzing its content in the time domain. According to an embodiment, wavelet processing may include obtaining a high frequency band from frequency band decomposition and low frequency residuals, and dividing it into blocks of equal size. These signal blocks of equal size are then analyzed with a psychohaptic model. Irreversible compression may be applied by wavelet-transforming the blocks and quantizing them with the help of a psychohaptic model. Finally, each block is stored as an individual effect within a single band. This is done by formatting. Binary compression may apply reversible compression using appropriate coding techniques, such as the SPIHT (Set partitioning in hierarchical trees) algorithm and AC (Arithmetic Coding).

[0026] As shown in Figure 3, the haptic encoder 300 may be configured to encode descriptive data and quantized haptic data, and may output three types of formats: an interchange format (.hjif), a binary compressed format (.hmpg), and a streaming format (e.g., MPEG Immersive Haptic Stream (MIHS)). The .hjif format is a human-readable format based on JSON, easily parsable and manually editable, making it an ideal interchange format, especially when designing / creating content. For distribution, .hjif data can be compressed into a more memory-efficient binary .hmpg bitstream. This compression may be lossy, and various parameters affecting the encoding depth of the amplitude and frequency that make up the bitstream are used. For streaming, data can be compressed and packetized into an MPEG-I haptic stream (MIHS). The three formats described above serve complementary purposes, and lossy one-to-one conversions can occur between them.

[0027] As shown in Figure 4, the haptic decoder 350 can take either a .hmpg compressed binary file format or a MIHS bitstream as input. The haptic decoder 350 can output a .hjif interchange format that can be used directly for rendering. Both input formats undergo binary decompression to extract both metadata and the data itself from the file and map it to a selected data structure. The data can then be exported to the haptic renderer 380 in .hjif format.

[0028] As shown in Figure 4, the renderer 380 has a synthesizer. The synthesizer can render haptic data from an .hjif input file to a PCM output file. Rendering and / or synthesizing is informative. According to the embodiment, the synthesizer parses the input file and performs advanced synthesis distribution between vectors, wavelets, etc. The synthesis process then proceeds to the bandwidth components of the codec in which the synthesis process is invoked. Then, all bandwidths of a given channel are mixed by a simple addition operator to reproduce the desired haptic signal.

[0029] According to the embodiment, the haptic experience defines the root of the hierarchical data model. It provides information about the file's date and format version, it describes the haptic experience, it lists the various avatars (i.e., bodily representations) used throughout the experience, and it defines all haptic perceptions.

[0030] Depending on the embodiment, haptic signals may be encoded in multiple channels. In some embodiments, haptic channels may define signals rendered at specific body positions by dedicated actuators / devices. Metadata stored at the channel level may include information such as gain, mixed weights, desired body positions for haptic feedback, and optionally, reference device and / or orientation, associated with that channel. Additional information such as a desired sampling frequency or number of samples may also be provided. Finally, the haptic data of a channel is contained in a set of haptic bands defined by their frequency ranges. A haptic band describes the haptic signals of a channel within a given frequency range. A band is defined by a type and sequential list of haptic effects, each containing a set of keyframes. For any type of haptic band, the haptic effect may be defined by at least its position (temporal or spatial) and type. Depending on the type of band and the type of effect, further characteristics may be specified, including phase, base signal, configuration, and the number of sequential haptic keyframes describing the effect.

[0031] In accordance with the embodiments, a haptic data hierarchy is defined in this disclosure. ● Haptic Channels ○ Haptic Band • Haptic effect

[0032] In some embodiments, a self-contained stream format for carrying MPEG-I haptic data may use a packetization approach and may include two levels of packetization: an MPEG-I haptic stream (MIHS) unit that covers a certain time and contains zero or more MIHS packets, and an MIHS packet containing metadata or haptic effect data. In some embodiments, an MIHS unit may be referred to as a network abstraction layer unit associated with haptic data. In some embodiments, an MIHS unit may be referred to as an MIHS sample associated with haptic data.

[0033] Depending on the embodiment, each MIHS unit may cover a non-overlapping lifetime of haptic expression time. That is, it can begin at the end of the previous MIHS unit and cover a time defined by its lifetime field. An MIHS unit may be followed by the next MIHS unit until it is the last MIHS unit of the haptic experience. All MIHS packets of an MIHS unit may have a start time and lifetime of the MIHS unit they contain.

[0034] Depending on the embodiment, the MIHS unit may be a sync unit or a non-sync unit. A sync unit resets the previous effect and provides a haptic experience independent of the previous MIHS unit. A non-sync unit is a continuation of the previous MIHS unit and cannot be decoded and rendered independently without decoding the previous MIHS unit.

[0035] Embodiments of this disclosure relate to time-referenced elements (also referred to as haptic access units, access units, or MIHS units) orthogonal to a haptic data hierarchy. That is, each time-referenced MIHS unit includes one or more channels, each channel includes one or more bandwidths, and each bandwidth includes one or more haptic effects. In this disclosure, time-referenced MIHS units are used synonymously with haptic access units.

[0036] As shown in Figure 5, channel information, bandwidth information, and haptic effects are packed into the MIHS unit MIHS1.

[0037] In each embodiment, MIHS units do not overlap in time. In each embodiment, each MIHS unit may have a start time and / or lifetime. The start time and lifetime may be defined in units of a time scale. The time scale may be defined by the number of ticks per second within that time scale. An MIHS unit with a lifetime can only start at the end of the previous access unit. If a start time is set for an MIHS unit, it overrides the lifetime of the previous MIHS unit if the start time of the current MIHS unit occurs before the lifetime of the previous MIHS unit has ended.

[0038] In some embodiments, the position of an MIHS effect is defined as an offset from the start time of the MIHS unit carrying the effect. In other or the same embodiments, the position of a haptic effect of an MIHS unit may be defined as an offset from the start time of the MIHS unit carrying the effect. In other or the same embodiments, the MIHS unit may not contain any effects; that is, an empty access unit that does not exhibit a haptic effect starts during the duration of this access unit.

[0039] In accordance with aspects of this disclosure, an MIHS unit may consist of multiple MIHS packets, each packet having a type that defines what information the packet carries. Exemplary MIHS packet types include perceptual information, body part information, device information, channel information, bandwidth information, effect information for the effects carrying the MIHS unit, or empty or synchronization information for MIHS packets within a synchronous MIHS unit. A synchronous MIHS unit is a random access point that can reset all previous effects.

[0040] The advantage of using MIHS units in haptic elementary streams is that mapping the haptic stream to ISOBMFF or any other time-sample-based file format becomes efficient and easy. In one embodiment or another, each MIHS unit may be mapped to an ISOBMFF sample in the corresponding audio / video stream.

[0041] Accordingly, according to embodiments, a method for defining time-referenced access units for haptic signals may be provided. In the method, information may be packaged in one or more MIHS units. Each MIHS unit may cover a certain time and may have a start time and / or lifetime in units of a time scale, and all haptic aspects such as perception, device, channel, bandwidth, and haptic effects may be packaged in the MIHS unit. Metadata or binary aspects of the information may be included in the MIHS unit or MIHS packet, and the anchor point of the start time of any haptic effect within the MIHS unit is the start time of the MIHS unit, and the start time of any effect does not exceed the lifetime of the MIHS carrying this effect. The type of MIHS unit may also be specified by a field in the MIHS unit header, and some MIHS units are defined as synchronous access units, which reset all previous effects, thus providing random access points / synchronous points in the stream and are mapped to ISOBMFF synchronous samples.

[0042] For example, the start of an experience may be defined as a common anchor point for all effects in the stream. For instance, the first effect might have position 0, and the positions of all other effects may be defined relative to the position of the first effect. Then, in the case of ISOBMFF transmission of a haptic channel, the effect position should be relative to the start time of the sample carrying that effect. Then, when the haptic channel is carried in ISOBMFF, the position of its effect needs to be adjusted. Similarly, after parsing the ISOBMFF, the position of the effect needs to be readjusted by adding the start time of the sample before sending it to the haptic decoder.

[0043] In other or the same example, two types of ISOBMFF tracks may be provided: the first is a track whose start time is tracked as an anchor to the effect's position, and the second is a track whose start time is tracked as an anchor to the sample's start time.

[0044] In other or the same example, a sample structure within a haptic elemental stream may be defined. In this case, a haptic channel may consist of one or more samples / frames, and the timing of each effect within each sample / frame is relative to that sample.

[0045] As an example, a method for decoding MIHS units according to one embodiment may include the steps of: receiving a media stream including at least one haptic track and at least one media track; obtaining at least one MIHS unit from the media stream, wherein none of the at least one MIHS unit overlaps with any other MIHS unit in time, the MIHS unit includes the start time of the MIHS unit, and the MIHS unit is associated with one or more haptic channels; obtaining the respective start times associated with the at least one MIHS unit from the media stream; and decoding the media stream based on the respective start times.

[0046] Figure 6 shows process 600 for decoding the time reference MIHS unit.

[0047] In operation 605, a media stream may be received, which includes at least one haptic track and at least one media track.

[0048] In operation 610, at least one MIHS unit is retrieved from the metadata stream. In embodiments, each of the at least one MIHS unit does not overlap in time. In embodiments, an MIHS unit within the at least one MIHS unit may include the start time of that MIHS unit. In embodiments, an MIHS unit may be associated with one or more haptic channels.

[0049] In some embodiments, the MIHS unit may further include one or more haptic effects, the haptic effects of which include an offset indicating the start time of the haptic effect relative to the start time of the MIHS unit. In some embodiments, the offset is shorter than or equal to the lifespan of the MIHS unit.

[0050] In some embodiments, when an MIHS unit does not contain a haptic effect, its duration indicates the length of time during which no haptic effect is present in the media stream. In some embodiments, a synchronization MIHS unit from at least one haptic track is mapped to an ISO-based media file format (ISOBMFF) synchronization sample from at least one media track.

[0051] According to the embodiment, an MIHS unit may include one or more MIHS packets, each containing a type parameter that defines what information the MIHS packet carries. The type parameter may be one of the following: perceptual information type, body part information type, device information type, channel information type, bandwidth information type, and effect-enabled type.

[0052] In operation 615, the start time for each associated with at least one MIHS unit can be obtained. In embodiments, the MIHS unit may further include a lifetime, and the start time of the MIHS unit and the lifetime of the MIHS may be defined in units of a time scale, the time scale may be defined by the number of tracks per second within the time scale.

[0053] In operation 620, media streams may be rendered and / or displayed based on their respective start times.

[0054] In the embodiment, the MIHS unit may be signaled with higher-level syntax. The MIHS unit may further include one or more haptic bands, each containing one or more haptic effects.

[0055] Those skilled in the art will understand that the techniques described herein may be implemented on both the encoder and decoder sides. These techniques can be implemented as computer software using computer-readable instructions and physically stored on one or more computer-readable media. For example, Figure 7 shows a computer system 700 suitable for implementing a particular embodiment of the present disclosure.

[0056] Computer software can be coded in any suitable machine code or computer language that can follow mechanisms such as assembly, compilation, and linking to generate code that includes instructions that can be executed directly or through interpretation, microcode execution, etc., by a central processing unit (CPU), graphics processing unit (GPU), etc.

[0057] The instructions can be executed on various types of computers or their components, including, for example, personal computers, tablet computers, servers, smartphones, game consoles, and Internet of Things devices.

[0058] The components shown in Figure 7 with respect to the computer system 700 are illustrative and are not intended to imply any limitation on the scope of use or functionality of computer software implementing embodiments of the present disclosure. The configuration of the components should not be construed as having any dependence or requirement on any one or combination of components described in non-limiting embodiments of the computer system 700.

[0059] The computer system 700 may include certain human interface input devices. Such human interface input devices may respond to input from one or more users through, for example, tactile input (e.g., keyboard, swipe, dataglobe operation), voice input (e.g., voice, clapping), visual input (e.g., gestures), or olfactory input (not shown). Human interface devices may also be used to capture certain media that are not necessarily directly related to conscious human input, such as sound (e.g., speech, music, ambient sounds), images (e.g., scanned images, photographic images taken from a still camera), or video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).

[0060] The input human interface device may include one or more of the following: keyboard 701, mouse 702, trackpad 703, touchscreen 710, data glove, joystick 705, microphone 706, scanner 707, and camera 708 (only one of each is shown).

[0061] The computer system 700 may also include certain human interface output devices. Such human interface output devices can stimulate the senses of one or more users, for example, through tactile output, sound, light, and smell / taste. Such human interface output devices may include tactile output devices (e.g., touchscreen 710, data glove, or tactile feedback via joystick 705; however, there may also be tactile feedback devices that do not function as input devices). For example, such devices may include audio output devices (e.g., speaker 709, headphones (not shown)), visual output devices (e.g., CRT screens, LCD screens, plasma screens, OLED screens, each with or without touchscreen input functionality and each with or without tactile feedback functionality, some of which are screens 710 capable of outputting two-dimensional visual output or output in more than three dimensions by means such as stereoscopic output, virtual reality glasses (not shown), holographic displays, and smoke tanks (not shown)), and printers (not shown).

[0062] The computer system 700 may also include human-accessible storage devices and their associated media, such as CD / DVD ROM / RW 720 including CD / DVD or similar media 721, a thumb drive 722, a removable hard disk or solid-state drive 723, legacy magnetic media, such as tape and floppy disks (not shown), dedicated ROM / ASIC / PLD-based devices, such as security dongles (not shown), and the like.

[0063] Those skilled in the art will understand that the term “computer-readable medium” as used in relation to the subject matter currently disclosed does not include transmission media, carrier waves, or other transient signals.

[0064] The computer system 700 may also include interfaces to one or more communication networks. These networks can be, for example, wireless, wireline, or optical. They can also be local, wide-area, metropolitan, vehicle and industrial, real-time, latency-tolerant, etc. Examples of networks include local area networks such as Ethernet®, cellular networks including wireless LAN, GSM, 3G, 4G, 5G, LTE, etc., TV wireline or wireless wide-area digital networks including cable TV, satellite TV, and terrestrial TV, and vehicle and factory networks including CAN bus. Certain networks generally require an external network interface adapter attached to a specific general-purpose digital port or peripheral bus 749 (e.g., a USB port on the computer system 700). Others are generally integrated into the core of the computer system 700 by attachment to a system bus as described below (e.g., an Ethernet network to a PC computer system, or a cellular network interface to a smartphone computer system). Using any of these networks, the computer system 700 can communicate with other entities. Such communications can be unidirectional and receive-only (e.g., broadcast TV) or unidirectional and transmit-only (e.g., a CAN bus to a specific CAN bus device), or they can be bidirectional to other computer systems using, for example, a local or wide-area digital network. Such communications may include communications to a cloud computing environment 755. Specific protocols or protocol stacks are available for use on the aforementioned networks and network interfaces, respectively.

[0065] The above-mentioned human interface device, human-accessible memory device, and network interface 754 may be attached to the core 740 of the computer system 700.

[0066] The core 740 may include one or more central processing units (CPUs) 741, graphics processing units (GPUs) 742, dedicated programmable processing units in the form of field-programmable gate areas (FPGAs) 743, hardware accelerators 744 for specific tasks, etc. These devices may be connected via a system bus 748, along with read-only memory (ROM) 745, random access memory (RAM) 746, internal mass storage devices such as internal user-inaccessible hard drives, SSDs, etc. 747. In some computer systems, the system bus 748 may be accessible in the form of one or more physical plugs to allow expansion with additional CPUs, GPUs, etc. Peripherals may be attached to the core's system bus 748 directly or via a peripheral bus 749. Architectures for peripheral buses include PCI, USB, etc. A graphics adapter 750 may be included in the core 740.

[0067] The CPU 741, GPU 742, FPGA 743, and accelerator 744 are capable of executing specific instructions that can be combined to form the computer code described above. This computer code can be stored in ROM 745 or RAM 746. Temporary data can also be stored in RAM 746, while persistent data can be stored, for example, in the built-in mass storage device 747. High-speed storage and retrieval to any of the memory devices can be enabled by using cache memory. Cache memory may be closely associated with one or more CPUs 741, GPUs 742, mass storage devices 747, ROM 745, RAM 746, etc.

[0068] Computer-readable media may contain computer code for performing various computer implementation operations. The media and computer code may be specifically designed and configured for the purposes of this disclosure, or they may be of a type that is well known and available to those skilled in the art in computer software technology.

[0069] For example, and not as an limitation, a computer system having architecture 700, specifically a core 740, can provide functionality as a result of a processor (including CPUs, GPUs, FPGAs, accelerators, etc.) that runs software embodied in one or more tangible computer-readable media. Such computer-readable media can be media related to user-accessible mass storage devices introduced earlier, in addition to specific storage devices of the core 740 that have a non-transient nature, such as the core-integrated mass storage device 747 or ROM 745. Software implementing various embodiments of the present disclosure is stored in such devices and is executable by the core 740. The computer-readable media may include one or more memory devices or chips, depending on the specific needs. The software can cause the core 740, and specifically the processors within it (including CPUs, GPUs, FPGAs, etc.), to execute specific processes or specific parts of specific processes described herein, including defining data structures stored in RAM 746 and modifying such data structures according to processes defined by the software. Additionally, or alternatively, a computer system may provide functionality as a result of logic (e.g., accelerator 744) hardwired or otherwise embodied in the circuitry, which can operate in place of or with software to perform specific processes or specific parts of specific processes described herein. References to software may, as necessary, include logic, and vice versa. References to computer-readable media may, as necessary, include circuitry storing software for execution (e.g., integrated circuits (ICs)), circuitry embodying logic for execution, or both. This disclosure also encompasses any suitable combination of hardware and software.

[0070] While this disclosure has described several exemplary embodiments, alternatives, substitutions, and various substitute equivalents exist and are included within the scope of this disclosure. Therefore, it will be apparent to those skilled in the art that numerous systems and methods embodying the principles of this disclosure, and thus falling within its spirit and scope, can be conceived, even if not explicitly illustrated or described herein.

[0071] [Cross-references to related applications] This application claims priority to U.S. Provisional Patent Application No. 63 / 417184, filed on 18 October 2022, and U.S. Patent Application No. 18 / 487870, filed on 16 October 2023. The disclosures of these U.S. applications are incorporated herein by reference in their entirety.

Claims

1. A method for decoding a time-referenced video expert group (MPEG) immersive haptics stream (MIHS) unit, wherein the method is performed by at least one processor, The steps include receiving a media stream that includes at least one haptic track and at least one media track, Steps include: obtaining at least one MIHS unit from the media stream, wherein the media stream comprises a plurality of MIHS units, each of the plurality of MIHS units does not temporally overlap with other MIHS units in the media stream, each MIHS unit includes its start time, and the current MIHS unit overrides the lifespan of the previous MIHS unit if the start time of the current MIHS unit occurs before the end of the lifespan of the previous MIHS unit, such that the start time of the current MIHS unit begins after the lifespan of the previous MIHS unit, and the MIHS unit is associated with one or more haptic channels; The steps include obtaining the respective start times associated with the at least one MIHS unit from the media stream, The steps include decoding the media stream based on each of the aforementioned start times, and A method of having.

2. The start time and the duration of the MIHS unit are defined in units of a time scale. The aforementioned time scale is defined by the number of tracks per second within that time scale. The method according to claim 1.

3. The MIHS unit further includes one or more haptic effects, The haptic effect among the one or more haptic effects includes an offset indicating the start time of the haptic effect relative to the start time of the MIHS unit. The method according to claim 1.

4. The offset is less than or equal to the lifespan of the MIHS unit. The method according to claim 3.

5. The MIHS unit does not include haptic effects, and in the case where the MIHS unit does not include haptic effects, the duration indicates the length of time in the media stream during which haptic effects are absent. The method according to claim 1.

6. The MIHS unit includes one or more MIHS packets, each MIHS packet including type parameters that define what information the MIHS packet carries. The method according to claim 1.

7. The aforementioned type parameter is, Perceptual information type, Body part information type, Device information type, Channel information type, Bandwidth information type, and Effect-equipped type One of them is The method according to claim 6.

8. The aforementioned MIHS unit is signaled with higher-level syntax. The method according to claim 1.

9. The MIHS unit further includes one or more haptic bands, each containing one or more haptic effects. The method according to claim 1.

10. The synchronization MIHS unit from the at least one haptic track is mapped to an ISO-based media file format (ISOBMFF) synchronization sample from the at least one media track. The method according to claim 1.

11. A device for decoding a Time-Referenced Video Expert Group (MPEG) Immersive Haptics Stream (MIHS) unit, At least one memory configured to store program code, The system includes at least one processor configured to read the program code and operate as instructed by the program code, When the program code is executed by the at least one processor, it causes the at least one processor to perform the method according to any one of claims 1 to 10. Device.

12. A computer program that includes instructions, When the instruction is executed by one or more processors of a device for decoding a Time-Referenced Video Expert Group (MPEG) Immersive Haptics Stream (MIHS) unit, it causes the one or more processors to perform the method according to any one of claims 1 to 10. Computer program.