ISOBMFF tactile track with tactile effect sample anchoring
The method and apparatus address the timing model issue for haptic tracks in multimedia presentations by encoding and decoding haptic data with MPEG Immersive Haptic Stream units, ensuring synchronized haptic effects with other media tracks.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- TENCENT AMERICA LLC
- Filing Date
- 2023-10-17
- Publication Date
- 2026-07-01
AI Technical Summary
The lack of a clear timing model for the transmission of haptic tracks in multimedia presentations, specifically how the timing of ISOBMFF tracks relates to the timing of underlying haptic signals, is unresolved.
A method and apparatus for encoding and decoding haptic data, including receiving a media stream with haptic and video tracks, obtaining MPEG Immersive Haptic Stream (MIHS) units with initiation times, and rendering the stream based on acquired timing information, along with a non-temporary computer-readable medium storing instructions to perform these steps.
Provides a clear timing model for haptic effects, enabling efficient synchronization with other media tracks and enhancing the manipulation and processing of media streams.
Smart Images

Figure 0007883668000001 
Figure 0007883668000002 
Figure 0007883668000003
Abstract
Description
Technical Field
[0001] [Cross - Reference to Related Applications] This application claims priority based on U.S. Provisional Patent Application No. 63 / 416,780 filed on October 17, 2022, and U.S. Patent Application No. 18 / 487,688 filed on October 16, 2023, the disclosures of which are hereby incorporated herein by reference in their entirety.
[0002] [Technical Field] This disclosure relates to a series of advanced video encoding techniques. More specifically, this disclosure relates to the encoding and decoding of haptic experiences for multimedia presentations.
Background Art
[0003] Haptic experiences are already part of multimedia presentations. In applications where a multimedia presentation includes aspects of haptic experiences, haptic signals are transmitted to a device or a wearable device, and the user can feel haptic sensations that cooperate with visual and / or audio media experiences during the use of the application.
[0004] Recognizing that haptic experiences are becoming increasingly popular in multimedia presentations, the Motion Picture Experts Group (MPEG) has already started considering compression standards for haptics (both MPEG - DASH and MPEG - I) and signaling transmission for compressed haptics in the ISO (International Organization for Standardization) Based Media File Format (ISOBMFF).
Summary of the Invention
Problems to be Solved by the Invention
[0005] One of the issues that needs to be addressed regarding the haptic experience aspect of multimedia presentations is the lack of a clear timing model for the transmission of haptic tracks; specifically, it is unclear how the timing of ISOBMFF tracks relates to the timing of the underlying haptic signals. A solution is needed to address this problem. [Means for solving the problem]
[0006] According to the embodiment, a method for encoding or decoding haptic data is provided. The method can be performed by at least one processor and may include the steps of: receiving a media stream including at least one haptic track and at least one video track; obtaining one or more Motion Expert Group (MPEG) Immersive Haptics Stream (MIHS) units from the media stream, wherein the MIHS unit includes one or more haptic effects, and the MIHS unit includes an initiation time; obtaining timing information from the media stream relating to one or more haptic effects, wherein the timing information includes the temporal position of at least one of the haptic effects; and rendering the media stream based on the obtained timing information.
[0007] According to the embodiment, an apparatus for haptic processing can be provided. The apparatus includes at least one memory arranged to store program code and at least one processor arranged to read the program code and operate according to the instructions of the program code. The program code may include a first receive code arranged to cause at least one processor to receive a media stream including at least one haptic track and at least one video track; a first acquire code arranged to cause at least one processor to acquire one or more Motion Expert Group (MPEG) Immersive Haptic Stream (MIHS) units from the media stream, wherein the MIHS unit includes one or more haptic effects, and the MIHS unit includes a first acquire code including an initiation time; a second acquire code arranged to cause at least one processor to acquire timing information related to one or more haptic effects from the media stream, wherein the timing information includes the second acquire code including the temporal position of at least one of the haptic effects; and a rendering code arranged to cause at least one processor to render the media stream based on the acquired timing information.
[0008] According to the embodiment, a non-temporary computer-readable medium storing computer instructions can be provided. The instructions may include one or more instructions, which, when executed by one or more processors of a device for haptic processing, cause one or more processors to perform the following steps: receiving a media stream including at least one haptic track and at least one video track; obtaining one or more Motion Expert Group (MPEG) Immersive Haptic Stream (MIHS) units from the media stream, wherein the MIHS units include one or more haptic effects, and the MIHS units include a start time; obtaining timing information related to one or more haptic effects from the media stream, wherein the timing information includes the temporal position of at least one of the haptic effects; and rendering the media stream based on the obtained timing information.
[0009] Further features, properties, and various advantages of the disclosed subject matter will become clearer from the following detailed description and accompanying drawings. [Brief explanation of the drawing]
[0010] [Figure 1] This is a schematic diagram of a simplified block diagram of a communication system according to the embodiment of this disclosure. [Figure 2] This is a schematic diagram of a simplified block diagram of a streaming system according to an embodiment of the present disclosure. [Figure 3A] This is a schematic diagram of a simplified block diagram of a tactile encoder according to an embodiment of the present disclosure. [Figure 3B] This is a schematic diagram of a simplified block diagram of a haptic decoder and haptic renderer according to an embodiment of the present disclosure. [Figure 4] This is an illustrative diagram of a process for determining the relative timing of MIHS (MPEG immersive haptic stream, MIHS) units according to an embodiment of the present disclosure. [Figure 5]This is an illustrative diagram of a process for determining the relative timing of MIHS units according to the embodiment of this disclosure. [Figure 6] This is an exemplary flowchart of a process for processing haptic media according to an embodiment of the present disclosure. [Figure 7] This is a diagram of a computer system suitable for realizing an embodiment. [Modes for carrying out the invention]
[0011] One aspect of this disclosure provides a method, system, and non-temporary storage medium for parallel processing of dynamic mesh compression. Embodiments of this disclosure can also be applied to static meshes.
[0012] An embodiment of the present disclosure for realizing the encoding and decoding configurations of the present disclosure will be described with reference to Figures 1 and 2.
[0013] Figure 1 shows a simplified block diagram of a communication system 100 according to an embodiment of the present disclosure. The system 100 may include at least two terminals 110, 120 connected to each other via a network 150. In the case of one-way data transmission, the first terminal 110 can encode video data, which may include mesh data at its local location, and transmit it to the other terminal 120 via the network 150. The second terminal 120 can receive the encoded video data from the other terminal via the network 150, decode the encoded data, and display the restored video data. One-way data transmission is common in media service applications and the like.
[0014] Figure 1 shows a second set of terminals 130, 140, which are provided to support the bidirectional transmission of encoded video that may occur, for example, during a video conference. In the case of bidirectional data transmission, each terminal 130, 140 can encode video data captured at its local location and transmit it to another terminal via the network 150. Each terminal 130, 140 may receive encoded video data transmitted by another terminal, decode the encoded data, and further display the restored video data on a local display device.
[0015] In Figure 1, terminals 110-140 may be, for example, servers, personal computers, smartphones, and / or any other type of terminal. For example, terminals (110-140) may be laptop computers, tablet computers, media players, and / or dedicated video conferencing devices. Network 150 represents any number of networks that transmit encoded video data between terminals 110-140, including, for example, wired communication networks and / or wireless communication networks. Communication network 150 can exchange data over circuit-switched channels and / or packet-switched channels. Typical networks include telecommunications networks, local area networks, wide area networks, and / or the Internet. For the purposes of this discussion, the architecture and topology of network 150 may not be important to the operation of this disclosure unless described below.
[0016] As an example of the application of the disclosed subject matter, Figure 2 shows the arrangement of a video encoder and decoder in a streaming environment. The disclosed subject matter can be used in conjunction with other applications that support video, such as video conferencing and digital TV that store compressed video on digital media such as CDs (Compact Discs), DVDs (Digital Video Disks), and Memory Sticks.
[0017] As shown in Figure 2, the streaming system 200 includes a capture subsystem 213 which includes a video source 201 and an encoder 203. The streaming system 200 may further include at least one streaming server 205 and / or at least one streaming client 206.
[0018] The video source 201 can create a stream 202 containing, for example, a 3D (Three Dimensional, 3D) mesh and metadata associated with the 3D mesh. The video source 201 may include, for example, a 3D sensor (e.g., a depth sensor) or 3D imaging technology (e.g., a digital camera), and a computing device arranged to generate a 3D mesh using data received from the 3D sensor or 3D imaging technology. A sample stream 202, having a higher data volume compared to the encoded video bitstream, can be processed by an encoder 203 coupled to the video source 201. The encoder 203 may include hardware, software, or a combination thereof, to realize or implement each aspect of the disclosed subject matter, as described in detail below. The encoder 203 can generate an encoded video bitstream 204. The encoded video bitstream 204, having a lower data volume compared to the uncompressed stream 202, can be stored in the streaming server 205 for future use. One or more streaming clients 206 can access the streaming server 205 to retrieve a video bitstream 209, which may be a copy of the encoded video bitstream 204.
[0019] The streaming client 206 can include a video decoder 210 and a display 212. The video decoder 210 can decode a video bitstream 209, which is, for example, an input copy of the encoded video bitstream 204, and generate a transmitted video sample stream 211 that can be rendered on the display 212 or another rendering device (not shown). In some streaming systems, the video bitstreams 204, 209 can be encoded according to some video encoding / compression standards.
[0020] Referring to FIGS. 3A through 3B, embodiments of the present disclosure implementing a haptic encoder 300 and a haptic decoder 350 will be described.
[0021] As shown in FIG. 3A, the haptic encoder 300 can receive both descriptive haptic data and waveform haptic data. Accordingly, the haptic encoder 300 can process three types of input files:.ohm metadata files (object haptic metadata, a text file format used for haptic metadata), descriptive haptic files (.ivs,.ahap, and.hjif), or waveform PCM (Pulse Code Modulation, PCM) files (.wav). Examples of descriptive data can include.ahap from Apple (a file format similar to JSON (JavaScript Object Notation, JSON) that specifies haptic patterns for Apple haptics and audio mode) (representing the expected haptic output by a set of parameterized modulated continuous signals and a set of modulated transients),.ivs from Immersion (representing the expected haptic output by a set of basic effects parameterized by a set of parameters), or the proposed MPEG format.hjif (haptic JSON interchange format). Examples of waveform pulse code modulation (PCM) signals can include.ohm input files that contain metadata information.
[0022] According to the embodiment, the tactile encoder 300 can process two types of input files in different ways. For descriptive content, the tactile encoder 300 can semantically analyze the input and transcode the data (if necessary) into a submitted encoded representation.
[0023] According to the embodiment, the .ohm metadata input file may include a description of the haptic system and its installation. In particular, the .ohm metadata input file may include the name of each associated haptic file (descriptive or PCM) and a description of the signal. It may also provide a mapping between each channel of the signal and the target body part on the user's body. With respect to the .ohm metadata input file, the haptic encoder performs metadata extraction by searching for the associated haptic file from the URI (Uniform Resource Identifier), encoding it based on the type of haptic file, extracting metadata from the .ohm file, and mapping the metadata to the metadata information of the data model.
[0024] According to the embodiment, descriptive haptic files (e.g., .ivs, .ahap, and .hjif) can be encoded by simple processing. The haptic encoder 300 first specifically identifies the input format. If the input format is a .hjif file, transcoding is not necessary; the file can be further edited, compressed into a binary format, and finally packetized into an MIHS stream. If an .ahap or .ivs input file is used, transcoding is necessary. The haptic encoder 300 first semantically analyzes the input file information, then transcodes and formats it into a selected data model. After transcoding, the data is exported to a .hjif file, an .hmpg binary file, or an MIHS stream.
[0025] According to the embodiment, the haptic encoder 300 can perform signal analysis to describe the signal structure of a .wav file and convert the signal structure into a submitted encoded representation. For waveform PCM content, the haptic encoder 300 can divide the signal analysis process into two subprocesses. After performing frequency band decomposition on the signal, in the first subprocess, the low-frequency band can be encoded using a keyframe extraction process. Subsequently, the low-frequency band can be reconstructed and the error between the signal and the original low-frequency signal can be calculated. Then, this residual signal is added to the original high-frequency band and then encoded using a wavelet transform, which is the second subprocess. According to the embodiment, if several low-frequency bands are used, the residuals from all low-frequency bands are added to the high-frequency band before encoding. In the embodiment, when several high-frequency bands are used, the residuals from the low-frequency bands are added to a first high-frequency band before encoding.
[0026] According to one embodiment, keyframe extraction includes extracting the low-frequency band from the frequency band decomposition and analyzing the contents of the low-frequency band in the time domain. According to another embodiment, wavelet processing includes extracting the high-frequency band from the frequency band decomposition and low-frequency residuals and dividing the high-frequency band into blocks of equal size. These signal blocks of equal size are then analyzed in a psychotactile model. With the help of the psychotactile model, lossy compression can be applied by wavelet transforming and quantizing the blocks. Finally, each block is stored as a separate effect within a single band, which is done in formatting. Binary compression can be applied using appropriate coding techniques, such as the Set Partitioning in Hierarchical Trees (SPIHT) algorithm and Arithmetic Coding (AC), to achieve lossless compression.
[0027] As shown in Figure 3A, the haptic encoder 300 is configured to encode explanatory haptic data and quantized haptic data, and can output three types of formats: an exchange format (.hjif), a binary compressed format (.hmpg), and a stream format (e.g., MPEG Immersive Haptic Stream (MIHS)). The .hjif format is a human-readable format based on JSON, and is easy to parse and manually edit, making it an ideal exchange format, especially when designing / creating content. For distribution purposes, .hjif data can be compressed into a more memory-efficient binary .hmpg bitstream. Such compression is lossy, and different parameters affect the encoding depth of the amplitude and frequency that make up the bitstream. For streaming purposes, the data is compressed and packetized into an MPEG-I Haptic Stream (MIHS). The three formats described above have complementary purposes, and lossy one-to-one conversions can be performed between them.
[0028] As shown in Figure 3B, the haptic decoder 350 can use the .hmpg compressed binary file format or the MIHS bitstream as input. The haptic decoder 350 can output the .hjif exchange format, which can be used directly for rendering. Both types of input formats allow for the extraction of both metadata and the data itself from the file by binary decompression, and the mapping of the data to a selected data structure. The data can then be derived to the haptic renderer 380 in the .hjif format.
[0029] As shown in Figure 3B, the renderer 380 includes a synthesizer. The synthesizer can render tactile data from an .hjif input file into a PCM output file. Rendering and / or synthesis are informational. According to the embodiment, the synthesizer analyzes the input file and performs a high-level synthesis distribution among vectors, wavelets, etc. The synthesis process then continues up to the band component of the codec that invokes the synthesis process. Next, all bands of a given channel are mixed by a simple adder to regenerate the desired tactile signal.
[0030] According to the embodiment, the haptic experience restricts the root of the hierarchical data model. It provides information about the file date and format version, describes the haptic experience, lists the different avatars (i.e., bodily representations) used throughout the experience, and limits all haptic perceptions.
[0031] According to the embodiment, a self-contained stream format for transmitting MPEG-I haptic data may use a packetization approach and may include two levels of packets: MPEG-I haptic stream (MIHS) units that cover a duration and contain zero or more MIHS packets, and MIHS packets containing metadata or haptic effect data. Each MIHS unit may cover a non-overlapping duration of the haptic presentation time, i.e., each MIHS unit may start at the end of the previous MIHS unit and cover a duration limited by its duration field. An MIHS unit may have a successor MIHS unit unless it is the last MIHS unit of the haptic experience. All MIHS packets in an MIHS unit may have a start time and duration that include the MIHS unit.
[0032] In the embodiment, an MIHS unit is also called a network abstraction layer unit associated with haptic data. In the embodiment, an MIHS unit is also called an MIHS sample associated with haptic data.
[0033] According to the embodiment, the MIHS unit may be either a synchronous or asynchronous unit. A synchronous unit resets previous effects and provides a haptic experience independent of the previous MIHS unit. An asynchronous unit is a continuation of the previous MIHS unit and cannot decode and render independently without decoding the previous MIHS unit.
[0034] According to the embodiments, multiple channels can encode tactile signals. In some embodiments, a tactile channel can be limited to a signal rendered at a specific body position by a dedicated actuator / device. Metadata stored at the channel level may include, for example, the gain associated with the channel, the mixed weight, the desired body position for tactile feedback, and selectable reference devices and / or directions. Additional information such as a desired sampling frequency and sample count may also be provided. Finally, the tactile data of a channel is contained in a set of tactile bands limited by the frequency range of the tactile bands. The tactile bands describe the tactile signals of the channel within a given frequency range. The bands are limited by the type and order list of tactile effects, and each tactile effect includes a set of keyframes. For each type of tactile band, the tactile effects can be limited by at least location and type. Location can indicate the temporal or spatial location of the effect. In some embodiments, a value of 0 is the relative starting position of the experience and depends on the factor variable of the placed perceptual modality. The default unit for temporal tactile feedback may be milliseconds, and the default unit for spatial tactile feedback may be millimeters. This embodiment discloses the “start position of the experience” because the binary delivery format does not have any concept of a limited time interval, i.e., a frame or a sample.
[0035] Depending on the band type and effect type, additional characteristics can be specified, including the phase, fundamental signal, and the configuration and number of sequential tactile keyframes describing the effect.
[0036] Depending on the embodiment, this disclosure limits the tactile data hierarchy structure. ●Tactile channels ○Tactile band ■Tactile effect
[0037] Embodiments of this disclosure describe two anchors for the location of tactile effects with respect to the ISOBMFF track.
[0038] Figure 4 shows a first embodiment. As shown in Figure 4, each MIHS unit (also called an MIHS sample, ISOBMFF tactile sample, or sample in the embodiment) includes one or more tactile channel information and one or more tactile band information. As described above, each MIHS unit includes one or more channels, and each channel includes one or more bands. Each band may have one or more effects.
[0039] In the first embodiment, the temporal position of the effect can be limited as an offset to the start timing of the sample having the effect (e.g., MIHS unit start time). In the second embodiment or the same embodiment, the offset is based on the start time and / or presentation time of the media or haptic track.
[0040] According to the embodiment, the first embodiment allows the track to be manipulated if it does not affect the position of the haptic effect, because any change in ISOBMFF sample timing also does not affect the relative position of the effect. According to the embodiment, the second embodiment can be used when using a basic haptic stream (e.g., a high-level grammar stream) without ISOBMFF.
[0041] According to the embodiments, several types of tactile tracks can be used. In one embodiment, the tactile track may use a sample or MIHS unit whose temporal position has an effect, and the temporal position of the effect is limited as an offset from the start timing of the sample. According to another embodiment, the tactile track may use a sample or MIHS unit whose temporal position of the effect is related to the start time of the track. In yet another embodiment, mixed MIHS units or samples can be used.
[0042] This disclosure provides a method, apparatus, and system for limiting the timing model of haptic effects for ISOBMFF file format tracks having haptic effects. Two timing options are provided, one of which the anchor is the media track presentation start time, and all haptic effect positions are restricted to that point in time. The other timing option involves each haptic effect being anchored by the sample start time in which the effect has been expressed, and the effect is expressed in sample time. Embodiments may include tracks having option 1, tracks having option 2, or a mixed track.
[0043] Embodiments of this disclosure provide timing models that allow haptic effects to be synchronized with other media tracks in the same or related ISOBMFF file. Since the timing model of the haptic track relates to the timing model of the related ISOBMFF file, the manipulation and processing of the media track become more efficient.
[0044] As shown in Figure 6, process 600 represents an exemplary process for decoding tactile data.
[0045] Operation 605 can receive a media stream that includes one or more haptic tracks and one or more video tracks.
[0046] In operation 610, one or more Motion Picture Expert Group (MPEG) Immersive Haptic Stream (MIHS) units can be obtained from the media stream. In some embodiments, an MIHS unit may include one or more haptic effects. An MIHS unit may include an MIHS unit start time.
[0047] In the embodiment, the MIHS unit is associated with at least one tactile channel, the at least one tactile channel comprising one or more tactile bands, each of the one or more tactile bands having at least one tactile effect.
[0048] In operation 615, timing information associated with one or more tactile effects can be obtained. In an embodiment, the timing information may include the temporal position of at least one of the one or more tactile effects.
[0049] In this embodiment, the temporal position of the tactile effect indicates the onset time of the tactile effect, and the onset time of the tactile effect is an offset based on the start time of the corresponding MIHS unit. The onset time can indicate the onset time of the tactile effect relative to the start time of the corresponding MIHS unit.
[0050] In the embodiment, the onset time for the tactile effect is an absolute time based on the start time of at least one tactile track or at least one video track.
[0051] In operation 620, the media stream is rendered based on the acquired timing information.
[0052] According to the embodiment, the sequence manipulation of one or more MIHS units does not affect the temporal position of at least one of the haptic effects, because one or more MIHS units correspond to one or more ISO-based media file format (ISOBMFF) samples associated with at least one video track.
[0053] In some embodiments, synchronous MIHS units can be obtained from a media stream. In embodiments, a synchronous MIHS unit is a special type of MIHS unit that provides a reset point in the bitstream. In embodiments, a synchronous MIHS unit is mapped to a synchronous sample in a video bitstream corresponding to one or more haptic channels.
[0054] As those skilled in the art will understand, the techniques described herein can be implemented on both the encoder and decoder sides. The techniques described above can be implemented as computer software using computer-readable instructions and can be stored physically on one or more computer-readable media. For example, Figure 7 shows a computer system 700 suitable for implementing some embodiments of the present disclosure.
[0055] Computer software can be encoded using any suitable machine code or computer language, which can be directly executed by a computer's central processing unit (CPU), graphics processing unit (GPU), etc., or interpreted, to create code containing instructions that are executed via microcode, through assembly, compilation, linking, or similar mechanisms.
[0056] This directive can be executed on various types of computers or their components (including, for example, personal computers, tablet computers, servers, smartphones, game consoles, and Internet of Things devices).
[0057] The components of the computer system 700 shown in Figure 7 are illustrative and are not intended to imply any limitations on the scope or functionality of the computer software used to implement embodiments of the present disclosure. The arrangement of the components should not be construed as having any dependencies or requirements on any one or combination of components shown in any non-limiting embodiment of the computer system 700.
[0058] The computer system 700 may include several human-machine interface input devices. Such human-machine interface input devices can respond to input from one or more human users, for example, tactile input (e.g., keystrokes, swipes, data glove movements), audio input (e.g., voice, tapping), visual input (e.g., gestures), and olfactory input (not shown). The human-machine interface devices may also be used to capture several media that are not necessarily directly related to conscious human input, such as audio (e.g., voices, music, ambient sounds), images (e.g., scanned images, photographic images acquired from still image cameras), and videos (e.g., two-dimensional videos including stereoscopic videos, three-dimensional videos).
[0059] The input human-machine interface device may include one or more of the following: keyboard 701, mouse 702, touchpad 703, touchscreen 710, data glove, joystick 705, microphone 706, scanner 707, and imaging device 708 (only one of each is shown).
[0060] The computer system 700 may further include several human-machine interface output devices. Such human-machine interface output devices can stimulate the senses of one or more human users, for example, through tactile output, sound, light, and smell / taste. Such human-machine interface output devices may include tactile output devices (e.g., touchscreen 710, tactile feedback via data glove or joystick 705, but there may be tactile feedback devices that are not used as input devices). For example, such devices may include audio output devices (e.g., speaker 709, head-mounted headphones (not shown)), visual output devices (e.g., screen 710 including CRT screens, LCD screens, plasma screens, OLED screens, etc., with or without touchscreen input capability, with or without tactile feedback capability, some of which may output two-dimensional visual output or three-dimensional or more output in a stereoscopic output manner, for example, virtual reality glasses (not shown), holographic displays and smoke tanks (not shown)), and printers (not shown).
[0061] The computer system 700 may further include human-accessible storage devices and associated media, such as optical media including a CD / DVD ROM / RW 720 having a CD / DVD media 721, a thumb drive 722, a removable hard drive or solid-state drive 723, traditional magnetic media such as magnetic tape or floppy disks (not shown), and devices such as security dongles (not shown) using dedicated ROM / ASIC / PLD.
[0062] Those skilled in the art should understand that the term “computer-readable medium” as used in combination with the currently disclosed subject matter does not include a transmission medium, carrier wave, or other transient signal.
[0063] The computer system 700 may include interfaces to one or more communication networks. The networks may be, for example, wireless, wired, or optical networks. The networks may be local, wide-area, metropolitan, vehicle, industrial, real-time, or latency-tolerant networks. Examples of networks include, for example, local networks such as Ethernet®, cellular networks including wireless LAN (Local Area Network, LAN), GSM (Global System for Mobile Communications, GSM), 3G (the Third Generation, 3G), 4G (the Fourth Generation, 4G), 5G (the Fifth Generation, 5G), and LTE (Long Term Evolution, LTE), wide-area digital networks including cable TV, satellite TV, and terrestrial TV, and vehicle and industrial networks including CANBus. Some networks typically require an external network interface adapter connected to a general-purpose data port or peripheral bus 749 (e.g., a USB port on computer system 700), while others are integrated into the core of computer system 70 by connecting to the system bus via the network infrastructure described below (e.g., an Ethernet interface to a PC computer system, or a cellular network interface to a smartphone computer system). Using any of these networks, computer system 700 can communicate with other entities. Such communication may be one-way reception only (e.g., broadcast television), one-way transmission only (e.g., a CAN bus to some CAN (Controller Area Network, CAN) bus devices), or bidirectional to other computer systems via a local area or wide area digital network, for example. Such communication may include communication to a cloud computing environment 755.Certain protocols and protocol stacks can be used on each of these networks and network interfaces.
[0064] The aforementioned human-machine interface device, human-accessible storage device, and network interface 754 can be attached to the core 740 of the computer system 700.
[0065] The core 740 may include one or more central processing units (CPUs) 741, graphics processing units (GPUs) 742, dedicated programmable processing units in the form of field programmable gate arrays (FPGAs) 743, and hardware accelerators 744 for certain specific tasks. These devices are connected via a system bus 748, along with read-only memory (ROM) 745, random access memory 746, and internal mass storage devices 747 such as internal hard disk drives or SSDs that are not accessible to the user. In some computer systems, expansion with additional CPUs, GPUs, etc., can be enabled by accessing the system bus 748 in the form of one or more physical plugs. Peripheral devices are connected to the core's system bus 748 directly or via a peripheral bus 749. Peripheral bus architectures include PCI (Peripheral Component Interconnect / Interface) and USB (Universal Serial Bus). A graphics adapter 750 may be included in the core 740.
[0066] The CPU 741, GPU 742, FPGA 743, and accelerator 744 can execute certain commands, and by combining these commands, the above-mentioned computer code can be constructed. This computer code can be stored in ROM 745 and RAM (Random Access Memory, RAM) 746. Temporary data can also be stored in RAM 746, and persistent data can be stored, for example, in internal mass storage device 747. High-speed storage and retrieval to any memory device can be achieved by using cache memory, and this cache memory can be closely associated with one or more CPUs 741, GPUs 742, mass storage devices 747, ROMs 745, RAM 746, etc.
[0067] A computer-readable medium may contain computer code for performing various operations that a computer can perform. The medium and computer code may be specially designed and constructed for the purposes of this disclosure, or they may be of a type known and available to those skilled in the computer software field.
[0068] For example, but not limited to, a computer system having the architecture of computer system 700, in particular core 740, can provide functionality by having a processor (including a CPU, GPU, FPGA, accelerator, etc.) execute software embodied in one or more tangible computer-readable media. Such computer-readable media may be media related to user-accessible mass storage and several storage devices having the non-transient nature of core 740 (e.g., internal mass storage device 747 or ROM 745), as described above. Software of various embodiments for realizing this disclosure can be stored in such devices and executed by the core. Depending on the specific requirements, the computer-readable media may include one or more storage devices or chips. The software can cause the core, in particular the processor (including a CPU, GPU, FPGA, etc.) within it, to execute certain processes or specific parts of certain processes, including defining data structures stored in RAM 746 as described herein, and modifying such data structures according to processes defined by the software. In addition, or alternatively, a computer system may provide functionality as a result of logic being embodied in circuitry (e.g., accelerator 744) in hardwired or otherwise, such circuitry may operate in place of or with software to perform specific processes or specific parts of specific processes described herein. Where appropriate, references to software may include logic, and references to logic may include software. Where appropriate, references to computer-readable media may include circuitry that stores software for execution (e.g., integrated circuit, IC), circuitry that implements logic for execution, or both. This disclosure includes any appropriate combination of hardware and software.
[0069] While this disclosure has already described several non-limiting embodiments, there are various modifications, substitutions, and alternative equivalents that fall within the scope of this disclosure. Accordingly, various systems and methods that are not expressly shown or described herein but embody the principles of this disclosure and thus fall within the spirit and scope of this disclosure can be devised by those skilled in the art.
Claims
1. A method for decoding timing information of haptic data, which is performed by at least one processor, The aforementioned method, The steps include receiving a media stream that includes at least one haptic track and at least one video track, A step of receiving a first Motion Expert Group (MPEG) Immersive Haptic Stream (MIHS) unit from the media stream, wherein the first MIHS unit includes one or more haptic effects and an initiation time of the first MIHS unit, the first MIHS unit has one or more channels, each channel has one or more bands, and each band includes haptic effects such that the first band of the first channel of the first MIHS unit includes the one or more haptic effects, A step of obtaining temporal position information associated with a first haptic effect among the one or more haptic effects from the media stream, wherein the temporal position information indicates the effect start time of the first haptic effect offset with respect to the start time of the first MIHS unit. The steps include rendering the media stream based on the acquired temporal position information, A method that includes this.
2. The order in which the first MIHS unit and the second MIHS unit are operated does not affect the onset time of the first tactile effect. The method according to claim 1.
3. The first MIHS unit corresponds to an ISO-based media file format (ISOBMFF) sample associated with the at least one video track, The method according to claim 1.
4. A device for decoding tactile data, wherein the device is At least one memory location configured to store program code, At least one processor configured to read the program code and operate according to the instructions of the program code, Includes, The aforementioned program code is: A first receive code is configured to cause at least one processor to receive a media stream including at least one haptic track and at least one video track, A first acquisition code configured to cause at least one processor to acquire a first Motion Image Expert Group (MPEG) Immersive Haptic Stream (MIHS) unit from the media stream, wherein the first MIHS unit includes one or more haptic effects and an initiation time of the first MIHS unit, the first MIHS unit has one or more channels, each channel has one or more bands, each band includes haptic effects such that the first band of the first channel of the first MIHS unit includes the one or more haptic effects, A second acquisition code is configured to cause at least one processor to acquire from the media stream temporal position information associated with a first haptic effect among the one or more haptic effects, wherein the temporal position information indicates the effect start time of the first haptic effect offset with respect to the start time of the first MIHS unit, The rendering code is configured to cause at least one processor to render the media stream based on the acquired temporal position information, A device that includes this.
5. The order in which the first MIHS unit and the second MIHS unit are operated does not affect the onset time of the first tactile effect. The apparatus according to claim 4.
6. The first MIHS unit corresponds to an ISO-based media file format (ISOBMFF) sample associated with the at least one video track, The apparatus according to claim 4.
7. A computer executable program containing instructions, The instruction includes one or more instructions, and when the one or more instructions are executed by one or more processors of the device for decoding tactile data, the one or more processors are instructed to A program that causes the method described in any one of claims 1 to 3 to be performed.