VR system

The VR system reduces bandwidth by generating and transmitting only necessary regions of 360-degree images based on gaze information and predicted transitions, effectively addressing the bandwidth challenge for multiple camera images in VR video transmission.

JP7883262B2Active Publication Date: 2026-07-01NIPPON TELEGRAPH & TELEPHONE CORP +1

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
NIPPON TELEGRAPH & TELEPHONE CORP
Filing Date
2023-01-30
Publication Date
2026-07-01

Smart Images

  • Figure 0007883262000001
    Figure 0007883262000001
  • Figure 0007883262000002
    Figure 0007883262000002
  • Figure 0007883262000003
    Figure 0007883262000003
Patent Text Reader

Abstract

To reduce the bit rate in transmitting a plurality of camera videos constituting a 360-degree video.SOLUTION: A server device connected to a controlled object through a wireless network, comprises: a conversion processing part which acquires a plurality of camera videos captured by a plurality of cameras from the controlled object and generates a 360-degree video from the plurality of camera videos; and a conversion table in which a corresponding relation between positions in the plurality of camera videos and positions in the 360-degree video is determined. Once a view region in the 360-degree video is designated, regions in the plurality of camera videos corresponding to the view region are specified by referring to a reverse conversion table of the conversion table, and sight information including the specified regions is transmitted to the controlled object.SELECTED DRAWING: Figure 1
Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] This disclosure relates to a method for reducing the video bitrate in a VR system. [Background technology]

[0002] When operating controlled objects such as remotely controlled robots, the field of view is limited with conventional 2D video, so there is a demand to operate them while viewing VR (Virtual Reality) video. The following forms of systems can be considered for transmitting VR video between the controlled object (such as a robot), the operator, and a server that acts as an intermediary between them. (1) The controlled object acquires images from multiple cameras (e.g., Dual Fisheye format) to create VR images. (2) Multiple camera feeds are transmitted to a server via wireless communication. (3) The server stitches together the received camera footage to generate a 360-degree video. (4) The stitched 360-degree video is transmitted to the operator. This allows the operator to view a video area corresponding to their line of sight through the headset.

[0003] To create high-definition VR images, multiple high-quality camera images must be uploaded from the controlled object. However, since the controlled object and the server are connected via a wireless network, sufficient bandwidth cannot be obtained. Another possible method to reduce bandwidth is to place the stitching server on the controlled object, but due to the size and power constraints of the controlled robot, it is difficult to mount a high-performance server.

[0004] Therefore, traffic reduction methods using tile division have been proposed [see, for example, Non-Patent Documents 1-3]. Non-Patent Document 1 transmits only the tiles corresponding to the operator's viewport. Non-Patent Documents 2 and 3 transmit tiles with improved quality that are related to the viewport. [Prior art documents] [Non-patent literature]

[0005] [Non-Patent Document 1] Feng Qian et al., “Optimizing 360 video delivery over cellular networks,” ACM ATC, 2016. [Non-Patent Document 2] M. Hosseini, “View-aware tile-based adaptations in 360 virtual reality video streaming,” IEEE VR, 2017 [Non-Patent Document 3] DV Nguyan et al., “An Optimal Tile-Based Approach for Viewport-Adaptive 360-Degree Video Streaming,” IEEE Journal of Emerging and Selected Topics in Circuits and Systems, Vol. 9, No. 1, 2019. [Overview of the project] [Problems that the invention aims to solve]

[0006] However, while the methods described in Non-Patent Documents 1-3 of prior studies can reduce the bandwidth of the 360-degree video itself projected onto the headset, they cannot reduce the bandwidth of the multiple camera images used to generate the 360-degree video. Therefore, this disclosure aims to achieve a reduction in the bitrate when transmitting multiple camera images to constitute a 360-degree video. [Means for solving the problem]

[0007] The VR system of this disclosure comprises a server device of this disclosure and a controlled device of this disclosure. The server device and the controlled device of this disclosure execute the method of this disclosure.

[0008] The server equipment disclosed herein is A server device connected to the controlled object via a wireless network, A conversion processing unit that acquires multiple camera images captured by multiple cameras from the control target and generates a 360-degree image from the multiple camera images, The system includes a conversion table that defines the correspondence between the positions in the multiple camera images and the positions in the 360-degree image, When a viewing area in the 360-degree video is specified, the region in the multiple camera images corresponding to the viewing area is identified by referring to the inverse conversion table of the conversion table. The gaze information, including the identified area, is transmitted to the control target.

[0009] The server device of this disclosure may acquire the gaze transitions of the operator of the controlled device, predict the gaze information to follow the acquired gaze transitions, identify the regions of the multiple camera images corresponding to the gaze information obtained by the prediction, and transmit the gaze information including the identified region to the controlled device.

[0010] The controlled device in this disclosure is A controlled device connected to a server device via a wireless network, Equipped with multiple cameras, The controlled device receives gaze information from the server device to which the gaze should be set, From among the multiple camera images captured by the multiple cameras, the region identified by the gaze information is transmitted to the server device.

[0011] The controlled device of the present disclosure may acquire the gaze transitions of the operator of the controlled device, predict the gaze information in accordance with the acquired gaze transitions, and transmit to the server device the region identified by the gaze information obtained by the prediction from among the multiple camera images.

[0012] The program of the present disclosure is a program for causing a computer to realize each function provided in the server device or the controlled device according to the present disclosure, and is a program for causing a computer to execute each procedure included in the method executed by the server device or the controlled device according to the present disclosure.

[0013] Note that the above disclosures can be combined as much as possible.

Effects of the Invention

[0014] In the present invention, it is possible to realize bit rate reduction when transmitting a plurality of camera images for constituting a 360-degree image.

Brief Description of the Drawings

[0015] [Figure 1] Shows a configuration example of a real-time VR system. [Figure 2] Shows an example of a Dual fisheye image transmitted from a controlled object to a server. [Figure 3] Shows an example of a 360-degree image transmitted from a server to an operator. [Figure 4] It is a diagram for explaining the problems of the present disclosure. [Figure 5] It is a diagram for explaining the outline of the present disclosure. [Figure 6] Shows a configuration example of the VR system of the present disclosure. [Figure 7] Shows an example of a line-of-sight transition. [Figure 8] Shows an example of a transition of a viewing area. [[ID=四十一]] [Figure 9] Shows an example of a transition of a transmission area.

Embodiments for Carrying Out the Invention

[0016] Embodiments of this disclosure will be described in detail below with reference to the drawings. However, this disclosure is not limited to the embodiments shown below. These examples are illustrative, and this disclosure can be implemented in various modified and improved forms based on the knowledge of those skilled in the art. In this specification and in the drawings, components with the same reference numerals refer to the same components.

[0017] (VR system) Figure 1 shows an example of a VR system configuration. The robot 91 and the server 92 are connected by a wireless network 81, and the server 92 and the robot 91 operator's headset 93 are connected by an arbitrary communication network 82. The communication network 82 may include a wired network or a wireless network.

[0018] Robot 91 functions as a controlled device in this disclosure, and server 92 functions as a server device in this disclosure. The server device and controlled device in this disclosure can also be realized by a computer and a program, and the program can be recorded on a recording medium or provided over a network.

[0019] Server 92 receives multiple camera images from robot 91 as shown in Figure 2, uses them to generate a 360-degree image as shown in Figure 3, and transmits it to the operator. As a result, the VR image is displayed on the headset 93. In this embodiment, as shown in Figure 2, an example is shown in which multiple images captured by multiple cameras are transmitted as a composite image projected onto a single image. Hereinafter, multiple cameras are dual fisheye cameras, and an image composed of a composite image in which multiple frame images captured simultaneously by dual fisheye cameras are projected onto a single frame image will be referred to as a dual fisheye image.

[0020] This disclosure addresses the bandwidth shortage of the wireless network 81 as shown in Figure 4 by reducing the bitrate of multiple camera images transmitted by the robot 91. (1) Decisively reduce a portion of the area. (2) Reduce the area using statistics based on past history. (3) Reduce the area using the prediction results obtained from the prediction. The following provides a detailed explanation of the robot VR system, in which the controlled object is robot 91.

[0021] (First embodiment) The headset 93 can detect the operator's viewing position in a 360-degree video with stitching. Furthermore, the correspondence between the position in the dual fisheye video and the position in the 360-degree video can be determined using a conversion table. In this embodiment, a conversion table is prepared in advance to define the correspondence between the position in the dual fisheye video and the position in the 360-degree video, and this is used to selectively transmit the dual fisheye video at the viewing position.

[0022] (Pre-processing) The preprocessing steps will be explained with reference to Figure 5. 1. Obtain the viewing position on the Dual fisheye video P11. 2. Use a conversion table to convert the viewing position to the viewing position on the 360-degree video P12 with stitching. 3. Based on the viewing position with the headset 93, the viewing area on the 360-degree video P12 is reduced. This results in the 360-degree video P13. 4. Using the inverse conversion table of the conversion table, the viewing area on the 360-degree video P12 is converted into the form of the Dual fisheye video P14. This makes it possible to identify the viewing area within the Dual fisheye video P14. In this embodiment, this viewing area within the Dual fisheye video P14 is assigned to the robot 91. The following describes an example in which the viewing area is provided as part of the operator's gaze information.

[0023] In this embodiment, When robot 91 transmits multiple camera feeds to create a 360-degree video, Server 92, based on a pre-calculated inverse transformation table, determines the transmission area on multiple camera images corresponding to the viewing area on the 360-degree video. Server 92 notifies the controlled robot 91 of the information in the transmission area, and the controlled robot 91 transmits only the transmission area from among the multiple camera images to Server 92. This method significantly reduces the amount of video traffic transmitted from the controlled robot 91 in this embodiment.

[0024] Figure 6 shows an example of the configuration of the VR system of this disclosure. The robot 91 includes a 360-degree video capture unit 11, a video encoding unit and video transmission unit 12, a robot operation unit 13, and a video control unit 14. The server 92 comprises a conversion processing unit 21, a video transmission unit 22, a transmission / reception unit 23, and an inverse conversion processing unit 24.

[0025] The 360-degree video recording unit 11 is a functional unit capable of capturing 360-degree video. For example, a dual fisheye system can be used to capture 360-degree images of the robot 91 from all angles using two cameras. The video encoding unit and video transmission unit 12 are functional units capable of transmitting camera images from the 360-degree video shooting unit 11 to the server 92. For example, they encode the images from each of multiple cameras and transmit them wirelessly to the server 92. The robot operation unit 13 is a functional unit that performs arbitrary operations on the robot 91 based on operation information from the server 92. The video control unit 14 stores control information used to control the robot 91 and controls the camera images transmitted from the video encoding unit and the video transmission unit 12 based on this information. In this embodiment, the control information includes operation information from the server 92, movement information indicating that any part of the robot 91 has been moved by the execution of the robot operation unit 13, and gaze information obtained by the server 92.

[0026] The conversion processing unit 21 generates a 360-degree video using multiple camera images captured by the 360-degree video shooting unit 11. The video transmission unit 22 transmits 360-degree video to the operator. As a result, the 360-degree video is displayed on the headset 93. The transmitting / receiving unit 23 receives operation information from the operator and the operator's gaze information obtained from the headset 93, and transmits them to the robot 91. The operator's gaze information obtained from the headset 93 represents the viewing position on the 360-degree video. The inverse conversion processing unit 24 uses an inverse conversion table to convert the viewing position on the 360-degree video to the viewing position on the dual fisheye video.

[0027] The video encoding unit and video transmission unit 12 transmit only the portion of the dual fisheye video P11 obtained by the inverse conversion processing unit 24 at the viewing position. This allows this embodiment to reduce the amount of data transmitted from the controlled robot 91.

[0028] (Second embodiment) Figure 7 shows an example of gaze transition. In this embodiment, an example is shown in which gaze information is notified from the server 92 to the robot 91 at intervals τ. In the first embodiment, from the time the server 92 transmits the operator's gaze information (θ,ψ) to the robot 91 until the server 92 transmits the dual fisheye image of the gaze information (θ,ψ) to the operator, the following time intervals occur: transmission time t1 of the gaze information (θ,ψ), video encoding time t2 in the robot 91, video transmission time t3 from the robot 91 to the server 92, and video decoding time t4 in the server 92.

[0029] When the operator's viewing area moves, the delay in tracking these movements can potentially degrade image quality. Therefore, in this embodiment, the video control unit 14 predicts the temporal changes in gaze information and sets the transmission area. This improves the ability to track changes in gaze movement in this embodiment.

[0030] For example, when the delay time from when the line-of-sight information and operation information are transmitted from the operator until the operator views the video frame processed based on that information is d, the time when the user actually views the video processed based on the line-of-sight information at time i is i + d. The video control unit 14 uses the following formula to calculate the transmission area S , i,w , i,w , , i,w , , , , , , i,w , , i,w , i,w , ,

[0031] , at time i + d. S i+d = v i + αM i,w + βσ i,w + γa i,w + δ (1) Here, the parameters are as follows. S i+d : The area to be transmitted at time i v i : The viewing area at time i M i,w : The maximum value of the line-of-sight movement amount in the past w frames from time i σ i,w : The variance of the line of sight in the past w frames from time i a i,w : The acceleration of the line of sight in the past w frames from time i α, β, γ, δ: Constants

[0031] αM i,w is a term for reflecting the maximum movement amount of the line of sight. The transmission area is expanded on the assumption that the movement amount from time i to time i + d is less than or equal to the maximum movement amount of the line of sight in the past w frames. When not considering the direction, α = 2 can be adopted to correspond to the movement to both sides, up and down or left and right, with respect to the viewing position at time i. βσ i,w is a term for reflecting the tendency of the line-of-sight change. A small variance indicates a small line-of-sight change, and a large variance indicates a large line-of-sight change. The variance of the line-of-sight information indicates different properties for each video and operator. γa i,w is a term for reflecting the acceleration of the line-of-sight change. It quickly captures the change points of the line-of-sight movement and follows the line-of-sight movement. Based on these statistical information, the terms can be used alone or in combination by appropriately setting the constants.

[0032] Using equation (1), the transmission region S i+d By calculating this, the ranges A1, A2, and A3 of the transmission area can be changed to follow the operator's gaze transition, as shown in Figure 8. As a result, as shown in Figure 9, the viewing area, which was the Dual fisheye image P21 at time (t-τ), expands to the Dual fisheye image P22 at time (t-τ-ε) and then to the Dual fisheye image P23 at time t, due to the operator's gaze transition.

[0033] In this embodiment, an example is shown in which the video control unit 14 predicts the time change of gaze information and sets the transmission area based on this prediction, but this disclosure is not limited to this. For example, the video control unit 14 may predict the time change of the transmission area and set the transmission area based on this prediction.

[0034] (Third embodiment) The server 92 may perform the prediction of the temporal changes in gaze information. For example, the inverse transformation processing unit 24 may predict the temporal changes in the viewing area, similar to the second embodiment. However, in this embodiment, the inverse transformation processing unit 24 determines the transmission area to be transmitted from the robot 91 by performing an inverse transformation of the viewing area obtained by the prediction.

[0035] (Fourth embodiment) This disclosure allows the server 92 to be mounted on the robot 91. Even in this configuration, this disclosure reduces the amount of video processing required, thus lowering the performance requirements of the mounted server 92.

[0036] As explained above, this disclosure can adaptively determine the video area to be reduced based on the control amount derived from the control information. Furthermore, this disclosure can reduce video areas that cannot be viewed in time based on the temporal progression of gaze information and operation information. As a result, this disclosure can streamline VR video transmission for realizing remote control and tele-existence via the robot 91. [Explanation of Symbols]

[0037] 11: 360-degree video shooting department 12: Video encoding unit and video transmission unit 13: Robot Operation Unit 14: Video Control Unit 21: Conversion Processing Unit 22: Video transmission unit 23: Transmitter / Receiver 24: Inverse Transform Processing Unit 81: Wireless Network 82: Communication Networks 91: Robot 92: Server 93: Headset

Claims

1. A server device connected to the controlled device via a wireless network, A conversion processing unit that acquires multiple camera images captured by multiple cameras from the controlled device and generates a 360-degree image from the multiple camera images, The system includes a conversion table that defines the correspondence between the positions in the multiple camera images and the positions in the 360-degree image, When a viewing area in the 360-degree video is specified, the region in the multiple camera images corresponding to the viewing area is identified by referring to the inverse conversion table of the conversion table. The gaze information including the identified area is transmitted to the controlled device. Server device.

2. The server device according to claim 1, The control target device comprising the aforementioned plurality of cameras, Equipped with, The controlled device is, The server device receives the gaze information, The camera image of the region identified by the line of sight information from among the multiple camera images is transmitted to the server device. VR system.

3. A method executed by a server device connected to a controlled device via a wireless network, The server device is A conversion processing unit that acquires multiple camera images captured by multiple cameras from the controlled device and generates a 360-degree image from the multiple camera images, The system includes a conversion table that defines the correspondence between the positions in the multiple camera images and the positions in the 360-degree image, When a viewing area in the 360-degree video is specified, the region in the multiple camera images corresponding to the viewing area is identified by referring to the inverse conversion table of the conversion table. The gaze information including the identified area is transmitted to the controlled device. method.

4. A server device connected to the controlled device via a wireless network, A procedure for acquiring multiple camera images captured by multiple cameras from the controlled device, and generating a 360-degree image from the multiple camera images, When a viewing area in the 360-degree video is specified, the procedure involves identifying the area in the multiple camera images that corresponds to the viewing area by referring to an inverse conversion table of a conversion table that defines the correspondence between the positions in the multiple camera images and the positions in the 360-degree video. A procedure for transmitting gaze information including the identified area to the controlled device, A program to execute.