Image processing device, image processing method, and program
The image processing apparatus generates virtual viewpoint images with preventive objects or altered properties to prevent accurate three-dimensional shape reconstruction, ensuring privacy and security in virtual viewpoint systems.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- CANON KK
- Filing Date
- 2024-12-20
- Publication Date
- 2026-07-02
Smart Images

Figure 2026110229000001_ABST
Abstract
Description
Technical Field
[0001] The present disclosure relates to an image processing apparatus, an image processing method, and a program, and particularly to a technique for generating a virtual viewpoint video.
Background Art
[0002] There is a system that generates a virtual viewpoint video from a virtual viewpoint specified by a user based on images captured by a plurality of cameras. In the image processing system described in Patent Document 1, an image processing apparatus extracts a foreground image from a captured image of a subject obtained by a camera by comparing it with the background. Then, the image generation apparatus estimates the three-dimensional shape of the subject based on the foreground images obtained from the plurality of captured images. In this system, time and viewpoint can be arbitrarily manipulated. Therefore, a subject at a certain time can be viewed from an arbitrary viewpoint.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] Three-dimensional reconstruction techniques for estimating the three-dimensional shape of a subject using captured images obtained by imaging a stationary subject from a plurality of directions have become widespread. As three-dimensional reconstruction techniques, photogrammetry and NeRF are known. In addition, these three-dimensional reconstruction techniques have become easily available by using applications or web services.
[0005] By using a system that generates virtual viewpoint images with manipulable time and perspective, users can easily obtain images of a subject from multiple viewpoints at a specific time. Using these images and three-dimensional reconstruction techniques, users can potentially reconstruct the three-dimensional shape of the subject.
[0006] This disclosure aims to make it difficult to estimate the three-dimensional shape of a subject based on the generated virtual viewpoint image in a technology for generating a virtual viewpoint image of a subject. [Means for solving the problem]
[0007] An image processing apparatus according to one embodiment has the following configuration. That is, Acquisition means for acquiring multiple pieces of information indicating the time corresponding to an object, the position of a virtual viewpoint, and the direction of line of sight from the virtual viewpoint, A determination means for determining whether the multiple pieces of information satisfy predetermined conditions, A generation means for generating a virtual viewpoint image that includes an object corresponding to the virtual viewpoint and the time, based on each of the multiple pieces of information, and a generation means for changing the generated virtual viewpoint image according to the determination result of the determination means, It is equipped with. [Effects of the Invention]
[0008] In a technology that generates a virtual viewpoint image of a subject, it is possible to make it difficult to estimate the three-dimensional shape of the subject based on the generated virtual viewpoint image. [Brief explanation of the drawing]
[0009] [Figure 1] A block diagram of an image processing device according to one embodiment. [Figure 2] A flowchart of an image processing method according to one embodiment. [Figure 3] A diagram showing an example of a virtual viewpoint image including a protective object. [Figure 4]A diagram showing an example of a three-dimensional reconstruction result based on video footage including a protective object. [Figure 5] A diagram illustrating an example of changing the relative positions of the subject and background. [Figure 6] A block diagram showing an example of a computer hardware configuration. [Modes for carrying out the invention]
[0010] The embodiments will be described in detail below with reference to the attached drawings. Note that the following embodiments do not limit the scope of the claims. While the embodiments describe multiple features, not all of these features are necessarily essential, and the features may be combined in any way. Furthermore, in the attached drawings, identical or similar configurations are given the same reference numerals, and redundant descriptions are omitted.
[0011] Figure 1 shows an example of the configuration of an image processing system according to one embodiment. In this embodiment, the image processing system includes an image processing device and a user terminal 10. The image processing device according to this embodiment generates a virtual viewpoint image. The image processing device according to this embodiment can also generate a virtual viewpoint video composed of multiple frames, each corresponding to a virtual viewpoint image. The image processing device according to this embodiment includes a synchronization unit 2, a shape estimation unit 3, a storage unit 4, and a video generation unit 6. The image processing device may have multiple video generation units 6. In this case, the multiple video generation units 6 may be connected to one storage unit 4.
[0012] Furthermore, the user terminal 10 has a viewpoint indicator unit 5 and a display unit 7. The user terminal 10 may also include a storage unit (not shown) for storing virtual viewpoint images acquired by the user terminal 10.
[0013] The general operation of each component of the image processing system according to one embodiment will be described. First, the multiple imaging units 1 perform imaging in synchronization with each other based on the synchronization signal from the synchronization unit 2. In this way, the multiple imaging units 1 generate images (texture images) from multiple viewpoints. The imaging units 1 output the texture images obtained by imaging to the shape estimation unit 3. The multiple imaging units 1 can be arranged to surround the shooting area where the subject is located. With this arrangement, the multiple imaging units 1 can photograph the subject from multiple directions.
[0014] The shape estimation unit 3 estimates the three-dimensional shape of the subject using the texture image input from the imaging unit 1. For example, the shape estimation unit 3 extracts the silhouette of the subject from the texture image from multiple viewpoints. Then, based on the silhouette of the subject from multiple viewpoints obtained in this way, the shape estimation unit 3 generates data indicating the three-dimensional shape of the subject using methods such as the viewing volume cross-eyed method. Furthermore, the shape estimation unit 3 outputs the generated data indicating the three-dimensional shape and the texture image to the storage unit 4.
[0015] Here, the subject is an object that is the target of three-dimensional shape generation. Examples of subjects include people, items handled by people, and animals. Furthermore, the subject may be a virtual object generated using CG or CAD technology. In this specification, these subjects are referred to as subject objects.
[0016] The storage unit 4 stores data (material data) used for generating virtual viewpoint videos. The material data includes, for example, the texture images input from the shape estimation unit 3 and the three-dimensional shape data of the subject object. The storage unit 4 stores the material data in association with information regarding time. For example, the material data for a specific imaging time can include the texture image at the specific imaging time and the three-dimensional shape data for the specific imaging time. The texture image for a specific imaging time can be the captured images obtained by each of the plurality of imaging units 1 at the specific imaging time. Also, the three-dimensional shape data for a specific imaging time can be data representing the three-dimensional shape of the subject estimated based on the captured images obtained by each of the plurality of imaging units 1 at the specific imaging time. And the storage unit 4 can store the material data corresponding to each of the plurality of times. Further, the storage unit 4 stores camera parameters such as the position, orientation, and optical characteristics of each of the plurality of imaging units 1.
[0017] Furthermore, the storage unit 4 stores background data used for generating the background of the virtual viewpoint video. In this specification, an object serving as the background is referred to as a background object. The background data includes the three-dimensional shape data of the background object and the background texture image. In one embodiment, the same background data is used to generate the virtual viewpoint image regardless of the time information indicated by the operation information. The storage unit 4 may store other data such as audio data.
[0018] The user terminal 10 includes a viewpoint instruction unit 5 and a display unit 7. The user can operate the virtual viewpoint and the time of the playback target using the user terminal 10. Also, the user can view the virtual viewpoint video on the user terminal 10. The viewpoint instruction unit 5 can have a user interface such as a touch panel, a joystick, or a jog dial, for example. The display unit 7 can be a display. The virtual viewpoint video created by the video generation unit 6 described later is displayed on the display unit 7 at any time.
[0019] The viewpoint指示部5 generates information indicating the time corresponding to the subject object, the position of the virtual viewpoint, and the line-of-sight direction from the virtual viewpoint, based on an input to the user interface. Further, the viewpoint指示部5 outputs the generated information to the video generation部6. The time corresponding to the subject object indicates the time associated with the material data used for generating the virtual viewpoint image. That is, by specifying the time corresponding to the subject object, a virtual viewpoint image is generated according to the texture image of the subject at the specified time and the three-dimensional data of the subject based on the captured image obtained at the specified time. In this specification, the information indicating the time corresponding to the subject object is referred to as time information. Also, in this specification, the information for specifying the virtual viewpoint is referred to as virtual viewpoint information. The virtual viewpoint information indicates the position of the virtual viewpoint and the line-of-sight direction from the virtual viewpoint. The virtual viewpoint image can include information indicating the position and line-of-sight direction (or pose) of the virtual viewpoint (corresponding to the external parameters of the camera), and information indicating the focal length and angle of view of the virtual viewpoint (corresponding to the internal parameters of the camera).
[0020] For example, the viewpoint指示部5 can generate operation information indicating the time of the virtual viewpoint image and the operation of the viewpoint. The operation information can include the virtual viewpoint information. Also, the operation information can include the time information. In this specification, the time corresponding to the subject object indicated by the time information may be referred to as the time of the virtual viewpoint image. The virtual viewpoint image at a specific time includes the subject object corresponding to this time. Thus, the operation information can include the time information for specifying the imaging time to be reproduced. In the present embodiment, the user can operate the time and the viewpoint of the virtual viewpoint image so that at least one of them changes over time. That is, the viewpoint指示部5 can generate a plurality of pieces of information indicating the time corresponding to the subject object, the position of the virtual viewpoint, and the line-of-sight direction from the virtual viewpoint. Thus, the viewpoint指示部5 can generate operation information indicating a plurality of combinations of the virtual viewpoint information and the time information.
[0021] The video generation unit 6 generates a virtual viewpoint image based on multiple pieces of information indicating the time corresponding to the subject object, the position of the virtual viewpoint, and the direction of view from the virtual viewpoint. Specifically, the video generation unit 6 generates a virtual viewpoint image that corresponds to the virtual viewpoint and includes the object corresponding to the time. The video generation unit 6 can generate a virtual viewpoint image based on operation information. That is, according to the operation information, the video generation unit 6 can generate a virtual viewpoint image that includes images of the subject object corresponding to each specified time from each specified viewpoint. Specifically, the video generation unit 6 can generate a virtual viewpoint image at a specified time corresponding to the time information included in the operation information, from a specified viewpoint corresponding to the virtual viewpoint information included in the operation information. The method of generating the virtual viewpoint image is not particularly limited. For example, the video generation unit 6 can acquire material data corresponding to the specified time indicated by the time information included in the input operation information from the storage unit 4. Then, the video generation unit 6 can use the three-dimensional shape data of the subject object and multiple texture images of the subject object included in the acquired material data to generate a virtual viewpoint image at the virtual viewpoint indicated by the virtual viewpoint information included in the operation information. For example, the image generation unit 6 can map a texture image to a three-dimensional shape model of the subject object, represented by three-dimensional shape data, by referring to the camera parameters of the imaging unit 1. The image generation unit 6 can also place the three-dimensional shape model in a virtual space. Then, the image generation unit 6 can generate a virtual viewpoint image by rendering an image of the virtual space, including the subject object, as seen from the virtual viewpoint.
[0022] Furthermore, the video generation unit 6 can generate virtual viewpoint images from specified viewpoints of the subject object and background object using the three-dimensional shape data of the background object. For example, the video generation unit 6 can acquire background data stored in the storage unit 4. At this time, the video generation unit 6 can map a background texture image to the three-dimensional shape model of the background object represented by the three-dimensional shape data contained in the background data. In this way, the video generation unit 6 can render the three-dimensional shape model of the subject object and the three-dimensional shape model of the background object together. The video generation unit 6 can output the virtual viewpoint image thus generated to the display unit 7.
[0023] The video generation unit 6 can sequentially generate virtual viewpoint images corresponding to each combination of virtual viewpoint information and time information. The video generation unit 6 can output a virtual viewpoint video composed of the multiple virtual viewpoint images thus generated to the display unit 7. For example, the operation information can specify that the time be advanced along the time axis. In this case, the display unit 7 displays the virtual viewpoint image for each time point. In this way, the display unit 7 plays back a virtual viewpoint video that changes along the time axis.
[0024] The video generation unit 6 further determines whether multiple pieces of information, including the time corresponding to the subject object, the position of the virtual viewpoint, and the direction of line of sight from the virtual viewpoint, satisfy predetermined conditions. Then, the video generation unit 6 controls the generated virtual viewpoint image according to the determination result. For example, the video generation unit 6 can control the generated virtual viewpoint image according to whether the operation at the time indicated by the operation information satisfies predetermined conditions. Depending on the determination result that multiple pieces of information satisfy predetermined conditions, the video generation unit 6 modifies the virtual viewpoint image so that it becomes more difficult to reconstruct the three-dimensional shape of the subject object based on the virtual viewpoint image. On the other hand, in one embodiment, if multiple pieces of information do not satisfy predetermined conditions, no such modification is made. Therefore, when multiple pieces of information do not satisfy predetermined conditions, a deterioration in the quality of the virtual viewpoint image can be prevented.
[0025] The video generation unit 6 includes an operation detection unit 61, a generation processing unit 62, and a prevention processing unit 63. The operation detection unit 61 acquires multiple pieces of information indicating the time corresponding to an object, the position of a virtual viewpoint, and the direction of the line of sight from the virtual viewpoint. It then determines whether the multiple pieces of information satisfy predetermined conditions. For example, the operation detection unit 61 acquires operation information and determines whether the operation at the time indicated by the operation information satisfies predetermined conditions. In this way, the operation detection unit 61 can monitor time information and detect when a specific operation related to time has been performed. The operation detection unit 61 outputs the detection result to the prevention processing unit 63.
[0026] The generation processing unit 62 generates virtual viewpoint images that correspond to virtual viewpoints and include objects corresponding to time periods, based on each of the multiple pieces of information. For example, according to the operation information, the generation processing unit 62 can sequentially generate virtual viewpoint images that include images of subject objects corresponding to each specified time from each specified viewpoint. The generation processing unit 62 can generate virtual viewpoint images in the manner described above.
[0027] The prevention processing unit 63 performs control to make it more difficult to reconstruct the three-dimensional shape of the subject object based on the virtual viewpoint image. In one embodiment, the prevention processing unit 63 can add a prevention indicator to the virtual viewpoint image generated by the generation processing unit 62 based on the output of the operation detection unit 61. This prevention indicator is added in such a way that it makes it more difficult to reconstruct the three-dimensional shape of the subject object based on the virtual viewpoint image output from the video generation unit 6. In this case, the video generation unit 6 outputs a virtual viewpoint image to which the prevention indicator has been added. Specific examples of prevention indicators will be described later.
[0028] Next, the processing performed by the image processing system according to this embodiment will be explained with reference to the flowchart in Figure 2.
[0029] In S201, the user operates the viewpoint indicator unit 5 to control the virtual viewpoint and time. The viewpoint indicator unit 5 then inputs operation information corresponding to the user's operation into the video generation unit 6.
[0030] In S202, the operation detection unit 61 acquires information indicating the time corresponding to the object, the position of the virtual viewpoint, and the direction of the line of sight from the virtual viewpoint. The operation detection unit 61 then determines whether the multiple pieces of information satisfy predetermined conditions. The operation detection unit 61 may also determine whether a predetermined number of recently acquired pieces of information satisfy predetermined conditions. The predetermined conditions are not particularly limited. The predetermined conditions may be conditions relating to changes in time indicated by the multiple pieces of information. For example, the predetermined conditions may be conditions relating to operations at the time indicated by the operation information. Furthermore, the predetermined conditions may be satisfied when an operation is performed that facilitates the reconstruction of the three-dimensional shape of the subject object based on the virtual viewpoint image. For example, by generating virtual viewpoint images of the subject object from multiple virtual viewpoints at the same or nearly the same time, it may be possible to easily reconstruct the three-dimensional shape of the subject object. If the multiple pieces of information satisfy the predetermined conditions, the process proceeds to S203. Otherwise, the process proceeds to S204.
[0031] For example, a predetermined condition may be met when time is stopped. That is, the predetermined condition may include pausing playback. The operation detection unit 61 can detect the pausing of playback based on the time information included in the operation information. For example, the operation detection unit 61 can determine that playback is paused if the times indicated by a predetermined number of recently acquired pieces of information match. Through such an operation, the user can generate virtual viewpoint images of a subject object from multiple virtual viewpoints at the same time. As another example, a predetermined condition may be met when time is changed at a speed below a threshold. For example, the predetermined condition may include performing playback at an extremely slow speed in which the subject is substantially motionless. Through such an operation, the user can generate virtual viewpoint images of a subject object that is substantially motionless from multiple virtual viewpoints.
[0032] Furthermore, the predetermined conditions may also be met when the same time is repeated. That is, the predetermined conditions may include repeatedly playing back the same time or time period. An operation at a time that satisfies the predetermined conditions may be an operation that repeats the same time or time period a predetermined number of times or more. The predetermined number of times is not particularly limited and can be any number of two or more. The operation detection unit 61 can record the generation history of virtual viewpoint images. This generation history may include time information corresponding to the virtual viewpoint image indicated by the operation information. By referring to such a generation history, the operation detection unit 61 can detect that playback is being performed repeatedly at the same time or time period. The user can generate virtual viewpoint images from multiple virtual viewpoints of the subject object at this time by repeating the same time while changing the virtual viewpoint each time playback is performed.
[0033] In S203, the prevention processing unit 63 performs a process to change the virtual viewpoint image displayed in S204. In this embodiment, the prevention processing unit 63 adds an object. In this specification, the additional object thus added is called a prevention object. The prevention processing unit 63 can add a three-dimensional object as a prevention object. In this case, in addition to the subject object included in the material data corresponding to the specified time of the display target, the prevention object is placed in the virtual space.
[0034] A preventative object may be an object that changes depending on the position of the virtual viewpoint. For example, a preventative object may be an object whose color, shape, or arrangement changes depending on the position of the virtual viewpoint. In this specification, processing depending on the position of the virtual viewpoint includes processing depending on the direction of the virtual viewpoint. The direction of the virtual viewpoint may be the orientation of the optical axis center of the virtual viewpoint (i.e., the line of sight), or the orientation from the virtual viewpoint to an object such as a preventative object. For example, a preventative object may be an object whose color, shape, or arrangement changes depending on the direction of the virtual viewpoint. The color, shape, or arrangement of the preventative object may differ for each direction of the virtual viewpoint.
[0035] In one embodiment, the color of the protective object in the virtual viewpoint image changes according to the position of the specified viewpoint. For example, the protective object may have surface properties that change color according to the position of the specified viewpoint. In one embodiment, the protective object is an object with a mirrored surface. The reflected image that appears on an object with a mirrored surface changes depending on the viewing direction. Therefore, the color of the surface of the object with a mirrored surface changes according to the position of the virtual viewpoint. It is difficult to estimate the shape of such an object with a mirrored surface using photogrammetry or the like. Figure 4 shows an example of the estimation result of the three-dimensional shape of an object with a mirrored surface using photogrammetry. In Figure 4, the object with a mirrored surface is detected as an irregularly shaped object 430. Thus, the estimation result of the object with a mirrored surface is inaccurate. Furthermore, the irregularly shaped object 430 is attached to a subject object 410, such as a person, and a background object 420. Due to this influence, the estimated shape of the object is also inaccurate, as shown in parts 411 and 412. Moreover, the color information of the object that is attached in this way is also inaccurate. Furthermore, the estimation of the object's shape in the parts obscured by other objects becomes inaccurate. Thus, adding occluding objects makes it more difficult to reconstruct the three-dimensional shape of the subject object based on the generated virtual viewpoint image. On the other hand, adding objects with reflective surfaces causes little discomfort to the user viewing the virtual viewpoint image.
[0036] Furthermore, the object used for prevention may be semi-transparent. The color of such an object changes according to the color of the object behind it. Therefore, the color of such an object also changes depending on the position of the virtual viewpoint.
[0037] The type of prevention object is not particularly limited. A prevention object may be, for example, an object with unrealistic characteristics that differ from an object existing in real space. For example, the color, shape, or placement of the prevention object in virtual space may change depending on the position of a specified viewpoint or for each virtual viewpoint image generated. The prevention processing unit 63 may set the color, shape, or placement of the prevention object so that the color, shape, or placement of the prevention object changes depending on the position of the virtual viewpoint. The prevention processing unit 63 can change the color, shape, or placement of the prevention object when the position of the virtual viewpoint changes. In this specification, changing the color of an object includes changing the transparency of the object.
[0038] In another example, the prevention processing unit 63 may randomly change the color, shape, or placement of the prevention object. For example, the prevention processing unit 63 can change the color, shape, or placement of the prevention object for each virtual viewpoint image that is generated. In this embodiment, the prevention processing unit 63 changes the color, shape, or placement of the object placed in the virtual space before the rendering process for generating the virtual viewpoint image. In this way, the prevention object can have the unrealistic characteristic of having its color, shape, or placement fluctuate at the same specified time.
[0039] For example, the prevention processing unit 63 can change the texture image of the prevention object according to the position of the virtual viewpoint. The prevention processing unit 63 can also change the color of the object according to the position of the virtual viewpoint. For example, the prevention processing unit 63 can set a mixture of multiple colors as the object's color. In this case, the weight of each of the multiple colors can be changed according to the position of the virtual viewpoint. Furthermore, the prevention processing unit 63 can change the transparency of the object according to the position of the virtual viewpoint. The prevention processing unit 63 may also deform the prevention object according to the position of the virtual viewpoint. For example, the prevention processing unit 63 may change the size of the prevention object according to the position of the virtual viewpoint. Furthermore, the prevention processing unit 63 may translate or rotate the position of the prevention object according to the position of the virtual viewpoint. For example, the prevention processing unit 63 can position the prevention object so that a specific face of the prevention object faces a specified viewpoint. That is, the prevention object may be rotated so that a specific face of the prevention object faces the virtual viewpoint. For example, the prevention object may be a billboard-shaped object with a texture image. The prevention processing unit 63 can set the orientation of this object so that the texture image always faces the virtual viewpoint.
[0040] Furthermore, the prevention object may have faces that are visible from the inside of the prevention object but invisible from the outside. For example, only one side of the mesh representing the three-dimensional shape of the prevention object may be a hidden face. Specifically, the mesh constituting the prevention object may be set so that the normal direction (front surface) is inward and the back surface is hidden. In the virtual viewpoint image, the face of such a prevention object that is close to the virtual viewpoint is not displayed, and only the face that is far from the virtual viewpoint is displayed.
[0041] These objects possess display characteristics that are not realistic. Therefore, when using three-dimensional reconstruction techniques such as photogrammetry or NeRF, it is expected that the estimation results of the subject objects will be inaccurate. Furthermore, even when fewer or smaller protective objects are used, it is expected that estimating the three-dimensional shape of the objects will become difficult. This reduces the impact on the user experience of users viewing the virtual viewpoint image.
[0042] On the other hand, the prevention object may be a normal object whose color or shape does not change depending on the position of the virtual viewpoint. By placing such an object near the subject object, it is expected that a part of the subject object will be hidden by the prevention object. This makes it more difficult to reconstruct the three-dimensional shape of the object based on the generated virtual viewpoint image.
[0043] Preventive objects may be prepared in advance. For example, the shape, color, or texture of the preventive objects may be set in advance. On the other hand, the prevention processing unit 63 may change the number of preventive objects. For example, the prevention processing unit 63 may add preventive objects depending on the situation. The prevention processing unit 63 may also change the arrangement of preventive objects according to the subject object. For example, the prevention processing unit 63 can change the arrangement of preventive objects in the virtual space according to the size, number (e.g., the number of subjects), or position of the subject object. As a specific example, the prevention processing unit 63 can arrange multiple preventive objects so as to surround the subject object. The prevention processing unit 63 may also place preventive objects between the subject object and the virtual viewpoint. The prevention processing unit 63 may also change the size of the preventive objects according to the field of view of the virtual viewpoint, or the distance between the virtual viewpoint and the subject object.
[0044] Furthermore, the prevention processing unit 63 may change the number or size of prevention objects according to the amount of time that time is stopped, the amount of time that time is changing at a rate below a threshold, or the number of repetitions of the same time period. For example, the longer the time that time is stopped, the more prevention objects there may be, or the larger the prevention objects may become. Also, the prevention processing unit 63 may change the number or size of prevention objects according to the distance traveled by the virtual viewpoint while time is stopped or while time is changing at a rate below a threshold.
[0045] In S204, the generation processing unit 62 generates a virtual viewpoint image that includes an image of the object corresponding to the specified time from a specified viewpoint, as described above. If it is determined that multiple pieces of information do not meet predetermined conditions, the generation processing unit 62 can generate a virtual viewpoint image of the subject object that includes the material data corresponding to the specified time of the display target. On the other hand, if it is determined that multiple pieces of information meet predetermined conditions, the virtual space will contain both the subject object that includes the material data corresponding to the specified time of the display target and the prevention object that the prevention processing unit 63 added in S203. The generation processing unit 62 then generates a virtual viewpoint image of this virtual space. Figure 3 shows an example of a virtual viewpoint image generated by the generation processing unit 62 when it is determined that multiple pieces of information meet predetermined conditions. The virtual viewpoint image shown in Figure 3 includes the subject object 310 and the background object 320, as well as the prevention object 330.
[0046] Furthermore, the generation processing unit 62 outputs the generated virtual viewpoint image to the display unit 7. The display unit 7 generates a virtual viewpoint image corresponding to the time information and virtual viewpoint information.
[0047] The video generation unit 6 generates multiple virtual viewpoint images, i.e., virtual viewpoint videos, by repeating the processes S201 to S204, according to multiple pieces of information indicating the time corresponding to an object, the position of the virtual viewpoint, and the direction of the line of sight from the virtual viewpoint.
[0048] According to this embodiment, the video generation unit 6 can generate a virtual viewpoint image on which a protective object is placed. As a result, when a user creates a virtual viewpoint video while manipulating the virtual viewpoint and generates a three-dimensional shape model of a subject object using the virtual viewpoint video, the accuracy of the three-dimensional shape model becomes low.
[0049] (Another embodiment) In the embodiment described above, the prevention processing unit 63 placed a prevention object in a virtual space to make it difficult to reconstruct the three-dimensional shape of the subject object. However, it is not essential to use a prevention object to make it difficult to reconstruct the three-dimensional shape of the subject object. The prevention processing unit 63 may employ another method.
[0050] For example, the prevention processing unit 63 may superimpose an additional image (preventive image) onto the virtual viewpoint image when it is determined that multiple pieces of information satisfy predetermined conditions. For example, the prevention processing unit 63 can generate a preventive image in S203. Then, the generation processing unit 62 can superimpose the preventive image onto the virtual viewpoint image generated using the source data in S204. The type of preventive image is not particularly limited. The preventive image may be, for example, a watermark. Such a watermark is difficult for the user to see, but is expected to affect the reconstruction of the three-dimensional shape of the subject object using the virtual viewpoint video. However, the preventive image may also be an image that is visible to the user. Such a preventive image may differ depending on the position of the virtual viewpoint or for each virtual viewpoint image that is generated.
[0051] Furthermore, the generation processing unit 62 can superimpose a foreground image or background image onto the generated virtual viewpoint image, regardless of whether the multiple pieces of information satisfy predetermined conditions. In this case, the prevention processing unit 63 may change the color, shape, or position of the foreground image or background image depending on whether the multiple pieces of information satisfy predetermined conditions. For example, the prevention processing unit 63 may change the color, shape, or position of the foreground image or background image depending on the position of the virtual viewpoint, or for each generated virtual viewpoint image. Such processing is also expected to reduce the accuracy of estimating the shape of the subject object.
[0052] Furthermore, depending on whether multiple pieces of information satisfy predetermined conditions, the color, shape, or placement of the subject object may change according to the position of the specified viewpoint or for each virtual viewpoint image generated. For example, the prevention processing unit 63 can perform translation, scaling, reduction, or rotation on the subject object. Through such processing, the prevention processing unit 63 can slightly change the positional relationship between the background object and the subject object according to the position of the virtual viewpoint. Figure 5 shows an example in which a subject object placed in virtual space is slightly translated from position 510 to position 520. Due to the translation, the positional relationship between the subject object and the background object 530 has changed. Such processing makes it difficult to reconstruct the three-dimensional shape of the subject object using the virtual viewpoint image. For example, such processing can hinder the process of estimating the camera's position and orientation using photogrammetry technology. The prevention processing unit 63 may also change the texture of the subject object.
[0053] Similarly, the prevention processing unit 63 may change the color, shape, or position of the background object in response to the determination that multiple pieces of information satisfy predetermined conditions. For example, the prevention processing unit 63 may change the color, shape, or position of the background object according to the position of the virtual viewpoint, or for each virtual viewpoint image that is generated. Such a configuration is expected to reduce the accuracy of the shape estimation for the background object. Furthermore, it is expected that the accuracy of the shape estimation for the subject object will also decrease, for example, as the estimated background object may stick to the subject object. For example, the prevention processing unit 63 can perform translation, scaling, or rotation on the background object. Such processing also changes the positional relationship between the background object and the subject object.
[0054] Furthermore, the prevention processing unit 63 may change the positional relationships between multiple objects in response to the determination that multiple pieces of information satisfy predetermined conditions. Multiple objects may include subject objects, background objects, and prevention objects. For example, the positional relationship between subject objects and background objects, the positional relationship between subject objects and prevention objects, or the positional relationships between multiple subject objects may change depending on the position of the virtual viewpoint or for each virtual viewpoint image that is generated. The prevention processing unit 63 can change the positional relationships between multiple objects by performing translation, scaling, or rotation on at least one object.
[0055] Furthermore, the prevention processing unit 63 may perform a process to reduce the resolution or sharpness of the virtual viewpoint image generated by the generation processing unit 62, depending on whether it has determined that multiple pieces of information meet predetermined conditions. For example, the prevention processing unit 63 can apply blurring to the virtual viewpoint image. Specifically, if the virtual viewpoint moves while time is stopped, the prevention processing unit 63 can apply motion blur to the virtual viewpoint image. With such a configuration, the resolution of the virtual viewpoint image can be reduced while suppressing the discomfort felt by the user. The prevention processing unit 63 may also forcibly add blur to the virtual viewpoint image by methods such as shifting the focus position. The prevention processing unit 63 may also change the transparency of the subject object. The prevention processing unit 63 can reduce the sharpness of the virtual viewpoint image by increasing the transparency to an extent that does not greatly cause discomfort to the user. Furthermore, the prevention processing unit 63 may superimpose noise such as block noise onto the virtual viewpoint image. When such processing is performed, the user may feel that the transmission bandwidth has decreased. Therefore, the discomfort felt by the user when the resolution or sharpness is reduced can be suppressed.
[0056] Furthermore, the prevention processing unit 63 can apply visual effects to the virtual viewpoint image generated by the generation processing unit 62 when it is determined that multiple pieces of information meet predetermined conditions. For example, the prevention processing unit 63 can apply effects such as lens flare, ghosting, glow, or glare. By applying such processing, the outline of the subject can be blurred. Also, different effects may be superimposed on the subject object depending on the position of the virtual viewpoint. This can hinder the reconstruction of the three-dimensional shape of the subject object using photogrammetry or the like. The user can perceive such effects as a visual effect during time stoppage. This can reduce the user's sense of unease.
[0057] Furthermore, the prevention processing unit 63 may superimpose invisible noise onto the virtual viewpoint image generated by the generation processing unit 62, depending on whether it has determined that multiple pieces of information meet predetermined conditions. This processing is expected to make it difficult to reconstruct the shape of the subject object from the virtual viewpoint image using machine learning techniques.
[0058] Up to this point, we have described various processes performed by the prevention processing unit 63 to make it difficult to estimate the three-dimensional shape of an object. The prevention processing unit 63 may use a combination of the above processes. By combining multiple processes, it becomes even more difficult to estimate the three-dimensional shape of an object. In addition, the prevention processing unit 63 may switch between multiple processes.
[0059] Furthermore, in the above-described embodiment, when it is determined that multiple pieces of information satisfy predetermined conditions, the prevention processing unit 63 performs processing to make it difficult to estimate the three-dimensional shape of the object. However, the conditions under which the prevention processing unit 63 performs processing are not limited to those described above. For example, the operation detection unit 61 may determine whether multiple pieces of virtual viewpoint information satisfy predetermined conditions. The predetermined conditions may be, for example, that the virtual viewpoint rotates around a specific subject object. In this case, when the user operates the virtual viewpoint to create virtual viewpoint images from various directions of the subject object, the prevention processing unit 63 performs processing. Furthermore, the prevention processing unit 63 may always perform processing to make it difficult to estimate the three-dimensional shape of the object. With such a configuration, it is possible to make it difficult to estimate the three-dimensional shape of a background object that does not change over time. In this case, the prevention processing unit 63 may switch between multiple prevention displays according to the performance.
[0060] As described above, the generation processing unit 62 can generate multiple virtual viewpoint images containing objects corresponding to the same specified time, according to information indicating the time corresponding to an object, the position of a virtual viewpoint, and the direction of line of sight from the virtual viewpoint. In one embodiment, the color, shape, or arrangement of objects in the virtual space changes among the multiple virtual viewpoint images generated. As described above, such an embodiment can be realized by changing the color, shape, or arrangement of objects such as subject objects, background objects, or protective objects according to the position of the specified viewpoint or for each virtual viewpoint image generated. For example, the color, shape, or arrangement of objects may differ depending on the direction of the virtual viewpoint. According to such an embodiment, as described above, it becomes difficult to reconstruct the three-dimensional shape of an object using virtual viewpoint images. Such processing may be performed depending on whether the multiple pieces of information satisfy predetermined conditions. On the other hand, such processing may be performed regardless of whether the multiple pieces of information satisfy predetermined conditions. For example, such processing may always be performed. If such processing is always performed, the color, shape, or arrangement of objects can be changed in a way that is not easily perceived by the user.
[0061] (Other examples) Each processing unit of the image processing device shown in Figure 1 can be implemented by hardware. On the other hand, the functions of each processing unit may be implemented by computer programs. For example, the image processing device according to the above embodiment can be implemented by a computer equipped with a processor and memory. Furthermore, the image processing device may be composed of multiple information processing devices connected, for example, via a network. For example, the functions of the image processing device may be provided as a cloud service.
[0062] Figure 6 is a block diagram showing an example of the hardware configuration of a computer usable as an image processing device according to one embodiment. The CPU 601 controls the entire computer using computer programs or data stored in RAM 602 or ROM 603. The CPU 601 also executes the processing performed by each processing unit of the image processing device. In other words, the CPU 601 can function as each processing unit of the image processing device.
[0063] RAM 602 has an area for temporarily storing computer programs or data read from the external storage device 606. RAM 602 can also store data acquired from the outside via the interface 607. In addition, RAM 602 provides a work area used by the CPU 601 when executing processing. For example, RAM 602 can provide a memory area such as frame memory.
[0064] ROM 603 stores configuration data or boot programs, etc. The operation unit 604 inputs various instructions to the CPU 601 according to user operations. The operation unit 604 may be a keyboard or mouse, etc. The output unit 605 displays the processing results obtained by the CPU 601. The output unit 605 may be, for example, a liquid crystal display. The operation unit 604 can function as a viewpoint indicator unit 5. The output unit 605 can also function as a display unit 7.
[0065] The external storage device 606 stores the operating system (OS) computer program and computer programs that enable the CPU 601 to implement the functions of each processing unit of the image processing device. The external storage device 606 can also function as a storage unit 4 for storing source data. The external storage device 606 may be a large-capacity information storage device such as a hard disk drive.
[0066] In this way, a processor such as the CPU 601 can realize the functions of each processing unit of the information processing device by executing a program stored in memory such as RAM 602, ROM 603, or external storage device 606. For example, a computer program or data stored in the external storage device 606 is loaded into RAM 602 as appropriate, according to the control of the CPU 601. The program or data loaded into RAM 602 is then processed by the CPU 601.
[0067] I / F607 can be connected to a network such as a LAN or the Internet, or to other devices such as a projection device or display device. The image processing device can send and receive various information via I / F607. The imaging unit 1 and the user terminal 10 may be connected to I / F607. In this case, the captured image and operation information are input via I / F607. The image processing device can also control the imaging unit 1 or transmit images to be displayed on the display unit 7 via I / F607. Bus 608 connects the above-mentioned parts.
[0068] The configuration of this disclosure can also be realized by supplying a program that implements one or more of the functions of the embodiments described above to a system or device via a network or storage medium, and by a process in which one or more processors in the computer of the system or device read and execute the program. It can also be realized by a circuit (e.g., an ASIC) that implements one or more functions.
[0069] The disclosures herein include the following image processing apparatus, image processing methods, and programs. (Item 1) Acquisition means for acquiring multiple pieces of information indicating the time corresponding to an object, the position of a virtual viewpoint, and the direction of line of sight from the virtual viewpoint, A determination means for determining whether the multiple pieces of information satisfy predetermined conditions, A generation means for generating a virtual viewpoint image that includes an object corresponding to the virtual viewpoint and the time, based on each of the multiple pieces of information, and a generation means for changing the generated virtual viewpoint image according to the determination result of the determination means, An image processing apparatus characterized by comprising: (Item 2) The image processing apparatus according to item 1, characterized in that the predetermined conditions are met when time is stopped, time is changed at a speed below a threshold, or the same time is repeated. (Item 3) The image processing apparatus according to any one of items 1 to 2, characterized in that the generation means modifies the virtual viewpoint image in such a way that it becomes more difficult to reconstruct the three-dimensional shape of the object based on the virtual viewpoint image, depending on whether the plurality of pieces of information are determined to satisfy the predetermined conditions. (Item 4) The image processing apparatus according to any one of items 1 to 3, characterized in that the generation means adds a three-dimensional object as an additional object when a plurality of pieces of information satisfy the predetermined conditions. (Item 5) The image processing apparatus according to item 4, characterized in that the color of the additional object in the virtual viewpoint image changes according to the position of the virtual viewpoint. (Item 6) The image processing apparatus according to any one of items 4 to 5, characterized in that the additional object has a mirror surface. (Item 7) The image processing apparatus according to any one of items 4 to 6, characterized in that the aforementioned additional object has different properties from an object in real space. (Item 8) The image processing apparatus according to any one of items 4 to 7, characterized in that the additional object has a surface that is visible from the inside of the additional object and invisible from the outside of the additional object. (Item 9) An image processing apparatus according to any one of items 4 to 8, characterized in that the color, shape, or arrangement of the additional objects in the virtual space changes depending on the position of the virtual viewpoint or for each virtual viewpoint image generated. (Item 10) The image processing apparatus according to any one of items 4 to 9, characterized in that the generation means arranges the additional object such that a specific face of the additional object faces the virtual viewpoint. (Item 11) The image processing apparatus according to any one of items 1 to 10, wherein the generation means superimposes an additional image onto the virtual viewpoint image in accordance with whether a plurality of pieces of information satisfy the predetermined conditions. (Item 12) The image processing apparatus according to item 11, characterized in that the additional image is a watermark image. (Item 13) An image processing apparatus according to any one of items 1 to 12, characterized in that the color, shape, or arrangement of the object in the virtual space changes depending on the position of the virtual viewpoint or for each virtual viewpoint image generated, in response to multiple pieces of information satisfying the predetermined conditions. (Item 14) The generation means further generates a virtual viewpoint image that corresponds to the virtual viewpoint and includes an object corresponding to the time and the background object, using the three-dimensional shape data of the background object. An image processing apparatus according to any one of items 1 to 13, characterized in that, depending on whether the multiple pieces of information satisfy the predetermined conditions, the color, shape, or arrangement of the background objects in the virtual space, or the positional relationship between the objects and the background objects, changes depending on the position of the virtual viewpoint or for each virtual viewpoint image that is generated. (Item 15) The image processing apparatus according to any one of items 1 to 14, characterized in that the generation means performs a process to reduce the resolution or sharpness of the virtual viewpoint image in response to the multiple pieces of information satisfying the predetermined conditions. (Item 16) The image processing apparatus according to any one of items 1 to 15, characterized in that the generation means retrieves three-dimensional shape data and multiple texture images of the object corresponding to each of a plurality of time points from a storage means that stores three-dimensional shape data and texture images of the object corresponding to each of a plurality of time points, and generates the virtual viewpoint image using the three-dimensional shape data and texture images. (Item 17) Acquisition means for acquiring multiple pieces of information indicating the time corresponding to an object, the position of a virtual viewpoint, and the direction of line of sight from the virtual viewpoint, The system comprises a generation means that generates a virtual viewpoint image, which includes an object corresponding to the virtual viewpoint and corresponding to the time, based on each of the multiple pieces of information, An image processing apparatus characterized in that the generation means generates a plurality of virtual viewpoint images including objects corresponding to the same time, such that the color, shape, or arrangement of the objects in the virtual space changes between the plurality of virtual viewpoint images. (Item 18) An image processing method performed by an image processing device, A step of acquiring multiple pieces of information indicating the time corresponding to an object, the position of a virtual viewpoint, and the direction of line of sight from the virtual viewpoint, A step of determining whether the multiple pieces of information satisfy predetermined conditions, The process includes the step of generating a virtual viewpoint image that corresponds to the virtual viewpoint and includes an object corresponding to the time, based on each of the multiple pieces of information, An image processing method characterized by changing the virtual viewpoint image generated according to the determination result in the determination step. (Item 19) A program to cause a computer to function as an image processing device as described in one of items 1 through 17.
[0070] The configurations relating to this disclosure are not limited to the embodiments described above, and various modifications and variations are possible without departing from the spirit and scope of this disclosure. Accordingly, the claims are attached to make the scope of this disclosure public. [Explanation of Symbols]
[0071] 1: Imaging unit, 2: Synchronization unit, 3: Shape estimation unit, 4: Storage unit, 5: Viewpoint indication unit, 6: Image generation unit, 7: Display unit, 10: User terminal, 61: Operation detection unit, 62: Generation processing unit, 63: Prevention processing unit
Claims
1. Acquisition means for acquiring multiple pieces of information indicating the time corresponding to an object, the position of a virtual viewpoint, and the direction of line of sight from the virtual viewpoint, A determination means for determining whether the multiple pieces of information satisfy predetermined conditions, A generation means for generating a virtual viewpoint image that includes an object corresponding to the virtual viewpoint and the time, based on each of the multiple pieces of information, and a generation means for changing the generated virtual viewpoint image according to the determination result of the determination means, An image processing apparatus characterized by comprising:
2. The image processing apparatus according to claim 1, characterized in that the predetermined conditions are met when time is stopped, time is changed at a speed below a threshold, or the same time is repeated.
3. The image processing apparatus according to claim 1, characterized in that the generation means modifies the virtual viewpoint image in such a way that it becomes more difficult to reconstruct the three-dimensional shape of the object based on the virtual viewpoint image, depending on whether the plurality of pieces of information are determined to satisfy the predetermined conditions.
4. The image processing apparatus according to claim 1, wherein the generation means adds a three-dimensional object as an additional object in response to a plurality of pieces of information satisfying the predetermined conditions.
5. The image processing apparatus according to claim 4, characterized in that the color of the additional object in the virtual viewpoint image changes according to the position of the virtual viewpoint.
6. The image processing apparatus according to claim 4, characterized in that the additional object has a mirror surface.
7. The image processing apparatus according to claim 4, characterized in that the additional object has different characteristics from an object in real space.
8. The image processing apparatus according to claim 4, characterized in that the additional object has a surface that is visible from the inside of the additional object and invisible from the outside of the additional object.
9. The image processing apparatus according to claim 4, characterized in that the color, shape, or arrangement of the additional objects in the virtual space changes depending on the position of the virtual viewpoint or for each virtual viewpoint image generated.
10. The image processing apparatus according to claim 4, wherein the generation means arranges the additional objects such that a specific face of the additional objects faces the virtual viewpoint.
11. The image processing apparatus according to claim 1, wherein the generation means superimposes an additional image onto the virtual viewpoint image in accordance with whether a plurality of pieces of information satisfy the predetermined conditions.
12. The image processing apparatus according to claim 11, characterized in that the additional image is a watermark image.
13. The image processing apparatus according to claim 1, characterized in that the color, shape, or arrangement of the objects in the virtual space changes depending on the position of the virtual viewpoint or for each virtual viewpoint image generated, in accordance with the multiple pieces of information satisfying the predetermined conditions.
14. The generation means further generates a virtual viewpoint image that corresponds to the virtual viewpoint and includes an object corresponding to the time and the background object, using the three-dimensional shape data of the background object. The image processing apparatus according to claim 1, characterized in that, depending on whether the multiple pieces of information satisfy the predetermined conditions, the color, shape, or arrangement of the background objects in the virtual space, or the positional relationship between the object and the background objects, changes depending on the position of the virtual viewpoint or for each virtual viewpoint image to be generated.
15. The image processing apparatus according to claim 1, characterized in that the generation means performs a process to reduce the resolution or sharpness of the virtual viewpoint image in response to the plurality of pieces of information satisfying the predetermined conditions.
16. The image processing apparatus according to claim 1, characterized in that the generation means obtains three-dimensional shape data and a plurality of texture images of the object corresponding to each of a plurality of time points from a storage means that stores three-dimensional shape data and texture images of the object corresponding to each of a plurality of time points, and generates the virtual viewpoint image using the three-dimensional shape data and the texture images.
17. Acquisition means for acquiring multiple pieces of information indicating the time corresponding to an object, the position of a virtual viewpoint, and the direction of line of sight from the virtual viewpoint, The system comprises a generation means that generates a virtual viewpoint image, which includes an object corresponding to the virtual viewpoint and corresponding to the time, based on each of the multiple pieces of information, An image processing apparatus characterized in that the generation means generates a plurality of virtual viewpoint images including objects corresponding to the same time, such that the color, shape, or arrangement of the objects in the virtual space changes between the plurality of virtual viewpoint images.
18. An image processing method performed by an image processing device, A step of acquiring multiple pieces of information indicating the time corresponding to an object, the position of a virtual viewpoint, and the direction of line of sight from the virtual viewpoint, A step of determining whether the multiple pieces of information satisfy predetermined conditions, The process includes the step of generating a virtual viewpoint image that corresponds to the virtual viewpoint and includes an object corresponding to the time, based on each of the multiple pieces of information, An image processing method characterized by changing the virtual viewpoint image generated according to the determination result in the determination step.
19. A program for causing a computer to function as an image processing device according to any one of claims 1 to 17.