Programs, methods, and systems

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The system addresses augmented reality content display challenges by aligning virtual objects with real-world conditions using silhouette images and depth information, ensuring consistent movement and preventing display failures.

JP2026096768APending Publication Date: 2026-06-15COVER CORP

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: COVER CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

AI Technical Summary

Technical Problem

Existing augmented reality technologies face issues where content display is compromised due to environmental factors such as marker unreadability or inaccurate location estimation, leading to impaired user experience.

Method used

A system and method that generates augmented reality content by combining real and virtual spaces, using silhouette images and depth information to ensure consistent virtual object movement with the real-world situation, allowing content display even in imperfect conditions.

Benefits of technology

Enhances user experience by ensuring virtual objects' movements align with real-world situations, preventing content display issues and improving satisfaction.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026096768000001_ABST

Patent Text Reader

Abstract

To provide a program, method, system, and movable object that enables the display of augmented reality images without compromising the user experience. [Solution] When a user initiates the process, the system executes a process to display content video that combines video of the real world being captured with video of virtual objects operating in the virtual world. During the period in which the system accepts the initiate operation, the system executes a process to display a specific image, which is a silhouette image that matches a specific object that can be placed at a predetermined location in the real world when the real world is captured from a predetermined viewing position. The video of the virtual objects operating in the virtual world is video in which, in relation to the video of the real world being captured when a user initiates the process at the viewing position, the movement of the virtual objects is consistent with the situation in the real world.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to a program, a method, a system, and a movable object.

Background Art

[0002] There is a technology such as AR (Augmented Reality, augmented reality (sense)) that adds information such as virtual objects to the real space and expands the real world. In order to provide the user with an experience using this augmented reality, for example, a technology for reading a marker such as a two-dimensional code with a terminal and displaying augmented reality content video (AR content) is known (see, for example, Patent Document 1). In addition, when providing AR content to a user, attempts have also been made to display content according to the position information of the user terminal (see, for example, Patent Documents 2 and 3).

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Patent Document 2

Patent Document 3

Summary of the Invention

Problems to be Solved by the Invention

[0004] However, if a marker is read by the terminal and the content video is displayed by superimposing it on the marker, as in Patent Document 1, the content cannot be displayed if the marker cannot be read properly. Also, as in Patent Documents 2 and 3, if the user's location information is required to display the content (for example, whether or not the user is located at an event venue), if it becomes difficult to estimate the location information, there is a risk that the content may not be displayed even if the user is in the correct location (for example, the event venue). This would impair the user experience.

[0005] This invention was conceived in view of the above circumstances and provides a program, method, system, and movable object that enable the display of augmented reality images without impairing the user experience. [Means for solving the problem]

[0006] (1) A program according to a certain aspect of the present invention provides a computer (e.g., a distribution server 100, a user terminal 300) capable of generating augmented reality content video (e.g., AR content video that combines real space and virtual space data) by combining an operating virtual object (e.g., character C) with an image of real space (e.g., an image of real space captured by the information acquisition unit 350), When a user initiates a start operation (for example, when the start button 23 for starting playback of AR content displayed on the start operation screen such as Figure 4(B) is operated), means (for example, image processing unit 361, control unit 130) for executing a process (for example, the AR content playback process in Figure 11) to display content video (for example, AR content video that combines the image of the captured real space with the image of a virtual object operating in the virtual space (for example, AR content video that combines the 3DCG character C displayed on the screen of the user terminal 300 in Figure 5(C) with the real space, or video of character C operating in the real space as in Figures 6 and 7), During the period in which the start operation is accepted (for example, after the planar scan in step S102 of Figure 11, when the start button 23 is displayed on the user terminal 300 as in Figure 4(B), Figure 5(A), (B), etc.), a means (for example, the control unit 130, image processing unit 361, display control unit 364, etc.) is configured to perform a process (for example, step S103 of Figure 11) to display a specific image (for example, a silhouette image that matches a specific object (for example, see Panel 10, modified example (movable object), etc.) that can be placed at a predetermined position in the real space (for example, the ideal installation location of Panel 10 in advance) when the real space is imaged from a predetermined viewing viewpoint (for example, a viewpoint that views Panel 10 from a predetermined distance away from the pre-planned installation location of Panel 10 from a predetermined direction), such that the contour and shape of Panel 10 when viewed from a predetermined direction at a predetermined distance away from Panel 10 (from the correct viewing viewpoint), such as silhouette image 22 in Figure 5(A), (B), etc.) is configured to function as a means (for example, the control unit 130, image processing unit 361, display control unit 364, etc.) for displaying a specific image (for example, a silhouette image 22 as in Figure 5(A), (B), etc., where the contour and shape of Panel 10 when viewed from a predetermined direction at a predetermined distance away from Panel 10 (from the correct viewing viewpoint)). The video of a virtual object operating in the aforementioned virtual space (for example, the video of character C, etc.) When the user initiates the playback from the aforementioned viewing viewpoint (for example, when the start button 23 for starting playback of AR content displayed on the start operation screen such as Figure 4(B) is operated), the resulting video shows the virtual object's actions in relation to the captured image of the real space in a state consistent with the situation in the real space (for example, as illustrated in Figures 6 and 7, the character performs actions that match the situation in the real space (for example, the location and intended use of buildings, etc.)).

[0007] With this configuration, once the user initiates the process, augmented reality content video can be displayed. This prevents the user experience from being compromised due to environmental factors preventing the display of content video, and also enhances the viewing experience by displaying content video in which the movement of virtual objects is consistent with the situation in the real world when the real world space is being captured from the viewing viewpoint.

[0008] (2) In (1) above, the virtual object is an object with a three-dimensional solid shape (for example, character C, which is a 3DCG avatar object).

[0009] With this configuration, users can visualize the movement of virtual objects in three-dimensional images, thereby enhancing their interest.

[0010] (3) In (1) above, the virtual object is a character (for example, character C which is a 3DCG avatar object), The specific object placed in real space has an illustration of a character displayed on it (for example, illustration 12 drawn on panel 10), and the outline of the specific object is the outline of the character illustration (for example, the outline of panel 10 is the outline of illustration 12).

[0011] With this configuration, the identification image becomes a silhouette image that follows the outline of the character's illustration, enhancing user interest, and allowing content providers to adopt complex shapes as identification images.

[0012] (4) In (3) above, the video of the character operating in the virtual space is started by the user with the identification image superimposed on the outline of the character's illustration (for example, as illustrated in Figure 5(B), the start button 23 is operated with the silhouette image 22 aligned with the panel 10), and the video is started in a manner in which the character displayed on the identification object appears in the real space (for example, by aligning the real-space coordinates of the panel 10 with the origin of the virtual space and placing character C at the origin, it appears as if character C is popping out of the panel 10).

[0013] With this configuration, by starting the program with the silhouette image aligned to the character's illustration, the content video is displayed in a manner that makes the character appear to emerge from a specific object, thereby enhancing the sense of the character's presence.

[0014] (5) In (1) above, the specific object can be placed in a location different from the predetermined location in real space (for example, movable objects such as panel 10 can have their placement changed depending on the circumstances of the event venue).

[0015] With this configuration, it is possible to move specific objects according to the conditions in the real world.

[0016] (6) In (1) above, the means for performing the process for displaying the content video is: The content video, which is played back based on a user's start operation, will be displayed if a start operation is performed, even if the identification image and the outline of the specific object placed in real space do not match (for example, even if the silhouette image 22 does not match the panel 10, if it is determined in step S104 that the start button 23 has been operated by the user, the content will be made playable in step S105).

[0017] With this configuration, even if the outline of a specific object and the identification image, which is a silhouette image matching that outline, do not match, the content video can be displayed if the user initiates the process. This prevents the user experience from being impaired due to the content video not being displayed because environmental factors are not met.

[0018] (7) In the above (6), the computer is Based on the video of the real space captured by the user, means (such as the acquisition information analysis unit 362, etc.) is made to function as a means for executing a process of determining the degree of coincidence between the contour of the specific use image and the contour of the specific object arranged in the real space (for example, refer to (automatic playback according to the degree of coincidence) of the modification example). The means for executing the process of displaying the content video, when it is determined by the means for executing the determination process that the degree of coincidence exceeds a predetermined value (for example, the area where the panel 10 and the silhouette image 22 intersect by image segmentation is 90% or more), displays the content video without a start operation by the user (for example, regardless of whether the start button 23 is operated by the user).

[0019] According to such a configuration, if the specific use image and the contour of the specific object match, the content video can be displayed without a start operation by the user, thus increasing the convenience for the user.

[0020] (8) In the above (1), the content video is displayed based on content display data (for example, stored as content data 121 in FIG. 10, downloaded and read into the user terminal 300 based on the two-dimensional code 11 displayed on the panel 10, etc.) that includes information on the display start position of the virtual object, and each content data includes information on the coordinates of the display start position of the character in the virtual space (arranged at the origin, etc.). The content display data includes correction value data (for example, data of the correction parameters for the display start position illustrated in FIG. 10, etc.) for correcting the predetermined display start position of the virtual object. The correction value is a value determined according to the arrangement position of the specific object arranged in the real space (for example, if the panel 10 has moved 1 m to the left from a predetermined position, a value for correcting the start position 1 m to the right, etc.).

[0021] According to such a configuration, since the display start position of the virtual object can be corrected according to the arrangement position of the specific object, it is possible to display a content video in which the operation of the virtual object is in a state consistent with the actual situation.

[0022] (9) In the above (1), the computer is means for executing a process (for example, GPS information, self-position estimation process by SLAM, etc.) for estimating the position of the real space of the user terminal when there is a start operation from the user (for example, control unit 130, acquisition information analysis unit 362, etc.), and means for executing a process (for example, image processing unit 361) for determining whether or not the error between the position of the real space estimated by the means for executing the process for the estimation and the coordinates of the predetermined viewing point is greater than or equal to a specific value (for example, the distance between a marker arranged at an ideal panel or the display position of a character and the coordinates of the character is 2 m or more, etc., indicating that there is a deviation between the actual viewing point and the ideal viewing point), and the means for executing the process for displaying the content video, when it is determined by the means for executing the process for the determination that the error is greater than or equal to the specific value, corrects the display position of the virtual object to the position (for example, the coordinates of character C when reproduced from the predetermined correct viewing point) where it would be displayed if the start operation was performed by the user at the predetermined viewing point (for example, refer to the correction according to the error being greater than or equal to the specific value in the modified example) for the process.

[0023] According to such a configuration, even if the start operation is performed from a viewpoint different from the predetermined viewing point, it is possible to correct so that the content video is such that the operation of the virtual object is in a state consistent with the situation of the real space. Thereby, it is possible to avoid the user experience being impaired due to the content video not being displayed, and at the same time, improve the user satisfaction.

[0024] (10) In (5) above, the means for performing the process for displaying the content video corrects the display position of the virtual object to a position that can be displayed if the user has initiated the operation from the predetermined viewing viewpoint, even if the location of the specific object has moved, by scanning a marker placed in the real space for identifying the predetermined position of the specific object placed in the real space (for example, the ground at the predetermined ideal placement position of the panel 10, or correction markers placed nearby) (see, for example, the modified example (correction of position error of movable object with markers)).

[0025] With this configuration, even if the location of a specific object is moved from its predetermined position, it becomes possible to correct the content video so that the movement of the virtual object is consistent with the situation in the real world. This makes it possible to improve user satisfaction while avoiding the loss of user experience due to the content video not being displayed.

[0026] (11) In (5) above, the means for performing the process for displaying the content video moves the display position of the virtual object in the content video so that the movement of the virtual object is displayed in a manner consistent with the situation in the real space, by scanning a guidance marker placed in the real space for correcting the display position of the virtual object (for example, a marker placed near the predetermined movement path of character C) (see, for example, the modified example (guidance marker to a suitable location)).

[0027] With this configuration, even if the start operation is performed from a viewpoint different from the predetermined viewing viewpoint, it becomes possible to correct the content video so that the movement of the virtual object is consistent with the situation in the real world. This makes it possible to improve user satisfaction while avoiding the user experience being impaired by the content video not being displayed.

[0028] (12) In the above (1), the computer is Means for performing processes (for example, an acquired information analysis unit 362, a control unit 130, etc.) to acquire depth information that makes it possible to identify the depth position of each real object that may be included in the image of the real space captured by the user (for example, image processing to identify depth information, or processing to identify depth information acquired by a depth sensor), Based on the depth information (e.g., depth information) that can be identified from the information acquired by the means for performing the processing to acquire the aforementioned depth information, the means (e.g., acquired information analysis unit 362, control unit 130, etc.) are configured to perform a process (e.g., converting the position of the real object to coordinates in the virtual space) that identifies real objects that may be included in the captured image and identifies the depth position of the real object in the virtual space in which the virtual object operates. The means for performing the processing to display the aforementioned content video moves the position of the virtual camera placed in the virtual space in conjunction with the user's movement of the viewpoint for capturing the video in the real space (for example, the position and rotation of the camera that captures the real space are reflected in the parameters of the virtual camera in conjunction with the camera in the real space), Even if the position of the imaging viewpoint is moved, the portion of the virtual object that overlaps with a real object located in front of it is hidden, and the portion that does not overlap is displayed in the content video, based on the depth information identified by the means for performing the processing to acquire the depth information (for example, the virtual object is occluded by an object in real space).

[0029] This configuration makes it possible to display content images that emphasize the geometric consistency between the real and virtual spaces based on depth information, thereby improving the sense of realism.

[0030] (13) A method according to a certain aspect of the present invention is a method for generating augmented reality content video (for example, AR content video that combines real space and virtual space data) by combining an image of real space (for example, an image of real space captured by the information acquisition unit 350) with an operating virtual object (for example, character C), When a user initiates the process (for example, when the start button 23 for starting playback of AR content displayed on the initiation screen such as Figure 4(B) is pressed), the computer is instructed to perform a process (for example, the AR content playback process in Figure 11) to display content video that combines the captured image of the real world with the image of a virtual object operating in the virtual world (for example, AR content video that combines the 3DCG character C displayed on the screen of the user terminal 300 in Figure 5(C) with the real world, or video of character C operating in the real world, as in Figures 6 and 7). During the period in which the start operation is accepted (for example, after the planar scan in step S102 of Figure 11, when the start button 23 is displayed on the user terminal 300 as in Figure 4(B), Figure 5(A), (B), etc.), the system includes a step of causing the computer to execute a process (for example, step S103 of Figure 11) to display a specific image, which is a silhouette image that matches a specific object (for example, see Panel 10, modified example (movable object), etc.) that can be placed at a predetermined position in real space (for example, the ideal installation location of Panel 10, as planned in advance) when the real space is imaged from a predetermined viewing viewpoint (for example, a viewpoint that views Panel 10 from a predetermined distance away from the Panel 10 and from a predetermined direction), (for example, a silhouette image 22 in Figures 5(A), (B), etc., where the contour and shape of Panel 10 when viewed from a predetermined direction at a predetermined distance away from Panel 10 (from the correct viewing viewpoint)), The video of a virtual object operating in the aforementioned virtual space (for example, the video of character C, etc.) When the user initiates the playback from the aforementioned viewing viewpoint (for example, when the start button 23 for starting playback of AR content displayed on the start operation screen such as Figure 4(B) is operated), the resulting video shows the virtual object's actions in relation to the captured image of the real space in a state consistent with the situation in the real space (for example, as illustrated in Figures 6 and 7, the character performs actions that match the situation in the real space (for example, the location and intended use of buildings, etc.)).

[0031] With this configuration, once the user initiates the process, augmented reality content video can be displayed. This prevents the user experience from being compromised due to environmental factors preventing the display of content video, and also enhances the viewing experience by displaying content video in which the movement of virtual objects is consistent with the situation in the real world when the real world space is being captured from the viewing viewpoint.

[0032] (14) A system according to a certain aspect of the present invention is a system comprising a computer (e.g., a distribution server 100, a user terminal 300) capable of generating augmented reality content video (e.g., AR content video that combines real space and virtual space data) by combining an operating virtual object (e.g., character C) with an image of real space (e.g., an image of real space captured by an information acquisition unit 350), When a user initiates a start operation (for example, when the start button 23 for starting playback of AR content displayed on the start operation screen such as Figure 4(B) is operated), means (for example, image processing unit 361, control unit 130) for executing a process (for example, the AR content playback process in Figure 11) to display content video (for example, AR content video that combines the image of the captured real space with the image of a virtual object operating in the virtual space (for example, AR content video that combines the 3DCG character C displayed on the screen of the user terminal 300 in Figure 5(C) with the real space, or video of character C operating in the real space as in Figures 6 and 7), During the period in which the start operation is accepted (for example, after the planar scan in step S102 of Figure 11, when the start button 23 is displayed on the user terminal 300 as in Figure 4(B), Figure 5(A), (B), etc.), the system includes means (for example, a control unit 130, an image processing unit 361, a display control unit 364, etc.) that executes a process (for example, step S103 of Figure 11) to display a specific image (for example, a silhouette image such as the silhouette image 22 in Figures 5(A), (B), etc., where the contour and shape of the panel 10 when viewed from a predetermined direction at a predetermined distance from the panel 10 (from the correct viewing viewpoint)) that matches a specific object (for example, the panel 10, modified example (movable object), etc.) that can be placed at a predetermined position in the real space (for example, the ideal installation location of the panel 10 as planned in advance) when the real space is imaged from a predetermined viewing viewpoint (for example, a viewpoint from a predetermined distance from the panel 10 as planned in advance), which is a silhouette image that matches a specific object (for example, the panel 10, modified example (movable object), etc.) that can be placed at a predetermined position in the real space (for example, the ideal installation location of the panel 10 as planned in advance), The video of a virtual object operating in the aforementioned virtual space (for example, the video of character C, etc.) This is a video in which, when a user initiates the playback from the aforementioned viewing perspective (for example, when the start button 23 for starting playback of AR content displayed on a start operation screen such as Figure 4(B) is operated), the character performs actions that are consistent with the situation in the real space (for example, the location and intended use of buildings, etc.) in relation to the captured video of the real space.

[0033] With this configuration, once the user initiates the process, augmented reality content video can be displayed. This prevents the user experience from being compromised due to environmental factors preventing the display of content video, and also enhances the viewing experience by displaying content video in which the movement of virtual objects is consistent with the situation in the real world when the real world space is being captured from the viewing viewpoint.

[0034] (15) A movable object according to a certain aspect of the present invention is a movable object that can be installed in real space (for example, a panel, a display, etc., see Modified Examples (of Movable Objects)), The movable object displays content access information (e.g., a QR code, a URL, etc.). The content access information displays a silhouette image that matches the movable object when the movable object is photographed from a predetermined viewing viewpoint in relation to the movable object (for example, a viewpoint from which the panel 10 is viewed from a predetermined distance away from the predetermined installation location of the panel 10 and from a predetermined direction) (for example, an image in which the contour and shape of the panel 10 match when viewed from a predetermined distance away from the panel 10 and from a predetermined direction (from the correct viewing viewpoint), when the movable object is installed in a predetermined position in real space and is photographed from the aforementioned viewing viewpoint, in relation to the image of real space captured when the user performs a start operation (for example, the image of real space captured when the panel 10 is installed in a predetermined correct location and the start button 23 is operated by the user from the corresponding correct viewing viewpoint and In this relationship, the system enables the display of a performance video consistent with the situation in the real space using augmented reality (for example, a virtual space video that is geometrically consistent with the situation in the real space video being captured) by compositing it onto the real space video (for example, a composite video using augmented reality can be displayed). On the other hand, when the movable object is not placed in the predetermined position in the real space or is not captured from the viewing viewpoint, the performance video will be inconsistent with the situation in the real space in relation to the real space video captured when the user performs a start operation, but the system enables the display of the performance video by compositing it onto the real space video (for example, when panel 10 is not placed in the correct installation location that was planned in advance, or when the playback operation is not performed from the planned viewing viewpoint, the movement of character C will be displayed in a manner that does not correspond to the originally planned situation in the real space, but the AR content video itself can be played).

[0035] With this configuration, it is possible to display content images that are consistent with the situation in the real world based on content access information displayed on movable objects, while also being able to display content even in inconsistent situations. This helps to prevent the user experience from being impaired due to the inability to display content images due to environmental factors. [Brief explanation of the drawing]

[0036] [Figure 1] This diagram shows an example of a communication system's hardware configuration. [Figure 2] This is a diagram illustrating the configuration of the distribution server. [Figure 3] This is a diagram illustrating the configuration of the user terminal. [Figure 4] This is a diagram illustrating examples of specific objects and examples of images used for identification. [Figure 5] This diagram illustrates an example of the procedure for displaying AR content in relation to the real world. [Figure 6] This diagram illustrates an example of how virtual objects in AR content are displayed in relation to the real world. [Figure 7] This figure illustrates another example of how virtual objects in AR content are displayed in relation to the real world. [Figure 8] This is a diagram illustrating the relationship between a specific object and a specific image. [Figure 9] This diagram illustrates examples of how AR content is displayed in relation to the real world. [Figure 10] This is an example of content data used for playing AR content. [Figure 11] This is a flowchart showing an example of AR content playback processing. [Modes for carrying out the invention]

[0037] Embodiments of the present invention will be described below with reference to the drawings. However, the present invention is not limited to the following examples, and all modifications within the meaning and scope of the claims are intended to be included in the present invention. In the following description, the same elements in the drawings will be denoted by the same reference numerals, and redundant descriptions will not be repeated.

[0038] This embodiment relates to a method for enabling users to experience augmented reality by combining and displaying virtual three-dimensional computer graphics (3DCG) with captured images of real space. With augmented reality (AR) content (augmented reality images that users can experience through application software that enables users to experience augmented reality), the user experience is improved by displaying AR content (hereinafter sometimes referred to as "content") that is consistent with the situation in real space. As an example of AR content that is consistent with the situation in real space, it is conceivable to play a composite image of a real space image with a specific building or object in the real space on which computer graphics (CG) corresponding to that specific building or object are displayed.

[0039] Traditionally, in order to play AR content that can be aligned with the conditions of the real world, methods have been employed such as reading markers displayed in the real world (for example, predetermined icons or QR codes) and using the position of the marker as a guide (for example, by aligning the marker's position with the origin of the virtual space or by using it as the display position of a virtual object) to play the AR content. In addition, even without using markers, methods have been employed to display content aligned with the real world by recognizing predetermined objects in the real world (such as the recognition of predetermined three-dimensional objects or the automatic recognition of predetermined images through image segmentation) or based on the location information of the user's terminal.

[0040] However, if content cannot be played unless the marker is recognized by the user's terminal, the marker may become unreadable due to deterioration of the medium on which it is printed (e.g., fading, damage, or dirt). Also, when content is played by recognizing a marker or a designated object, it is necessary to create an environment where there are no obstacles between the marker and the user's terminal's camera, and to secure an optimal location for the reading process, in order for the marker or object to be properly read by the user's terminal. However, at events where many people gather, it may be difficult to secure a location where the marker or other object to be scanned can be recognized, or to create a situation where there are no obstacles in front of the marker or other object to be scanned. In that case, there is a risk that users may be unable to play the content even though they are at the event venue, which would impair the user experience.

[0041] Furthermore, if a method is used that plays content appropriate to the real-world location based on the user's terminal location information, such as GPS data, it is limited to outdoor environments where GPS information can be properly acquired, and there is a risk that accurate location information cannot be obtained due to environmental factors such as nearby buildings. In addition, there is a risk that location information cannot be acquired properly depending on the specifications of the user's terminal's sensors.

[0042] Another method involves measuring 3D data of the event venue where content is to be played in advance, and then determining the location of the user's device within the 3D space of the event venue based on the device's self-position estimation function, and then playing the content. However, measuring 3D data requires an enormous amount of time and effort, and it can be difficult to measure objects that are not rigid bodies, making it sometimes impossible to obtain accurate venue data.

[0043] As described above, if the conditions for playing AR content depend on environmental factors in the real world, it becomes difficult to play the AR content, and the user experience is impaired. Therefore, in this embodiment, we prevented the user experience from being impaired by enabling AR content playback according to the situation in the real world (such as three-dimensional shapes) without relying on environmental factors in the real world. Specifically, we adopted a unique approach that suggests the correct playback position (viewing viewpoint) in the real world for playing AR content to the user, while leaving the accuracy of the viewing viewpoint and the judgment of playback timing to the user.

[0044] Figure 1 shows an example of the hardware configuration of communication system 1. Communication system 1 includes a distribution server 100 and multiple user terminals 300a, 300b, 300c, etc. The multiple user terminals 300a, 300b, 300c, etc. are terminals owned by each of the multiple users, and are collectively referred to as user terminal 300 below.

[0045] The distribution server 100 and the user terminal 300 are each capable of communicating via network 2 and sending and receiving information (data) bidirectionally. Network 2 is, for example, the internet and consists of access networks such as LAN (Local Area Network), WAN (Wide Area Network), mobile communication networks (e.g., 5G, wireless networks, etc.), wired telephone networks, FTTH (Fiber To The Home), and CATV (Cable Television) networks.

[0046] The distribution server 100 is, for example, a computer such as a workstation or personal computer with communication capabilities. The distribution server 100 manages programs and content data for providing (allowing users to experience) XR (Extended Reality), which merges the real world and the virtual world. For example, it manages a program that enables the execution of augmented reality image display processing in this embodiment on the user terminal, and a virtual space (for example, a virtual space for augmented reality image display processing, including a metaverse space) which is a virtual world constructed on the computer, and provides it to the user terminal 300 via network 2.

[0047] The virtual space is managed and configured according to the type of content provided. Available content includes, but is not limited to, augmented reality, as well as content that allows users to view and experience games, live events, programs, etc., and content that facilitates interaction and communication between users using chat, emotes, etc. Furthermore, for example, if the virtual space is a so-called metaverse space, users can access the distribution server 100 using their user terminal 300, select the desired content, and seamlessly participate in (navigate) that content, allowing them to view and experience the virtual space corresponding to that content.

[0048] Depending on the type of content, the virtual space includes 3D spaces (spaces constructed based on 3D data) and 2D spaces (spaces constructed based on 2D data) generated by CG (Computer Graphics). The virtual space also contains virtual characters (avatar objects / character objects), backgrounds and virtual object objects depending on the content type, and user-selectable menu objects.

[0049] Content data includes, for example, information to identify objects placed in the virtual space (e.g., object type, placement location, orientation, posture, and appearance), information to identify each user character of a user participating in the virtual space (e.g., user character type, placement location, orientation, posture, appearance, motion data, and audio data), objects representing backgrounds and virtual objects according to the type of content, and menu objects selected by the user.

[0050] In this embodiment, the distribution server 100 may be implemented by a single computer, or it may be implemented by multiple computers (for example, multiple servers).

[0051] The user terminal 300 is a computer capable of displaying images using augmented reality, and is, for example, a personal computer, tablet device, or smartphone, which has operation input functions and communication functions.

[0052] The user terminal 300 can communicate with the distribution server 100 and execute programs received in response to operations performed on the terminal. The user terminal 300 receives content data of the content selected by the user from the distribution server 100. Based on the received content data, the user terminal 300 constructs a virtual space of the content selected by the user from the virtual space constructed on the distribution server 100 within the user terminal 300's memory area, displays images within that virtual space, and outputs sound. This makes it possible to view and experience the virtual space of the content via the user terminal 300. For example, the user terminal 300 can provide the user with augmented reality by combining captured images (image data) obtained by the information acquisition unit (camera, etc.) with CG, etc., based on a program for augmented reality image display processing received from the distribution server 100. Hereinafter, an application that provides augmented reality (AR content) to the user by a program for augmented reality image display processing will also be referred to as an AR application. An application that provides augmented reality to the user is, for example, an application that can be installed on the user terminal 300.

[0053] <Configuration of the distribution server> Next, the configuration of the distribution server 100 will be described. As shown in Figure 2, the distribution server 100 includes a communication unit 110 that communicates with other computers, a storage unit 120 that stores various data, and a control unit 130 that controls the entire computer. The communication unit 110, the storage unit 120, and the control unit 130 are interconnected by a bus line.

[0054] The communication unit 110 is a communication interface equipped with a NIC (Network Interface Card controller) for wired or wireless communication. The communication unit 110 communicates with other computers via network 2.

[0055] The memory unit 120 consists of RAM (Random Access Memory), ROM (Read Only Memory), flash memory, HDD (Hard Disk Drive), etc. The memory unit 120 stores programs for executing various control processes (for example, programs for managing and providing content using AR content that enables augmented reality shooting and image display, or virtual spaces such as the metaverse), various data, etc. The various data stored in the memory unit 120 include content data 121 that stores object data that can be displayed in the AR content provided to the user. Although not shown in the figures, it also includes game programs that contain information for identifying images in the virtual space provided for each type of content (for example, game content), and user information about the user. User information about the user includes, for example, an ID for identifying the user and a username.

[0056] The content data 121 includes and manages information for identifying various objects (also referred to as virtual objects) placed in the virtual space as information for AR content provided to the user. The table of content data 121 stored in the memory unit 120 stores information such as the shape and pattern of various objects, motion information, placement information, and the amount of cache and virtual currency required to generate various objects.

[0057] Objects include 2D and 3D CG data, such as character objects and decorative objects. This CG data can be displayed on the user terminal 300 as an augmented reality image through arbitrary user operations. Hereafter in this embodiment, virtual objects that are different from real-world objects, such as character objects and decorative objects, may also be referred to as unreal objects. Unreal objects are simply objects that are different from real-world objects in the captured image, and include various virtual objects such as objects that cannot exist in real space and objects that mimic real-world objects. Objects that correspond to real-world objects in the captured image (for example, objects, polygon masks, etc., that correspond to objects or surfaces included in an image captured from real space) may also be referred to as real objects. Furthermore, the display manner of character objects can be changed according to user operations. For example, the character's facial expression, pose, clothing, and motion can be changed. Character objects have a predetermined height according to the character.

[0058] Decorative objects include, for example, decorations such as letters and icons (stickers, stamps, etc.), frames, and particles. The display of decorative objects can be changed according to user operations. For example, the size, color, and pattern of decorative objects can be changed. Particles are visual effects accompanied by animation, such as objects composed of multiple particles like snow, flowers, rain, and light, or objects that manipulate fluids like water and fire, enabling effects such as snow falling in a designated area or a designated area catching fire. Decorative objects can be edited by the user or linked to character object data and played back as AR content.

[0059] The control unit 130 consists of a CPU (Central Processing Unit) and the like. The control unit 130 controls the overall operation of the distribution server 100 by executing programs stored in the memory unit 120.

[0060] The functional configuration of the control unit 130 is described below. The control unit 130 functions as at least a data transmission / reception unit 131, a content management unit, and a user management unit (not shown).

[0061] The data transmission / reception unit 131 receives various information transmitted from the user terminal 300 and transmits various information to the user terminal 300. The data transmission / reception unit 131 refers to various management tables stored in the storage unit 120. The various information transmitted and received by the data transmission / reception unit 131 includes, for example, information about objects placed in the virtual space managed by the content management unit (for example, operations to place, delete, or move objects associated with a user in the virtual space), as well as game information and various notifications. Information about objects placed in the virtual space includes, for example, information for displaying (playing) character objects and decorative objects stored as content data 121 as AR content on the user terminal 300.

[0062] The data used to play (display) AR content includes virtual objects placed in a virtual space, such as 3DCG three-dimensional shapes (shape models), character objects, game objects, and moving object objects like cars, as well as object information such as 2D video content and images of paintings. The data for various virtual objects also includes 3D models such as the shape and color information of the objects, motion data of the 3D models, and information about the placement position of the virtual objects in the virtual space. As placement position information, for example, information such as placing the object at the origin (0,0,0) of the virtual space is pre-associated. It is also possible to include correction value information to modify the predetermined placement position information. Furthermore, it is possible to include information about various effects as a visual effect for the content video. In addition, the data includes AR content data that is displayed in a manner that projects images onto buildings, etc., when viewed from the correct position, in accordance with the real space, such as games (for example, shooting games), paintings, and other content.

[0063] <User terminal configuration> Next, the configuration of the user terminal 300 will be described in detail. As shown in Figure 3, the user terminal includes a communication unit 310 that communicates with other computers, including the distribution server 100; a storage unit 320 that stores various data; an operation unit 330 for inputting operations, etc.; an output unit 340 for outputting images, etc.; an information acquisition unit 350; and a control unit 360 that controls the entire computer. The communication unit 310, storage unit 320, operation unit 330, output unit 340, information acquisition unit 350, and control unit 360 are interconnected by bus lines.

[0064] The communication unit 310 is a communication interface for wired or wireless communication. The communication unit 310 communicates with other computers, including the distribution server 100, via the network 2.

[0065] The memory unit 320 is a memory composed of RAM, ROM, etc. The memory unit 320 stores programs for executing various control processes (for example, programs for executing AR applications that enable augmented reality photography and image display, programs for viewing content using virtual spaces such as the metaverse, etc.), various data, etc. The various data includes, for example, character objects and decorative objects that are stored as content data 121 of the distribution server 100.

[0066] The operation unit 330 includes input devices (e.g., touch panel, touchpad, pointing device such as mouse, keyboard, microphone, etc.) for receiving user input and voice commands. User operation in this embodiment refers to user operations on these operation units 330. For example, this includes touch operations on the touch panel, slide operations, flick operations, button operations, drag (swipe) operations, pinch-in / out operations, operations on icons displayed on the display unit of the user terminal 300, operations on pointing devices and keyboards, and voice input to the microphone. User operations on the operation unit 330 enable the selection, movement, rotation, and scaling of unrealistic objects that may be displayed by AR content, etc.

[0067] The output unit 340 includes an output device (such as a display unit, speaker, etc.) for presenting and outputting information (text, images, audio, etc.) to the user.

[0068] The information acquisition unit 350 is an input device for acquiring information about the real world, and includes various sensors such as a camera, an acceleration sensor, and a gyroscope. The information acquisition unit 350 performs imaging processing of the real world and identifies the distance (e.g., depth, depth position) from the information acquisition unit 350 to an object in the real world, as well as the shape of the object. The information acquisition unit 350 includes, for example, an imaging unit such as a depth-sensing camera that combines a lens (optical system) and an image sensor (image sensor) and is equipped with a depth sensor that measures the distance from the terminal to an object and the distance between objects. For example, it includes a camera equipped with a depth sensor (e.g., a ToF (Time of Flight) type sensor). The camera may be a stereo camera capable of estimating depth, and it is possible to use a camera equipped with various sensors that enable the estimation of depth in the captured image.

[0069] The control unit 360 consists of a CPU and other components. The control unit 360 controls and performs calculations on the overall operation of the user terminal 300 by executing programs stored in the memory unit 320.

[0070] The functional configuration of the control unit 360 is described below. The control unit 360 includes an image processing unit 361 for displaying (playing) AR content. The image processing unit 361 functions as an acquired information analysis unit 362, an augmented reality data generation unit 363, and a display control unit 364, among others.

[0071] The acquired information analysis unit 362 performs self-position estimation and mapping (using SLAM, etc.) based on the captured image (general camera footage) acquired by the information acquisition unit 350 and depth information (depth information, such as image format information generated based on sensor measurements or depth estimation by machine learning (image processing, etc.)) identified (estimated) by the depth sensor of the information acquisition unit 350, and detects planes (horizontal or vertical) in real space, as well as detecting objects and determining their coordinates. For example, it is possible to identify the shape of an object in the image acquired by the information acquisition unit 350 and determine the position (coordinates) of that object. The acquired information analysis unit 362 also identifies camera-related information such as the size, focal length, and brightness of the camera sensor associated with the captured image taken by the information acquisition unit 350. Based on the information acquired by the information acquisition unit 350, the acquired information analysis unit 362 determines the coordinate system (e.g., world coordinates).

[0072] The augmented reality data generation unit 363 generates a virtual space (a virtual space for realizing augmented reality) for arranging non-realistic objects (such as character objects and decorative objects) that can be displayed as AR content. The virtual space for realizing augmented reality, as described below, is information for identifying coordinates using augmented reality, and may include information that allows for the identification of the spatial relationship between non-realistic objects and objects in the real space (real objects). For example, in this embodiment, the position within the virtual space may be identified based on coordinates determined by the acquired information analysis unit 362 based on depth information, etc. The augmented reality data generation unit 363 places character objects, decorative objects, etc., stored in the storage unit 320 within the virtual space based on user operation. Based on the information acquired by the information acquisition unit 350, the augmented reality data generation unit 363 places a virtual camera in the virtual space at a position corresponding to (equivalent to) the captured image in the real world, and controls the position, orientation, tilt, etc. of the virtual camera. In other words, the position, rotation, and other conditions of the camera that photographs the real world are reflected in the parameters of the virtual camera in conjunction with the camera in the real world. Furthermore, the augmented reality data generation unit 363 generates a composite image (composite video) that becomes AR content by combining an image of an object in the virtual space acquired from a virtual camera with an image captured by the information acquisition unit 350.

[0073] Furthermore, the augmented reality data generation unit 363 generates a transparent polygon mesh (hereinafter simply referred to as a mask or mask polygon) as a real object corresponding to the object detected by the acquired information analysis unit 362. This mask is placed at coordinates (placement positions) specified by the depth information of the imaging information. This converts the position of an object in real space to coordinates in virtual space. For example, the augmented reality data generation unit 363 creates a mask polygon using the depth information acquired by the information acquisition unit 350, making it possible to create a front-to-back relationship (occlusion) between the character object and the mask polygon in virtual space. Although this mask is transparent and not displayed, it is also referred to as a real object because it is an object corresponding to a real object.

[0074] The display control unit 364 causes the display unit (such as a display) to display an image corresponding to the viewing area, which is the field of view from the virtual camera, among the virtual spaces stored in the storage unit 320 based on the content data acquired by the communication unit 310. That is, the user can change the position, orientation, tilt, etc. of the virtual camera arranged in the virtual space according to the operation on the operation unit 330, and display and view an image within the field of view of the virtual camera among the images in the virtual space on the user terminal 300. Further, the display control unit 364 causes the display unit (such as a display) to display a composite image (composite video) of the image of the real space generated by the augmented reality data generation unit 363 and the image of the virtual space. The display control unit 364 renders and draws, for example, two-dimensional or three-dimensional data.

[0075] <Example of screen display mode during AR shooting> Hereinafter, referring to FIGS. 4 to 9, an example of the mode of AR content (composite video by augmented reality) that can be displayed in this embodiment and an example of the procedure until the AR content is displayed will be described. In this embodiment, it is possible to display a video of AR content that matches a specific location, such as an event venue, at the specific location. Specifically, the display mode of the AR content and the actions (production contents) of characters, etc. are videos that are consistent with a specific location in the real space. For example, it is a production video of a character that is consistent with geometric consistency and the usage of a specific location in the real space. The AR content can be reproduced on the user terminal 300 based on receiving content data from the distribution server 100 to the user terminal 300. The usage includes, for example, entering the entrance of a building, worshipping at a shrine, climbing onto a stage, pointing at a predetermined location for guidance (which may be combined with voice data, etc.), etc., including how the environment and situation of the location are utilized.

[0076] In this embodiment, the AR content video is a video that operates a 3DCG character, and the character's movements are geometrically aligned (aligned) with a specific location (the real environment of that specific location), as will be described later with reference to Figures 6 and 7. The AR content video that aligns with the real environment is played back as a video that aligns with the real environment because the position of the user terminal 300 (including information that determines the viewpoint, such as the camera direction (rotation angle)) when the AR content playback start operation is performed is located at the correct viewpoint (viewing point).

[0077] In other words, if the user terminal 300 is positioned at the correct viewing viewpoint and the AR content is played, the user can experience the correct content video. Conversely, if the user is experiencing the correct content video, it can be said that the viewpoint from which the AR content was played is the correct viewing viewpoint. In this embodiment, the correct viewing viewpoint includes not only the correct position but also the correct orientation. The correct orientation is the orientation from which a specific object (for example, the panel 10 described later) placed at a predetermined position and orientation (predetermined position) is viewed from the front, and the correct position is the position where the contour shape of the specific object placed at the predetermined position and orientation (predetermined position) and the shape of a predetermined image displayed on the user terminal 300 (for example, the silhouette image 22 described later) substantially coincide in the image captured by the user terminal 300 and displayed on the screen of the user terminal 300.

[0078] In this embodiment, a character silhouette image is displayed on the screen used for starting the AR content playback, so that the user terminal 300 is moved to the correct viewpoint. When the user moves the user terminal 300 to align the character silhouette image with the panel and starts the AR content (playback operation), the content of the AR content that can be viewed from the viewing viewpoint from which the operation was performed is played back as an image that is consistent with the situation in the real world. As a result, even if markers are not loaded or location information such as GPS is not acquired, the AR content can be played back at the user's timing, and an image that can be viewed when played back from the correct viewing viewpoint can be displayed. The details are explained below.

[0079] Figure 4 is a diagram illustrating the reception of content data for playing AR content based on content access information displayed on a panel depicting a character. The content data for playing AR content of the character displayed on the panel can be received by the user terminal 300 based on content access information displayed on panel 10 installed at the event venue, as illustrated in Figure 4(A). Content access information is, for example, a two-dimensional code. Panel 10 also depicts an illustration 12 of a predetermined character. Panel 10 is positioned at a predetermined location and facing a predetermined direction (hereinafter, this may also be referred to as the predetermined location, including the direction it faces). The predetermined location is, for example, a position that corresponds to the correct viewing viewpoint such that the character C's movements are consistent with the situation in the real world, by aligning the coordinate point of the distance from the camera origin (where the user terminal 300 is located) to panel 10 with the origin of the virtual space when the user terminal 300 is positioned at a predetermined distance from panel 10 and the panel 10 is photographed from a predetermined angle, and is an ideal placement position.

[0080] The user (hereinafter referred to as User U in the diagrams) scans the two-dimensional code 11 displayed on the panel 10 with the user terminal 300 at an event venue or similar location. Figure 4(B) shows an example of a screen (hereinafter also referred to as the AR start operation screen) displayed on the user terminal 300 that accepts the start operation to play the AR content based on the information received from the server 100 accessed via the two-dimensional code.

[0081] In this embodiment, when the content data received by the user terminal 300 is loaded, a start button 23 for playing (displaying) the AR content is displayed on the display unit of the user terminal 300, along with a silhouette image 22 of the character displayed on the panel. The silhouette image 22 is an image whose size and shape match the outline of the panel 10 when viewed from the correct viewing viewpoint. When the user operates the start button 23, a video of the AR content with 3DCG in operation is displayed. The AR start operation screen may be displayed after the plane detection operation in the real space has been performed and the plane has been detected (ground detection). Alternatively, the AR start operation screen may be displayed with the start button 23 grayed out and unoperable before plane detection, and the start button 23 may be activated and made operable after plane detection.

[0082] Figure 5 is a diagram illustrating the procedure for playing AR content from the correct viewing perspective. Figures 5(A) to 5(C) each represent a user U who photographs a panel 10 placed in real space with a user terminal 300, and below them are examples of the display screen of the user terminal 300 operated by the user U. First, the user operating the user terminal 300 overlays the character silhouette image 22, as explained in Figure 4, onto the panel 10 in real space by manipulating the position and tilt of the user terminal 300.

[0083] Specifically, as illustrated in Figure 5(A), in real space, user U points the imaging unit of user terminal 300 towards panel 10 on which a character illustration 12 is drawn. Then, as shown in the example screen of user terminal 300 at the bottom of Figure 5(A), the silhouette image 22 of the AR start operation screen and the start button 23 are displayed superimposed on the image of the real space in which panel 10 may be reflected. In this embodiment, the shape (outline) of the panel approximates the shape (outline) of a complex panel that approximates the shape (outline) of the character illustration drawn on panel 10, and the silhouette image 22 approximates the shape of the illustration. For example, the outline of panel 10 may be formed with a margin of several centimeters to tens of centimeters so as to roughly coincide with the outline of illustration 12, or it may be formed with an outline shape that roughly resembles the outline of the illustration.

[0084] As illustrated in Figure 5(B), user U adjusts their posture and position, such as their own standing position and the angle of the user terminal 300 they are operating, to ensure that the panel 10 and the silhouette image 22 are in alignment. In other words, it is desirable for the user to operate the start button 23 when the silhouette image 22 is superimposed on the panel 10 displayed in the image captured by the imaging unit of the information acquisition unit 350 so that they are in a state of near-perfect alignment (for example, if the degree of alignment is determined by image segmentation, the area where the panel 10 and the silhouette image 22 intersect in the image is determined to be a match of a predetermined value (for example, 90%) or more, or a state of alignment that can be recognized as near-perfect by human perception).

[0085] When the user operates the start button 23, as shown in Figure 5(C), an AR content image, in which character C is superimposed on the image of the real world captured by the imaging unit of the information acquisition unit 350, is played back on the display unit. Specifically, the image processing unit 361 and other units perform a process to match a predetermined position in the real world with the origin of the virtual world, based on the position of the user terminal 300 where the user operated the start button 23 (start operation). This process then combines the image of the virtual world (CG-generated image) with the image of the real world captured by the user terminal 300. Information about the origin of the virtual world is pre-included in the content data received by the user terminal 300. The origin information is predetermined, for example, a position at a predetermined distance from the camera position (camera origin) of the user terminal 300 (for example, 3m away from the camera position), and the relationship between the camera origin and the origin of the virtual world is predetermined.

[0086] Furthermore, in this embodiment, the initial placement information for character C is set to be placed at the origin of the virtual space, and is included in the content data received by the user terminal 300, along with the character's shape model information and motion information. Note that the origin of the virtual space is not limited to the coordinates of the distance from the camera to the panel 10; the origin may also be set between the camera and the panel 10. This makes it possible, for example, to display character C in the direction of the camera (closer to the camera's viewpoint) than the panel 10 when character C is placed at the origin of the virtual space. Moreover, the relationship between the panel position, the camera origin, and the origin of the virtual space is not limited to these, and various positional relationships can be set depending on the content the creator wants to play.

[0087] When the user initiates the process, the real-world coordinate information obtained from the user terminal 300 is processed to match the origin of the virtual space. As a result, character C is placed at the origin, and an AR content video is displayed with character C placed in the real-world image. In this embodiment, if the AR content is played from the correct viewing viewpoint, the coordinates of panel 10 are adjusted to match the origin of the virtual space, and character C is placed at that origin. This makes it appear as if character C is popping out of panel 10, thus enhancing the sense of the character's presence.

[0088] Figure 6 shows an example where building B1 is located behind panel 10, which is placed in a predetermined position in real space. In the example in Figure 6, as explained in Figure 5, etc., character C, which is positioned based on the user operating the start button 23 while matching panel 10 with the silhouette image 22, performs actions based on motion data included in the content data. In Figure 6, because the user performs the start operation from the correct viewing viewpoint, character C performs an action as if entering building B1 in real space through an entrance. In other words, because the start operation is performed from the correct viewing viewpoint (camera origin), the origin of the virtual space is set to a position where character C moves with the correct motion in relation to the camera position for playing the content, and the content data is generated so that character C performs actions that are consistent with the situation in real space (geometrically and in terms of intended use, etc.).

[0089] Furthermore, in this embodiment, based on depth information acquired by the information acquisition unit 350 (imaging unit, etc.) of the user terminal 300, it is possible to occlude the virtual object character C from objects in real space (for example, walls, floors, and other three-dimensional objects). When an occlusion is detected in real space based on the depth information, the position of the occlusion is identified in the virtual space where character C is placed, and processing is performed so that virtual objects located behind the occlusion from the camera's viewpoint are not rendered. As a result, as illustrated in Figure 6, when character C enters building B1, character C located behind the wall of building B1 from the camera's viewpoint is not rendered and is not displayed on the display unit.

[0090] Furthermore, if AR content is played from a viewpoint different from the correct viewing viewpoint, for example, the actions of character C will be displayed in a manner that does not correspond to the originally intended situation in the real world. For example, suppose that in Figure 6, the content is played from a position that is shifted 2m horizontally (for example, to the left in Figure 6) from the originally intended correct viewing viewpoint. In this case, if the content data originally had the ideal position of the user terminal 300 (the correct viewing viewpoint) to be 1m in front of panel 10, then when the playback operation (start operation) is performed, the user terminal 300 will execute a process to align the origin of the virtual space to 1m in front of the position that is shifted 2m horizontally from the original viewing viewpoint, and character C will be displayed at a real-world position that is shifted 2m horizontally from the originally intended position. If this happens, when the character performs an action based on motion data, instead of entering the entrance to the B1 floor of the building, it will move towards the wall next to the entrance. Due to the occlusion process, the character's coordinates will move further behind the coordinates of the real-world object recognized as the wall, resulting in a video that appears to disappear into the wall. This could result in a video that does not match the AR content creator's original intentions.

[0091] However, by starting the operation from the correct viewing point, geometric consistency is achieved between the coordinates of the virtual space where the character is placed and the coordinates of the real space. This allows for a predetermined motion that simulates the situation in the real space, such as character C entering a building through an entrance. It is also possible to create a video where character C beckons towards the outside of building B1. In Figure 6, user U is shown moving as if following character C, but even if user U does not move and the angle of the imaging unit of the user terminal 300 is moved to follow the moving character C, the video of the real space captured by the user terminal 300 will show a video in which character C performs geometrically consistent movements with the real space. Therefore, as character C moves towards a distant building or other object that is further away than the user terminal 300, the display size of character C also decreases.

[0092] Figure 7 shows an example where panel 10 is installed in a shrine, and character C performs the action of praying at the shrine. As explained in Figure 5, when the user roughly matches the silhouette image 22 to panel 10, the start operation is performed at the correct viewing position, and character C, positioned based on the start operation position, performs actions that match the situation of the real-world shrine (positional relationships, purpose of use, etc.). In Figure 7, after character C is displayed based on the start operation, character C moves to the position of the offering box B2 and performs the action of praying. Also, as illustrated in Figure 7, user U can view the image in front of character C from a position as if they were in front of character C. This is because character C is a 3DCG model.

[0093] This allows character C to perform motions that match the actual venue conditions. The motion data may also be motion capture data of the performer playing character C. Based on venue information measured in advance, markers indicating the location of panel 10, building B1, and offering box B2 may be placed in the motion capture studio. The performer may then use these markers as a guide to move into the building and perform actions such as beckoning or praying, and the data of these actions may be recorded and used as motion data for the content.

[0094] Figure 8 is a diagram illustrating the correct viewing viewpoint (position) achieved through the relationship between panel 10 and silhouette image 22. In this embodiment, instead of scanning a marker as in the conventional method, a silhouette image 22 that approximates the shape (outline) of a panel with a complex shape that approximates the shape (outline) of the character illustration is displayed on the AR content start screen. If the panel (or the illustration displayed on the panel) and the silhouette image superimposed on the panel are simply simple shapes such as a square, even if the silhouette image is aligned with the shape of the panel (or the illustration displayed on the panel), the user is unlikely to notice the discrepancy in angle and size. In other words, even if the panel and silhouette image do not actually match, there is a wide range in which the user may mistakenly consider it to be the correct viewing viewpoint, making it difficult to guide the user to the correct viewing viewpoint. On the other hand, in this embodiment, by adopting a complex shape with many curves instead of a simple shape, when there is a discrepancy between the panel whose outline is to be matched and the silhouette image, the discrepancy becomes significant, as illustrated in Figure 8, making it easier for the user to realize that it is not the correct viewing viewpoint. Furthermore, in this embodiment, because the character is humanoid, it tends to have more curves and a combination of shapes of different sizes, resulting in a more complex outline compared to simple shapes. Additionally, the character's design, which may include decorations such as hair ornaments or horns, also tends to complicate the shape. As a result, the AR content in this embodiment can guide the user terminal 300 to the correct viewing point without markers and without relying on the user terminal 300's location information.

[0095] Figure 9 illustrates how the AR content video in this embodiment can be displayed without being affected by environmental factors in the real world, unlike in the past. For example, if panel 10 is installed at an event venue, it is conceivable that a crowd P will form in front of panel 10. In such a case, if, as in the past, the condition for playing the AR content is to scan markers or images displayed on the panel (for example, to set the location of the marker in the real world as the origin in the virtual space), the crowd may prevent the markers from being scanned indefinitely, resulting in waiting times. Furthermore, the quality of printed materials can change due to environmental factors, which may prevent some markers from being read, or the accuracy of the user terminal 300's reading function may prevent the markers from being read, potentially preventing content playback indefinitely.

[0096] Furthermore, if the system uses image segmentation to determine the degree of match between the panel shape and the silhouette image shape, and automatically plays the AR content when the match exceeds a predetermined value (for example, 90% or more), as shown in Figure 9, if a crowd of people gathers in front of the panel, it becomes impossible to align the silhouette image 22. Thus, even if users are near the correct viewpoint, the AR content will not be displayed because the panel 10 is not visible, even though the range of the correct viewpoint is limited, resulting in many users missing out on opportunities.

[0097] However, in this embodiment, regardless of the actual environment, and regardless of whether the panel 10 is visible to the user (or regardless of whether the panel 10 can be detected within the information that can be acquired by the information acquisition unit 350 of the user terminal 300), the AR content can be played by the user operating the start button 23. For example, as shown in Figure 9(A), suppose the shape of the panel 10 is partially visible beyond the crowd P. Based on this small clue, for example, the silhouette image 22 can be aligned to just the top part of the panel 10, and even if the panel 10 is not completely visible to the user, the user can predict that this is the correct viewing point because part of the silhouette matches. Furthermore, since the panel 10 and the silhouette image 22 being aligned are not simple shapes but complex shapes formed by combinations of curves, etc., even if only a part is visible, the user is likely to notice any discrepancies. In this embodiment, the start operation is not automatic but is left to the user. For example, even if a crowd P is gathered in front of panel 10 as shown in Figure 9(A), the start button 23 can be operated to display the AR content image with character C, as shown in Figure 9(B).

[0098] In other words, playback is possible even if the panel is not visible at all, and the user is left to judge whether the viewing viewpoint is correct. Therefore, regarding the accuracy of whether the played AR content is consistent with the real world, it is possible for the user to adjust the degree of accuracy of the degree of consistency with reality based on their own values, deciding whether to demand high accuracy or whether it is acceptable as long as the content can be played and viewed, even if character C may move in a way that is inconsistent with the real world. This makes it possible to experience AR content that is consistent with the real world, unlike before, while also avoiding situations where nothing is played and no content can be experienced.

[0099] Conventionally, it is possible to estimate the location of the user terminal 300 based on GPS information and determine the relationship between the location of the user terminal 300 and the coordinates of the virtual space in which the character operates so that the character's movements are correct. However, depending on environmental factors (e.g., objects blocking radio waves, crowds, etc.) and the specifications of the user terminal 300, the location information may not be obtained correctly. In addition, GPS cannot be used indoors, which limits the places where the content can be experienced. For indoors, it is conceivable to provide the user with AR content in a virtual space constructed based on information obtained by pre-scanning three-dimensional indoor information using photogrammetry, and then estimate the location of the user terminal 300 based on the three-dimensional spatial information of the real space scanned by the user terminal 300 (using a self-localization function, etc.), thereby placing virtual objects at predetermined coordinates in the virtual space and aligning it with the real space. However, constructing an environment using photogrammetry is extremely time-consuming (due to the large number of images required and its dependence on environmental factors such as weather), and because the real-world environment can change (for example, if part of a building is demolished, or if objects that are not rigid bodies change shape are included in the real world, the three-dimensional environment will change from its original state), flexibly responding to changes in reality each time is time-consuming, laborious, burdensome, and difficult.

[0100] Furthermore, it is conceivable to use real-world objects such as buildings, which are immovable and do not move, as markers to determine the coordinates in the virtual space, in order to align the viewpoint (viewing point of view). However, if only large buildings are used as objects for alignment, the location and shape of buildings that can be targeted will be limited, restricting the places where content can be provided. Moreover, AR content providers are not free to decorate the content as they wish, resulting in many constraints.

[0101] On the other hand, in this embodiment, the objects in the real world used to determine the coordinates of the virtual space are not limited to immobile or permanently installed objects, but can be various objects such as panels whose location can be moved. This makes it possible to use panels that are installed only during the event period and that the event provider can freely decorate.

[0102] Furthermore, with the AR content provision method of this embodiment, even if the location of panels, etc., moves due to venue circumstances or human contact, the operation to start the AR content is left to the user. As a result, even if the character's movement is out of sync with its position in the real world when the content is played, it is possible to find the optimal viewpoint by repeatedly playing the content. For example, as shown in Figure 6, judging from the way the AR content is played, it is thought that the original intention was for the character to enter the entrance of building B1, but even if the character enters the wall adjacent to the door when played, the user can make fine adjustments by repeatedly playing the content.

[0103] In this way, even if the object to be aligned is not placed in a position that always guides the user to the correct viewpoint, the user experience will not be compromised, and the user will be able to experience the correct AR content image without being affected by real-world environmental factors.

[0104] Furthermore, if the conditions for playing AR content were to be determined automatically by methods such as marker scanning or automatic image recognition, as mentioned above, even if scanning is successful, if the panel has moved, for example, not everyone would be able to experience the correct AR content. However, in this embodiment, the AR content video can be played at the user's discretion, and playback can be restarted while changing the position, so there is no such risk.

[0105] Referring to Figure 10, the correction of the appearance position of character C in AR content will be explained. Figure 10 is an example of content data stored as content data 121 on the distribution server 100 and distributed to the user terminal 300. Each character in the content data is assigned an ID. In this embodiment, different panels 10 corresponding to each character are created and each is placed in a different location in real space. Different AR content is also provided for each character. In this embodiment, as described above, the content data can be downloaded to the user terminal 300 by scanning the two-dimensional code 11 displayed on the panel 10.

[0106] The content data stored in the distribution server 100 includes, for each character ID, data for a corresponding alignment silhouette image (data for silhouette image 22), 3D model data for each character (display data for character C), and motion data for character C (motion data for character C as explained in Figures 7 and 8). The content data may also include correction values (correction parameters) for the display start position (coordinates in virtual space) of character C. Each two-dimensional code 11 displayed on the panel 10 is associated with content data corresponding to one of the content IDs. For example, when the two-dimensional code 11 associated with content ID_1 is scanned, the content data for content ID_1 is downloaded to the user terminal 300. As a result, the silhouette image data that matches the outline of the panel 10 corresponding to character a associated with content ID_1, the 3D model data of character a, and the motion data for operating character a are stored in the storage unit 320 of the user terminal 300. On the user terminal 300, based on the content data stored in the memory unit 320, a silhouette image 22 corresponding to the outline of the character a's panel 10, and a video of the AR content in which character a is in action will be displayed.

[0107] Next, we will explain the parameters for correcting the placement location. As mentioned earlier, panel 10 may be installed at event venues, etc. In such cases, the placement location may change due to venue constraints, or the panel's position may shift due to physical contact with the installed panel. If, due to these factors, the panel can no longer be installed at the originally planned location within the event venue, it is necessary to adjust the character's appearance position so that character C operates according to the moved installation location of the panel. For example, suppose the installation location is moved 1m to the left from the originally planned location. To address such a situation, based on the on-site information of the change in installation location, a correction value for the character's appearance position is included in the content data corresponding to the panel, which is stored as content data 121 in the distribution server 100. For example, this data may be used to make the character appear at a position 1m in the x-axis direction from the origin. Therefore, even if the AR content is played at a point shifted 1m to the left from the originally planned viewing viewpoint, character C will be positioned in the virtual space to correspond to the coordinates of the originally planned installation location of panel 10. Thus, in this case, the image will appear with character C appearing to the right and slightly in front of the user. This makes it possible to position the character in a location consistent with the motion data, even if the installation position is misaligned. Alternatively, the value of the correction parameter may be added to the original character display position coordinates, or the character display position coordinates may be overwritten by the value of the correction parameter.

[0108] While the example described includes a correction value for the character's appearance position in the data, the data could also be used to correct the character's motion data. For example, if the panel is moved 1m to the left of its originally planned location, a motion could be added in which character C moves 1m in the x-axis direction to ensure consistency.

[0109] Next, with reference to Figure 11, the flow of the aforementioned AR content playback process executed on the user terminal 300 will be explained. The AR content playback process may be made executable, for example, by installing dedicated application software for AR content playback on the user terminal 300.

[0110] In step S101 of Figure 11, content data for playing (displaying) the AR content is loaded. For example, when the two-dimensional code 11 displayed on the panel 10 installed at the event venue, as illustrated in Figure 4(A), is read by the user terminal 300, the content data is downloaded from the distribution server 100 to the user terminal 300, and the process of loading the content data is executed.

[0111] Next, in step S102, a plane scan is performed to determine a horizontal plane corresponding to the floor (ground) of the virtual space. The coordinates of the floor are determined to correspond to the plane of the real space detected by the plane scan. For example, if the user performs an operation to scan the ground or other floor surface by moving the user terminal 300, following instructions to perform a plane scan displayed on the user terminal 300 by the AR application software, the acquired information analysis unit 362 performs plane detection based on the information acquired by the information acquisition unit 350. By performing a plane scan, it becomes possible to display the 3DCG of character C in a manner that makes it appear as if it is standing on the actual ground when it is composited with the image of the real space.

[0112] For example, after downloading content data in step S101, instructions may be displayed to perform plane detection as the first step in displaying AR content based on said content data. Alternatively, after downloading, the user terminal 300 may be shown an option to display the AR content or not, and if the user selects to display it, the process of performing plane detection may be executed.

[0113] Furthermore, the process is not limited to downloading content data in step S101 before performing plane detection in step S102. It may also be possible to enable the display of AR content based on downloaded (or on-the-spot downloaded) content data after plane detection has been performed, using software that enables the display of AR content (which is executed by pre-installing an application for displaying AR, etc.).

[0114] After the planar scan is performed in step S102, in step S103, a process is executed to display a silhouette image and a start button on the user terminal 300. In other words, a process is executed to display the AR start operation screen. For example, as illustrated in Figure 4(B), the AR start operation screen displays a silhouette image 22 that approximates the outline of a panel 10 in real space that approximates the shape of a character drawn on it, and a start button 23 for starting the AR content.

[0115] In step S104, it is determined whether or not the start button has been operated. If it is determined that the start button has been operated by the user, in step S105, the process of playing the AR content is executed and the AR content playback process is terminated. For example, when the start button 23 in Figure 4(B) is tapped by the user, an image of the AR content in which character C is displayed in the real space captured by the imaging unit of the information acquisition unit 350 is played, as illustrated in Figure 5(C). Specifically, based on the position of the user terminal 300 where the user operated the start button 23 (start operation), a process is performed to match a predetermined position in the real space with the origin of the virtual space based on the content data loaded in step S101. As a result, the image (CG) of the virtual space in which character C may be placed and the image of the real space captured by the user terminal 300 are combined. The origin is determined by the relationship between a predetermined camera origin and the origin of the virtual space, as contained in the content data. For example, the origin may be a position at a predetermined distance from the camera position of the user terminal 300 (camera origin) (for example, 3m away from the camera position), or the origin may be a position obtained by correcting the predetermined distance based on correction value data contained in the content data.

[0116] For example, as illustrated in Figures 5(A) and 5(B), a silhouette image 22 is displayed on the user terminal 300 so as to be superimposed on the image of the real space captured by the imaging unit. After the user adjusts the position and tilt of the user terminal 300 to align the silhouette image 22 with the outline of the panel 10, the start button 23 is operated, which allows the character's motion to become consistent with the situation in the real space.

[0117] Furthermore, if the AR content is started from the correct viewing viewpoint, the motion of character C will be executed in a manner consistent with the real-world situation, as illustrated in Figures 6 and 7, based on the motion data of character C included in the content data of character C. However, if the AR content is not started from the correct viewing viewpoint, for example in the example in Figure 7, the AR content may depict the character praying in a location completely different from the offering box B2, instead of the character praying in front of the offering box B2. In such cases, character C may not only appear to be in empty spaces, but may also appear to be embedded in buildings or pillars.

[0118] <Variation> The following are some examples of modifications to the embodiments described above.

[0119] (Automatic playback based on the degree of match) In the embodiment described above, the AR content, which is a composite image of a real-world image and a character C placed in a virtual space, is shown in an example where it is displayed based on a start operation, such as when the user operates the start button 23. However, it is not limited to this, and may also be combined with a system that automatically plays the content if it is determined that the degree of contour matching between the panel 10 and the silhouette image 22 in the real-world image matches by a predetermined range or more. Note that the degree of contour matching refers to the degree of area matching. For example, the system may extract only the panel 10 from the captured image using image segmentation, and then determine whether the area where the panel 10 and the silhouette image 22 intersect in that image is above a certain value, and then start playing the AR content. For example, if the area matches by 90% or more, the display of the AR content image may start even if the user does not operate the start button 23. This allows users to initiate AR content at any time through the aforementioned user actions, while also ensuring that the AR content is played from a suitable viewing perspective regardless of whether the user initiates the start. This means that even if the user has difficulty aligning their position due to hand tremors, the correct image can be automatically played, reducing the burden on the user.

[0120] Furthermore, the automatic matching process based on image segmentation may be enabled or disabled by the user. For example, suppose the position of panel 10 has moved from its originally planned location. In this case, even if the user aligns the silhouette image 22 with panel 10 and initiates the AR content, the character's movements based on the motion data included in the content data may not match the situation in the real world. If the automatic matching process is enabled in such a case, the AR content will be played when the user terminal 300 is positioned in a location that is not the correct viewing viewpoint. In such cases, by setting the automatic matching process to disabled, the user can find the correct viewing viewpoint by repeatedly initiating the AR content launch operation.

[0121] (Marker correction for positional errors of movable objects) In the above-described embodiment, as a method for viewing AR content from the correct viewing viewpoint, an example was described in which a silhouette image 22 is displayed aligned with the panel 10, as explained in Figure 5. Furthermore, the AR content can be started at any time the user desires, and an example was described in which the viewing viewpoint at which the AR content is started is not necessarily the correct viewing viewpoint. In addition, an example was described in which the content may not be played from the correct viewing viewpoint because the panel 10 has moved from its predetermined position. Thus, even if the installation position of the panel 10 in the real world has moved from its predetermined location, an example was described in which a correction value for correcting the display start position of character C, as exemplified in Figure 10, is sent to the user terminal 300 along with the content data, so that when the silhouette image 22 is aligned with the panel 10, the character C's movements will be consistent with the situation in the real world. However, this is not the only way; even if the starting position of the AR content is not the coordinates of the correct viewing viewpoint, it may be corrected to the correct display position by reading a marker. For example, a two-dimensional code for position correction could be placed at the ideal position of panel 10 (for example, on a separate panel from panel 10 displaying the two-dimensional code for position correction, or by attaching it to the ground where panel 10 is installed or to a stationary object in nearby real space (for example, a building, a pillar, etc.)). After the user initiates the AR content, the position correction could be performed when it is determined that the position correction marker is included in the captured image (for example, when the marker is scanned). Alternatively, the position correction marker could not be limited to a two-dimensional code, but could be a predetermined illustration or object, predefined as a marker for playing the various content data in Figure 10, and the adjustment could be performed when the predefined marker is read.

[0122] (Guidance marker to the appropriate location) Furthermore, as shown in Figures 6 and 7, if the content were played from the correct viewing viewpoint, a marker could be placed at the location in the real world where character C is scheduled to move. Even if the AR content is not played from the correct viewing viewpoint, character C can be guided to the coordinates of its pre-planned destination (movement path) (the coordinates in the virtual space when played from the correct viewing viewpoint). This makes it possible to correct the display of character C so that it appears in the correct position, even if the AR content is started at a location different from the ideal correct viewing viewpoint at a location of the user's choosing. In other words, regardless of whether panel 10 is placed in a predetermined position or not, even if the correct AR content video is not played at the time of the start operation, it can be corrected to the correct content video afterward. Also, if the user aligns the silhouette image 22 with panel 10 before playback, even if the location where the AR content playback starts is different from the correct viewing viewpoint, there is a high probability that it is being played at a location that is somewhat close to the correct viewing viewpoint, and the guidance marker is more likely to appear in the video captured by the imaging unit.

[0123] Furthermore, guidance markers may be displayed by placing panels with guidance QR codes along the character C's path, or by attaching them to stationary objects in real space (e.g., buildings, pillars, etc.) adjacent to the character C's predetermined path. After the user initiates the AR content, the guidance markers may be corrected to an arbitrary position when it is determined that they are included in the captured image (e.g., when the markers are scanned). For example, processing may be performed to display the markers at a predetermined distance (e.g., -3 (3m) from the marker's z-coordinate) relative to the detected coordinates. In addition, the position correction markers are not limited to QR codes; predetermined illustrations or objects may be predetermined as markers for playing the various content data in Figure 10, and adjustments may be made when the predetermined markers are read.

[0124] (Correction due to error exceeding a specific value) Furthermore, if the AR content is not started from the correct viewing viewpoint, the user terminal 300 may determine the error from the correct viewing viewpoint and correct the display position of character C if the error is determined to be greater than a certain value, in order to allow the user to view the AR content from the correct viewing viewpoint. For example, similar to the placement of the markers described above as position correction markers and guidance markers, scan targets such as markers may be placed at the ideal placement position of panel 10 or at the location in real space where character C would move if it were played from the correct viewing viewpoint. In the user terminal 300, if the distance between the character's coordinates and the coordinates of the scanned marker (by converting them to virtual space coordinates, etc.) is determined to be greater than a certain value (for example, a straight-line distance of 2m or more between the two points), the display position of the character may be changed to a position corresponding to the marker's coordinates (for example, the marker's coordinates or a position shifted by a predetermined distance from the marker's coordinate position (for example, 1m in a predetermined direction)). Alternatively, based on venue data scanned in advance using photogrammetry (such as the shape of the venue and objects placed in the venue), if the relationship between the venue data scanned by the user terminal 300 (such as 3D data based on depth information) and the character's coordinates is determined to deviate by a certain value or more from the pre-defined ideal positional relationship with the venue, the system may perform a process to move the character to the ideal display position. This ensures that even if there is a slight deviation from the ideal display position, the user experience will not be hindered by shifting the character's display position if the error is within an acceptable range for the user to enjoy the content. Note that when markers are not scanned (not recognized by the user terminal 300), that is, when it is not possible to determine whether the error is above a certain value, the process to change the display position based on the error will not be performed because the markers cannot be scanned. For example, this includes cases where the AR content playback operation is performed in a location completely different from the originally planned real space, such as the user's home, rather than the event venue.

[0125] Furthermore, if an unacceptable error occurs (within a predetermined range), the character's display position can be changed to improve the user experience. In other words, the process of determining whether the coordinates of the virtual space, which are converted based on information such as markers placed in real space, have an error of more than a certain value from the coordinates where the marker should originally be displayed in the virtual space, can also be said to be a process for determining the error between the position in real space estimated by the user terminal 300 (i.e., the actual viewing viewpoint) and the actual viewing viewpoint, rather than a predetermined ideal viewing viewpoint. Note that the specific value of the distance used as the criterion for determining the error can vary depending on the content, so creators can set various values.

[0126] (Regarding cases where character motion data is not included) In the embodiments described above, an example was explained in which the character displayed as AR content is a moving character (virtual object). That is, the content data for displaying AR content includes motion data, and an example was described in which character C performs an action that is consistent with the situation in real space, such as moving position. However, this is not limited to this, and the content data may not include motion data, and character C may not perform any action. Even if there is no action, if the AR content is started from the correct viewing viewpoint, the pose of the displayed character C (the character's movement in a stationary state) may be made to match the situation in real space. For example, a pose such as sitting on a chair in real space or pointing to a predetermined location in real space. Alternatively, even if motion data is included, the display position of character C in virtual space may not move, for example, by remaining at the same position as the initial placement position, but the character may perform an action without moving. In other words, the virtual object to be displayed in AR content is not limited to a moving virtual object, but may also be a non-moving (or non-transforming) virtual object, and may include both moving and non-moving (or non-transforming) virtual objects.

[0127] (The user can choose the appearance location) In the embodiments described above, an example was explained in which the starting position of the character's display is predetermined in relation to the camera coordinates. However, in addition to this, the user may be able to move the placement of character C by tapping. Alternatively, based on character data associated with motion data consistent with the situation in real space, the position tapped by the user may be used as the display position of character C, and the motion may be played. The user may be able to adjust the position they tap arbitrarily based on the playback status of the motion data.

[0128] (Regarding movable objects) In the embodiment described above, a panel 10 with a character illustration drawn on it and approximating the shape (outline) of the character illustration was used as an example of a real-world object to match the silhouette image 22. However, the embodiment is not limited to this, and other objects may be used as objects to match the silhouette image 22. For example, various movable objects can be used as objects to match the silhouette image 22, such as displays that electronically display images, paper media such as posters (for example, attached to signs, bulletin boards, or other objects to which paper media can be attached), and moving objects such as cars. Furthermore, the silhouette image may be a shape that approximates the outline of these objects when viewed from a predetermined position (for example, the correct viewing viewpoint).

[0129] Alternatively, a shape that approximates the outline of the illustration or other artwork drawn on these movable objects may be used as the silhouette image. Preferably, an outline composed of a combination of curves of different sizes is preferred over a simple shape such as a square. This is because a complex shape makes it easier to identify the viewpoint, as mentioned above. In this case, when a shape approximating the outline of an illustration or other artwork is used as the silhouette image, the shape of the display object itself, such as a panel, may be a simple shape such as a square or a circle. Furthermore, the shape of the panel 10 may be complex, while the illustration drawn on it may be a simple shape such as a square. For example, even if the content of the illustration or the character design that the creator wants to draw does not consist of a combination of curves, the outline of the panel 10 on which the illustration is drawn may be made complex. As a result, if the shape of the silhouette image 22 is made to roughly match the outline shape of the panel 10, it becomes easier to guide the user to the correct viewing point.

[0130] In this embodiment, we have described an example in which a real object having the shape (outline) to which the silhouette image 22 is to be aligned can be applied to a movable object such as a panel 10. However, the embodiment is not limited to this, and instead of a movable object, an immovable building or the like with a firmly fixed position may be used as the object to be aligned with the silhouette image 22 from the viewing viewpoint. Even in this case, by displaying the silhouette image 22 of this embodiment on the user terminal 300, guiding the user to the correct viewing viewpoint, and applying AR content playback processing that allows the user to play the AR content at will, it is possible to avoid a deterioration of the user experience, such as the inability to experience the content even if a crowd has formed in front of the building, as explained with reference to Figure 9.

[0131] Furthermore, in the above-described embodiment, the correct orientation (viewing viewpoint) is when a specific object for alignment, such as panel 10, is viewed from the front, and the shape of the silhouette image 22, which is the image for alignment, was described as corresponding to that orientation. However, the silhouette image 22 may also be one that matches the shape of the panel (or the shape of the illustration) when panel 10 is viewed from a direction other than the front. While it is desirable that the panel be viewed from the front (the direction from which the area of the object to be aligned appears largest) as the angle that makes alignment easy for the user, any viewpoint from an angle that shows an area within a range that does not make alignment difficult is acceptable. For example, the orientation of panel 10 viewed from an angle (for example, 45 degrees to the left) may be adopted as the correct viewing viewpoint. This makes it possible to create multiple content data for a single panel. Also, even if there are circumstances in the venue that prevent the front of the panel from being set as the viewing viewpoint, it is possible to determine the positional relationship between the panel and the viewing viewpoint according to the circumstances of the real space.

[0132] Furthermore, movable objects such as panel 10 may have different shapes for each type of content (for example, each character, each character motion, each character costume), or they may have the same shape regardless of the type of content. In addition, the correct placement position and viewing viewpoint of movable objects such as panel 10 may be predetermined to differ for each type of content. For example, multiple AR contents corresponding to one panel 10 may be predetermined, and the first content data that enables the playback of the first AR content may display a first silhouette image 22 on the user terminal 300 that corresponds to the silhouette of the outline of panel 10 when panel 10 is photographed from 1m in front of panel 10, and the second content data that enables the playback of the second AR content may display a corresponding second silhouette image 22 on the user terminal 300 that corresponds to the silhouette of the outline of panel 10 when panel 10 is photographed from 2m in front of the same panel 10 and moved 2m laterally from the front of panel 10. Each piece of content data can be played back with different appearances and motions for character C.

[0133] While the example of movable objects such as panel 10 being installed at an event venue has been described, information that identifies the correct installation location (including orientation) may be displayed on a part of panel 10 so that venue staff and users can understand the correct installation location. For example, a message such as "Please install 3m in front of building A" or a photograph or map that helps identify the installation location may be displayed on the back of panel 10 opposite to the side with illustration 12.

[0134] (Content access information displayed on movable objects) In the embodiment described above, an example was explained in which a two-dimensional code 11 is displayed on the panel 10 as information for receiving content data. However, the invention is not limited to this, and information for the user terminal 300 to read may not be displayed on the panel 10, but other information may be displayed. For example, keywords may be listed, and by entering these keywords on a predetermined web page or within an application, if information matching the predetermined keywords is entered, the content may become downloadable. Alternatively, the user terminal 300 may request the distribution server 100 in advance via an AR application to receive the content data corresponding to the panel 10 in advance, and the information displayed on the panel 10 at the venue may be required to play the content data. In other words, permission to download content data or permission to play downloaded AR content may be granted based on content access information displayed on the panel 10. For example, user information and playback permission information may be associated with each content data stored in the storage unit 120 of the distribution server 100 or the storage unit 320 of the user terminal 300. Regarding the QR code 11 displayed on panel 10 for downloading content data, the example described shows it displayed on the front of panel 10, but it is not limited to this; it may also be displayed on the back of the panel.

[0135] Furthermore, content data may be downloadable without relying on local information. It may also be distributed as data usable at event venues, or it may be distributed under certain restrictions. These restrictions can include various forms, such as whether a purchase operation has been performed, whether the user has the right to participate in the event, whether a specified password has been entered, or whether it is within a specified period. At the event venue, by performing an AR playback operation to display the content video, users can view character motions that are geometrically consistent with the real world, while also being able to play the content even in environments where it is not consistent (for example, the user's home), allowing them to enjoy the content playback itself. As a result, even if the information displayed on panel 10 cannot be read due to crowds at the event venue or deterioration of printed materials, a screen for starting the playback of AR content, such as Figure 4(B), can be displayed.

[0136] In the embodiments described above, an example of downloading any content data was explained with reference to Figure 10, etc., but it is also possible to allow multiple downloads and enable the user to select the content to play. It is also possible to associate multiple content data, which are played at the user's discretion or randomly, with one panel 10 and one corresponding silhouette image 22, allowing the user to experience multiple AR content for a single panel.

[0137] Furthermore, once the content data has been downloaded, it may be stored in the user terminal 300 and be executable (the content can be played) at any time. Also, the corresponding content data may be updated through communication with the server or by scanning the same QR code 11 again. Alternatively, all content data may be downloaded and stored at other specific times, such as when downloading the AR application, and the content data associated with (corresponding to) the QR code 11 scanned in step S101 may be made available for playback.

[0138] (Regarding processes that can be executed on distribution server 100) In the embodiment described above, an example was explained in which the AR content playback process shown in Figure 11 can be executed on the user terminal 300. However, the invention is not limited to this, and some processing (for example, some of the functions of the image processing unit 361) may be performed on the distribution server 100. For example, imaging information and terminal position estimation information that can be acquired by the information acquisition unit 350 of the user terminal 300 may be transmitted to the distribution server 100, and based on the information received by the distribution server 100, virtual space data in which non-real objects are placed may be streamed to the user terminal 300, thereby generating a composite image of AR content on the user terminal 300. Alternatively, for example, information that can be acquired by the information acquisition unit 350 of the user terminal 300 may be transmitted to the distribution server 100, and based on the depth information identified from the information acquired by the information acquisition unit 350, the distribution server 100 may be able to identify shape and coordinate information (information for generating a transparent polygon mesh) corresponding to real objects. Based on the information of the identified object, the distribution server 100 may generate virtual space data with a polygon mask corresponding to the real object, and then distribute the virtual space data with the non-real object placed to the user terminal 300.

[0139] <Examples of LE configuration and effects>

[0140] (1) In the above-described embodiment, the distribution server 100 and the user terminal 300 are capable of generating augmented reality (AR) content video by compositing a virtual object, such as a 3DCG character C, which is a moving CG, onto a video of the real space captured by the information acquisition unit 350. Furthermore, when the user operates the start button 23 for starting playback of AR content, which is displayed on the start operation screen such as Figure 4(B), the AR content video compositing the 3DCG character C shown in Figure 5(C) with the real space, or a video of character C moving in the real space, as shown in Figures 6 and 7, can be displayed on the screen of the user terminal 300 by the AR content playback process shown in Figure 11. Furthermore, after the planar scan in step S102 of Figure 11, in step S103 of Figure 11, during the period in which the user terminal 300 accepts an operation to start the AR content by displaying the start button 23, as in Figure 4(B), Figure 5(A), (B), etc., when the real space is imaged from a predetermined viewing viewpoint, which is a viewpoint that views the panel 10 from a predetermined direction at a predetermined distance from the predetermined installation location of the panel 10, a specific object such as the panel 10 that could be placed in the ideal installation location of the panel 10 in the real space is displayed as a silhouette image, such as the silhouette image 22 in Figure 5(A), (B), etc., in which the outline or outline of the drawn illustration approximately matches. The video of virtual objects such as character C operating in the virtual space is displayed from the correct viewing viewpoint. When the user operates the start button 23 to start playback of the AR content displayed on the start operation screen such as Figure 4(B), the video is displayed in a state where the character is performing actions that match (align with) the situation in the real space (for example, the location and use of buildings, etc.) in relation to the video of the real space captured by the imaging unit of the information acquisition unit 350, as illustrated in Figures 6 and 7. As a result, the augmented reality content video can be displayed when the user initiates the start operation, without requiring the loading of positioning information such as markers as a condition for display. In other words, the judgment of positioning is left to the user.This helps to prevent the user experience from being compromised due to environmental factors preventing the display of content video, and also makes it possible to display content video in which the movement of virtual objects is consistent with the situation in the real world when the real world space is being captured from the correct viewing viewpoint, thereby improving the level of enjoyment.

[0141] (2) In the above-described embodiment, the virtual object operating within the AR content video is a three-dimensional object such as character C, which is a 3DCG avatar object. This allows the user to see the movements of CG such as characters in three-dimensional video, thereby enhancing the user's interest.

[0142] (3) In the embodiment described above, the panel 10 that is the target of the silhouette image 22 placed in real space is depicted with an illustration 12 of the character C displayed in the AR content, and the outline of the panel 10 is formed to approximate the outline of the illustration 12. As a result, the identification image used to match a specific object such as the silhouette image 22 on the screen is a silhouette image that follows the outline of the character illustration, which enhances user interest, and also allows content providers to use complex shapes as identification images.

[0143] (4) In the embodiment described above, as illustrated in Figure 5(B), when the start button 23 is operated on the user terminal 300 screen with the silhouette image 22 superimposed on the panel 10, the real-world coordinates of the panel 10 are aligned with the origin of the virtual space, and character C is placed at the origin, resulting in an AR content video that appears as if character C is popping out of the panel 10. As a result, when the user starts the operation with the silhouette image aligned with the character's illustration, the content video is displayed in a manner that makes it appear as if the character is emerging from a specific object, thereby improving the sense of the character's presence.

[0144] (5) In the embodiment described above, the specific object, such as the panel 10 that is the target of the silhouette image 22, is an object whose placement can be changed depending on the circumstances of the event venue. This makes it possible to move the specific object according to the situation in the real space, thus improving convenience for both the content creator and the event venue.

[0145] (6) In the embodiment described above, even if the silhouette image 22 does not match the panel 10, if it is determined in step S104 that the start button 23 has been operated by the user, the AR content video can be played in step S105. This makes it possible to display the content video even if the outline of a specific object such as a panel does not match the specific image, which is a silhouette image that matches that outline, as long as the user has initiated the start operation. This prevents the content video from not being displayed and the user experience from being impaired due to environmental factors such as the lack of markers or location information.

[0146] (7) In the above-described embodiment, the degree of agreement between the area of panel 10 and the area of silhouette image 22 may be determined by image segmentation. When it is determined that the degree of agreement exceeds a predetermined value, such as 90% or more of the area where panel 10 and silhouette image 22 intersect, the AR content can be played regardless of whether the user has operated the start button 23 or not. This increases user convenience because, if the identification image matches the outline of a specific object (for example, the area where they intersect), the content video can be displayed without any user operation.

[0147] (8) In the embodiment described above, the AR content video is stored as content data 121 such as in Figure 10, downloaded and read to the user terminal 300 based on the two-dimensional code 11 displayed on the panel 10, and displayed based on each content data which includes information on the coordinates of the starting position of the display of the character in the virtual space (such as placing it at the origin). The content data also includes correction value data that corrects the display starting position of predetermined virtual objects, such as the data for correction parameters for the display starting position exemplified in Figure 10. This correction value is a value that is adjusted and determined according to how the placement position of a specific object placed in the real space has moved, for example, if the panel 10 has moved 1m to the left from a predetermined position, the starting position is corrected to the right by 1m. As a result, the display starting position of the virtual object can be corrected according to the placement position of the specific object, so even if the specific object moves from its pre-planned ideal placement position due to venue circumstances or physical contact, it is possible to display content video in which the movement of the virtual object is consistent with the real situation.

[0148] (9) In the above-described embodiment, a process is executed to determine whether the error between the real-world position (actual viewing viewpoint) estimated based on a process for estimating the real-world position of the user terminal when a user initiates a start operation, such as using GPS information or SLAM-based self-position estimation, and the coordinates of a predetermined ideal viewing viewpoint is greater than or equal to a specific value. For example, if the distance between a marker placed at the ideal display position of a panel or character and the character's coordinates is 2m or more, or if the correspondence between venue data (objects of real objects in the real venue) and the coordinates on which the character is displayed exceeds a predetermined acceptable range, it is possible to correct the character's display position to the coordinates of the character as if it were played from a predetermined correct viewing viewpoint. This makes it possible to correct the content video so that the movement of the virtual object is consistent with the situation in the real world, even if the start operation is performed from a viewpoint different from the predetermined viewing viewpoint.

[0149] (10) In the above-described embodiment, even if the placement location of the panel 10 has moved, the coordinates (display position) of character C in the content video being played can be corrected to the ideal position that the user would have displayed if the user had started the content from a predetermined viewing viewpoint, by scanning a marker for identifying a predetermined position of a specific object, such as a correction marker placed on or near the ideal placement location of the panel 10. This makes it possible to correct the content video so that the movement of the virtual object is consistent with the situation in the real world, even if the placement location of the specific object has moved from its predetermined position. Furthermore, even if the placement location of the panel 10 has not moved, and even if the user started the AR content from an undesirable position, the display position of the virtual object can be corrected by reading a position correction marker, such as a correction marker, at the user terminal 300. As a result, it is possible to improve user satisfaction while avoiding a loss of user experience due to the content video not being displayed.

[0150] (11) In the above-described embodiment, regardless of whether the panel 10 is positioned in a predetermined ideal location, the coordinates (display position) to which character C moves are moved, so that the character C's movements are displayed in a state consistent with the situation in the real world, by scanning guiding markers placed in the real world for correcting the display position of virtual objects, such as markers placed near the predetermined movement path of character C. As a result, even if the start operation is performed from a viewpoint different from the predetermined viewing viewpoint, it is possible to correct the content video so that the movements of virtual objects are consistent with the situation in the real world.

[0151] (12) In the above-described embodiment, by image processing to identify depth information and processing to identify depth information acquired by a depth sensor, the coordinates and shapes of real objects included in the captured video are identified based on the depth information, and by converting the position of the real object to the coordinates of the virtual space, mask processing based on depth information becomes possible, and the position and rotation of the camera that captures the real space are reflected in the parameters of the virtual camera in conjunction with the camera in the real space, so that AR content video can be displayed in which virtual objects such as character C are occluded by objects in the real space. As a result, it is possible to display content video in which the geometric consistency between the real space and the virtual space is further emphasized based on depth information, and the sense of reality can be improved.

[0152] (13) In the above-described embodiment, movable objects such as panels and displays that can be installed in the real world display content access information such as a two-dimensional code and a URL. Based on this content access information, it is possible to download content data for playing AR content and grant permissions to enable playback of AR content images based on the content data. Based on this content access information, a silhouette image 22 such as Figure 5(A) and (B) is displayed, which substantially matches the outline of the panel 10 and the outline of the drawn illustration when photographed from a predetermined viewing viewpoint (for example, a viewpoint that views the panel 10 from a predetermined distance and direction) in relation to the installation location of the panel 10, which is a movable object that was planned in advance. Based on this content access information, when the panel 10 is installed in the correct predetermined location and the start button 23 is operated by the user from the corresponding correct viewing viewpoint, it is possible to synthesize a virtual space image that is geometrically consistent with the situation of the real space image being captured, enabling AR content effects, with the real space image and display it on the user terminal 300 as AR content image. On the other hand, even if panel 10 is not placed in the correct location as originally planned, or if playback is not performed from the planned viewing viewpoint, the AR content video itself can still be played, although the character C's movements may be displayed in a manner that does not correspond to the originally planned situation in the real world. This makes it possible to display content video that is consistent with the situation in the real world based on content access information displayed on a movable object, while also being able to display it even in inconsistent situations. Therefore, it is possible to prevent the user experience from being impaired due to the inability to display content video due to environmental factors or other reasons.

[0153] [Examples of implementation using software] The various control blocks of the control unit in the server, terminal, or other computer in the embodiments described above may be implemented by logic circuits (hardware) formed on an integrated circuit (IC chip), or by software using a CPU (Central Processing Unit). When implemented by software using a CPU, the computer equipped with the control unit includes a CPU that executes instructions for a program which is software that realizes each function, a ROM (Read Only Memory) or storage device (collectively referred to as a "recording medium") on which the program and various data are recorded in a readable format for the computer (or CPU), and a RAM (Random Access Memory) for loading the program. The object of the present invention is achieved when the computer (or CPU) reads the program from the recording medium and executes it. As the recording medium, a "non-temporary tangible medium" such as tape, disk, card, semiconductor memory, or programmable logic circuit can be used. The program may also be supplied to the computer via any transmission medium capable of transmitting the program (such as a communication network or broadcast wave). One aspect of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the program is embodied by electronic transmission.

[0154] The embodiments disclosed herein should be considered in all respects to be illustrative and not restrictive. The scope of this invention is indicated by the claims rather than by the foregoing description, and all modifications within the meaning and scope equivalent to the claims are intended to be included. [Explanation of Symbols]

[0155] 1 Communication system, 2 Network, 100 Distribution server, 300 User terminal, 110 Communication unit, 120 Storage unit, 310 Communication unit, 320 Storage unit, 330 Operation unit, 340 Output unit, 350 Information acquisition unit, 360 Control unit

Claims

1. A computer capable of generating augmented reality content by compositing moving virtual objects onto images of real space. A means for performing a process to display content video, which is a composite of video footage of the real world captured by the user and video footage of virtual objects operating in a virtual space, when the user initiates the start operation. During the period in which the aforementioned start operation is accepted, the system functions as a means for performing a process to display a specific image, which is a silhouette image that matches a specific object that can be placed at a predetermined position in the real space when the real space is imaged from a predetermined viewing viewpoint. The video of the virtual object operating in the aforementioned virtual space is A program in which, when initiated by a user from the aforementioned viewing perspective, the movement of the virtual object becomes consistent with the situation in the real space in relation to the captured image of the real space.

2. The program according to claim 1, wherein the virtual object is an object with a three-dimensional solid shape.

3. The aforementioned virtual object is a character, The program according to claim 1, wherein a specific object placed in real space has an illustration of a character displayed on it, and the outline of the specific object is the outline of the illustration of the character.

4. The program according to claim 3, wherein the video of the character operating in the virtual space is a video that is started by the user with the specific image superimposed on the outline of the character's illustration, so that the character displayed on the specific object appears in the real space.

5. The program according to claim 1, wherein the specified object can be placed at a location different from the predetermined location in real space.

6. The means for performing the process to display the aforementioned content video is: The program according to claim 1, wherein the content video, which is played back based on a start operation by the user, is displayed when a start operation is performed, even if the specific image and the contour of the specific object placed in real space do not match.

7. The aforementioned computer, This system functions as a means for performing a process that determines the degree of agreement between the contour of the identification image and the contour of the specific object placed in the real space, based on images of the real space captured by the user. The program according to claim 6, wherein the means for performing the process for displaying the content video is determined by the means for performing the determination process to determine that the degree of match exceeds a predetermined value, and the content video is displayed without any start operation by the user.

8. The aforementioned content video is displayed based on content display data which includes information on the starting position of the virtual object. The content display data includes correction value data for correcting the predetermined display start position of the virtual object. The program according to claim 1, wherein the correction value is a value determined according to the position of the specific object placed in real space.

9. The aforementioned computer, A means for performing a process to estimate the real-world position of the user terminal when a user initiates an operation, This means functions as a means for performing a process to determine whether the error between the position in real space estimated by the means for performing the estimation process and the coordinates of the predetermined viewing viewpoint is greater than or equal to a specific value. The program according to claim 1, wherein the means for performing the process for displaying the content video is determined by the means for performing the determination process to be greater than or equal to a specific value, and the program performs a process to correct the display position of the virtual object to a position that would be displayed if the user initiated the operation from the predetermined viewing viewpoint.

10. The program according to claim 5, wherein the means for performing the process for displaying the content video corrects the display position of the virtual object to a position that can be displayed from a predetermined viewing viewpoint if the user has initiated the operation, even if the location of the specific object has moved, by scanning a marker placed in real space for identifying the predetermined position of the specific object placed in real space.

11. The program according to claim 5, wherein the means for performing the process for displaying the content video is to move the display position of the virtual object in the content video so that the movement of the virtual object is displayed in a manner consistent with the situation in the real space, by scanning a guidance marker placed in the real space for correcting the display position of the virtual object, regardless of whether the specific object in the real space was placed at the predetermined position.

12. The aforementioned computer, A means for performing a process to acquire depth information that makes it possible to identify the depth position of each real object that may be included in a video of real space captured by a user, Based on the depth information that can be identified from the information obtained by the means for performing the process for obtaining the aforementioned depth information, the means functions to perform a process that identifies real objects that may be included in the captured image and identifies the depth position of the real objects in the virtual space in which the virtual objects operate. The means for performing the process to display the content video moves the position of the virtual camera placed in the virtual space in conjunction with the user's movement of the viewpoint for capturing the video in the real space. The program according to claim 1, wherein even if the position of the imaging viewpoint is moved, the portion of the virtual object that overlaps with a real object located in front of the virtual object is hidden and the portion that does not overlap is displayed in the content video, based on the depth information identified by the means for performing the processing for acquiring the depth information.

13. A method for generating augmented reality content video by compositing moving virtual objects onto images of real space, The process involves causing the computer to perform a process to display content video, which is a composite of video footage of the real world captured by the user and video footage of virtual objects operating in a virtual space, when the user initiates the process. The process includes, during the period in which the aforementioned start operation is accepted, causing a computer to perform a process to display a specific image, which is a silhouette image that matches a specific object that can be placed at a predetermined position in the real space when the real space is imaged from a predetermined viewing viewpoint, The video of the virtual object operating in the aforementioned virtual space is A method wherein, when the user initiates the operation from the aforementioned viewing viewpoint, the resulting image is one in which the movement of the virtual object is consistent with the situation in the real space in relation to the captured image of the real space.

14. A system comprising a computer capable of generating augmented reality content images by compositing moving virtual objects onto images of real space, A means for performing a process to display content video, which is a composite of video footage of the real world captured by the user and video footage of virtual objects operating in a virtual space, when the user initiates the start operation. During the period in which the aforementioned start operation is accepted, the system includes means for performing a process to display a specific image, which is a silhouette image that matches a specific object that can be placed at a predetermined position in the real space when the real space is imaged from a predetermined viewing viewpoint, The video of the virtual object operating in the aforementioned virtual space is A system in which, when initiated by a user from the aforementioned viewing perspective, the movement of the virtual object becomes consistent with the situation in the real space in relation to the captured image of the real space.

15. A movable object that can be placed in real space, The movable object displays content access information. The content access information is a movable object, which is information that displays a silhouette image that matches the movable object when the movable object is photographed from a predetermined viewing position in relation to the movable object, and when the movable object is placed in a predetermined position in real space and is photographed from the viewing viewpoint, and in relation to the image of real space captured by a user's start operation, it is possible to display a performance video that is consistent with the situation in real space, which is an augmented reality performance video, by compositing it with the image of real space, while when the movable object is not placed in the predetermined position in real space or is not photographed from the viewing viewpoint, and in relation to the image of real space captured by a user's start operation, it is possible to display the performance video that is inconsistent with the situation in real space, by compositing it with the image of real space,