Character display device and AI assistant display device
The character display device addresses the limitations of existing airborne video display technologies by incorporating a display unit, speaker, microphone, and communication unit with a large-scale language model for enhanced brightness and interactive user experience.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- MAXELL LTD
- Filing Date
- 2023-03-16
- Publication Date
- 2026-06-12
Smart Images

Figure 0007873644000001 
Figure 0007873644000002 
Figure 0007873644000003
Abstract
Description
【Technical Field】 【0001】 The present invention relates to Character display device and AI assistant display device . 【Background Art】 【0002】 Regarding the airborne information display technology, for example, it is disclosed in Patent Document 1. 【Prior Art Document】 【Patent Document】 【0003】 【Patent Document 1】 Japanese Patent Application Laid-Open No. 2019-128722 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0004】 However, in the disclosure of Patent Document 1, the consideration regarding the configuration for obtaining the practical brightness and quality of the airborne video, and the configuration for the user to visually recognize the airborne video more enjoyably, etc., was not sufficient. 【0005】 An object of the present invention is to provide a more suitable airborne video display device. 【Means for Solving the Problems】 【0006】 In order to solve the above problems, for example, the configuration described in the claims is adopted. This application includes a plurality of means for solving the above problems. If an example is given, A character display device that allows conversation with a character comprises a display unit capable of displaying a character, a speaker, a microphone, a communication unit, and a control unit, wherein the communication unit is capable of communicating with a server capable of performing inference on a large-scale language model which is an artificial intelligence, and transmits an instruction sentence containing natural language text information to the server, receives a response containing natural language text information from the server, and the speaker outputs a synthesized natural language voice that is heard by the user as the voice of the character displayed on the display unit, based on the text information contained in the response. it may be configured as follows. 【Effects of the Invention】 【0007】 According to the present invention, a more suitable airborne video display device can be realized. Other problems, configurations, and effects will be clarified in the description of the following embodiments. 【Brief Description of the Drawings】 【0008】 [Figure 1] This figure shows an example of how to use a spatially floating image display device according to one embodiment of the present invention. [Figure 2A] This figure shows an example of the main component configuration and retroreflective component configuration of a spatial floating image display device according to one embodiment of the present invention. [Figure 2B] This figure shows an example of the main component configuration and retroreflective component configuration of a spatial floating image display device according to one embodiment of the present invention. [Figure 2C] This figure shows an example of the main component configuration and retroreflective component configuration of a spatial floating image display device according to one embodiment of the present invention. [Figure 2D] This figure shows an example of the main components and retroreflective components of an aerial levitation image display device according to one embodiment of the present invention. [Figure 2E] This is a projection view of a retroreflective plate that constitutes an aerial levitation image display device according to one embodiment of the present invention. [Figure 2F] This is a top view of a retroreflective plate that constitutes an aerial levitation image display device according to one embodiment of the present invention. [Figure 2G] This is a perspective view showing a corner reflector that constitutes a retroreflective plate, which is part of an aerial floating image display device according to one embodiment of the present invention. [Figure 2H] This is a top view showing a corner reflector, which constitutes a retroreflective plate, as part of an aerial floating image display device according to one embodiment of the present invention. [Figure 2I] This is a side view showing a corner reflector, which constitutes a retroreflective plate, as part of an aerial floating image display device according to one embodiment of the present invention. [Figure 3] This figure shows an example of the configuration of a spatially floating image display device according to one embodiment of the present invention. [Figure 4A] This figure shows an example of the configuration of a spatially floating image display device according to one embodiment of the present invention. [Figure 4B] This figure shows an example of the configuration of a spatially floating image display device according to one embodiment of the present invention. [Figure 4C] This figure shows an example of the configuration of a spatially floating image display device according to one embodiment of the present invention. [Figure 4D] It is a diagram showing an example of the configuration of a spatial floating image display device according to an embodiment of the present invention. [Figure 4E] It is a diagram showing an example of the configuration of a spatial floating image display device according to an embodiment of the present invention. [Figure 4F] It is a diagram showing an example of the configuration of a spatial floating image display device according to an embodiment of the present invention. [Figure 4G] It is a diagram showing an example of the configuration of a spatial floating image display device according to an embodiment of the present invention. [Figure 4H] It is a diagram showing an example of the configuration of a spatial floating image display device according to an embodiment of the present invention. [Figure 4I] It is a diagram showing an example of the configuration of a spatial floating image display device according to an embodiment of the present invention. [Figure 4J] It is a diagram showing an example of the configuration of a spatial floating image display device according to an embodiment of the present invention. [Figure 4K] It is a diagram showing an example of the configuration of a spatial floating image display device according to an embodiment of the present invention. [Figure 4L] It is a diagram showing an example of the configuration of a spatial floating image display device according to an embodiment of the present invention. [Figure 4M] It is a diagram showing an example of the configuration of a spatial floating image display device according to an embodiment of the present invention. [Figure 4N] It is a diagram showing an example of the configuration of a spatial floating image display device according to an embodiment of the present invention. [Figure 4O] It is a diagram showing an example of the configuration of a spatial floating image display device according to an embodiment of the present invention. [Figure 5] It is a cross-sectional view showing an example of the specific configuration of a light source device according to an embodiment of the present invention. [Figure 6] It is a cross-sectional view showing an example of the specific configuration of a light source device according to an embodiment of the present invention. [Figure 7] It is a cross-sectional view showing an example of the specific configuration of a light source device according to an embodiment of the present invention. [Figure 8] It is an arrangement diagram showing the main part of a spatial floating image display device according to an embodiment of the present invention. [Figure 9]This is a cross-sectional view showing the configuration of a display device according to one embodiment of the present invention. [Figure 10] This is a cross-sectional view showing the configuration of a display device according to one embodiment of the present invention. [Figure 11] This is an explanatory diagram illustrating the light source diffusion characteristics of an image display device according to one embodiment of the present invention. [Figure 12] This is an explanatory diagram illustrating the diffusion characteristics of an image display device according to one embodiment of the present invention. [Figure 13A] This is an explanatory diagram illustrating an example of a problem that the image processing according to one embodiment of the present invention solves. [Figure 13B] This is an explanatory diagram of an example of image processing according to one embodiment of the present invention. [Figure 13C] This is an explanatory diagram illustrating an example of video display processing according to one embodiment of the present invention. [Figure 13D] This is an explanatory diagram illustrating an example of video display processing according to one embodiment of the present invention. [Figure 14] This figure shows an example of the main component configuration and retroreflective component configuration of a spatial floating image display device according to one embodiment of the present invention. [Figure 15A] This is an explanatory diagram illustrating an example of a display example of a spatial floating image display device according to one embodiment of the present invention. [Figure 15B] This is an explanatory diagram illustrating an example of a display example of a spatial floating image display device according to one embodiment of the present invention. [Figure 15C] This is an explanatory diagram illustrating an example of a display example for a floating image display device. [Figure 15D] This is an explanatory diagram illustrating an example of user perception regarding the display of a floating image display device. [Figure 16A] This is an explanatory diagram illustrating an example of a method for generating a rendered image of a 3D model of a character in a virtual 3D space. [Figure 16B] This is an explanatory diagram illustrating an example of the display process for a rendering image of a spatial floating image display device according to one embodiment of the present invention. [Figure 16C] This is an explanatory diagram illustrating an example of the user's visual viewing distance for a spatially floating image display device according to one embodiment of the present invention. [Figure 16D] This is an explanatory diagram illustrating an example of the calculation results for arm length based on survey results. [Figure 16E] This is an explanatory diagram illustrating an example of the focal length of a virtual 3D space camera used for rendering according to one embodiment of the present invention. [Figure 16F] This is an explanatory diagram illustrating an example of the focal length of a virtual 3D space camera used for rendering according to one embodiment of the present invention. [Figure 17A] This is an explanatory diagram illustrating an example of a display example of a spatial floating image display device according to one embodiment of the present invention. [Figure 17B] This is an explanatory diagram illustrating an example of user recognition regarding the display of a floating image display device according to one embodiment of the present invention. [Figure 18] This is an explanatory diagram illustrating an example of a predetermined region in a floating image of a floating image display device according to one embodiment of the present invention. [Figure 19A] This is an explanatory diagram of an example of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 19B] This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 19C] This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 19D] This is an explanatory diagram illustrating an example of a conversation in a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 19E] This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 19F] This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 19G] This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 19H] This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 19I]This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 19J] This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 20A] This is an explanatory diagram of an example of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 20B] This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 20C] This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 20D] This is an explanatory diagram illustrating an example of a conversation in a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 20E] This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 20F] This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 20G] This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Figure 20H] This is an explanatory diagram illustrating an example of the operation of a character conversation device and character conversation system according to one embodiment of the present invention. [Modes for carrying out the invention] 【0009】 Embodiments of the present invention will be described in detail below with reference to the drawings. However, the present invention is not limited to the examples described herein, and various modifications and alterations are possible by those skilled in the art within the scope of the technical ideas disclosed herein. Furthermore, in all the figures used to illustrate the present invention, components having the same function are given the same reference numerals, and repeated descriptions may be omitted. 【0010】 The following embodiments relate to an image display device capable of transmitting an image generated by image light from an image light source through a transparent component that partitions a space, such as glass, and displaying it as a floating image in space outside the transparent component. In the following description of embodiments, the image floating in space is referred to as a "floating image in space." Instead of this term, other terms such as "aerial image," "spatial image," "floating image in space," "floating optical image of a displayed image," or "floating optical image of a displayed image in space" may be used. The term "floating image in space," which is mainly used in the description of embodiments, is used as a representative example of these terms. 【0011】 According to the following embodiment, a suitable video display device can be realized for applications such as bank ATMs, train station ticket machines, and digital signage. For example, currently, bank ATMs and train station ticket machines typically use touch panels, but by using a transparent glass surface or a light-transmitting plate material, high-resolution video information can be displayed on this glass surface or light-transmitting plate material in a state of floating in space. At this time, by making the divergence angle of the emitted video light small, i.e., acute, and further aligning it to a specific polarization, only the normal reflected light is efficiently reflected by the retroreflector, resulting in high light utilization efficiency. This suppresses ghost images that occur in addition to the main floating image, which was a problem in conventional retroreflection methods, and allows for the acquisition of a clear floating image. Furthermore, the device including the light source of this embodiment can provide a novel and highly usable floating image display device (floating image display system) that can significantly reduce power consumption. In addition, for example, a floating image display device for vehicles can be provided that enables so-called unidirectional floating image display, which is visible inside and / or outside a vehicle. <Example 1> 【0012】 <An example of how a spatially floating image display device can be used> Figure 1 is a diagram showing an example of how to use a spatially floating image display device according to one embodiment of the present invention, and is a diagram showing the overall configuration of the spatially floating image display device according to this embodiment. The specific configuration of the spatially floating image display device will be described in detail using Figure 2, etc., but light with narrow-angle directivity and specific polarization is emitted from the image display device 1 as an image light beam, and after reflection in the optical system inside the spatially floating image display device, it enters the retroreflector plate 2, is retroreflected and passes through a transparent member 100 (glass, etc.), and forms an aerial image (spatially floating image 3), which is a real image, on the outside of the glass surface. In the following embodiments, the retroreflector plate 2 (retroreflective plate) will be used as an example of a retroreflective member. However, the retroreflector plate 2 of the present invention is not limited to a planar plate, but is used as an example of a concept that includes a sheet-like retroreflector attached to a planar or non-planar member, or the entire assembly in which a sheet-like retroreflector is attached to a planar or non-planar member. Furthermore, since the light rays reflected by the retroreflector 2 have imaging optical properties, the retroreflector 2 may also be described as an imaging optical member or imaging optical plate. 【0013】 Furthermore, in stores and other similar establishments, the space is partitioned by a translucent material such as glass, called a "show window" (also known as "window glass") 105. According to the spatial floating image display device of this embodiment, it is possible to transmit such a transparent material and display the floating image in one direction to the outside and / or inside of the store (space). 【0014】 In Figure 1, the inside of the window glass 105 (inside the store) is shown in the depth direction, and the outside (for example, the sidewalk) is shown in the foreground. Alternatively, by providing the window glass 105 with means for reflecting specific polarizations, it is also possible to reflect the light and form an aerial image at a desired location inside the store. 【0015】 <Example of optical system configuration for a spatially floating image display device> Figure 2A is a diagram showing an example of the configuration of the optical system of a spatially floating image display device according to one embodiment of the present invention. The configuration of the spatially floating image display device will be explained in more detail using Figure 2A. As shown in Figure 2A(1), a display device 1 is provided that emits image light of a specific polarization at a narrow angle in the oblique direction of a transparent member 100 such as glass. The display device 1 comprises a liquid crystal display panel 11 and a light source device 13 that generates light of a specific polarization having narrow-angle diffusion characteristics. 【0016】 The image light of a specific polarization from the display device 1 is reflected by a polarization separation member 101 (in the figure, the polarization separation member 101 is formed in a sheet shape and adhered to the transparent member 100) which has a film that selectively reflects the image light of the specific polarization, and is incident on the retroreflector 2. A λ / 4 plate 21 is provided on the image light incident surface of the retroreflector 2. The image light is polarized from the specific polarization to the other polarization by passing through the λ / 4 plate 21 twice, once when it is incident on the retroreflector 2 and once when it is emitted. Here, the polarization separation member 101, which selectively reflects the image light of the specific polarization, has the property of transmitting the polarization of the other polarization after polarization conversion, so the image light of the specific polarization after polarization conversion is transmitted through the polarization separation member 101. The image light that has been transmitted through the polarization separation member 101 forms a spatially floating image 3, which is a real image, on the outside of the transparent member 100. Note that in Figure 2A, the principal ray of the image light incident on the retroreflector 2 is shown as being incident at a 90° angle to the retroreflector 2. However, the incident angle of the principal ray of the image light on the retroreflector 2 is not limited to 90°; for example, 90°±15° can also be used. 【0017】 Here, we will describe a first example of polarization design in the optical system shown in Figure 2A. For example, the display device 1 may emit S-polarized image light to the polarization separation member 101, and the polarization separation member 101 may have the characteristic of reflecting S-polarized light and transmitting P-polarized light. In this case, the S-polarized image light that reaches the polarization separation member 101 from the display device 1 is reflected by the polarization separation member 101 and heads towards the retroreflector 2. When the image light is reflected by the retroreflector 2, it passes through the λ / 4 plate 21 provided on the incident surface of the retroreflector 2 twice, so the image light is converted from S-polarized to P-polarized light. The image light converted to P-polarized light heads towards the polarization separation member 101 again. Here, since the polarization separation member 101 has the characteristic of reflecting S-polarized light and transmitting P-polarized light, the P-polarized image light passes through the polarization separation member 101 and then through the transparent member 100. Since the image light transmitted through the transparent member 100 is light generated by the retroreflector 2, a floating image 3, which is an optical image of the display image of the display device 1, is formed at a position that is mirror-like to the display image of the display device 1 with respect to the polarization separation member 101. With such a polarization design, a floating image 3 can be suitably formed. 【0018】 Next, a second example of polarization design in the optical system shown in Figure 2A will be described. For example, the display device 1 may emit P-polarized image light to the polarization separation member 101, and the polarization separation member 101 may be configured to reflect P-polarized light and transmit S-polarized light. In this case, the P-polarized image light that reaches the polarization separation member 101 from the display device 1 is reflected by the polarization separation member 101 and heads towards the retroreflector 2. When the image light is reflected by the retroreflector 2, it passes through the λ / 4 plate 21 provided on the incident surface of the retroreflector 2 twice, so the image light is converted from P-polarized to S-polarized light. The image light converted to S-polarized light heads towards the polarization separation member 101 again. Here, since the polarization separation member 101 has the characteristic of reflecting P-polarized light and transmitting S-polarized light, the S-polarized image light passes through the polarization separation member 101 and then through the transparent member 100. Since the image light transmitted through the transparent member 100 is light generated by the retroreflector 2, a floating image 3, which is an optical image of the display image of the display device 1, is formed at a position that is mirror-like to the display image of the display device 1 with respect to the polarization separation member 101. With such a polarization design, a floating image 3 can be suitably formed. 【0019】 The light that forms the floating image 3 is a collection of light rays converging from the retroreflector 2 to the optical image of the floating image 3, and these light rays continue to travel in a straight line even after passing through the optical image of the floating image 3. Therefore, unlike the diffused image light formed on a screen by a typical projector, the floating image 3 is an image with high directivity. Thus, in the configuration of Figure 2A, when a user views from the direction of arrow A, the floating image 3 is visible as a bright image. However, when another person views from the direction of arrow B, the floating image 3 cannot be seen as an image at all. This characteristic is very suitable for use in systems that display images requiring high security or highly confidential images that should be hidden from people directly facing the user. 【0020】 Furthermore, depending on the performance of the retroreflector 2, the polarization axis of the reflected image light may become uneven. The reflection angle may also become uneven. Such uneven light may not maintain the polarization state and propagation angle assumed in the design. For example, light with such an unintended polarization state and propagation angle may re-enter the image display side of the liquid crystal display panel 11 directly from the position of the retroreflector 2 without passing through the polarization separation member. Light with such an unintended polarization state and propagation angle may also be reflected by components within the floating image display device and then re-enter the image display side of the liquid crystal display panel 11. This re-entered light on the image display side of the liquid crystal display panel 11 may be re-reflected by the image display surface of the liquid crystal display panel 11 that constitutes the display device 1, potentially generating ghost images and degrading the image quality of the floating image. Therefore, in this embodiment, an absorbing polarizer 12 may be provided on the image display surface of the display device 1. The image light emitted from the display device 1 is transmitted through the absorbing polarizer 12, and the reflected light returning from the polarization separation member 101 is absorbed by the absorbing polarizer 12, thereby suppressing the above-mentioned re-reflection. This prevents image quality degradation due to ghost images of floating images in space. Specifically, if the display device 1 emits S-polarized image light to the polarization separation member 101, the absorbing polarizer 12 should be a polarizer that absorbs P-polarized light. Alternatively, if the display device 1 emits P-polarized image light to the polarization separation member 101, the absorbing polarizer 12 should be a polarizer that absorbs S-polarized light. 【0021】 The polarization separation member 101 described above may be formed, for example, from a reflective polarizer or a multilayer metal film that reflects specific polarizations. 【0022】 Next, Figure 2A(2) shows an example of the surface shape of a typical retroreflector 2. Light rays incident inside the regularly arranged hexagonal prisms are reflected by the walls and bottom surfaces of the hexagonal prisms and emitted as retroreflected light in the direction corresponding to the incident light, and a spatially floating image, which is a real image, is displayed on the display device 1. 【0023】 The resolution of this floating image depends not only on the resolution of the liquid crystal display panel 11, but also largely on the outer diameter D and pitch P of the retroreflective portion of the retroreflective plate 2 shown in Figure 2A(2). For example, when using a 7-inch WUXGA (1920 x 1200 pixels) liquid crystal display panel, even if one pixel (one triplet) is approximately 80 μm, if the diameter D of the retroreflective portion is 240 μm and the pitch is 300 μm, then one pixel of the floating image will be equivalent to 300 μm. As a result, the effective resolution of the floating image is reduced to about one-third. 【0024】 Therefore, in order to make the resolution of the floating image in space equivalent to the resolution of the display device 1, it is desirable to make the diameter and pitch of the retroreflective portion close to that of one pixel of the liquid crystal display panel. On the other hand, in order to suppress the occurrence of moiré patterns caused by the retroreflective plate and the pixels of the liquid crystal display panel, it is advisable to design the respective pitch ratios to be outside of integer multiples of one pixel. Furthermore, the shape should be arranged so that none of the sides of the retroreflective portion overlap with any of the sides of one pixel of the liquid crystal display panel. 【0025】 The surface shape of the retroreflector in this embodiment is not limited to the examples described above. It may have various surface shapes that realize retroreflection. Specifically, retroreflective elements formed by periodically arranging triangular pyramidal prisms, hexagonal pyramidal prisms, other polygonal prisms, or combinations thereof may be provided on the surface of the retroreflector in this embodiment. Alternatively, retroreflective elements that form cube corners by periodically arranging these prisms may be provided on the surface of the retroreflector in this embodiment. These can also be expressed as corner reflector arrays or polyhedron reflector arrays. Alternatively, capsule lens type retroreflective elements formed by periodically arranging glass beads may be provided on the surface of the retroreflector in this embodiment. Since the detailed configuration of these retroreflective elements can be described using existing technology, a detailed explanation is omitted. Specifically, the technology disclosed in Japanese Patent Publication No. 2001-33609, Japanese Patent Publication No. 2001-264525, Japanese Patent Publication No. 2005-181555, Japanese Patent Publication No. 2008-70898, Japanese Patent Publication No. 2009-229942, etc., can be used. 【0026】 <Another example of the optical system configuration for a spatially floating image display device 1> Another example of the optical system configuration for the floating image display device will be explained using Figure 2B. In Figure 2B, components with the same reference numerals as those in Figure 2A have the same function and configuration as those in Figure 2A. For simplicity, repeated explanations of such components will be omitted. 【0027】 In the optical system shown in Figure 2B, as in Figure 2A, image light with a specific polarization is output from the display device 1. The image light with a specific polarization output from the display device 1 is input to the polarization separation member 101B. The polarization separation member 101B is a member that selectively transmits image light with a specific polarization. Unlike the polarization separation member 101 in Figure 2A, the polarization separation member 101B is not integrated with the transparent member 100, but has an independent plate-like shape. Therefore, the polarization separation member 101B may also be described as a polarization separation plate. The polarization separation member 101B may be configured as a reflective polarizer, for example, by attaching a polarization separation sheet to a transparent member. Alternatively, it may be formed by a metal multilayer film that selectively transmits the specific polarization and reflects the polarization of other specific polarizations to the transparent member. In Figure 2B, the polarization separation member 101B is configured to transmit image light with a specific polarization output from the display device 1. 【0028】 The image light that has passed through the polarization separation member 101B is incident on the retroreflector 2. A λ / 4 plate 21 is provided on the image light incident surface of the retroreflector. The image light is polarized from one polarization to the other by passing through the λ / 4 plate 21 twice, once when it is incident on the retroreflector and once when it is emitted. Here, the polarization separation member 101B has the property of reflecting the polarization of the other polarization that has been polarized by the λ / 4 plate 21, so the image light after polarization conversion is reflected by the polarization separation member 101B. The image light reflected by the polarization separation member 101B passes through the transparent member 100, forming a spatially floating image 3, which is a real image, on the outside of the transparent member 100. 【0029】 Here, we will describe a first example of polarization design in the optical system shown in Figure 2B. For example, the display device 1 may emit P-polarized video light to the polarization separation member 101B, and the polarization separation member 101B may have the characteristic of reflecting S-polarized light and transmitting P-polarized light. In this case, the P-polarized video light that reaches the polarization separation member 101B from the display device 1 passes through the polarization separation member 101B and heads towards the retroreflector 2. When the video light is reflected by the retroreflector 2, it passes through the λ / 4 plate 21 provided on the incident surface of the retroreflector 2 twice, so the video light is converted from P-polarized to S-polarized light. The video light converted to S-polarized light heads towards the polarization separation member 101B again. Here, since the polarization separation member 101B has the characteristic of reflecting S-polarized light and transmitting P-polarized light, the S-polarized video light is reflected by the polarization separation member 101 and transmitted through the transparent member 100. Since the image light transmitted through the transparent member 100 is light generated by the retroreflector 2, a floating image 3, which is an optical image of the display image of the display device 1, is formed at a position that is mirror-like to the display image of the display device 1 with respect to the polarization separation member 101B. With such a polarization design, a floating image 3 can be suitably formed. 【0030】 Next, a second example of polarization design in the optical system shown in Figure 2B will be described. For example, the display device 1 may be configured to emit S-polarized video light to the polarization separation member 101B, and the polarization separation member 101B may be configured to reflect P-polarized light and transmit S-polarized light. In this case, the S-polarized video light that reaches the polarization separation member 101B from the display device 1 passes through the polarization separation member 101B and heads towards the retroreflector 2. When the video light is reflected by the retroreflector 2, it passes through the λ / 4 plate 21 provided on the incident surface of the retroreflector 2 twice, so the video light is converted from S-polarized to P-polarized light. The video light converted to P-polarized light heads towards the polarization separation member 101B again. Here, since the polarization separation member 101B has the characteristic of reflecting P-polarized light and transmitting S-polarized light, the P-polarized video light is reflected by the polarization separation member 101 and transmitted through the transparent member 100. Since the image light transmitted through the transparent member 100 is light generated by the retroreflector 2, a floating image 3, which is an optical image of the display image of the display device 1, is formed at a position that is mirror-like to the display image of the display device 1 with respect to the polarization separation member 101B. With such a polarization design, the floating image 3 can be suitably formed. 【0031】 In Figure 2B, the image display surface of the display device 1 and the surface of the retroreflector 2 are arranged parallel to each other. The polarization separation member 101B is positioned at an angle α (e.g., 30°) relative to the image display surface of the display device 1 and the surface of the retroreflector 2. As a result, in the reflection by the polarization separation member 101B, the direction of propagation of the image light reflected by the polarization separation member 101B (the direction of the principal ray of the image light) differs from the direction of propagation of the image light incident from the retroreflector 2 (the direction of the principal ray of the image light) by an angle β (e.g., 60°). With this configuration, in the optical system of Figure 2B, image light is output at a predetermined angle shown toward the outside of the transparent member 100, forming a real image of a floating image in space 3. In the configuration of Figure 2B, when a user views from the direction of arrow A, the floating image in space 3 is visible as a bright image. However, when another person views from the direction of arrow B, the floating image in space 3 cannot be seen as an image at all. This characteristic makes it ideal for systems that display video requiring high security, or highly confidential video that should be kept hidden from the user. 【0032】 As explained above, the optical system in Figure 2B, although having a different configuration from the optical system in Figure 2A, can form suitable floating images in space, just like the optical system in Figure 2A. 【0033】 Alternatively, an absorbing polarizer may be provided on the side of the transparent member 100 facing the polarization separation member 101B. This absorbing polarizer should transmit the polarization of the image light from the polarization separation member 101B and absorb the polarization that is 90° out of phase with the polarization of the image light from the polarization separation member 101B. In this way, the image light for forming the floating image 3 can be sufficiently transmitted, while the ambient light incident on the floating image 3 side of the transparent member 100 can be reduced by approximately 50%. This reduces stray light in the optical system shown in Figure 2B, which is caused by ambient light incident on the floating image 3 side of the transparent member 100. 【0034】 <Another example of the optical system configuration for a floating image display device (2)> Another example of the optical system configuration for the floating image display device will be explained using Figure 2C. In Figure 2C, components with the same reference numerals as those in Figure 2B have the same function and configuration as those in Figure 2B. Such configurations will not be repeated in order to simplify the explanation. 【0035】 The only difference between the optical system in Figure 2C and the optical system in Figure 2B is the positioning angle of the polarization separation member 101B with respect to the image display surface of the display device 1 and the surface of the retroreflector 2. All other configurations are the same as those of the optical system in Figure 2B, so repeated explanations are omitted. The polarization design of the optical system in Figure 2C is also the same as that of the optical system in Figure 2B, so repeated explanations are omitted. 【0036】 In the optical system shown in Figure 2C, the polarization separation member 101B is positioned at an angle α with respect to the image display surface of the display device 1 and the surface of the retroreflector 2. In Figure 2C, this angle α is 45°. With this configuration, in the reflection by the polarization separation member 101B, the angle β between the direction of propagation of the image light reflected by the polarization separation member 101B (the direction of the principal ray of the image light) and the direction of propagation of the image light incident from the retroreflector 2 (the direction of the principal ray of the image light) is 90°. With this configuration, the image display surface of the display device 1 and the surface of the retroreflector 2 are perpendicular to the direction of propagation of the image light reflected by the polarization separation member 101B, thus simplifying the angular relationships of the surfaces constituting the optical system. If the surface of the transparent member 100 is positioned perpendicular to the direction of propagation of the image light reflected by the polarization separation member 101B, the angular relationships of the surfaces constituting the optical system can be further simplified. In the configuration shown in Figure 2C, when a user views the image from the direction of arrow A, the floating image 3 is visible as a bright image. However, when another person views the image from the direction of arrow B, the floating image 3 is not visible at all. This characteristic makes it ideal for systems that display highly secure images or highly confidential images that should be concealed from people directly facing the user. 【0037】 As explained above, the optical system in Figure 2C, while having a different configuration from the optical systems in Figures 2A and 2B, can form suitable floating images in space, just like the optical systems in Figures 2A and 2B. Furthermore, the angles of the surfaces constituting the optical system can be made simpler. 【0038】 Alternatively, an absorbing polarizer may be provided on the side of the transparent member 100 facing the polarization separation member 101B. This absorbing polarizer should transmit the polarization of the image light from the polarization separation member 101B and absorb the polarization that is 90° out of phase with the polarization of the image light from the polarization separation member 101B. In this way, the image light for forming the floating image 3 can be sufficiently transmitted, while the ambient light incident on the floating image 3 side of the transparent member 100 can be reduced by approximately 50%. This reduces stray light in the optical system of Figure 2C caused by ambient light incident on the floating image 3 side of the transparent member 100. 【0039】 <Another example of the optical system configuration for a floating image display device 3> Another example of the optical system configuration for the floating image display device will be explained using Figure 2D. The optical system in Figure 2D is an optical system that uses a retroreflector 5, which is different from the retroreflector 2 used in Figures 2A to 2C. Below, another example of the optical system configuration 3 will be explained in more detail using Figures 2D to 2I. In Figure 2D, components that are denoted by the same reference numerals as in Figures 2A to 2C have the same function and configuration as those in Figures 2A to 2C. Such components will not be explained again in order to simplify the explanation. 【0040】 Figure 2D shows an example of the main components and retroreflective components of a spatially floating image display device according to one embodiment of the present invention. A display device 10 that emits image light is provided obliquely to a transparent member 100 such as glass. The display device 10 comprises a liquid crystal display panel 11 and a light source device 13 that generates light. 【0041】 The principal ray 9020, which represents the light beam emitted from the display device 10, travels toward the retroreflector 5 and is incident on the retroreflector 5 at an incident angle α. The incident angle α can be, for example, 45°. However, the incident angle α is not limited to 45°; for example, 45°±15° can also be used. 【0042】 The retroreflector 5 is an optical component having optical properties that retroreflect light rays in at least some directions. Furthermore, since the reflected light rays have optical properties that form an image, the retroreflector 5 may also be described as an imaging optical component or imaging optical plate. 【0043】 The specific configuration of the retroreflector 5 will be described in detail using Figures 2E and 2F, but the retroreflector 5 causes the principal ray 9020 to propagate in the z direction while being retroreflected in the x and y directions. As a result, the reflected ray 9021 travels away from the retroreflector 5 in an optical path that is mirror-symmetric with respect to the principal ray 9020 with respect to the retroreflector 5, passes through the transparent member 100, and forms a floating image 3 as a real image at the imaging plane. 【0044】 The light beam forming the floating image 3 is a collection of light rays converging from the retroreflector 5 to the optical image of the floating image 3, and these light rays continue to travel in a straight line even after passing through the optical image of the floating image 3. Therefore, unlike diffuse images formed on a screen by general projectors, the floating image 3 is an image with high directivity. Thus, in the configuration of Figure 2, when a user views from the direction of arrow A, the floating image 3 is perceived as a bright image. However, when another person views from the direction of arrow B, the floating image 3 cannot be seen as an image at all. This characteristic is suitable for use in systems that display images requiring high security or highly confidential images that should be hidden from people directly facing the user. 【0045】 An example of the configuration of the retroreflector 5 will be explained using Figures 2E and 2F. The retroreflector 5 has a configuration in which multiple corner reflectors 9040 are arranged in an array on the surface of a transparent member 50. This may also be called a corner reflector array or a polyhedron reflector array. The specific configuration of the corner reflectors 9040 will be described in detail using Figures 2G, 2H, and 2I, but the light rays 9111, 9112, 9113, and 9114 emitted from the light source 9110 are reflected twice by the two mirror surfaces 9041 and 9042 of the corner reflectors 9040, becoming reflected light rays 9121, 9122, 9123, and 9124. This double reflection is a retroreflection in the x and y directions, where the light is reflected back in the same direction as the incident direction (moving in a direction rotated 180°), and in the z direction, it is a specular reflection in which the angle of incidence and the angle of reflection coincide due to total internal reflection. 【0046】 In other words, the light rays 9111 to 9114 produce reflected light rays 9121 to 9124 on a straight line symmetrical in the z direction with respect to the corner reflector 9040, forming the aerial real image 9120. The light rays 9111 to 9114 emitted from the light source 9110 are four representative rays of diffused light from the light source 9110, and depending on the diffusion characteristics of the light source 9110, the light rays incident on the retroreflector 5 are not limited to these, but any incident light ray will cause similar reflection and form the aerial real image 9120. For the sake of clarity in the diagram, the position of the light source 9110 and the position of the aerial real image 9120 in the x direction are shown offset, but in reality, the position of the light source 9110 and the position of the aerial real image 9120 in the x direction are at the same position, and when viewed from the z direction, they are in overlapping positions. 【0047】 Next, the configuration and effects of the corner reflectors 9040 that constitute the retroreflector 5 will be explained in Figures 2G, 2H, and 2I. The corner reflector 9040 is a rectangular parallelepiped in which only two specific faces, 9041 and 9042, are mirrored surfaces, while the other four faces are made of a transparent material. The retroreflector 5 has a configuration in which these corner reflectors 9040 are arranged in an array such that their corresponding mirrored surfaces face the same direction. 【0048】 When viewed from above (+z direction), the light ray 9111 emitted from the light source 9110 is incident on the mirror surface 9041 (or mirror surface 9042) at a specific angle of incidence, undergoes total internal reflection at the reflection point 9130, and then undergoes total internal reflection again at the reflection point 9132 on the mirror surface 9042 (or mirror surface 9041). 【0049】 If the angle of incidence of ray 9111 to mirror surface 9041 (or mirror surface 9042) is θ, then the angle of incidence of the first reflected ray 9131, reflected by mirror surface 9041 (or mirror surface 9042), to mirror surface 9042 (or mirror surface 9041) can be expressed as 90°-θ. Therefore, with respect to ray 9111, the second reflected ray 9121 gains a rotation of 2θ from the first reflection and 2×(90°-θ) from the second reflection, resulting in a total reversed optical path of 180°. On the other hand, when viewed from the side (the direction midway between -x and -y), total internal reflection in the z direction occurs only once. Therefore, if the angle of incidence to mirror surface 9041 or mirror surface 9042 is φ, then with respect to ray 9111, the reflected ray 9121 gains a rotation of 2×φ from one reflection. 【0050】 From the above, the light rays incident on the corner reflector 9040 undergo retroreflection with reversed optical paths in the x and y directions, and specular reflection due to total internal reflection in the z direction. Considering the retroreflector 5, similar reflections occur in each optical path, so in the x and y directions, the image is formed at a point symmetrical with respect to the z axis due to the reversing optical path with convergence properties. 【0051】 In the optical system shown in Figures 2A to 2C, the retroreflector 2 has retroreflective properties in three axes. As a result, when a diffusive incident light beam is incident on the retroreflector 2, a convergent reflected light beam travels toward the side of the incident light beam where the light source is located. This convergent reflected light beam forms an image in the air, creating a floating image 3. The direction of propagation of the principal ray of the convergent reflected light beam reflected from the retroreflector 2 is opposite to the direction of propagation of the principal ray of the diffusive incident light beam incident on the retroreflector 2. 【0052】 In contrast, in the optical system shown in Figure 2D, the retroreflector 5 has retroreflective properties in two axes and specular reflection in the other axis. As a result, when a diffuse incident light beam is incident on the retroreflector 5, the convergent reflected light beam reflected by the corner reflector array travels toward the retroreflector 5 toward the side of the incident light beam away from the light source. This convergent reflected light beam forms an image in the air, creating a floating image 3. 【0053】 The direction of propagation of the principal ray of the convergent reflected light beam reflected by the corner reflector array of the retroreflector 5 is not in the opposite direction to the direction of propagation of the principal ray of the diffuse incident light beam incident on the retroreflector 5. The component of the direction of propagation of the principal ray of the diffuse incident light beam incident on the retroreflector 5 in the direction of the plate-shaped surface of the retroreflector 5, and the component of the direction of propagation of the principal ray after it has been reflected by the retroreflector 5 and become a convergent reflected light beam, remain in a straight line before and after reflection by the corner reflector array. 【0054】 In other words, the diffusive incident light beam is converted into a convergent reflected light beam by reflection at the retroreflector 5, but in the direction normal to the plate-shaped surface of the retroreflector 5, the light beam will travel through the retroreflector 5. Here, the diffusive incident light beam that enters the retroreflector 5 and the convergent reflected light beam that exits the retroreflector 5 are geometrically symmetrical with respect to the plate-shaped surface of the retroreflector 5. 【0055】 The resolution of the floating image formed by the light rays from the video output unit 10 depends not only on the resolution of the liquid crystal display panel 11, but also largely on the diameter D and pitch P (not shown) of the retroreflective portion of the retroreflective plate 5, as shown in Figures 2E and 2F. For example, when using a 7-inch WUXGA (1920 x 1200 pixels) liquid crystal display panel, even if one pixel (one triplet) is approximately 80 μm, if the diameter D of the retroreflective portion is 240 μm and the pitch P is 300 μm, then one pixel of the floating image will be equivalent to 300 μm. As a result, the effective resolution of the floating image is reduced to about one-third. 【0056】 Therefore, in order to make the resolution of the floating image in space equivalent to the resolution of the display device 10, it is desirable to bring the diameter D and pitch P of the retroreflective portion close to that of one pixel of the liquid crystal display panel. On the other hand, in order to suppress the occurrence of moiré patterns caused by the retroreflective plate and the pixels of the liquid crystal display panel, it is preferable to design the respective pitch ratios to be outside of integer multiples of one pixel. Furthermore, the shape should be arranged so that none of the sides of the retroreflective portion overlap with any of the sides of one pixel of the liquid crystal display panel. 【0057】 The shape of the retroreflector (imaging optical plate) in this embodiment is not limited to the example described above. It may have various shapes that realize retroreflection. Specifically, it may be various cubic corner bodies, corner reflector arrays, slit mirror arrays, two-sided corner reflector arrays, multi-sided reflector arrays, or a shape in which combinations of their reflective surfaces are arranged periodically. Alternatively, a capsule lens type retroreflector element with glass beads arranged periodically may be provided on the surface of the retroreflector in this embodiment. The detailed configuration of these retroreflector elements can be described using existing technology, so a detailed explanation is omitted. Specifically, the technology disclosed in Japanese Patent Publication No. 2017-33005, Japanese Patent Publication No. 2019-133110, Japanese Patent Publication No. 2017-67933, WO2009 / 131128, etc., can be used. 【0058】 In the optical system shown in Figure 2D, the image light emitted from the display device 10 can be in any polarization state. Both S-polarization and P-polarization are acceptable. 【0059】 As explained above, the optical system in Figure 2D, while using a different retroreflective plate than the optical systems in Figures 2A to 2C, can form a more suitable floating image in space, similar to the optical systems in Figures 2A to 2C. 【0060】 As described above, the optical systems shown in Figures 2A, 2B, 2C, and 2D can provide brighter, higher-quality floating images in space. 【0061】 <<Block diagram of the internal structure of the floating image display device>> 【0062】 Next, a block diagram of the internal configuration of the floating image display device 1000 will be described. Figure 3 is a block diagram showing an example of the internal configuration of the floating image display device 1000. 【0063】 The floating video display device 1000 includes a retroreflective section 1101, a video display section 1102, a light guide 1104, a light source 1105, a power supply 1106, an external power input interface 1111, an operation input section 1107, a non-volatile memory 1108, a memory 1109, a control section 1110, a video signal input section 1131, an audio signal input section 1133, a communication section 1132, an aerial operation detection sensor 1351, an aerial operation detection section 1350, an audio output section 1140, a microphone 1139, a video control section 1160, a storage section 1170, an imaging section 1180, and the like. It may also include a removable media interface 1134, an attitude sensor 1113, a transmissive self-emissive video display device 1650, a second display device 1680, or a secondary battery 1112. 【0064】 Each component of the floating video display device 1000 is arranged in the housing 1190. Note that the imaging unit 1180 and the aerial operation detection sensor 1351 shown in Figure 3 may be provided on the outside of the housing 1190. 【0065】 The retroreflective section 1101 in Figure 3 corresponds to the retroreflective plate 2 in Figures 2A, 2B, and 2C. The retroreflective section 1101 retroreflectively reflects light modulated by the image display section 1102. The floating image 3 is formed by the light from the reflected light from the retroreflective section 1101 that is output to the outside of the floating image display device 1000. 【0066】 The image display unit 1102 in Figure 3 corresponds to the liquid crystal display panel 11 in Figures 2A, 2B, and 2C. The light source 1105 in Figure 3 corresponds to the light source device 13 in Figures 2A, 2B, and 2C. The image display unit 1102, light guide 1104, and light source 1105 in Figure 3 correspond to the display device 1 in Figures 2A, 2B, and 2C. 【0067】 The video display unit 1102 is a display unit that generates an image by modulating transmitted light based on a video signal input under control by the video control unit 1160, which will be described later. The video display unit 1102 corresponds to the liquid crystal display panel 11 in Figures 2A, 2B, and 2C. For example, a transmissive liquid crystal panel can be used as the video display unit 1102. Alternatively, a reflective liquid crystal panel or a DMD (Digital Micromirror Device: registered trademark) panel that modulates reflected light may also be used as the video display unit 1102. 【0068】 The light source 1105 generates light for the image display unit 1102 and is a solid-state light source such as an LED or laser light source. The power supply 1106 converts the AC current input from an external source via the external power input interface 1111 into DC current and supplies power to the light source 1105. The power supply 1106 also supplies the necessary DC current to each part of the floating image display device 1000. The secondary battery 1112 stores the power supplied from the power supply 1106. The secondary battery 1112 also supplies power to the light source 1105 and other components that require power via the external power input interface 1111 when power is not supplied from an external source. In other words, if the floating image display device 1000 is equipped with a secondary battery 1112, the user can use the floating image display device 1000 even when power is not supplied from an external source. 【0069】 The light guide 1104 guides the light generated by the light source 1105 and illuminates the image display unit 1102. The combination of the light guide 1104 and the light source 1105 can also be called the backlight of the image display unit 1102. The light guide 1104 may be mainly made of glass. The light guide 1104 may be mainly made of plastic. The light guide 1104 may be made of mirrors. Various combinations of the light guide 1104 and the light source 1105 are possible. Specific examples of combinations of the light guide 1104 and the light source 1105 will be explained in detail later. 【0070】 The aerial operation detection sensor 1351 is a sensor that detects the operation of the floating video 3 by the user 230's finger. The aerial operation detection sensor 1351 senses, for example, the area that overlaps with the entire display range of the floating video 3. Alternatively, the aerial operation detection sensor 1351 may sense only the area that overlaps with at least a portion of the display range of the floating video 3. 【0071】 Specific examples of the aerial operation detection sensor 1351 include distance sensors using invisible light such as infrared, invisible light lasers, and ultrasonic waves. The aerial operation detection sensor 1351 may also be configured by combining multiple sensors to detect coordinates on a two-dimensional plane. Furthermore, the aerial operation detection sensor 1351 may consist of a Time of Flight (ToF) LiDAR (Light Detection and Ranging) or an image sensor. 【0072】 The aerial operation detection sensor 1351 only needs to be able to sense touch operations, etc., on an object displayed as a floating image 3 in space, using the user's finger. Such sensing can be performed using existing technologies. 【0073】 The aerial operation detection unit 1350 acquires sensing signals from the aerial operation detection sensor 1351 and, based on the sensing signals, calculates whether or not the user 230's finger has made contact with an object in the floating spatial image 3, and the position where the user 230's finger made contact with the object (contact position). The aerial operation detection unit 1350 is composed of circuits such as an FPGA (Field Programmable Gate Array). In addition, some functions of the aerial operation detection unit 1350 may be implemented in software, for example, by a spatial operation detection program executed in the control unit 1110. 【0074】 The aerial operation detection sensor 1351 and the aerial operation detection unit 1350 may be built into the floating video display device 1000, or they may be provided separately from the floating video display device 1000. When provided separately from the floating video display device 1000, the aerial operation detection sensor 1351 and the aerial operation detection unit 1350 are configured to transmit information and signals to the floating video display device 1000 via wired or wireless communication lines or video signal transmission lines. 【0075】 Furthermore, the aerial operation detection sensor 1351 and the aerial operation detection unit 1350 may be provided as separate components. This makes it possible to construct a system in which the floating video display device 1000 without the aerial operation detection function is used as the main unit, and only the aerial operation detection function can be added as an option. Alternatively, the aerial operation detection sensor 1351 may be a separate component, and the aerial operation detection unit 1350 may be built into the floating video display device 1000. When it is desired to place the aerial operation detection sensor 1351 more freely relative to the installation position of the floating video display device 1000, there is an advantage to having only the aerial operation detection sensor 1351 as a separate component. 【0076】 The imaging unit 1180 is a camera with an image sensor that captures images of the space near the floating image 3 and / or the user's face, arms, fingers, etc. Multiple imaging units 1180 may be provided. By using multiple imaging units 1180, or by using imaging units with depth sensors, the aerial operation detection unit 1350 can be assisted when detecting touch operations on the floating image 3 by the user 230. The imaging unit 1180 may be provided separately from the floating image display device 1000. If the imaging unit 1180 is provided separately from the floating image display device 1000, it should be configured to transmit imaging signals to the floating image display device 1000 via a wired or wireless communication connection path. 【0077】 For example, if the aerial operation detection sensor 1351 is configured as an object intrusion sensor that detects whether or not an object has entered a plane (intrusion detection plane) that includes the display surface of the floating spatial image 3, the aerial operation detection sensor 1351 may not be able to detect information such as how far away an object that has not entered the intrusion detection plane (for example, a user's finger) is from the intrusion detection plane, or how close an object is to the intrusion detection plane. 【0078】 In such cases, the distance between the object and the intrusion detection plane can be calculated by using information such as depth calculation information of the object based on images captured by multiple imaging units 1180 and depth information of the object from a depth sensor. This information, as well as various other information such as the distance between the object and the intrusion detection plane, is then used for various display controls of the floating spatial image 3. 【0079】 Alternatively, instead of using the aerial operation detection sensor 1351, the aerial operation detection unit 1350 may detect touch operations on the floating video 3 by the user 230 based on the image captured by the imaging unit 1180. 【0080】 Alternatively, the imaging unit 1180 may capture an image of the face of the user 230 operating the floating video 3, and the control unit 1110 may perform user identification processing. Furthermore, in order to determine whether other people are standing around or behind the user 230 operating the floating video 3 and whether they are peeking at the user 230's operation of the floating video 3, the imaging unit 1180 may capture an area that includes the user 230 operating the floating video 3 and the area surrounding the user 230. 【0081】 The operation input unit 1107 is, for example, an operation button, a signal receiving unit such as a remote controller, or an infrared light receiving unit, and inputs signals for operations other than aerial operations (touch operations) by the user 230. Separately from the aforementioned user 230 who touches the floating spatial image 3, the operation input unit 1107 may also be used, for example, by an administrator to operate the floating spatial image display device 1000. 【0082】 The video signal input unit 1131 receives video data by connecting an external video output device. Various digital video input interfaces can be considered for the video signal input unit 1131. For example, it can be configured with a video input interface conforming to the HDMI (High-Definition Multimedia Interface) standard, a video input interface conforming to the DVI (Digital Visual Interface) standard, or a video input interface conforming to the DisplayPort standard. 【0083】 Alternatively, an analog video input interface such as analog RGB or composite video may be provided. The audio signal input unit 1133 receives audio data by connecting an external audio output device. The audio signal input unit 1133 may be configured as an HDMI standard audio input interface, an optical digital terminal interface, or a coaxial digital terminal interface. In the case of an HDMI standard interface, the video signal input unit 1131 and the audio signal input unit 1133 may be configured as an interface with integrated terminals and cables. The audio output unit 1140 is capable of outputting audio based on the audio data input to the audio signal input unit 1133. The audio output unit 1140 may be configured as a speaker. 【0084】 Furthermore, the audio output unit 1140 may output built-in operation sounds or error warning sounds. Alternatively, the audio output unit 1140 may be configured to output as a digital signal to an external device, such as the Audio Return Channel function specified in the HDMI standard. The microphone 1139 is a microphone that picks up sounds from the vicinity of the floating image display device 1000, converts them into signals, and generates an audio signal. The microphone may record human voices, such as the user's voice, and the control unit 1110, described later, may perform speech recognition processing on the generated audio signal to obtain text information from the audio signal. 【0085】 The non-volatile memory 1108 stores various data used by the floating image display device 1000. The data stored in the non-volatile memory 1108 includes, for example, data for various operations displayed on the floating image 3, display icons, data and layout information for objects that the user operates. Memory 1109 stores video data to be displayed as the floating image 3, control data for the device, and the like. 【0086】 The control unit 1110 controls the operation of each connected part. The control unit 1110 may also work in cooperation with a program stored in the memory 1109 to perform calculation processing based on information acquired from each part of the floating image display device 1000. 【0087】 The communication unit 1132 communicates with external devices, external servers, etc., via a wired or wireless communication interface. If the communication unit 1132 has a wired communication interface, the wired communication interface may be configured as, for example, an Ethernet standard LAN interface. If the communication unit 1132 has a wireless communication interface, it may be configured as, for example, a Wi-Fi communication interface, a Bluetooth communication interface, or a mobile communication interface such as 4G or 5G. Various types of data, such as video data, image data, and audio data, are transmitted and received through communication via the communication unit 1132. 【0088】 Furthermore, the removable media interface 1134 is an interface for connecting removable recording media. Removable recording media may consist of semiconductor memory such as solid-state drives (SSDs), magnetic recording media recording devices such as hard disk drives (HDDs), or optical recording media such as optical discs. The removable media interface 1134 can read various types of information, such as video data, image data, and audio data, recorded on the removable recording media. Video data, image data, etc., recorded on the removable recording media are output as floating-in-space images 3 via the video display unit 1102 and the retroreflective unit 1101. 【0089】 The storage unit 1170 is a storage device that records various types of information, such as video data, image data, and audio data. The storage unit 1170 may be composed of a magnetic recording medium such as a hard disk drive (HDD) or a semiconductor memory such as a solid-state drive (SSD). The storage unit 1170 may have various types of information, such as video data, image data, and audio data, pre-recorded in it at the time of product shipment. The storage unit 1170 may also record various types of information, such as video data, image data, and audio data, acquired from external devices or external servers via the communication unit 1132. 【0090】 The video data, image data, etc., recorded in the storage unit 1170 are output as floating-in-space video 3 via the video display unit 1102 and the retroreflection unit 1101. The video data, image data, etc., of display icons and user-operated objects, etc., which are displayed as floating-in-space video 3, are also recorded in the storage unit 1170. 【0091】 Layout information such as display icons and objects shown as the floating spatial image 3, as well as various metadata information related to the objects, are also recorded in the storage unit 1170. Audio data recorded in the storage unit 1170 is output as audio from, for example, the audio output unit 1140. 【0092】 The video control unit 1160 performs various controls related to the video signals input to the video display unit 1102. The video control unit 1160 may also be called a video processing circuit and may be composed of hardware such as an ASIC, FPGA, or video processor. The video control unit 1160 may also be called a video processing unit or image processing unit. For example, the video control unit 1160 performs video switching control, such as determining which video signal to input to the video display unit 1102 from among the video signals stored in the memory 1109 and the video signals (video data) input to the video signal input unit 1131. 【0093】 Alternatively, the video control unit 1160 may generate a superimposed video signal by superimposing the video signal to be stored in the memory 1109 and the video signal input from the video signal input unit 1131, and then input the superimposed video signal to the video display unit 1102 to form the composite image as a floating image 3. 【0094】 Furthermore, the video control unit 1160 may perform image processing on video signals input from the video signal input unit 1131 and video signals stored in the memory 1109. Examples of image processing include scaling, which enlarges, reduces, and transforms images; brightness adjustment, which changes the brightness; contrast adjustment, which changes the contrast curve of an image; and retinex processing, which decomposes an image into its light components and changes the weighting of each component. 【0095】 Furthermore, the video control unit 1160 may perform special effects video processing on the video signal input to the video display unit 1102 to assist the user 230's aerial operation (touch operation). Special effects video processing is performed, for example, based on the detection result of the user 230's touch operation by the aerial operation detection unit 1350 or on the image captured by the imaging unit 1180 of the user 230. 【0096】 The attitude sensor 1113 is a sensor composed of a gravity sensor, an acceleration sensor, or a combination thereof, and can detect the orientation in which the floating video display device 1000 is installed. Based on the attitude detection result of the attitude sensor 1113, the control unit 1110 may control the operation of each connected part. For example, if an undesirable orientation for the user is detected, the control unit 1110 may stop displaying the video that the video display unit 1102 was showing and display an error message to the user. Alternatively, if the attitude sensor 1113 detects that the installation orientation of the floating video display device 1000 has changed, the control unit 1110 may rotate the orientation of the video that the video display unit 1102 was showing. 【0097】 As explained above, the floating image display device 1000 is equipped with various functions. However, the floating image display device 1000 does not need to have all of these functions; any configuration is acceptable as long as it has the function of forming the floating image 3. 【0098】 <Example configuration of a floating image display device> Next, we will describe an example configuration of the floating image display device. The layout of the components of the floating image display device according to this embodiment can vary depending on the usage. Below, we will describe each of the layouts shown in Figures 4A to 4M. In all of the examples in Figures 4A to 4M, the thick lines surrounding the floating image display device 1000 indicate an example of the housing structure of the floating image display device 1000. 【0099】 Figure 4A shows an example of the configuration of a floating image display device. The floating image display device 1000 shown in Figure 4A is equipped with an optical system corresponding to the optical system in Figure 2A. In the floating image display device 1000 shown in Figure 4A, it is installed horizontally so that the side on which the floating image 3 is formed faces upward. That is, in Figure 4A, the floating image display device 1000 has a transparent member 100 installed on the top surface of the device. The floating image 3 is formed above the surface of the transparent member 100 of the floating image display device 1000. The light of the floating image 3 travels diagonally upward. If the air operation detection sensor 1351 is installed as shown in the figure, it is possible to detect operation of the floating image 3 by the user 230's finger. Note that the x direction is the left-right direction from the user's perspective, the y direction is the front-back direction (depth direction) from the user's perspective, and the z direction is the up-down direction (vertical direction). In the following Figure 4, the definitions of the x, y, and z directions are the same, so repeated explanations will be omitted. 【0100】 Figure 4B shows an example of the configuration of a floating image display device. The floating image display device 1000 shown in Figure 4B is equipped with an optical system corresponding to the optical system in Figure 2A. The floating image display device 1000 shown in Figure 4B is installed vertically so that the side on which the floating image 3 is formed faces the front of the floating image display device 1000 (towards the user 230). That is, in Figure 4B, the floating image display device has a transparent member 100 installed on the front of the device (towards the user 230). The floating image 3 is formed on the user 230 side relative to the surface of the transparent member 100 of the floating image display device 1000. The light of the floating image 3 travels diagonally upward. If the aerial operation detection sensor 1351 is provided as shown in the figure, it is possible to detect operation of the floating image 3 by the user 230's finger. As shown in Figure 4B, the airborne operation detection sensor 1351 senses the user's finger from above, allowing it to utilize the reflection of sensing light from the user's fingernail for touch detection. Generally, fingernails have a higher reflectivity than the pads of the fingers, so this configuration can improve the accuracy of touch detection. 【0101】 Figure 4C shows an example of the configuration of a floating image display device. The floating image display device 1000 shown in Figure 4C is equipped with an optical system corresponding to the optical system in Figure 2B. In the floating image display device 1000 shown in Figure 4C, it is installed horizontally so that the side on which the floating image 3 is formed faces upward. That is, in Figure 4C, the floating image display device 1000 has a transparent member 100 installed on the top surface of the device. The floating image 3 is formed above the surface of the transparent member 100 of the floating image display device 1000. The light of the floating image 3 travels diagonally upward. If the aerial operation detection sensor 1351 is provided as shown in the figure, it is possible to detect operation of the floating image 3 by the user 230's finger. 【0102】 Figure 4D shows an example of the configuration of a floating image display device. The floating image display device 1000 shown in Figure 4D is equipped with an optical system corresponding to the optical system in Figure 2B. The floating image display device 1000 shown in Figure 4D is installed vertically so that the side on which the floating image 3 is formed faces the front of the floating image display device 1000 (towards the user 230). That is, in Figure 4D, the floating image display device 1000 has a transparent member 100 installed on the front of the device (towards the user 230). The floating image 3 is formed on the user 230 side of the transparent member 100 of the floating image display device 1000. The light of the floating image 3 travels diagonally upward. If the aerial operation detection sensor 1351 is provided as shown in the figure, it is possible to detect operation of the floating image 3 by the user 230's finger. Here, as shown in Figure 4D, the airborne operation detection sensor 1351 senses the user's finger from above, and the reflection of the sensing light from the user's fingernail can be used for touch detection. Generally, fingernails have a higher reflectivity than the pads of the fingers, so this configuration can improve the accuracy of touch detection. 【0103】 Figure 4E shows an example of the configuration of a floating image display device. The floating image display device 1000 shown in Figure 4E is equipped with an optical system corresponding to the optical system in Figure 2C. In the floating image display device 1000 shown in Figure 4E, it is installed horizontally so that the side on which the floating image 3 is formed faces upward. That is, in Figure 4E, the floating image display device 1000 has a transparent member 100 installed on the top surface of the device. The floating image 3 is formed above the surface of the transparent member 100 of the floating image display device 1000. The light of the floating image 3 travels in a direction directly upward. If the air operation detection sensor 1351 is provided as shown in the figure, it is possible to detect operation of the floating image 3 by the user 230's finger. 【0104】 Figure 4F shows an example of the configuration of a floating image display device. The floating image display device 1000 shown in Figure 4F is equipped with an optical system corresponding to the optical system in Figure 2C. The floating image display device 1000 shown in Figure 4F is installed vertically so that the side on which the floating image 3 is formed faces the front of the floating image display device 1000 (towards the user 230). That is, in Figure 4F, the floating image display device 1000 has a transparent member 100 installed on the front of the device (towards the user 230). The floating image 3 is formed on the user 230 side relative to the surface of the transparent member 100 of the floating image display device 1000. The light of the floating image 3 travels in the direction toward the user. If the aerial operation detection sensor 1351 is provided as shown in the figure, it is possible to detect operation of the floating image 3 by the user 230's finger. 【0105】 Figure 4G shows an example of the configuration of a floating image display device. The floating image display device 1000 shown in Figure 4G is equipped with an optical system corresponding to the optical system in Figure 2C. In the optical systems of the floating image display devices shown in Figures 4A to 4F, the optical path of the center of the image light emitted from the display device 1 was on the yz plane. That is, within the optical systems of the floating image display devices shown in Figures 4A to 4F, the image light traveled in the front-to-back and up-and-down directions as viewed from the user. In contrast, in the optical system of the floating image display device shown in Figure 4G, the optical path of the center of the image light emitted from the display device 1 is on the xy plane. That is, within the optical system of the floating image display device shown in Figure 4G, the image light travels in the left-to-right and front-to-back directions as viewed from the user. In the floating image display device 1000 shown in Figure 4G, the side on which the floating image 3 is formed is installed so that it faces the front of the device (towards the user 230). In other words, in Figure 4G, the floating image display device 1000 has a transparent member 100 installed on the front of the device (towards the user 230). The floating image 3 is formed on the user side relative to the surface of the transparent member 100 of the floating image display device 1000. The light of the floating image 3 travels toward the user. If the aerial operation detection sensor 1351 is provided as shown in the figure, it is possible to detect operation of the floating image 3 by the user 230's finger. 【0106】 Figure 4H shows an example of the configuration of a floating image display device. The floating image display device 1000 in Figure 4H differs from the floating image display device in Figure 4G in that it has a window with a transparent plate 100B made of glass or plastic on the back of the device (opposite the position from which the user 230 views the floating image 3, i.e., opposite the direction of propagation of the image light of the floating image 3 directed toward the user 230). The other configurations are the same as those of the floating image display device in Figure 4G, so repeated explanations are omitted. The floating image display device 1000 in Figure 4H has a window with a transparent plate 100B at a position opposite to the direction of propagation of the image light of the floating image 3. Therefore, when the user 230 views the floating image 3, they can recognize the scenery behind the floating image display device 1000 as the background of the floating image 3. Therefore, the user 230 can perceive the floating image 3 as floating in the air in front of the scenery behind the floating image display device 1000. This further enhances the sense of floating in the air of the floating image 3. 【0107】 Furthermore, depending on the polarization distribution of the video light output from the display device 1 and the performance of the polarization separation member 101B, a portion of the video light output from the display device 1 may be reflected by the polarization separation member 101B and directed toward the transparent plate 100B. Depending on the coating performance of the surface of the transparent plate 100B, this light may be reflected again by the surface of the transparent plate 100B and may be visible to the user as stray light. Therefore, in order to prevent such stray light, the transparent plate 100B may not be provided in the window on the back of the floating video display device 1000. 【0108】 Figure 4I shows an example of the configuration of a floating image display device. The floating image display device 1000 in Figure 4I differs from the floating image display device in Figure 4H in that it has an opening / closing door 1410 for light shielding on the window of the transparent plate 100B located on the back of the device (opposite the position from which the user 230 views the floating image 3). The other configurations are the same as those of the floating image display device in Figure 4H, so repeated explanations are omitted. 【0109】 The opening and closing door 1410 of the floating spatial image display device 1000 in Figure 4I has, for example, a light-shielding plate and is equipped with a mechanism for moving (sliding), rotating, or attaching / detaching the light-shielding plate, thereby enabling switching between an open state and a light-shielding state for the window (rear window) of the transparent plate 100B located at the back of the floating spatial image display device 1000. The movement (sliding) and rotation of the light-shielding plate by the opening and closing door 1410 may be electrically driven by a motor (not shown). This motor may be controlled by the control unit 1110 in Figure 3. Note that in the example in Figure 4I, an example is disclosed in which the opening and closing door 1410 has two light-shielding plates. However, the opening and closing door 1410 may have only one light-shielding plate. 【0110】 For example, if the view beyond the window of the transparent panel 100B of the floating image display device 1000 is outdoors, the brightness of sunlight will vary depending on the weather. If the sunlight outdoors is strong, the background of the floating image 3 may become too bright, reducing the visibility of the floating image 3 for the user 230. In such cases, by moving (sliding), rotating, or attaching the light-shielding plate of the opening / closing door 1410 to block the light on the rear window, the background of the floating image 3 will become darker, thereby relatively improving the visibility of the floating image 3. This shielding operation by the light-shielding plate of the opening / closing door 1410 may be performed directly by the force of the user 230's hand. Alternatively, the control unit 1110 may control a motor (not shown) in response to an operation input via the operation input unit 1107 in Figure 3 to perform the shielding operation by the light-shielding plate of the opening / closing door 1410. 【0111】 Furthermore, an illuminance sensor may be installed on the rear side of the floating spatial image display device 1000 (opposite side of the user 230), such as near the rear window, to measure the brightness of the space beyond the rear window. In this case, the control unit 1110 in Figure 3 may control a motor (not shown) to open and close the light-shielding plate of the opening / closing door 1410 according to the detection result of the illuminance sensor. By controlling the opening and closing operation of the light-shielding plate of the opening / closing door 1410 in this way, the visibility of the floating spatial image 3 can be more favorably maintained without the user 230 having to manually open or close the light-shielding plate of the opening / closing door 1410. 【0112】 Furthermore, the light-shielding plate provided by the opening / closing door 1410 may be manually detachable. Depending on the intended use and installation environment of the spatial floating image display device 1000, the user can choose whether to leave the rear window open or detached. If the rear window is to be used in a detached state for a long period of time, the detachable light-shielding plate can be fixed in the detached state. If the rear window is to be used in an open state for a long period of time, the detachable light-shielding plate can be removed. The light-shielding plate may be attached and detached using screws, a hook structure, or a snap-in structure. 【0113】 In the example of the floating video display device 1000 shown in Figure 4I, depending on the polarization distribution of the video light output from the display device 1 and the performance of the polarization separation member 101B, some of the video light output from the display device 1 may be reflected by the polarization separation member 101B and directed toward the transparent plate 100B. Depending on the coating performance of the surface of the transparent plate 100B, this light may be reflected again by the surface of the transparent plate 100B and seen by the user as stray light. Therefore, in order to prevent such stray light, the floating video display device 1000 may be configured without a transparent plate 100B in the window on the back of the device. The window without the transparent plate 100B can be provided with the above-mentioned opening and closing door 1410. In order to prevent such stray light, it is desirable that the inner surface of the housing of the light-shielding plate of the above-mentioned opening and closing door 1410 has a coating or material with low light reflectivity. 【0114】 Figure 4J shows an example of the configuration of a floating image display device. The floating image display device 1000 in Figure 4J differs from the floating image display device in Figure 4H in that instead of placing a transparent plate 100B made of glass or plastic in the rear window, it places an electronically controlled variable transmittance device 1620. The other configurations are the same as those of the floating image display device in Figure 4H, so repeated explanations are omitted. An example of the electronically controlled variable transmittance device 1620 is a liquid crystal shutter. 【0115】 In other words, the liquid crystal shutter can control the transmitted light by controlling the voltage of the liquid crystal element sandwiched between two polarizing plates. Therefore, by controlling the liquid crystal shutter to increase the transmittance, the background of the floating image 3 will be the scenery seen through the rear window. Alternatively, by controlling the liquid crystal shutter to increase the transmittance, the scenery seen through the rear window can be made invisible as the background of the floating image 3. Furthermore, since the liquid crystal shutter can be controlled in intermediate lengths, it can be set to a transmittance of 50% or other values. For example, the control unit 1110 can control the transmittance of the electronically controlled variable transmittance device 1620 in response to an operation input via the operation input unit 1107 in Figure 3. With this configuration, if the view through the rear window is desired as the background for the floating image 3, but the view through the rear window is too bright and reduces the visibility of the floating image 3, the visibility of the floating image 3 can be adjusted by adjusting the transmittance of the electronically controlled variable transmittance device 1620. 【0116】 Alternatively, an illuminance sensor may be installed on the rear side of the floating image display device 1000 (opposite the user 230), such as near the rear window, to measure the brightness of the space beyond the rear window. In this case, the control unit 1110 in Figure 3 can control the transmittance of the electronically controlled variable transmittance device 1620 according to the detection result of the illuminance sensor. In this way, the transmittance of the electronically controlled variable transmittance device 1620 can be adjusted according to the brightness of the space beyond the rear window without the user 230 having to perform an operation input via the operation input unit 1107 in Figure 3, thereby making it possible to maintain the visibility of the floating image 3 more favorably. 【0117】 Furthermore, in the above example, a liquid crystal shutter was described as an example of the electronically controlled variable transmittance device 1620. In contrast, electronic paper may be used as another example of the electronically controlled variable transmittance device 1620. The same effects as described above can be obtained even when electronic paper is used. Moreover, electronic paper consumes very little power to maintain the halftone state. Therefore, a spatial levitation image display device with lower power consumption can be realized compared to the case in which a liquid crystal shutter is used. 【0118】 Figure 4K shows an example of the configuration of a floating image display device. The floating image display device 1000 in Figure 4K differs from the floating image display device in Figure 4G in that it has a transmissive self-emissive image display device 1650 instead of a transparent member 100. The other configurations are the same as those of the floating image display device in Figure 4G, so repeated explanations are omitted. 【0119】 In the spatial levitation image display device 1000 shown in Figure 4K, the spatial levitation image 3 is formed outside the spatial levitation image display device 1000 after the image light beam passes through the display surface of the transmissive self-emissive image display device 1650. That is, when an image is displayed on the transmissive self-emissive image display device 1650, which is a two-dimensional planar display, the spatial levitation image 3 can be displayed as a projecting image further in front of the user than the image on the transmissive self-emissive image display device 1650. At this time, the user 230 can simultaneously view two images at different depth positions. The transmissive self-emissive image display device 1650 can be constructed using existing technologies such as transmissive organic EL panels, as disclosed in, for example, Japanese Patent Application Publication No. 2014-216761. Although the transmissive self-emissive image display device 1650 is not shown in Figure 3, it can be configured as a component of the spatial levitation image display device 1000 shown in Figure 3, and connected to other processing units such as the control unit 1110. 【0120】 Here, if the transparent self-emissive video display device 1650 displays both the background and objects such as characters, and then performs an effect such as having only the objects such as characters move to the floating video 3 in the foreground space, the user 230 can be provided with a more effective surprise video experience. 【0121】 Furthermore, if the interior of the floating image display device 1000 is kept dark, the background of the transmissive self-emissive image display device 1650 becomes sufficiently dark. Therefore, when no image is displayed on the display device 1, or when the light source of the display device 1 is turned off and only the transmissive self-emissive image display device 1650 displays an image, the user 230 will perceive the transmissive self-emissive image display device 1650 as a normal two-dimensional planar display rather than a transmissive display (because the floating image 3 in the embodiment of the present invention is displayed as a real optical image in space without a screen, if the light source of the display device 1 is turned off, the planned display location for the floating image 3 becomes empty space). Therefore, when the transmissive self-emissive image display device 1650 is being used as if it were a general two-dimensional planar display to show an image, characters or objects can be suddenly displayed in mid-air as floating images 3, providing the user 230 with a more effective surprise visual experience. 【0122】 Furthermore, the darker the interior of the floating image display device 1000 is made, the more the transmissive self-emissive image display device 1650 appears like a two-dimensional planar display. Therefore, an absorbing polarizing plate (not shown) that transmits the polarization of the image light reflected by the polarization separation member 101B and absorbs polarization that is 90° out of phase with that polarization may be provided on the interior side of the transmissive self-emissive image display device 1650 (the incident surface for the image light reflected by the polarization separation member 101B into the transmissive self-emissive image display device 1650, i.e., the side of the transmissive self-emissive image display device 1650 opposite to the floating image 3). In this way, the impact on the image light forming the floating image 3 is not so great, but the amount of light incident on the interior of the floating image display device 1000 from the outside via the transmissive self-emissive image display device 1650 can be significantly reduced, making the interior of the floating image display device 1000 darker, which is preferable. 【0123】 Figure 4L shows an example of the configuration of a floating image display device. The floating image display device 1000 in Figure 4L is a modified version of the floating image display device in Figure 4K. The orientation of the components in the floating image display device 1000 differs from that of the floating image display device in Figure 4K, and is closer to that of the floating image display device in Figure 4F. The functions and operations of each component are the same as those of the floating image display device in Figure 4K, so repeated explanations are omitted. 【0124】 In the floating image display device shown in Figure 4L, after the light beam of the image passes through the transmissive self-emissive image display device 1650, the floating image 3 is formed on the user 230 side of the transmissive self-emissive image display device 1650. 【0125】 In both the example of the floating image display device in Figure 4K and the example of the floating image display device in Figure 4L, the floating image 3 is displayed superimposed in front of the image on the transmissive self-emissive image display device 1650 to the user 230. Here, the position of the floating image 3 and the position of the image on the transmissive self-emissive image display device 1650 are configured to have a difference in the depth direction. Therefore, when the user moves their head (viewpoint position), they can perceive the depth of the two images due to parallax. Thus, by displaying two images with different depth positions, a three-dimensional image experience can be more favorably provided to the user with the naked eye, without the need for stereoscopic glasses or other such devices. 【0126】 Figure 4M shows an example of the configuration of a floating image display device. In the floating image display device 1000 of Figure 4M, a second display device 1680 is provided on the side furthest from the user's perspective relative to the polarization separation member 101B of the floating image display device in Figure 4G. The other configurations are the same as those of the floating image display device in Figure 4G, so repeated explanations are omitted. 【0127】 In the configuration example shown in Figure 4M, the second display device 1680 is located behind the display position of the floating image 3, with its display surface facing the floating image 3. With this configuration, the user 230 can view the image from the second display device 1680 and the floating image 3, which are displayed at two different depths, superimposed on each other. In other words, the second display device 1680 is positioned to display the image in the direction of the user 230 who is viewing the floating image 3. Although the second display device 1680 is not shown in Figure 3, it can be configured as a component of the floating image display device 1000 in Figure 3, connected to other processing units such as the control unit 1110. 【0128】 In Figure 4M, the video light from the second display device 1680 of the floating video display device 1000 is seen by the user 230 after passing through the polarization separation member 101B. Therefore, in order for the video light from the second display device 1680 to pass through the polarization separation member 101B more favorably, it is desirable that the video light output from the second display device 1680 has polarization in the direction of vibration that the polarization separation member 101B passes through more favorably. That is, it is desirable that the polarization is in the same direction of vibration as the video light output from the display device 1. For example, if the video light output from the display device 1 is S-polarized, it is desirable that the video light output from the second display device 1680 is also S-polarized. Also, if the video light output from the display device 1 is P-polarized, it is desirable that the video light output from the second display device 1680 is also P-polarized. 【0129】 The example of the floating image display device in Figure 4M has the same effect as the examples of the floating image display devices in Figure 4K and Figure 4L, in that it displays a second image behind the floating image 3. However, unlike the examples of the floating image display devices in Figure 4K and Figure 4L, in the example of the floating image display device in Figure 4M, the light beam of the image light that forms the floating image 3 does not pass through the second display device 1680. Therefore, the second display device 1680 does not need to be a transmissive self-emissive image display device, and can be a liquid crystal display, which is a two-dimensional planar display. The second display device 1680 can also be an organic EL display. Therefore, in the example of the floating image display device in Figure 4M, it is possible to realize the floating image display device 1000 at a lower cost than in the examples of the floating image display devices in Figure 4K and Figure 4L. 【0130】 Here, depending on the polarization distribution of the video light output from the display device 1 and the performance of the polarization separation member 101B, a portion of the video light output from the display device 1 may be reflected by the polarization separation member 101B and directed toward the second display device 1680. This light (a portion of the video light) may be reflected again by the surface of the second display device 1680 and may be visible to the user as stray light. 【0131】 Therefore, in order to prevent stray light, an absorbing polarizer may be provided on the surface of the second display device 1680. In this case, the absorbing polarizer should be one that transmits the polarization of the image light output from the second display device 1680 and absorbs polarization that is 90° out of phase with the polarization of the image light output from the second display device 1680. If the second display device 1680 is a liquid crystal display, an absorbing polarizer also exists on the image output side inside the liquid crystal display. However, if there is a cover glass (cover glass on the image display surface side) on the output surface of the absorbing polarizer on the image output side inside the liquid crystal display, it is not possible to prevent stray light caused by reflection from the cover glass due to light from outside the liquid crystal display. Therefore, it is necessary to separately provide the above-mentioned absorbing polarizer on the surface of the cover glass. 【0132】 Furthermore, when displaying an image on the second display device 1680, which is a two-dimensional planar display, the floating spatial image 3 can be displayed as an image further in front of the user than the image on the second display device 1680. In this case, the user 230 can simultaneously view two images with different depth positions. By displaying a character on the floating spatial image 3 and a background on the second display device 1680, it is possible to provide the user 230 with the effect of viewing the space in which the character exists in three dimensions. 【0133】 Furthermore, by displaying both the background and objects such as characters on the second display device 1680, and then having only the objects such as characters move to the floating image 3 in the foreground, it is possible to provide the user 230 with a more effective surprise-style video experience. 【0134】 Next, Figure 4N shows an example of the configuration of a floating image display device. The floating image display device 1000 in Figure 4N is a floating image display device that employs the optical system shown in Figure 2D. Similar to the examples of floating image display devices employing the optical systems in Figures 2A to 2C, the floating image 3 is projected into the air as an image by the image light that has passed through the transparent member 100. Furthermore, the operation of the floating image 3 by the user's finger 9004 can be detected using the sensing light of the air operation detection sensor 1351, which is located behind the transparent member 100 as seen from the user's perspective. 【0135】 In both the example of a floating image display device employing the optical systems shown in Figures 2A to 2C, and the example of a floating image display device employing the optical system shown in Figure 2D, the floating image 3 is projected in front of the transparent member 100, and the user's operation of the floating image 3 by their finger can be detected using the sensing light of the aerial operation detection sensor 1351, which is positioned behind the transparent member 100 as seen from the user's perspective. Therefore, the floating image display device employing the optical system in Figure 2D has a different optical system from the floating image display device employing the optical systems in Figures 2A to 2C, which are positioned behind the transparent member 100 as seen from the user's perspective. 【0136】 However, the usability of the floating image display device employing the optical system shown in Figure 2D from the user's perspective will be almost the same as that of the floating image display device employing the optical systems shown in Figures 2A to 2C. 【0137】 Next, Figure 4O is a diagram showing an example of the configuration of a floating image display device. Figure 4O is a diagram that shows the configuration of the internal optical system in the floating image display device 1000 of Figure 4N. The floating image display device 1000 shown in Figure 4O is equipped with an optical system corresponding to the optical system in Figure 2D. In the floating image display device 1000 shown in Figure 4O, it is installed horizontally so that the side on which the floating image 3 is formed faces upward. 【0138】 In other words, in Figure 4O, the floating image display device 1000 has a transparent member 100 installed on its upper surface. The floating image 3 is formed above the surface of the transparent member 100 of the floating image display device 1000. The light of the floating image 3 travels diagonally upward. If the aerial operation detection sensor 1351 is provided as shown in the figure, it is possible to detect operation of the floating image 3 by the user 230's finger. 【0139】 Here, we compare the configuration of Figure 4O with the configuration of Figure 4A and confirm the differences. In Figure 4A, the display device 1 and the floating image 3 are symmetrical with respect to the plane of the polarization separation member 101. In contrast, in Figure 4O, the display device 1 and the floating image 3 are symmetrical with respect to the plane of the retroreflector 5. Also, the configuration of Figure 4A includes a retroreflector 2 and a λ / 4 plate 21, but these are not present in Figure 4O. Furthermore, in Figure 4A, it is preferable to have an absorptive polarizer 12, but in Figure 4O, an absorptive polarizer 12 is not particularly necessary. 【0140】 In other words, to replace the optical system of Figure 2A in the configuration of Figure 4A with the optical system of Figure 2D and to replace it with the configuration of Figure 4O, the following should be done. That is, the polarization separation member 101 in the configuration of Figure 4A should be replaced with the retroreflector 5, and the retroreflector 2 and λ / 4 plate 21 should be removed from the configuration of Figure 4A. The absorbing polarizer 12 may or may not be included. By performing substitutions based on this idea, the optical systems of Figures 2A to 2C mounted on the spatial floating image display device configurations of Figures 4A to 4G can be replaced with the optical system of Figure 2D, and the spatial floating image display device can be replaced with the optical system of Figure 2D. In this case, in Figures 4A and 4B, the polarization separation member 101 should be replaced with the retroreflector 5, and in Figures 4C to 4G, the polarization separation member 101B should be replaced with the retroreflector 5. 【0141】 In this way, a floating image display device can be realized by replacing the optical system in the configuration of the floating image display device shown in Figures 4A to 4G with the optical system shown in Figure 2D. Even with these floating image display devices that have been replaced with the optical system shown in Figure 2D, it is possible to realize a floating image display device with almost the same ease of use as the floating image display devices shown in Figures 4A to 4G. 【0142】 <Display device> Next, the display device 1 of this embodiment will be described with reference to the figures. The display device 1 of this embodiment includes a video display element 11 (liquid crystal display panel) and a light source device 13 that constitutes its light source. Figure 5 shows the light source device 13 together with the liquid crystal display panel as an exploded perspective view. 【0143】 As shown by the arrow 30 in Figure 5, this liquid crystal display panel (image display element 11) receives an illumination beam from the light source device 13, which is a backlight device, that has narrow-angle diffusion characteristics, that is, strong directionality (straight-line propagation) and characteristics similar to laser light with the polarization plane aligned in one direction. The liquid crystal display panel (image display element 11) modulates the received illumination beam according to the input video signal. The modulated video light is reflected by the retroreflector 2 and transmitted through the transparent member 100 to form a floating image in space, which is a real image (see Figure 1). 【0144】 Furthermore, Figure 5 shows that the display device 1 is configured with a liquid crystal display panel 11, an optical direction conversion panel 54 that controls the directional characteristics of the light beam emitted from the light source device 13, and a narrow-angle diffuser (not shown) as needed. That is, polarizing plates are provided on both sides of the liquid crystal display panel 11, and image light with a specific polarization is emitted after the intensity of the light is modulated by the image signal (see arrow 30 in Figure 5). As a result, the desired image is projected as light with a specific polarization that has high directionality (straight-line propagation) towards the retroreflector 2 via the optical direction conversion panel 54, reflected by the retroreflector 2, and transmitted towards the eyes of a monitor outside the store (space) to form a floating image 3 in space. Note that a protective cover 50 (see Figures 6 and 7) may be provided on the surface of the optical direction conversion panel 54 described above. 【0145】 <Example of a display device 1> Figure 6 shows an example of the specific configuration of the display device 1. In Figure 6, a liquid crystal display panel 11 and a light direction conversion panel 54 are arranged on top of the light source device 13 shown in Figure 5. The light source device 13 is formed from, for example, plastic on the case shown in Figure 5, and houses LED elements 201 and a light guide 203 inside. As shown in Figure 5, the end face of the light guide 203 has a lens shape that gradually increases in cross-sectional area toward the light receiving part, and has the effect of gradually decreasing the divergence angle by undergoing multiple total internal reflections as the light propagates through the interior, in order to convert the divergent light emitted from each LED element 201 into a nearly parallel luminous beam. The liquid crystal display panel 11 that constitutes the display device 1 is mounted on the top surface of the display device 1. In addition, an LED (Light Emitting Diode) element 201, which is a semiconductor light source, and an LED substrate 202 on which its control circuit is mounted are attached to one side of the case of the light source device 13 (the left end face in this example). A heat sink, which is a component for cooling the heat generated by the LED elements and control circuit, may be attached to the outer surface of the LED substrate 202. 【0146】 Furthermore, the frame (not shown) of the liquid crystal display panel, which is mounted on the top surface of the case of the light source device 13, is configured by mounting the liquid crystal display panel 11 attached to the frame, and also by mounting an FPC (Flexible Printed Circuits) (not shown) electrically connected to the liquid crystal display panel 11. In other words, the liquid crystal display panel 11, which is an image display element, generates a display image by modulating the intensity of transmitted light based on a control signal from a control circuit (image control unit 1160 in Figure 3) that constitutes the electronic device, together with the LED element 201, which is a solid light source. At this time, the generated image light has a narrow diffusion angle and consists only of specific polarization components, so a new and unprecedented image display device is obtained that is similar to a surface-emitting laser image source driven by an image signal. Currently, it is technically and safely impossible to obtain a laser beam of the same size as the image obtained by the above-described display device 1 using a laser device. Therefore, in this embodiment, for example, light similar to the surface-emitting laser image light described above is obtained from a light beam from a general light source equipped with an LED element. 【0147】 Next, the configuration of the optical system housed within the case of the light source device 13 will be explained in detail with reference to Figure 7, along with Figure 6. 【0148】 Since Figures 6 and 7 are cross-sectional views, only one of the multiple LED elements 201 constituting the light source is shown, and these are converted into approximately collimated light by the shape of the light-receiving end face 203a of the light guide 203. For this reason, the light-receiving portion of the end face of the light guide and the LED elements are mounted while maintaining a predetermined positional relationship. 【0149】 Each of these light guides 203 is formed from a translucent resin such as acrylic. The LED light-receiving surface at the end of the light guide 203 has, for example, a cone-shaped outer surface obtained by rotating a parabolic cross-section, with a recess at its apex that forms a convex portion (i.e., a convex lens surface) in its center, and a convex lens surface (or a concave lens surface that is recessed inward) in the center of its flat surface (not shown). The outer shape of the light-receiving portion of the light guide to which the LED element 201 is attached is a parabolic shape that forms a cone-shaped outer surface, and is set within an angle range that allows for total internal reflection of light emitted from the LED element in the peripheral direction, or a reflective surface is formed. 【0150】 On the other hand, the LED elements 201 are each positioned at predetermined locations on the surface of the LED substrate 202, which is their circuit board. The LED substrate 202 is fixed to the LED collimator (light-receiving end face 203a) such that the LED elements 201 on its surface are each positioned in the center of the aforementioned recess. 【0151】 With this configuration, the shape of the light-receiving end face 203a of the light guide 203 makes it possible to extract the light emitted from the LED element 201 as substantially parallel light, thereby improving the utilization efficiency of the generated light. 【0152】 As described above, the light source device 13 is configured by attaching a light source unit, which consists of multiple LED elements 201 arranged in a row, to a light-receiving end surface 203a, which is a light-receiving part provided on the end face of a light guide 203. The divergent light beam from the LED elements 201 is converted into approximately parallel light by the lens shape of the light-receiving end surface 203a on the end face of the light guide, and guided through the inside of the light guide 203 (in the direction parallel to the drawing) as shown by the arrows. The light beam direction conversion means 204 then emits the light towards the liquid crystal display panel 11, which is arranged approximately parallel to the light guide 203 (in the direction perpendicular to the viewer in the drawing). By optimizing the distribution (density) of this light beam direction conversion means 204 depending on the shape of the inside or surface of the light guide, the uniformity of the light beam incident on the liquid crystal display panel 11 can be controlled. 【0153】 The aforementioned light beam direction conversion means 204, by the shape of the surface of the light guide or by providing, for example, a portion with a different refractive index inside the light guide, emits the light beam propagated within the light guide toward the liquid crystal display panel 11, which is positioned approximately parallel to the light guide 203 (in a direction perpendicular to the viewer in the drawing). At this time, if the relative brightness ratio when comparing the brightness of the center of the screen and the periphery of the screen with the liquid crystal display panel 11 facing the center of the screen and the viewpoint positioned at the same position as the screen diagonal dimension is 20% or more, there is no practical problem, and if it exceeds 30%, it is an even better characteristic. 【0154】 Figure 6 is a cross-sectional diagram illustrating the configuration and operation of the light source in this embodiment, which performs polarization conversion, in the light source device 13 including the light guide 203 and LED element 201 described above. In Figure 6, the light source device 13 consists of a light guide 203 with a light beam direction conversion means 204 provided on the surface or inside, which is formed of plastic or the like, an LED element 201 as a light source, a reflective sheet 205, a phase difference plate 206, a lenticular lens, etc., and a liquid crystal display panel 11 equipped with polarizing plates on the light source light incident surface and the image light output surface is attached to its upper surface. 【0155】 Furthermore, a film or sheet-like reflective polarizer 49 is provided on the light source light incidence surface (lower surface in the figure) of the liquid crystal display panel 11 corresponding to the light source device 13, selectively reflecting one side of the polarization (e.g., P-wave) 212 of the natural light beam 210 emitted from the LED element 201. The reflected light is reflected again by a reflective sheet 205 provided on one side (lower surface in the figure) of the light guide 203 and directed towards the liquid crystal display panel 11. Therefore, a phase difference plate (λ / 4 plate) is provided between the reflective sheet 205 and the light guide 203 or between the light guide 203 and the reflective polarizer 49, causing the reflected light beam to be reflected by the reflective sheet 205 and passed through twice, converting the reflected light beam from P-polarization to S-polarization and improving the efficiency of utilizing the light source light as image light. The image light beam whose light intensity has been modulated by the image signal in the liquid crystal display panel 11 (arrow 213 in Figure 6) is incident on the retroreflector 2. After reflection by the retroreflector 2, a spatially floating image, which is a real image, can be obtained. 【0156】 Figure 7, similar to Figure 6, is a cross-sectional arrangement diagram illustrating the configuration and operation of the light source in this embodiment, which performs polarization conversion, in a light source device 13 including a light guide 203 and an LED element 201. The light source device 13 is similarly composed of a light guide 203 with a light beam direction conversion means 204 provided on the surface or inside, for example, made of plastic, an LED element 201 as a light source, a reflective sheet 205, a phase difference plate 206, a lenticular lens, and the like. A liquid crystal display panel 11, equipped with polarizing plates on the light source light incident surface and the image light emission surface, is mounted on the upper surface of the light source device 13 as an image display element. 【0157】 Furthermore, a film or sheet-like reflective polarizer 49 is provided on the light source light incidence surface (lower surface in the figure) of the liquid crystal display panel 11 corresponding to the light source device 13, selectively reflecting one side of the polarization (e.g., S-wave) 211 of the natural light beam 210 emitted from the LED element 201. That is, in the example of Figure 7, the selective reflection characteristics of the reflective polarizer 49 are different from those in Figure 7. The reflected light is reflected by a reflective sheet 205 provided on one side (lower surface in the figure) of the light guide 203 and returns to the liquid crystal display panel 11. A phase difference plate (λ / 4 plate) is provided between the reflective sheet 205 and the light guide 203 or between the light guide 203 and the reflective polarizer 49, causing the reflected light beam to be reflected by the reflective sheet 205 and passed through twice, thereby converting the reflected light beam from S-polarization to P-polarization and improving the utilization efficiency of the light source light as image light. The image light beam whose light intensity has been modulated by the image signal in the liquid crystal display panel 11 (arrow 214 in Figure 7) is incident on the retroreflector 2. After reflection by retroreflector 2, a real image, which is a floating image in space, can be obtained. 【0158】 In the light source devices shown in Figures 6 and 7, in addition to the action of the polarizer provided on the light incident surface of the corresponding liquid crystal display panel 11, a reflective polarizer reflects one side of the polarization component. Therefore, the theoretically obtainable contrast ratio is the product of the reciprocal of the cross transmittance of the reflective polarizer and the reciprocal of the cross transmittance obtained by the two polarizers attached to the liquid crystal display panel. This results in high contrast performance. In actual experiments, it was confirmed that the contrast performance of the displayed image improved by more than 10 times. As a result, high-quality images comparable to those of self-emissive organic EL displays were obtained. 【0159】 <Example of display device 2> Figure 8 shows another example of the specific configuration of the display device 1. This light source device 13 is constructed by housing LEDs, a collimator, a composite diffusion block, a light guide, etc., in a case made of, for example, plastic, and a liquid crystal display panel 11 is mounted on its top surface. In addition, an LED substrate on which semiconductor light source LED (Light Emitting Diode) elements 14a and 14b and their control circuits are mounted is attached to one side of the case of the light source device 13, and a heat sink 103, which is a component for cooling the heat generated by the LED elements and control circuits, is attached to the outer surface of the LED substrate. 【0160】 Furthermore, the liquid crystal display panel frame attached to the top surface of the case is configured with a liquid crystal display panel 11 mounted on the frame, and an FPC (Flexible Printed Circuits) 403 electrically connected to the liquid crystal display panel 11. In other words, the liquid crystal display panel 11, which is a liquid crystal display element, generates a display image by modulating the intensity of transmitted light based on control signals from a control circuit (not shown here) that constitutes the electronic device, together with LED elements 14a and 14b, which are solid light sources. 【0161】 <Example of a display device 3> Next, using Figure 9, another example of the specific configuration of the display device 1 (Example 3 of the display device) will be explained. In this display device 1, the light source device converts the divergent luminous flux of light from the LED (a mixture of P-polarized and S-polarized light) into a nearly parallel luminous flux by the collimator 18, and reflects it toward the liquid crystal display panel 11 by the reflective surface of the reflective light guide 304. The reflected light is incident on a reflective polarizer 49 placed between the liquid crystal display panel 11 and the reflective light guide 304. The reflective polarizer 49 transmits light of a specific polarization (e.g., P-polarized light), and the transmitted polarized light is incident on the liquid crystal display panel 11. Here, other polarizations other than the specific polarization (e.g., S-polarized light) are reflected by the reflective polarizer 49 and return to the reflective light guide 304. 【0162】 The reflective polarizer 49 is installed at an angle to the liquid crystal display panel 11 so as not to be perpendicular to the principal ray of light from the reflective surface of the reflective light guide 304. The principal ray of light reflected by the reflective polarizer 49 is incident on the transmissive surface of the reflective light guide 304. The light incident on the transmissive surface of the reflective light guide 304 passes through the back of the reflective light guide 304, passes through the λ / 4 plate 270 which is a phase difference plate, and is reflected by the reflector 271. The light reflected by the reflector 271 passes through the λ / 4 plate 270 again and passes through the transmissive surface of the reflective light guide 304. The light that has passed through the transmissive surface of the reflective light guide 304 is incident on the reflective polarizer 49 again. 【0163】 At this time, the light that again enters the reflective polarizer 49 has passed through the λ / 4 plate 270 twice, so its polarization has been converted to a polarization that can be transmitted through the reflective polarizer 49 (for example, P-polarization). Therefore, the light whose polarization has been converted passes through the reflective polarizer 49 and enters the liquid crystal display panel 11. Regarding the polarization design related to polarization conversion, it is also acceptable to configure the polarization in reverse from the above explanation (reversing S-polarization and P-polarization). 【0164】 As a result, the light from the LEDs is aligned to a specific polarization (e.g., P-polarization), incident on the liquid crystal display panel 11, and is luminance-modulated in accordance with the video signal to display an image on the panel surface. Similar to the example described above, multiple LEDs constituting the light source are shown (however, only one is shown in Figure 9 because it is a vertical cross-section), and these are mounted in predetermined positions relative to the collimator 18. 【0165】 Each collimator 18 is formed from a translucent resin such as acrylic or glass. The collimator 18 may have a cone-shaped outer surface obtained by rotating a parabolic cross-section. The collimator 18 may also have a recess with a convex portion (i.e., a convex lens surface) in the center of the top portion (the side facing the LED substrate 102). Furthermore, the central part of the planar portion of the collimator 18 (the side opposite to the top portion) has a convex lens surface that protrudes outward (or a concave lens surface that is recessed inward). The parabolic surface forming the cone-shaped outer surface of the collimator 18 is set within an angle range that allows for total internal reflection of light emitted from the LED in the peripheral direction, or a reflective surface is formed thereon. 【0166】 The LEDs are each positioned at predetermined locations on the surface of the LED board 102, which is the circuit board for the LEDs. The LED board 102 is fixed to the collimator 18 such that the LEDs on its surface are each positioned at the center of the apex of the cone-shaped convex form (or in the recess if there is a recess at the apex). 【0167】 With this configuration, the collimator 18 focuses the light emitted from the LED, particularly the light emitted from its central portion, into parallel light due to the convex lens surface that forms the outer shape of the collimator 18. Similarly, the light emitted from other parts toward the periphery is reflected by the parabolic surface that forms the conical outer surface of the collimator 18, and is also focused into parallel light. In other words, a collimator 18 with a convex lens in its center and a parabolic surface around its periphery makes it possible to extract almost all of the light generated by the LED as parallel light, thereby improving the utilization efficiency of the generated light. 【0168】 Furthermore, the light converted to nearly parallel light by the collimator 18 shown in Figure 9 is reflected by the reflective light guide 304. Of this light, light of a specific polarization is transmitted through the reflective polarizer 49, and the light of the other polarization reflected by the reflective polarizer 49 is transmitted through the light guide 304 again. This light is reflected by the reflector 271, which is located opposite the liquid crystal display panel 11 to the reflective light guide 304. At this time, the light undergoes polarization conversion by passing through the λ / 4 plate 270, which is a phase difference plate, twice. The light reflected by the reflector 271 is transmitted through the light guide 304 again and incident on the reflective polarizer 49 located on the opposite side. Since the polarization conversion has been performed on this incident light, it is transmitted through the reflective polarizer 49, and its polarization direction is aligned before it is incident on the liquid crystal display panel 11. As a result, all of the light from the light source can be utilized, so the geometrical optical utilization efficiency of the light is doubled. Furthermore, since the polarization degree (extinction ratio) of the reflective polarizer is also added to the extinction ratio of the entire system, the contrast ratio of the entire display device is significantly improved by using the light source device of this embodiment. The reflection and diffusion angles of light at each reflective surface can be adjusted by adjusting the surface roughness of the reflective surface of the reflective light guide 304 and the surface roughness of the reflector 271. The surface roughness of the reflective surface of the reflective light guide 304 and the surface roughness of the reflector 271 should be adjusted for each design to further optimize the uniformity of the light incident on the liquid crystal display panel 11. 【0169】 Note that the λ / 4 plate 270, which is the phase difference plate in Figure 9, does not necessarily need to have a phase difference of λ / 4 with respect to polarization incident perpendicularly to the λ / 4 plate 270. In the configuration of Figure 9, any phase difference plate that changes its phase by 90° (λ / 2) after the polarization passes through it twice is sufficient. The thickness of the phase difference plate should be adjusted according to the incident angle distribution of the polarization. 【0170】 <Example of a display device 4> Furthermore, another example of the optical system configuration for the light source device of a display device (Example 4 of a display device) will be explained using Figure 10. This is an example of a configuration in which a diffusion sheet is used instead of the reflective light guide 304 in the light source device of Example 3 of a display device. Specifically, two optical sheets (optical sheet 207A and optical sheet 207B) that convert the diffusion characteristics in the vertical and horizontal directions (front and back directions in the figure, not shown) are used on the light output side of the collimator 18, and the light from the collimator 18 is incident between the two optical sheets (diffusion sheet). 【0171】 Note that the optical sheet described above may be a single sheet instead of two. If a single sheet is used, the vertical and horizontal diffusion characteristics are adjusted by the micro-shapes of the front and back surfaces of the single optical sheet. Alternatively, multiple diffusion sheets may be used to share the function. In the example shown in Figure 10, the reflection and diffusion characteristics of optical sheets 207A and 207B are optimized by designing the number of LEDs, the divergence angle from the LED substrate (optical element) 102, and the optical specifications of the collimator 18 as design parameters so that the surface density of the light beam emitted from the liquid crystal display panel 11 is uniform. In other words, the diffusion characteristics are adjusted by the surface shapes of multiple diffusion sheets instead of a light guide. 【0172】 In the example shown in Figure 10, polarization conversion is performed in the same manner as in Example 3 of the display device described above. That is, in the example shown in Figure 10, the reflective polarizer 49 should be configured to have the characteristic of reflecting S-polarized light (and transmitting P-polarized light). In that case, P-polarized light from the light source LED is transmitted, and the transmitted light is incident on the liquid crystal display panel 11. S-polarized light from the light source LED is reflected, and the reflected light passes through the phase difference plate 270 shown in Figure 10. The light that has passed through the phase difference plate 270 is reflected by the reflector 271. The light reflected by the reflector 271 is converted to P-polarized light by passing through the phase difference plate 270 again. The polarized light passes through the reflective polarizer 49 and is incident on the liquid crystal display panel 11. 【0173】 Note that the λ / 4 plate 270, which is the phase difference plate in Figure 10, does not necessarily need to have a phase difference of λ / 4 with respect to polarization incident perpendicularly to the λ / 4 plate 270. In the configuration of Figure 10, any phase difference plate that changes its phase by 90° (λ / 2) after the polarization passes through it twice is sufficient. The thickness of the phase difference plate should be adjusted according to the incident angle distribution of the polarization. Also, in Figure 10, regarding the polarization design related to polarization conversion, the polarization can be reversed from the explanation above (S-polarization and P-polarization are reversed). 【0174】 In typical TV applications, the light emitted from the liquid crystal display panel 11 has similar diffusion characteristics in both the horizontal direction (shown on the X-axis in Figure 12(a)) and the vertical direction (shown on the Y-axis in Figure 12(b)). In contrast, the diffusion characteristics of the light beam emitted from the liquid crystal display panel in this embodiment are such that, for example, as shown in Example 1 in Figure 12, the viewing angle at which the brightness is 50% of that of a front view (0-degree angle) is 13 degrees, which is 1 / 5 of the 62 degrees of a typical TV application. Similarly, the vertical viewing angle is made uneven, with the upper viewing angle being reduced to about 1 / 3 of the lower viewing angle by optimizing the reflection angle of the reflective light guide and the area of the reflective surface. As a result, compared to conventional liquid crystal TVs, the amount of image light directed towards the viewing direction is significantly improved, and the brightness is more than 50 times higher. 【0175】 Furthermore, with the viewing angle characteristics shown in Example 2 of Figure 12, the viewing angle at which the brightness is 50% of that of a front view (0-degree angle) is set to 5 degrees, which is 1 / 12 of the 62 degrees of a typical TV device. Similarly, the vertical viewing angle is made uniform both vertically and horizontally, and the reflection angle of the reflective light guide and the area of the reflective surface are optimized to reduce the viewing angle to about 1 / 12 of that of a typical TV device. As a result, the amount of image light directed towards the monitoring direction is significantly improved compared to conventional LCD TVs, and the brightness becomes more than 100 times higher. 【0176】 As described above, by narrowing the viewing angle, the amount of light flux directed towards the monitoring direction can be concentrated, significantly improving the efficiency of light utilization. As a result, even when using a liquid crystal display panel for general TV applications, by controlling the light diffusion characteristics of the light source device, a significant increase in brightness can be achieved with similar power consumption, making it possible to create a video display device that is suitable for information display systems for bright outdoor environments. 【0177】 When using a large LCD display panel, the overall brightness of the screen can be improved by directing the light from the edges of the screen inward so that it is directed towards the monitor when the monitor is facing the center of the screen. Figure 11 shows the convergence angles of the long and short sides of the panel, with the monitor's distance L from the panel and the panel size (screen aspect ratio 16:10) as parameters. When monitoring with the screen in portrait orientation, the convergence angle should be set to match the short side. For example, with a 22-inch panel used vertically and a monitoring distance of 0.8m, setting the convergence angle to 10 degrees will effectively direct the image light from the four corners of the screen towards the monitor. 【0178】 Similarly, when monitoring with a 15-inch panel in portrait orientation, if the monitoring distance is 0.8m, a convergence angle of 7 degrees will effectively direct the image light from the four corners of the screen towards the monitor. As described above, depending on the size of the LCD display panel and whether it is used vertically or horizontally, the overall brightness of the screen can be improved by directing the image light from the periphery of the screen towards the monitor who is in the optimal position to monitor the center of the screen. 【0179】 In its basic configuration, as shown in Figure 9, a light source device emits a light beam with narrow-angle directional characteristics onto the liquid crystal display panel 11. By modulating the brightness in accordance with the video signal, the video information displayed on the screen of the liquid crystal display panel 11 is reflected by a retroreflector, and the resulting floating image is displayed outdoors or indoors via a transparent member 100. 【0180】 By using the display device and light source device according to one embodiment of the present invention described above, it becomes possible to realize a spatial floating image display device with higher light utilization efficiency. 【0181】 <Example of video display processing in a spatially floating video display device> Next, an example of a problem that the image processing of this embodiment solves will be explained using Figure 13A. In the floating spatial image display device 1000, when the area behind the floating spatial image 3 is inside the housing of the floating spatial image display device 1000 as seen from the user's perspective, and is sufficiently dark, the user perceives the background of the floating spatial image 3 as black. 【0182】 Here, using Figure 13A, we will explain an example of displaying the character "Panda" 1525 in the floating spatial image 3. First, the image control unit 1160 in Figure 3 recognizes the pixel area for drawing the image of the character "Panda" 1525 and the transparent information area 1520, which is the background image, in an image that includes both the pixel area for drawing the image of the character "Panda" 1525 and the transparent information area 1520, which is the background image, as shown in Figure 13A(1). 【0183】 One method for distinguishing and recognizing character images and background images is to configure the image processing of the video control unit 1160 so that the background image layer and the character image layer in front of the background image layer can be processed as separate layers, and the character images and background images can be distinguished and recognized based on the superposition relationship when these layers are combined. 【0184】 Here, the video control unit 1160 recognizes the black pixels that render objects such as character images and the transparent information pixels as different information. However, it is assumed that both the black pixels that render objects and the transparent information pixels have a brightness of 0. In this case, when displaying the floating spatial image 3, there is no difference in brightness between the pixels that render black in the image of the character "panda" 1525 and the pixels of the transparent information area 1520, which is the background image. Therefore, in the floating spatial image 3, as shown in Figure 13A(2), neither the pixels that render black in the image of the character "panda" 1525 nor the pixels of the transparent information area 1520 have brightness, and are optically perceived by the user as the same black space. In other words, the parts of the image of the character "panda" 1525 that are rendered in black blend into the background, and only the non-black parts of the character "panda" 1525 are perceived as floating in the display area of the floating spatial image 3. 【0185】 An example of image processing in this embodiment will be explained using Figure 13B. Figure 13B illustrates an example of image processing that more effectively resolves the problem described in Figure 13A, where the black image area of an object blends into the background. In Figures 13B(1) and (2), the display state of the floating spatial image 3 is shown on the upper side, and the input / output characteristics of the image processing of the object's image are shown on the lower side. The image of the object (character "Panda" 1525) and its corresponding data may be read from the storage unit 1170 or memory 1109 in Figure 3. Alternatively, it may be input from the video signal input unit 1131. Alternatively, it may be acquired via the communication unit 1132. 【0186】 In the state shown in Figure 13B(1), the input / output characteristics of the image processing of the object's image are in a linear state with no special adjustments. In this case, the display state is the same as in Figure 13A(2), and the black image area of the object blends into the background. In contrast, in Figure 13B(2), the video control unit 1160 of this embodiment adjusts the input / output characteristics of the image processing of the object (character "Panda" 1525) to the input / output characteristics shown in the lower section. 【0187】 In other words, the video control unit 1160 performs input-output characteristic image processing on the image of the object (character "Panda" 1525), which has the characteristic of converting the pixels of the input image into output pixels with increased brightness values for pixels in the low-brightness region. After the image of the object (character "Panda" 1525) is subjected to this input-output characteristic image processing, the video including the image of the object (character "Panda" 1525) is input to the display device 1 and displayed. As a result, the display state of the floating video 3 is as shown in the upper part of Figure 13B(2), where the brightness of the pixel regions that draw black in the image of the character "Panda" 1525 increases. This makes it possible to distinguish the regions that draw black within the area where the image of the character "Panda" 1525 is drawn from the black background and allow the user to recognize it, making it possible to display the object more favorably. 【0188】 In other words, by using the image processing shown in Figure 13B(2), the area displaying the image of the object character "Panda" 1525 can be distinguished from the black background inside the housing of the floating image display device 1000 via the window, thereby improving the visibility of the object. Therefore, for example, even if an object contains pixels with a brightness value of 0 before the image processing (i.e., when the image of the object and its corresponding data are read from the storage unit 1170 or memory 1109 in Figure 3, or when the image of the object is input from the video signal input unit 1131, or when the data of the object is acquired via the communication unit 1132, etc.), the image processing of the input / output characteristics by the video control unit 1160 converts it into an object with increased brightness values in the low-brightness region, which is then displayed on the display device 1 and converted into a floating image 3 by the optical system of the floating image display device 1000. 【0189】 In other words, the pixels constituting the object after image processing of the input / output characteristics are converted to a state in which no pixels with a brightness value of 0 are included, and then displayed on the display device 1, and converted into a floating image 3 by the optical system of the floating image display device 1000. 【0190】 Furthermore, in the image processing shown in Figure 13B(2), one method for applying the input / output characteristics image processing shown in Figure 13B(2) only to the area of the object (character "Panda" 1525) image is to configure the image processing of the video control unit 1160 so that the background image layer and the character image layer in front of the background image layer can be processed as separate layers, and the input / output characteristics image processing shown in Figure 13B(2) can be applied to the character image layer, while the background image layer can not be processed in the same way. 【0191】 Subsequently, when these layers are combined, as shown in Figure 13B(2), only the character image will be subjected to image processing that enhances the low-luminance areas of the input image. Alternatively, the character image layer and the background image layer may be combined first, and then the image processing with the input / output characteristics shown in Figure 13B(2) may be applied only to the character image area. 【0192】 Furthermore, the input / output video characteristics used in video processing to enhance the low-luminance region of the input / output characteristics for the input video are not limited to the example shown in Figure 13B(2). Any video processing that enhances low luminance is acceptable, including so-called bright adjustment. Alternatively, video processing that improves visibility may be performed by controlling the gain that changes the weighting of the retinex processing, as disclosed in International Publication No. 2014 / 162533. 【0193】 As explained above, the image processing shown in Figure 13B(2) allows for the rendering of black areas within the image rendering area, such as characters and objects, to be recognized by the user without blending into the black background, thereby achieving a more suitable display. 【0194】 In the examples shown in Figures 13A and 13B, the challenges and more suitable image processing methods were explained using examples of spatial levitation image display devices where the background appears black (for example, the spatial levitation image display device 1000 in Figures 4A-G, or the spatial levitation image display device 1000 in Figures 4I and 4J with the rear window shading). However, this image processing method is also effective for devices other than these spatial levitation image display devices. 【0195】 Specifically, in the floating image display device 1000 shown in Figure 4H, and in the floating image display device 1000 shown in Figures 4I and 4J with the rear window not shading, the background of the floating image 3 is not black, but rather the scenery behind the floating image display device 1000 through the window. In this case as well, the problems described in Figures 13A and 13B still exist. 【0196】 In other words, the parts of the image of the character "Panda" 1525 that are rendered in black blend into the scenery behind the floating video display device 1000 through the window. In this case as well, by using the image processing shown in Figure 13B(2), the parts of the image of the character "Panda" 1525 that are rendered in black can be distinguished and recognized from the scenery behind the floating video display device 1000 through the window, thereby improving the visibility of the object. 【0197】 In other words, by using the image processing shown in Figure 13B(2), the area displaying the image of the object character "Panda" 1525 can be distinguished from the scenery behind the spatially floating image display device 1000 through the window, making it more favorable to recognize that the object character "Panda" 1525 is in front of the scenery, thereby improving the visibility of the object. 【0198】 Furthermore, in the floating image display device 1000 shown in Figures 4K, 4L, and 4M, as described above, if another image (such as the image from the transmissive self-emissive image display device 1650 or the image from the second display device 1680) is displayed at a different depth from the floating image 3, the background of the floating image 3 will not be black, but will be that other image. In this case as well, the problems described in Figures 13A and 13B still exist. 【0199】 In other words, the parts of the image of the object character "Panda" 1525 that are rendered in black blend into the other image which is displayed at a different depth from the floating spatial image 3. In this case as well, by using the image processing shown in Figure 13B(2), the parts of the image of the object character "Panda" 1525 that are rendered in black can be distinguished and recognized from the other image, thereby improving the visibility of the object. 【0200】 In other words, by using the image processing shown in Figure 13B(2), the area displaying the image of the object character "Panda" 1525 can be recognized separately from the other video, making it more favorable to recognize that the object character "Panda" 1525 is in front of the other video, thereby improving the visibility of the object. 【0201】 An example of the video display processing in this embodiment will be explained using Figure 13C. Figure 13C is an example of video display in this embodiment in which the floating video 3 and a second image 2050, which is another video, are displayed simultaneously. The second image 2050 may correspond to the display video of the transmissive self-emissive video display device 1650 in Figure 4K or Figure 4L. Alternatively, the second image 2050 may correspond to the display video of the second display device 1680 in Figure 4M. 【0202】 In other words, the example of video display in Figure 13C is a concrete example of the video display example of the spatial floating video display device 1000 shown in Figures 4K, 4L, and 4M. In this example, a bear character is displayed in spatial floating video 3. Areas other than the bear character in spatial floating video 3 are displayed in black and are transparent as spatial floating video. The second image 2050 is a background image depicting a plain, mountains, and the sun. 【0203】 In Figure 13C, the floating image 3 and the second image 2050 are displayed at different depths. By viewing the two images, floating image 3 and the second image 2050, in the line of sight indicated by arrow 2040, user 230 can see the two images superimposed. Specifically, the bear character from floating image 3 appears superimposed in front of the background of plains, mountains, and the sun depicted in the second image 2050. 【0204】 Here, since the floating image 3 is projected as a real image in the air, when user 230 slightly moves their viewpoint, they can perceive the depth of the floating image 3 and the second image 2050 due to parallax. Therefore, user 230 can view the two images superimposed and obtain a stronger sense of floating in the floating image 3. 【0205】 An example of the video display processing in this embodiment will be explained using Figure 13D. Figure 13D(1) is a view of the floating-in-space video 3 from the user 230's line of sight, as seen in the example of video display in this embodiment shown in Figure 13C. Here, a bear character is displayed in the floating-in-space video 3. Areas other than the bear character in the floating-in-space video 3 are displayed in black and are transparent as a floating-in-space video. 【0206】 Figure 13D(2) is a view of the second image 2050 from the user 230's line of sight, as seen in the example of video display of this embodiment in Figure 13C. In this example, the second image 2050 is a background image depicting a plain, mountains, and the sun. 【0207】 Figure 13D(3) shows an example of the video display in this embodiment of Figure 13C, where the second image 2050 and the floating spatial image 3 appear superimposed in the user 230's line of sight. Specifically, the bear character from the floating spatial image 3 appears superimposed in front of the background of plains, mountains, and the sun depicted in the second image 2050. 【0208】 Here, when displaying the floating image 3 and the second image 2050 simultaneously, it is desirable to pay attention to the balance of brightness between the two images in order to better ensure the visibility of the floating image 3. If the second image 2050 is too bright compared to the brightness of the floating image 3, the displayed image of the floating image 3 will become transparent, and the background, the second image 2050, will become transparent and strongly visible. 【0209】 Therefore, the output of the light source for the floating image 3 and the display image brightness of the display device 1, as well as the output of the light source of the display device that displays the second image 2050 and the display image brightness of the display device, should be set such that the brightness per unit area of the floating image 3 at the display position of the floating image 3 is greater than the brightness per unit area of the image light reaching the display position of the floating image 3 from the second image 2050. 【0210】 Furthermore, since this condition only needs to be met when the floating image 3 and the second image 2050 are displayed simultaneously, when switching from the first display mode, which displays only the second image 2050 without displaying the floating image 3, to the second display mode, which displays the floating image 3 and the second image 2050 simultaneously, control may be performed to reduce the brightness of the second image 2050 by lowering the output of the light source of the display device that displays the second image 2050 and / or the display image brightness of the display device. These controls can be achieved by the control unit 1110 in Figure 3 controlling the display device 1 and the display device that displays the second image 2050 (the transmissive self-emissive image display device 1650 in Figure 4K or Figure 4L, or the second display device 1680 in Figure 4M). 【0211】 Furthermore, when switching from the first display mode to the second display mode described above, if control is performed to reduce the brightness of the second image 2050, the brightness may be reduced uniformly across the entire screen of the second image 2050. Alternatively, instead of uniformly reducing the brightness across the entire screen of the second image 2050, the area where an object is displayed in the floating spatial image 3 may be set to the state with the highest brightness reduction effect, and the brightness reduction effect may be gradually reduced in the surrounding areas. In other words, if the brightness reduction of the second image 2050 is achieved only in the area where the floating spatial image 3 is superimposed on the second image 2050 and visible, the visibility of the floating spatial image 3 can be sufficiently ensured. 【0212】 Here, since the floating image 3 and the second image 2050 are displayed at different depths, if the user 230 slightly changes their viewpoint, the superimposed position of the floating image 3 on the second image 2050 changes due to parallax. Therefore, when switching from the first display mode to the second display mode described above, if the brightness is to be reduced unevenly across the entire screen of the second image 2050, it is not desirable to sharply reduce the brightness based on the outlines of the objects displayed in the floating image 3. Rather, it is desirable to perform a gradient processing of the brightness reduction effect, which changes the brightness reduction effect step by step depending on the position, as described above. 【0213】 In the case of the floating image display device 1000, where the position of the object displayed in the floating image 3 is approximately in the center of the floating image 3, the position where the brightness reduction effect of the gradient processing is highest should be the center of the floating image 3. 【0214】 According to the video display processing of this embodiment described above, the user 230 can more favorably view the floating video 3 and the second image 2050. 【0215】 Furthermore, when displaying the floating image 3, the display of the second image 2050 may be controlled to be disabled. Disabling the display of the second image 2050 improves the visibility of the floating image 3, making it suitable for floating image display devices 1000 and other applications where the user must reliably view the floating image 3 when it is displayed. 【0216】 <Example 2> As Embodiment 2 of the present invention, an example of another configuration of the floating image display device will be described. The floating image display device according to this embodiment is a modified version of the floating image display device described in Embodiment 1, with the optical system stored in the floating image display device being changed to the optical system shown in Figure 14(1) or Figure 14(2). In this embodiment, the differences from Embodiment 1 will be explained, and the same configuration as in Embodiment 1 will not be repeated. In the following description of this embodiment, the predetermined polarization and the other polarization are polarizations with a phase difference of 90° from each other. 【0217】 Figure 14(1) shows an example of the optical system and optical path according to this embodiment. The optical system shown in Figure 14(1) is a more compact configuration of the optical system in Figure 2C, with the display device 1 closer to the polarization separation member 101B. Detailed explanations of components in Figure 14(1) that have the same reference numerals as in Figure 2C will be omitted. 【0218】 In Figure 14(1), similar to Figure 2C, image light of a predetermined polarization (P-polarization in the figure) emitted from the display device 1 propagates perpendicularly from the image display surface of the display device 1. Here, similar to Figure 2C, the polarization separation member 101B selectively transmits the predetermined polarization (P-polarization in the figure) emitted from the display device 1 and reflects the other polarization (S-polarization in the figure). 【0219】 Therefore, the image light with a predetermined polarization (P polarization in the figure) that travels perpendicularly from the image display surface of the display device 1 passes through the polarization separation member 101B and reaches the retroreflector 2 to which the λ / 4 plate 21 is attached. The image light that is retroreflected by the retroreflector 2 and travels again toward the polarization separation member 101B has been converted from the predetermined polarization (P polarization in the figure) when emitted from the display device 1 to the other polarization (S polarization in the figure) by passing through the λ / 4 plate 21 twice. The image light that travels again toward the polarization separation member 101B is the other polarization (S polarization in the figure), and is therefore reflected by the polarization separation member 101B toward the position where the user should be. The direction of propagation of the image reflected by the polarization separation member 101B is determined based on the angle at which the polarization separation member 101B is positioned. 【0220】 In the example shown in Figure 14(1), the image light traveling toward the polarization separation member 101B is reflected at a right angle by the polarization separation member 101B and travels as shown in the figure. The image light reflected by the polarization separation member 101B forms a floating image 3A. The floating image 3A can be preferably viewed by the user from the direction of arrow A. 【0221】 Here, due to the retroreflective properties of the retroreflector 2, the optical path length from the image light emitted from the display device 1 to the retroreflector 2 is equal to the optical path length from the image light emitted from the retroreflector 2 to the position where the floating image 3A is formed. This relationship determines the position where the floating image 3A is formed in the direction of propagation of the image light reflected by the polarization separation member 101B. 【0222】 In the example in Figure 14(1), the display device 1, the polarization separation member 101B, and the retroreflector 2 are arranged closer together than in the example in Figure 2C. This makes the entire optical system more compact. However, the amount of floating image 3A that flies out from the optical system in Figure 14(1) is not very large. For example, as one indicator of the amount of floating image 3A that flies out from the optical system, the figure shows the distance from the position where the light rays from the central part of the image light are reflected by the polarization separation member 101B to the position where the image light forms the floating image 3A (L1 in the example in Figure 14(1)). 【0223】 Furthermore, regarding the polarization design in the optical system shown in Figure 14(1), the characteristics of P-polarization and S-polarization may be swapped. Specifically, a predetermined polarization of the image light emitted from the display device 1 may be defined as S-polarization, and the reflection characteristics of the polarization separation member 101B may be swapped between P-polarization and S-polarization. In this case, although the P-polarization and S-polarization shown will be reversed, the optical design, such as the optical path, can be realized in exactly the same way. 【0224】 Next, Figure 14(2) shows another example of the optical system and optical path according to this embodiment. The optical system in Figure 14(2) is a modified version of the optical system in Figure 14(1) that achieves the same compactness as the optical system in Figure 14(1) while increasing the amount of spatially floating image projected from the optical system. Detailed explanations of components in Figure 14(2) that are given the same reference numerals as in Figure 14(1) will be omitted. 【0225】 In Figure 14(2), similar to Figure 14(1), image light with a predetermined polarization (P-polarization in the figure) emitted from the display device 1 travels perpendicularly from the image display surface of the display device 1. Here, the polarization characteristics of the polarization separation member 101B are arranged 90 degrees differently from those in Figure 14(1). The image light with the predetermined polarization (P-polarization in the figure) traveling perpendicularly from the image display surface of the display device 1 passes through the polarization separation member 101B. 【0226】 Here, unlike in Figure 14(1), the image light, after passing through the polarization separation member 101B, is positioned not on the retroreflector 2 to which the λ / 4 plate 21 is attached, but on the specular reflector 4 to which the λ / 4 plate 21B is attached. Here, the reflection at the specular reflector 4 is specular reflection (also called specular reflection), and not retroreflection. 【0227】 Therefore, the image light that passes through the polarization separation member 101B is specularly reflected by the specular reflector 4 to which the λ / 4 plate 21B is attached. The image light that is specularly reflected by the specular reflector 4 and travels towards the polarization separation member 101B again has been converted from a predetermined polarization (P polarization in the figure) when emitted from the display device 1 to the other polarization (S polarization in the figure) by passing through the λ / 4 plate 21 twice. The image light that travels towards the polarization separation member 101B again has the other polarization (S polarization in the figure) and is therefore reflected by the polarization separation member 101B. 【0228】 Here, the orientation of the polarization separation member 101B in Figure 14(2) is different from that in Figure 14(1), so the image light reflected by the polarization separation member 101B travels in the opposite direction to where the user should be. A retroreflective plate 2 with a λ / 4 plate 21C attached is located where the image light reflected by the polarization separation member 101B travels. The image light is retroreflectively reflected by the retroreflective plate 2. The image light, retroreflectively reflected by the retroreflective plate 2 and traveling back towards the polarization separation member 101B, is converted back from the other polarization (S polarization in the figure) to the predetermined polarization (P polarization in the figure) by passing through the λ / 4 plate 21C twice. 【0229】 The image light, which has traveled again toward the polarization separation member 101B, has a predetermined polarization (P polarization in the figure), so it passes through the polarization separation member 101B and continues to travel toward the position where the user should be. The image light that has passed through the polarization separation member 101B forms the floating image 3B. The floating image 3B can be viewed favorably by the user from the direction of arrow A. 【0230】 Here, in Figure 14(2), as in Figure 14(1), due to the retroreflection characteristics of the retroreflector 2, the optical path length from the image light emitted from the display device 1 to the retroreflector 2 is equal to the optical path length from the image light emitted from the retroreflector 2 to the position where the floating image 3B is formed. This relationship determines the position where the floating image 3B is formed in the direction of propagation of the image light that has passed through the polarization separation member 101B. 【0231】 In Figure 14(2), the optical path length from the display device 1 to the retroreflector 2 is longer than the optical path length from the display device 1 to the retroreflector 2 in Figure 14(1). This is because, in the optical system of Figure 14(2), an optical path that travels back and forth between the polarization separation member 101B and the specular reflector 4, which is not present in the optical system of Figure 14(1), is added to the optical path length from the display device 1 to the retroreflector 2. 【0232】 As a result, in the optical system of Figure 14(2), the distance from the position where the central portion of the light beam passes through the polarization separation member 101B to the position where the light beam forms the floating image 3B (L2 in the example of Figure 14(2)) is significantly longer than the distance from the position where the central portion of the light beam is reflected by the polarization separation member 101B to the position where the light beam forms the floating image 3A (L1 in the example of Figure 14(1)) in the optical system of Figure 14(1). 【0233】 Furthermore, the polarization design in the optical system shown in Figure 14(2) may also involve swapping the characteristics of P-polarization and S-polarization. Specifically, a predetermined polarization of the image light emitted from the display device 1 may be defined as S-polarization, and the reflection characteristics of the polarization separation member 101B may be swapped between P-polarization and S-polarization. In this case, although the P-polarization and S-polarization shown in the figure will be reversed, the optical design, such as the optical path, can be realized in exactly the same way. 【0234】 As described above, the optical systems in Embodiment 2 of the present invention, as shown in Figures 14(1) and 14(2), allow for the realization of a more compact optical system. In particular, the optical system in Figure 14(2) allows for a larger amount of spatially levitating images to project from the optical system, while maintaining a more compact design. 【0235】 Furthermore, when incorporating the optical system shown in Figure 14(1) or Figure 14(2) into a floating image display device, this can be achieved by replacing the optical system in the floating image display device described in Example 1 with the optical system shown in Figure 14(1) or Figure 14(2). Specifically, the optical system in Figure 14(1) may be replaced with the optical system of the floating image display device shown in Figures 4E, 4F, 4G, 4H, 4I, 4J, 4K, 4L, or 4M. In this case, the optical system becomes more compact, making it possible to further reduce the size of the housing of the floating image display device shown in each figure. 【0236】 Furthermore, the optical system in Figure 14(2) may be replaced with the optical system of the levitating image display device in Figures 4E, 4F, 4G, 4K, or 4L. In this case, it becomes possible to increase the amount of levitating image projected from the optical system. Also, since the optical system becomes more compact, it becomes possible to make the housing of the levitating image display device in each figure smaller. 【0237】 <Example 3> As Embodiment 3 of the present invention, an example of a display in the floating spatial image display device 1000 described in the figures of Embodiment 1 or Embodiment 2 will be described. For Embodiment 3 of the present invention, either the floating spatial image display device 1000 shown in the figures of Embodiment 1 or Embodiment 2 may be used. In this embodiment, the differences from Embodiment 1 or Embodiment 2 will be explained, and repeating explanations of configurations similar to those in these embodiments will be omitted. 【0238】 Figure 15A shows an example of a display in the floating-space image display device 1000 of Embodiment 3, in which a rendered image of a 3D model of character 1532 is displayed in floating-space image 3. The 3D model of character 1532 displayed in floating-space image 3 is rendered as an image captured from a predetermined viewpoint (virtual 3D space camera) in a virtual 3D space. The rendered image is displayed on the display device 1, passes through the optical system of the floating-space image display device 1000, and is then displayed in floating-space image 3. 【0239】 Regarding the description of FIG. 15A, the black display indicates the black display state on the liquid crystal display panel 11 of the display device 1. In the spatial floating image 3, the black display is transparent in space, so there is no luminance or color and it is in the air. That is, in the example of FIG. 15A, only the character 1532 displayed in the spatial floating image 3 is visually recognized by the user as floating in the air. 【0240】 Note that in the spatial floating image display device 1000, a first processing example for realizing the display of the rendering image of a 3D model such as the character 1532 as shown in FIG. 15A will be described using the configuration of the spatial floating image display device 1000 in FIG. 3. Specifically, first, a video generation program capable of generating a rendering image of a 3D model such as a character is stored in the storage unit 1170. The control unit 1110 reads out the video generation program from the storage unit 1170 and expands it in the memory 1109. The control unit 1110 executes the video generation program expanded in the memory 1109, and the video generation program renders a 3D model such as a character to generate a video to be displayed on the display device 1. 【0241】 Here, the video control unit 1160 may perform control to display the generated video of the character 1532 on the display device 1. 【0242】 Also, in the spatial floating image display device 1000, a second processing example for realizing the display of the rendering image of a 3D model such as the character 1532 as shown in FIG. 15A will be described using the configuration of the spatial floating image display device 1000 in FIG. 3. 【0243】 In the second processing example, a pre-rendered rendering image is stored in the storage unit 1170 in advance. This is an example in which the video control unit 1160 performs control to reproduce the rendering image stored in the storage unit 1170 and display it on the display device 11. The rendering image stored in the storage unit 1170 in advance may be a rendering image by the video generation program expanded in the memory 1109 as in the first processing example. 【0244】 Alternatively, the rendered video stored in the storage unit 1170 may have been pre-rendered by an external device, acquired by the spatial floating video display device 1000 via the communication unit 1132, and stored in the storage unit 1170. 【0245】 Furthermore, a third processing example for realizing the display of a rendered image of a 3D model such as a character 1532 in the floating spatial image display device 1000 as shown in Figure 15A will be explained using the configuration of the floating spatial image display device 1000 shown in Figure 3. In the third processing example, the floating spatial image display device 1000 acquires a rendered image that has been pre-rendered by an external device via the communication unit 1132. In the third processing example, the rendered image acquired via the communication unit 1132 can be displayed on the display device 1 as a rendered image of the generated character 1532 under the control of the image control unit 1160, without going through storage in the storage unit 1170. 【0246】 Next, Figure 15B shows an example of a display example in the floating-space image display device 1000 of Embodiment 3, where a rendered image of a virtual 3D space containing 3D models of character 1531 and character 1532 is displayed in floating-space image 3. The 3D models of character 1531 and character 1532 displayed in floating-space image 3 are rendered as images captured from a predetermined viewpoint in the virtual 3D space. The rendered image is displayed on the display device 1, passes through the optical system of the floating-space image display device 1000, and is displayed in floating-space image 3. For the purposes of explaining Figure 15B, the black display is a black display state on the liquid crystal display panel 11 of the display device 1. In floating-space image 3, the black display is transparent in space, so there is no brightness or color and it is in the air. That is, in the example of Figure 15B, only character 1531 and character 1532 displayed in floating-space image 3 are perceived by the user as floating in the air. 【0247】 Here, Figure 15B shows a rendering using parallel projection in a virtual 3D space containing 3D models of character 1531 and character 1532. Therefore, the height settings of the 3D models of character 1531 and character 1532 can be accurately recognized based on the rendering image in Figure 15B. Character 1531 is taller than character 1532, and character 1532 is shorter than character 1531. 【0248】 Next, Figure 15C shows a rendering image displayed on the display device 1, rendered using perspective projection in a virtual 3D space containing 3D models of character 1531 and character 1532. Here, Figure 15C is an example where the field of view of the virtual 3D space camera, which serves as the rendering viewpoint, is set to wide angle. Due to the wide-angle distortion of the virtual 3D space camera, character 1532, who is standing near the edge of the field of view, appears more distorted than character 1531, who is standing near the center of the field of view. As a result, character 1532, who is actually shorter than character 1531, appears taller than character 1531. 【0249】 For example, if the same image as in Figure 15C is displayed on a fixed-pixel display device with a general rectangular display area instead of a floating-space image, the user can see that the area corresponding to the black area of the floating-space image 3 is displayed in black on the fixed pixels. This allows the user to recognize the rectangular display area of the fixed-pixel display device that corresponds to the rectangle of the floating-space image 3. If the user can recognize the rectangular display area of the fixed-pixel display device, they can infer the effect of wide-angle distortion based on that rectangle and the positions of characters 1532 and 1531 relative to it, and the user can recognize that character 1531 is significantly distorted. 【0250】 In contrast, when the image in Figure 15C is displayed as a floating image, the black area of the floating image 3 is transparent to the user and therefore cannot be seen. Thus, the user cannot perceive the rectangular shape of the outer perimeter of the display area of the floating image 3. In the example of Figure 15C, only characters 1531 and 1532 displayed in the floating image 3 appear to be floating in the air to the user. The user perceives that character 1532 is displayed larger than character 1531. Here, since the user cannot perceive the rectangular shape of the outer perimeter of the display area of the floating image 3, it is not easy for the user to infer that the reason character 1532 is displayed larger is due to the effect of wide-angle distortion. 【0251】 For illustrative purposes, Figure 15D shows an example where the black area within the rectangular display area of the floating spatial image 3 is removed from the image in Figure 15C. 【0252】 As can be seen from the example in Figure 15D, if the display of the black area within the rectangular display area of the floating space image 3 is removed from the image in Figure 15C, it is not easy for the user to recognize the center of the rectangular display area of the floating space image 3. Consequently, it is not easy for the user to understand that character 1531 is standing in the center of the rectangular display area of the floating space image 3, and character 1532 is standing at the right edge of the rectangular display area of the floating space image 3. Therefore, it is not easy for the user to recognize whether or not character 1532 is distorted due to wide-angle distortion. In such a display state, it is quite possible that the user may mistakenly perceive character 1532 as a larger character than character 1531. In the following explanation, an example of more suitable display control will be described. 【0253】 First, we will explain an example of how to generate a rendered image by capturing a 3D model of a character in a virtual 3D space with a virtual camera, using Figure 16A. When generating a rendered image that includes objects in a virtual 3D space, the field of view of the virtual 3D camera determines the range of that image. The field of view of the virtual 3D camera can also be set by mimicking an optical camera in real space, using the size of the virtual image sensor and the focal length of the virtual lens. 【0254】 Figure 16A shows an example of the field of view of virtual 3D space cameras 1611 and 1612, each equipped with a predetermined imaging sensor, in a virtual 3D space. In this example, the size of the imaging sensors for virtual 3D space cameras 1611 and 1612 is assumed to be the same. Both virtual 3D space cameras capture images within the range including the character position 1620 where the character is placed, such that the screen width of the captured image range is 1630. 【0255】 Here, the focal length of the virtual 3D camera 1611 is shorter than the focal length of the virtual 3D camera 1612. That is, even though the screen width 1630 of the image capture range is the same, the field of view of the virtual 3D camera 1611 is wider than that of the virtual 3D camera 1612. Note that the relationship between the specific value of the focal length setting of the virtual 3D camera and the field of view is also affected by the size of the image sensor. Therefore, it is possible to simplify the explanation by converting the image sensor size to the focal length equivalent to that of 35mm film. Accordingly, in the following explanation, the focal length equivalent to 35mm film will be used. 【0256】 Here, since the field of view of the virtual 3D space camera 1611 is wider than that of the virtual 3D space camera 1612, the captured character may be distorted by wide-angle distortion at the field of view of the virtual 3D space camera 1611 with focal length A. In contrast, since the field of view of the virtual 3D space camera 1612 is more telephoto than that of the virtual 3D space camera 1611, the wide-angle distortion of the captured character will be relatively reduced. Also, the captured range will change depending on the field of view for backgrounds that are farther from the virtual 3D space camera than the character position 1620 in the virtual 3D space. However, in black background spaces, it is not easy for the user to recognize the difference in the range of the cropped background based on the rendered image. 【0257】 As explained above, it is possible to generate rendered images by capturing a 3D model of a character in a virtual 3D space with a virtual camera. However, due to the field of view of the virtual 3D space camera, the character in the rendered image will be affected by wide-angle distortion. 【0258】 Next, using Figure 16B, we will explain the occurrence of discomfort due to wide-angle distortion when a user views the display screen of a display device such as the floating image display device 1000, and how to resolve it. Figure 16B shows an example in which a user views the optical image of the floating image, which is the display screen of the floating image display device. The floating image, which is the display screen, is rectangular in shape. Let Pa be the diagonal length of the rectangle. The floating image can display a rendering image as an image, such as a 3D model of a character in a virtual 3D space, which is captured by a virtual camera and rendered using perspective projection, as shown in Figure 15A. The viewing distance Lm is defined as the distance from the user's eye position 1660 in real space to view the display screen of the floating image display device 1000. 【0259】 In Figure 16B, the screen size Im in the virtual space, when the rendered image is generated, is superimposed on the display screen of a display device such as the floating image display device 1000 in the real space. This allows for a comparison between the field of view of the virtual 3D camera in the virtual 3D space and the field of view when the user perceives the display screen in the real space. Figure 16B shows three virtual 3D cameras with different fields of view: virtual 3D camera 1651, virtual 3D camera 1652, and virtual 3D camera 1653. 【0260】 In the comparison of field of view shown in Figure 16B, a virtual 3D camera 1651, which has a wider field of view than the field of view from the user's eye position 1660 in real space to view the display screen of a display device such as the floating spatial image display device 1000, captures an image rendered using perspective projection and displays it on the floating spatial image display device 1000. When the user views the image from their eye position 1660, a wide-angle distortion occurs that would not normally be visible from the user's eye position 1660, causing the user to feel a sense of unease with the image. 【0261】 In response to this, a virtual 3D camera 1652, which has a field of view that is almost the same as the field of view from the user's eye position 1660 in real space to view the display screen of a display device such as the floating image display device 1000, or a virtual 3D camera 1653, which has a more telephoto field of view than the field of view from the user's eye position 1660 in real space to view the display screen of a display device such as the floating image display device 1000, is used to capture images, which are then rendered using perspective projection and displayed on the floating image display device 1000. 【0262】 In this case, when the user views the image from eye position 1660, wide-angle distortion does not occur as in the image captured by the virtual 3D space camera 1651 and rendered using perspective projection, so the user does not feel the aforementioned discomfort. Note that as the field of view is increased, as with the virtual 3D space camera 1653, rendering using perspective projection approaches parallel projection, but the human eye perceives parallel projection less discomfort than wide-angle distortion in perspective projection. 【0263】 Therefore, in a situation like that shown in FIG. 16B, when an object such as a 3D model of a character in a virtual 3D space is photographed with a virtual 3D space camera and a rendering video obtained by rendering using the perspective projection method is displayed, and when a user in the real space views the display screen of a display device such as the spatial floating video display device 1000 from the position 1660 of the user's eyes in the real space, it is desirable that the angle of view of the virtual 3D space camera used for rendering has a telephoto angle of view equal to or greater than the angle of view when viewing the display screen of the display device such as the spatial floating video display device 1000 from the position 1660 of the user's eyes in the real space. 【0264】 That is, if the focal length of the virtual 3D space camera is Lf in terms of 35 mm film conversion, and the focal length of the virtual 3D space camera that gives the same angle of view as the angle of view when viewing the display screen of a display device such as the spatial floating video display device 1000 from the position 1660 of the user's eyes in the real space is Lf0 in terms of 35 mm film conversion, then the video may be rendered using a virtual 3D space camera such that the 35 mm film conversion focal length Lf ≥ Lf0 and displayed on the display screen of a display device such as the spatial floating video display device 1000. 【0265】 Lf0, which is the 35 mm film conversion of the focal length of the virtual 3D space camera that gives the same angle of view as the angle of view when viewing the display screen of a display device such as the spatial floating video display device 1000 from the position 1660 of the user's eyes in the real space, is obtained as follows. Since the angle of view when viewing the display screen of a display device such as the spatial floating video display device 1000 from the position 1660 of the user's eyes in the real space is equal to the angle of view of the virtual 3D space camera with a focal length of Lf0 in terms of 35 mm film conversion, the ratio of the viewing distance Lm to the diagonal length Pa of the rectangle of the spatial floating image, which is the display screen, is equal to the ratio of the 35 mm film conversion Lf0 of the focal length of the virtual 3D space camera to the diagonal length of the 35 mm film. 【0266】 In other words, if the diagonal length of a 35mm film is Fi, then Lm / Pa = Lf0 / Fi holds true. Rearranging this equation for Lf0, we get Lf0 = Lm × Fi / Pa. To make the field of view Lf of the virtual 3D space camera used for rendering equivalent to or greater than the field of view when viewing the display screen of a display device such as the floating image display device 1000 from the user's eye position 1660 in real space, we need to set Lf ≥ Lf0. Therefore, it is desirable to capture objects such as 3D models of characters in the virtual 3D space with a virtual 3D space camera with a focal length of Lf ≥ Lm × Fi / Pa in 35mm film equivalent, render them using perspective projection, and display them on the display screen of a display device such as the floating image display device 1000. 【0267】 By performing the display processing described above, it becomes possible to display rendered images of 3D character models and other elements that are less likely to cause discomfort to the user when viewed. 【0268】 The visual viewing distance Lm can be predetermined in the spatial floating image display device 1000 as a technically suitable distance. The value of the visual viewing distance Lm may be recorded in the non-volatile memory 1108 or storage unit 1170 shown in Figure 3. The user manual may be displayed on the display screen, such as the spatial floating image 3, to present the visual viewing distance Lm to the user and encourage the user to use the device at that distance. Alternatively, the distance from the display screen, such as the spatial floating image 3, to the user may be measured using the imaging unit 1180 shown in Figure 3, and this distance may be calculated as the visual viewing distance Lm. This distance measurement process can be performed by the control unit 1110 shown in Figure 3 using the image captured by the imaging unit 1180. 【0269】 Next, Figure 16C will explain an example of the visual distance in the case of a display device having an operation detection unit capable of detecting operations such as user finger touches on the display screen. In a display device capable of detecting user 230's finger touches on the display screen (space-floating image 3 in Figures 4A to 4O), as shown in Figures 4A to 4O (space-floating image display device 1000 having an air operation detection sensor 1351 and an air operation detection unit 1350), the visual distance Lm explained in Figure 16B is determined by the length of the arm, including the fingertip, in the case of a touch sensor model. For example, as reference information 1, the following URL contains the results of the "Human Characteristics Infrastructure Development Project (size-JPN)" implemented by the Ministry of Economy, Trade and Industry from FY2004 to FY2006, which collected data from approximately 7,000 people. 【0270】 [Reference information 1] https: / / warp.ndl.go.jp / info:ndljp / pid / 286890 / www.meti.go.jp / press / 20071001007 / 20071001007.html 【0271】 Based on these survey results, Figure 16D shows the calculated arm length, including finger length, broken down by gender and age group. 【0272】 According to the survey results, when classified by gender and age, the group with the longest arm length was men aged 20-24, with an average arm length of 571 mm. When classified by gender and age, the group with the shortest arm length was women aged 75-79, with an average arm length of 456 mm. 【0273】 However, when operating a screen with touch controls, it is almost impossible to fully extend one's arm. In reality, users operate the device with their arm slightly bent at a distance slightly closer to their actual arm length. When the ratio of this distance to arm length was measured for multiple individuals, the average was approximately 80%. From this data, it is estimated that the distance Lm at which men aged 20-24, who have the longest arm lengths among the gender and age-classified groups, visually view the image on the touch sensor surface is approximately 457 mm. Similarly, it is estimated that the distance Lm at which women aged 75-79, who have the shortest arm lengths among the gender and age-classified groups, visually view the image on the touch sensor surface is approximately 365 mm. 【0274】 Figure 16E is a table showing the specific values of the diagonal length Pa of the rectangle representing the floating image on the display screen, as explained in Figure 16B, and Lf0, which is the 35mm film equivalent of the focal length of a virtual 3D camera that has the same field of view as when viewing the display screen of a display device such as the floating image display device 1000 from the user's eye position 1660 in real space, assuming a user's visual viewing distance Lm = 365 mm, as calculated in the calculation results of Figure 16D. Figure 16E shows examples of diagonal lengths Pa of the floating image on the display screen being 10 inches, 5 inches, and 3 inches. 【0275】 Here, we perform the calculation using Lf0 = Lm × Fi / Pa as explained in Figure 16B. When the diagonal length Pa of the rectangle of the floating image, which is the display screen, is 10 inches, 5 inches, and 3 inches, the focal length Lf0 is calculated to be 62 mm, 124 mm, and 207 mm in each example. In the floating image display device 1000, when adopting each display screen size, when generating the image to be displayed on the display screen, it is sufficient to capture objects such as 3D models of characters in the virtual 3D space with a virtual 3D space camera with a focal length Lf equal to or greater than these respective focal lengths Lf0 in 35mm film equivalent, and render them using perspective projection. In this way, it becomes possible to display rendered images of 3D models of characters, etc., that are less likely to cause discomfort to the user when viewed. 【0276】 Here, substituting the visual distance Lm = 365 mm and the diagonal length of 35 mm film Fi = 43.3 mm into the formula Lf0 = Lm × Fi / Pa, we derive Lf0 = 15805 / Pa. 【0277】 Therefore, by determining the focal length Lf(mm) of the virtual 3D space camera in video rendering such that Lf≧Lf0, it can be said that by using a focal length of the virtual 3D space camera such that Lf≧15805 / Pa on any display screen, the effect of displaying rendered images of 3D models of characters, etc., that are less likely to cause discomfort to the user, can be obtained, at least for a specific group of users. 【0278】 Next, Figure 16F is a table showing the specific values of the diagonal length Pa of the rectangle representing the floating image on the display screen, as explained in Figure 16B, and Lf0, which is the 35mm film equivalent of the focal length of a virtual 3D camera that has the same field of view as when viewing the display screen of a display device such as the floating image display device 1000 from the user's eye position 1660 in real space, assuming a user's visual viewing distance Lm = 457 mm based on the calculation results in Figure 16D. Figure 16F shows examples where the diagonal length Pa of the floating image on the display screen is 10 inches, 5 inches, and 3 inches. 【0279】 Here, we perform the calculation using Lf0 = Lm × Fi / Pa as explained in Figure 16B. When the diagonal length Pa of the rectangle of the floating image, which is the display screen, is 10 inches, 5 inches, and 3 inches, the focal length Lf0 is calculated to be 78 mm, 156 mm, and 259 mm in each example. In the floating image display device 1000, when adopting each display screen size, when generating the image to be displayed on the display screen, it is sufficient to capture objects such as 3D models of characters in the virtual 3D space with a virtual 3D space camera with a focal length Lf equal to or greater than these respective focal lengths Lf0 in 35mm film equivalent, and render them using perspective projection. In this way, it becomes possible to display rendered images of 3D models of characters, etc., that are less likely to cause discomfort to the user when viewed. 【0280】 Here, substituting the visual distance Lm = 457 mm and the diagonal length of 35 mm film Fi = 43.3 mm into the formula Lf0 = Lm × Fi / Pa, we derive Lf0 = 19788 / Pa. 【0281】 Therefore, by determining the focal length Lf(mm) of the virtual 3D space camera in video rendering such that Lf≧Lf0, it can be said that by using a focal length of the virtual 3D space camera where Lf≧19788 / Pa on any display screen, the effect of displaying rendered images of 3D models of characters and the like that are less likely to cause discomfort to the user can be obtained for almost all user groups. 【0282】 The effects of the rendering method for displaying rendering images of 3D models of characters, etc., according to this embodiment, as described above using Figures 16B to 16F, will now be explained using Figure 17A. Figure 17A shows the rendering image, rendered using perspective projection, displayed on the display device 1 in a virtual 3D space containing 3D models of character 1531 and character 1532. 【0283】 Here, Figure 17A shows an example of the image a user would see when rendering and displaying are performed with the focal length of the virtual 3D space camera, which serves as the rendering viewpoint, set to Lf ≥ Lm × Fi / Pa. Thus, even with perspective projection, if the field of view of the virtual 3D space camera, which serves as the rendering viewpoint, is equivalent to or more telephoto than the field of view that does not cause discomfort to the user, the user will not perceive any uncomfortable wide-angle distortion in the first place. 【0284】 Furthermore, in the example shown in Figure 17A, the phenomenon where character 1532, who is actually supposed to be shorter than character 1531, appears taller than character 1531 does not occur. 【0285】 Furthermore, when the image in Figure 17A is displayed as a floating image, the black area of the floating image 3 is transparent to the user and therefore cannot be seen, and the user cannot recognize the rectangular shape of the outer perimeter of the display area of the floating image 3. As a result, in the example of Figure 17A, only characters 1531 and 1532 displayed in the floating image 3 appear to the user as if they are floating in the air. However, unlike the image in Figure 15C, the image in Figure 17A does not have the issue of character 1532 being displayed larger than character 1531, so the user recognition problem that is recognized in the image in Figure 15C does not occur. 【0286】 For illustrative purposes, Figure 17B shows an example where the black area within the rectangular display area of the floating spatial image 3 is removed from the image in Figure 17A. 【0287】 As can be seen from the example in Figure 17B, if the black area within the rectangular display area of the floating spatial image 3 is removed from the image in Figure 17B, it is not easy for the user to recognize the center of the rectangle of the floating spatial image 3. As a result, it is not easy for the user to understand that character 1531 is standing in the center of the rectangular display area of the floating spatial image 3, and character 1532 is standing at the right edge of the rectangular display area of the floating spatial image 3. However, in the image in Figure 17B, the focal length of the virtual 3D spatial camera, which is the rendering viewpoint, is set to Lf≧Lm×Fi / Pa for rendering and display, so that characters 1531 and 1532 are displayed more favorably without causing wide-angle distortion that would be unnatural to the user. 【0288】 The user can more effectively recognize characters 1531 and 1532 without having to recognize the rectangular shape of the display area of the floating image 3. This display processing allows the user to recognize the characters displayed in the floating image 3 with less discomfort, improving the sense of realism of the characters displayed in the air and enabling a more optimal display. 【0289】 Figure 18 illustrates a predetermined area within the rectangular display area of the floating spatial image 3. This predetermined area is the area that, when displayed in black, makes it difficult for the user to recognize the rectangular display area of the floating spatial image 3. Figures 15A to 15D, and Figures 17A and 17B, illustrate this using an example of a video where all areas other than the character are displayed in black. However, even when only a portion of the areas other than the character are displayed, it may still be difficult for the user to recognize the rectangular display area of the floating spatial image 3. 【0290】 Figure 18 shows an example of this region. In Figure 18, character 1532 is displayed in the floating image 3. The white area 1800 behind character 1532 is assumed to display a background image with brightness. In contrast, the top-left vertex 1801 and the top-right vertex 1802 of the rectangle of the display area of floating image 3 are displayed in black. 【0291】 Furthermore, the area 1803, which includes the upper edge of the rectangle of the display area of the floating spatial image 3, is also displayed in black. In this way, when the vertices of the rectangle of the display area of the floating spatial image 3 are displayed in black and appear transparent to the user, making them invisible, or when the area containing one edge of the rectangle of the display area of the floating spatial image 3 is displayed in black and appears transparent to the user, making it invisible, the elements that make up the rectangle of the display area of the floating spatial image 3 become transparent to the user and invisible. At this time, it becomes difficult for the user to recognize the rectangle of the display area of the floating spatial image 3. 【0292】 As described above, if an area that makes it difficult for the user to recognize the rectangular display area of the floating spatial image 3 is displayed in black, then even if the image is not entirely black except for the character, the more favorable display processing effect according to this embodiment, as explained using Figures 15A to 17B, can be obtained. 【0293】 <Example 4> Next, as Embodiment 4 of the present invention, we will describe an example in which the spatial floating image display device 1000 described in Figures 1 to 3 is connected to the internet, and a new operation is performed by connecting to a server equipped with a large-scale language model artificial intelligence via the internet. For Embodiment 4 of the present invention, any of the spatial floating image display devices 1000 shown in Figures 1 to 3 may be used. In this embodiment, we will explain the differences from Embodiments 1 to 3, and will omit repeated explanations of configurations similar to those in these embodiments. 【0294】 Using Figure 19A, an example of the connection state between the floating spatial image display device 1000 and the large-scale language model server 19001 of Embodiment 4 of the present invention will be explained. The floating spatial image display device 1000 according to Embodiment 4 may also be called a character conversation device. Furthermore, the system including the floating spatial image display device 1000 and the large-scale language model server 19001 according to Embodiment 4 may also be called a character conversation system. The floating spatial image 3 displayed by the floating spatial image display device 1000 shows an image of character 19051. The image of character 19051 is an image generated by rendering a 3D model of the character in virtual space. 【0295】 In the example shown in Figure 19A, the audio output unit 1140 of the floating image display device 1000 is composed of a speaker. The floating image display device 1000 is also equipped with a microphone 1139 that can pick up the user's voice. The communication unit 19010 is an example of the communication unit 1132 in Figure 3. The floating image display device 1000 can communicate with a communication device 19011 connected to the internet 19000 via the communication unit 19010. In the example shown in Figure 19A, the communication between the communication unit 19010 and the communication device 19011 is shown as wireless, but wired communication is also acceptable. The communication path from the communication unit 19010 to the internet 19000 may have both wired and wireless sections. The floating image display device 1000 can communicate with the large-scale language model server 19001 via the communication device 19011 and the internet 19000. Furthermore, the floating spatial image display device 1000 can communicate with a second server 19002, which is different from the large-scale language model server 19001, via the communication device 19011 and the internet 19000. The configuration including the floating spatial image display device 1000 and the large-scale language model server 19001 may be considered as a single system. 【0296】 Next, an example of the operation of the character conversation device (spatial floating image display device 1000) of Embodiment 4 of the present invention will be described using Figure 19B. This can also be described as an example of the operation of a character conversation system including the spatial floating image display device 1000 and the large-scale language model server 19001. Note that in Figure 19B, the communication paths such as the Internet 19000 shown in Figure 19A have been omitted. In Figure 19B, the user 230 of the spatial floating image display device 1000 is also shown. 【0297】 First, the Large Language Model Server 19001 is a server equipped with a Large Language Model artificial intelligence. Large Language Models are also referred to as LLMs (Large Language Models). Specifically, various models such as GPT-1, GPT-2, GPT-3, InstructGPT, and ChatGPT have been made publicly available. These technologies can be used in this embodiment as well. These Large Language Models are artificial intelligence models that have been generated through extensive pre-training on natural language contained in a large number of documents and texts that exist in the human world. The number of parameters of the artificial intelligence models exceeds hundreds of millions. Furthermore, in addition to this, there are also models that incorporate reinforcement learning based on human feedback. An example of a base model is a model called Transformer. As an example of training these models, for example, Reference 1 is publicly available. 【0298】 [Reference 1] Long Ouyang, et. al. “Training language models to follow instructions with human feedback”, https: / / arxiv.org / pdf / 2203.02155.pdf 【0299】 These large-scale language models are capable of natural language translation, natural language proofreading, and natural language text summarization. More advanced models can even perform natural language question answering (also known as dialogue or conversation), natural language suggestion generation, and programming code generation. Because these artificial intelligence models have a very large number of parameters, training requires vast amounts of data and computing resources. Therefore, training this level of artificial intelligence for a specific application is extremely resource-inefficient. A more resource-efficient approach is to generate a foundation model (Foundation Model) through large-scale training, and then utilize it on various devices via an API (Application Programming Interface). 【0300】 Here, we will explain the sequence of operations of the floating spatial image display device 1000. The floating spatial image display device 1000 loads the character motion program stored in the storage unit 1170 or the like into the memory 1109, and the control unit 110 executes the character motion program, thereby enabling the various processes described below. 【0301】 First, the floating video display device 1000 is equipped with a microphone 1139. When a user 230 speaks to the character 19051, the microphone 1139 picks up the user's voice (words from the user) and converts it into an audio signal. The character operation program executed by the control unit 110 then extracts the text of the words spoken by the user 230 from the audio signal. This text is in natural language. The extraction of the text of the words spoken by the user 230 may be performed continuously for all words, or it may be started when the user speaks within a predetermined period following a trigger keyword. For example, the trigger keyword could be when the user says "Hello" followed by the character's name. For example, if the character 19051's name is "Koto," then "Hello, Koto!" could be the trigger keyword. 【0302】 The character movement program of the floating image display device 1000 creates an instruction (prompt) based on the text of the words spoken by the user 230, and sends the instruction to the large-scale language model server 19001 using an API. Here, the instruction may be metadata in which information is stored using tags in a markup language or JSON format. The instruction contains natural language text information as the main message. The types of instruction sent from the floating image display device 1000 to the large-scale language model server 19001 include setting instruction statements that store instructions such as initial settings, and user instruction statements that reflect instructions from the user. Type identification information that identifies whether an instruction statement is a setting instruction statement or a user instruction statement may be stored in a part of the instruction statement other than the main message. When the character movement program of the floating image display device 1000 creates an instruction (prompt) based on the text of the words spoken by the user 230, it creates a user instruction statement and sends it to the large-scale language model server 19001. 【0303】 Next, the large-scale language model of the artificial intelligence in the large-scale language model server 19001 performs inference based on the instruction text transmitted from the floating spatial image display device 1000, and generates a response containing natural language text information based on the result. The large-scale language model server 19001 transmits the response to the floating spatial image display device 1000 using an API. The response contains natural language text information as the main message. Here, the response may also be metadata in which information is stored using tags in a markup language or JSON format, etc., in the same format as the instruction text. If the same format as the instruction text is used in the response, type identification information may be stored in a part other than the main message to indicate that it is a different type of information from the initial setup instruction text and the user instruction text. For example, information indicating that it is a response text from the large-scale language model may be stored. 【0304】 Next, the floating image display device 1000 receives a response from the large-scale language model server 19001 and extracts the natural language text information stored as the main message in that response. Based on the natural language text information extracted from the aforementioned response, the character operation program of the floating image display device 1000 uses speech synthesis technology to generate natural language speech that serves as a response to the user, and outputs it from the speaker-like audio output unit 1140 so that it sounds as if it were the voice of character 19051. This process may also be described as the character "speaking". 【0305】 As described above, the processing by the floating spatial image display device 1000 and the large-scale language model server 19001 allows for specific examples of the voice responses of character 19051 to words spoken by user 230, as shown in conversation examples 1-5 of Figure 19C. In this way, user 230 can converse with character 19051 as if it were a real person. 【0306】 As described above, with the floating-image display device 1000 or system including the floating-image display device 1000 shown in Figure 19B, it is not necessary to install the large-scale language model itself, which requires a vast amount of data and computing resources for training, into the floating-image display device 1000 itself. Furthermore, the advanced natural language processing capabilities of the large-scale language model can be utilized via an API, enabling the character to provide more appropriate responses and engage in more suitable conversations when the user speaks to it. 【0307】 Next, using Figure 19D, an example of the operation of the character conversation device (spatial floating image display device 1000) of Embodiment 4 of the present invention will be described. This can also be described as an example of the operation of a character conversation system including the spatial floating image display device 1000 and the large-scale language model server 19001. Specifically, Figure 19D is an example of the natural language text of the main message of the instruction sent from the spatial floating image display device 1000 to the large-scale language model server 19001, which forms the basis of the conversation between the character 19051 displayed on the spatial floating image display device 1000 and the user 230, and the natural language text of the main message of the server response that is the response. 【0308】 Furthermore, Figure 19D shows the exchange of instructions and responses in chronological order, from the display setting instruction to the first round of user instructions and their responses, up to the fourth round of user instructions and their responses. 【0309】 As shown in Figure 19D, the large-scale language model server 19001 can be instructed by the configuration instructions to provide initial settings to the large-scale language model of the artificial intelligence, such as the name of the large-scale language model itself, the role it should play, and the characteristics of the conversation. The user's name can also be made to understand the initial settings. As a result, the large-scale language model generates responses from the first round onward while adhering to the assigned role. When a user hears the voice of character 19051 based on these responses from the first round onward, it will feel as if character 19051 embodies the settings and personality of the person described in the configuration instructions. Furthermore, the large-scale language model server 19001 in this embodiment is equipped with memory that stores the content of the conversation until the series of conversations is completed, and is configured to generate responses after storing a series of user instructions and their responses. This enables conversations like the one shown in Figure 19D. 【0310】 Next, using Figure 19E, an example of the operation of the character conversation device (spatial floating image display device 1000) of Embodiment 4 of the present invention will be described. This can also be described as an example of the operation of a character conversation system including the spatial floating image display device 1000 and the large-scale language model server 19001. Specifically, Figure 19E is an example of the natural language text of the main message of the instruction sent from the spatial floating image display device 1000 to the large-scale language model server 19001, which forms the basis of the conversation between the character 19051 displayed on the spatial floating image display device 1000 and the user 230, and the natural language text of the main message of the server response that is the response. 【0311】 Figure 19E shows an example of a new conversation that takes place after the series of conversations shown in Figure 19D has ended, when user 230 speaks to character 19051 again. In Figure 19E, the exchange of instructions and responses is shown chronologically, from the first round of user instructions and their responses to the third round of user instructions and their responses. 【0312】 Here, "termination" of "continuation of a series of conversations" refers to the process by which the large-scale language model server 19001 erases the conversation memory it held while the series of conversations was continuing, when predetermined conditions are met. An example of predetermined conditions is when the spatial floating image display device 1000 instructs the large-scale language model server 19001 to "terminate" the "continuation of a series of conversations" using an instruction message. Another example of predetermined conditions is when a predetermined amount of time has passed since the spatial floating image display device 1000 stopped sending instruction messages to the large-scale language model server 19001 regarding the series of conversations (timeout). Furthermore, in the connection between the spatial floating image display device 1000 and the large-scale language model server 19001, authentication may be lost due to factors such as communication interruption or the spatial floating image display device 1000 being powered off (OFF) while the above instruction messages and responses are being exchanged after the authentication process has been performed. 【0313】 Furthermore, once the "continuation of a series of conversations" ends, the large-scale language model server 19001 erases the memories of the conversations it held while the series of conversations was ongoing. Therefore, even though the conversation shown in Figure 19E takes place after the series of conversations shown in Figure 19D, the server response to the user instruction is one in which the server has no memory whatsoever of the name of the large-scale language model itself, the role to be played, the characteristics of the conversation, the user's name, etc., which were included in the setting instruction shown in Figure 19D. Similarly, the conversation shown in Figure 19E is one in which the server has no memory whatsoever of the series of conversations shown in Figure 19D. In other words, the "end" of the "continuation of a series of conversations" shown in Figure 19D means that the conversation in Figure 19E starts from a state in which the large-scale language model of the artificial intelligence of the large-scale language model server 19001 has been initialized. 【0314】 This can make user 230 feel as if character 19051 has lost their memories of them, or as if they are a completely different person. From user 230's perspective, the character's response feels very unnatural, resulting in a lonely and disappointing experience. Such behavior presents a challenge in that it is impossible to ensure the identity of settings and memories of character 19051, such as their name, role, conversational characteristics, and personality, as displayed on the spatial floating image display device 1000. 【0315】 Next, using Figure 19F, an example of the operation of the character conversation device (spatial floating image display device 1000) of Embodiment 4 of the present invention will be described. This can also be described as an example of the operation of a character conversation system including the spatial floating image display device 1000 and the large-scale language model server 19001. Specifically, Figure 19F is an example of the natural language text of the main message of the instruction sent from the spatial floating image display device 1000 to the large-scale language model server 19001, which forms the basis of the conversation between the character 19051 displayed on the spatial floating image display device 1000 and the user 230, and the natural language text of the main message of the server response that is the response. 【0316】 Figure 19F shows an example of a new conversation that takes place after the series of conversations shown in Figure 19D has ended, when user 230 speaks to character 19051 again. Unlike the process in Figure 19E, in the process in Figure 19F, when a new conversation is started, the floating spatial image display device 1000 sends a setting instruction as the first instruction to the large-scale language model server 19001. This setting instruction contains the same natural language text as the initial setting instruction in Figure 19D. This may also be referred to as the reset text. The setting instruction then contains natural language text that explains the history of past conversations. This may also be referred to as the conversation history text. The history of past conversations can be recorded by the floating spatial image display device 1000 as natural language text information in the storage unit 1170, linked to the date and time information of the conversation, while the series of conversations described in Figure 19D is continuing. If there are conversations on different dates, each conversation can be recorded linked to the date and time information, and the conversation history can be accumulated. When generating the initial instruction for a later conversation, as shown in Figure 19F, the natural language text information of the conversation and the date and time information of the conversation recorded in the storage unit 1170 can be read and used to generate the instruction. 【0317】 When using natural language text information from past conversation history to generate the setting instruction statement, the format can be determined to some extent freely, as this data is sent to a large-scale language model. However, as shown in Figure 19F, it is advisable to prepare natural language prefixes and suffixes such as "I said the following on [date]," and "You said the following on [date]," and then combine them with the recorded conversation's natural language text information to generate the text of the setting instruction statement. Additionally, the date and time information of the conversation read from the storage unit 1170 may be combined with the "[date]" portion and used as part of the text of the setting instruction statement. 【0318】 Even if user 230 speaks to character 19051 again to initiate a new conversation after a series of conversations has ended, performing the generation and transmission process of the setting instruction text as described in Figure 19F will ensure that subsequent user instruction texts reflect the character's role, name, conversational characteristics, personality, and / or conversational characteristics settings and conversation history from the previous conversation. This is preferable because it allows the user to perceive a greater degree of consistency in the character's role, name, conversational characteristics, or personality settings and memories from the previous conversation. 【0319】 Next, using Figure 19G, an example of the operation of the character conversation device (spatial floating image display device 1000) of Embodiment 4 of the present invention will be described. This can also be described as an example of the operation of a character conversation system including the spatial floating image display device 1000 and the large-scale language model server 19001. Specifically, Figure 19G is an example of the natural language text of the main message of the instruction sent from the spatial floating image display device 1000 to the large-scale language model server 19001, which forms the basis of the conversation between the character 19051 displayed on the spatial floating image display device 1000 and the user 230, and the natural language text of the main message of the server response that is the response. 【0320】 Figure 19G shows an example of a series of conversations in the same conversation as shown in Figure 19F, specifically the first round of user instructions and their responses, followed by the third round of user instructions and their responses. In Figure 19G, the exchange of instructions and responses is shown chronologically. The content of the setting instructions is the same as shown in Figure 19F, so repeated descriptions are omitted. 【0321】 As shown in the natural language text of the server response in the table in Figure 19F, by using the setting instructions shown in Figure 19F, the server response by the large-scale language model artificial intelligence of the large-scale language model server 19001 will reflect the settings and conversation history of the character, such as the character's role, name, conversational characteristics, or personality, at the time of the previous conversation. This is preferable because, from the user's perspective, it is perceived that the identity of the character's settings and memories, such as the character's role, name, conversational characteristics, or personality, at the time of the previous conversation is better maintained. This can also be called pseudo-identity of the character from the user's perspective, as the character can be perceived as identical. 【0322】 Furthermore, from the user's perspective, they can share memories with the character, resulting in a more enjoyable character conversation experience. 【0323】 Next, using Figure 19H, an example of the operation of the character conversation device (space-floating image display device 1000) of Embodiment 4 of the present invention will be described. This can also be described as an example of the operation of a character conversation system including the space-floating image display device 1000 and the large-scale language model server 19001. Specifically, Figure 19H shows an example of operation in which the character displayed on the space-floating image 3 of the space-floating image display device 1000 is switched from among multiple character candidates. The character operation program executed by the control unit 110 of the space-floating image display device 1000 can switch the displayed character based, for example, on operation inputs input to the operation input unit 1107 or operations detected by the aerial operation detection unit 1350. 【0324】 In the example in Figure 19H, in addition to character 19051 (named "Koto") used in the explanations of Figures 19A to 19G, characters 19052 (named "Tom") and 19053 (named "Necco") are shown. Characters 19051 (named "Koto") and 19052 (named "Tom") are human characters, while character 19053 (named "Necco") is a cat character. Switching the display of characters shown in the floating spatial image 3 can be done by rendering different characters in a virtual 3D space for each character and switching the displayed image on the display device 1. The processing to realize the display of the rendered image of the 3D model of each character can be done by performing one of the first to third processing examples explained in Figure 15A, for example. In addition, for some characters, a dynamically moving 2D image may be displayed. 【0325】 Furthermore, when the character movement program executed by the control unit 110 switches the display of the characters shown in the floating spatial image 3, it is preferable that the synthesized voice used for each character's "speech" is also changed. This can be done by pre-storing synthesized voice data with corresponding voice to each character in the storage unit 1170, and then performing the synthesized voice change process when switching the display of the characters. 【0326】 In the example shown in Figure 19H, the system is configured so that user 230 can converse with any of the characters. In the spatial floating image display device 1000 of Figure 19H, each of these characters is assigned different roles, names, conversational characteristics, or personalities. Furthermore, the memories of each character based on their conversation history are managed separately for each character. 【0327】 Therefore, the floating spatial image display device 1000 constructs a database shown in Figure 19I in its storage unit 1170, and uses this database to manage character settings and character conversation history. 【0328】 Next, using Figure 19I, an example of the operation of the character conversation device (spatial floating image display device 1000) of Embodiment 4 of the present invention will be described. This can also be described as an example of the operation of a character conversation system including the spatial floating image display device 1000 and the large-scale language model server 19001. Specifically, Figure 19I is an explanatory diagram of the database 19200 for managing character settings and character conversation history for multiple characters displayed in the spatial floating image 3 of the spatial floating image display device 1000. 【0329】 The character operation program executed by the control unit 110 of the floating image display device 1000 constructs the database 19200 in the storage unit 1170, for example. The character ID is an identification number that identifies each of the multiple characters that can be displayed by the floating image display device 1000, and may be a natural number or use the alphabet, etc. The name is data of the name of each of the multiple characters that can be displayed by the floating image display device 1000. 【0330】 The initial setup instruction is natural language text information that describes the settings of each of the multiple characters that can be displayed on the floating spatial image display device 1000, such as their role, name, conversational characteristics, or personality. Since this initial setup instruction is the main data of the setting instruction sent from the floating spatial image display device 1000 to the large-scale language model server 19001, it is desirable that the content be such that the large-scale language model of the artificial intelligence in the large-scale language model server 19001 can read it directly. 【0331】 The conversation history, numbered 1, 2, ..., is a record of conversations between each character and the user, and is recorded separately for each character. Since this conversation history will be included in the natural language text information, which is the main data of the setting instruction message sent from the spatial floating image display device 1000 to the large-scale language model server 19001, it is desirable that the content be such that the large-scale language model of the artificial intelligence in the large-scale language model server 19001 can read it directly. 【0332】 The character operation program executed by the control unit 110 of the floating-space image display device 1000, when switching the character displayed on the floating-space image 3 of the floating-space image display device 1000, uses the database 19200 in Figure 19I to select and switch the initial setting instruction text and conversation history used for the natural language text information, which is the main data of the setting instruction text sent from the floating-space image display device 1000 to the large-scale language model server 19001, so as to correspond to the character displayed on the floating-space image 3 of the floating-space image display device 1000. In addition, each time a conversation takes place between the user 230 and the character, the character operation program records the history of that conversation in the conversation history area of the database 19200 in Figure 19I that corresponds to the character displayed on the floating-space image 3. 【0333】 By using the database 19200, the character operation program executed by the control unit 110 of the floating spatial image display device 1000 uses the same large-scale language model of the same artificial intelligence on the same large-scale language model server 19001 to establish a conversation between the user 230 and the character. However, from the user's perspective, the uniqueness of each character's settings is preserved, and the memory of different conversations continues for each character. From the user's perspective, this is preferable because it is perceived that the identity of the settings and memories, such as the character's role, name, conversational characteristics, or personality at the time of the previous conversation, is more readily maintained for each character. This can also be expressed as ensuring a pseudo-identity of each character from the user's perspective. 【0334】 Therefore, even when the floating image display device 1000 is configured to switch between displaying characters from among multiple character candidates in the floating image 3, the operation using the database 19200 described above will result in a less jarring experience for the user in conversations with each character, and will allow them to share memories with each of the multiple characters, providing a more enjoyable character conversation experience. 【0335】 Furthermore, if the user is prevented from editing the initial setting instructions for multiple characters, the settings for each character, such as their role, name, conversational characteristics, or personality, can be maintained in a state close to the intentions of the provider of the floating spatial image display device 1000 or the creator of the character's content. Alternatively, the user may be allowed to edit the initial setting instructions for characters in response to input from the operation input unit 1107 or the like. In this case, the user can customize the settings for the character's role, name, conversational characteristics, or personality, and converse with the character they have set up themselves. In this case, the character's 3D model, its rendered image, and the type of synthesized voice for the character may also be replaced accordingly. 【0336】 Next, using Figure 19J, an example of the operation of the character conversation device (spatial floating image display device 1000) of Embodiment 4 of the present invention will be described. This can also be described as an example of the operation of a character conversation system including the spatial floating image display device 1000 and the large-scale language model server 19001. Specifically, a method for providing a character conversation service using a character conversation device with the spatial floating image display device 1000, or a character conversation system with the spatial floating image display device 1000 and the large-scale language model server 19001, at a lower cost will be described. 【0337】 As explained in Figure 19B, training a large-scale language model to achieve this level of artificial intelligence for a specific application is extremely resource-inefficient. Therefore, it is more resource-efficient to generate a foundation model that can be applied to various uses by performing large-scale training, and then making it available on various devices via an API (Application Programming Interface). In this case, the provider of the large-scale language model often recovers the cost used to train the model from the user of the device as a usage fee for the device's API. In natural language models, the API usage fee is often charged based on the number of tokens, which are units of words that divide sentences, that are processed. 【0338】 Therefore, in the floating image display device 1000 of Embodiment 4 of the present invention, by reducing the number of tokens in the natural language text information transmitted between the floating image display device 1000 and the large-scale language model server 19001 using an API, it is possible to provide users with a character conversation device using the floating image display device 1000, and a character conversation service using a character conversation system with the floating image display device 1000 and the large-scale language model server 19001 at a lower cost. 【0339】 For example, by using the processing and configurations shown in Examples 1 to 3 in the table in Figure 19J, it is technically possible to reduce the number of tokens in the natural language text information transmitted between the spatial floating image display device 1000 and the large-scale language model server 19001 using an API. 【0340】 Example 1 is an example of a method to reduce the number of tokens in the conversation history text stored and transmitted in the API configuration instructions, specifically by using document summarization processing to shorten the conversation history text and reduce the number of tokens. For example, the natural language of the conversation history with the character recorded in the storage unit 1170 is summarized and recorded. While document summarization can be performed at the start of the next conversation, it is more time-efficient to perform it at the end of a "series of conversations". 【0341】 Alternatively, the text summarization process may be requested from the large-scale language model server 19001 itself. However, in this case, the token saving effect is low. Therefore, for example, if the second server 19002 provides natural language text summarization processing via an API at a lower cost than the large-scale language model server 19001, the text summarization processing can be requested from the second server 19002 via the API, and the text summary of the conversation history can be stored in a configuration instruction message to the large-scale language model server 19001 and transmitted. 【0342】 Furthermore, if only text summarization processing is required, it can also be done on the terminal side. The control unit 110 may execute a document summarization program that is loaded into the memory 1109 of the spatial floating image display device 1000 to perform text summarization. In this case, the token saving effect is high. Also, even if the conversation history becomes long, by specifying an upper limit on the number of characters after summarization in the text summarization processing, the upper limit on the length of the conversation history text is determined, so an upper limit on tokens can be set, and token saving is possible. 【0343】 Furthermore, since the amount of text information for initial character settings, such as character roles, names, conversational characteristics, or personalities, does not increase as much as the amount of conversation history, it is efficient and preferable to maintain the text information in the initial character setting instructions while reducing the number of text tokens in the conversation history. 【0344】 The process described in Example 1 can be carried out by a character motion program executed by the control unit 110, which controls each part. 【0345】 Example 2 is another example of reducing the number of tokens in the conversation history text stored and transmitted in the API configuration instructions. For example, the number of tokens can be reduced by deleting older conversation histories from the conversation history with the character recorded in the storage unit 1170. Specifying an upper limit on the number of characters in the conversation history determines the upper limit on the length of the conversation history text, thus setting an upper limit on the number of tokens and enabling token saving. Alternatively, a predetermined period for the conversation history can be specified, and conversation histories exceeding that period can be deleted. In this case as well, token saving is possible. In Example 2 as well, since the text information of the character's initial settings, such as the character's role, name, conversation characteristics, or personality, does not increase as much as the conversation history, it is efficient and preferable to maintain the text information in the character's initial settings instructions and reduce the number of tokens in the conversation history text information. 【0346】 The process described in Example 2 can be carried out by the character motion program executed by the control unit 110, which controls each part. 【0347】 Example 3 is a method to reduce the number of tokens by reducing the frequency of sending setting instructions using the API. Specifically, after the device power is turned on or after the display character is switched, and even after the video settings and synthesized voice settings for the displayed character are completed, the setting instructions are not sent in advance. Instead, the setting instructions are sent to the large-scale language model server 19001 only when the control unit 110 determines that the natural language text information contained in the user's speech picked up by the microphone 1139 is text information that should be used with a large-scale language model of artificial intelligence. This reduces the frequency of sending setting instructions to the large-scale language model server 19001 and thus reduces the number of tokens. 【0348】 Specifically, for example, after the device is powered on (ON) or after an operation input to switch the displayed character, the display processing of the display device 1, controlled by a character operation program executed by the control unit 110, causes character 19051 (named "Koto") to be displayed in the floating spatial image 3 as shown in Figure 19H. At this time, for example, if a synthesized voice for the character 19051's appearance is stored in the storage unit 1170 or the like, the synthesized voice for the character's appearance, such as "Good morning. I'm Koto," "Good afternoon. I'm Koto," or "Good evening. I'm Koto," may be output from the speaker, which is the audio output unit 1140. At this time, the image of character 19051 has already been set as the image of the character displayed in the floating spatial image 3, and the synthesized voice output from the speaker, which is the audio output unit 1140, is set to the synthesized voice corresponding to character 19051. 【0349】 Here, as already explained, the inference processing of the large-scale language model of artificial intelligence in the large-scale language model server 19001 also takes time if the instruction sentence is long. In particular, if the setting instruction sentence includes text information about past conversation history, the number of tokens in the instruction sentence increases, and the inference processing time becomes especially long. The setting instruction sentence itself and the response are not output to the user 230. From the user's response to the setting instruction sentence, the synthesized speech as the character's "utterance" is output from the speaker, which is the speech output unit 1140. Then, it would seem preferable to send the setting instruction sentence in advance from the spatial floating image display device 1000 to the large-scale language model server 19001 and complete the inference processing of the large-scale language model for the setting instruction sentence in advance, because this would speed up the output of the synthesized speech of the character 19051's "utterance" after the user 230 speaks to the character 19051. 【0350】 However, if the large-scale language model server 19001 is pre-processed by sending a setting instruction to the user 230 before the user 230 speaks, and the large-scale language model's inference processing for the setting instruction is completed in advance, for example, if the user 230 turns off the power to the floating spatial image display device 1000 through operations via the operation input unit 1107 or the aerial operation detection unit 1350, or if the user 230 switches the displayed character from character 19051 to another character through operations via the operation input unit 1107 or the aerial operation detection unit 1350, then the number of tokens processed by the large-scale language model server 19001 after the setting instruction was pre-processed will be the number of processed tokens that wasted usage fees. This hinders the provision of character conversation devices using the floating spatial image display device 1000, and character conversation services using the floating spatial image display device 1000 and the large-scale language model server 19001, to users at a lower cost. 【0351】 Therefore, after the power of the floating spatial image display device 1000 is turned on or after an operation input to switch the displayed character, the control unit 110 controls the character operation program to set the image of character 19051 as the image of the character displayed in the floating spatial image 3, and even after the synthesized voice output from the speaker, which is the sound output unit 1140, is set to the synthesized voice corresponding to character 19051, it is desirable that the device continues not to send setting instruction text to the large-scale language model server 19001 until the point in time when it recognizes that the user 230 is speaking to character 19051. 【0352】 Here, the point at which it is recognized that user 230 is speaking to character 19051 may be, for example, the point at which the trigger keyword described in Figure 19B is detected, or the point at which the text of the words spoken by user 230 is extracted. In this way, the number of processing tokens that unnecessarily waste usage fees can be reduced, and the character conversation device using the spatial floating image display device 1000, or the character conversation service using the spatial floating image display device 1000 and the large-scale language model server 19001, can be provided to users at a lower cost. 【0353】 Furthermore, even after the point in time when the system recognizes that user 230 is speaking to character 19051, it is desirable to continue not sending setting instructions to the large-scale language model server 19001, for example, if the text information extracted from user 230's voice picked up by microphone 1139 corresponds to preset keywords that do not require inference processing by a large-scale language model. Specifically, examples of preset keywords include those used by user 230 to request character 19051 to react, such as by performing an animation or emitting synthesized speech, such as "Try jumping" or "Try dancing." In this case, the character motion program executed by the control unit 110 can read the motion data, animation video, and / or synthesized speech data corresponding to the reaction stored in the storage unit for character 19051, and use this data to generate the video to be displayed on the floating spatial video 3 and to output the synthesized speech from the speaker, which is the audio output unit 1140. 【0354】 Such processing does not necessarily require the inference processing of the large-scale language model server 19001. If, after such processing, user 230 turns off the power of the floating video display device 1000 through operations via the operation input unit 1107 or the aerial operation detection unit 1350, or if, for example, user 230 switches the displayed character from character 19051 to another character through operations via the operation input unit 1107 or the aerial operation detection unit 1350, sending the setting instruction statement to the large-scale language model server 19001 beforehand and processing it with the inference processing of the large-scale language model would result in a wasteful use of processing tokens and usage fees. 【0355】 Therefore, even after the point in time when the system recognizes that user 230 is speaking to character 19051, it is desirable to continue not sending the configuration instruction to the large-scale language model server 19001 until, for example, it is determined whether the text information extracted from user 230's voice picked up by microphone 1139 corresponds to text information of preset keywords that do not require inference processing of the large-scale language model. Only when this determination determines that inference processing of the large-scale language model is necessary should the configuration instruction be sent to the large-scale language model server 19001 and the inference processing of the large-scale language model be advanced. 【0356】 Furthermore, the process described in Example 3 can be carried out by the character motion program executed by the control unit 110, which controls each part. 【0357】 As described above, the methods for reducing (saving) the number of processing tokens for large-scale language models, as shown in the examples in Figure 19J, make it possible to provide users with character conversation devices using the floating spatial image display device 1000, and character conversation services using the floating spatial image display device 1000 and the large-scale language model server 19001, at a lower cost. 【0358】 As described above, the character conversation device and character conversation system according to Embodiment 4 can reduce the sense of incongruity that users feel when conversing with characters displayed on the display device. Furthermore, the character conversation device and character conversation system according to Embodiment 4 can provide character conversation services to users at a lower cost. 【0359】 In addition, the character conversation device and the display device related to the character conversation system in Example 4 were described using a spatially floating image display device as an example. However, in Example 4, the display device does not necessarily have to be one that displays spatially floating images in the air. For example, it may be a display device that displays images on a physical surface, such as a liquid crystal panel, an organic EL panel, a plasma display, a projector that projects an image and reflects it onto an opaque screen, or a projector that projects an image and diffuses it onto a transparent screen. In this case, the part of these display devices that displays an image visible to the user may all be referred to as the display unit, even if the display screen of the displayed image has various forms. 【0360】 In this case as well, the character conversation device and character conversation system according to Example 4 can reduce the sense of incongruity that users feel when conversing with characters displayed on the display device. Furthermore, the character conversation device and character conversation system according to Example 4 can provide character conversation services to users at a lower cost. 【0361】 <Example 5> Next, Embodiment 5 of the present invention is an improvement on the character conversation device (space-floating image display device 1000) and character conversation system described in the figures of Embodiment 4. In this embodiment, the differences from Embodiment 4 will be explained, and repeating explanations of configurations similar to those in these embodiments will be omitted. 【0362】 An example of a character conversation device and character conversation system according to Embodiment 5 of the present invention will be described using Figure 20A. In the character conversation system of Embodiment 5, a large-scale language model server 20001 is provided instead of the large-scale language model server 19001 in Figure 19A, and it is connected to the Internet 19000. 【0363】 Here, the large-scale language model server 20001 is a server equipped with a large-scale language model artificial intelligence, but it is a multimodal large-scale language model artificial intelligence capable of processing not only natural language text information, which could be processed by the large-scale language model server 19001, but also other types of information other than natural language text information. Examples of multimodal large-scale language model artificial intelligence include GPT-4 (see Reference 2) and Gato (see Reference 3), which have been made publicly available. These technologies may also be used in this embodiment. These multimodal large-scale language models are artificial intelligence models generated by performing large-scale pre-training on natural language and other types of information (e.g., images, videos, audio, etc.) contained in numerous documents and texts that exist in the human world. Furthermore, there are also models that incorporate reinforcement learning based on human feedback. Hereafter, information other than natural language text information, such as images, videos, and audio, may be referred to as non-natural language information sources. 【0364】 [Reference 2] Open AI “GPT-4 Technical Report”, https: / / cdn.openai.com / papers / gpt-4.pdf 【0365】 [Reference 3] Scott Reed, et. al. “A Generalist Agent”, https: / / arxiv.org / pdf / 2205.06175.pdf 【0366】 Here, the character conversation device, the floating spatial image display device 1000, will be described as having the same configuration as the character conversation device (floating spatial image display device 1000) of Embodiment 4, as an example. 【0367】 In Example 5, the spatial floating image display device 1000, which is a character conversation device, can communicate with the large-scale language model of the large-scale language model server 20001 via the internet 19000 using an API. 【0368】 The character conversation system in Example 5 includes a mobile information processing terminal 20010 used by user 230. The mobile information processing terminal 20010 is a so-called smartphone or tablet information processing terminal. 【0369】 Here, an example of a mobile information processing terminal 20010 will be described using Figure 20B. The mobile information processing terminal 20010 includes a display panel 20011 which is a touch operation input panel, a control unit 20012, an external power input interface 20013, a power supply 20014, a secondary battery 20015, a storage unit 20016, a video control unit 20017, a posture sensor 20018, a communication unit 20020, an audio output unit 20021, a microphone 20022, a video signal input unit 20023, an audio signal input unit 20024, an imaging unit 20025, and the like. 【0370】 The display panel 20011 is equipped with a touch input sensor and can accept touch input from the user 230's finger. The display panel 20011 displays images using a liquid crystal panel or an organic EL panel and can display images. The display panel 20011 may also be called a display unit. 【0371】 The communication unit 20020 can be configured with a Wi-Fi communication interface, a Bluetooth communication interface, or a mobile communication interface such as 4G or 5G. Using these communication methods, the communication unit 20020 of the mobile information processing terminal 20010 can communicate with the communication unit 19010 of the character conversation device (space-floating image display device 1000). The mobile information processing terminal 20010 is equipped with a control unit such as a CPU and memory, and the control unit controls the display panel 20011 and the communication unit 20020. Furthermore, the communication unit 20020 can communicate with the communication device 19011 connected to the Internet 19000 using one of the communication methods of the communication unit 20020. As a result, the mobile information processing terminal 20010 can communicate with various servers connected to the Internet 19000. 【0372】 Power supply 20014 converts AC current input from an external source via the external power input interface 20013 into DC current and supplies the necessary DC current to each part of the mobile information processing terminal 20010. The secondary battery 20015 stores the power supplied by power supply 20014. In addition, the secondary battery 20015 supplies power to each part that requires power via the external power input interface 20013 when external power is not supplied. 【0373】 The video signal input section 20023 receives video data by connecting an external video output device. Various digital video input interfaces are possible for the video signal input section 20023. For example, it can be configured with an HDMI (High-Definition Multimedia Interface) standard video input interface, a DVI (Digital Visual Interface) standard video input interface, or a DisplayPort standard video input interface. Alternatively, analog video input interfaces such as analog RGB or composite video may be provided. The video signal input section 20023 may also use various USB interfaces. 【0374】 The audio signal input unit 20024 receives audio data by connecting an external audio output device. The audio signal input unit 20024 may be configured as an HDMI audio input interface, an optical digital terminal interface, or a coaxial digital terminal interface, etc. The audio signal input unit 20024 may also be various USB interfaces, etc. In the case of an HDMI interface, the video signal input unit 20023 and the audio signal input unit 20024 may be configured as an interface with integrated terminals and cables. 【0375】 The audio output unit 20021 is capable of outputting audio based on audio data input to the audio signal input unit 20024. The audio output unit 20021 is also capable of outputting audio based on audio data stored in the storage unit 20016. The audio output unit 20021 may be configured as a speaker. In addition, the audio output unit 20021 may output built-in operation sounds or error warning sounds. Alternatively, the audio output unit 20021 may be configured to output as a digital signal to an external device, such as the Audio Return Channel function specified in the HDMI standard. 【0376】 Microphone 20022 is a microphone that picks up sounds from the surrounding area of the mobile information processing terminal 20010, converts them into signals, and generates audio signals. The microphone may be configured to record human voices, such as the user's voice, and the control unit 20012, described later, may perform speech recognition processing on the generated audio signal to obtain text information from the audio signal. 【0377】 The imaging unit 20025 is a camera having an image sensor. The camera may be provided on the front of the display panel 20011 side of the mobile information processing terminal 20010, or on the back of the display panel 20011 side. Both a front camera and a rear camera may be provided. In this embodiment, the imaging unit 20025 will be described as having both a front camera and a rear camera. 【0378】 The storage unit 20016 is a storage device that records various types of information, such as video data, image data, and audio data. The storage unit 20016 may be composed of a magnetic recording medium such as a hard disk drive (HDD) or a semiconductor memory such as a solid-state drive (SSD). For example, the storage unit 20016 may have various types of information, such as video data, image data, and audio data, pre-recorded in it at the time of product shipment. The storage unit 20016 may also record various types of information, such as video data, image data, and audio data, acquired from external devices or external servers via the communication unit 20020. The video data, image data, etc., recorded in the storage unit 20016 are output to the display panel 20011. The video data, image data, etc., recorded in the storage unit 20016 may also be output to external devices or external servers via the communication unit 20020. 【0379】 The video control unit 20017 performs various controls related to the video signals input to the display panel 20011. The video control unit 20017 may also be called a video processing circuit and may be composed of hardware such as an ASIC, FPGA, or video processor. The video control unit 20017 may also be called a video processing unit or image processing unit. For example, the video control unit 20017 controls video switching, such as determining which video signal to input to the display panel 20011 from among the video signals to be stored in memory 20026 and the video signals (video data) input to the video signal input unit 20023. The video control unit 20017 may also perform control to perform image processing on the video signals input from the video signal input unit 20023 and the video signals to be stored in memory 20026. Examples of image processing include scaling processing, such as enlarging, reducing, and transforming images; brightness adjustment processing, which changes the brightness; contrast adjustment processing, which changes the contrast curve of an image; and retinex processing, which decomposes an image into its light components and changes the weighting of each component. 【0380】 The attitude sensor 20018 is a sensor composed of a gravity sensor, an acceleration sensor, or a combination thereof, and can detect the attitude of the mobile information processing terminal 20010. Based on the attitude detection result of the attitude sensor 20018, the control unit 20012 may control the operation of each connected part. 【0381】 The non-volatile memory 20027 stores various data used by the mobile information processing terminal 20010. The data stored in the non-volatile memory 20027 includes, for example, data for various operations displayed on the display panel 20011 of the mobile information processing terminal 20010, display icons, data and layout information for objects used by the user. Memory 20026 stores video data and device control data displayed on the display panel 20011. The control unit 20012 may also read various software from the storage unit 20016, expand it into memory 20026, and store it there. 【0382】 The control unit 20012 controls the operation of each connected component. The control unit 20012 may also work in cooperation with a program stored in memory 20026 to perform calculations based on information acquired from each component within the mobile information processing terminal 20010. 【0383】 Next, an example of the operation of the character conversation device (spatial floating image display device 1000) of Embodiment 5 of the present invention will be described using Figure 20C. This can also be described as an example of the operation of a character conversation system including the spatial floating image display device 1000 and the large-scale language model server 20001. In Embodiment 5 as well, the character conversation device (spatial floating image display device 1000) loads the character operation program stored in the storage unit 1170 or the like into the memory 1109, and the control unit 110 executes the character operation program, thereby enabling the various processes described below to be realized. 【0384】 In Example 4, the actions performed by User 230 to the character conversation device (spatial floating image display device 1000) were mainly through User 230's voice. In Example 4, the character conversation device (spatial floating image display device 1000) performed a series of operations starting with the process of picking up User 230's voice with a microphone. In contrast, the character conversation device (spatial floating image display device 1000) in Example 5 is also capable of performing the series of operations described in Example 4, starting with the process of the character conversation device (spatial floating image display device 1000) picking up User 230's voice with a microphone. In addition, in the character conversation device (spatial floating image display device 1000) in Example 5, User 230 can perform actions to the character conversation device (spatial floating image display device 1000) through user operation via the operation input unit 1107 in Figure 3. Here, an example of the operation input unit 1107 in Figure 3 is a mouse, keyboard, touch panel, etc. 【0385】 Furthermore, in the character conversation device (space-floating image display device 1000) of Example 5, the user 230 can perform actions on the character conversation device (space-floating image display device 1000) through aerial operations detected by the aerial operation detection sensor 1351 and aerial operation detection unit 1350 shown in Figure 3. In addition, the user 230 can also input user operations to the character conversation device (space-floating image display device 1000) by operating the mobile information processing terminal 20010 and communicating from the mobile information processing terminal 20010 to the character conversation device (space-floating image display device 1000). 【0386】 Alternatively, the display panel 20011 of the mobile information processing terminal 20010 may display an information-storage image, such as a two-dimensional code containing information that the user wants to convey to the character conversation device (space-floating image display device 1000), and the imaging unit 1180 of the character conversation device (space-floating image display device 1000) as shown in Figure 3 may capture this display. The control unit 1110 of the character conversation device (space-floating image display device 1000) may extract information from the information-storage image, such as a two-dimensional code, captured by the imaging unit 1180 and obtain the information. Alternatively, the display panel 20011 of the mobile information processing terminal 20010 may display an image that the user wants to convey to the character conversation device (space-floating image display device 1000), and the imaging unit 1180 of the character conversation device (space-floating image display device 1000) as shown in Figure 3 may capture this display. The control unit 1110 of the character conversation device (space-floating image display device 1000) may perform image recognition processing on the image captured by the imaging unit 1180 and obtain the result of said image recognition processing. 【0387】 Thus, in the character conversation device (spatial floating image display device 1000) of Example 5, the types of actions that the user 230 can perform on the character conversation device (spatial floating image display device 1000) are greater than those of the character conversation device (spatial floating image display device 1000) described in Example 4. As a result, the character conversation device (spatial floating image display device 1000) of Example 5 can acquire the results of actions performed by the user 230 other than the user's voice, and generate instruction sentences (prompts) to send to the large-scale language model server 20001 based on these results. This makes it possible to more favorably include types of information other than natural language text information extracted from the user's voice in the instruction sentences sent to the large-scale language model server 20001. Examples of types of information other than natural language text information extracted from the user's voice include images, videos, and audio. 【0388】 Next, the character conversation device (spatial floating image display device 1000) of this embodiment sends instruction texts to the large-scale language model server 20001 using an API. In this embodiment as well, instruction texts may be metadata in which information is stored using tags in a markup language or JSON format. In this embodiment as well, there are two types of instruction texts: setting instruction texts that store instructions such as initial settings, and user instruction texts that reflect instructions from the user. Type identification information that identifies whether an instruction text is a setting instruction text or a user instruction text may be stored in a part of the instruction text other than the main message. In this case, the instruction text includes natural language text information as the main message. Furthermore, in this embodiment, in addition to natural language text information, the main message of the instruction text may include non-natural language information sources such as images, videos, or audio as a type of information other than natural language text information. A specific method for including non-natural language information sources in instruction texts will be described later. 【0389】 The large-scale language model server 20001 of this embodiment has a multimodal large-scale language model that can process non-natural language information sources in conjunction with natural language text information. The large-scale language model server 20001 receives an instruction sentence from a character conversation device (spatial floating image display device 1000). Based on the instruction sentence, the multimodal large-scale language model performs inference and generates a response containing natural language text information as a result of the inference. Here, since the artificial intelligence of the large-scale language model server 20001 is a multimodal large-scale language model, the response can include non-natural language information sources such as images, videos, or audio in addition to natural language text information. 【0390】 The character conversation device (spatial floating image display device 1000) receives a response from the large-scale language model server 20001 and extracts natural language text information and non-natural language information sources such as images, videos, or audio stored as the main message in the response. The character operation program of the character conversation device (spatial floating image display device 1000) may use speech synthesis technology to generate natural language audio as a response to the user based on the natural language text information extracted from the aforementioned response, and output it from the audio output unit 1140, which is a speaker, so that it sounds as if it were the voice of the character 19051 displayed on the display screen. 【0391】 Furthermore, the character operation program of the character conversation device (space-floating image display device 1000) may display natural language characters that serve as a response to the user on the display screen of the character conversation device (space-floating image display device 1000), based on the natural language text information extracted from the aforementioned response. In this case, the characters may be displayed together with character 19051, superimposed on the image of character 19051, or displayed in place of the image of character 19051. The video control unit 1160 may perform these specific processes. 【0392】 Furthermore, the character operation program of the character conversation device (space-floating image display device 1000) may display an image on the display screen of the character conversation device (space-floating image display device 1000) in order to present it to the user, based on the image information of the non-natural language information source extracted from the aforementioned response. At this time, the image may be displayed together with character 19051, superimposed on the image of character 19051, or displayed in place of the image of character 19051. These specific processes can be executed by the image control unit 1160. 【0393】 Furthermore, the character operation program of the character conversation device (space-floating image display device 1000) may display the video information of the non-natural language information source extracted from the aforementioned response on the display screen of the character conversation device (space-floating image display device 1000) in order to present it to the user. In this case, the video may be displayed together with character 19051, superimposed on the video of character 19051, or displayed in place of the video of character 19051. These specific processes can be executed by the video control unit 1160. 【0394】 Furthermore, the character operation program of the character conversation device (space-floating image display device 1000) may output voice generated based on the voice information of the non-natural language information source extracted from the aforementioned response from the voice output unit 1140, which is a speaker. 【0395】 As described above, with the character conversation device (spatial floating image display device 1000) shown in Figure 20C, or the character conversation system including the character conversation device (spatial floating image display device 1000) and the large-scale language model server 20001, it is not necessary to install the large-scale language model itself, which requires a massive amount of data and computing resources for training, into the character conversation device (spatial floating image display device 1000) itself. Furthermore, the advanced natural language processing and non-natural language information processing capabilities of the multimodal large-scale language model can be utilized via an API. In addition to responses based on natural language text, responses based on non-natural language information sources can be provided in response to user actions towards the character, enabling more appropriate conversations. 【0396】 Next, an example of the operation of the character conversation device (spatial floating image display device 1000) of Embodiment 5 of the present invention will be described using Figure 20D. This can also be described as an example of the operation of a character conversation system including the spatial floating image display device 1000 and the large-scale language model server 20001. Specifically, Figure 20D shows an example of natural language text and non-natural language information sources such as images for the main message of an instruction sent from the character conversation device (spatial floating image display device 1000) to the large-scale language model server 20001, and an example of natural language text and non-natural language information sources such as images for the main message of the server response. In this embodiment, non-natural language information sources can include images, videos, and audio, but Figure 20D shows an example of an image as a non-natural language information source. 【0397】 Furthermore, Figure 20D shows the exchange of instructions and responses in chronological order, from the first round of setting instructions and user instructions and their responses to the second round of user instructions and their responses. Here, the instructions and responses shown in Figure 20D include non-natural language information sources 20061 and 20062, which were not shown in Figure 19D of Example 4. In the example of Figure 20D, both non-natural language information sources 20061 and 20062 are images. 【0398】 In Figure 20D, for the sake of simplicity, an image of the non-natural language information source 20061 is shown embedded within the instruction text. However, there are multiple methods for transmitting or specifying data for the non-natural language information source 20061 in the instruction text sent from the character conversation device (spatial floating image display device 1000) to the large-scale language model server 20001. The character conversation device (spatial floating image display device 1000) can use any one of these methods, or switch between them. An example of each method will be explained below. 【0399】 The first method for transmitting or specifying non-natural language information source data in an instruction is used, for example, when the non-natural language information source to be specified is located on a server or other location connected to a network such as the Internet. A specific example of the first method is to use tag information in the instruction to specify a non-natural language information source file located on a network such as the Internet, using the network location information (so-called URL, etc.) and the file name. 【0400】 For example, a tag used to specify an image in a markup language. <img src=""****”"> You can also specify an image that exists on a network such as the internet by using the **** part and writing the location information and file name information of the image file. Alternatively, you can use the tag to specify a video in a markup language. <video src=""****”">You can also specify a video that exists on a network such as the internet by using the **** part and writing the location information and file name information of the video file. Alternatively, you can use a tag that specifies audio in a markup language. <audio src=""****”">By using this tag and replacing the **** section with location and filename information for the audio file, you can specify audio files that exist on a network such as the Internet. The example format of this tag is just one example, and you may use other proprietary formats. In any case, the information specifying the location and filename of the non-natural language source file should be stored in the instruction statement. 【0401】 As in the first method, when information specifying the location and filename of a non-natural language information source file is stored in the instruction statement, the instruction statement itself does not need to store the data of the non-natural language information source file. Therefore, the amount of data in the instruction statement can be reduced. In the first method, the large-scale language model server 20001 that receives an instruction statement specifying non-natural language information source data can use the location and filename information of the non-natural language information source file stored in the instruction statement to obtain the non-natural language information source file located on a server or other location connected to a network such as the Internet. 【0402】 Here, we will explain how location information and file name information are input when the character conversation device (spatial floating image display device 1000) specifies non-natural language information source data in an instruction sentence using the first method. In Figure 20C, we have explained that in this embodiment, the types of actions that the user 230 can perform on the character conversation device (spatial floating image display device 1000) have increased compared to Embodiment 4, in addition to the user 230's voice. Therefore, for example, the user 230 may input location information such as a URL for specifying non-natural language information source data, file name information, etc., through user operation (e.g., mouse, keyboard, touch panel) via the operation input unit 1107 in Figure 3. 【0403】 Furthermore, in the character conversation device (space-floating image display device 1000), the control unit 1110 may work in cooperation with the memory 1109 to execute a web browser program and display the GUI of the web browser program on the display screen of the character conversation device (space-floating image display device 1000). User operations on the GUI of the web browser program may be received via the operation input unit 1107 (for example, mouse, keyboard, touch panel) or by the user's aerial operations detectable by the aerial operation detection sensor 1351 and the aerial operation detection unit 1350, and non-natural language information source data such as images, videos, and audio selected on the browser screen of the web browser program may be used as the data to be specified in the instruction statement. In this case, the web browser program should acquire the location information and file name information of the non-natural language information source data and pass it to the character operation program. 【0404】 Alternatively, user 230 may operate the mobile information processing terminal 20010 to communicate with the character conversation device (spatial floating image display device 1000) and input location information such as a URL for specifying non-natural language information source data into the character conversation device (spatial floating image display device 1000). Alternatively, as explained in Figure 20C, location information such as a URL for specifying non-natural language information source data, file name information, etc., may be input by displaying an information-storing image such as a two-dimensional code on the display panel 20011 of the mobile information processing terminal 20010, performing image recognition processing on the image captured by the imaging unit 1180 of the character conversation device (spatial floating image display device 1000), and obtaining the result of the image recognition processing. 【0405】 Furthermore, the use of the first method for transmitting or specifying non-natural language information source data in an instruction is not limited to cases where the non-natural language information source file already exists on a server or other location connected to a network such as the Internet. For example, if it is desired to include non-natural language information source data such as images, videos, and audio stored in the storage unit 1170 of the character conversation device (space-floating image display device 1000) in an instruction, the character conversation device (space-floating image display device 1000) may upload the non-natural language information source data to a second server 19002 via the Internet 19000 and include the Internet location information (so-called URL, etc.) and file name of the uploaded non-natural language information source data on the second server 19002 in the instruction. In this case, the second server 19002 functions as a so-called intermediate server. 【0406】 Similarly, if it is desired to include non-natural language information source data such as images, videos, and audio stored in the storage unit 20016 of the mobile information processing terminal 20010 in the instruction text, the mobile information processing terminal 20010 may upload the non-natural language information source data to the second server 19002 via the internet 19000. The mobile information processing terminal 20010 or the second server 19002 may transmit the internet location information (so-called URL, etc.) and file name of the non-natural language information source data on the second server 19002 to the character conversation device (space-floating image display device 1000), and the character operation program of the character conversation device (space-floating image display device 1000) may include the acquired internet location information (so-called URL, etc.) and file name of the non-natural language information source data uploaded to the second server 19002 in the instruction text. 【0407】 Furthermore, the character operation program of the character conversation device (space-floating image display device 1000) may work in cooperation with the memory 1109 and the storage unit 1170 to construct a media server within the character conversation device (space-floating image display device 1000) that can be accessed from other servers via the internet 19000. In this case, when the character conversation device (space-floating image display device 1000) specifies non-natural language information source data in an instruction statement using the first method, it may store in the instruction statement location information on the internet (such as a URL) indicating the media server constructed within the character conversation device (space-floating image display device 1000) itself, and the file name of the corresponding non-natural language information source data. 【0408】 Next, a second method for specifying the transmission or designation of non-natural language information source data in an instruction is, for example, simply to store (attach) the non-natural language information source data itself in the instruction (prompt) and send it. Generally, non-natural language information source data such as images, videos, and audio are larger in data size than natural language text information. Therefore, in this case, the data size of the instruction (prompt) itself will be larger than in the first method. The character operation program of the character conversation device (spatial floating image display device 1000) can store the non-natural language information source data that it wants to store (attach) in the instruction (prompt) in memory 1109, and when sending the instruction (prompt), it can store (attach) the data in the instruction (prompt) via the communication unit 1132 and output it to the large-scale language model server 20001. The non-natural language information source data that the character operation program of the character conversation device (space-floating image display device 1000) stores in memory 1109 may be acquired by the communication unit 1132 via the internet 19000, acquired by the communication unit 1132 from the mobile information processing terminal 20010, or read from the storage unit 1170 and stored in memory 1109. 【0409】 As described above, the character conversation device (spatial floating image display device 1000) can transmit or specify non-natural language information source data using instructional text. 【0410】 The large-scale language model server 20001 is a multimodal large-scale language model that can process non-natural language information sources together with natural language text information. As shown in the example in Figure 20D, through the first round of user instructions, it can acquire images of a swimming pool and poolside, which are non-natural language information sources 20061, and natural language text information. As a result of this inference, it can output natural language text information as shown in the figure, in response to the first round of user instructions. 【0411】 Furthermore, since the large-scale language model server 20001 is a multimodal large-scale language model that can process non-natural language information sources together with natural language text information, as shown in the example in Figure 20D, in the response to the second round of user instructions, the large-scale language model server 20001 can include the non-natural language information source 20062 generated by the inference of the multimodal large-scale language model in its response and transmit it to the character conversation device (spatial floating image display device 1000). In Figure 20D, the non-natural language information source 20062 is an example of an image in which a circle is added to an image of a swimming pool and poolside, which is the non-natural language information source 20061. Note that the non-natural language information source 20062 stored in the response is not limited to the image shown in Figure 20D, but may also be a video or audio. 【0412】 When the response from the large-scale language model server 20001 includes non-natural language information sources other than natural language text information, the method can be the first method or a method similar to the second method used by the character conversation device (spatial floating image display device 1000) to transmit or specify non-natural language information source data in the instruction statement. 【0413】 Specifically, in a method similar to the first method described above, the large-scale language model server 20001 may store information specifying the location and file name of the non-natural language information source file in the instruction statement in its response. The non-natural language information source 20062 itself, such as images, videos, and audio, may be kept by the large-scale language model server 20001, or it may be transferred to and kept by the second server 19002, which functions as an intermediate server. In either case, the large-scale language model server 20001 may store information specifying the location and file name of the non-natural language information source file in the instruction statement in its response. The character conversation device (space-floating image display device 1000), having received the response, may use the location and file name information of the non-natural language information source file described in the instruction statement to access the large-scale language model server 20001 or the second server 19002 to obtain the non-natural language information source 20062. 【0414】 Furthermore, specifically, as a method similar to the second method described above, the large-scale language model server 20001 may store (attach) the non-natural language information source 20062 file data itself in the response and send it to the character conversation device (spatial floating image display device 1000). The character conversation device (spatial floating image display device 1000) can acquire the data of the non-natural language information source 20062 stored (attached) in the instruction text and use it for various outputs to the user 230. 【0415】 As described above using Figure 20D, the operation of the character conversation device (spatial floating image display device 1000) and character conversation system of Embodiment 5 enables the transmission and reception of instruction sentences and responses between the character displayed on the character conversation device (spatial floating image display device 1000) and the user 230, enabling conversation using non-natural language information such as images, videos, and audio. This makes it possible to achieve more advanced and natural conversations, as shown in the messages in Figure 20D. 【0416】 Next, using Figure 20E, an example of the operation of the character conversation device (spatial floating image display device 1000) of Embodiment 5 of the present invention will be described. This can also be described as an example of the operation of a character conversation system including the spatial floating image display device 1000 and the large-scale language model server 20001. Specifically, Figure 20E is an example of the main message of the instruction sent from the spatial floating image display device 1000 to the large-scale language model server 20001, which forms the basis of the conversation between the character 19051 displayed on the spatial floating image display device 1000 and the user 230, and the main message of the server response that is the response. 【0417】 Figure 20E shows an example of a new conversation that takes place after the series of conversations shown in Figure 20D has ended, when user 230 speaks to character 19051 again. In the example in Figure 20E, no processing using the conversation history is performed, as explained in Figures 19F, 19G, and 19I of Example 4. Therefore, Figure 20E, like Figure 19E of Example 4, shows a response in which the name of the large-scale language model itself, the role to be played, the characteristics of the conversation, the user's name, and the conversation history, which were included in the setting instruction, are not remembered at all. 【0418】 Next, using Figure 20F, an example of the operation of the character conversation device (spatial floating image display device 1000) of Embodiment 5 of the present invention will be described. This can also be described as an example of the operation of a character conversation system including the spatial floating image display device 1000 and the large-scale language model server 20001. Specifically, Figure 20F is an example of the main message of the instruction sent from the spatial floating image display device 1000 to the large-scale language model server 20001, which forms the basis of the conversation between the character 19051 displayed on the spatial floating image display device 1000 and the user 230, and the main message of the server response that is the response. 【0419】 Figure 20F shows an example of a new conversation that takes place after the series of conversations shown in Figure 20D has ended, when user 230 speaks to character 19051 again. In Figure 20F, the method of storing a message explaining the history of past conversations in the setting instruction statement, as explained in Figure 19F of Embodiment 4, is also applied to the character conversation device (spatial floating image display device 1000) of Embodiment 5. Specifically, the message that constitutes the content of the setting instruction statement in Figure 20D is stored as a reset message in Figure 20F, and following the reset message, a message explaining the history of past conversations is stored as a conversation history message. 【0420】 The large-scale language model server 20001 in Example 5 is a multimodal large-scale language model that can process non-natural language information sources together with natural language text information. Therefore, in past instructions and responses, non-natural language information source data may have been transmitted or specified. Accordingly, in the example in Figure 20F, the conversation history message reflects not only the natural language text information in past instructions and responses, but also the transmission or specification of non-natural language information source data in past instructions and responses. The specific method of transmitting or specifying non-natural language information source data in the instructions in Figure 20F is the same as the transmission or specif...
Claims
[Claim 1] A character display device that allows you to converse with a character, A display unit capable of displaying characters, Speakers and, Mike and, Communications Department and, Control unit and Equipped with, The communication unit is capable of communicating with a server that can perform inference on a large-scale language model, which is an artificial intelligence, and sends an instruction sentence containing natural language text information to the server and receives a response containing natural language text information from the server. The speaker outputs natural language speech using synthesized speech based on the text information contained in the response, so that it sounds to the user as the voice of the character displayed on the display unit. Furthermore, it is equipped with a storage unit, The aforementioned instruction includes a setting instruction containing natural language text information, and the setting instruction can specify at least one setting among the roles, names, conversational characteristics, or personalities that the character should play, based on the natural language text information. The storage unit stores and maintains in a database the conversation history between the character and the user, which is generated based on natural language text information communicated between the character display device and the server via the communication unit, in natural language. Character display device. [Claim 2] A character display device that allows you to converse with a character, A display unit capable of displaying characters, Speakers and, Mike and, Communications Department and, Control unit and Equipped with, The communication unit is capable of communicating with a server that can perform inference on a large-scale language model, which is an artificial intelligence, and sends an instruction sentence containing natural language text information to the server and receives a response containing natural language text information from the server. The speaker outputs natural language speech using synthesized speech based on the text information contained in the response, so that it sounds to the user as the voice of the character displayed on the display unit. Furthermore, it is equipped with a storage unit, The aforementioned instruction includes a setting instruction containing natural language text information, and the setting instruction can specify at least one setting among the roles, names, conversational characteristics, or personalities that the character should play, based on the natural language text information. The storage unit stores and maintains in a database the conversation history between the character and the user, which is generated based on the natural language text information communicated between the character display device and the server via the communication unit, in natural language. After a series of conversations has concluded between the character display device and the server via the communication unit, in which natural language text information is communicated, a new conversation is initiated by sending a setting instruction statement containing natural language text information to the server again via the communication unit. In this case, natural language text information based on the conversation history stored in the database is included in the setting instruction statement sent to the server. Character display device. [Claim 3] A character display device that allows you to converse with a character, A display unit capable of displaying characters, Speakers and, Mike and, Communications Department and, Control unit and Equipped with, The communication unit is capable of communicating with a server that can perform inference on a large-scale language model, which is an artificial intelligence, and sends an instruction sentence containing natural language text information to the server and receives a response containing natural language text information from the server. The speaker outputs natural language speech using synthesized speech based on the text information contained in the response, so that it sounds to the user as the voice of the character displayed on the display unit. Furthermore, it is equipped with a storage unit, The aforementioned instruction includes a setting instruction containing natural language text information, and the setting instruction can specify at least one setting among the roles, names, conversational characteristics, or personalities that the character should play, based on the natural language text information. The storage unit stores and maintains in a database the conversation history between the character and the user, which is generated based on the natural language text information communicated between the character display device and the server via the communication unit, in natural language. After a series of conversations are completed through the communication unit, where the character display device and the server communicate natural language text information via the communication unit, a new conversation is initiated by sending a setting instruction statement containing natural language text information to the server again via the communication unit. In this case, the natural language text information based on the conversation history stored in the database is included in the setting instruction statement sent to the server. The database in the storage unit stores natural language text information resulting from text summarization processing of the conversation history, thereby reducing the number of tokens in the natural language text information based on the conversation history included in the setting instruction statement sent to the server. Character display device. [Claim 4] A character display device according to claim 3, The text summarization process is performed on a server different from the server, or by a text summarization program that is stored in the memory of the character display device. Character display device. [Claim 5] A character display device according to claim 3, By deleting older conversation history entries from the database of the storage unit, the number of tokens in the natural language text information based on the conversation history included in the configuration instruction message sent to the server is reduced. Character display device. [Claim 6] A character display device that can converse with a character, A display unit capable of displaying characters, Speakers and, Mike and, Communications Department and, Control unit and Equipped with, The communication unit is capable of communicating with a server that can perform inference on a large-scale language model, which is an artificial intelligence, and sends an instruction sentence containing natural language text information to the server and receives a response containing natural language text information from the server. The speaker outputs natural language speech using synthesized speech based on the text information contained in the response, so that it sounds to the user as the voice of the character displayed on the display unit. Furthermore, it is equipped with a storage unit, The aforementioned instruction includes a setting instruction containing natural language text information, and the setting instruction can specify at least one setting among the roles, names, conversational characteristics, or personalities that the character should play, based on the natural language text information. The storage unit stores and maintains in a database the conversation history between the character and the user, which is generated based on the natural language text information communicated between the character display device and the server via the communication unit, in natural language. The control unit is capable of controlling the display unit to switch the character displayed on the display unit to a different character from the one currently displayed on the display unit, selected from a plurality of characters. The database held by the storage unit records, for each of the multiple characters, a setting instruction statement containing natural language text information that specifies at least one setting among the roles, names, conversational characteristics, or personalities that the character should play, and a conversation history between the character and the user generated based on natural language text information communicated between the character display device and the server via the communication unit, in correspondence. Character display device. [Claim 7] A character display device according to claim 6, After a series of conversations for a predetermined character are completed through the communication unit, where the character display device and the server communicate natural language text information via the communication unit, a new conversation for the predetermined character is initiated by sending a setting instruction statement containing natural language text information to the server again via the communication unit. At this time, the conversation history associated with the predetermined character is selected from the conversation histories associated with each of the multiple characters stored in the database, and natural language text information based on that conversation history is included in the setting instruction statement sent to the server. Character display device. [Claim 8] A character display device according to claim 7, When the control unit performs control to switch to a character other than the one displayed on the display unit, even after setting the character displayed on the display unit to switch to the image of the other character and setting the synthesized voice output from the speaker to the synthesized voice corresponding to the other character, there is a state in which the setting instruction statement, which includes natural language text information based on the conversation history stored in the database associated with the other character, is not sent to the server, thereby reducing the frequency of sending the setting instruction statement to the server. Character display device. [Claim 9] A character display device according to claim 6, The display screen of the aforementioned display unit is a display screen formed from an aerial levitation image. Character display device. [Claim 10] A character display device that can converse with a character, A display unit capable of displaying characters, Speakers and, Mike and, Communications Department and, Control unit and Equipped with, The communication unit is capable of communicating with a server that can perform inference on a large-scale language model, which is an artificial intelligence, and sends an instruction sentence containing natural language text information to the server and receives a response containing natural language text information from the server. The speaker outputs natural language speech using synthesized speech based on the text information contained in the response, so that it sounds to the user as the voice of the character displayed on the display unit. The instruction sent by the communication unit includes, in addition to the text information which is natural language, information that specifies data from a non-natural language information source, and is sent to the server. Furthermore, it is equipped with a storage unit, The storage unit stores and maintains in a database the conversation history between the character and the user, which is generated based on information regarding natural language text information and non-natural language information data communicated together with the natural language text information via the communication unit. Character display device. [Claim 11] A character display device according to claim 10, The information regarding the data of the non-natural language source included in the conversation history stored in the database includes information specifying the data of the natural language source or the data of the natural language source itself. Character display device. [Claim 12] A character display device according to claim 11, After a series of conversations has concluded between the character display device and the server via the communication unit, in which natural language text information and information regarding the data of the non-natural language information source are communicated, a new conversation is initiated by sending a setting instruction statement containing natural language text information to the server again via the communication unit, and the setting instruction statement sent to the server includes natural language text information and information regarding the data of the non-natural language information source based on the conversation history stored in the database. Character display device. [Claim 13] A character display device according to claim 12, The data of the aforementioned non-natural language information source is image data, video data, or audio data. Character display device. [Claim 14] A character display device according to claim 13, The server is a server having a multimodal large-scale language model capable of performing inference that includes information from non-natural language sources in addition to natural language text information. Character display device. [Claim 15] A character display device that allows you to converse with a character, A display unit capable of displaying characters, Speakers and, Mike and, Communications Department and, Control unit and Equipped with, The communication unit is capable of communicating with a server that can perform inference on a large-scale language model, which is an artificial intelligence, and sends an instruction sentence containing natural language text information to the server and receives a response containing natural language text information from the server. The speaker outputs natural language speech using synthesized speech based on the text information contained in the response, so that it sounds to the user as the voice of the character displayed on the display unit. The instruction sent by the communication unit includes, in addition to the text information which is natural language, information that specifies data from a non-natural language information source, and is sent to the server. Furthermore, it is equipped with a storage unit, The storage unit stores and maintains in a database the conversation history between the character and the user, which is generated based on information regarding natural language text information and non-natural language information data communicated together with the natural language text information via the communication unit. The conversation history of the aforementioned database includes location information indicating the location of the data of the aforementioned non-natural language information source, The location information stored in the database is location information obtained by rewriting another location information that was previously indicated as the location of the data of the non-natural language information source before being stored as the conversation history, into location information that indicates the location within the character display device. Character display device. [Claim 16] An AI assistant display device, A display unit capable of showing an AI assistant, Speakers and, Mike and, Communications Department and, Control unit and Equipped with, The communication unit is capable of communicating with a server that can perform inference on a large-scale language model, which is an artificial intelligence, and sends an instruction sentence containing natural language text information to the server and receives a response containing natural language text information from the server. The speaker outputs natural language speech using synthesized speech based on the text information contained in the response, so that it sounds to the user as the voice of the AI assistant displayed on the display unit. Furthermore, it is equipped with a storage unit, The aforementioned instruction statement includes a setting instruction statement containing natural language text information, and the setting instruction statement allows specifying at least one setting from among the roles, names, conversational characteristics, or personalities that the AI assistant should play, based on the natural language text information. The storage unit stores and maintains in a database the conversation history between the AI assistant and the user, which is generated based on natural language text information communicated between the AI assistant display device and the server via the communication unit, in natural language. AI assistant display device. [Claim 17] An AI assistant display device, A display unit capable of showing an AI assistant, Speakers and, Mike and, Communications Department and, Control unit and Equipped with, The communication unit is capable of communicating with a server that can perform inference on a large-scale language model, which is an artificial intelligence, and sends an instruction sentence containing natural language text information to the server and receives a response containing natural language text information from the server. The speaker outputs natural language speech using synthesized speech based on the text information contained in the response, so that it sounds to the user as the voice of the AI assistant displayed on the display unit. Furthermore, it is equipped with a storage unit, The aforementioned instruction statement includes a setting instruction statement containing natural language text information, and the setting instruction statement allows specifying at least one setting from among the roles, names, conversational characteristics, or personalities that the AI assistant should play, based on the natural language text information. The storage unit stores and maintains in a database the conversation history between the AI assistant and the user, which is generated based on natural language text information communicated between the AI assistant display device and the server via the communication unit, in natural language. After a series of conversations has concluded between the AI assistant display device and the server via the communication unit, in which natural language text information is communicated, a new conversation is initiated by sending a setting instruction statement containing natural language text information to the server again via the communication unit. In this case, natural language text information based on the conversation history stored in the database is included in the setting instruction statement sent to the server. AI assistant display device. [Claim 18] An AI assistant display device, A display unit capable of showing an AI assistant, Speakers and, Mike and, Communications Department and, Control unit and Equipped with, The communication unit is capable of communicating with a server that can perform inference on a large-scale language model, which is an artificial intelligence, and sends an instruction sentence containing natural language text information to the server and receives a response containing natural language text information from the server. The speaker outputs natural language speech using synthesized speech based on the text information contained in the response, so that it sounds to the user as the voice of the AI assistant displayed on the display unit. Furthermore, it is equipped with a storage unit, The aforementioned instruction statement includes a setting instruction statement containing natural language text information, and the setting instruction statement allows specifying at least one setting from among the roles, names, conversational characteristics, or personalities that the AI assistant should play, based on the natural language text information. The storage unit stores and maintains in a database the conversation history between the AI assistant and the user, which is generated based on natural language text information communicated between the AI assistant display device and the server via the communication unit, in natural language. After a series of conversations are completed through the communication unit, where the AI assistant display device and the server communicate natural language text information via the communication unit, a new conversation is initiated by sending a setting instruction statement containing natural language text information to the server again via the communication unit. In this case, the natural language text information based on the conversation history stored in the database is included in the setting instruction statement sent to the server. The database in the storage unit stores natural language text information resulting from text summarization processing of the conversation history, thereby reducing the number of tokens in the natural language text information based on the conversation history included in the setting instruction statement sent to the server. AI assistant display device. [Claim 19] An AI assistant display device according to claim 18, The aforementioned text summarization process is performed on a server different from the server, or it is performed by a text summarization program that is stored in the memory of the AI assistant display device. AI assistant display device. [Claim 20] An AI assistant display device according to claim 18, By deleting older conversation history entries from the database of the storage unit, the number of tokens in the natural language text information based on the conversation history included in the configuration instruction message sent to the server is reduced. AI assistant display device. [Claim 21] An AI assistant display device, A display unit capable of showing an AI assistant, Speakers and, Mike and, Communications Department and, Control unit and Equipped with, The communication unit is capable of communicating with a server that can perform inference on a large-scale language model, which is an artificial intelligence, and sends an instruction sentence containing natural language text information to the server and receives a response containing natural language text information from the server. The speaker outputs natural language speech using synthesized speech based on the text information contained in the response, so that it sounds to the user as the voice of the AI assistant displayed on the display unit. Furthermore, it is equipped with a storage unit, The aforementioned instruction statement includes a setting instruction statement containing natural language text information, and the setting instruction statement allows specifying at least one setting from among the roles, names, conversational characteristics, or personalities that the AI assistant should play, based on the natural language text information. The storage unit stores and maintains in a database the conversation history between the AI assistant and the user, which is generated based on natural language text information communicated between the AI assistant display device and the server via the communication unit, in natural language. The control unit is capable of switching the AI assistant displayed on the display unit to a different AI assistant from among a plurality of AI assistants, from the AI assistant currently displayed on the display unit. The database held by the storage unit records, for each of the multiple AI assistants, a setting instruction statement containing natural language text information that specifies at least one setting from the roles, names, conversational characteristics, or personalities that the AI assistant should play, and a conversation history between the AI assistant and the user generated based on natural language text information communicated between the AI assistant display device and the server via the communication unit, in correspondence. AI assistant display device. [Claim 22] An AI assistant display device according to claim 21, After a series of conversations has been completed between the AI assistant display device and the server via the communication unit for a predetermined AI assistant, when a new conversation is initiated for the predetermined AI assistant by sending a setting instruction statement containing natural language text information to the server via the communication unit, the system selects the conversation history associated with the predetermined AI assistant from among the conversation histories associated with each of the multiple AI assistants stored in the database, and includes the natural language text information based on that conversation history in the setting instruction statement sent to the server. AI assistant display device. [Claim 23] An AI assistant display device according to claim 22, When the control unit performs control to switch to an AI assistant other than the one displayed on the display unit, even after setting the AI assistant displayed on the display unit to switch to the image of the other AI assistant, and setting the synthesized voice output from the speaker to the synthesized voice corresponding to the other AI assistant, there is a state in which the setting instruction statement, which includes natural language text information based on the conversation history stored in the database associated with the other AI assistant, is not sent to the server, thereby reducing the frequency of sending the setting instruction statement to the server. AI assistant display device. [Claim 24] An AI assistant display device according to claim 21, The display screen of the aforementioned display unit is a display screen formed from an aerial levitation image. AI assistant display device. [Claim 25] An AI assistant display device, A display unit capable of showing an AI assistant, Speakers and, Mike and, Communications Department and, Control unit and Equipped with, The communication unit is capable of communicating with a server that can perform inference on a large-scale language model, which is an artificial intelligence, and sends an instruction sentence containing natural language text information to the server and receives a response containing natural language text information from the server. The speaker outputs natural language speech using synthesized speech based on the text information contained in the response, so that it sounds to the user as the voice of the AI assistant displayed on the display unit. The instruction sent by the communication unit includes, in addition to the text information which is natural language, information that specifies data from a non-natural language information source, and is sent to the server. Furthermore, it is equipped with a storage unit, The storage unit stores and maintains in a database the conversation history between the AI assistant and the user, which is generated based on information regarding natural language text information and non-natural language information data communicated together with the natural language text information via the communication unit. AI assistant display device. [Claim 26] An AI assistant display device according to claim 25, The information regarding the data of the non-natural language source included in the conversation history stored in the database includes information specifying the data of the natural language source or the data of the natural language source itself. AI assistant display device. [Claim 27] An AI assistant display device according to claim 26, After a series of conversations has concluded between the AI assistant display device and the server via the communication unit, where the AI assistant display device and the server communicate natural language text information and information regarding the data of the non-natural language information source, when a new conversation is initiated by sending a setting instruction statement containing natural language text information to the server again via the communication unit, the setting instruction statement sent to the server includes natural language text information and information regarding the data of the non-natural language information source based on the conversation history stored in the database. AI assistant display device. [Claim 28] An AI assistant display device according to claim 27, The data of the aforementioned non-natural language information source is image data, video data, or audio data. AI assistant display device. [Claim 29] An AI assistant display device according to claim 28, The server is a server having a multimodal large-scale language model capable of performing inference that includes information from non-natural language sources in addition to natural language text information. AI assistant display device. [Claim 30] An AI assistant display device, A display unit capable of showing an AI assistant, Speakers and, Mike and, Communications Department and, Control unit and Equipped with, The communication unit is capable of communicating with a server that can perform inference on a large-scale language model, which is an artificial intelligence, and sends an instruction sentence containing natural language text information to the server and receives a response containing natural language text information from the server. The speaker outputs natural language speech using synthesized speech based on the text information contained in the response, so that it sounds to the user as the voice of the AI assistant displayed on the display unit. The instruction sent by the communication unit includes, in addition to the text information which is natural language, information that specifies data from a non-natural language information source, and is sent to the server. Furthermore, it is equipped with a storage unit, The storage unit stores and maintains in a database the conversation history between the AI assistant and the user, which is generated based on information regarding natural language text information and non-natural language information data communicated together with the natural language text information via the communication unit. The conversation history of the aforementioned database includes location information indicating the location of the data of the aforementioned non-natural language information source, The location information stored in the database is location information obtained by rewriting another location information that was previously indicated as the location of the data of the non-natural language information source before being stored as the conversation history, into location information that indicates the location within the AI assistant display device. AI assistant display device.