Information processing apparatus, information processing method, and program

JP2025004939A5Pending Publication Date: 2026-06-25CANON KK

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
CANON KK
Filing Date
2023-06-27
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing AR technologies require users to maintain a trimming pose through hand gestures for framing, leading to potential fatigue and hand shadows in the image, which can disrupt the intended processing.

Method used

An information processing device that recognizes a specific hand gesture, displays a framing frame based on it, and continues to show the frame for a predetermined time, allowing users to provide additional instructions or adjustments without maintaining the pose.

Benefits of technology

Enables the user to execute intended processing without fatigue and avoids hand shadows, ensuring the captured image aligns with their composition intentions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 00000000_0000_ABST
    Figure 00000000_0000_ABST
Patent Text Reader

Abstract

To allow execution of processing intended by a user with a specific hand gesture.SOLUTION: An information processing apparatus allows a user to visually recognize a real space through display means, and has: recognition means that recognizes a specific hand gesture made by a user; display control means that (1) displays a first frame according to the specific hand gesture on the display means, and (2) even when the recognition means changes to a state where it does not recognize the specific hand gesture after the first frame is displayed, displays the first frame continuously for at least a predetermined time duration after the specific hand gesture is recognized; and processing means that, with the first frame being displayed on the display means, executes specific processing on a range of the real space indicated by the first frame.SELECTED DRAWING: Figure 3
Need to check novelty before this filing date? Find Prior Art

Description

[Technical field]

[0001] The present invention relates to an information processing device, an information processing method, and a program. [Background technology]

[0002] In recent years, technologies such as Augmented Reality (AR) and Mixed Reality (MR) have emerged. These technologies use devices such as Head Mounted Displays (HMD), smartphones equipped with cameras, and tablet terminals. By using the cameras of these devices, it is possible to generate images in which virtual objects are superimposed on real space.

[0003] In Patent Document 1, a user wearing an HMD makes a hand gesture of a trimming pose in front of the user, and a shooting frame determined by the hand gesture is displayed in the AR space seen by the user. Then, a camera mounted on the HMD captures an image within the range indicated by the shooting frame. This hand gesture approach allows the angle of view to be determined by an intuitive operation, so that the user can easily perform shooting. [Prior art documents] [Patent documents]

[0004] [Patent Document 1] Patent Publication No. 2022-6502 Summary of the Invention [Problem to be solved by the invention]

[0005] However, in the technology of Patent Document 1, the user needs to form a trimming pose by hand gesture at the timing of shooting. Therefore, the user's arm may become tired before the timing of shooting arrives, and the position of the shooting frame may change due to the fatigue. In addition, when taking a trimming pose in an AR space, the shadow of the user's hand may be reflected on the subject. For this reason, the processing intended by the user may not be executed.

[0006] Therefore, an object of the present invention is to enable a user to realize an intended process by using a specific hand gesture. [Means for solving the problem]

[0007] One aspect of the present invention is a method for producing a composition comprising the steps of: An information processing device capable of allowing a user to visually recognize a real space via a display means, recognition means for recognizing a specific hand gesture by the user; 1) displaying a first frame corresponding to the specific hand gesture on the display means; and 2) display control means for continuing to display the first frame for at least a predetermined time after the specific hand gesture is recognized, even if the recognition means changes to a state in which the recognition means does not recognize the specific hand gesture after the first frame is displayed. a processing means for executing a specific process targeting a range of the real space indicated by the first frame in a state in which the first frame is displayed on the display means; The information processing device is characterized by having:

[0008] One aspect of the present invention is a method for producing a composition comprising the steps of: An information processing method capable of allowing a user to visually recognize a real space via a display means, comprising: a recognition step of recognizing a specific hand gesture by the user; 1) displaying a first frame corresponding to the specific hand gesture on the display means; and 2) continuing to display the first frame for at least a predetermined time after the specific hand gesture is recognized, even if the specific hand gesture is changed to an unrecognized state in the recognition step after the first frame is displayed. a processing step of executing a specific process targeted at a range of the real space indicated by the first frame in a state in which the first frame is displayed on the display means; The information processing method is characterized by having the following features. Effect of the Invention

[0009] According to the present invention, a process intended by a user can be realized by a specific hand gesture. [Brief description of the drawings]

[0010] [Figure 1] 1 is an external view of an HMD according to a first embodiment. [Diagram 2] FIG. 2 is a hardware configuration diagram of the HMD according to the first embodiment. [Diagram 3] 4 is a flowchart of a shooting control process according to the first embodiment. [Figure 4] FIG. 2 is a diagram for explaining a photographing frame according to the first embodiment. [Diagram 5] FIG. 2 is a diagram showing the relationship between a photographing frame and a user according to the first embodiment. [Figure 6] FIG. 2 is a diagram showing the relationship between a photographing frame and a user according to the first embodiment. [Figure 7] 10 is a flowchart of a shooting control process according to the second embodiment. [Figure 8] FIG. 11 is a diagram for explaining a photographing frame according to the second embodiment. [Figure 9] FIG. 11 is a hardware configuration diagram of an HMD according to a third embodiment. [Figure 10] 13 is a flowchart of a shooting control process according to a third embodiment. [Figure 11]FIG. 13 is a diagram for explaining a photographing frame according to the third embodiment. [Figure 12] FIG. 11 is a diagram showing the relationship between a photographing frame and a user according to a third embodiment. [Figure 13] 13 is a flowchart of a processing for arranging a photographing frame according to the third embodiment. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0011] Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings.

[0012] <Embodiment 1> Fig. 1 shows an external view of an HMD 1 which is an optical see-through display device according to embodiment 1. Fig. 2 shows a hardware configuration of the HMD 1. Elements with the same numbers in Fig. 1 and Fig. 2 are the same elements.

[0013] The HMD 1 is an information processing device (electronic device) for allowing a user to experience an AR space. As shown in FIG. 1, the HMD 1 includes a display unit 11 (display unit 11a and display unit 11b), an imaging unit 14 (imaging unit 14a and imaging unit 14b), a three-dimensional space recognition unit 17, and a frame.

[0014] The display unit 11 has an image projection unit 12 and a light guide unit 13. The image projection unit 12a and the light guide unit 13a guide light from a display element (not shown) of the display unit 11a to the right eye of a user wearing the HMD 1. Similarly, the image projection unit 12b and the light guide unit 13b guide light from a display element (not shown) of the display unit 11b to the left eye of a user wearing the HMD 1. The user can simultaneously perceive the images displayed on the display units 11a and 11b and the incident light (light from a subject) from the front of the HMD 1.

[0015] The imaging units 14a and 14b each capture (take images of) the real space around (including the front of) the HMD 1.

[0016] The three-dimensional space recognition unit 17 acquires information (distance information) on the distance to objects around (including in front of) the HMD 1, and generates three-dimensional space information. In the first embodiment, the HMD 1 is an optical see-through HMD using a transparent display. However, the HMD 1 may be a video see-through HMD using an opaque display.

[0017] The frame has a rim 15 and temples 16a and 16b. The display unit 11 (display units 11a and 11b) is joined to the lower surface of the rim 15. The temples 16a and 16b are joined to both sides of the rim 15.

[0018] 2, the HMD 1 has a calculation unit 101, an information processing unit 102, a communication unit 103, a primary storage unit 104, a secondary storage unit 105, a display unit 11, an imaging unit 14, an operation unit 108, a voice recognition unit 109, and a three-dimensional space recognition unit 17. The components of the HMD 1 exchange data with each other via a bus 111. The HMD 1 may have only the display unit 11 and the imaging unit 14. The HMD 1 may be controlled by an information processing device (such as a computer) having components other than the display unit 11 and the imaging unit 14 shown in FIG. 2.

[0019] The calculation unit 101 is a control unit that controls the other components. For example, the calculation unit 101 controls the display of the display unit 11. For example, the calculation unit 101 is a CPU (Central Processing Unit).

[0020] The information processing unit 102 performs, for example, various types of calculation processing (calculation processing using images acquired by the imaging unit 14, calculation processing using various evaluation values ​​acquired by the imaging unit 14, and calculation processing using data acquired by the three-dimensional space recognition unit 17).

[0021] The communication unit 103 communicates with external devices and the like.

[0022] The primary storage unit 104 temporarily stores data used by the calculation unit 101 and the information processing unit 102. The primary storage unit 104 is, for example, a dynamic random access memory (DRAM).

[0023] The secondary storage unit 105 stores data used by the calculation unit 101 and the information processing unit 102. The secondary storage unit 105 also stores recorded images and the like that have been processed (encoded) by the information processing unit 102. The secondary storage unit 105 is, for example, a flash memory.

[0024] The display unit 11 includes a display for the right eye (display unit 11a) and a display for the left eye (display unit 11b). Each of the two displays can display an image.

[0025] The imaging unit 14 converts an image formed by collecting light incident from a subject into digital data, thereby acquiring a captured image of the subject (real space).

[0026] The operation unit 108 is an operation member (such as a button or a dial) that accepts a user's operation. The operation unit 108 may include two or more operation members. In addition, the operation unit 108 may not be mounted on the HMD 1, but may be mounted on an external device that can communicate with the HMD 1 via the communication unit 103.

[0027] The voice recognition unit 109 acquires external voice using a microphone or the like.

[0028] The three-dimensional space recognition unit 17 acquires distance information and position coordinate information in the real space using a distance sensor or the like.

[0029] Shooting control by hand gestures according to the first embodiment will be described with reference to Fig. 3 to Fig. 5B. Fig. 4A to Fig. 5B are schematic diagrams showing the relationship between the position of the hand of the user 405 and the position of the shooting frame 402 in a space (AR space; a space where real space and virtual objects are fused) represented by the display of the display unit 11. Fig. 4A and Fig. 4B are schematic diagrams of the shooting frame 402 as seen by the user 405 wearing the HMD1. Fig. 5A and Fig. 5B are schematic diagrams of the user 405 wearing the HMD1 and the shooting frame 402 as seen from the side.

[0030] The process of the flowchart in Fig. 3 starts when the user 405 operates the operation unit 108 of the HMD 1 and the HMD 1 is set to the shooting mode. Each process of the flowchart in Fig. 3 is realized by the calculation unit 101 of the HMD 1 executing a program. Note that the process described as being executed by the calculation unit 101 in each step of the flowchart in Fig. 3 may be executed by the information processing unit 102.

[0031] In step S301, the calculation unit 101 controls the three-dimensional space recognition unit 17 to obtain information on the arrangement of objects in real space, information on the distance from the HMD 1 to the objects, and the like as three-dimensional space information. The calculation unit 101 stores the three-dimensional space information in the primary storage unit 104.

[0032] In step S302, the calculation unit 101 recognizes (detects) a hand gesture of the user 405 based on the three-dimensional space information. Then, as shown in Figs. 4A and 5A, when a pose (trimming pose) that represents the trimming range is made by the right hand 403 and the left hand 404, the calculation unit 101 generates a photographing frame 402 that is a "rectangle formed by a diagonal line connecting the right hand 403 and the left hand 404." Note that the trimming pose may be any pose as long as it is a pose that can specify a certain range. For example, the trimming pose may be a pose in which a circle is made by the thumb and index finger of one hand.

[0033] Further, the calculation unit 101 holds information (position, orientation, and shape information) of the photographing frame 402 in the primary storage unit 104, and outputs the information of the photographing frame 402 to the display unit 11. In this way, the calculation unit 101 can display on the display unit 11 the photographing frame 402 superimposed on the real space (the AR space visually recognized by the user via the display unit 11). In other words, the calculation unit 101 can display the photographing frame 402 on the display unit 11 so that the user 405 can perceive that the photographing frame 402 is displayed at the position of the hand gesture of the trimming pose in the real space. In FIG. 4A, the photographing frame 402 is superimposed on the subject 401 existing in the real space. Note that, in the first embodiment, the information (position, orientation, and shape) of the photographing frame 402 coincides with information of a rectangle (position, orientation, and shape information in the AR space) determined by the trimming pose formed by the user 405.

[0034] Then, in a predetermined case, the calculation unit 101 confirms (completes) the arrangement of the displayed photographing frame 402. For example, the predetermined case is when a predetermined operation is performed on the operation unit 108, or when the voice recognition unit 109 acquires a predetermined voice from the user 405. The predetermined case may be when the user 405 performs a specific action by a hand gesture, or when a certain time has passed since the hand gesture stopped.

[0035] In step S303, the calculation unit 101 determines whether or not the arrangement of the photographing frame 402 is complete. If it is determined that the arrangement of the photographing frame 402 is complete, the process proceeds to step S304. If it is determined that the arrangement of the photographing frame 402 is not complete, the process proceeds to step S302.

[0036] In step S304, the calculation unit 101 keeps the shooting frame 402 displayed on the display unit 11. At this time, as shown in FIG. 4B and FIG. 5B, even if the trimming pose hand gesture ends (even after the trimming pose hand gesture has changed to an unrecognized state), the capture frame 402 continues to be displayed.

[0037] In step S305, the calculation unit 101 judges whether or not an instruction for editing (an instruction by the user 405) has been detected. If it is judged that an instruction for editing has been detected, the process proceeds to step S306. If it is judged that an instruction for editing has not been detected, the process proceeds to step S307.

[0038] In step S306, the calculation unit 101 edits the photographing frame 402 (the position, orientation, and shape of the photographing frame 402) in response to, for example, an instruction by operating the operation unit 108 or an instruction by voice from the user 405. Alternatively, the calculation unit 101 may edit the photographing frame 402 in response to a hand gesture (motion) of the user 405 picking up the photographing frame 402 with the right hand 403 (or left hand 404) of the user 405.

[0039] In step S307, the calculation unit 101 determines whether or not an instruction regarding shooting has been detected. If it is determined that an instruction regarding shooting has been detected, the process proceeds to step S308. If it is determined that an instruction regarding shooting has not been detected, the process proceeds to step S309. In step S307, even after the hand gesture of the trimming pose is completed, the shooting frame 402 continues to be displayed. Therefore, the user 405 can freely use either hand to give an instruction regarding shooting.

[0040] In step S308, the calculation unit 101 performs processing according to an instruction regarding shooting. For example, the calculation unit 101 captures the range of real space indicated by the shooting frame 402 by the imaging unit 14 in response to an instruction by operation on the operation unit 108 or a voice from the user 405 (an instruction by a voice uttered by the user). The calculation unit 101 may capture the range of real space indicated by the shooting frame 402 by the imaging unit 14 in response to a hand gesture by the right hand 403 or the left hand 404 of the user 405 instructing shooting. In addition, the calculation unit 101 performs image processing on the captured image capturing the range of real space indicated by the shooting frame 402. Note that in steps S307 and S308, shooting may be performed in response to a specific condition being satisfied, regardless of an instruction regarding shooting. For example, the calculation unit 101 may capture the range of real space indicated by the shooting frame 402 by the imaging unit 14 in response to a certain time having elapsed since the shooting frame 402 began to be displayed.

[0041] In step S309, the calculation unit 101 determines whether or not an instruction to delete the photographing frame 402 has been detected. If it is determined that an instruction to delete the photographing frame 402 has been detected, the process proceeds to step S310. If it is determined that an instruction to delete the photographing frame 402 has not been detected, the process proceeds to step S304.

[0042] In step S310, the calculation unit 101 deletes the photographing frame 402. Specifically, the calculation unit 101 deletes the information of the photographing frame 402 from the primary storage unit 104, and hides the photographing frame 402 displayed on the display unit 11. Note that, if a predetermined time (e.g., 5 seconds) has not elapsed since the photographing frame 402 was displayed at the time when an instruction to delete the photographing frame 402 is detected, the calculation unit 101 may delete the photographing frame 402 at a timing when the predetermined time has elapsed since the photographing frame 402 was displayed. In other words, the calculation unit 101 controls the display unit 11 so that the photographing frame 402 continues to be displayed for at least a predetermined time after the photographing frame 402 is displayed.

[0043] For example, the calculation unit 101 deletes the photographing frame 402 in response to an operation on the operation unit 108 or a voice from the user 405. When a hand gesture instructing deletion of the photographing frame 402 is made with the left hand 404, the photographing frame 402 may be deleted.

[0044] In steps S309 and S310, the photographing frame 402 may be deleted (the photographing frame 402 may be made invisible) in response to a specific condition being satisfied (at the timing when the specific condition is satisfied) regardless of an instruction to delete the photographing frame 402. The calculation unit 101 may delete the photographing frame 402 when the photographing frame 402 is displayed for a certain period of time (for example, 10 seconds) or when the imaging unit 14 has performed one photographing. The calculation unit 101 may also delete the photographing frame 402 when the main subject 401, which exists within the range indicated by the photographing frame 402, moves out of the range. The calculation unit 101 may delete the photographing frame 402 when the main subject 401, which exists within the range indicated by the photographing frame 402, turns away or when the user 405 takes his / her line of sight away from the photographing frame 402 (when the user 405 no longer looks at the photographing frame 402).

[0045] The main subject 401 can be detected by a known subject detection technique. Also, whether the main subject 401 has turned away can be detected by a known face detection technique.

[0046] The gaze of the user 405 can be realized by mounting a gaze detection unit on the HMD 1. The gaze detection unit is composed of a dichroic mirror, an imaging lens, a gaze detection sensor, an infrared light emitting diode, or the like. The gaze detection unit detects the position (viewpoint position) at which the user 405 is looking on the display unit 11. The gaze detection unit detects the viewpoint position, for example, by a method called the corneal reflex method. The corneal reflex method is a method for detecting the gaze direction and viewpoint position based on the "positional relationship between the reflected light of the eyeball (particularly the cornea) that is emitted from an infrared light emitting diode and the pupil of the eyeball." In addition to this, there are various methods for detecting the gaze direction and viewpoint position, such as a method called the scleral reflex method that utilizes the difference in light reflectance between the black and white of the eye. Note that gaze detection methods other than those described above may be used as long as they can detect the gaze direction and viewpoint position.

[0047] In the first embodiment, information (position, orientation, and shape information) of the photographing frame 402 is stored in the primary storage unit 104, and the photographing frame 402 is mapped so as to be fixed at the same position in the AR space (real space). In other words, even if the user 405 changes the direction of his / her head, the photographing frame 402 is fixed in the AR space by absolute coordinates (x, y, z). This allows the user 405 to visually recognize (perceive) the photographing frame 402 as if it were continuously displayed at the same position in the real space.

[0048] 6A and 6B are diagrams of the HMD 1, the photographing frame 402, and the user 405 in the AR space as viewed from above. FIG. 6A shows the position of the photographing frame 402 before the user 405 changes the direction of his / her head. FIG. 6B shows the position of the photographing frame 402 after the user 405 changes the direction of his / her head. The position of the photographing frame 402 is fixed to the position of the photographing frame 402 before the user 405 changes the direction of his / her head, regardless of the movement of the user's head. This can be realized by the calculation unit 101 changing the position of the photographing frame 402 on the display unit 11 according to the movement of the head.

[0049] On the other hand, the calculation unit 101 may continue to display the shooting frame 402 so that the user 405 visually recognizes (perceives) that the shooting frame 402 in real space is moving in accordance with the movement of the user's 405 head. FIG. 6C shows the position of the shooting frame 402 after the user 405 changes the direction of his / her head from the state of FIG. 6A. The shooting frame 402 moves in AR space in conjunction with the movement of the user's 405 head while maintaining a distance d from the user 405. This can be achieved by the calculation unit 101 continuing to display the shooting frame 402 in the same position on the display unit 11. Furthermore, the photographing frame 402 may move dynamically to follow the movement of the main subject 401 .

[0050] In the first embodiment, the calculation is performed by the calculation unit 101 or the information processing unit 102 of the HMD 1. However, if there is a system (cloud) that can be connected to the HMD 1 via the communication unit 103, the system may perform the calculation.

[0051] Furthermore, in the first embodiment, a specific frame indicating a range in which a specific process is executed may be used instead of the shooting frame 402, which is a frame for determining the shooting angle of view. In this case, when it is determined in step S307 that an instruction to execute the specific process has been given, the specific process is executed in step S308. The specific process is, for example, a process of capturing or translating characters in the range of real space indicated by the specific frame. The specific process is, for example, a process of performing AF (autofocus) on the range of real space indicated by the specific frame.

[0052] As described above, according to the first embodiment, the HMD can display a shooting frame according to a hand gesture and then continue to display the shooting frame for a certain period of time. This eliminates the need for the user to continue a specific pose using a hand gesture, and the shooting frame does not blur or the shadow of the hand does not appear in the image. This allows an image with the composition intended by the user to be acquired.

[0053] <Embodiment 2> In the first embodiment, an example in which one photographing frame is arranged has been described. In the second embodiment, the HMD 1 recognizes multiple trimming pose hand gestures and arranges multiple photographing frames. Hereinafter, only the differences between the first embodiment and the second embodiment will be described in detail.

[0054] 8A and 8B show examples of the arrangement of a plurality of shooting frames. Each of FIG. 8A and FIG. 8B shows a display that the user 405 visually recognizes through the display unit 11. That is, FIG. 8A and FIG. 8B are diagrams showing an AR space expressed by the display of the display unit 11. In FIG. 8A, there are a plurality of subjects 802 (subjects 802-1 to 802-4) and a plurality of shooting frames 803 (shooting frames 803-1 to 803-4). The angle of view 801 is the angle of view of the HMD 1. In FIG. 8B, there are a shooting frame 804 selected by the user and shooting frames 805-1 to 805-3 not selected by the user.

[0055] Shooting control using hand gestures according to the second embodiment will be described with reference to the flowchart of Fig. 7. Note that in the description of the flowchart of Fig. 7, a description of parts common to the flowchart of Fig. 3 will be omitted.

[0056] In step S701, the calculation unit 101 controls the three-dimensional space recognition unit 17 in the same manner as in step S302 to recognize (detect) a hand gesture of the user 405. As a result, the calculation unit 101 generates one photographing frame 803 and arranges the one photographing frame 803.

[0057] In step S702, the calculation unit 101 determines whether or not the arrangement of the necessary number of shooting frames 803 has been completed. If it is determined that the arrangement of the necessary number of shooting frames 803 has been completed, the process proceeds to step S703. If it is determined that the arrangement of the necessary number of shooting frames 803 has not been completed, the process proceeds to step S701.

[0058] In the example of FIG. 8A and FIG. 8B, the processes of step S701 and step S702 are repeated until the arrangement of the four photographing frames 803 (photographing frames 803-1 to 803-4) is completed. In the first embodiment, the completion of the arrangement of the photographing frames 402 is, for example, when the user 405 uses a dedicated hand It is determined whether or not a gesture has been made. In the second embodiment, the completion of arranging the photographing frames 803 may be determined according to whether or not an upper limit on the number of arrangements has been reached, the voice of the user 405, or the time since the photographing frames 803 were arranged. The calculation unit 101 holds information on the photographing frames 803-1 to 803-4 (information on the positions, orientations, and shapes of the photographing frames 803-1 to 803-4) in the primary storage unit 104. The calculation unit 101 also outputs the photographing frames 803-1 to 803-4 to the display unit 11. As a result, the display unit 11 displays the photographing frames 803-1 to 803-4 as if they were superimposed on the real space.

[0059] In step S703, the calculation unit 101 continues to display the arranged multiple shooting frames 803 in the same manner as in step S304. At this time, when at least one of the subjects 802-1 to 802-4 is included in a certain shooting frame 803, the calculation unit 101 may perform a frame adjustment process (such as a process of changing the color or thickness of the shooting frame 803) according to the subject 802 included in the shooting frame 803.

[0060] In step S704, the calculation unit 101 controls the three-dimensional space recognition unit 17 to detect a hand gesture of the user 405. Based on the detected hand gesture, the calculation unit 101 selects a photographing frame 804 to be used in the processing from step S305 onwards from among the photographing frames 803-1 to 803-4. At this time, the calculation unit 101 may perform processing (photographic frame processing) such as changing the color or thickness of the photographing frame between the selected photographing frame 804 and the non-selected photographing frames 805-1 to 805-3, as shown in FIG. 8B.

[0061] In the second embodiment, one photographing frame 804 is selected, but multiple photographing frames 804 may be selectable. The calculation unit 101 may select a photographing frame 804 depending on whether or not any of the subjects 802-1 to 802-4 is included in each of the photographing frames 803-1 to 803-4. The determination of whether or not any of the subjects 802-1 to 802-4 is included in each of the photographing frames 803-1 to 803-4 can be realized by a known subject detection process or image determination process.

[0062] Step S705 starts when it is determined in step S307 that an instruction regarding shooting has been given. In step S705, the calculation unit 101 performs processing according to the instruction regarding shooting. For example, the calculation unit 101 shoots the range of real space indicated by the selected shooting frame 804 in the same manner as in step S308. Furthermore, the calculation unit 101 performs image processing on a shot image obtained by shooting the range of real space indicated by the shooting frame 804. At this time, the calculation unit 101 may determine shooting parameters and image processing parameters based on the range of real space indicated by the non-selected shooting frames 805-1 to 805-3 in addition to the range of real space indicated by the shooting frame 804.

[0063] According to the second embodiment, the HMD can display multiple shooting frames in response to multiple hand gestures, and then display the multiple shooting frames continuously for a certain period of time. This allows the user to select one or multiple shooting frames, take a picture according to the selected shooting frame, and take a picture with parameters that take the multiple shooting frames into consideration. This allows the user to obtain an image with the intended composition and the intended parameters.

[0064] <Embodiment 3> In the first and second embodiments, the HMD 1 displays the shooting frame at the position of the hand gesture of the trimming pose for at least a certain period of time even if the user releases the trimming pose after displaying the shooting frame. In the third embodiment, the HMD 1 displays (places) the shooting frame at a position according to the user's selection.

[0065] Hereinafter, only the differences between the third embodiment and the first embodiment (or the second embodiment) will be described in detail. do.

[0066] 9 shows a hardware configuration of an optical see-through type HMD 1 according to embodiment 3. The HMD 1 according to embodiment 3 has a gaze detection unit 110 in addition to the configuration of the HMD 1 according to embodiment 1. The gaze detection unit 110 includes a dichroic mirror, an imaging lens, a gaze detection sensor, an infrared light emitting diode, and the like. The gaze detection unit 110 detects the direction of the gaze of the user 405 (the direction in which the user 405 is looking) and the viewpoint position of the user 405 (the position in which the user 405 is looking).

[0067] Shooting control by hand gestures according to the third embodiment will be described with reference to the flowchart in Fig. 10. Figs. 11A to 11E and 12 are schematic diagrams showing the relationship between the position of the hand of a user 405 and the position of a shooting frame 402 in an AR space according to the third embodiment. Figs. 11A to 11E are schematic diagrams of the shooting frame 402 as seen by a user 405 wearing an HMD1. Fig. 12 is a schematic diagram of the user 405 wearing an HMD1 and the shooting frame 402 as seen from the side.

[0068] In embodiment 3, when a trimming pose is created by the right hand 403 and left hand 404 of a user 405, the HMD 1 generates a rectangular shooting frame 402 consisting of a diagonal line connecting the right hand 403 and the left hand 404, at a position away from the right hand 403 and the left hand 404.

[0069] 11A, even if a trimming pose is formed in a downward direction with respect to the position of the subject 401, the HMD 1 generates a photographing frame 402 in a position including the subject 401 in an upward direction from the position of the trimming pose. This reduces the fatigue of the arms of the user 405 when taking a trimming pose.

[0070] The process of the flowchart in Fig. 10 starts when the user 405 operates the operation unit 108 of the HMD 1 and the HMD 1 is set to the shooting mode. Each process of the flowchart in Fig. 10 is realized by the calculation unit 101 of the HMD 1 executing a program. Note that steps S307 and S308 according to the third embodiment are the same as steps S307 and S308 according to the first embodiment, and therefore a description thereof will be omitted.

[0071] In step S1001, the calculation unit 101 selects a method for determining a position for displaying the shooting frame 402 (hereinafter referred to as a "position determination method"). In the third embodiment, the calculation unit 101 selects a position determination method from among "a method using a viewpoint position", "a method using a position of the HMD1", "a method using a fixed angle", and "a method using a position of a main subject within a field of view" in response to a selection made by the user 405.

[0072] The display unit 11 displays an operation screen on which option buttons representing a plurality of options for the position determination method are arranged. Then, the calculation unit 101 selects the position determination method corresponding to the option button at the position touched by the user 405 when the operation screen is displayed. Information on the position determination method selected by the user 405 is stored in the secondary storage unit 105. Note that instead of the user 405 selecting the position determination method, the calculation unit 101 may select the position determination method based on the amount of movement of the viewpoint position of the user 405 or the distance to the main subject. For example, when the amount of movement of the viewpoint position of the user 405 is larger than a predetermined amount, there is a possibility that the user 405 is tracking a certain object with his / her line of sight, so the calculation unit 101 selects the "method using the viewpoint position" as the position determination method.

[0073] In step S1002, the calculation unit 101 recognizes a hand gesture of the trimming pose of the user 405. In the first embodiment, an example in which the hand gesture of the user 405 is recognized by the three-dimensional space recognition unit 17 has been described. In the third embodiment, however, the hand gesture of the user 405 is recognized by image recognition. Here, an example of recognizing a hand gesture will be described. Note that the calculation unit 101 may recognize a hand gesture using the three-dimensional space recognition unit 17, similarly to the first embodiment.

[0074] In step S1002, the calculation unit 101 determines whether or not the shape of the hand gesture of the preregistered trimming pose matches the shape of the recognized hand gesture. When a trimming pose is made by the right hand 403 and the left hand 404 as shown in FIG. 11A, the calculation unit 101 determines whether or not the shape of the recognized hand gesture matches the shape of the hand gesture of the preregistered trimming pose. If it is determined that the two shapes match, the process proceeds to step S1003. If it is determined that the two shapes do not match, the process of step S1002 is repeated.

[0075] In step S1003, the calculation unit 101 generates a photographing frame 402 in response to the hand gesture.

[0076] 11A and 12, when a trimming pose is made by the right hand 403 and left hand 404 of a user 405, the calculation unit 101 generates a rectangular photographing frame 402 defined by a diagonal line connecting the right hand 403 and the left hand 404. Information about the generated photographing frame 402 (information about the position, orientation, and shape of the photographing frame 402) is held in the primary storage unit 104.

[0077] In step S1004, the calculation unit 101 displays (places) the shooting frame 402 on the display unit 11 based on the position determination method selected in step S1001.

[0078] The process of step S1004 will be described in detail below with reference to the flowchart in FIG.

[0079] In step S1301, the calculation unit 101 determines whether or not the user 405 has selected the "method using the viewpoint position" as the position determination method. If it is determined that the "method using the viewpoint position" has been selected, the process proceeds to step S1302. If it is determined that the "method using the viewpoint position" has not been selected, the process proceeds to step S1304.

[0080] In step S1302, the calculation unit 101 detects the viewpoint position of the user 405. Specifically, the calculation unit 101 detects the position of the display unit 11 at which the user 405, who is holding the eye of the HMD 1, is looking, by the line-of-sight detection unit 110. Then, the calculation unit 101 detects the position of the display unit 11 at which the user 405 is looking, as the viewpoint position.

[0081] In step S 1303 , the calculation unit 101 places (displays) the shooting frame 402 at the viewpoint position of the user 405 .

[0082] Fig. 11B shows a schematic diagram of the shooting frame 402 as seen by the user 405 when the shooting frame 402 is placed at the viewpoint position of the user 405. As shown in Fig. 11B, the calculation unit 101 displays the shooting frame 402 on the display unit 11 so that the viewpoint position 1101 of the user and the center position of the shooting frame 402 coincide with each other. This allows the user 405 to perceive the shooting frame 402 as if it were placed in real space (AR space).

[0083] In step S1304, the calculation unit 101 determines whether or not the user 405 has selected the "method of using the position of HMD1" as the position determination method. If it is determined that the "method of using the position of HMD1" has been selected, the process proceeds to step S1305. If it is determined that the "method of using the position of HMD1" has not been selected, the process proceeds to step S1307.

[0084] In step S1305, the calculation unit 101 detects the position of the center of the display unit 11 of the HMD 1.

[0085] In step S1306, the shooting frame 402 is placed (displayed) at the center position of the display unit 11 of the HMD 1.

[0086] Fig. 11C shows a schematic diagram of the shooting frame 402 as seen by the user 405 when the shooting frame 402 is placed at the center position of the display unit 11. As shown in Fig. 11C, the calculation unit 101 displays the shooting frame 402 so that the center position 1102 of the display unit 11 and the center position of the shooting frame 402 coincide with each other. This allows the user 405 to perceive the shooting frame 402 as if it were placed in real space.

[0087] In step S1307, the calculation unit 101 checks whether or not the user 405 has selected the "method using a fixed angle" as the position determination method. If it is determined that the "method using a fixed angle" has been selected, the process proceeds to step S1308. If it is determined that the "method using a fixed angle" has not been selected, the process proceeds to step S1310.

[0088] In step S1308, the calculation unit 101 acquires information on a fixed angle that is stored in advance in the secondary storage unit 105.

[0089] In step S1309, the calculation unit 101 arranges (displays) the photographing frame 402 at "a position in the AR space that is a specific distance away from the position of the hand gesture in a direction of a fixed angle (predetermined)" (see FIG. 12). That is, the calculation unit 101 arranges (displays) the photographing frame 402 at a position on the display unit 11 that corresponds to "a position in the AR space that is a specific distance away from the position of the hand gesture in a direction of a fixed angle". Specifically, the calculation unit 101 displays the photographing frame 402 on the display unit 11 so that the user 405 can perceive, by looking at the display unit 11, that the photographing frame 402 is arranged at a position a specific distance away from the position of the hand gesture in a direction of a fixed angle.

[0090] Fig. 11D shows a schematic diagram of the photographing frame 402 seen from the user 405 when the photographing frame 402 is arranged at a position in a direction of a fixed angle from the position of the hand gesture. As shown in Fig. 11D, the calculation unit 101 displays the photographing frame 402 on the display unit 11 so that "position 1105 at a distance 1104 away from the position 1103 of the hand gesture in a direction of a fixed angle" and "the center position of the photographing frame 402" match in the AR space. This allows the user 405 to perceive the photographing frame 402 as if it were arranged in the real space.

[0091] In step S1310, calculation unit 101 determines whether user 405 has selected "method of using the position of the main subject within the field of view" as the position determination method. If it is determined that "method of using the position of the main subject within the field of view" has been selected, the process proceeds to step S1311. If it is determined that "method of using the position of the main subject within the field of view" has not been selected, the process proceeds to step S1001.

[0092] In step S1311, the calculation unit 101 detects the position of the main subject within the field of view.

[0093] 11E shows a schematic diagram of the photographing frame 402 as seen by the user 405 when the photographing frame 402 is arranged at the position of the main subject in the visual field. When calculating the position information of the main subject in the visual field, the calculation unit 101 first obtains the user's viewpoint position 1101. That is, the calculation unit 101 uses the line-of-sight detection unit 110 to calculate the viewpoint position 1101 of the display unit 11 that the user 405, who is holding the HMD 1 close to his / her eyes, is looking at. The position is acquired as the viewpoint position 1101 .

[0094] Next, the calculation unit 101 determines whether or not a main subject is present in the viewing area 406 of the user 405. The viewing area 406 of the user 405 is, for example, a circular area centered on the position 1101 of the user's viewpoint position. In FIG. 11E, a subject 401 and an object 407 are present in the viewing area 406. The calculation unit 101 determines whether or not a main subject is present based on whether or not an "subject having a shape that matches the shape of a main subject registered in advance" is present in the viewing area 406. The example of FIG. 11E shows a case where a person is registered in advance as the main subject, and the calculation unit 101 determines that the subject 401 that matches the shape of the person is present in the viewing area 406. Then, the calculation unit 101 calculates the position of the subject 401 determined to be the main subject on the display unit 11 as the position of the main subject in the field of view.

[0095] In step S1312, the calculation unit 101 arranges (displays) the photographing frame 402 at a position that indicates the main subject within the field of view on the display unit 11. For example, as shown in Fig. 11E, the calculation unit 101 arranges the photographing frame 402 so that the boundary of the subject 401 determined as the main subject is entirely contained within the photographing frame 402. This allows the user 405 to perceive the photographing frame 402 as if it were arranged in real space.

[0096] As described above, according to the third embodiment, the HMD 1 can place (display) a shooting frame at a position away from the hand gesture position. This allows the user to perform the hand gesture pose with the arm lowered, and thus allows the user to obtain an image with the intended composition with reduced fatigue and strain on the arm.

[0097] Although the present invention has been described in detail based on the preferred embodiments, the present invention is not limited to these specific embodiments, and various forms within the scope of the gist of the present invention are also included in the present invention. Parts of the above-described embodiments may be combined as appropriate.

[0098] Also, in the above, "If A is equal to or greater than B, proceed to step S1, and if A is smaller (lower) than B, proceed to step S2" may be read as "If A is greater (higher) than B, proceed to step S1, and if A is equal to or less than B, proceed to step S2." Conversely, "If A is greater (higher) than B, proceed to step S1, and if A is equal to or less than B, proceed to step S2" may be read as "If A is greater (higher) than B, proceed to step S1, and if A is smaller (lower) than B, proceed to step S2." Therefore, unless a contradiction occurs, "equal to or greater than A" may be read as "equal to or greater than A (high; long; many)," and "equal to or less than A" may be read as "equal to or less than A (low; short; few)." And, "equal to or greater than A" may be read as "equal to or greater than A," and "equal to or less than A" may be read as "equal to or less than A."

[0099] Each functional unit in each of the above embodiments (variations) may or may not be individual hardware. The functions of two or more functional units may be realized by common hardware. Each of a plurality of functions of one functional unit may be realized by individual hardware. Two or more functions of one functional unit may be realized by common hardware. Furthermore, each functional unit may or may not be realized by hardware such as an ASIC, FPGA, or DSP. For example, the device may have a processor and a memory (storage medium) in which a control program is stored. Then, the functions of at least some of the functional units of the device may be realized by the processor reading and executing the control program from the memory.

[0100] (Other embodiments) The present invention can also be realized by a process in which a program for implementing one or more of the functions of the above-described embodiments is supplied to a system or device via a network or a storage medium, and one or more processors in a computer of the system or device read and execute the program. The present invention can also be realized by a circuit (e.g., ASIC) for implementing one or more of the functions.

[0101] The disclosure of the above embodiments includes the following configurations, methods, and programs. (Configuration 1) An information processing device capable of allowing a user to visually recognize a real space via a display means, recognition means for recognizing a specific hand gesture by the user; 1) displaying a first frame corresponding to the specific hand gesture on the display means; and 2) display control means for continuing to display the first frame for at least a predetermined time after the specific hand gesture is recognized, even if the recognition means changes to a state in which the recognition means does not recognize the specific hand gesture after the first frame is displayed. a processing means for executing a specific process targeting a range of the real space indicated by the first frame in a state in which the first frame is displayed on the display means; 13. An information processing device comprising: (Configuration 2) The specific process is a process of photographing the range of the real space indicated by the first frame. 2. The information processing device according to configuration 1. (Configuration 3) The recognition means recognizes a specific hand gesture by the user multiple times; the display control means displays, on the display means, a plurality of frames corresponding to the plurality of specific hand gestures for at least the predetermined time period after the plurality of specific hand gestures are recognized; the processing means, after the first frame is selected from the plurality of frames, executes the specific process targeting the range of the real space indicated by the first frame in a state in which the first frame is displayed on the display means; 3. The information processing device according to configuration 1 or 2. (Configuration 4) The processing means determines parameters related to the specific process based on the plurality of frames. 4. The information processing device according to configuration 3. (Configuration 5) the display control means edits the position, orientation, and shape of the first frame in response to an instruction from the user. 5. The information processing device according to any one of configurations 1 to 4. (Configuration 6) The display control means, in a state in which the first frame is displayed, 1) continues to display the first frame until a specific condition is satisfied, and 2) hides the first frame when the specific condition is satisfied. 6. The information processing device according to any one of configurations 1 to 5. (Configuration 7) The specific condition being satisfied means that a specific instruction is issued by the user's operation of an operation member. 7. The information processing device according to configuration 6. (Configuration 8) The specific condition being satisfied means that a specific instruction is given by a voice uttered by the user. 7. The information processing device according to configuration 6. (Configuration 9) The specific condition being satisfied means that a specific instruction is performed by the user's hand gesture. 7. The information processing device according to configuration 6. (Configuration 10) The specific condition being satisfied means that a certain period of time has elapsed since the first frame started to be displayed. 7. The information processing device according to configuration 6. (Configuration 11) 11. The information processing device according to any one of configurations 1 to 10, wherein when the specific hand gesture is recognized, the display control means displays the first frame at a position on the display means corresponding to a position of the recognized specific hand gesture. (Configuration 12) When the specific hand gesture is recognized, the display control means displays the first frame at a position determined by a method selected from a plurality of methods. 11. The information processing device according to any one of configurations 1 to 10. (Configuration 13) The plurality of methods includes a method of displaying the first frame at a position on the display means where the user is looking, 13. The information processing device according to configuration 12. (Configuration 14) 14. The information processing device according to configuration 12 or 13, wherein the plurality of methods includes a method of displaying the first frame at a center position of the display means. (Configuration 15) The plurality of methods include a method of displaying the first frame at a position that indicates a specific subject in a visual field of the user, 15. The information processing device according to any one of configurations 12 to 14. (Configuration 16) the plurality of methods includes a method of displaying the first frame at a position on the display means corresponding to a position in a direction at a predetermined angle from a position of the specific hand gesture, 16. The information processing device according to any one of configurations 12 to 15. (Configuration 17) the display control means, while displaying the first frame, continues to display the first frame so that the user perceives the first frame as being fixed at the same position in the real space; 17. The information processing device according to any one of configurations 1 to 16. (Configuration 18) The display means is included in a display device that is worn on the user's head, the display control means, while displaying the first frame, continues to display the first frame so that the user perceives that the first frame is moving in the real space in response to a change in the orientation of the head; 17. The information processing device according to any one of configurations 1 to 16. (method) An information processing method capable of allowing a user to visually recognize a real space via a display means, comprising: a recognition step of recognizing a specific hand gesture by the user; 1) displaying a first frame corresponding to the specific hand gesture on the display means; and 2) continuing to display the first frame for at least a predetermined time after the specific hand gesture is recognized, even if the specific hand gesture is changed to an unrecognized state in the recognition step after the first frame is displayed. a processing step of executing a specific process targeted at a range of the real space indicated by the first frame in a state in which the first frame is displayed on the display means; 13. An information processing method comprising: (program) 19. A program for causing a computer to function as each of the means of the information processing device according to any one of configurations 1 to 18. [Explanation of symbols]

[0102] 1: HMD, 11: Display section, 17: 3D space recognition section, 101: Arithmetic section

Claims

1. An information processing device capable of allowing a user to visually perceive the real space through a display means, Recognition means for recognizing a specific hand gesture made by the user, 1) Display a first frame corresponding to the specific hand gesture on the display means; 2) Even if the recognition means changes to a state where it does not recognize the specific hand gesture after the first frame is displayed, the display control means continues to display the first frame for at least a predetermined time after the specific hand gesture is recognized; With the first frame displayed on the display means, a processing means that performs a specific process targeting the range of the real space indicated by the first frame within the real space, An information processing device characterized by having the following features.

2. The aforementioned specific process is the process of photographing the range of the real space indicated by the first frame. The information processing apparatus according to feature 1.

3. The recognition means recognizes multiple specific hand gestures made by the user, The display control means displays a plurality of frames corresponding to the plurality of specific hand gestures on the display means for at least a predetermined time after the plurality of specific hand gestures are recognized. The processing means, after the first frame is selected from among the plurality of frames, and with the first frame displayed on the display means, executes the specific processing targeting the range of the real space indicated by the first frame within the real space. The information processing apparatus according to claim 1 or 2.

4. The processing means determines parameters related to the specific processing based on the plurality of frames. The information processing apparatus according to claim 3.

5. The display control means edits the position, orientation, and shape of the first frame in accordance with the user's instructions. The information processing apparatus according to claim 1 or 2.

6. The display control means, when the first frame is displayed, 1) if certain conditions are met The first frame will continue to be displayed until the above conditions are met, and 2) the first frame will be hidden when the above specific conditions are met. The information processing apparatus according to claim 1 or 2.

7. The fulfillment of the aforementioned specific condition means that a specific instruction is given by the user's operation of the operating member. The information processing apparatus according to feature 6.

8. The aforementioned specific condition is met when a specific instruction is given by the user's voice. The information processing apparatus according to feature 6.

9. The aforementioned specific condition is met when a specific instruction is given by the user's hand gesture. The information processing apparatus according to feature 6.

10. The aforementioned specific condition is met when a certain amount of time has elapsed since the first frame began to be displayed. The information processing apparatus according to feature 6.

11. The information processing apparatus according to claim 1 or 2, characterized in that when the display control means recognizes the specific hand gesture, it displays the first frame at the position of the display means corresponding to the position of the recognized specific hand gesture.

12. When the specific hand gesture is recognized, the display control means displays the first frame at a position determined by a method selected from among a plurality of methods. The information processing apparatus according to claim 1 or 2.

13. The aforementioned plurality of methods include a method of displaying the first frame at the position viewed by the user among the display means, The information processing apparatus according to feature 12.

14. The information processing apparatus according to claim 12, characterized in that the plurality of methods include a method for displaying the first frame at the center of the display means.

15. The aforementioned plurality of methods include a method for displaying the first frame in a position that indicates a specific subject in the user's field of view, The information processing apparatus according to feature 12.

16. The aforementioned plurality of methods include a method for displaying the first frame at a position on the display means corresponding to a position in a predetermined angular direction from the position of the specific hand gesture, The information processing apparatus according to feature 12.

17. When the display control means is displaying the first frame, it continues to display the first frame so that the user perceives that the first frame is fixed in the same position in the real space. The information processing apparatus according to claim 3.

18. The display means is included in a display device that is attached to the user's head, When the first frame is displayed, the display control means responds to changes in the orientation of the head. Accordingly, the first frame is kept displayed so that the user perceives that the first frame is moving in the real space. The information processing apparatus according to claim 1 or 2.

19. An information processing method that allows a user to perceive the real space through a display means, A recognition step of recognizing a specific hand gesture made by the user, 1) Display a first frame corresponding to the specific hand gesture on the display means; 2) Even if the specific hand gesture changes to a state where it is not recognized in the recognition step after the first frame is displayed, the display control step continues to display the first frame for at least a predetermined time after the specific hand gesture is recognized; A processing step in which, with the first frame displayed on the display means, a specific processing is performed targeting the range of the real space indicated by the first frame within the real space, An information processing method characterized by having the following features.

20. A program for causing a computer to function as each of the means of the information processing apparatus described in claim 1 or 2.