Information processing device and device location estimation method

The integration of image analysis and sensor data with a Kalman filter in the information processing apparatus addresses device tracking challenges in VR, ensuring accurate and intuitive user interactions by stabilizing device position, enhancing gaming experiences.

JP7875936B2Active Publication Date: 2026-06-18SONY INTERACTIVE ENTERTAINMENT LLC

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
SONY INTERACTIVE ENTERTAINMENT LLC
Filing Date
2022-08-05
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

Existing information processing technologies face challenges in accurately tracking the position and orientation of devices within a 3D virtual reality space, particularly when intuitive user interactions are required, such as in gaming environments.

Method used

An information processing apparatus and method that combines image analysis of device markers with sensor data from inertial measurement units to estimate device position and orientation, using a Kalman filter for high-accuracy integration of estimation results, and includes contact and stationary determinations to stabilize device positioning.

🎯Benefits of technology

Enhances the accuracy and intuitiveness of device tracking in VR environments by stabilizing device position when not in use, thereby improving user interaction and immersion in applications like gaming.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007875936000002
    Figure 0007875936000002
  • Figure 0007875936000003
    Figure 0007875936000003
  • Figure 0007875936000004
    Figure 0007875936000004
Patent Text Reader

Abstract

A captured image acquisition unit 212 acquires a captured image of a device. A sensor data acquisition unit 214 acquires sensor data indicating the acceleration and / or angular velocity of the device. An estimation processing unit 230 estimates the position of the device on the basis of the captured image of the device. A contact determination unit 232 determines whether a user is touching the device. A stationary state determination unit 234 determines whether the device is in a stationary state on the basis of the sensor data. When it is determined that the user is not touching the device and the device is in the stationary state, the estimation processing unit 230 fixes the estimated position of the device.
Need to check novelty before this filing date? Find Prior Art

Description

【Technical Field】 【0001】 The present disclosure relates to a technique for estimating the position of a device. 【Background Art】 【0002】 Patent Document 1 discloses an information processing apparatus that identifies representative coordinates of a marker image from an image obtained by photographing a device provided with a plurality of markers, and derives position information and orientation information of the device using the representative coordinates of the marker image. The information processing apparatus disclosed in Patent Document 1 identifies a first boundary box that encloses a region where pixels having a first luminance or higher are continuous in a photographed image, and also identifies a second boundary box that encloses a region where pixels having a second luminance or higher, which is higher than the first luminance, are continuous within the first boundary box, and derives representative coordinates of the marker image based on the pixels within the first boundary box or the second boundary box. 【0003】 Patent Document 2 discloses an input device provided with a plurality of light emitting units and a plurality of operation members. The light emitting units of the input device are photographed by a camera provided in a head mounting device, and the position and orientation of the input device are calculated based on the detected positions of the light emitting units. 【Prior Art Documents】 【Patent Documents】 【0004】 【Patent Document 1】 Japanese Unexamined Patent Application Publication No. 2020-181322 【Patent Document 2】 International Publication No. 2021 / 240930 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0005】 In recent years, information processing technologies that track the position and orientation of a device and reflect it in a 3D model in a VR space have become widespread. By linking the movement of a player character or game object in a game space to changes in the position and orientation of a device to be tracked, intuitive operations by the user are realized. 【0006】 The present disclosure aims to provide a technique for estimating the position of a device. The device may be an input device having an operation member, or may simply be a device to be tracked without an operation member. 【Means for Solving the Problem】 【0007】 To solve the above problems, an information processing apparatus according to an aspect of the present disclosure is an information processing apparatus for estimating the position of a device, and includes a captured image acquisition unit that acquires an image of the device, a sensor data acquisition unit that acquires sensor data indicating the acceleration and / or angular velocity of the device, an estimation processing unit that estimates the position of the device based on the image of the device, a contact determination unit that determines whether or not the user is touching the device, and a stationary determination unit that determines whether or not the device is stationary based on the sensor data. When it is determined that the user is not touching the device and the device is stationary, the estimation processing unit fixes the estimated position of the device. 【0008】 Another aspect of the present disclosure is an information processing apparatus for estimating the position of a device, comprising: an image acquisition unit for acquiring an image of the device; a sensor data acquisition unit for acquiring sensor data indicating the acceleration and / or angular velocity of the device; a contact determination unit for determining whether a user is touching the device; a stationary determination unit for determining whether the device is stationary based on the sensor data; and an estimation processing unit for estimating the position of the device. The estimation processing unit comprises a first estimation processing unit for estimating the position of the device based on an image of the device; a second estimation processing unit for estimating the position of the device based on sensor data; and a third estimation processing unit for deriving the position of the device based on the position of the device estimated by the first estimation processing unit and the position of the device estimated by the second estimation processing unit. When it is determined that a user is not touching the device and the device is stationary, the estimation processing unit fixes the estimated position of the device. 【0009】 A device position estimation method in yet another aspect of the present disclosure includes the steps of: acquiring an image of the device; acquiring sensor data indicating the acceleration and / or angular velocity of the device; estimating the position of the device based on the image of the device; determining whether a user is touching the device; determining whether the device is stationary based on the sensor data; and fixing the estimated position of the device if it is determined that the user is not touching the device and the device is stationary. 【0010】 A device position estimation method in yet another aspect of the present disclosure comprises the steps of acquiring an image captured by an imaging device, acquiring sensor data indicating the acceleration and / or angular velocity of a device, determining whether a user is touching the device, determining whether the device is stationary based on the sensor data, and estimating the position of the device, wherein the estimation step comprises a first estimation step of estimating the position of the device based on an image of the device, a second estimation step of estimating the position of the device based on sensor data, and a third estimation step of estimating the position of the device based on the position of the device estimated in the first estimation step and the position of the device estimated in the second estimation step, and if it is determined that a user is not touching the device and the device is stationary, the estimation step fixes the estimated position of the device. 【0011】 Furthermore, any combination of the above components, as well as conversions of the expressions of this disclosure between methods, apparatus, systems, computer programs, recording media on which computer programs are recorded in a readable manner, data structures, etc., are also valid as aspects of this disclosure. [Brief explanation of the drawing] 【0012】 [Figure 1] This figure shows an example of the configuration of the information processing system in the embodiment. [Figure 2] This figure shows an example of the external shape of an HMD (Head-Mounted Display). [Figure 3] This diagram shows the functional blocks of the HMD (Head-Mounted Display). [Figure 4] This diagram shows the shape of the input device. [Figure 5] This diagram shows the shape of the input device. [Figure 6] This figure shows some examples of images taken from an input device. [Figure 7] This diagram shows the functional blocks of an input device. [Figure 8] This diagram shows the functional blocks of an information processing device. [Figure 9] This is a flowchart showing the position and orientation estimation process. [Figure 10] This is a diagram showing the internal configuration of the estimation processing unit. [Figure 11] This is a flowchart showing the position fixing process. [Modes for carrying out the invention] 【0013】 Figure 1 shows an example configuration of the information processing system 1 in an embodiment. The information processing system 1 comprises an information processing device 10, a recording device 11, a head-mounted display (HMD) 100, an input device 16 that the user holds and operates with their fingers, and an output device 15 that outputs images and sound. The output device 15 may be a television. The information processing device 10 is connected to an external network 2, such as the internet, via an access point (AP) 17. The AP 17 has the functions of a wireless access point and a router, and the information processing device 10 may be connected to the AP 17 by a cable or by a known wireless communication protocol. 【0014】 The recording device 11 records system software and applications such as game software. The information processing device 10 may download game software from the content server to the recording device 11 via the network 2. The information processing device 10 executes the game software and supplies game image data and audio data to the HMD 100. The information processing device 10 and the HMD 100 may be connected by a known wireless communication protocol or by a cable. 【0015】 The HMD100 is a display device that, when worn on the user's head, displays images on a display panel positioned in front of the user's eyes. The HMD100 separately displays the left eye image on the left eye display panel and the right eye image on the right eye display panel. These images constitute parallax images viewed from the left and right viewpoints, achieving stereoscopic vision. Since the user views the display panel through optical lenses, the information processing device 10 supplies the HMD100 with parallax image data corrected for optical distortion caused by the lenses. 【0016】 The user wearing the HMD 100 does not need the output device 15, but by providing the output device 15, another user can view the image displayed on the output device 15. The information processing device 10 may display the same image on the output device 15 as the image seen by the user wearing the HMD 100, or it may display a different image. For example, if the user wearing the HMD and another user are playing a game together, the output device 15 may display the game image from the perspective of the other user's character. 【0017】 The information processing device 10 and the input device 16 may be connected by a known wireless communication protocol or by a cable. The input device 16 is equipped with multiple operating elements such as operation buttons, and the user operates the operating elements with their fingers while holding the input device 16. When the information processing device 10 runs a game, the input device 16 is used as a game controller. The input device 16 is equipped with an inertial measurement unit (IMU) including a 3-axis accelerometer and a 3-axis angular velocity sensor, and transmits sensor data to the information processing device 10 at a predetermined period (for example, 800 Hz). 【0018】 The game in this embodiment handles not only the operation information of the control elements of the input device 16, but also the speed, position, and orientation of the input device 16 as operation information, and reflects this in the movement of the player character in the virtual 3D space. For example, the operation information of the control elements may be used as information to move the player character, and the operation information such as the speed, position, and orientation of the input device 16 may be used as information to move the player character's arms. In the game's combat scenes, the movement of the input device 16 is reflected in the movement of the player character holding a weapon, enabling intuitive operation for the user and enhancing immersion in the game. 【0019】 To track the position and orientation of the input device 16, the input device 16 is provided with multiple markers (light-emitting parts) that can be photographed by the imaging device 14. The information processing device 10 has a function (hereinafter also referred to as the "first estimation function") to analyze the image of the input device 16 and estimate the position and orientation of the input device 16 in real space. 【0020】 The HMD100 is equipped with multiple imaging devices 14. The multiple imaging devices 14 are mounted in different positions and orientations on the front of the HMD100 so that the combined imaging range of each device covers the entire field of view of the user. The imaging devices 14 are equipped with image sensors capable of acquiring images of multiple markers on the input device 16. For example, if the markers emit visible light, the imaging device 14 has a visible light sensor, such as a CCD (Charge Coupled Device) sensor or a CMOS (Complementary Metal Oxide Semiconductor) sensor, which are commonly used in digital video cameras. If the markers emit invisible light, the imaging device 14 has an invisible light sensor. The multiple imaging devices 14 capture images of the area in front of the user at a predetermined period (for example, 120 frames / second) in synchronous timing and transmit the image data capturing the real space to the information processing device 10. 【0021】 The information processing device 10 performs a first estimation function to identify the positions of multiple marker images of the input device 16 included in the captured image. Although one input device 16 may be captured by multiple imaging devices 14 at the same time, since the mounting position and orientation of the imaging devices 14 are known, the information processing device 10 may synthesize the multiple captured images to identify the positions of the marker images. 【0022】 The three-dimensional shape of the input device 16 and the position coordinates of multiple markers placed on its surface are known, and the information processing device 10 estimates the position and orientation of the input device 16 in real space based on the position coordinates of the multiple marker images in the captured image. The position of the input device 16 is estimated as a world coordinate value in a three-dimensional space with a reference position as the origin, and the reference position may be a position coordinate (latitude, longitude, altitude) set before the start of the game. 【0023】 The information processing device 10 in this embodiment includes a function (hereinafter also referred to as the "second estimation function") that analyzes sensor data transmitted from the input device 16 to estimate the velocity, position, and orientation of the input device 16 in real space. The information processing device 10 derives the position and orientation of the input device 16 using the estimation results from the first estimation function and the estimation results from the second estimation function. The information processing device 10 in this embodiment uses a state estimation technique using a Kalman filter to integrate the estimation results from the first estimation function and the estimation results from the second estimation function, thereby estimating the state of the input device 16 at the current time with high accuracy. 【0024】 Figure 2 shows an example of the external shape of the HMD100. The HMD100 consists of an output mechanism 102 and a mounting mechanism 104. The mounting mechanism 104 includes a mounting band 106 that wraps around the user's head to secure the HMD100 to the head. The mounting band 106 has a material or structure that allows its length to be adjusted to the user's head circumference. 【0025】 The output mechanism 102 includes a housing 108 shaped to cover the left and right eyes when the HMD 100 is worn by the user, and inside is a display panel that faces the eyes when worn. The display panel may be an LCD panel or an organic EL panel. Inside the housing 108 is a pair of left and right optical lenses positioned between the display panel and the user's eyes to expand the user's field of view. The HMD 100 may also be equipped with speakers or earphones at positions corresponding to the user's ears, and may be configured to allow connection of external headphones. 【0026】 Multiple imaging devices 14a, 14b, 14c, and 14d are provided on the front outer surface of the housing 108. Using the user's face as a reference, imaging device 14a is mounted in the upper right corner of the front outer surface with the camera optical axis pointing diagonally upward to the right; imaging device 14b is mounted in the upper left corner of the front outer surface with the camera optical axis pointing diagonally upward to the left; imaging device 14c is mounted in the lower right corner of the front outer surface with the camera optical axis pointing diagonally downward to the right; and imaging device 14d is mounted in the lower left corner of the front outer surface with the camera optical axis pointing diagonally downward to the left. By installing multiple imaging devices 14 in this manner, the combined imaging range of each device covers the entire user's field of view. This user's field of view may be the user's field of view in a three-dimensional virtual space. 【0027】 The HMD100 transmits sensor data detected by the IMU (Inertial Measurement Unit) and image data captured by the imaging device 14 to the information processing device 10, and also receives game image data and game audio data generated by the information processing device 10. 【0028】 Figure 3 shows the functional blocks of the HMD100. The control unit 120 is the main processor that processes and outputs various data such as image data, audio data, and sensor data, as well as commands. The memory unit 122 temporarily stores the data and commands processed by the control unit 120. The IMU 124 acquires sensor data related to the movement of the HMD100. The IMU 124 may include at least three-axis acceleration sensors and three-axis angular velocity sensors. The IMU 124 detects the values ​​of each axis component (sensor data) at a predetermined period (for example, 800 Hz). 【0029】 The communication control unit 128 transmits data output from the control unit 120 to the external information processing device 10 via a network adapter or antenna, either by wired or wireless communication. The communication control unit 128 also receives data from the information processing device 10 and outputs it to the control unit 120. 【0030】 When the control unit 120 receives game image data and game audio data from the information processing device 10, it supplies them to the display panel 130 for display and also supplies them to the audio output unit 132 for audio output. The display panel 130 consists of a left-eye display panel 130a and a right-eye display panel 130b, and a pair of parallax images are displayed on each display panel. The control unit 120 also transmits sensor data from the IMU 124, audio data from the microphone 126, and captured image data from the imaging device 14 to the information processing device 10 via the communication control unit 128. 【0031】 Figure 4(a) shows the shape of the left-handed input device 16a. The left-handed input device 16a comprises a case body 20, a plurality of operating members 22a, 22b, 22c, and 22d (hereinafter referred to as "operating members 22" unless otherwise specified) operated by the user, and a plurality of markers 30 that emit light to the outside of the case body 20. The markers 30 may have an emission portion with a circular cross-section. The operating members 22 may include an analog stick for tilting operation, a push-button, etc. The case body 20 has a gripping portion 21 and a curved portion 23 connecting the top and bottom of the case body, and the user places their left hand in the curved portion 23 and grips the gripping portion 21. While gripping the gripping portion 21, the user uses their left thumb to operate the operating members 22a, 22b, 22c, and 22d. 【0032】 Figure 4(b) shows the shape of the right-handed input device 16b. The right-handed input device 16b comprises a case body 20, a plurality of operating members 22e, 22f, 22g, and 22h (hereinafter referred to as "operating members 22" unless otherwise specified) operated by the user, and a plurality of markers 30 that emit light to the outside of the case body 20. The operating members 22 may include an analog stick for tilting operation, a push-button, etc. The case body 20 has a gripping portion 21 and a curved portion 23 connecting the top and bottom of the case body, and the user places their right hand in the curved portion 23 and grips the gripping portion 21. While gripping the gripping portion 21, the user uses their right thumb to operate the operating members 22e, 22f, 22g, and 22h. 【0033】 Figure 5 shows the shape of the input device 16b for the right hand. In addition to the operating members 22e, 22f, 22g, and 22h shown in Figure 4(b), the input device 16b has operating members 22i and 22j. The user holds the gripping part 21 and operates operating member 22i with the index finger of the right hand and operating member 22j with the middle finger. Hereafter, unless otherwise distinguished, the input device 16a and the input device 16b will be referred to as "input device 16". 【0034】 The operating members 22 provided on the input device 16 may be equipped with a touch-sense function that recognizes a finger simply by touching it, without requiring a press. Regarding the input device 16b for the right hand, the operating members 22f, 22g, and 22j may be equipped with capacitive touch sensors. While the touch sensors may be mounted on other operating members 22, it is preferable that they be mounted on operating members 22 that do not come into contact with the surface when the input device 16 is placed on a surface such as a table. 【0035】 The marker 30 is a light-emitting part that emits light to the outside of the case body 20, and includes a resin part on the surface of the case body 20 that diffuses and emits light to the outside from a light source such as an LED (Light Emitting Diode) element. The marker 30 is photographed by the imaging device 14 and used for tracking processing of the input device 16. 【0036】 The information processing device 10 uses the images captured by the imaging device 14 for tracking processing of the input device 16 and for SLAM (Simultaneous Localization and Mapping) processing of the HMD 100. In this embodiment, of the images captured by the imaging device 14 at 120 frames per second, the grayscale images captured at 60 frames per second are used for tracking processing of the input device 16, and another full-color image captured at 60 frames per second may be used for the HMD 100 to simultaneously perform self-position estimation and environmental map creation. 【0037】 Figure 6 shows an example of a portion of an image taken of the input device 16. This image is of the input device 16b held in the right hand and includes images of multiple markers 30 that emit light. In the HMD 100, the communication control unit 128 transmits the image data captured by the imaging device 14 to the information processing device 10 in real time. 【0038】 Figure 7 shows the functional blocks of the input device 16. The control unit 50 receives operation information input to the operating member 22. The control unit 50 also receives sensor data detected by the IMU (Inertial Measurement Unit) 32 and sensor data detected by the touch sensor 24. As described above, the touch sensor 24 is attached to at least some of the multiple operating members 22 and detects when the user's finger is in contact with the operating member 22. 【0039】 The IMU 32 acquires sensor data related to the movement of the input device 16 and includes an accelerometer 34 that detects acceleration data for at least three axes and an angular velocity sensor 36 that detects angular velocity data for three axes. The accelerometer 34 and angular velocity sensor 36 detect values ​​(sensor data) for each axis component at a predetermined period (e.g., 800 Hz). The control unit 50 supplies the received operation information and sensor data to the communication control unit 54, and the communication control unit 54 transmits the operation information and sensor data to the information processing device 10 via a network adapter or antenna by wired or wireless communication. 【0040】 The input device 16 includes multiple light sources 58 for illuminating multiple markers 30. The light sources 58 may be LED elements that emit light in a predetermined color. When the communication control unit 54 receives a light emission instruction from the information processing device 10, the control unit 50 illuminates the light sources 58 based on the light emission instruction, thereby illuminating the markers 30. In the example shown in Figure 7, one light source 58 is provided for each marker 30, but one light source 58 may illuminate multiple markers 30. 【0041】 The vibrator 52 provides the user with tactile stimuli for game presentation. During the user's gameplay, the information processing device 10 transmits vibration instructions to the input device 16 according to the game's progress. When the communication control unit 54 receives a vibration instruction from the information processing device 10, the control unit 50 vibrates the vibrator 52 based on the instruction. By providing the user with tactile sensations corresponding to the game's progress, the vibrator 52 can enhance the user's immersion in the game. The vibrator 52 may be, for example, a voice coil motor. 【0042】 Figure 8 shows the functional blocks of the information processing device 10. The information processing device 10 comprises a processing unit 200 and a communication unit 202. The processing unit 200 comprises an acquisition unit 210, a game execution unit 220, an image signal processing unit 222, a marker information holding unit 224, a state holding unit 226, an estimation processing unit 230, a contact determination unit 232, a stationary determination unit 234, an image signal processing unit 268, and a SLAM processing unit 270. The communication unit 202 receives operation information and sensor data of the operating member 22 transmitted from the input device 16 and supplies it to the acquisition unit 210. The communication unit 202 also receives captured image data and sensor data transmitted from the HMD 100 and supplies it to the acquisition unit 210. The acquisition unit 210 comprises a captured image acquisition unit 212, a sensor data acquisition unit 214, and an operation information acquisition unit 216. 【0043】 The information processing device 10 includes a computer, and various functions shown in Figure 8 are realized by the computer executing a program. The computer includes, as hardware, a memory for loading the program, one or more processors for executing the loaded program, auxiliary storage devices, and other LSIs. The processor is composed of multiple electronic circuits, including semiconductor integrated circuits and LSIs, and these multiple electronic circuits may be mounted on one chip or on multiple chips. The functional blocks shown in Figure 8 are realized through the cooperation of hardware and software, and therefore, it will be understood by those skilled in the art that these functional blocks can be realized in various ways by hardware alone, software alone, or a combination thereof. 【0044】 (SLAM function) The image acquisition unit 212 acquires a full-color image for SLAM processing of the HMD 100 and supplies it to the image signal processing unit 268. The image signal processing unit 268 applies image signal processing such as noise reduction and optical correction (shading correction) to the image data and supplies the processed image data to the SLAM processing unit 270. 【0045】 The sensor data acquisition unit 214 acquires sensor data transmitted from the HMD 100 and supplies it to the SLAM processing unit 270. The SLAM processing unit 270 simultaneously performs self-position estimation of the HMD 100 and environmental map creation based on the image data supplied from the captured image acquisition unit 212 and the sensor data supplied from the sensor data acquisition unit 214. 【0046】 (First estimation function using captured images) The image acquisition unit 212 acquires a grayscale image for tracking processing of the input device 16 and supplies it to the image signal processing unit 222. The image signal processing unit 222 applies image signal processing such as noise reduction and optical correction (shading correction) to the image data and supplies the image signal processed image data to the first estimation processing unit 240. 【0047】 The first estimation processing unit 240 includes a marker image coordinate identification unit 242, a position and orientation derivation unit 244, and a noise derivation unit 246, and realizes a first estimation function that estimates the position and orientation of the input device 16 based on an image of the input device 16. The first estimation processing unit 240 extracts marker images of multiple markers 30 of the input device 16 from the captured image, and estimates the position and orientation of the input device 16 from the arrangement of the extracted multiple marker images. The first estimation processing unit 240 outputs the estimated position and orientation of the input device 16, along with the variance of its noise (error), to the third estimation processing unit 260. 【0048】 (Second estimation function using sensor data) The sensor data acquisition unit 214 acquires sensor data transmitted from the input device 16 and supplies it to the second estimation processing unit 250. The second estimation processing unit 250 implements a second estimation function that estimates the velocity, position, and attitude of the input device 16 based on the sensor data indicating the acceleration and angular velocity of the input device 16. In this embodiment, the second estimation function is a function that performs the state prediction step in the Kalman filter, and the second estimation processing unit 250 estimates the state vector at the current time by adding the amount of change in the state vector obtained by integrating the supplied sensor data to the state vector (velocity, position, attitude) at the previous time. The second estimation processing unit 250 outputs the estimated state vector, along with its noise variance, to the third estimation processing unit 260. Note that the amount of change obtained by integration accumulates noise over time, so the state vector (velocity, position, attitude) estimated by the second estimation processing unit 250 tends to deviate from the actual state vector (velocity, position, attitude). 【0049】 (Integration function of estimation results) The third estimation processing unit 260 derives the velocity, position, and orientation of the input device 16 with high accuracy from the position and orientation of the input device 16 estimated by the first estimation processing unit 240 and the state vector (velocity, position, orientation) of the input device 16 estimated by the second estimation processing unit 250. The third estimation processing unit 260 may perform a UKF (unscented Kalman filter) filtering step (correction step). The third estimation processing unit 260 acquires the state vector estimated by the second estimation processing unit 250 as a "prior estimate," acquires the position and orientation estimated by the first estimation processing unit 240 as "observed values," calculates the Kalman gain, and obtains a "post-estimate" by correcting the "prior estimate" using the Kalman gain. The "post-estimate" accurately represents the velocity, position, and orientation of the input device 16 and is provided to the game execution unit 220, as well as recorded in the state holding unit 226 and used by the second estimation processing unit 250 to estimate the state vector at the next time step. 【0050】 The technique of improving accuracy by integrating analysis results from multiple sensors, such as the imaging device 14 and the IMU 32, is known as sensor fusion. In sensor fusion, it is necessary to represent the time at which data was acquired by each sensor on a common time axis. In the information processing system 1, the imaging period of the imaging device 14 and the sampling period of the IMU 32 are different and asynchronous. Therefore, by accurately managing the image acquisition time and the detection time of acceleration and angular velocity, the third estimation processing unit 260 can estimate the position and orientation of the input device 16 with high accuracy. 【0051】 The operation information acquisition unit 216 acquires operation information transmitted from the input device 16 and supplies it to the game execution unit 220. The game execution unit 220 proceeds with the game based on the operation information and the position and orientation information of the input device 16 estimated by the estimation processing unit 230. 【0052】 Figure 9 is a flowchart showing the position and orientation estimation process by the first estimation processing unit 240. The captured image acquisition unit 212 acquires image data of the input device 16 (S10) and supplies it to the image signal processing unit 222. The image signal processing unit 222 performs image signal processing such as noise reduction and optical correction on the image data (S12) and supplies the image signal processed image data to the marker image coordinate identification unit 242. 【0053】 The marker image coordinate identification unit 242 identifies the representative coordinates of multiple marker images included in the captured image (S14). When the brightness of each pixel in a grayscale image is represented by 8 bits and takes brightness values ​​from 0 to 255, the marker image is captured as a high-brightness image as shown in Figure 6. The marker image coordinate identification unit 242 may identify a region from the captured image in which pixels with brightness values ​​of a predetermined value or higher (for example, a brightness value of 128) are continuous, calculate the centroid coordinates of that continuous pixel region, and identify the representative coordinates of the marker image. 【0054】 The captured image includes not only the marker image but also images of lighting equipment such as lamps. Therefore, the marker image coordinate identification unit 242 checks whether a continuous pixel region with a brightness value greater than or equal to a predetermined value corresponds to the marker image by comparing it against several predetermined criteria. For example, if the continuous pixel region is too large or has a long shape, it is certain that the continuous pixel region does not correspond to the marker image, and the marker image coordinate identification unit 242 may determine that such a continuous pixel region is not the marker image. The marker image coordinate identification unit 242 calculates the centroid coordinates of the continuous pixel region that satisfies the predetermined criteria, identifies them as the representative coordinates of the marker image (marker image coordinates), and stores the identified representative coordinates in memory (not shown). 【0055】 The marker information holding unit 224 holds the three-dimensional coordinates of each marker in a three-dimensional model of the input device 16 at a reference position and reference orientation. A method for estimating the position and orientation of an imaging device that took an image of an object whose three-dimensional shape and size are known is known by solving the PNP (Perspective n-Point) problem. 【0056】 In this embodiment, the position and orientation derivation unit 244 reads N (where N is an integer of 3 or more) marker image coordinates from memory (not shown), and estimates the position and orientation of the input device 16 from the read N marker image coordinates and the 3D coordinates of the N markers in the 3D model of the input device 16. The position and orientation derivation unit 244 estimates the position and orientation of the imaging device 14 using the following (Equation 1), and derives the position and orientation of the input device 16 in 3D space based on the estimation result. 【number】 【0057】 Here, (u, v) are the marker image coordinates in the captured image, and (X, Y, Z) are the position coordinates in the three-dimensional space of the marker 30 when the three-dimensional model of the input device 16 is in the reference position and reference orientation. The three-dimensional model has exactly the same shape and size as the input device 16 and is a model with markers arranged at the same positions. The marker information holding unit 224 holds the three-dimensional coordinates of each marker in the three-dimensional model in the reference position and reference orientation. The position and orientation derivation unit 244 reads out the three-dimensional coordinates of each marker from the marker information holding unit 224 and obtains (X, Y, Z). 【0058】 (f x 、f y ) is the focal length of the imaging device 14, (c x 、c y ) is the principal point of the image, and both are the internal parameters of the imaging device 14. The matrix with elements r 11 ~r 33 、t1~t3 is a rotation and translation matrix. In Equation (1), (u, v), (f x 、f y ), (c x 、c y ), and (X, Y, Z) are known, and the position and orientation derivation unit 244 solves the equations for N markers 30 to obtain the common rotation and translation matrix for them. In the embodiment, the process of estimating the position and orientation of the input device 16 is implemented by solving the P3P problem. 【0059】 Specifically, the position and orientation derivation unit 244 extracts any three marker image coordinates from among the plurality of marker image coordinates specified by the marker image coordinate specifying unit 242. The position and orientation derivation unit 244 reads out the three-dimensional coordinates of the markers in the three-dimensional model from the marker information holding unit 224 and solves the P3P problem using Equation (1). When the position and orientation derivation unit 244 specifies the rotation and translation matrix common to the three extracted marker image coordinates, it calculates the reprojection error using the marker image coordinates of the input device 16 other than the three extracted marker image coordinates. 【0060】 The position and orientation derivation unit 244 extracts a predetermined number of combinations of the coordinates of three marker images. For each of the extracted combinations of three marker image coordinates, the position and orientation derivation unit 244 identifies a rotation / translation sequence and calculates the reprojection error for each. Then, the position and orientation derivation unit 244 identifies the rotation / translation sequence that results in the smallest reprojection error from the predetermined number of reprojection errors and derives the position and orientation of the input device 16 (S16). Here, the position and orientation derivation unit 244 derives the world coordinate position and orientation of the input device 16, which are determined by combining the position and orientation of the input device 16 estimated in the HMD coordinate system with the world coordinate position and orientation of the HMD 100. 【0061】 The noise derivation unit 246 derives the variance of the noise (error) for each of the estimated position and orientation (S18). The noise variance corresponds to the reliability of the estimated position and orientation; the higher the reliability, the smaller the variance, and the lower the reliability, the larger the variance. The noise derivation unit 246 may derive the noise variance based on the distance between the imaging device 14 and the input device 16, and the position of the marker image within the field of view. For example, if the imaging device 14 and the input device 16 are far apart or extremely close, or if the marker image is located at the edge of the captured image, it becomes difficult to derive the accurate centroid coordinates of the marker image, and the noise variance tends to be large. 【0062】 The position and orientation estimation process by the first estimation processing unit 240 is performed at the image acquisition cycle (60 frames / second) of the tracking image of the input device 16 (N in S20). When the game execution unit 220 ends the game, the position and orientation estimation process by the first estimation processing unit 240 ends (Y in S20). 【0063】 Figure 10 shows the internal configuration of the estimation processing unit 230. At time k, the first estimation processing unit 240 calculates the estimated position and orientation as "observed values ​​n". k ", the variance of position noise and attitude noise is "Observation noise R k This is output to the third estimation processing unit 260. • Observation value n k : Observation vector at time k · Observation noise R k Error covariance matrix of observed values ​​at time k 【0064】 The second estimation processing unit 250 processes the "state vector m" from one time step prior (time k-1). k-1|k-1 " and "Estimated error P k-1|k-1 The state holding unit 226 reads the state vector m k-1|k-1 " and "Estimated error P k-1|k-1 The following is input to the prediction unit. The state variables m in this embodiment include the velocity, position, and attitude of the input device 16, but may also include acceleration bias and angular velocity bias. • State vector m k-1|k-1 : State vector at time k-1 estimated from information up to time k-1 · Estimation error P k-1|k-1 : Estimated error covariance matrix of the state at time k-1, estimated using information up to time k-1. 【0065】 Furthermore, the second estimation processing unit 250 receives the acceleration a of the input device 16 from the sensor data acquisition unit 214. k and angular velocity ω k Obtain acceleration a k and angular velocity ω k to "Process Input l k This is entered into the prediction unit as "[...]". · Acceleration a k : Acceleration at time k · Angular velocity ω k Angular velocity at time k • Process input l k : Process input vector at time k 【0066】 The second estimation processing unit 250 processes acceleration a k and angular velocity ω k Then, using fixed noise parameters (including axial misalignment, scale misalignment, value misalignment, and bias misalignment), the variance of acceleration noise and angular velocity noise is calculated, and "Process Noise Q" is calculated. k This is entered into the prediction unit as "[...]". • Process noise Q k Error covariance matrix of process input at time k 【0067】 The prediction unit calculates acceleration a k and angular velocity ω k Perform an integral operation on each of them to obtain the "state vector m k-1|k-1 The amount of change from " (i.e., change in velocity, change in position, change in attitude) is calculated, and the "state vector m k-1|k-1 The calculation is performed by adding to ". Specifically, the prediction unit calculates acceleration a k The velocity change is calculated by integrating, and the calculated velocity change is called the "state vector m k-1|k-1 The velocity at time k is estimated by adding it to the velocity at time k-1 included in "". The prediction unit calculates the change in position by integrating the estimated velocity at time k, and the calculated change in position is used as the "state vector m k-1|k-1 The position at time k is estimated by adding it to the position at time k-1 included in ". The prediction unit also calculates the angular velocity ω k The amount of attitude change is calculated by integrating, and the calculated amount of attitude change is called the "state vector m k-1|k-1 The state vector m is added to the state vector m included in the prediction unit to estimate the state at time k. k|k-1 The prediction unit calculates the "state vector m". k|k-1 " and "Estimated error P k|k-1 This is output to the third estimation processing unit 260. • State vector m k|k-1 : State vector at time k estimated using information up to time k-1 · Estimation error P k|k-1 : Estimation error covariance matrix of the state at time k, estimated using information up to time k-1. 【0068】 The third estimation processing unit 260 receives the "observed value n" from the first estimation processing unit 240. k " and "Observation noise R k The second estimation processing unit 250 obtains the "state vector m" and the second estimation processing unit 250 obtains the "state vector m" k|k-1 " and "Estimated error P k|k-1 Obtain "state vector m" k|k-1 The third estimation processing unit 260 calculates the Kalman gain to correct the "state vector m k|k-1 Correcting "state vector m k|k" and "Estimated error P k|k It outputs "". • State vector m k|k : State vector at time k estimated from information up to time k · Estimation error P k|k : Estimation error covariance matrix of the state at time k, estimated using information up to time k. 【0069】 The state vector estimated by the second estimation processing unit 250 contains integral drift, and the amount of drift accumulates each time the integral operation is repeated. In the estimation processing unit 230, the third estimation processing unit 260 uses a highly reliable "observed value n" at the tracking processing period (60 Hz) of the first estimation processing unit 240. k By using this, the state vector (velocity, position, attitude) output by the second estimation processing unit 250 is corrected, resulting in a highly accurate "state vector m k|k It is estimating "state vector m". k|k "The state vector m includes velocity, position, and orientation in world coordinates and is provided to the game execution unit 220 and may be used for game operation. k|k " and "Estimated error P k|k This value is temporarily held in the state holding unit 226 and read out during the estimation process at time k+1 in the second estimation processing unit 250. 【0070】 In the estimation processing unit 230, the estimation process by the first estimation processing unit 240 is performed at a period of 60 Hz, while the estimation process by the second estimation processing unit 250 is performed at a period of 800 Hz. Therefore, between the time the first estimation processing unit 240 outputs an observed value and the time the next observed value is output, the second estimation processing unit 250 sequentially updates the state vector, and during this time, the state vector is not corrected. In the embodiment, the estimation processing unit 230 performs the correction step based on the state at time k-1 immediately preceding the observation time k, that is, it uses the observed value to correct past states. 【0071】 In the information processing device 10, the world coordinate position of the input device 16 is determined by a combination of the position of the input device 16 estimated in the HMD coordinate system of the HMD 100 and the world coordinate position of the HMD 100. The world coordinate system generated by SLAM contains errors, and errors are also introduced into the estimation process in the estimation processing unit 230, so the estimated world coordinate position of the input device 16 always contains errors. 【0072】 Therefore, even when the input device 16 is placed on a table or the like and not touched by the user, the position of the input device 16 estimated by the estimation processing unit 230 will change slightly due to fluctuations in the error component and will not be maintained in exactly the same position. In particular, when the HMD 100 equipped with the imaging device 14 moves significantly, the error component increases, causing the estimated world coordinate position to fluctuate even though the input device 16 is not moved at all on the table. For this reason, the estimation processing unit 230 in this embodiment performs a position fixing process to fix and prevent fluctuations in the world coordinate position of the input device 16 when it is certain that the input device 16 is stationary. 【0073】 <Position fixing process> The processing unit 200 includes a contact determination unit 232 and a stationary determination unit 234 to determine whether the input device 16 is stationary. First, the stationary determination unit 234 determines whether the input device 16 is stationary based on sensor data indicating the acceleration and / or angular velocity of the input device 16. The stationary determination unit 234 may determine whether the input device 16 is stationary based, for example, on sensor data indicating angular velocity. 【0074】 In this case, the stationary determination unit 234 determines that the input device 16 is stationary when the angular velocity is below a predetermined threshold for a predetermined period of time, and determines that the input device 16 is moving if the angular velocity does not fall below the predetermined threshold for a predetermined period of time. The time for determining stationary status may be 1.5 seconds, and the predetermined threshold may be set to, for example, 3 degrees per second. The stationary determination unit 234 may also determine that the input device 16 is stationary if the variance of the most recent 10 samples is 1 degree per second or less as an additional condition. Determining that the input device 16 is stationary constitutes the first condition for the estimation processing unit 230 to perform position fixing processing. 【0075】 The contact determination unit 232 determines whether the user is touching the input device 16. The contact determination unit 232 may determine whether the user is touching the input device 16 based on sensor data from the touch sensor 24 provided on the input device 16. Determining that the user is not touching the input device 16 constitutes a second condition for the estimation processing unit 230 to perform position fixing processing. 【0076】 The position fixing unit 262 detects that the input device 16 is reliably stationary when both the first and second conditions are met simultaneously. According to the Discloser's experiments, it has been found that a stationary determination time of about 10 seconds is required to reliably detect that the input device 16 is stationary using only the first condition. The Discloser has found that by adding the second condition, the stationary determination time for the first condition can be shortened (for example, to 1.5 seconds). Therefore, in the embodiment, the stationary state is reliably detected in a short time by having both the first and second conditions met simultaneously. 【0077】 When both the first and second conditions are met simultaneously, the position fixing unit 262 performs a position fixing process to fix the world coordinate position P of the input device 16 estimated by the third estimation processing unit 260. While the position fixing process is being performed, the position fixing unit 262 prevents the third estimation processing unit 260 from deriving a new estimated position. Since the estimated position is not updated, the position fixing unit 262 may cancel the execution of the state prediction step by the second estimation processing unit 250 and the correction step by the third estimation processing unit 260. 【0078】 The position fixing unit 262 performs position fixing processing to avoid situations where the position of the completely stationary input device 16 fluctuates in the world coordinate system. If the contact detection unit 232 determines that the user is touching the input device 16, or if the stationary detection unit 234 determines that the input device 16 is moving, the position fixing unit 262 terminates the position fixing processing and releases the position of the input device 16. At the same time, the position fixing unit 262 restarts the operation of the second estimation processing unit 250 and the third estimation processing unit 260 to perform normal position and orientation estimation processing. After the position fixing processing is completed, the third estimation processing unit 260 performs processing to estimate the velocity, position, and orientation of the input device 16 with high accuracy from the position and orientation of the input device 16 estimated by the first estimation processing unit 240 and the state vector of the input device 16 estimated by the second estimation processing unit 250. 【0079】 When the position-fixing process is completed and the normal position-orientation estimation process resumes, the fixed position of the input device 16 is instantly updated to the position estimated by the position-orientation estimation process. Therefore, if there is a large discrepancy (distance in the world coordinate system) between the world coordinate position estimated by the third estimation processing unit 260 and the fixed world coordinate position, the user will have the impression that the game object corresponding to the input device 16 has instantly moved to a different location on the game screen. For this reason, it is preferable to keep the discrepancy (distance) between the estimated world coordinate position and the fixed position as small as possible in preparation for the resumption of the normal position-orientation estimation process. During the position-fixing process, the operation of the second estimation processing unit 250 and the third estimation processing unit 260 is suspended, so in this embodiment, the first estimation processing unit 240 operates to keep the discrepancy (distance) between the estimated position and the fixed position small. 【0080】 The position fixing unit 262 then monitors the relationship between the position estimated by the first estimation processing unit 240 and the fixed position. When the positions of the two reach a predetermined relationship, it temporarily suspends the position fixing process, releases the position of the input device 16, and resumes the normal position and orientation estimation process. Specifically, the position fixing unit 262 monitors the distance between the position estimated by the first estimation processing unit 240 and the fixed position. When the distance between the two positions exceeds a predetermined distance (Dth), it temporarily suspends the position fixing process, releases the position of the input device 16, and resumes the normal position and orientation estimation process. 【0081】 The predetermined distance Dth is determined according to the distance between the input device 16 and the imaging device 14, and the longer the distance between the input device 16 and the imaging device 14, the longer the predetermined distance Dth may be set. For example, when the distance between the input device 16 and the imaging device 14 is less than 1 m, the predetermined distance Dth is set to 5 cm, and when the distance between the input device 16 and the imaging device 14 is 1 m or more, the predetermined distance Dth may be set to 20 cm. As the distance between the input device 16 and the imaging device 14 increases, the estimation accuracy by the first estimation processing unit 240 decreases. Therefore, by setting the predetermined distance Dth longer than when the distance between the input device 16 and the imaging device 14 is short, the situation of frequently interrupting the position fixing process can be avoided. 【0082】 After the normal position and orientation estimation process resumes, if both the first and second conditions are met simultaneously, the position fixing unit 262 performs position fixing processing to fix the world coordinate position P of the input device 16, which has been newly estimated by the third estimation processing unit 260. By adjusting the fixed position each time in this way to prevent a large discrepancy (gap) between the estimated position and the fixed position, it is possible to avoid a situation where the game object corresponding to the input device 16 instantly moves when the normal position and orientation estimation process resumes. 【0083】 Figure 11 shows a flowchart of the position fixing process. Upon the start of the game, the first estimation processing unit 240, the second estimation processing unit 250, and the third estimation processing unit 260 work together to initiate position and orientation estimation (S30). During gameplay by the user, the position and orientation estimation process is performed at a frequency of 800 Hz (N in S32). 【0084】 During the position and orientation estimation process, the stationary determination unit 234 determines whether the input device 16 is stationary or not based on sensor data indicating the acceleration and / or angular velocity of the input device 16 (S34). If the angular velocity is not below a predetermined threshold for a predetermined period of time, the stationary determination unit 234 determines that the input device 16 is moving (N in S34), and the position and orientation estimation process continues (S30). The stationary determination unit 234 determines that the input device 16 is stationary when the angular velocity is below a predetermined threshold for a predetermined period of time (Y in S34). In addition, to determine that the input device 16 is stationary, it may be set as an additional condition that the variance of the most recent 10 samples is 1 deg / sec or less. 【0085】 The contact determination unit 232 determines whether the user is touching the input device 16 (S36). The contact determination unit 232 determines that the user is touching the device if the sensor data of the touch sensor 24 is a value indicating contact (Y in S36), and the position and orientation estimation process continues (S30). The contact determination unit 232 determines that the user is not touching the device if the sensor data of the touch sensor 24 is a value indicating non-contact (N in S36). 【0086】 When the input device 16, which is not being touched by the user, is in a stationary state, the position fixing unit 262 fixes the world coordinate position P of the input device 16, which has been estimated by the third estimation processing unit 260 (S38). At this time, the position fixing unit 262 stops the execution of the state prediction step by the second estimation processing unit 250 and the correction step by the third estimation processing unit 260, and maintains the fixed position coordinate P. 【0087】 During the position fixing process (N in S40), the position fixing unit 262 monitors the distance between the position estimated by the first estimation processing unit 240 and the fixed position P (S42). In this embodiment, the predetermined distance Dth is set according to the distance between the input device 16 and the imaging device 14. The position fixing unit 262 identifies the distance between the input device 16 and the imaging device 14 and compares the predetermined distance Dth set according to that distance with the distance between the estimated position and the fixed position P. If the distance between the estimated position and the fixed position P is less than the predetermined distance Dth (N in S42), the process returns to S34 to determine if the conditions for continuing the position fixing process are met. On the other hand, if the distance between the estimated position and the fixed position P becomes greater than or equal to the predetermined distance Dth (Y in S42), the position fixing unit 262 temporarily suspends the position fixing process and resumes the position and orientation estimation process in order to perform the fixed position P update process (S30). Subsequently, when the conditions for performing the position fixing process (Y in S34, N in S36) are met, the position fixing unit 262 resumes the position fixing process and fixes the estimated position at that time (S38). When the game execution unit 220 ends the game, this flow ends (Y in S32, Y in S40). 【0088】 The present disclosure has been explained above based on examples. The above examples are illustrative, and it will be understood by those skilled in the art that various modifications are possible in combinations of their components and processing processes, and that such modifications are also within the scope of the present disclosure. In the examples, the estimation process was performed by the information processing device 10, but the functions of the information processing device 10 may be provided in the HMD 100, and the HMD 100 may perform the estimation process. In other words, the HMD 100 may be the information processing device 10. 【0089】 In this embodiment, the arrangement of multiple markers 30 in an input device 16 equipped with an operating member 22 has been described, but the device to be tracked does not necessarily have to be equipped with an operating member 22. In this embodiment, the imaging device 14 is attached to the HMD 100, but the imaging device 14 only needs to be able to capture marker images and may be attached to a location other than the HMD 100. [Industrial applicability] 【0090】 This disclosure can be used in techniques for estimating the location of a device. [Explanation of symbols] 【0091】 1... Information processing system, 10... Information processing device, 14... Imaging device, 16, 16a, 16b... Input devices, 20... Case body, 21... Gripping part, 22... Operating member, 23... Curved part, 24... Touch sensor, 30... Marker, 32... IMU, 34... Acceleration sensor, 36... Angular velocity sensor, 50... Control unit, 52... ...Vibrator, 54...Communication control unit, 58...Light source, 100...HMD, 102...Output mechanism, 104...Mounting mechanism, 106...Mounting band, 108...Housing, 120...Control unit, 122...Memory unit, 124...IMU, 126...Microphone, 128...Communication control unit, 130...Display panel, 130a...Display panel for left eye 130b... Right eye display panel, 132... Audio output unit, 200... Processing unit, 202... Communication unit, 210... Acquisition unit, 212... Captured image acquisition unit, 214... Sensor data acquisition unit, 216... Operation information acquisition unit, 220... Game execution unit, 222... Image signal processing unit, 224... Marker information holding unit, 226... State holding unit, 23 0... Estimation processing unit, 232... Contact determination unit, 234... Stationary determination unit, 240... First estimation processing unit, 242... Marker image coordinate identification unit, 244... Position and orientation derivation unit, 246... Noise derivation unit, 250... Second estimation processing unit, 260... Third estimation processing unit, 262... Position fixing unit, 268... Image signal processing unit, 270... SLAM processing unit.

Claims

[Claim 1] An information processing device for estimating the location of a device, The image acquisition unit acquires an image of the aforementioned device, A sensor data acquisition unit that acquires sensor data indicating the acceleration and / or angular velocity of the device, An estimation processing unit that estimates the position of the device based on an image taken of the device, A contact determination unit that determines whether or not a user is touching the device, It includes a stationary determination unit that determines whether or not the device is stationary based on the sensor data, When it is determined that the user is not touching the device and the device is stationary, the estimation processing unit fixes the estimated position of the device. An information processing device characterized by the following: [Claim 2] The contact determination unit determines whether or not a user is touching the device based on the sensor data from the touch sensor provided on the device. The information processing apparatus according to feature 1. [Claim 3] After fixing the position of the device, the estimation processing unit releases the device's position when the contact determination unit determines that a user is touching the device, or when the stationary determination unit determines that the device is moving. The information processing apparatus according to feature 1. [Claim 4] The estimation processing unit, after fixing the position of the device, releases the fixing of the device's position when the fixed position of the device and the estimated position of the device reach a predetermined relationship. The information processing apparatus according to feature 1. [Claim 5] The estimation processing unit, after fixing the position of the device, releases the fixing of the device's position when the fixed position of the device and the estimated position of the device are separated by a predetermined distance or more. The information processing apparatus according to feature 4. [Claim 6] The predetermined distance is set according to the distance between the device and the imaging device. The information processing apparatus according to feature 5. [Claim 7] The estimation processing unit releases the fixation of the device's position and then fixes the newly estimated position of the device. The information processing apparatus according to feature 4. [Claim 8] An information processing device for estimating the location of a device, The image acquisition unit acquires an image of the aforementioned device, A sensor data acquisition unit that acquires sensor data indicating the acceleration and / or angular velocity of the device, A contact determination unit that determines whether or not a user is touching the device, A stationary determination unit determines whether the device is stationary or not based on the aforementioned sensor data, The system comprises an estimation processing unit for estimating the position of the device, The estimation processing unit, A first estimation processing unit estimates the position of the device based on an image taken of the device, A second estimation processing unit estimates the position of the device based on the aforementioned sensor data, The system includes a third estimation processing unit that derives the position of the device based on the position of the device estimated by the first estimation processing unit and the position of the device estimated by the second estimation processing unit, When it is determined that the user is not touching the device and the device is stationary, the estimation processing unit fixes the estimated position of the device. An information processing device characterized by the following: [Claim 9] After fixing the position of the device, the estimation processing unit releases the fixing of the device's position when the fixed position of the device and the position of the device estimated by the first estimation processing unit reach a predetermined relationship. The information processing apparatus according to feature 8. [Claim 10] A method for estimating the location of a device, The steps include: acquiring an image of the aforementioned device, The steps include: acquiring sensor data indicating the acceleration and / or angular velocity of the device; The steps include: estimating the position of the device based on an image taken of the device; A step of determining whether the user is touching the device, A step of determining whether the device is stationary or not based on the sensor data, If it is determined that the user is not touching the device and the device is stationary, the estimated position of the device is fixed. A device position estimation method characterized by having the following features. [Claim 11] A method for estimating the location of a device, The steps include acquiring the image captured by the imaging device, The steps include: acquiring sensor data indicating the acceleration and / or angular velocity of the device; A step of determining whether the user is touching the device, A step of determining whether the device is stationary or not based on the sensor data, The system comprises an estimation step for estimating the position of the device, The estimation step described above is: A first estimation step in which the position of the device is estimated based on an image taken of the device, A second estimation step in which the position of the device is estimated based on the aforementioned sensor data, The method comprises: the position of the device estimated in the first estimation step and a third estimation step that estimates the position of the device based on the position of the device estimated in the second estimation step, If it is determined that the user is not touching the device and the device is stationary, the estimation step fixes the estimated position of the device. A device position estimation method characterized by the following: