Smart glasses using event camera
Smart glasses employ an event camera system with AI-assisted image processing to reduce latency, allowing for high refresh rate images and precise image correction, addressing the limitations of existing eye tracking technologies.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- KOREA ELECTRONICS TECH INST
- Filing Date
- 2024-12-06
- Publication Date
- 2026-06-11
Smart Images

Figure KR2024019968_11062026_PF_FP_ABST
Abstract
Description
Smart glasses using an event camera
[0001] The present invention relates to smart glasses, and more specifically, to smart glasses that estimate a user's gaze and posture to perform foveated rendering and image correction.
[0002] The eye tracking methods used in existing smart glasses-type devices (AR / VR / MR / XR devices, etc.) utilize IR cameras or standard CMOS sensors, but this results in limitations regarding eye tracking. The Motion-to-Photon Latency (MTP latency) commonly applied to smart glasses must be 20ms or less, which implies a speed of approximately 50Hz. However, for advanced displays, 60Hz or higher is taken for granted, and they often exhibit various refresh rates ranging from 120Hz to 240Hz.
[0003] Therefore, if the MTL latency commonly used for eye tracking remains at 50Hz, the reality is that it is difficult to handle this regardless of how high the video refresh rate is, and this can be considered the biggest challenge in the development of smart glasses. For example, if we assume that the MTL latency required to prevent eye dragging when turning the head from left to right while wearing smart glasses is 50Hz (20ms), then no matter how high the video refresh rate is increased, the time required to detect eye tracking and display it in the video is limited to 50Hz or less, making it impossible to improve the quality of the video.
[0004] In addition, even when the head is not turned, or when the head is fixed and the gaze is moved, or when a person's gaze is generally fixed, there is a constant display of minute movements. Therefore, eye tracking technology is a highly advanced technology, and applying it to smart glasses has been difficult to date.
[0005] The present invention has been devised to solve the above-mentioned problems, and the objective of the present invention is to provide smart glasses capable of foveated rendering of high refresh rate images and image correction by rapidly tracking the user's gaze and head posture using an event camera as a method to reduce the MTL latency of smart glasses.
[0006] A wearable device according to one embodiment of the present invention for achieving the above objective comprises: a first event camera that detects an event by photographing a user's pupil; an eye tracking unit that tracks the gaze using the event detection result of the first event camera; a generation unit that generates an image based on the gaze tracking result of the eye tracking unit; and a display unit that displays the image generated by the generation unit.
[0007] A wearable device according to the present invention further comprises: a second event camera that detects an event by capturing an external environment viewed by a user; and a posture estimation unit that estimates the user's head posture using the event detection result of the second event camera; and a generation unit can generate an image by further referring to the head posture estimated by the estimation unit.
[0008] Wearable devices may not be able to display video with a refresh rate above a certain level due to Motion-to-Photon latency.
[0009] The wearable device may be a glass-type device.
[0010] The generation unit can generate a foveated rendered image based on the eye tracking results.
[0011] The generation unit can correct the generated image based on the eye tracking results.
[0012] Image correction may include correction for at least one of pincushion distortion and barrel distortion.
[0013] The correction unit may be an artificial intelligence model pre-trained to generate a corrected image by receiving an image generated by the generation unit and the eye tracking result of the eye tracking unit.
[0014] According to another aspect of the present invention, a method for generating an image is provided, comprising: a step of detecting an event by photographing a user's pupil with a first event camera; a step of tracking a gaze using the event detection result; a step of generating an image based on the gaze tracking result; and a step of displaying the generated image.
[0015] According to another aspect of the present invention, a wearable device is characterized by comprising: an event camera that detects an event by capturing an external environment viewed by a user; a posture estimation unit that estimates the user's head posture using the event detection result of the event camera; a generation unit that generates an image based on the posture estimation result of the posture estimation unit; and a display unit that displays the image generated by the generation unit.
[0016] According to another aspect of the present invention, a method for generating an image is provided, comprising: a step of detecting an event by capturing an external environment viewed by a user with an event camera; a step of estimating the head posture of the user using the event detection result; a step of generating an image based on the posture estimation result; and a step of displaying the generated image.
[0017] As explained above, according to the embodiments of the present invention, by rapidly tracking the user's gaze and head posture using an event camera, the MTL latency of the smart glasses is lowered, allowing for foveated rendering of high refresh rate images and rapid image correction.
[0018] Fig. 1. Overview diagram of Motion-to-Photon latency explanation
[0019] Fig. 2. Smart glasses operation process
[0020] Fig. 3. Principle of an event camera
[0021] FIG. 4. Smart glasses according to an embodiment of the present invention
[0022] Fig. 5. Function of the internal event camera
[0023] FIG. 6. Method for generating a foveated image according to another embodiment of the present invention
[0024] FIG. 7. Smart glasses according to another embodiment of the present invention
[0025] Fig. 8. Function of the external event camera
[0026] FIG. 9. Method for generating a foveated image according to another embodiment of the present invention
[0027] The present invention will be described in more detail below with reference to the drawings.
[0028] In the case of conventional smart glasses, the MTP latency is determined to be around 20ms. As shown in Figure 1, MTP latency is the time interval from when a user performs an action (such as turning their head) until a change in content resulting from that action appears on the screen (if the user turns their head to the right, the real world must turn to the left). The term 'Photon' refers to the moment when a photon flashes on the display, as the entire display system is involved in this process. Because the human visual system detects these changes very well, it is difficult to notice a delay of about 20ms, but delays longer than that are noticeable, making it difficult to express naturalness or high-quality video.
[0029] The process of how the system operates in smart glasses is illustrated in Fig. 2. Generally, data is acquired from a motion-detecting sensor (camera) to calculate a pose corresponding to 6 DOF (Degree of Freedom), and a rendering process is performed to express this as an image. During this process, the part to be rendered includes pose information and spatial information acquired from the camera. Only when this rendering result is displayed on a screen can an image (screen) with movement applied be shown to the user's eyes.
[0030] Most smart glasses undergo this process, resulting in an MTP latency of around 20ms used for eye tracking or pose tracking; it is true that this makes it somewhat difficult to produce natural-looking video. This is because significant time delays are already included in the sensor and tracking stages, so there are limitations in displaying video of accurate poses no matter how quickly the data is rendered and displayed. Recently, technologies that predict movement to render and display images have been advancing to overcome these limitations, but they are currently being used in extremely limited stages.
[0031] Accordingly, an embodiment of the present invention presents smart glasses using an event camera. By performing eye tracking and head pose estimation at high speed using an event camera to reduce MTL latency, this technology enables foveated rendering of high refresh rate images and rapid image correction.
[0032] Event cameras use a method in which each pixel represents the amount of change (displacement) from the previous time point (about 1 / 1000s) rather than the frame-by-frame method of conventional cameras, and generally, when used in other fields (sports, drones, etc.), they are combined well with images acquired from conventional CMOS cameras to acquire images at very high speeds.
[0033] In an embodiment of the present invention, the advantages of such an event camera are utilized to capture the movements of a user using a smart camera more accurately and quickly, and furthermore, to precisely track eye movements (gaze) to display images expressed on smart glasses more effectively.
[0034] As shown in Fig. 3, the event camera derives a specific amount of change compared to the previous time point, and has the advantage of having a very short interval, being turned on when derived in the positive direction and turned off when derived in the negative direction.
[0035] Therefore, the embodiment of the present invention was conceived based on the idea that this result can be combined with existing video to track motion or gaze with great precision. Furthermore, since very precise tracking has become possible by combining event results with an artificial intelligence network model, the invention proposes a method to go beyond the level of motion recognition to very precisely track the user's eyes (pupils) when using smart glasses and apply this to Foveated rendering.
[0036] FIG. 4 is a diagram illustrating the configuration of smart glasses according to an embodiment of the present invention. As illustrated, the smart glasses according to an embodiment of the present invention are configured to include an internal event camera (110), an eye tracking unit (120), a communication unit (130), a foveated image generation unit (140), an image correction unit (150), and an image display unit (160).
[0037] The internal event camera (110) captures the pupil of a user wearing smart glasses to generate an event video and outputs the generated event video as an event detection result. The eye tracking unit (120) tracks the user's gaze from the event video output from the internal event camera (110). Figure 5 schematically illustrates the eye tracking by the internal event camera (110) and the eye tracking unit (120).
[0038] The foveated image generation unit (140) generates a foveated image based on the eye tracking results of the eye tracking unit (120). Specifically, the foveated image generation unit (140) generates a high-resolution central view image in the direction of the user's gaze and generates a low-resolution peripheral view image around the central view image.
[0039] The communication unit (130) receives the original video for generating a foveated video from a content server (not shown) and transmits it to the foveated video generation unit (140).
[0040] The image correction unit (150) corrects pincushion distortion and barrel distortion on the foveated image generated by the foveated image generation unit (140) based on the gaze tracking result by the gaze tracking unit (120).
[0041] The image correction unit (150) can be implemented with a pre-trained artificial intelligence model that receives a foveated image and an eye tracking result (the point where the user's gaze is directed in the foveated image) and generates an image with corrected pincushion distortion and barrel distortion.
[0042] Meanwhile, image correction by the image correction unit (150) may be performed on the entire foveated image, but it is also possible to perform it only on the center-view image.
[0043] The image display unit (160) is configured to display a foveated image corrected by the image correction unit (150) and can be implemented as a waveguide display.
[0044] FIG. 6 is a diagram illustrating the flow of a method for generating a foveated image according to another embodiment of the present invention.
[0045] As described above, first, an internal event camera (110) captures the pupil of a user wearing smart glasses to generate an event video (S210), and an eye tracking unit (120) tracks the user's gaze from the event video generated in step S210 (S220).
[0046] Then, the foveated image generation unit (140) generates a foveated image based on the eye tracking result from step S220 (S230), and the image correction unit (150) corrects pincushion distortion and barrel distortion on the foveated image generated in step S230 based on the eye tracking result from step S220 (S240).
[0047] Afterwards, the image display unit (160) displays the foveated image corrected at step S240 (S250).
[0048] FIG. 7 is a diagram illustrating the configuration of smart glasses according to another embodiment of the present invention. As illustrated, the smart glasses according to the embodiment of the present invention are configured to include an internal event camera (110), an eye tracking unit (120), a communication unit (130), a foveated image generation unit (140), an image correction unit (150), an image display unit (160), an external event camera (170), and a head pose estimation unit (180).
[0049] The internal event camera (110), the eye tracking unit (120), and the communication unit (130) can be implemented in the same way as the configurations shown in FIG. 1, so a detailed description of them is omitted.
[0050] The external event camera (170) captures the external environment viewed by the user wearing smart glasses to generate an event video, and outputs the generated event video as an event detection result.
[0051] FIG. 8 schematically illustrates the detection of external environment change events by an external event camera (170). Changes in the external environment may occur due to the movement of external objects, but may also occur due to changes in the user's head posture. By using the external event camera (170), changes in the user's head posture can be detected very precisely and quickly, which can be very effectively utilized to reduce MTP latency.
[0052] Accordingly, the head posture estimation unit (180) estimates the user's head posture from an external environment change event appearing in an event video output from an external event camera (170), and transmits the estimated head posture to the foveated image generation unit (140).
[0053] The foveated image generation unit (140) generates a foveated image based on the gaze tracking result by the gaze tracking unit (120) and the head pose estimation result by the head pose estimation unit (180).
[0054] The communication unit (130), image correction unit (150), and image display unit (160) can be implemented in the same way as the configurations shown in FIG. 1, so a detailed description of them is omitted.
[0055] FIG. 9 is a diagram illustrating the flow of a method for generating a foveated image according to another embodiment of the present invention.
[0056] As described above, first, an internal event camera (110) captures the pupil of a user wearing smart glasses to generate an event video (S310), and an eye tracking unit (120) tracks the user's gaze from the event video generated in step S310 (S320).
[0057] Then, the external event camera (170) captures the external environment viewed by the user wearing smart glasses to generate an event video (S330), and the head posture estimation unit (180) estimates the user's head posture from the external environment change event shown in the event video generated in step S330 (S340).
[0058] Then, the foveated image generation unit (140) generates a foveated image based on the eye tracking result from step S320 and the head pose estimation result from step S340 (S350), and the image correction unit (150) corrects pincushion distortion and barrel distortion on the foveated image generated in step S350 based on the eye tracking result from step S320 (S360).
[0059] Afterwards, the image display unit (160) displays the foveated image corrected at step S360 (S370).
[0060] Up until now, a preferred embodiment of a smart glass using an event camera has been described in detail.
[0061] Conventional technology utilizes IR camera or CMOS camera sensors, but the proposed technology can use an event camera as a hybrid with an existing camera, or use an event camera and convert the acquired images into event-unit data into image data using artificial intelligence technology.
[0062] In the case of conventional smart glasses, the eye tracking speed is about 20ms, making it somewhat difficult to improve video quality. In other words, considering the eye tracking speed, it is difficult to display video with a refresh rate of 50Hz or higher. On the other hand, the smart glasses according to the embodiment of the present invention use an event camera with high temporal resolution and high dynamic range performance to facilitate tracking head movements or gaze, and thus have the advantage of being able to use the refresh rate of the display mounted on the smart glasses as is.
[0063] In addition, by utilizing the smart glasses according to the embodiment of the present invention, it is possible to precisely track gaze using an event camera, so it can be used for Foveated rendering which is used in existing smart glasses, and it is possible to accurately determine the point of gaze, so it can also be used for image correction of Pincushion distortion or barrel distortion.
[0064] Meanwhile, it goes without saying that the technical concept of the present invention may also be applied to a computer-readable recording medium containing a computer program that enables the device and method according to the present embodiment to perform their functions. Furthermore, the technical concept according to various embodiments of the present invention may be implemented in the form of computer-readable code recorded on a computer-readable recording medium. A computer-readable recording medium may be any data storage device that can be read by a computer and store data. For example, a computer-readable recording medium may be a ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, etc. Additionally, computer-readable code or a program stored on a computer-readable recording medium may be transmitted through a network connected between computers.
[0065] Furthermore, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above. Various modifications are possible by those skilled in the art without departing from the essence of the invention as claimed in the claims, and such modifications should not be understood individually from the technical spirit or perspective of the present invention.
Claims
1. A first event camera that detects an event by photographing the user's pupil; A gaze tracking unit that tracks the gaze using the event detection result of a first event camera; A generating unit that generates an image based on the eye tracking results of an eye tracking unit; and A wearable device characterized by including a display unit that displays an image generated by a generation unit.
2. In Claim 1, A second event camera that detects events by capturing the external environment viewed by the user; and It further includes a posture estimation unit that estimates the user's head posture using the event detection result of the second event camera, and The generating part is, A wearable device characterized by generating an image by further referring to the head posture estimated by the estimation unit.
3. In Claim 1, Wearable devices are, A wearable device characterized by being unable to display images with a refresh rate above a certain level due to motion-to-photon latency.
4. In Claim 3, Wearable devices are, A wearable device characterized by being a glass-type device.
5. In Claim 1, The generating part is, A wearable device characterized by generating foveated rendered images based on eye tracking results.
6. In Claim 1, The generating part is, A wearable device characterized by correcting a generated image based on eye tracking results.
7. In Claim 6, Video correction is, A wearable device characterized by including correction for at least one of pincushion distortion and barrel distortion.
8. In Claim 7, The correction unit is, A wearable device characterized by being an artificial intelligence model pre-trained to generate a corrected image by receiving an image generated by a generation unit and an eye tracking result from an eye tracking unit.
9. A step of detecting an event by photographing the user's pupil with a first event camera; A step of tracking gaze using event detection results; A step of generating an image based on eye tracking results; and A method for generating an image characterized by including the step of displaying the generated image.
10. An event camera that detects events by capturing the external environment viewed by the user; A pose estimation unit that estimates the user's head pose using the event detection results of an event camera; A generation unit that generates an image based on the pose estimation result of a pose estimation unit; and A wearable device characterized by including a display unit that displays an image generated by a generation unit.
11. A step of detecting events by capturing the external environment viewed by the user with an event camera; A step of estimating the user's head posture using event detection results; A step of generating an image based on the pose estimation result; and A method for generating an image characterized by including the step of displaying the generated image.