Display glasses with auditory enhancement

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
Display glasses with gaze-tracking and sound processing capabilities enhance auditory perception by isolating and presenting sounds from the direction of gaze, addressing the limitations of conventional hearing aids in noisy environments.

JP7875823B2Active Publication Date: 2026-06-18INNOVEGA INC

View PDF 5 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: INNOVEGA INC
Filing Date: 2021-06-23
Publication Date: 2026-06-18

Application Information

Patent Timeline

23 Jun 2021

Application

18 Jun 2026

Publication

JP7875823B2

IPC: G06F3/01; G06F3/16; G09G5/00

CPC: G10L21/10; G06F3/013; G06F3/017; H04R1/028; H04R1/10; H04R1/406; H04R3/005; H04R2227/003

AI Tagging

Application Domain

Input/output for user-computer interaction Acquiring/recognising eyes

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Conventional hearing aids collect sound from multiple directions, making it difficult for users to isolate and perceive sounds of interest, and existing technologies do not effectively address hearing impairments or enhance auditory perception in noisy environments.

Method used

Display glasses equipped with gaze-tracking technology, microphones, and controllers that isolate and process sound from the direction of the user's gaze, presenting it as text, hand signals, or sound through auditory transducers, and optionally using lip-reading or keyword recognition to enhance auditory perception.

Benefits of technology

Enables users to clearly perceive sounds and speech of interest by isolating and processing sounds from the direction of gaze, improving auditory perception and providing additional auditory enhancement features.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 0007875823000001
Figure 0007875823000002
Figure 0007875823000003

Patent Text Reader

Abstract

Some embodiments provide display eyewear with hearing augmentation. In general, one disclosed aspect features a head-mounted device that includes a microphone, a display panel visible to a wearer, an eye-tracking device configured to determine a direction of gaze of a wearer of the head-mounted device, and a controller configured to extract speech from sounds collected by the microphone from the determined direction and present the extracted speech on the display panel.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] Cross - Reference to Related Applications This application claims the priority of U.S. Patent Application No. 16 / 915,951, filed on June 29, 2020, the content of which is incorporated herein by reference in its entirety.

[0002] The disclosed technology generally relates to display glasses, and more particularly, some embodiments relate to display glasses having an auditory function.

Summary of the Invention

[0003] Generally, one aspect disclosed is a head - mounted device comprising a microphone, a display panel visible to a wearer, a gaze - tracking device configured to determine the direction of the gaze of the wearer of the head - mounted device, and a controller configured to extract speech from sound collected by the microphone from the determined direction and present the extracted speech on the display panel.

[0004] Embodiments of a head-mounted device may include one or more of the following features: In some embodiments, the controller is further configured to present the extracted speech as text on a display panel. In some embodiments, the controller is further configured to present the text as multiple words simultaneously on the display panel. In some embodiments, the controller is further configured to present the text as a single word presented in chronological order on the display panel. In some embodiments, the controller is further configured to present the extracted speech as hand signals on the display panel. In some embodiments, an off-axis projector configured to project the extracted words onto a display panel comprises a semi-transparent diffuser. In some embodiments, the display panel is either transparent or occluded. In some embodiments, the microphone is a directional microphone. In some embodiments, the microphone comprises an array of microphone elements. In some embodiments, an auditory transducer is provided, and the controller is further configured to supply the auditory transducer with speech representing isolated sounds, which the auditory transducer represents. In some embodiments, a hearing aid system comprising an auditory transducer is provided. In some embodiments, the device comprises an auditory transducer, and a controller is further configured to supply the auditory transducer with a voice representing extracted speech, which the auditory transducer then expresses. In some embodiments, the auditory transducer comprises at least one of a speaker, an ear speaker, and a bone conduction auditory system. In some embodiments, the device comprises an auditory transducer and an additional microphone configured to collect additional sound from the sides and rear of the wearer's head, and the controller is further configured to supply the auditory transducer with a voice representing the additional sound in response to the additional sound collected by the additional microphone representing a predetermined keyword, which the auditory transducer then expresses. In some embodiments, the head-mounted device is eyeglasses.In some embodiments, the head-mounted device is an augmented reality headset.

[0005] Generally, one aspect of the disclosed invention features a non-temporary machine-readable storage medium encoding instructions that can be executed by a hardware processor of a computing component, the machine-readable storage medium including instructions for causing the hardware processor to perform a method relating to a head-mounted device, the method comprising determining the direction of the wearer's gaze of the head-mounted device, collecting sounds emanating from the determined direction, extracting speech from the collected sounds, and presenting the extracted speech on a display panel of the head-mounted device visible to the wearer.

[0006] Embodiments of a non-temporary machine-readable storage medium may include one or more of the following features: In some embodiments, the method further includes presenting the extracted speech as text on a display panel. In some embodiments, the method further includes presenting the text as multiple words simultaneously on the display panel. In some embodiments, the method further includes presenting the text as a single word presented in chronological order on the display panel. In some embodiments, the method further includes projecting the text onto the display panel using an off-axis projector, wherein the display panel is a semi-transparent diffuser. In some embodiments, the display panel is either transparent or occluded. In some embodiments, the method further includes presenting the extracted speech as hand signals on the display panel. In some embodiments, the method further includes supplying speech representing the collected sound to an auditory transducer of a head-mounted device, wherein the auditory transducer expresses the speech. In some embodiments, the method further includes supplying speech representing the extracted speech to an auditory transducer of a head-mounted device, wherein the auditory transducer expresses the speech. In some embodiments, the method further includes collecting additional sounds from the sides and rear of the wearer's head, and supplying additional sounds representing the additional sounds to an auditory transducer of a head-mounted device in response to the additional sounds representing predetermined keywords, the auditory transducer then expresses the additional sounds. In some embodiments, the head-mounted device is eyeglasses. In some embodiments, the head-mounted device is an augmented reality headset.

[0007] Generally, one aspect of the disclosed head-mounted device is characterized by comprising a display panel visible to the wearer of the head-mounted device, an eye-tracking device configured to determine the direction of the wearer's gaze, a camera for capturing an image of a person's mouth in the determined direction, and a controller configured to extract speech from the image and display the extracted speech on the display panel.

[0008] Embodiments of a head-mounted device may include one or more of the following features: In some embodiments, the controller is further configured to present the extracted speech as text or hand signals. In some embodiments, the camera is further configured to capture an image of a person's mouth, and the controller is further configured to extract speech from the image of the person's mouth. In some embodiments, this method further includes the camera being further configured to capture an image of a person's hands, and the controller being further configured to extract speech from the image of the person's hands.

[0009] Generally, one aspect of the disclosed invention features a non-temporary machine-readable storage medium encoding instructions that can be executed by a hardware processor of a computing component, the machine-readable medium containing instructions for causing the hardware processor to perform a method relating to a head-mounted device, the method comprising determining the direction of the wearer's gaze of the head-mounted device, capturing an image of a person in the determined direction, extracting speech from the image, and presenting the extracted speech on a display panel of the head-mounted device that is visible to the wearer.

[0010] Embodiments of a non-temporary machine-readable storage medium may include one or more of the following features: In some embodiments, the method further includes presenting the extracted speech as text or hand signals. In some embodiments, the method further includes taking an image of a person's mouth and extracting speech from the image of the person's mouth. In some embodiments, the method further includes taking an image of a person's hands and extracting speech from the image of the person's hands. [Brief explanation of the drawing]

[0011] This disclosure will be described in detail with reference to the following figures according to one or more different embodiments. The figures are provided for illustrative purposes only and only show typical or exemplary embodiments.

[0012] [Figure 1] This figure shows examples of use cases according to several embodiments of the technology disclosed herein.

[0013] [Figure 2] This figure further illustrates examples of the use of Figure 1 in some embodiments of the technology disclosed herein.

[0014] [Figure 3] This disclosure shows display glasses equipped with multidirectional microphones according to several embodiments of the technology.

[0015] [Figure 4] This figure shows display glasses equipped with multiple narrow-angle microphones according to some embodiments of the technology disclosed herein.

[0016] [Figure 5] This disclosure shows display glasses equipped with a camera and multiple narrow-angle microphones according to several embodiments of the technology of this disclosure.

[0017] [Figure 6]A diagram showing an exemplary auditory augmentation process according to an embodiment of the technology of the present disclosure.

[0018] [Figure 7] A diagram showing a keyword-based auditory augmentation process according to an embodiment of the technology of the present disclosure.

[0019] [Figure 8] A diagram showing a lip-reading image-based auditory augmentation process according to an embodiment of the technology of the present disclosure.

[0020] [Figure 9] A diagram showing a hand-signal reading image-based auditory augmentation process according to an embodiment of the technology of the present disclosure.

[0021] [Figure 10] A diagram of an exemplary computing component that can be used to implement various features of the embodiments described in the present disclosure.

[0022] The drawings are not exhaustive and do not limit the present disclosure to the exact forms disclosed.

Modes for Carrying Out the Invention

[0023] A number of people suffer from some degree of hearing loss. Hearing loss can occur when some part of the ear or the auditory system is not functioning properly. The Centers for Disease Control and Prevention (CDC) in the United States has identified four types of hearing loss. Conductive hearing loss is caused by the inability of sound to pass through the outer or middle ear for some reason. Sensorineural hearing loss occurs when there is a problem in the inner ear or the auditory nerve. Mixed hearing loss is a combination of both conductive and sensorineural hearing loss. Auditory neuropathy spectrum disorder is a hearing loss that occurs when sound enters the ear normally but is not organized in a way that the brain can understand due to damage to the inner ear or the auditory nerve.

[0024] Hearing aids have been developed to help people with hearing loss. While conventional hearing aids are extremely useful, they collect sound from many directions, making it difficult for the listener to isolate and perceive the sound of interest.

[0025] The technology disclosed herein is beneficial to people with hearing impairments, but also to people without hearing impairments. For example, the technology disclosed herein can enable users without hearing impairments to perceive sounds and speech that they would not otherwise be able to perceive.

[0026] Embodiments of the technology of the present disclosure provide display glasses capable of collecting and separating sounds of interest according to the direction of the user's gaze. That is, the display glasses of the present disclosure can determine the direction in which the user is looking and collect and separate sounds from that direction. In some embodiments, the display glasses can perform speech recognition and present speech on the display of the display glasses in the form of, for example, text and hand signals. In some embodiments, the display glasses can also represent the separated sounds for the user, for example, as sounds in the earpieces of the display glasses. In some embodiments, the display glasses can also represent the recognized speech for the user, for example, as sounds in the earpieces of the display glasses.

[0027] Display glasses can collect sound using microphones. In various embodiments, any microphone technology may be used. For example, microphone technologies may include condenser, electret condenser, dynamic, ribbon, carbon, piezoelectric, optical fiber, laser, liquid, micro-electromechanical systems (MEMS), and combinations thereof. Various embodiments may use a variety of microphone patterns, including, for example, polar, omnidirectional, unidirectional, cardioid, hypercardioid, supercardioid, subcardioid, lobe, bidirectional, shotgun, boundary or pressure-zone microphones (PZM), and combinations thereof. In embodiments having multiple microphones and microphone elements, the selection of microphones or microphone elements may be controlled by gaze, gesture, voice, or other user interface controls.

[0028] Various embodiments can use acoustic modulation to improve hearing. For example, acoustic modulation may include selecting different volume levels for different frequencies. Acoustic modulation may be controlled by the user according to gaze direction, gestures, voice, or other user interface controls. In this way, the user can mix sounds to improve the perception of desired sounds and reduce or eliminate unwanted sounds.

[0029] In some embodiments, the display glasses may include a camera for capturing an image of a person in the user's line of sight, and lip-reading technology may be used to extract the person's speech from the image. In these embodiments, the display glasses may present the recognized speech to the user as text, hand signals, sound, or a combination thereof.

[0030] Figure 1 illustrates use cases according to several embodiments of the technology of this disclosure. At the top of Figure 1 is an exemplary scene 102 as perceived by a user of display glasses. In exemplary scene 102, four people are speaking simultaneously, and the four speeches are labeled “Voice 1,” “Voice 2,” “Voice 3,” and “Voice 4” from left to right, respectively. In this example, the user's gaze is positioned on the person on the right, as indicated by the reticle 106.

[0031] At the bottom of Figure 1, an exemplary display 104 produced by the display glasses is shown in relation to an exemplary scene 102. The exemplary display 104 includes four people. Furthermore, the exemplary display 104 presents the text of the speech of the person on the right, labeled as "Voice 4". In this example, the display glasses isolate the voice of the person on the right in the exemplary scene 102, recognize that person's speech, and present that speech as text in the exemplary display 104.

[0032] Figure 2 further illustrates use cases of Figure 1 in several embodiments of the technology of this disclosure. At the top of Figure 2 is an exemplary scene 202, similar to exemplary scene 102 in Figure 1, but at a different point in time. In exemplary scene 202, four people are again speaking simultaneously. In this example, the user's line of sight is located on the second person from the left, as indicated by the reticle 206.

[0033] At the bottom of Figure 2, an exemplary display 204 produced by the display glasses is shown in relation to an exemplary scene 202. The exemplary display 204 includes four people. Furthermore, the exemplary display 204 presents the text of the speech of the second person from the left, labeled "Voice 2". In this example, the display glasses isolate the voice of the second person from the left in the exemplary scene 202, recognize that person's speech, and present that speech as text in the exemplary display 204.

[0034] Figure 3 shows exemplary display glasses equipped with a multidirectional microphone according to several embodiments of the technology of this disclosure. Referring to Figure 3, display glasses 300 can be implemented using glasses 302. The glasses may include a front section, one or more temples, and one or more lenses. The front section may rest on the user's nose bridge. Each temple may rest on the user's ears. However, while the embodiments described are implemented using glasses, it should be understood that other embodiments may be implemented using any structure that can be worn on the user's head. For example, such structures may include monocles, glasses, headbands, hats, masks, augmented reality headsets, retinal display generating devices that are placed on or inside the eye, and so on.

[0035] The display glasses 300 may include one or more eye-tracking devices 304. The eye-tracking device 304 may be implemented as a conventional eye-tracking device that detects the orientation of the eyeball and determines the direction of the user's gaze based on its position. However, other eye-tracking techniques may be used instead of or in addition to this technique. Examples include pupil tracking, tracking of other anatomical features of the eye, tracking of landmarks in contact lenses, and the use of data from accelerometers in contact lenses or intraocular lenses. However, any type of eye-tracking device can be used. As used herein, the term "contact lens" means a lens that comes into contact with ocular tissue and may include corneal lenses, intraocular lenses, etc.

[0036] The display glasses 300 may include one or more microdisplay panels 306. Each microdisplay panel 306 may be positioned above or within the user's resting line of sight. Each microdisplay panel 306 may be implemented as a digital display panel, such as an LCD, LCOS, or OLED display panel. However, any type of display capable of performing the functions described herein may be used. The display may utilize waveguide technology, light field technology, off-axis projection, holographic reflection, femtoprojector, or any other means of generating an image near, above, or inside the eye. The microdisplay panel 306 may be transparent or shielded. Transparent displays may utilize technologies such as geometric optics, waveguide relays, semi-transparent diffusers with projection, multi-pinpoint reflectors, direct retinal projection, and in-lens femtoprojectors, all in any combination with MEMS scanners, LCOS, OLEDs, LCDs, and LEDs. Shielded displays can utilize technologies such as geometric optics, waveguide relays, reflectors / diffusers with projection, and multi-pinpoint reflectors, all in combination with shielding shields, in some way with MEMS scanners, LCOS, OLEDs, LCDs, and LEDs.

[0037] In some embodiments, the display glasses 300 may include a user interface that allows the user to transition the microdisplay panel 306 between a transparent state and an opaque state. In some embodiments, the display glasses 300 may include an off-axis projector configured to project images such as text and hand signals onto the microdisplay panel 306. In such embodiments, the microdisplay panel 306 may be equipped with a semi-transparent diffuser. In some embodiments, all or part of the microdisplay panel 306 may be placed within a contact lens or an intraocular lens.

[0038] The display glasses 300 may include a multidirectional microphone 308. The microphone 308 may be attached to the front of the display glasses 300, for example, to the nosepiece of the glasses 302. The coverage area of the multidirectional microphone 308 may be selected according to the user's needs. The multidirectional microphone 308 may include one or more elements. The multidirectional microphone 308 may include multiple elements arranged in a fan-shaped pattern facing different directions, as shown in Figure 3. Each element may use a directional pattern, such as a lobe-shaped pattern. Each element may have a different channel. Each channel may be modulated by the line of sight, other means, etc.

[0039] The display eyeglasses 300 may include one or more auditory transducers. In the example in Figure 3, the auditory transducer may be implemented as a pair of speakers 310 mounted on the temples of the eyeglasses 302 near the user's ears. In some embodiments, the auditory transducer may be implemented as a speaker, an ear speaker, a bone conduction hearing system, or a combination thereof. In some embodiments, the auditory transducer may be part of a hearing aid system.

[0040] In some embodiments, the display glasses 300 may include a controller 312. The controller 312 can process data generated by the eye-tracking device 304 to determine the direction of the user's gaze. The controller 312 can process sound collected by the microphone 308. The controller 312 can generate sound and supply it to the speaker 310. The controller 312 can generate a display and supply it to the microdisplay panel 306. The controller 312 may be located in one temple and / or in some other part of the display glasses 300.

[0041] In some embodiments, the display glasses 300 may include one or more rear / side microphones 318 for capturing sounds emanating from outside the user's field of view. These microphones 318 may be positioned at the rear of the temples of the glasses 302, as shown in Figure 3.

[0042] It should be understood that the embodiment in whole or in part of Figure 3 may be used in conjunction with other embodiments described herein.

[0043] Figure 4 shows display glasses equipped with multiple narrow-angle microphones according to several embodiments of the technology of this disclosure. Referring to Figure 4, display glasses 400 can be implemented using glasses 402. The glasses may include a front section, one or more temples, and one or more lenses. The front section may rest on the user's nose bridge. Each temple may rest on the user's ears. However, while the embodiments described are implemented using glasses, it should be understood that other embodiments may be implemented using any structure that can be worn on the user's head. For example, such structures may include headbands, hats, masks, and so on.

[0044] The display glasses 400 may include, for example, one or more eye-tracking devices 304, one or more microdisplay panels 306, one or more auditory transducers, one or more rear / side microphones 318, and a controller 312, as described above.

[0045] The display glasses 400 may include multiple narrow-angle microphones 408. Microphones 308 may be mounted along the front of the display glasses 400. The angle β of the microphones 308 may be selected according to the user's needs.

[0046] It should be understood that the embodiment in whole or in part of Figure 4 may be used in conjunction with other embodiments described herein.

[0047] Figure 5 shows display glasses equipped with a camera and multiple narrow-angle microphones according to several embodiments of the technology of this disclosure. Referring to Figure 5, display glasses 500 can be implemented using glasses 502. The glasses may include a front section, one or more temples, and one or more lenses. The front section may rest on the user's nose bridge. Each temple may rest on the user's ears. However, while the embodiments described are implemented using glasses, it should be understood that other embodiments may be implemented using any structure that can be worn on the user's head. For example, such structures may include headbands, hats, masks, and the like.

[0048] The display glasses 500 may include, for example, one or more eye-tracking devices 304, one or more microdisplay panels 306, one or more auditory transducers, one or more rear / side microphones 318, and a controller 312, as described above. The display glasses 500 may also include, for example, a plurality of narrow-angle microphones 408, as described above.

[0049] The display glasses 500 may include a camera 510. A controller 312 can receive images from the camera 510 and process those images. Based on the received and / or processed images, the controller 312 can supply images to the microdisplay panel 306. The camera 510 may be implemented as a digital camera or the like. However, any type of camera capable of performing the functions described herein may be used. In some implementations, the camera 510 can be used as a source for the microdisplay panel 306 at either the same time as or after the image capture by the camera 510.

[0050] It should be understood that the embodiment in whole or in part of Figure 5 may be used in conjunction with other embodiments described herein.

[0051] Figure 6 shows exemplary auditory augmentation processes 600 according to several embodiments of the technology of this disclosure. Process 600 can be performed, for example, by the display glasses shown in Figures 3, 4, and 5. The elements of the processes disclosed herein are presented in a specific order. However, it should be understood that one or more elements of each process may be performed in a different order, simultaneously, or omitted.

[0052] Referring to Figure 6, process 600 may include determining the gaze direction in 602. The gaze direction represents the direction in which the user of the display glasses is looking. In the examples of Figures 3, 4, and 5, the gaze direction can be determined by the controller 312 of the display glasses based on a signal received from the eye-tracking device 304.

[0053] Process 600 may include, in 604, collecting sound emitted from the line of sight. In the example in Figure 3, sound can be collected by a wide-angle microphone 308, which can supply an auditory signal representing the collected sound to the controller 312. In the examples in Figures 4 and 5, sound can be collected by one or more narrow-angle microphones 408, which can supply an auditory signal representing the collected sound to the controller 312. In some embodiments, the controller 312 can isolate sound emitted from the line of sight by turning on one or more microphones 208 facing the line of sight and turning off other microphones 208. In some embodiments, the controller 312 can isolate sound emitted from the line of sight using multiple microphones 208 and beamforming techniques. Other techniques may be used instead of, or in addition to, these techniques.

[0054] Referring again to Figure 6, process 600 may include extracting speech from the collected sound in 606. In the examples of Figures 3, 4, and 5, the controller 312 can extract speech from the collected sound. The speech can be extracted from the collected sound using any technique, such as conventional speech recognition techniques.

[0055] Referring again to Figure 6, process 600 may include, in 608, presenting the extracted speech on one or more display panels of the display glasses. The presentation may be in real time or near real time. In the examples of Figures 3, 4, and 5, the controller 312 may present the extracted speech on one or more of the microdisplay panels 306. In some embodiments, the extracted speech may be presented as text. For example, the text may be presented as multiple words simultaneously, for example, in the form of a sentence. As another example, the text may be presented as words in chronological order, that is, one word at a time. In some embodiments, the extracted speech may be presented as hand signals in, for example, American Sign Language (ASL). In some embodiments, the display glasses may translate the speech into another language and present the text in that language.

[0056] In some embodiments, in addition to presenting extracted speech on a display panel, the display glasses may also present the collected sound or extracted speech to the user as sound. Referring again to Figure 6, process 600 may include, in 610, supplying a voice representing the separated sound or extracted speech to the auditory transducer of the display glasses. Process 600 may include, in 612, representing the voice with an auditory transducer. In the examples of Figures 3, 4, and 5, the controller 312 may supply a voice representing the collected sound or extracted speech to one or more speakers 310 that represent the voice as sound for the user. In various embodiments, the sound may be represented by speakers, ear speakers, bone conduction hearing systems, cochlear implants, or other hearing improvement devices, and combinations thereof. In some embodiments, the display glasses may be able to translate the speech into another language and present the speech in that language. Process 600 may continue in 602.

[0057] In some embodiments, the display glasses may include one or more side and / or rear microphones that collect sound from the side and / or rear of the user and provide a voice representing the collected sound to the controller of the display glasses. The controller may perform speech recognition on the collected sound and, upon recognizing one or more predetermined keywords, present the sound and / or speech to the user. Figure 7 shows a keyword-based auditory augmentation process 700 according to an embodiment of the technology of the present disclosure.

[0058] Referring to Figure 7, process 700 may include collecting sound from the sides and rear of the wearer's head in 702. In the examples of Figures 3, 4, and 5, one or more wide-angle microphones 318, which may be positioned to face the sides and rear of the wearer's head, can collect sound and provide the controller 312 with an audio representation of the sound.

[0059] Referring again to Figure 7, process 700 may include determining in 704 whether a keyword is present in the collected sound. The keyword may include, for example, the user's name. In some embodiments, the display glasses include a user interface that allows the user to set one or more of a predetermined keywords. In the examples of Figures 3, 4, and 5, the controller 312 may perform speech recognition on the collected sound and use word matching techniques to determine whether any of the recognized words in the speech match one or more of a predetermined keyword. However, any suitable technique may be used.

[0060] If no keyword is detected, process 700 may, in 702, continue collecting sound. However, if a keyword is detected, process 700 may, in 706, alert the user to the keyword. In some embodiments, the alert may include supplying a sound representing the sound to an auditory transducer representing the sound. In the examples of Figures 3, 4, and 5, the controller 312 may supply the collected sound or recognized speech to one or more speakers 310 that represent the sound as sound for the user. In some embodiments, the alert may include presenting the keyword on one or more displays of the display glasses in the form of, for example, text or hand signals. In the examples of Figures 3, 4, and 5, the controller 312 may present the keyword on one or more of the microdisplay panels 306. In some embodiments, the keyword may be presented as sound or off-center on the display to indicate the direction from which the sound originates. After alerting the user to the keyword, process 700 may continue in 702.

[0061] In some embodiments, the display glasses may include a camera for capturing images of people speaking, and lip-reading techniques may be used from the captured images. Figure 8 shows a lip-reading image-based auditory augmentation process 800 according to an embodiment of the technology of the present disclosure.

[0062] Referring to Figure 8, process 800 may include determining the gaze direction in 802. The gaze direction represents the direction in which the user of the display glasses is looking. In the examples of Figures 3, 4, and 5, the gaze direction can be determined by the controller 312 of the display glasses based on a signal received from the eye-tracking device 304.

[0063] Process 800 may include capturing an image of the mouth of a person in the line of sight in 804. In the example in Figure 5, the image can be captured by camera 530, which can supply a video signal representing the captured image to controller 312.

[0064] Referring again to Figure 8, process 800 may include extracting speech from the captured image in 806. In the example in Figure 5, the controller 312 can extract speech from the captured image. Speech can be extracted from the captured image using any technique, such as conventional lip-reading techniques. These techniques can be implemented, for example, using artificial intelligence techniques in a machine learning model.

[0065] Referring again to Figure 8, process 800 may include, in 808, presenting the extracted speech on one or more display panels of the display glasses. The presentation may be in real time or near real time. In the example of Figure 5, the controller 312 may present the extracted speech on one or more of the microdisplay panels 306, for example, as described above.

[0066] For example, as described above, in some embodiments, in addition to presenting the extracted speech on the display panel, the display glasses may also present the collected sound or extracted speech to the user as sound. Process 800 may continue in 802.

[0067] In some embodiments, the display glasses may include a camera for capturing images of people's hand movements, and hand signal reading technology can be used from the captured images. Figure 9 shows a hand signal reading image-based auditory augmentation process 900 according to an embodiment of the technology of the present disclosure.

[0068] Referring to Figure 9, process 900 may include determining the gaze direction in 902. The gaze direction represents the direction in which the user of the display glasses is looking. In the examples of Figures 3, 4, and 5, the gaze direction can be determined by the controller 312 of the display glasses based on a signal received from the eye-tracking device 304.

[0069] Process 900 may include capturing an image of a person's hand in the line of sight in step 904. In the example in Figure 5, the image can be captured by camera 530, which can supply a video signal representing the captured image to controller 312.

[0070] Referring again to Figure 9, process 900 may include extracting speech from the captured image in 906. In the example in Figure 5, the controller 312 can extract speech from the captured image. Speech can be extracted from the captured image using any technique, such as conventional hand signal recognition techniques. These techniques can be implemented, for example, using artificial intelligence techniques in a machine learning model.

[0071] Referring again to Figure 9, process 900 may include, in 908, presenting the extracted speech on one or more display panels of the display glasses. The presentation may be in real time or near real time. In the example of Figure 5, the controller 312 may present the extracted speech on one or more of the microdisplay panels 306, for example, as described above. For example, the extracted speech may be presented as hand signals.

[0072] For example, as described above, in some embodiments, in addition to presenting the extracted speech on the display panel, the display glasses may also present the collected sound or extracted speech to the user as sound. Process 900 may continue in 902.

[0073] Figure 10 shows a block diagram of an exemplary computer system 1000 that can carry out the embodiments described herein. The computer system 1000 includes a bus 1002 or other communication mechanism for communicating information and one or more hardware processors 1004 connected to the bus 1002 for processing information. The hardware processors 1004 may be, for example, one or more general-purpose microprocessors.

[0074] The computer system 1000 further includes main memory 1006, such as random access memory (RAM), cache, and / or other dynamic storage devices connected to a bus 1002 for storing information and instructions executed by the processor 1004. The main memory 1006 may also be used to store temporary variables or other intermediate information during the execution of instructions by the processor 1004. Once such instructions are stored in a storage medium accessible to the processor 1004, the computer system 1000 becomes a dedicated machine customized to perform the operations specified in the instructions.

[0075] The computer system 1000 further includes read-only memory (ROM) 1008 or other static storage device connected to a bus 1002 for storing static information and instructions for the processor 1004. A storage device 1010, such as a magnetic disk, optical disk, or USB thumb drive (flash drive), is provided and connected to the bus 1002 for storing information and instructions.

[0076] The computer system 1000 can be connected via a bus 1002 to a display 1012, such as a liquid crystal display (LCD) (or touchscreen), to display information to the computer user. An input device 1014, including alphanumeric and other keys, is connected to the bus 1002 to transmit information and command selections to the processor 1004. Another type of user input device is a cursor control unit 1016, such as a mouse, trackball, or cursor directional keys, to transmit directional information and command selections to the processor 1004 and to control cursor movement on the display 1012. In some embodiments, the same directional information and command selections as the cursor control unit may be achieved by receiving touches on a touchscreen without using a cursor.

[0077] The computing system 1000 may include a user interface module for implementing a GUI that may be stored in mass storage as executable software code executed by a computing device. This module and other modules may include components such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables, as examples.

[0078] Generally, when used herein, words such as “component,” “engine,” “system,” “database,” and “datastore” can refer to logic embodied in hardware or firmware, or to a set of software instructions written in a programming language such as Java, C, or C++, possibly having entry and exit points. Software components may be compiled and linked into an executable program, installed in a dynamic link library, or written in an interpreted programming language such as BASIC, Perl, or Python. It will be understood that software components may be callable by other components or by themselves, and may even be called in response to the detection of an event or interrupt. Software components configured to run on a computing device may be provided on a computer-readable medium such as a compact disk, digital video disk, flash drive, magnetic disk, or any other tangible medium, or as a digital download (initially stored in a compressed or installable format, which may require installation, decompression, or decryption prior to execution). Such software code, in part or in whole, may be stored on the memory device of the computing device for execution by the computing device. Software instructions may be embedded in firmware such as an EPROM. It will be further understood that hardware components may consist of connected logic units such as gates and flip-flops, and / or programmable units such as a programmable gate array or a processor.

[0079] Computer system 1000 can implement the techniques described herein using customized hardwired logic, one or more ASICs or FPGAs, firmware, and / or program logic that makes computer system 1000 a dedicated machine or programs it to be programmed to be a dedicated machine in combination with the computer system. According to one embodiment, the techniques described herein are executed by computer system 1000 in response to a processor 1004 that executes one or more sequences of one or more instructions contained in main memory 1006. Such instructions may be read into main memory 1006 from another storage medium, such as a storage device 1010. The execution of the sequence of instructions contained in main memory 1006 causes the processor 1004 to execute each step of the process described herein. In alternative embodiments, hardwired circuits may be used instead of or in combination with software instructions.

[0080] As used herein, the term “non-temporary medium” and similar terms refer to any medium that stores data and / or instructions that cause a machine to operate in a particular way. Such non-temporary media may include non-volatile media and / or volatile media. Non-volatile media include, for example, optical or magnetic disks such as storage device 1010. Volatile media include dynamic memory such as main memory 1006. Common forms of non-temporary media include, for example, floppy disks, flexible disks, hard disks, solid-state drives, magnetic tapes, or any other magnetic data storage media, CD-ROMs, any other optical data storage media, any physical media having a pattern of holes, RAM, PROMs, and EPROMs, FLASH-EPROMs, NVRAMs, any other memory chips or cartridges, and their networked versions.

[0081] Non-transient media are different from transmission media, but may be used together with transmission media. Transmission media are involved in the transfer of information between non-transient media. For example, transmission media include coaxial cables, copper wires, and optical fibers, such as wires including bus 1002. Transmission media can take the form of sound waves or light waves, such as those generated during radio and infrared data communications.

[0082] The computer system 1000 further includes a communication interface 1018 connected to the bus 1002. The network interface 1018 provides bidirectional data communication coupled to one or more network links connected to one or more local networks. For example, the communication interface 1018 may be an Integrated Services Digital Network (ISDN) card, cable modem, satellite modem, or modem for providing data communication connectivity to a corresponding type of telephone line. As another example, the network interface 1018 may be a local area network (LAN) card (or a WAN component communicating with a WAN) for providing data communication connectivity to a compatible LAN. A wireless link may also be implemented. In any such implementation, the network interface 1018 transmits and receives electrical, electromagnetic, or optical signals carrying digital data streams representing various types of information.

[0083] A network link typically provides data communication to other data devices over one or more networks. For example, a network link can provide connection to data equipment operated by a host computer or an Internet Service Provider (ISP) over a local network. An ISP provides data communication services over a worldwide packet data communication network now commonly referred to as the “Internet.” Both local networks and the Internet use electrical, electromagnetic, or optical signals to carry digital data streams. Examples of transmission media include signals over various networks that carry digital data to and from computer system 1000, as well as signals over a communication interface 1018 on a network link.

[0084] The computer system 1000 can send messages and receive data, including program code, via a network, network links, and communication interface 1018. In the case of the internet, a server can send requested code for an application program via the internet, an ISP, a local network, and communication interface 1018.

[0085] The received code may be executed by the processor 1004 when it is received, or it may be stored in the memory device 1010 or other non-volatile memory device for later execution.

[0086] Each of the processes, methods, and algorithms described herein may be embodied in code components executed by one or more computer systems or computer processors, including computer hardware, and may be automated in whole or in part. One or more computer systems or computer processors may also be configured to support execution of the relevant operations in a “cloud computing” environment or as “software as a service” (SaaS). Processes and algorithms may be implemented in part or in whole in application-specific circuits. The various features and processes described above may be used independently of each other or combined in various ways. Various combinations and partial combinations are intended to be included within the scope of this disclosure, and in some implementations, certain methods or process blocks may be omitted. Furthermore, the methods and processes described herein are not limited to any particular order, and the blocks or states associated therewith may be executed in any other appropriate order, in parallel, or in any other manner. Blocks or states may be added to or removed from the disclosed exemplary embodiments. The execution of a particular action or process may not only reside within a single machine, but may also be deployed across several machines and distributed among computer systems or computer processors.

[0087] When used herein, circuits may be implemented using any form of hardware, or a combination of hardware and software. For example, a circuit can be configured by implementing one or more processors, controllers, ASICs, PLAs, PALs, CPLDs, FPGAs, logic components, software routines, or other mechanisms. In implementation, the various circuits described herein may be implemented as separate circuits, or some or all of the described functions and features may be shared among one or more circuits. Various features or elements of a function may be described individually or described in a claim as a separate circuit, but these features and functions can be shared among one or more common circuits, and such descriptions do not require or suggest that a separate circuit is necessary to implement such features or functions. If all or part of a circuit is implemented using software, such software may be implemented to run on a computing system or processing system capable of performing the functions described with respect to such software, such as computer system 1000.

[0088] When used herein, the term “or” may be interpreted either comprehensively or exclusively. Furthermore, descriptions of resources, actions, or structures in the singular should not be understood to exclude the plural. Conditional language, in particular, such as “can,” “could,” “might,” or “may,” is generally intended to indicate that a particular embodiment includes certain features, elements, and / or steps, while other embodiments do not, unless otherwise specified or understood in the context in which they are used.

[0089] Terms and phrases used herein, as well as variations thereof, should be interpreted non-exclusively, rather than restrictively, unless explicitly stated otherwise. Modifiers such as “conventional,” “traditional,” “ordinary,” “standard,” and “known,” and similar terms, should not be interpreted as limiting the items described to items available for a given period or at a given point in time, but rather as encompassing conventional, traditional, ordinary, or standard technologies that may be available or known at any point in the present or future. In some cases, broader terms and expressions such as “one or more…,” “at least…,” and “…but not limited to…,” or other similar expressions exist, but such expressions should not be interpreted as meaning that a narrower case is intended or desired in cases where such broader expressions do not exist.

Claims

1. A head-mounted device, A first microphone is positioned at the front end of the device, directed towards the wearer, and collects a first sound emitted from the front of the wearer's head. A display panel that is visible to the wearer, An eye-tracking device configured to determine the direction of the wearer's gaze in the head-mounted device, Multiple second microphones positioned on both the side and rear end of the device and the wearer, for collecting second sounds emitted from the side and rear of the wearer's head, Auditory transducer and It is a controller, Speech is extracted from the first sound collected by the first microphone from the determined direction, The extracted speech is displayed on the display panel. The system is characterized in that, in response to the second sound collected by the plurality of second microphones representing a predetermined keyword, the second sound is supplied to the auditory transducer, and the auditory transducer expresses the second sound. The keyword is displayed at a position on the display panel indicating the direction from which the second sound was emitted. A controller configured as follows, Equipped with, A head-mounted device.

2. The head-mounted device according to claim 1, wherein the controller is further configured to display the extracted speech as text on the display panel.

3. The head-mounted device according to claim 1, wherein the controller is further configured to present the extracted speech as hand signals on the display panel.

4. A head-mounted device, A first microphone is positioned at the front end of the device, directed towards the wearer, and collects a first sound emitted from the front of the wearer's head. A display panel that is visible to the wearer, An eye-tracking device configured to determine the direction of the wearer's gaze in the head-mounted device, Multiple second microphones positioned on both the side and rear end of the device and the wearer, for collecting second sounds emitted from the side and rear of the wearer's head, It is a controller, The first speech is extracted from the first sound collected by the first microphone from the determined direction. The extracted first speech is displayed on the display panel, In response to the second sound representing a predetermined keyword, A second speech is extracted from the second sound collected by the aforementioned multiple second microphones. The extracted second speech is displayed at the position on the display panel indicating the direction from which the second sound was emitted. A controller configured as follows, Equipped with, A head-mounted device.

5. The controller is further configured to supply the auditory transducer with sound representing the first sound collected by the first microphone from the determined direction, and the auditory transducer expresses the sound representing the first sound collected by the first microphone from the determined direction. The head-mounted device according to claim 1.

6. Hearing aid system comprising the aforementioned auditory transducer The head-mounted device according to claim 1, further comprising:

7. The aforementioned controller, The speech extracted from the aforementioned first sound is referred to as the first speech. A second speech is extracted from the second sound collected by multiple second microphones. The auditory transducer is further configured to supply the auditory transducer with sounds representing either or both of the first speech and the second speech, wherein the auditory transducer expresses the sounds. The head-mounted device according to claim 1.

8. The head-mounted device according to claim 1, wherein the head-mounted device is a pair of glasses.

9. A non-temporary, machine-readable storage medium encoding instructions that can be executed by the hardware processor of a computing component, The hardware processor includes instructions for performing a method relating to a head-mounted device, The aforementioned method, The steps include determining the direction of the wearer's gaze for the head-mounted device, A step of collecting a first sound emitted from a determined direction using a first microphone, wherein the first microphone is positioned at the front end of the device, directed in front of the wearer, and is a microphone that collects a first sound emitted from in front of the wearer's head of the head-mounted device; A step of extracting a first speech from the collected first sound, A step of presenting the extracted first speech on the display panel of the head-mounted device, wherein the display panel is visible to the wearer, A step of collecting a second sound from the side and rear of the wearer's head using a second microphone, wherein the second microphone is positioned at both the side and rear end of the device and the wearer and collects the second sound emanating from the side and rear of the wearer's head; In response to the sound representing a predetermined keyword, a second sound representing the second sound is supplied to the auditory transducer of the head-mounted device, and the auditory transducer supplies the second sound representing the second sound to the auditory transducer of the head-mounted device. The steps include: displaying the keyword at a position on the display panel indicating the direction from which the second sound was emitted; including, A non-temporary, machine-readable storage medium.

10. The head-mounted device is a pair of glasses, the non-temporary machine-readable storage medium according to claim 9.

11. The head-mounted device according to claim 4, wherein the controller is further configured to display either or both of the extracted first speech and the second speech as text on the display panel.

12. The head-mounted device according to claim 4, wherein the controller is further configured to present either or both of the extracted first speech and the second speech as hand signals on the display panel.

13. Auditory transducer Furthermore, The controller is further configured to supply to the auditory transducer an audio signal representing the first sound collected by the first microphone from the determined direction, the auditory transducer expresses the audio signal representing the first sound collected by the first microphone from the determined direction. The head-mounted device according to claim 4.

14. Hearing aid system comprising the aforementioned auditory transducer The head-mounted device according to claim 13, further comprising:

15. Auditory transducer Furthermore, The controller is further configured to supply the auditory transducer with audio representing either or both of the extracted first speech and the second speech, and the auditory transducer represents the audio. The head-mounted device according to claim 4.

16. The head-mounted device according to claim 4, wherein the head-mounted device is a pair of glasses.

17. The aforementioned method, The steps include extracting a second speech from the second sound, Steps to display either or both of the extracted first speech and the second speech as text on the display panel. Further including, A non-temporary machine-readable storage medium according to claim 9.

18. The aforementioned method, The steps include projecting the extracted first speech onto the display panel using an off-axis projector, and Includes, The aforementioned display panel is a semi-transparent diffuser. A non-temporary machine-readable storage medium according to claim 9.

19. The aforementioned method, The steps include extracting a second speech from the second sound, Steps to present either or both of the extracted first speech and the second speech as hand signals on the display panel. Further including, A non-temporary machine-readable storage medium according to claim 9.

20. The aforementioned method, Steps include supplying the collected audio representing the first sound to the auditory transducer of the head-mounted device. It further includes, The auditory transducer expresses the sound, A non-temporary machine-readable storage medium according to claim 9.

21. The aforementioned method, The steps include extracting a second speech from the second sound, The process further includes supplying audio representing either or both of the extracted first speech and the second speech to an auditory transducer of the head-mounted device, the auditory transducer representing the audio, A non-temporary machine-readable storage medium according to claim 9.