Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

18 results about "Voice activity" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Voice activity detection is an essential component of many audio systems, such as automatic speech recognition and speaker recognition.

An electronic device and method for audio processing

PCT designated stageWO2026116709A1MicrophonesLoudspeakersVoice activitySpeech sound

A method for audio processing performed by an electronic device is provided. The method includes detecting at least one of a first voice activity near a first device and a second voice activity near a second device using a voice recognition module associated with the first device, comparing the first voice activity and the second voice activity to determine whether the first voice activity and the second voice activity exceed a predetermined threshold, and outputting the at least one of the first voice activity or the second voice activity through the second device in response to determining that the first voice activity and the second voice activity exceed the predetermined threshold.

An electronic device and method for audio processing

Owner:SAMSUNG ELECTRONICS CO LTD

A voice and visual interaction control method for safe driving

PendingCN122275933AInteraction controlDriver/operator

This invention discloses a safe driving voice and visual interaction control method. The method includes: synchronously collecting and preprocessing the driver's visual data and voice interaction data; extracting key visual features based on the preprocessed visual data and calculating visual state feature values; triggering standardized voice interaction based on the visual state feature values, calculating a voice activity score in conjunction with the preprocessed voice interaction data, and determining the driver's voice response delay level based on the voice activity score; matching based on a predefined 3D virtual guide action sequence according to the voice response delay level, triggering a linkage response after matching to form a non-intrusive driving reminder with light, sound, and shape linkage; and completing closed-loop feedback control based on the execution state of the 3D virtual guide action sequence and the non-intrusive driving reminder with light, sound, and shape linkage. The method provided by this invention can reduce the monitoring misjudgment rate and achieve non-intrusive safety reminders.

A voice and visual interaction control method for safe driving

Owner:SHANGHAI CHANGXING SOFTWARE CO LTD

Audio system and method for voice activity detection

ActiveCN115868178BMicrophone signalVoice activity

Audio systems, methods, and processor instructions are provided herein that detect voice activity of a user and provide an output voice signal. The systems, methods, and instructions receive a plurality of microphone signals and combine the plurality of microphone signals according to a first combination and a second combination. The first combination produces a primary signal having an enhanced response in a direction of a mouth of the user, and the second combination produces a reference signal having a reduced response in the direction of the mouth of the user. The primary signal and the reference signal are added and subtracted to produce a sum signal and a difference signal, respectively. The sum signal is compared to the difference signal and an output voice signal is provided based on the comparison.

Audio system and method for voice activity detection

Audio system and method for voice activity detection

Audio system and method for voice activity detection

Owner:BOSE CORP

Electronic device and method for audio processing

PendingUS20260148741A1Sound input/outputSpeech recognitionVoice activitySpeech sound

A method for audio processing performed by an electronic device is provided. The method includes detecting at least one of a first voice activity near a first device and a second voice activity near a second device using a voice recognition module associated with the first device, comparing the first voice activity and the second voice activity to determine whether the first voice activity and the second voice activity exceed a predetermined threshold, and outputting the at least one of the first voice activity or the second voice activity through the second device in response to determining that the first voice activity and the second voice activity exceed the predetermined threshold.

Electronic device and method for audio processing

Owner:SAMSUNG ELECTRONICS CO LTD

Methods and devices for encoding and / or decoding spatial background noise within a multi-channel input signal

PendingUS20260179627A1Speech analysisAlgorithmVoice activity

The present document describes a method (600) for encoding a multi-channel input signal (101) which comprises N different channels. The method (600) comprises, for a current frame of a sequence of frames, determining (601) whether the current frame is an active frame or an inactive frame using a signal and / or a voice activity detector, and determining (602) a downmix signal (103) based on the multi-channel input signal (101), wherein the downmix signal (103) comprises N channels or less. In addition, the method (600) comprises determining (603) upmixing metadata (105) comprising a set of parameters for generating, based on the downmix signal (103), a reconstructed multi-channel signal (111) comprising N channels, wherein the upmixing metadata (105) is determined in dependance of whether the current frame is an active frame or an inactive frame. The method (600) further comprises encoding (604) the upmixing metadata (105) into a bitstream.

Methods and devices for encoding and / or decoding spatial background noise within a multi-channel input signal

Owner:DOLBY LABORATORIES LICENSING CORP

Voice activity intelligent detection method and device, electronic equipment and storage medium

PendingCN122417064AComputer hardwareVoice activity

本申请提供一种语音活动智能检测方法、装置、电子设备及存储介质，涉及语音信号处理技术领域。其中，方法包括：确定待识别音频信号中存在通信起始词，基于待识别音频信号的各个待识别音频帧能量，确定语音活动检测VAD阈值；基于待检测音频信号的抖动率，确定待检测音频信号的滑动时间窗口长度；基于窗口语音能量和VAD阈值的比较结果，确定待检测音频信号的语音活动检测结果，窗口语音能量即待检测音频信号在滑动时间窗口长度内的语音能量。本申请根据各个待识别音频帧能量确定VAD阈值，能够结合待识别音频信号的实际能量确定VAD阈值，最终确定的VAD阈值能够快速适应动态变化的环境噪声，提高了语音活动检测的准确率。

Voice activity intelligent detection method and device, electronic equipment and storage medium

Owner:GUANGZHOU HOTAPPS TECH LTD

Method for determining an activity of an intrinsic voice of a user of a hearing device, hearing device, and hearing device system

ActiveUS12652502B2Speech analysisHearing aids signal processingTransducerHearing aid

A method for detecting activity of the own voice of a wearer of a hearing device by way of a signal processing apparatus of the hearing device. A first input signal is generated by a first input transducer, and a second input signal is generated by a second input transducer. The two input signals are supplied to a detection unit of the signal processing apparatus, which has a neural network and an input stage, which is connected in front of the neural network. Information signals are generated by the input stage on the basis of the two input signals and the information signals are evaluated by the neural network. A detection result is output by the detection unit based on the evaluation of the information signals by the neural network.

Method for determining an activity of an intrinsic voice of a user of a hearing device, hearing device, and hearing device system

Method for determining an activity of an intrinsic voice of a user of a hearing device, hearing device, and hearing device system

Method for determining an activity of an intrinsic voice of a user of a hearing device, hearing device, and hearing device system

Owner:SIVANTOS PTE LTD

Automated multi-speaker and multi-lingual speech analysis

PCT designated stageWO2026142921A1Semantic vectorSystems analysis

Exemplary system and methods use a combination of application modules and neural network architecture for multi-speaker and multi-language speech analysis. The exemplary system can receive a natural language input, which it decomposes into plural segments. A sub-group of the plural segments are accumulated in a buffer where each segment representing a period during which voice activity is detected. The sub-groups are analyzed for voice activity of multiple speakers and one or more text segments are generated based on the speakers. A semantic vector for each text segment is generated and stored in vector memory. Relevant data associated with each semantic vector is retrieved from the vector memory based on a similarity measure; and a response including specified information extracted from the one or more text segments is generated based on at least the relevant data.

Automated multi-speaker and multi-lingual speech analysis

Automated multi-speaker and multi-lingual speech analysis

Automated multi-speaker and multi-lingual speech analysis

Owner:ERESTECH

Speech enhancement method and device based on double microphone array, equipment and medium

PendingCN122392555ABeam directionNoise

The specification provides a speech enhancement method based on a dual microphone array, the method comprising: acquiring surrounding acoustic signals collected by two microphones in a microphone array respectively. Based on the acoustic signals collected by the two microphones respectively, coherence information between the two microphones is determined. Based on the coherence information, a speech activity state is judged, and in the case that the speech activity state is a speech missing state, a noise covariance matrix of the spatial filter is updated; and a gain function value of the post-filter is determined based on the coherence information. The two collected acoustic signals are subjected to beamforming processing by the spatial filter to obtain acoustic signals of at least one beam direction. The acoustic signals of any beam direction are subjected to filtering processing by the post-filter to obtain an estimation result of a speech component contained in the acoustic signals of the any beam direction.

Speech enhancement method and device based on double microphone array, equipment and medium

Speech enhancement method and device based on double microphone array, equipment and medium

Speech enhancement method and device based on double microphone array, equipment and medium

Owner:ZHEJIANG GEELY HLDG GRP CO LTD +1

METHOD AND SYSTEM FOR DETECTING SPEECH ACTIVITIES AND METHOD AND SYSTEM FOR SPEECH IMPROVEMENT

ActiveDE602021056325T2Speech analysisEarpiece/earphone attachmentsVoice activityMedicinal chemistry

Owner:SHENZHEN SHOKZ CO LTD

A method and system for scene-aware real-time accompaniment and structured memory generation

PendingCN122334482AEngineeringVoice activity

This invention discloses a method and system for real-time accompaniment and structured memory generation based on scene perception, relating to the fields of artificial intelligence, augmented reality, and mobile computing. The system includes core modules such as an audio and video acquisition module, a scene intent recognition and scheduling module, a sensor parameter dynamic configuration module, a real-time dialogue perception engine, and a structured information extraction module. This invention achieves dynamic optimization of hardware parameters driven by scene semantics, balancing perception accuracy and device power consumption. It realizes speaker recognition without training through a large language model, and adopts a local priority architecture to protect data privacy. It can achieve seamless continuous accompaniment after a single trigger, completing real-time understanding of dialogue content, multi-dimensional structured information extraction, and local memory generation. At the same time, through a voice activity adaptive acquisition mechanism and a runtime voice command reconfiguration mechanism, it achieves low-power acquisition and touchless parameter adjustment, making it suitable for various scenarios such as medical consultation, business negotiation, and daily social interaction.

A method and system for scene-aware real-time accompaniment and structured memory generation

Owner:CHENGDU RUIMIXIN TECHNOLOGY CO LTD

Adaptive speech control system and method for low bandwidth and noisy environments

PendingCN122090833ASpeech recognitionTransmissionNoiseEnvironmental perception

This invention discloses an adaptive voice control system and method for low-bandwidth and noisy environments, belonging to the field of speech recognition and intelligent control technology. It includes: a noise reduction processing module for enhancing the original audio signal; an environmental perception module for detecting speech activity and monitoring network status based on the enhanced noise-reduced signal; a collaborative control module for dynamically and collaboratively deciding on the target encoding bitrate and speech recognition model version by querying a built-in mapping table based on the perceived information; a dynamic encoding module for compressing the enhanced noise-reduced signal at the target bitrate; and a lightweight inference module for loading the model version to recognize the enhanced noise-reduced signal and generate control commands. This invention effectively solves the technical challenge of simultaneously ensuring low latency and high accuracy in voice control systems under dynamically changing low-bandwidth and high-noise environments by introducing real-time collaborative optimization of noise reduction preprocessing and encoding / recognition strategies. It is suitable for complex scenarios such as smart homes and industrial control.

Adaptive speech control system and method for low bandwidth and noisy environments

Adaptive speech control system and method for low bandwidth and noisy environments

Adaptive speech control system and method for low bandwidth and noisy environments

Owner:WUHAN HUABO COMM CO LTD

A medical speech recognition-based gastroscopy report generation system and method

PendingCN122245588AImage analysisBiological modelsAdaptive denoisingVoice activity

This invention belongs to the field of speech recognition technology. It discloses a system and method for generating gastrointestinal endoscopy reports based on medical speech recognition. The system includes a speech acquisition and processing module for acquiring real-time spoken speech from doctors during gastrointestinal endoscopy, performing adaptive noise reduction and mask occlusion compensation on the spoken speech, and outputting a speech segment sequence; a speech activity detection module for receiving the speech segment sequence, calculating the prior probability of speech activity based on the transition relationships of different operational states in the endoscopy procedure, and performing speech activity detection on the speech segment sequence based on the prior probability of speech activity to filter out speech event sequences consistent with the examination procedure; and a speech-image alignment module for transcribing the speech event sequences into medical speech, acquiring endoscopic images at corresponding times, and completing the ambiguous spoken content in the speech event sequence based on the endoscopic images to generate speech description data. This improves the automation level of examination recording.

A medical speech recognition-based gastroscopy report generation system and method

A medical speech recognition-based gastroscopy report generation system and method

Owner:SHANGHAI HAOKANGYUN MEDICAL TECHNOLOGY DEVELOPMENT CO LTD

Adaptive spatial filtering system

PCT designated stageWO2026149993A1NoiseEngineering

The invention relates to a spatial filtering system (4) comprising: - at least one first spatial filter (8), each filter being arranged to attenuate noise in a predefined direction; - a second spatial filter (9), arranged to attenuate noise in an optimized direction defined in real time; - a filter selector (10), arranged to select an optimal spatial filter; - a voice activity detector (12) arranged to detect the target voice; at least one parameter of the second spatial filter (9) being recalculated when an output signal of the voice activity detector is equal to a first value representative of an absence of the target voice.

Adaptive spatial filtering system

Owner:OROSOUND

Electronic device having a microphone and method for controlling the same

PendingUS20260197579A1NoiseEngineering

Disclosed are an electronic device and method for controlling the same that may increase the performance of sensing and recognizing a user's voice even in ambient noise by installing an Air Conduction Microphone (ACM) and a Bone Conduction Microphone (BCM) together and selectively using at least one of the ACM and the BCM depending on a use environment, the electronic device including a user input unit, an Air Conduction Microphone (ACM), a Bone Conduction Microphone (BCM), and a controller configured to execute an application, detect a voice activity of a user based on a BCM sensing signal received from the BCM, and control a mixing signal generated by synthesizing the BCM sensing signal and an ACM sensing signal received from the ACM to be inputted to the application in response to detecting the voice activity.

Electronic device having a microphone and method for controlling the same

Electronic device having a microphone and method for controlling the same

Electronic device having a microphone and method for controlling the same

Owner:LG ELECTRONICS INC

Control device and method for activating a vehicle function by voice sample authentication

PendingCN122290608AEngineeringVoice activity

This invention relates to a control device and method for activating vehicle functions by authenticating voice samples. The control device 100 detects voice commands using at least one sensor 102 installed in the vehicle. The control device 100 filters voice samples from the voice commands using a voice activity recognition module 104. The control device 100 examines the voice samples based on the recorded or imitated voice using a spoofing identification module 106, extracts a voice signature from the voice samples using a voice signature extractor 108, and authenticates the voice signature by referring to a pre-recorded voice signature from a user login process using an authenticator 110. The control device 100 extracts an entity and intent from the voice signature using an entity and intent extractor 112, and verifies the entity and intent simultaneously for multiple users by referring to pre-configured access data using a verification module 114. After verification, the control device 100 executes the intent to activate the vehicle functions.

Control device and method for activating a vehicle function by voice sample authentication

Control device and method for activating a vehicle function by voice sample authentication

Owner:ROBERT BOSCH GMBH +1

An overlap speech separation method, system, device and medium based on characteristic space orthogonal projection

PendingCN122157689ASpeech analysisBiological modelsCosine similarityCharacteristic space

The application discloses an overlapping speech separation method and system based on feature space orthogonal projection, a device and a medium, and relates to the technical field of speech recognition. The method comprises the following steps: real-time detection of a collected speech sequence is performed through a pre-trained segmentation model, speech active segments and non-active segments are divided, and multi-speaker overlapping speech segments are accurately located. For the overlapping segments, the identity of the main speaker is determined and the voiceprint features thereof are extracted by using context tracking information, and a known speaker feature subspace is constructed. Then, the mixed voiceprint features of the overlapping speech are extracted, the mixed voiceprint features are decomposed into a subspace parallel component and an orthogonal vertical component through matrix projection operation, so that residual features without the main speaker information are obtained, the cosine similarity with candidate speakers is calculated, the speaker with the highest similarity is selected as the secondary speaker, and the logical separation of the overlapping speech is realized. The known speaker features are eliminated through mathematical projection operation, and unknown speakers can be accurately identified in the residual space.

An overlap speech separation method, system, device and medium based on characteristic space orthogonal projection

Owner:TIANJIN UNIVERSITY OF TECHNOLOGY +1

System and method for automated multi-speaker and multi-lingual speech analysis

PendingUS20260178838A1Natural language data processingSpeech recognitionSemantic vectorSystems analysis

Exemplary system and methods use a combination of application modules and neural network architecture for multi-speaker and multi-language speech analysis. The exemplary system can receive a natural language input, which it decomposes into plural segments. A sub-group of the plural segments are accumulated in a buffer where each segment representing a period during which voice activity is detected. The sub-groups are analyzed for voice activity of multiple speakers and one or more text segments are generated based on the speakers. A semantic vector for each text segment is generated and stored in vector memory. Relevant data associated with each semantic vector is retrieved from the vector memory based on a similarity measure; and a response including specified information extracted from the one or more text segments is generated based on at least the relevant data.

System and method for automated multi-speaker and multi-lingual speech analysis

Owner:ERESTECH

Popular searches

Audio frequency Subvocal recognition Electronic equipment Feedback control Visual perception Visual interaction S Voice Feature based Safe driving Background noise