Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

271 results about "Voice activity detection" patented technology

Voice activity detection (VAD), also known as speech activity detection or speech detection, is a technique used in speech processing in which the presence or absence of human speech is detected. The main uses of VAD are in speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech section of an audio session: it can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol applications, saving on computation and on network bandwidth.

Method and an apparatus for voice activity detection

InactiveUS20120232896A1Easy to adaptFast processingSpeech recognitionDecision combinationSpeech sound
A voice activity detection apparatus (1) comprising: a signal condition analyzing unit (3) which analyses at least one signal parameter of an input signal to detect a signal condition SC of said input signal; at least two voice activity detection units (4-i) comprising different voice detection characteristics, wherein each voice activity detection unit (4-i) performs separately a voice activity detection of said input signal to provide a voice activity detection decision VADD; and a decision combination unit (5) which combines the voice activity detection decisions VADDs provided by said voice activity detection units (4-i) depending on the detected signal condition SC to provide a combined voice activity detection decision cVADD.
Owner:HUAWEI TECH CO LTD

Method and apparatus for comfort noise generation in speech communication systems

A method that may be used in variety of electronic devices for generating comfort noise includes receiving (705) a plurality of information frames indicative of speech plus background noise, estimating (710) one or more background noise characteristics based on the plurality of information frames, and generating a comfort noise signal (715) based on the one or more background noise characteristics. The method may further include generating a speech signal (720) from the plurality of information frames, and generating an output signal (725) by switching between the comfort noise signal and the speech signal based on a voice activity detection.
Owner:GOOGLE TECH HLDG LLC

Acoustic echo devices and methods

Hands-free phones with voice activity detection using a comparison of frame power estimate with an adaptive frame noise power estimate, automatic gain control with fast adaptation and minimal speech distortion, echo cancellation updated in the frequency domain with stepsize optimization and smoothed spectral whitening, and echo suppression with adaptive talking-state transitions.
Owner:TEXAS INSTR INC

System and method for winding audio content using a voice activity detection algorithm

A system and method for locating a preferable playback start location after a winding or rewinding action in an audio playing device. In response to an adjustment of the playing location for audio content to a desired playing position, the system determines whether at least one non-speech or silent period of at least a predetermined duration exists within the vicinity of the desired playing position. If at least one such non-speech or silent period exists within the vicinity of the desired playing position, the system adjusts the playing position to fall within one of the at least one non-speech period or silent period.
Owner:WSOU INVESTMENTS LLC

Device And Method For Voice Activity Detection

A device includes a sound signal analyser configured to determine whether a sound signal comprises speech. The device further includes a microphone system configured to discriminate sounds emanating from sources located in different directions from the microphone system so that sounds only emanating from a range of directions are included as signals possibly containing speech.
Owner:SONY CORP

Echo cancellation in telephones with multiple microphones

The present invention is directed to a telephone equipped with multiple microphones that provides improved performance during operation of the telephone in a speaker-phone mode. For example, the multiple microphones can be used to improve voice activity detection, which in turn, can improve echo cancellation. In addition, the multiple microphones can be configured as an adaptive microphone array and used to reduce the effects of (i) room reverberation, when a near-end user is speaking, and / or (ii) acoustic echo, when a far-end user is speaking.
Owner:AVAGO TECH WIRELESS IP SINGAPORE PTE

Method for adaptively adjusting sound effect and equipment thereof

ActiveCN102436821AReduce the impact on useImprove experienceSpeech analysisEnvironmental noiseEngineering
A method for adaptively adjusting a sound effect and equipment thereof are disclosed. The method comprises the following steps: acquiring an energy value of current environmental noise; receiving a first trigger instruction and adjusting a current output volume according to the energy value of the current environmental noise; when the energy value of the current environmental noise is larger thana first threshold, carrying out treble enhancement processing; when the energy value of the current environmental noise is less than a second threshold, carrying out bass enhancement processing. In the method, through collecting sound data, voice activity detection is performed to the sound data. When the first trigger instruction is received, the current output volume can be adjusted according to the current environmental noise energy value and a frequency response can be adjusted through the treble enhancement or the bass enhancement. The better sound effect can be obtained and is easy to realize.
Owner:HYTERA COMM CORP

Voice activity detection and wake-up method and device

The invention provides a voice activity detection and wake-up method and device, and relates to the technical field of machine learning speech recognition. The method includes the steps of acquiring voice activity detection data and wake-up data, and performing Fbank feature extraction on the voice activity detection data and wake-up data to obtain voice Fbank feature data; inputting the voice Fbank feature data to a binary neural network model to obtain binarized neural network output result data; and according to a preset backend evaluation strategy, processing the binarized neural network output result data, determining a voice start position and a voice end position of the voice activity detection data, and detecting wake-up word data in the wake-up data. The system framework of the invention can be applied to voice activity detection and voice wake-up technologies at the same time, and can implement accurate, fast, low-delay, small-model and low-power voice activity detection technologies and voice wake-up technologies.
Owner:TSINGHUA UNIV

Multi-band structure self-adaptive filter switching method for AEC (acoustic echo cancellation)

ActiveCN106782593AAchieving Convergence Speed ​​AdvantageOvercome speedSpeech analysisMulti bandAdaptive filter
The invention discloses a multi-band structure self-adaptive filter switching method for AEC (acoustic echo cancellation). Firstly, a far-end voice signal is acquired; a voice endpoint is detected, and a VAD (voice activity detection) flag bit and an improved envelope decision threshold are output; the voice signal is fed into a loudspeaker to serve as a desired signal and also input into a self-adaptive filter; the self-adaptive filter adopts a switchable multi-band structure and a corresponding self-adaptive algorithm, parameters of the filter are adjusted by use of the least mean square criterion according to feedback information, and the optimal solution is obtained. According to the provided switching method, voice characteristics are considered sufficiently under the condition that steady maladjustment is guaranteed, and optimized configuration of the convergence rate and the algorithm complexity is realized while advantages of the algorithm in the convergence rate are utilized. During actual application of echo cancellation, a single algorithm does not easily meet various variable demands. The variable switching algorithm provides more probability for a user and has great significance in application of self-adaptive echo cancellation.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Acoustic echo devices and methods

Hands-free phones with voice activity detection using a comparison of frame power estimate with an adaptive frame noise power estimate, automatic gain control with fast adaptation and minimal speech distortion, echo cancellation updated in the frequency domain with stepsize optimization and smoothed spectral whitening, and echo suppression with adaptive talking-state transitions.
Owner:TEXAS INSTR INC

Signal presence detection using bi-directional communication data

A system and method for using bi-directional conversation data to improve signal presence detection are disclosed. The detector module is adapted to communicate with a signal enhancement module. The detector module collects data from a transmit direction of the connection and a receive direction of a data connection. The collected data from the transmit and the receive direction is used to classify at least one of data in the transmit direction and data in the receive direction. Responsive to the classification, the signal enhancement module enhances data in one of the transmit direction and the receive direction. Hence, data classification accuracy is improved by using data from both the transmit and receive directions. In one embodiment, the detector module applies a voice activity detection module (VAD) process to detect the presence or absence of voice data in the collected data.
Owner:DITECH NETWORKS

Method, device and electronic equipment for voice activity detection

ActiveCN102044242ACapable of self-adaptive adjustmentImprove the performance of voice activation detectionSpeech analysisTime domainOperation mode
The embodiment of the invention discloses a method, device and electronic equipment for voice activity detection. The method comprises the following steps: acquiring time domain sorting parameters and frequency domain sorting parameters from audio frames; acquiring first distances between the time domain sorting parameters and the long-time sliding average value of the time domain sorting parameters in historical background noise frames; acquiring second distances between the frequency domain sorting parameters and the long-time sliding average value of the frequency domain sorting parameters in historical background noise frames; and determining whether the audio frames are foreground voice frames or background noise frames according to the first distances, the second distances and a determining polynomial group based on the first and second distances, wherein at least one coefficient in the determining polynomial group is a variable which can be changed with the operation mode of voice activity detection or the characteristics of input signals. The technical scheme can endue the determining criterion with self-adaptive regulation capability, thereby improving the performance of voice activity detection.
Owner:HUAWEI TECH CO LTD

Method and apparatus to facilitate voice activity detection and coexistence manager decisions

A system and method to facilitate voice activity detection and coexistence manager decisions is provided and include identifying a connection utilizing a first resource and a content stream corresponding to the connection, where the first resource conflicts with a second resource. The content of the content stream is classified into multiple levels based on a value of the content and then a priority is assigned to the first and second resources based on the level of the content of the first resource.
Owner:QUALCOMM INC

Single-channel voice enhancement method and system

The invention provides a single-channel voice enhancement method and a single-channel voice enhancement system. The method comprises the following steps of: extracting a noise signal from a noisy voice signal through voice activity detection; respectively performing outer ear, inner ear and middle ear simulation manipulation to the noisy voice signal and the noise signal through peripheral analysis; obtaining energy difference of each time frequency unit of the noisy voice signal and the noise signal subjected from simulation manipulation through feature extraction; generating different masking values to the energy difference of each time frequency unit and weighing the different masking values to obtain a masking processing signal; and rebuilding the voice signal to the masking processing signal and the noisy voice signal subjected from simulation manipulation to obtain a voice enhancement signal. The invention can decrease damage to a target voice signal and realize better denoising effect and keep higher voice quality under the environment with multi noises.
Owner:WUXI RES INST OF APPLIED TECH TSINGHUA UNIV +1

Noise suppressing multi-microphone headset

A new type of headset that employs adaptive noise suppression, multiple microphones, a voice activity detection (VAD) device, and unique mechanisms to position it correctly on either ear for use with phones, computers, and wired or wireless connections of any kind is described. In various embodiments, the headset employs combinations of new technologies and mechanisms to provide the user a unique communications experience.
Owner:JAWBONE INNOVATIONS LLC

Method and system for speech processing for enhancement and detection

A method for discriminating noise from signal in a noise-contaminated signal involves decomposing a frame of samples of the signal into decorrelated components, and using a difference between probability distributions of the noise contributions and the signal contributions to identify signal and noise. A Gaussian distribution is used to determine whether the components are only noise whereas a Laplacian distribution is used to determine whether the components contain the signal. Such discrimination may be used in speech enhancement or voice activity detection apparatus.
Owner:RPX CLEARINGHOUSE

Voice activity detection method in complex background noise

ActiveCN102194452ADifferentiate voiceDistinguish background noiseSpeech analysisBackground noiseSpeech sound
The invention discloses a voice activity detection method in complex background noise. The method sequentially comprises the following steps of: (1) performing TEO (Teager Energy Operator) operation on data; (2) pre-weighting input data x(n); (3) performing band-pass filtering; (4) framing and windowing; (5) calculating an evolution value of autocorrelation of each frame and a standard variance thereof; (6) calculating Stati of 20 frames at the initial stage, and a mean (Stati) and a standard variance std (Stati) thereof, comparing the std (Stati) with a preset threshold to judge whether voice is available; (7) calculating subsequent data; (8) calculating Stati of continuous FrameN frames, and performing secondary determination according to the mean (Stati) and the standard variance std (Stati) thereof; (9) considering that the speech interval Speechmin is equal to 100-200ms and duration Silencemin is equal to 500-1,000ms, judging that voice occurs under the condition that Statusfinalis equal to 0 when continuous Ns (the value is related to the FrameN) atatus is equal to 1; and judging that the voice is ended under the condition that Statusfinal is equal to 1 when continuous NE (the value is also related to the FrameN) atatus is equal to 0, and finally judging actual end points of the voice.
Owner:西安烽火电子科技有限责任公司

Activity detection by joint human and object detection and tracking

A computing device includes a communication interface, a memory, and processing circuitry. The processing circuitry is coupled to the communication interface and to the memory and is configured to execute the operational instructions to perform various functions. The computing device is configured to process a video frame of a video segment on a per-frame basis and based on joint human-object interactive activity (HOIA) to generate a per-frame pairwise human-object interactive (HOI) feature based on a plurality of candidate HOI pairs. The computing device is also configured to process the per-frame pairwise HOI feature to identify a valid HOI pair among the plurality of candidate HOI pairs and to track the valid HOI pair through subsequent frames of the video segment to generate a contextual spatial-temporal feature for the valid HOI pair to be used in activity detection.
Owner:FUTUREWEI TECH INC

System and method for reducing VOIP (voice over internet protocol) communication resource overhead

The invention discloses a system for reducing VOIP (voice over internet protocol) communication resource overhead, comprising an input layer, a convolution layer, a sampling sub-layer and an output layer, each layer being composed of a characteristic spectrum, each characteristic spectrum containing nerve cells; a method of using the system to reduce VOIP communication resource overhead includes specifically: 1, training a convolutional neural network; 2, initializing the convolutional neural network; 3, inputting voice to be measured into a VAD (voice activity detection) system; 4, extracting voice characteristic parameter MFCC and its first-order differential characteristic parameter from each frame in order; 5, composing the parameters of each frame into a one-dimensional characteristic map taken into the convolutional neural network system; 6, the convolutional neural network system outputting in order a result [x, y] of each frame of the voice to be detected, and the VAD system making judgment and recording the results. The system and method have the advantages that the convolutional neural network system is used in the VAD system for detecting, the misjudgment rate of the VAD system is reduced, calculation time and bandwidth are saved, and VOIP voice resource overhead can be reduced at the premise of ensuring communication quality.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Acoustic echo devices and methods

Hands-free phones with voice activity detection using a comparison of frame power estimate with an adaptive frame noise power estimate, automatic gain control with fast adaptation and minimal speech distortion, echo cancellation updated in the frequency domain with stepsize optimization and smoothed spectral whitening, and echo suppression with adaptive talking-state transitions.
Owner:TEXAS INSTR INC

Voice acquiring method and device adopting plurality of microphones

The invention provides a voice acquiring method and a voice acquiring device adopting a plurality of microphones. The method comprises the following steps: carrying out voice acquiring by adopting theplurality of microphones, wherein the microphones correspond to different voice acquiring channels, and thus voice signals of each voice acquiring channel are obtained; carrying out analog-digital conversion on the voice signals, thus obtaining voice digital signals; carrying out framing processing on PCM binary data of the voice digital signals, thus obtaining short-time stable audio signals corresponding to each frame of PCM binary data; carrying out voice activity detection on the short-time stable audio signals in sequence according to the frames, and determining the frames correspondingto the short-time stable audio signals as voice frames or non-voice frames; carrying out voice quality detection on fragment audio files corresponding to the voice frames by adopting the preset framenumber as the step size, and saving the fragment audio files with the qualified quality; and splicing the saved fragment audio files with the qualified quality for synthesizing the complete audio file.
Owner:SPEAKIN TECH CO LTD

Voice activity detection apparatus and method

A voice activity detection method comprising the steps of (a) Estimating in a noise power estimator the noise power within a signal having a speech component and a noise component, and (b) Calculating a likelihood ratio for the presence of speech in the signal from the estimated power of noise signals from step (a) and a complex Gaussian statistical model.
Owner:KK TOSHIBA

Audio signal segmentation algorithm

The present invention discloses an audio signal segmentation algorithm comprising the following steps. First, an audio signal is provided. Then, an audio activity detection (AAD) step is applied to divide the audio signal into at least one noise segment and at least one noisy audio segment. Then, an audio feature extraction step is used on the noisy audio segment to obtain multiple audio features. Then, a smoothing step is applied. Then, multiple speech frames and multiple music frames are discriminated. The speech frames and the music frames compose at least one speech segment and at least one music segment. Finally, the speech segment and the music segment are segmented from the noisy audio segment.
Owner:NAT CHENG KUNG UNIV

Echo reduction system

The present invention relates to a method for reducing an echo in a microphone signal generated by a microphone, comprising echo compensating the microphone signal by subtracting an estimated echo signal from the microphone signal to generate an echo compensated signal, detecting a speech activity of a local speaker on the basis of the microphone signal and the estimated echo signal and suppressing a residual echo in the echo compensated signal on the basis of the detected speech activity to obtain an output signal. The invention further relates to a system for processing a microphone signal generated by a microphone, comprising echo compensation filtering means configured to receive and echo compensate the microphone signal to output an echo compensated signal based on the received microphone signal, a speech activity detection means configured to detect speech activity of a local speaker by receiving and analyzing the echo compensated signal and to output a detection signal and a residual echo suppressing means configured to receive the detection signal and to receive and filter the echo compensated signal on the basis of the detection signal to output an output signal.
Owner:CERENCE OPERATING CO

Intelligent voice mixing method and device for multi-party voice communication

The invention discloses an intelligent voice mixing method and device for multi-party voice communication, and belongs to the technical field of multimedia. The method comprises the steps that in the voice communication process, current frame data of all active voice channels except a home terminal are obtained; voice active detection results of the current frame data of all the active voice channels and the short time average energy of all the active voice channels are obtained; voice channels for conducting voice mixing processing are selected according to the voice active detection results of the current frame data of all the active voice channels, the short time average energy of all the active voice channels, the number of voice channels with effective voice and gating identifiers corresponding to all the active voice channels; superposition voice mixing processing is conducted on the current frame data of the selected voice channels, and voice mixing data obtained after the superposition voice mixing are output. By means of the intelligent voice mixing method and device, noise generated in the multi-party voice communication is lowered, the clarity of voice in the multi-party voice communication is improved, and the execution efficiency of the multi-party voice communication is improved.
Owner:GUANGZHOU HUADUO NETWORK TECH

Voiceprint identification method and system

InactiveCN108766445ASharp angular boundariesImprove accuracySpeech analysisPattern recognitionNetwork model
The invention provides a voiceprint identification method and system. The method comprises steps that features of voiced frames in a training corpus set are extracted through VAD voice activity detection; inter-class angle boundaries of the features of the voiced frames are expanded based on the A-softmax loss function, and the intra-class angle of the features of the voiced frames is limited to train a neural network model; deep voiceprint features of a to-be-registered target are determined according to the trained neural network model, and the to-be-registered target and the deep voiceprintfeatures are registered in a voiceprint database; the deep voiceprint features of the to-be-registered target are determined according to the trained neural network model; identification is carried out according to similarity of each deep voiceprint feature in the voiceprint database and the deep voiceprint feature of the to-be-registered target. The invention further provides a voiceprint identification system. The method is advantaged in that the A-softmax loss function is utilized to limit the intra-class angle, so obvious angle boundaries are between corresponding different classes of embedding vectors, discriminability is improved, and identification accuracy is higher.
Owner:AISPEECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products