Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

184 results about "Pitch period" patented technology

Pitch is the fundamental period of the speech signal. It the perceptual correlate of fundamental frequency. It represents the vibration frequency of the vocal cords during the sound productions (like vowels, for example). It is generally stated that pitch is the fundamental frequency of the signal.

System and method for providing high-quality stretching and compression of a digital audio signal

An adaptive “temporal audio scaler” is provided for automatically stretching and compressing frames of audio signals received across a packet-based network. Prior to stretching or compressing segments of a current frame, the temporal audio scaler first computes a pitch period for each frame for sizing signal templates used for matching operations in stretching and compressing segments. Further, the temporal audio scaler also determines the type or types of segments comprising each frame. These segment types include “voiced” segments, “unvoiced” segments, and “mixed” segments which include both voiced and unvoiced portions. The stretching or compression methods applied to segments of each frame are then dependent upon the type of segments comprising each frame. Further, the amount of stretching and compression applied to particular segments is automatically variable for minimizing signal artifacts while still ensuring that an overall target stretching or compression ratio is maintained for each frame.
Owner:MICROSOFT TECH LICENSING LLC

Decimated Bisectional Pitch Refinement

A method and system for refining an estimated pitch period estimate based on a coarse pitch useful for performing frame loss concealment in an audio decoder as well as for other applications. A normalized correlation at the coarse pitch lag is computed and used as the current best candidate. The normalized correlation is then evaluated at the midpoint of the refinement pitch range on either side of the current best candidate. If the normalized correlation at either midpoint is greater than the current best lag, the midpoint with the maximum correlation is selected as the current best lag. After each iteration, the refinement range is decreased by a factor of two and centered on the current best lag. This bisectional search continues until the pitch has been refined to an acceptable tolerance or until the refinement range has been exhausted. During each step of the bisectional pitch refinement, the signal is decimated to reduce the complexity of computing the normalized correlation.
Owner:AVAGO TECH INT SALES PTE LTD

Refinement of pitch detection

Successive pitch periods / frequencies are accurately determined in an audio equivalent signal. Using a suitable conventional pitch detection technique, an initial value of the pitch frequency / period is determined for so-called pitch detection segments of the audio equivalent signal. Based on the determined initial value, a refined value of the pitch frequency / period is determined. To this end, the signal is divided into a sequence of pitch refinement segments. Each pitch refinement segment is associated with at least one of the pitch detection segments. The pitch refinement segments are filtered to extract a frequency component with a frequency substantially corresponding to an initially determined pitch frequency of an associated pitch detection segment. The successive pitch periods / frequencies are determined in the filtered signal.
Owner:NXP BV +1

Voice-activity detection using energy ratios and periodicity

A voice activity detector (100) filters (204) out noise energy and then computes a high-frequency (2400 Hz to 4000 Hz) versus low-frequency (100 Hz to 2400 Hz) signal energy ratio (224), total voiceband (100 Hz to 4000 Hz) signal energy (214), and signal periodicity (208) on successive frames of signal samples. Signal periodicity is determined by estimating the pitch period (206) of the signal, determining a gain value of the signal over the pitch period as a function of the estimated pitch period, and estimating a periodicity of the signal over the pitch period as a function of the estimated pitch period and the gain value. Voice is detected (230–232) in a segment if either (a) the difference between the average high-frequency versus low-frequency signal energy ratio and the present segment's high-frequency versus low-frequency energy ratio either exceeds (310) a high threshold value or is exceeded (312) by a low threshold value, or (b) the average periodicity of the signal is lower (306) than a low threshold value, or (c) the difference between the average total signal energy and the present segment's total energy exceeds (304) a threshold value and the average periodicity of the signal is lower (304) than a high threshold value, or (d) the average total signal energy exceeds (412) a minimum average total signal energy by a threshold value and voice has been detected (410) in the preceding segment.
Owner:AVAYA INC

Speech Coding System to Improve Packet Loss Concealment

A method of significantly reducing error propagation due to voice packet loss, while still greatly profiting from long-term pitch prediction, is achieved by adaptively limiting the maximum value of the pitch gain for the first pitch cycle within one frame. A speech coding system for encoding a speech signal, wherein said a plurality of speech frames are classified into said a plurality of classes depending on if the first pitch cycle is included in one subframe or several subframes. The pitch gain is set to a value significantly smaller than 1 for the subframes covering first pitch cycle; wherein the pitch gain reduction is compensated by increasing the coded excitation codebook size or adding one more stage of excitation for the subframes covering the first pitch cycle.
Owner:HUAWEI TECH CO LTD

Lost frame compensating method, audio encoding apparatus and audio decoding apparatus

A frame loss compensating method wherein even when audio codec, which utilizes past sound source information of adaptive codebook or the like, is used as a main layer, the degradation in quality of the decoded audio of a lost frame and following frames is small. In this method, it is assumed that a pitch period ‘T’ and a pitch gain ‘g’ have been obtained as encoded information of a current frame. The sound source information of a preceding frame is expressed by use of a single pulse, and a pulse position ‘b’ and a pulse amplitude ‘a’ are used as encoded information for compensation. Then, an encoded sound source signal is a vector that builds up a pulse having an amplitude ‘a’ at a position that precedes by ‘b’ from the front position of the current frame. This vector is used as the content of the adaptive codebook, so that a vector, which builds up a pulse having an amplitude (g×a) at the position of the current frame (T−b), can be used as an adaptive codebook vector at the current frame. This vector is used to synthesize a decoded signal. The pulse position ‘b’ and pulse amplitude ‘a’ are then decided such that a difference between the synthesized signal and an input signal becomes minimum.
Owner:PANASONIC CORP

Method and apparatus for estimating harmonic information, spectral envelope information, and degree of voicing of speech signal

A degree of voicing is extracted using the characteristic of harmonic peaks existing in a constant period by converting an input speech or audio signal to a speech signal of the frequency domain, selecting the greatest peak in a first pitch period of the converted speech signal as a harmonic peak, thereafter selecting a peak having the greatest spectral value among peaks existing in each peak search range of the speech signal as a harmonic peak, extracting harmonic spectral envelope information by performing interpolation of the selected harmonic peaks, extracting non-harmonic spectral envelope information by performing interpolation of the non-harmonic peaks, and comparing the two pieces of envelope information to each other.
Owner:SAMSUNG ELECTRONICS CO LTD

System and method for providing high-quality stretching and compression of a digital audio signal

An adaptive “temporal audio scaler” is provided for automatically stretching and compressing frames of audio signals received across a packet-based network. Prior to stretching or compressing segments of a current frame, the temporal audio scaler first computes a pitch period for each frame for sizing signal templates used for matching operations in stretching and compressing segments. Further, the temporal audio scaler also determines the type or types of segments comprising each frame. These segment types include “voiced” segments, “unvoiced” segments, and “mixed” segments which include both voiced and unvoiced portions. The stretching or compression methods applied to segments of each frame are then dependent upon the type of segments comprising each frame. Further, the amount of stretching and compression applied to particular segments is automatically variable for minimizing signal artifacts while still ensuring that an overall target stretching or compression ratio is maintained for each frame.
Owner:MICROSOFT TECH LICENSING LLC

Voice noise reduction method for conference terminal based on neural network model

The invention provides a voice noise reduction method for a conference terminal based on the neural network model. The method comprises steps that S1, an audio file is collected by the conference terminal device to generate a digital audio signal in the time domain; S2, the digital audio signal is framed, and short-time Fourier transform is performed; S3, the amplitude spectrum of the frequency domain is mapped into a frequency band, and a Mel-frequency cepstral coefficient is further solved; S4, first-order and second-order differential coefficients are calculated through utilizing the Mel-frequency cepstral coefficient, a pitch correlation coefficient is calculated on each frequency band, and pitch period features and VAD features are further extracted; S5, input characteristic parameters of an audio are used as the input of the neural network model, the neural network is trained offline, the frequency band gain generating the noise reduction speech is learned, and the trained weightis solidified; S6, the neural network model is utilized to learn, the frequency band gain is generated, the outputted frequency band gain is mapped to the spectrum, the phase information is added, and a noise reduction speech signal is reduced through inverse Fourier transform. The method is advantaged in that real-time noise reduction can be achieved.
Owner:FUJIAN STAR NET WISDOM TECH CO LTD

Method of and apparatus for pitch period estimation

A pitch period of a signal is estimated by identifying a peak candidate of the signal as a peak and estimating the pitch period of the signal based on a time difference between the identified peak and a previous peak of the signal. An error-concealment apparatus includes a history block for storing signal data input to a decoder, an error likelihood detector, and a pitch period estimator. The error likelihood detector directs an input of the decoder to data of the signal data in the history block offset an estimated signal pitch period back in time responsive to a determination that data from a receiver has been lost or corrupted. The pitch period estimator estimates the pitch period of the signal via identification of peaks of the signal data.
Owner:TELEFON AB LM ERICSSON (PUBL)

Speech encoding method, apparatus and program

A speech encoding method, apparatus and program wherein an input speech signal is divided into a plurality of frames each having a predetermined length, each of the frames is subdivided into a plurality of subframes, a predictive pitch period of a subframe in a to-be-encoded current frame is obtained by using pitch periods of at least two frames of the current frame and past and future frames with respect to the current frame; a pitch period of a subframe in the current frame is obtained by using the predictive pitch period, a relative pitch pattern codebook storing a plurality of relative pitch patterns representing fluctuations in pitch periods of a plurality of subframes is prepared, and a change in pitch period of plural subframes is expressed with one relative pitch pattern selected from the relative pitch pattern codebook.
Owner:KK TOSHIBA

Packet loss concealment for voice over packet networks

A method to reduce memory requirements for a packet loss concealment algorithm in the event of packet loss in a receiver of pulse code modulated voice signals. Packet losses are concealed by using the spectral analysis filter memory to smooth a signal gap and by using a technique for determining a maximum repeatable waveform range instead of using the pitch period to reproduce lost packets. The invention uses fewer processing resources and results in improved performance compared to a packet loss concealment algorithm under G.711 Appendix I standards.
Owner:TELOGY NETWORKS

Linear prediction speech coding method and speech synthesis method

The invention discloses a linear prediction speech coding method and a speech synthesis method. The linear prediction speech coding method includes the following steps: speech is preprocessed; second-order backward linear prediction is carried out on the preprocessed speech, so that a residual signal is obtained; wavelet decomposition and compression are carried out on the residual signal, so that a wavelet coefficient is obtained, vector quantization is carried out on the wavelet coefficient, and meanwhile, the pitch period and gain parameters of the residual signal and the unvoicing and voicing characteristic of each sub-band are calculated and respectively and scalarly quantized. The speech synthesis method is based on the linear prediction speech coding method. After being adopted, the technical scheme of the invention can reduce the affection of noise on the quality of decoded speech, inhibit the deterioration of speech quality when unvoicing and voicing judgment is mistaken and improve the performance of coding unvoiced speech or background noise.
Owner:北京迅光达通信技术有限公司

Method and apparatus for concealing jitter buffer expansion and contraction

InactiveUS7099820B1Disadvantages changingProblems changingError preventionTransmission systemsContraction methodAudio frequency
Methods for concealing audible distortions resulting from changes in jitter buffer size include receiving an audio stream, storing the audio stream in a jitter buffer, and determining a pitch period associated with the audio stream. To expand the jitter buffer, a method includes inserting additional audio data that has a duration corresponding to an integer multiple of the pitch period into the audio stream. To contract the jitter buffer, a method includes removing a portion of the audio stream having a duration corresponding to an integer multiple of the pitch period.
Owner:CISCO TECH INC

Audio signal quality enhancement apparatus and method

An audio signal quality enhancement apparatus and method. The apparatus includes a pitch calculating unit to extract a pitch period of an audio signal, a frequency domain transforming unit to transform the audio signal to a frequency domain, a frequency band dividing unit to classify the transformed audio signal into audio signals for each of the plurality of frequency bands based on the extracted pitch period, and a pitch enhancement unit to determine a gain based on a volume of the transformed audio signal, and to generate an output signal by multiplying each of the classified audio signals with respect to each of the plurality of frequency bands by the gain, thereby enhancing quality of the audio signal.
Owner:SAMSUNG ELECTRONICS CO LTD +1

Method and device for performing frame erasure concealment on higher-band signal

A method for performing a frame erasure concealment for a higher-band signal involves calculating a periodic intensity of the higher-band signal with respect to pitch period information of a lower-band signal; comparing the periodic intensity to a preconfigured threshold and, if the periodic intensity is greater or equal to the preconfigured threshold, performing the frame erasure concealment with a pitch period repetition based method. If the periodic intensity is less than the preconfigured threshold, performing the frame erasure concealment with a previous frame data repetition based method. A device for performing a frame erasure concealment includes a periodic intensity calculation module, a pitch period repetition module, and a previous frame data repetition module. The pitch period repetition module performs the frame erasure concealment with a pitch period repetition based method; and the previous frame data repetition module performs the frame erasure concealment with a previous frame data repetition based method.
Owner:HUAWEI TECH CO LTD

Method for Varying Speech Speed

ActiveUS20080140391A1Facilitate decelerationFacilitate accelerationEar treatmentDigital computer detailsSpeech rateSpeech sound
A method for varying speech speed is provided. The method includes the following steps: receive an original speech signal; calculate a pitch period of the original speech signal; define search ranges according to the pitch period; find a maximum within each of the search ranges of the original speech signal; divide the original speech signal into speech sections according to the maxima; obtain a speed-varied speech signal by applying a speed-varying algorithm to each speech section of the original speed signal according to a speed-varying command; and eventually, output the speed-varied speech signal.
Owner:MICRO-STAR INTERNATIONAL

Concealing lost packets in a sub-band coding decoder

An electronic device for reconstructing a lost packet in a Sub-Band Coding (SBC) decoder is described. The electronic device includes a processor and instructions stored in memory. The electronic device detects a lost packet, obtains a zero-input response of a synthesis filter bank and obtains a coarse pitch estimate. The electronic device also obtains a fine pitch estimate based on the zero-input response and the coarse pitch estimate. The electronic device selects a last pitch period based on the fine pitch estimate and uses samples from the last pitch period for the lost packet.
Owner:QUALCOMM INC

Method and apparatus for tonal modification of voice

InactiveCN101354889ARealize adaptive pitch shiftingEasy to operateSpeech analysisSelf adaptiveSpeech sound
The invention discloses a speech tone modification method and a device thereof, which are used to realize the self-adaptive tone modification of speech. The speech tone modification method provided by the invention comprises the following steps: received speech is subjected to pitch detection to determine the pitch period of the speech; the range of pitch period to which the pitch period of the speech belongs is determined; according to the preset corresponding relation between the range of the tone period and pitch modification parameters, the tone modification parameters corresponding to the range of the pitch period to which the pitch period of the speech belongs is obtained; and the tone modification parameters are adopted to carry out tone modification processing to the speech. The speech tone modification method and the device are used for realizing the self-adaptive tone modification of the speech and avoids the fact that the prior art needs the user to fixedly change tone through manually setting the tone rising amplitude or the tone falling amplitude, thereby being convenient to operate by the user and improving the accuracy of tone modification.
Owner:VIMICRO CORP

Packet Loss Concealment for Speech Coding

ActiveUS20120323567A1Improving packet loss concealmentReducing and limiting energySpeech analysisSpeech codePacket loss concealment
A speech coding method of significantly reducing error propagation due to voice packet loss, while still greatly profiting from a pitch prediction or Long-Term Prediction (LTP), is achieved by limiting or reducing a pitch gain only for the first subframe or the first two subframes within a speech frame. The method is used for a voiced speech class; a pitch cycle length is compared to a subframe size to decide to reduce the pitch gain for the first subframe or the first two subframes within the frame. Speech coding quality loss due to the pitch gain reduction is compensated by increasing a bit rate of a second excitation component or adding one more stage of excitation component only for the first subframe or the first two subframes within the speech frame.
Owner:HUAWEI TECH CO LTD

Pitch detection method and apparatus

A pitch detection method and apparatus, the pitch detection apparatus includes: a data rearrangement unit which rearranges voice data on the basis of a center peak of the voice data included in a single frame; a decomposition unit which decomposes rearranged voice data into even symmetrical components on the basis of a center peak; a pitch determination unit which obtains a segment correlation value between a reference point and at least one or more local peaks in relation to even symmetrical components, and determines the location of a local peak corresponding to a maximum segment correlation value among the obtained segment correlation values, as a pitch period.
Owner:SAMSUNG ELECTRONICS CO LTD

Voiceprint recognition method based on pitch period mixed characteristic parameters

The invention provides a voiceprint recognition method based on pitch period mixed characteristic parameters. The method comprises the following steps of voice signal acquisition and input, voice signal preprocessing and voice signal combined characteristic parameter extraction, i.e. a pitch period, LPCC, delta LPCC, energy, first order difference of energy and GFCC characteristic parameters are extracted to be combined into multidimensional characteristic vectors together, the multidimensional characteristic vectors are screened by adopting a discrete binary particle swarm optimization algorithm, the voice model of a speaker is obtained by introducing universal background model UBM training, and finally test voice is recognized by utilizing a GMM-UBM model. Compared with a mode that voiceprint recognition is performed through single voice signal characteristic parameter, recognition accuracy of the voiceprint recognition and system stability are effectively enhanced by adopting the combined characteristic parameters and using the voiceprint recognition system of the GMM-UBM model.
Owner:芽米科技(广州)有限公司

Pitch detection method and apparatus

A pitch detection method and apparatus, the pitch detection apparatus includes: a data rearrangement unit which rearranges voice data on the basis of a center peak of the voice data included in a single frame; a decomposition unit which decomposes rearranged voice data into even symmetrical components on the basis of a center peak; a pitch determination unit which obtains a segment correlation value between a reference point and at least one or more local peaks in relation to even symmetrical components, and determines the location of a local peak corresponding to a maximum segment correlation value among the obtained segment correlation values, as a pitch period.
Owner:SAMSUNG ELECTRONICS CO LTD

Speaker recognition method for deliberately pretended voices

The invention provides a speaker recognition method for deliberately pretended voices. Firstly, a reasonable recording scheme is set up in an anechoic room without noise and reflection for eight deliberately pretended voices, namely tone raising, tone lowering, quick speaking, slow speaking, nose nipping, mouth covering, object biting (holding a pencil in the mouth) and chewing (chewing gum), then based on pitch period presorting, the Mel frequency cepstrum coefficient and a Gauss hybrid model are used for carrying out recognition under pretending of a speaker, and finally self-adaptive group adjustment is adopted to achieve high-quality speaker recognition of pretended voices. The method can be applied to voice cases that criminals cover up identities through pretended voices.
Owner:NANJING UNIV OF POSTS & TELECOMM

Anti-noise low-bitrate speech coding method and decoding method

The invention provides an audio data coding method and a decoding method. The coding method comprises the steps that original audio is obtained, non-speech data in the original audio is removed through detection of an end point, and speech section data are obtained; pre-enhancement is carried out on each frame of speech data, and the speech energy is calculated after the interference of part of noise is removed; the pitch period of each frame of speech data is calculated by analyzing the period characteristic and the pure and turbid state of all sub-bands, and spectrum parameters are enhanced through a multi-layer neural network model; speech frame clustering is carried out through the spectrum parameters, the pitch periods and energy, and a speech section is composed of adjacent frames with the similar characteristics; after the mean value characteristics of all the sections of speech spectrum parameters, the pitch periods and the energy and the number of frames of each section of speech are calculated, quantization is carried out; quantified various speech parameters are coded, and a speech data package is generated. The high speech quality can be kept under the condition of the extremely low code rate.
Owner:东莞市凌进精密制造有限公司

Method and device for hiding throw-away frame

The invention is concerned with the losing-frame hiding equipment and the method, it is: gets the current losing-frame pitch period by the last good frame pitch period before losing frame, recovers the current losing-frame excitation signal by the latest good frame excitation signal, reduces the sensation contrast of the receiver, improves the sound quality. The invention can avoid the buzzing effect creating by continuous losing-frame, improve the sound quality; process energy attenuation for the excitation signal, reduce the audition contrast of the receiver.
Owner:HUAWEI TECH CO LTD

Method and apparatus for reducing access delay in discontinuous transmission packet telephony systems

Speech at the beginning of a talkspurt in a discontinuous transmission (DTX) packet telephony system is speeded up to help make up for an access delay incurred during channel allocation. Incoming speech frames are buffered, a pitch period for a current portion of the signal is estimated, and then a pitch period=s worth of the signal is cut from that portion. This is continued until the original access delay, as estimated from the time lag between the commencement of voice input for the talkspurt, and notification that a channel is available, is eliminated. The remainder of the talkspurt is then transmitted without such compression.
Owner:AMERICAN TELEPHONE & TELEGRAPH CO
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products