Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

1135 results about "Voice pitch" patented technology

Pitch is an integral part of the human voice. The pitch of the voice is defined as the "rate of vibration of the vocal folds" . The sound of the voice changes as the rate of vibrations varies. As the number of vibrations per second increases, so does the pitch, meaning the voice would sound higher.

Personal message service with enhanced text to speech synthesis

A server in a network gathers textual information, such as news items, E-mail and the like. From that information, the server develops or identifies messages for use by individual subscribers. The same server that accumulates the text messages or another server in the network converts the textual information in each message to a sequence of speech synthesizer instructions. The converted messages, containing the sequences of speech synthesizer instructions, are transmitted to each identified subscriber's terminal device. A synthesizer in the terminal generates an audio waveform signal, representing the speech information, in response to the instructions. In the preferred embodiment, the terminals utilize concatenative type speech synthesizers, each of which has an associated vocabulary of stored fundamental sound samples. The instructions identify the sound samples, in order. The instructions also provide parameters for controlling characteristics of the signal generated during waveform synthesis for each sound sample in each sequence. For example, the instructions may specify the pitch, duration, amplitude, attack envelope and decay envelope for each sample. The division of the text to speech synthesis processing between the server and the terminals places the cost of the front end processing in the server, which is a shared resource. As a result, the hardware and software of the terminal may be relatively simple and inexpensive. Also, it is possible to upgrade the quality of the synthesis by upgrading the server software, without modifying the terminals.
Owner:GOOGLE LLC

Systems and methods for reducing speech intelligibility while preserving environmental sounds

ActiveUS20090306988A1Reduced speech intelligibilitySecret communicationSpeech synthesisSyllableRelative energy
An audio privacy system reduces the intelligibility of speech in an audio signal while preserving prosodic information, such as pitch, relative energy and intonation so that a listener has the ability to recognize environmental sounds but not the speech itself. An audio signal is processed to separate non-vocalic information, such as pitch and relative energy of speech, from vocalic regions, after which syllables are identified within the vocalic regions. Representations of the vocalic regions are computed to produce a vocal tract transfer function and an excitation. The vocal tract transfer function for each syllable is then replaced with the vocal tract transfer function from another prerecorded vocalic sound. In one aspect, the identity of the replacement vocalic sound is independent of the identity of the syllable being replaced. A modified audio signal is then synthesized with the original prosodic information and the modified vocal tract transfer function to produce unintelligible speech that preserves the pitch and energy of the speech as well as environmental sounds.
Owner:FUJIFILM BUSINESS INNOVATION CORP

Acoustic control system for electronic musical instrument

An acoustic control system for an electronic musical instrument, which can obtain optimal acoustic characteristics irrespective of whether or not a low range speaker is attached to a body thereof. A piano body has high range and mid range speakers. A low range speaker is removable from the body. A stand-excluding switch designates a first speaker-use mode for using the high range and mid range speakers alone, and a stand-including switch designates a both speaker-use mode for using both the high range and mid range speakers and the low range speaker. A ROM stores stand-excluding and stand-including factors for setting acoustic characteristics in the two modes, respectively. A CPU reads one of the factors, which corresponds to the mode designated by the stand-excluding or stand-including switch. A tone generator circuit generates a musical tone signal to be reproduced in one of the modes, based on the read factor.
Owner:KAWAI MUSICAL INSTR MFG CO

Controller and interface for home security, monitoring and automation having customizable audio alerts for sma events

A single platform for controller functionality for each of security, monitoring and automation, as well as providing a capacity to function as a bidirectional Internet gateway, is provided. Embodiments of the present invention provide such functionality by virtue of a configurable architecture that enables a user to adapt the system for the user's specific needs. Such configurability includes associating selected audio event tones with selected events associated with sensors and zones coupled to the system.
Owner:ICONTROL NETWORKS

Telephone terminal

A telephone terminal device such as a portable telephone performs music playback processes with respect to use-specified music data in which tempos, tone colors, and pitches are specifically processed to suit different uses while tone color assignment and musical score are commonly shared among different uses, or common-use music data that are partially modified to suit a specific use in reproduction such as production of incoming call melody sound, hold sound, background music (BGM) during conversation in progress, karaoke accompaniment sound, and music for appreciation.
Owner:YAMAHA CORP

Voice intelligibility enhancement system

Intelligibility of a human voice projected by a loudspeaker in an environment of high ambient noise is enhanced by processing a voice signal in accordance with the frequency response characteristics of the human hearing system. Intelligibility of the human voice is derived largely from the pattern of frequency distribution of voice sounds, such as formants, as perceived by the human hearing system. Intelligibility of speech in a voice signal is enhanced by filtering and expanding the voice signal with a transfer function that approximates an inverse of equal loudness contours for tones in a frontal sound field for humans of average hearing acuity.
Owner:DTS

Translating emotion to braille, emoticons and other special symbols

A system for incorporating emotional information in a communication stream from a first person to a second person, including an electronic image analyzer for determining an emotional component of a speaker or presenter using subsystems such as facial expression recognition, hand gesture recognition, body movement recognition, voice pitch analysis. A symbol generator generates one or more symbols such as emoticons, graphic symbols, or text modifications (bolding, underlining, etc.) corresponding to the emotional aspects of the presenter or speaker. The emotional symbols are merged with the audio or visual information from the speaker, and is presented to one or more recipients.
Owner:IBM CORP +1

Prototype waveform phase modeling for a frequency domain interpolative speech codec system

A system and method is provided that employs a frequency domain interpolative CODEC system for low bit rate coding of speech which comprises a linear prediction (LP) front end adapted to process an input signal that provides LP parameters which are quantized and encoded over predetermined intervals and used to compute a LP residual signal. An open loop pitch estimator adapted to process the LP residual signal, a pitch quantizer, and a pitch interpolator and provide a pitch contour within the predetermined intervals is also provided. Also provided is a signal processor responsive to the LP residual signal and the pitch contour and adapted to perform the following: provide a voicing measure, where the voicing measure characterizes a degree of voicing of the input speech signal and is derived from several input parameters that are correlated to degrees of periodicity of the signal over the predetermined intervals; extract a prototype waveform (PW) from the LP residual and the open loop pitch contour for a number of equal sub-intervals within the predetermined intervals; normalize the PW by a gain value of the PW; encode a magnitude of the PW; and separate stationary and nonstationary components of the PW using a low complexity alignment process and a filtering process that introduce no delay. The ratio of the energy of the nonstationary component of the PW to that of the stationary component of the PW is averaged across 5 subbands to compute the nonstationarity measure as a frequency dependent vector entity. A measure of the degree of voicing of the residual is also computed using openloop pitchgain, pitch variance, relative signal power, PW correlation and PW nonstationarity in low frequency subbands. The nonstationarity measure and voicing measure are encoded using a 6-bit spectrally weighted vector quantization scheme using a codebook partitioned based on a voiced / unvoiced decision. At the decoder, a stationary component of PW is reconstructed as a weighted combination of the previous PW phase vector, a random phase perturbation and a fixed phase vector obtained from a voiced pitch pulse.
Owner:HUGHES NETWORK SYST

Gesture synthesizer for electronic sound device

A MIDI-compatible gesture synthesizer is provided for use with a conventional music synthesizer to create musically realistic<DEL-S DATE="20020416" ID="DEL-S-00001" / >ally<DEL-E ID="DEL-S-00001" / > sounding gestures. The gesture synthesizer is responsive to one or more user controllable input signals, and includes several transfer function models that may be user-selected. One transfer function models properties of muscles using Hill's force-velocity equation to describe the non-linearity of muscle activation. A second transfer function models the cyclic oscillation produced by opposing effects of two force sources representing the cyclic oppositional action of muscle systems. A third transfer function emulates the response of muscles to internal electrical impulses. A fourth transfer function provides a model representing and altering virtual trajectory of gestures. A fifth transfer function models visco-elastic properties of muscle response to simulated loads. The gesture synthesizer outputs <DEL-S DATE="20020416" ID="DEL-S-00002" / >MIDI-compatible<DEL-E ID="DEL-S-00002" / > continuous pitch data, tone volume and tone timbre information. The continuous pitch data is combined with discrete pitch data provided by the discrete pitch generator within the conventional synthesizer, and the combined signal is input to a tone generator, along with the tone volume and tone timbre information. The tone generator outputs tones that are user-controllable in real time during performance of a musical gesture.
Owner:LONGO NICHOLAS

First draft-switching controller for personal ANR system

An active noise control system for use in testing hearing using a pure tone audiometry testing procedure and employing multiple switching controllers with pre-filtering means and a switch to select any one controller to provide a predetermined one and having the ability to configure each switching controller so that the maximum threshold shift occurs for the frequency of the test tone and for modifying each test tone in accordance with a standard calibration frequency.
Owner:GENTEX CORP

Bandwidth extension method, bandwidth extension apparatus, program, integrated circuit, and audio decoding apparatus

To provide a bandwidth extension method which allows reduction of computation amount in bandwidth extension and suppression of deterioration of quality in the bandwidth to be extended. In the bandwidth extension method: a low frequency bandwidth signal is transformed into a QMF domain to generate a first low frequency QMF spectrum; pitch-shifted signals are generated by applying different shifting factors on the low frequency bandwidth signal; a high frequency QMF spectrum is generated by time-stretching the pitch-shifted signals in the QMF domain; the high frequency QMF spectrum is modified; and the modified high frequency QMF spectrum is combined with the first low frequency QMF spectrum.
Owner:PANASONIC INTELLECTUAL PROPERTY CORP OF AMERICA

Speech encoder adaptively applying pitch preprocessing with warping of target signal

InactiveUS20010023395A1Efficient and effective of signalReduce bitrateSpeech analysisTarget signalClosed loop
A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. A speech encoder employing various encoding schemes based upon parameters including an available transmission bit rate. In addition, the speech encoder is operable to identify and apply an optimal encoding scheme for a given speech signal. The speech encoder may be applied code-excited linear prediction when the available bit rate is above a predetermined upper threshold. Pitch preprocessing, including continuous warping, may be applied when it is below a predetermined lower threshold. The encoder considers varying characteristics of the speech signal including the long term prediction mode of a previous frame, and a spectral difference between the line spectral frequencies of a current and a previous frame, a predicted pitch lag, an open loop pitch lag, a closed loop pitch lag, a pitch gain, and a pitch correlation.
Owner:SAMSUNG ELECTRONICS CO LTD

Optimizing pitch and other speech stimuli allocation in a cochlear implant

Errors in pitch (frequency) allocation within a cochlear implant are corrected in order to provide a significant and profound improvement in the quality of sound perceived by the cochlear implant user. In one embodiment, the user is stimulated with a reference signal, e.g., the tone “A” (440 Hz) and then the user is stimulated with a probe signal, separated from the reference signal by an octave, e.g., high “A” (880 Hz). The user adjusts the location where the probe signal is applied, using current steering, until the pitch of the probe signal, as perceived by the user, matches the pitch of the reference signal, as perceived by the user. In this manner, the user maps frequencies to stimulation locations in order to tune his or her implant system to his or her unique cochlea.
Owner:ADVANCED BIONICS AG

Self-contained real-time gait therapy device

A self-contained, real-time self-use gait therapy device with a gait sensor, actuator, output speaker, and battery receptacle enclosed in a component case with a belt clip. Based upon step duration, step impact force, or step form data generated by the gait sensor, the actuator drives the speaker, producing beeps on a real-time basis with the pitch of each beep being a function of the step duration, step impact force, or step form for each step of the user. The speaker output is monitored and used by the user on a real-time basis to modify and improve his or her gait.
Owner:IBM CORP

System And Method For Compressing And Reconstructing Audio Files

InactiveUS20080243518A1Easy to calculateExpand the range of equipmentSpeech analysisFrequency spectrumWaveform shaping
A system and method for the improved compression of audio signals and the restoration and enhancement of audio recordings missing high frequency content. In the preferred embodiment the different context models are applied to increase the compression ratio of spectral information, quantization coefficients and other information. Context models and arithmetic compression are used for final compression. The time-frequency amplitude envelope and degree of tonality parameters are extracted from the low frequency component. An estimate of the high frequency component is performed by applying a multiband distortion effect, waveshaping, to the low frequency content. Control of tonality is achieved by varying the number of bands within the multiband framework. A filterbank is used that roughly shapes the reconstructed high frequency component according to an estimation of the most probable shape.
Owner:SOUND GENETICS INC

Karaoke apparatus

A karaoke apparatus includes a sound effect processing system provided in a microprocessor. The system decodes standard song data from an internal storage or an external storage connected to an extended system interface by a song decoding module; corrects pitches of sing voices by a pitch correcting system, so the pitches of the singing voices are corrected to the pitches of the standard song or close to the pitches of the standard song. The singing voices are processed with harmony adding, tonal modification and speed-changing by a harmony adding system to produce an effect of chorus being composed of three voice parts. A pitch evaluating system is used for comparing the pitch sequence of the singing voices with the pitch sequence of the standard song to draw a voice graph so as to visually show a difference between the pitches of the singing voices and the pitches of the standard song, while providing score and comment of the singing voices. Therefore, a singer can be aware of the effect of his / her performance to immediately so as to increase the amusement in a karaoke singing.
Owner:MULTAK TECH DEV

Humming transcription system and methodology

A humming transcription system and methodology is capable of transcribing an input humming signal into a standard notational representation. The disclosed humming transcription technique uses a statistical music recognition approach to recognize an input humming signal, model the humming signal into musical notes, and decide the pitch of each music note in the humming signal. The humming transcription system includes an input means accepting a humming signal, a humming database recording a sequence of humming data for training note models and pitch models, and a statistical humming transcription block that transcribes the input humming signal into musical notations in which the note symbols in the humming signal is segmented by phone-level Hidden Markov Models (HMMs) and the pitch value of each note symbol is modeled by Gaussian Mixture Models (GMMs), and thereby output a musical query sequence for music retrieval in later music search steps.
Owner:ACER INC +1

Methods and apparatus for encoding and decoding data transmitted over telephone lines

A method of generation and detection of acoustic signals carrying alpha-numeric data for reducing the effects of noise and thereby improving signal transmission without the need to boost the power of the signal. In accordance with one aspect of the invention, a special DTMF acoustic signal consisting of a combination of two frequencies will be generated for the representation of a particular alpha-numeric character that will be similar to the standard DTMF tone. The difference between the amplitudes of both frequencies varies during the time of generation of the DTMF tone in such a way that both signals will arrive at a final destination detector, i.e., an interactive voice response board, at least for a portion of the generation time, at a relatively similar amplitude, thereby substantially increasing the probabilities of being recognized as a DTMF signal.
Owner:CIDWAY TECH

Singing synthesis parameter data estimation system

There is provided a singing synthesis parameter data estimation system that automatically estimates singing synthesis parameter data for automatically synthesizing a human-like singing voice from an audio signal of input singing voice. A pitch parameter estimating section 9 estimates a pitch parameter, by which the pitch feature of an audio signal of synthesized singing voice is got closer to the pitch feature of the audio signal of input singing voice based on at least both of the pitch feature and lyric data with specified syllable bondaries of the audio signal of input singing voice. A dynamics parameter estimating section 11 converts the dynamics feature of the audio signal of input singing voice to a relative value with respect to the dynamics feature of the audio signal of synthesized singing voice, and estimates a dynamics parameter, by which the dynamics feature of the audio signal of synthesized singing voice is got close to the dynamics feature of the audio signal of input singing voice that has been converted to the relative value.
Owner:NAT INST OF ADVANCED IND SCI & TECH

Melody retrieval system

InactiveUS20070163425A1Lower matching costsPenalize costGearworksMusical toysNatural language processingFrequency spectrum
A music retrieval system which take an input melody as the query. In one embodiment, changes or differences in the distribution of energy across the frequency spectrum over time are used to find breakpoints in the input melody in order to separate it into distinct notes. In another embodiment the breakpoints are identified based on changes in pitch over time. A confidence level is preferably associated with each breakpoint and / or note extracted from the input melody. The confidence level is based on one or more of: changes in pitch, absolute values of a spectral energy distribution indicator, relative values of the spectral energy distribution indicator, and the energy level of the input melody. The process of matching the input melody with songs in the music database is based on minimizing a cost computation that takes into account errors in the insertion and deletion of notes, and penalizes these errors in accordance with the confidence levels of the breakpoints and / or notes.
Owner:PERCEPTION DIGITAL TECH BVI

Apparatus and method for analyzing a sound signal using a physiological ear model

An apparatus for analyzing a sound signal is based on an ear model for deriving, for a number of inner hair cells, an estimate for a time-varying concentration of transmitter substance inside a cleft between an inner hair cell and an associated auditory nerve from the sound signal so that an estimated inner hair cell cleft contents map over time is obtained. This map is analyzed by means of a pitch analyzer to obtain a pitch line over time, the pitch line indicating a pitch of the sound signal for respective time instants. A rhythm analyzer is operative for analyzing envelopes of estimates for selected inner hair cells, the inner hair cells being selected in accordance with the pitch line, so that segmentation instants are obtained, wherein a segmentation instant indicates an end of the preceding note or a start of a succeeding note. Thus, a human-related and reliable sound signal analysis can be obtained.
Owner:FRAUNHOFER GESELLSCHAFT ZUR FOERDERUNG DER ANGEWANDTEN FORSCHUNG EV

System and method for automatic creation of digitally enhanced ringtones for cellphones

There is provided herein a system and method for automatically enhancing or creating ringtones for use on cellular phones. In the preferred embodiment, a user will be able to create an enhanced ringtone that is recognizable and distinct when played via a cell phone. The user will preferably begin by selecting the input audio material. An automatic analysis of the audio material is preferably conducted in order to determine a synthetic tone series that best represents it. An enhanced ringtone is then created by combining the original audio material with the tone series to form a unified audio work. The instant invention is primarily intended for use with cell phones but would also be useful elsewhere and this is especially so where the device that plays the enhanced audio material has limited audio capabilities.
Owner:MAGIX

Pitch detection of speech signals

Pitch detection of speech signals finds numerous applications in karaoke, voice recognition and scoring applications. While most of the existing techniques rely on time domain methods, the invention utilizes frequency domain methods. There is provided a method and system for determining the pitch of speech from a speech signal. The method includes the steps of: producing or obtaining the speech signal; distinguishing the speech signal into voiced, unvoiced or silence sections using speech signal energy levels; applying a Fourier Transform to the speech signal and obtaining speech signal parameters; determining peaks of the Fourier transformed speech signal; tracking the speech signal parameters of the determined peaks to select partials; and determining the pitch from the selected partials using a two-way mismatch error calculation.
Owner:STMICROELECTRONICS ASIA PACIFIC PTE

Song practice support device

A technique of enabling a singer to auditorily recognize how to change the way of singing is provided. The CPU (11) of a karaoke device (1) associates model voice data stored in a model voice data storage area (14c) with inputted learner voice data in a time axis direction. Then the CPU (11) shifts the pitch of the learner voice data so that it may coincide with the corresponding pitch of the model voice data according to the result of the association, compresses or extends the section (mora) of the learner voice data in the time axis direction so that the section length of the learner voice data coincides with the corresponding section length of the model voice data, and outputs the resultant learner voice data to a voice processing section (18). The voice processing section (18) converts the learner voice data supplied from the CPU (11) into an analog signal and generates the sound from a loudspeaker (19).
Owner:YAMAHA CORP

Music performance information converting method with modification of timbre for emulation

A method of converting performance information is carried out by the steps of receiving identification information that identifies an target tone generator different from an available tone generator, reading out first performance information that indicates a music performance in the form of a sequence of tones, and that includes timbre information specifying a timbre of the tones, and changing the timbre information included in the read first performance information based on the received identification information so as to generate second performance information including the changed timbre information adapted to the target tone generator. By such a manner, the available tone generator can process the second performance information to generate the sequence of tones having a timbre as if generated by the target tone generator.
Owner:YAMAHA CORP

Method and apparatus for extending the bandwidth of a speech signal

A bandwidth extension module, and an associated method and computer-readable medium, suitable for use in artificially extending the bandwidth of a lowband speech signal. The bandwidth extension module comprises a band-pass filter configured to produce a band-pass signal from the lowband speech signal; at least one carrier frequency modulator, each carrier frequency modulator configured to pitch-synchronously modulate the band-pass signal about a respective carrier frequency, the at least one carrier frequency modulator collectively producing a highband speech signal component; a synthesis filter configured to determine a highband speech signal based on the highband speech signal component; and a summation module configured to combine the lowband speech signal with the highband speech signal to obtain a bandwidth-extended speech signal.
Owner:APPLE INC

Method and device for gain quantization in variable bit rate wideband speech coding

ActiveUS20050251387A1Speech analysisDigital computer detailsWideband speech codingComputer science
The present invention relates to a gain quantization method and device for implementation in a technique for coding a sampled sound signal processed, during coding, by successive frames of L samples, wherein each frame is divided into a number of subframes and each subframe comprises a number N of samples, where N<L. In the gain quantization method and device, an initial pitch gain is calculated based on a number f of subframes, a portion of a gain quantization codebook is selected in relation to the initial pitch gain, and pitch and fixed-codebook gains are jointly quantized. This joint quantization of the pitch and fixed-codebook gains comprises, for the number f of subframes, searching the gain quantization codebook in relation to a search criterion. The codebook search is restricted to the selected portion of the gain quantization codebook and an index of the selected portion of the gain quantization codebook best meeting the search criterion is found.
Owner:NOKIA TECHNOLOGLES OY

Lyric display method, lyric display computer program and lyric display apparatus

Performance data and lyric data stored in an external storage device 35 is read out in accordance with the progression of a song. The performance data is sent to a tone generator 36, so that melody and accompaniment tones are reproduced by use of the performance data. The lyric data is sent to a display control circuit 14, so that lyrics represented by the lyric data are displayed on a display unit 12. The timing at which the lyrics are reproduced is determined by relative time data ΔT added to the lyric data, so that the lyrics are displayed at right positions corresponding to bar and beat positions. When a bar has lengthy lyrics, the width of the bar to be displayed on the display unit 12 is adjusted. As a result, the position of lyrics displayed corresponds to the progression of a song, resulting in users being capable of obtaining timing in both performing a musical instrument and singing when they sing a song while performing the musical instrument.
Owner:YAMAHA CORP

Audio-video systems supporting merged audio streams

An audio / video processing system combines a locally generated audio signal with pre-recorded audio / video programming to produce combined audio / video output. The audio / video system allows users to generate sound locally, mix them with the audio content of a pre-recorded audio / video program, and allows combined output to be presented by home audio / video system video displays and the speakers. The audio / video system provides independent sound characteristic control capability, such as volume control settings and voice and tone alterations settings and the equalization settings, for the various sound components produced in the process of mixing locally generated sounds. The sound components produced in the process of mixing include locally generated sound components, such as the voice sound component, the musical instrument sound components, and the sound components of the pre-recorded audio program such as voice, musical instrument and the background sound components. The pre-recorded audio / video programs may be obtained on the pay-per-view basis.
Owner:AVAGO TECH WIRELESS IP SINGAPORE PTE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products