Voice dictation with audio large language model,A method and system for prosody perception calculation based on curvature sequence,Data processing method and apparatus,A dialect generation method and system based on voiceprint features,A national vocal dialect prosody intelligent correction method and system

Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

27 results about "Prosody" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

In linguistics, prosody is concerned with those elements of speech that are not individual phonetic segments (vowels and consonants) but are properties of syllables and larger units of speech, including linguistic functions such as intonation, tone, stress, and rhythm. Such elements are known as suprasegmentals.

Voice dictation with audio large language model

PendingGB2702128ASemantic analysisSpeech recognitionContext dataParallel processing

A method comprising receiving audio data 102, generating a transcription 151 comprising a sequence of terms 152, such as “Buy some tomatoes and bananas. Change tomatoes to potatoes”, parallel processing the audio data and the transcription using a multimodal large language model (LLM) 150 to identify one or more revision terms 152R, for example “Change”, specifying a revision action to perform on at least one other term in the sequence, in this instance “tomatoes”, and modifying the transcription 151M accordingly – “Buy some potatoes and bananas”. Identifying the revision term(s) may be based on a corresponding user intent 154 determined for each respective term in the sequence, for example, the user 10 does not intend the final transcription to include “tomatoes”. For each term in the sequence, parallel processing may comprise correlating its speech characteristics 156 such as pitch, tone or prosody information determined from the audio data with its corresponding linguistic context 158 determined from the transcription. Transcription correction may be based on a revision token inserted into the sequence, the token indicating an N number of terms for replacement and their corresponding replacement terms. User context data 104 may be obtained to tailor the LLM to a particular user. [Figure 1A]

Owner:GOOGLE LLC

A method and system for prosody perception calculation based on curvature sequence

PendingCN122173781AManufacturing computing systemsEvaluation resultAlgorithm

The application discloses a kind of based on fixed dimension curvature sequence's sense of rhythm perception and computing method and system, belong to data processing, signal analysis technical field.The method includes: obtaining sequence data, and it is discretized into 60-dimensional normalized curvature sequence;Preset reference curvature value 2π and total potential parameter 120 / π, construct reference curvature sequence;The point-by-point absolute deviation of both is calculated and weighted sum is obtained to get repulsion R, the compatibility P is calculated by formula P=1-R / Ψ_total, according to P value output rhythm evaluation result.The system corresponds to the above-mentioned method sets each functional module.The present application solves the problem that rhythm evaluation cannot be quantified in the prior art, and the calculation is complicated, the calculation is simple and efficient, and the rhythm evaluation of a variety of conventional signals can be adapted, and the practicality is strong.

Owner:成罡

Data processing method and apparatus

PendingCN122290562Aimprove interpretabilityimprove consistencyTime informationAudio frequency

This specification provides a data processing method and apparatus, wherein the data processing method includes: preprocessing initial audio data to obtain audio data, and inputting the audio data into an audio recognition model to obtain initial text containing prosody identifiers; determining at least one text unit corresponding to the initial text, and determining time information corresponding to each of the at least one text unit based on the audio data; updating the initial text in the prosody identifier dimension based on the time information corresponding to each of the at least one text unit to obtain target text corresponding to the audio data, wherein the target text and the audio data are used to train an audio generation model.

27 results about "Prosody" patented technology

Popular searches