How to Speech interaction?

Speech interruption decision method and system based on multi-granularity semantic completeness prediction,Cross-modal attention fusion method and device based on voiceprint features and language semantics,Cross-modal attention fusion method and device based on voiceprint features and language semantics,An interaction method, apparatus, product, electronic device, and medium,Speech data processing method and system based on large language model

Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

17 results about "Speech interaction" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Speech interruption decision method and system based on multi-granularity semantic completeness prediction

PendingCN122313981ASemantic featureData mining

This invention discloses a speech interruption decision-making method and system based on multi-granularity semantic integrity prediction, belonging to the field of speech interaction technology. The method includes: extracting multi-granularity semantic features and combining them into a current multi-granularity semantic feature vector; calculating the offset of the current multi-granularity semantic feature vector with a standard semantic pause model to obtain a semantic offset feature vector; performing similarity matching and weighted voting between the semantic offset feature vector and a historical interruption decision case library to obtain the current predicted interruption confidence; and generating an interruption command when the current predicted interruption confidence exceeds a dynamic threshold. This invention solves the problems of existing technologies where speech interruption decisions rely on simple energy thresholds or semantic integrity probabilities, lack utilization of user expression habits and contextual experience, and have low decision accuracy by introducing a standard semantic pause model and a historical case analogy reasoning mechanism, thus achieving more intelligent and accurate speech interruption judgment.

Speech interruption decision method and system based on multi-granularity semantic completeness prediction

View all

Owner:GUANGZHOU JIUSI INTELLIGENT TECH CO LTD

Cross-modal attention fusion method and device based on voiceprint features and language semantics

PendingCN122266366ASpeech recognitionPersonalizationFeature extraction

The application relates to the technical field of speech recognition, and discloses a cross-modal attention fusion method and device based on voiceprint features and language semantics, which comprises the following steps: performing a feature extraction operation on speech data to obtain voiceprint features; based on a sentiment perception mask mechanism and a context sentiment memory unit, fusing sentiment features into text data to obtain semantic features; constructing a cross-modal relationship graph of the voiceprint features and the semantic features; determining a time sequence dependency relationship between the voiceprint features and the semantic features based on the edge weights generated by the cross-modal relationship graph; projecting the voiceprint features and the semantic features into the same semantic space based on the time sequence dependency relationship, and performing a feature reconstruction fusion operation based on a reconstruction loss function to obtain voiceprint and semantic fusion features; and adjusting a dialogue strategy determined based on the voiceprint and semantic fusion features; and using the adjusted dialogue strategy to generate interactive response data with historical interactive data, so that a high-accuracy and personalized speech interaction experience is realized.

Cross-modal attention fusion method and device based on voiceprint features and language semantics

View all

Owner:GUANGDONG GUANGXIN COMM SERVICES COMPANY

Cross-modal attention fusion method and device based on voiceprint features and language semantics

ActiveCN122266366BPersonalizationFeature extraction

View all

Owner:GUANGDONG GUANGXIN COMM SERVICES COMPANY

An interaction method, apparatus, product, electronic device, and medium

PendingCN122266353ADigital data information retrievalSpeech recognitionEngineeringHuman–robot interaction

The present disclosure provides an interaction method, device, product, electronic equipment and medium, relating to the technical field of artificial intelligence. The method comprises: receiving input information of a user, generating a control result according to the associated information of the input information; injecting the control result into the decoding process of a speech interaction model used to generate target speech as a constraint condition of the model output; generating target speech matched with the control result based on the speech interaction model, forming a mandatory constraint on the generation direction by injecting the control result through the model, enabling the model to follow the control result when generating target speech, actively regulating the speech generation process, and making the generated target speech matched with the interaction demand of the input information, thereby improving the pertinence and rationality of human-computer interaction.

An interaction method, apparatus, product, electronic device, and medium

View all

Owner:BEIJING XIAOMI MOBILE SOFTWARE CO LTD +1

Speech data processing method and system based on large language model

PendingCN122177097ASpeech recognitionDynamic reasoningAcoustics

The application discloses a speech data processing method and system based on a large language model, relates to the technical field of speech data processing, and comprises the following steps: performing multi-channel feature decomposition on a received original speech signal, and constructing an acoustic state representation tensor; constructing a semantic candidate distribution space, generating multiple sets of semantic hypothesis vectors, and constructing a semantic evolution path graph; generating a semantic uncertainty function representing semantic ambiguity and speech disturbance sensitivity; dynamically constructing a reasoning depth control parameter and inputting the same to a multi-layer reasoning path scheduling unit of the large language model, constructing an intention structure vector, and mapping the intention structure vector into a structured semantic output. The technical problems that in the prior art, under a complex acoustic environment, it is difficult to accurately and effectively separate acoustic features, leading to low speech understanding accuracy of high ambiguity, and lacking dynamic reasoning ability to cope with semantic uncertainty risks are solved, and the technical effects of improving semantic understanding precision, ambiguity resolution ability of speech interaction, and reducing business misjudgment rate and risk are achieved.

Speech data processing method and system based on large language model

View all

Owner:GUANGDONG JINWAN INFORMATION TECH CO LTD

Voice interaction method, device and electronic equipment

ActiveCN121075331BSpeech recognitionSpoken languageIntent recognition

The application provides a speech interaction method and device and electronic equipment, and relates to the technical field of speech processing. The method comprises the following steps: acquiring speech information input by a user, and acquiring historical intention text of the user; inputting the speech information into a speech encoder of a spoken language understanding model to obtain acoustic coding features output by the speech encoder; inputting the historical intention text into a text encoder of the spoken language understanding model to obtain text coding features output by the text encoder; inputting the acoustic coding features and the text coding features into an intention recognition module of the spoken language understanding model to obtain an intention recognition result output by the intention recognition module, so as to be used for speech interaction. The application can improve the accuracy of the intention recognition result and accurately acquire the real intention of the user.

Voice interaction method, device and electronic equipment

View all

Owner:HANGZHOU QIUGUOJIHUA TECHNOLOGY CO LTD

A multimodal cognitive input driven adaptive memory engine construction method and optimization system

PendingCN122333342AData packEngineering

The application discloses a kind of multi-modal cognitive input driven adaptive memory engine construction method and optimization system, it is related to artificial intelligence technical field, including the following steps, S1, acquisition and pre-processing multi-modal cognitive input data, the multi-modal cognitive input data includes text semantic data, visual perception data, speech interaction data and environment state data, to each kind of data is respectively normalized noise reduction processing, feature extraction and modal feature alignment processing between it.This open and close tank cap assembly and tire curing tank, by collecting and pre-processing text, vision, voice and environment state and so on Multi-modal data, modal feature alignment is realized using cross-modal attention mechanism, eliminate modal heterogeneity, can accurately capture user multidimensional cognitive demand, at the same time, through hierarchical intent recognition and scene understanding, combined with the hierarchical retrieval strategy of three-layer memory structure, the real-time and integrity of memory response are considered.

A multimodal cognitive input driven adaptive memory engine construction method and optimization system

View all

Owner:EMDOOR ELECTRONICS SCINENCE & TECH CO LTD

An environment perception intelligent assistance method and device based on speech recognition and a medium

PendingCN122369439ASyllableEnvironmental perception

This invention discloses an environmental perception intelligent assistance method, device, and medium based on speech recognition, relating to the field of intelligent speech interaction technology. The method includes: acquiring continuous speech signals and obtaining speech segments through framing, spectral denoising, and silence detection; dividing syllable boundaries; extracting syllable sequences based on the syllable boundaries and converting them into pinyin; generating a candidate path set by combining a vocabulary dictionary; reading the candidate path set; matching the word texts in the candidate paths with the syllable sequences to obtain candidate word segments; grouping and filtering the candidate word segments to generate stable segments; sorting the stable segments in chronological order to obtain a stable segment set; determining a state word group splicing threshold based on the stable segment set; splicing and summarizing the stable segments to obtain a state word group set; determining a state sequence based on the continuity relationship between state word groups in the current speech segment and the previous speech segment; and sorting the state sequence in chronological order to obtain a state sequence set. This invention improves the accuracy of semantic information extraction.

An environment perception intelligent assistance method and device based on speech recognition and a medium

View all

Owner:BOYIN HEARING EQUIP (SUZHOU) CO LTD

Speech synthesis method and system based on bone conduction signal and lip image fusion

ActiveCN116343793Bretain explanatory powerPreserve high-quality anti-noise performanceCharacter and pattern recognitionBiological modelsGenerative adversarial networkSpeech input

The present application relates to a kind of speech synthesis method and system based on bone conduction signal and lip image fusion, comprising the following steps: bone conduction signal, lip movement image signal are synchronously acquired when user speech input is collected;Determine the single-mode data characteristics of time domain and spatial domain based on bone conduction signal, lip movement image signal;Based on the two-source single-mode data characteristics of time domain and spatial domain determined, apply the generative adversarial network of cross-modal attention mechanism and mel-spectrogram fusion method, establish speech model, obtain modal collaborative feature expression;Based on the modal collaborative feature expression obtained, it can be recognized as specific phrase and instruction output by neural network model, and speech synthesis is realized using vocal synthesis model.The above algorithm realizes the commonality of modal collaborative representation, makes up the representation defect problem of single-mode independent existence, optimizes the effect of speech synthesis under high noise interference or mute mode, so as to expand the realizability of speech interaction.

View all

Owner:NAT INNOVATION INST OF DEFENSE TECH PLA ACAD OF MILITARY SCI

Server-side processing method and server for actively initiating a conversation, as well as a speech interaction system with the ability to actively initiate a conversation.

ActiveDE602020073023T2Digital data information retrievalAutomatic exchangesMedicineProcessing

Owner:AISPEECH CO LTD

Speech interaction method based on multi-modal noise suppression

PendingCN122135696ASpeech recognitionInference methodsData matchingSound sources

The application discloses a speech interaction method based on multi-modal noise suppression and belongs to the technical field of sound extraction. The application controls a microphone to collect speech data in a dialogue scene, and controls an image acquisition device to shoot a lip movement video in the dialogue scene; determines a data matching result according to the speech data, the lip movement video and a preset large language model, wherein the data matching result comprises a time matching result and a sound source matching result; and determines the de-noising result according to the data matching result and the speech data, thereby realizing the beneficial effect of improving the accuracy of speech recognition.

View all

Owner:TIANHE COLLEGE GUANGDONG POLYTECHNIC NORMAL UNIV

A method, apparatus, device, and readable storage medium for speech segmentation.

PendingCN122313977ASpeech segmentationConfusion

This application discloses a speech segmentation method, apparatus, device, and readable storage medium, relating to the field of speech interaction technology. It includes: performing real-time semantic analysis on the user's speech stream locally on the terminal device to determine the semantic integrity status, and dynamically selecting a target threshold from a preset set of silence thresholds to extract audio segments and send them to the cloud; the cloud performs endpoint detection on the audio segments, obtains the start and end timestamps of the speech, and calculates the time interval between the current segment and the previous segment; when the interval is less than or equal to a preset listening window, it performs a semantic correlation judgment on the two segments based on rules and a semantic model, and decides whether to splice them. This application solves the problems of response delay and semantic confusion caused by fixed thresholds through semantically driven dynamic threshold adjustment and cloud-based semantic-level splicing decision-making, thereby improving the response speed and recognition accuracy of speech interaction.

A method, apparatus, device, and readable storage medium for speech segmentation.

View all

Owner:AISPEECH CO LTD

Speech interaction method, system and device based on multi-modal large model and storage medium

PendingCN122116895ASemantic analysisSpeech recognitionPersonalizationSpeech sound

The embodiment of the application relates to the technical field of data processing, and discloses a voice interaction method, system and device based on a multi-modal large model and a storage medium, the method comprising: receiving a language instruction of a user; converting the language instruction into voice text; determining whether the voice text is a fixed instruction, if yes, directly executing, otherwise, executing the next step; obtaining visual information of a current interface, the multi-modal large model performing semantic understanding and information acquisition according to the voice text and the visual information, realizing user intention reasoning, and outputting the reasoning result; and displaying the output result. For non-fixed instructions, the multi-modal large model performs semantic understanding and information acquisition according to the voice text and the visual information, and realizes user intention reasoning. The reasoning ability of the multi-modal large model is used to dynamically adapt to diversified and personalized voice demands of users without modifying system codes or expanding a preset instruction library.

Speech interaction method, system and device based on multi-modal large model and storage medium

View all

Owner:HUIZHOU DESAY SV AUTOMOTIVE

Intelligent assessment methods, devices, equipment, and storage media for classroom teacher-student interaction

ActiveCN121640156Bimprove accuracyEffectively reflect the classroom atmosphereData scienceData structure

This application belongs to the field of educational informatization technology, specifically disclosing an intelligent evaluation method, device, equipment, and storage medium for classroom teacher-student interaction. Through this application, verbal interaction features of teachers and students in the classroom are determined based on recognition results; non-verbal interaction features are determined based on detection results; a quantitative matrix of classroom interaction is generated based on the verbal and non-verbal interaction features, and a classroom interaction feature tensor is constructed based on the quantitative matrix; intelligent evaluation of classroom teacher-student interaction is performed based on the classroom interaction feature tensor using a target tensor clustering model. By using the above method, the verbal and non-verbal interaction features of teachers and students in the classroom are determined based on the teaching video to be evaluated, and then intelligent evaluation is performed based on a target tensor clustering model that can reveal high-dimensional interaction relationships while maintaining the integrity of the data structure, thereby effectively improving the accuracy and fairness of evaluating teacher-student interaction.

Intelligent assessment methods, devices, equipment, and storage media for classroom teacher-student interaction

View all

Owner:HUAZHONG NORMAL UNIV

Out-of-vehicle speech interaction method, out-of-vehicle speech interaction apparatus, and vehicle

PendingEP4769388A1AlarmsSpeech recognitionSpeech inputAcoustics

An out-of-vehicle speech interaction method, an out-of-vehicle speech interaction apparatus, and a vehicle. The out-of-vehicle speech interaction method comprises: when a vehicle is in a sentry mode and changes from a standby state to a non-standby state, broadcasting a speech prompt according to a preset rule (S10); acquiring a speech input from a user outside the vehicle (S20); and executing a response operation on the basis of the speech input from the user outside the vehicle (S30). The capability of a conventional sentry mode is reused, and the conventional sentry mode is combined with an out-of-vehicle speech interaction capability, thereby realizing an out-of-vehicle speech interaction experience triggered by a state change of the sentry mode.

Out-of-vehicle speech interaction method, out-of-vehicle speech interaction apparatus, and vehicle

View all

Owner:GEELY AUTOMOBILE INST (NINGBO) CO LTD

17 results about "Speech interaction" patented technology

Popular searches