Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

58 results about "Speech segmentation" patented technology

Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken natural languages. The term applies both to the mental processes used by humans, and to artificial processes of natural language processing.

Bimodal man-man conversation sentiment analysis system and method thereof based on machine learning

ActiveCN106503805AFeatures are comprehensive and thoughtfulImprove accuracySemantic analysisMachine learningSpeech segmentationSingle sentence
The invention comprises a bimodal man-man conversation sentiment analysis system and a bimodal man-man conversation sentiment analysis method based on machine learning. The bimodal man-man conversation sentiment analysis system is characterized by comprising a speech recognition module, a text deep-layer feature extraction module, a speech segmentation module, an acoustic feature extraction module, a feature fusion module and an sentiment analysis module, wherein the speech recognition module is used for recognizing speech content and a time label; the text deep-layer feature extraction module is used for completing the extraction of text deep-layer word level features and text deep-layer sentence level features; the speech segmentation module is used for segmenting single sentence speech from entire speech; the acoustic feature extraction module is used for completing the extraction of acoustic features of the speech; the feature fusion module is used for fusing the obtained text deep-layer features with the acoustic features; and the sentiment analysis module is used for acquiring sentiment polarities of the speech to be subjected to sentiment analysis. The bimodal man-man conversation sentiment analysis method can integrate the text and audio modals for recognizing conversation sentiment, and fully utilizes features of word vectors and sentence vectors, thereby improving the precision of recognition.
Owner:山东心法科技有限公司

Emotional Chinese text human voice synthesis method

The invention discloses an emotional Chinese text human voice synthesis method, which mainly comprises the steps of: (1) constructing an emotional corpus; (2) and performing emotional speech synthesisbased on waveform splicing. The emotional corpus establishment is mainly implemented by the steps of: (11) segmenting terms and acquiring parts of speech of the terms; (12) performing speech segmentation, and acquiring audio data corresponding to segmented terms based on speech data features and text corpora; (13) and performing emotion analysis, and acquiring emotional feature values of terms, clauses and whole sentences based on text term segmentation and audio features. The emotional speech synthesis based on waveform splicing is implemented by the steps of: (21) segmenting terms and performing emotion analysis on a text to be synthesized, and acquiring parts of speech of words, sentence patterns and emotional features in the text to be synthesized; (22) selecting the optimal corpus, and carrying out matching to obtain the optimal corpus set based on text eigenvalues; (23) and perfomring speech synthesis and waveform splicing, extracting a word audio sequence set from the corpus set, and synthesizing the audio to output a final speech. The emotional Chinese text human voice synthesis method is used for synthesizing and outputting a true human voice speech with emotional features.
Owner:SOUTHEAST UNIV

Single channel-based non-supervision target speaker speech extraction method

The embodiment of the invention discloses a single channel-based non-supervision target speaker speech extraction method comprising a teacher language detection step and a teacher language model training step; the teacher language detection step comprises the following parts: obtaining speech data from a classroom recording; processing speech signals; speech segmentation and modeling, the speech segmentation comprises steps of segmenting the classroom speech at equal length, aiming at each segment of speech and extracting corresponding MFCC features, and building each segment speech GMM modelaccording to the MFCC features; teacher speed detection, calculating the similarity between the GMM model of each segment speech except for teacher speech types and a GGMM, tagging the GMM models smaller than a set threshold as teacher speech types, thus obtaining the final teacher speech types; the teacher language GGMM model training step comprises the following parts: clustering the speech dataobtained in S3; obtaining an initial teacher speech type, and extracting the GGMM model according to the initial teacher speech type. The method can effectively improve the system adaptability and intelligence in real applications, thus laying foundation for following applications and researches.
Owner:SHANTOU UNIV

System and method for detecting identity impersonation in telephone satisfaction survey

The invention provides a system and a method for detecting identity impersonation in telephone satisfaction survey and provides a solution for the following problems: identity impersonation detection can be carried out only for single-channel telephone speech in previous telephone satisfaction surveys, the method for speech processing is rough, telephone survey speech contains a variety of non-effective speeches such as noise and ring-back tone, and the like. The system of the invention is composed of a to-be-detected speech library 101, a preprocessing module 102, a speaker speech segmentation module 103, a respondent voiceprint library 104, a voiceprint training module 105, a respondent speech database 106, a verification speech selection module 107, a respondent verification speech library 108, a voiceprint verification module 109, a score statistical analysis module 110 and a detection report generation module 111. Identity impersonation is detected by using the voiceprint recognition technology and the speaker speech segmentation technology, and a clear and readable identity impersonation detection report is given finally to be reflected on the authenticity of survey data in telephone satisfaction survey.
Owner:XIAMEN KUAISHANGTONG INFORMATION TECH CO LTD

Interactive teaching method and device, storage medium and electronic equipment

The invention relates to an interactive teaching method and device, a storage medium and electronic equipment, and the method comprises the steps: obtaining teaching interaction audio information collected by an audio collection device under the condition that an interaction request message is received, wherein the teaching interaction audio information comprises audio information sent by teachers or audio information sent by students; performing voice segmentation processing on the teaching interaction audio information to obtain multiple pieces of sub-audio information; obtaining a text recognition result corresponding to the teaching interaction audio information according to a voice recognition model obtained through pre-training of the multiple pieces of sub-audio information, and removing a long short-term memory (LSTM) network layer from a model structure of the voice recognition model; generating a teaching material query instruction according to the text recognition result; querying a target material matched with the teaching interaction audio information according to the teaching material query instruction; and adding the target material to the currently displayed teaching courseware so as to display the target material searched according to the teaching interaction audio information.
Owner:NEW ORIENTAL EDUCATION & TECH GRP CO LTD

Speech segmentation, recombination and output method and device

The invention discloses a speech segmentation, recombination and output method and device.The speech segmentation, recombination and output method and device are used for reducing occupied IO port resources and reducing production and manufacturing cost while providing a speech content alteration function.The speech segmentation, recombination and output device comprises a flash memory module and a single-chip microcomputer module.The flash memory module is used for storing several segments of speech broadcasting information and an address directory corresponding to the segments of speech broadcasting information.The single-chip microcomputer module comprises a speech segmentation unit, a segment processing unit and a reading unit, wherein the speech segmentation unit is used for decomposing target speech information into several ordered segments of speech information, the segment processing unit is used for converting the segments of speech information decomposed by the speech segmentation unit into address directory numbers in order, and the reading unit is used for reading and outputting the corresponding speech broadcasting information in the flash memory module in order according to the converted address directory numbers, so that recombination and output of segmented speech are realized.
Owner:苏州蓝博控制技术有限公司

Rail transit standard entity relationship automatic completion method based on artificial intelligence

The invention discloses a rail transit standard entity relationship automatic completion method based on artificial intelligence. The method comprises the steps of constructing an entity relationshipcompletion model, inputting the rail transit specification and part-of-speech segmentation of nouns into an entity relationship completion model; judging whether the input specification is a simple sentence or not; if yes, searching entity related attributes in the rail transit specification; generating an entity relationship triple; if not, extracting later sentence attribute words and entities of the rail transit specification, matching the former sentence entities and later sentence attribute words in a n:n manner, or judging whether the the former sentence grammar is subject-verb-object ornot and the latter sentence grammar is object complement; if yes, directly matching the former sentence entities with the objects and directly matching the later sentence keywords with the object entities to generate entity relationship triples, and if not, outputting the entities of which the vocabulary relevancy exceeds a threshold value and the entity relationships to generate the entity relationship triples to obtain complete semantic structure entity specifications, thereby finishing automatic completion of the rail transit specification entity relationships.
Owner:XIAN UNIV OF TECH

An automatic generation system for character Chinese lip animation

The invention discloses a system for automatically generating animations of characters' Chinese mouth shapes. The dialogue text filtering and coding module performs phrase segmentation, pinyin mouth shape coding, overall recognition and reading mark setting and coding filtering on the dialogue text, generates and outputs dialogue mouth shape codes, and dialogue overall Recognition and reading code identification and dialogue mouth shape filter coding sequence; dialogue speech segmentation module conducts speech sampling and speech energy statistics on dialogue audio, generates and outputs dialogue speech segmentation candidate result sequence; dialogue segmentation coding integration module connects dialogue text filter coding The module and the dialogue voice segmentation module integrate and correct the dialogue voice segmentation candidate result sequence, generate and output the dialogue segmentation code sequence; the character Chinese mouth animation generation module is connected with the dialogue segmentation coding integration module, The sequence generates and outputs the character's Chinese lip animation. In the processing process of the invention, the production of the Chinese mouth animation of the whole character can be automatically completed without loading the corresponding voice library.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Keyword search method based on unified representation

The invention belongs to the technical field of voice signal processing, and in particular relates to a keyword search method based on unified representation. The method comprises the following steps:training a neural network voice auto-encoder with a bottleneck layer by adopting abundant voice data, thus obtaining an acoustics representation vector extractor; training a neural network text auto-encoder with a bottleneck layer by adopting abundant text data, thus obtaining a language representation vector extractor; extracting corresponding acoustics representation vectors and language representation vectors respectively by adopting abundant voice data segments and corresponding text data segments for training a unified vector extractor; acquiring inquiring vectors of a text keyword through the language representation vector extractor and the unified vector extractor; acquiring inquiring vectors of a voice keyword through the acoustics representation vector extractor and the unified vector extractor; and for to-be-inquired voice, acquiring a plurality of index vectors in segments in the sequence through the acoustics representation vector extractor and the unified vector extractor, and calculating the distances among the inquiring vectors, wherein if the value is smaller than a preset threshold, considering that the keyword is hit.
Owner:TSINGHUA UNIV

Voice separation method, voice separation device, electronic equipment and storage medium

The application provides a speech separation method, a speech separation device, electronic equipment, and a storage medium. The speech separation method includes: obtaining the original audio, and extracting the spectrogram feature sequence from the original audio in a time window sliding window; The graph feature sequence is input into the pre-trained speech segmentation model, and the embedded feature sequence is obtained through the speech segmentation model; the embedded feature sequence is input into the pre-trained speech clustering model, and the corresponding embedding feature sequence is obtained through the speech clustering model The predicted label sequence; perform single-speaker voice restoration based on the predicted label sequence to generate separated speech. According to the speech separation method, speech separation device, electronic equipment and storage medium of the present application, the problem of unsatisfactory speech separation effect can be solved, and the speech segment belonging to a single speaker can be separated from the short-term speech audio file in which multiple people speak alternately, And it can accurately estimate the number of speakers in conjunction with contextual information.
Owner:北京远鉴信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products