Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-user dialogue audio recognition method and system based on machine learning

A technology of machine learning and recognition methods, applied in the computer field, can solve the problems of not considering the problem of character recognition, not considering personalized scenes, segmentation and poor accuracy of character recognition, etc.

Inactive Publication Date: 2017-11-17
广州心语心伴互联网信息服务有限公司
View PDF13 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The method based on statistical distance has the advantage that it does not need to go through the step of training the model with sample data, and directly assumes that there are differences in the distribution of the Gaussian Mixture Model (GMM) of the acoustic model of different people in a short period of time, and then based on this difference, it can be segmented. This method can be applied to any voice role segmentation task; however, the method based on statistical distance has obvious disadvantages. It is a general segmentation method that does not consider personalized scenes, and does not consider the problem of character recognition. The accuracy of segmentation and character recognition is relatively low. Difference

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-user dialogue audio recognition method and system based on machine learning
  • Multi-user dialogue audio recognition method and system based on machine learning
  • Multi-user dialogue audio recognition method and system based on machine learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] Embodiments of the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings. The following examples are only used to more clearly illustrate the technical solution of the present invention, and therefore are only examples, and cannot limit the protection scope of the present invention with this.

[0055] It should be noted that, unless otherwise specified, the technical terms or scientific terms used in this application shall have the usual meanings understood by those skilled in the art to which the present invention belongs.

[0056] figure 1 It shows a flow chart of a method for providing machine learning-based multi-person dialogue audio role recognition provided by the first embodiment of the present invention. The multi-person dialogue audio role recognition method based on machine learning of the present embodiment specifically includes the following steps: the voice data with label is trained u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a multi-user dialogue audio role recognition dialogue audio method based on machine learning. The method specifically includes steps: training voice data with marks by employing a UBM-GMM algorithm to obtain a UBM-GMM model; performing secondary segmentation on to-be-recognized voice data, and clustering voice to obtain voice samples with clustering marks; then extracting a part of the voice samples with the clustering marks as samples to be put into the UBM-GMM model, and performing alignment training to obtain an alignment training model; performing identity recognition according to the alignment training model to obtain collected voice clips with the same identity, and classifying the voices with the same identity; and outputting the voice data of each person in a dialogue. According to the method, roles in the audio are intelligently sampled for the role recognition model training, the precision of voice segmentation and role restoration is greatly improved, automatic optimization of the model is realized, and problems of low segmentation and recognition precision and failure of dynamic optimization of the conventional method are solved.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a machine learning-based multi-person dialogue audio recognition method and system. Background technique [0002] Dialogue audio contains two or more character dialogues, from which it is very important to identify and extract the words spoken by each character and convert them into text dialogues, which is of great significance for the in-depth analysis and application of audio content. [0003] Existing dialogue segmentation techniques are mainly based on statistical distance methods, such as Bayesian Information Criterion (BIC) and Generalize Likelihood Ratio (GLR). The method based on statistical distance has the advantage that it does not need to go through the step of training the model with sample data, and directly assumes that there are differences in the distribution of the Gaussian Mixture Model (GMM) of the acoustic model of different people in a short perio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/04G10L15/06G10L15/18G10L17/04G06K9/62
CPCG10L15/04G10L15/063G10L15/18G10L17/04G06F18/23
Inventor 谢兵龚永源
Owner 广州心语心伴互联网信息服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products