Acoustic model training method and device, computer equipment, and storage medium

An acoustic model and training method technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problem of time-consuming and costly acoustic models, and achieve cost-saving and performance-improving effects

Active Publication Date: 2017-10-10
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF4 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Embodiments of the present invention provide an acoustic model training method and device, computer equipment, and storage media to solve the time-consuming and costly problems of training acoustic models in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Acoustic model training method and device, computer equipment, and storage medium
  • Acoustic model training method and device, computer equipment, and storage medium
  • Acoustic model training method and device, computer equipment, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0025] figure 1 It is a flowchart of the acoustic model training method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation where the acoustic model is obtained through training. The method can be executed by an acoustic model training device, which can use software and / or hardware way to achieve. Such as figure 1 As shown, the method specifically includes:

[0026] S101. Acquire supervised speech data and unsupervised speech data, wherein the supervised speech data is speech data with manual annotation, and the unsupervised speech data is speech data with machine annotation.

[0027] Specifically, the supervised speech data may be pre-labeled speech data manually, or pre-purchased manually-labeled speech data, or both. Unsupervised voice data can be obtained from online products on the Internet, for example, from anonymous user traffic such as Baidu search or Baidu input method. These unsupervised voice data have not been manua...

Embodiment 2

[0043] figure 2 It is a flow chart of the acoustic model training method provided by Embodiment 2 of the present invention. Embodiment 2 is further optimized on the basis of Embodiment 1. Such as figure 2 As shown, the method includes:

[0044] S201. Acquire supervised speech data and unsupervised speech data, wherein the supervised speech data is speech data with manual annotation, and the unsupervised speech data is speech data with machine annotation.

[0045] S202. Filter and screen the unsupervised voice data by means of confidence filtering.

[0046] The unsupervised voice data obtained directly from online products on the Internet usually have low-quality data, such as incomplete voice data, voice data with unclear noise, or commonly used voice data with low utilization value, etc. . Confidence filtering means may include, for example, user portraits, text features, or acoustic likelihoods. Through confidence filtering means, relatively high-quality voice data is ...

Embodiment 3

[0052] image 3 It is a structural schematic diagram of the acoustic model training device in Embodiment 3 of the present invention. Such as image 3 As shown, the acoustic model training device 3 includes:

[0053] The data acquisition module 310 is used to obtain supervised voice data and unsupervised voice data, wherein the supervised voice data is voice data with manual annotation, and the unsupervised voice data is voice data with machine annotation;

[0054] A feature extraction module 320, configured to extract voice features from the supervised voice data and unsupervised voice data;

[0055] The model training module 330 is used to use the network structure of deep learning to perform multi-task learning of supervised learning tasks and unsupervised learning tasks on the voice features of the supervised voice data and unsupervised voice data, so as to train and obtain acoustic Model.

[0056] In a preferred embodiment, the network structure of the deep learning in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses an acoustic model training method and device, computer equipment, and a storage medium; the method comprises the following steps: obtaining supervised voice data and non-supervision voice data, wherein the supervised voice data refers to voice data with artificial marks, and the non-supervision data refers to the voice data with machinery marks; extracting voice features from the supervised voice data and non-supervision voice data; using a depth learning network structure to respectively carry out multiple-task learning including a supervised learning task and a non-supervision learning task for the voice features of the supervised voice data and non-supervision voice data, and training to obtain an acoustic model. The half-supervision acoustic model training based on multiple-task learning can save the artificial marking voice data cost needed for training the acoustic model, and no expensive artificial marked voice data is needed, thus continuously improving voice recognition performance.

Description

technical field [0001] Embodiments of the present invention relate to speech recognition technology, and in particular to an acoustic model training method and device, computer equipment, and storage media. Background technique [0002] Speech technology has begun to change our way of life and work in recent years. Among them, speech recognition takes speech as the research object. Through speech signal processing and pattern recognition, the machine can automatically recognize and understand the language spoken by humans. It is a convenient man-machine The interactive mode is now widely used in mobile Internet and other fields, such as signal processing, pattern recognition, probability theory and information theory, sound mechanism and auditory mechanism, artificial intelligence and so on. Speech recognition technology is a technology that allows machines to convert speech signals into corresponding text or commands through the process of recognition and understanding. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06
CPCG10L15/063G10L2015/0631G10L15/16
Inventor 黄斌彭一平李先刚
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products