Sichuan dialect identification method and apparatus, acoustic model training method and apparatus, and equipment

An acoustic model and training method technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of high recognition error rate and low recognition efficiency, achieve good recognition effect and reduce training time.

Inactive Publication Date: 2018-11-16
SICHUAN UNIV
View PDF5 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The acoustic model training method, device, equipment and medium provided by the embodiments of the present invention can solve the technical problem of high recognition error rate existing in models in the prior art
[0004] The Sichuan dialect recognition method, device, equipment and medium provided by the embodiments of the present invention can solve the technical problem of low recognition efficiency in dialect recognition in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sichuan dialect identification method and apparatus, acoustic model training method and apparatus, and equipment
  • Sichuan dialect identification method and apparatus, acoustic model training method and apparatus, and equipment
  • Sichuan dialect identification method and apparatus, acoustic model training method and apparatus, and equipment

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0029] see figure 1 , is a flow chart of the acoustic model training method provided by the embodiment of the present invention. The following will be figure 1 The specific process shown will be described in detail.

[0030] Step S101, collecting voice data of Sichuan dialect.

[0031]Wherein, the speech data of Sichuan dialect can be collected through a microphone or the speech data of Sichuan dialect in a video can be collected through a software tool. For example, the user inputs a piece of voice data through the microphone of the mobile phone, or intercepts the voice data played in the video. Here, no specific limitation is made.

[0032] Wherein, the Sichuan dialect voice data is for the Sichuan dialect.

[0033] Wherein, the length of the Sichuan dialect speech data may be multiple frames or one frame. Preferably, the length of the Sichuan dialect voice data is multiple frames.

[0034] In this embodiment, the Sichuan dialect speech data may be speech signals in w...

no. 2 example

[0074] Corresponding to the sound model training method in the first embodiment, image 3 It shows a voice model training device that adopts the voice model training method shown in the first embodiment in one-to-one correspondence. Such as image 3 The acoustic model training device 400 shown includes an acquisition module 410 , a feature extraction module 420 , a first training module 430 and a second training module 440 . Among them, the implementation functions of the acquisition module 410, the feature extraction module 420, the first training module 430 and the second training module 440 are in one-to-one correspondence with the corresponding steps in the first embodiment. stated.

[0075] The obtaining module 410 is used for collecting speech data of Sichuan dialect.

[0076] The feature extraction module 420 is configured to perform feature extraction on the speech data of the Sichuan dialect to obtain speech features.

[0077] As an implementation manner, the feat...

no. 3 example

[0084] see Figure 4 , is a flow chart of the Sichuan dialect recognition method provided by the embodiment of the present invention. The following will be Figure 4 The specific process shown will be described in detail.

[0085] Step S201, acquiring voice data input by a user.

[0086] Wherein, user input sound data may be collected through a microphone. For example, a user inputs a piece of voice data through a microphone of a mobile phone.

[0087] Wherein, the length of the voice data may be multiple frames or one frame.

[0088] In this embodiment, the voice data may be voice signals in wav, mp3 or other formats. Here, no specific limitation is made.

[0089] Step S201 , using the target acoustic model obtained by the acoustic model training method and the preset Sichuan dialect dictionary and language model to recognize the speech data, and obtain a recognition result.

[0090] Wherein, the Sichuan dialect dictionary is generated in advance based on a large amoun...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention, which belongs to the field of speech recognition technology, provides a Sichuan dialect identification method and apparatus, an acoustic model training method and apparatus, and equipment. The acoustic model training method comprises: collecting Sichuan dialect speech data; carrying out feature extraction on the Sichuan dialect speech data to obtain speech features; training the speech features by using a hidden Markov model-Gaussian mixed model, acquiring a classification label corresponding to each frame of speech features and generating to-be-processed speech features with the classification labels; and training the to-be-processed speech features by using a depth delay LSTM model to obtain a target acoustic model. With the acoustic model training method provided by the invention, the time needed by acoustic model training is saved effectively; the training efficiency is improved; and the recognition efficiency and the recognition accuracy are improved.

Description

technical field [0001] The present invention relates to the technical field of speech recognition, in particular, to a Sichuan dialect recognition method, an acoustic model training method, a device and equipment. Background technique [0002] The speech recognition task is to project an acoustic signal containing natural language pronunciation onto the speaker's word sequence. From the 1980s to 2009-2012, the state-of-the-art speech recognition system combined hidden Markov model (HMM) and Gaussian mixture model (GMM). The effect cannot reach the commercial level. In 2009, Hinton introduced deep neural network (deep neural network, DNN) to scholars doing speech recognition. In 2010, a huge breakthrough occurred. With the joint efforts of the Hinton team, Microsoft, Google, and IBM research teams, the relative recognition error rate dropped by nearly 30% by replacing GMM with DNN. Subsequent speech recognition research is almost entirely based on deep learning, mostly imp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/14G10L15/16G10L15/08G10L15/06G10L15/26G10L15/02
CPCG10L15/02G10L15/063G10L15/08G10L15/144G10L15/16G10L15/26G10L2015/0633
Inventor 张蕾应汪洋章毅郭际香陈媛媛
Owner SICHUAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products