Spoken language understanding method based on Dirichlet variational auto-encoder and related equipment
A self-encoder and spoken language understanding technology, applied in the computer field, can solve problems such as poor robustness and diversity of sentences, affecting the effect of spoken language understanding, data scarcity, etc., and achieve the effect of reducing the cost of labeling
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0033] The first embodiment of the present invention provides a spoken language comprehension method based on Dirichlet variational autoencoder, please refer to figure 1 , The method includes:
[0034] S11. Use the Dirichlet variational autoencoder to sample the training corpus to generate a sampled corpus;
[0035] S12. Perform data enhancement based on the sampled corpus;
[0036] S13. Generate training corpus.
[0037] According to the inventor's research, the joint learning model is similar to most natural language processing tasks and faces a serious problem of data scarcity. In addition, the nearly unlimited domain space in the spoken language comprehension data set and the labor-intensive labeling tasks make the sparsity problem even more serious. However, traditional data enhancement and generation methods rely on enhancement / generation functions, and the generated sentences are usually less robust and diverse. This will cause the joint learning model to have problems such ...
Embodiment 2
[0083] The second embodiment of the present invention provides a system, which includes:
[0084] The sampling corpus generation module is configured to use the Dirichlet variational autoencoder to sample the training corpus to generate a sample corpus;
[0085] A data enhancement module configured to perform data enhancement according to the sampled corpus;
[0086] The training corpus generation module is configured to generate training corpus.
[0087] In the second embodiment of the present invention, the sample corpus generation module specifically includes:
[0088] The first sub-module is configured to initialize an empty corpus M given the number of sampled corpora n; the second sub-module is configured to loop S1121-S1124: S1121, select one when the number of corpora in M is less than n Real word sequence w; S1122, infer approximate posterior parameters by inverse gamma distribution function approximation method S1123, through variational distribution q φ (w|z) sampling S1...
Embodiment 3
[0093] It should be noted that, based on the same inventive communication as in the first and second embodiments above, the third embodiment of the present invention provides a device including: a radio frequency (RF) circuit 310, a memory 320, an input unit 330, and a display Unit 340, audio circuit 350, WiFi module 360, processor 370, power supply 380 and other components. Wherein, the memory 320 stores a computer program that can run on the processor 370, and the processor 370 implements step S110, step S120, step S130, step S140, and step S150 described in the first embodiment when the processor 370 executes the computer program; Or implement step S210, step S220, step S230, step S240, step S250, and step S260 described in the second embodiment; or implement step S301, step S302, step S303, and step S304 described in the third embodiment.
[0094] In the specific implementation process, when the processor executes the computer program, any one of the implementation modes in t...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


