Spoken language understanding method based on Dirichlet variational auto-encoder and related equipment

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A self-encoder and spoken language understanding technology, applied in the computer field, can solve problems such as poor robustness and diversity of sentences, affecting the effect of spoken language understanding, data scarcity, etc., and achieve the effect of reducing the cost of labeling

Active Publication Date: 2020-09-29

JIANGHAN UNIVERSITY

View PDF5 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] However, the federated learning model, like most natural language processing tasks, faces serious data scarcity problems

In addition, the near-infinite domain space and labor-intensive labeling tasks in spoken language comprehension datasets make the sparsity problem even more serious

However, traditional data enhancement and generation methods rely on enhancement / generation functions, and the generated sentences are usually less robust and diverse.

This will lead to problems such as overfitting and lack of generalization ability in the joint learning model, thereby affecting the effect of spoken language understanding, which is also the key problem to be solved by the present invention

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0033] The first embodiment of the present invention provides a spoken language comprehension method based on Dirichlet variational autoencoder, please refer to figure 1 , The method includes:

[0034] S11. Use the Dirichlet variational autoencoder to sample the training corpus to generate a sampled corpus;

[0035] S12. Perform data enhancement based on the sampled corpus;

[0036] S13. Generate training corpus.

[0037] According to the inventor's research, the joint learning model is similar to most natural language processing tasks and faces a serious problem of data scarcity. In addition, the nearly unlimited domain space in the spoken language comprehension data set and the labor-intensive labeling tasks make the sparsity problem even more serious. However, traditional data enhancement and generation methods rely on enhancement / generation functions, and the generated sentences are usually less robust and diverse. This will cause the joint learning model to have problems such ...

Embodiment 2

[0083] The second embodiment of the present invention provides a system, which includes:

[0084] The sampling corpus generation module is configured to use the Dirichlet variational autoencoder to sample the training corpus to generate a sample corpus;

[0085] A data enhancement module configured to perform data enhancement according to the sampled corpus;

[0086] The training corpus generation module is configured to generate training corpus.

[0087] In the second embodiment of the present invention, the sample corpus generation module specifically includes:

[0088] The first sub-module is configured to initialize an empty corpus M given the number of sampled corpora n; the second sub-module is configured to loop S1121-S1124: S1121, select one when the number of corpora in M is less than n Real word sequence w; S1122, infer approximate posterior parameters by inverse gamma distribution function approximation method S1123, through variational distribution q φ (w|z) sampling S1...

Embodiment 3

[0093] It should be noted that, based on the same inventive communication as in the first and second embodiments above, the third embodiment of the present invention provides a device including: a radio frequency (RF) circuit 310, a memory 320, an input unit 330, and a display Unit 340, audio circuit 350, WiFi module 360, processor 370, power supply 380 and other components. Wherein, the memory 320 stores a computer program that can run on the processor 370, and the processor 370 implements step S110, step S120, step S130, step S140, and step S150 described in the first embodiment when the processor 370 executes the computer program; Or implement step S210, step S220, step S230, step S240, step S250, and step S260 described in the second embodiment; or implement step S301, step S302, step S303, and step S304 described in the third embodiment.

[0094] In the specific implementation process, when the processor executes the computer program, any one of the implementation modes in t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a spoken language understanding method based on a Dirichlet variational auto-encoder, which belongs to the technical field of computers, and comprises the following steps: sampling training corpora by using the Dirichlet variational auto-encoder to generate a sampling corpus set; performing data enhancement according to the sampling corpus set; and generating training corpora. The semi-supervised learning method based on the Dirichlet variational auto-encoder is introduced into the modeling process of spoken language understanding, potential semantic features of original data are learned, high-quality new data are generated, the labeling cost is reduced, and the beneficial effect of improving the spoken language understanding model is achieved.

Description

Technical field [0001] The present invention relates to the field of computer technology, and in particular to a spoken language comprehension method and related equipment based on Dirichlet Variational Autoencoder. Background technique [0002] The task-based dialogue system is a human-computer interaction system that helps users complete specific tasks through multiple rounds of dialogue. This is a research direction that has received widespread attention and has broad application prospects. Currently, many research institutions and technology companies have been involved in the task-based dialogue system, such as Alibaba's Tmall Genie, Apple's Siri, and Microsoft's Xiaona. Spoken language comprehension is a core technology for constructing task-based dialogue systems. It is used to parse the natural language input by the user into a structured semantic expression that the computer can understand. This expression contains the semantic unit that best represents the user's inten...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L15/06G10L15/18G06F40/30G06F40/263

CPCG10L15/1822G10L15/063Y02D10/00

Inventor 高望朱珣邓宏涛王煜炜曾凡琮

Owner JIANGHAN UNIVERSITY

Spoken language understanding method based on Dirichlet variational auto-encoder and related equipment

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology