Method and system for extending spoken language dialogue system corpora

A dialogue system and corpus technology, applied in the creation of semantic tools, natural language data processing, special data processing applications, etc., can solve problems such as low recognition rate, failure to parse users, and failure to find semantic parsers, etc., to reduce maintenance costs , improve robustness, and enhance the effect of usability

Active Publication Date: 2017-05-10
SAMSUNG ELECTRONICS CHINA R&D CENT +1
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] In practical applications, the biggest problem with the natural language understanding unit is that due to the complexity of the language, the system cannot cover all grammars, and user sentences that are misrecognized or semantically incomprehensible still often appear
In particular, due to the differences in dialects, regional language habits, and user habits, it brings many difficulties to the establishment of the corpus of the spoken dialogue system. Often a very large corpus cannot correctly parse all the words spoken by users.
Therefore, even if the speech recognition system can recognize the sentence spoken by the user, after entering the natural language understanding unit, the semantic parser may not be able to find the corresponding rules to parse the sentence, resulting in a low overall recognition rate
Therefore, developers need to spend a lot of time updating and maintaining the corpus in order to ensure a high recognition rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for extending spoken language dialogue system corpora
  • Method and system for extending spoken language dialogue system corpora
  • Method and system for extending spoken language dialogue system corpora

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be further described in detail below with reference to the accompanying drawings and examples.

[0038] Aiming at the problems existing in the prior art, the present invention discloses a method for expanding the corpus beyond the original corpus of the spoken language parser. The method collects user sentences that cannot be parsed, and analyzes them again to obtain possibly correct candidate results. The user selects the candidate sentence that conforms to his discourse intention, and stores it in the exclusive secondary corpus created for the user (that is, the corpus that the present invention expands beyond the original corpus), expands the main corpus of the system (that is: the original corpus), and enhances the robustness of semantic analysis and improve the accuracy and coverage of the corpus. When the analysis by the semantic analys...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for extending spoken language dialogue system corpora. The method includes the steps that secondary semantic analysis is conducted on a sentence which cannot be analyzed to obtain a candidate analysis result; if a user selects a candidate from the candidate analysis result, a mapping rule is formed between the sentence and the candidate selected by the user, the generated rule is added into a user exclusive preprocessing rule library, and the candidate sentence selected by the user and corresponding semantic information are stored in an exclusive auxiliary corpus corresponding to the user; when the user sentence is analyzed, semantic analysis is conducted by means of the user exclusive preprocessing rule library and a rule-assisted main corpus generated on the basis of the auxiliary corpus. The invention further discloses a corresponding system. By means of the method and system, robustness of the corpora can be improved, the cost for maintaining the corpora is reduced, a spoken language system correction function is provided, and the usability of the spoken language system is enhanced.

Description

technical field [0001] This application relates to the expansion technology of the corpus in the spoken language dialogue system, in particular to the method and system for expanding the corpus of the spoken language dialogue system. Background technique [0002] A spoken dialogue system is a computer system that can communicate with people through voice. Spoken language dialogue system is a kind of dialogue system. Compared with general text dialogue system, it mainly has speech recognition and speech synthesis modules. [0003] The dialog system mainly consists of input recognizer / decoder, natural language understanding unit, dialog manager, task managers, natural It consists of a Natural LanguageGenerator unit and an output renderer. [0004] The core of the spoken dialogue system is the natural language understanding unit, which often contains a huge text corpus and consists of three main modules: proper noun recognition, part-of-speech tagging and semantic parser (Sem...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/36G06F40/289
Inventor 周进华崔计平
Owner SAMSUNG ELECTRONICS CHINA R&D CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products