Method and device for identifying redundant components of spoken language

A spoken language and redundant technology, applied in the field of identifying redundant components in spoken language, can solve the problems of unclear definition and misidentification of redundant components in spoken language, and achieve the effect of facilitating ability expansion, reducing interference and good adaptability

Pending Publication Date: 2021-10-01
EMOTIBOT TECH LTD
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide a method and device for identifying redundant components of spoken language, so as to solve the problems of unclear definition and misidentification of redundant components of spoken language

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for identifying redundant components of spoken language
  • Method and device for identifying redundant components of spoken language
  • Method and device for identifying redundant components of spoken language

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The present invention will be further described below in conjunction with accompanying drawing.

[0055] In the oral dialogue scene, because everyone's living habits, regions, personalities, and Mandarin levels are different, everyone's spoken English is almost different. After the spoken dialogue content is translated by ASR, the text often contains many redundant components. Some typical redundant components are for example: modal particles or interjections such as "uh" and "um", meaningless pronouns such as "that" and "um", punctuation marks, repetitive components such as "I, I, and I".

[0056] Obviously, these redundant component contents affect the subsequent natural language understanding of the machine. Currently, the redundant components cannot be effectively and accurately defined during recognition, for example, "hurriedly busy" is a whole, not a repeated character. Because of the inaccurate definition, the effective components in the sentence will be ident...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device for identifying redundant components of a spoken language. The method comprises the following steps: receiving a spoken language corpus and a training corpus; classifying redundant components in the spoken language corpus to obtain redundant components and repeated components; according to a preset scene and the redundant components, training the training corpus to obtain a redundant component recognition model; according to the repeated components, training the training corpus to obtain a repeated component recognition model; and recognizing the oral language text by using the redundant component recognition model and the repeated component recognition model to obtain the oral language text marked with redundant components. The problems of unclear definition and misrecognition of the oral redundant components in the prior art can be solved.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a method and device for identifying redundant components of spoken language. Background technique [0002] The spoken dialogue scene is an important and common one in the field of natural language processing. In a spoken dialogue scene, after ASR (speech recognition) translation, the text often contains many redundant components. Redundant components are typical modal particles or interjections, referential pronouns, punctuation marks, repeated components, etc. These redundant content will affect subsequent natural language understanding and need to be identified. However, the existing technology mainly uses the rule method to identify modal particles, interjections, repeated components, and punctuation marks, and uses machine learning or deep learning models to identify other redundant components. However, the rule method defines redundant components on the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/332G06F16/33G06F16/35G06F40/242G06N20/00
CPCG06F16/3329G06F16/3344G06F16/35G06F40/242G06N20/00
Inventor 简仁贤范敏苏畅吴文杰
Owner EMOTIBOT TECH LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products