Check patentability & draft patents in minutes with Patsnap Eureka AI!

Text normalizing method and device

A text and regular technology, applied in the field of text processing, can solve the effect of machine simultaneous interpretation, affect translation quality, repetition and other problems, and achieve the effect of improving accuracy.

Pending Publication Date: 2020-07-21
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, because machine simultaneous interpretation is limited by speech recognition and machine translation technology, the overall translation quality of machine simultaneous interpretation is far behind that of manual simultaneous interpretation. Especially in some speech occasions, speakers often improvise according to the theme and ideas. Speech, during the speech, there will be colloquial expressions such as incomplete semantics, repetition, and modal particles. This phenomenon will affect the quality of subsequent translations, and even lead to complete translation errors
In response to this situation, existing machine simultaneous interpretation products usually use a rule-based method to remove some colloquial expressions, but the effect is not good, which will greatly affect the effect of subsequent machine simultaneous interpretation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text normalizing method and device
  • Text normalizing method and device
  • Text normalizing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] In order to enable those skilled in the art to better understand the solutions of the embodiments of the present invention, the embodiments of the present invention will be further described in detail below in conjunction with the drawings and implementations.

[0052] In view of the fact that some colloquial expressions often appear in the speaker's voice, and then the speech recognition text input into the machine for machine translation contains colloquial words, the embodiment of the present invention provides a text regularization method and device, respectively from the current speech data and its Extract the corresponding regularization features from the recognition text, use the word vectors of each word unit in the recognition text, the feature vectors corresponding to the regularization features, and the pre-built text regularization model to determine the labels of each word unit, and the labels are used to identify all word units. Whether the predicate unit s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text normalizing method and device. The method comprises the steps of obtaining voice data and a recognition text corresponding to the voice data; obtaining a word unit in the recognition text and a word vector corresponding to the word unit; extracting regular features, specifically, extracting voice features from the voice data, and extracting text features from the recognition text; inputting the word vector and the feature vector corresponding to the structured feature into a pre-constructed text structured model, and obtaining a label corresponding to the word unit according to the output of the text structured model, the label at least comprising deletion; and normalizing the recognition text according to the label corresponding to the word unit to obtain anormalized speech recognition text. By utilizing the scheme of the invention, the spoken speech recognition text can be more understandable and can express normativity.

Description

technical field [0001] The invention relates to the field of text processing, in particular to a text regularization method and device. Background technique [0002] Simultaneous interpretation, referred to as "simultaneous interpretation", refers to the uninterrupted real-time translation of the speech content to the audience without interrupting the speaker. Its biggest feature is its high efficiency, and the audience can obtain information in a timely manner. It is widely used in important occasions such as international conferences and diplomatic negotiations. At present, with the development of artificial intelligence technology, machine simultaneous interpretation has appeared. The biggest advantage of machine simultaneous interpretation is that the translation rate will not decrease due to fatigue. However, because machine simultaneous interpretation is limited by speech recognition and machine translation technology, the overall translation quality of machine simult...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/26G10L15/25G10L15/02G06F40/289G06F40/30G06N3/04G06N3/08G06K9/62
CPCG10L15/26G10L15/25G10L15/02G06N3/08G06N3/044G06N3/045G06F18/24
Inventor 赵超
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More