Corpus generation method and device, translation model training method, translation model translation method, equipment and medium

A corpus and model technology, applied in the computer field, can solve problems such as scarcity of text, and achieve the effects of rich types, improved accuracy, and improved efficiency

Active Publication Date: 2020-11-03
TENCENT TECH (SHENZHEN) CO LTD
View PDF6 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, the way to obtain corpus is generally to obtain a large amount of text from network resources, but for some rare corpus, the text collected on network resources is relatively rare, how to obtain translation corpus of rare languages ​​is a problem that needs to be considered

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus generation method and device, translation model training method, translation model translation method, equipment and medium
  • Corpus generation method and device, translation model training method, translation model translation method, equipment and medium
  • Corpus generation method and device, translation model training method, translation model translation method, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] In order to better understand the technical solutions provided by the embodiments of the present application, a detailed description will be given below in conjunction with the accompanying drawings and specific implementation manners.

[0065] In order to facilitate those skilled in the art to better understand the technical solutions of the present application, terms involved in the present application are introduced below.

[0066]1. Artificial intelligence technology: It is a comprehensive subject, involving a wide range of fields, including both hardware-level technology and software-level technology. Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation / interaction systems, and mechatronics. Artificial intelligence software technology mainly includes several major directions such as computer vision technology...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a corpus generation method and device, a translation model training method, a translation model translation method, equipment and a medium, and relates to the technical field ofartificial intelligence. The corpus generation method comprises the steps of acquiring a target image; identifying an object image area containing a first language text and a second language text inthe target image; wherein the first language text and the second language text are correspondingly displayed in the object image area; extracting the first language text and the second language text from the object image area; and generating a translation annotation corpus between the first language and the second language according to the first language text and the second language text.

Description

technical field [0001] This application relates to the field of computer technology, in particular to the field of artificial intelligence technology, and provides a method, device, equipment and medium for corpus generation, translation model training, and translation. Background technique [0002] Natural language processing (Nature Language processing, NLP) is widely used, such as in machine translation, machine question answering and other fields. In NLP, various artificial intelligence models are usually relied on for language processing, and the processing accuracy of each model largely depends on the corpus. If the corpus is not sufficient, the trained model will not perform well when applied. For this reason, how to obtain more corpus has become an urgent problem in the field of NLP. [0003] At present, the way to obtain corpus is generally to obtain a large number of texts from network resources, but for some rare corpus, the texts collected on network resources a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/20G06K9/62G06F40/58G06F16/951
CPCG06F40/58G06F16/951G06V10/22G06F18/22G06F18/214
Inventor 张忱黄杰袁星宇
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products