Automatic acquisition system of Chinese and Lao bilingual parallel texts and implementation method

A technology of parallel corpus and automatic collection, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of few, lack of corpus collection tools and methods, etc., and achieve the effect of solving the scarcity of corpus resources

Inactive Publication Date: 2015-11-04
广西达译科技有限公司 +1
View PDF8 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the corpus collection tools and methods for Chinese-Lao bilingual

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic acquisition system of Chinese and Lao bilingual parallel texts and implementation method
  • Automatic acquisition system of Chinese and Lao bilingual parallel texts and implementation method
  • Automatic acquisition system of Chinese and Lao bilingual parallel texts and implementation method

Examples

Experimental program
Comparison scheme
Effect test
No Example Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic acquisition system of Chinese and Lao bilingual parallel texts and an implementation method. The implementation method comprises the automatic discovery, the automatic extraction and the automatic tidying of Chinese and Lao bilingual parallel information. The implementation method comprises the following steps: firstly, formulating a keyword group which needs to collect the texts, searching websites through a search engine, collecting webpages to obtain search results, filtering and screening the information of the search results, and then storing the search results obtained through filtering into a search result database; secondly, accessing the webpages in the search result database to automatically extract the Chinese and Lao bilingual parallel information; and finally, aiming at the Chinese and Lao parallel information which is automatically extracted to filter the data, and storing the Chinese and Lao bilingual parallel data subjected to the filtering processing into a Chinese and Lao bilingual parallel corpus. The automatic acquisition system and the implementation method provide important basic data for Chinese and Lao language research and machine translation application, solve a data source problem to which text acquisition personnel and research personnel face, and make an outstanding contribution to the development of the automatic acquisition of the bilingual parallel texts and the natural language processing of Chinese and Lao.

Description

technical field [0001] The invention relates to the field of computer application technology, in particular to a system and an implementation method for automatic collection of Chinese-Lao bilingual parallel corpus. Background technique [0002] "Parallel corpus" ( Parallel Texts ) are texts written in different languages ​​that have a "translation relationship" between them. In computational linguistics, it is distinguished from "contrastive corpora" ( Comparable Texts ), which are also written in different languages ​​and on the same subject, but do not have a direct "translation relationship" between them. [0003] There have been various parallel corpora in human history. The Rosetta Stone unearthed in Egypt, its inscriptions carved in two languages ​​and three scripts, is a well-known ancient parallel corpus. By comparing the writing on the stele, the French ancient philologist Champollion deciphered the hieroglyphs of ancient Egypt. In addition, contract agreemen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28
Inventor 温家凯农强刘连芳刘永俊
Owner 广西达译科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products