Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for automatically extracting bilingual translation dictionary from internet

A technology for automatically extracting and translating dictionaries, which is applied in special data processing applications, instruments, and electronic digital data processing, etc., to achieve the effect of short update cycle, small workload, and overcoming performance bottlenecks

Inactive Publication Date: 2011-12-28
TSINGHUA UNIV +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0027] The technical problem to be solved by the present invention is how to construct a bilingual dictionary quickly and effectively without relying on any external resources when constructing a bilingual dictionary from the Internet

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically extracting bilingual translation dictionary from internet
  • Method for automatically extracting bilingual translation dictionary from internet
  • Method for automatically extracting bilingual translation dictionary from internet

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] The method that the present invention proposes automatically extracts the bilingual translation dictionary from the Internet, is described as follows in conjunction with accompanying drawing and embodiment, is embodiment with English to illustrate the present invention.

[0065] Such as figure 1 Shown is the flow chart of using this method to generate a bilingual dictionary. Include steps:

[0066] Step 1: Extract bilingual vocabulary in brackets and bilingual vocabulary with good structure from the bilingual webpage in Chinese and English. When extracting the bilingual vocabulary in brackets, traverse the content on the left side of the left bracket from right to left, and use language attribute changes or punctuation marks as boundaries. The attribute is Chinese or English. For example: "..., XXX is unhappy (uphappy)" If "XXX" is Chinese, then extract "XXX is unhappy (unhappy)" as a bilingual word in brackets when extracting, if "XXX" is not Chinese , then extract ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for automatically extracting a bilingual translation dictionary from the internet. The method is characterized by comprising the following steps of: extracting bracket bilingual words and right-structured bilingual words from Chinese and foreign bilingual web pages; intercepting the extracted bracket bilingual words to obtain exactly translated bracket bilingual words; carrying out root combination on the right-structured bilingual words and the exactly translated bracket bilingual words; for given Chinese, searching corresponding translations in the right-structured bilingual words, and if the corresponding translations are searched, ignoring the translations of the bracket bilingual words, or else, searching the corresponding translations in the bracket bilingual words; and processing all foreign languages by using the same method to obtain a final bilingual translation dictionary. The invention can quickly, effectively and automatically construct the bilingual translation dictionary according to the word frequency of the bilingual words without relying on any external resources.

Description

technical field [0001] The invention relates to the technical field of statistical natural language processing, in particular to a method for automatically extracting bilingual translation dictionaries from the Internet. Background technique [0002] Whether it is scientific research or daily life, people have a high degree of exposure to and dependence on foreign languages. Traditional translation dictionaries mainly come from manual collation and editing, with a long generation cycle, slow update, and low coverage. Existing methods for generating translation dictionaries based on the Internet rely on a variety of natural language processing technologies and machine learning technologies. These methods may become performance bottlenecks when processing large-scale data, and rely on pre-established resources. [0003] The bilingual translation dictionary we constructed comes from the Internet. In addition to traditional vocabulary, it can also cover current popular vocabula...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/28
Inventor 周立柱韩军刘娟张崇茹立云佟子健
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products