Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning

A supervised learning, unregistered word technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of too simple correlation evaluation, incomplete representation of translation candidate features, etc., to improve coverage, Improve translation accuracy and improve the effect of accuracy

Inactive Publication Date: 2012-09-12
SHANGHAI JILIAN NETWORK TECH CO LTD
View PDF2 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It is established to solve the existing problems in the existing methods, such as incomplete representation of translation candidate features and too simple correlation evaluation. The specific steps are as follows:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning
  • Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning
  • Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The present invention is further described below by way of embodiment and accompanying drawing.

[0046] Preprocessing and Parameter Estimation

[0047] The overall process of the present invention is to input Chinese unregistered words as a query into Google A search engine that returns a summary set of search results and extracts translation candidates from the summary set. At the same time, some heuristic rules and information entropy combined with partial features are used to filter the noise for initial sorting, and the top few are taken as the final translation candidates. Then, the features of translation candidates are extracted, and SVM and Ranking SVM are used to evaluate the correlation between translation candidates and unregistered words. In the above process, it is necessary to preprocess the returned abstract set, that is, set the search engine to return English web pages in the Chinese-English translation of unregistered words; remove hyperlinks, et...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of multimedia information processing, specifically relating to a Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning. The method provided by the invention comprises four steps as follows: pre-processing and estimating parameters; acquiring translation candidates based on Web; representing the translation candidates based on multi-feature; ordering and estimating translation results. In the invention, a method for acquiring corpora through Web excavation, representing the translation candidates with multi-feature, and ordering the translation candidates through supervised learning. Compared with the traditional method, the method provided by the invention has the advantages of simple corpora acquiring method and pre-processing, overall features for representing the translation candidates, and high accuracy for translation results. Unknown words translation is one of important and difficult points for text processing; the effective Chinese-English unknown words translating method is provided by the invention, so that the method has the important application value in machine translation and cross-language information retrieval fields.

Description

technical field [0001] The invention belongs to the technical field of multimedia information processing, and in particular relates to a method for translating Chinese-English unregistered words. Background technique [0002] With the rapid development of social informatization and the Internet, new words, terms, popular terms, etc. appear on the Internet in an endless stream. The number of these new named entities is huge and constantly updated, and they cannot be found one by one in the existing bilingual dictionaries. Unregistered words of . The translation of unregistered words refers to inputting unregistered words in the source language and outputting their translation in the target language. With the continuous development of Machine Translation (Machine Translation) and Cross-language Information Retrieval (Cross-language Information Retrieval, CLIR), it is extremely important to be able to quickly and accurately translate unregistered words. It is one of the impor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06F17/30
Inventor 张玥杰苏艳霞金城薛向阳
Owner SHANGHAI JILIAN NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products