Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Cross-language retrieval method oriented to big data

A cross-language, big data technology, applied in the field of cross-language retrieval, can solve problems such as language inconsistency

Active Publication Date: 2017-02-01
GLOBAL TONE COMM TECH
View PDF11 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a big data-oriented cross-language retrieval method, aiming to solve the problem of inconsistency between the language used for query and the language used for documents in cross-language information retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-language retrieval method oriented to big data
  • Cross-language retrieval method oriented to big data
  • Cross-language retrieval method oriented to big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0045] The application principle of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0046] like figure 1 As shown, the big data-oriented cross-language retrieval method of the embodiment of the present invention includes the following steps:

[0047] S101: According to Wikipedia's Chinese-English comparable corpus, respectively construct dictionary trees of Chinese and English entries;

[0048] S102: For the query word, search in different dictionary trees according to the language, if found, return the corresponding entry number;

[0049]S103: According to the ent...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a cross-language retrieval method oriented to big data. The cross-language retrieval method oriented to the big data comprises the following steps: constructing a bilingual word vector model by use of cross-language relationship between Chinese-English Wikipedia entries; translating query by use of the bilingual word vector model; finally, constructing a new query execution retrieval according to a candidate translation. The bilingual vector model utilizes source language query vectors as input, and the similarity degree of target language documents similar to query vector semantics is output; during a query translation process, a canonical correlation analysis result is adopted. Through starting from the angle of automatic query translation and utilizing the document semantic similarity characteristic of different languages, a shared semantic space of two languages is found, and semantic translation on query is performed in the shared space, so that an automatic query translation function is realized.

Description

technical field [0001] The invention belongs to the technical field of cross-language retrieval, and in particular relates to a big data-oriented cross-language retrieval method. Background technique [0002] With the continuous development of information technology and the deepening of cultural exchanges around the world, the Internet has gradually become a global multilingual information sharing warehouse. How to quickly and accurately obtain user-satisfied cross-language information from the massive information database is an urgent problem to be solved in the multilingual information age. [0003] Cross-language information retrieval is an important means of obtaining multilingual information. Cross-language information retrieval (Cross-language information retrieval, CLIR) refers to a query constructed in a certain language to retrieve information expressed in one or more other languages. information retrieval techniques or methods. As a branch of the information retr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27G06F17/28
CPCG06F16/243G06F40/30G06F40/56
Inventor 程国艮巢文涵王文声
Owner GLOBAL TONE COMM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products