Vietnamese-English cross-language text retrieval method and system based on inter-word weighted association model

A weighted association and cross-lingual technology, applied in digital data information retrieval, instrumentation, computing, etc., can solve problems such as word mismatch, inferior single-language retrieval performance, and query topic drift

Inactive Publication Date: 2019-03-29
GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Scholars from all over the world have conducted in-depth discussions and research on cross-lingual information retrieval methods and systems from different angles and directions, and have achieved rich results. However, the current problems in cross-lingual information retrieval research have not been completely resolved. One of the problems that have been solved and paid more attention to is the serious query topic drift problem in the process of cross-language information retrieval, which is more serious than single-language retrieval. The problem of word mismatch, these problems often lead to low performance of cross-language retrieval, Not as good as monolingual retrieval performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Vietnamese-English cross-language text retrieval method and system based on inter-word weighted association model
  • Vietnamese-English cross-language text retrieval method and system based on inter-word weighted association model
  • Vietnamese-English cross-language text retrieval method and system based on inter-word weighted association model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] The technical solutions of the present invention will be further described in non-limiting detail below with reference to the embodiments and the accompanying drawings.

[0065]1. In order to better illustrate the technical solution of the present invention, the related concepts involved in the present invention are introduced as follows:

[0066] Suppose CLIRdoc={d 1 ,d 2 ,…,d n} is the set of documents related to the target language preliminary inspection of the cross-language preliminary inspection results, where d i (1≦i≦n) is the i-th document in the target language document set CLIRdoc, d i ={t 1 ,t 2 ,…,t m ,…,t p}, t m (m=1,2,...,p) is called the target language feature term item (Feature-term Item, FTI), abbreviated as feature item, which is generally composed of words, words or phrases, d i The corresponding feature item weight set W in i ={w i1 ,w i2 ,…,w im ,…,w ip},w im for the i-th document d i The mth characteristic item t in m The corres...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an Inter-word weighting associating mode-based Vietnamese-to-English cross-language text retrieval method and an Inter-word weighting associating mode-based Vietnamese-to-English cross-language text retrieval system. The method comprises the following steps: translating a Vietnamese user query into an English query by utilizing a machine translating module, and submitting the English query to a text retrieval module for retrieving an English document; performing relevance judgment by utilizing a user relevant feedback information extracting module to obtain a user feedback English relevant document set; performing pre-processing by utilizing an English document pre-processing module to obtain an initial retrieval English relevant document library; establishing an English characteristic word weighting associating rule library by using a weighting associating mode mining module; establishing an English extension word library by utilizing an extension work generating module; resubmitting a combined new query to the text retrieval module for retrieving to obtain a final retrieval result English document by utilizing a query extension implementation module, translating the final retrieval result English document into a Vietnamese document through a final result display module, and returning the Vietnamese document to a user. The method and the system can effectively enhance and improve the cross-langue retrieval performance, and have a good practical application value and a good popularization prospect.

Description

technical field [0001] The invention belongs to the field of text information retrieval, in particular to a Vietnamese-English cross-language text retrieval method and system based on an inter-word weighted association mode, which is suitable for the fields of cross-language text information retrieval using Vietnamese query to retrieve English documents and the like. Background technique [0002] Cross-language information retrieval refers to the technology of retrieving information resources in other languages ​​with queries in one language. The Vietnamese-English cross-language information retrieval method is a cross-language retrieval problem of retrieving English documents with Vietnamese queries. The Vietnamese language in which the query is expressed is called the source language, and the English language of the retrieved documents is called the target language. With the increasingly close communication between China and ASEAN countries, it is urgent and important to s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/332
CPCG06F16/3337
Inventor 黄名选
Owner GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products