Indonesian-Chinese cross-language retrieval method and system based on matrix weighted association model

A technology of matrix weighting and association mode, applied in digital data information retrieval, instrumentation, calculation and other directions, can solve the problems of query subject drift, inferior performance of single-language retrieval, word mismatch and so on

Inactive Publication Date: 2019-04-16
GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Scholars from all over the world have conducted in-depth discussions and research on cross-lingual information retrieval methods and systems from different angles and directions, and have achieved rich results. However, the current problems in cross-lingual information retrieval research have not been completely resolved. One of the problems that have been solved and paid more attention to is the serious query topic drift problem in the process of cross-language information retrieval, which is more serious than single-language retrieval. The problem of word mismatch, these problems often lead to low performance of cross-language retrieval, Not as good as monolingual retrieval performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Indonesian-Chinese cross-language retrieval method and system based on matrix weighted association model
  • Indonesian-Chinese cross-language retrieval method and system based on matrix weighted association model
  • Indonesian-Chinese cross-language retrieval method and system based on matrix weighted association model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] The technical solution of the present invention will be described in further non-limiting detail below in conjunction with the embodiments and accompanying drawings.

[0057] One, in order to better illustrate the technical scheme of the present invention, the relevant concepts involved in the present invention are introduced as follows below:

[0058] Assume that the target language (Target Language, TL) preliminary inspection related document set obtained after cross-language retrieval by the user query is TLdoc={tld 1 ,tld 2 ,...,tld n},tld i (1≦i≦n) indicates the i-th document in the target language document set TLdoc, tld j ={t 1 ,t 2 ,...,t m ,...,t p},t m (m=1,2,...,p) is called the target language feature term item (Feature-term Item, FTI), referred to as the feature item, generally composed of words, words or phrases, tld i The corresponding feature item weight set W in i ={w i1 ,w i2 ,...,w im ,...,w ip},w im tld for the i-th document i The mth...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a matrix weighted association mode-based Indonesian and Chinese cross-language retrieval method and system. The method comprises the steps of translating an Indonesian user query into a Chinese query by utilizing a machine translation module and submitting the Chinese query to a text retrieval module for retrieving a Chinese document; performing preprocessing by using a front initial retrieved document extraction and preprocessing module, and establishing a front initial retrieved document database; calling an Indonesian and Chinese cross-language retrieval-oriented matrix weighted association rule mining module to establish a matrix weighted association rule library; establishing an extension word base by utilizing a cross-language query extension word generation module; submitting a combined new query to the text retrieval module for retrieval again by utilizing a cross-language query extension realization module to obtain a Chinese document of a final retrieval result; and submitting the final retrieval result to the machine translation module for translation by utilizing a final result display module to obtain an Indonesian document, and returning the Indonesian document to a user. The method is applied to a cross-language text retrieval system for ASEAN countries; the cross-language retrieval performance is effectively enhanced and improved; and the application value and the popularization prospect are relatively high and good.

Description

technical field [0001] The invention belongs to the field of text information retrieval, and specifically relates to an Indonesian-Chinese cross-language retrieval method and system based on a matrix weighted association model, which is applicable to fields such as cross-language text information retrieval of Chinese documents by Indonesian language query. Background technique [0002] Cross-language information retrieval refers to the technology of retrieving information resources in other languages ​​with a query in one language. The Indonesian-Chinese cross-language information retrieval method is to query and retrieve Chinese documents in Indonesian language. The Indonesian language used to express the query is called the source language, and the Chinese language of the retrieved documents is called the target language. With the increasingly close exchanges between China and ASEAN countries, the research on cross-lingual information retrieval methods for ASEAN languages ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/332
CPCG06F16/3334G06F16/3337
Inventor 黄名选
Owner GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products