Cross-language topic detection method and system

A topic detection, cross-language technology, applied in natural language translation, special data processing applications, instruments, etc., can solve problems such as noise, semantic deviation, lack of resources, etc., to achieve the effect of improving accuracy

Active Publication Date: 2016-12-07
MINZU UNIVERSITY OF CHINA
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For cross-language detection methods based on machine translation and dictionaries, since each language has its own characteristics, in the process of translation from the source language to the target language, there will be semantic deviations and noises, which will change the source language news. The meaning expressed in the report affects the accuracy of text and topic similarity calculations
Therefore, the translation strategy cannot fundamentally improve the performance of cross-lingual topic detection.
The main difficulty of cross-language topic detection methods based on parallel corpora is that parallel corpora are difficult to obtain and resources are scarce.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-language topic detection method and system
  • Cross-language topic detection method and system
  • Cross-language topic detection method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The technical solutions of the present invention will be further described in detail below through the drawings and embodiments.

[0020] The embodiments of the present invention provide a cross-language topic detection method and system to improve the accuracy of calculation of cross-language document similarity, and cross-language topic detection is realized by using topic model construction based on LDA and cross-language topic alignment.

[0021] The following combination figure 1 with Figure 7 The cross-language topic detection method provided by the embodiment of the present invention is described in detail:

[0022] Such as figure 1 As shown, the method includes steps 101-103:

[0023] Step 101: Construct a comparable corpus of the first language and the second language. In this embodiment, the first language is Tibetan as an example, and the second language is Chinese as an example.

[0024] (1) Tibetan-Chinese dictionary construction

[0025] Such as image 3 As shown, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a cross-language topic detection method and system, wherein the method comprises the following steps of building a comparable corpus of a first language and a second language; respectively building a first language topic model and a second language topic model on the basis of the comparable corpus; determining the alignment of a first language topic and a second language topic through similarity judgment on the basis of document-topic probability distribution generated by the first language topic model and the second language topic model so as to realize cross-language topic detection. The system comprises a first generating module, a second generation module and a detection module. The cross-language topic detection method and the cross-language topic detection system provided by the invention have the advantages that the accuracy rate of cross-language document similarity calculation is improved; through the building of the topic models based on LDA (latent dirichlet allocation), the cross-language topic detection is realized by utilizing the cross-language topic alignment.

Description

Technical field [0001] The invention relates to the technical field of cross-language topic detection, in particular to a cross-language topic detection method and system based on a comparable corpus. Background technique [0002] The research of cross-language topic detection helps people of different countries and nationalities to share knowledge, enhance the network information security of various countries and national areas, promote the economic and cultural development of my country’s national areas, promote national unity, and build a "harmonious society" and " The social environment of "scientific development" provides important conditions for support. [0003] Currently, cross-language topic detection mainly has three methods based on machine translation, bilingual dictionary, and bilingual parallel corpus. For cross-language detection methods based on machine translation and dictionary, since each language has its own characteristics, there will be semantic deviations and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28
CPCG06F40/47
Inventor 孙媛赵倩
Owner MINZU UNIVERSITY OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products