A text analysis method and device for large-scale multilingual data

A text analysis, multilingual technology, applied in text database clustering/classification, text database query, unstructured text data retrieval, etc. problem, to achieve the effect of improving the clustering accuracy

Pending Publication Date: 2019-05-07
INFORMATION RES INST OF SHANDONG ACAD OF SCI
View PDF2 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Most of the existing text data analysis methods are based on a single language. Even if they are applied to multilingual text analysis, they often cannot get good analysis results, especially because it is easy to only consider the language information of a single language. It is difficult to effectively discover potential correlation information between multiple languages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text analysis method and device for large-scale multilingual data
  • A text analysis method and device for large-scale multilingual data
  • A text analysis method and device for large-scale multilingual data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] It should be noted that the following detailed description is exemplary and intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

[0050] It should be noted that the terminology used herein is only for describing specific embodiments, and is not intended to limit the exemplary embodiments according to the present disclosure. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and / or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and / or combinations thereof.

[0051] One or more embodiments provide a text analysis method for large-scale multilingual data, the method includes the follow...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text analysis method and device for large-scale multilingual data, and the method comprises the steps: collecting large-scale multilingual text data, and storing the large-scale multilingual text data in a corresponding database; Performing entity matching on the multilingual text data in the database by utilizing a Markov logic network; using the ML-PIB algorithm to carry out clustering analysis on the matched multilingual text data to obtain a target clustering result among the different language information, so that the association contained in the different language information is effectively mined, and the clustering quality is improved.

Description

technical field [0001] The disclosure relates to the field of multilingual text analysis, in particular to a text analysis method and device for large-scale multilingual data. Background technique [0002] With the rapid development of the Internet and the further deepening of the trend of globalization, network data has shown explosive growth, and the era of big data has arrived. There is a large amount of multilingual text data in the program in the network text. At the same time, with the rise of the translation system, many texts are also translated into other languages, which promotes the generation of multilingual text data. [0003] Most of the existing text data analysis methods are based on a single language. Even if they are applied to multilingual text analysis, they often cannot get good analysis results, especially because it is easy to only consider the language information of a single language. It is difficult to effectively discover potential correlation inf...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/33
Inventor 杨子江于俊凤朱世伟徐蓓蓓魏墨济李晨李思思刘翠芹李宪毅
Owner INFORMATION RES INST OF SHANDONG ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products