Method for representing multiple graininess of text message

A technology of text information and text representation, which is applied in special data processing applications, instruments, electrical digital data processing, etc. It can solve the problems that text representation cannot meet the text optimal representation, and achieve superior performance, high robustness and stability , to avoid the effect of complex operations

Inactive Publication Date: 2009-03-04
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF0 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem that the single-granularity text representation in the prior art cannot meet the optimal representation of the text, the purpose of the present invention is to study the multi-granularity text representation method, improve the performance of text information representation, and thus improve the performance of information retrieval, text classification, and text aggregation. The intelligent processing of text information, such as class and text content analysis, has a driving effect. Therefore, the present invention provides a text information representation method based on multi-granularity text features

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for representing multiple graininess of text message
  • Method for representing multiple graininess of text message
  • Method for representing multiple graininess of text message

Examples

Experimental program
Comparison scheme
Effect test

example

[0051] In this example, the text representation in the text classification application is taken as an example, and the n-gram method is used to implement the multi-granularity representation of the text, and different values ​​of n represent different granularities. Then in the multi-granularity text learning stage, it is necessary to learn different n-gram models, and then integrate different n-gram text models to form a multi-layer structure model. In this embodiment, the multi-granularity text representation by a three-layer n-gram model composed of Unigram, Bigram and Trigram text models is taken as an example to illustrate the implementation process, and the text features used are grams of each order. The specific instructions are as follows:

[0052] 1. Learning of multi-granularity text model

[0053] Unigram, Bigram, and Trigram represent text representation models with a granularity of 1 to 3, respectively. N-grams are learned for each of them, and the probability d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a text information representation method based on multi-granularity text features. In the method, multi-granularity text representation model training is used for generating a multi-granularity text model and integration, thus forming text information multi-granularity integration representation. The method solves the problem of the integration of multi-granularity text features based on overall weights and text local features. The text information representation method is very robust and stable in respect of corpus scale and sparse data. The semantic structures implied in the text can be accurately and fully represented through acquiring the multi-granularity semantic space mapping of the text. The advantages of fine granularity text representation and coarse granularity text representation can be used comprehensively based on the relativity between the multi-granularity text features. Under the condition of training corpora with different scales, the text representation performance of the text information representation method is better than the representation performance of the single-granularity text representation method. Although the multi-granularity text representation method is in a multilayer structure, and the relations between the layers are clear, and the complex operations of a plurality of text representation methods in parameter adjustment are eliminated.

Description

technical field [0001] The present invention relates to the technical fields of intelligent information processing, information retrieval and natural language processing, in particular to a text information representation method, which is used for text representation in information retrieval or other text processing applications, so that the text can be processed and analyzed by a computer , showing high performance in information retrieval, text classification, and some other text processing purposes. Background technique [0002] Text processing is an important technology in information processing technology, and has a core position in information retrieval, text information analysis, natural language processing and other fields. The first step in text processing is to represent text information into a form that can be analyzed by computer programs. The quality of text information representation methods directly affects the effect and efficiency of text processing. Especi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/21
Inventor 戴汝为朱远平王春恒
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products