Automatic end-to-end English text structure analysis method based on pipeline mode

An automatic analysis and text technology, applied in semantic analysis, character and pattern recognition, natural language data processing, etc., to achieve the effect of improving accuracy

Active Publication Date: 2017-10-20
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF3 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At the same time, aiming at the problem that linguistic features cannot dig out deeper semantics, by carefully analyzing the characteristics of non-explicit discours

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic end-to-end English text structure analysis method based on pipeline mode
  • Automatic end-to-end English text structure analysis method based on pipeline mode
  • Automatic end-to-end English text structure analysis method based on pipeline mode

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The present invention will be described in detail below with reference to the accompanying drawings and examples. The experimental methods used in the following examples are conventional methods unless otherwise specified.

[0052] The first is the training step, such as figure 1 As shown, the process is as follows:

[0053] 1. Prepare the training corpus, the implementation steps are as follows:

[0054] (1) Using section 02-21 in the Pennsylvania Discourse Treebank (PDTB) version 2.0 as the training corpus, for explicit discourse relations, extract the corresponding connectives, the range of arguments (Arg1, Arg2), and discourse relations category and the corresponding original text, and obtain the corresponding part-of-speech tagging and syntactic analysis; for the non-explicit discourse relationship, extract the corresponding argument range, discourse relationship category, and the corresponding part-of-speech tagging and syntactic analysis results of the original...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an automatic end-to-end English text structure analysis method based on a pipeline mode and belongs to the technical field of natural language processing applications. According to the method, the explicit text relation is identified, for the shortcoming of characteristic vectorization performed by adopting a word bag model in a traditional method, a characteristic representation and calculation method based on combination of a hybrid convolutional tree kernel and a polynomial kernel is proposed, and syntax characteristics and flat characteristics are divided and ruled. The characteristic vector dimensionality can be greatly reduced, and detailed information in expression characteristics can be fully expressed. Furthermore, for the problem that deeper semantics cannot be mined based on linguistic characteristics and the traditional method brings sparse data and semantic gaps during identification of non-explicit text relation, a non-explicit text relation identification model based on deep learning is provided by analyzing the identification characteristics of the non-explicit text relation in detail and applying the advantages of word pair characteristics. Compared with the prior art, the precision of a whole end-to-end system is improved.

Description

technical field [0001] The present invention relates to an end-to-end English discourse structure automatic analysis method based on pipeline mode, in particular to an explicit discourse relationship analysis method based on the combination of hybrid convolution tree kernel and polynomial kernel and a non-explicit discourse relationship analysis method based on deep learning. The present invention relates to a textual relationship analysis method, which belongs to the technical field of natural language processing applications. Background technique [0002] Text analysis has always been the core task of natural language processing. The text context information and text-level semantic information it provides are of great significance to other tasks of natural language processing such as machine translation, sentiment analysis, and automatic question answering. Discourse structure analysis is one of the important ways of discourse analysis, which aims to study the composition ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06K9/62
CPCG06F40/211G06F40/30G06F18/2414G06F18/2411
Inventor 鉴萍张鹏程黄河燕
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products