Automatic analysis method Chinese syntax based on corpus and tree type structural pattern match

A tree structure, pattern matching technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as large amount of calculation, sparse data, ignoring language semantic constraints and associations, etc., to achieve high efficiency, average The effect of improved precision and recall, and large granularity

Inactive Publication Date: 2008-12-24
NANJING UNIV
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The main problems of statistical methods are data sparsity, ignoring context structure information,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic analysis method Chinese syntax based on corpus and tree type structural pattern match
  • Automatic analysis method Chinese syntax based on corpus and tree type structural pattern match
  • Automatic analysis method Chinese syntax based on corpus and tree type structural pattern match

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] 1. Build a data support platform for pattern matching.

[0033] Taking the syntactic tree bank as the corpus resource, through related processing algorithms, derive the syntactic sub-tree bank, the syntactic pattern library, the syntactic sub-pattern library, the pattern specification library, the pattern index library, the statistical sentence pattern library, etc. Matched syntax analysis provides the processing platform. Since the core data structure in the present invention is a tree structure schema, it is strictly defined.

[0034] Definition of syntax pattern: For a syntax tree, draw a line from left to right that only passes through the nodes in the tree. If the nodes on this line satisfy the following constraints, the sequence of nodes is a syntax pattern.

[0035] The nodes on this line are a proper subset C of all nodes D in the tree, and:

[0036] (1) No node in C is on any path of successor nodes starting from other nodes in C;

[0037] (2) No other node ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic analysis method of Chinese syntax based on corpora and pattern matching of tree structure. Based on the deep analysis and complete segmentation of Chinese mark corpus and according to syntactic patterns extracted from corpus and corresponding relationship of semantic collocation, the method carries out the pattern matching and switching processes of the sentences to be processed, and obtains an optimal syntax analysis result through the process of semantic disambiguation. The syntax automatic analysis system of the invention comprises an extracting, storing and calling module of syntactic pattern in syntax treebank, a sentence pattern statistics module, a syntactic pattern matching module, a local conversion module of approximate patterns and a semantic disambiguation module. Experiments prove that compared with the traditional syntax analysis, the Chinese syntax analysis method of the invention pays more attention to the combination of overall matching and local switching of the syntactic patterns, has large processing granularity and high efficiency, and increases average accuracy and recalling rate by about 10 percent.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a new Chinese syntax automatic analysis method and processing system, that is, a Chinese syntax automatic analysis method and system based on corpus and tree structure pattern matching. Background technique [0002] The so-called automatic syntactic analysis, from a formal point of view, is to transform a linear sequence of language elements (words) into a hierarchical three-dimensional structure with language chunks; from a logical point of view, it is to determine the relationship between language elements and their combinations in a sentence. internal relationship between them. [0003] Syntactic Parsing is one of the key technologies in natural language processing research, and the results of syntactic parsing directly affect the understanding of natural language sentences. Natural language understanding is the basis of many language processing technologi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 陈家骏张亮戴新宇尹存燕
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products