Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

State transition and neural network-based Chinese chunk parsing method

A neural network and state transfer technology, applied in natural language data processing, special data processing applications, instruments, etc., can solve problems such as not being able to make full use of block level and long-distance information features

Active Publication Date: 2016-10-12
NANJING UNIV
View PDF1 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Purpose of the invention: The present invention proposes a method based on state transition and neural network for the shortcomings that the models used in the current Chinese chunk analysis technology cannot make full use of the chunk level and long-distance information features, and need to manually customize complex combined feature templates To alleviate this limitation and improve the accuracy of Chinese chunk analysis

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • State transition and neural network-based Chinese chunk parsing method
  • State transition and neural network-based Chinese chunk parsing method
  • State transition and neural network-based Chinese chunk parsing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0176] First of all, the model parameters in the present embodiment are first in the 728 files in the Penn State Treebank Chinese version CTB (The Chinese Penn Treebank) 4.0 according to the method in the additional explanation of the model parameter training method in the instructions (the file numbers start from chtb_001. fid to chtb_899.ptb, it should be noted that the numbers are not consecutive, so there are only 110 files) and the training is obtained on 9978 sentences.

[0177] The present embodiment utilizes the Chinese chunk analysis method based on state transition and neural network in the present invention to carry out the complete process of Chinese chunk analysis to a sentence as follows:

[0178] Step 1-1, define Chinese chunk types, 12 types are defined on the basis of CTB4.0 of the Chinese version of Penn Treebank: ADJP, ADVP, CLP, DNP, DP, DVP, LCP, LST, NP, PP, QP, VP, for their specific meanings, see step 1-1 in the manual;

[0179] Step 1-2, determine the...

Embodiment 2

[0205] Algorithms used in the present invention are all written and implemented in C++ language. The model used in the experiment of this embodiment is: Intel(R) Core(TM) i7-5930K processor, the main frequency is 3.50GHz, and the memory is 64G. First of all, the model parameters in the present embodiment are first in the 728 files in the Penn State Treebank Chinese version CTB (TheChinese Penn Treebank) 4.0 according to the method in the additional explanation of the model parameter training method in the specification sheet (the file number is from chtb_001.fid To chtb_899.ptb, it should be noted that the numbers are not consecutive, so there are only 110 files) that are trained on 9978 sentences. The data used in the experimental test uses 5290 sentences in 110 files (the file numbers are from chtb_900.fid to chtb_1078.ptb, it should be noted that the numbers are not consecutive, so there are only 110 files) for block analysis. The results are shown in Table 7:

[0206] Ta...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention proposes a state transition and neural network-based Chinese chunk parsing method. The method comprises the steps of converting a chunk parsing task into a serialized tagging task; tagging a sentence by using a state transition-based framework; scoring a transition operation to be carried out in each state by using a forward neural network in the tagging process; and taking a distributed representation characteristic of words and part-of-speech tagging learned by utilizing a two-way long short-term memory neural network model as an additional information characteristic of a tagging model, thereby improving the accuracy of chunk parsing. Compared with other Chinese chunk parsing technologies, the Chinese chunk parsing method has the advantages that characteristics of chunk levels can be more flexibly added by using the state transition-based framework, combination modes among the characteristics can be automatically learned by using the neural network, the useful additional information characteristic is introduced by utilizing the two-way long short-term memory neural network model, and the combination of the state transition-based framework, the neural network and the two-way long short-term memory neural network model effectively improves the accuracy of chunk parsing.

Description

technical field [0001] The invention relates to a method for analyzing shallow Chinese syntax by using a computer, in particular to a method for automatically analyzing Chinese chunks based on the combination of state transition and neural network. Background technique [0002] Chinese syntax analysis is a basic task in Chinese information processing, and its wide application requirements have attracted a lot of related research, which has promoted the rapid development of its related technologies. Due to factors such as the high complexity of the problem itself, the complete syntax analysis has a low analysis accuracy rate and a slow speed, so its practicability is limited. Chunking analysis, also known as shallow syntactic analysis, is different from complete syntactic analysis for the purpose of obtaining a complete syntax tree of a sentence. Nested noun phrases, verb phrases, etc. Since its recognition target is non-nesting and non-overlapping phrase components that me...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
CPCG06F40/211G06F40/289
Inventor 戴新宇程川陈家骏黄书剑张建兵
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products