Combined processing method of word segmentation, part-of-speech tagging, entity identification and syntactic analysis

A technology of part-of-speech tagging and entity recognition, which can be used in electronic digital data processing, special data processing applications, natural language data processing, etc., and can solve problems such as error propagation and task accuracy decline.

Active Publication Date: 2018-07-13
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF7 Cites 32 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to solve the problem that in the traditional method, when performing word segmentation, part-of-speech tagging, entity recognition and component syntactic analysis tasks in a pipeline, errors are propagated between tasks, resulting in a decrease in task accuracy. The joint processing method of taggin

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Combined processing method of word segmentation, part-of-speech tagging, entity identification and syntactic analysis
  • Combined processing method of word segmentation, part-of-speech tagging, entity identification and syntactic analysis
  • Combined processing method of word segmentation, part-of-speech tagging, entity identification and syntactic analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0084] This embodiment describes the complete process of the "joint processing method of word segmentation, part-of-speech tagging, entity recognition and syntactic analysis" from building a model to training a model and then using the model to analyze Chinese sentences.

[0085] figure 1 It is a flow chart of the implementation of the method proposed in the present invention. In order to illustrate the relevant content more clearly, we will also describe it together with other drawings.

[0086] Step A: According to the purpose of the present invention, construct a joint model, including: define the joint model structure, define the feature template, define the transfer action set of the joint model, define the calculation method of the feature vector, define the training method of the joint model and the joint model The loss function, specifically:

[0087] Step A.1 defines the joint model structure:

[0088] First construct the n-gram bi-LSTM neural network, the structure...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a processing method of word segmentation, part-of-speech tagging, entity identification and syntactic analysis, discloses a combined processing method of word segmentation, part-of-speech tagging, entity identification and syntactic analysis and belongs to the technical field of natural language processing. The core ideal of the method is characterized in that firstly, a combined model is constructed step by step; secondly, existing entity data and component syntax tree data are utilized to construct combined syntax tree data; thirdly, training data is extracted from the combined syntax tree data; fourthly, the training data is utilized to train the combined model; finally, the trained combined model is used for analyzing Chinese sentences to be analyzed to obtaina combined syntax tree serving as an analysis result. The method effectively avoids an error propagation problem and is an analysis method based on transfer, and the execution speed of the method is ensured.

Description

technical field [0001] The invention relates to a joint processing method of word segmentation, part-of-speech tagging, entity recognition and syntax analysis, and belongs to the technical field of natural language processing. Background technique [0002] Word segmentation, part-of-speech tagging, entity recognition, and syntactic analysis are all important basic tasks in the field of natural language processing. Word segmentation is to enable the model to accurately identify the words in the sentence, because in some languages, such as Chinese, there is no obvious word segmentation mark in the sentence, and there is no interval between words, and when performing text analysis, It is often necessary to use text at the word level, so word segmentation has become a necessary basic task. Part-of-speech tagging is to judge the grammatical category of each word in a sentence, determine its part of speech and mark it. Entity recognition is to identify entities with specific mea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/355G06F40/211G06F40/295
Inventor 郭平常薇辛欣
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products