Coordinated training-based dual-language named entity identification method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology for named entity recognition and named entity, applied in the field of natural language processing (NLP), which can solve the problems of performance degradation, unsatisfactory performance, and discomfort of supervised learning methods, and achieves reduction of domain dependence, improvement of consistency, and strong generalization. The effect of the ability to

Active Publication Date: 2014-06-11

BEIJING INSTITUTE OF TECHNOLOGYGY

View PDF5 Cites 34 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Among the statistical methods, the supervised learning method has a good performance in the task of named entity recognition, but it has two shortcomings: First, the method requires a large amount of labeled data to ensure the accuracy of learning, so it is not suitable for those Languages with relatively poor resources; second, when the existing labeled data and the data to be judged do not belong to the same field, the performance of the supervised learning method will drop significantly

The performance of unsupervised methods is not satisfactory

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0019] The specific implementation manner of the present invention will be described in further detail below in conjunction with the accompanying drawings.

[0020] A bilingual named entity recognition method based on collaborative training, comprising the following steps:

[0021] Step 1. Initialize the bilingual sequence tagging model, and train the Chinese-English sequence tagging models: Cmodel(s) and Cmodel(t) respectively on the tagged corpus sets Ls and Lt aligned at the Chinese-English sentence level. There are three named entities marked in the annotation corpus, namely PER (person name), LOC (place name) and ORG (organization name). The BIO annotation set is selected, and there are 7 types of annotations for all words: B-PER, I-PER, B-LOC, I-LOC, B-ORG, I-ORG, and O. Chinese uses single character features, single word features, 2-3 characters or word combination features; English uses word, part of speech, initial letter case feature combination templates.

[0022]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a dual-language coordinated training-based named entity identification method, and belongs to the technical field of natural language processing in computer science. Parallel Chinese and English sentence datasets are considered as two different view of a dataset for dual-language coordinated training, a log-linear model is used for correcting projection marks in a projection process, and named entity dual-language aligned annotation consistency is introduced as a measurement index for mark confidence estimation when the model is used for predicting an unseen case. Compared with the prior art, the method has the advantages that the domain dependence of named entity identification is reduced, the advantages of dual-language identification are fused, the problem of partial identification ambiguity in single-language identification is solved, and the method is particularly suitable for the dual-language named entity synchronous identification of large-scale language materials.

Description

technical field [0001] The invention relates to a method for identifying bilingual named entities, and is especially suitable for identifying named entities on large-scale cross-domain bilingual corpus as a pre-processing of machine translation, and belongs to the technical field of natural language processing (NLP) in computer science. Background technique [0002] A named entity is the proper name of a unique entity. Named entity recognition is an important basic technical problem in the field of natural language processing, and has become one of the technical bottlenecks in the field of multilingual information processing such as cross-language information retrieval and machine translation. [0003] Currently, researchers have developed many models for named entity recognition. Among them, since rule-based methods are not conducive to generalization among different types of languages, statistical-based methods have received extensive attention in recent years. Among the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/28

Inventor黄河燕史树敏李业刚

OwnerBEIJING INSTITUTE OF TECHNOLOGYGY

Coordinated training-based dual-language named entity identification method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology