Chinese organization name abbreviation recognition system adopting context feature matching

A feature matching and recognition system technology, applied in special data processing applications, instruments, calculations, etc., can solve the problems that the name and abbreviation of the organization cannot be recognized, and the abbreviation is difficult to be recognized.

Inactive Publication Date: 2014-09-10
EAST CHINA NORMAL UNIV
View PDF3 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, no matter which identification method is used, the abbreviation of the institution name depends on the full name. If the full name corresponding to the abbreviation of the institution name is not included in the corpus, the abbreviation of the institution name will not be recognized.
In addition, the Chinese characters that make up the abbreviation by default also come from the full name and are in the same order as in the full name, which makes it difficult to recognize the abbreviation that does not meet the above conditions

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese organization name abbreviation recognition system adopting context feature matching

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0061] refer to figure 1 , the training part shown in , first trains to obtain the context features of the organization name and noise words, and then intersects them to obtain the intersection features and unique features, and then expands the noise word list obtained by supplementing Chinese surnames and place names at the end of the training. The set obtained in the final training is the final interference vocabulary set, the intersection feature set and the unique feature set of the organization name. figure 1 The recognition process uses the three types of sets obtained from training to use the feature matching algorithm in the test corpus to recognize the abbreviation of the organization name.

[0062] The present invention includes following two modules:

[0063] Module 1: Training Module:

[0064] 1) First, train to obtain the pre-feature, post-feature, and weakly credible feature pairs of the institution name;

[0065] 2) Secondly, three types of contextual feature...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese organization name abbreviation recognition system adopting context feature matching. The system is characterized by including firstly, training to obtain an organization name unique feature set and an intersected feature set of distractor word context features and organization name context features; adopting the features for recognizing abbreviations of organization names; screening the abbreviations of the organization names by means of setup of a distractor word list and extended operations. The Chinese organization name abbreviation recognition system adopting context feature matching has the advantages that recognition of the abbreviations is independent of full names of organizations and composition forms of the abbreviations of the organization names, and the abbreviations of the organization names can be recognized only according to the context features of the organization names.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a recognition system for organization names and abbreviations based on context features. Background technique [0002] Named entity recognition has become a basic task in natural language processing, and plays an important role in information extraction, syntactic analysis, and machine translation. Person names, place names, and organization names are the three most important types of named entities. At present, the identification research of the first two types has been extensive and detailed, and it is of great significance to accurately and efficiently identify organization names. In the text, abbreviations are a common grammatical phenomenon. However, due to the characteristics of various composition forms, weak regularity, and the possibility of multiple abbreviations for the same full name, it is difficult to identify them. [0003] At present, the iden...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 杨静郝娟潘云裴逸钧杜泽宇
Owner EAST CHINA NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products