Method for automatically identifying Chinese name

An automatic recognition, Chinese technology, applied in the direction of natural language data processing, special data processing applications, instruments, etc., can solve the problems of single characters that do not conform to the understanding of Chinese semantics, do not meet the needs, etc.

Inactive Publication Date: 2017-01-25
DATAGRAND TECH INC
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example: Zhang Ziyi, if divided into "Zhang", "Zi", and "Yi" according to the list, these words do not conform to people's understanding of Chinese semantics, so they do not meet the requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically identifying Chinese name
  • Method for automatically identifying Chinese name
  • Method for automatically identifying Chinese name

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] The present invention will be described in further detail below through specific embodiments and in conjunction with the accompanying drawings. The overall process is divided into two parts: the labeled data training part and the name recognition part. Among them, the specific steps of the labeled data training part are as follows:

[0062] Among them, the name data training part can use a large number of training texts marked with Chinese names as the basic corpus. The specific operations include:

[0063] 1. Divide words into the following types according to the position where they appear:

[0064] H_1: Appears in the first / Head of Chinese names, and is a single surname, for example: Zhang, Li, Wang;

[0065] M_1: Appears in the middle / Middle of Chinese names, for example: Xiao, Xue, Hai, etc.;

[0066] T_1: Appears at the tail / Tail of Chinese names, for example: Wen, Bin, Tao;

[0067] N_1: Does not appear in any of the above positions of the Chinese name / None;...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of computer application, and particularly relates to a method for automatically identifying Chinese name. The method includes a labeled data training part and a people name identifying part. There are quite a lot of combinations for automatically identifying Chinese name through algorithm instead of listing people names endlessly; along with increasing name all the time, it is obvious hard to directly add all people names to a dictionary. Automatic judgment of name distribution with reference to the context is greatly influenced by the context. Besides, some rare Chinese names bring big difficulty to automatic name identification. Through the method, the identification of above complex situations can be comprehensively solved.

Description

technical field [0001] The present invention relates to a method for automatically recognizing Chinese names, in particular to counting the prior probabilities of specific Chinese characters appearing in different positions of Chinese names according to the Chinese training corpus with names marked, and then judging and Extract Chinese name fragments. Background technique [0002] In Chinese articles, the names of people often appear, such as "Wang Jianguo", "Zhang Xiaoming", "Li Ming", etc., but how to automatically identify which are the names of the people through the computer system is a problem to be solved in this patent. [0003] The difficulty in solving this problem is that in Chinese, only words, sentences, and paragraphs can be delimited by obvious delimiters, but words do not have a formal delimiter. Example sentence: "Actress Ziyi Zhang won the prize", English can be naturally divided into words such as Actress / Margaret / Grace / won / the / prize, each word can expres...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/279G06F40/30
Inventor 陈运文纪达麒桂洪冠江永青张健
Owner DATAGRAND TECH INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products