Method and device for extracting Chinese names of people and places

A technology of Chinese and place names, which is applied in special data processing applications, instruments, and electronic digital data processing, etc. It can solve the problems of low recognition rate of new words and low labeling accuracy, and achieve the effect of reducing dirty data and improving accuracy

Active Publication Date: 2016-05-11
XIAMEN MEIYA PICO INFORMATION
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 1. The rule-based method, which is to find out the composition rules of personal names and place names, and match the sample data according to the composition rules, has the advantages of high efficiency and fast matching speed, but there are also obvious shortcomings: the coverage needs to be continuously improved Definition, for ambiguous words, the recognition rate of new words is low, and the labeling accuracy rate is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting Chinese names of people and places
  • Method and device for extracting Chinese names of people and places
  • Method and device for extracting Chinese names of people and places

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] To further illustrate the various embodiments, the present invention is provided with accompanying drawings. These drawings are a part of the disclosure of the present invention, which are mainly used to illustrate the embodiments, and can be combined with related descriptions in the specification to explain the operating principles of the embodiments. With reference to these contents, those skilled in the art should understand other possible implementations and advantages of the present invention. Components in the figures are not drawn to scale, and similar component symbols are generally used to denote similar components.

[0035] The present invention will be further described in conjunction with the accompanying drawings and specific embodiments.

[0036] refer to figure 1 Shown, the present invention proposes a kind of method for extracting Chinese names and place names, and it comprises the following steps:

[0037] S1, convert the received text into UTF-8 enc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of natural language processing in computational linguistics, and particularly relates to a method and a device for extracting Chinese names of people and places. The method comprises the following steps: S1, transforming text into an encoding format of UTF-8 (unicode transformation format-8); S2, presetting a text threshold L, judging whether text length T is larger than the threshold L, adopting an extension paragraphing method to paragraph the text and turning to step S3 after paragraphing if T is larger than L, and turning to the step S3 if T is smaller than or equal to L; S3, preprocessing the text to remove dirty data; S4, performing part-of-speech tagging on separate Chinese characters in the preprocessed text, and performing word separation and combination on the tagged separate characters; S5, marking phrases in the text, which are matched with target phrases, and calculating matching results. The method and the device can be widely applied to recognition of named entities in fields of search engines, machine translation, data mining and the like.

Description

technical field [0001] The invention belongs to the field of natural language processing in computer linguistics, in particular to a method and device for extracting Chinese names of people and places. Background technique [0002] With the changes of the times, information has shown explosive growth. In order to extract useful information from massive data, various fields are vigorously researching related technologies. A hot spot in the analysis, due to the complexity and ambiguity of Chinese itself, the research on the extraction of Chinese names and place names lags behind that of English. [0003] The reference patent document CN104182423A discloses a method for automatic recognition of Chinese personal names based on conditional random fields. By studying the characteristics of Chinese personal names and combining with a statistical probability model, an automatic recognition system for Chinese personal names is constructed. Reference patent document CN103870489A disc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/295
Inventor 陈泽青苏再添吴少华
Owner XIAMEN MEIYA PICO INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products