Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese address word segmentation and annotation method

A Chinese address and word segmentation technology, applied in the field of data processing, can solve the problem of low accuracy and achieve the effect of high accuracy

Active Publication Date: 2015-09-23
SHENZHEN AUDAQUE DATA TECH
View PDF5 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the accuracy rate of Chinese address word segmentation and tagging by applying general word segmentation tagging or entity recognition technology is not very high, only about 80%.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese address word segmentation and annotation method
  • Chinese address word segmentation and annotation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The technical solution and beneficial effects of the present invention will be apparent through the detailed description of specific embodiments of the present invention in conjunction with the accompanying drawings.

[0028] The present invention adopts the word segmentation framework based on the conditional random field model, and the conditional random field related technology adopts the open-source CRF++ tool. CRF++ is a well-known open source tool for conditional random fields, and it is also the CRF tool with the best comprehensive performance at present.

[0029] Corresponding to the marked label of the address, the present invention defines the following concepts of the address:

[0030] Province: the first-level administrative region stipulated by the "National Geographical Name and Address Data Specification", including: provinces, municipalities directly under the Central Government, autonomous regions, and special administrative regions;

[0031] City: the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to a Chinese address word segmentation and annotation method. The method comprises: step 11, selecting address data by means of manual word segmentation and annotation as training data; step 12, substituting specified a single Arabic numerical character or English character for a present single Arabic numerical character or English character and a plurality of continuous Arabic numerical characters or English characters; step 13, converting the training data into a data format desired by the CRF++ tool; step 14, defining a feature profile; step 15, respectively establishing a word segmentation model and an annotation model by using the CRF++ tool; step 16, substituting the specified single Arabic numerical character or English character for the single Arabic numerical character or English character and the plurality of Arabic numerical characters or English characters present in the address; step 17, performing word segmentation and annotation by using the CRF++ tool; and step 18, recovering the Arabic numerical character or English character before the substitution. The Chinese address word segmentation and annotation method according to the present invention achieves high accuracy.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a Chinese address word segmentation tagging method. Background technique [0002] When people fill in the recipient address, office address, home address and other address information, they usually write together the province, city, district, house number, residential area, room number and other information to form address information, such as "Guangdong Shenzhen Nanshan District 713, Software Building, Science and Technology Park, No. 9, High-tech Middle Road, this writing method is suitable for manual recognition, and when the structural details of the address are recognized by the machine, the first processing is word segmentation and labeling, and the input long text is divided into one by one words, and mark the attributes of the words. For example, the word segmentation result of the address information in the above example may be: "Guangdong / Shenzhen, Nanshan Distr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
CPCG06F40/20
Inventor 王明兴贾西贝
Owner SHENZHEN AUDAQUE DATA TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products