Supercharge Your Innovation With Domain-Expert AI Agents!

Chinese address word segmentation tagging method

A Chinese address and word segmentation technology, applied in the field of data processing, can solve the problem of low accuracy and achieve the effect of high accuracy

Active Publication Date: 2017-09-01
SHENZHEN AUDAQUE DATA TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the accuracy rate of Chinese address word segmentation and tagging by applying general word segmentation tagging or entity recognition technology is not very high, only about 80%.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese address word segmentation tagging method
  • Chinese address word segmentation tagging method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The technical solution and beneficial effects of the present invention will be apparent through the detailed description of specific embodiments of the present invention in conjunction with the accompanying drawings.

[0028] The present invention adopts the word segmentation framework based on the conditional random field model, and the conditional random field related technology adopts the open-source CRF++ tool. CRF++ is a well-known open source tool for conditional random fields, and it is also the CRF tool with the best comprehensive performance at present.

[0029] Corresponding to the marked label of the address, the present invention defines the following concepts of the address:

[0030] Province: the first-level administrative region stipulated by the "National Geographical Name and Address Data Specification", including: provinces, municipalities directly under the Central Government, autonomous regions, and special administrative regions;

[0031] City: the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Chinese address word segmentation tagging method. The method includes: step 11, artificial word segmentation and labeling selected address data as training data; step 12, for single Arabic numerals or English alphabetic characters and multiple continuous Arabic numerals or English alphabetic characters, specifying single Arabic numerals or English alphabetic characters Numerical characters or English alphabetic characters are replaced; step 13, the training data is converted into the format required by the CRF++ tool; step 14, the feature template is defined; step 15, the word segmentation model and the labeling model are respectively established using the CRF++ tool; step 16, a single that appears in the address Arabic numerals or English alphabetic characters and multiple consecutive Arabic numerals or English alphabetic characters are replaced by the specified single Arabic numerals or English alphabetic characters; step 17, word segmentation with CRF++ tool; step 18, restore the Arabic before replacement Numeric characters or English alphabetic characters. The Chinese address participle labeling method of the present invention has a high accuracy rate.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a Chinese address word segmentation tagging method. Background technique [0002] When people fill in the recipient address, office address, home address and other address information, they usually write together the province, city, district, house number, residential area, room number and other information to form address information, such as "Guangdong Shenzhen Nanshan District 713, Software Building, Science and Technology Park, No. 9, High-tech Middle Road, this writing method is suitable for manual recognition, and when the structural details of the address are recognized by the machine, the first processing is word segmentation and labeling, and the input long text is divided into one by one words, and mark the attributes of the words. For example, the word segmentation result of the address information in the above example may be: "Guangdong / Shenzhen, Nanshan Distr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
CPCG06F40/20
Inventor 王明兴贾西贝
Owner SHENZHEN AUDAQUE DATA TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More