Chinese address compound word segmentation technology based on rules and statistic model

A technology of Chinese address and word segmentation technology, applied in the field of geographic information, can solve the problems that word segmentation technology cannot effectively identify new words, no combination of different technologies, single word segmentation mode, etc., to achieve strong pattern recognition ability, good ambiguity address discrimination ability, The effect of guaranteeing efficiency

Inactive Publication Date: 2015-08-19
裴克铭管理咨询(上海)有限公司
View PDF2 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, there are many address word segmentation modes in the address matching methods in the prior art, some of which have a single word segmentation mode, and do not use different technologies in combination, or the combination efficiency of different technologies is not high, resulting in the word segmentation technology based on a single rule. The effective recognition of new words and the slow speed of word segmentation based on a single statistical model, if the two can be used together, improving the accuracy of word segmentation under the premise of ensuring efficiency has become an urgent need

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0008] The present invention will be described in detail below in conjunction with specific embodiments.

[0009] A Chinese address compound word segmentation technology based on rules and statistical models, which comprehensively uses the conditional random field model and the rule-optimized maximum matching algorithm to perform word segmentation processing on the address; the use of the conditional random field model needs to extract the associated features inside the address information , use the training data set created in the preprocessing stage to train the model, so that it has the ability to automatically segment address information and identify address elements. The conditional random field model has a powerful pattern recognition ability, which can successfully identify the cells missed by the database, and has a good ability to distinguish ambiguous addresses, which is helpful for successfully distinguishing address elements. The rule-optimized maximum matching alg...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese address compound word segmentation technology based on rules and a statistic model. The word segmentation processing is carried out on addresses by comprehensively utilizing a condition random field model and a maximum matching algorithm optimized by the rules; and the condition random field model extracts related characteristics of address inner information, and a training data set established by a pre-processing phase to train the model so that the Chinese address compound word segmentation technology has the capabilities of automatically segmenting address information and identifying address factors. The condition random field model has a strong model identification capability and can be used for successfully identifying cells with database omission, and also has a good ambiguous address identification capability so that the address factors can be successfully distinguished. An MMSEG algorithm has the characteristics of rapid speed, high precision and the like under the good condition of dictionary data support. According to the Chinese address compound word segmentation technology, the two algorithms are combined and can be mutually supplemented and identified, so that the address matching accuracy is effectively improved, and the word segmentation accuracy is improved under the condition that the efficiency is guaranteed.

Description

technical field [0001] The invention belongs to the technical field of geographic information, and in particular relates to a Chinese address compound word segmentation technology based on rules and statistical models. Background technique [0002] Address matching is the process of establishing a correspondence between a literal description address and its geographic location coordinates. The address matching service follows specific steps to find a match for an address. First, the address is standardized; then the server searches address matching reference data to find potential locations; assigns a score to each candidate location based on its proximity to the address, and finally The highest value to match this address. At present, there are many address word segmentation modes in the address matching methods in the prior art, some of which have single word segmentation modes, do not use different technologies in combination, or the combination efficiency of different ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 沈启明密铁宾
Owner 裴克铭管理咨询(上海)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products