Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Chinese address semantic tagging method based on the Bayes word segmentation algorithm

A technology of Chinese address and semantic annotation, applied in computing, geographic information database, natural language data processing, etc., to achieve the effect of fast and accurate semantic analysis

Active Publication Date: 2017-03-22
WUHAN INSTITUTE OF TECHNOLOGY
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It is well known that there is no clear separator for each element in Chinese address information, and the identification of Chinese address elements in related technologies still faces various difficulties such as analytical integrity, diversity, and ambiguity.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese address semantic tagging method based on the Bayes word segmentation algorithm
  • A Chinese address semantic tagging method based on the Bayes word segmentation algorithm
  • A Chinese address semantic tagging method based on the Bayes word segmentation algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0076] The specific implementation process of the present invention will be described below by taking the Chinese address "Yenlord Food Plaza, No. 137, Dongma Road, Nankai District" as an example.

[0077] P1: Set the annotation relationship table, which can be designed as shown in Table 1.

[0078] P2: Obtain the set T of pre-segmented and marked NT pieces of Chinese address data as the training corpus, set the set T={T i}, where each piece of Chinese address data is T i , and 1≤i≤NT.

[0079] P3: Perform statistical learning on the set T. The specific steps of statistical learning include:

[0080] P31: Count each word segmented in the set T, the word frequency of each word, and the frequency value of each word and its adjacent previous word at the same time, and store it in the word frequency dictionary Word_dic;

[0081] P32: Count each word and the tagging relationship corresponding to the word, and store it in the tagging relationship dictionary Taging_dic;

[0082] P3...

Embodiment 2

[0094] The above embodiment 1 is the case where the address information does not contain uncertain tagged related words. Next, the specific implementation process of the present invention will be described by taking the Chinese address "Lane 98, Bixiu Road, Minhang District, Shanghai" as an example.

[0095] A1: Set the annotation relationship table, which can be designed as shown in Table 1.

[0096] A2: Obtain the set T of pre-segmented and marked NT pieces of Chinese address data as the training corpus, set the set T={T i}, where each piece of Chinese address data is T i , and 1≤i≤NT.

[0097] A3: Perform statistical learning on the set T. The specific steps of statistical learning include:

[0098] A31: Count each word segmented in the set T, the word frequency of each word, and the frequency value of each word and its adjacent word at the same time, and store it in the word frequency dictionary Word_dic;

[0099] A32: Count each word and the tagging relationship corres...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a Chinese address semantic tagging method based on the Bayes word segmentation algorithm. The method comprises the steps of: S1, presetting a tagging relationship list for semantic tagging of Chinese address data; S2, acquiring pre-segmented and tagged training corpus data; S3, performing statistical learning on the training corpus data to obtain a word frequency dictionary, a tagging relationship dictionary and a tagging mode list; S4, inputting to-be-tagged address strings for full segmentation; S5, acquiring a word segmentation scheme with the maximum probability based on the word frequency dictionary and the Bayes word segmentation algorithm; S6, performing tagging on the work segmentation scheme according to the tagging relationship dictionary to obtain a tagging result. The tagging relationship list sets a normalized tagging template; a database including the word frequency dictionary, the tagging relationship dictionary and the tagging mode list is obtained via the statistical learning of the training corpus data; tagged Chinese addresses with semantic information are obtained through training database matching, and thus semantic parsing of Chinese address data can be rapidly and accurately completed.

Description

technical field [0001] The invention relates to the technical field of Chinese address analysis, in particular to a Chinese address semantic labeling method based on a Bayesian word segmentation algorithm. Background technique [0002] With the development of the Internet, more and more network content comes directly from information uploaded and shared by users. For some websites that provide life consumption platforms, they will receive thousands of pieces of merchant address information uploaded by users, and a large part of these address information is free text, without display structure and implicit semantics Additional description information for . [0003] In the Internet location service, the geographic location can be expressed in many ways, and the Chinese address is one of them. A standardized Chinese address should contain complete administrative divisions, and be expressed in the order of administrative divisions (province / city / county / township / village), roads...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/29G06F40/289
Inventor 黄爽李晓林谢婷婷严柯刘志杰段艳会张玉敏
Owner WUHAN INSTITUTE OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products