A Chinese address semantic tagging method based on Bayesian word segmentation algorithm

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of Chinese address and semantic annotation, applied in computing, geographic information database, natural language data processing, etc., to achieve fast and accurate semantic analysis

Active Publication Date: 2019-05-28

WUHAN INSTITUTE OF TECHNOLOGY

View PDF3 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

It is well known that there is no clear separator for each element in Chinese address information, and the identification of Chinese address elements in related technologies still faces various difficulties such as analytical integrity, diversity, and ambiguity.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0076] The specific implementation process of the present invention will be described below by taking the Chinese address "Yenlord Food Plaza, No. 137, Dongma Road, Nankai District" as an example.

[0077] P1: Set the annotation relationship table, which can be designed as shown in Table 1.

[0078] P2: Obtain the set T of pre-segmented and marked NT pieces of Chinese address data as the training corpus, set the set T={T i}, where each piece of Chinese address data is T i , and 1≤i≤NT.

[0079] P3: Perform statistical learning on the set T. The specific steps of statistical learning include:

[0080] P31: Count each word segmented in the set T, the word frequency of each word, and the frequency value of each word and its adjacent previous word at the same time, and store it in the word frequency dictionary Word_dic;

[0081] P32: Count each word and the tagging relationship corresponding to the word, and store it in the tagging relationship dictionary Taging_dic;

[0082] P3...

Embodiment 2

[0094] The above embodiment 1 is the case where the address information does not contain uncertain tagged related words. Next, the specific implementation process of the present invention will be described by taking the Chinese address "Lane 98, Bixiu Road, Minhang District, Shanghai" as an example.

[0095] A1: Set the annotation relationship table, which can be designed as shown in Table 1.

[0096] A2: Obtain the set T of pre-segmented and marked NT pieces of Chinese address data as the training corpus, set the set T={T i}, where each piece of Chinese address data is T i , and 1≤i≤NT.

[0097] A3: Perform statistical learning on the set T. The specific steps of statistical learning include:

[0098] A31: Count each word segmented in the set T, the word frequency of each word, and the frequency value of each word and its adjacent word at the same time, and store it in the word frequency dictionary Word_dic;

[0099] A32: Count each word and the tagging relationship corres...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a Chinese address semantic tagging method based on the Bayes word segmentation algorithm. The method comprises the steps of: S1, presetting a tagging relationship list for semantic tagging of Chinese address data; S2, acquiring pre-segmented and tagged training corpus data; S3, performing statistical learning on the training corpus data to obtain a word frequency dictionary, a tagging relationship dictionary and a tagging mode list; S4, inputting to-be-tagged address strings for full segmentation; S5, acquiring a word segmentation scheme with the maximum probability based on the word frequency dictionary and the Bayes word segmentation algorithm; S6, performing tagging on the work segmentation scheme according to the tagging relationship dictionary to obtain a tagging result. The tagging relationship list sets a normalized tagging template; a database including the word frequency dictionary, the tagging relationship dictionary and the tagging mode list is obtained via the statistical learning of the training corpus data; tagged Chinese addresses with semantic information are obtained through training database matching, and thus semantic parsing of Chinese address data can be rapidly and accurately completed.

Description

technical field [0001] The invention relates to the technical field of Chinese address analysis, in particular to a Chinese address semantic labeling method based on a Bayesian word segmentation algorithm. Background technique [0002] With the development of the Internet, more and more network content comes directly from information uploaded and shared by users. For some websites that provide life consumption platforms, they will receive thousands of pieces of merchant address information uploaded by users, and a large part of these address information is free text, without display structure and implicit semantics Additional description information for . [0003] In the Internet location service, the geographic location can be expressed in many ways, and the Chinese address is one of them. A standardized Chinese address should contain complete administrative divisions, and be expressed in the order of administrative divisions (province / city / county / township / village), roads...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F17/27G06F16/35

CPCG06F16/29G06F40/289

Inventor 黄爽李晓林谢婷婷严柯刘志杰段艳会张玉敏

Owner WUHAN INSTITUTE OF TECHNOLOGY

A Chinese address semantic tagging method based on Bayesian word segmentation algorithm

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology