Unlock instant, AI-driven research and patent intelligence for your innovation.

Address entity co-reference resolution method based on density clustering algorithm

A density clustering algorithm and coreference resolution technology, applied in the field of address entity coreference resolution, can solve problems such as difficulty in identifying abbreviations and aliases, and difficulty in address entity coreference resolution

Active Publication Date: 2019-12-24
南京安链数据科技有限公司
View PDF6 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the problems of address entity coreference resolution difficulty and difficulty in identifying abbreviations and aliases, the present invention proposes an address entity coreference resolution method based on a density clustering algorithm, using electronic maps to obtain address coordinates, and through improved density clustering The distance measurement method clusters the addresses twice in a row, unifies the different names of the same building in the address, and aggregates the addresses in the same building to realize the coreference resolution of address entities

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Address entity co-reference resolution method based on density clustering algorithm
  • Address entity co-reference resolution method based on density clustering algorithm
  • Address entity co-reference resolution method based on density clustering algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] Below in conjunction with accompanying drawing, technical scheme of the present invention will be further described:

[0040] The present invention proposes an address entity coreference resolution method based on a density clustering algorithm, such as figure 1 and2 As shown, the steps are as follows:

[0041] S1. Store all the original addresses that need to be processed in the address data set in turn. Each original address includes address information and a unique ID. The address information here is generally composed of words and numbers, such as "Ninghai Road Street Shanxi Road Room 1806, World Trade Center Building, No. 67", the ID can be a number or a code, and is mainly used to identify the original address. Use the address resolution function of the electronic map interface to process the original addresses in the address data set, and obtain the geographic coordinates and formatted addresses corresponding to each original address. The geographic coordinates ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an address entity co-reference resolution method based on a density clustering algorithm. The method comprises the following steps: S1, processing an address data set through anaddress resolution function of an electronic map interface, and obtaining geographic coordinates and formatted addresses; S2, calculating a geographic distance and a name distance according to the geographic coordinates and the formatted address, and performing density clustering for the first time to obtain a plurality of clustered clusters; S3, performing re-clustering according to a distance measurement method with strict name distance and loose geographical distance to obtain a plurality of super-clusters; and S4, calculating the name of the super-cluster, and taking the name of the super-cluster as the name of the building. According to the method, the text information and the geographic information of the address are combined, the address in the same building can be accurately found, and the influence of address alias, abbreviation and wrongly written characters is greatly reduced and even eliminated.

Description

technical field [0001] The invention relates to a method for address entity coreference resolution, which belongs to the technical field of machine learning. Background technique [0002] Address data is usually a semi-structured or even unstructured data. Most of the address data comes from manual filling without unified and standardized processing, so it inherits the inaccuracy of natural language itself, and many places have their aliases or abbreviations, for example, "National Leading Talent Pioneering Park" is short for "National Chuangyuan", these two names point to the same building (or building group), but the names are quite different, and it is difficult for general programs to classify these two names into one category; there are also many identical names pointing to different places, for example, There can be many Wanda Centers in a city, so addresses like "Wanda Center" cannot be classified as the same building. In addition, due to human factors such as cleri...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/28G06F16/29
CPCG06F16/29G06F16/285Y02D10/00
Inventor 袁栩栩李一明
Owner 南京安链数据科技有限公司