Road entity data deduplication method and device, computing equipment and medium

A technology for entity data and roads, applied in the field of data processing, can solve the problem of inability to effectively determine repeated intelligence, and achieve the effect of improving the effectiveness of deduplication and processing efficiency.

Active Publication Date: 2019-09-10
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, due to the differences in the expressions of intelligence releases, even for the same road change event, the intelligence content released by

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Road entity data deduplication method and device, computing equipment and medium
  • Road entity data deduplication method and device, computing equipment and medium
  • Road entity data deduplication method and device, computing equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0029] Example one

[0030] figure 1 It is a flowchart of a road entity data de-duplication method provided in the first embodiment of the present invention. This embodiment can be applied to road source data obtained from the Internet, such as various network media data related to roads. In the case of removing the duplicate data of the road, especially for the case where full-text analysis is required to determine whether the road source data is data describing repeated road entity events, the road entity data can be understood as effective data with processing value in the road source data. The method can be executed by a road entity data deduplication device, which can be implemented in software and / or hardware, and can be integrated in any computing device, including but not limited to a server.

[0031] Such as figure 1 As shown, the road entity data deduplication method provided in this embodiment may include:

[0032] S110. Obtain at least one road source data, and classify ...

Example Embodiment

[0043] Example two

[0044] figure 2 It is a flowchart of a road entity data deduplication method provided in the second embodiment of the present invention. This embodiment is further optimized and extended on the basis of the above-mentioned embodiment. Such as figure 2 As shown, the method can include:

[0045] S210. Obtain at least one road source data, and classify the at least one road source data into at least one data subset according to the road entity event type, where one data subset corresponds to a road entity event type, and the road source data is used to describe the road. Entity event.

[0046] S220: Use the first word obtained by word segmentation on the text content corresponding to each road source data in each data subset as a road candidate word.

[0047] The first word in this embodiment refers to a certain number of words that have the characteristics of describing road names. Exemplarily, in the process of parsing the text content corresponding to each roa...

Example Embodiment

[0061] Example three

[0062] image 3 It is a flowchart of a road entity data deduplication method provided in the third embodiment of the present invention. This embodiment is further optimized and extended on the basis of the foregoing embodiment. Such as image 3 As shown, the method can include:

[0063] S310. Obtain at least one road source data, and classify the at least one road source data into at least one data subset according to the road entity event type, where one data subset corresponds to a road entity event type, and the road source data is used to describe the road. Entity event.

[0064] S320: Determine the road name and geographic area name in the text content corresponding to each road source data in each data subset.

[0065] S330. For at least one road name and at least two geographical area names determined from the text content corresponding to each road source data in each data subset, according to the affiliation relationship between the road and the geogra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a road entity data deduplication method and device, computing equipment and a medium. The method comprises: acquiring road source data, classifying the road source data into at least one data subset according to road entity event types, one data subset corresponding to one road entity event type, and the road source data being used for describing the roadentity events; determining a road name and a geographic area name in the text content corresponding to each piece of road source data in each data subset; and according to the road name and the geographic area name in the text content corresponding to each piece of road source data, performing text matching in historical road source data belonging to the same road entity event type as the corresponding data subset, and determining a duplicate text in each data subset. According to the embodiment of the invention, the de-duplication effectiveness of the road entity data in the internet data canbe improved, so that the processing efficiency of mass road entity data is improved.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of data processing, and in particular to a method, device, computing device and medium for deduplication of road entity data. Background technique [0002] For map products, timely and effective acquisition of road change information in the real world, updating the map, and improving the timeliness of the map are important factors to ensure the accuracy of map data and user satisfaction. There is a large amount of road update information on the Internet. Therefore, capturing, mining and operating road information on the Internet is an important link to ensure the quality of map road data. In the face of massive Internet road information (for example, webpage news, Weibo official documents and public account articles, etc.), the probability of duplicate information is very high. Deduplication processing of Internet road information is very important for reducing labor costs and impro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/29G06F16/35G06F17/27
CPCG06F16/29G06F16/35G06F40/295
Inventor 马赛李江龙李烜赫
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products