Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Road entity data deduplication method and device, computing equipment and medium

A technology for entity data and roads, applied in the field of data processing, can solve the problem of inability to effectively determine repeated intelligence, and achieve the effect of improving the effectiveness of deduplication and processing efficiency.

Active Publication Date: 2019-09-10
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, due to the differences in the expressions of intelligence releases, even for the same road change event, the intelligence content released by different media is not the same. As a result, deduplication of web links and text similarity cannot be used to effectively duplicate intelligence. judgment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Road entity data deduplication method and device, computing equipment and medium
  • Road entity data deduplication method and device, computing equipment and medium
  • Road entity data deduplication method and device, computing equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] figure 1 It is a flow chart of the road entity data deduplication method provided by Embodiment 1 of the present invention. This embodiment can be applied to road source data obtained from the Internet, such as various network media data related to roads, for the same road entity In the case of deduplicated data, especially for the situation where a full-text analysis is required to determine whether the road source data is data describing repeated road entity events, the road entity data can be understood as effective data with processing value in the road source data. The method can be executed by a road entity data deduplication device, which can be implemented in the form of software and / or hardware, and can be integrated into any computing device, including but not limited to a server.

[0031] Such as figure 1 As shown, the road entity data deduplication method provided in this embodiment may include:

[0032] S110. Obtain at least one piece of road source data,...

Embodiment 2

[0044] figure 2 It is a flow chart of the method for deduplicating road entity data provided by Embodiment 2 of the present invention. This embodiment further optimizes and expands on the basis of the foregoing embodiments. Such as figure 2 As shown, the method may include:

[0045] S210. Obtain at least one piece of road source data, and classify at least one piece of road source data into at least one data subset according to the type of road entity event, wherein one data subset corresponds to one type of road entity event, and the road source data is used to describe the road Entity event.

[0046] S220. Use the first words obtained by segmenting the text content corresponding to each road source data in each data subset as road candidate words.

[0047] The first word in this embodiment refers to a certain number of words that describe road names. Exemplarily, in the process of parsing the text content corresponding to each road source data, the forward maximum matc...

Embodiment 3

[0062] image 3 It is a flow chart of the method for deduplicating road entity data provided by Embodiment 3 of the present invention. This embodiment further optimizes and expands on the basis of the foregoing embodiments. Such as image 3 As shown, the method may include:

[0063] S310. Obtain at least one piece of road source data, and classify at least one piece of road source data into at least one data subset according to the road entity event type, wherein one data subset corresponds to one road entity event type, and the road source data is used to describe the road Entity event.

[0064] S320. Determine the road name and the geographical area name in the text content corresponding to each road source data in each data subset.

[0065] S330, for at least one road name and at least two geographical area names determined from the text content corresponding to each road source data in each data subset, according to the affiliation relationship between the road and the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a road entity data deduplication method and device, computing equipment and a medium. The method comprises: acquiring road source data, classifying the road source data into at least one data subset according to road entity event types, one data subset corresponding to one road entity event type, and the road source data being used for describing the roadentity events; determining a road name and a geographic area name in the text content corresponding to each piece of road source data in each data subset; and according to the road name and the geographic area name in the text content corresponding to each piece of road source data, performing text matching in historical road source data belonging to the same road entity event type as the corresponding data subset, and determining a duplicate text in each data subset. According to the embodiment of the invention, the de-duplication effectiveness of the road entity data in the internet data canbe improved, so that the processing efficiency of mass road entity data is improved.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of data processing, and in particular to a method, device, computing device and medium for deduplication of road entity data. Background technique [0002] For map products, timely and effective acquisition of road change information in the real world, updating the map, and improving the timeliness of the map are important factors to ensure the accuracy of map data and user satisfaction. There is a large amount of road update information on the Internet. Therefore, capturing, mining and operating road information on the Internet is an important link to ensure the quality of map road data. In the face of massive Internet road information (for example, webpage news, Weibo official documents and public account articles, etc.), the probability of duplicate information is very high. Deduplication processing of Internet road information is very important for reducing labor costs and impro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/29G06F16/35G06F17/27
CPCG06F16/29G06F16/35G06F40/295
Inventor 马赛李江龙李烜赫
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products