Unlock instant, AI-driven research and patent intelligence for your innovation.

Semantic deduplication method and device for uniform resource locators, equipment and medium

A resource locator and semantic technology, applied in the network field, can solve problems such as URL misjudgment, and achieve the effect of reducing the number of URLs that are mistakenly deleted

Active Publication Date: 2018-12-07
SF TECH
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0014] It can be seen that the current URL semantic deduplication method has the problem of URL misjudgment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic deduplication method and device for uniform resource locators, equipment and medium
  • Semantic deduplication method and device for uniform resource locators, equipment and medium
  • Semantic deduplication method and device for uniform resource locators, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain related inventions, rather than to limit the invention. It should also be noted that, for ease of description, only parts related to the invention are shown in the drawings.

[0036] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

[0037] The present application will be described in detail below with reference to the accompanying drawings and embodiments.

[0038] Such as figure 1 As shown, it is an exemplary flow chart of a URL semantic deduplication method provided by the embodiment of the present application. The method comprises the steps of:

[0039] Step 110, determine the hash value of each URL.

[0040] Specifically, first, segmen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a semantic deduplication method and device for uniform resource locators, equipment and a medium. The method comprises the steps of determining a hash value of each uniform resource locator (URL); dividing a plurality of URL sets on the basis of the hash value of each URL, wherein any two URLs in each URL set are similar; constructing a spanning tree of each URL set; pruning the spanning tree of each URL set according to a preset branch number threshold and obtaining the pruned spanning tree of each URL set; and traversing the pruned spanning tree of each URL set and obtaining the deduplicated URL sets. According to the technical scheme in the embodiment, the number of accidentally deleted URLs can be effectively reduced.

Description

technical field [0001] The present disclosure relates to the field of network technologies, and in particular to a method, device, device and medium for semantic deduplication of Uniform Resource Locators (Uniform Resource Locator, URL). Background technique [0002] In web applications, different URLs correspond to different functional interfaces. Extracting these URLs is the primary task of many practical applications. For example, in security penetration testing and URL page traffic statistics, it is necessary to find out the URL information existing in the system. In the process of extracting URLs, deduplication of URLs can greatly reduce the number of redundant URLs and improve the efficiency of follow-up work. There is such a situation in the process of URL deduplication: [0003] Group A URL list: [0004] http: / / abc.com / yun / task / 1 [0005] http: / / abc.com / yun / task / 2 [0006] … [0007] http: / / abc.com / yun / task / 100 [0008] As shown in the URL list of Group A abov...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 张振海罗剑江胡泽柱
Owner SF TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More