Method and device for recognizing similar webpages
A web page and similar technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as long time consumption, errors, low efficiency, etc., and achieve the effect of making up for low efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0064] The embodiment of the present invention provides a method for identifying similar web pages, see figure 1 , The method flow includes:
[0065] 101: Obtain hypertext markup language HTML element information of a first webpage to be classified and HTML element information of a second webpage with known category information, respectively;
[0066] 102: Calculate the similarity between the first webpage and the second webpage according to the HTML element information of the first webpage and the second webpage;
[0067] 103: When the degree of similarity is greater than a preset similarity threshold, determine that the first webpage and the second webpage are similar webpages.
[0068] The embodiment of the present invention obtains the HTML element information of the first webpage to be classified and the second webpage of the known category, and calculates the similarity according to the HTML element information corresponding to the two webpages to determine whether the two webpag...
Embodiment 2
[0070] The embodiment of the present invention provides a method for identifying similar web pages, see figure 2 , The method flow includes:
[0071] 201: Obtain the hypertext markup language HTML element information of the first webpage to be classified and the HTML element information of the second webpage with known category information respectively.
[0072] Step 201 can be specifically:
[0073] 2011: Acquire the DOM structure information of the document object model of the first webpage according to the URL address of the uniform resource locator of the first webpage to be classified.
[0074] The webpage information of the first webpage to be classified corresponding to the specified URL is crawled by a webpage crawler, where the webpage information is the HTML code of the webpage, and the DOM structure information of the first webpage is obtained from the HTML code.
[0075] 2012: Acquire the DOM structure information of the second webpage with known category information from t...
Embodiment 3
[0117] The embodiment of the present invention provides a device for identifying similar webpages, see Figure 4 , The device includes:
[0118] The first obtaining module 401 is configured to obtain the hypertext markup language HTML element information of the first webpage to be classified and the HTML element information of the second webpage with known category information respectively;
[0119] The calculation module 402 is configured to calculate the similarity between the first webpage and the second webpage according to HTML element information of the first webpage and the second webpage;
[0120] The determining module 403 is configured to determine that the first webpage and the second webpage are similar webpages when the similarity is greater than a preset similarity threshold.
[0121] The embodiment of the present invention obtains the HTML element information of the first webpage to be classified and the second webpage of the known category, and calculates the...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com