A method for identifying link blocks of web pages based on block tree
An identification method and technology of linking blocks, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of accurate judgment of link blocks, accurate judgment of interference of link blocks, ignoring the number of links, etc., to achieve easy Flexible quantity scale, fast recognition speed, and guaranteed fine effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0100] The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.
[0101] refer to figure 1 , a flowchart of the present invention, a method for identifying web page link blocks based on a block tree, comprising the steps of:
[0102] Step 1: Input a collection of web pages, wherein, step 1 includes the following steps;
[0103] Step 1.1 Encoding identification: first obtain the web page encoding format UTF-8, GB2312, etc.;
[0104] Step 1.2 webpage reading: by character scanning the HTML document of the WEB webpage to be identified, identify the starting position and the ending position respectively;
[0105] Define the following concepts:
[0106] Word
[0107] The starting position starts with the character "", and there is no string of characters "" between the two;
[0108] The end position starts with the character "", and...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com