Web page information block extracting method and apparatus
An extraction method and information block technology, applied in the field of extracting cohesive regions in web pages, can solve problems such as lack of versatility
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0026] figure 1
[0027] figure 2
[0028] .
[0029] In the repeated pattern discovery unit 203, a suffix tree of the HTML tag token stream is constructed, and all repeated patterns and corresponding occurrences are retrieved from the suffix tree.
[0030] Figure 4 An example suffix tree with an input token stream and six token-suffixes is demonstrated in . The suffix tree used for token flow is defined as (∑, C, E, N, S, φ, <):
[0031] ∑ is the input token letter.
[0032] C is the input token sequence. Each token c∈C, c∈∑.
[0033] E is the set of arcs in the suffix tree. Each arc e∈E in the suffix tree represents a token in ∑.
[0034] N is the set of internal nodes within the suffix tree.
[0035] S is the set of leaf nodes.
[0036] φ represents the dummy suffix tree root.
[0037] 2 is node N 1 As a node in a subsuffix tree of the root, n 1 2 .
[0038] If two nodes n i and n j exist n i j relationship, the path n conn...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com