Webpage core block determining method based on DOM (Document Object Model) node text density
A determination method and DOM tree technology, applied in the field of web page core block determination algorithm, can solve the problems of easy loss of density, insufficient use of noise data, difficult application integration, etc., and achieve the effect of good discrimination
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0054] The preferred embodiments of the present invention will be specifically described below in conjunction with the accompanying drawings.
[0055] This embodiment uses an actual page of New York Times as an example. On the page, it contains many pictures, text and links. The specific articles included in the page are the core content of the web page.
[0056] First parse it into a DOM tree. Select one of the codes as an example, as follows:
[0057]
[0058]
[0059] The ellipsis in the code indicates some other node information, which is replaced by ellipsis for simplified representation. Parsing it into a DOM tree such as figure 1 shown.
[0060] Then calculate the DOM tree of the entire page to obtain the text density value of each node and the density sum of its child nodes. The results are as follows:
[0061] : Chars=6094, Tags=541, LinkChars=3243, LinkTags=445, Density=4.18771, densitySum=4.18549
[0062] : Chars=6094, Tags=533, LinkChars=3243, LinkTags...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com