Unlock instant, AI-driven research and patent intelligence for your innovation.

Webpage splitting method

A webpage segmentation and webpage technology, applied in the Internet field, can solve the problems of destroying the webpage structure and inconvenience for users to browse, and achieve the effect of high execution efficiency, avoiding separation, and easy realization

Active Publication Date: 2011-05-25
江西国云科技有限公司
View PDF4 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Obviously, this block result is not convenient for users to browse
[0010] In summary, the web page segmentation methods in the prior art will destroy the original web page structure to varying degrees, and there is an urgent need for a web page segmentation method that can well maintain the original web page structure

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage splitting method
  • Webpage splitting method
  • Webpage splitting method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0087] Consistent with the previous embodiment, the web page segmentation method of this embodiment includes segmentation based on multi-line blocks and merging based on topic blocks. The two steps in this embodiment are introduced respectively below.

[0088] 1. Segmentation based on multi-line blocks. Such as Figure 4 As shown, the segmentation method based on multi-line blocks includes the following steps:

[0089] Step 1: Traverse the DOM tree from bottom to top. Set the node's multiline block attribute value.

[0090] Each web page can be represented by a DOM tree, and the DOM tree representation of the web page can be obtained through the interface provided by the browser. A bottom-up traversal of the DOM tree is performed, and if a leaf node is encountered during the traversal, its multi-line block attribute is set to zero. If a non-leaf node is encountered, its child nodes are traversed first, and the y coordinate value of the child node is recorded. After all t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a webpage splitting method, which is characterized by comprising the following steps: step 1, acquiring the document object model (DOM) tree of a webpage to be split; step 2, traversing the DOM tree, taking each node only containing one basic multi-line node in the DOM tree as a basic block, and respectively integrating scattered leaf nodes between each two nodes only containing one basic multi-line node into a basic block, wherein when the father node of the basic multi-line node is taken as a basic block, the basic multi-line node is not used as a basic block any more; and step 3, finding out subject blocks in the basic blocks, and combining the subject blocks with a plurality of non-subject blocks adjacent to the subject blocks at the back. The method is used to preferably keep the original webpage structure at the same time of splitting the webpage into blocks, thus avoiding scattering links which belong to the same subject or class, and preventing the subjects or classification tags from being separated from the corresponding links. The method has the advantage of high execution efficiency, and is easy to realize.

Description

technical field [0001] The present invention relates to the technical field of the Internet, in particular, the present invention relates to a web page segmentation method. Background technique [0002] Internet technology has greatly changed people's daily life, and people can access the Internet more and more conveniently. In some cases, however, people's access to the Internet is subject to various restrictions. Because most of the webpages on the Internet are designed for large monitors, a webpage often contains a lot of content, and in some cases, people do not need or want to browse the information of the entire webpage. [0003] For example: when using some devices with small display screens (such as mobile phones) to surf the Internet, generally speaking, the screen cannot fully display a complete web page. It is good for users to browse; moreover, the Internet access bandwidth of mobile devices is low. If you want to download the entire web page to the mobile devi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 邓铸辉陈启华王向东钱跃良林守勋
Owner 江西国云科技有限公司