Unlock instant, AI-driven research and patent intelligence for your innovation.

Webpage content processing method and device and storage medium

A web page content and processing method technology, which is applied in the direction of website content management, word processing, image data processing, etc., can solve the problems of inaccurate text extraction and affecting the user's reading experience, so as to improve the user experience, wide application range, and simple implementation Effect

Active Publication Date: 2018-12-07
ZTE CORP
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although this method can realize the extraction of the text content required in the webpage, due to the frequent changes of the webpage layout or label attributes, the text extraction is often inaccurate and affects the user's reading experience

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage content processing method and device and storage medium
  • Webpage content processing method and device and storage medium
  • Webpage content processing method and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0077] In order to understand the characteristics and technical contents of the embodiments of the present invention in more detail, the implementation of the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. The attached drawings are only for reference and description, and are not intended to limit the present invention.

[0078] In the description of the embodiments of the present invention, it should be noted that, unless otherwise specified and limited, the terms "first\second" involved in the embodiments of the present invention are only used to distinguish similar objects, and do not represent specific Sorting, it is understandable that "first\second" can be exchanged for a specific order or sequence if allowed. It is to be understood that the terms "first\second" are interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of practice in sequ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a webpage content processing method which comprises the following steps: determining the starting position and the ending position of the longest continuous image in the longitudinal axis direction in the histogram statistical array of the webpage snapshot; determining the first starting position and the first ending position of the text in the webpage snapshot in the longitudinal axis direction between the starting position and the ending position in the longitudinal axis direction based on the resolution of the webpage snapshot; and determining the second starting position and the second ending position of the text in the webpage snapshot in the longitudinal axis direction between the first starting position and the first ending position based on the rendering tree of the webpage snapshot,. The invention also discloses a webpage content processing device and a storage medium.

Description

technical field [0001] The invention relates to a web page content extraction technology of an Internet browser, in particular to a web page content processing method and device, and a storage medium. Background technique [0002] In the prior art, in order to avoid frequent page-turning operations, when there are multiple pages of news or novel text content on the website, the content in the webpage can be extracted and spliced ​​into one webpage based on the user's request. In order to extract the text content required in the webpage, the commonly used method in the prior art is to find the position of the text nearby according to the label position of the next page button in the webpage, or to manually traverse the webpages of various websites and write down the label attributes corresponding to the text , and then use this attribute to find the label element of the body. Although this method can realize the extraction of the required text content in the webpage, because...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/986G06F40/106G06F40/14G06T2207/30176G06T7/70G06F16/951G06F16/958
Inventor 曹刚
Owner ZTE CORP