Check patentability & draft patents in minutes with Patsnap Eureka AI!

Webpage main body information extraction method and device and storage medium

A technology of subject information and extraction method, which is applied in digital data information retrieval, website content management, network data retrieval, etc.

Active Publication Date: 2021-03-19
中科大数据研究院
View PDF14 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, there are many useless data in addition to the main information in the current web pages. For example, in addition to the main data news, there will also be some advertisements on the news web page. These useless data not only occupy a lot of Internet resources, but also affect The efficiency of users to obtain information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage main body information extraction method and device and storage medium
  • Webpage main body information extraction method and device and storage medium
  • Webpage main body information extraction method and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0086] Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of apparatuses and methods consistent with aspects of the invention as recited in the appended claims.

[0087] The terminology used in the present invention is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used herein and in the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and / or" as use...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a webpage main body information extraction method and device and a storage medium, and the method comprises the steps: webpage source code obtaining: receiving webpage address information, and obtaining the webpage source code of a webpage in the Internet according to the webpage address information, wherein the webpage source code comprises at least one node, and the node comprises at least one label; and main body information extraction: traversing each node in the webpage source code, judging whether the information of the node is main body information or not according to the label in the node, and if so, extracting the information in the node as the main body information. Firstly, a webpage is found in the Internet through webpage address information, a main bodypart in the webpage is recognized by processing a webpage source code, and when a user browses the webpage, information of the main body part can be directly browsed, so that on one hand, network resources occupied by useless information of a non-main body part are reduced, and the utilization rate of Internet resources is increased; on the other hand, the information acquisition efficiency of users is improved.

Description

technical field [0001] The invention relates to the technical field of computer data mining, in particular to a method, device and storage medium for extracting webpage main body information. Background technique [0002] At the beginning of this century, people obtained information from the outside world mainly through media channels such as newspapers, radio stations, radio and television stations. With the advancement of science and technology, modern people obtain information in various ways, such as mobile phones or computers. Electronic devices obtain information from the Internet by browsing web pages. [0003] However, there are many useless data in addition to the main information in the current web pages. For example, in addition to the main data news, there will also be some advertisements on the news web page. These useless data not only occupy a lot of Internet resources, but also affect It affects the efficiency of users to obtain information. Therefore, how ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/958G06F16/951G06F16/9535
CPCG06F16/951G06F16/9535G06F16/958
Inventor 李玺冯凯王元卓
Owner 中科大数据研究院
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More