Webpage information extracting method, device and terminal

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for web page information and text, applied in the electronic field, can solve the problems of affecting retrieval results, wasting user reading time, etc., and achieve the effect of improving the extraction speed

Active Publication Date: 2015-01-07

GUANGZHOU KINGSOFT NETWORK TECH

View PDF5 Cites 18 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] Web page information includes text content, advertisement information, email login information, etc., and the text content is generally in the middle of the web page display interface. In the existing technical solution, the crawler searches the entire web page information every time to extract useful information. , but in fact the crawler only needs to extract the text content blocks in the web page display interface, and searching for other information such as advertisement information in the web page display interface will inevitably affect the retrieval results and waste the user's reading time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0073] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0074] Please refer to figure 1 , figure 1 It is a flow chart of the first embodiment of a web page information extraction method proposed by the present invention. As shown in the figure, the information extraction method in the embodiment of the present invention includes:

[0075] S101. Parse webpage information and generate a tag tree to obtain the webpage information, where the tag tree includes a plurality of nodes, and each node of the tag tre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention discloses a webpage information extracting method. The method comprises analyzing webpage information and obtaining the tag tree of the webpage information, wherein the tag tree comprises a plurality of nodes, and every node corresponds to one content block of the webpage information; obtaining a pre-established webpage information word library, wherein the webpage information word library comprises multiple types of word sets, and every word in the word sets corresponds to one weight; according to the pre-established webpage information word library, obtaining the text content blocks of the webpage information by traversing the tag tree of the webpage information; according to the text content blocks of the webpage information, extracting at least one content element of the webpage information. The embodiment of the invention also discloses a webpage information extracting device and terminal. The webpage information extracting method, device and terminal can increase the webpage information extracting speed.

Description

technical field [0001] The present invention relates to the field of electronic technology, in particular to a web page information extraction method, device and terminal. Background technique [0002] Search engines include crawlers, indexers, and retrievers. Crawlers can collect information on the Internet and write the collected information into databases; indexers can extract index items from the information collected by crawlers to generate indexes for document libraries. table; the retriever can query the search documents related to the query information submitted by the user according to the index table of the document library, so as to display the search documents to the user. Therefore, whether the search engine can finally show the user a satisfactory search answer , a large factor depends on the information extracted by the crawler, and the extraction method of the crawler determines the information extracted by the crawler. [0003] Web page information includes...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/95

Inventor 邝锐强

Owner GUANGZHOU KINGSOFT NETWORK TECH

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Webpage information extracting method, device and terminal

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology