Supercharge Your Innovation With Domain-Expert AI Agents!

Method and device for extracting contents of index pages and search engine

An extraction method and extraction device technology, applied in the Internet field, can solve the problems of reducing the spider's coverage of network resources and the like

Active Publication Date: 2015-12-09
BEIJING QIHOO TECH CO LTD
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] Due to the above problems of missing links, the coverage of Spider's collection of network resources has been reduced

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting contents of index pages and search engine
  • Method and device for extracting contents of index pages and search engine
  • Method and device for extracting contents of index pages and search engine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0098] Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangements of components and steps, numerical expressions and numerical values ​​set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

[0099] At the same time, it should be understood that, for the convenience of description, the sizes of the various parts shown in the drawings are not drawn according to the actual proportional relationship.

[0100] The following description of at least one exemplary embodiment is merely illustrative in nature and in no way taken as limiting the invention, its application or uses.

[0101] Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the descript...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a method and device for extracting contents of index pages and a search engine. The method comprises the following steps: extracting the contents of a current index page in a index page series of a specified website and comparing a subject link set in the current index page with a historical subject link set extracted last time; because responding to the fact that the subject link set in the current index page does not have intersection with the historical subject link set, positioning a next index page to carry out content extraction. The network resource embodying and coverage rates of Spider can be increased under the condition that the flow cost is not increased by adopting the embodiment of the invention.

Description

technical field [0001] The invention relates to Internet technology, in particular to a method and device for extracting index page content, and a search engine. Background technique [0002] HTML (HypertextMarkupLanguage, Hypertext Markup Language) documents as Internet resources are connected using hyperlinks, just like weaving a web. Search engines use spiders (web crawlers, also known as web spiders) to find web resources. Spider is located at the most upstream of the search engine data stream, and is responsible for collecting resources on Internet sites into local databases for subsequent retrieval. It is one of the most important data sources for search engines. The goal of Spider is to discover and crawl all valuable web pages on the Internet. [0003] At present, most Internet sites organize website resources in the form of index pages and page turning. When new resources are added, old resources are moved backward or forward to the page turning series. For spide...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9566
Inventor 郑燕琴
Owner BEIJING QIHOO TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More