Method for abstracting network data and web reptile system

A webpage data and extraction method technology, applied in the field of data analysis, can solve problems such as the inability to crawl webpages, and achieve the effect of reducing research and development costs

Inactive Publication Date: 2007-12-19
李沫南
View PDF0 Cites 90 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to provide a general data extraction method for search engine systems or other systems that extract requirements from Web pages, so as to solve the problem that existing Web crawler systems cannot capture webpages that use scripts to generate webpage content represented by AJAX

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for abstracting network data and web reptile system
  • Method for abstracting network data and web reptile system
  • Method for abstracting network data and web reptile system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The present invention is described below with reference to the drawings and embodiments. In the following, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, the invention may be practiced without these specific details. In other embodiments, well-known structures and devices are shown in block diagram form to facilitate an understanding of the present invention.

[0032] In this application, the term "component" is intended to refer to a computer-related entity—hardware, a combination of hardware and software, software, or software in execution. For example, a component includes, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and / or a computer. For example, both a program running on a server and the server can be computer components. One or more components may reside within a process and / or threa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A web crawler system used for picking up webpage data is prepared as providing data pick-up task to the second component and receiving execution result of data pick-up task from the second component by the first component, communicating with webpage server to obtain webpage data and operating DOM model to pick up data as well as describing picked up data then sending picked up data and its description to the first component by the second one.

Description

technical field [0001] The present invention generally relates to data analysis, and more specifically relates to a method and a system for extracting data from webpages using a Web crawler (Crawler). Background technique [0002] With the development of computer and Internet technologies, search engines have become an important way for users of Web clients (eg, computers) to obtain information. Generally, users provide search engines with keywords they are interested in, and the search engines generate pages based on the keywords provided by users to help users discover and visit new "Uniform Resource Addresses" (URLs). In order to achieve this goal, the search engine retrieves the pre-established index data structure through the indexer to generate the keyword result page provided to the user, and uses Web crawlers (or called "spiders", "robots") to regularly visit through It extracts the textual information and other related webpage attributes from the webpage resource l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 李沫南
Owner 李沫南
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products