Collection method for vertical data of web spider

A technology of data collection and web spider, which is applied in the fields of electrical digital data processing, special data processing applications, instruments, etc. Application prospect, accurate effect of resources

Active Publication Date: 2013-02-13
KUNSHAN DINGSHENG DATA SERVICES CO LTD +1
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] At present, with the rapid expansion of network information, the amount of information on the Internet is increasing. Now some search engines such as Google, Baidu, etc., etc. are free and open to all users, and they all strive to do their best in the search results. However, the above-mentioned search engines are not specially designed to search for information in a specific field. However, people often need to use search engines to help them find information in a specific field in a large amount of information. For example, job seekers search for "Suzhou Tourism" in Baidu. , there will be some travel information, but many of the travel information in the results have expired, and some even have job information from a year ago or even longer. For very time-sensitive information such as travel information, search engines such as Baidu obviously Can not meet user needs
[0003] Also, the background of developing web search is because there are too many web pages on the Internet, and users cannot find the web pages they want. There are two problems. First: users want to find something they want from open web search. It takes a lot of time to search for information in a specific field. For example, for job hunting, if a user enters "java development" in Google, the results are all about java development skills. If you want to see information about "java development" "Positions need to be constantly turned over; the second reason for vertical search is that the number of domain-specific websites is increasing day by day. Taking recruitment as an example, there are now hundreds of domestic recruitment websites, except for 51job and Chinahr. In addition to such portals as Zhaopin and Zhaopin, each region also has its own recruitment portals. Therefore, if users want to get comprehensive recruitment information, they need to open one website after another, which is time-consuming and labor-intensive.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Collection method for vertical data of web spider
  • Collection method for vertical data of web spider

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be further described below in conjunction with the accompanying drawings.

[0031] Such as figure 1 The system block diagram of the network vertical crawling system is shown. The process of the network vertical crawling system is to obtain input from the Internet (initially contains the user-specified starting seed URL class library collection, which can be one or more), and parse the URL class library The server address indicated in , establish a connection, send a request and receive data, store the obtained webpage data in the original webpage database, extract the link information from it and put it into the webpage structure database, and put the URL to be captured into the URL class Library, to ensure the recursive process of the whole process, until the URL library is empty, the web spider vertical search system provides retrieval services, it is necessary to save the original text of the webpage, and the collected webpage should be sto...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a collection method for vertical data of a web spider. The collection method comprises the following steps: 1) a client establishes a URL (uniform resource locator) class library and a Page class library; 2) a client process is in connection with a server process; 3) the client generates a request message body and sends the request message body to a server; 4) the client acquires webpage header information and webpage body information; and 5) the client analyzes the webpage header information and saves the satisfactory webpage body information so as to complete the collection of the webpage data. The collection method for the vertical data of the web spider disclosed by the invention can provide more accurate information to a user and better meet the search requirements of the user, and has an accurate and stable algorithm without causing depletion of local resources, so that the collection method has a good application prospect.

Description

technical field [0001] The invention relates to the technical field of information analysis and capture, in particular to a vertical data collection method for web spiders. Background technique [0002] At present, with the rapid expansion of network information, the amount of information on the Internet is increasing. Now some search engines such as Google, Baidu, etc., etc. are free and open to all users, and they all strive to do their best in the search results. However, the above-mentioned search engines are not specially designed to search for information in a specific field. However, people often need to use search engines to help them find information in a specific field in a large amount of information. For example, job seekers search for "Suzhou Tourism" in Baidu. , there will be some travel information, but many of the travel information in the results have expired, and some even have job information from a year ago or even longer. For very time-sensitive informat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/08G06F17/30
Inventor 丁国平
Owner KUNSHAN DINGSHENG DATA SERVICES CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products