Data processing system, method and device for acquiring website resources

A technology of a data processing system and a data processing device, applied in the field of data processing, can solve the problems of poor resource timeliness, long data update cycle, etc., to achieve a good search experience, overcome the long data update cycle, and improve the rendering rate.

Active Publication Date: 2015-01-14
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the defects of the existing technology, the embodiment of the present invention provides a data processing system, method and device for obtaining website resources, which can overcome the defects of the existing technology such as long data update cycle and poor timeliness of resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing system, method and device for acquiring website resources
  • Data processing system, method and device for acquiring website resources
  • Data processing system, method and device for acquiring website resources

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0050] figure 1 is a block diagram of a data processing system for acquiring website resources according to an embodiment of the present invention, refer to figure 1 , the data processing system 1 includes a data screening device 10, a webpage parsing server 20 and a database 30, which will be described separately below.

[0051] The data screening device 10 is used to receive the webpage data captured by the web crawler, perform screening processing on the received webpage data during the receiving process, and send the screened webpage data related to the specified website to the webpage analysis server 20 .

[0052] Optionally, in an implementation of this embodiment, the data screening device 10 may directly communicate with the web crawler and continuously receive webpage data, or may communicate with a database for storing webpage data captured by the web crawler and continuously receive The webpage data can also communicate with the data forwarding device for forwardin...

no. 2 example

[0065] figure 1 The shown data processing system 1 is suitable for obtaining resources of various types of websites (eg news websites, video websites, education and scientific research websites, military websites, etc.). As far as obtaining video website resources is concerned, considering that displaying video resources in the form of pictures can improve user experience, the present invention further provides a preferred data processing system for obtaining video website resources, such as Figure 2A As shown, the data processing system 2 includes a picture processing subsystem 40 in addition to the data screening device 10 , the webpage parsing server 20 and the database 30 . Describe respectively below, wherein, although the data screening device 10, the webpage parsing server 20 and the database 30 are not described in detail, but the three can have figure 1 All the features in the illustrated embodiment will not be repeated here.

[0066] In this embodiment, the data s...

no. 3 example

[0078] The data processing system according to the embodiment of the present invention has been described above with reference to the drawings, and the data processing method according to the embodiment of the present invention will be described below with reference to the drawings.

[0079] image 3 It is a schematic flowchart of a data processing method for obtaining website resources according to an embodiment of the present invention, refer to image 3 , the method includes:

[0080] 300: Receive webpage data captured by the web crawler, and filter and process the received webpage data during the receiving process to obtain webpage data related to the specified website.

[0081] Optionally, in an implementation manner of this embodiment, during the process of receiving the webpage data, the received webpage data is screened according to the URL regular expression of the specified website.

[0082] 302: Perform parsing and processing on the webpage data related to the spe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data processing system, method and device for acquiring website resources. The system comprises a data screening device, a webpage parsing server and a database, wherein the data screening device is used for receiving webpage data captured by web crawlers, screening the received data during receiving, and transmitting the screened webpage data related to appointed websites to the webpage parsing server, the webpage parsing server is used for parsing the webpage data related to the appointed websites according to preset parsing strategies to obtain first structuralized data and saving the first structuralized data to the database, and the database is used for performing data fusion according to the first structuralized data received in a preset time period to obtain second structuralized data for describing the resources of the appointed websites. By the system, the updating cycle of the website resources can be shortened, timeliness of the website resources is increased, the image rate of video resources of video websites can be increased, and user experience is increased.

Description

technical field [0001] The present invention relates to the field of data processing, and more specifically, to a data processing system, method and device for obtaining website resources. Background technique [0002] Search engines provide users with search services based on website resources included in databases (website resources are usually described in structured data). The search results of the search engine are directly related to the website resources included in the database. Therefore, in order to improve the user experience, it is necessary to update the website resources in time. [0003] In the prior art, website resources are usually updated in the following manner: first, wait for a web crawler (spider) to grab a large number of webpages, store the webpages grabbed in the first database and build an index; The full amount of webpages is screened and structured data analysis (this operation is usually triggered manually), and the analysis results are stored ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30H04L29/06
CPCG06F16/951
Inventor 鲁晓莹李进刘世戟刘鸿宇
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products