Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Information crawling method and device

A configuration file and component technology, applied in the Internet field, can solve the problems of complex code maintenance, low result accuracy, slow crawling speed, etc., to improve the efficiency of information crawling, save coding work, and lower the technical threshold.

Inactive Publication Date: 2018-02-23
JINGDONG TECH HLDG CO LTD
View PDF5 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The search engine platform can collect open information on the Internet across the entire network, with a wide range of collected data, fast update frequency, and a large amount of collected data. However, most of the collected information is fuzzy information, and the accuracy of the results is not high. And clean data, so when it is necessary to obtain accurate data, usually collect accurate data and usually use crawlers to accurately crawl information
[0004] Accurate crawling of information using crawlers usually requires custom coding for each type of target page, so it has the advantages of flexible data collection, accurate crawling information, and controllable crawling directions, but requires a large workload, and code maintenance is complex, difficult and Crawling speed is limited by the performance of a single machine
In addition, because the existing methods usually need to load the content of the page and download all the content on the page before crawling, it will take up a lot of resources when multiple web pages need to be crawled, resulting in a slow crawling speed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information crawling method and device
  • Information crawling method and device
  • Information crawling method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] Example embodiments will now be described more fully with reference to the accompanying drawings. However, the example embodiments can be implemented in various forms, and should not be construed as being limited to the examples set forth herein; on the contrary, the provision of these embodiments makes the present disclosure more comprehensive and complete, and fully conveys the concept of the example embodiments To those skilled in the art. The described features, structures or characteristics can be combined in one or more embodiments in any suitable way. In the following description, many specific details are provided to give a sufficient understanding of the embodiments of the present disclosure. However, those skilled in the art will realize that the technical solutions of the present disclosure can be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. can be used. In other cases, the known technical solution...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The disclosure provides an information crawling method and device. The method comprises: acquiring a configuration file generated according to webpage structure and information crawling need of a target website; executing an information crawling task for the website according to the configuration file. The information crawling method provided herein enables information crawling efficiency to be improved.

Description

Technical field [0001] The present disclosure relates to the field of Internet technology, and in particular, to an information crawling method and device for crawling information by configuring a crawling template. Background technique [0002] With the advent of the big data era, the importance of data has become increasingly prominent, and the collection of large amounts of data has become more and more important. At present, the methods of data collection are mainly divided into using internal data for collection and using the Internet for collection. A common technique is to encode the data that needs to be collected to capture the specified data. When using the Internet for data collection, it can be divided into using search engines for collection and using crawlers for crawling. [0003] The search engine platform can collect open information on the Internet across the entire network, with a wide range of collected data, fast update frequency, and large amount of collected...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F9/445G06F9/48G06F9/50
CPCG06F16/9566G06F9/44505G06F9/4881G06F9/5027G06F16/951
Inventor 苑海江党启贺
Owner JINGDONG TECH HLDG CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products