An Object Storage Based Crawler Network Path Tracing Method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of object storage and network path, which is applied in the field of path tracking research in software engineering, can solve the problem of serious disk IO load, achieve the effect of improving IO efficiency, decoupling, and ensuring retrieval efficiency

Active Publication Date: 2020-06-09

广州探迹科技有限公司

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

This method will cause serious disk IO load due to the need to read and write to the disk frequently. In addition, this method still has the problem that the same data is jointly maintained by two systems, and it cannot fundamentally avoid read-write conflicts.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0022] Such as figure 1 As shown, in this embodiment, a crawler network path tracking method based on object storage includes the following steps:

[0023] 1. Deploy the object storage system and log processor.

[0024] The object storage system is a file storage system based on the HBASE distributed file system, which can support the storage of PT-level files. By calling the HTTP interface and passing corresponding parameters, deletion (DELETE), creation (POST), and rewriting (PUT) of files on the object storage system can be implemented. It should be noted that the object storage system of the present invention can only provide file deletion, creation and rewriting, and does not support incremental writing of files.

[0025] The log processor is a single piece, which is used to process the received results and distinguish them by crawler. Each crawler is correspondingly written into a result path log file, which is convenient for the subsequent system to read and index.

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a crawler network path tracing method based on object storage. The method comprises the steps that an object storage system and a log recorder are established, wherein the log recorder generates a result path log, and indexes from the source URL of a crawling result to a crawler result file on the object storage system are recorded in the result path log; when an external system needs to call the data in the database, the crawler result file on the object storage system is directly obtained through the indexes. According to the method, the object storage system is introduced so that the file reading and writing speed can be increased; the result path log is established so that the data can be retrieved in the log when the external system calls the data and does not need to be searched in the database, and accordingly the possibility of reading and writing conflicts is avoided.

Description

technical field [0001] The invention belongs to the research field of path tracing in software engineering, in particular to a crawler network path tracing method based on object storage. Background technique [0002] A web crawler is a program or script that automatically captures information on the World Wide Web according to certain rules. In the current path tracing, most of the crawler network path tracing is based on the crawler task as the basic unit. For example, the open source crawler framework pyspider, the default The action is to store the result into the database. If the external system needs to retrieve the data in the database, there is no convenient retrieval method. It can only scan the database, and it is necessary to modify the status of the result data in the database so that these processed data will be excluded in the next processing. result. As a result, the data in the database needs to be maintained by the two systems together, causing great uncer...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06F16/18G06F16/953G06F16/182

CPCG06F16/1815G06F16/182G06F16/951

Inventor陈开冉邓楚健

Owner广州探迹科技有限公司

An Object Storage Based Crawler Network Path Tracing Method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology