Semi-automatic vertical crawler generation tool and method

A semi-automatic and automatic generation technology, applied in the field of search engines, can solve problems such as low work efficiency, grammatical errors, and inability to visually check template content, and achieve the effect of saving workload

Active Publication Date: 2014-11-12
威海天之卫网络空间安全科技有限公司
View PDF5 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the embodiments of the present invention is to provide a semi-automatic vertical crawler generation tool and method, aiming to solve the existing vertical crawlers that require manual intervention in the configuration of crawler templates, which are prone to grammatical errors, and cannot visually check the template content and work. The problem of inefficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-automatic vertical crawler generation tool and method
  • Semi-automatic vertical crawler generation tool and method
  • Semi-automatic vertical crawler generation tool and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0021] The application principle of the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0022] Such as figure 1 As shown, the semi-automatic vertical crawler generation tool of the embodiment of the present invention is mainly composed of: crawler automatic generation tool module 1, crawler module 2;

[0023] The crawler automatic generation tool module 1 is used to enable the user to choose to create a new template or use lex-yacc technology to open the template, perform lexical and grammatical analysis on the template file, maintain the symbol ta...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a semi-automatic vertical crawler generation tool and method. The semi-automatic vertical crawler generation method comprises the steps that a user can select to newly establish a template or adopt the lex-yacc technology to open a template, conduct lexical and grammatical analysis on a template file, maintain a symbol table and construct a grammatical analysis tree; a tree template structure is constructed according to the content of the template by searching for the rules of the lexical and grammatical analysis and storing and processing data in the whole analysis process; according to the content to be extracted, template nodes are increased, modified or deleted in the tree template structure, wherein node information contains the skip relation, an XPath expression and the data storage modes; the template is saved. The semi-automatic vertical crawler generation tool comprises an automatic crawler generation tool module and a crawler module. According to the semi-automatic vertical crawler generation tool and method, automatic generation of the template content replaces manual configuration of the template, so that the template is configured more conveniently and quickly, and the workload of related staff can be greatly saved.

Description

technical field [0001] The invention belongs to the technical field of search engines, in particular to a semi-automatic vertical crawler generation tool and method. Background technique [0002] With the development of search engine technology, vertical search applications serving specific fields began to emerge. Vertical crawlers can selectively access target links of Internet pages according to specific targets to obtain page information. It does not pursue large and wide coverage, but only focuses on a certain field or industry, and selects the next page to be crawled from the URL queue according to the search engine indexing strategy. Although vertical crawlers have advantages such as higher accuracy compared with batch crawlers and incremental crawlers, vertical crawlers still need to manually intervene in the configuration of crawler templates, which is also determined by the characteristics of vertical crawlers. [0003] The efficiency of manual template configurat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 陈新蕾吕芳魏玉良刘扬黄俊恒王佰玲
Owner 威海天之卫网络空间安全科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products