Method and system for crawler paging selection of big data network
A web crawler and big data technology, applied in the field of big data web crawler paging selection, can solve the problems of unable to locate, unable to crawl webpage data circularly, unable to locate label information, etc., to prevent process interruption, improve processing efficiency, improve The effect of crawling efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0112] Based on the configuration steps and corresponding configuration modules of the present invention, the crawler script of the paging configuration part is as follows:
[0113] 1 name: ‘nextpage’,
[0114] 2 css: ‘#ess_ctrl193591_ListC_AspNetPager> table> tbody> tr> td: nth-child(2)> a’,
[0115] 3 type: ‘list’,
[0116] 4 regex: ‘Next page’,
[0117] 5 rule: {
[0118] 6 name: ‘Href’,
[0119] 7 keys: [
[0120] {
[0121] 8 name: ‘Href’,
[0122] 9 type: ‘pagelink’,
[0123] 10 css: ‘a’
[0124] },
[0125] {
[0126] 11 name: ‘title’,
[0127] 12 type: ‘text’,
[0128] 13 css: ‘a’
[0129] },
[0130] {
[0131] 14 name: ‘txt’,
[0132] 15 type: ‘text’,
[0133] 16 css: ‘a’
[0134] }
[0135] ]
[0136] }
[0137] The crawler script is as follows:
[0138] 1 name: ‘liuyugaikuang’,
[0139] 2 url: ‘http: / / www.gdwater.gov.cn / yszx / ysgk / lygk’,
[0140] 3 keys: [{
[0141] 4 name: ‘news’,
[0142] 5 css: ‘body’> div.wrap> div> div.glcom.clearfix> div.gl-right> ul> li,
[0143] 6 type:‘list’,
[0144] 7 rule: ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


