A big data web crawler page selection method and system
A web crawler and big data technology, applied in the field of big data web crawler paging selection, can solve the problems of unable to locate, unable to crawl webpage data circularly, unable to locate label information, etc., to prevent process interruption, improve processing efficiency, improve The effect of crawling efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0112] Based on the configuration steps of the present invention and corresponding configuration modules, the crawler script of the paging configuration part is as follows:
[0113] 1 name: 'nextpage',
[0114] 2 css: '#ess_ctrl193591_ListC_AspNetPager>table>tbody>tr>td:nth-child(2)>a',
[0115] 3 type: 'list',
[0116] 4 regex: 'next page',
[0117] 5 rule: {
[0118] 6 name: 'Href',
[0119] 7 keys: [
[0120] {
[0121]8 name: 'Href',
[0122] 9 type: 'pagelink',
[0123] 10 css: 'a'
[0124]},
[0125] {
[0126] 11 name: 'title',
[0127] 12 type: 'text',
[0128] 13 css: 'a'
[0129]},
[0130] {
[0131] 14 name: 'txt',
[0132] 15 type: 'text',
[0133] 16 css: 'a'
[0134]}
]
[0135]}
[0136] The crawler script is as follows:
[0137] 1 name: 'liuyugaikuang',
[0138] 2 url: 'http: / / www.gdwater.gov.cn / yszx / ysgk / lygk',
[0139] 3 keys: [{
[0140] 4 name: 'news',
[0141] 5 css: 'body'>div.wrap>div>div.glcom.clearfix>div.gl-right>ul>li,
[0142...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


