The invention relates to a
web crawler system, in particular to a mobile
terminal web crawler
system, which comprises the following steps of: 1, starting from URLs (
Uniform Resource Locator) of one ormore initial web pages, obtaining the URLs on the initial web pages; 2, continuously extracting new URLs (Uniform Resource Locators) from the current page and putting the new URLs into a
queue in a webpage
crawling process until a certain stop condition of the
system is met; 3, filtering links irrelevant to the theme according to a certain webpage analysis
algorithm, reserving useful links, and putting the useful links into a URL
queue to be grabbed; 4, selecting a webpage URL to be captured in the next step from the
queue according to a certain search strategy, and repeating the process until a certain condition of the system is met; 5, storing all the web pages captured by the crawler by the system. According to the technical scheme, the defects that in the prior art, a search result contains a large amount of irrelevant information, the coverage rate is low, and query proposed according to
semantic information is difficult to support can be effectively overcome.