The invention discloses a multithreading-based web crawler system, which comprises an URL (Uniform Resource Locator) processing module, a web crawling module, a web analysis module and a web storage module, wherein the URL processing module obtains the host name, the port number and the filename of each URL through URL-class processing; the web crawling module carries out partitioning crawling on web contents and stores a captured web into a temporary storage module; the web analysis module extracts the URL, redirects the URL, carries out repetition judgment processing on the URL and deletes repeated the URL; the web storage module judges whether the file is in the presence or not when the file is stored, and the file is directly crawled if the file is not in the presence; if the file is in the presence, contents obtained by crawling the web at the time are more than the contents crawled in the previous time, and the original file is covered; and otherwise, the file is abandoned. The web matched with a regular expression is firstly input, a web request signal is sent, then, a private function is triggered to obtain matched substance, finally, specific information which contains keywords is finally obtained, crawling speed is high, and efficiency is high.