Timely and high-efficiency crawling method for internet information
A technology for Internet information and web page information, applied in the information field to reduce misjudgments, simplify the scope and complexity, and simplify resource allocation
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0037] The specific embodiment of the present invention is as figure 1 shown. The steps are detailed below.
[0038] 1. Information collection and collation (such as figure 2 shown)
[0039] 1. Collect relevant information Url address
[0040] According to the pre-determined topic meaning, first select a certain part (such as 3-5) topic keywords; enter these topic keywords on a general search engine to get a list of query results; organize the query results and extract Url to get some relevant information URL address.
[0041] 2. Initial Url setting and web page information crawling
[0042] Select Internet information crawler software (such as Heritrix, Nutch, etc.), and set these Url addresses obtained in steps 1 and 1 as seed Url addresses in the software. Parameters such as the number of pages (determined in advance) are set in the software, and then the general Internet information crawling method (without subject-related judgment and timeliness prediction) is used...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com