Data processing method based on web crawlers and structural storage
A structured storage and data processing technology, applied in the direction of network data retrieval, network data indexing, electronic digital data processing, etc., can solve the problems that the results cannot meet the needs, are not comprehensive enough, and cannot meet the requirements of the application, so as to reduce data The effect of source comparison, improving efficiency, and ensuring accuracy and completeness
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0024] The present invention is described in detail below in conjunction with accompanying drawing:
[0025] Such as figure 1 As shown, a data processing method based on web crawler and structured storage includes the following steps:
[0026] Step 1: Determine the data source and configure the web crawler system;
[0027] Step 2: Configure the data processing interface according to the characteristics of the data source and the preset metadata structure;
[0028] Step 3: Filter and sort the data and files obtained by web crawlers, and filter and sort the information on the pages on the website according to the URL address. Non-duplicate data enters the database and is copied by the system platform. During the copying process, compare within 48 hours For similar news, compare the title, the text before the paragraph, and the text at the end of the paragraph, or compare the word segmentation of the text with greater than or equal to 80% believe the information to record and m...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com