A method for collecting batch encrypted data for news media
A technology for encrypting data and collecting methods, applied in network data indexing, network data retrieval, other database retrieval and other directions, can solve the problems of increasing data collection difficulty, poor collection stability, and high collection cost, reducing data collection workload, running The effect of fast speed and improved data collection efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0039] Embodiment 1: a kind of collection method for batch encrypted data of news media, at first will need to collect the network address url, station name content to add database; Also comprise the following steps:
[0040] S1, setting the url deduplication set realized by redis and the url queue realized by redis, and adding the website url and site name content in the database to the url deduplication set realized by redis and the url queue realized by redis;
[0041] S2, the processor generates multiple puppeteer processes to consume the data in the url queue implemented by redis in step S1;
[0042] S3, setting the html queue realized by redis, after obtaining the web page data html, adding it to the html queue realized by redis, and setting a marking process in the html queue realized by redis, the marking process is used to distinguish the list page webpage Data html or content page webpage data html;
[0043] S4, analyze the data in the html queue implemented by redi...
Embodiment 2
[0045] Embodiment 2: On the basis of Embodiment 1, in step S2, a plurality of puppeteer processes will continue to maintain a plurality of puppeteer processes and save the browser status information in a text file when idle, and mark it as to be called; When there is a url in the url queue implemented by redis that needs to be parsed, randomly read the text document information of a puppeteer process marked as waiting to be called, and then mark the document status as being called, which can reduce memory usage and improve browser opening speed .
Embodiment 3
[0046] Embodiment 3: On the basis of embodiment 1, in step S4, set a marking process, specifically set an html mark; To monitor whether there is parsing data in the redis html queue, if there is parsing data, the processor calls the html tag parsing program process to parse the html tags.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com