Network anti-crawling method, system and computer device,Web scraping through use of proxies, and applications thereof,Dynamic optimization of request parameters for proxy servers,A webpage content intelligent crawling method and system based on data analysis,Systems and methods for automated assessment of media content for sincerity

Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

5 results about "Web scraping" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

Network anti-crawling method, system and computer device

ActiveCN117527281BWeb siteSoftware engineering

This application relates to a method, system, and computer device for preventing web scraping. The method includes: when a client sends an access request to a target website, determining whether the target website has configured anti-scraping measures based on preset anti-scraping information; if the target website has configured anti-scraping measures, injecting an SDK into the access response, generating a first access response, and returning the first access response to the client; the access response is generated by the target website based on the access request; the client asynchronously submits its runtime information to a human-machine interface verification unit based on the first access response; when the client sends an access request to a subpage of the target website, determining whether the interface corresponding to the subpage is a protected interface; if the interface corresponding to the subpage is a protected interface, the human-machine interface verification unit performs human-machine interface verification based on the client's runtime information; if the human-machine interface verification result is successful, an access request is sent to the subpage. This method can prevent the target website from being accessed by malicious programs.

Network anti-crawling method, system and computer device

View all

Owner:ZHONGAN INFORMATION TECH SERVICES CO LTD

Web scraping through use of proxies, and applications thereof

PendingHK40134646AMechanical engineeringWeb scraping

The invention relates to a computer-implemented method for processing web scraping jobs, using a plurality of database servers (404A-404N) operating independently of one another and each being configured to manage data storage to at least a portion of a job database (314) that stores status of web scraping jobs while the web scraping jobs are being executed, the method comprising: - receiving a web scraping request from a client computing device (102); - when the web scraping request is received, selecting one of the plurality of database servers (404A-404N) that is identified as enabled in a table (1008); and sending a job description specified by the web scraping request to the selected database server (404A-404N) for storage in the job database (314) as a pending web scraping job; - repeatedly checking health of each of the plurality of database servers (404A-404N); and - based on the health checks, determine whether each of the plurality of database servers (404A-404N) are to be enabled or disabled in the table (1008).

Web scraping through use of proxies, and applications thereof

View all

Owner:OXYLABS UAB

Dynamic optimization of request parameters for proxy servers

PendingCN122372546AEngineeringWeb crawler

The present disclosure relates to dynamic optimization of request parameters for a proxy server. Systems and methods of task fulfillment are extended as provided herein and target the web scraping process through the step of a client submitting a request to a web crawler. The systems and methods allow for more complex requests to be defined for the web crawler in order to receive more specific data. In one aspect, a method for extracting and collecting data from a network by a service provider infrastructure includes the steps of inspecting parameters of a request received from a user's device, adjusting the request parameters according to pre-established scraping logic, selecting a proxy according to criteria of the pre-established scraping logic, sending the adjusted request to a target through the selected proxy, inspecting metadata received from the target, and forwarding the data to the user's device.

Dynamic optimization of request parameters for proxy servers

View all

Owner:OKOSILA BOSE PTE LTD

A webpage content intelligent crawling method and system based on data analysis

PendingCN122285979ADocumentationData science

This invention discloses a method and system for intelligent web page content crawling based on data analysis, belonging to the field of web page crawling technology. The method includes identifying candidate content blocks and calculating their Shannon entropy, generating an importance score by combining text density; constructing a global vocabulary based on the candidate content blocks and calculating inverse document frequencies of terms, defining a topic-specific factor, multiplying the importance score by the topic-specific factor to obtain a comprehensive priority, selecting candidate content blocks with a comprehensive priority higher than the average as target crawling blocks, and generating crawling rules for each target crawling block; and using the generated crawling rules to extract content from the initial HTML source code. This invention improves the content differentiation of web page crawling by combining text density analysis and Shannon entropy evaluation, and significantly enhances the accuracy of the crawling rules through in-depth analysis of web page structure and visual elements.

A webpage content intelligent crawling method and system based on data analysis

View all

Owner:TIANJIN HONGCHENG TECHNOLOGY CO LTD

Systems and methods for automated assessment of media content for sincerity

PendingUS20260154504A1Natural language analysisSemantic analysisScoring algorithmMediaFLO

A non-transitory computer-readable medium storing instructions that, when executed by a processor, cause a system to perform a method for analyzing media content, the method comprising: ingesting, via one or more web scraping techniques, media content from one or more online sources; preprocessing, via one or more natural language processing tools, the media content comprising the steps of: tokenizing the media content into a plurality of components, removing stop words from the plurality of components, lemmatizing the plurality of components, and normalizing the plurality of components; analyzing, via the one or more natural language processing tools, the preprocessed media content for one or more sentiments; generating, via one or more scoring algorithms, a score for each of the one or more sentiments; compiling the score for each of the one or more sentiments to generate a final score; and displaying the final score on one or more client devices.

Systems and methods for automated assessment of media content for sincerity

View all

Owner:Y2 CONSULTING LLC

5 results about "Web scraping" patented technology

Network anti-crawling method, system and computer device

Web scraping through use of proxies, and applications thereof

Dynamic optimization of request parameters for proxy servers

A webpage content intelligent crawling method and system based on data analysis

Systems and methods for automated assessment of media content for sincerity

Popular searches