Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A system capable of identifying and automatically collecting web page information

A technology for automatic collection and webpage information, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of not fundamentally eliminating search engine collection, increasing the difficulty of webpage collection and analysis, and achieving the goal of eliminating collection Effect

Active Publication Date: 2016-02-17
国科(上海)企业发展有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] 2. The information on some websites has privacy or copyright, and many webpages contain information such as background databases, user privacy, passwords, etc.
Parsing process such as figure 2 As shown, the method of building dynamic webpages with dynamic technology only increases the difficulty of webpage collection and analysis, and does not fundamentally eliminate the collection of search engines.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] Referring to the accompanying drawings, a system capable of identifying automatic collection of web page information includes an anti-collection classifier building block, an automatic collection identification module and an anti-collection online processing module, and an anti-collection classifier building block, which is mainly used for using computer programs to The automatic collection of historical web information and normal web page access behavior are studied and distinguished. This module provides a training model for automatic collection and recognition. The automatic collection and recognition module described above mainly automatically recognizes search engine programs by loading an automatic classifier automatic collection behavior, and add the identified IP segment where the collection program is located to the blacklist, which is used for subsequent online interception of the automatic collection behavior. The anti-collection online processing module is mai...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a system and a method for identifying and automatically acquiring webpage information. The system comprises an anti-acquisition classifier constructing module, an automatic acquisition identifying module and an anti-acquisition online processing module, wherein the anti-acquisition classifier constructing module is mainly used for automatically acquiring history web information by using a computer program and learning and distinguishing normal webpage access behaviors; the automatic acquisition identifying module is used for automatically identifying the automatic acquisition behavior of a search engine program by using an anti-acquisition classifier in the previous step, and adding an IP (Internet Protocol) segment where the identified acquisition program is positioned into a blacklist; and the anti-acquisition online processing module is mainly used for automatically judging and processing accessing users on line. Due to the adoption of the system and the method, the deficiencies in the prior art are overcome; and in the system, the history webpage access behaviors of a website are analyzed, the automatic acquisition classifier is established, automatic acquisition of a robot is identified, and webpage anti-grabbing is realized through automatic robot acquisition and identification.

Description

technical field [0001] The invention relates to the technical field of dynamic analysis of webpages, in particular to a system capable of identifying and automatically collecting webpage information. Background technique [0002] With the development of the Internet, there have been more and more Internet sites in endless forms, such as news, blogs, forums, SNS, Weibo and so on. According to the latest statistics from CNNIC this year, China now has 485 million Internet users and more than 1.3 million domain names of various sites. In today's Internet information explosion, search engines have become the most important tool for people to find Internet information. [0003] Search engines mainly crawl website information automatically, preprocess it, and build indexes after word segmentation. After entering the search term, the search engine can automatically find the most relevant results for the user. After more than ten years of development, the search engine technology ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 张炜金军吴杨梓江岩
Owner 国科(上海)企业发展有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products