System and method for acquiring intelligent network information based on WEB content and structure mining

A technology of information collection and intelligent network, which is applied in the direction of network data retrieval, web data retrieval using information identifiers, special data processing applications, etc. problem, to achieve the effect of saving hardware and network resources and improving information collection

Active Publication Date: 2016-11-16
厦门博卡斯通信息科技有限公司
View PDF12 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, as people's requirements for various information services are getting higher and higher, the traditional information collection based on the entire W

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for acquiring intelligent network information based on WEB content and structure mining
  • System and method for acquiring intelligent network information based on WEB content and structure mining
  • System and method for acquiring intelligent network information based on WEB content and structure mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] figure 1 It is a structural schematic diagram of an intelligent network information collection system in an embodiment of the present invention.

[0048] The intelligent network information collection system in this embodiment includes the following structure:

[0049] Protocol processor 1, used to obtain data in the webpage according to the WEB protocol;

[0050] In some embodiments, the protocol handler is tasked with protocol processing over all web protocols.

[0051]In some embodiments, the web protocol such as HTTP, FTP, Gopher and BBS obtains web page data. The HTTP protocol refers to the Hypertext Transfer Protocol (HTTP-Hypertext transfer protocol), which defines how the browser requests a World Wide Web document from the World Wide Web server, and how the server transmits the document to the browser. From a hierarchical point of view, HTTP is a transaction-oriented application layer protocol, which is an important basis for the reliable exchange of files (i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a system for acquiring intelligent network information based on WEB contents and structure mining. The system comprises a protocol processor, a webpage mark extractor connected with the protocol processor, a URL processor, a leading edge analyzer connected with the URL processor, and a URL database connected with the webpage mark extractor, and also comprises an acquisition monitor connected with the URL database. The system uses Web contents and a hyperlink structure to analyze, to determine correlation of a webpage and a leisure travelling field, so as to determine sequence of acquisition, and realize acquisition of intelligent network information. The invention also discloses an acquisition method. The method comprises: extracting metadata in a webpage; when detecting a new URL link, analyzing the correlation of the new URL which is detected in the webpage and an acquisition theme, to generate a to-be-accessed URL list; in an acquisition process, monitoring multi-thread acquisition process, and through evaluating the acquisition process, optimizing the acquisition, so as to greatly improve recognition rate of related webpage, and optimize the whole acquisition process.

Description

technical field [0001] The invention relates to the field of data collection and processing, in particular to an intelligent network information collection and method based on WEB content and structure mining. Background technique [0002] In the era of network information explosion, the amount of information has become extremely large, and it has become more and more difficult to search for valuable information in the overwhelming information ocean. Therefore, in order to solve this problem, there are already many learning methods that use some machines, such as a web page ranking method that can be predicted based on user requests. But even with a very sophisticated ranking algorithm, even the best crawler may not be able to retrieve useful information from that page if there is no subject indexing. [0003] With the rapid expansion of information on the WEB side, various WEB-based services are also gradually prospering. As the foundation and an important part of these i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/955
Inventor 黄杨
Owner 厦门博卡斯通信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products