Method and system for extracting information of content in Internet

A content information and extraction system technology, which is applied in the field of Internet content information extraction methods and extraction systems, can solve the problems of limiting the initiative of extracting information, reducing the browsing speed of users, and being unable to obtain and save text content, so as to speed up the browsing display speed , the effect of improving browsing speed

Active Publication Date: 2007-12-26
TENCENT TECH (SHENZHEN) CO LTD
View PDF0 Cites 62 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the RSS provided by many information content sites does not cover all the information in the site, but only provides a small part of the content. For the content that is not provided by RSS, it cannot be obtained through the existing technology, which limits users. Activeness

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for extracting information of content in Internet
  • Method and system for extracting information of content in Internet
  • Method and system for extracting information of content in Internet

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0047] The present invention will be further described in detail below through specific embodiments and drawings.

[0048] The core idea of ​​the present invention is to obtain the source code of the target webpage in an active manner, extract the address link therein, and then actively obtain the source code of the link, thereby obtaining the required content information.

[0049] Fig. 2 is a schematic diagram of the structure of the Internet content information extraction system according to the present invention. Referring to Figure 2, the Internet content information extraction system 21 includes:

[0050] Setting unit 201: used to provide users with a setting interface for the target webpage and predetermined extraction conditions, and save the set content; the user can customize the target webpage of the target information content site to be accessed through the setting interface (the target webpage is generally a Index webpage) and customize the predetermined extraction con...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The method comprises: a) getting the source code of the target webpage; b) extracting the address link matching the a preset extracting term from said source code of the target webpage; c) according to the extracted address link, getting the source code of its corresponding content webpage; d) extracting the content information matching the preset extracting term from the content webpage. The system thereof comprises: a setting unit used for presetting a target webpage and an extracting term; a first acquisition unit used for the getting the address link from the target webpage source code; and a second acquisition unit used for getting the content information from the content webpage source code.

Description

technical field [0001] The invention relates to the technical fields of computers and the Internet, in particular to a method and system for extracting Internet content information. Background technique [0002] With the development of the Internet, the information content it contains has reached a massive level, but these consulting contents are scattered on thousands of sites in the Internet, which brings great inconvenience to people's browsing. Under such circumstances, more and more attention is paid to Internet content extraction technology, which can actively extract information content and provide raw data for content aggregation, content mining, content publishing and other services. [0003] The extraction of Internet information content and search engines are different concepts. The search engine finds web pages that have a certain relationship with the keywords through the keywords entered by the user, and lists and displays the addresses of these web pages that...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L12/28H04L29/06
Inventor 郭欣
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products