Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for extracting information of content in Internet

A content information and extraction system technology, which is applied in the field of Internet content information extraction methods and extraction systems, can solve the problems of limiting the initiative of extracting information, reducing the browsing speed of users, and being unable to obtain and save text content, so as to speed up the browsing display speed , the effect of improving browsing speed

Active Publication Date: 2007-12-26
TENCENT TECH (SHENZHEN) CO LTD
View PDF0 Cites 62 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the RSS provided by many information content sites does not cover all the information in the site, but only provides a small part of the content. For the content that is not provided by RSS, it cannot be obtained through the existing technology, which limits users. Activeness in extracting information
[0010] 3) Cannot obtain and save the text content through RSS
The current RSS only provides the text address link, but not the content of the text. Users must visit the URL pointed by the text address link to browse the text, thus reducing the user's browsing speed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for extracting information of content in Internet
  • Method and system for extracting information of content in Internet
  • Method and system for extracting information of content in Internet

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The present invention will be further described in detail below through specific embodiments and accompanying drawings.

[0048] The core idea of ​​the present invention is: actively obtain the source code of the target webpage, extract the address link therein, and then actively obtain the source code of the link, and obtain the required content information therefrom.

[0049] FIG. 2 is a schematic structural diagram of a system for extracting Internet content information according to the present invention. Referring to Fig. 2, the extraction system 21 of the Internet content information includes:

[0050] Setting unit 201: used to provide the user with a target web page and a setting interface for predetermined extraction conditions, and save the set content; the user can customize the target web page of the target information content site to be visited through the setting interface (the target web page is generally a indexed webpage) and customizing predetermined ex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The method comprises: a) getting the source code of the target webpage; b) extracting the address link matching the a preset extracting term from said source code of the target webpage; c) according to the extracted address link, getting the source code of its corresponding content webpage; d) extracting the content information matching the preset extracting term from the content webpage. The system thereof comprises: a setting unit used for presetting a target webpage and an extracting term; a first acquisition unit used for the getting the address link from the target webpage source code; and a second acquisition unit used for getting the content information from the content webpage source code.

Description

technical field [0001] The invention relates to the technical fields of computers and the Internet, in particular to a method and system for extracting Internet content information. Background technique [0002] With the development of the Internet, the information content it contains has reached a massive level, but these consulting contents are scattered on thousands of sites in the Internet, which brings great inconvenience to people's browsing. Under such circumstances, more and more attention is paid to Internet content extraction technology, which can actively extract information content and provide raw data for content aggregation, content mining, content publishing and other services. [0003] The extraction of Internet information content and search engines are different concepts. The search engine finds web pages that have a certain relationship with the keywords through the keywords entered by the user, and lists and displays the addresses of these web pages that...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04L12/28H04L29/06
Inventor 郭欣
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products