Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for picking up web information needed by user in web page

A user and page technology, applied in the Internet field, can solve problems such as high time complexity and poor scalability, and achieve the effect of flexible information extraction algorithms

Active Publication Date: 2007-12-26
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0027] The present invention provides a method for extracting the Web information required by the user in the Web page, which is used to solve the problems of high time complexity and poor scalability in the prior art due to the analysis of the content of the Web information to extract the required Web information.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for picking up web information needed by user in web page

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The present invention provides an information extraction method based on HTML grammar standards, which couples webpage information segmentation and information extraction algorithms with specific information content, so that the information segmentation and extraction algorithms are more versatile and flexible.

[0055] The method for extracting Web information required by a user from a Web page provided by the present invention, as shown in Figure 2, includes:

[0056] Step S11, according to the order of HTML text corresponding to the Web page, select a number of HTML tags as tag ruler elements to generate a tag ruler, and store it in the system;

[0057] Step S12: The system matches the HTML text in sequence according to the HTML tag elements in the tag ruler, divides the Web information according to the matched HTML tags, and stores the divided Web information block and the position information of the HTML tag containing the information block in the text ;

[0058] Step S1...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The method comprises: according to the HTML text sequence corresponding to the webpage, selecting some HTML labels as the label scale elements to generate label scale; saving the label scale into the system; according to said HTML elements in said label scale, the system sequentially matches said HTML texts; according to the matched HTML labels, splitting web information; the split web information blocks and the location information of HTML label in said text are saved; user can confirm the location of HTML label in HTML text according to the web information, and inform it to system; system looks up and extracts the split web information block.

Description

Technical field [0001] The invention relates to the Internet, and in particular to a method for extracting Web information required by a user in a Web page. Background technique [0002] Hypertext Markup Language (HTML) is a text markup language currently used on the WWW. HTML uses a series of tags to enable Web browsers to structure Web pages. [0003] For example, the effect of the following piece of HTML text information displayed in the IE browser is shown in Figure 1. [0004] <font color=red>147 <font color=red>734 Learn <Ahref="http: / / www.cnplayer.com / upload / 2006 / 2 / 13 / 200621323483592551238218.torrent" target=_blank> CPA2005 learning materials-accounting, economic law, tax law, ISO classic materials <a href="http: / / bbs.fkee.com / "target="_blank> Related discussion <A href="http: / / www.cnplayer.com / bt / study / 210591.htm"target="_blank>View <font color=red> 1354M [0005] In the above HTML text,,, Such symbols ar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04L12/58G06F17/30G06F17/00G06F40/00G06F40/143
CPCG06F17/2247G06F17/227G06F17/30896G06F16/986G06F40/154G06F40/143
Inventor 程凯
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products