Webpage theme information extraction method

A technology of theme information and web pages, applied in the network field, can solve problems such as loss of confidence, inconvenience, and lost goals

Inactive Publication Date: 2014-06-04
DALIAN LINGDONG TECH DEV
View PDF2 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the continuous enrichment of network resources and the continuous expansion of network information, people's dependence on the network is becoming stronger and stronger, but it also brings inconvenience to the service objects to quickly find the specific resources they need from the vast Internet resources.
Extracting useful resources from massive amounts of information is an urgent problem to be solved at present, and the main information expressed by Web pages is usually hidden in a large number of irrelevant structures and texts, which prevents users from quickly obtaining topic information and limits the use of Web pages. Usability, clients tend to lose their goals when querying information, or get some biased results
Therefore, when browsing the search results, many service objects often spend a lot of time and energy looking at pages that are not related to the services they searched for, which makes many service objects lose confidence in search engines, resulting in the loss of service objects

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage theme information extraction method
  • Webpage theme information extraction method
  • Webpage theme information extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] The present invention is a method for extracting web page topic information, and its specific implementation includes the following steps:

[0061] A. Topic information extraction method In the application background, this method mainly studies the method based on the HTML structure mentioned above. In the field of Web information retrieval, the relevance of retrieval results and the speed of retrieval are two indicators for evaluating a Web retrieval system . If the noise content in the original web page is not removed, the retrieval system will also build an index on the noise content, so that the web page is returned as a result just because the query word appears in the noise content of the web page, and the subject content of the web page content may be different from that of the original web page. This query term is completely irrelevant. for this problem,

[0062] Divide the layout tags into blocks. The block nodes determine the granularity of the blocks. TABLE...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a webpage theme information extraction method. The method comprises the following steps that a structural representation method of tree-shaped information is used for representing individual service problems; logical representation of structured problems is conducted; orderly solving of the individual problems is conducted. Due to the fact that the structural representation method of the tree-shaped information is used for describing the individual service problems, the individual service problems of various fields and modes are represented by using three basic elements which refer to the service content element, the service object element and other elements, the individual service problem is divided into a basic element layer, a basic information layer and a sub information layer, so that the individual service problems of information systems in most of fields are structuralized, and the recommendation rules of individual service can be set. According to the webpage theme information extraction method, a weighted search solving method is used, the weight of a reasoning result related to the current purchase history of a customer is maximum, sequential lowering according to a purchase sequence in later is achieved, and new resources interested by a server object is recommended.

Description

technical field [0001] The invention relates to a network technology, in particular to a method for extracting web page topic information. Background technique [0002] With the popularization of the Internet and the development of information technology, a large number of information resources have been formed. With the continuous enrichment of network resources and the continuous expansion of network information, people's dependence on the network is becoming stronger and stronger, but it also brings inconvenience to service objects to quickly find the specific resources they need from the vast Internet resources. Extracting useful resources from massive amounts of information is an urgent problem to be solved at present, and the main information expressed by Web pages is usually hidden in a large number of irrelevant structures and texts, which prevents users from quickly obtaining topic information and limits the use of Web pages. Usability, clients tend to lose their g...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 郑世超刘立堂
Owner DALIAN LINGDONG TECH DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products