Webpage information analysis method and device

A web page information and analysis method technology, applied in the field of web page information analysis, can solve problems such as incapacity and low success rate

Active Publication Date: 2014-05-14
BEIJING QIHOO TECH CO LTD
View PDF5 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When traditional shopping search extracts product page information from e-commerce websites, it generally only maintains a set of templates. When encountering such a complex page situation, it will appear powerless and can only successfully parse a part of the products, resulting in a low success rate of parsing. relatively low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage information analysis method and device
  • Webpage information analysis method and device
  • Webpage information analysis method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0039] Such as figure 1 A method for parsing web page information of the present invention includes the following steps:

[0040] Step S110, before parsing starts, it is first necessary to obtain the URL of the webpage to be parsed. Since not all URLs of webpages can be parsed, it is necessary to judge whether it can be parsed according to the URL of the webpage to be parsed. What is interesting is naturally the product features,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a webpage information analysis method including the steps of matching a URL(uniform resource locator) of a to-be-analyzed webpage to a domain name of an analyzable domain name set, positioning to a corresponding URL feature set according to the domain name successful in matching, matching the URL of the to-be-analyzed webpage to URL feature of the URL feature set, positioning to a corresponding goods template set according to the URL feature successful in matching, matching the to-be-analyzed webpage to the goods template of the goods template set, analyzing the to-be-analyzed webpage according to the goods template successful in matching, and then feeding back an analysis result. The invention further provides a webpage information analysis device correspondingly. By the webpage information analysis method and device, various types of goods URLs of a web can be identified accurately, different type of goods templates are used to perform matching and identifying to different type of goods URLs, and thus goods information on the webpage can be analyzed as much as possible.

Description

technical field [0001] The invention relates to an analysis technology of webpage information, in particular to an information analysis and extraction method and a corresponding device when the website addresses of the webpage are diversified and the information of the webpage is diversified. Background technique [0002] With the continuous development of e-commerce websites, in order to better display product information and impress consumers who come to shop, the pages of the website are now becoming more and more complicated. For shopping searches that want to extract product information from these websites That being said, there were no small challenges. Firstly, the URL of the product page may have various forms, and secondly, the product page information may also be presented in various forms. When traditional shopping search extracts product page information from e-commerce websites, it generally only maintains a set of templates. When encountering such a complex pa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06Q30/00
CPCG06F16/958
Inventor 周雷高扬姜鑫曹晴牛杏媛
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products