Web site product detailed information classification crawling and product information base establishing method

A technology of information base and construction method, which is applied in the field of Internet web crawlers, and can solve the problems of lower crawling efficiency of traditional crawlers

Active Publication Date: 2014-07-16
CHONGQING UNIV OF POSTS & TELECOMM
View PDF4 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the entire process of data mining, web crawlers play an important role. The web pages captured by web crawlers are the data sources for big data analysis. These data will directly affect the accuracy of data mining. However, traditional web

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web site product detailed information classification crawling and product information base establishing method
  • Web site product detailed information classification crawling and product information base establishing method
  • Web site product detailed information classification crawling and product information base establishing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] With the widespread use of Internet technology in the 21st century and the explosive growth of information, people have entered the era of big data. Faced with a wide variety of products on the Web site, it is an important step in data mining to realize the crawling and analysis of the product classification sub-pages of the Web site and to establish a product information database. For web sites with product codes, how to build a product information database will directly affect the accuracy of subsequent web site product data mining. Based on the above problems, the present invention proposes a method for classifying and grabbing product detailed information on a Web site to obtain product classification sub-page source files; Classify information and build a product information database.

[0036] The content of the present invention includes two parts: crawling of sub-pages of each classification and establishment of a product information database.

[0037] Below in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a webpage crawling method for Web site product information obtaining. First, a home page of first-level classification of site products is subjected to crawling, and by analyzing crawled classification home page source files, a next-level product classification home page link is obtained; then step-by-step crawling is carried out until all classification home pages of a site are subjected to crawling; and by analyzing source files of all classification sub-pages, page turning elements and classification page numbers are obtained, then classification sub-page links are generated, and finally, according to the classification sub-page links, classification sub-page crawling is completed. Meanwhile, by analyzing crawled product classification sub-page source files, product detailed information and classification information of products are extracted, the mapping relation of site product id, classification id and other detailed information is established, and a product information base is established.

Description

technical field [0001] The invention relates to the field of Internet web crawlers. For websites with product numbers, use web crawlers to establish mapping relationships between product ids, category ids, and other detailed information. Background technique [0002] With the rapid development of Internet technology, the continuous enrichment of Web site products, and the continuous improvement of people's awareness of the value of information, thus stimulating people's demand for mining useful information from the massive product information of Web sites. Accurately classifying massive product information and establishing a product information database is an important basis for mining useful information. In the entire process of data mining, web crawlers play an important role. The web pages captured by web crawlers are the data sources for big data analysis. These data will directly affect the accuracy of data mining. Comprehensive crawling of page information, such a l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/958
Inventor 雒江涛申健杨军超刘勇高伟邓生雄王小平
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products