Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for extracting POI (Point of Interest) data from webpages

A technology of data extraction and web page, applied in the Internet field to achieve the effect of accurate extraction

Active Publication Date: 2015-04-08
BEIJING QIHOO TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, these webpages containing POI data are submerged in a large number of webpages. Compared with hundreds of billions of webpages, their proportion is less than one percent. However, there is no more accurate and fast POI data extraction method in the prior art.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting POI (Point of Interest) data from webpages
  • Method and device for extracting POI (Point of Interest) data from webpages
  • Method and device for extracting POI (Point of Interest) data from webpages

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

[0046] figure 1 It is a flow chart of steps of a method for extracting POI data in a webpage according to an embodiment of the present invention; refer to figure 1 , the method includes:

[0047] S101: Obtain multiple webpages containing POI data;

[0048] It should be noted that various methods can be used to obtain the webpage. In this embodiment, multiple webpages containing POI data are obtained according to the URL of the preset target website. Of course, other methods can also be used, and this embodiment does not be restricted.

[0049] S102: Perform address pattern clustering on the webpage according to the URL address of each webpage, so as to obtain multiple ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a device for extracting POI (Point of Interest) data from webpages, and relates to the technical field of information point extraction. The method comprises the following steps: acquiring a plurality of webpages comprising POI data; performing address mode clustering on the webpages according to the URL (Uniform Resource Locator) address of each webpage; sequencing a plurality of address modes based on the quantity of webpages corresponding to each address mode to obtain the sequencing result of each address mode; selecting N address modes with largest webpage quantities; extracting POI data comprised in the webpages corresponding to the N address modes respectively. Through the method and the device, mass webpages of POI data can be determined from hundred-billion-scale webpages more rapidly, and the POI data are extracted from the webpages more accurately.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a method and device for extracting POI data in a webpage. Background technique [0002] POI: The abbreviation of "Point of Interest". The data in each POI generally includes information such as name, category, longitude, latitude, nearby hotels, restaurants and shops, and can be used as a location identifier in an electronic map. Map search requires a large amount of POI data as a search source, and POI data is mainly constructed through purchase, cooperation or self-construction. [0003] The Internet contains a lot of geographic location information that can be used as POI data. For example, a company will give its address and contact information on its homepage, and a gourmet website will give the specific location of the store and order phone information, etc. These contain POI data. The web page provides a rich source of data for map searches. [0004] However, these ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/335G06F16/35G06F16/9535
Inventor 魏少俊
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products