Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Strategy for Vertical Crawler Data Classification Integration

A technology of data classification and data integration, applied in the field of vertical search engines, it can solve the problems of unstructured effective information, no personalized optimization of returned results, and low level of data processing, so as to avoid multiple searches.

Inactive Publication Date: 2018-01-05
XIAMEN UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The existing general search strategy is to obtain data as much as possible, but the level of data processing is relatively low. The prominent problems are: too much invalid information (more noise data), insufficient effective information, unstructured effective information, and unpersonalized returned results optimization mechanism

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Strategy for Vertical Crawler Data Classification Integration
  • A Strategy for Vertical Crawler Data Classification Integration
  • A Strategy for Vertical Crawler Data Classification Integration

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be further described below through specific embodiments.

[0031] refer to figure 1 , a strategy for classification and integration of vertical crawler data, including two parts: a classification system and its mapping mechanism and a dynamic classification data integration mechanism, wherein the classification system and its mapping mechanism include the following steps:

[0032] 1) Construction of benchmark category system;

[0033] 2) Construction of the category system of the crawler target website;

[0034] 3) Construction of category system mapping mechanism.

[0035] Step 1). The benchmark category system is the category system of the integrated website system, which serves as a benchmark, and the category systems of other websites are aligned with it. Three-level category applications can be used for construction, and its architecture is shown in Table 1 below. There are mainly four-dimensional attributes: category ID, large categ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a strategy used for vertical crawler data classification and integration. The strategy comprises two parts including a classification system and a mapping mechanism of the classification system, and a dynamic classification data integration mechanism, wherein the classification system and the mapping mechanism of the classification system adopt the following steps that: (1) a reference class system is built; (2) a class system of a crawler target website is built; and (3) a class system mapping mechanism is built. The strategy has the advantages that the class system obtained after vertical crawler data capture and unstructured data resolution can be effectively integrated; the completeness of a source classification system is maintained; and meanwhile, the dynamic tracking on the source classification system can also be realized.

Description

technical field [0001] The invention relates to the technical field of vertical search engines, in particular to a strategy for classification and integration of vertical crawler data. Background technique [0002] With the explosive growth of webpage information, the use value of search engines is getting higher and higher, becoming an indispensable tool for network users, providing users with information navigation and query services. It integrates numerous webpage resources on the Internet, provides relevant webpages according to keywords inquired by users, and sorts them according to their relevance. It is the entrance of the entire Internet. At present, the comprehensive search engine is the main force to provide users with query services, but its comprehensiveness determines that it cannot meet the professional people's demand for precise information services in specialized fields. Users' demand for information is diversified, so the service model of search engines wi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 吴梅红洪志令
Owner XIAMEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products