Classified site mining method and device and searching method and system

A digging device and site technology, which is applied in the direction of website content management, network data indexing, network data retrieval, etc., can solve the problems of cumbersome keyword determination process, low accuracy rate, time-consuming and laborious, etc., and achieve accurate classification of site search results, High recall rate and precision rate, overcoming the effect of cumbersome operation

Inactive Publication Date: 2015-04-22
BEIJING QIHOO TECH CO LTD +1
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The above scheme has the following defects: 1. First, a large number of keywords need to be determined with the method of keyword matching, and the determination process of keywords is very tedious, requires manual participation, and the comprehensiveness of keywords cannot be guaranteed
2. Due to the problem of synonyms in keyword matching, that is, the same meaning may have different expressions, resulting in poor interaction and low accuracy in the matching process
3. The web page keyword matching program is relatively complicated, and requires a lot of calculations for all web pages, which is time-consuming and labor-intensive, and poor in implementability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Classified site mining method and device and searching method and system
  • Classified site mining method and device and searching method and system
  • Classified site mining method and device and searching method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0044] figure 1 A flow chart of a method for mining classified sites according to an embodiment of the present invention is shown. Such as figure 1 As shown, the method includes:

[0045] Step S110, for a category of sites to be mined, determine one or more basic sites belonging to the category.

[0046] Step S120, extracting the webpage content of the basic site, and mining recommended and / or referenced links of other sites in ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a classified site mining method and device and a searching method and system. The classified site mining method includes the steps of classifying a site to be mined to determine one or more basic sites belonging to the category; extracting webpage content in the basic site, and mining links, recommended and / or quoted in the webpage content in the basic sites, of the other sites; adding one or more links of the other sites into a site set of the category. According to the technical scheme, the recommendation and quotation relationship of the basic site are utilized for mining the other sites belonging to the same site category with the basic site, and the site set of the category is obtained. According to the technical scheme, the classified site mining method and device and the searching method and system have the advantages of being simple and visual in principle, overcoming defects that in the prior art, operation is complicated, and practicability is poor, being more convenient and effective, being higher in the recall rate and accuracy rate, enabling users to obtain rapid, effective and accurate searching results of the classified sites in the searching process, and meeting searching requirements of the users.

Description

technical field [0001] The invention relates to the technical field of data mining, in particular to a method and device for mining classified sites and a search method and system. Background technique [0002] When a search engine needs to include specific types of Internet resources, it is often necessary to classify the sites. For example, when it is necessary to collect some IT technology resources, it is often necessary to dig out some IT technology sites first, and then adopt a specific strategy to crawl the resource pages on the website according to the characteristics of these sites. [0003] In the prior art, to mine specific types of sites, it is first necessary to classify the content of the sites, which is the first step in the collection work. At present, the classification of sites is generally based on the method of keyword matching, that is, first set some keywords, and then calculate whether these keywords are included in the webpage. The type of all web p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/972G06F16/951G06F16/958
Inventor 王智广魏少俊
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products