Domain-oriented network information search method

A technology of network information and search methods, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of poor timeliness of web page information

Inactive Publication Date: 2013-04-17
BEIJING INFORMATION SCI & TECH UNIV
View PDF1 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] With the rapid growth of webpage information, the total number of webpages has exceeded 3.5 billion at present, and it is increasing at a rate of one million every day. This will cause the timeliness of webpage in

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Domain-oriented network information search method
  • Domain-oriented network information search method
  • Domain-oriented network information search method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The application examples of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0050] The present invention takes the salt lake industrial field as an example, and develops a field-oriented network information search method. The specific process is as follows:

[0051] In the first step, a collection of domain websites is carefully provided by domain experts, selected as the initial URL of the spider, and used as a training webpage for classifier training. This case summarizes the field website collection based on the survey questionnaire, see Figure 4 :

[0052] It can be seen that Salt Lake chemical workers are more inclined to chemical websites and HowNet literature websites. These websites can be used as initial URLs and placed in the queue of URLs to be crawled to provide URLs for future crawler collection. Here are some related URLs:

[0053] China Chemical Network: http: / / china.chemnet.com /

[00...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a domain-oriented network information search method. By the domain-oriented network information search method, data information can be unified on the same platform, information of multiple data sources can be searched, and various formats of data including structured, semi-structured and unstructured data are supported. The domain-oriented network information search method which is a network information acquisition method includes steps of enabling domain experts to specify domain website sets, submitting keywords according to domain characteristics and creating a domain keyword bank; compiling information acquisition strategies according to link and content analysis and then acquiring target web pages in domain websites; and extracting, filtering and classifying acquired information of the web pages, creating a database and storing the information according to an inverted index. The domain-oriented network information search method is specifically implemented by a web page acquisition spider module, a classifier training module and a data index module. The domain-oriented network information search method is high in adaptability and topic relevance, and a vertical search engine using the method as a core is high in recall ratio and precision ratio.

Description

technical field [0001] The invention is a field-oriented network information search method, and relates to relevant technologies such as the improvement of theme crawler collection strategies, webpage content extraction and classification, and the like. Background technique [0002] With the rapid growth of webpage information, the total number of webpages has exceeded 3.5 billion at present, and it is increasing at a rate of one million every day. This will cause the timeliness of webpage information indexed by general search engines to be poor, and it is difficult to meet the needs of different professional users. The rapid development of the Internet poses a huge challenge to the search of WEB information. So the vertical search engine facing the field came into being. [0003] The search engine based on topic web crawler (that is, the fourth generation search engine) has become a hot research direction of current search engine. Vertical search engines focus on a specif...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 张健冯飞胡亮齐林张小栓徐晓莉邢晓辉魏宗洋王楠甘露刘菁
Owner BEIJING INFORMATION SCI & TECH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products