Distributed searching method based on cloud computing

A distributed search and cloud computing technology, applied in the field of distributed search, can solve the problems of consuming a lot of time and storage space, the inability to accurately realize the user's search intention, and the understanding of word meanings are not the same, so as to achieve accurate search results, The effect of rich query results

Inactive Publication Date: 2014-03-05
TONGJI UNIV
View PDF2 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] (1) Massive network data: the amount of network information is large and the coverage is wide, and the calculation and storage of these data need to consume a lot of time and storage space
[0007] (2) User differences: users have different background knowledge, and their understanding of word meanings is also different. Different users have different tendencies for the same search term
[0008] (3) Retrieval is related to time: the same retrieval request by the user in different periods or stages still obtains exactly the same retrieval results, which does not have adaptive ability for users
[0009] (4) Expression of search terms: Due to the lack of domain knowledge of users and the limitations of the query interface of search engines, it is impossible to accurately realize the user's search intent

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed searching method based on cloud computing
  • Distributed searching method based on cloud computing
  • Distributed searching method based on cloud computing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0043] Such as figure 1 As shown, the present invention provides a kind of distributed search method based on cloud computing, and this method comprises the following steps:

[0044] Step ⑴: crawl network files in various formats through distributed web crawlers, including HTML, PPT, EXCEL, PDF files;

[0045] Step ⑵: Analyze the files crawled by the crawler through distributed parallel extraction, the extraction format is a custom document table format, and extract the text, title, author and other relevant information;

[0046] Specifically: URL+title+parsing time+author+source+text+pr value+category+link.

[0047] Among them: url is the webpage link, title is the title of the webpage, parsing time refers to the date of parsing, author refers to the author of the webpage, the initial value is "unknown", source refers to the source of the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed searching method based on cloud computing. The method includes the steps that network files of various formats are crawled through a distributed web crawler; a document table format with a user-defined format is extracted through the files crawled by the distributed parallel extraction analysis crawler; the extracted document content is stored into a distributed database, and a document table database is established; an index table is established through the document table database, and a parallel computing technology is also adopted; an index table format is also of a user-defined format; index files are imported to an index database, and index data are provided for a searcher; a PageRank and optimized on-line sorting algorithm is adopted in search results. The distributed searching method based on cloud computing has the advantages that the distributed storage and computing characteristic is adopted, by the aid of the improved and optimized sorting algorithm, the search results are more accurate, and due to the fact that the semantic extension keyword technology is used, and the search results are richer.

Description

technical field [0001] The invention relates to a distributed search method, in particular to a distributed search method based on cloud computing for fast retrieval under processing large data. Background technique [0002] With the rapid development of Internet, WWW (World Wide Web referred to as WWW) has become a huge information space, providing users with valuable information resources. In the face of a large number of information resources, it is very inconvenient to browse step by step through a browser. How to quickly and accurately obtain the required information from the WWW has become a crucial issue. The emergence of search engines has greatly improved people's ability to collect information. However, the existing search engines still have problems and difficulties in terms of search efficiency, information maintenance, information repetition, network and sites, and load. [0003] At present, most search engines are centralized in terms of architecture. That i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951G06F16/2471
Inventor 向阳陈佑雄张依杨平宇张波袁书寒
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products