Solr webpage sorting optimization method based on big data
A ranking optimization and big data technology, applied in the field of information retrieval and big data, can solve the problems of no research on web page ranking, and no consideration of the topic drift defect of the PageRank algorithm.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0047] like Figure 1-5 shown.
[0048] A method for sorting and optimizing Solr web pages based on big data, comprising the following steps:
[0049] 1) Build a Solr search engine, the Solr search engine includes an information extraction module, a storage module, an index module and a retrieval module; the information extraction module crawls and parses web pages from the Internet through Nutch; the storage module crawls and parses the above The web pages are stored in the database; the indexing module transfers the file information in the database to the indexing tool in the Solr search engine and builds an index; the retrieval module responds to the user's query request and displays it to the user with the Browse interface that comes with the Solr search engine ;
[0050] 2) Web page importance calculation: The weight distribution of the PageRank algorithm is improved through KMeans clustering, and the Cluster-PageRank algorithm is obtained;
[0051] 2-1) Extract the we...
Embodiment 2
[0065] The big data-based Solr web page ranking optimization method as described in Embodiment 1, the difference lies in the formula of the Cluster-PageRank algorithm in the step 2):
[0066] P R ( A ) = ( 1 - m ) + m [ α × Σ P i ∈ W 1 P R ( P i ) L ( i ) + β × Σ P j ∈ ...
Embodiment 3
[0069] The big data-based Solr web page ranking optimization method as described in Embodiment 2, the difference is that α>β. In the present invention, it is believed that the linking pages related to the theme should obtain higher weights, while the linking pages that are not related to the theme should have a lower weight, so as to prevent the theme drift caused by the PageRank value of the pages not related to the theme being too high, so here α should be greater than beta.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com