Solr webpage sorting optimization method based on big data

A ranking optimization and big data technology, applied in the field of information retrieval and big data, can solve the problems of no research on web page ranking, and no consideration of the topic drift defect of the PageRank algorithm.

Inactive Publication Date: 2016-07-27
SHANDONG UNIV
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the above-mentioned patent does not take into account the theme drift defect in the PageRank algorithm and the time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Solr webpage sorting optimization method based on big data
  • Solr webpage sorting optimization method based on big data
  • Solr webpage sorting optimization method based on big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] like Figure 1-5 shown.

[0048] A method for sorting and optimizing Solr web pages based on big data, comprising the following steps:

[0049] 1) Build a Solr search engine, the Solr search engine includes an information extraction module, a storage module, an index module and a retrieval module; the information extraction module crawls and parses web pages from the Internet through Nutch; the storage module crawls and parses the above The web pages are stored in the database; the indexing module transfers the file information in the database to the indexing tool in the Solr search engine and builds an index; the retrieval module responds to the user's query request and displays it to the user with the Browse interface that comes with the Solr search engine ;

[0050] 2) Web page importance calculation: The weight distribution of the PageRank algorithm is improved through KMeans clustering, and the Cluster-PageRank algorithm is obtained;

[0051] 2-1) Extract the we...

Embodiment 2

[0065] The big data-based Solr web page ranking optimization method as described in Embodiment 1, the difference lies in the formula of the Cluster-PageRank algorithm in the step 2):

[0066] P R ( A ) = ( 1 - m ) + m [ α × Σ P i ∈ W 1 P R ( P i ) L ( i ) + β × Σ P j ∈ ...

Embodiment 3

[0069] The big data-based Solr web page ranking optimization method as described in Embodiment 2, the difference is that α>β. In the present invention, it is believed that the linking pages related to the theme should obtain higher weights, while the linking pages that are not related to the theme should have a lower weight, so as to prevent the theme drift caused by the PageRank value of the pages not related to the theme being too high, so here α should be greater than beta.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Solr webpage sorting optimization method based on big data. The method comprises adding webpage importance degree and webpage time factor in the existing Solr sorting algorithm through the adoption of a Solr external domain concept. The method effectively solves the problem that the webpage sorting is inaccurate since the primary Solr sorting algorithm is only in consideration of the matching degree of a search word and the webpage text content, the webpage with high text relevancy, high authority and more effectiveness comes top front position. The improved sorting algorithm is good in application in the Solr search engine, and the webpage sorting quality and user experience are improved.

Description

technical field [0001] The invention relates to a big data-based Solr webpage sorting optimization method, which belongs to the technical field of information retrieval and big data. Background technique [0002] In the era of big data, the amount of information in search engines has exploded. The user inputs keywords, and the search engine system sorts the search results according to the sorting rules and displays them to the user. A reasonable web page ranking algorithm can provide users with better search results, and is one of the key technologies of the search engine system. [0003] As an excellent search engine system, Solr is based on Lucene at the bottom, but it provides a richer query language than Lucene, and meets the needs of search engines in data acquisition, parsing, word segmentation, etc. At present, Solr's sorting rules are inherited from Lucene's text relevance scoring model, which is sorted according to the relevance of web page texts. However, Solr's...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/955
Inventor 袁东风张艳徐秀珊
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products