Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

A Web Page Ranking Method Based on Random Forest Algorithm

A random forest algorithm and sorting method technology, applied in the field of web page sorting, can solve problems such as not very good, not very good search experience, and achieve the effect of good information, strong target, and accurate search.

Active Publication Date: 2020-10-02
GUANGDONG KINGPOINT DATA SCI & TECH CO LTD
View PDF13 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But the user's search experience is not very good, and the information is not very good

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Web Page Ranking Method Based on Random Forest Algorithm
  • A Web Page Ranking Method Based on Random Forest Algorithm
  • A Web Page Ranking Method Based on Random Forest Algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The above and other technical features and advantages of the present invention will be described in more detail below in conjunction with the accompanying drawings.

[0042] Such as figure 1 Shown is a flow chart of a method for sorting webpages based on the random forest algorithm of the present invention, the method comprising the following steps:

[0043] Step S1, obtaining keywords and keyword alternatives corresponding to the search webpage.

[0044] Specifically, statistically-based semantic analysis is performed on user search words and keywords are divided, and then a set number of words similar to keywords are called from the thesaurus as key candidate words.

[0045] Step S2, calculating the word frequency and weight of keywords or key candidate words corresponding to the search webpage.

[0046] Specifically, the formula for calculating the word frequency of keywords or key candidate words is:

[0047]

[0048] In the formula, tf i,j is the frequency of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a webpage sequencing method based on a random forest algorithm. The method includes the steps of obtaining keywords and keyword candidate words corresponding to search webpages;calculating the word frequencies and weights of the keywords or keyword candidate words corresponding to the search webpages; calculating PR values of quality related indexes of the search webpages;calculating pivot values and weight values of the search webpages; calculating the relevance between the recently browsed webpages and the search webpages and the product of TF-IDF values of the keywords and keyword candidate words of the recently browsed webpages; calculating whether an output index is larger than a set threshold or not, wherein the output index is the product of the number of times of a user for browsing the search webpages beyond stipulated access time and a certain function of the webpage staying time meeting the conditions; establishing a random forest model and recordinga corresponding result; calculating final search webpage scores and conducting sequencing. Compared with the prior art, a traditional HITS algorithm is improved to a certain extent by means of a random forest method, the service experience of the user is improved, and information is better and more accurate.

Description

technical field [0001] The invention relates to the technical field of webpage sorting, in particular to a method for sorting webpages based on a random forest algorithm. Background technique [0002] With the rapid development of computer technology, people have more and faster ways to obtain information, but with the explosive growth of information, it is more difficult for people to accurately obtain information. How to provide faster and better information to The information that users want appears to be very important. The birth of search engines such as Baidu and Google is to make it easier for people to quickly and accurately find what they need in the vast ocean of information. And an excellent search engine should provide users with the most important and valuable webpage information they need and rank it in front, and the service provided should be simple and humanized, so that users can search and search in a short period of time. Get satisfactory relevant searc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/9532G06F16/9535G06F40/284G06N20/00
CPCG06F16/9535G06F40/284G06N20/00
Inventor 陶波许飞月陈乐焱简宋全
Owner GUANGDONG KINGPOINT DATA SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products