Stop word mining method and device, search method and device, evaluation method and device

A mining device and technology of stop words, applied in the Internet field, can solve problems such as low accuracy rate, achieve the effect of improving accuracy rate, saving browsing time, and saving storage space

Active Publication Date: 2019-03-26
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Based on this, it is necessary to provide a stop word mining method that can improve the accuracy for the low accuracy of traditional stop word mining.
[0005] In addition, it is also necessary to provide a stop word mining device that can improve the accuracy for the low accuracy of traditional stop word mining

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Stop word mining method and device, search method and device, evaluation method and device
  • Stop word mining method and device, search method and device, evaluation method and device
  • Stop word mining method and device, search method and device, evaluation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] The method and device for mining stop words, as well as the technical solution for the evaluation method and device for mining algorithms for stop words will be described in detail below in conjunction with specific embodiments and accompanying drawings, so as to make them more clear.

[0056] Such as figure 1 Shown, in one embodiment, a kind of mining method of stop word, comprises the following steps:

[0057] Step S102, acquiring query logs.

[0058] Specifically, the query log is used to record the information generated by the user inputting the query string to perform the query behavior and triggering the query result behavior. The query log includes the query string, the webpage address obtained from the query, the behavior of modifying the query string, the behavior of triggering the webpage address, and the corresponding relationship between the query string and the webpage address.

[0059] Step S104, obtain the reverse document frequency of the query word in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a stop word mining method. The stop word mining method comprises the following steps that a query log is obtained; the inverse document frequency of search terms in a search string recorded in the query log, the relative term weights of the search terms, search term sets generated due to search string modifying behaviors, and at least one kind of attribute information in the corresponding relation set, generated due to trigger behaviors, between the search string and a web page URL are obtained, and a stop word set is generated according to the attribute information. In addition, the invention provides a stop word mining device, a searching method and device, and an evaluating method and device for a stop word mining algorithm. By the adoption of the stop word mining method and device, the accuracy rate of mined stop words is increased; according to the searching method and device, due to the fact that an original search string is simplified by removing stop words, more relevant web pages can be found out, and the accuracy rate of searching is increased; according to the evaluating method and device for the stop word mining algorithm, evaluation is conducted in a cross validation manner, and an optimal algorithm can be obtained through comparison.

Description

technical field [0001] The present invention relates to Internet technology, in particular to a stop word mining method and device, a search method and device, and a stop word mining algorithm evaluation method and device. Background technique [0002] Stop words are query words that are automatically ignored by search engines when indexing web pages or processing query requests. Stop words usually appear too frequently and have no practical meaning, such as "the", "a", "的", "了". , removing such words is beneficial to reduce the scale of web search and improve the accuracy of search results. [0003] There are two traditional ways to mine stop words, one is manual selection according to a certain standard; the other is automatic mining from web documents and search engine logs. Manual selection requires a lot of manpower and is inefficient. There are two ways to automatically mine stop words from web documents and search engine logs. One is to use random sampling to genera...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/951G06F16/953
CPCG06F16/951
Inventor 赵耀胡熠刘磊程佳
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products