Method for detecting search engine cheat based on small sample set

A search engine and detection method technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of high acquisition cost, achieve the effect of improving performance and overcoming the problem of sample imbalance

Inactive Publication Date: 2009-01-21
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF0 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In order to solve the problem that the acquisition cost of samples required by existing machine learning methods is high, and traditional learning algorithms are difficult to achieve good results in unbalanced sample learning, the purpose of this invention is to reduce the human resources required for obtaining samples and reduce costs Obtain good results in unbalanced sample learning, for this reason the present invention provides a kind of search engine Web cheating detection method based on small sample set

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for detecting search engine cheat based on small sample set
  • Method for detecting search engine cheat based on small sample set
  • Method for detecting search engine cheat based on small sample set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the described embodiments are only intended to facilitate the understanding of the present invention and do not have any limiting effect on it.

[0027] In order to realize the method of the present invention, considering that the algorithm involves multiple resampling and iterative processes, if it is implemented in a single machine, it is best to ensure that the processor frequency is not less than 2GHz and the memory is not less than 1G, and can be written in any common programming language.

[0028] The Web cheating detection method based on a small sample set proposed by the present invention, the overall process is as follows figure 1 As shown, the specific data flow of each step is figure 2 , 3 , 4 is given. The preprocessing (step S1) part prepares data for the entire cheating detection work; step S2 is the iterative expansion process of the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to Internet information retrieval and discloses a method for detecting Internet cheats on the basis of a small sample set to strike search engine cheating behaviors which become severe increasingly. The invention uses the iterative execution of self-learning on the basis of a classifier and the linking learning process on the basis of an Internet topological structure to continuously expand a training set aiming at the problem that a detection sample has high collecting cost to realize that search engine cheats are detected under the small sample set, an integrated down sampling strategy is adopted in the identification process, and information which is contained in websites which exist widely on Internet and have high reputation is fully used. Finally, label transmission along the Internet topological structure and on the basis of prediction cheating degree is carried out to realize the optimization of detection results. Experiments are used to show that the method can effectively detect the cheating behaviors.

Description

Technical field [0001] The invention relates to the technical field of information retrieval and search engines, and is a method for cheating detection of search engines under small samples. Background technique [0002] As the largest information library ever, the Internet is still growing exponentially. Internet search has become a part of people's daily life. The CNNIC report released in July 2006 stated that search engines ranked first among the Internet services most frequently used by netizens with a ratio of 66.3%. [0003] Scholars such as N. Eiron used the well-known PageRank algorithm to rank 100 million web pages, and found that 11 of the top 20 websites were pornographic websites. These websites obtained high rankings by tampering with hyperlinks. According to a survey conducted by the US Bureau of Business Investigation, in 2006, e-commerce sales in the United States reached US$114.1 billion, an increase of 22.7% from US$93 billion in 2005. In 2007, the first quarter...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 耿光刚王春恒戴汝为李秋丹朱远平
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products