Unlock instant, AI-driven research and patent intelligence for your innovation.

Method for detecting search engine cheat based on small sample set

A search engine and detection method technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of high acquisition cost, achieve the effect of improving performance and overcoming the problem of sample imbalance

Inactive Publication Date: 2011-09-07
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In order to solve the problem that the acquisition cost of samples required by existing machine learning methods is high, and traditional learning algorithms are difficult to achieve good results in unbalanced sample learning, the purpose of this invention is to reduce the human resources required for obtaining samples and reduce costs Obtain good results in unbalanced sample learning, for this reason the present invention provides a kind of search engine Web cheating detection method based on small sample set

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for detecting search engine cheat based on small sample set
  • Method for detecting search engine cheat based on small sample set
  • Method for detecting search engine cheat based on small sample set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be described in detail below in conjunction with the accompanying drawings. It should be noted that the described embodiments are only intended to facilitate the understanding of the present invention, rather than limiting it in any way.

[0027] In order to realize the method of the present invention, considering that the algorithm involves multiple resampling and iterative processes, if it is implemented on a single machine, it is best to ensure that the main frequency of the processor is not less than 2GHz, and the memory is not less than 1G, and can be written in any common programming language.

[0028] The Web cheating detection method based on the small sample set proposed by the present invention, the overall process is as follows figure 1 As shown, the specific data flow of each step is composed of figure 2 , 3 , 4 is given. The preprocessing (step S1) part prepares data for the entire cheating detection work; step S2 is the iterat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to Internet information retrieval and discloses a method for detecting Internet cheats on the basis of a small sample set to strike search engine cheating behaviors which become severe increasingly. The invention uses the iterative execution of self-learning on the basis of a classifier and the linking learning process on the basis of an Internet topological structure to continuously expand a training set aiming at the problem that a detection sample has high collecting cost to realize that search engine cheats are detected under the small sample set, an integrated down sampling strategy is adopted in the identification process, and information which is contained in websites which exist widely on Internet and have high reputation is fully used. Finally, label transmission along the Internet topological structure and on the basis of prediction cheating degree is carried out to realize the optimization of detection results. Experiments are used to show that the method can effectively detect the cheating behaviors.

Description

technical field [0001] The invention relates to the technical fields of information retrieval and search engines, and relates to a method for detecting cheating of search engines under small samples. Background technique [0002] As the largest repository of information ever created, the Internet is still growing exponentially. Internet search has become a part of people's daily life. According to a report released by CNNIC in July 2006, search engines accounted for 66.3% of the Internet users' most frequently used Internet services. [0003] Scholars such as N. Eiron used the famous PageRank algorithm to sort 100 million web pages, and found that 11 of the top 20 websites were pornographic websites, and these websites obtained high rankings by tampering with hyperlinks. According to the survey of the US Business Research Bureau, in 2006, the e-commerce sales in the United States reached 114.1 billion US dollars, an increase of 22.7% compared with 93 billion US dollars in 2...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 耿光刚王春恒戴汝为李秋丹朱远平
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More