Unlock instant, AI-driven research and patent intelligence for your innovation.

Picture-oriented fraudulent webpage identification method, system and device and medium

A recognition method and web page technology, applied in computer security devices, character and pattern recognition, network data retrieval, etc., can solve problems such as slow speed, inability to obtain valid information, poor effect, etc., achieve fast detection speed, and improve calculation speed. and the effect of high precision, precision and recall

Active Publication Date: 2021-01-12
SHANDONG BITTEL INTELLIGENT TECH CO LTD
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] That is to say, both the expert system method and the machine learning method have certain problems. The key premise that the existing methods can be used is that there is relatively rich text information in the webpage, and in fraudulent webpages, especially in webpages selling counterfeit drugs, it is relatively common A large number of pictures are piled up, and all effective information is displayed in the pictures, so the existing methods cannot get any effective information, resulting in unsatisfactory detection results
If OCR technology is used to identify all pictures, the speed is slow and the effect is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Picture-oriented fraudulent webpage identification method, system and device and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] A method for identifying fraudulent webpages based on pictures of the present invention comprises the following steps:

[0047] S100. Collect fraudulent webpages mainly based on pictures to construct webpage samples;

[0048] S200. For each fraudulent webpage, extract the tag tree information through the web page tag tree extraction tool, encode the tag tree through characters, construct a tag tree sequence according to the characters corresponding to the tag, and use the tag tree sequence as a fraudulent tag tree sequence;

[0049] For the malicious value corresponding to each fraud tag tree sequence, the above malicious degree is initialized based on the sample statistical value, and the malicious value is the malicious value of the malicious keyword;

[0050] S300. Construct a feature library based on each of the fraudulent label tree sequences and the update time and malicious value corresponding to each fraudulent label tree sequence. The update time of the above f...

Embodiment 2

[0062] A system for identifying fraudulent webpages based on pictures of the present invention includes a collection module, a label extraction module, a fraudulent label tree module, a malicious value initialization module, a feature database initialization module, a feature database initialization module, and a preliminary judgment module for a webpage to be tested , a suspected fraudulent web page judging module, a fraudulent web page judging module and a feature library cleaning module, the above-mentioned system can execute the method disclosed in Embodiment 1.

[0063] Wherein, the collection module is used to collect fraudulent webpage construction webpage samples mainly based on pictures.

[0064] The label extraction module is used to extract the label tree information through the webpage label tree extraction tool, and encode the label tree through characters, and construct the label tree sequence according to the characters corresponding to the label; or, it is...

Embodiment 3

[0078] An apparatus according to the present invention includes: at least one memory and at least one processor; the at least one memory is used to store a machine-readable program; the at least one processor is used to call the machine-readable program to execute the embodiment 1 public method.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a picture-oriented fraudulent webpage identification method, system and device and a medium, belongs to the technical field of fraudulent webpage identification, and aims to overcome the defects of fraudulent webpage detection on webpages based on pictures so as to rapidly and effectively identify fraudulent webpages. The method comprises the following steps: constructing afeature library based on each fraud label tree sequence and updating time and malicious values corresponding to each fraud label tree sequence; carrying out similarity calculation on the to-be-testedlabel tree sequences and fraud label tree sequences in a feature library, and for the to-be-tested label tree sequences with the similarity higher than a threshold value, determining that the corresponding to-be-tested web pages are suspicious fraud web pages; for the suspicious fraudulent webpage, if the maliciousness of the malicious keyword meets a preset value, determining the suspicious fraudulent webpage as a fraudulent webpage; and updating the malicious value corresponding to the new fraud label tree sequence, and adding the new fraud label sequence and the updating time and the malicious value corresponding to the new fraud label tree sequence into the feature library.

Description

technical field [0001] The invention relates to the technical field of fraudulent webpage identification, in particular to a method, system, device and medium for identifying fraudulent webpages mainly based on pictures. Background technique [0002] There are usually two methods to detect whether there is fraudulent information on a webpage, namely the expert system method and the machine learning method. The expert system method extracts the main content from the webpage, including the title, abstract and content, and then according to the key information stored in the expert system Words and other rule information to determine whether a web page contains fraudulent information. In this method, rule information such as feature words needs to be maintained manually, which requires a lot of manpower. The machine learning method extracts and classifies a large amount of webpage content, and classifies it into fraudulent webpages and non-fraudulent webpages. After the classifi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/12G06K9/34G06F21/56G06F16/958G06F16/33G06F16/23
CPCG06F21/128G06F21/56G06F16/958G06F16/3344G06F16/23G06V30/153
Inventor 刘广卫梁彦博王兆丽曹佃国乔志刚张笃强张安波
Owner SHANDONG BITTEL INTELLIGENT TECH CO LTD