Webpage sensitive information detection method and device and electronic equipment

A technology of sensitive information and detection methods, applied in the field of information security, can solve the problems of high false alarm rate and increased manual workload.

Active Publication Date: 2018-04-20
HANGZHOU ANHENG INFORMATION TECH CO LTD
View PDF11 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Because this method treats non-sensitive content containing keywords as sensitive information, it will filter out a large number o...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage sensitive information detection method and device and electronic equipment
  • Webpage sensitive information detection method and device and electronic equipment
  • Webpage sensitive information detection method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0065] The embodiment of the present invention provides a method for detecting sensitive webpage information, see figure 1 As shown, the method includes the following steps:

[0066] S101: Obtain webpage content of the website to be detected.

[0067] The specific web page content acquisition process includes the following steps, see figure 2 Shown:

[0068] S201: Obtain the page address of the website to be detected.

[0069] S202: Save the page address in the system database module.

[0070] S203: Perform page access according to the page address, and extract page content as web page content.

[0071] During specific implementation, start parsing from the initial page of the website to be detected, obtain the page address (webpage link) of the website to be detected, then store the page link into the system database module, and ensure that the same page link is not repeatedly stored, and then from the system The database module extracts saved page links that have not b...

Embodiment 2

[0106] An embodiment of the present invention provides a detection device for webpage sensitive information, see Figure 7 As shown, the device includes:

[0107] The first webpage content acquisition module 71 is used to acquire the webpage content of the website to be detected;

[0108] The first judging module 72 is used to judge whether the target keyword is included in the web page content, wherein the target keyword is a keyword related to the information to be detected; the information to be detected is preset sensitive information;

[0109] The second webpage content acquisition module 73 is used to extract the target webpage content in the target keyword preset range in the webpage content when the judgment result of the first judging module is yes;

[0110] The second judging module 74 is used to judge whether the content of the target web page contains associated keywords, wherein the associated keywords are keywords associated with the target keywords in the prese...

Embodiment 3

[0121] An embodiment of the present invention provides an electronic device, see Figure 8 As shown, the electronic device includes: a processor 80, a memory 81, a bus 82 and a communication interface 83, and the processor 80, the communication interface 83 and the memory 81 are connected through the bus 82; the processor 80 is used to execute the stored in the memory 81 Executable modules, such as computer programs. When the processor executes the computer program, the steps of the methods described in the method embodiments are realized.

[0122] Wherein, the memory 81 may include a high-speed random access memory (RAM, RandomAccessMemory), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the system network element and at least one other network element is realized through at least one communication interface 83 (which can be wired or wireless), and the Internet, wide area network, local...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a webpage sensitive information detection method and device and electronic equipment, and relates to the technical field of information safety. The method comprises the following steps of: obtaining a webpage content of a to-be-detected website; judging whether the webpage content comprises a target keyword, wherein the target keyword is a keyword associated to preset sensitive information; if the judging result is positive, extracting a target webpage content, in a preset range of the target keyword, in the webpage contents; judging whether the target webpage content comprises an associated keyword or not, wherein the associated keyword is a keyword associated with the target keyword in a preset associated keyword library; if the judging result is positive, solvinga weighted sum of the associated keyword so as to obtain a weighted score; and when the score is greater than a preset threshold value, determining that the to-be-detected website comprises to-be-detected information. According to the method, dual judgement of the target keyword and the associated keyword can be carried out on the webpage contents of the to-be-detected website, so that the false alarm rate of automatic detection of webpage sensitive information is reduced, the workload of artificial audition is decreased, the working efficiency is improved and the labor cost is reduced.

Description

technical field [0001] The present invention relates to the technical field of information security, in particular to a method, device and electronic equipment for detecting sensitive webpage information. Background technique [0002] With the rapid development of information technology and the Internet, webpages have become one of the important ways for various organizations, units and individuals to publish and obtain information, and hundreds of millions of webpages are updated and browsed every day. However, not all information on the web is legal or civil. Due to reasons such as hacking, information leakage, and unethical behavior of netizens, various uncivilized information and some illegally leaked sensitive information (such as commercial secrets, etc.) also exist on the webpage. [0003] In order to ensure that information is not illegally leaked and Internet content is green and healthy, many website content auditors and companies need to manually check a large nu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/951G06F16/955G06F40/289
Inventor 沈晓峰范渊
Owner HANGZHOU ANHENG INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products