Association rule mining-based method for determining public webpage involving personal information, electronic equipment and storage medium

A determination method and information network technology, applied in digital data information retrieval, network data retrieval, network data indexing and other directions, can solve problems such as low work efficiency and personal information leakage, and achieve the effect of reducing the risk of violations of laws and regulations

Pending Publication Date: 2022-05-10
国家计算机网络与信息安全管理中心黑龙江分中心
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to overcome the above-mentioned defects in the prior art, the present invention provides a method for judging webpages related to personal information publicity based on associa

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Association rule mining-based method for determining public webpage involving personal information, electronic equipment and storage medium
  • Association rule mining-based method for determining public webpage involving personal information, electronic equipment and storage medium
  • Association rule mining-based method for determining public webpage involving personal information, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] Such as figure 1 As shown, this embodiment 1 provides a method for judging webpages involving personal information disclosure based on association rule mining, including the following steps:

[0063] Step S1, crawling the announcement webpages to form a webpage collection W, and manually marking to form a personal information webpage collection WP and a non-personal information webpage collection WN.

[0064] Step S11, crawling all the announcement webpages on the website, analyzing the TITLE of the webpage, the content of the webpage, the name of the attachment and the content of the attachment, and forming a set W of webpages containing webpage elements that are helpful to personal information in the announcement content;

[0065] W={Webpage 1 ,Webpage 2 ,Webpage 3 ...} (1)

[0066] In formula (1), Webpage is a connection string of a certain webpage's TITLE, webpage content, attachment name and attachment content webpage elements.

[0067] Step S12: Carry out man...

Embodiment 2

[0102] Embodiment 2 of the present application provides an electronic device in the form of a general-purpose computing device. Components of an electronic device may include, but are not limited to: one or more processors or processing units, memory for storing computer programs that can run on the processors, connections to different system components (including memory, one or more processors or processing unit) bus.

[0103] Wherein, when the one or more processors or processing units are used to run the computer program, execute the steps of the method described in Embodiment 1. The types of processors used include central processing units, general purpose processors, digital signal processors, application specific integrated circuits, field programmable gate arrays or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.

[0104] Wherein, the bus refers to one or more of several types of bus structures, including a me...

Embodiment 3

[0106] Embodiment 3 of the present application provides a storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method described in Embodiment 1 are implemented.

[0107] It should be noted that the storage medium shown in this application may be a computer-readable signal medium or a storage medium or any combination of the above two. The storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage dev...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an association rule mining-based determination method for a public webpage involving personal information, electronic equipment and a storage medium, belongs to the field of analysis and determination of public announcement webpages of personal information, and aims to solve the problem of low working efficiency depending on manual work and the problem of controlling leakage of personal information. The method mainly comprises the steps of crawling announcement web pages to form a web page set W, and forming a personal information web page set WP and a non-personal information web page set WN through manual annotation; performing Chinese word segmentation on each webpage Webpage in the personal information webpage set WP, tagging part-of-speech, forming a personal information publication webpage word segmentation set WPP and the like, and finally judging whether a new publication announcement webpage relates to personal information or not by applying a frequent item set FI. Through a computer association rule mining technology, automatic discovery of publication announcement webpages related to personal information is realized, and the working efficiency is greatly improved.

Description

technical field [0001] The present invention relates to the technical field of analysis and determination of webpages, in particular to a determination method, electronic equipment and a storage medium for a webpage involving personal information disclosure based on association rule mining. Background technique [0002] At present, announcements on the website have become a common working method to inform the public, solicit opinions and improve work. Most of the portal websites of the units have opened a separate announcement column to focus on the announcement information of the unit. All kinds of publicity and announcement information include the publicity of the catalog of rights and interests, the publicity of project environmental impact assessment, the publicity of charging standards, the publicity of examination results, the publicity of award lists, etc. Some of the publicity of the publicity includes personal names, mobile phone numbers, ID numbers, and home address...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F21/62G06F16/9035G06F16/951G06F16/2458G06F40/289
CPCG06F21/6245G06F16/9035G06F16/951G06F16/2465G06F40/289
Inventor 于佳华刘琨常远张光耀孙巍
Owner 国家计算机网络与信息安全管理中心黑龙江分中心
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products