A digital tag-based file leakage prevention and traceability evidence system
By constructing a file leakage prevention and traceability evidence collection system based on digital tags, the problem of fine-grained access control and leakage tracking of files throughout their entire lifecycle in the intranet was solved. This system enables multi-dimensional recording and traceability evidence collection of file behavior, thereby improving file security and emergency response efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NO 30 INST OF CHINA ELECTRONIC TECH GRP CORP
- Filing Date
- 2023-12-08
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies cannot effectively prevent fine-grained access control and leakage risks of files throughout their entire lifecycle on the intranet, and cannot perform multi-dimensional recording and source tracing, making it difficult to trace the source after a file is leaked.
By constructing a file leakage prevention and source tracing system based on digital tags, and utilizing terminal agent units and behavior auditing centers, the system enables multi-dimensional tag attribute storage and chained recording of files. Combined with artificial intelligence analysis of file behavior, it performs fine-grained access control and threat assessment, and conducts source tracing and evidence collection after file leakage.
It enables behavior auditing and fine-grained access control throughout the entire file lifecycle, improves the ability to detect abnormal file behavior and the speed of emergency response, and can promptly find the source of leakage and complete the closed loop of incident handling.
Smart Images

Figure CN117675373B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of network security protection technology, and in particular relates to a file leakage prevention and traceability evidence collection system based on digital tags. Background Technology
[0002] Currently, new information technologies, with data as their prominent feature, are developing rapidly.
[0003] Traditional security solutions mainly consist of passive defense measures such as firewalls, intrusion detection, and antivirus software, primarily preventing network attacks that originate from the outside in.
[0004] Effectively auditing the entire lifecycle of files within an intranet—from creation and opening to editing, transmission, and destruction—is a crucial aspect of enterprise data security. Li Jie's paper, "Design and Implementation of a Tag-Based File Control System," mentions tag-based file access control technology. Its main research focuses on secure file access control through file attributes and policies, solving the problem of allowing specific users to access files only in specific external environments such as specific networks or terminals. However, it cannot achieve finer-grained access control, such as for file editing and transmission, and therefore cannot effectively prevent file leakage risks. Furthermore, it lacks multi-dimensional chain recording of the entire file lifecycle from creation to destruction, including information on all legitimate readers and whether the file content has been modified or edited. This makes it difficult to effectively analyze and predict risks such as file copying, modification, and human-caused leakage. In the event of a file leak, it also fails to conduct source tracing and evidence collection based on the file itself. Summary of the Invention
[0005] The purpose of this invention is to overcome the problems of existing technologies by disclosing a file leakage prevention and traceability evidence collection system based on digital tags. This system constructs multi-dimensional tag attributes for electronic files using digital tag technology. Through tag chain storage, it covers the entire lifecycle of a file, from creation to deletion, including operations such as file creation, opening, editing, sending, and deletion. It relies on intelligent algorithms to analyze the static characteristics and dynamic behaviors of files, providing a comprehensive situational assessment and security threat evaluation. Simultaneously, it implements fine-grained access control based on the security attributes in the tags. When a file is leaked, the chain information of the tags can be used for traceability evidence collection, completing a closed-loop event handling process.
[0006] The objective of this invention is achieved through the following technical solution:
[0007] A file leakage prevention and traceability evidence collection system based on digital tags, the file leakage prevention and traceability evidence collection system includes: a terminal agent unit and a behavior audit center, the terminal agent unit completing information interaction with the behavior audit center through the network;
[0008] The terminal agent unit is deployed on each terminal computer within the local area network. It achieves fine-grained access control and behavior auditing of intranet files based on tag attributes by embedding tags, verifying tags, auditing behavior, and controlling access to intranet files.
[0009] The behavior audit center is deployed on the local area network backend server. It receives file attribute and tag information reported by terminal agent units, realizes intranet file asset presentation, digital tag management, security policy management, and behavior log auditing. Based on artificial intelligence technology, it realizes comprehensive presentation of file behavior status and threat prediction, and realizes automatic threat handling according to preset security policies.
[0010] According to a preferred implementation, when a user logs into a terminal computer to operate on a file, the terminal agent unit intercepts the file operation, checks the file's tag information, prohibits any operation on files without tags, and allows access to files with tags based on tag attributes. At the same time, it updates the file's new tag information, forms a tag chain around the timeline, and uploads the information to the behavior audit center for unified storage, analysis, and situation presentation, and completes file behavior threat detection, emergency response handling, and file source tracing and evidence collection.
[0011] According to a preferred embodiment, the digital tag used by the terminal agent unit is a data segment bound to a file, containing the file's basic attributes, security attributes, circulation attributes, and behavioral attribute information, and can be embedded into the file and circulate along with the file.
[0012] According to a preferred embodiment, the basic attributes include the identity information of the electronic document, including: the security classification of the document, the file name, and the file path;
[0013] The security attributes include the document's password processing and operation permission information, including: password processing methods such as encryption, signing, and adding watermarks, as well as read, write, and print operation permissions;
[0014] The circulation attributes include relevant information about the file during the circulation process, including: the sender, receiver, and transmission time of the file;
[0015] The behavioral attributes include information about the user's actions related to the file, including: file operation method, operator, and operation time.
[0016] According to a preferred embodiment, the tag embedding process includes: when a file is generated on a terminal computer, the embedding module obtains multi-dimensional attribute information of the file, generates a digital tag for the file, and uses a digital watermarking method to write the digital tag into a preset area of the file based on preset rules to complete the tag embedding.
[0017] According to a preferred embodiment, the terminal agent unit implements access control and behavior auditing for file operations based on the user layer and the kernel layer. The operation types include creating, reading, writing, deleting, renaming, and modifying permissions.
[0018] According to a preferred embodiment, the digital tag management process includes:
[0019] After collecting the file attribute and tag information reported by the terminal agent unit, the Behavior Audit Center cleans and standardizes the data, stores it uniformly, and completes the mapping table of "attribute-tag-tag summary".
[0020] During storage, the reported tag information is verified for integrity. If inconsistent tags are found, an alarm is triggered, and the security administrator takes security action. The security administrator edits the security attributes of the file tags to complete the "tag-policy" mapping and achieve secure access control of the files.
[0021] The audit center categorizes and stores the collected file tags, constructing a tag chain for each file to record the file's behavior information throughout its entire lifecycle from creation to destruction, providing data support for tracing and evidence collection after a file leak.
[0022] According to a preferred embodiment, the file behavior threat detection process includes: performing file behavior analysis and threat discovery based on file attribute and tag information reported by the terminal;
[0023] First, data preprocessing is performed, which involves reading the file attributes and tag information from the database, and then cleaning, deduplicating, and standardizing the data. At the same time, considering the needs of modeling, the time period factor is processed.
[0024] Secondly, data analysis was conducted, including clustering, file label determination, behavioral association analysis, frequency quantification, and time period clustering analysis of the preprocessed data.
[0025] Ultimately, behavioral profiles are completed, enabling tracking, threat alerts, and emergency response.
[0026] The aforementioned main solution of the present invention and its various further alternative solutions can be freely combined to form multiple solutions, all of which are solutions that can be adopted and are claimed by the present invention. Those skilled in the art, after understanding the solution of the present invention, will realize that there are many combinations based on existing technology and common knowledge, all of which are technical solutions to be protected by the present invention, and will not be exhaustively listed here.
[0027] The beneficial effects of this invention are:
[0028] This invention system, based on digital tagging technology, constructs multi-dimensional attribute tags for intranet files. These tags cover all file behaviors, including file creation, opening, editing, renaming, copying, sending, and deletion. Each file operation generates new tag information, which is then stored in a chain to create a file tag chain. This chain covers the entire lifecycle of a file, from its creation to its destruction. By setting tag security attributes, fine-grained behavior auditing and access control based on digital tags are achieved, effectively solving the problem of easy leakage of confidential files, such as unauthorized viewing, manual screen copying, content tampering, and unauthorized transmission.
[0029] Meanwhile, this invention relies on artificial intelligence algorithms to analyze the static characteristics and dynamic behavior of files, and utilizes a security risk assessment index system to comprehensively present the situation of file behavior and analyze security threats, thereby improving the ability to detect abnormal file behavior and the speed of emergency response. When a file is leaked, the chain information of the file tags can be used to trace the source of the leak and find the problematic links in a timely manner. Attached Figure Description
[0030] Figure 1 This is a schematic diagram of the architecture of the document leakage prevention and traceability evidence collection system of this invention;
[0031] Figure 2 This is a schematic diagram of the workflow of the document leakage prevention and traceability evidence collection system of this invention;
[0032] Figure 3 This is a flowchart of the digital tag embedding process for the document leakage prevention and traceability evidence collection system of this invention;
[0033] Figure 4 This is a tag chain data diagram of the document leakage prevention and traceability evidence collection system of this invention;
[0034] Figure 5 This is a flowchart of the behavior analysis of the document leakage prevention and traceability evidence collection system of this invention. Detailed Implementation
[0035] The following specific examples illustrate the implementation of the present invention. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that, unless otherwise specified, the following embodiments and features described therein can be combined with each other.
[0036] It should be noted that similar labels and letters in the following figures indicate similar items. Therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures.
[0037] In the description of this invention, it should be noted that the terms "center," "upper," "lower," "left," "right," "vertical," "horizontal," "inner," and "outer," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings, or the orientation or positional relationship commonly used when the product of this invention is in use. They are only for the convenience of describing this invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of this invention. In addition, the terms "first," "second," "third," etc., are only used to distinguish descriptions and should not be construed as indicating or implying relative importance.
[0038] Furthermore, terms such as "horizontal," "vertical," and "sag" do not imply that components must be absolutely horizontal or suspended, but rather that they can be slightly tilted. For example, "horizontal" simply means that its direction is more horizontal relative to "vertical," and does not mean that the structure must be completely horizontal, but can be slightly tilted.
[0039] In the description of this invention, it should also be noted that, unless otherwise explicitly specified and limited, the terms "set," "install," "connect," and "link" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection of two components. Those skilled in the art can understand the specific meaning of the above terms in this invention based on the specific circumstances.
[0040] Furthermore, it should be noted that, unless otherwise specified, the structures, connections, positions, power sources, etc. involved in this invention are all things that a person skilled in the art can know without creative effort based on the prior art.
[0041] Example 1:
[0042] refer to Figure 1 As shown in the figure, a file leakage prevention and traceability evidence collection system based on digital tags is illustrated. The file leakage prevention and traceability evidence collection system includes: a terminal agent unit and a behavior audit center. The terminal agent unit completes information interaction with the behavior audit center through the network.
[0043] The terminal agent unit is deployed on each terminal computer within the local area network. It achieves fine-grained access control and behavior auditing of intranet files based on tag attributes by embedding tags, verifying tags, auditing behavior, and controlling access to intranet files.
[0044] The behavior audit center is deployed on the local area network backend server. It receives file attribute and tag information reported by the terminal agent unit, realizes the presentation of intranet file assets, digital tag management, security policy management, and behavior log auditing. Based on artificial intelligence technology, it realizes the comprehensive presentation of file behavior status and threat prediction, and realizes automatic threat handling according to the preset security policy.
[0045] refer to Figure 2 As shown, when a user logs into the terminal computer to operate on a file, the terminal agent unit intercepts the file operation, checks the file's tag information, prohibits any operation on files without tags, and allows access to files with tags based on tag attributes. At the same time, it updates the file's new tag information, forms a tag chain around the timeline, and uploads the information to the behavior audit center for unified storage, analysis, and situation presentation, and completes file behavior threat detection, emergency response and handling, and file source tracing and evidence collection.
[0046] This invention's system is based on digital tagging technology, constructing multi-dimensional tag attributes for electronic documents. These attributes cover all document behaviors, including creation, opening, editing, sending, and deletion. Through tag-chain storage, it covers the entire lifecycle of a document, from creation to destruction, embedding tags and tag digest values into the document. Based on digital tags, it performs fine-grained behavior auditing and access control. By leveraging artificial intelligence algorithms to analyze the static characteristics and dynamic behaviors of documents, it provides a comprehensive situational awareness and security threat assessment. In the event of a document leak, the chain of tags allows for traceability and evidence collection, completing a closed-loop incident handling process.
[0047] To make the objectives, technical solutions, and advantages of this invention clearer, the specific implementation of this invention will be described in detail.
[0048] (1) Terminal Agent Unit
[0049] The terminal agent unit is deployed on the terminal computer and receives tag information and policy information synchronized from the behavior audit center. It has functions such as tag embedding, tag verification, file behavior auditing, and file operation access control for intranet files, realizing fine-grained secure access control and behavior auditing of intranet files based on tag attributes. The main functional designs are as follows:
[0050] 1) Design of multi-dimensional attribute tags for documents
[0051] A numeric tag is a data segment bound to a file. It can contain multi-dimensional attribute information of the file, such as basic attributes, security attributes, circulation attributes, and behavioral attributes, and can be embedded into the file and circulate with the file.
[0052] The basic attributes mainly include the electronic document's identity information, such as the document's security level, file name, and file path; the security attributes mainly include information such as the document's password processing and operation permissions, such as password processing methods like encryption, signing, and adding watermarks, as well as operation permissions such as read, write, and print; the circulation attributes mainly record relevant information about the file during its circulation process, such as the file's sender, receiver, and transmission time; and the behavioral attributes are mainly responsible for recording the user's operational behavior information, such as the file operation method, operator, and operation time.
[0053] An encoding rule is established for different file attributes, mapping all attribute information to file numeric tags. In this paper, the file numeric tag length is set to 512 bits.
[0054] The multidimensional attribute tag information constructed for files in this invention is shown in the table below.
[0055]
[0056] 2) Embedding digital tags
[0057] Tag embedding is a crucial part of this system. Only when files are tagged can fine-grained behavior auditing and access control be achieved. Users cannot perform any operations on files without embedded tags.
[0058] File tag embedding is implemented by terminal-side agent software. When a file is generated on the terminal computer, the embedding module automatically obtains the file's multi-dimensional attribute information, generates a file digital tag, and embeds it into the file. Tag embedding needs to ensure that it does not affect the use of the original file, while possessing a certain degree of security, concealment, and resistance to tampering. This invention uses digital watermarking technology, according to certain rules, to write the file's digital tag and a digital tag calculated using a digest value into a specific area of the file. The written content does not affect the normal use of the original file. Digital watermarking technology has a certain degree of security, concealment, and robustness. The specific digital tag embedding process is as follows. Figure 3 As shown.
[0059] 3) Digital tag synchronization
[0060] The file attribute information, file tags, and tag digest values generated by the terminal agent software are transmitted to the file audit center via socket communication. The digital tag management module then verifies the integrity of the tag information and manages its unified storage. The file audit center establishes a mapping between attributes, tags, and policies by constructing a file attribute information table, a file tag table, and a policy information control table. The file audit center can dynamically adjust policy information to dynamically modify file tags, and then synchronize this modification with the terminal side.
[0061] 4) Terminal file behavior auditing and access control
[0062] Terminal agent software, based on the user layer and kernel layer, implements access control and behavior auditing for file operations, including creation, reading, writing, deletion, renaming, and permission modification.
[0063] Specifically: In the Windows system, a file filtering driver is written to filter I / O operation requests. A file monitoring module is embedded between the file system driver (FSD) and the I / O manager. When an operation is detected to be inconsistent with the policy information, the operation is intercepted, the operation is recorded, and the transmission process is interrupted; if it is consistent with the policy information, the operation is allowed and the operation is recorded.
[0064] In domestic operating systems, the file monitoring kernel module is implemented based on the LSM framework registration. By hooking system calls related to file operations (such as __NR_write, __NR_delete_module, __NR_readdir), it achieves access control and behavior auditing for specified files.
[0065] (2) Behavioral Audit Center
[0066] The behavior auditing center is deployed on the server, receiving file attribute and tag information reported by terminal agent software. It enables the presentation of intranet file assets, digital tag management, security policy management, and behavior log auditing. Based on intelligent technology, it provides a comprehensive view of file behavior and threat prediction, and can automatically handle threats according to pre-configured security policies. The main functional designs are as follows:
[0067] 1) Digital Tag Management
[0068] After collecting file attribute and tag information reported by terminal agent software, the behavior audit center software cleans and standardizes the data, stores it uniformly, and completes the mapping table of "attribute-tag-tag summary".
[0069] During storage, the reported tag information is verified for integrity. Inconsistent tags are flagged with alerts, prompting security administrators to take appropriate action. Security administrators can edit the security attributes of file tags, mapping them to "tags" and policies to enable secure access control. The audit center categorizes and stores the collected file tags, creating a tag chain for each file to record its entire lifecycle from creation to destruction, providing data support for tracing and evidence collection after file leaks.
[0070] Specific chain structures are as follows: Figure 4 As shown. Each file tag chain is uniquely identified by the file's basic attributes. Every time a file's status changes, a file tag is generated and then added to the tag chain in chronological order.
[0071] 2) Behavioral detection, analysis, and profiling
[0072] Based on file attribute and tag information reported by the terminal, file behavior analysis is performed to detect threats. This embodiment utilizes Spark to build a related analysis platform.
[0073] First, data preprocessing is performed, reading file attributes and tag information from the database, and then cleaning, deduplicating, and standardizing the data. Simultaneously, considering modeling needs, the time period factor can be processed dimensionally. For example, considering daily time aggregation, the 24-hour time system will be normalized to the range [0,1].
[0074] Next, data analysis is conducted, including clustering, file label determination, behavioral association analysis, frequency quantification, and time-period clustering analysis of the preprocessed data. Specifically, Spark MLlib tools can be used for basic analysis, with the K-means package for clustering and FPGrowth for association rules.
[0075] The final step is to create a behavioral profile, which is then used for source tracing, threat alerts, and emergency response. The main process is as follows: Figure 5 As shown.
[0076] 3) Risk assessment indicator system
[0077] The accuracy of file behavior security analysis depends on the construction of a risk assessment indicator system. This paper mainly focuses on the file itself, conducting security analysis based on the file's purpose, user operations on the file, and daily office behaviors to identify abnormal behaviors and promptly detect file leakage risks. This paper constructs a three-level indicator evaluation system, as shown in the table below, and dynamically adjusts the weights based on the evaluation results.
[0078]
[0079] This invention system, based on digital tagging technology, constructs multi-dimensional attribute tags for intranet files. These tags cover all file behaviors, including file creation, opening, editing, renaming, copying, sending, and deletion. Each file operation generates new tag information, which is then stored in a chain to create a file tag chain. This chain covers the entire lifecycle of a file, from its creation to its destruction. By setting tag security attributes, fine-grained behavior auditing and access control based on digital tags are achieved, effectively solving the problem of easy leakage of confidential files, such as unauthorized viewing, manual screen copying, content tampering, and unauthorized transmission.
[0080] Meanwhile, this invention relies on artificial intelligence algorithms to analyze the static characteristics and dynamic behavior of files, and utilizes a security risk assessment index system to comprehensively present the situation of file behavior and analyze security threats, thereby improving the ability to detect abnormal file behavior and the speed of emergency response. When a file is leaked, the chain information of the file tags can be used to trace the source of the leak and find the problematic links in a timely manner.
[0081] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A file leakage prevention and traceability evidence collection system based on digital tags, characterized in that, The document leakage prevention and traceability system includes: a terminal agent unit and a behavior audit center, wherein the terminal agent unit completes information interaction with the behavior audit center via the network; The terminal agent unit is deployed on each terminal computer within the local area network. It achieves fine-grained access control and behavior auditing of intranet files based on tag attributes by embedding tags, verifying tags, auditing behavior, and controlling access to intranet files. The behavior audit center is deployed on the local area network backend server. It receives file attribute and tag information reported by the terminal agent unit, realizes the presentation of intranet file assets, digital tag management, security policy management, and behavior log auditing. Based on artificial intelligence technology, it realizes the comprehensive presentation of file behavior status and threat prediction, and realizes automatic threat handling according to the preset security policy. The digital tag used by the terminal agent unit is a data segment bound to a file, containing the file's basic attributes, security attributes, flow attributes, and behavioral attribute information, and can be embedded into the file and flow along with the file; The basic attributes include the electronic document's identity information, including: the document's security level, file name, and file path; The security attributes include the document's password processing and operation permission information, including: password processing methods such as encryption, signing, and adding watermarks, as well as read, write, and print operation permissions; The circulation attributes include relevant information about the file during the circulation process, including: the sender, receiver, and transmission time of the file; The behavioral attributes include information about the user's actions on the file, including: file operation method, operator, and operation time; The digital tag management process includes: After collecting the file attribute and tag information reported by the terminal agent unit, the Behavior Audit Center cleans and standardizes the data, stores it uniformly, and completes the mapping table of "attribute-tag-tag summary". During storage, the reported tag information is verified for integrity. If inconsistent tags are found, an alarm is triggered, and the security administrator takes security action. The security administrator edits the security attributes of the file tags to complete the "tag-policy" mapping and achieve secure access control of the files. The audit center categorizes and stores the collected file tags, constructing a tag chain for each file to record the file's behavior information throughout its entire lifecycle from creation to destruction, providing data support for tracing and evidence collection after a file leak.
2. The document leakage prevention and traceability evidence collection system as described in claim 1, characterized in that, When a user logs into a terminal computer and performs file operations, the terminal agent unit intercepts the file operations, checks the file's tag information, prohibits any operations on untagged files, and allows access to tagged files based on tag attributes. At the same time, it updates the file's new tag information, forms a tag chain around the timeline, and uploads the information to the behavior audit center for unified storage, analysis, and situational awareness, and completes file behavior threat detection, emergency response and handling, and file source tracing and evidence collection.
3. The document leakage prevention and traceability evidence collection system as described in claim 1, characterized in that, The tag embedding process includes: when a file is generated on a terminal computer, the embedding module obtains the file's multi-dimensional attribute information, generates a digital tag for the file, and uses a digital watermarking method to write the digital tag into a preset area of the file based on preset rules to complete the tag embedding.
4. The document leakage prevention and traceability evidence collection system as described in claim 1, characterized in that, The terminal agent unit, based on the user layer and kernel layer, implements access control and behavior auditing for file operations, including creation, reading, writing, deletion, renaming, and permission modification.
5. The document leakage prevention and traceability evidence collection system as described in claim 2, characterized in that, The file behavior threat detection process includes: analyzing file behavior based on file attribute and tag information reported by the terminal, and conducting threat discovery; First, data preprocessing is performed, which involves reading the file attributes and tag information from the database, cleaning the data, deduplicating the data, and standardizing the data. At the same time, considering the needs of modeling, the time period factor is processed. Secondly, data analysis was conducted, including clustering, file label determination, behavioral association analysis, frequency quantification, and time period clustering analysis of the preprocessed data. Ultimately, behavioral profiles are completed, enabling tracking, threat alerts, and emergency response.