A document privacy protection system and method based on organization information intensive management

By integrating and coordinating the management of organizational information authentication servers and converged document editors, sensitive information can be identified and hidden in real time, solving the problems of decentralized organizational information management and document privacy protection, and achieving efficient and secure information collaboration and privacy protection.

CN122241758APending Publication Date: 2026-06-19玺链科技有限公司 +4

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
玺链科技有限公司
Filing Date
2026-03-23
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, the decentralized management of organizational information leads to low efficiency in information retrieval, makes it difficult to form unified security control, and there is a disconnect between document editing and privacy protection. The application of AI large models brings new privacy risks, and there is a lack of privacy protection measures throughout the entire process.

Method used

By organizing information authentication servers for centralized management, and utilizing a fusion document editor and privacy protection engine to work together, sensitive information can be identified and hidden in real time. A mapping relationship between sensitive information and digital identifiers can be established to generate a two-layer document structure for precise control.

Benefits of technology

It has enabled centralized management of organizational information, improved the efficiency of information retrieval, eliminated the risk of leakage of sensitive information during the document generation stage, and ensured the efficiency and security of cross-entity and cross-system information collaboration.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241758A_ABST
    Figure CN122241758A_ABST
Patent Text Reader

Abstract

This invention discloses a document privacy protection system based on centralized management of organizational information, comprising: an organizational information authentication server configured to store real-name authenticated organizational information, wherein the organizational information is divided into multiple data packets according to the subject type, and each data packet is bound to a unique corresponding digital identifier; a fusion document editor communicatively connected to the organizational information authentication server, used to provide a text editing interface and to call the organizational information as a reference library in real time; a privacy protection engine coupled to both the fusion document editor and the organizational information authentication server, used to identify and hide sensitive information in the edited text, and to establish a mapping relationship between sensitive information and the digital identifier; and a digital identifier management module used to visually encapsulate and output the processed text and the digital identifier. This invention solves the problems of scattered storage of organizational information, duplicate authentication, and unclear responsibilities.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of information security technology, specifically relating to a document privacy protection system and method. Background Technology

[0002] With the deepening of digital transformation, the electronic information of five types of organizations—individuals, families, enterprises, communities, and governments—is experiencing explosive growth. However, existing technologies suffer from a prominent problem of fragmented organizational information management. Personal ID cards, medical insurance cards, and property ownership certificates are stored on different platforms; enterprise qualifications and legal representative information are scattered across systems such as industry and commerce, taxation, and banking; and family, community, and government information lacks a unified, centralized authentication portal. This fragmented management model leads to low information retrieval efficiency and makes it difficult to achieve unified security control at the organizational level. Furthermore, existing technologies such as electronic signatures and QR codes are only used for information presentation or signature verification and lack the ability to centrally bind and manage information from multiple organizations, failing to meet the integrated management needs of organizational information in the digital age.

[0003] On the other hand, there is a significant disconnect between document editing and privacy protection, and the application of AI large-scale models brings new privacy risks. Existing document editing tools lack the ability to identify organizational information, and users can only perform post-processing anonymization after the document is completed. This "expose first, protect later" model has serious security lags. When users upload documents to AI large-scale models for training or analysis, the organizationally sensitive information included in the document may be absorbed by the model and used in subsequent outputs, causing irreversible privacy leaks. Currently, there is a lack of effective means to control the data fed to AI at the source of document writing, making it impossible to achieve full-process privacy protection from information generation to use. There is currently no effective comprehensive solution to the above technical problems.

[0004] It should be noted that the above description of the technical background is only for the purpose of providing a clear and complete explanation of the technical solutions of the present invention and facilitating understanding by those skilled in the art. It should not be assumed that the above technical solutions are known to those skilled in the art simply because they have been described in the background section of this invention. Summary of the Invention

[0005] The purpose of this invention is to overcome the shortcomings of the prior art and provide a document privacy protection system and method based on the centralized management of organizational information.

[0006] This application discloses a document privacy protection system based on centralized management of organizational information, comprising: an organizational information authentication server configured to store real-name authenticated organizational information, wherein the organizational information is divided into multiple data packets according to the subject type, and each data packet is bound to a unique corresponding digital identifier; a fusion document editor communicatively connected to the organizational information authentication server, used to provide a text editing interface and to call the organizational information as a reference library in real time; a privacy protection engine connected to both the fusion document editor and the organizational information authentication server, used to identify sensitive information in the edited text and hide or replace it, while establishing a mapping relationship between sensitive information and the digital identifier; and a digital identifier management module used to visually encapsulate and output the processed text and the digital identifier.

[0007] Furthermore, the aforementioned privacy protection engine includes: a recognition unit, used to perform multi-dimensional matching of text fields with organizational information in the aforementioned comparison library based on regular expressions and deep learning models, and to attach a structured label to the text field after a successful match. The structured label includes at least the data packet type, information category, and corresponding unique digital identifier of the sensitive information; a one-click hiding unit, used to respond to user triggers and uniformly replace all text fields with the aforementioned structured labels with structured placeholders; and an association binding unit, used to establish an index mapping table between the aforementioned structured placeholders, the replaced original organizational information, and the aforementioned digital identifiers.

[0008] Furthermore, the aforementioned digital identifier is at least one of electronic signature, encrypted feature label, and QR code; the aforementioned electronic signature is issued by a nationally recognized Certificate Authority (CA) corresponding to the information subject type of each organization, and its generation and verification rely on asymmetric encryption algorithms; the aforementioned system interfaces with the aforementioned Certificate Authority to verify the validity of the electronic signature and establish a mapping relationship.

[0009] Furthermore, the aforementioned integrated document editor dynamically loads the organizational information of the multiple data packets associated with the currently logged-in user from the aforementioned organization information authentication server as a real-time reference library, based on the user's identity and permissions.

[0010] Furthermore, the aforementioned privacy protection engine receives text content from the aforementioned integrated document editor in real time and identifies sensitive information in the text that falls within the scope of the aforementioned plurality of data packets by intelligently matching it with the lookup database in the aforementioned organization information authentication server.

[0011] Furthermore, the aforementioned plurality of data packets include: an individual data packet, used to store organizational information containing identity information of an individual who has been verified by real name, and to establish a mapping relationship with the individual's personal electronic seal issued by a nationally recognized digital certificate authority; a family data packet, used to collect information on family members and shared assets, and to establish a mapping relationship with a family electronic signature seal generated based on the composite authorization of the member's personal CA certificate; an enterprise data packet, used to manage the legal person qualification and operational information of an enterprise, and to establish a mapping relationship with the enterprise's electronic seal issued by an enterprise digital certificate authority; a community data packet, used to store information on community and rural collective organizations, and to establish a mapping relationship with the electronic seal issued by the corresponding digital certificate authority of the community and rural collective organization; and a government data packet, used to carry public service information of government agencies, and to establish a mapping relationship with the government's electronic seal issued by a government digital certificate authority.

[0012] This application also discloses a document privacy protection method based on centralized management of organizational information, including the following steps: S1. By connecting to multiple nationally recognized digital certificate authorities and performing multi-factor composite authentication on each entity, a centralized authentication database of multiple data packets is established. Organizational information is classified and authenticated according to the entity, and a mapping relationship is established with the corresponding digital identifiers issued by CAs; S2. The above-mentioned integrated document editor is launched, and the multiple data packet lookup databases associated with the current user are loaded; S3. The edited text is identified in real time, and structured tags are attached to the text fields that match the lookup database, marking them as items to be hidden; S4. In response to the one-click hide command, all items to be hidden with the above-mentioned structured tags are replaced with placeholders, and a mapping relationship is established between placeholders, original information, and digital identifiers; S5. The text including placeholders and the above-mentioned digital identifiers are visually encapsulated and output.

[0013] Furthermore, the specific logic for real-time identification in step S3 above satisfies the following formula:

[0014] (Equation 1)

[0015] In the formula, M(t) represents whether the text fragment t is sensitive information, D is the set of organizational information in the above-mentioned organizational information authentication server, Sim(t,d) is the similarity calculation function, and θ is the preset matching threshold.

[0016] Furthermore, the data structure for establishing the mapping relationship in step S4 above satisfies the following formula:

[0017] (Equation 2)

[0018] In the formula, Index(P) is the index value corresponding to the placeholder P. To use the server private key Encryption functions for encryption, For user identification, RawData is the original sensitive information, and StampID is the associated unique numerical identifier.

[0019] Furthermore, in step S5 above, when visualizing the encapsulated output, the output document has a two-layer structure: the top layer is readable text including placeholders and graphic codes of the aforementioned numerical identifiers, and the bottom layer is a pointer link to the mapping relationship in the aforementioned organization information authentication server.

[0020] The beneficial effects of this invention are as follows:

[0021] First, by using the aforementioned organizational information authentication server, the organizational information of five types of entities—individuals, families, enterprises, communities, and governments—is centrally authenticated and managed, and bound to a unique digital identifier. This fundamentally solves the fragmentation problems of scattered storage, duplicate authentication, and unclear responsibilities of organizational information in the traditional model, and achieves the data governance goal of "one-time authentication, unified management of five entities, and universal applicability."

[0022] Second, through the collaborative operation of the aforementioned integrated document editor and the aforementioned privacy protection engine, the reference library is called in real time to identify sensitive information and respond to the one-click hiding command when the user edits text. This upgrades privacy protection from "post-event desensitization" to "source prevention and control", completely eliminating the risk of sensitive information being exposed at the document generation stage. It is especially suitable for AI large model data uploading scenarios, ensuring that the original privacy data never enters the model training or inference channel.

[0023] Third, a three-way mapping relationship between placeholders, original information and digital identifiers is established through the above-mentioned associated binding units. The digital identifier management module encapsulates this relationship into a two-layer document structure of "surface purification and decryption, and underlying authorization traceability". This achieves precise control of "no privacy in external dissemination and restoration of internal authorization". Legitimate certificate holders can retrieve the original information after identity verification by scanning the electronic signature or QR code. All access behaviors are recorded in the above-mentioned index mapping table to form a complete audit log.

[0024] Fourth, by establishing the generation and verification of digital identifiers on asymmetric encryption algorithms, the unforgeability and legal validity of electronic signatures, encrypted labels, and QR codes are ensured. They can be widely used in scenarios such as "paperless processing" of government services, enterprise bidding, sharing of medical records, government information disclosure, and family asset inheritance. While ensuring data security, they significantly improve the efficiency of information collaboration across entities, systems, and regions. Attached Figure Description

[0025] Figure 1 This is a schematic diagram of a document privacy protection system based on the centralized management of organizational information in one embodiment of the present invention.

[0026] Figure 2This is a flowchart of a document privacy protection method based on the centralized management of organizational information in one embodiment of the present invention.

[0027] The reference numerals in the above figures:

[0028] Privacy protection system 100, organization information authentication server 10, integrated document editor 20, privacy protection engine 30, digital identification management module 40, number package 11, family package 12, enterprise package 13, social package 14, government package 15, identification unit 31, one-click hiding unit 32, association binding unit 33, index mapping table 34, S1 to S5 are steps. Detailed Implementation

[0029] To better understand this invention, the following embodiments are provided in conjunction with the accompanying drawings. It should be understood that the embodiments of this invention are for illustrative purposes only and not for limiting the invention; the scope of protection of this invention is defined solely by the claims. The embodiments provided are merely preferred embodiments and are not intended to limit the invention in any way. Those skilled in the art can make changes, equivalent substitutions, or modifications based on the content of this invention to form different implementations. However, any changes and modifications, and any equivalent substitutions made to the method of this invention without departing from the inventive concept are within the scope of protection of this invention.

[0030] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.

[0031] It should be noted that the following detailed descriptions are exemplary and intended to provide further illustration of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

[0032] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of exemplary embodiments according to the invention. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. Furthermore, it should be understood that when the terms “comprising” and / or “including” are used in this specification, they indicate the presence of features, steps, operations, and / or combinations thereof.

[0033] First, the "Five-Data Package" system proposed in this invention does not simply divide data into five categories, but rather constructs a global organizational information digital identification framework covering five major social entities: individuals, families, enterprises, communities, and governments. The core of this framework lies in encapsulating all key data generated by each social entity in its social, economic, and political activities into corresponding data packages—namely, individual data packages, family data packages, enterprise data packages, community data packages, and government data packages—through rigidly bound digital identifiers, thereby achieving the inseparability of entity identity and data content at the data source.

[0034] Please refer to Figure 1 , Figure 1 This is a schematic diagram of a document privacy protection system 100 based on centralized organizational information management according to an embodiment of the present invention. Figure 1 As shown, a document privacy protection system 100 based on centralized management of organizational information according to the present invention includes: an organizational information authentication server 10, a fusion document editor 20, a privacy protection engine 30, and a digital identification management module 40.

[0035] The aforementioned organization information authentication server 10 is configured to store organization information that has been verified by real name. The organization information is divided into multiple data packets according to the subject type, such as packet 11 for individuals, packet 12 for families, packet 13 for enterprises, packet 14 for communities, and packet 15 for governments. Each data packet is bound to a unique corresponding digital identifier.

[0036] The aforementioned organization information authentication server 10 is the data core and root of trust of this invention. Its configuration is not simply centralized storage, but rather it strictly divides the information of all entities in society that require privacy protection into five independent data packages based on their legal attributes and social roles: Individual data package 11 is used to store the organization information of all individuals with identity information who have been verified by real name, and establishes a mapping relationship with the personal electronic seal issued by the individual through a nationally recognized digital certificate authority; Family data package 12 is used to collect information on family members and shared assets, and establishes a mapping relationship with the family electronic signature seal generated based on the composite authorization of the member's personal CA certificate; Enterprise data package 13 is used to manage the legal person qualification and operation information of enterprises, and establishes a mapping relationship with the enterprise electronic seal issued by the enterprise digital certificate authority; Community data package 14 is used to store information on community and rural group organizations, and establishes a mapping relationship with the electronic seal issued by the corresponding digital certificate authority of the community and rural group organization; Government data package 15 is used to carry public service information of government agencies, and establishes a mapping relationship with the government electronic seal issued by the government digital certificate authority.

[0037] This "five-package separation, unified management" architecture enables each type of organizational information to establish a mapping relationship with its unique digital identifier (electronic signature / tag / QR code) at the source, forming a rigid binding. This provides a reliable data foundation for real-time identification, precise hiding, and controllable restoration of permissions during the subsequent document editing process.

[0038] The aforementioned integrated document editor 20 is communicatively connected to the aforementioned organization information authentication server 10, providing a text editing interface and calling the aforementioned organization information as a reference library in real time. The integrated document editor 20 is the user interaction entry point and privacy protection trigger of this invention. It establishes a real-time communication connection with the aforementioned organization information authentication server 10. In addition to providing a regular text editing interface, it can dynamically load organization information from the aforementioned organization information authentication server 10 based on the currently logged-in user's identity and permissions, using the associated data packages 11, 12, 13, 14, or 15 as a real-time reference library. This means that when a user enters text in the editor, the editor's backend continuously compares the entered text content with the real-name authenticated organization information (such as personal ID numbers, unified social credit codes, family property information, etc.) stored on the server. This upgrades the integrated document editor 20 from a traditional "passive recording tool" to an intelligent creation platform with "actively identifying sensitive information capabilities," providing an accurate identification basis for subsequent one-click privacy hiding.

[0039] The aforementioned privacy protection engine 30 is connected to the aforementioned integrated document editor 20 and the organization information authentication server 10, respectively, and is used to identify and hide sensitive information in the edited text, while establishing a mapping relationship between sensitive information and the aforementioned digital identifiers. The aforementioned privacy protection engine 30 is the core processing unit of this invention, undertaking the key three-in-one function of "identification-replacement-binding": First, it receives text content from the aforementioned integrated document editor 20 in real time, and through intelligent matching with the lookup database in the aforementioned organization information authentication server 10, accurately identifies sensitive information (such as ID card numbers, enterprise qualification numbers, property certificate numbers, etc.) in the text that belongs to the scope of the quantity package, family package, enterprise package, social package, or government package; second, in response to the user-triggered one-click hiding command, it uniformly replaces all identified sensitive fields with structured identifiers. Placeholders (such as "{{number of packets: ID card number}}") are used to form privacy-cleaning text that can be published externally or input into AI large models. Finally, it establishes a key three-way mapping relationship in the background database—associating the above placeholders, the replaced original sensitive information, and the unique digital identifier (electronic signature / encryption label / QR code) bound to the data packet. This allows the cleaned document to not only completely shield privacy content on the surface, but also to accurately trace and restore the original information in legal scenarios through the digital identifier as an authorization key, thereby achieving a privacy protection mechanism that is "invisible on the surface and controllable at the bottom layer".

[0040] The aforementioned digital identifier management module 40 is used to visually encapsulate and output the processed text and the aforementioned digital identifier. The aforementioned digital identifier management module 40 is the output encapsulation and permission visualization unit of this invention, and its core function is to integrate and output the purified text processed by the aforementioned privacy protection engine 30 with the corresponding digital identifier.

[0041] Specifically, when generating the final document (such as a PDF, webpage, or structured data file), the aforementioned digital identifier management module 40 not only retains the privacy-cleaned text, including structured placeholders, as the surface content, but also embeds or attaches the digital identifiers (i.e., electronic signatures, encrypted feature tags, or QR codes) bound to the number packages, family packages, enterprise packages, social packages, or government packages associated with this text in a visual graphic form, forming a two-layer encapsulation structure of "text + signature". This encapsulation method gives the output document dual attributes—when it is released externally or submitted to the AI ​​big model, third parties can only read the placeholder text with hidden sensitive information and cannot obtain the original organizational information, while the digital identifier attached to the document serves as the only authorization entry point. Those with legitimate permissions can scan the QR code or verify the electronic signature to initiate a request to the aforementioned organizational information authentication server 10, accurately restoring the hidden original information based on the mapping relationship established in the background, thereby achieving a unity between "the dissemination of surface cleanliness" and "the controllability of underlying data".

[0042] The comparison database stored in the organization information authentication server 10 of the present invention is essentially an "organization information sieve" that has undergone multi-CA authentication and multi-factor composite verification. Its construction process is as follows.

[0043] First, the system connects to various nationally recognized digital certificate authorities, including personal CAs, enterprise CAs, and government CAs, to conduct multi-factor composite authentication for five types of entities: individuals, families, enterprises, communities, and governments. Specifically, personal authentication includes at least ID card verification, facial recognition, and binding of a personal digital certificate; enterprise authentication includes at least a unified social credit code, legal representative identity, and binding of a corporate digital certificate; family authentication is based on composite authorization of members' personal CA certificates and proof of shared assets; community authentication is based on the rural collective economic organization registration certificate and its corresponding CA certificate; and government authentication is based on the organization code certificate and the government CA certificate.

[0044] After each type of entity is authenticated, all its organizational information (such as identity information, qualification information, asset information, and authorization relationships) is categorized into corresponding data packages (individual, family, enterprise, community, and government data packages), and a mapping relationship is established with the corresponding legitimate electronic signature of that entity. The above-mentioned comparison database is like a fine sieve: the longitude lines are the five data package categories (individual, family, enterprise, community, and government), the latitude lines are the multi-element information (identity, qualification, assets, and relationships) under each entity, the mesh size is defined by the matching rules of the identification units (regular expressions, deep learning models, and similarity thresholds), and the mesh material is based on the trusted foundation of electronic signatures issued by multiple CAs.

[0045] When a user edits a document, the aforementioned integrated document editor 20 streams the input text through the sieve in real time. Any text field that matches a match in the mesh (i.e., the authentication information in the database) is tagged with a structured label by the recognition unit (the label includes at least the data packet type, information category, and corresponding unique numerical identifier). This process is equivalent to "hanging" sensitive information requiring privacy protection on the sieve.

[0046] When a user triggers a one-click hide command, the aforementioned privacy protection engine 30 only needs to batch replace the tagged fields with placeholders, eliminating the need for real-time recognition and achieving millisecond-level privacy protection response. The mapping relationship between the replaced original information and the placeholders and numeric identifiers is stored in an index mapping table through an associated binding unit for subsequent authorization restoration.

[0047] Through the aforementioned "sieve" mechanism, this invention achieves full-process privacy protection through "precise pre-marking, one-click batch hiding, and reliable authorization restoration".

[0048] It is worth noting that the aforementioned privacy protection engine 30 specifically includes: an identification unit 31, a one-click hiding unit 32, an association binding unit 33, and an index mapping table 34.

[0049] The aforementioned identification unit 31 is used to perform multi-dimensional matching between text fields and organizational information in the aforementioned comparison library based on regular expressions and deep learning models. Upon successful matching, a structured tag is attached to the text field. This structured tag includes at least the data packet type, information category, and corresponding unique numerical identifier of the sensitive information. The aforementioned one-click hiding unit 32 is used to respond to user triggering by uniformly replacing all text fields with the attached structured tags with structured placeholders. The aforementioned association and binding unit 33 is used to establish an index mapping table 34 between the structured placeholders, the replaced original organizational information, and the aforementioned numerical identifier.

[0050] The aforementioned identification unit 31, the aforementioned one-click hiding unit 32, and the aforementioned association binding unit 33 together constitute the core functional chain of the aforementioned privacy protection engine 30, forming a complete closed loop of "accurate identification - quick replacement - reliable binding".

[0051] The aforementioned identification unit 31 employs a dual matching strategy combining regular expressions and a deep learning model. The regular expressions are responsible for quickly locating sensitive fields with fixed formats, such as ID card numbers and unified social credit codes. The deep learning model is a semantic similarity calculation model based on Bidirectional Encoder Representations from Transformers (BERT), a Robustly Optimized BERT Pretraining Approach (RoBERTa), or a variant thereof. Its training data includes at least anonymized organizational information samples and their corresponding semantic variants from the aforementioned organizational information authentication server. The deep learning model then identifies implicit organizational information in unstructured text based on semantic understanding (e.g., matching "my three-bedroom apartment in Chaoyang District" with property certificate information). Through multi-dimensional matching, it accurately compares and labels text fields with organizational information in the reference database of the aforementioned organizational information authentication server 10.

[0052] The aforementioned one-click hiding unit 32 responds to the user-triggered hiding command (such as clicking the "AI Privacy Protection" button or a shortcut key), uniformly replacing all marked sensitive fields with structured placeholders (e.g., "{{number package: ID card number}}" "{{enterprise data package: unified social credit code}}"), forming privacy-cleaning text that can be safely disseminated; the aforementioned association binding unit 33 establishes and maintains the aforementioned index mapping table 34 in the background database, and the aforementioned index mapping table 34 is also used to record every access request to the mapping relationship, forming an audit log.

[0053] The aforementioned index mapping table 34 uses structured placeholders as index keys to record in detail the original sensitive information corresponding to each placeholder, the five-digit packet type to which the information belongs, and the storage path or encryption credential of the unique digital identifier (electronic signature / encrypted label / QR code) bound to the data packet. This ensures that the purified document completely shields privacy content when it is released to the public, while in legally authorized scenarios (such as after identity verification through digital identifiers), the original information can be accurately restored based on the index mapping table 34, achieving the privacy protection goal of "surface purification and underlying traceability".

[0054] In one embodiment of the present invention, the aforementioned digital identifier is at least one of electronic signature, encrypted feature label, and QR code; the aforementioned electronic signature is issued by a nationally recognized digital certificate authority corresponding to the information subject type of each organization, and its generation and verification rely on asymmetric encryption algorithms; the aforementioned privacy protection system 100 interfaces with the aforementioned digital certificate authority to verify the validity of the seal / signature and establish a mapping relationship.

[0055] For example, the aforementioned digital identifier is the authorization carrier and trust anchor of this invention. Its specific form is at least one of electronic signature, encrypted feature label, and QR code. The entire process of its generation and verification relies on asymmetric encryption algorithms (such as the SM2 Elliptic Curve Public Key Cryptography Algorithm (China State Commercial Cryptography Administration Office) or RSA asymmetric encryption algorithm), thereby constructing a "non-forgeable and non-repudiable" secure trust system. During the generation phase, the system uses a private key to digitally sign the verified organizational information (such as personal identity information corresponding to the data package and enterprise qualification information corresponding to the enterprise data package), and encodes the signature result into the graphic features of an electronic seal, the data field of an encrypted label, or a matrix pattern of a QR code, so that each digital identifier establishes a mapping relationship with a specific data package subject, forming a rigid binding. During the verification phase, any third party holding a public key can scan or parse the digital identifier to verify whether the associated organizational information has been issued by a legitimate certification authority and has not been tampered with, but cannot directly obtain the original sensitive data. If it is necessary to view the hidden original organizational information, it is necessary to use the digital identifier as an authorization credential, combined with the holder's identity authentication (such as facial recognition, private key signature, etc.), to send a request to the aforementioned organizational information authentication server 10. After the server verifies the information, it accurately returns the original information according to the aforementioned index mapping table 34, thereby realizing a unified mechanism for privacy protection and access control that ensures "identifiers are publicly verifiable and data authorization is traceable".

[0056] Please refer to Figure 2 , Figure 2 This is a flowchart of a document privacy protection method based on centralized organizational information management in one embodiment of the present invention. Figure 2 As shown, the present invention provides a document privacy protection method based on centralized management of organizational information, comprising the following steps: S1. By connecting to multiple nationally recognized digital certificate authorities and performing multi-factor composite authentication on each entity, a centralized authentication database of multiple data packets is established, and organizational information is classified and authenticated according to the entity, and a mapping relationship is established with the corresponding digital identifiers issued by CAs; S2. The above-mentioned integrated document editor is launched, and the multiple data packet reference database associated with the current user is loaded; S3. The edited text is identified in real time, and the text fields that match the reference database are attached with structured tags and marked as items to be hidden; S4. In response to the one-click hide command, all items to be hidden with the above-mentioned structured tags are replaced with placeholders, and a mapping relationship is established between placeholders, original information and digital identifiers; S5. The text including placeholders and the above-mentioned digital identifiers are visually encapsulated and output.

[0057] It is worth noting that step S1 of this invention, in the generation and authentication of digital identifiers, adopts a multi-CA, multi-element, and multi-platform collaborative mechanism for the entire organization's information authentication and certification. Specifically, the system does not rely on a single digital certificate authority or a single authentication source, but integrates multiple nationally recognized CA security authentication systems—including the online resident identity card function certificate of public security departments (such as the Ministry of Public Security of the People's Republic of China), the electronic business license service area of ​​market supervision departments, and the CA security authentication systems of various industry authorities—and dynamically adopts multi-dimensional authentication element combinations according to the different types of social entities: for individuals, real-name authentication is performed by integrating multiple elements such as identity documents, biometrics, and mobile phone numbers; for enterprises, multiple elements such as unified social credit code, legal person information, and corporate accounts are verified to confirm their legal identity; for families, family units are bound through proof of family member identity; and for government entities, authentication is performed based on authorization credentials from the government system and the identity information of public officials. Through this multi-CA interoperability and mutual recognition, multi-element cross-verification, and multi-platform data collaboration, unique and unforgeable electronic signatures (digital identifiers) are generated for all types of organizations, including individuals, families, enterprises, communities, and governments, ensuring that the signature is deeply cryptographically bound to the corresponding entity's global organizational information.

[0058] It is worth noting that, in one embodiment of the present invention, the specific logic for real-time identification in step S3 above satisfies the following formula:

[0059] (Equation 1)

[0060] In the formula, M(t) represents whether the text fragment t is sensitive information, D is the set of organizational information in the above-mentioned organizational information authentication server 10, Sim(t,d) is the similarity calculation function, and θ is the preset matching threshold.

[0061] In this embodiment, a mathematical formula is used to precisely define how the identification unit 31 determines whether a text fragment is sensitive information. The core of this embodiment lies in introducing a dual determination mechanism of similarity calculation function and preset threshold.

[0062] For example, M(t) represents the Boolean result of text fragment t being judged as sensitive information, D is the complete set of real-name authenticated organizational information stored in the aforementioned organizational information authentication server 10 (covering all authentication fields such as ID card numbers, unified social credit codes, and property certificate numbers in the number of individuals, families, enterprises, social organizations, and government agencies), Sim(t,d) is a similarity calculation function that can be implemented based on edit distance, cosine similarity, or deep learning semantic matching models, used to quantify the degree of matching between text fragment t and a certain piece of organizational information d in the reference library, and θ is a preset matching threshold that can be dynamically adjusted according to different data types. For example, a high threshold of 0.99 is set for fields with fixed formats such as ID card numbers, and a medium threshold of 0.85 is set for semantically ambiguous fields such as addresses. Text fragment t is judged as sensitive information (returns True) if and only if there exists at least one piece of organizational information d such that the similarity calculation result is greater than or equal to the threshold θ; otherwise, it is judged as non-sensitive information (returns False).

[0063] This formulaic definition transforms the subjective "sensitive information identification" into an objective and quantifiable calculation process, ensuring both the accuracy and consistency of identification, and providing a clear and programmable trigger basis for the subsequent one-click hiding unit 32.

[0064] In one embodiment of the present invention, the Sim(t,d) function can be a combination of multiple models (e.g., for fixed-format fields, regular expressions + edit distance are used; for semantic fields, deep learning models such as BERT are used to calculate vector similarity), and the threshold θ can be dynamically adjusted (e.g., different thresholds are set for different data packet types and different fields). For example, for fixed-format fields such as ID card numbers and unified social credit codes, regular expressions are used for precise matching, and the similarity is set to 1 or 0; for fields such as addresses and names, similarity calculation based on edit distance (Levenshtein Distance) is used, and a preset threshold of 0.85 is set; for more complex semantic information, a BERT-based semantic similarity model is used to convert text fragments and reference library information into vectors respectively, calculate cosine similarity, and set a preset threshold of 0.78.

[0065] It is worth noting that, in one embodiment of the present invention, the data structure for establishing the mapping relationship in step S4 above satisfies the following formula:

[0066] (Equation 2)

[0067] Index(P) is the index value corresponding to the placeholder P. An encryption function performed using the server's private key. For user identification, RawData is the original sensitive information, and StampID is the associated unique numerical identifier.

[0068] In this embodiment, a mathematical formula is used to define how the aforementioned association binding unit 33 securely binds the placeholder, the original sensitive information, and the digital identifier. The core of this embodiment lies in using a server private key encryption mechanism to ensure the immutability and traceability of the mapping relationship. Index(P) represents the unique encrypted index value corresponding to the placeholder P in the aforementioned index mapping table 34. To use the private key of the aforementioned organization information authentication server 10 Asymmetric encryption functions for encryption. The current user's identity is used for permission verification. RawData is the original sensitive information that is being replaced (such as the ID number "410***************"). StampID is the unique digital identifier bound to the multiple data packets to which this sensitive information belongs (such as the serial number of a personal electronic seal).

[0069] Through this encrypted index structure, the system does not store plaintext mapping relationships in the background, but rather ciphertext index values ​​encrypted with the private key. Even if an external attacker obtains the index mapping table 34, they cannot reconstruct the original information. In legitimate authorized scenarios (such as after a user verifies their identity via a digital identifier), the system uses the corresponding public key to decrypt Index(P) and extract... Access is matched and confirmed before RawData can be accessed for users to view. This ensures both the "surface cleanliness of documents" and provides tamper-proof and auditable security technology support for the "traceability of underlying data".

[0070] It is worth noting that the aforementioned Index(P) or mapping relationship is primarily stored in the index mapping table 34 of the aforementioned organization information authentication server 10. The underlying "pointer link" of the aforementioned document only includes a unique, unpredictable "reference ID" pointing to a specific mapping entry on that server, rather than including the encrypted original data itself.

[0071] In one embodiment of the present invention, a user sends a request to the server, carrying a reference ID and user identity credentials, to scan the digital identifier on the document. The aforementioned organization information authentication server 10 decrypts the data and returns the original data. In this way, the encrypted original data never leaves the secure environment of the aforementioned organization information authentication server 10, which conforms to the core principle of "data usable but not visible" and also solves the risk of offline cracking.

[0072] It is worth noting that, in one embodiment of the present invention, when visualizing the encapsulation output in step S5 above, the output document has a two-layer structure: the top layer is readable text including placeholders and graphic codes of the above-mentioned numerical identifiers, and the bottom layer is a set of pointer links pointing to one or more mapping relationships in the above-mentioned organization information authentication server 10.

[0073] The visualization encapsulation output defined in step S5 of this embodiment constructs an innovative two-layer document structure, fundamentally resolving the contradiction between the "transmissibility" and "verifiability" of privacy documents. The surface layer is a public view that users can directly read or submit to the AI ​​large model. This includes structured placeholders (such as "{{number package: ID card number}}" and "{{enterprise data package: unified social credit code}}") replaced by the one-click hiding unit 32, as well as graphic codes (i.e., electronic signature images, encrypted labels, or QR codes) that digitally identify multiple data packets associated with the document. This surface layer ensures that any third party can only see the desensitized placeholder text when obtaining the document, and cannot access the original sensitive information. At the same time, the digitally identified graphic codes serve as publicly visible access points. The underlying layer consists of invisible pointer links embedded in documents or stored in document metadata. These links, in encrypted form, point to the index mapping table 34 established by the aforementioned association binding unit 33 in the organization information authentication server 10. Each set of placeholders forms a precise mapping with the corresponding encrypted index value. When a user with legitimate permissions (such as completing identity authentication through digital identification) scans the surface QR code or clicks the electronic signature, the system sends a request to the server through the underlying pointer link. The server decrypts the information based on the aforementioned index mapping table 34 and returns the corresponding original organization information. This achieves the dual goals of "surface-level public dissemination without leakage and underlying-level authorized access that can be restored." It satisfies the privacy protection needs in scenarios such as AI large model data upload and document disclosure, while retaining the ability to accurately trace the original information when necessary.

[0074] In one embodiment of the present invention, the invention is also embedded into word processing software or web page editors as a plug-in. The present invention encapsulates core components such as an organizational information authentication server, a converged document editor, a privacy protection engine, and a digital identifier management module into lightweight plug-ins or extension components, enabling seamless integration into mainstream word processing software (such as Microsoft Word and WPS Office) or web page editors (such as online document browsers and multi-function text editors).

[0075] This plug-in deployment approach offers three key technological advantages: First, users can obtain "one-click hiding" privacy protection within a familiar software environment without changing their existing document editing habits, significantly reducing learning costs and promotion barriers. Second, the plug-in can access authentication data from the backend organization's information authentication server in real time, ensuring the accuracy and timeliness of sensitive information identification. Third, the plug-in architecture enables the technical solution of this invention to quickly adapt to the application needs of different platforms and scenarios. Whether for individual users, enterprise users, or government office scenarios, users can obtain the ability to protect privacy from the source of document editing through simple plug-in installation, thereby greatly enhancing the commercial value and industrial application prospects of this invention.

[0076] The core of this invention lies in the fact that only by using "global organizational information"—that is, the complete identity attribute information corresponding to individuals, families, enterprises, communities, governments, and other entities in legal and socio-economic activities—as a complete element for CA authentication and electronic signature generation can the technical goal of "one-click privacy protection" be truly achieved. If the digital identifier is only bound to partial information (e.g., only bound to an individual's ID number without binding to their family relationship, corporate position, or government status), then when a user uses the same identifier on different platforms and in different scenarios, the system will not be able to accurately determine the scope of permissions that the current operator should possess, nor will it be able to automatically implement the minimum necessary principle of privacy protection strategy in cross-domain data flow. Conversely, because each of the five data packets in this invention is bound to the global organizational information of the corresponding entity, the system can accurately identify the multiple identity attributes of the operating entity when receiving any data packet. Thus, when the user triggers the "one-click privacy protection" function, the system can automatically and accurately determine the accessibility scope of the current data in different scenarios based on the global organizational information embedded in the data packet, without requiring the user to repeatedly manually set permissions—this technical effect can only be achieved under the premise of constructing the digital identifier with global organizational information as a complete authentication element.

[0077] Example 1: Personal User Document Privacy Protection

[0078] This embodiment uses the example of an individual user, "Zhang San," using this system to write a document involving information about jointly owned family assets to illustrate the implementation process of the present invention in detail.

[0079] First, the privacy protection system 100 executes step S1 to establish a centralized authentication database of five data packets. The privacy protection system 100 connects to multiple nationally recognized digital certificate authorities and performs multi-factor authentication on each entity, establishing a centralized authentication database of multiple data packets. Specifically, when Zhang San uses the system for the first time, he needs to complete multi-factor authentication of his personal identity. The privacy protection system 100 connects to the trusted identity authentication platform (CyberTrusted Identity (CTID) platform) of the Ministry of Public Security of the People's Republic of China, collecting Zhang San's ID card information, facial biometric features, and his real-name mobile phone number. After cross-verification, a unique digital identifier for Zhang San is generated, and his identity information is stored in data packet 11. Simultaneously, the privacy protection system 100 connects to the electronic business license service area of ​​the market supervision department to authenticate the enterprise information of Zhang San as the legal representative, storing it in enterprise data packet 13. The privacy protection system 100 also connects to the civil affairs department and the real estate registration system to authenticate his family members and jointly owned property information, storing it in family data packet 12. Each of the above data packets is mapped to a corresponding digital identifier issued by a certificate authority. Data packet 11 is associated with Zhang San's personal electronic seal, data packet 13 is associated with the company's electronic seal, and data packet 12 is associated with the family electronic signature seal generated based on the composite authorization of family members' personal CA certificates.

[0080] Next, the system executes step S2, launching the integrated document editor 20 and loading the comparison database. After Zhang San logs into the system, he launches the integrated document editor 20. Based on Zhang San's login identity and permissions, the privacy protection system 100 dynamically loads organizational information from multiple data packets associated with the user from the organizational information authentication server 10 as a real-time comparison database. At this time, the integrated document editor 20 loads the following comparison data in the background: the number packet 11 includes Zhang San's name, ID number, mobile phone number, address, etc.; the family packet 12 includes family member Li Si, family shared property address, shared vehicle information, etc.; and the enterprise packet 13 includes enterprise name, unified social credit code, registered address, etc.

[0081] Then, the system executes step S3 to identify the edited text in real time and attach structured tags. Zhang San writes text in the editor, which involves personal identity information and family co-owned assets. The privacy protection system 100 receives the text content through the privacy protection engine 30 and performs multi-dimensional matching of the text fields with the organization information in the reference library based on regular expressions and deep learning models. When "Zhang San" appears in the text, the privacy protection system 100 matches the personal name in the packet; when "110101199001011234" appears, it matches the ID number in packet 11; when "Room 101, Building 1, XX Community, Chaoyang District, Beijing" appears, it matches the family property address in packet 12; when "Beijing Real Estate Certificate No. 123456" appears, it matches the real estate certificate number in packet 12; when "Li Si" appears, it matches the family member name in packet 12. After successful matching, the recognition unit 31 attaches a structured tag to the text field, and the tag includes the data packet type, information category to which the sensitive information belongs, and the corresponding digital identification unique number. For example, the tag attached to the ID number field includes packet 11, the ID number information category, and the unique number of Zhang San's personal electronic seal.

[0082] After that, the privacy protection system 100 executes step S4 to respond to the one-key hiding instruction and establish a mapping relationship. After Zhang San finishes writing the document and is about to send it to the intermediary agency, to protect privacy, Zhang San clicks the one-key hiding button on the toolbar of the integrated document editor 20. In response to the user's trigger, the privacy protection system 100 uniformly replaces all text fields with attached structured tags with structured placeholders. The original "Zhang San" is replaced with "personal name", the ID number is replaced with "personal ID number", the family property address is replaced with "family property address", the real estate certificate number is replaced with "family real estate certificate number", and the family member name is replaced with "family member name". At the same time, the association binding unit 33 establishes an index mapping table 34 among the structured placeholder, the original organization information being replaced, and the digital identification. The index mapping table 34 is encrypted using the server private key to bind the user identity identifier, the original sensitive information, and the associated digital identification unique number, and is only stored in the organization information authentication server 10. The client does not retain any original sensitive information.

[0083] Finally, the system executes step S5 to perform visual encapsulation and output. The privacy protection system 100 visually encapsulates and outputs text including placeholders and numerical identifiers. The output document has a two-layer structure: the top layer is readable text including placeholders and a QR code graphic of Zhang San's electronic personal seal; the bottom layer is a pointer link to the mapping relationship in the organization information authentication server 10. After receiving the document, the recipient, i.e., the intermediary, needs to initiate an authorization request to the system through the numerical identifier if it needs to verify or restore sensitive information. After verifying the recipient's permissions, the privacy protection system 100 decrypts and retrieves the original information from the organization information authentication server 10 based on the index value of the placeholders, realizing on-demand and controllable information restoration.

[0084] Example 2: Privacy Protection of Enterprise User Contract Documents

[0085] This embodiment uses the example of a legal representative of a company, "XX Technology Co., Ltd.," using this system to draft a procurement contract between the company and its supplier to illustrate the implementation process of the present invention in detail.

[0086] First, the privacy protection system 100 executes step S1 to establish a centralized authentication database for the five data packages. When registering with the system, XX Technology Co., Ltd. needs to complete multi-factor composite authentication of its corporate entity. The privacy protection system 100 connects to the electronic business license service area of ​​the market supervision department, verifying multiple elements such as the unified social credit code, legal person information, and corporate bank account. After authentication, an electronic seal is generated for the company, and the company's qualification information is stored in data package 13. Simultaneously, the company's legal representative, Mr. Wang, needs to complete personal identity authentication. His personal information is stored in data package 11 and an authorization relationship is established between it and the company's electronic seal.

[0087] Next, the system executes step S2, launching the integrated document editor 20 and loading the comparison database. After logging into the system, legal personnel Wang launches the integrated document editor 20. Based on Wang's login credentials, the privacy protection system 100 dynamically loads associated data packets from the organization information authentication server 10 as a real-time comparison database. Since Wang has enterprise administrator privileges, the organization information authentication server 10 loads the following comparison data in the background: the enterprise data package includes the enterprise name "XX Technology Co., Ltd.", unified social credit code, registered address, bank account information, etc.; the item data package 11 includes Wang's name, position, contact information, etc. Simultaneously, because the contract involves supplier information, Wang manually authorized the loading of publicly available authentication information for the supplier companies.

[0088] Then, the privacy protection system 100 executes step S3, recognizing the edited text in real time and attaching structured tags. Wang is writing a procurement contract in the integrated document editor 20, which includes corporate information of the purchaser and supplier. The privacy protection system 100 uses the privacy protection engine 30 to recognize sensitive information in the text in real time. When the purchaser's company name, unified social credit code, and registered address appear in the text, the privacy protection system 100 matches them with the information in the enterprise data package 13 and successfully recognizes them; when the supplier's company name, unified social credit code, and registered address appear in the text, the privacy protection system 100 matches them with the supplier's authentication information and successfully recognizes them. The recognition unit 31 attaches structured tags to the above fields, including the data package type (enterprise data package 13), the information category (company name, unified social credit code, or registered address), and the corresponding unique digital identifier (company electronic seal number).

[0089] Subsequently, the privacy protection system 100 executes step S4, responding to the one-click hide command and establishing a mapping relationship. After the contract document is finalized, Mr. Wang prepares to send the document to the supplier for confirmation. To protect the sensitive information of both parties, Mr. Wang clicks the one-click hide button. The privacy protection system 100 uniformly replaces all sensitive fields with attached structured tags with structured placeholders. The purchaser's company name is replaced with "purchaser's company name", the unified social credit code is replaced with "purchaser's unified social credit code", the registered address is replaced with "purchaser's registered address", and the supplier's relevant information is also replaced with placeholders accordingly. At the same time, the association binding unit 33 establishes an index mapping table 34 between placeholders and original information. Since this embodiment involves multiple parties, namely the purchaser and the supplier, the privacy protection system 100 establishes a mapping relationship with their respective digital identifiers. The purchaser's information is mapped to the electronic seal of XX Technology Co., Ltd., and the supplier's information is mapped to the electronic seal of YY Technology Co., Ltd.

[0090] Finally, the privacy protection system 100 executes step S5 to perform visual encapsulation and output. The privacy protection system 100 visually encapsulates and outputs the contract text, including placeholders and numerical identifiers. In the output PDF document, the electronic seal graphic codes of both the purchaser and the supplier are attached below the placeholder text on the surface. After receiving the document, the recipient, i.e., the supplier, can decrypt and verify it using its own company's electronic seal, only revealing sensitive information related to itself, such as the supplier's name and address, while the purchaser's sensitive information remains hidden, achieving fine-grained access control.

[0091] Example 3: Cross-departmental collaborative privacy protection of government documents

[0092] This embodiment uses a staff member of a municipal government department to write a public service notice involving collaboration among multiple departments as an example to illustrate the implementation process of the present invention.

[0093] First, the privacy protection system 100 executes step S1 to establish a centralized authentication database for the five data packages. When deploying this system, a municipal government service center needs to complete multi-element composite authentication of government entities. The privacy protection system 100 connects to the digital certificate issuing authority to verify the government system's authorization credentials and the identity information of public officials. After authentication, it generates an electronic government seal and stores information such as government agency information, departmental responsibilities, and commonly used document numbering rules in data package 15. Simultaneously, relevant government departments such as the Municipal Civil Affairs Bureau, the Municipal Finance Bureau, and the Municipal Human Resources and Social Security Bureau also complete authentication, generating their respective electronic government seals and establishing data packages 15.

[0094] Next, the system executes step S2, launching the integrated document editor 20 and loading the comparison database. After logging into the privacy protection system 100, Li, a staff member of the government service center, launches the integrated document editor 20. Based on Li's login credentials, the privacy protection system 100 dynamically loads associated data packets from the organization information authentication server 10 as a real-time comparison database. Since Li is responsible for drafting official documents involving collaboration across multiple departments, the integrated document editor 20 loads the following comparison data in the background: the government service center's data package 15, the municipal civil affairs bureau's data package 15, the municipal finance bureau's data package 15, the municipal human resources and social security bureau's data package 15, and Li's personal data package 11.

[0095] Then, the privacy protection system 100 executes step S3, which identifies the edited text in real time and adds structured tags. Li wrote an announcement about the distribution of subsidies for the benefit of the people in the integrated document editor 20. The content involved citations of documents from multiple departments and sensitive monetary information. The privacy protection system 100, through the privacy protection engine 30, identifies sensitive information in the text in real time. When "Civil Affairs Bureau" appears in the text, the system matches the department name in the Civil Affairs Bureau's data package; when "Civil Affairs Document

[2024] No. 12" appears, the system matches the document number in the Civil Affairs Bureau's data package 15; when "City Finance Bureau" appears, the system matches the department name in the City Finance Bureau's data package; when "Finance Document

[2024] No. 35" appears, the system matches the document number in the City Finance Bureau's data package 15; when "2023 Minimum Living Standard" and "1080 yuan / month" appear, the system matches the subsidy standard information in the City Human Resources and Social Security Bureau's data package 15. The identification unit adds structured labels to the above fields. Unlike personal and enterprise scenarios, the labels in government document scenarios need to be additionally marked with information security level such as internal public, sensitive or confidential, so that they can be differentiated and hidden according to different permissions later.

[0096] Afterwards, the system executes step S4, responding to the one-click hide command and establishing a mapping relationship. After completing the initial draft, Mr. Li prepares to send the document to relevant departments for countersigning. Since the document includes undisclosed subsidy amounts and internal document numbers, Mr. Li clicks the one-click hide button. The privacy protection system 100 uniformly replaces all sensitive fields with attached structured tags with structured placeholders and differentiates them according to their security level. Department names are replaced with "department name," document numbers are replaced with "document number," and subsidy standard information is replaced with "the annual minimum living allowance standard published by the department name." Simultaneously, the association binding unit 33 establishes an index mapping table 34 between placeholders and original information. The privacy protection system 100 records the issuing department's digital identifier for each piece of sensitive information. Document information from the Municipal Civil Affairs Bureau is mapped to the Municipal Civil Affairs Bureau's electronic seal, document information from the Municipal Finance Bureau is mapped to the Municipal Finance Bureau's electronic seal, and subsidy standard information from the Municipal Human Resources and Social Security Bureau is mapped to the Municipal Human Resources and Social Security Bureau's electronic seal.

[0097] Finally, the privacy protection system 100 executes step S5 to perform visual encapsulation and output. The privacy protection system 100 visually encapsulates and outputs the document, including placeholders and digital identifiers. The output document has a two-layer structure: the top layer contains readable text including placeholders and the electronic seal of the document issuing unit, namely the government service center; the bottom layer contains pointer links to mapping relationships in the organization information authentication server, respectively associated with the electronic seals of relevant departments such as the Municipal Civil Affairs Bureau, the Municipal Finance Bureau, and the Municipal Human Resources and Social Security Bureau. When the document is sent to the Municipal Civil Affairs Bureau for countersigning, staff use its electronic seal to decrypt it, restoring only sensitive information related to the bureau, such as "Civil Affairs Document

[2024] No. 12," while information from other departments remains hidden. Similarly, when sent to the Municipal Finance Bureau, only information related to the Finance Bureau is restored; when sent to the Municipal Human Resources and Social Security Bureau, only information related to subsidy standards is restored. When the final version is released, the administrator can choose to hide all information or partially restore it according to permissions, achieving flexible control of visibility by department during the countersigning stage and unified desensitization during the release stage.

[0098] It is worth noting that in this invention, the aforementioned electronic seals, including electronic official seals, electronic private seals, and electronic signature seals, are collectively referred to as digital identifiers. Essentially, they are a set of digital signature data generated based on asymmetric encryption algorithms, and a mapping relationship is established with real-name authenticated organizational information (such as personal ID cards, unified social credit codes of enterprises, etc.) to achieve rigid binding.

[0099] The electronic signature generation process is as follows: The system uses the private key from a digital certificate issued by a nationally recognized certificate authority to sign the organization's information and unique identifier, generating tamper-proof seal data, which is then visually presented as a graphic code (such as a QR code or signature image). During verification, any third party holding the corresponding public key can verify the authenticity and integrity of the signature, but cannot directly obtain the original organization information. Through this mechanism, the electronic signature in this invention combines identity authentication, authorization, and operation traceability functions.

[0100] The primary advantage of this invention lies in its realization of "building a privacy firewall from the source of document writing," overturning the traditional, reactive privacy protection model of "post-event desensitization and passive response." Through the aforementioned organizational information authentication server 10, organizational information of five types of entities—individuals, families, enterprises, communities, and governments—is centrally authenticated and managed, and bound to a unique digital identifier (electronic signature / encrypted label / QR code). Then, through the aforementioned integrated document editor 20, sensitive information is identified in real time by calling a reference library when the user inputs text. This makes privacy protection no longer dependent on secondary processing after the document is completed or compliance commitments from third-party service providers. Instead, from the moment the text is generated, through the collaborative operation of the aforementioned privacy protection engine 30's identification unit 31, the aforementioned one-click hiding unit 32, and the aforementioned association binding unit 33, sensitive information is replaced with structured placeholders and a traceable mapping relationship is established before output. Finally, the aforementioned digital identifier management module 40 encapsulates it into a two-layer document structure of "surface purification and decryption, and underlying authorization and traceability."

[0101] The core value of this mechanism is that when any organization or individual uploads data to the AI ​​big model, publishes documents externally, or shares information across departments, the original sensitive information will never appear in the disseminated content. Furthermore, legitimate authorized access can be precisely achieved through digital identification. This completely solves the risks of privacy leakage caused by uploading AI big model data, the risk of uncontrolled permissions in cross-system data sharing, and the problem of excessive information exposure during the circulation of electronic documents. It builds a brand-new privacy protection technology system that ensures "data is usable but invisible, and permissions are controllable and traceable."

[0102] The embodiments of the present invention described above can be implemented in various hardware, software codes, or combinations thereof. For example, embodiments of the present invention can also be program code executing the above methods in a Digital Signal Processor (DSP). The present invention can also relate to various functions executed by a computer processor, digital signal processor, microprocessor, or Field Programmable Gate Array (FPGA). The processor described above can be configured to perform specific tasks according to the present invention, which are accomplished by executing machine-readable software code or firmware code defining the specific methods disclosed in the present invention. The software code or firmware code can be developed into different programming languages ​​and different formats or forms. The software code can also be compiled for different target platforms. However, the different code styles, types, and languages ​​of the software code performing tasks according to the present invention and other types of configuration code do not depart from the spirit and scope of the present invention.

[0103] Therefore, those skilled in the art will recognize that although embodiments of the present invention have been shown and described in detail herein, many other variations or modifications conforming to the principles of the present invention can be directly determined or derived from the disclosure of the present invention without departing from the spirit and scope of the invention. Therefore, the scope of the present invention should be understood and recognized as covering all such other variations or modifications.

Claims

1. A document privacy protection system based on centralized organizational information management, comprising: An organization information authentication server is configured to store organization information that has been verified by real name. The organization information is divided into multiple data packets according to the subject type, and each data packet is bound to a unique corresponding digital identifier. An integrated document editor is connected to the organization information authentication server to provide a text editing interface and to call the organization information as a reference database in real time. A privacy protection engine is coupled to the integrated document editor and the organization information authentication server respectively, and is used to identify sensitive information in the edited text and hide or replace it, while establishing a mapping relationship between sensitive information and the digital identifier; A digital identifier management module is used to visually encapsulate and output the processed text along with the digital identifier.

2. The document privacy protection system according to claim 1, characterized in that, The privacy protection engine includes: A recognition unit is used to perform multi-dimensional matching of text fields with organizational information in the reference library based on regular expressions and deep learning models, and to attach a structured label to the text field after a successful match. The structured label includes at least the data packet type, information category, and corresponding unique digital identifier of the sensitive information. The one-click hide unit is used to respond to user triggers and uniformly replace all text fields with the structured tags with structured placeholders. An association binding unit is used to establish an index mapping table among the structured placeholders, the replaced original organizational information, and the digital identifier.

3. The document privacy protection system according to claim 1, characterized in that, The digital identifier is at least one of electronic signature, encrypted feature label, and QR code; the electronic signature is issued by a nationally recognized digital certificate authority corresponding to the information subject type of each organization, and its generation and verification rely on asymmetric encryption algorithms; the document privacy protection system interfaces with the digital certificate authority to verify the validity of the electronic signature and establish a mapping relationship.

4. The document privacy protection system according to claim 1, characterized in that, The privacy protection engine receives text content from the integrated document editor in real time and identifies sensitive information in the text that falls within the range of the plurality of data packets by intelligently matching it with the reference library in the organization information authentication server.

5. The document privacy protection system according to claim 1, characterized in that, The integrated document editor dynamically loads the organizational information from the multiple data packets associated with the currently logged-in user as a real-time reference library, based on the user's identity and permissions.

6. The document privacy protection system according to any one of claims 1 to 5, characterized in that, The plurality of data packets includes: A number of packets are used to store organizational information containing identity information of all individuals who have been verified by real name, and to establish a mapping relationship with the personal electronic seal issued by the individual through a nationally recognized digital certificate authority. The Family Data Package is used to collect information on family members and shared assets, and to establish a mapping relationship with the family electronic signature seal generated based on the composite authorization of members' personal CA certificates. Enterprise Data Package is used to manage corporate legal entity qualifications and operational information, and to establish a mapping relationship with the electronic seal of the enterprise issued by the enterprise digital certificate authority. The community data package is used to store information about community and rural collective organizations and to establish a mapping relationship with the electronic seals issued by the digital certificate issuing authorities corresponding to the community and rural collective organizations. The government data package is used to carry public service information of government agencies and to establish a mapping relationship with the electronic seals of government issued by government digital certificate issuing authorities.

7. A document privacy protection method based on the document privacy protection system according to any one of claims 1 to 6, comprising the following steps: S1. By connecting with multiple nationally recognized digital certificate authorities and conducting multi-element composite authentication for each entity, a centralized authentication database of multiple data packets is established. Organizational information is authenticated according to the entity and a mapping relationship is established with the corresponding digital identifier issued by the CA. S2. Start the integrated document editor and load the multiple data package reference libraries associated with the current user; S3. Real-time recognition of edited text, attaching structured tags to text fields that match the reference library and marking them as items to be hidden; S4. In response to the one-click hide command, replace all items to be hidden that have been attached with the structured tags with placeholders, and establish a mapping relationship between placeholders, original information and numerical identifiers. S5. Visually encapsulate and output the text including placeholders and the numerical identifier.

8. The document privacy protection method according to claim 7, characterized in that, The specific logic for real-time identification in step S3 satisfies the following formula: Where M(t) represents whether text fragment t is sensitive information, D is the set of organizational information in the organizational information authentication server, Sim(t,d) is the similarity calculation function, and θ is the preset matching threshold.

9. The document privacy protection method according to claim 7, characterized in that, The data structure for establishing the mapping relationship in step S4 satisfies the following formula: Where Index(P) is the index value corresponding to placeholder P. To use the server private key Encryption functions for encryption, For user identification, RawData is the original sensitive information, and StampID is the associated unique numerical identifier.

10. The document privacy protection method according to claim 7, characterized in that, In step S5, when visualizing and encapsulating the output, the output document has a two-layer structure: the top layer is readable text including placeholders and a graphic code of the numerical identifier, and the bottom layer is a pointer link to the mapping relationship in the organization information authentication server.