File processing method and device, electronic equipment, storage medium and program product

By converting user identification information into replacement index values ​​and embedding them into file content, the problem of separating identification information from file content in sensitive file tracing is solved, achieving high resistance to damage and accurate leakage tracing of files.

CN122196980APending Publication Date: 2026-06-12CHINA UNIONPAY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA UNIONPAY
Filing Date
2026-01-16
Publication Date
2026-06-12

Smart Images

  • Figure CN122196980A_ABST
    Figure CN122196980A_ABST
Patent Text Reader

Abstract

Embodiments of the present application provide a file processing method and device, electronic equipment, storage medium and program product. When a download request for a target file is received, a substitution item index value of a replaceable part in the target file is generated according to user identification information of a requester through a mapping rule; a substitution operation is performed on the replaceable part based on the substitution item index value, and a downloadable file corresponding to the target file is generated; in the case of leakage of the downloadable file, the user identification information is restored according to the replaced part in the downloadable file, and the source of leakage is determined based on the restored user identification information. Through the mapping rule, the user identification information is converted into the substitution item index value, and the substitution item corresponding to the index value is embedded in the file content through the substitution operation. Therefore, when the file is leaked, the user identification information can be determined based on the substitution item embedded in the file content, so as to determine the source of leakage. By fusing the user identification information and the file content, the anti-damage property of the file is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of information processing, and more particularly to a document processing method, apparatus, electronic device, storage medium, and program product. Background Technology

[0002] Many platforms currently offer file download functionality, and some files may contain sensitive information. If these files are illegally downloaded and publicly disseminated, they may cause public opinion risks or data security issues.

[0003] Currently, tracing the source of sensitive documents relies on embedding identifiable identifying information within the documents, such as visible watermarking technology and invisible character technology.

[0004] However, in these existing methods, the identification information is separated from the file content, resulting in poor resistance to vandalism. Summary of the Invention

[0005] The document processing methods, apparatus, electronic devices, storage media, and program products provided in this application are used to improve the resistance of documents to damage.

[0006] In a first aspect, embodiments of this application provide a document processing method, including:

[0007] Upon receiving a download request for a target file, the system generates a replacement index value for the replaceable part of the target file based on the requester's user identification information and mapping rules.

[0008] Based on the replacement item index value, a replacement operation is performed on the replaceable part to generate a downloadable file corresponding to the target file;

[0009] In the event of a leak of the downloadable file, user identification information is restored based on the replaced portion of the downloadable file, and the source of the leak is determined based on the restored user identification information.

[0010] Optionally, upon receiving a download request for the target file, generating replacement index values ​​for the replaceable parts of the target file based on the requester's user identifier information and mapping rules includes:

[0011] When there are multiple replaceable parts, upon receiving a download request for the target file, a mixed cardinality decomposition is performed based on the user identifier information of the requester and the number of replaceable items in each replaceable part of the target file to obtain the replaceable item index value for each replaceable part.

[0012] Optionally, the step of performing a mixed cardinality decomposition based on the requester's user identifier information and the number of replaceable items in each replaceable part of the target file to obtain the replaceable item index value for each replaceable part includes:

[0013] Any replaceable portion in the target file, excluding the last replaceable portion, is called a target replaceable portion:

[0014] The first product is obtained by multiplying the number of replaceable items corresponding to the target replaceable part and all replaceable parts after the target replaceable part, and the first result is obtained by taking the modulo result of the first product with the value corresponding to the user identifier information of the requester.

[0015] Multiply the number of replacement items corresponding to all replacement parts after the target replacement part to obtain the second result;

[0016] The ratio of the first result to the second result is rounded down and then increased by a preset value to obtain the replacement index value of the target replaceable part.

[0017] Optionally, the step of performing a mixed cardinality decomposition based on the requester's user identifier information and the number of replaceable items in each replaceable part of the target file to obtain the replaceable item index value for each replaceable part includes:

[0018] For the last replaceable portion in the target file:

[0019] The third result is obtained by taking the modulo between the value corresponding to the user identifier information of the requester and the number of replaceable items in the last replaceable part;

[0020] Based on the ratio of the third result to the default value, a fourth result is obtained;

[0021] The ratio of the third result to the fourth result is rounded down and then increased by a preset value to obtain the replacement index value of the last replaceable part.

[0022] Optionally, in the event of a leaked downloadable file, restoring user identification information based on the replaced portion of the downloadable file and determining the source of the leak based on the restored user identification information includes:

[0023] In the event of a leaked downloadable file, the downloadable file is compared with the target file to determine the replaced portion of the downloadable file;

[0024] Based on the replacement index value corresponding to the replaced portion of the downloadable file, the user identification information is restored, and the source of the leak is determined based on the restored user identification information.

[0025] Optionally, restoring the user identification information based on the replacement index value corresponding to the replaced portion in the downloadable file includes:

[0026] The first difference corresponding to the target replaced part is determined based on the difference between the replacement index value corresponding to the target replaced part and the preset value;

[0027] The second product of the target replaced part is determined based on the product of the number of replaceable items corresponding to all replaced parts after the target replaced part; wherein, when the target replaced part is the last replaced part, the product of the number of replaceable items corresponding to all replaced parts after it is the default value;

[0028] Based on the product of the first difference and the second product, the identifier of the replaced part of the target is determined;

[0029] The target replaced part is any replaced part in the downloadable file. After the identifier of each replaced part is determined, the user identifier information is restored by summing the identifiers of each replaced part.

[0030] Optionally, the method further includes:

[0031] At least one replaceable section is set in the target file, and the replaceable section includes multiple replaceable items.

[0032] Optionally, the method further includes:

[0033] The number of replacement combinations is determined based on the number of replaceable parts and the number of replaceable items corresponding to each replaceable part;

[0034] When the number of replacement combinations is greater than or equal to the upper limit of the number of users corresponding to the user identification information, the target file is determined to be an allowed download file.

[0035] Optionally, the method further includes:

[0036] When the number of possible replacement combinations is less than the maximum number of users, additional replacements are added.

[0037] Optional, replaceable parts include at least one of punctuation marks, non-critical text, and formatting styles.

[0038] Secondly, embodiments of this application provide a document processing apparatus, including:

[0039] The first processing module is used to generate replacement item index values ​​for replaceable parts in the target file based on the user identification information of the requester and mapping rules when a download request for the target file is received.

[0040] The second processing module is used to perform a replacement operation on the replaceable part based on the replacement item index value, and generate a downloadable file corresponding to the target file;

[0041] The third processing module is used to restore user identification information based on the replaced portion of the downloadable file in the event of the leak, and to determine the source of the leak based on the restored user identification information.

[0042] Thirdly, embodiments of this application provide an electronic device, including: a memory and a processor;

[0043] The memory stores computer-executed instructions;

[0044] The processor executes computer execution instructions stored in the memory, causing the processor to perform the first aspect and / or various possible implementations of the first aspect as described above.

[0045] Fourthly, embodiments of this application provide a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, are used to implement the first aspect and / or various possible implementations of the first aspect.

[0046] Fifthly, embodiments of this application provide a computer program product, including a computer program that, when executed by a processor, implements the first aspect and / or various possible implementations of the first aspect.

[0047] The file processing method, apparatus, electronic device, storage medium, and program product provided in this application, upon receiving a download request for a target file, generates a replacement item index value for the replaceable portion of the target file based on the requester's user identification information and a mapping rule; performs a replacement operation on the replaceable portion based on the replacement item index value, generating a downloadable file corresponding to the target file; and in the event of a leaked downloadable file, restores the user identification information based on the replaced portion in the downloadable file, and determines the source of the leak based on the restored user identification information. The solution in this application, based on mapping rules, converts user identification information into a replacement item index value and embeds the replacement item corresponding to the index value into the file content through a replacement operation. Therefore, in the event of a file leak, the user identification information can be determined based on the replacement item embedded in the file content, thereby identifying the source of the leak. By integrating user identification information with the file content, the file's resistance to damage is improved. Attached Figure Description

[0048] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0049] Figure 1 Flowchart of the document processing method provided for this application Figure 1 ;

[0050] Figure 2 Flowchart of the document processing method provided for this application Figure 2 ;

[0051] Figure 3 A schematic diagram of the document processing apparatus provided in this application;

[0052] Figure 4 A schematic diagram of the structure of the electronic device provided in this application.

[0053] The accompanying drawings illustrate specific embodiments of this application, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concept of this application to those skilled in the art through reference to particular embodiments. Detailed Implementation

[0054] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.

[0055] In various sectors such as finance, government affairs, and healthcare, document download functionality is widely used across different platforms. For example, platforms such as banks, payment institutions, and government departments need to provide users (e.g., institutional clients, merchants, internal employees) with documents containing sensitive information, such as transaction records, merchant qualification certificates, and customer information forms. These documents may contain sensitive information, and if they are illegally downloaded and publicly disseminated, it could lead to public opinion risks or data security issues.

[0056] Therefore, after a file containing sensitive information (referred to as a sensitive file) is leaked, it is particularly important to trace its source in order to locate the source of the leak and take appropriate measures.

[0057] Currently, tracing the origin of sensitive documents relies on embedding identifiable identifying information within the files, such as:

[0058] 1. Visible watermarking technology

[0059] This method identifies the downloader of a file by adding a visible watermark (such as text or an image). However, this method is easily removed by malicious copying or image processing, and the watermark information may be cropped or covered.

[0060] 2. Invisible character technology

[0061] Invisible special characters or metadata can be embedded in a file, and the source can be traced by parsing the characters. However, such characters may be lost after being recognized by OCR (Optical Character Recognition) or stripped off due to file format conversion (such as PDF to Word).

[0062] The common drawback of the above methods is that the identification information is separated from the file content and lacks deep integration with the file content, resulting in poor resistance to damage.

[0063] To address this issue, this application proposes a file processing method that, based on mapping rules, transforms user identification information into replacement index values ​​and embeds the corresponding replacement values ​​into the file content through a replacement operation. Therefore, in the event of a file leak, the user identification information can be determined based on the replacement values ​​embedded in the file content, thereby identifying the source of the leak. By integrating user identification information with the file content, the file's resistance to damage is improved.

[0064] The technical solution of this application and how the technical solution of this application solves the above-mentioned technical problems are described in detail below with specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. The embodiments of this application will now be described with reference to the accompanying drawings.

[0065] Figure 1 Flowchart of the document processing method provided for this application Figure 1 ,like Figure 1 As shown, the method includes:

[0066] S101. Upon receiving a download request for the target file, the replacement index value of the replaceable part in the target file is generated based on the user identification information of the requester and through dynamic mapping rules.

[0067] In this embodiment of the application, the target file includes a replaceable portion, and the replaceable portion has corresponding replaceable items.

[0068] For example, the target file may include one or more replaceable parts, each of which may correspond to one or more replaceable items.

[0069] For example, replaceable portions refer to areas or elements in the target file that can be replaced without affecting the overall content. These can be areas or elements in text or images. Areas or elements may include, for example, punctuation marks, non-critical text, and formatting styles. Non-critical text refers to text portions in the target file that do not affect the overall content, such as examples, explanations, and annotations.

[0070] Replaceable items refer to the optional replacement content of replaceable parts in the target file. Replaceable items can include punctuation marks with similar functions, text expressions with similar semantics, etc.

[0071] The mapping rules are used to map user identification information and replacement item index values. The replacement page index value refers to the sequence number or label of the target replacement item corresponding to the replaceable part. The target replacement item is one of the multiple replaceable items corresponding to the replaceable part. For example, if the replaceable part includes five replacement items, the replacement item index value can be the 5th item, the 4th item, etc.

[0072] For example, user identification information may include a user ID. The number of digits in the user ID determines the maximum number of users. For instance, a user ID has 11 decimal digits (each digit from 0 to 9), so it can have a maximum of 10. 11 Each user is assigned a unique ID, meaning the maximum number of users is 10. 11 .

[0073] In this embodiment of the application, upon receiving a download request for a target file, the replacement item index value is determined based on the requester's user identification information and the mapping rules. It is understood that, since the mapping rules are used to map user identification information and replacement item index values, the replacement item index value for the replaceable portion of the target file can be determined based on the requester's user identification information.

[0074] S102. Perform a replacement operation on the replaceable part based on the replacement item index value to generate a downloadable file corresponding to the target file.

[0075] In this embodiment, after determining the replacement item index value of the replaceable part in the target file, the target replacement item for the replaceable part is determined from the set of replaceable items corresponding to the replaceable part based on the replacement item index value; the content of the replaceable part in the target file is replaced with the content of the target replacement item, generating a downloadable file corresponding to the target file. The set of replaceable items may include one or more replaceable items.

[0076] The target file is the original template file containing the replaceable parts, and it is not directly provided to users for download. The downloadable file is the final file that users are allowed to download. It is generated by replacing the replaceable parts, and the content of the replaceable parts differs for different users.

[0077] For example, when the target file includes multiple replaceable parts, after determining the replacement item index value of each replaceable part in the target file, the target replacement item for each replaceable part is determined from the set of replaceable items corresponding to each replaceable part according to the replacement item index value; the content of each replaceable part in the target file is replaced with the content of the corresponding target replacement item to generate a downloadable file corresponding to the target file.

[0078] S103. In the event of a leaked downloadable file, determine the user identification information based on the replaced portion of the downloadable file, and determine the source of the leak based on the determined user identification information.

[0079] Because different user identification information is mapped to different replacement index values ​​through mapping rules, the content of the replacement part of the downloadable file corresponding to different users is uniquely different, thus embedding user identification information into the downloadable file. Specifically, the replaced part of the downloadable file corresponds to the replaceable part of the target file, and the replaced part of the downloadable file is obtained by replacing the replaceable part of the target file.

[0080] Accordingly, when a downloadable file is leaked, the content of each replaced part in the leaked file is extracted and matched with the set of replacement items to determine the index value of the replacement item corresponding to each replaced part; then, according to the reverse algorithm of the mapping rules or the lookup table, the user identification information is restored from the replacement item index value; based on the user identification information, the user corresponding to the leak source can be located.

[0081] The file processing method provided in this application, based on mapping rules, converts user identification information into replacement index values ​​and embeds the replacement items corresponding to the index values ​​into the file content through a replacement operation. Therefore, in the event of a file leak, the user identification information can be determined based on the replacement items embedded in the file content, thereby identifying the source of the leak. By integrating user identification information with the file content, the file's resistance to damage is improved.

[0082] This embodiment is in Figure 1 Based on the above, step S101 will be described in detail. Step S101, upon receiving a download request for the target file, generates replacement index values ​​for replaceable parts of the target file according to the requester's user identifier information and mapping rules. This may include:

[0083] S201. When there are multiple replaceable parts, upon receiving a download request for the target file, the system performs a mixed cardinality decomposition based on the user identifier information of the requester and the number of replaceable items in each replaceable part of the target file to obtain the replaceable item index value for each replaceable part.

[0084] For example, hybrid cardinality decomposition refers to decomposing the numerical value corresponding to the user identifier information of the requester into multiple numbers, each with a different cardinality. That is, each replaceable part has a different cardinality. The cardinality corresponding to each replaceable part can be the product of the number of replaceable items corresponding to that replaceable part and all the replaceable parts thereafter.

[0085] For example, the target file includes four replaceable parts, which are named the first replaceable part, the second replaceable part, the third replaceable part, and the fourth replaceable part according to their order of appearance in the target file. The cardinality of the first replaceable part is the product of the number of replaceable items corresponding to the four replaceable parts; the cardinality of the second replaceable part is the product of the number of replaceable items corresponding to the second, third, and fourth replaceable parts; the cardinality of the third replaceable part is the product of the number of replaceable items corresponding to the third and fourth replaceable parts; and the cardinality of the fourth replaceable part is the product of the number of replaceable items corresponding to the fourth replaceable part.

[0086] Optionally, any replaceable part other than the last replaceable part in the target file is called the target replaceable part:

[0087] The first product is obtained by multiplying the number of replaceable items corresponding to the target replaceable part and all replaceable parts after the target replaceable part. The first result is obtained by taking the modulo result of the first product with the value corresponding to the user identifier information of the requester.

[0088] Furthermore, the second result is obtained by multiplying the number of replacement items corresponding to all replacement parts after the target replacement part.

[0089] Then, the ratio of the first and second results is rounded down and incremented by a preset value to obtain the replacement index value for the target replaceable part. Modulo and division operations usually produce an integer sequence starting from 0. Increasing the preset value makes the index value more consistent with conventional identification habits or directly correspond to the preset replacement list.

[0090] When calculating the replacement index value of the target replaceable part, it only depends on the number of replaceables in that part and thereafter, without needing to obtain or calculate the total product of all global replaceables. This algorithm is recursive, with the computational complexity of each step being constant, significantly reducing computational resource consumption and improving the system's response speed and throughput under high concurrency requests.

[0091] It should be noted that the last replaceable part can refer to the last replaceable part in the target file. The target replaceable part refers to any replaceable part other than the last replaceable part. Based on the above example, the replacement index value of each replaceable part other than the last replaceable part can be obtained.

[0092] For example, the target file includes four replaceable parts, which are referred to as the first replaceable part, the second replaceable part, the third replaceable part, and the fourth replaceable part in the order in which they appear in the target file. The fourth replaceable part is the last replaceable part.

[0093] When the target replaceable part is the first replaceable part, the first product is the product of the number of replaceable items corresponding to the first replaceable part, the number of replaceable items corresponding to the second replaceable part, the number of replaceable items corresponding to the third replaceable part, and the number of replaceable items corresponding to the fourth replaceable part; the second product is the product of the number of replaceable items corresponding to the second replaceable part, the number of replaceable items corresponding to the third replaceable part, and the number of replaceable items corresponding to the fourth replaceable part.

[0094] When the target replaceable part is the second replaceable part, the first product is the product of the number of replaceable items corresponding to the second replaceable part, the number of replaceable items corresponding to the third replaceable part, and the number of replaceable items corresponding to the fourth replaceable part; the second product is the product of the number of replaceable items corresponding to the third replaceable part and the number of replaceable items corresponding to the fourth replaceable part.

[0095] When the target replaceable part is the third replaceable part, the first product is the product of the number of replaceable items corresponding to the third replaceable part and the number of replaceable items corresponding to the fourth replaceable part; the second product is the third replaceable part itself.

[0096] Optionally, for the last replaceable part in the object file:

[0097] The third result is obtained by taking the modulo between the value corresponding to the user's identifier information and the number of replaceable items in the last replaceable part; the fourth result is obtained by the ratio of the third result to the default value; the ratio of the third result to the fourth result is rounded down and then increased by a preset value to obtain the index value of the replaceable item in the last replaceable part.

[0098] Since there are no replacements after the last replacement, the index value of the replacement item corresponding to the last replacement can be determined.

[0099] For example, both the default and preset values ​​can be 1. The preset value is used to convert the computer's 0-based indexing to a human- and business-logical 1-based indexing. Since there are no other parts after the last replaceable part, according to the mathematical definition of the empty product, this product equals 1, which is also the default value of 1.

[0100] For example, based on the following formula, the requester's user identification information is decomposed using a hybrid technique according to the number of replacement items for each replaceable part, to obtain the replacement item index value for each replaceable part:

[0101]

[0102] It refers to As the divisor, for The remainder (or modulus) obtained after performing the modulo operation. The numerical value corresponding to the user identification information of the requester; For the first The product of the number of substitutes for each of the first and all subsequent substitutes. It refers to the first The number of replaceable items for each replaceable part; It refers to the first The product of the number of substitutes for all substitutes after the first substitute; It refers to the first The number of replaceable items for each replaceable part; 1 refers to the preset value. Modulo and division operations usually produce an integer sequence starting from 0. The preset value completes the conversion from starting from 0 to starting from 1, making the index value more in line with conventional identification habits or directly corresponding to the preset list of replaceable items.

[0103] For example, if the user ID is 2015, and the user ID is 11 digits long, the user ID could be 00000002015. In the case where the target file contains four replaceable parts, referred to as the first replaceable part, second replaceable part, third replaceable part, and fourth replaceable part in the order they appear in the target file:

[0104] The index value of the replacement item in the first replaceable part for:

[0105] ;

[0106] The index value of the replacement item in the second replaceable part for:

[0107] ;

[0108] The index value of the replacement item in the third replaceable part for:

[0109] ;

[0110] The index value of the replacement item in the fourth replaceable part for:

[0111] ;

[0112] The index values ​​of the replacement items corresponding to each replaceable part are 5, 4, 8, and 9, respectively. That is, the first replaceable part uses the 5th item in the replacement items corresponding to the first replaceable part, the second replaceable part uses the 4th item in the replacement items corresponding to the second replaceable part, the third replaceable part uses the 8th item in the replacement items corresponding to the third replaceable part, and the fourth replaceable part uses the 9th item in the replacement items corresponding to the fourth replaceable part.

[0113] In this embodiment, when there are multiple replaceable parts, upon receiving a download request for the target file, a mixed cardinality decomposition is performed based on the requester's user identification information and the number of replaceable items in each replaceable part of the target file to obtain the replaceable item index value for each replaceable part. By losslessly expanding one-dimensional user identification information into a multi-dimensional fingerprint sequence embedded in the document content, the covert embedding of user identity is achieved without changing the file's external function and format.

[0114] It should be noted that when there is only one replaceable part, the replacement item index value of the replaceable part is 1.

[0115] Figure 2 Flowchart of the document processing method provided for this application Figure 2 ,like Figure 2 As shown, step S103, in the event of a leaked downloadable file, restores the user identification information based on the replaced portion of the downloadable file, and determines the source of the leak based on the restored user identification information, including:

[0116] S301. In the event of a leaked downloadable file, compare the downloadable file with the target file to determine the replaced portion in the downloadable file.

[0117] For example, since the downloadable file is generated by replacing the replaceable parts of the target file, the difference between the two is that the downloadable file includes the replaced parts, while the target file includes the replaceable parts. Therefore, after comparing the downloadable file with the target file, the parts in the downloadable file that are inconsistent with the target file are the replaced parts.

[0118] S302. Based on the replacement index value of the replaced part in the downloadable file, restore the user identification information, and determine the source of the leak based on the restored user identification information.

[0119] It should be noted that the replacement index value of the replaced portion in the downloadable file is the same as the replacement index value of the replaceable portion in the target file. The replacement index value of the replaceable portion in the target file is generated based on identification information; therefore, the user identification information can be reconstructed based on the replacement index value of the replaced portion in the downloadable file. The user corresponding to the user identification information is the user who leaked the downloadable file, thus identifying the source of the leak.

[0120] Optionally, a first difference is determined based on the difference between the replacement index value corresponding to the target replaced part and a preset value; a third product is determined based on the product of the number of replaceable items corresponding to all replaced parts after the target replaced part; wherein, when the target replaced part is the last replaced part, the product of the number of replaceable items corresponding to all replaced parts after it is the default value.

[0121] Based on the product of the first difference and the third product, the identifier of the target replaced part is determined; the target replaced part is any replaced part in the downloadable file. After the identifier of each replaced part is determined, the user identification information is restored according to the sum of the identifiers of each replaced part.

[0122] The restoration process is broken down into a series of simple arithmetic operations (subtraction, multiplication, and addition), which has low computational complexity and can quickly process leaked files, improving the efficiency and response speed of tracing the source.

[0123] For example, user identification information is restored based on the replacement index value of the replaced portion in the downloadable file, according to the following formula:

[0124]

[0125] For the first The replacement index value corresponding to the replaced part, that is, the first replaced part. The replacement index value corresponding to each replaceable part; It refers to the first The product of the number of substitutes for the 1st replaced part and all subsequent replaced parts, that is, the 1st... The product of the number of substitutes for the first substitute part and all subsequent substitute parts; It refers to the first The number of replaceable items in the replaced part, that is, the number of items in the replaced part. The number of replaceable items for each replaceable part.

[0126] For example, after comparing the downloadable file and the target file, it is determined that there are four replaced parts, which are named the first replaced part, the second replaced part, the third replaced part, and the fourth replaced part according to their order of appearance in the downloadable file. The replacement item index values ​​corresponding to each replaced part are 5, 4, 8, and 9, respectively. Then the user ID is: (5-1)×6×8×9+(4-1)×8×9+(8-1)×9+(9-1)×1=2025.

[0127] In this embodiment, in the event of a leaked downloadable file, the downloadable file is compared with the target file to determine the replaced portion of the downloadable file. Based on the replacement index value corresponding to the replaced portion, user identification information is restored, and the source of the leak is determined based on the restored user identification information. The forward process involves encoding user information (through mapping rules) into the index value and performing the replacement; the reverse process in this embodiment decodes the user information from the replacement result (index value), highlighting the consistency and reversibility of the solution.

[0128] The document processing method provided in this application, based on the above embodiments, further includes:

[0129] S401. Set at least one replaceable section in the target file, and the replaceable section includes multiple replacement items.

[0130] For example, in a downloadable file, not all content is unreplaceable; some punctuation, examples, words, and even content can be replaced without affecting the original meaning. The file provider can then set multiple replaceable parts and replacement items. By pre-setting replaceable parts and their replacements in the target file, the replacement operation ensures that the integrity of the file content is not affected.

[0131] For example, the replaceable part should not only include characters, but can also use images or other visually distinguishable content as the replaceable part.

[0132] For example, the methods for setting replaceable parts not only include manual setting by business personnel, but can also be implemented by methods such as automatic detection and replacement by models.

[0133] Optionally, after setting at least one replaceable part in the target file, the number of replacement item combinations is determined based on the number of replaceable parts and the number of replaceable items corresponding to each replaceable part. When the number of replacement item combinations is greater than or equal to the upper limit of the number of users corresponding to the user identification information, the target file is determined to be an allowed download file. By controlling the number of replacement item combinations to be greater than or equal to the upper limit of the number of users, it is ensured that a unique replacement item combination is matched for each user identification information.

[0134] It should be noted that if the target file is a downloadable file, the corresponding downloadable file will be available for download by the user.

[0135] Optionally, if the number of alternative combinations is less than the maximum number of users, additional alternatives can be added to ensure that a unique alternative combination can be matched for each user.

[0136] For example, suppose there are n replaceable parts in total, the nth... There are several replaceable parts. There are ___ substitutions_, then the different combinations of substitutions total ___. One possibility, that is, it can represent Individual users. Before publishing the file, check if... If the number of possible scenarios exceeds the planned user limit, the application can be published; otherwise, additional replacement content is required.

[0137] Optional, alternatives include at least one of punctuation marks, non-critical text, and formatting styles.

[0138] Non-critical text refers to text portions of a file that do not affect the overall content, such as example descriptions and annotations. For example, replace "." with "," in the replaceable section; replace "example text" with "reference content" in the replaceable section; change the font color of the replaceable section from black to gray.

[0139] By selecting punctuation marks, non-critical text, or formatting styles as alternatives, the concealment of identification information is significantly improved without affecting the content of the document, ensuring that the replacement operation has no substantial impact on the readability of the document and the business logic.

[0140] In some examples, at least two of the following replacement methods are performed on the replaceable portion: character replacement, image replacement, and format replacement.

[0141] Image replacement refers to replacing replaceable parts with image-based identifiers, such as inserting a specific pattern into a file or replacing sample text with an image containing user ID information.

[0142] By employing a multi-dimensional replacement strategy and redundant design of identification information, the destructiveness of the display is enhanced. Even if a file is destroyed by OCR recognition or by photographing, the identification information can still be preserved by other methods.

[0143] In some examples, natural speech processing models are used to analyze the semantics and structure of the file content and dynamically generate replacements that fit the file content.

[0144] Natural language models are models that analyze the semantics and structure of text using machine learning algorithms, such as generating replacements based on context.

[0145] By dynamically generating replacement items through natural language processing models, we ensure that the replacement items have minimal visual and functional impact on the document content, improve the integration of replacement items with text content, reduce the likelihood of users perceiving the identification information through semantic consistency, and adapt to the needs of different business scenarios.

[0146] In some examples, an encryption algorithm (such as a hash function or symmetric encryption) is introduced in the mapping process between user ID and replacement index to convert the user ID into an encrypted value, which is then mapped to the replacement index through a number system conversion.

[0147] For example, a user ID is first encrypted using an encryption algorithm to generate a fixed-length ciphertext. Then, the ciphertext is split into multiple numerical segments, each corresponding to an index of a replacement part. For instance, the user ID "2015" is encrypted to generate the ciphertext "a1b2c3d4", which is split into four segments: "a1", "b2", "c3", and "d4", each mapped to an index value of a replacement part.

[0148] The encrypted mapping rules increase the complexity of the replacement index generation, preventing attackers from reverse engineering user IDs through simple mathematical calculations. The encryption algorithm significantly improves the resistance to reverse engineering of the replacement rules, preventing the identification information from being cracked; the encrypted value is not directly related to the user ID, reducing the risk of identification information being identified; the encryption algorithm can support longer user ID lengths, adapting to the needs of large-scale user groups.

[0149] In some examples, the replacement process is divided into a base layer and an extension layer: the base layer handles core identification information (such as user ID), and the extension layer handles auxiliary information (such as download time and file type). For example, the base layer embeds the user ID by replacing characters, and the extension layer embeds the download timestamp by replacing the format.

[0150] The base layer uses fixed replacement rules (such as number system conversion) to generate replacement items corresponding to the user ID, while the extension layer uses dynamic rules (such as timestamp hashing) to generate replacement items corresponding to auxiliary information. During file download, the system performs replacement operations in both the base and extension layers simultaneously.

[0151] The layered replacement mechanism ensures that even if the extension layer is compromised, the base layer still provides effective traceability by separating core identification information from auxiliary information. Furthermore, the core identification information (user ID) exists independently within the base layer, reducing the risk of traceability failure due to extension layer malfunction. The extension layer can embed more business-related data (such as download time and file type), enhancing the richness of traceability information. The layered design requires attackers to compromise multiple replacement layers simultaneously to completely eliminate the identification information, significantly improving security.

[0152] In some examples, machine learning models (such as generative adversarial networks, GANs) are used to predict user acceptance of replacements and select the replacement with the least impact on users. For example, in financial documents, the model analyzes user acceptance of replacing "example text" with "reference content" and prioritizes the replacement with the least perceived difference for users.

[0153] Train a machine learning model based on user behavior data. Input a user ID and file content, and output the optimal combination of replacement options. The model learns from historical data the user's acceptance of different replacement options (such as photocopying behavior and file modification history) and dynamically optimizes the replacement strategy.

[0154] Machine learning-assisted replacement uses a data-driven approach to select replacements that have the least impact on user perception, reducing the likelihood of users actively removing the identified information. Furthermore, minimizing the impact of replacements on user perception reduces the motivation for users to actively remove the identified information. By learning from user behavior data, it generates replacements that better align with user habits, reducing the risk of the identified information being recognized. The model can dynamically adjust the replacement strategy based on changes in user behavior, adapting to the needs of different business scenarios.

[0155] Figure 3 A schematic diagram of the document processing apparatus provided in this application is shown below. Figure 3 As shown, the document processing device 40 provided in this embodiment includes:

[0156] The first processing module 41 is used to generate replacement index values ​​for replaceable parts in the target file based on the user identification information of the requester and mapping rules when a download request for the target file is received.

[0157] The second processing module 42 is used to perform a replacement operation on the replaceable part based on the replacement item index value and generate a downloadable file corresponding to the target file.

[0158] The third processing module 43 is used to restore user identification information based on the replaced part of the downloadable file in the event of a leak, and to determine the source of the leak based on the restored user identification information.

[0159] In one possible implementation, the first processing module 41 is specifically used to, when there are multiple replaceable parts, upon receiving a download request for the target file, perform mixed cardinality decomposition based on the user identifier information of the requester and the number of replaceable items in each replaceable part of the target file to obtain the replaceable item index value of each replaceable part.

[0160] In one possible implementation, the first processing module 41 is specifically used for any replaceable portion in the target file other than the last replaceable portion, referred to as the target replaceable portion:

[0161] The first product is obtained by multiplying the number of replaceable items corresponding to the target replaceable part and all replaceable parts after the target replaceable part. The first result is obtained by taking the modulo result of the first product with the value corresponding to the user identifier information of the requester.

[0162] Multiply the number of replacement items corresponding to all replacement parts after the target replacement part to obtain the second result;

[0163] The ratio of the first result to the second result is rounded down and then increased by a preset value to obtain the replacement index value of the target replaceable part.

[0164] In one possible implementation, the first processing module 41 is specifically configured to process the last replaceable portion of the target file:

[0165] The third result is obtained by taking the modulo between the value corresponding to the user's identifier information and the number of replaceable items in the last replaceable part;

[0166] The fourth result is obtained based on the ratio of the third result to the default value;

[0167] The ratio of the third and fourth results is rounded down and then increased by a preset value to obtain the replacement index value of the last replaceable part.

[0168] In one possible implementation, the third processing module 43 is specifically used to compare the downloadable file with the target file in the event of a leaked downloadable file, and determine the replaced part in the downloadable file;

[0169] Based on the replacement index value corresponding to the replaced part in the downloadable file, restore the user identification information, and determine the source of the leak based on the restored user identification information.

[0170] In one possible implementation, the third processing module 43 is specifically used to determine the first difference corresponding to the target replaced part based on the difference between the replacement index value corresponding to the target replaced part and the preset value.

[0171] The second product of the target replaced part is determined by the product of the number of replaceable items corresponding to all replaced parts after the target replaced part; wherein, when the target replaced part is the last replaced part, the product of the number of replaceable items corresponding to all replaced parts after it is the default value;

[0172] Based on the product of the first difference and the second product, the identifier of the replaced part of the target is determined;

[0173] The target replaced part is any replaced part in the downloadable file. After the identifier of each replaced part is determined, the user identifier information is restored based on the sum of the identifiers of each replaced part.

[0174] In one possible implementation, a fourth processing module 44 is also included, for setting at least one replaceable portion in the target file, the replaceable portion including multiple replaceable items.

[0175] In one possible implementation, the fourth processing module 44 is further configured to determine the number of replacement item combinations based on the number of replaceable parts and the number of replaceable items corresponding to each replaceable part;

[0176] When the number of replacement combinations is greater than or equal to the upper limit of the number of users corresponding to the user identification information, the target file is determined to be an allowed download file.

[0177] In one possible implementation, the fourth processing module 44 is also used to supplement replaceable items when the number of replacement combination items is less than the upper limit of the number of users.

[0178] The file processing device provided in this embodiment can execute the method provided in the above method embodiment. Its implementation principle and technical effect are similar, and will not be described in detail here.

[0179] Figure 4 A schematic diagram of the structure of the electronic device provided in this application. Figure 4 As shown, the electronic device 50 provided in this embodiment includes at least one processor 501 and a memory 502. Optionally, the electronic device 50 further includes a communication component 503. The processor 501, memory 502, and communication component 503 are connected via a bus.

[0180] In a specific implementation, at least one processor 501 executes computer execution instructions stored in memory 502, causing at least one processor 501 to perform the above-described method.

[0181] The specific implementation process of processor 501 can be found in the above method embodiments, and its implementation principle and technical effect are similar. It will not be repeated here.

[0182] In the above embodiments, it should be understood that the processor can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), etc. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in this invention can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules within the processor.

[0183] The memory may include random access memory (RAM) and may also include non-volatile memory (NVM), such as at least one disk storage device.

[0184] The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of illustration, the buses shown in the accompanying drawings are not limited to a single bus or a single type of bus.

[0185] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described method.

[0186] This application also provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, implement the above-described method.

[0187] The aforementioned readable storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk. The readable storage medium can be any available medium accessible to a general-purpose or special-purpose computer.

[0188] An exemplary readable storage medium is coupled to a processor, enabling the processor to read information from and write information to the readable storage medium. Of course, the readable storage medium can also be a component of the processor. The processor and the readable storage medium can reside in an Application Specific Integrated Circuit (ASIC). Alternatively, the processor and the readable storage medium can exist as discrete components in the device.

[0189] The division of units is merely a logical functional division; in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be indirect coupling or communication connection through some interfaces, devices, or units, and may be electrical, mechanical, or other forms.

[0190] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0191] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0192] If a function is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0193] Those skilled in the art will understand that all or part of the steps of the above-described method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When executed, the program performs the steps of the above-described method embodiments; and the aforementioned storage medium includes various media capable of storing program code, such as ROM, RAM, magnetic disks, or optical disks.

[0194] Finally, it should be noted that other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common knowledge or customary techniques in the art not disclosed herein, and is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of the invention is limited only by the appended claims.

Claims

1. A file processing method, characterized in that, include: Upon receiving a download request for a target file, the system generates a replacement index value for the replaceable part of the target file based on the requester's user identification information and mapping rules. Based on the replacement item index value, a replacement operation is performed on the replaceable part to generate a downloadable file corresponding to the target file; In the event of a leak of the downloadable file, user identification information is restored based on the replaced portion of the downloadable file, and the source of the leak is determined based on the restored user identification information.

2. The method according to claim 1, characterized in that, Upon receiving a download request for a target file, the step of generating replacement index values ​​for replaceable parts of the target file based on the requester's user identifier information and mapping rules includes: When there are multiple replaceable parts, upon receiving a download request for the target file, a mixed cardinality decomposition is performed based on the user identifier information of the requester and the number of replaceable items in each replaceable part of the target file to obtain the replaceable item index value for each replaceable part.

3. The method according to claim 2, characterized in that, The step of performing a mixed cardinality decomposition based on the requester's user identifier information and the number of replaceable items in each replaceable part of the target file to obtain the replaceable item index value for each replaceable part includes: Any replaceable portion in the target file, excluding the last replaceable portion, is called a target replaceable portion: The first product is obtained by multiplying the number of replaceable items corresponding to the target replaceable part and all replaceable parts after the target replaceable part, and the first result is obtained by taking the modulo result of the first product with the value corresponding to the user identifier information of the requester. Multiply the number of replacement items corresponding to all replacement parts after the target replacement part to obtain the second result; The ratio of the first result to the second result is rounded down and then increased by a preset value to obtain the replacement index value of the target replaceable part.

4. The method according to claim 2, characterized in that, The step of performing a mixed cardinality decomposition based on the requester's user identifier information and the number of replaceable items in each replaceable part of the target file to obtain the replaceable item index value for each replaceable part includes: For the last replaceable portion in the target file: The third result is obtained by taking the modulo between the value corresponding to the user identifier information of the requester and the number of replaceable items in the last replaceable part; Based on the ratio of the third result to the default value, a fourth result is obtained; The ratio of the third result to the fourth result is rounded down and then increased by a preset value to obtain the replacement index value of the last replaceable part.

5. The method according to claim 1, characterized in that, In the event of a leaked downloadable file, restoring user identification information based on the replaced portion of the downloadable file, and determining the source of the leak based on the restored user identification information, includes: In the event of a leaked downloadable file, the downloadable file is compared with the target file to determine the replaced portion of the downloadable file; Based on the replacement index value corresponding to the replaced portion of the downloadable file, the user identification information is restored, and the source of the leak is determined based on the restored user identification information.

6. The method according to claim 5, characterized in that, The step of restoring user identification information based on the replacement index value corresponding to the replaced portion in the downloadable file includes: The first difference corresponding to the target replaced part is determined based on the difference between the replacement index value corresponding to the target replaced part and the preset value; The second product of the target replaced part is determined based on the product of the number of replaceable items corresponding to all replaced parts after the target replaced part; wherein, when the target replaced part is the last replaced part, the product of the number of replaceable items corresponding to all replaced parts after it is the default value; Based on the product of the first difference and the second product, the identifier of the replaced part of the target is determined; The target replaced part is any replaced part in the downloadable file. After the identifier of each replaced part is determined, the user identifier information is restored by summing the identifiers of each replaced part.

7. The method according to any one of claims 1-6, characterized in that, The method further includes: At least one replaceable section is set in the target file, and the replaceable section includes multiple replaceable items.

8. The method according to claim 7, characterized in that, The method further includes: The number of replacement combinations is determined based on the number of replaceable parts and the number of replaceable items corresponding to each replaceable part; When the number of replacement combinations is greater than or equal to the upper limit of the number of users corresponding to the user identification information, the target file is determined to be an allowed download file.

9. The method according to claim 8, characterized in that, The method further includes: When the number of possible replacement combinations is less than the maximum number of users, additional replacements are added.

10. The method according to any one of claims 1-6, characterized in that, Replaceable parts include at least one of punctuation marks, non-critical text, and formatting styles.

11. A document processing device, characterized in that, include: The first processing module is used to generate replacement item index values ​​for replaceable parts in the target file based on the user identification information of the requester and mapping rules when a download request for the target file is received. The second processing module is used to perform a replacement operation on the replaceable part based on the replacement item index value, and generate a downloadable file corresponding to the target file; The third processing module is used to restore user identification information based on the replaced portion of the downloadable file in the event of the leak, and to determine the source of the leak based on the restored user identification information.

12. An electronic device, characterized in that, include: Memory, processor; The memory stores computer-executed instructions; The processor executes computer execution instructions stored in the memory, causing the processor to perform the method as described in any one of claims 1-10.

13. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, are used to implement the method as described in any one of claims 1-10.

14. A computer program product, characterized in that, Includes a computer program that, when executed by a processor, implements the method described in any one of claims 1-10.