Concurrent editing based document processing method and apparatus, and electronic device
By aligning the structural tags and semantic information of document blocks, the system identifies document block rearrangement and movement operations during concurrent editing, solving the problem of poor accuracy in merging results during concurrent editing and achieving efficient and accurate document merging processing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- YUDONGYUAN (BEIJING) INFORMATION TECH CO LTD
- Filing Date
- 2026-02-06
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies have poor accuracy in merging documents edited concurrently by multiple users, especially when adjusting and rearranging document structures, making it difficult to guarantee semantic integrity and accuracy.
By acquiring the target version documents of each user, aligning them based on the structural tags and semantic information of document blocks, identifying the reordering and moving operations of document blocks, performing text difference processing, generating candidate editing and merging information, and finally generating the target merged document.
It improves the accuracy and semantic integrity of document merging results in concurrent editing scenarios, supports semi-automatic merging and full-process auditable playback, and meets auditing and accountability requirements.
Smart Images

Figure CN121659906B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and more specifically, to a document processing method, apparatus, and electronic device based on concurrent editing. Background Technology
[0002] With the continuous development of technology, the collaborative model of knowledge-based organizations has shifted from synchronous collaboration in the same location to a hybrid model that combines real-time collaboration among multiple people with asynchronous offline review. Remote / hybrid office work has been stable for a long time, which means that the same document will be repeatedly modified, reviewed and rolled back by multiple people at different times / locations / network conditions. Document merging and auditing have become routine infrastructure.
[0003] In existing technologies, the merging of results from concurrent editing by multiple users is usually achieved by performing character-level differential comparison on multiple versions of concurrent editing to quickly locate the changes and then merge them.
[0004] However, the above method can lead to poor accuracy in the merging results. Summary of the Invention
[0005] The purpose of this application is to address the shortcomings of the prior art by providing a document processing method, apparatus, and electronic device based on concurrent editing, thereby improving the accuracy of document merging and modification in concurrent editing scenarios.
[0006] To achieve the above objectives, the technical solutions adopted in the embodiments of this application are as follows:
[0007] In a first aspect, embodiments of this application provide a document processing method based on concurrent editing, including:
[0008] Based on the edit submission requests submitted by different users for the same original document, obtain the target version document for each user separately;
[0009] Based on the block information of each document block in each target version document and the block information of each document block in the original document, the structure tags of each document block in each target version document and the structure tags of each document block in the original document are determined respectively. The block information includes: structured location information, semantic information and persistent identifiers assigned to the document blocks during initialization.
[0010] Based on the structural tags of each document block in each target version document and the structural tags of each document block in the original document, determine the same document block between each target version document and the original document;
[0011] Based on the same document block in the original document, perform text difference processing on the same document block in each of the target version documents to generate candidate editing and merging information;
[0012] Based on the candidate edit and merge information, the original document is edited to obtain the target merged document.
[0013] Optionally, determining the structural tags of each document block in each target version document and the structural tags of each document block in the original document based on the block information of each document block in each target version document and the block information of each document block in the original document includes:
[0014] Based on the structured location information of each document block in the target version document and the structured location information of each document block in the original document, respectively, determine the structure path information of each document block in the target version document and the structure path information of each document block in the original document;
[0015] Based on the semantic information of each document block in the target version document and the semantic information of each document block in the original document, the semantic summary information of each document block in the target version document and the semantic summary information of each document block in the original document are determined respectively.
[0016] Based on the persistent identifier assigned to the document block during initialization, the global identifier of each document block in the target version document and the global identifier of each document block in the original document are determined respectively.
[0017] Based on the structural path information, semantic summary information and global identifier of each document block in the target version document, determine the structural tags of each document block in the target version document;
[0018] Based on the structural path information, semantic summary information, and global identifier of each document block in the original document, the structural tags of each document block in the original document are determined.
[0019] Optionally, determining the same document block between each target version document and the original document based on the structure tags of each document block in each target version document and the structure tags of each document block in the original document includes:
[0020] Based on the structural path information of each document block in each target version document and the structural path information of each document block in the original document, determine whether there are document blocks with the same structural path information in each target version document and the original document;
[0021] If so, then the document blocks with the same structural path information in each target version document and the original document shall be regarded as the same document block between each target version document and the original document;
[0022] If not, then based on the global identifiers of each document block in each target version document and the global identifiers of each document block in the original document, determine whether there are document blocks with the same global identifier in each target version document and the original document;
[0023] If so, then the document block with the same global identifier in each target version document and the original document shall be regarded as the same document block between each target version document and the original document.
[0024] Optionally, the method further includes:
[0025] Based on the semantic summary information of the same document block in each of the target version documents and the semantic summary information of the same document block in the original document, the same document block between the determined target version documents and the original document is verified.
[0026] Optionally, the method further includes:
[0027] If each of the target version documents and the original document contains a target document block with the same global identifier but different structural path information, then a target tag is generated for the target document block. The target tag is used to indicate that the target document block has undergone move or rearrangement editing.
[0028] Optionally, the step of performing text difference processing on the same document block in each of the target version documents based on the same document block in the original document to generate candidate edit merge information includes:
[0029] The text content of the same document block in each of the target version documents is compared with the text content of the same document block in the original document using character-level difference processing to determine the editing information corresponding to each of the target version documents; the editing information includes the editing content performed on the same document block in the target version document;
[0030] Based on the editing information corresponding to each target version document, conflict marker information for the same document block is generated for each target version document; the conflict marker information includes first marker information or second marker information; the first marker information is used to indicate that there is no conflict in the editing of the same document block, and the second marker information is used to indicate that there is a conflict in the editing of the same document block;
[0031] Based on the editing information corresponding to each target version document and the conflict marker information, candidate editing and merging information is generated.
[0032] Optionally, generating candidate edit merge information based on the editing information corresponding to each target version document and the conflict marker information includes:
[0033] If the conflict marker information is the first marker information, then the editing information corresponding to each target version document is concatenated to generate candidate editing merge information;
[0034] If the conflict marker information is the second marker information, then each editing information is classified according to the editing information corresponding to each target version document, and candidate editing merging information is generated based on the classification results.
[0035] Optionally, after obtaining the target merged document, the process further includes:
[0036] Based on the target merged document and the original document, a patch file is generated, which includes at least: the modified document blocks and the modified content;
[0037] Generate a merged ledger, which includes at least: the structural path information of the document blocks in each target version document of each user who submitted the edit request, the edit content of the document blocks for which the edit was performed, conflict information, and the merge strategy adopted for the conflicted edits.
[0038] Secondly, embodiments of this application also provide a document processing apparatus based on concurrent editing, including: an acquisition module, a determination module, a generation module, and a processing module;
[0039] The acquisition module is used to acquire the target version document of each user based on the editing submission requests submitted by different users for the same original document;
[0040] The determining module is used to determine the structural tags of each document block in each target version document and the structural tags of each document block in the original document based on the block information of each document block in each target version document and the block information of each document block in the original document, wherein the block information includes: structured location information, semantic information and persistent identifiers assigned to the document blocks during initialization.
[0041] The determining module is used to determine the same document block between each target version document and the original document based on the structure tags of each document block in each target version document and the structure tags of each document block in the original document;
[0042] The generation module is used to perform text difference processing on the same document block in each of the target version documents based on the same document block in the original document, and generate candidate editing and merging information;
[0043] The processing module is used to edit the original document according to the candidate edit and merge information to obtain the target merged document.
[0044] Optionally, the determining module is specifically used to determine the structural path information of each document block in the target version document and the structural path information of each document block in the original document based on the structured position information of each document block in the target version document and the structured position information of each document block in the original document, respectively.
[0045] Based on the semantic information of each document block in the target version document and the semantic information of each document block in the original document, the semantic summary information of each document block in the target version document and the semantic summary information of each document block in the original document are determined respectively.
[0046] Based on the persistent identifier assigned to the document block during initialization, the global identifier of each document block in the target version document and the global identifier of each document block in the original document are determined respectively.
[0047] Based on the structural path information, semantic summary information and global identifier of each document block in the target version document, determine the structural tags of each document block in the target version document;
[0048] Based on the structural path information, semantic summary information, and global identifier of each document block in the original document, the structural tags of each document block in the original document are determined.
[0049] Optionally, the determining module is specifically used to determine whether there are document blocks with the same structural path information in each target version document and the original document, based on the structural path information of each document block in each target version document and the structural path information of each document block in the original document;
[0050] If so, then the document blocks with the same structural path information in each target version document and the original document shall be regarded as the same document block between each target version document and the original document;
[0051] If not, then based on the global identifiers of each document block in each target version document and the global identifiers of each document block in the original document, determine whether there are document blocks with the same global identifier in each target version document and the original document;
[0052] If so, then the document block with the same global identifier in each target version document and the original document shall be regarded as the same document block between each target version document and the original document.
[0053] Optionally, it may also include a verification module;
[0054] The verification module is used to verify the same document block between the determined target version documents and the original document based on the semantic summary information of the same document block in each target version document and the semantic summary information of the same document block in the original document.
[0055] Optionally, the generation module is further configured to generate a target tag for the target document block if each of the target version documents and the original document have target document blocks with the same global identifier but different structural path information, and the target tag is used to indicate that the target document block has undergone move or rearrangement editing.
[0056] Optionally, the generation module is specifically used to perform character-level difference processing on the text content of the same document block in each of the target version documents and the text content of the same document block in the original document to determine the editing information corresponding to each of the target version documents; the editing information includes the editing content performed on the same document block in the target version document;
[0057] Based on the editing information corresponding to each target version document, conflict marker information for the same document block is generated for each target version document; the conflict marker information includes first marker information or second marker information; the first marker information is used to indicate that there is no conflict in the editing of the same document block, and the second marker information is used to indicate that there is a conflict in the editing of the same document block;
[0058] Based on the editing information corresponding to each target version document and the conflict marker information, candidate editing and merging information is generated.
[0059] Optionally, the generation module is specifically used to, if the conflict marking information is the first marking information, concatenate the editing information corresponding to each of the target version documents to generate candidate editing merge information;
[0060] If the conflict marker information is the second marker information, then each editing information is classified according to the editing information corresponding to each target version document, and candidate editing merging information is generated based on the classification results.
[0061] Optionally, the generation module is further configured to generate a patch file based on the target merged document and the original document, wherein the patch file includes at least: the modified document block and the modified content;
[0062] Generate a merged ledger, which includes at least: the structural path information of the document blocks in each target version document of each user who submitted the edit request, the edit content of the document blocks for which the edit was performed, conflict information, and the merge strategy adopted for the conflicted edits.
[0063] Thirdly, embodiments of this application provide an electronic device, including: a processor, a storage medium, and a bus. The storage medium stores machine-readable instructions executable by the processor. When the electronic device is running, the processor communicates with the storage medium via the bus, and the processor executes the machine-readable instructions to implement the document processing method based on concurrent editing as provided in the first aspect.
[0064] Fourthly, embodiments of this application provide a computer-readable storage medium storing a computer program that, when executed by a processor, performs the document processing method based on concurrent editing as provided in the first aspect.
[0065] The beneficial effects of this application are:
[0066] This application provides a document processing method, apparatus, and electronic device based on concurrent editing, comprising: obtaining the target version document of each user according to the editing submission requests submitted by different users for the same original document; determining the structural tags of each document block in each target version document and the structural tags of each document block in the original document according to the block information of each document block in each target version document and the block information of each document block in the original document; determining the same document block between each target version document and the original document according to the structural tags of each document block in each target version document and the structural tags of each document block in the original document; performing text difference processing on the same document block in each target version document according to the same document block in the original document to generate candidate editing merge information; and editing the original document according to the candidate editing merge information to obtain the target merged document. This method dynamically generates structural tags for each document block in the target version document and the original document based on editing requests submitted by different users to the same original document. These structural tags contain both structural and content information of the document blocks. Based on these tags, document blocks in the target version document and the original document are first aligned, and rearrangement and movement operations are effectively identified, avoiding their impact on the merging result. Secondly, based on the aligned document blocks, internal text difference processing is performed according to the content of the document blocks to effectively determine candidate editing merging information. Finally, based on this candidate information, the edits from different users are merged in the original document to generate the target merged document. This concurrent editing and merging process ensures convergence between edits submitted by different users while maintaining the semantic integrity and accuracy of the merging result, thus improving the accuracy of editing and merging results in concurrent editing scenarios.
[0067] In addition, by generating patch files and merging ledgers, it is easy to replay and verify in the same environment, which is convenient for automatic execution in the pipeline and also meets the needs of auditing and accountability. Attached Figure Description
[0068] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0069] Figure 1 A flowchart illustrating the document processing method based on concurrent editing provided in this application embodiment. Figure 1 ;
[0070] Figure 2 A flowchart illustrating the document processing method based on concurrent editing provided in this application embodiment. Figure 2 ;
[0071] Figure 3 A flowchart illustrating the document processing method based on concurrent editing provided in this application embodiment. Figure 3 ;
[0072] Figure 4 A flowchart illustrating the document processing method based on concurrent editing provided in this application embodiment. Figure 4 ;
[0073] Figure 5 A flowchart illustrating the document processing method based on concurrent editing provided in this application embodiment. Figure 5 ;
[0074] Figure 6 A schematic diagram of a document processing device based on concurrent editing provided in an embodiment of this application;
[0075] Figure 7 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0076] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. It should be understood that the accompanying drawings in this application are for illustrative and descriptive purposes only and are not intended to limit the scope of protection of this application. Furthermore, it should be understood that the schematic drawings are not drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of this application. It should be understood that the operations in the flowcharts may not be implemented in sequence, and steps without logical contextual relationships may be reversed or implemented simultaneously. In addition, those skilled in the art, guided by the content of this application, may add one or more other operations to the flowcharts, or remove one or more operations from the flowcharts.
[0077] Furthermore, the described embodiments are merely some, not all, of the embodiments of this application. The components of the embodiments of this application described and illustrated herein can typically be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely to illustrate selected embodiments of the application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.
[0078] It should be noted that the term "comprising" will be used in the embodiments of this application to indicate the presence of the features declared thereafter, but does not exclude the addition of other features.
[0079] As knowledge-based organizations undergo a long-term transformation towards remote and hybrid work models, collaborative editing has evolved from an auxiliary tool into a core infrastructure supporting enterprise operations. Current mainstream collaborative editing systems primarily rely on two types of underlying consistency algorithms: Operational Transformation (OT) and Conflict-free Replicated Data Types (CRDT). Both aim to achieve eventual consistency across multiple copies under abnormal conditions such as network partitioning, message out-of-order delivery, or concurrent writes. However, these mechanisms only guarantee convergence at the data level and do not focus on the integrity of semantic units or the fidelity of intent. For example, when multiple users simultaneously adjust the structure or replace terminology in the same paragraph, although the system can ensure that all changes are synchronized without omission, it cannot determine whether the resulting content contradictions disrupt the original logic or consistency of expression, leading to problems of formal consistency but semantic errors, severely impacting document credibility.
[0080] In offline merging scenarios, traditional version control systems widely employ line- or character-level difference comparison and three-way merging strategies. While these technologies are mature and stable, their core limitation lies in their lack of understanding of document structure. When users perform common editing actions such as moving entire paragraphs, rearranging chapters, or copying across sections, the system often misinterprets these as a large number of deletions and additions. This not only amplifies the scope of conflicts but also introduces review noise, making it difficult for reviewers to distinguish between substantive rewriting and positional relocation. Manual contextual analysis is required to determine the true intent.
[0081] While structured differential techniques for specific formats have made some progress, their applicability is limited. For example, in the field of structured data such as XML / JSON, there are already three-way merging schemes based on abstract syntax trees or node path matching; in the field of PDF or table processing, there are also methods to achieve content alignment through coordinate positioning and layout analysis. However, these schemes rely heavily on predefined data models or layout information, making it difficult to generalize to general rich text documents dominated by natural language (such as reports, manuals, technical white papers, etc.), and even more unable to handle complex structures such as mixed text and graphics, dynamic numbering, and linked citations.
[0082] To address the aforementioned issues, this solution provides a document processing method based on concurrent editing. Without altering existing editing habits or the underlying storage architecture, it achieves block-level alignment prioritizing structure over content, accurately identifies movement and rearrangement, exposes semantic conflicts with fine granularity, and supports semi-automatic merging and fully auditable playback, thus enabling efficient and accurate document merging processing.
[0083] Figure 1 A flowchart illustrating the document processing method based on concurrent editing provided in this application embodiment. Figure 1 This method can be applied to processing devices such as computers. Figure 1 As shown, the method includes:
[0084] S101. Based on the editing submission requests submitted by different users for the same original document, obtain the target version document for each user.
[0085] The original document can refer to the ancestral version document, that is, the version document that different users agreed upon after their last edit. Different users can make their own modifications and submit them based on the original document, thereby generating their own target version document.
[0086] For example, suppose the original document contains the statement "The system supports three import methods: A, B, and C". Different users have made different modifications to this statement in the original document. User A modifies it to "The system supports four import methods: A, B, C, and D", while user B modifies it to "The system supports two import methods: A and B". Therefore, the statement recorded in the target version document obtained from user A is "The system supports four import methods: A, B, C, and D", while the statement recorded in the target version document obtained from user B is "The system supports two import methods: A and B".
[0087] Because different users have made their own modifications to the original document, a corresponding target version document will be generated for each user.
[0088] S102. Based on the block information of each document block in each target version document and the block information of each document block in the original document, determine the structure tags of each document block in each target version document and the structure tags of each document block in the original document.
[0089] The block information includes: structured location information, semantic information, and persistent identifiers assigned to document blocks during initialization.
[0090] Based on the document's structured information, the document can be divided into the smallest units, resulting in multiple document blocks. Each chapter, paragraph, list item, table cell, etc., in the document can be considered a document block.
[0091] Based on the document partitioning rules mentioned above, we can first divide each target version document and the original document into document blocks to obtain each document block in each target version document and each document block in the original document.
[0092] Based on the block information of each document block in the target version document, structural tags can be generated for each document block in the target version document; at the same time, based on the block information of each document block in the original document, structural tags can be generated for each document block in the original document.
[0093] The block information for each document block may include, but is not limited to, structured location information, semantic information, and a persistent identifier assigned to the document block during initialization. Structured location information indicates the specific location of the document block within the document, while semantic information characterizes the specific content of the document block. The persistent identifier assigned to the document block during initialization can refer to a globally unique block ID generated by the system for each document block in the original ancestor version when it is first structured and parsed. Subsequent changes to this block ID are inherited by each document block in the target version document generated after each edit commit, as well as by each document block in the original document generated after each agreement on modifications. Therefore, the persistent identifier assigned to the document block during initialization does not change due to modifications to the document block's content or path.
[0094] S103. Based on the structure tags of each document block in each target version document and the structure tags of each document block in the original document, determine the same document block between each target version document and the original document.
[0095] In some embodiments, the document block structure of each target version document and the original document can be aligned first based on the structure tags of each document block in each target version document and the structure tags of each document block in the original document. This will find the same document block between the target version document and the original document, that is, find which document block in the target version document represents the same document block as a document block in the original document.
[0096] For example: document block a located on page 1, paragraph 2 in target version document 1 and document block b located on page 2, paragraph 3 in the original document belong to the same document block; document block a located on page 2, paragraph 5 in target version document 2 and document block b located on page 2, paragraph 3 in the original document belong to the same document block.
[0097] Only after aligning the document blocks of the target version document with those of the original document can fine-grained text difference comparison be performed between the aligned document blocks. Simultaneously, it allows for precise identification of document block movements and rearrangements, avoiding amplification of conflicts.
[0098] It is worth noting that since both the target version document and the original document contain multiple document blocks, there will be multiple instances of the same document block between each target version document and the original document. Each instance of the same document block indicates the association relationship between document blocks in multiple documents that point to the same document block.
[0099] S104. Based on the same document block in the original document, perform text difference processing on the same document block in each target version document to generate candidate editing and merging information.
[0100] This example illustrates the concept using any single document block aligned between the target version document and the original document.
[0101] After aligning the document blocks, the same document block in the original document can be used as a benchmark. Text differencing is then performed on the same document block in each target version document and its counterpart in the original document to generate a minimum editing script for each target version document. This minimum editing script contains the actual changes made to the same document block in the target version document compared to its counterpart in the original document. Based on the minimum editing scripts for each target version document, candidate edit merging information is generated. This candidate edit merging information is generated by integrating the minimum editing scripts for each target version document after conflict determination.
[0102] S105. Based on the candidate editing and merging information, edit the original document to obtain the target merged document.
[0103] Based on the candidate editing and merging information, a merging review can be conducted to determine the final merging method. Then, based on the candidate editing and merging information, the original document is edited according to the merging method to merge the edits submitted by different users into the original document, resulting in the target merged document. The target merged document then serves as the new original document, which will be used as the original document that different users agree on the next time they edit.
[0104] In summary, the document processing method based on concurrent editing provided in this embodiment includes: obtaining the target version document for each user based on the editing submission requests submitted by different users for the same original document; determining the structural tags of each document block in each target version document and the structural tags of each document block in the original document based on the block information of each document block in each target version document and the block information of each document block in the original document; determining the same document block between each target version document and the original document based on the structural tags of each document block in each target version document and the structural tags of each document block in the original document; performing text difference processing on the same document block in each target version document based on the same document block in the original document to generate candidate editing and merging information; and editing the original document based on the candidate editing and merging information to obtain the target merged document. This method dynamically generates structural tags for each document block in the target version document and the original document based on editing requests submitted by different users to the same original document. These structural tags contain both structural and content information of the document blocks. Based on these tags, document blocks in the target version document and the original document are first aligned, and rearrangement and movement operations are effectively identified, avoiding their impact on the merging result. Secondly, based on the aligned document blocks, internal text difference processing is performed according to the content of the document blocks to effectively determine candidate editing merging information. Finally, based on this candidate information, the edits from different users are merged in the original document to generate the target merged document. This concurrent editing and merging process ensures convergence between edits submitted by different users while maintaining the semantic integrity and accuracy of the merging result, thus improving the accuracy of editing and merging results in concurrent editing scenarios.
[0105] Figure 2 A flowchart illustrating the document processing method based on concurrent editing provided in this application embodiment. Figure 2 Optionally, in step S102, based on the block information of each document block in each target version document and the block information of each document block in the original document, the structure tags of each document block in each target version document and the structure tags of each document block in the original document are determined respectively, including:
[0106] S201. Based on the structured location information of each document block in the target version document and the structured location information of each document block in the original document, determine the structured path information of each document block in the target version document and the structured path information of each document block in the original document.
[0107] The structured location information of a document block refers to its specific position within the document. For example, it might be located on a specific page, in a specific paragraph or line, within a specific chapter, or as a table within a specific page and paragraph. This structured location information allows for precise identification of each document block. Each document block has unique structured location information, thus enabling the unique identification of a single document block.
[0108] Based on the structured location information of each document block in the target version document, structural path information of each document block in the target version document can be generated. Similarly, based on the structured location information of each document block in the original document, structural path information of each document block in the original document can be generated.
[0109] S202. Based on the semantic information of each document block in the target version document and the semantic information of each document block in the original document, determine the semantic summary information of each document block in the target version document and the semantic summary information of each document block in the original document.
[0110] The semantic information of a document block refers to its content, that is, the specific content information contained within the document block. By extracting key information from the semantic information of a document block and generating a semantic summary based on this extracted key information in a fixed order or through hashing, the semantic summary information of a document block can indicate its general content; that is, based on the semantic summary information, we can know roughly what the document block is about.
[0111] Semantic summary information of document blocks is used to compare the similarity between document blocks to help verify whether two document blocks are the same document block.
[0112] To adapt to different types of documents, the semantic summary information of document blocks can be made into pluggable components. For simple scenarios, the keywords of the document blocks can be used directly as semantic summary information; for complex scenarios, vector fingerprints can be generated instead of keywords.
[0113] Pluggable means that the system defines only a clear interface, the input is a document block, and the output is a certain feature or similarity, and the specific implementation can be replaced. For example, semantic summarization information can be implemented using keyword signatures or vector fingerprints.
[0114] Vector fingerprinting compresses the semantics of a document block into a fixed-length numerical vector (such as 384-dimensional or 768-dimensional), typically obtained using a Chinese semantic encoding model. When comparing whether two document blocks are synonymous, instead of comparing vocabulary overlap, the cosine similarity of the two vectors is calculated.
[0115] S203. Based on the persistent identifiers assigned to document blocks during initialization, determine the global identifiers of each document block in the target version document and the global identifiers of each document block in the original document.
[0116] Based on the persistent identifier of each document block, a corresponding global identifier can be assigned to each document block in the target version document, and a corresponding global identifier can also be assigned to each document block in the original document.
[0117] For example, if the persistent identifier of document block 1 is 'a' and the persistent identifier of document block 2 is 'b', then the global identifier 'a' is assigned to the document blocks in the target version document that point to the same content as document block 1; the global identifier 'b' is assigned to the document blocks in the target version document that point to the same content as document block 2; the global identifier 'a' is assigned to the document blocks in the original document that point to the same content as document block 1, and the global identifier 'b' is assigned to the document blocks in the original document that point to the same content as document block 2. In other words, the persistent identifiers assigned to document blocks during initialization will be carried over and inherited in subsequent user-generated target version documents.
[0118] S204. Based on the structural path information, semantic summary information, and global identifier of each document block in the target version document, determine the structural tags of each document block in the target version document.
[0119] S205. Based on the structural path information, semantic summary information, and global identifier of each document block in the original document, determine the structural tags of each document block in the original document.
[0120] Whether it is the target version of the document or the original document, for any document block, the structural path information, semantic summary information and global identifier of the document block can be combined to obtain the structural tag of the document block.
[0121] Of course, in practical applications, the structural tags of document blocks are not limited to the three types of information mentioned above; they may also include hash information that represents the modification type of the document block.
[0122] Figure 3 A flowchart illustrating the document processing method based on concurrent editing provided in this application embodiment. Figure 3 Optionally, in step S103, based on the structure tags of each document block in each target version document and the structure tags of each document block in the original document, the same document block between each target version document and the original document is determined, including:
[0123] S301. Based on the structural path information of each document block in each target version document and the structural path information of each document block in the original document, determine whether there are document blocks with the same structural path information in each target version document and the original document.
[0124] In some embodiments, path matching can be prioritized. If path matching fails, matching can be performed based on the global identifier of the document block. Finally, auxiliary verification can be performed based on the semantic summary information of the document block to ensure that the document blocks can be accurately aligned and that special modifications such as movement and rearrangement of the document blocks can be accurately identified.
[0125] Taking document block matching and alignment between two target version documents and the original document as an example, we can search for document blocks with the same structural path information in the three documents respectively.
[0126] S302. If so, then the document blocks with the same structural path information in each target version document and the original document shall be regarded as the same document blocks between each target version document and the original document.
[0127] If there are document blocks with consistent structural path information, then the three document blocks with consistent structural path information in the three documents will be regarded as the same document block between each target version document and the original document.
[0128] For example, if document block a in target version document 1, document block b in target version document 2, and document block c in the original document have the same structural path information, then document blocks a, b, and c are considered the same document block across target version document 1, target version document 2, and the original document. All three document blocks point to the same content.
[0129] S303. If not, then determine whether there are document blocks with the same global identifier in each target version document and the original document, based on the global identifier of each document block in each target version document and the global identifier of each document block in the original document.
[0130] If no document block with the same structural path information can be found among the three documents, the global identifier of each document block can be used to search for a document block with the same global identifier among the three documents.
[0131] S304. If so, then the document block with the same global identifier in each target version document and the original document shall be regarded as the same document block between each target version document and the original document.
[0132] If a document block with a globally consistent identifier can be found, then the three document blocks with the same globally consistent identifier in the three documents can be used as the same document block between each target version document and the original document.
[0133] Taking original document O, target version document L, and target version document R as an example, the process begins with a first round of matching based on the structural path information of the document blocks. For documents O, L, and R, document blocks with the same path are considered as the same document block. Document blocks that do not match in the first round are then matched based on their global identifiers, with document blocks sharing the same global identifier being considered as the same document block. For document blocks whose global identifiers match successfully but whose structural paths differ, it can be considered that the document blocks have been moved or rearranged.
[0134] Optionally, the method further includes: verifying the same document block between the determined target version documents and the original document based on the semantic summary information of the same document block in each target version document and the semantic summary information of the same document block in the original document.
[0135] In some embodiments, the same document block identified based on the structural path information and global identifier of the document block can be further verified by judging the semantic similarity of the document blocks based on the semantic summary information of the same document block, and finally determine whether they really belong to the same document block.
[0136] For example, based on the structural path information and global identifier of document blocks, if it is determined that document block a in the target version document L and document block c in the original document are the same document block, and document block b in the target version document R and document block c in the original document are the same document block, then the semantic similarity between document block a and document block c can be calculated based on the semantic summary information of document block a and document block c to verify whether document block a and document block c truly belong to the same document block. Similarly, the semantic similarity between document block b and document block c can be calculated based on the semantic summary information of document block b and document block c to verify whether document block b and document block c truly belong to the same document block.
[0137] Optionally, the method further includes: if each target version document and the original document contain target document blocks with the same global identifier but different structural path information, then a target tag for the target document block is generated, and the target tag is used to indicate that the target document block has undergone move or rearrangement editing.
[0138] In some embodiments, if a document block in the target version document has the same global identifier as a document block in the original document, but the structural path information is different, it can be assumed that the user has moved or rearranged the document block in the original document, and thus the document block can be marked in the target version document.
[0139] To illustrate with a specific example:
[0140] The original document is O: In the section 2.3 Performance Metrics of the original document, there is a paragraph with the global identifier blk-023 in the structure tag of the document block corresponding to this paragraph. The structure path information is / document / Chapter 2 / Section 2.3 / Paragraph 2, and the content is "The system should respond within 100ms".
[0141] In the target version of document L, the user moved the "Performance Indicators" section to "Chapter 4.2 Technical Requirements", thus changing the document block's structure path information to / document / Chapter 4 / Section 4.2 / Paragraph 1. However, the global identifier of the document block remains blk-023, and the content has only been slightly modified.
[0142] Target version document R: The user did not move this paragraph, but changed "100ms" to "150ms" in the original position.
[0143] First, locate the document block with the global identifier blk-023 in the original document O. Then, search for a document block with the same global identifier in the target version document L. If found, but the structural path information of this document block in the target version document L differs from that in the original document O, mark this document block in the target version document L as moved or rearranged. This document block may not have undergone any content modification, only movement or rearrangement. If traditional character difference comparison is used, this document block will be misidentified as "deletion + addition," obviously leading to a merge error.
[0144] If a document block with the same structural path information as the original document O is found in the target version document R, and the global identifier is also the same, then the target version document R is considered to have made an in-situ modification to the document block, that is, a content modification, and can be merged later through text difference processing within the document block.
[0145] In addition, in some embodiments, the structure path tags of each document block may also contain content hash information. After identifying the same document block between the target version document and the original document, the content hash information of the document block in the target version document and the content hash information of the same document block in the original document can be used to determine whether it is a modification of the document block content or a modification of the format. If it is determined to be a format modification, subsequent text difference processing within the document block can be skipped. For example, if only spaces / line breaks in the document block are changed, or the number of characters changed is less than N, it can be considered a format change, and content comparison can be skipped.
[0146] Figure 4 A flowchart illustrating the document processing method based on concurrent editing provided in this application embodiment. Figure 4 Optionally, in step S104, based on the same document block in the original document, text difference processing is performed on the same document block in each target version document to generate candidate edit merge information, including:
[0147] S401. Perform character-level difference processing on the text content of the same document block in each target version document and the text content of the same document block in the original document to determine the editing information corresponding to each target version document.
[0148] Editing information includes the edits performed on the same document block in the target version of the document.
[0149] Optionally, character-level difference processing can be performed on the same document blocks that have been successfully matched. Continuing with the above example, the text content difference processing can be performed on the same document blocks that are paired between the target version document L and the original document O to determine the changes made by the document block in the target version document L compared with the document block in the original document O, thereby generating the editing information corresponding to the target version document L. That is, by comparing the specific content of the two document blocks, it can be determined what modifications have been made to the document block in the target version document L.
[0150] Similarly, the text content of the same document block paired between the target version document R and the original document O is differentially processed to determine the changes made by the document block in the target version document R compared with the document block in the original document O, thereby generating the editing information corresponding to the target version document R.
[0151] S402. Based on the editing information corresponding to each target version document, generate conflict marker information for the same document block in each target version document.
[0152] The conflict flag information includes first flag information or second flag information; the first flag information is used to indicate that there is no conflict in editing the same document block, and the second flag information is used to indicate that there is a conflict in editing the same document block.
[0153] Based on the editing information corresponding to each target version document, modification conflicts in different target version documents can be determined and conflict marker information can be generated.
[0154] In some embodiments, if the modifications made to the same document block in the target version document L and the target version document R do not overlap or can be applied sequentially, it is determined that there is no conflict in the editing of the same document block by the two versions, and then the first mark information is generated.
[0155] In other embodiments, if someone in target version document L and target version document R modifies the structural elements of the same document block, such as changing the number of table columns or changing required fields, or if two people make contradictory modifications to the text content of the same document block, it is determined that there is a conflict in their editing of the same document block, and a second marker information is generated.
[0156] S403. Generate candidate edit merge information based on the editing information and conflict marker information corresponding to each target version document.
[0157] Based on the editing information corresponding to each target version document and the conflict marking information for the same document block in each target version document, a candidate editing and merging information can be automatically generated.
[0158] Additionally, a list of items requiring manual confirmation can be generated, which may include: which version of the terminology definition to use, whether to update a certain reference, and other items awaiting confirmation.
[0159] Optionally, in step S403, candidate edit merge information is generated based on the editing information and conflict marker information corresponding to each target version document, including: if the conflict marker information is the first marker information, then the editing information corresponding to each target version document is concatenated to generate candidate edit merge information.
[0160] In some embodiments, if the conflict marker information for the same document block is the first marker information, that is, if the modifications made to the same document block by the target version document L and the target version document R do not conflict, then the modifications made to the same document block in the two versions of the document can be merged and recorded in the candidate edit merge information. In this way, when merging in a subsequent process, the modifications of both versions can be retained simultaneously.
[0161] If the conflict marker information is the second marker information, then the editing information is classified according to the editing information corresponding to each target version document, and candidate editing merge information is generated based on the classification results.
[0162] In other embodiments, if the conflict marker information for the same document block is the second marker information, that is, if there is a conflict between the modifications made to the same document block by the target version document L and the target version document R, two sub-versions can be copied separately, one carrying the changes made by the target version document L to the document block and the other carrying the changes made by the target version document R to the document block. The system can automatically merge the two sub-versions according to pre-set executable rules, such as: definition class statements take precedence over examples, required fields cannot be deleted, cross-references must be consistent, etc., and record them in the candidate edit merge information.
[0163] It should be noted that the candidate edit merge information is not the result of merging the edit information from the two target version documents. Rather, it is a sorted and displayed edit information from the two target version documents. The edit information from both target version documents will be retained in the candidate edit merge information. For modifications that do not conflict, the changes from both versions are simply put together. For modifications that conflict, the changes from both versions are displayed in categories: structural changes are displayed together, and textual changes are displayed together. The final merge confirmation will be made manually based on the candidate edit merge information.
[0164] Optionally, reviewers can select or add explanations to the list of items requiring manual confirmation. Based on this, the system generates the final merged draft: replacing the content of the target document blocks, merging sub-versions if necessary, and automatically updating the table of contents, numbering, cross-references, and table indexes to ensure consistency within the document.
[0165] Figure 5 A flowchart illustrating the document processing method based on concurrent editing provided in this application embodiment. Figure 5 Optionally, in step S105, after obtaining the target merged document, the following steps are also included:
[0166] S501. Generate a patch file based on the target merged document and the original document.
[0167] The patch file should include at least: the modified document block and the modified content.
[0168] In some embodiments, based on the final generated target merged document, the target merged document can be compared with the original document to export the final modifications as a structure path-addressed patch file. The patch file can explicitly indicate which document block was modified and what specific changes were made within that document block.
[0169] S502. Generate a merged ledger, which includes at least: the structure path information of the document blocks in each target version document of each user who submitted the edit request, the edit content of the document blocks for which the edit was performed, conflict information, and the merge strategy used for conflicting edits.
[0170] It can also generate a merged ledger simultaneously, which can record the modification time, participants (i.e., users who participated in the modification), the path of the document blocks involved, the original changes made to the document blocks by different users, the conflict type, the conflict handling strategy applied, and the final merge method selected when generating the target merge file, so as to facilitate auditing and playback.
[0171] The patch file can be replayed in other environments using the same path to obtain consistent results.
[0172] The above process of this solution starts with structured block partitioning and structure tag alignment. Through intra-block differentiation of document blocks, conflict determination based on the original document, sub-version merging, structured patching, and merging ledger writing to the storage system in a persistent, verifiable, and reusable manner, a complete concurrent editing merging process implementation path is formed from the acquisition of the target version document and the original document to the generation of replayable results. The entire process does not change the way the existing editor and repository (in version control systems, this refers to the data structure on disk storage, including files, directories, and metadata) are used. It only adds structure-level alignment and recording before and after merging, and has good scalability.
[0173] In summary, the document processing method based on concurrent editing provided in this embodiment includes: obtaining the target version document for each user based on the editing submission requests submitted by different users for the same original document; determining the structural tags of each document block in each target version document and the structural tags of each document block in the original document based on the block information of each document block in each target version document and the block information of each document block in the original document; determining the same document block between each target version document and the original document based on the structural tags of each document block in each target version document and the structural tags of each document block in the original document; performing text difference processing on the same document block in each target version document based on the same document block in the original document to generate candidate editing and merging information; and editing the original document based on the candidate editing and merging information to obtain the target merged document. This method dynamically generates structural tags for each document block in the target version document and the original document based on editing requests submitted by different users to the same original document. These structural tags contain both structural and content information of the document blocks. Based on these tags, document blocks in the target version document and the original document are first aligned, and rearrangement and movement operations are effectively identified, avoiding their impact on the merging result. Secondly, based on the aligned document blocks, internal text difference processing is performed according to the content of the document blocks to effectively determine candidate editing merging information. Finally, based on this candidate information, the edits from different users are merged in the original document to generate the target merged document. This concurrent editing and merging process ensures convergence between edits submitted by different users while maintaining the semantic integrity and accuracy of the merging result, thus improving the accuracy of editing and merging results in concurrent editing scenarios.
[0174] In addition, by generating patch files and merging ledgers, it is easy to replay and verify in the same environment, which is convenient for automatic execution in the pipeline and also meets the needs of auditing and accountability.
[0175] The following describes the apparatus, device, and storage medium used to implement the concurrent editing-based document processing method provided in this application. The specific implementation process and technical effects are described above and will not be repeated below.
[0176] Figure 6 This diagram illustrates a document processing device based on concurrent editing, as provided in an embodiment of this application. The functions implemented by this device correspond to the steps executed by the method described above. This device can be understood as the aforementioned server, or the server's processor, or as a component independent of the aforementioned server or processor but under server control that implements the functions of this application, such as... Figure 6 As shown, the device may include: an acquisition module 100, a determination module 200, a generation module 300, and a processing module 400;
[0177] The acquisition module 100 is used to acquire the target version document for each user based on the editing submission requests submitted by different users for the same original document;
[0178] The determination module 200 is used to determine the structure tags of each document block in each target version document and the structure tags of each document block in the original document based on the block information of each document block in each target version document and the block information of each document block in the original document, wherein the block information includes: structured location information, semantic information and persistent identifiers assigned to the document blocks during initialization.
[0179] The determination module 200 is used to determine the same document block between each target version document and the original document based on the structure tags of each document block in each target version document and the structure tags of each document block in the original document;
[0180] The generation module 300 is used to perform text difference processing on the same document block in each target version document based on the same document block in the original document, and generate candidate editing and merging information;
[0181] The processing module 400 is used to edit the original document based on the candidate editing and merging information to obtain the target merged document.
[0182] Optionally, the determining module 200 is specifically used to determine the structural path information of each document block in the target version document and the structural path information of each document block in the original document based on the structured location information of each document block in the target version document and the structured location information of each document block in the original document, respectively.
[0183] Based on the semantic information of each document block in the target version document and the semantic information of each document block in the original document, the semantic summary information of each document block in the target version document and the semantic summary information of each document block in the original document are determined respectively.
[0184] Based on the persistent identifiers assigned to document blocks during initialization, the global identifiers of each document block in the target version document and the global identifiers of each document block in the original document are determined respectively.
[0185] Based on the structural path information, semantic summary information and global identifier of each document block in the target version document, determine the structural tags of each document block in the target version document;
[0186] Based on the structural path information, semantic summary information, and global identifier of each document block in the original document, determine the structural tags of each document block in the original document.
[0187] Optionally, the determining module 200 is specifically used to determine whether there are document blocks with the same structural path information in each target version document and the original document, based on the structural path information of each document block in each target version document and the structural path information of each document block in the original document.
[0188] If so, then the document blocks with the same structural path information in each target version document and the original document shall be regarded as the same document blocks between each target version document and the original document;
[0189] If not, then based on the global identifiers of each document block in each target version document and the global identifiers of each document block in the original document, determine whether there are document blocks with the same global identifier in each target version document and the original document.
[0190] If so, then the document block with the same global identifier in each target version document and the original document will be regarded as the same document block between each target version document and the original document.
[0191] Optionally, it may also include a verification module;
[0192] The verification module is used to verify the same document blocks between the determined target version documents and the original document based on the semantic summary information of the same document block in each target version document and the semantic summary information of the same document block in the original document.
[0193] Optionally, the generation module 300 is further configured to generate a target marker for the target document block if there are target document blocks in each target version document and the original document that have the same global identifier but different structural path information. The target marker is used to indicate that the target document block has undergone move or rearrangement editing.
[0194] Optionally, the generation module 300 is specifically used to perform character-level difference processing on the text content of the same document block in each target version document and the text content of the same document block in the original document to determine the editing information corresponding to each target version document; the editing information includes the editing content performed on the same document block in the target version document;
[0195] Based on the editing information corresponding to each target version document, conflict marker information for the same document block is generated for each target version document; the conflict marker information includes first marker information or second marker information; the first marker information is used to indicate that there is no conflict in the editing of the same document block, and the second marker information is used to indicate that there is a conflict in the editing of the same document block;
[0196] Based on the editing information and conflict marker information corresponding to each target version document, candidate editing and merging information is generated.
[0197] Optionally, the generation module 300 is specifically used to concatenate the editing information corresponding to each target version document if the conflict marking information is the first marking information, and generate candidate editing merge information;
[0198] If the conflict marker information is the second marker information, then the editing information is classified according to the editing information corresponding to each target version document, and candidate editing merge information is generated based on the classification results.
[0199] Optionally, the generation module 300 is also used to generate a patch file based on the target merged document and the original document, wherein the patch file includes at least: the modified document block and the modified content;
[0200] Generate a merged ledger, which includes at least: the structure path information of the document blocks in each target version document of each user who submitted the edit request, the edit content of the document blocks for which the edit was performed, conflict information, and the merge strategy used for the conflicted edits.
[0201] The above-described device is used to execute the method provided in the foregoing embodiments, and its implementation principle and technical effect are similar, so they will not be described again here.
[0202] These modules can be one or more integrated circuits configured to implement the above methods, such as one or more Application Specific Integrated Circuits (ASICs), one or more digital signal processors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs). Alternatively, when a module is implemented using processing element scheduler code, the processing element can be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. Furthermore, these modules can be integrated together as a system-on-a-chip (SOC).
[0203] The modules described above can be connected or communicate with each other via wired or wireless connections. Wired connections may include metal cables, optical fibers, hybrid cables, or any combination thereof. Wireless connections may include connections via LAN, WAN, Bluetooth, ZigBee, or NFC, or any combination thereof. Two or more units can be combined into a single module, and any module can be divided into two or more units. Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems and devices described above can be referred to the corresponding processes in the method embodiments, and will not be repeated here.
[0204] Figure 7 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. The device may be a computing device with data processing capabilities.
[0205] The device includes a processor 801, a storage medium 802, and a bus 803. The storage medium 802 stores program instructions that can be executed by the processor 801. When the electronic device is running, the processor 801 communicates with the storage medium 802 through the bus 803. The processor 801 executes the program instructions to implement the document processing method based on concurrent editing as described in the embodiment.
[0206] The storage medium 802 stores program code, which, when executed by the processor 801, causes the processor 801 to perform various steps in the concurrent editing-based document processing method according to various exemplary embodiments of this application as described in the "Exemplary Methods" section above.
[0207] The processor 801 can be a general-purpose processor, such as a central processing unit (CPU), digital signal processor (DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component, capable of implementing or executing the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly manifested as being executed by a hardware processor, or executed by a combination of hardware and software modules within the processor.
[0208] Storage medium 802, as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The storage medium can include at least one type of storage medium, such as flash memory, hard disk, multimedia card, card-type storage medium, random access memory (RAM), static random access memory (SRAM), programmable read-only memory (PROM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), magnetic storage medium, magnetic disk, optical disk, etc. The storage medium is any other medium capable of carrying or storing desired program code in the form of instructions or data structures that can be accessed by a computer, but is not limited thereto. In the embodiments of this application, storage medium 802 can also be a circuit or any other device capable of implementing storage functions for storing program instructions and / or data.
[0209] Optionally, this application also provides a program product, such as a computer-readable storage medium, including a program that, when executed by a processor, performs the above-described method embodiments.
[0210] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0211] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0212] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or in a combination of hardware and software functional units.
[0213] The integrated units implemented as software functional units described above can be stored in a computer-readable storage medium. These software functional units, stored in a storage medium, include several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute some steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
Claims
1. A document processing method based on concurrent editing, characterized in that, include: Based on the edit submission requests submitted by different users for the same original document, obtain the target version document for each user separately; Based on the block information of each document block in each target version document and the block information of each document block in the original document, the structure tags of each document block in each target version document and the structure tags of each document block in the original document are determined respectively. The block information includes: structured location information, semantic information and persistent identifiers assigned to the document blocks during initialization. Based on the structural tags of each document block in each target version document and the structural tags of each document block in the original document, determine the same document block between each target version document and the original document; Based on the same document block in the original document, perform text difference processing on the same document block in each of the target version documents to generate candidate editing and merging information; Based on the candidate edit and merge information, the original document is edited to obtain the target merged document; The step of performing text difference processing on the same document block in each of the target version documents based on the same document block in the original document to generate candidate edit merging information includes: The text content of the same document block in each of the target version documents is compared with the text content of the same document block in the original document using character-level difference processing to determine the editing information corresponding to each of the target version documents; the editing information includes the editing content performed on the same document block in the target version document; Based on the editing information corresponding to each target version document, conflict marker information for the same document block is generated for each target version document; the conflict marker information includes first marker information or second marker information; the first marker information is used to indicate that there is no conflict in the editing of the same document block, and the second marker information is used to indicate that there is a conflict in the editing of the same document block; Based on the editing information corresponding to each target version document and the conflict marker information, candidate editing and merging information is generated.
2. The method according to claim 1, characterized in that, The step of determining the structural tags of each document block in each target version document and the structural tags of each document block in the original document based on the block information of each document block in each target version document and the block information of each document block in the original document includes: Based on the structured location information of each document block in the target version document and the structured location information of each document block in the original document, respectively, determine the structure path information of each document block in the target version document and the structure path information of each document block in the original document; Based on the semantic information of each document block in the target version document and the semantic information of each document block in the original document, the semantic summary information of each document block in the target version document and the semantic summary information of each document block in the original document are determined respectively. Based on the persistent identifier assigned to the document block during initialization, the global identifier of each document block in the target version document and the global identifier of each document block in the original document are determined respectively. Based on the structural path information, semantic summary information and global identifier of each document block in the target version document, determine the structural tags of each document block in the target version document; Based on the structural path information, semantic summary information, and global identifier of each document block in the original document, the structural tags of each document block in the original document are determined.
3. The method according to claim 2, characterized in that, The step of determining the same document block between each target version document and the original document based on the structure tags of each document block in each target version document and the structure tags of each document block in the original document includes: Based on the structural path information of each document block in each target version document and the structural path information of each document block in the original document, determine whether there are document blocks with the same structural path information in each target version document and the original document; If so, then the document blocks with the same structural path information in each target version document and the original document shall be regarded as the same document block between each target version document and the original document; If not, then based on the global identifiers of each document block in each target version document and the global identifiers of each document block in the original document, determine whether there are document blocks with the same global identifier in each target version document and the original document; If so, then the document block with the same global identifier in each target version document and the original document shall be regarded as the same document block between each target version document and the original document.
4. The method according to claim 2, characterized in that, The method further includes: Based on the semantic summary information of the same document block in each of the target version documents and the semantic summary information of the same document block in the original document, the same document block between the determined target version documents and the original document is verified.
5. The method according to claim 3, characterized in that, The method further includes: If each of the target version documents and the original document contains a target document block with the same global identifier but different structural path information, then a target tag is generated for the target document block. The target tag is used to indicate that the target document block has undergone move or rearrangement editing.
6. The method according to claim 1, characterized in that, The step of generating candidate edit merging information based on the editing information corresponding to each target version document and the conflict marker information includes: If the conflict marker information is the first marker information, then the editing information corresponding to each target version document is concatenated to generate candidate editing merge information; If the conflict marker information is the second marker information, then each editing information is classified according to the editing information corresponding to each target version document, and candidate editing merging information is generated based on the classification results.
7. The method according to claim 1, characterized in that, After obtaining the target merged document, the process also includes: Based on the target merged document and the original document, a patch file is generated, which includes at least: the modified document blocks and the modified content; Generate a merged ledger, which includes at least: the structural path information of the document blocks in each target version document of each user who submitted the edit request, the edit content of the document blocks for which the edit was performed, conflict information, and the merge strategy adopted for the conflicted edits.
8. A document processing device based on concurrent editing, characterized in that, include: The module includes an acquisition module, a determination module, a generation module, and a processing module. The acquisition module is used to acquire the target version document of each user based on the editing submission requests submitted by different users for the same original document; The determining module is used to determine the structural tags of each document block in each target version document and the structural tags of each document block in the original document based on the block information of each document block in each target version document and the block information of each document block in the original document, wherein the block information includes: structured location information, semantic information and persistent identifiers assigned to the document blocks during initialization. The determining module is used to determine the same document block between each target version document and the original document based on the structure tags of each document block in each target version document and the structure tags of each document block in the original document; The generation module is used to perform text difference processing on the same document block in each of the target version documents based on the same document block in the original document, and generate candidate editing and merging information; The processing module is used to edit the original document according to the candidate edit and merge information to obtain the target merged document; The generation module is specifically used to perform character-level difference processing on the text content of the same document block in each of the target version documents and the text content of the same document block in the original document to determine the editing information corresponding to each of the target version documents; the editing information includes the editing content performed on the same document block in the target version document; Based on the editing information corresponding to each target version document, conflict marker information for the same document block is generated for each target version document; the conflict marker information includes first marker information or second marker information; the first marker information is used to indicate that there is no conflict in the editing of the same document block, and the second marker information is used to indicate that there is a conflict in the editing of the same document block; Based on the editing information corresponding to each target version document and the conflict marker information, candidate editing and merging information is generated.
9. An electronic device, characterized in that, include: The device includes a processor, a storage medium, and a bus. The storage medium stores program instructions executable by the processor. When the electronic device is running, the processor communicates with the storage medium via the bus, and the processor executes the program instructions to implement the document processing method based on concurrent editing as described in any one of claims 1 to 7.