Method, device and equipment for optimizing document comparison result and storage medium

CN115759032BActive Publication Date: 2026-06-12IFLYTEK CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: IFLYTEK CO LTD
Filing Date: 2022-11-18
Publication Date: 2026-06-12

Smart Images

Figure CN115759032B_ABST

Patent Text Reader

Abstract

The application discloses a kind of optimization method, device and equipment of document comparison result and storage medium, and the optimization method of document comparison result includes: obtaining the initial comparison result of first document and second document, and including several initial difference items in initial comparison result;Determine the target fragment where each initial difference item is located in target document, and target document is first document or second document;Based on the semantic comparison result of each target fragment and several preset labels, determine target difference item from each target fragment, and target difference item corresponds to each preset label and contains at least one initial difference item;Obtain the optimized comparison result between first document and second document by the set of each target difference item.The above scheme can improve the accuracy of document comparison result.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence technology, and in particular to a method, apparatus, device, and storage medium for optimizing document comparison results. Background Technology

[0002] As informatization levels continue to increase, enterprises are producing more and more document-type deliverables, which consumes significant human and material resources for review. Current technology typically involves manually reviewing different versions of the document, identifying differences, and then making appropriate decisions. However, this requires manual reading of all versions, and the process of reviewing these differences is not particularly enjoyable. Summary of the Invention

[0003] This application provides at least one method, apparatus, device, and storage medium for optimizing document comparison results.

[0004] The first aspect of this application provides a method for optimizing document comparison results, comprising: obtaining initial comparison results of a first document and a second document, wherein the initial comparison results include a plurality of initial difference items; determining the target segment in a target document in which each initial difference item is located, wherein the target document is either the first document or the second document; determining target difference items from each target segment based on semantic comparison results of each target segment with a plurality of preset tags, wherein the target difference items correspond to each preset tag and contain at least one initial difference item; and obtaining a set of each target difference item to obtain an optimized comparison result between the first document and the second document.

[0005] The semantic comparison results include semantic similarity. Based on the semantic comparison results of each target segment with several preset tags, target difference items are determined from each target segment, including: obtaining the semantic similarity between each character in each target segment and each preset tag; and determining target difference items from the target segment based on each semantic similarity.

[0006] Specifically, based on semantic similarity, target difference items are determined from target segments, including: for each preset tag, the first character in each target segment whose semantic similarity to the preset tag is greater than or equal to the preset semantic similarity is determined as the first character of the target difference item corresponding to the preset tag; and the last character in each target segment whose semantic similarity to the preset tag is greater than or equal to the preset semantic similarity is determined as the last character of the target difference item corresponding to the preset tag; in each target segment, the first character, the last character, and the characters between the first character and the last character are taken as candidate target difference items corresponding to the preset tag; and the candidate target difference items containing the initial difference items are taken as target difference items corresponding to the preset tag.

[0007] The method further includes, after taking the candidate target difference items containing the initial difference items as the target difference items corresponding to the preset label, determining whether each character in each target difference item has a greater than preset semantic similarity to the preset label; and in response to the existence of a target character in the target difference item with a semantic similarity less than the preset semantic similarity to the preset label, taking the character adjacent to the target character and facing the side of the initial difference item in the target difference item as the new first character or last character.

[0008] Before determining the target difference items from each target segment based on the semantic comparison results of each target segment with several preset tags, the method further includes: receiving a selection instruction from the user to select preset tags from several tags, the tags being related to the usage scenario of the document; and in response to the selection instruction, using the selected tags as preset tags.

[0009] The method further includes, after obtaining the set of target difference items to obtain the optimized alignment results between the first document and the second document, displaying the initial alignment results and the optimized alignment results. The initial alignment results include the number of initial difference items and / or the position of each initial difference item in the target document. The optimized alignment results include the number of target difference items and / or the position of each target difference item in the target document.

[0010] The process of obtaining the initial comparison results of the first document and the second document includes: obtaining layout resources and comparison resources, wherein the layout resources include the layout information to be compared, and the comparison resources include the comparison methods corresponding to each layout information; classifying the first document and the second document according to the layout information to obtain the text content corresponding to each layout information; comparing the text content corresponding to each layout information using the comparison methods corresponding to each layout information to obtain the initial difference items corresponding to each layout information; and obtaining the initial comparison results of the first document and the second document based on the initial difference items corresponding to each layout information.

[0011] A second aspect of this application provides an optimization apparatus for document comparison results, comprising: an acquisition module for acquiring initial comparison results of a first document and a second document, the initial comparison results including a plurality of initial difference items; a processing module for determining the target segment in a target document where each initial difference item is located, the target document being either the first document or the second document; a target difference item determination module for determining target difference items from each target segment based on semantic comparison results of each target segment with a plurality of preset tags, the target difference items corresponding to each preset tag and including at least one initial difference item; and a result acquisition module for acquiring a set of target difference items to obtain an optimized comparison result between the first document and the second document.

[0012] A third aspect of this application provides an electronic device, including a memory and a processor, wherein the processor is configured to execute program instructions stored in the memory to implement the above-described method for optimizing document comparison results.

[0013] The fourth aspect of this application provides a computer-readable storage medium having program instructions stored thereon, which, when executed by a processor, implement the above-mentioned method for optimizing document comparison results.

[0014] The above scheme, after obtaining the initial comparison results of the first and second documents, uses the semantic comparison results between the target segments where each initial difference item is located and the preset tags to determine the target difference items. Each target difference item includes at least one initial difference item. While retaining the original difference item results, it achieves the effect of integrating the initial difference items, optimizes the initial comparison results, and facilitates user access.

[0015] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this application. Attached Figure Description

[0016] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with this application and, together with the specification, serve to explain the technical solutions of this application.

[0017] Figure 1 This is a flowchart illustrating an embodiment of the method for optimizing the document comparison results of this application.

[0018] Figure 2 This is a schematic diagram of a sub-process of step S13, illustrating an embodiment of the method for optimizing the document comparison results of this application.

[0019] Figure 3 This is an embodiment of an optimization method for the document comparison results of this application, illustrating a schematic diagram of the workflow of the comparison model;

[0020] Figure 4 This is a schematic diagram illustrating the display method of an embodiment of the method for optimizing the document comparison results of this application;

[0021] Figure 5 This is a schematic diagram of an embodiment of the device for optimizing the document comparison results of this application;

[0022] Figure 6 This is a schematic diagram of the structure of an embodiment of the electronic device of this application;

[0023] Figure 7 This is a schematic diagram of the structure of an embodiment of the computer-readable storage medium of this application. Detailed Implementation

[0024] The embodiments of this application will now be described in detail with reference to the accompanying drawings.

[0025] In the following description, specific details such as particular system architectures, interfaces, and technologies are presented for illustrative purposes rather than for limiting purposes, in order to provide a thorough understanding of this application.

[0026] In this document, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Additionally, the character " / " generally indicates that the preceding and following related objects have an "or" relationship. Furthermore, "many" in this document means two or more. Moreover, the term "at least one" in this document means any combination of at least two of any one or more of a plurality of objects. For example, including at least one of A, B, and C can mean including any one or more elements selected from the set consisting of A, B, and C.

[0027] Please see Figure 1 , Figure 1 This is a flowchart illustrating an embodiment of a method for optimizing the document comparison results of this application. Specifically, it may include the following steps:

[0028] Step S11: Obtain the initial comparison results of the first document and the second document. The initial comparison results include several initial differences.

[0029] The initial comparison results of the first and second documents can be obtained by transmitting them from other devices to this device, or by obtaining the results through document comparison performed by this device on the first and second documents. Here, "this device" refers to a device capable of executing the optimization method for document comparison results described in the embodiments of this disclosure.

[0030] The initial difference item is the distinction between the first document and the second document. For example, if the content of the first document is "12,500 yuan" and the content of the second document is "12,600 yuan", then the difference item between the first document and the second document is the 5th digit.

[0031] Step S12: Determine the target segment in the target document where each initial difference item is located. The target document is either the first document or the second document.

[0032] The target segment can be the paragraph containing the initial difference, the line containing the initial difference, the line containing the initial difference plus a preset number of lines before and after it, or even a few characters before and after the initial difference. For example, the preset number can be one line or more.

[0033] In some application scenarios, the target document is the first document. In other application scenarios, the target document is the second document. For example, if the first document is a template document and the second document is a document to be confirmed, then the target document can be the document to be confirmed, or the target document can be the template document. Whether the target document is the first or the second document is not specifically defined here.

[0034] Step S13: Based on the semantic comparison results of each target segment with several preset tags, determine the target difference items from each target segment. The target difference items correspond to each preset tag and contain at least one initial difference item.

[0035] The semantic comparison results between each target fragment and several preset tags can be obtained by performing a semantic comparison between each target fragment and each preset tag separately. The preset tags can be tags that need to be defined in any scenario; for example, preset tags can be time, address, amount, date, phone number, ID number, etc. Depending on the scenario, preset tags can also be book titles, movie titles, furniture names, etc. The specific types of preset tags are not specifically specified here.

[0036] The target difference item corresponding to the preset tag can be that the target difference item belongs to the preset tag. For example, part of the content of the first document is "This is Street C, District B, City A", part of the content of the second document is "This is Street C, District D, City A", the target document is the first document, the target fragment is "This is Street C, District B, City A", the preset tag is the address, and the initial difference item is the 6th character "B". Then the target difference item can be Street C, District B, City A.

[0037] Step S14: Obtain the set of each target difference item to get the optimized comparison result between the first document and the second document.

[0038] Optionally, the set of target differences can be used as the optimized comparison result between the first and second documents. Alternatively, based on the position of each target difference in the target fragment, the position of each target difference in the target document can be further obtained, and then the set of target differences and the position of each target difference in the document can be used as the optimized comparison result between the first and second documents. Of course, in other embodiments, the specific method of obtaining the optimized comparison result can be determined according to the user's needs.

[0039] The above scheme, after obtaining the initial comparison results of the first and second documents, uses the semantic comparison results between the target segments where each initial difference item is located and the preset tags to determine the target difference items. Each target difference item includes at least one initial difference item. While retaining the original difference item results, it achieves the effect of integrating the initial difference items, optimizes the initial comparison results, and facilitates user access.

[0040] In some publicly available embodiments, the semantic comparison results include semantic similarity. See also... Figure 2 , Figure 2 This is a schematic diagram of a sub-process of step S13, illustrating an embodiment of the method for optimizing the document comparison results of this application. For example... Figure 2 As shown, step S13 above may include the following steps:

[0041] Step S131: Obtain the semantic similarity between each character in each target segment and each preset tag.

[0042] One method for obtaining the semantic similarity between each character in each target segment and each preset tag can be: encoding each character and each preset tag separately, and then taking the inner product of the encoding of each character and the encoding of each preset tag to obtain the semantic similarity between the character and each preset tag. The encoding method can be to encode both the content of the character and its position in the target segment. Of course, this method of calculating semantic similarity is only one example. In other embodiments, the method of calculating semantic similarity can also be by clustering each character and each preset tag, and determining the similarity based on the distance between each character and the preset tag in the clustering results. In some embodiments, any method other than clustering that can calculate the similarity between the two can also be used; no specific limitation is made here.

[0043] Step S132: Based on the semantic similarity, determine the target difference items from the target segment.

[0044] The method for determining target difference items from target segments based on semantic similarity can be as follows: For each preset tag, determine the first character in each target segment whose semantic similarity to the preset tag is greater than or equal to the preset semantic similarity, and use it as the first character of the target difference item corresponding to the preset tag. Also, determine the last character in each target segment whose semantic similarity to the preset tag is greater than or equal to the preset semantic similarity, and use it as the last character of the target difference item corresponding to the preset tag. Within each target segment, the first character, the last character, and the characters between the first and last characters are used as candidate target difference items corresponding to the preset tag. Then, the candidate target difference items containing the initial difference items are used as the target difference items corresponding to the preset tag.

[0045] For example, the target segment includes 10 characters, with a preset label 'a'. The similarity scores between these 10 characters and the preset label 'a' are 0.3, 0.5, 0.8, 0.4, 0.8, 0.9, 0.7, 0.6, 0.5, and 0.4, respectively. If the preset semantic similarity is 0.6, then the first character in the target segment with a semantic similarity greater than or equal to the preset label 'a' is the 3rd character from left to right, and the last character with a semantic similarity greater than or equal to the preset label 'a' is the 8th character from left to right. The characters between the first and last characters are the characters between the 3rd and 8th characters, and the candidate target difference items are the characters between the 3rd and 8th characters. Wherein, if the initial difference item is between the 3rd and 8th characters, then this candidate difference item is the target difference item corresponding to the preset label 'a'. If the initial difference item is not between the 3rd and 8th characters, then the candidate difference item is not the target difference item corresponding to the preset label 'a'.

[0046] After selecting candidate target differences containing initial differences as target differences corresponding to preset labels, the following steps can be performed: Determine whether each character in each target difference has a greater than preset semantic similarity to the preset label. In response to the presence of a target character in a target difference with a semantic similarity less than the preset label, the character adjacent to the target character and facing the initial difference in the target difference is taken as the new first or last character.

[0047] Following the previous example, the target segment contains 10 characters, with the preset label 'a'. The similarity between these 10 characters and the preset label 'a' is 0.3, 0.5, 0.8, 0.4, 0.8, 0.9, 0.7, 0.6, 0.5, and 0.4, respectively. The preset semantic similarity is 0.6. If the initial difference item is the 6th character, then the 3rd to 8th characters are the target difference items corresponding to the preset label 'a'. The similarity between the 2nd character in the target difference image and the preset label 'a' is determined to be 0.4. Therefore, this character is identified as the target character, and the character adjacent to this character and facing the initial difference item is taken as the new first character of the target difference item. Currently, the new first character corresponds to the 5th character in the original target segment; that is, the new target difference item consists of characters between the 5th and 8th characters in the original target segment. This example illustrates using a character adjacent to the target character and facing the initial difference item in the target difference item as the new first character. The example of using a character as the new last character is similar and will not be elaborated further here.

[0048] In some publicly available embodiments, target differences can be filtered in other ways. For example, the similarity between adjacent characters can be used to determine whether adjacent characters belong to the same preset tag. If the similarity is greater than or equal to a preset similarity, the adjacent characters can be considered to belong to the same preset tag. If the similarity between adjacent characters is less than the preset similarity, the adjacent characters are considered not to belong to the same preset tag. The preset similarity can be a preset semantic similarity or can be set according to needs. In addition, in some embodiments, target differences can be filtered in other ways. For example, the preset tag is a mobile phone number tag. Generally, a mobile phone number is 11 digits long. If the detected target difference only has 2 digits, it is obvious that the number does not belong to the mobile phone number tag, so the target difference can be directly discarded. Optionally, although the target difference does not correspond to the preset tag, it still includes the initial difference. The initial difference can be displayed in the final optimized comparison result by tagging or other means for user viewing. In other embodiments, the target difference can be directly discarded and not reflected in the optimized comparison result.

[0049] Prior to performing step S13, the method further includes receiving a selection instruction from a user to choose a preset tag from several tags. The tags are related to the document's usage scenario. In response to the selection instruction, the selected tag is used as the preset tag.

[0050] For example, tags may include time, address, amount, date, phone number, ID number, etc. Users can select one or more tags as preset tags. For example, users can select amount and date as preset tags. Tags are related to the use scenario of the document; specifically, if scenario one focuses on the amount in the document, the amount can be set as the tag; if scenario two focuses on the address in the document, the address can be set as the tag.

[0051] In some disclosed embodiments, step S13 described above can be performed by the alignment model. For a better understanding of the alignment model's operation, please refer to [link to relevant documentation]. Figure 3 , Figure 3 This is an embodiment of an optimization method for the document comparison results of this application, illustrating a schematic diagram of the workflow of the comparison model. For example... Figure 3 As shown, the input to the comparison model can include preset labels and target segments. The comparison model first encodes the characters in the preset labels and target segments respectively, and then performs an inner product of the encoded preset labels and each character in the target segment to obtain the semantic similarity between each character and each preset label. Figure 3The target segment × preset label N refers to the inner product of each character in the target segment and the preset label N. Then, based on the semantic similarity, the first character, the last character, and the characters between the first and last characters of the target difference item are determined, and the target difference item is output.

[0052] The document comparison result optimization method provided in this embodiment may further include a comparison model training step. The training step includes defining labels, which can be determined according to the scenario; for example, labels can be time, address, amount, date, phone number, and ID number, etc. Then, relevant data is collected for model training. The training task involves extracting relevant segments with corresponding labels from the input text content. The model has three main training objectives: the start, end, and intermediate sequences of the answer. The answer refers to the extracted relevant segments. In other embodiments, the model training objective can also be the answer; in other words, the model can directly output relevant segments containing the start, end, and intermediate sequences of the answer. After obtaining these three training objectives, the final answer can be obtained through post-processing. Post-processing includes filtering the answer by matching the beginning and end of the answer, removing bad answers, etc., and updating the start or end of the answer. Specific filtering methods can refer to the above-described updating methods for target differences, which will not be repeated here.

[0053] In some disclosed embodiments, after performing step S14 above, the following steps may also be performed: displaying the initial alignment results and the optimized alignment results. The initial alignment results also include the number of initial differences and / or the position of each initial difference in the target document. The optimized alignment results include the number of target differences and / or the position of each target difference in the target document.

[0054] In some application scenarios, the initial alignment results and optimized alignment results can be displayed in a way that shows part or all of their contents. Displaying all contents can include showing the corresponding difference items (target difference items or initial difference items), the number of each difference item, and its location in the target document. Displaying only portions can include showing only the corresponding difference items and their quantities, without showing their locations in the target document, or only showing the corresponding difference items and their locations in the target document, without showing their quantities, etc.

[0055] To better understand the methods for displaying the initial alignment results and optimized alignment results proposed in the embodiments of this disclosure, please refer to... Figure 4 , Figure 4 This is a schematic diagram illustrating the display method of an embodiment of the optimization method for comparing the document comparison results of this application. For example... Figure 4As shown, the table displays the baseline document (which can be the first document) and the comparison document (which can be the second document) in the first column. The second column shows the initial differences in the initial comparison results. The third column shows the number of initial differences in the initial comparison results. The fourth column shows the target differences in the optimized comparison results. The fifth column shows the number of target differences in the optimized comparison results. Taking the baseline document as June 12, 2022, and the comparison document as April 25, 2022, the initial comparison results specifically indicate that the initial differences include the 6th character and the consecutive 8th and 9th characters, totaling two initial differences. The optimized comparison results show that the target differences are from the first character to the last character, i.e., the target difference is April 25, 2022. Therefore, the optimized comparison results contain one target difference.

[0056] The initial comparison results of the first and second documents can be obtained by acquiring layout resources and comparison resources. Layout resources include the layout information to be compared, and comparison resources include the comparison methods corresponding to each layout information. Then, the first and second documents are classified according to their layout information to obtain the text content corresponding to each layout information. Next, the text content corresponding to each layout information is compared using the comparison methods corresponding to each layout information to obtain the initial differences for each layout information. Finally, based on the initial differences for each layout information, the initial comparison results of the first and second documents are obtained.

[0057] Different manufacturers offer different resources (layout resources and comparison resources). By obtaining the resource identifier input by the user, the corresponding layout and comparison resources can be retrieved. Based on layout information, the first and second documents can be classified into layout types such as text extraction, cover extraction, header extraction, footer extraction, table of contents extraction, table extraction, stamp extraction, and handwriting extraction. In other words, layout includes the document's main text layout, cover layout, header layout, etc. Using the comparison methods corresponding to each layout information, the text content corresponding to each layout information is compared to obtain the initial differences. Specifically, the main text of the two documents can be compared using the corresponding comparison method to obtain the initial differences in the main text, and the header can be compared using the corresponding comparison method to obtain the initial differences in the header, and so on. Other layouts are similar and will not be elaborated on here.

[0058] After obtaining the initial differences for each layout, these initial differences can be filtered. For example, filtering can be based on user attention levels, which can be pre-inputted. For instance, users can select layouts; unselected layouts have lower attention levels, and the initial differences corresponding to these lower-attrition layouts can be discarded. The system can also receive user-imported filtering logic, and filtering based on this logic yields the initial comparison results for the first and second documents.

[0059] In some disclosed embodiments, before classifying the first and second documents based on layout information to obtain the text content corresponding to each layout information, the following steps may also be performed:

[0060] The data required for document comparison is typically Word, PDF, or image files containing paragraphs or tables. Because OCR recognition can be performed on both the baseline and comparison documents, the input data is structured, facilitating subsequent comparison. To prevent incomplete data during file parsing, all input data formats can be converted into images through various operations before OCR recognition.

[0061] Then, the OCR-recognized content from the two documents is concatenated. The watermark position and the proportion of each character covered by the watermark in the watermark frame are then detected. Based on this proportion, the characters covered by the watermark are normalized to obtain a format that can be compared.

[0062] In some application scenarios, document comparison results can be executed by a document comparison system that integrates semantic granularity extraction capabilities. The text content of the line containing the initial difference item is fed into the semantic granularity extraction capability to identify the semantic granularity information. The semantic granularity information includes the semantic granularity entity and its specific position information in the line. In other words, it can identify the specific content of each character and the specific position of each character in the line. In this way, the corresponding target difference item can be obtained by calculating the semantic similarity with the preset tags.

[0063] Furthermore, this solution optimizes the comparison results by standardizing the initial differences and adding semantic granularity information (target differences and their number, etc.) for easier user viewing. It fully considers the semantic granularity information of the differences and standardizes and merges the results. While retaining the initial comparison results, the system adds semantic-level document comparison results, allowing users to select the comparison mode according to their specific scenarios, thus improving user satisfaction with the system.

[0064] In addition, this solution optimizes the initial comparison results by using preset labels after obtaining the initial results. Compared with the method of adding preset labels to obtain the initial comparison results when comparing two documents for the first time, this solution has less computation and faster comparison speed.

[0065] The entity executing the document comparison result optimization method can be a document comparison result optimization device. For example, the document comparison result optimization method can be executed by a terminal device, server, or other processing device. The terminal device can be a user equipment (UE), computer, mobile device, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (PDA), handheld device, computing device, in-vehicle device, wearable device, etc. In some possible implementations, the document comparison result optimization method can be implemented by a processor calling computer-readable instructions stored in memory.

[0066] Please see Figure 5 , Figure 5 This is a schematic diagram of an embodiment of the document comparison result optimization device of this application. The document comparison result optimization device 20 includes an acquisition module 21, a processing module 22, a target difference item determination module 23, and a result acquisition module 24. The acquisition module 21 is used to acquire the initial comparison results of the first document and the second document, the initial comparison results including a plurality of initial difference items; the processing module 22 is used to determine the target segment in the target document where each initial difference item is located, the target document being either the first document or the second document; the target difference item determination module 23 is used to determine the target difference item from each target segment based on the semantic comparison results of each target segment with a plurality of preset tags, the target difference item corresponding to each preset tag and containing at least one initial difference item; the result acquisition module 24 is used to acquire the set of each target difference item to obtain the optimized comparison result between the first document and the second document.

[0067] The above scheme, after obtaining the initial comparison results of the first document and the second document, uses the semantic comparison results between the target segment where each initial difference item is located and the preset tag to determine the target difference item. Each target difference item includes at least one initial difference item. On the basis of retaining the original difference item results, it achieves the effect of integrating the initial difference items and optimizes the initial comparison results.

[0068] The functions of each module can be found in the implementation example of the optimization method for document comparison results, and will not be repeated here.

[0069] Please see Figure 6 , Figure 6This is a schematic diagram of the structure of an embodiment of the electronic device of this application. The electronic device 30 includes a memory 31 and a processor 32. The processor 32 is used to execute program instructions stored in the memory 31 to implement the steps in any of the above-described document comparison result optimization method embodiments. In a specific implementation scenario, the electronic device 30 may include, but is not limited to, a microcomputer or a server. In addition, the electronic device 30 may also include mobile devices such as laptops and tablets, which are not limited here.

[0070] Specifically, processor 32 controls itself and memory 31 to implement the steps in the method embodiment for optimizing the document comparison results described above. Processor 32 can also be referred to as a CPU (Central Processing Unit). Processor 32 may be an integrated circuit chip with signal processing capabilities. Processor 32 can also be a general-purpose processor, digital signal processor (DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. A general-purpose processor can be a microprocessor or any conventional processor. Furthermore, processor 32 can be implemented using integrated circuit chips.

[0071] The above scheme, after obtaining the initial comparison results of the first document and the second document, uses the semantic comparison results between the target segment where each initial difference item is located and the preset tag to determine the target difference item. Each target difference item includes at least one initial difference item. On the basis of retaining the original difference item results, it achieves the effect of integrating the initial difference items and optimizes the initial comparison results.

[0072] Please see Figure 7 , Figure 7 This is a schematic diagram of a computer-readable storage medium according to an embodiment of the present application. The computer-readable storage medium 40 stores program instructions 41 that can be executed by a processor. The program instructions 41 are used to implement the steps in any of the above-described document comparison result optimization method embodiments.

[0073] The above scheme, after obtaining the initial comparison results of the first document and the second document, uses the semantic comparison results between the target segment where each initial difference item is located and the preset tag to determine the target difference item. Each target difference item includes at least one initial difference item. On the basis of retaining the original difference item results, it achieves the effect of integrating the initial difference items and optimizes the initial comparison results.

[0074] In some embodiments, the functions or modules of the apparatus provided in this disclosure can be used to perform the methods described in the above method embodiments. The specific implementation can be referred to the description of the above method embodiments, and for the sake of brevity, it will not be repeated here.

[0075] The description of the various embodiments above tends to emphasize the differences between the various embodiments. The similarities or similarities between them can be referred to, and for the sake of brevity, they will not be repeated here.

[0076] In the several embodiments provided in this application, it should be understood that the disclosed methods and apparatus can be implemented in other ways. For example, the apparatus implementations described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection of devices or units may be electrical, mechanical, or other forms.

[0077] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

Claims

1. A method for optimizing document comparison results, characterized in that, include: Obtain the initial comparison results of the first document and the second document, wherein the initial comparison results include several initial difference items; Determine the target segment in the target document where each of the initial differences is located, where the target document is either the first document or the second document; Based on the semantic comparison results of each target segment with several preset tags, target difference items are determined from each target segment. The target difference items correspond to each preset tag and include at least one initial difference item. The optimized comparison result between the first document and the second document is obtained by acquiring the set of each of the target difference items; The semantic comparison result includes semantic similarity. The step of determining target difference items from each target segment based on the semantic comparison results of each target segment with several preset tags includes: Semantic similarity between each character in each target segment and each preset tag is obtained; For each preset tag, the first character in each target segment whose semantic similarity to the preset tag is greater than or equal to the preset semantic similarity is determined as the first character of the target difference item corresponding to the preset tag, and the last character in each target segment whose semantic similarity to the preset tag is greater than or equal to the preset semantic similarity is determined as the last character of the target difference item corresponding to the preset tag. In each target segment, the first character, the last character, and the characters between the first character and the last character are used as candidate target differences corresponding to the preset tag; The candidate target difference items containing the initial difference items are used as the target difference items corresponding to the preset labels.

2. The method according to claim 1, characterized in that, After selecting the candidate target difference item containing the initial difference item as the target difference item corresponding to the preset label, the method further includes: Determine whether each character in each of the target difference items has a greater than the preset semantic similarity to the preset label; In response to the presence of a target character in the target difference item whose semantic similarity to the preset tag is less than the preset semantic similarity, the character adjacent to the target character and facing the side of the initial difference item in the target difference item is taken as the new first character or last character.

3. The method according to any one of claims 1-2, characterized in that, Before determining the target difference item from each target segment based on the semantic comparison results of each target segment with a plurality of preset tags, the method further includes: Receives a user's selection instruction to choose a preset tag from several tags, wherein the tags are related to the document's usage scenario; In response to the selection instruction, the selected tag is used as the preset tag.

4. The method according to any one of claims 1-2, characterized in that, After obtaining the optimized alignment result between the first document and the second document by acquiring the set of each of the target difference items, the method further includes: The initial alignment results and the optimized alignment results are displayed. The initial alignment results also include the number of initial differences and / or the position of each initial difference in the target document. The optimized alignment results include the number of target differences and / or the position of each target difference in the target document.

5. The method according to any one of claims 1-2, characterized in that, The process of obtaining the initial comparison results of the first document and the second document includes: Acquire layout resources and comparison resources, wherein the layout resources include layout information to be compared, and the comparison resources include comparison methods corresponding to each of the layout information to be compared; Based on the layout information, the first document and the second document are classified according to their layout to obtain the text content corresponding to each layout information. Using the comparison method corresponding to each of the layout information, the text content corresponding to each of the layout information is compared to obtain the initial difference item corresponding to each of the layout information; Based on the initial differences corresponding to each of the aforementioned layout information, the initial comparison results of the first document and the second document are obtained.

6. An optimization device for document comparison results, characterized in that, include: The acquisition module is used to acquire the initial comparison results of the first document and the second document, wherein the initial comparison results include several initial difference items; The processing module is used to determine the target segment in the target document where each of the initial difference items is located, wherein the target document is the first document or the second document; The target difference item determination module is used to determine target difference items from each target segment based on the semantic comparison results of each target segment with a number of preset tags. The target difference items correspond to each preset tag and include at least one initial difference item. The result acquisition module is used to acquire a set of each of the target difference items to obtain the optimized comparison result between the first document and the second document; The semantic comparison result includes semantic similarity. The step of determining target difference items from each target segment based on the semantic comparison results of each target segment with several preset tags includes: Semantic similarity between each character in each target segment and each preset tag is obtained; For each preset tag, the first character in each target segment whose semantic similarity to the preset tag is greater than or equal to the preset semantic similarity is determined as the first character of the target difference item corresponding to the preset tag, and the last character in each target segment whose semantic similarity to the preset tag is greater than or equal to the preset semantic similarity is determined as the last character of the target difference item corresponding to the preset tag. In each target segment, the first character, the last character, and the characters between the first character and the last character are used as candidate target differences corresponding to the preset tag; The candidate target difference items containing the initial difference items are used as the target difference items corresponding to the preset labels.

7. An electronic device, characterized in that, The method includes a memory and a processor, the processor being configured to execute program instructions stored in the memory to implement the method according to any one of claims 1 to 5.

8. A computer-readable storage medium having program instructions stored thereon, characterized in that, When the program instructions are executed by the processor, they implement the method described in any one of claims 1 to 5.