Text sequence alignment method and device, electronic equipment and computer readable medium
By generating a comparison identifier matrix variable and a text relationship array, segment-to-segment text sequence alignment was achieved, solving the accuracy problem in point-to-point comparison and improving the accuracy and efficiency of the comparison results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- PAI TECH CO LTD
- Filing Date
- 2022-04-08
- Publication Date
- 2026-06-30
AI Technical Summary
Existing point-to-point text sequence alignment methods result in poor accuracy of the alignment results, and users need to spend time searching for non-corresponding text objects.
By generating a comparison identifier pair matrix variable, the comparison identifier pair groups of the text information to be compared and the initial text information are determined. The comparison identifier pairs that meet the preset conditions are selected according to the text relationship array, and the pairs are aligned and displayed in a segment-to-segment manner on the display device.
It improves the accuracy of text sequence alignment results, saves users time in finding non-corresponding text objects, and increases alignment efficiency.
Smart Images

Figure CN116933758B_ABST
Abstract
Description
Technical Field
[0001] Embodiments of this disclosure relate to the field of computer technology, and more particularly to text sequence alignment methods, apparatus, electronic devices, and computer-readable media. Background Technology
[0002] Text sequence alignment is the process of arranging two or more text sequences according to certain rules to determine their similarity or even homology. Currently, the common method for aligning text sequences is to arrange them using a point-to-point alignment method.
[0003] However, when comparing text sequences using the above method, the following technical problems often arise:
[0004] First, in point-to-point alignment, each alignment involves only two point-level text objects. Point-level text objects cannot represent the local context, resulting in poor accuracy of text sequence alignment results.
[0005] Second, when the text sequences are not arranged in a one-to-one correspondence, users need to spend a considerable amount of time searching for another text object that is aligned with a text object, resulting in a waste of user time. Summary of the Invention
[0006] The summary portion of this disclosure is intended to provide a brief overview of the concepts, which will be described in detail in the detailed description portion. This summary portion is not intended to identify key or essential features of the claimed technical solutions, nor is it intended to limit the scope of the claimed technical solutions.
[0007] Some embodiments of this disclosure provide text sequence alignment methods, apparatuses, electronic devices, and computer-readable media to address one or more of the technical problems mentioned in the background section above.
[0008] In a first aspect, some embodiments of this disclosure provide a text sequence alignment method, the method comprising: generating an alignment identifier pair matrix variable based on an initial text information quantity and a text information quantity to be aligned, wherein the initial text information quantity is the quantity of initial text information included in the initial text information sequence, and the text information quantity to be aligned is the quantity of text information to be aligned included in the text information sequence to be aligned; for a first identifier number of each text information to be aligned in the text information sequence to be aligned, and a second identifier number of each initial text information in the initial text information sequence, performing the following steps: determining a pair of alignment identifier pairs corresponding to the text information to be aligned and the initial text information; generating an alignment identifier pair matrix variable based on the text information sequence to be aligned and the initial text information sequence. The text correlation coefficients corresponding to each pair of identifiers in the pair to be compared are described to obtain a text relational relationship array; based on the text relational relationship array, the pair of identifiers that meet the preset correlation coefficient conditions are selected from the pair to be compared as the pair to be compared; based on the first identifier and the second identifier, the pair to be compared is updated to the pair to be compared matrix variable; based on the updated pair to be compared matrix variable, a text alignment information group is generated, wherein the text alignment information in the text alignment information group includes the text information segment to be compared and the initial text information segment corresponding to the text information segment to be compared; the text information segment to be compared and the initial text information segment included in each text alignment information in the associated display device are aligned and displayed.
[0009] Secondly, some embodiments of this disclosure provide a text sequence alignment apparatus, comprising: a first generation unit configured to generate a comparison identifier pair matrix variable based on an initial text information quantity and a quantity of text information to be compared, wherein the initial text information quantity is the quantity of initial text information included in the initial text information sequence, and the quantity of text information to be compared is the quantity of text information to be compared included in the text information sequence; and an execution unit configured to, for each first identifier of text information to be compared in the text information sequence to be compared, and a second identifier of each initial text information in the initial text information sequence, perform the following steps: determining a pair of comparison identifier pairs corresponding to the text information to be compared and the initial text information; generating an upper... The text association coefficients corresponding to each pair of identifiers to be compared in the group of identifiers to be compared are described to obtain a text relationship association array; based on the text relationship association array, the pair of identifiers to be compared that meets the preset association coefficient conditions are selected from the group of identifiers to be compared as the comparison identifier pairs; based on the first identifier and the second identifier, the comparison identifier pairs are updated to the comparison identifier pair matrix variable; the second generation unit is configured to generate a text alignment information group based on the updated comparison identifier pair matrix variable, wherein the text alignment information in the text alignment information group includes a text information segment to be compared and an initial text information segment corresponding to the text information segment to be compared; the display unit is configured to display the text information segment to be compared and the initial text information segment included in each text alignment information in the text alignment information group in an aligned display device.
[0010] Thirdly, some embodiments of this disclosure provide an electronic device, including: one or more processors; and a storage device having one or more programs stored thereon, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the method described in any implementation of the first aspect above.
[0011] Fourthly, some embodiments of this disclosure provide a computer-readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.
[0012] The above embodiments of this disclosure have the following beneficial effects: the text sequence alignment method of some embodiments of this disclosure improves the accuracy of text sequence alignment results. Specifically, the reason for the poor accuracy of text sequence alignment results is that in the point-to-point alignment method, each alignment involves only two point-level text objects. Point-level text objects cannot represent the local context, resulting in poor accuracy of text sequence alignment results. Based on this, the text sequence alignment method of some embodiments of this disclosure first generates an alignment identifier pair matrix variable based on the initial text information quantity and the quantity of text information to be compared. Wherein, the initial text information quantity is the number of initial text information included in the initial text information sequence, and the quantity of text information to be compared is the number of text information to be compared included in the text information sequence to be compared. Thus, the alignment identifier pair matrix variable can be used to store the alignment identifier pairs for sequence alignment. Then, for the first identifier number of each text information to be compared in the text information sequence to be compared, and the second identifier number of each initial text information in the initial text information sequence, the following steps are performed: First, determine the pair of alignment identifier pairs corresponding to the text information to be compared and the initial text information. The second step involves generating text correlation coefficients for each pair of text to be compared in the aforementioned comparison sequence and the aforementioned initial text sequence, thus obtaining a text relational array. This text relational array can be used as the basis for determining the comparison pairs. The third step involves selecting, based on the text relational array, the comparison pairs that satisfy the preset correlation coefficient conditions from the aforementioned comparison pair group as the comparison pairs. The fourth step involves updating the comparison pairs to the comparison pair matrix variable based on the aforementioned first and second identifiers. This allows the determined comparison pairs to be added to the comparison pair matrix variable. Then, based on the updated comparison pair matrix variable, a text alignment information group is generated. The text alignment information in this group includes the text segment to be compared and the corresponding initial text segment. Therefore, the text segment to be compared and the corresponding initial text segment in each text alignment information can be compared in a segment-to-segment alignment manner. Finally, the text information segments to be compared and the initial text information segments included in each of the aforementioned text alignment information groups are aligned and displayed on the associated display device. Thus, the displayed text information segments to be compared and the initial text information segments present an aligned display effect, allowing for text sequence comparison in a segment-to-segment alignment manner. Furthermore, because each alignment in the segment-to-segment alignment method involves at least two point-level text objects, and these segment-level text objects can represent the local context, the accuracy of the text sequence comparison results is improved. Attached Figure Description
[0013] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and elements are not necessarily drawn to scale.
[0014] Figure 1 This is a flowchart of some embodiments of the text sequence alignment method according to the present disclosure;
[0015] Figure 2 This is a schematic diagram of the structure of some embodiments of the text sequence alignment apparatus according to the present disclosure;
[0016] Figure 3 This is a schematic diagram of the structure of an electronic device suitable for implementing some embodiments of the present disclosure. Detailed Implementation
[0017] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.
[0018] It should also be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings. Unless otherwise specified, the embodiments and features described in this disclosure can be combined with each other.
[0019] It should be noted that the concepts of "first" and "second" mentioned in this disclosure are used only to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or their interdependencies.
[0020] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".
[0021] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
[0022] This disclosure will now be described in detail with reference to the accompanying drawings and embodiments.
[0023] Figure 1A flow 100 is shown, illustrating some embodiments of a text sequence alignment method according to this disclosure. The text sequence alignment method includes the following steps:
[0024] Step 101: Generate a comparison identifier matrix variable based on the initial number of text information and the number of text information to be compared.
[0025] In some embodiments, the execution entity (e.g., a client) of the text sequence alignment method can generate a comparison identifier matrix variable based on the initial text information quantity and the quantity of text information to be compared. Here, the initial text information quantity refers to the number of initial text information items included in the initial text information sequence. The quantity of text information to be compared refers to the number of text information items included in the text information sequence to be compared. The text information sequence to be compared can be a sequence of text currently input by the user. Alternatively, the initial text information sequence can be a sequence of text input by the user at a historical time. For example, the text information sequence to be compared can be x = [a, b, d, e, fgh]. Then the quantity of text information to be compared is 5. For example, the initial text information sequence can be y = [b, c, de, f, g, h]. Then the quantity of initial text information is 6.
[0026] In practice, the aforementioned executing entity can determine the initial value of the comparison identifier matrix variable as a zero-identifier pair matrix, with the number of columns being the sum of the number of text information to be compared and 1, and the number of rows being the sum of the initial number of text information and 1. The zero-identifier pair matrix can be a matrix with all elements being (0,0). It can be understood that the aforementioned comparison identifier matrix variable can be considered a matrix variable, with its initial value being a zero matrix. As an example, the comparison identifier matrix variable can be a 7×6 zero-identifier pair matrix.
[0027] Optionally, before step 101, the executing entity may first receive the text information sequence to be compared. Then, in response to determining that the text information sequence to be compared is empty, it may control the audio playback device connected to the communication connection to play a preset re-entry prompt tone. The audio playback device may be a speaker built into the executing entity. The preset re-entry prompt tone may be a prompt tone prompting the user to re-enter the text information sequence to be compared. For example, the preset re-entry prompt tone may be an audio message saying "Please re-enter the text to be compared." Thus, when the received text information sequence to be compared is empty, the user may be prompted to re-enter the text information sequence to be compared.
[0028] Step 102: For the first identifier of each text information to be compared in the text information sequence to be compared, and the second identifier of each initial text information in the initial text information sequence, perform the following steps:
[0029] Step 1021: Determine the pair of identification pairs to be compared, corresponding to the text information to be compared and the initial text information.
[0030] In some embodiments, the executing entity may determine the pair of identification numbers corresponding to the text information to be compared and the initial text information in various ways. The first identification number may be an identification number of the text information to be compared that starts with a value of 1 in the sequence of text information to be compared. The second identification number may be an identification number of the initial text information that starts with a value of 1 in the sequence of initial text information.
[0031] In practice, the aforementioned executing entity can determine the first identifier to be compared as the difference between the first identifier and 1. Then, it can determine the second identifier to be compared as the difference between the second identifier and 1. Next, the first identifier and the second identifier to be compared are sequentially combined into a pair of identifiers to be compared. Then, the first identifier and the second identifier to be compared are sequentially combined into another pair of identifiers to be compared. Finally, the combined pairs of identifiers to be compared are sequentially combined into a group of identifiers to be compared.
[0032] Optionally, before sequentially combining the combined pairs of identifiers to be compared into a group of identifier pairs to be compared, the aforementioned execution entity may perform the following steps:
[0033] The first step is to generate a target first identifier group based on the quantity threshold within the text segment to be compared. This quantity threshold can be the maximum number of text information items included in each segment-level text object corresponding to the text information sequence to be compared. A segment-level text object can be a text object composed of at least one piece of text information. This quantity threshold can be set by the user. In practice, the executing entity can select an integer greater than or equal to 0 as the target first identifier from the difference between the first identifier and the quantity threshold within the text segment to the difference between the first identifier and 1, thus obtaining the target first identifier group. The selected target first identifier is greater than or equal to 0 and less than or equal to the difference between the first identifier and 1.
[0034] The second step is to generate a target second identifier group based on the quantity threshold within the initial text segment. This initial quantity threshold can be the maximum number of initial text information items included in each segment-level text object corresponding to the initial text information sequence. This initial quantity threshold can be set by the user. In practice, the executing entity can select an integer greater than or equal to 0 as the target second identifier from the difference between the second identifier and the initial quantity threshold, down to the difference between the second identifier and 1, to obtain the target second identifier group. The selected target second identifier must be greater than or equal to 0 and less than or equal to the difference between the second identifier and 1.
[0035] The third step is to combine each target first identifier in the above target first identifier group with each target second identifier in the above target second identifier group to form a pair of identifiers to be compared.
[0036] Step 1022: Based on the text information sequence to be compared and the initial text information sequence, generate the text association coefficient corresponding to each pair of identifications in the pair to be compared group, and obtain the text relationship association array.
[0037] In some embodiments, the execution entity can generate text association coefficients corresponding to each pair of identifiers in the pair-to-be-compared group based on the text information sequence to be compared and the initial text information sequence, thereby obtaining a text association array. In practice, the execution entity can generate the text similarity corresponding to the pair of identifiers to be compared as the text association coefficients.
[0038] Specifically, in response to the fact that the identification pair to be compared is a combination of the first identification number and the second identification number, the executing entity may determine the first identification in the identification pair to be compared as a first quantity, and the second identification in the identification pair to be compared as a second quantity. Then, the first quantity of text information to be compared in the text information sequence to be compared may be determined as the first text information to be compared, and the second quantity of initial text information in the initial text information sequence may be determined as the second text information to be compared. Afterwards, the text similarity between the first text information to be compared and the first text information to be compared may be determined as a text correlation coefficient. For example, the text similarity may be a Jaccard similarity.
[0039] In some optional implementations of certain embodiments, in response to the fact that the identification pair to be compared is a combination of the target first identification number in the target first identification number group and the target second identification number in the target second identification number group, the following steps are performed:
[0040] The first step is to determine the text information to be compared corresponding to the first identifier of the target in the above-mentioned text information sequence as the target text information to be compared. Specifically, the target text information to be compared is the text information to be compared that is ranked at the first identifier of the target in the above-mentioned text information sequence.
[0041] The second step is to determine the initial text information corresponding to the second target identifier in the aforementioned initial text information sequence as the target initial text information. Specifically, the target initial text information is the initial text information ranked at the second target identifier in the aforementioned initial text information sequence.
[0042] The third step is to aggregate the sum of the target first identifier and 1 to the target first identifier in the above-mentioned text information sequence to form a first aggregated text information segment to be compared. In practice, the aggregation method can be to combine the text information together sequentially.
[0043] The fourth step is to aggregate the initial text information of the target second identifier and 1 in the above initial text information sequence into a first aggregated initial text information segment.
[0044] The fifth step is to determine the text similarity between the first aggregated text information segment to be compared and the first aggregated initial text information segment as the aggregated segment similarity.
[0045] The sixth step is to determine the text correlation coefficient by summing the text similarity between the target text information to be compared and the target initial text information and the aggregated segment similarity.
[0046] Optionally, the executing entity may also generate a text information distance as a text association coefficient for each pair of identifiers in the pair of identifiers to be compared, based on the text information sequence to be compared and the initial text information sequence. The text information distance may be Euclidean distance or cosine distance.
[0047] Step 1023: Based on the text relational array, select the pair of identifiers that meet the preset correlation coefficient conditions from the pair of identifiers to be compared as the pair of identifiers to be compared.
[0048] In some embodiments, the execution entity may select a pair of identifiers that meet a preset correlation coefficient condition from the pair of identifiers to be compared, based on the text relation array. Wherein, when the text correlation coefficient in the text relation array is text similarity, the preset correlation coefficient condition can be "the corresponding text correlation coefficient is the largest".
[0049] Optionally, the executing entity may select the pair of identifiers with the smallest text correlation coefficient from the group of identifier pairs to be compared as the comparison identifier pair. Here, the text correlation coefficient represents the text information distance.
[0050] Step 1024: Update the alignment identifier pair to the alignment identifier pair matrix variable according to the first identifier and the second identifier.
[0051] In some embodiments, the executing entity can update the comparison identifier pair to the comparison identifier pair matrix variable based on the first identifier and the second identifier. In practice, firstly, the executing entity can determine the first column number by the sum of the first identifier and 1. Then, it can determine the second row number by the sum of the second identifier and 1. Afterward, the executing entity can add the comparison identifier pair to the position of the column number in the first column and the row number in the second row of the comparison identifier pair matrix variable. The update method can be replacement.
[0052] Step 103: Generate text alignment information groups based on the updated alignment identifier matrix variables.
[0053] In some embodiments, the execution entity can generate a text alignment information group based on the updated alignment identifier pair matrix variable. The text alignment information in the text alignment information group includes a text information segment to be compared and an initial text information segment corresponding to the text information segment to be compared. The text information segment to be compared is at least one consecutively arranged text information segment. The initial text information segment is at least one consecutively arranged initial text information segment. In practice, firstly, the execution entity can, in response to both the first sequence number variable and the second sequence number variable being greater than zero, perform the following steps to generate the alignment identifier pair array:
[0054] The first step involves determining the first alignment identifier in the alignment identifier pair corresponding to the first and second sequence number variables in the alignment identifier pair matrix as the first loop identifier variable, and determining the second alignment identifier in the alignment identifier pair as the second loop identifier variable. The first sequence number variable can be a numeric variable with the number of text information to be compared as its initial value. The second sequence number variable can also be a numeric variable with the initial number of text information as its initial value.
[0055] The second step is to determine the difference between the first sequence number variable and the first loop identifier variable as the first variable difference.
[0056] The third step is to determine the difference between the second sequence number variable and the second loop identifier variable as the second variable difference.
[0057] Fourth step: In response to determining that the sum of the first variable difference and the second variable difference is greater than 1, the text information to be compared in the text information sequence to be compared, which is the sum of the first cycle identifier variable and 1 to the first sequence number variable, is aggregated into a second aggregated text information segment, and the initial text information in the initial text information sequence, which is the sum of the second cycle identifier variable and 1 to the second sequence number variable, is aggregated into a second aggregated initial text information segment.
[0058] The fifth step is to combine the second aggregated text information segment to be compared and the second aggregated initial text information segment into text alignment information. This combination can be achieved through splicing.
[0059] Then, the first sequence number variable can be updated to the first loop identifier variable mentioned above, and the second sequence number variable can be updated to the second loop identifier variable mentioned above. In response to the fact that both the updated first sequence number variable and the second sequence number variable are greater than zero, the above comparison identifier pair array generation step is executed again.
[0060] Step 104: Align and display the text information segments to be compared and the initial text information segments included in each text alignment information group in the associated display device.
[0061] In some embodiments, the execution entity can align and display the text information segments to be compared and the initial text information segments included in each text alignment information group in the associated display device. In practice, the execution entity can render the text information segments to be compared and the initial text information segments included in the text alignment information in the display device according to a rendering template corresponding to the text alignment information, so that the text information segments to be compared and the initial text information segments present the same rendering style. The rendering template can be a style template for rendering and may include a font background color. In practice, the execution entity can render the text information segments to be compared and the initial text information segments in the display device with the font background color, so that the rendered text information segments to be compared and the initial text information segments present the same font background color.
[0062] Optionally, firstly, the executing entity can sort the text alignment information in the text alignment information group according to the order of the initial text information corresponding to each text alignment information in the text alignment information group, thus obtaining a text alignment information sequence. In practice, the executing entity can sort the text alignment information in the text alignment information group according to the order of the initial text information corresponding to each text alignment information in the initial text information sequence, thus obtaining a text alignment information sequence. This ensures that the order of the initial text information included in the text alignment information sequence is the same as the order of the initial text information in the initial text information sequence. Then, for the text alignment information in the text alignment information sequence, the text information segment to be compared and the initial text information segment included in the text alignment information can be displayed in the target interface area. The target interface area can be the area in the current interface used to display the text information segment to be compared and the initial text information segment included in a text alignment information. Specifically, the text information segment to be compared included in the text alignment information can be displayed in the first area of the target interface area, and the initial text information segment included in the text alignment information can be displayed in the second area of the target interface area. The first area can be the right-hand area of the target interface area. The second region mentioned above can be the left side of the target interface region.
[0063] Optionally, the executing entity may display identical text in the text segment to be compared and the initial text segment in the first and second regions using a first preset color. For example, the first preset color may be black. New text in the text segment to be compared corresponding to the initial text segment may be displayed in the first region using a second preset color. For example, the second preset color may be green. Missing text in the text segment to be compared corresponding to the initial text segment may be displayed in the second region using a third preset color. For example, the third preset color may be red.
[0064] Optionally, the execution entity may also respond to detecting a user's downward swipe operation on the first area, and the text segment to be compared displayed in the first area sliding out of the first area, and the text segment to be compared not being the last text segment to be compared in the corresponding text alignment information sequence, by displaying the next text segment to be compared in the text alignment information sequence included in the first area, and displaying the initial text segment corresponding to the next text segment to be compared in the second area. The downward swipe operation can be a swipe down to view subsequent content. The text segment to be compared displayed in the first area slides out of the first area from above. This allows the user to swipe down to view the next text segment to be compared and simultaneously view the corresponding initial text segment. Secondly, in response to detecting a user's downward swipe operation on the second area, and the initial text segment displayed in the second area swiping out of the second area, and the initial text segment not being the last initial text segment in the corresponding text alignment information sequence, the system displays the next initial text segment included in the text alignment information sequence in the second area, and displays the text segment to be compared corresponding to the next initial text segment in the first area. This allows the user to swipe down to view the next initial text segment and simultaneously view the corresponding text segment to be compared.
[0065] Optionally, the execution entity may also respond to detecting a user's upward swipe operation on the first area, and the text segment to be compared displayed in the first area sliding out of the first area, and the text segment to be compared not being the first text segment to be compared in the corresponding text alignment information sequence, by displaying the previous text segment to be compared included in the text alignment information sequence in the first area, and displaying the initial text segment corresponding to the previous text segment to be compared in the second area. The upward swipe operation can be an operation of swiping up to view previously displayed content. The text segment to be compared displayed in the first area slides out of the first area by swiping down from below the first area. Thus, the user can swipe up to view the previous text segment to be compared and simultaneously view the corresponding initial text segment. Secondly, in response to detecting a user's upward swipe operation on the second area, and the initial text information segment displayed in the second area swiping out of the second area, and the initial text information segment not being the first initial text information segment corresponding to the text alignment information sequence, the system displays the previous initial text information segment included in the text alignment information sequence in the second area, and displays the text information segment to be compared corresponding to the previous initial text information segment in the first area. This allows the user to swipe up to view the previous initial text information segment and simultaneously view the corresponding text information segment to be compared.
[0066] The above content, as an inventive point of this disclosure, solves the second technical problem mentioned in the background art: "When the various text sequences are not arranged in a one-to-one correspondence, the user needs to spend a long time searching for another text object aligned with a text object, resulting in a waste of user time." Factors leading to this waste of user time often include: when the various text sequences are not arranged in a one-to-one correspondence, the user needs to spend a long time searching for another text object aligned with a text object. Solving these factors can save user time. To achieve this effect, this disclosure displays the text information segment to be compared or the initial text information segment to be compared when the user swipes up or down. This allows the user to swipe down to view the next text information segment to be compared while simultaneously viewing the corresponding initial text information segment. It also allows the user to swipe up to view the previous text information segment to be compared while simultaneously viewing the corresponding initial text information segment. Therefore, users do not need to spend a long time searching for another text object that is aligned with a text object, saving users time.
[0067] Optionally, after step 101, the executing entity can generate a text correlation coefficient matrix variable based on the number of text information to be compared and the initial number of text information. In practice, the executing entity can determine the initial value of the text correlation coefficient matrix variable as a zero matrix with the sum of the number of text information to be compared and 1 as the number of columns and the sum of the initial number of text information and 1 as the number of rows.
[0068] Optionally, after step 1024, the executing entity can also update the text correlation coefficients corresponding to the comparison identifier pairs to the text correlation coefficient matrix variable based on the first identifier and the second identifier. In practice, the executing entity can first update the text correlation coefficients to the positions of the first column and the second row of the text correlation coefficient matrix variable. The update method can be replacement.
[0069] The above embodiments of this disclosure have the following beneficial effects: the text sequence alignment method of some embodiments of this disclosure improves the accuracy of text sequence alignment results. Specifically, the reason for the poor accuracy of text sequence alignment results is that in the point-to-point alignment method, each alignment involves only two point-level text objects. Point-level text objects cannot represent the local context, resulting in poor accuracy of text sequence alignment results. Based on this, the text sequence alignment method of some embodiments of this disclosure first generates an alignment identifier pair matrix variable based on the initial text information quantity and the quantity of text information to be compared. Wherein, the initial text information quantity is the number of initial text information included in the initial text information sequence, and the quantity of text information to be compared is the number of text information to be compared included in the text information sequence to be compared. Thus, the alignment identifier pair matrix variable can be used to store the alignment identifier pairs for sequence alignment. Then, for the first identifier number of each text information to be compared in the text information sequence to be compared, and the second identifier number of each initial text information in the initial text information sequence, the following steps are performed: First, determine the pair of alignment identifier pairs corresponding to the text information to be compared and the initial text information. The second step involves generating text correlation coefficients for each pair of text to be compared in the aforementioned comparison sequence and the aforementioned initial text sequence, thus obtaining a text relational array. This text relational array can be used as the basis for determining the comparison pairs. The third step involves selecting, based on the text relational array, the comparison pairs that satisfy the preset correlation coefficient conditions from the aforementioned comparison pair group as the comparison pairs. The fourth step involves updating the comparison pairs to the comparison pair matrix variable based on the aforementioned first and second identifiers. This allows the determined comparison pairs to be added to the comparison pair matrix variable. Then, based on the updated comparison pair matrix variable, a text alignment information group is generated. The text alignment information in this group includes the text segment to be compared and the corresponding initial text segment. Therefore, the text segment to be compared and the corresponding initial text segment in each text alignment information can be compared in a segment-to-segment alignment manner. Finally, the text information segments to be compared and the initial text information segments included in each of the aforementioned text alignment information groups are aligned and displayed on the associated display device. Thus, the displayed text information segments to be compared and the initial text information segments present an aligned display effect, allowing for text sequence comparison in a segment-to-segment alignment manner. Furthermore, because each alignment in the segment-to-segment alignment method involves at least two point-level text objects, and these segment-level text objects can represent the local context, the accuracy of the text sequence comparison results is improved.
[0070] Further reference Figure 2As an implementation of the methods shown in the above figures, this disclosure provides some embodiments of a text sequence alignment device, which are similar to... Figure 1 Corresponding to the method embodiments shown, the device can be specifically applied to various electronic devices.
[0071] like Figure 2 As shown, a text sequence alignment apparatus 200 in some embodiments includes: a first generation unit 201, an execution unit 202, a second generation unit 203, and a display unit 204. The first generation unit 201 is configured to generate a comparison identifier pair matrix variable based on the initial text information quantity and the quantity of text information to be compared, wherein the initial text information quantity is the number of initial text information included in the initial text information sequence, and the quantity of text information to be compared is the number of text information to be compared included in the text information sequence. The execution unit 202 is configured to, for each first identifier of each text information to be compared in the text information sequence to be compared, and for each second identifier of each initial text information in the initial text information sequence, perform the following steps: determining a pair of comparison identifier pairs corresponding to the text information to be compared and the initial text information; generating each comparison identifier pair in the pair of comparison identifier pairs based on the text information sequence to be compared and the initial text information sequence. The text association coefficients are identified to obtain a text relationship array; based on the text relationship array, a pair of identifications that meet the preset association coefficient conditions is selected from the pair of identifications to be compared as a pair of identifications; based on the first identification number and the second identification number, the pair of identifications is updated to the pair of identification matrix variables; the second generation unit 203 is configured to generate a text alignment information group based on the updated pair of identification matrix variables, wherein the text alignment information in the text alignment information group includes a text information segment to be compared and an initial text information segment corresponding to the text information segment to be compared; the display unit 204 is configured to display the text information segment to be compared and the initial text information segment included in each text alignment information in the text alignment information group in an aligned display device.
[0072] It is understandable that the units described in the device 200 are related to the reference. Figure 1 The steps in the described method correspond to each other. Therefore, the operations, features, and beneficial effects described above for the method also apply to the device 200 and the units contained therein, and will not be repeated here.
[0073] The following is for reference. Figure 3This document illustrates a structural schematic of an electronic device 300 suitable for implementing some embodiments of the present disclosure. The electronic devices in some embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers. Figure 3 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments of this disclosure.
[0074] like Figure 3 As shown, the electronic device 300 may include a processing unit (e.g., a central processing unit, a graphics processing unit, etc.) 301, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 302 or a program loaded from a storage device 308 into a random access memory (RAM) 303. The RAM 303 also stores various programs and data required for the operation of the electronic device 300. The processing unit 301, ROM 302, and RAM 303 are interconnected via a bus 304. An input / output (I / O) interface 305 is also connected to the bus 304.
[0075] Typically, the following devices can be connected to I / O interface 305: input devices 306 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 307 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; and communication devices 309. Communication device 309 allows electronic device 300 to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 3 An electronic device 300 with various devices is shown; however, it should be understood that it is not required to implement or possess all of the devices shown. More or fewer devices may be implemented or possessed alternatively. Figure 3 Each box shown can represent a device or multiple devices as needed.
[0076] In particular, according to some embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, some embodiments of this disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication device 309, or installed from storage device 308, or installed from ROM 302. When the computer program is executed by processing device 301, it performs the functions defined in the methods of some embodiments of this disclosure.
[0077] It should be noted that, in some embodiments of this disclosure, the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium may be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In some embodiments of this disclosure, a computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In some embodiments of this disclosure, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.
[0078] In some implementations, clients and servers can communicate using any currently known or future-developed network protocol such as HTTP (Hypertext Transfer Protocol) and can interconnect with digital data communication (e.g., communication networks) of any form or medium. Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), the Internet (e.g., the Internet of Things), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future-developed networks.
[0079] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently and not assembled into the electronic device. The aforementioned computer-readable medium carries one or more programs, which, when executed by the electronic device, cause the electronic device to: generate a comparison identifier pair matrix variable based on the initial text information quantity and the quantity of text information to be compared, wherein the initial text information quantity is the number of initial text information included in the initial text information sequence, and the quantity of text information to be compared is the number of text information to be compared included in the text information to be compared sequence; for each first identifier of each text information to be compared in the text information to be compared sequence, and for each second identifier of each initial text information in the initial text information sequence, perform the following steps: determine the comparison identifier pair group corresponding to the text information to be compared and the initial text information; based on the text information to be compared sequence and the initial text information... The information sequence generates text correlation coefficients for each pair of identifiers to be compared in the aforementioned pair-to-be-compared group, resulting in a text relational array. Based on the text relational array, pairs of identifiers that satisfy preset correlation coefficient conditions are selected from the aforementioned pair-to-be-compared group as comparison pairs. Based on the first identifier and the second identifier, the comparison pairs are updated to the comparison pair matrix variable. Based on the updated comparison pair matrix variable, a text alignment information group is generated, wherein the text alignment information in the text alignment information group includes a text information segment to be compared and an initial text information segment corresponding to the text information segment to be compared. The text alignment information segment to be compared and the initial text information segment included in each text alignment information in the associated display device are aligned and displayed.
[0080] Computer program code for performing operations of some embodiments of this disclosure can be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, and C++, and conventional procedural programming languages such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).
[0081] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0082] The units described in some embodiments of this disclosure can be implemented in software or hardware. The described units can also be housed in a processor; for example, a processor may be described as including a first generation unit, an execution unit, a second generation unit, and a display unit. The names of these units do not necessarily limit the unit itself; for example, the first generation unit may also be described as "a unit that generates a comparison identifier matrix variable based on the initial number of text information and the number of text information to be compared."
[0083] The functions described above in this document can be performed, at least in part, by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), Application Standard Products (ASSPs), System-on-Chip (SoCs), Complex Programmable Logic Devices (CPLDs), and so on.
[0084] The above description is merely a selection of preferred embodiments of this disclosure and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in the embodiments of this disclosure is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the above-described inventive concept. For example, technical solutions formed by substituting the above-described features with (but not limited to) technical features with similar functions disclosed in the embodiments of this disclosure.
Claims
1. A text sequence alignment method, comprising: Based on the initial text information quantity and the text information quantity to be compared, a comparison identifier matrix variable is generated, wherein the initial text information quantity is the number of initial text information included in the initial text information sequence, and the text information quantity to be compared is the number of text information to be compared included in the text information sequence to be compared. For each first identifier of the text information to be compared in the text information sequence to be compared, and for each second identifier of the initial text information in the initial text information sequence, the following steps are performed: Determine the pair of identification pairs to be compared corresponding to the text information to be compared and the initial text information; Based on the text information sequence to be compared and the initial text information sequence, generate the text association coefficient corresponding to each pair of identifiers in the pair to be compared group, and obtain the text association array; Based on the text relation array, select the pair of identifiers that meet the preset correlation coefficient condition from the pair of identifiers to be compared as the pair of identifiers to be compared; Based on the first identifier and the second identifier, update the alignment identifier pair to the alignment identifier pair matrix variable; Based on the updated comparison identifier matrix variable, a text alignment information group is generated, wherein the text alignment information in the text alignment information group includes the text information segment to be compared and the initial text information segment corresponding to the text information segment to be compared. The text alignment information segments to be compared and the initial text information segment are aligned and displayed in the associated display device for each text alignment information group.
2. The method of claim 1, wherein, Before generating the comparison identifier matrix variable based on the initial number of text information and the number of text information to be compared, the method further includes: Receive the sequence of text information to be compared; In response to determining that the text information sequence to be compared is empty, the audio playback device connected to the communication connection is controlled to play a preset retransmission prompt tone.
3. The method according to claim 1, wherein, The step of generating the text association coefficient corresponding to each pair of identifiers in the pair of identifiers to be compared based on the text information sequence to be compared and the initial text information sequence includes: Based on the text information sequence to be compared and the initial text information sequence, the text information distance corresponding to each pair of identifiers in the pair to be compared is generated as a text association coefficient.
4. The method according to claim 3, wherein, The step of selecting a pair of identifiers that meet a preset correlation coefficient condition from the pair of identifiers to be compared as the comparison identifier pair according to the text relation association array includes: Select the pair of identifiers with the smallest text correlation coefficient from the group of identifier pairs to be compared as the comparison identifier pair.
5. The method according to claim 1, wherein, The step of aligning and displaying the text information segment to be compared and the initial text information segment included in each text alignment information group in the associated display device includes: Based on the order of the initial text information corresponding to each text alignment information in the text alignment information group, sort each text alignment information in the text alignment information group to obtain a text alignment information sequence; For the text alignment information in the text alignment information sequence, the text information segment to be compared and the initial text information segment included in the text alignment information are displayed in the target interface area.
6. The method according to claim 5, wherein, The step of displaying the text information segment to be compared and the initial text information segment, which include the text alignment information, in the target interface area includes: The text information segment to be compared, including the text alignment information, is displayed in a first area of the target interface area, and the initial text information segment, including the text alignment information, is displayed in a second area of the target interface area.
7. The method according to claim 6, wherein, The step of displaying the text alignment information, including the text segment to be compared and the initial text segment, in the target interface area further includes: In the first region and the second region, the identical text in the text information segment to be compared and the initial text information segment is displayed in a first preset color; The newly added text corresponding to the initial text information segment of the text information segment to be compared is displayed in the first area in a second preset color; In the second region, the missing text corresponding to the text information segment to be compared is displayed in a third preset color.
8. A text sequence alignment device, comprising: The first generation unit is configured to generate a comparison identifier matrix variable based on the initial text information quantity and the text information to be compared quantity, wherein the initial text information quantity is the number of initial text information included in the initial text information sequence, and the text information to be compared quantity is the number of text information to be compared included in the text information to be compared sequence. The execution unit is configured to perform the following steps for each first identifier of the text information to be compared in the text information sequence to be compared, and each second identifier of the initial text information in the initial text information sequence: determining a pair of identification pairs to be compared corresponding to the text information to be compared and the initial text information; generating a text correlation coefficient corresponding to each pair of identification pairs in the pair of identification pairs to be compared based on the text information sequence to be compared and the initial text information sequence to be compared, thereby obtaining a text relational array; selecting a pair of identification pairs to be compared that satisfies a preset correlation coefficient condition from the pair of identification pairs to be compared as a pair of identification pairs; and updating the pair of identification pairs to the pair of identification pairs matrix variable based on the first identifier and the second identifier. The second generation unit is configured to generate a text alignment information group based on the updated alignment identifier matrix variable, wherein the text alignment information in the text alignment information group includes a text information segment to be compared and an initial text information segment corresponding to the text information segment to be compared. The display unit is configured to align and display the text information segments to be compared and the initial text information segments included in each text alignment information group in an associated display device.
9. An electronic device, comprising: One or more processors; Storage device, on which one or more programs are stored, When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1-7.
10. A computer-readable medium having a computer program stored thereon, wherein, When the program is executed by the processor, it implements the method as described in any one of claims 1-7.