Method for bidirectional conversion between table data and text content
By generating a sequential baseline sequence and a sequential indicator fragment to record the field arrangement trajectory, and constructing a sequential offset sequence for segmentation, the problem of inconsistent field order in the bidirectional conversion between tabular data and text content is solved, thereby improving data consistency and system reliability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI FOREIGN SERVICE INFORMATION TECH CO LTD
- Filing Date
- 2026-05-15
- Publication Date
- 2026-06-12
AI Technical Summary
In the process of bidirectional conversion between tabular data and text content, existing technologies lack traceable evidence of field order after semantic rearrangement, resulting in inconsistent field order during reverse conversion, which affects data consistency and system reliability.
By generating a sequential baseline sequence and a sequential indicator fragment, the arrangement trajectory of fields in the text content is recorded, and a sequential offset sequence is constructed for segmentation, identifying stable segments and disturbed segments, and adjusting the field arrangement order based on the degree of traceability of changes.
When converting text content into tabular data, maintain the continuity and parsability of field order to improve data consistency and system reliability, and avoid distortion of order information during multiple conversions.
Smart Images

Figure CN122197826A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of bidirectional conversion technology between tabular data and text content, and specifically to a method for implementing bidirectional conversion between tabular data and text content. Background Technology
[0002] Bidirectional conversion between tabular data and text content refers to a data processing method in a computer system that converts tabular data with row and column structures (such as database tables or spreadsheets) to text content existing in natural language or semi-structured form. Its core lies in achieving the mapping and reconstruction between structured information and unstructured semantics. In existing technologies, the process typically begins with table-to-text conversion. This involves extracting field names, data values, and their logical relationships from the table through header recognition, field semantic parsing, and row and column relationship modeling. Descriptive text is then generated based on preset templates or rules. Some solutions also incorporate natural language generation techniques to improve fluency. Conversely, the text-to-table conversion involves word segmentation, entity recognition, and semantic decoding. Natural language processing techniques such as parsing and relation extraction are used to extract corresponding fields, attributes, and their values from text. These are then mapped and filled based on predefined data structures or dynamically inferred table structures to reconstruct the table. The entire bidirectional conversion process typically includes multiple stages such as data preprocessing, structure recognition, semantic modeling, mapping rule construction, intermediate representation generation (such as JSON or tag-based structures), conversion execution, and result verification. The intermediate representation layer is often used to unify the expression of different data formats to improve the accuracy and reversibility of the conversion. It may also introduce consistency verification and error correction mechanisms to solve the problem of information loss or ambiguity during the conversion process, thereby achieving a relatively stable and universal bidirectional conversion function between tabular data and text content.
[0003] The existing technology has the following shortcomings: In the process of bidirectional conversion between tabular data and text content, when converting tabular data to text content, the field order in the original tabular data is rearranged according to semantic importance or language expression habits to improve the readability and naturalness of the text content. This changes the presentation order of the fields in the text content. Since this rearrangement process only focuses on the text expression effect and does not record the field order information in the original tabular data, the change in field order lacks traceability. As a result, when converting the text content back to tabular data, it is difficult to restore the original field arrangement structure based on the text content. Existing technology cannot adjust the field arrangement order when converting text content back to tabular data based on the traceability of the order change when the field order is semantically rearranged during the conversion of tabular data to text content. This will cause the field order of the restored tabular data to be inconsistent with the original structure, resulting in errors in application scenarios that rely on field order for data calculation, logical processing, or interface display, seriously affecting data consistency and system reliability.
[0004] The information disclosed in the background section is only intended to enhance the understanding of the background of this disclosure, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention
[0005] The purpose of this invention is to provide a method for bidirectional conversion between tabular data and text content, so as to solve the problems in the background art mentioned above.
[0006] To achieve the above objectives, the present invention provides the following technical solution: a method for bidirectional conversion between tabular data and text content, specifically including the following steps: S1. Generate a sequence of fields in the table data, construct a sequence indicator fragment based on the preset encoding symbols, embed the sequence indicator fragment into the field content to form combined data, and record the arrangement trajectory of the fields in the text content. S2. Establish the field correspondence based on the sequential baseline sequence and the arrangement trajectory, generate the sequential offset sequence, and determine whether the field order of the table data is semantically rearranged when it is converted into text content based on the sequential offset sequence. S3. When the field order is semantically rearranged, the sequence offset sequence is divided into segments to identify stable segments and disturbed segments, and the traceability of the sequence change is determined based on the segment distribution and connection relationship. S4. When converting text content into tabular data, perform field arrangement and reorganization based on the traceability of order changes, the order base sequence, and the order offset sequence, and adjust the field arrangement order when converting text content into tabular data according to the traceability of order changes. S5. During the bidirectional conversion process, the sequence offset sequence and the traceability of sequence changes are continuously tracked, and the embedding distribution of sequence indicator fragments in the field content is adjusted based on the change trend to maintain the correspondence between the sequence reference sequence and the arrangement trajectory.
[0007] Preferably, S1 is as follows: The fields in the table data are sequentially labeled according to their original arrangement order, and consecutive serial numbers are assigned to each field. The field identifiers are then associated with their corresponding serial numbers to construct a sequential baseline sequence. Based on the sequence number in the sequential baseline sequence, the sequence is encoded according to the preset encoding symbols. The preset encoding symbols are a set of fixed characters that are pre-defined to identify sequence information. The sequence number is mapped to the corresponding character combination to form a sequence indicator fragment, and the sequence indicator fragment is combined with the field identifier. Sequence indicator fragments are embedded into field content to form combined data. When the tabular data is converted into text content, the combined data is recorded sequentially according to the output order of the fields in the text content, generating the arrangement trajectory of the fields in the text content.
[0008] Preferably, S2 specifically includes the following steps: S201. Based on the correspondence between field identifiers and sequence numbers in the sequential baseline sequence, and combined with the arrangement order of the sequence indicator segments in the arrangement trajectory of the fields in the text content, the field identifiers in the sequential baseline sequence and the sequence indicator segments in the arrangement trajectory are matched to establish the field correspondence between the sequence number position of the field in the sequential baseline sequence and the sequence number position in the arrangement trajectory. S202. Based on the position of each field in the sequential reference sequence and the position of each field in the arrangement trajectory, calculate the position difference for each field, and sort the position differences according to the position order in the arrangement trajectory to form a sequential offset sequence. S203. Based on the distribution of the position difference of the serial number in the sequential offset sequence, the sequential offset sequence is judged. When the position difference of each serial number in the sequential offset sequence is zero or shows a consistent trend, it is determined that the field order of the table data has not been semantically rearranged when it is converted into text content. When there are inconsistent position differences of serial numbers in the sequential offset sequence, it is determined that the field order of the table data has been semantically rearranged when it is converted into text content.
[0009] Preferably, S203 is as follows: Based on the arrangement order of the position differences of each index in the sequential offset sequence, each position difference is read item by item, and a distribution sequence of position differences is constructed according to the arrangement order. At the same time, the adjacent change relationship between position differences is recorded to form a change sequence of position differences. Based on the change sequence of the position difference of the serial number, the change direction and change magnitude between adjacent position differences of the serial number are compared. The position differences of the serial number with the same change direction and the same change magnitude are divided into the same change segment, and the position differences of the serial number with different change directions or different change magnitudes are divided into different change segments, thus forming the segment distribution result of the position difference of the serial number. Based on the segment distribution results of the sequence number position difference, the order offset sequence is determined. When the segment distribution result contains only a single segment and the sequence number position difference is zero or shows a consistent trend, it is determined that the field order of the table data has not undergone semantic rearrangement when converted to text content. When the segment distribution result contains multiple segments and there are inconsistent sequence number position differences, it is determined that the field order of the table data has undergone semantic rearrangement when converted to text content.
[0010] Preferably, S3 specifically includes the following steps: S301. When the field order is semantically rearranged, the position difference of each number is read continuously according to the arrangement order of the position difference of the number in the sequential offset sequence, and the segmentation is performed according to the change direction and change magnitude between adjacent position differences. The position differences of the number with the same change direction and the same change magnitude are divided into the same segment to form the segmentation result of the sequential offset sequence. S302. After completing the segment division, analyze the variation characteristics of the position difference of the serial number in each segment. Identify the segments in which the position difference of the serial number changes consistently or in a continuous unidirectional direction as stable segments, and identify the segments in which the position difference of the serial number changes in direction or the magnitude of the change is discontinuous as disturbed segments, thus forming the identification results of stable segments and disturbed segments. S303. Based on the segment distribution of stable and disturbed segments and the connection relationship between adjacent segments, the distribution position and segment length of each segment in the sequential offset sequence are statistically analyzed. Combined with the proportion of stable segments in the segment distribution and the connection order between stable and disturbed segments, the traceability of sequence changes is determined.
[0011] Preferably, S303 is as follows: Based on the segment distribution of stable and disturbed segments, the distribution position of each segment in the sequential offset sequence is marked, and the number of sequence position differences contained in each segment is counted to form a segment length set. At the same time, the arrangement order of each segment in the sequential offset sequence is recorded. Based on the set of segment lengths, the ratio between the sum of stable segment lengths and the total length of the sequential offset sequence is calculated, and the adjacent connection relationships between stable segments and disturbed segments are extracted according to the segment arrangement order to form a segment connection sequence; The traceability of sequence changes is classified into levels based on the ratio between the sum of the lengths of stable segments and the total length of the sequence offset sequence, as well as the segment connection sequence. The first level of traceability is determined when the ratio between the sum of the lengths of stable segments and the total length of the sequence offset sequence is greater than a preset threshold and the number of consecutive occurrences of stable segments in the segment connection sequence is greater than a preset number. The second level of traceability is determined when the ratio between the sum of the lengths of stable segments and the total length of the sequence offset sequence is less than or equal to a preset threshold and stable segments and disturbed segments alternate in the segment connection sequence.
[0012] Preferably, S4 is as follows: When converting text content into tabular data, the sequential offset sequence and the sequential base sequence are jointly parsed based on the traceability of the sequence change. The field identifiers parsed from the text content are matched with the position differences of the serial numbers in the sequential offset sequence, and the initial position of each field is determined by combining the correspondence between the field identifiers and serial numbers in the sequential base sequence. After the initial positions of each field are determined, the field arrangement and recombination path is selected according to the degree of traceability of the sequence change. When the degree of traceability of the sequence change corresponds to the first degree of traceability, the position of the sequence number in the sequence base sequence is used as the basis for field arrangement. When the degree of traceability of the sequence change corresponds to the second degree of traceability, the initial position of each field is offset by the difference of the position of the sequence offset sequence to form the recombination position of each field. The field identifiers are sorted and rearranged according to the reorganization position of each field to form a field arrangement result. Table data is then generated based on the field arrangement result to realize the adjustment of the field arrangement order when the text content is reverse-converted into table data according to the traceability of the order change.
[0013] Preferably, S5 is as follows: During the bidirectional conversion process, the sequence offset sequence is collected and recorded sequentially according to the conversion execution order. After each conversion, the difference in the sequence position in the sequence offset sequence is updated. At the same time, the traceability of sequence change is repeatedly determined based on the updated sequence offset sequence, forming a continuous tracking sequence of sequence offset sequence and traceability of sequence change. Based on the continuous tracking sequence, the changing trend of the position difference of the sequence number in the sequential offset sequence is analyzed. By comparing the changing direction and magnitude of the position difference of the sequence number in the sequential offset sequence at adjacent time points, the changing trend sequence is extracted, and the embedding distribution adjustment parameters of the sequence indicator segment in the field content are determined based on the changing trend sequence. Based on the embedding distribution adjustment parameters, the embedding distribution of the sequence indicator fragment in the field content is adjusted. By changing the embedding position and number of the sequence indicator fragment in the field content, the correspondence between the field identifier and the sequence number position is maintained between the sequence reference sequence and the arrangement trajectory.
[0014] The technical effects and advantages provided by the present invention in the above technical solution are as follows: 1. This invention introduces a sequence reference sequence, sequence indicator fragments, and arrangement trajectory records during the conversion of tabular data into text content. This allows the field sequence information, originally implicit in the table structure, to be explicitly expressed and embedded into the text content, thereby achieving the synchronous carrying of sequence information without affecting text readability. Furthermore, by constructing a sequence offset sequence and performing segmentation and change feature analysis on it, the field sequence change is elevated from a simple positional difference to a change expression form with structural features. Based on the distribution and connection relationship of stable and disturbed segments, the traceability of sequence change is determined, enabling the field sequence change to have quantifiable and hierarchical descriptive capabilities. Compared with existing technologies that rely solely on templates or simple mappings, this solution can maintain the continuity and parsability of sequence information even in complex situations where the field sequence undergoes semantic rearrangement.
[0015] 2. In the stage of converting text content into tabular data, this invention introduces the traceability of order changes as a core control parameter. By combining the order baseline sequence and the order offset sequence to perform field arrangement and recombination, the field arrangement process is transformed from a fixed rule recovery process to an adaptive adjustment process based on the degree of change. This allows for the selection of matching recombination paths under different order disturbance scenarios, improving the accuracy of field order recovery. Simultaneously, during the bidirectional conversion process, the order offset sequence and the traceability of order changes are continuously tracked, and the embedding distribution of order indicator fragments in the field content is dynamically adjusted according to the change trend. This enables the expression density and position of order information to be adaptively optimized according to the changing state, thereby maintaining a stable correspondence between the order baseline sequence and the arrangement trajectory. This scheme avoids the problem of gradual distortion of order information during multiple conversions, improving data consistency, structure recovery accuracy, and overall system reliability. Attached Figure Description
[0016] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this invention. For those skilled in the art, other drawings can be obtained based on these drawings.
[0017] Figure 1 This is a schematic diagram of the process of the present invention. Detailed Implementation
[0018] Exemplary embodiments will now be described more fully with reference to the accompanying drawings. However, these exemplary embodiments can be implemented in many forms and should not be construed as limited to the examples set forth herein; rather, they are provided so that the description of this disclosure will be more complete and fully convey the concept of the exemplary embodiments to those skilled in the art.
[0019] This invention provides, for example Figure 1 The method for bidirectional conversion between tabular data and text content, as shown, includes the following steps: S1. Generate a sequence of fields in the table data, construct a sequence indicator fragment based on the preset encoding symbols, embed the sequence indicator fragment into the field content to form combined data, and record the arrangement trajectory of the fields in the text content. In this embodiment, S1 specifically refers to: The fields in the table data are sequentially labeled according to their original arrangement order, and consecutive serial numbers are assigned to each field. The field identifiers are then associated with their corresponding serial numbers to construct a sequential baseline sequence. When assigning order to fields in tabular data, one can iterate through the data item by item according to its natural arrangement in the table, such as reading the field positions sequentially from left to right or top to bottom, and assigning each field an incrementing integer number as a sequence identifier. In practice, one can read the column index or column position of a field in the table, assign the first field a starting number, and then assign values to each field sequentially according to their arrangement in the table, forming a continuous sequence of numbers. At the same time, the name or unique identifier of each field is bound to the corresponding number, generating a set of paired mapping relationships. This set is the order baseline sequence. For example, a table containing fields "name, age, address" can be assigned values 1, 2, and 3 in order, forming a correspondence of "name-1, age-2, address-3". In this way, the order information originally implicit in the table structure can be transformed into explicit and recordable data, providing a basis for comparison when the order changes in the text content, thereby supporting the order reconstruction in the reverse recovery process.
[0020] Tabular data refers to a structured collection of information organized in rows and columns. Fields represent data columns or data items with independent meaning in a table. The original arrangement order refers to the natural arrangement of fields in the table. Sequence labeling is the process of numbering this positional relationship. Continuous serial numbers represent integer identifiers that are sequentially increased without interruption from a fixed starting point. Field identifiers are data tags used to uniquely distinguish different fields and can be field names or field numbers. The sequence reference sequence is an ordered set composed of field identifiers and corresponding continuous serial numbers, used to describe the original order relationship of fields in tabular data. By combining these elements, the structural order in tabular data can be transformed into a clear data description form, making the sequence information recordable and comparable.
[0021] Based on the sequence number in the sequential baseline sequence, the sequence is encoded according to the preset encoding symbols. The preset encoding symbols are a set of fixed characters that are pre-defined to identify sequence information. The sequence number is mapped to the corresponding character combination to form a sequence indicator fragment, and the sequence indicator fragment is combined with the field identifier. When encoding the sequence numbers in a sequential baseline sequence, a fixed character set can be pre-defined, such as an encoded character set composed of specific symbols, letters, or combinations of markers. Then, consecutive sequence numbers are converted into corresponding character combinations according to a predetermined mapping relationship. In specific implementation, the sequence number can be decomposed into several bit identifiers based on its numerical value, and then mapped to elements in the character set and concatenated to form a unique character sequence. This character sequence is the sequence indicator fragment. Subsequently, the sequence indicator fragment is concatenated or embedded with field identifiers to form a combination of content carrying sequence information. For example, sequence number one can be mapped to the specific symbol combination "AA", sequence number two can be mapped to "AB", and then combined with field names to form the form "AA_name" and "AB_age". In this way, numerical sequence information can be converted into recognizable character markers in text, which will not be confused with ordinary semantic content in subsequent text processing, while ensuring that the sequence information can be parsed and restored during the conversion process.
[0022] The sequence number in the sequential baseline sequence represents the position number of the field in the original arrangement. The preset encoding symbol refers to the predefined set of characters used to represent sequential information. This set of characters is usually composed of special characters or character combinations that do not conflict with business data. Encoding processing refers to the process of converting numerical sequence numbers into character representations. A pre-defined set of fixed characters used to identify sequential information ensures that the encoding results are consistent and unique. The corresponding character combination refers to the character sequence obtained by converting the sequence number through a mapping relationship. The sequence indicator fragment is the identification content formed by the corresponding character combination to represent the sequential position of the field. Through the combination of these elements, the sequential information that originally relied on numerical expression can be transformed into a character expression form suitable for the text environment, making the sequential information embeddable and parsable in the text content.
[0023] Sequence indicator fragments are embedded into field content to form combined data. When the tabular data is converted into text content, the combined data is recorded sequentially according to the output order of the fields in the text content, generating the arrangement trajectory of the fields in the text content.
[0024] When embedding sequence indicator fragments into field content to form combined data, a fixed position can be selected for insertion within the field content. For example, a sequence indicator fragment can be added to the prefix of the field content, or it can be concatenated within the field content using delimiters, thereby generating combined data containing sequence information. When converting tabular data into text content, each combined data is recorded sequentially into a sequence set according to the field output order during text generation. This sequence set reflects the actual arrangement of the combined data in the text. In specific implementation, a sequence record list can be maintained synchronously during text generation. Whenever a combined data is output to the text, the sequence indicator fragment corresponding to that combined data is written into the list in the output order, thus forming an arrangement trajectory. For example, if the fields "name", "age", and "address" are rearranged in the text to "age", "name", and "address", the output order of the recorded sequence indicator fragments is "AB, AA, AC". This method can explicitly record the actual arrangement order in the text so that the original structure can be restored later.
[0025] Field content represents the specific data value corresponding to the field. Embedding refers to inserting sequence indicator fragments into a specified position in the field content to form a unified data unit. Combined data refers to the overall information carrier containing field content and sequence indicator fragments. Converting tabular data into text content represents the process of generating continuous text from structured fields according to certain logic. The output order of fields in the text content represents the actual order of appearance of the corresponding content in the text. The arrangement trajectory of fields in the text content is an ordered sequence formed by recording sequence indicator fragments according to the output order, which is used to describe the arrangement path of fields in the text. Through the combination of these contents, the process of field order change can be completely recorded, making the order information traceable.
[0026] S2. Establish the field correspondence based on the sequential baseline sequence and the arrangement trajectory, generate the sequential offset sequence, and determine whether the field order of the table data is semantically rearranged when it is converted into text content based on the sequential offset sequence. In this embodiment, S2 specifically includes the following steps: S201. Based on the correspondence between field identifiers and sequence numbers in the sequential baseline sequence, and combined with the arrangement order of the sequence indicator segments in the arrangement trajectory of the fields in the text content, the field identifiers in the sequential baseline sequence and the sequence indicator segments in the arrangement trajectory are matched to establish the field correspondence between the sequence number position of the field in the sequential baseline sequence and the sequence number position in the arrangement trajectory. When establishing field correspondences, a sequential baseline sequence can be used as the original reference. The ordered set consisting of each field identifier and its corresponding sequence number is traversed. Simultaneously, the order of sequence indicator fragments is extracted from the arrangement trajectory of the fields in the text content. The corresponding sequence number information is then parsed back from these sequence indicator fragments. Finally, matching is performed based on the correspondence between field identifiers and sequence numbers to determine the original sequence number position of the same field in the sequential baseline sequence and its current sequence number position in the arrangement trajectory. Specifically, a mapping table can be constructed with field identifiers as keys and sequence numbers as values. Then, the sequence indicator fragments in the arrangement trajectory are parsed one by one to obtain the sequence number sequence. Each sequence number is then assigned a current position index according to its arrangement order. A one-to-one correspondence is established by comparing the original sequence number position with the current position index. For example, if the sequential baseline sequence is "Name-1, Age-2, Address-3", and the sequence indicator fragment in the arrangement trajectory is parsed as "2, 1, 3", then the current position indices are "Age-1, Name-2, Address-3". This method clarifies the positional relationship of each field in the two sequences, thus providing a foundation for subsequent offset calculations.
[0027] The sequence baseline sequence represents an ordered set composed of field identifiers and serial numbers, used to describe the arrangement order of fields in the original table. The correspondence between field identifiers and serial numbers represents a one-to-one mapping relationship between fields and their position numbers. The arrangement order of sequence indicator segments in the arrangement trajectory of fields in the text content represents the sequence of sequence indicator segments corresponding to the actual output order of fields in the text. The field correspondence between the serial number position of a field in the sequence baseline sequence and the serial number position in the arrangement trajectory represents the mapping relationship between the original sequence position and the current sequence position of the same field. By combining these elements, the positional changes of fields in different sequences can be clearly associated, providing accurate basic data for the subsequent generation of sequence offset sequences.
[0028] S202. Based on the position of each field in the sequential reference sequence and the position of each field in the arrangement trajectory, calculate the position difference for each field, and sort the position differences according to the position order in the arrangement trajectory to form a sequential offset sequence. When calculating the difference in sequence positions and forming a sequential offset sequence, based on the mapping results established in the field correspondence, the sequence position of each field in the sequential reference sequence and its sequence position in the arrangement trajectory can be extracted. Then, the fields are processed one by one according to their order of appearance in the arrangement trajectory, the two sequence positions are compared and the position difference is calculated, and this difference is recorded as the offset of the field. In specific implementation, a field list can be generated first according to the arrangement trajectory order, and then the original sequence position of each field in the sequential reference sequence and its position in the arrangement trajectory can be found from the field correspondence. The current position index is used to obtain the offset by comparing the two items one by one. Then, the offsets are arranged in the order of the arrangement trajectory to form an ordered offset sequence. For example, if the ordered base sequence is "name-1, age-2, address-3" and the order in the arrangement trajectory is "age, name, address", then the corresponding position is "age-1, name-2, address-3". The offset sequence can be calculated as "+1, -1, 0". Then, the complete sequence is formed by arranging the arrangement trajectory. This process can transform the change in field order into a quantifiable numerical sequence, providing a basis for subsequent judgment.
[0029] The field correspondence represents the positional mapping relationship of the same field in the sequential reference sequence and the arrangement trajectory. The ordinal position of each field in the sequential reference sequence represents the position number of the field in the original arrangement. The ordinal position position of each field in the arrangement trajectory represents the position number of the field in the text output order. The difference in ordinal position represents the numerical manifestation of the position change of the field in the two sequences. The sequence offset sequence represents the ordered set of values formed by arranging all ordinal position differences according to the arrangement trajectory. By combining these elements, the process of field order change can be described in numerical form, making the order change analyzable and comparable.
[0030] S203. Based on the distribution of the position difference of the serial number in the sequential offset sequence, the sequential offset sequence is judged. When the position difference of each serial number in the sequential offset sequence is zero or shows a consistent trend, it is determined that the field order of the table data has not been semantically rearranged when it is converted into text content. When there are inconsistent position differences of serial numbers in the sequential offset sequence, it is determined that the field order of the table data has been semantically rearranged when it is converted into text content.
[0031] In this embodiment, S203 specifically refers to: Based on the arrangement order of the position differences of each index in the sequential offset sequence, each position difference is read item by item, and a distribution sequence of position differences is constructed according to the arrangement order. At the same time, the adjacent change relationship between position differences is recorded to form a change sequence of position differences. When processing the sequential offset sequence, it can be read item by item according to the order of the position differences of each index in the sequence. That is, it accesses each index difference in turn according to the actual output order of the field in the text content, and reorganizes the reading results into an index difference distribution sequence according to the access order. At the same time, the changes between two adjacent index differences are recorded during the reading process, including the direction and magnitude of the change, thereby constructing a sequence of changes in index difference. In specific implementation, it is possible to traverse the list of values in the sequential offset sequence, add each item to the distribution sequence in turn, and compare the current value with the previous value during the traversal and record its changes. For example, if the sequential offset sequence is "+1, -1, 0", then the distribution sequence is kept as follows: The values are "+1, -1, 0". The adjacent change relationships are "reversal from +1 to -1" and "incremental change from -1 to 0", thus forming a change sequence of "reversal and increment". In this way, a single offset value sequence can be transformed into a sequence containing change trend information, providing a basis for subsequent analysis of the sequential change pattern. The arrangement order of the position difference of each index in the sequential offset sequence indicates that the offset is organized according to the arrangement order of the fields in the text. Reading item by item means extracting values one by one in this order. The sequence of position difference distribution indicates the set of offsets that maintains the original arrangement order. The adjacent change relationship indicates the change characteristics between two consecutive offsets. The sequence of position difference change indicates the sequence formed by recording these change characteristics in order.
[0032] Based on the change sequence of the position difference of the serial number, the change direction and change magnitude between adjacent position differences of the serial number are compared. The position differences of the serial number with the same change direction and the same change magnitude are divided into the same change segment, and the position differences of the serial number with different change directions or different change magnitudes are divided into different change segments, thus forming the segment distribution result of the position difference of the serial number. When segmenting the sequence of changes in positional differences, we can analyze the characteristics of changes between adjacent positional differences one by one, following the order of the sequence. First, we determine whether the direction of change is consistent, i.e., whether adjacent differences increase, decrease, or remain unchanged in the same direction. Then, we determine whether the magnitude of change is consistent, i.e., whether the amount of change between adjacent differences is the same. Based on this, consecutive positional differences that satisfy both consistent direction and magnitude of change are grouped into the same change segment. When adjacent differences show inconsistent direction or different magnitudes of change, they are divided into new change segments. By segmenting, we can obtain a complete segment distribution result. In practice, we can use a traversal approach to compare adjacent elements in the change sequence. For example, if the change sequence is "increasing, increasing, reversing, increasing", then the first two "increasing" values can be divided into the same segment. In a segment, "reversal" forms a new segment, and subsequent "incremental" segments, due to their different direction of change from the previous segment, form new segments again, resulting in the segment distribution results of "incremental segment, reversal segment, and incremental segment". This process allows for the grouping and expression of continuous change trends. The direction and magnitude of change between adjacent index positions represent the change characteristics between two consecutive differences. Consistent change directions indicate the same change trend, and consistent change magnitudes indicate the same degree of change. The same change segment represents a set of consecutive differences with the same change characteristics. Inconsistent change directions or different change magnitudes indicate a change in the change trend or degree. Different change segments represent sets of differences divided into different groups. The segment distribution results of index position differences represent a set of multiple consecutive segments divided according to change characteristics, used to reflect the structural characteristics of overall sequential change.
[0033] Based on the segment distribution results of the sequence number position difference, the order offset sequence is determined. When the segment distribution result contains only a single segment and the sequence number position difference is zero or shows a consistent trend, it is determined that the field order of the table data has not undergone semantic rearrangement when converted to text content. When the segment distribution result contains multiple segments and there are inconsistent sequence number position differences, it is determined that the field order of the table data has undergone semantic rearrangement when converted to text content.
[0034] When making the final determination of the sequence offset sequence, an overall analysis can be performed based on the segment distribution results of the sequence position difference. Specifically, first, the number of segments in the segment distribution results and the changing characteristics of the sequence position difference within each segment are counted. Then, the number of segments and the changing state within each segment are jointly judged. When the segment distribution results contain only a single segment, it is further checked whether all sequence position differences within that segment are zero, or whether they maintain the same direction and consistent magnitude of change during the arrangement process. For example, if the sequence position difference sequence is "0, 0, 0" or "+1, +2, +3", which is a continuous increase or decrease, it can be determined that the field order has not undergone semantic rearrangement. However, when the segment distribution results contain multiple segments, it indicates that there is a break in the changing trend of the sequence position difference during the arrangement process. For example, if the sequence position difference sequence is "+1, +2, -1, 0", and the direction or magnitude of change before and after is significantly changed, it can be determined that the field order has undergone semantic rearrangement. In this way, the overall pattern of sequence change can be transformed into a judgmentable structural feature.
[0035] A single segment indicates that the entire sequence of position difference values maintains a consistent change characteristic without any interruption in the trend. All position differences are zero, indicating that all fields are in completely consistent positions in the two sequences. A consistent trend indicates that the position differences maintain the same direction in the sequence and the magnitude of change is continuous and stable. Semantic rearrangement indicates that the arrangement order of fields in the text content has undergone a structural adjustment compared to the original order. Inconsistent position differences indicate that there are discontinuous changes in direction or magnitude in the sequence. By combining these elements, it is possible to clearly determine whether the field order has undergone semantic rearrangement.
[0036] S3. When the field order is semantically rearranged, the sequence offset sequence is divided into segments to identify stable segments and disturbed segments, and the traceability of the sequence change is determined based on the segment distribution and connection relationship. In this embodiment, S3 specifically includes the following steps: S301. When the field order is semantically rearranged, the position difference of each number is read continuously according to the arrangement order of the position difference of the number in the sequential offset sequence, and the segmentation is performed according to the change direction and change magnitude between adjacent position differences. The position differences of the number with the same change direction and the same change magnitude are divided into the same segment to form the segmentation result of the sequential offset sequence. When the field order is semantically rearranged, a sequence of sequential offsets can be used as the input data source. The process involves iterating through each index position difference according to its order in the sequence. This means reading each index position difference sequentially according to its actual output order in the text content, comparing the current index position difference with the previous one, and extracting the direction and magnitude of change. The direction of change is determined by whether the value is increasing, decreasing, or remaining constant, while the magnitude is determined by the change between adjacent differences. Furthermore, when the direction of change is consistent across multiple consecutive index position differences, and... If the magnitude of change is the same, these positional differences are grouped together and recorded as a segment. When the direction of change or the magnitude of change changes, the current segment ends and a new segment begins, thus completing the segmentation of the entire sequence. For example, if the sequence of positional differences is "+1, +2, +3, -1, -2, 0", the first three differences are divided into one segment because they are increasing and have the same magnitude of change. The subsequent "-1, -2" are divided into another segment, and "0" forms a separate segment. In this way, parts with consistent continuous change patterns can be aggregated, providing a foundation for subsequent analysis of sequential change structures.
[0037] Semantic rearrangement of field order indicates that the order of fields in the text content has changed relative to the original table order. The difference in sequence position indicates the positional offset of the field in the original sequence and the current sequence. Continuous reading indicates that the difference data is extracted one by one according to the sequence order. The direction and magnitude of change between adjacent sequence position differences indicate the trend and degree of change between two consecutive differences. Segment division indicates that the sequence is grouped according to the change characteristics. Sequence position differences with the same direction and magnitude of change indicate that consecutive differences have the same trend and degree of change. The same segment indicates a set of consecutive differences divided into the same group. The segment division result of the sequence offset sequence indicates a set of multiple consecutive segments after being divided according to the change characteristics, which is used to reflect the structural distribution of field order changes.
[0038] S302. After completing the segment division, analyze the variation characteristics of the position difference of the serial number in each segment. Identify the segments in which the position difference of the serial number changes consistently or in a continuous unidirectional direction as stable segments, and identify the segments in which the position difference of the serial number changes in direction or the magnitude of the change is discontinuous as disturbed segments, thus forming the identification results of stable segments and disturbed segments. After segmenting, the positional differences within each segment can be analyzed item by item to extract the relationship between all differences within the segment. The overall change characteristics of the segment can be determined by traversing the direction and magnitude of changes in adjacent differences within the segment. In practice, all differences within a segment can be scanned sequentially. If all differences within a segment maintain the same value, or if the differences continuously increase or decrease in a unified direction with consistent magnitudes, the segment is identified as a stable segment. If the direction of the differences changes from positive to negative or vice versa, or if the magnitudes of changes between adjacent differences are inconsistent, the segment is identified as a disturbed segment. For example, the segment "+1, +2, +3" is continuously increasing with consistent changes and can be identified as a stable segment, while the segment "+2, -1, +3" shows a change in direction and is identified as a disturbed segment. In this way, segments with different change characteristics can be classified, providing a structural basis for subsequent sequential change analysis.
[0039] The changing characteristics of the positional differences within each segment indicate the changing patterns of the differences within the segment. Consistent changes in positional differences indicate that all differences within the segment are the same or have the same trend. Segments with continuous unidirectional changes indicate that the differences increase or decrease continuously in the same direction and the magnitude of change remains stable. Stable segments indicate segments with continuous and consistent changing patterns. Changes in the direction of positional differences indicate a reversal of the changing trend within the segment. Discontinuous changes in magnitude indicate inconsistent degrees of change between adjacent differences. Disturbed segments indicate segments with unstable changing patterns or abrupt changes. These elements allow for a clear classification of the changing characteristics within a segment, thereby distinguishing between stable and unstable parts of sequential changes.
[0040] S303. Based on the segment distribution of stable and disturbed segments and the connection relationship between adjacent segments, the distribution position and segment length of each segment in the sequential offset sequence are statistically analyzed. Combined with the proportion of stable segments in the segment distribution and the connection order between stable and disturbed segments, the traceability of sequence changes is determined.
[0041] In this embodiment, S303 specifically refers to: Based on the segment distribution of stable and disturbed segments, the distribution position of each segment in the sequential offset sequence is marked, and the number of sequence position differences contained in each segment is counted to form a segment length set. At the same time, the arrangement order of each segment in the sequential offset sequence is recorded. When processing the segment distribution of stable and disturbed segments, a sequential offset sequence can be used as a basis. The start and end positions of each segment in the sequence can be marked. For example, the segment range can be determined by recording the start and end indices of the segment in the sequence. Based on this, the number of positional differences contained within the segment can be counted to obtain the length information of each segment. In specific implementation, the segment list can be traversed, the boundary position of each segment can be recorded, and the number of positional differences contained in the segment can be calculated by subtracting the start position from the end position and adding one, thus forming a set of segment lengths. For example, if the sequential offset sequence is divided into three segments, corresponding to the first to third positions, the fourth to fifth positions, and the sixth position, then the set of segment lengths is "three, two, one". At the same time, the segments are arranged and recorded according to their order of appearance in the sequential offset sequence to obtain the complete segment arrangement order. Through this processing, the structural distribution of segments in the sequence can be transformed into quantifiable length and position information, providing a data foundation for subsequent analysis of the degree of sequence change. Among them, the segment distribution of stable segments and perturbation segments represents the arrangement of different types of segments in the sequential offset sequence. The distribution position of each segment in the sequential offset sequence represents the specific start and end range of each segment in the entire sequence. The number of sequence position differences contained in each segment represents the number of elements inside the segment. The segment length set represents the ordered set composed of the segment lengths in the order of arrangement. By combining these elements, the segment structure can be transformed from a simple classification into a descriptive form with position and scale information, which is used to characterize the overall distribution characteristics of the sequence change.
[0042] Based on the set of segment lengths, the ratio between the sum of stable segment lengths and the total length of the sequential offset sequence is calculated, and the adjacent connection relationships between stable segments and disturbed segments are extracted according to the segment arrangement order to form a segment connection sequence; When processing the set of segment lengths, we can first sum the lengths of all stable segments to obtain the sum of stable segment lengths. Simultaneously, we can count the number of position differences for all indices in the sequential offset sequence to obtain the total length of the sequential offset sequence. Then, by comparing the two, we can calculate the proportion of stable segment lengths in the overall sequence. Based on this, we analyze adjacent segments one by one according to their arrangement in the sequential offset sequence, identifying the connection type between each pair of adjacent segments. For example, stable segments are connected to stable segments, stable segments are connected to disturbed segments, or disturbed segments are connected to stable segments. These are recorded in the order of appearance, thus forming a segment connection sequence. For example, if the segment arrangement is "stable segment, stable segment, disturbed segment, stable segment", then the connection relationship is "stable-stable, stable-disturbed, disturbed-stable". This processing can transform the structural relationships in the segment distribution into an analyzable connection sequence, reflecting the connection between stable and disturbed parts during the sequence change process. Among them, the ratio between the sum of the lengths of stable segments and the total length of the sequence offset sequence represents the proportion of stable segments in the overall sequence, which is used to measure the degree of stability maintained during sequence changes. The adjacency connection relationship between stable segments and perturbation segments represents the adjacency combination form of different types of segments in the sequence. The segment connection sequence represents the ordered set formed by recording the connection relationship between adjacent segments according to the segment arrangement order. Through the combination of these elements, the characteristics of sequence changes can be characterized from both the quantitative distribution and structural connection dimensions.
[0043] The traceability of sequence changes is classified into levels based on the ratio between the sum of the lengths of stable segments and the total length of the sequence offset sequence, as well as the segment connection sequence. The first level of traceability is determined when the ratio between the sum of the lengths of stable segments and the total length of the sequence offset sequence is greater than a preset threshold and the number of consecutive occurrences of stable segments in the segment connection sequence is greater than a preset number. The second level of traceability is determined when the ratio between the sum of the lengths of stable segments and the total length of the sequence offset sequence is less than or equal to a preset threshold and stable segments and disturbed segments alternate in the segment connection sequence.
[0044] When classifying the traceability of sequence changes, we can first quantify the proportion of stable segments in the overall sequence based on the ratio of the sum of stable segment lengths to the total length of the sequence offset sequence. Then, we can analyze the connection patterns between segments by combining the segment connection sequence. In practice, we can first set a preset threshold to distinguish different states, for example, comparing the ratio of the sum of stable segment lengths to the total length of the sequence offset sequence with this preset threshold. At the same time, we can traverse the segment connection sequence, count the number of consecutive occurrences of stable segments in the sequence, and compare this number of consecutive occurrences with a preset number. In comparison, when the ratio exceeds a preset threshold and the number of consecutive occurrences of stable segments exceeds a preset number, the current sequence change state is classified into the first traceability level. For example, when the proportion of stable segment length is relatively large and the segment connection shows a continuous and stable structure, it is classified into this level. When the ratio does not exceed a preset threshold and the segment connection sequence shows an alternating pattern of stable segments and disturbed segments, the sequence change state is classified into the second traceability level. For example, when the segment arrangement is "stable segment, disturbed segment, stable segment, disturbed segment", it is classified into this level. In this way, the sequence change can be hierarchically distinguished from continuous and stable to frequent disturbances.
[0045] The traceability of sequence changes indicates the degree to which sequence change information can be identified and utilized during reverse recovery. The hierarchical division indicates that sequence changes are divided into different levels according to different judgment conditions. The preset threshold indicates the numerical boundary used to distinguish different levels. The number of consecutive occurrences of stable segments indicates the number of consecutive stable segments in the segment connection sequence. The preset number indicates the counting standard used to judge the degree of continuous stability. The first traceability indicates the case where the proportion of stable segments is large and the connection structure is continuous. The alternation of stable segments and perturbation segments indicates that the two types of segments are arranged alternately in the segment connection sequence. The second traceability indicates the case where the stability is low and the perturbation is frequent. Through the combination of these elements, a hierarchical characterization of the complexity of sequence changes can be achieved.
[0046] S4. When converting text content into tabular data, perform field arrangement and reorganization based on the traceability of order changes, the order base sequence, and the order offset sequence, and adjust the field arrangement order when converting text content into tabular data according to the traceability of order changes. In this embodiment, S4 specifically refers to: When converting text content into tabular data, the sequential offset sequence and the sequential base sequence are jointly parsed based on the traceability of the sequence change. The field identifiers parsed from the text content are matched with the position differences of the serial numbers in the sequential offset sequence, and the initial position of each field is determined by combining the correspondence between the field identifiers and serial numbers in the sequential base sequence. When converting text content into tabular data, the text content can be parsed first to extract field identifiers. The current position of each field in the text can then be determined by combining the position difference of the sequence offset sequence. Simultaneously, the parsing path is selected based on the traceability of the sequence change. When the traceability of the sequence change is high, the sequence baseline sequence is used as a reference; when the traceability of the sequence change is low, the sequence offset sequence is used for auxiliary correction. Specifically, the sequence indicator fragments can be identified from the text content and the corresponding sequence of position differences can be reconstructed. Then, a mapping relationship between field identifiers and position differences can be established. The correspondence between field identifiers and sequence numbers in the sequence baseline sequence is then compared. By associating the current position with the original position, the initial position of each field in the original table structure can be determined. For example, if the original order is "Name-1, Age-2, Address-3", and the text parsing order is "Age, Name, Address", the position difference can be used to determine that "Age" corresponds to original position 2 and "Name" corresponds to original position 1, thus providing a basic positional basis for subsequent reorganization.
[0047] The process of converting text content into tabular data represents the process of recovering the structured field arrangement from continuous text. The degree of traceability of order changes indicates the extent to which order change information can be utilized during the recovery process. The order offset sequence and the order reference sequence represent the numerical descriptions of the current and original arrangement states of the fields, respectively. Joint parsing indicates that both types of sequences are used simultaneously for comprehensive analysis of field positions. The correspondence between field identifiers and serial numbers represents the mapping relationship between a field and its original position number. The initial position of each field represents the target position of the field in the original table structure. Through the combination of these elements, the positional association of fields between different sequences can be realized, providing an accurate positioning basis for subsequent field arrangement and reorganization.
[0048] After the initial positions of each field are determined, the field arrangement and recombination path is selected according to the degree of traceability of the sequence change. When the degree of traceability of the sequence change corresponds to the first degree of traceability, the position of the sequence number in the sequence base sequence is used as the basis for field arrangement. When the degree of traceability of the sequence change corresponds to the second degree of traceability, the initial position of each field is offset by the difference of the position of the sequence offset sequence to form the recombination position of each field. After the initial positions of each field are determined, the field arrangement and recombination path can be selected based on the traceability of the order change, and different field arrangement processes can be executed accordingly. In specific implementation, the level corresponding to the traceability of the order change can be determined first. When it corresponds to the first traceability level, it means that the field order change is relatively stable. At this time, the position of the sequence number in the sequence base sequence is directly used as the basis for field arrangement, that is, the fields are reordered according to the original sequence number order. When it corresponds to the second traceability level, it means that there is a lot of disturbance in the field order. At this time, it is necessary to combine the position difference of the sequence number in the sequence offset sequence to offset the initial position of each field, and adjust the current field position with the corresponding difference to form a new arrangement position. For example, if the original sequence number is "1, 2, 3", the initial position is resolved to "2, 1, 3", and the corresponding position difference is "+1, -1, 0", then the arrangement structure can be restored to the original order or close to the original order through offset adjustment. Through this hierarchical processing, different recombination strategies can be adopted under different order change states.
[0049] The field arrangement and reorganization path represents the field arrangement processing method selected based on the order change. The order change traceability level corresponds to the first traceability level, which indicates a state with small order changes and relatively stable structure. The sequence number position in the order baseline sequence represents the field's arrangement order number in the original table. The field arrangement basis represents the reference information used to determine the field arrangement order. The order change traceability level corresponds to the second traceability level, which indicates a state with large order changes and disturbances. The offset calculation represents the process of adjusting the field position through the difference in sequence position. The reorganization position of each field represents the target position of the field in the table after the arrangement adjustment. By combining these elements, reasonable rearrangement of fields can be achieved under different order change conditions.
[0050] The field identifiers are sorted and rearranged according to the reorganization position of each field to form a field arrangement result. Table data is then generated based on the field arrangement result to realize the adjustment of the field arrangement order when the text content is reverse-converted into table data according to the traceability of the order change.
[0051] After determining the reorganization positions of each field, the field identifiers can be sorted and rearranged according to their respective reorganization positions. Specifically, a dataset can be constructed first, using field identifiers as elements and reorganization positions as sorting keys. Then, the data is sorted from smallest to largest according to the reorganization positions. The sorted sequence of field identifiers is used as the new field arrangement result. Based on this, the field contents corresponding to the field identifiers are sequentially filled into the table structure according to the sorted order, completing the reconstruction of the table data. For example, if the reorganization positions of the fields "Name, Age, Address" are "1, 3, 2" respectively, then the arrangement result after sorting according to the reorganization positions is "Name, Address, Age". This order is then used to fill the corresponding columns in the table, forming table data that conforms to the target order. This method transforms the arrangement relationship obtained through position calculations into an actual data structure output, giving the restoration of the field order a clear execution path.
[0052] Field identifiers are data tags used to distinguish different fields, usually corresponding to field names or field unique codes. Sorting and rearranging refers to the process of arranging field identifiers in an orderly manner according to the reorganization position of the fields. Field arrangement result refers to the field order sequence formed after sorting. Tabular data refers to the structured data collection constructed according to the field arrangement result. Through the synergistic effect of these elements, the field position calculation result can be transformed into a specific data organization form, realizing the accurate restoration of field order during the reverse conversion of text content.
[0053] S5. During the bidirectional conversion process, the sequence offset sequence and the traceability of sequence changes are continuously tracked, and the embedding distribution of sequence indicator fragments in the field content is adjusted based on the change trend to maintain the correspondence between the sequence reference sequence and the arrangement trajectory.
[0054] In this embodiment, S5 specifically refers to: During the bidirectional conversion process, the sequence offset sequence is collected and recorded sequentially according to the conversion execution order. After each conversion, the difference in the sequence position in the sequence offset sequence is updated. At the same time, the traceability of sequence change is repeatedly determined based on the updated sequence offset sequence, forming a continuous tracking sequence of sequence offset sequence and traceability of sequence change. During the bidirectional conversion process, the sequential offset sequence generated after each conversion can be collected sequentially according to the order of conversion between table data and text content. Each collection result is stored in chronological order, thus forming a serialized record set. After each conversion, the existing sequential offset sequence is updated by recalculating the difference in the sequence position of the field in the current conversion result. Based on the updated sequential offset sequence, the determination of the traceability of the sequence change is re-executed. Each determination result is associated with the corresponding sequential offset sequence and recorded to form continuous tracking data. For example, the first conversion yields a sequential offset sequence of "+1, 0, -1", corresponding to a certain traceability level. After the second conversion, it is updated to "+2, -1, 0", and the determination is performed and recorded again. Through multiple conversions, a time series data of "sequence + determination" is formed, which can reflect the evolution of the sequence change during multiple conversions and provide a basis for subsequent trend analysis.
[0055] The bidirectional conversion process represents the execution process of multiple round trip conversions between tabular data and text content. The sequential offset sequence represents the numerical sequence of field position changes after each conversion. Successive acquisition and recording represents the sequential acquisition and storage of each conversion result. The sequence position difference update represents the recalculation of field position changes in the new round of conversion results. The repeated determination of the traceability of sequence changes represents the reassessment of the recoverability of sequence changes after each update. The continuous tracking sequence represents the ordered data set formed by combining the sequential offset sequence with the corresponding determination results in chronological order. Through the synergistic effect of these elements, continuous monitoring and recording of the sequence change process can be achieved.
[0056] Based on the continuous tracking sequence, the changing trend of the position difference of the sequence number in the sequential offset sequence is analyzed. By comparing the changing direction and magnitude of the position difference of the sequence number in the sequential offset sequence at adjacent time points, the changing trend sequence is extracted, and the embedding distribution adjustment parameters of the sequence indicator segment in the field content are determined based on the changing trend sequence. When performing trend analysis based on continuous tracking sequences, the sequences can be expanded chronologically, and the sequence offset sequences corresponding to each moment can be compared item by item. The focus is on comparing the changes in the positional differences of the same field in adjacent moments. By judging whether the direction of the difference changes between adjacent moments is consistent and whether the magnitude of the change increases or decreases, a trend sequence is formed. In specific implementation, a difference sequence in the time dimension can be constructed for the same field. For example, if the positional differences of a field in three consecutive transformations are "+1, +2, +3", it is determined to be a continuous increasing trend, while if it is "+1, -1, +2", it is determined to be a fluctuating trend. After obtaining the trend sequence, the embedding position and embedding density of the sequence indicator fragment in the field content are parametrically adjusted according to different trend types. For example, the embedding frequency is increased for a continuous increasing trend, and the embedding position distribution is adjusted for a fluctuating trend. In this way, the embedding distribution can be kept consistent with the sequence change trend, thereby enhancing the stability of the sequence information in the transformation process.
[0057] The continuous tracking sequence represents a combination of sequential offset sequences recorded in chronological order and the traceability of sequence changes. The trend of the position difference indicates the evolution of the position of the same field in different transformation stages. The direction of change indicates whether the difference changes in a positive or negative direction. The magnitude of change indicates the magnitude of the difference change. The trend sequence represents the sequence data formed by continuously recording these change features. The embedding distribution adjustment parameters represent a set of parameters used to control the embedding position and number of sequence indicator fragments in the field content. Through the combination of these elements, the matching between the embedding distribution of sequence indicator fragments and the dynamic features of sequence changes can be achieved.
[0058] Based on the embedding distribution adjustment parameters, the embedding distribution of the sequence indicator fragment in the field content is adjusted. By changing the embedding position and number of the sequence indicator fragment in the field content, the correspondence between the field identifier and the sequence number position is maintained between the sequence reference sequence and the arrangement trajectory.
[0059] When adjusting the sequence indicator fragments according to the embedding distribution adjustment parameters, the control information about the embedding position and number of embeddings in the embedding distribution adjustment parameters can be read first. Then, the content of each field is scanned and located to determine the current embedding position distribution of the sequence indicator fragments. The embedding position is then reconfigured according to the embedding distribution adjustment parameters. For example, the sequence indicator fragment originally located at the beginning of the field can be moved to the middle or end of the field, or multiple embedding points can be added to the field content to improve the coverage density of the sequence information. At the same time, the number of embeddings can be adjusted to match the changing trend. In specific implementation, the field content can be segmented, and a sequence indicator fragment can be inserted at each segment position. The insertion interval can be controlled according to the changing trend. For example, when the difference in the position of the sequence number of a field changes continuously in multiple transformations, the number of embeddings can be increased in the field content. When the change tends to be stable, the number of embeddings can be reduced. In this way, a dynamically distributed sequence identifier can be formed in the field content, so that the correspondence between the field and the original sequence number can still be accurately restored during subsequent parsing.
[0060] The embedding distribution adjustment parameters represent a set of parameters used to control the embedding position and number of sequence indicator fragments in the field content. The sequence indicator fragment represents the encoded fragment used to identify the order information of the field. The field content represents the specific data value corresponding to the field. The embedding distribution represents the spatial distribution of the sequence indicator fragments in the field content. The embedding position represents the specific position point of the sequence indicator fragment inserted in the field content. The embedding number represents the number of times the sequence indicator fragment is inserted in the same field content. The sequence reference sequence represents the reference sequence of the original order of the field. The arrangement trajectory represents the actual arrangement order of the field in the text. Through the synergistic effect of these elements, the distribution of the sequence indicator fragments can be dynamically adjusted during multiple transformations, so that the correspondence between the field identifier and the sequence position can be maintained continuously under different transformation states.
[0061] The above embodiments can be implemented, in whole or in part, by software, hardware, firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, as a computer program product. A computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions according to the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. Computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired or wireless means (e.g., infrared, wireless, microwave, etc.). A computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more sets of available media. Available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media. Semiconductor media can be solid-state drives.
[0062] It should be understood that in the various embodiments of this application, the order of the above-mentioned processes does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0063] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0064] In the several embodiments provided in this application, it should be understood that the disclosed systems and methods can be implemented in other ways. For example, the embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interfaces, devices, or units, and may be electrical, mechanical, or other forms.
[0065] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0066] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0067] The above are merely specific embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A method for bidirectional conversion between tabular data and text content, characterized in that: Specifically, the following steps are included: S1. Generate a sequence of fields in the table data, construct a sequence indicator fragment based on the preset encoding symbols, embed the sequence indicator fragment into the field content to form combined data, and record the arrangement trajectory of the fields in the text content. S2. Establish the field correspondence based on the sequential baseline sequence and the arrangement trajectory, generate the sequential offset sequence, and determine whether the field order of the table data is semantically rearranged when it is converted into text content based on the sequential offset sequence. S3. When the field order is semantically rearranged, the sequence offset sequence is divided into segments to identify stable segments and disturbed segments, and the traceability of the sequence change is determined based on the segment distribution and connection relationship. S4. When converting text content into tabular data, perform field arrangement and reorganization based on the traceability of order changes, the order base sequence, and the order offset sequence, and adjust the field arrangement order when converting text content into tabular data according to the traceability of order changes. S5. During the bidirectional conversion process, the sequence offset sequence and the traceability of sequence changes are continuously tracked, and the embedding distribution of sequence indicator fragments in the field content is adjusted based on the change trend to maintain the correspondence between the sequence reference sequence and the arrangement trajectory.
2. The method for bidirectional conversion between tabular data and text content according to claim 1, characterized in that, S1 specifically refers to: The fields in the table data are sequentially labeled according to their original arrangement order, and consecutive serial numbers are assigned to each field. The field identifiers are then associated with their corresponding serial numbers to construct a sequential baseline sequence. Based on the sequence number in the sequential baseline sequence, the sequence is encoded according to the preset encoding symbols. The preset encoding symbols are a set of fixed characters that are pre-defined to identify sequence information. The sequence number is mapped to the corresponding character combination to form a sequence indicator fragment, and the sequence indicator fragment is combined with the field identifier. Sequence indicator fragments are embedded into field content to form combined data. When the tabular data is converted into text content, the combined data is recorded sequentially according to the output order of the fields in the text content, generating the arrangement trajectory of the fields in the text content.
3. The method for bidirectional conversion between tabular data and text content according to claim 1, characterized in that, S2 specifically includes the following steps: S201. Based on the correspondence between field identifiers and sequence numbers in the sequential baseline sequence, and combined with the arrangement order of the sequence indicator segments in the arrangement trajectory of the fields in the text content, the field identifiers in the sequential baseline sequence and the sequence indicator segments in the arrangement trajectory are matched to establish the field correspondence between the sequence number position of the field in the sequential baseline sequence and the sequence number position in the arrangement trajectory. S202. Based on the position of each field in the sequential reference sequence and the position of each field in the arrangement trajectory, calculate the position difference for each field, and sort the position differences according to the position order in the arrangement trajectory to form a sequential offset sequence. S203. Based on the distribution of the position difference of the serial number in the sequential offset sequence, the sequential offset sequence is judged. When the position difference of each serial number in the sequential offset sequence is zero or shows a consistent trend, it is determined that the field order of the table data has not been semantically rearranged when it is converted into text content. When there are inconsistent position differences of serial numbers in the sequential offset sequence, it is determined that the field order of the table data has been semantically rearranged when it is converted into text content.
4. The method for bidirectional conversion between tabular data and text content according to claim 3, characterized in that, S203 specifically refers to: Based on the arrangement order of the position differences of each index in the sequential offset sequence, each position difference is read item by item, and a distribution sequence of position differences is constructed according to the arrangement order. At the same time, the adjacent change relationship between position differences is recorded to form a change sequence of position differences. Based on the change sequence of the position difference of the serial number, the change direction and change magnitude between adjacent position differences of the serial number are compared. The position differences of the serial number with the same change direction and the same change magnitude are divided into the same change segment, and the position differences of the serial number with different change directions or different change magnitudes are divided into different change segments, thus forming the segment distribution result of the position difference of the serial number. Based on the segment distribution results of the sequence number position difference, the order offset sequence is determined. When the segment distribution result contains only a single segment and the sequence number position difference is zero or shows a consistent trend, it is determined that the field order of the table data has not undergone semantic rearrangement when converted to text content. When the segment distribution result contains multiple segments and there are inconsistent sequence number position differences, it is determined that the field order of the table data has undergone semantic rearrangement when converted to text content.
5. The method for bidirectional conversion between tabular data and text content according to claim 1, characterized in that, S3 specifically includes the following steps: S301. When the field order is semantically rearranged, the position difference of each number is read continuously according to the arrangement order of the position difference of the number in the sequential offset sequence, and the segmentation is performed according to the change direction and change magnitude between adjacent position differences. The position differences of the number with the same change direction and the same change magnitude are divided into the same segment to form the segmentation result of the sequential offset sequence. S302. After completing the segment division, analyze the variation characteristics of the position difference of the serial number in each segment. Identify the segments in which the position difference of the serial number changes consistently or in a continuous unidirectional direction as stable segments, and identify the segments in which the position difference of the serial number changes in direction or the magnitude of the change is discontinuous as disturbed segments, thus forming the identification results of stable segments and disturbed segments. S303. Based on the segment distribution of stable and disturbed segments and the connection relationship between adjacent segments, the distribution position and segment length of each segment in the sequential offset sequence are statistically analyzed. Combined with the proportion of stable segments in the segment distribution and the connection order between stable and disturbed segments, the traceability of sequence changes is determined.
6. The method for bidirectional conversion between tabular data and text content according to claim 5, characterized in that, S303 specifically refers to: Based on the segment distribution of stable and disturbed segments, the distribution position of each segment in the sequential offset sequence is marked, and the number of sequence position differences contained in each segment is counted to form a segment length set. At the same time, the arrangement order of each segment in the sequential offset sequence is recorded. Based on the set of segment lengths, the ratio between the sum of stable segment lengths and the total length of the sequential offset sequence is calculated, and the adjacent connection relationships between stable segments and disturbed segments are extracted according to the segment arrangement order to form a segment connection sequence; The traceability of sequence changes is classified into levels based on the ratio between the sum of the lengths of stable segments and the total length of the sequence offset sequence, as well as the segment connection sequence. The first level of traceability is determined when the ratio between the sum of the lengths of stable segments and the total length of the sequence offset sequence is greater than a preset threshold and the number of consecutive occurrences of stable segments in the segment connection sequence is greater than a preset number. The second level of traceability is determined when the ratio between the sum of the lengths of stable segments and the total length of the sequence offset sequence is less than or equal to a preset threshold and stable segments and disturbed segments alternate in the segment connection sequence.
7. The method for bidirectional conversion between tabular data and text content according to claim 1, characterized in that, S4 specifically refers to: When converting text content into tabular data, the sequential offset sequence and the sequential base sequence are jointly parsed based on the traceability of the sequence change. The field identifiers parsed from the text content are matched with the position differences of the serial numbers in the sequential offset sequence, and the initial position of each field is determined by combining the correspondence between the field identifiers and serial numbers in the sequential base sequence. After the initial positions of each field are determined, the field arrangement and recombination path is selected according to the degree of traceability of the sequence change. When the degree of traceability of the sequence change corresponds to the first degree of traceability, the position of the sequence number in the sequence base sequence is used as the basis for field arrangement. When the degree of traceability of the sequence change corresponds to the second degree of traceability, the initial position of each field is offset by the difference of the position of the sequence offset sequence to form the recombination position of each field. The field identifiers are sorted and rearranged according to the reorganization position of each field to form a field arrangement result. Table data is then generated based on the field arrangement result to realize the adjustment of the field arrangement order when the text content is reverse-converted into table data according to the traceability of the order change.
8. The method for bidirectional conversion between tabular data and text content according to claim 1, characterized in that, S5 specifically refers to: During the bidirectional conversion process, the sequence offset sequence is collected and recorded sequentially according to the conversion execution order. After each conversion, the difference in the sequence position in the sequence offset sequence is updated. At the same time, the traceability of sequence change is repeatedly determined based on the updated sequence offset sequence, forming a continuous tracking sequence of sequence offset sequence and traceability of sequence change. Based on the continuous tracking sequence, the changing trend of the position difference of the sequence number in the sequential offset sequence is analyzed. By comparing the changing direction and magnitude of the position difference of the sequence number in the sequential offset sequence at adjacent time points, the changing trend sequence is extracted, and the embedding distribution adjustment parameters of the sequence indicator segment in the field content are determined based on the changing trend sequence. Based on the embedding distribution adjustment parameters, the embedding distribution of the sequence indicator fragment in the field content is adjusted. By changing the embedding position and number of the sequence indicator fragment in the field content, the correspondence between the field identifier and the sequence number position is maintained between the sequence reference sequence and the arrangement trajectory.