Table data processing method, device and equipment, storage medium and program product

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By performing structural encoding network processing on tabular data in front of a large language model, adding row and column position information to each cell and performing self-attention interaction, the problem that the large language model cannot fully perceive the two-dimensional structure of the table is solved, and a deep understanding and accurate query of the tabular data is achieved.

CN122240653APending Publication Date: 2026-06-19CHINA MOBILE(ZHEJIANG) RESEARCH & INNOVATION INSTITUTE +2

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: CHINA MOBILE(ZHEJIANG) RESEARCH & INNOVATION INSTITUTE
Filing Date: 2026-03-05
Publication Date: 2026-06-19

AI Technical Summary

Technical Problem

Existing large language models cannot fully perceive and model the two-dimensional grid structure of tables when processing tabular data, resulting in incomplete understanding of tables and difficulty in comprehending and reasoning about complex queries that rely on relative positions and topological relationships in the two-dimensional structure, thus limiting their accuracy and reliability in real-world scenarios.

Method used

By performing structural encoding network processing on the table data before inputting it into the large language model, row and column position information is added to each cell. Self-attention calculation is used to ensure that each cell only interacts with cells in the same row and column, forcing the structural feature perception model to follow the inherent row and column logic of the table, thereby completely and accurately capturing the intra-row and intra-column structural relationships of the table.

Benefits of technology

It significantly enhances the large language model's ability to perceive two-dimensional grid structures, enabling it to parse query semantics based on a deep understanding of table structures, thereby improving the model's accuracy and reliability in processing tabular data.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122240653A_ABST

Patent Text Reader

Abstract

This application discloses a method, apparatus, device, storage medium, and program product for processing tabular data. The method includes: acquiring a first table and a first query condition; inputting the first table into a structure encoding network of a table structure feature perception model for structure encoding processing to obtain a first data sequence; wherein the structure encoding processing includes: adding row position information and column position information to each cell of the first table, and using self-attention calculation to ensure that each cell only interacts with cells in the same row and column; inputting the first data sequence into a large language model to enable the large language model to process the first query condition and obtain a first query result. According to the embodiments of this application, the perception capability of the large language model for two-dimensional grid structures can be enhanced.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of tabular data processing technology, and in particular relates to a tabular data processing method, apparatus, device, storage medium and program product. Background Technology

[0002] In the era of big data and artificial intelligence, large language models (LMs) have become core tools for understanding and generating natural language. With the deepening of their applications, how to enable these models to effectively process and understand structured data that is not pure text, especially tabular data widely found in finance, government affairs, and commerce, has become an important research and practical direction.

[0003] Currently, the industry has proposed several technical solutions aimed at enhancing the table processing capabilities of large language models. One mainstream approach is to treat tables as an independent data modality, extracting table features through dedicated table encoders and connecting them to the large language model. For example, some methods encode each row of a table as an independent text sequence, or utilize specific neural network architectures to model the set relationships between rows. These encoders typically focus on capturing semantic information from the table content.

[0004] However, existing methods have a significant technical limitation: they cannot fully perceive and model the inherent two-dimensional grid structure of tables. Specifically, a table is a matrix composed of rows and columns, where the data in each cell is influenced by both the context of its row and the attributes of its column. Existing solutions often focus only on unidirectional serialization modeling along either the row or column direction, failing to explicitly and comprehensively encode the precise positional relationships of cells in two-dimensional space (i.e., which row and column they are in) and the resulting row-column interrelationships into the model representation. This results in an incomplete understanding of the table by the model, making it difficult to fully comprehend and reason about complex queries that rely on relative positions and topological relationships within the two-dimensional structure, thus limiting its accuracy and reliability in real-world scenarios. Summary of the Invention

[0005] This application provides a method, apparatus, device, computer storage medium, and program product for processing tabular data, which can enhance the perception ability of large language models to two-dimensional grid structures.

[0006] On one hand, embodiments of this application provide a method for processing tabular data, the method comprising: obtaining a first table and a first query condition; inputting the first table into the structure encoding network of a table structure feature perception model for structure encoding processing to obtain a first data sequence; wherein, the structure encoding processing comprises: adding row position information and column position information to each cell of the first table, and using self-attention calculation to make each cell interact with information only with cells in the same row and column; inputting the first data sequence into a large language model to enable the large language model to process the first query condition and obtain a first query result.

[0007] In some possible implementations, based on the above-mentioned table data processing method, the table structure feature perception model further includes a filtering module. Before inputting the first table into the structure encoding network of the table structure feature perception model, the method further includes: inputting the first table and the first query conditions into the filtering module of the table structure feature perception model, filtering based on the semantic relevance of the first query conditions and the first table to obtain the filtered first table; inputting the first table into the structure encoding network of the table structure feature perception model, including: inputting the filtered first table into the structure encoding network of the table structure feature perception model.

[0008] In some possible implementations, in the above implementation of filtering based on semantic relevance to obtain the filtered first table, filtering based on the semantic relevance of the first query condition and the first table to obtain the filtered first table includes: converting the first query condition, each row of data in the first table, and each column header into corresponding semantic representations; calculating the first similarity between the semantic representation of the first query condition and the semantic representation of each row of data in the first table; calculating the second similarity between the semantic representation of the first query condition and the semantic representation of each column header in the first table; filtering rows with the first similarity higher than a preset threshold and columns with the second similarity higher than a preset threshold to obtain the filtered first table.

[0009] In some possible implementations, based on the above-mentioned tabular data processing method, the method further includes the following training steps: obtaining a first training sample set; wherein the first training sample set includes multiple first training samples, and the first training samples contain a second table, a second query condition, and a second query result corresponding to the second query condition; inputting the second table into the structure encoding network of the table structure feature perception model for structure encoding processing to obtain a second data sequence; inputting the second data sequence into a large language model to enable the large language model to process the second query condition and obtain a first target query result; determining a first loss value of the first loss function based on the first target query result and the second query result; if the first loss value does not meet the first training stopping condition, keeping the parameters of the large language model unchanged, adjusting the parameters of the structure encoding network according to the first loss value to obtain an updated table structure feature perception model; and returning to inputting the second table into the structure encoding network of the table structure feature perception model until the first loss value meets the first training stopping condition to obtain a trained target table structure feature perception model.

[0010] In some possible implementations, based on the above-mentioned tabular data processing method, the first table includes multiple first tables, and the table structure feature perception model also includes a self-attention module. Before inputting the first data sequence into the large language model, the method further includes: obtaining the first data sequence corresponding to each first table; concatenating the first data sequences corresponding to multiple first tables to generate a third data sequence; inputting the third data sequence into the self-attention module of the table structure feature perception model to perform self-attention calculation, so that the first data sequences corresponding to each first table can interact with each other to obtain a fourth data sequence; inputting the first data sequence into the large language model includes: inputting the fourth data sequence into the large language model.

[0011] In some possible implementations, during the process of inputting the third data sequence into the self-attention module to obtain the fourth data sequence, the third data sequence is input into the self-attention module of the table structure feature perception model for self-attention calculation, so that the first data sequences corresponding to each first table can interact with each other to obtain the fourth data sequence. This includes: adding an aggregation identifier to the beginning of the third data sequence; inputting the third data sequence with the aggregation identifier added into the self-attention module of the table structure feature perception model for self-attention calculation, so that the aggregation identifier can interact with the first data sequences corresponding to each first table to obtain the fourth data sequence.

[0012] In some possible implementations, based on the above-mentioned table data processing method involving multiple first tables (i.e., the implementation method involving multi-table data interaction), the method further includes the following multi-table training steps: obtaining a second training sample set; wherein the second training sample set includes multiple second training samples, and the second training samples contain multiple third tables, third query conditions, and third query results corresponding to the third query conditions; inputting the third tables into the table encoding network of the table structure feature perception model for encoding processing to obtain a fifth data sequence; concatenating the fifth data sequences corresponding to multiple third tables to generate a sixth data sequence; inputting the sixth data sequence into the self-attention module of the table structure feature perception model for self-attention calculation, so that each third table... The corresponding fifth data sequence interacts with each other to obtain the seventh data sequence; the seventh data sequence is input into the large language model to process the third query condition and obtain the second target query result; based on the second target query result and the third query result, the second loss value of the second loss function is determined; if the second loss value does not meet the second training stopping condition, the parameters of the large language model remain unchanged, and the parameters of the structure encoding network and the self-attention module are adjusted according to the second loss value to obtain the updated table structure feature perception model; and the third table is input into the table encoding network of the table structure feature perception model until the second loss value meets the second training stopping condition, thus obtaining the trained target table structure feature perception model.

[0013] On the other hand, embodiments of this application provide a tabular data processing apparatus, comprising: a data acquisition module for acquiring a first table and a first query condition; a structure encoding module for inputting the first table into the structure encoding network of a table structure feature perception model for structure encoding processing to obtain a first data sequence; wherein the structure encoding processing includes: adding row position information and column position information to each cell of the first table, and using self-attention calculation to ensure that each cell only interacts with cells in the same row and column; and an inference output module for inputting the first data sequence into a large language model to enable the large language model to process the first query condition and obtain a first query result.

[0014] In another aspect, embodiments of this application provide an electronic device, the device including: a processor and a memory storing computer program instructions; the processor executes the computer program instructions to implement a tabular data processing method.

[0015] In another aspect, embodiments of this application provide a computer storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, a table data processing method is implemented.

[0016] In another aspect, embodiments of this application provide a computer program product in which instructions, when executed by the processor of an electronic device, cause the electronic device to perform a tabular data processing method.

[0017] The table data processing method, apparatus, device, and computer storage medium of this application embodiment preprocess the first table using a structural encoding network before inputting it into a large language model. This structural encoding network adds row and column position information to each cell. This feature directly addresses the deficiency in existing technologies that ignore the corresponding positional relationships between elements in rows and columns, explicitly encoding the two-dimensional structural information of the table into the data. Furthermore, the structural encoding network constrains each cell to interact only with cells in the same row and column through self-attention calculation. This mechanism forces the structural feature perception model to strictly follow the inherent row and column logic of the table when extracting features, thereby completely and accurately capturing the intra-row and intra-column structural relationships of the table. Finally, the table data sequence, incorporating precise row and column structural information, is input into the large language model for query processing, enabling the model to parse query semantics based on a deep understanding of the table structure, thereby enhancing the large language model's ability to perceive two-dimensional grid structures. Attached Figure Description

[0018] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments of this application will be briefly introduced below. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 This is a flowchart illustrating a tabular data processing method provided in one embodiment of this application; Figure 2 This is a flowchart illustrating a tabular data processing method provided in another embodiment of this application; Figure 3 This is a flowchart illustrating a tabular data processing method provided in another embodiment of this application; Figure 4 This is a schematic diagram of the architecture of a tabular data processing method provided in another embodiment of this application; Figure 5 This is a schematic diagram of the architecture of a tabular data processing method provided in another embodiment of this application; Figure 6 This is a schematic diagram of the structure of a tabular data processing device provided in another embodiment of this application; Figure 7 This is a schematic diagram of the structure of an electronic device provided in another embodiment of this application. Detailed Implementation

[0020] The features and exemplary embodiments of various aspects of this application will be described in detail below. To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only intended to explain this application and not to limit it. For those skilled in the art, this application can be implemented without some of these specific details. The following description of the embodiments is merely to provide a better understanding of this application by illustrating examples.

[0021] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising..." does not exclude the presence of additional identical elements in the process, method, article, or apparatus that includes the element.

[0022] It should be noted that the acquisition, storage, use, and processing of data in this application embodiment all comply with the relevant provisions of national laws and regulations.

[0023] It should be noted that in the embodiments of this application, certain software, components, models and other existing solutions in the industry may be mentioned. These should be regarded as exemplary and are only intended to illustrate the feasibility of implementing the technical solution of this application. However, it does not mean that the applicant has used or necessarily used the solution.

[0024] The existing large language model table processing solutions cannot fully perceive and model the inherent two-dimensional grid structure of tables. The core problem stems from the fact that, at the technical design level, these solutions do not conduct targeted structural modeling design for the two-dimensional matrix nature of table data. They only treat tables as one-dimensional text sequences or single-dimensional collection data, and lack the ability to explicitly model the two-dimensional positional relationships and row-column intersections of tables from the underlying design. Ultimately, this results in the model being unable to fully capture the two-dimensional structural information of tables.

[0025] Specifically, on the one hand, existing table encoders do not introduce a two-dimensional position embedding mechanism for rows and columns, and cannot encode the precise row and column belonging information of cells in the two-dimensional grid into the feature representation. The model cannot perceive the relative positional relationship between cells and loses the basic positional features for understanding the two-dimensional structure of the table, resulting in a lack of original structural information of the table during the feature extraction stage. On the other hand, the attention mechanism and neural network architecture of existing technologies have the design limitation of unidirectional modeling. They may only perform serialized text modeling of the table along the row direction, or only independently represent column attributes, or only model the set relationship between rows. They have not designed an attention mechanism that can simultaneously constrain and capture the cross-relationships within rows, columns, and between rows and columns. As a result, the feature representation of cells can only integrate single-dimensional contextual information and cannot integrate the topological relationship semantics brought about by the cross-row and column intersections in the two-dimensional grid. It is difficult to depict the inherent logical relationship of cells in the table due to their two-dimensional positions. Meanwhile, some solutions forcibly convert two-dimensional tables into one-dimensional text sequences for encoding. This conversion process inevitably loses the original two-dimensional structural association information of the table. Subsequent semantic modeling can only be based on the incomplete structural information and cannot restore the inherent matrix logic of the table. Ultimately, this makes it difficult for the model to effectively reason about complex queries that rely on two-dimensional structural topological relationships, thus limiting its accuracy and reliability in processing table data in real-world scenarios.

[0026] The table data processing method of this application preprocesses the first table using a structural encoding network before inputting it into a large language model. This structural encoding network adds row and column position information to each cell. This feature directly addresses the deficiency in existing technologies that ignore the corresponding positional relationships between elements in rows and columns, explicitly encoding the two-dimensional structural information of the table into the data. Furthermore, the structural encoding network constrains each cell to interact only with cells in the same row and column through self-attention calculation. This mechanism forces the structural feature perception model to strictly follow the inherent row and column logic of the table when extracting features, thereby completely and accurately capturing the intra-row and intra-column structural relationships of the table. Finally, the table data sequence, incorporating precise row and column structural information, is input into the large language model for query processing. This enables the model to parse query semantics based on a deep understanding of the table structure, significantly enhancing the large language model's ability to perceive two-dimensional grid structures.

[0027] To address the problems of the prior art, embodiments of this application provide a method, apparatus, device, computer storage medium, and computer program product for processing tabular data. The tabular data processing method provided in this application embodiment will be described first below.

[0028] Figure 1 A flowchart illustrating a tabular data processing method according to an embodiment of this application is shown. Figure 1As shown, the method includes the following steps.

[0029] S101, obtain the first table and the first query condition.

[0030] As an example, the first table can refer to structured data with a two-dimensional grid structure, consisting of intersecting rows and columns, containing at least one header row and several record rows. The intersection of rows and columns is the cell, and each cell stores structured data in text form. It is a structured data object for information extraction and reasoning by a large language model.

[0031] As an example, the first query condition can refer to the information query requirements put forward by the user for the structured data in the first table, presented in the form of natural language text, which is used to instruct the large language model to perform corresponding information extraction, logical reasoning and other operations on the first table.

[0032] Specifically, as an example, structured data from a first table can be read from a data storage medium, which may include a local database, a cloud data warehouse, a distributed file system, or other storage media. During the reading process, the two-dimensional row and column structure of the first table and the integrity of the text data within the cells are maintained, without altering the original structural features of the table. Simultaneously, the system receives first query conditions in natural language text format submitted by the user through a human-computer interaction interface, application programming interface, or other input methods. Basic text preprocessing operations are performed on the first query conditions, including removing invalid special characters, formatting the text, and completing incomplete query semantics, ensuring the textual integrity and semantic validity of the first query conditions, thus providing a foundation for the semantic understanding of subsequent large language models.

[0033] S102, the first table is input into the structure encoding network of the table structure feature perception model and structure encoding is performed to obtain the first data sequence; wherein, the structure encoding process includes: adding row position information and column position information to each cell of the first table, and using self-attention calculation to make each cell only interact with cells in the same row and column.

[0034] As an example, a table structure feature perception model can refer to a neural network model used to extract the two-dimensional structural features and semantic features of a table. This model integrates functional modules such as a structure encoding network, which can convert the two-dimensional structured data of a table into a vector sequence form that is adapted to the processing requirements of large language models, thereby realizing the fusion encoding of table structure features and semantic features.

[0035] As an example, a structural encoding network can refer to a functional network in a table structure feature perception model used to perform two-dimensional structural feature encoding on a table. It is built on the Transformer architecture and integrates a position embedding module and a self-attention module with a table mask matrix, which can realize the addition of table position information and the modeling of row and column interaction relationships.

[0036] As an example, structured encoding processing can refer to the feature encoding process of adding positional information to the cells of a table through a structured encoding network and modeling the information interaction relationship between the rows and columns of the table based on a self-attention mechanism with mask constraints, thereby converting the two-dimensional structured data of the table into a one-dimensional vector sequence.

[0037] As an example, row position information can refer to the positional features of the row to which a cell in a table belongs. It exists in the form of a learnable row position embedding vector, which can be fused into the embedding vectors of all cells in the corresponding row through a broadcast mechanism.

[0038] As an example, column position information can refer to the positional features of the column to which a cell in a table belongs. It exists in the form of a learnable column position embedding vector, which can be fused into the embedding vectors of all cells in the corresponding column through a broadcast mechanism.

[0039] As an example, self-attention computation can refer to the feature computation process that uses a self-attention mechanism and combines a table mask matrix to constrain the scope of attention computation, thereby enabling directional information interaction between table cells. This allows a cell to fuse feature information only with cells in the same row and column.

[0040] As an example, the first data sequence can refer to a one-dimensional cell embedding vector sequence output after structural encoding, which integrates the semantic features and two-dimensional structural features of the table. The sequence length is consistent with the number of cells in the table, and the vector dimension is the preset embedding dimension.

[0041] Specifically, as an example, the first table can be input into the structure encoding network of the table structure feature perception model. The structure encoding network first performs cell text parsing on the first table, processing the text data of each cell in the first table through the tokenizer and embedding layer of the large language model. This processing includes word segmentation, token encoding, embedding layer query, and average pooling, converting the text data of each cell into an initial embedding vector with a preset embedding dimension h. Based on the two-dimensional row and column structure of the first table, the initial embedding vectors of all cells are combined into a three-dimensional embedding matrix that matches the rows and columns of the table. , where r is the total number of rows in the first table, c is the total number of columns in the first table, and h is the embedding dimension.

[0042] Specifically, as an example, structural encoding can be performed on a 3D embedded matrix by first adding row and column position information to each cell: the row position information is the shape. The line position embedding The column position information is the shape. column position embedding The row position embeddings and column position embeddings are fused into the three-dimensional embedding matrix through a broadcast mechanism. The fusion calculation formula is as follows: This yields a three-dimensional embedding matrix that integrates location information, so that the embedding vector of each cell carries its own two-dimensional location features.

[0043] Specifically, as an example, a three-dimensional embedding matrix incorporating location information can be expanded row-wise into a one-dimensional sequence of cell embeddings. Perform masked self-attention computation on the one-dimensional sequence: First, construct a sequence with dimension... The table mask matrix TM is a block matrix whose diagonal portion is... A matrix consisting entirely of 1s, with the rest being... The diagonal matrix is used to constrain the interaction range of the self-attention computation; the table mask matrix TM is multiplied by negative infinity to obtain the mask matrix M, calculated as follows: Then, through a preset attention weight matrix... , , Embedding sequences in one-dimensional cells Perform linear transformations respectively to obtain the query matrix Q, key matrix K, and value matrix V, calculated as follows: , , Attention scores are calculated based on the query matrix Q, key matrix K, and mask matrix M. These scores are then normalized using the softmax function and multiplied by the value matrix V to obtain the updated one-dimensional cell embedding sequence. The calculation formula is as follows: This self-attention calculation process, through the constraints of the table mask matrix, ensures that each cell only interacts with cells in the same row and column, achieving the fusion of interactive features within the table row, column, and between rows and columns. The final output, the updated one-dimensional cell embedding sequence, is the first data sequence.

[0044] S103, the first data sequence is input into the large language model so that the large language model processes the first query condition and obtains the first query result.

[0045] As an example, a large language model can refer to a natural language processing model built on the Transformer decoder architecture. It uses a self-attention mechanism to model text sequences, has the ability to understand natural language and make logical inferences, and is built through three stages: pre-training, supervised fine-tuning, and human negative feedback training. It can globally perceive and model the semantic information of text sequences.

[0046] As an example, the first query result can refer to the information output by the large language model after combining the tabular feature information in the first data sequence with natural language understanding and logical reasoning of the first query conditions, which can meet the user's query needs. The output format includes natural language text, structured data, etc.

[0047] Specifically, as an example, text encoding processing can be performed on the first query condition. Using the same tokenizer and embedding layer as the text encoding of the first table cell, the natural language text of the first query condition is sequentially subjected to tokenization, token encoding, embedding layer query and average pooling operations. This converts the first query condition into a query embedding sequence with the same embedding dimension as the first data sequence, ensuring the compatibility of the large language model with the feature fusion of table features and query semantics.

[0048] Specifically, as an example, the first data sequence can be concatenated with the query embedding sequence to construct the input sequence of the large language model. During the concatenation process, the integrity of the first data sequence is maintained, allowing the large language model to prioritize the perception of table features. The constructed input sequence is then fed into the Transformer decoder of the large language model. The large language model performs global semantic modeling on the input sequence through a self-attention mechanism, fusing the table semantic features and two-dimensional structural features from the first data sequence with the query semantic information from the query embedding sequence to establish a semantic association between table features and query requirements.

[0049] Specifically, as an example, the large language model, based on the fused feature information after modeling, performs corresponding information extraction and logical reasoning operations to meet the semantic requirements of the first query condition, accurately responding to the user's query needs and ultimately outputting the first query result that satisfies the first query condition. Depending on the actual application scenario, the first query result can be converted into natural language text, tables, key-value pairs, or other output formats to ensure the semantic accuracy and readability of the first query result.

[0050] The table data processing method of this application preprocesses the first table using a structural encoding network before inputting it into a large language model. This structural encoding network adds row and column position information to each cell. This feature directly addresses the deficiency in existing technologies that ignore the corresponding positional relationships between elements in rows and columns, explicitly encoding the two-dimensional structural information of the table into the data. Furthermore, the structural encoding network constrains each cell to interact only with cells in the same row and column through self-attention calculation. This mechanism forces the structural feature perception model to strictly follow the inherent row and column logic of the table when extracting features, thereby completely and accurately capturing the intra-row and intra-column structural relationships of the table. Finally, the table data sequence, incorporating precise row and column structural information, is input into the large language model for query processing. This enables the model to parse query semantics based on a deep understanding of the table structure, significantly enhancing the large language model's ability to perceive two-dimensional grid structures.

[0051] As another implementation of this application, the table structure feature perception model also includes a filtering module. Before S102, the method may also include the following steps.

[0052] The first table and the first query conditions are input into the filtering module of the table structure feature perception model. The filtering is performed based on the semantic relevance between the first query conditions and the first table to obtain the filtered first table.

[0053] S102 may also include: inputting the filtered first table into the structure encoding network of the table structure feature perception model.

[0054] As an example, the filtering module can be a functional component in a table structure feature awareness model, used to dynamically reduce the size of the table based on the semantic relevance between the user's query conditions and the table data. This module quantifies the degree of matching between the query semantics and the semantics of the table rows and columns, and removes redundant rows and columns based on a preset similarity threshold, thereby retaining key data that is highly relevant to the query and improving the efficiency of subsequent processing stages.

[0055] As an example, semantic relevance can refer to the degree of semantic matching between a user's query and tabular data. This relevance is objectively measured by mapping the query text and the text of the table cells to the same vector space (i.e., the semantic representation space) and calculating the similarity between their vectors (such as cosine similarity). The higher the similarity, the closer the query is to the tabular data semantically, and the data should be retained; conversely, it is considered redundant.

[0056] As an example, the filtered first table can refer to the output of the original first table after processing by the filtering module. This table only contains rows and columns whose semantic relevance to the user's query conditions reaches a preset threshold, and its number of rows is denoted as... The number of columns is denoted as , usually satisfy and (Where r and c are the number of rows and columns of the original table). This table retains key header attributes, key records, and their corresponding text embeddings, serving as input to the structured coding network.

[0057] Specifically, as an example, the filtering module can embed or connect to a pre-trained semantic matching model trained on a large-scale table-based question-and-answer corpus. This model can receive text inputs of text queries and table row and column data, and output a semantic matching score between the two. The first query condition is input into the model to complete semantic initialization. Then, the header text of each column of the first table and the concatenated text of each record row are input into the initialized semantic matching model. The model outputs the semantic matching score with the query condition column by column and row by row. The filtering module presets a matching score threshold, automatically removes columns and rows with scores below the threshold, and after structural regularization of the retained key row and column data, the filtered first table is obtained.

[0058] Specifically, as an example, for a vertical domain's tabular data processing scenario, corresponding business attribute tags can be configured for each column header of the first table, and corresponding business content tags can be configured for each record row. Simultaneously, core business tags are extracted for the first query condition to obtain a set of business requirement tags for the query. The filtering module determines semantic relevance by calculating the matching degree between the business tags of the table rows and columns and the set of query business requirement tags: columns and rows that intersect with the query tag set are retained, while columns and rows without tag intersection are removed. For rows and columns with partial tag matching, manually configurable weights are determined based on domain business rules, ultimately retaining key row and column data that meet the relevance requirements to form the filtered first table.

[0059] The table data processing method of this application introduces a filtering module before structural encoding and uses the filtered first table as input to the structural encoding network, thereby achieving semantic-driven simplification of the original table. By calculating the similarity between the first query condition and the semantic representation of the table rows and columns, highly relevant rows and columns are dynamically identified and retained, which reduces computational overhead and improves the information purity of the input data. On this basis, the structural encoding network can complete the embedding and fusion of row position information and column position information within a smaller but more focused data range, as well as self-attention interaction within a limited range, thereby ensuring the synergistic improvement of the accuracy and efficiency of structural perception.

[0060] As another way to filter based on the semantic relevance of the first query condition and the first table, to obtain the filtered first table, such as Figure 2 As shown, this step also includes the following steps.

[0061] S201, convert the first query condition, the data in each row of the first table, and the headers of each table into their corresponding semantic representations.

[0062] As an example, each row of data can refer to the set of cell data corresponding to all record rows in the first table, excluding the header row. Each row of data consists of all cell text data in that row and column direction, which is the core data part of the table that carries business record information.

[0063] As an example, each column header can refer to the cell text data corresponding to each column in the first table header row. It is used to characterize the attribute meaning of the corresponding column and is the identifying data that defines the semantics of the table column data.

[0064] As an example, semantic representation can refer to converting query conditions, table row data, and table headers in the form of natural language text into low-dimensional dense vectors. These vectors are numerical vectors with a preset embedding dimension, which can represent the semantic information of the text in a unified feature space and are the basic feature form for realizing semantic similarity calculation.

[0065] Specifically, as an example, this can be achieved from a three-dimensional embedding matrix. Extract the feature vector set corresponding to the header row. In this set, the feature vector corresponding to each column is the semantic representation of the column header, ultimately resulting in a set with dimension [missing information]. The semantic representation matrix of each list header, where each row in the matrix represents the semantic representation of a list header.

[0066] Specifically, as an example, this can be applied to three-dimensional embedding matrices. Perform average pooling along the column direction to merge the feature vectors of all cells in each row into a single row-level feature vector, resulting in a vector of dimension [dimensional value missing]. Row-level semantic representation matrix Extract the set of feature vectors corresponding to all record rows except the header row from the row-level semantic representation matrix. Each row in this set represents a semantic representation of one row of data, ultimately resulting in a set with dimensions of [dimensional value missing]. The semantic representation matrix of each row of data.

[0067] Specifically, as an example, the first query condition in natural language text form can be subjected to the exact same embedding transformation operation as the table cell text, sequentially going through word segmentation, token encoding, embedding layer query, and average pooling, to convert the entire query condition text into a query semantic representation vector of the same dimension as the table feature vector. .

[0068] S202, calculate the first similarity between the semantic representation of the first query condition and the semantic representation of each row of data in the first table.

[0069] As an example, the first similarity can refer to the degree of semantic matching between the semantic representation of the first query condition and the semantic representation of each row of data in each record row of the first table. It is a quantifiable numerical indicator used to characterize the degree of semantic relevance between the query condition and the table row data.

[0070] Specifically, as an example, the cosine similarity algorithm can be used to calculate the first similarity. This algorithm can effectively measure the semantic similarity of two vectors in a unified feature space, and the value of the result ranges from [value missing]. [1,1], where the closer the value is to 1, the higher the semantic relevance. Let the semantic representation of the first query condition be a vector. In the first table, the semantic representation of each row of data in a given record row is a vector. The formula for calculating the first similarity between a single row of data and the query conditions is: ,in, Let vector q and vector The inner product, Let q be the magnitude of the vector. For vectors The length of the module.

[0071] Based on the above calculation formula, the semantic representation matrix of each row of data is... Each row of feature vectors in the matrix is then compared with the query semantic representation vector q using cosine similarity calculation, ultimately yielding a result with dimension q. The first similarity set, where each value in the set corresponds to the first similarity between each row of data in the first table and the first query condition.

[0072] S203, calculate the second similarity between the semantic representation of the first query condition and the semantic representation of each column header of the first table.

[0073] As an example, the second similarity can refer to the degree of semantic matching between the semantic representation of the first query condition and the semantic representation of each column header in the first table. It is a quantifiable numerical indicator used to characterize the degree of semantic relevance between the query condition and the column attributes of the table.

[0074] Specifically, as an example, a cosine similarity algorithm, identical to the first similarity algorithm, can be used for calculation to ensure the uniformity of the semantic relevance measurement standards for rows and columns, avoiding filtering biases caused by different measurement methods. Let the semantic representation of the first query condition be a vector. The header semantics of a column in the first table are represented as a vector. Then the formula for calculating the second similarity between a single list header and the query condition is: ,in, Let vector q and vector The inner product, Let q be the magnitude of the vector. For vectors The length of the module.

[0075] Based on the above calculation formula, for each row of the feature vector in the obtained semantic representation matrix of each list header, cosine similarity is calculated with the query semantic representation vector q, ultimately yielding a dimension of... The second similarity set, where each value in the set corresponds to the second similarity between a list header in the first table and the first query condition.

[0076] S204, filter rows with a first similarity higher than the preset threshold and columns with a second similarity higher than the preset threshold to obtain the first filtered table.

[0077] As an example, the row preset threshold can refer to a pre-configured numerical threshold used to determine whether the first query condition has a valid semantic relevance to the table row data. It is an empirical value obtained based on a large number of table question-and-answer samples and is used to filter table record rows that are semantically irrelevant to the query condition.

[0078] As an example, the column preset threshold can refer to a pre-configured numerical threshold used to determine whether the first query condition has a valid semantic relevance to the table header. It is also an empirical value obtained based on a large number of table question-and-answer samples and is used to filter table columns that are semantically irrelevant to the query condition.

[0079] As an example, the filtered first table can refer to the remaining two-dimensional grid structured data after filtering out the record rows with a first similarity lower than the preset threshold for rows and the columns with a second similarity lower than the preset threshold for columns from the original first table. This table retains the header row and record row structure of the original table and only contains key columns and key rows that have effective semantic relevance to the first query condition.

[0080] Specifically, as an example, each first similarity value in the first similarity set can be compared with a preset row threshold one by one, and the record rows with first similarity values higher than the preset row threshold can be marked. All cell data of such marked rows in the original first table can be retained, and the record rows with first similarity values lower than or equal to the preset row threshold can be removed to obtain an intermediate table that retains only the key record rows. The number of columns in the intermediate table is the same as the original first table, and the number of rows is the number of key record rows retained plus 1 header row.

[0081] Specifically, as an example, each second similarity value in the second similarity set can be compared with a preset threshold for the column one by one, and columns with second similarity values higher than the preset threshold can be marked. All cell data of such marked columns can be retained from the intermediate table, and columns with second similarity values lower than or equal to the preset threshold can be removed to obtain a target table that retains only key columns and key rows.

[0082] Specifically, as an example, the target table can be processed to regularize its row and column structure, maintaining the table's two-dimensional grid structure, ensuring that the row and column correspondence between the header row and the record rows is accurate, and that the position of the cell data is consistent with the relative position of the original table, ultimately resulting in the filtered first table. Simultaneously, based on the original 3D embedding matrix, the 3D embedding matrix corresponding to the first filtered table is extracted. ,in This represents the total number of rows in the filtered table. The total number of columns in the filtered table, and satisfying the following conditions: , This provides a simplified tabular data foundation for subsequent structural coding processing.

[0083] It should be noted that the preset thresholds for rows and columns can be flexibly adjusted according to the actual table data scenario and query task requirements. In large-scale table data scenarios, the thresholds can be appropriately increased to further improve the data simplification; in small-sample table data scenarios, the thresholds can be appropriately decreased to avoid the loss of key data.

[0084] The table data processing method of this application realizes fine-grained semantic pruning of the original table by uniformly mapping the first query condition, each row of data and each column header to a semantic representation, calculating their similarity to the query respectively, and performing joint filtering based on independently set row and column preset thresholds. This mechanism not only retains the record rows that are highly relevant to the query intent, but also retains the key field columns that support the intent, thereby reducing the computational load of subsequent structural encoding while ensuring the integrity of the information and the rationality of the structure required to generate the query results.

[0085] As another implementation of this application, the method may further include the following steps.

[0086] Obtain the first training sample set; wherein, the first training sample set includes multiple first training samples, and the first training samples contain the second table, the second query conditions, and the second query results corresponding to the second query conditions.

[0087] As an example, the first training sample set can refer to the set of sample data used to train the table structure feature perception model. It consists of a large number of labeled first training samples. The sample set covers different table structure forms such as single table and multiple tables, as well as diverse table query scenarios. It is the basic data carrier for realizing the model structure encoding capability fitting and optimization.

[0088] As an example, the first training sample can refer to the smallest independent data unit in the first training sample set. It is a triplet data structure containing a second table, a second query condition, and a second query result. The three have a one-to-one semantic relationship, and the annotation information is accurate and unambiguous. A single sample can enable the model to learn in a single training session in a specific table query scenario.

[0089] As an example, the second table can refer to the structured table data in the first training sample that serves as the query object. It is a two-dimensional grid structure table data that can contain single or multiple tables. It has the basic structure of header rows and record rows, which is consistent with the table data structure characteristics in actual application scenarios. It is the training object for model structure encoding.

[0090] As an example, the second query condition can refer to the natural language query requirements proposed for the second table in the first training sample. It is a query statement in text form, covering a variety of table query types such as information extraction, logical reasoning, and data statistics. It serves as the training basis for the model to perceive query semantics.

[0091] As an example, the second query result can refer to the standard answer in the first training sample for the second query condition. It is the labeled result that matches the second query condition. Its form is adapted according to the query type (such as natural language text, numerical data, structured data fragments, etc.) and serves as the benchmark labeled data for judging the accuracy of the model's prediction results and calculating the training loss.

[0092] Specifically, as an example, structured tabular data from real-world application scenarios can be collected as the second table, covering different structural forms such as single tables and multiple tables. Simultaneously, natural language queries from users in the corresponding scenarios can be collected as the raw data for the second query conditions, ensuring the authenticity and diversity of the sample data. The collected second table and second query conditions can be filtered for validity, eliminating invalid tables with incomplete data or disordered row and column structures, and invalid query conditions with ambiguous semantics or no relation to the table data, retaining valid data pairs with complete table structures and clear query semantics.

[0093] Specifically, as an example, for each valid second table and second query condition, professional annotators can provide standard query answers based on the table data content as the second query results. During the annotation process, the accuracy and uniqueness of the answers are guaranteed, and the annotation format is adapted according to the query type (such as annotating specific values for numerical queries and natural language conclusions for information reasoning).

[0094] Specifically, as an example, the labeled second table, second query conditions, and second query result triplet can be combined into the first training sample. All the first training samples are integrated into the first training sample set, and the sample set is divided into training set, validation set, and test set according to a preset ratio (such as 7:2:1) for model training, intermediate validation, and final performance evaluation, respectively. At the same time, the sample set is normalized to ensure the consistency of the table data format.

[0095] The second table is input into the structure encoding network of the table structure feature perception model for structure encoding processing to obtain the second data sequence.

[0096] As an example, the second data sequence can refer to the one-dimensional cell embedding vector sequence output by the second table after structural encoding processing by the structural encoding network. It integrates the two-dimensional structural features (row and column positions, row and column interactions) and text semantic features of the second table. The sequence length is consistent with the number of cells in the encoded table, and the vector dimension is the preset embedding dimension h. It is a table feature sequence that is adapted to the input format of large language models.

[0097] Specifically, as an example, the generation process of the second data sequence in this step corresponds completely to the generation process of the first data sequence in this application in terms of structure, process and parameter space; the only difference is that the input object is replaced by the first table in the inference stage and the second table in the training stage. The two belong to the same type of data object at the technical implementation level and do not introduce new structural or behavioral features; therefore, this step does not add any new technical features that need to be explained, but only reuses the structural coding mechanism disclosed in this application.

[0098] The second data sequence is input into the large language model so that the large language model can process the second query condition and obtain the first target query result.

[0099] As an example, the first target query result can refer to the table query prediction result output by the large language model after receiving the second data sequence and combining the semantic information of the second query conditions for natural language understanding and logical reasoning. Its format is consistent with the annotation format of the second query result. It is the model's prediction output for the second query conditions in the current training stage and is used for the calculation of subsequent training loss.

[0100] Specifically, as an example, the output is aligned in form with the second query result, such as being a string, a JSON object, or a structured token sequence; its generation process relies on the inherent instruction-following ability and context modeling ability of the large language model, without changing its original architecture and reasoning logic; this step does not introduce new actions or modules, but only continues the technical path of inputting the first data sequence into the large language model in this application, and connects the intermediate representations of the training phase to the same processing link.

[0101] Based on the results of the first target query and the second query, determine the first loss value of the first loss function.

[0102] As an example, the first loss function can refer to a mathematical function used to quantify the degree of deviation between the first target query result output by the large language model and the second query result labeled by the samples. The first loss function is adapted and selected according to the form of the first target query result and the second query result. It is the core basis for measuring the current training effect of the model and guiding the optimization of model parameters.

[0103] As an example, the first loss value can refer to the numerical result calculated after inputting the first target query result and the second query result into the first loss function. This value quantifies the magnitude of the error between the model's prediction result and the standard answer. The smaller the first loss value, the better the model's prediction effect, and vice versa.

[0104] Specifically, as an example, the first loss function can be selected based on the form of the first target query result and the second query result: if the query result is in the form of discrete text generation (such as natural language conclusions or category determinations), the cross-entropy loss function is selected to quantify the probability distribution deviation between the discrete prediction results and the labeled results; if the query result is in the form of continuous numerical values (such as data statistics or numerical extraction results), the mean squared error loss function is selected to quantify the squared error between the continuous numerical prediction values and the labeled values; if the query result is in the form of information matching, the cosine similarity loss function is selected to quantify the semantic similarity deviation between the prediction results and the labeled results.

[0105] Specifically, as an example, the format of the first target query result output by the large language model and the second query result labeled in the first training sample can be standardized to ensure that the input dimensions and data forms of the two are consistent. The standardized first target query result and the second query result are substituted into the selected first loss function, and the calculation is performed according to the mathematical calculation logic of the loss function. The resulting numerical result is the first loss value, which intuitively reflects the prediction error of the model for the sample in the current training stage.

[0106] Specifically, as an example, if the model is trained using batch training, the average first loss value of all the first training samples in the batch is calculated to obtain the average first loss value of the training batch. The average loss value of the batch is used as the basis for the overall loss of the training, thus avoiding the impact of outliers of individual samples on the model training.

[0107] If the first loss value does not meet the first training stopping condition, keep the parameters of the large language model unchanged, adjust the parameters of the structure encoding network according to the first loss value, and obtain the updated table structure feature perception model; then return to input the second table into the structure encoding network of the table structure feature perception model until the first loss value meets the first training stopping condition, and obtain the trained target table structure feature perception model.

[0108] As an example, the first training stopping condition can refer to a pre-set criterion for terminating model training. It can be a single or combined criterion. When the first loss value meets this condition, it is determined that the training effect of the model has reached the preset requirements, and the iterative training of the model is terminated. This is the core principle for controlling the model training process and preventing overfitting or underfitting.

[0109] As an example, the target table structure feature perception model can refer to the final trained model obtained after multiple iterations of training when the first loss value meets the first training stopping condition. The structural encoding network parameters of this model are optimized and fitted, and it has the ability to accurately extract and encode table structure features, which can effectively support subsequent table data processing tasks.

[0110] Specifically, as an example, the calculated first loss value (or batch average first loss value) can be compared with a preset first training stopping condition. The first training stopping condition is one or more of the following combinations: the first loss value is less than a preset loss threshold, the first loss value tends to converge in multiple consecutive training rounds (the change amplitude is less than the preset threshold), the number of training iterations of the model reaches the preset maximum number of iterations, and the accuracy of the model on the validation set reaches the preset accuracy threshold. If the first loss value meets the above training stopping conditions, the model training is terminated, and the current table structure feature perception model is the target table structure feature perception model. If it is not met, the parameter adjustment stage is entered.

[0111] Specifically, as an example, all the original parameters of the large language model can be kept strictly unchanged without any updates or adjustments. Only the parameters of the structural encoding network of the table structure feature perception model can be optimized and adjusted. Based on the backpropagation algorithm, the first loss value is backpropagated along the structural encoding network of the model to calculate the gradient values of each parameter in the structural encoding network, including the position embedding parameters and the attention weight matrix corresponding to the table mask matrix. , , The parameters of the Transformer module, such as inter-layer parameters, are updated and adjusted using stochastic gradient descent and Adam optimizers based on the calculated gradient values to obtain a table structure feature perception model with updated parameters.

[0112] Specifically, as an example, the second table in the first training sample set can be re-input into the structure encoding network of the table structure feature perception model after parameter update, and the steps of obtaining the second data sequence through structure encoding, obtaining the first target query result through large model inference, calculating the first loss value, and determining the training stopping condition can be repeated until the first loss value meets the first training stopping condition, and the iterative training is terminated.

[0113] Specifically, as an example, when the first loss value meets the first training stopping condition, all iterative training operations of the model can be stopped, and the final parameter-optimized table structure feature perception model can be used as the trained target table structure feature perception model. At the same time, the performance of the model can be evaluated on the test set to verify its generalization ability on unseen table query samples, thus ensuring the actual application effect of the model.

[0114] The table data processing method of this application separates the parameter update paths of the large language model and the structure encoding network during the training phase. It uses a first training sample set to provide supervision signals, driving the structure encoding network to learn and generate more discriminative table structure representations. On this basis, a first loss function is used to quantify the prediction bias, and efficient targeted optimization of the table front-end encoding module is achieved by freezing the parameters of the large language model and adjusting only the parameters of the structure encoding network. The final target table structure feature perception model can significantly enhance the understanding of the two-dimensional structure and semantic relationships of the table without changing the original capabilities of the large language model, thereby solving the problems of high computational overhead and low training efficiency caused by full parameter fine-tuning in the prior art.

[0115] As another implementation of this application, in order to improve the understanding of cross-table comparisons, such as Figure 3 As shown, the first table includes multiple first tables, and the table structure feature perception model also includes a self-attention module. Before S103, the method may also include the following steps.

[0116] S301, Obtain the first data sequence corresponding to each first table.

[0117] As an example, multiple first tables can refer to a collection of two-dimensional grid structured table data with cross-table semantic relationships. There are logical relationships between the tables in terms of business attributes, data records or fields. They are the target data objects for cross-table query tasks. Each table contains the basic structure of header rows and record rows.

[0118] As an example, the first data sequence can refer to a sequence of one-dimensional cell embedding vectors that integrates the two-dimensional structural features and semantic features of a single table after the structure encoding network of the table structure feature perception model is used to process the structure of a single table. Each table corresponds to a unique first data sequence, and the embedding dimension of the sequence is a preset h dimension.

[0119] Specifically, as an example, a parallel encoding strategy involving batch input of multiple tables can be adopted. The structure encoding network of the table structure feature awareness model enables parallel computing mode, encapsulating multiple first tables into table tensors in batches, and simultaneously inputting them into multiple parallel encoding branches of the structure encoding network. Each branch independently completes the structure encoding processing of a single table, synchronously outputting the first data sequence corresponding to each first table. After encoding, the tensors are concatenated to form a set of first data sequences for multiple tables. This method can effectively improve the structure encoding efficiency of multiple first tables and is suitable for processing scenarios with large-scale multi-table data.

[0120] S302, concatenate the first data sequences corresponding to multiple first tables to generate a third data sequence.

[0121] As an example, splicing can continuously combine the first data sequences corresponding to multiple first tables according to preset rules to form a single-dimensional feature sequence. During the splicing process, the internal feature order of each first data sequence remains unchanged, and only the external integration of multiple table sequences is achieved.

[0122] As an example, the third data sequence can refer to a one-dimensional embedding vector sequence that integrates the single-table structure and semantic features of multiple tables after concatenating the first data sequences corresponding to multiple first tables. This sequence only realizes the physical combination of the multi-table sequences and has not yet captured the cross-table semantic association information between the tables. The sequence embedding dimension is still h-dimensional.

[0123] Specifically, as an example, the business relevance of multiple first tables to the first query condition can be calculated in advance through methods such as table field matching and business attribute annotation, resulting in a relevance score for each table. The first data sequences are then sorted from highest to lowest according to their respective table relevance scores, and the sorted first data sequences are concatenated to generate a third data sequence. This method places the feature sequences of tables highly relevant to the query condition at the forefront of the third data sequence, facilitating the subsequent self-attention module to prioritize capturing cross-table semantic relationships between highly relevant tables.

[0124] S303, input the third data sequence into the self-attention module of the table structure feature perception model to perform self-attention calculation, so that the first data sequences corresponding to each first table can interact with each other to obtain the fourth data sequence.

[0125] As an example, the self-attention module can refer to a functional module in the table structure feature perception model that is specifically used to realize the cross-table semantic association modeling of multi-table data sequences. It is built based on the Transformer architecture and includes a self-attention calculation layer and a feature fusion layer. It can capture the semantic association information between the first data sequences corresponding to multiple tables and realize the interaction and fusion of cross-table features.

[0126] As an example, cross-table information interaction can refer to the process of using self-attention computation to enable the embedding vectors in the first data sequence corresponding to multiple first tables to generate attention weights for each other, thereby realizing the transmission and fusion of semantic information between the features of each table, and thus capturing cross-table semantic relationships such as field association, data matching, and logical deduction between tables.

[0127] As an example, the fourth data sequence can refer to the one-dimensional embedding vector sequence output by the third data sequence after the self-attention module calculates the self-attention. This sequence is a joint encoding feature sequence of multi-table data, and the embedding dimension is still h-dimensional, which can be directly adapted to the input format of large language models.

[0128] Specifically, as an example, a cross-table attention mask matrix can be introduced during the self-attention calculation process. This mask matrix only allows attention weight calculation between embedding vectors belonging to different first tables, restricting secondary attention interactions between embedding vectors within the same table (interactions within a single table have already been completed in the structural encoding stage). Multiplying the cross-table attention mask matrix by negative infinity and incorporating it into the attention score calculation process allows the self-attention module to focus solely on capturing cross-table semantic relationships, reducing invalid calculations of features within a single table, and improving the accuracy of cross-table information interaction.

[0129] S103 may also include: inputting a fourth data sequence into a large language model.

[0130] Specifically, as an example, the fourth data sequence serves as a replacement input for the first data sequence in this application. Its technical positioning is an enhanced table representation for multi-table scenarios. This sequence already contains the original structural encoding results of each first table and its cross-table interaction information, possessing richer contextual expression capabilities than a single-table first data sequence. When input into the large language model, its position in the LLM input sequence is consistent with the position of the first data sequence in this application, that is, it is located before the user query conditions or inserted as a prefix embedding at the beginning of the LLM input. Based on the multi-table joint semantics carried by the fourth data sequence, the large language model combines the first query conditions to perform inference and generation, and outputs a first query result covering multi-source information. This replacement does not change the original architecture and parameters of the large language model, but only updates the semantic granularity and scope of its input representation.

[0131] The table data processing method of this application embodiment obtains the first data sequence corresponding to each first table, concatenates them into a third data sequence, and then generates a fourth data sequence by realizing cross-table information interaction through a self-attention module. This enables the large language model to perform reasoning based on a unified, integrated, and structurally semantic input representation when processing complex query tasks involving multiple related tables. This collaborative mechanism breaks through the information boundary of single-table encoding and improves the understanding of higher-order semantics such as cross-table entity relationships, business logic chains, and multi-dimensional indicator comparisons, thereby effectively solving the technical problem of lacking multi-table joint modeling capabilities in the prior art.

[0132] As another implementation of S303, this step may also include the following steps.

[0133] Add an aggregation identifier at the very beginning of the third data sequence.

[0134] As an example, an aggregation token (table token) is a learnable special vector with dimensions consistent with the embedding vectors of each unit in the first data sequence. This aggregation token can be an embedding vector that has been initialized to a standard normal distribution or a zero vector and then trained and optimized. Its function is to serve as a hub node for cross-table semantic aggregation. In this embodiment, the aggregation token is placed at the beginning of the third data sequence, forming an expanded input sequence. Its role is to guide the self-attention mechanism to actively focus on the global association modeling between multiple tables, rather than being limited to the local interactions within each first data sequence. This aggregation token does not carry any cell content information from the original table, but through subsequent self-attention calculations, it can dynamically aggregate the structural and semantic features of the first data sequences corresponding to each first table, forming a condensed representation of the overall semantics of the multiple tables.

[0135] The third data sequence with the added aggregation identifier is input into the self-attention module of the table structure feature perception model for self-attention calculation, so that the aggregation identifier interacts with the first data sequence corresponding to each first table to obtain the fourth data sequence.

[0136] As an example, self-attention computation can refer to a multi-head self-attention mechanism based on the Transformer architecture, which takes an extended sequence containing aggregation identifiers as input and outputs a sequence of equal length.

[0137] This application may, for example, implement information interaction by means of bidirectional attention weight allocation between the aggregation identifier and each first data sequence; the means includes: uniformly mapping the extended sequence into three sets of matrices: query (Q), key (K), and value (V), wherein the query vector generated at the position of the aggregation identifier is similar to the key vectors generated at the positions of all first data sequences, and the value vectors corresponding to the positions of all first data sequences are weighted and aggregated accordingly, thereby updating the output representation of the position of the aggregation identifier.

[0138] This application can also achieve information interaction by means of independently participating in the full sequence attention calculation based on the position of the aggregation identifier; this means includes: not imposing mask restrictions on the aggregation identifier during the standard self-attention calculation process, so that it can freely focus on any position in the third data sequence, including all unit embeddings of all first data sequences; further, this application can also achieve information interaction by using the aggregation identifier as an additional learnable query vector and jointly performing cross-attention with the key-value pairs of the fixed-length first data sequences; in this means, the aggregation identifier does not participate in the intra-sequence position encoding, but drives the cross-table feature extraction of each first data sequence as an independent query identity. Based on any of the above methods, this application obtains a fourth data sequence for representing the joint semantics of multiple tables, and the output vector of this sequence at the position corresponding to the aggregation identifier is the global summary representation that integrates the structural and semantic information of each first table.

[0139] The tabular data processing method of this application embodiment achieves centralized modeling of the structure and semantic information of multiple tables by adding an aggregation identifier at the beginning of the third data sequence and having it interact with the first data sequence corresponding to each first table through self-attention interaction. By using the aggregation identifier as a hub node, not only is the efficiency and expressive power of information fusion between multiple tables improved, but also a semantically clear and structurally compact cross-table summary input is provided for subsequent large language models, thereby effectively supporting complex reasoning tasks for multi-source tabular data.

[0140] As another implementation of this application, the method may further include the following steps.

[0141] As an example, a second training sample set is obtained; wherein, the second training sample set includes multiple second training samples, and the second training samples contain multiple third tables, third query conditions, and third query results corresponding to the third query conditions.

[0142] As an example, the second training sample set can refer to a set of labeled samples specifically used to train the multi-table data processing capabilities of the table structure feature perception model. It consists of a large number of second training samples, covering real-world application scenarios involving multiple tables such as financial analysis, government data query, and business intelligence. It serves as the basic data carrier for achieving the fitting and optimization of the model's multi-table cross-table semantic modeling capabilities.

[0143] As an example, the second training sample can refer to the smallest independent data unit in the second training sample set. It is a multi-table triplet data structure containing multiple third tables, third query conditions, and third query results. The three have a one-to-one cross-table semantic relationship, and the annotation information is accurate and unambiguous. A single sample can realize the model's single training learning in a specific multi-table query scenario.

[0144] As an example, the third table can refer to multiple semantically related two-dimensional grid structured table data in the second training sample that serve as cross-table query objects. There are logical relationships between the tables in terms of fields, business attributes, or data records. It serves as the training object for multi-table encoding and cross-table reasoning. Each table contains the basic structure of a header row and record rows.

[0145] As an example, the third query condition can refer to the natural language cross-table query requirements proposed in the second training sample for multiple third tables. It is a query statement in text form, covering a variety of multi-table query types such as cross-table data extraction, cross-table logical reasoning, and cross-table data statistics. It is the training basis for the model to perceive the semantics of cross-table queries.

[0146] As an example, the third query result can refer to the standard answer in the second training sample for the third query condition. It is the labeled result that matches the third query condition. Its form is adapted according to the cross-table query type (such as natural language text, numerical values, structured data fragments, etc.). It is the benchmark labeled data for judging the accuracy of the model's multi-table prediction results and calculating the training loss.

[0147] The third table is input into the table encoding network of the table structure feature perception model for encoding processing, resulting in the fifth data sequence.

[0148] As an example, the fifth data sequence can refer to the global feature sequence of a single table output after a single third table has been fully encoded by a table encoding network. It is a low-dimensional dense vector sequence that integrates the two-dimensional structural features of the single table (row and column positions, row and column interactions) with textual semantic features. Each third table corresponds to a unique fifth data sequence.

[0149] The fifth data sequence corresponding to multiple third tables is concatenated to generate the sixth data sequence.

[0150] As an example, the fifth data sequence corresponding to multiple third tables can be an independent fifth data sequence obtained by processing all third tables in the same second training sample as described above; there is no preset order constraint between the sequences, but consistency within the sample must be maintained; the sixth data sequence can be a long sequence formed by connecting multiple fifth data sequences end to end in a fixed order (e.g., according to the order in which the tables appear in the sample); its total length is the sum of the lengths of each fifth data sequence, and the vector dimension remains unchanged; this sequence has not yet introduced a cross-table interaction mechanism and is only used as a data container.

[0151] The sixth data sequence is input into the self-attention module of the table structure feature perception model for self-attention calculation, so that the fifth data sequences corresponding to each third table can interact with each other to obtain the seventh data sequence.

[0152] As an example, the self-attention module can be the Transformer module defined in this application for implementing semantic alignment and structural fusion between multiple tables; its parameters are independent of the structural encoding network and are specifically used to model the dependencies between different table representations; the seventh data sequence can be the updated sequence output by the self-attention module after the sixth data sequence is processed, with the same length as the sixth data sequence, but the vector at each position has been aggregated with information from the corresponding regions of other third tables; this sequence achieves semantic enhancement at the cross-table granularity.

[0153] The seventh data sequence is input into the large language model so that the large language model can process the third query condition and obtain the second target query result.

[0154] As an example, the seventh data sequence, as a structurally enhanced multi-table joint representation, is placed at the beginning of the input sequence of the large language model; the third query condition, in the form of natural language text, is processed by the native word segmenter of the large language model to generate a corresponding token sequence, which is then followed by the seventh data sequence; the second target query result can be an answer autonomously generated by the large language model based on the cross-table structural semantics contained in the seventh data sequence and the semantic instructions of the third query condition; its form is consistent with the third query result and is used for subsequent loss calculation.

[0155] Based on the results of the second target query and the third query, the second loss value of the second loss function is determined.

[0156] As an example, the second loss function can be a supervised objective function that measures the semantic or formal consistency between the second target query result and the third query result; its specific form depends on the type of the third query result: when the third query result is a classification label, cross-entropy loss is used; the second loss value can be a scalar value calculated by the second loss function on the current second training sample, used to quantify the model prediction bias; this value serves as the gradient source for backpropagation, driving parameter updates.

[0157] If the second loss value does not meet the second training stopping condition, keep the parameters of the large language model unchanged, adjust the parameters of the self-attention module of the structure encoding network according to the second loss value, and obtain the updated table structure feature perception model; then return to the table encoding network of the table structure feature perception model by inputting the third table, until the second loss value meets the second training stopping condition, and obtain the trained target table structure feature perception model.

[0158] As an example, the second training stopping condition can be a pre-defined convergence criterion, including but not limited to: the second loss value being lower than a threshold, the loss no longer decreasing after multiple consecutive rounds of validation set testing, and the training steps reaching an upper limit; this condition is independent of the first training stopping condition defined in this application and is specifically used in the multi-table training phase; keeping the parameters of the large language model unchanged can mean setting the gradient to zero for all parameters of the large language model during backpropagation and not performing parameter update operations; this strategy belongs to the prefix fine-tuning paradigm, ensuring the efficiency of training and the stability of the general capabilities of the backbone model; adjusting the parameters of the structure encoding network and the self-attention module can mean performing gradient descent updates only on the learnable parameters of the structure encoding network (including position embedding and table mask self-attention layer) and the self-attention module (including aggregation identifier embedding and cross-table attention layer); this process enables the model to gradually learn how to extract joint representations that are beneficial to downstream query responses from the original structure of the multi-table model.

[0159] The tabular data processing method of this application constructs a multi-table supervision signal by acquiring a second training sample set, performs structure-aware encoding on each third table using a table encoding network, and concatenates them to form a sixth data sequence. Then, it uses a self-attention module to achieve cross-table semantic interaction and generate a seventh data sequence. This sequence, together with the third query conditions, is input into a large language model, which drives the structure encoding network and the self-attention module to optimize collaboratively while freezing its backbone parameters. Ultimately, the model is able to jointly extract key information from multiple logically related but physically separated tables and support complex natural language query responses, thereby solving the problems of existing technologies where multi-table processing relies on external systems, lacks end-to-end structure modeling capabilities, and has high training costs.

[0160] The technical solutions of the embodiments of this application will be further described below with reference to the accompanying drawings.

[0161] Figure 4 This diagram illustrates the architecture of a tabular data processing method according to an embodiment of this application. Figure 4 As shown, multi-table data and user queries are first input into a multi-table encoder, which is built based on a structure encoding network, a filtering module, and a self-attention module. This encoder is used to perform structure-aware encoding of the tables and cross-table information aggregation. The encoder outputs a compact representation that integrates multi-table semantic and structural information, namely the multi-table semantic representation, which is represented by a single table token embedding. This table token embedding is placed at the front of the large model's input sequence, forming a complete input sequence together with the user query and other prompts. During the training phase, the original backbone parameters of the large language model are frozen, and only the additional adaptation parameters shown in the figure (corresponding to the newly added attention parameters and encoder parameters related to the table token) are updated, thereby achieving efficient prefix fine-tuning. Finally, the processed sequence is decoded by the large language model to output the query results.

[0162] Figure 5 A schematic diagram of the architecture of a tabular data processing method provided in another embodiment of this application is shown. For example... Figure 5 As shown, for each table, it is first dynamically cropped based on user query data to filter irrelevant rows and columns and focus on key data. Subsequently, the cropped table data is combined with added row and column position embeddings in the structured encoding network and fed into a series of Transformer layers containing a self-attention mechanism based on a table mask matrix. This mechanism ensures that cells interact only with cells in the same row and column, thereby generating a global representation of each table. These global representations from different tables are concatenated and fed into another set of Transformer layers (i.e., the self-attention module) for cross-table interaction and information aggregation. A learnable table token embedding is pre-added to the front end of this sequence and integrates global information during cross-table interaction, ultimately generating a multi-table semantic representation (table token embedding) representing the entire multi-table input as the encoder's output.

[0163] Based on the tabular data processing method provided in the above embodiments, this application also provides specific implementations of the tabular data processing apparatus. Please refer to the following embodiments.

[0164] First see Figure 6 The tabular data processing apparatus 60 provided in this application embodiment includes the following modules: Data acquisition module 601 is used to acquire the first table and the first query conditions; The structure encoding module 602 is used to input the first table into the structure encoding network of the table structure feature perception model, perform structure encoding processing, and obtain the first data sequence; wherein, the structure encoding processing includes: adding row position information and column position information to each cell of the first table, and using self-attention calculation to make each cell only interact with cells in the same row and column; The inference output module 603 is used to input the first data sequence into the large language model so that the large language model can process the first query condition and obtain the first query result.

[0165] In some embodiments, the tabular data processing apparatus 30 may further include the following modules: The filtering module is used to input the first table and the first query conditions into the filtering module of the table structure feature perception model before inputting the first table into the structure encoding network of the table structure feature perception model, and to filter based on the semantic relevance of the first query conditions and the first table to obtain the filtered first table. The encoding input module is used to input the filtered first table into the structure encoding network of the table structure feature perception model.

[0166] In some embodiments, the filtering processing module includes: The semantic transformation submodule is used to convert the first query condition, the data in each row of the first table, and the headers of each table into their corresponding semantic representations. The row similarity submodule is used to calculate the first similarity between the semantic representation of the first query condition and the semantic representation of each row of data in the first table; The column similarity submodule is used to calculate the second similarity between the semantic representation of the first query condition and the semantic representation of each column header of the first table; The threshold filtering submodule is used to filter rows with a first similarity higher than a preset row threshold and columns with a second similarity higher than a preset column threshold, resulting in a filtered first table.

[0167] In some embodiments, the tabular data processing apparatus 30 may further include the following modules: The sample acquisition module is used to acquire a first training sample set; wherein, the first training sample set includes multiple first training samples, and the first training samples contain a second table, a second query condition, and a second query result corresponding to the second query condition; The single-table encoding module is used to input the second table into the structural encoding network of the table structure feature perception model, perform structural encoding processing, and obtain the second data sequence. The single-table reasoning module is used to input the second data sequence into the large language model so that the large language model can process the second query conditions and obtain the first target query result; The loss calculation module is used to determine the first loss value of the first loss function based on the first target query result and the second query result; The parameter optimization module is used to keep the parameters of the large language model unchanged when the first loss value does not meet the first training stopping condition. Based on the first loss value, it adjusts the parameters of the structure encoding network to obtain the updated table structure feature perception model. Then, it returns the second table to the structure encoding network of the table structure feature perception model until the first loss value meets the first training stopping condition, thus obtaining the trained target table structure feature perception model.

[0168] In some embodiments, the first table includes a plurality of first tables, and the table data processing device 30 may further include the following modules: The multi-table sequence module is used to obtain the first data sequence corresponding to each first table before inputting the first data sequence into the large language model; The sequence concatenation module is used to concatenate multiple first data sequences corresponding to the first table to generate a third data sequence; The cross-table interaction module is used to input the third data sequence into the self-attention module of the table structure feature perception model for self-attention calculation, so that the first data sequences corresponding to each first table can interact with each other to obtain the fourth data sequence. The inference output module 603 is also used to input the fourth data sequence into the large language model.

[0169] In some embodiments, the cross-table interaction module includes: The identifier addition submodule is used to add an aggregation identifier at the very beginning of the third data sequence; The identification interaction submodule is used to input the third data sequence with the added aggregation identifier into the self-attention module of the table structure feature perception model to perform self-attention calculation, so that the aggregation identifier can interact with the first data sequence corresponding to each first table to obtain the fourth data sequence.

[0170] In some embodiments, the tabular data processing apparatus 30 may further include the following modules: The sample acquisition module is used to acquire a second training sample set; wherein, the second training sample set includes multiple second training samples, and the second training samples contain multiple third tables, third query conditions, and third query results corresponding to the third query conditions; The multi-table encoding module is used to input the third table into the table encoding network of the table structure feature perception model for encoding processing, and obtain the fifth data sequence. The sequence concatenation module is used to concatenate multiple fifth data sequences corresponding to the third table to generate a sixth data sequence; The cross-table interaction module is used to input the sixth data sequence into the self-attention module of the table structure feature perception model for self-attention calculation, so that the fifth data sequences corresponding to each third table can interact with each other to obtain the seventh data sequence. The multi-table reasoning module is used to input the seventh data sequence into the large language model so that the large language model can process the third query condition and obtain the second target query result; The loss calculation module is used to determine the second loss value of the second loss function based on the second target query result and the third query result; The parameter optimization module is used to keep the parameters of the large language model unchanged when the second loss value does not meet the second training stopping condition. Based on the second loss value, it adjusts the parameters of the structure encoding network and the self-attention module to obtain the updated table structure feature perception model. Then, it returns the third table to the table encoding network of the table structure feature perception model until the second loss value meets the second training stopping condition, and obtains the trained target table structure feature perception model.

[0171] Figure 7 A schematic diagram of the hardware structure of the electronic device provided in an embodiment of this application is shown.

[0172] The electronic device may include a processor 701 and a memory 702 storing computer program instructions.

[0173] Specifically, the processor 701 may include a central processing unit (CPU), an application-specific integrated circuit (ASIC), or one or more integrated circuits that can be configured to implement the embodiments of this application.

[0174] Memory 702 may include mass storage for data or instructions. For example, and not limitingly, memory 702 may include a hard disk drive (HDD), floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape, or Universal Serial Bus (USB) drive, or a combination of two or more of these. Where appropriate, memory 702 may include removable or non-removable (or fixed) media. Where appropriate, memory 702 may be internal or external to the integrated gateway disaster recovery device. In a particular embodiment, memory 702 is non-volatile solid-state memory.

[0175] Memory may include read-only memory (ROM), random access memory (RAM), disk storage media devices, optical storage media devices, flash memory devices, and electrical, optical, or other physical / tangible memory storage devices. Therefore, typically, memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software including computer-executable instructions, and when the software is executed (e.g., by one or more processors), it is operable to perform the operations described with reference to the methods according to one aspect of this disclosure.

[0176] The processor 701 reads and executes computer program instructions stored in the memory 702 to implement any of the tabular data processing methods in the above embodiments.

[0177] In one example, the electronic device may also include a communication interface 703 and a bus 710. For example, Figure 7 As shown, the processor 701, memory 702, and communication interface 703 are connected through bus 710 and complete communication with each other.

[0178] The communication interface 703 is mainly used to realize communication between various modules, devices, units and / or equipment in the embodiments of this application.

[0179] Bus 710 includes hardware, software, or both, that couples components of an electronic device together. For example, and not limitingly, the bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an Infinite Bandwidth Interconnect, a Low Pin Count (LPC) bus, a memory bus, a Microchannel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a Video Electronics Standards Association Local (VLB) bus, or other suitable buses, or combinations of two or more of these. Where appropriate, bus 710 may include one or more buses. Although specific buses are described and illustrated in embodiments of this application, this application contemplates any suitable bus or interconnect.

[0180] Furthermore, in conjunction with the tabular data processing methods in the above embodiments, this application embodiment can provide a computer storage medium for implementation. This computer storage medium stores computer program instructions; when these computer program instructions are executed by a processor, they implement any of the tabular data processing methods in the above embodiments.

[0181] This application also provides a computer program product, including a computer program that, when executed by a processor, implements any of the tabular data processing methods described in the above embodiments.

[0182] It should be clarified that this application is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of this application is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of this application.

[0183] The functional blocks shown in the above block diagram can be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, they can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this application are programs or code segments used to perform the required tasks. Programs or code segments can be stored on a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried on a carrier wave. "Machine-readable medium" can include any medium capable of storing or transmitting information. Examples of machine-readable media include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio frequency (RF) links, etc. Code segments can be downloaded via computer networks such as the Internet, intranets, etc.

[0184] It should also be noted that the exemplary embodiments mentioned in this application describe methods or systems based on a series of steps or apparatus. However, this application is not limited to the order of the above steps; that is, the steps can be performed in the order mentioned in the embodiments, or in a different order, or several steps can be performed simultaneously.

[0185] The aspects of this disclosure have been described above with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It should be understood that each block in the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that these instructions, executable via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions / actions specified in one or more blocks of the flowchart illustrations and / or block diagrams. Such a processor can be, but is not limited to, a general-purpose processor, a special-purpose processor, a special application processor, or a field-programmable logic circuit. It is also understood that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can also be implemented by special-purpose hardware performing the specified functions or actions, or can be implemented by a combination of special-purpose hardware and computer instructions.

[0186] The above are merely specific embodiments of this application. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, modules, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here. It should be understood that the protection scope of this application is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in this application, and these modifications or substitutions should all be covered within the protection scope of this application.

Claims

1. A method for processing tabular data, characterized in that, include: Retrieve the first table and the first query condition; The first table is input into the structure encoding network of the table structure feature perception model for structure encoding processing to obtain the first data sequence; wherein, the structure encoding processing includes: adding row position information and column position information to each cell of the first table, and using self-attention calculation to make each cell only interact with cells in the same row and column; The first data sequence is input into a large language model so that the large language model processes the first query condition and obtains the first query result.

2. The method according to claim 1, characterized in that, The table structure feature perception model further includes a filtering module. Before inputting the first table into the structure encoding network of the table structure feature perception model, the method further includes: The first table and the first query conditions are input into the filtering module of the table structure feature perception model, and the first table is filtered based on the semantic relevance of the first query conditions and the first table to obtain the filtered first table. The step of inputting the first table into the structure encoding network of the table structure feature perception model includes: The filtered first table is input into the structure encoding network of the table structure feature perception model.

3. The method according to claim 2, characterized in that, The first table, obtained by filtering based on the semantic relevance of the first query conditions and the first table, includes: The first query condition, the data in each row of the first table, and the headers of each list are converted into corresponding semantic representations. Calculate the first similarity between the semantic representation of the first query condition and the semantic representation of each row of data in the first table; Calculate the second similarity between the semantic representation of the first query condition and the semantic representation of each column header of the first table; Rows with a first similarity higher than a preset row threshold and columns with a second similarity higher than a preset column threshold are filtered to obtain the filtered first table.

4. The method according to claim 1, characterized in that, The method further includes: Obtain a first training sample set; wherein the first training sample set includes multiple first training samples, and the first training samples contain a second table, a second query condition, and a second query result corresponding to the second query condition; The second table is input into the structure encoding network of the table structure feature perception model for structure encoding processing to obtain the second data sequence; The second data sequence is input into the large language model so that the large language model processes the second query condition and obtains the first target query result. Based on the first target query result and the second query result, determine the first loss value of the first loss function; If the first loss value does not meet the first training stopping condition, keep the parameters of the large language model unchanged, adjust the parameters of the structure encoding network according to the first loss value to obtain the updated table structure feature perception model; and return to the structure encoding network of the table structure feature perception model by inputting the second table until the first loss value meets the first training stopping condition to obtain the trained target table structure feature perception model.

5. The method according to claim 1, characterized in that, The first table includes multiple first tables, and the table structure feature perception model further includes a self-attention module. Before inputting the first data sequence into the large language model, the method further includes: Obtain the first data sequence corresponding to each first table; The first data sequences corresponding to multiple first tables are concatenated to generate a third data sequence; The third data sequence is input into the self-attention module of the table structure feature perception model to perform self-attention calculation, so that the first data sequences corresponding to each first table can interact with each other to obtain the fourth data sequence. The step of inputting the first data sequence into the large language model includes: The fourth data sequence is input into the large language model.

6. The method according to claim 5, characterized in that, The step of inputting the third data sequence into the self-attention module of the table structure feature perception model for self-attention calculation, so as to enable information interaction between the first data sequences corresponding to each first table, to obtain the fourth data sequence, includes: Add an aggregation identifier at the very beginning of the third data sequence; The third data sequence with the added aggregation identifier is input into the self-attention module of the table structure feature perception model to perform self-attention calculation, so that the aggregation identifier interacts with the first data sequence corresponding to each first table to obtain the fourth data sequence.

7. The method according to any one of claims 5-6, characterized in that, The method further includes: Obtain a second training sample set; wherein the second training sample set includes multiple second training samples, and the second training samples contain multiple third tables, third query conditions, and third query results corresponding to the third query conditions; The third table is input into the table encoding network of the table structure feature perception model for encoding processing to obtain the fifth data sequence. The fifth data sequences corresponding to the multiple third tables are concatenated to generate the sixth data sequence; The sixth data sequence is input into the self-attention module of the table structure feature perception model to perform self-attention calculation, so that the fifth data sequences corresponding to each third table can interact with each other to obtain the seventh data sequence. The seventh data sequence is input into the large language model so that the large language model processes the third query condition and obtains the second target query result. Based on the second target query result and the third query result, determine the second loss value of the second loss function; If the second loss value does not meet the second training stopping condition, keep the parameters of the large language model unchanged, adjust the parameters of the self-attention module of the structure encoding network according to the second loss value, and obtain the updated table structure feature perception model; and return to the table encoding network that inputs the third table into the table structure feature perception model until the second loss value meets the second training stopping condition, and obtain the trained target table structure feature perception model.

8. A tabular data processing device, characterized in that, The device includes: The data acquisition module is used to acquire the first table and the first query conditions; The structure encoding module is used to input the first table into the structure encoding network of the table structure feature perception model, perform structure encoding processing, and obtain the first data sequence; wherein, the structure encoding processing includes: adding row position information and column position information to each cell of the first table, and using self-attention calculation to make each cell only interact with cells in the same row and column; The inference output module is used to input the first data sequence into the large language model so that the large language model processes the first query condition and obtains the first query result.

9. An electronic device, characterized in that, The device includes: a processor and a memory storing computer program instructions; When the processor executes the computer program instructions, it implements the tabular data processing method as described in any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer program instructions, which, when executed by a processor, implement the tabular data processing method as described in any one of claims 1-7.

11. A computer program product, characterized in that, When the instructions in the computer program product are executed by the processor of the electronic device, the electronic device performs the tabular data processing method as described in any one of claims 1-7.