Financial business-based model data processing method and apparatus, device, and medium

By splitting the decoder layer of a large model into virtual layers and distributing them evenly across the computing card based on performance information, the problem of low computing card efficiency is solved, achieving more efficient computing card utilization and cost savings.

WO2026138270A1PCT designated stage Publication Date: 2026-07-02INDUSTRIAL AND COMMERCIAL BANK OF CHINA

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
INDUSTRIAL AND COMMERCIAL BANK OF CHINA
Filing Date
2025-11-19
Publication Date
2026-07-02

Smart Images

  • Figure CN2025136182_02072026_PF_FP_ABST
    Figure CN2025136182_02072026_PF_FP_ABST
Patent Text Reader

Abstract

The present application relates to the field of financial technology and the field of model data processing, and provides a financial business-based model data processing method and apparatus, a device, and a medium. The method comprises: acquiring a financial business model to be processed and a preset number of stages; parsing a decoder layer into virtual layers, and on the basis of the vocabulary of an embedding layer and the dimension of the decoder layer, determining the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to a linear transformation layer; and on the basis of the preset number of stages and the total number of virtual layers in said financial business model, determining the number of virtual layers in each model block, and on the basis of the number of virtual layers in each model block, deploying each model block into a compute card in a corresponding stage. The method of the present application improves the computational efficiency of model training and reduces the computational consumption of model training.
Need to check novelty before this filing date? Find Prior Art

Description

Methods, apparatus, equipment, and media for processing model data based on financial business.

[0001] This application claims priority to Chinese Patent Application No. 202411929138.X, filed on December 25, 2024, entitled “Method, Apparatus, Device and Medium for Processing Model Data Based on Financial Business”, the entire contents of which are incorporated herein by reference. Technical Field

[0002] This application relates to the fields of financial technology and model data processing, and more specifically, to a method, apparatus, device, and medium for processing model data based on financial business. Background Technology

[0003] With the development of large model technology, it has been widely used in financial scenarios. Model training is a key step in the large model generation process. During model training, a large amount of data and computing power are usually required. Since the parameter scale of the large model used usually exceeds the carrying capacity of a single computing card, hundreds or even thousands of computing cards are needed to complete the training.

[0004] Currently, hardware optimization methods for training large models are costly, while software optimization methods struggle to achieve the desired computational efficiency. Therefore, improving model training efficiency and the computational efficiency of computing cards have become urgent problems to be solved. Summary of the Invention

[0005] The purpose of this application is to provide a method, apparatus, device, and medium for processing model data based on financial business, in order to solve the technical problem of low computational efficiency during the training of large financial business models.

[0006] Firstly, this application discloses a method for processing model data based on financial transactions, including:

[0007] Obtain the financial business model to be processed and the number of preset stages; wherein, the financial business model to be processed is a pre-built large model used to process the transaction data of financial business, the financial business model to be processed includes an embedding layer, a linear transformation layer and at least one decoder layer, and the number of preset stages represents the number of model blocks of the financial business model to be processed, and each stage corresponds to at least one computing card.

[0008] The decoder layer is parsed into a virtual layer. Based on the vocabulary of the embedding layer and the dimension of the decoder layer, the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer are determined. The virtual layer includes an attention layer and a multilayer perceptron layer. The attention layer includes a normalization layer and a multi-head attention layer. The multilayer perceptron layer includes another normalization layer and a feedforward layer. The vocabulary represents the set of words used by the large model.

[0009] Based on the preset number of stages and the total number of virtual layers in the financial business model to be processed, the number of virtual layers in each model block is determined, and the model block is deployed to the computing card of the corresponding stage according to the number of virtual layers in the model block.

[0010] Optionally, the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transform layer are determined based on the vocabulary of the embedding layer and the dimension of the decoder layer, including:

[0011] Based on the vocabulary of the embedding layer and the dimensions of the decoder layer, the performance information of the embedding layer, the linear transform layer, the attention layer, and the multilayer perceptron layer is determined; wherein, the performance information includes memory usage information and computational cost information;

[0012] Based on the performance information of the embedding layer, the linear transformation layer, the attention layer, and the multilayer perceptron layer, the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer are determined.

[0013] Optionally, based on the vocabulary of the embedding layer and the dimensions of the decoder layer, the performance information of the embedding layer, the performance information of the linear transform layer, the performance information of the attention layer, and the performance information of the multilayer perceptron layer are determined, including:

[0014] Obtain the input data dimension of the decoder layer, and determine a first ratio based on the dimension of the decoder layer and the input data dimension of the decoder layer; wherein, the first ratio represents the ratio between the input data dimension of the decoder layer and the dimension of the decoder layer;

[0015] Obtain the number of query heads and the number of key-value heads in the multi-head attention layer, and determine a second ratio based on the number of query heads and the number of key-value heads; wherein, the second ratio represents the ratio between the number of query heads and the number of key-value heads;

[0016] Obtain the dimension of the feedforward layer, and determine a third ratio based on the dimension of the feedforward layer and the dimension of the decoder layer; wherein, the third ratio represents the ratio between the dimension of the feedforward layer and the dimension of the decoder layer;

[0017] A fourth ratio is determined based on the vocabulary of the embedding layer and the dimension of the decoder layer; wherein the fourth ratio represents the ratio between the size of the vocabulary of the embedding layer and the dimension of the decoder layer.

[0018] Based on the first ratio, the second ratio, the third ratio, and the fourth ratio, the performance information of the embedding layer, the linear transformation layer, the attention layer, and the multilayer perceptron layer are determined.

[0019] Optionally, the computational complexity information in the embedding layer performance information is 0; the performance information of the embedding layer, the linear transformation layer, the attention layer, and the multilayer perceptron layer are determined based on the first ratio, the second ratio, the third ratio, and the fourth ratio, including:

[0020] The performance information of the attention layer is determined based on the first ratio and the second ratio;

[0021] The performance information of the multilayer sensor layer is determined based on the first ratio and the third ratio.

[0022] The performance information of the linear transformation layer is determined based on the first ratio and the fourth ratio.

[0023] The performance information of the embedding layer is determined based on the first ratio and the fourth ratio.

[0024] Optionally, the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer are determined based on the performance information of the embedding layer, the performance information of the linear transformation layer, the performance information of the attention layer, and the performance information of the multilayer perceptron layer, including:

[0025] The number of virtual layers corresponding to the embedding layer is determined based on the performance information of the embedding layer, the performance information of the multilayer perceptron layer, and the performance information of the attention layer.

[0026] The number of virtual layers corresponding to the linear transformation layer is determined based on the performance information of the linear transformation layer, the performance information of the attention layer, and the performance information of the multilayer perceptron layer.

[0027] Optionally, the number of virtual layers corresponding to the embedding layer is determined based on the performance information of the embedding layer, the performance information of the multilayer perceptron layer, and the performance information of the attention layer, including:

[0028] Based on the memory usage information of the embedding layer, the memory usage information of the multilayer perceptron layer, and the memory usage information of the attention layer, the memory usage information of the virtual layer corresponding to the embedding layer is determined;

[0029] Based on the memory usage information of the embedded layer and the memory usage information of the virtual layer corresponding to the embedded layer, a first threshold for the number of virtual layers corresponding to the embedded layer is determined; wherein, the first threshold represents the maximum value of the number of virtual layers corresponding to the embedded layer.

[0030] The number of virtual layers corresponding to the embedding layer is determined based on the first threshold.

[0031] Optionally, the number of virtual layers corresponding to the linear transformation layer is determined based on the performance information of the linear transformation layer, the performance information of the attention layer, and the performance information of the multilayer perceptron layer, including:

[0032] Based on the memory usage information of the linear transformation layer, the memory usage information of the attention layer, and the memory usage information of the multilayer perceptron layer, the memory usage information of the virtual layer corresponding to the linear transformation layer is determined.

[0033] Based on the memory usage information of the linear transformation layer and the memory usage information of the virtual layer corresponding to the linear transformation layer, a second threshold is determined for the number of virtual layers corresponding to the linear transformation layer; wherein, the second threshold represents the maximum value of the number of virtual layers corresponding to the linear transformation layer.

[0034] Based on the computational complexity information of the linear transformation layer, the computational complexity information of the attention layer, and the computational complexity information of the multilayer perceptron layer, the computational complexity information of the virtual layer corresponding to the linear transformation layer is determined.

[0035] Based on the computational complexity information of the linear transformation layer and the computational complexity information of the virtual layer corresponding to the linear transformation layer, a third threshold is determined for the number of virtual layers corresponding to the linear transformation layer; wherein, the third threshold represents the minimum value of the number of virtual layers corresponding to the linear transformation layer.

[0036] The number of virtual layers corresponding to the linear transformation layer is determined based on the second threshold and the third threshold.

[0037] Optionally, the computational complexity information of the virtual layer corresponding to the embedded layer is 0. The number of virtual layers in each model block is determined based on the preset number of stages and the total number of virtual layers in the financial business model to be processed, including:

[0038] The quotient of the preset number of stages and the total number of virtual layers in the financial business model to be processed is determined as the number of layers to be adjusted, the remainder of the preset number of stages and the total number of virtual layers in the financial business model to be processed is determined as the number of layers to be allocated, and the preset number of stages is determined.

[0039] Based on the number of layers to be adjusted, the number of virtual layers corresponding to the linear transformation layer, the performance information of the virtual layer corresponding to the embedding layer, and the performance information of the virtual layer corresponding to the linear transformation layer, determine the number of virtual layers in the last model block of the financial business model to be processed;

[0040] The number of virtual layers in each model block is determined based on the preset number of stages, the number of layers to be adjusted, the number of layers to be allocated, and the number of virtual layers in the last model block.

[0041] Secondly, this application discloses a processing apparatus for model data based on financial transactions, comprising:

[0042] The acquisition unit is used to acquire the financial business model to be processed and the number of preset stages. The financial business model to be processed is a pre-built large model used to process the transaction data of financial business. The financial business model to be processed includes an embedding layer, a linear transformation layer and at least one decoder layer. The number of preset stages represents the number of model blocks of the financial business model to be processed. Each stage corresponds to at least one computing card.

[0043] The parsing unit is used to parse the decoder layer into virtual layers. Based on the vocabulary of the embedding layer and the dimension of the decoder layer, it determines the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer. The virtual layers include an attention layer and a multilayer perceptron layer. The attention layer includes a normalization layer and a multi-head attention layer. The multilayer perceptron layer includes another normalization layer and a feedforward layer. The vocabulary represents the set of words used by the large model.

[0044] The allocation unit is used to determine the number of virtual layers in each model block according to the preset number of stages and the total number of virtual layers in the financial business model to be processed, and to deploy the model block to the computing card of the corresponding stage according to the number of virtual layers in the model block.

[0045] Thirdly, embodiments of this application provide a text data generation device based on user needs, including: a memory and a processor;

[0046] The memory stores computer-executed instructions;

[0047] The processor executes computer execution instructions stored in the memory, causing the processor to perform the first aspect and / or various possible implementations of the first aspect as described above.

[0048] Fourthly, embodiments of this application provide a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, are used to implement the first aspect and / or various possible implementations of the first aspect.

[0049] Fifthly, embodiments of this application provide a computer program product, including a computer program that, when executed by a processor, implements the first aspect and / or various possible implementations of the first aspect.

[0050] Based on the above technical solutions, the processing method, apparatus, equipment, and medium for model data based on financial business provided in this application, by obtaining a preset number of stages and the financial business model to be processed, splits the decoder layer in the financial business model, dividing the decoder layer into virtual layers. Based on information such as the vocabulary of the embedding layer and the dimensions of the decoder layer, the number of virtual layers corresponding to the embedding layer and the linear transformation layer is determined, resulting in the total number of virtual layers corresponding to the financial business model. Based on the total number of virtual layers in the financial business model, the number of virtual layers in each model block is determined, and the model blocks are deployed to the computing cards of the corresponding stages for computation and training of the financial business model. By splitting the decoder layer, the content of each layer in the model is made smaller, making the model easier to split and allocate to the corresponding stages, thus improving the utilization rate of the computing cards. By determining the number of virtual layers corresponding to the embedding layer and the linear transformation layer, the total number of virtual layers corresponding to the entire model can be uniformly obtained, making the model split more even and consistent, and improving the overall computational efficiency. Attached Figure Description

[0051] Figure 1 is a flowchart illustrating a method for processing model data based on financial business, as provided in an embodiment of this disclosure.

[0052] Figure 2 is a schematic diagram of an exemplary decoder layer provided in an embodiment of this disclosure;

[0053] Figure 3 is a flowchart illustrating a method for processing model data based on financial business, as provided in an embodiment of this disclosure.

[0054] Figure 4 is a flowchart illustrating a method for processing model data based on financial business provided in an embodiment of this disclosure;

[0055] Figure 5 is a structural block diagram of a model data processing device based on financial business provided in an embodiment of this disclosure;

[0056] Figure 6 is a structural block diagram of a model data processing device based on financial business provided in an embodiment of this disclosure;

[0057] Figure 7 is a structural block diagram of an electronic device provided in an embodiment of this disclosure;

[0058] Figure 8 is a block diagram illustrating an electronic device according to an exemplary embodiment.

[0059] The accompanying drawings illustrate specific embodiments of this application, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concept of this application to those skilled in the art through reference to particular embodiments. Detailed Implementation

[0060] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.

[0061] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, storage, use, processing, transmission, provision, disclosure, and application of the relevant data all comply with the relevant laws, regulations, and standards of the relevant countries and regions, have taken necessary confidentiality measures, do not violate public order and good morals, and provide corresponding operation portals for users to choose to authorize or refuse.

[0062] Furthermore, the technical solution involved in this application, which involves big data analysis of user information (including but not limited to personal biometrics, identity data, consumption data, asset data, electronic terminal operation data, etc.) and the use of artificial intelligence technology for automated decision-making, and makes decisions that have a significant impact on personal rights based on the results of automated decision-making, provides users with corresponding operation entry points for users to choose to agree to or reject the results of automated decision-making; if the user chooses to reject, the process will proceed to the expert decision-making process.

[0063] It should be noted that the processing method, apparatus, equipment and medium for model data based on financial business provided in this application can be used in the field of financial technology, or in any field other than financial technology. The application field of the processing method, apparatus, equipment and medium for model data based on financial business in this application is not limited.

[0064] First, let's explain the terms used in this application:

[0065] NVLink: NVLink is a bus and its communication protocol developed and launched by NVIDIA.

[0066] Llama3 model: The Llama3 model is a new generation of large-scale language models. The Llama3 model adopts the standard pure decoder Transformer architecture and has multiple versions with parameters ranging from 8 billion to 400 billion.

[0067] Compute Card: A Compute Card is a piece of hardware designed for efficient data computing and is mainly used in fields such as scientific computing, big data processing, deep learning, and artificial intelligence.

[0068] for loop: The for loop is a loop statement in programming languages. The loop statement consists of two parts: the loop body and the loop condition. Its expression is: for(single expression; condition expression; final loop body){middle loop body;}.

[0069] With the development of large-scale model technology, large-scale models are increasingly being applied to various business scenarios, such as financial services and other fintech fields. Model training is a crucial step in utilizing large-scale models. Training large-scale models in the fintech field typically requires a large amount of data and computing power. Depending on the scale of the model, the number of computing cards used ranges from hundreds to thousands, requiring continuous computation for dozens of days or more. Therefore, given limited computing resources, improving the computational efficiency of each card has become a significant challenge in the model training process.

[0070] Currently, there are generally two ways to improve model computation efficiency. One is to optimize at the hardware level, such as using NVLink technology, which optimizes computation efficiency by increasing the communication bandwidth between each card. However, hardware optimization is costly and not feasible in application scenarios involving large-scale computation. Another approach is to optimize at the software level, that is, to improve computational efficiency by optimizing the computational logic of large models. Examples include tensor parallelism and pipelined parallelism. Tensor parallelism can strictly and evenly divide large models, making the computational load on each card completely consistent. However, tensor parallelism is highly complex to implement, and the strict partitioning and allocation of parameters itself requires a certain amount of time and computational resources, affecting overall computational efficiency. Pipeline parallelism, on the other hand, usually distributes all layers approximately evenly to different computing cards based on the memory usage of each layer's parameters. Since it only considers the memory usage of each layer and does not estimate the memory usage of intermediate variables and floating-point operations during the computation process, the partitioning is not balanced enough. This can easily lead to uneven workloads on computing cards, with some computing cards having to wait, resulting in insufficient utilization of computing cards and affecting overall computational efficiency.

[0071] The method, apparatus, equipment, and medium for processing model data based on financial business provided in this application are intended to solve the above-mentioned technical problems of the prior art.

[0072] The specific application scenario of this application is data processing of financial business models. It is used to evenly distribute the financial business models according to the preset number of stages, and to evenly distribute the financial business models to different computing cards for calculation, thereby improving computing efficiency and saving training costs for the models.

[0073] The technical solution of this application and how the technical solution of this application solves the above-mentioned technical problems are described in detail below with specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. The embodiments of this application will now be described with reference to the accompanying drawings.

[0074] Figure 1 is a flowchart illustrating a method for processing model data based on financial business provided in an embodiment of this disclosure. This method can be executed by a device for processing model data based on financial business.

[0075] As shown in Figure 1, the method includes the following steps:

[0076] S101. Obtain the financial business model to be processed and the number of preset stages; wherein, the financial business model to be processed is a pre-built large model used to process the transaction data of financial business, the financial business model to be processed includes an embedding layer, a linear transformation layer and at least one decoder layer, and the number of preset stages represents the number of model blocks of the financial business model to be processed, and each stage corresponds to at least one computing card.

[0077] For example, a financial business model to be processed is obtained. This model is a pre-built large model used to process transaction data from financial transactions, enabling risk management, transaction analysis, etc., based on the data. The pre-built large model can be, for example, the Llama3 model and its derivatives. The financial business model to be processed can include an embedding layer, a linear transformation layer, and at least one decoder layer. The embedding layer, located in the first layer of the model, converts the original text data into vectors based on the model's vocabulary. The vocabulary enhances the model's language processing capabilities and encoding efficiency; that is, the size of the vocabulary directly affects the model's language processing ability—the larger the vocabulary, the stronger the model's text processing capability. The linear transformation layer (LM Head), located in the last layer of the model, generates text or performs language modeling tasks based on the vocabulary size. The decoder layer processes the vector sequence input from the embedding layer, generates a corresponding output sequence based on the input sequence, obtains the mapping relationship between input and output, and generates text based on the mapping relationship.

[0078] Obtain the preset number of stages, that is, the number of stages in the pipeline for training the financial business model to be processed; the preset number of stages represents the number of model blocks of the financial business model to be processed, and the model block represents a part of the financial business model to be processed. The model block is used to deploy to the computing card corresponding to the stage, and the computing card completes the computing task; there is at least one computing card corresponding to each stage, and the computing card is used to complete the computing task in the model training process.

[0079] In this embodiment, no specific limitation is made on the type of large model.

[0080] S102. Parse the decoder layer into a virtual layer. Based on the vocabulary of the embedding layer and the dimension of the decoder layer, determine the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer. The virtual layer includes an attention layer and a multi-layer perceptron layer. The attention layer includes a normalization layer and a multi-head attention layer. The multi-layer perceptron layer includes another normalization layer and a feedforward layer. The vocabulary represents the set of words used by the large model.

[0081] For example, the decoder layer is parsed into a virtual layer. One decoder layer can be parsed into multiple virtual layers, where the virtual layer can include an attention layer and a multilayer perceptron layer. The attention layer includes a normalization layer and a multi-head attention layer, and the multilayer perceptron layer includes another normalization layer and a feedforward layer. For any decoder layer in the financial business model to be processed, the decoder layer is parsed into two independent parts: an attention layer and a multilayer perceptron layer.

[0082] Figure 2 is a schematic diagram of an exemplary decoder layer provided in an embodiment of this disclosure. As shown in Figure 2, the structure of the decoder layer can be analyzed as an attention layer and a multi-head perceptron layer. The attention layer, located above the dashed line, includes a normalization layer (Norm layer) and a multi-head attention layer. The normalization layer is used to normalize the input content x and input the normalized content into the multi-head attention layer. The multi-head attention layer is used to calculate the importance of each word in the input sequence to other words, thereby capturing global contextual information. The multi-head perceptron layer, i.e., the MLP layer in the figure, is located below the dashed line and includes another normalization layer (Norm layer) and a feedforward layer. The other normalization layer is used to normalize the intermediate quantity h and input the normalized intermediate quantity into the feedforward layer. The feedforward layer is responsible for extracting higher-level semantic features.

[0083] The calculation process formula for the decoder layer in Figure 2 can be expressed as follows:

[0084] Where x represents the input, y represents the output, and h represents the intermediate quantity generated by the input x after passing through the attention layer. norm() represents the processing function of the normalization layer, attention() represents the processing function of the multi-head attention layer, and mlp() represents the processing function of the feedforward layer. This expression means that the intermediate quantity h represents the input x plus the normalized input x after processing by the multi-head attention layer; the output y represents the intermediate quantity h plus the normalized intermediate quantity h after processing by the feedforward layer.

[0085] Based on the vocabulary of the embedding layer and the dimensions of the decoder layer, determine the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer. For example, this could involve obtaining the vector dimension of the embedding layer, i.e., the dimension of the vector transformed from text by the embedding layer. Obtain the output dimension of the linear transformation layer, i.e., the size of the output vector of the linear transformation layer. Based on the size of the embedding layer's vocabulary and its vector dimension, determine the memory usage of the embedding layer as the first usage; based on the size of the embedding layer's vocabulary, its vector dimension, and the output dimension of the linear transformation layer, determine the memory usage of the linear transformation layer as the second usage; based on the vector dimension of the embedding layer and the dimension of the decoder layer, determine the memory usage of the decoder layer as the third usage. Based on the first and third usages, determine the numerical relationship between the memory usage of the embedding layer and the decoder layer to obtain the first value. Based on the first value, determine the number of virtual layers corresponding to the embedding layer. Based on the second and third usages, determine the numerical relationship between the memory usage of the linear transformation layer and the decoder layer as the second numerical relationship. Based on the second numerical relationship, determine the number of virtual layers corresponding to the linear transformation layer. For example, the first value is 0.7, which means that the memory usage of the embedding layer is 0.7 times that of the decoder layer. Rounding it up, we get that the embedding layer is equal to 1 decoder layer. Since the decoder layer can be resolved into an attention layer and a multilayer perceptron layer, the number of virtual layers corresponding to the embedding layer is determined to be 2.

[0086] In other words, the number of virtual layers corresponding to an embedding layer can be seen as the number of virtual layers that an embedding layer is equivalent to; the number of virtual layers corresponding to a linear transformation layer can be seen as the number of virtual layers that a linear transformation layer is equivalent to.

[0087] S103. Based on the preset number of stages and the total number of virtual layers in the financial business model to be processed, determine the number of virtual layers in each model block, and deploy the model block to the computing card of the corresponding stage according to the number of virtual layers in the model block.

[0088] For example, the total number of virtual layers in the financial business model is determined based on the number of virtual layers corresponding to the embedding layer, the number of virtual layers corresponding to the linear transformation layer, and the number of decoder layers in the financial business model. Based on the preset number of stages and the total number of virtual layers in the financial business model, the number of virtual layers in each model block is determined, and the model blocks are deployed to the corresponding stage's computing card according to the number of virtual layers in each model block. For example, the financial business model can be divided into a preset number of stage model blocks, and the number of virtual layers in each model block is determined sequentially, layer by layer, to ensure that the number of virtual layers in each model block is the same. For example, the total number of virtual layers can be divided by the number of stages to obtain the number of virtual layers corresponding to each model block. If the result is an integer, it can be evenly distributed; if the result is a decimal, it is rounded up to obtain an integer. The virtual layers corresponding to the embedding layer are located in the first model block, and the virtual layers corresponding to the linear transformation layer are located in the last model block.

[0089] Based on the number of virtual layers in the model block, the model block is deployed to the computing card of the corresponding stage, and the computing card is instructed to perform calculations.

[0090] The method for processing model data based on financial business provided in this embodiment obtains the preset number of stages and the financial business model to be processed. It then splits the decoder layer of the financial business model into virtual layers. Based on information such as the vocabulary of the embedding layer and the dimensions of the decoder layer, the number of virtual layers corresponding to the embedding layer and the linear transformation layer is determined, resulting in the total number of virtual layers for the financial business model. Based on the total number of virtual layers in the financial business model, the number of virtual layers in each model block is determined, and the model blocks are deployed to the computing cards of the corresponding stages for computation and training. By splitting the decoder layer, the content of each layer in the model is made smaller, making the model easier to split and allocate to the corresponding stages, thus improving the utilization rate of the computing cards. By determining the number of virtual layers corresponding to the embedding layer and the linear transformation layer, the total number of virtual layers for the entire model can be uniformly obtained, making the model split more even and consistent, and improving the overall computational efficiency.

[0091] Figure 3 is a flowchart illustrating a method for processing model data based on financial business, as provided in an embodiment of this disclosure.

[0092] In this embodiment, the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer are determined based on the vocabulary of the embedding layer and the dimensions of the decoder layer. This includes: determining the memory usage, computational cost, linear transformation layer, attention layer, multilayer perceptron layer, and multilayer perceptron layer based on the vocabulary of the embedding layer and the dimensions of the decoder layer; and determining the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer based on the memory usage, computational cost, linear transformation layer, attention layer, multilayer perceptron layer, and multilayer perceptron layer.

[0093] As shown in Figure 3, the method includes the following steps:

[0094] S301. Obtain the financial business model to be processed and the number of preset stages; wherein, the financial business model to be processed is a pre-built large model used to process the transaction data of financial business, the financial business model to be processed includes an embedding layer, a linear transformation layer and at least one decoder layer, and the number of preset stages represents the number of model blocks of the financial business model to be processed, and each stage corresponds to at least one computing card.

[0095] For example, this step can refer to step S101 above, and will not be repeated here.

[0096] S302. Based on the vocabulary of the embedding layer and the dimensions of the decoder layer, determine the performance information of the embedded layer, the linear transformation layer, the attention layer, and the multilayer perceptron layer; wherein, the performance information includes memory usage information and computational load information.

[0097] For example, the length of the input data sequence of the decoder layer is obtained, and a first ratio, denoted as ρ, is determined based on the dimension of the decoder layer and the length of the input data sequence. s The first ratio is the ratio between the dimension of the input data of the decoder layer and the dimension of the decoder layer.

[0098] Obtain the number of query heads and key-value heads in the multi-head attention layer. Based on these numbers, determine the second ratio, denoted as ρ. kv The second ratio represents the ratio of the number of query heads to the number of key-value heads. The multi-head attention layer has a multi-head attention mechanism, which allows the processing of multiple heads. Each head, i.e., each attention subspace, will independently process a set of query matrices, key matrices, and value matrices. The number of query heads is the number of heads corresponding to the query matrix. The key-value matrix is ​​the number of heads corresponding to the key matrix and the value matrix.

[0099] Obtain the dimensions of the feedforward layer, and determine the third ratio, denoted as ρ, based on the dimensions of the feedforward layer and the decoder layer. mlp The third ratio represents the ratio between the dimension of the feedforward layer and the dimension of the decoder layer; the dimension of the feedforward layer is the dimension of the intermediate layer in the feedforward layer, the intermediate layer is the layer between the input layer and the output layer of the feedforward layer, and the dimension of the intermediate layer represents the number of neurons in the intermediate layer.

[0100] The fourth ratio, denoted as ρ, is determined based on the vocabulary of the embedding layer and the dimensions of the decoder layer. head The fourth ratio represents the ratio between the vocabulary size of the embedding layer and the dimension of the decoder layer.

[0101] Based on the first ratio, second ratio, third ratio, and fourth ratio, the performance information of the embedding layer, the linear transformation layer, the attention layer, and the multilayer perceptron layer is determined. The performance information includes memory usage information and computational load information; the memory usage information represents the memory consumption of the layer, and the computational load information represents the computational load of the layer.

[0102] The advantage of this setup is that by determining the first ratio, second ratio, third ratio, and fourth ratio based on each parameter, the ratio relationship of each layer can be determined, and the performance information of each layer can be determined based on the ratio relationship. This allows the computational load and memory usage between layers to be accurately quantified, facilitating subsequent processing.

[0103] In this embodiment, the computational complexity information in the embedding layer performance information is 0. The performance information of the embedding layer, the linear transformation layer, the attention layer, and the multilayer perceptron layer are determined based on a first ratio, a second ratio, a third ratio, and a fourth ratio. This includes: determining the performance information of the attention layer based on the first and second ratios; determining the performance information of the multilayer perceptron layer based on the first and third ratios; determining the performance information of the linear transformation layer based on the first and fourth ratios; and determining the performance information of the embedding layer based on the first and fourth ratios.

[0104] For example, the computational load information in the performance information of the embedded layer is preset to 0.

[0105] Based on the first ratio and the second ratio, the performance information of the attention layer is determined. For example, the memory usage information of the attention layer is determined by using a preset formula for the memory usage of the attention layer, and the computational usage information of the attention layer is determined by using a preset formula for the computational usage of the attention layer.

[0106] The preset formula for the memory usage of the attention layer can be: Mattn =2+2ρ kv +6ρ s

[0107] Among them, M attn The formula represents the memory usage of the attention layer, M. attn It equals 2 plus twice the second ratio ρ kv Add 6 times the first ratio ρ s .

[0108] The pre-defined formula for the computational complexity of the attention layer can be: F attn =2+2ρ kv +3ρ s

[0109] Where F attn The formula represents the computational complexity of the attention layer, F. attn It equals 2 plus twice the second ratio ρ kv Add three times the first ratio ρ s .

[0110] Based on the first ratio and the third ratio, the performance information of the multilayer perceptron layer is determined. For example, the memory usage information in the performance information of the multilayer perceptron layer is determined by using a preset formula for the memory usage of the multilayer perceptron layer, and the computational load information in the performance information of the multilayer perceptron layer is determined by using a preset formula for the computational load of the multilayer perceptron layer based on the third ratio.

[0111] The preset formula for the memory usage of a multilayer perceptron can be: M mlp =3ρ mlp +2ρ s ρ mlp +2ρ s

[0112] Among them, M mlp The formula represents the memory usage of a multilayer perceptron layer, M. mlp It equals 3 times the third ratio plus 2 times the product of the first and third ratios, plus 2 times the first ratio.

[0113] The pre-defined formula for the computational complexity of a multilayer perceptron can be: F mlp =3ρ mlp

[0114] Among them, F mlp The formula represents the computational complexity of a multilayer perceptron layer, F. mlp It equals three times the third ratio.

[0115] Based on the first ratio and the fourth ratio, the performance information of the linear transformation layer is determined. The memory usage information in the performance information of the linear transformation layer is determined using a preset formula for the memory usage of the linear transformation layer. Based on the fourth ratio, the computational load information in the performance information of the linear transformation layer is determined using a preset formula for the computational load of the linear transformation layer.

[0116] The default formula for the memory usage of the linear transformation layer can be: M head =ρ head +ρ s +ρ head ρ s

[0117] Among them, M head The formula represents the memory usage of the linear transformation layer, M. head It equals the fourth ratio plus the first ratio, plus the product of the fourth ratio and the first ratio.

[0118] The pre-defined formula for the computational complexity of the linear transformation layer can be: F head =ρ head

[0119] Among them, F head The formula characterizes the computational complexity of the linear transformation layer, F. head It equals the fourth ratio.

[0120] Based on the first ratio and the fourth ratio, and according to the preset formula for the memory usage of the embedded layer, the memory usage information in the performance information of the embedded layer is determined.

[0121] The default formula for the memory usage of the embedded layer can be: M emb =ρ head +ρ s

[0122] Among them, M emb The formula represents the memory usage of the embedding layer, M. emb It equals the fourth ratio plus the first ratio.

[0123] The advantage of this setting is that by calculating the memory usage and computational load of each layer in the model based on each ratio, a more accurate calculation of the computational load and memory usage of each layer can be obtained. This can serve as an effective basis for allocating the model and solves the problem of uneven allocation caused by considering only the memory usage of each layer without considering the computational load in the existing technology.

[0124] S303. Based on the performance information of the embedding layer, the linear transformation layer, the attention layer, and the multilayer perceptron layer, determine the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer.

[0125] For example, the number of virtual layers corresponding to the embedding layer is determined based on the memory usage information of the embedding layer, the multilayer perceptron layer, and the attention layer. For instance, the memory usage of the embedding layer can be divided by the sum of the memory usage of the multilayer perceptron layer and the attention layer to obtain the ratio between the embedding layer and the multilayer perceptron and attention layers, which is the first quotient value. That is, how many multilayer perceptron and attention layers does the embedding layer correspond to? Rounding this first quotient up to the nearest integer and multiplying it by 2 gives the number of virtual layers corresponding to the embedding layer.

[0126] In other words, the multilayer perceptron layer and the attention layer are regarded as a pair of virtual layers, and the memory usage of the multilayer perceptron layer and the attention layer are regarded as the memory usage of a pair of virtual layers. Based on the memory usage information of the embedding layer, the memory usage information of the multilayer perceptron layer, and the memory usage information of the attention layer, the memory usage of the embedding layer is determined to be equivalent to the memory usage of how many pairs of virtual layers, and the number of pairs of virtual layers corresponding to the embedding layer is obtained. The number of pairs of virtual layers corresponding to the embedding layer is then multiplied by 2 to obtain the number of virtual layers corresponding to the embedding layer.

[0127] Based on the performance information of the linear transformation layer, the attention layer, and the multilayer perceptron layer, the number of virtual layers corresponding to the linear transformation layer can be determined. For example, the memory usage of the linear transformation layer can be divided by the sum of the memory usage of the multilayer perceptron layer and the attention layer. The ratio between the linear transformation layer and the multilayer perceptron layer and the attention layer in terms of memory usage is the second quotient, which indicates how many multilayer perceptron layers and attention layers the linear transformation layer is equivalent to in terms of memory usage.

[0128] In other words, based on the memory usage information of the linear transformation layer, the multilayer perceptron layer, and the attention layer, we determine how many pairs of virtual layers the memory usage of the linear transformation layer is equivalent to, and thus obtain the number of virtual layers corresponding to the linear transformation layer at the memory usage level.

[0129] The linear transformation layer's computational cost is divided by the sum of the computational costs of the attention layer and the multilayer perceptron layer. This yields the linear transformation layer's multiplier relationship with the multilayer perceptron layer and the attention layer in terms of computational cost. This is the third quotient, which indicates how many layers of multilayer perceptron and attention layers the linear transformation layer is equivalent to in terms of computational cost.

[0130] In other words, the multilayer perceptron layer and the attention layer are regarded as a pair of virtual layers, and the computational workload of the multilayer perceptron layer and the attention layer are regarded as the computational workload of a pair of virtual layers. Based on the computational workload information of the linear transformation layer, the computational workload information of the multilayer perceptron layer, and the computational workload information of the attention layer, the computational workload of the linear transformation layer is determined to be equivalent to the computational workload of a pair of virtual layers, thus obtaining the number of virtual layers corresponding to the linear transformation layer at the computational workload level.

[0131] The average of the second and third quotients is determined, and this average is rounded up to obtain the number of virtual layers corresponding to the linear transformation layer.

[0132] The advantage of this setup is that, since the computational load and memory usage of the multilayer perceptron layer and the attention layer are different, and there is a one-to-one correspondence between the multilayer perceptron layer and the attention layer, treating the multilayer perceptron layer and the attention layer as a pair of virtual layers can reduce the error caused by the difference in computational load and memory usage of the virtual layers. It also allows for the determination of the number of virtual layers corresponding to the linear transformation layer in terms of computational load, the number of virtual layers corresponding to the linear transformation layer in terms of memory usage, and the number of virtual layers corresponding to the embedding layer. Determining the number of virtual layers corresponding to the linear transformation layer and the number of virtual layers corresponding to the embedding layer ensures that the corresponding number of virtual layers is accurate and reliable, facilitating subsequent allocation.

[0133] In this embodiment, determining the number of virtual layers corresponding to the embedding layer based on the performance information of the embedding layer, the multilayer perceptron layer, and the attention layer includes: determining the memory usage information of the virtual layers corresponding to the embedding layer based on the memory usage information of the embedding layer, the multilayer perceptron layer, and the attention layer; determining a first threshold for the number of virtual layers corresponding to the embedding layer based on the memory usage information of the embedding layer and the memory usage information of the virtual layers corresponding to the embedding layer; wherein, the first threshold represents the maximum value of the number of virtual layers corresponding to the embedding layer; and determining the number of virtual layers corresponding to the embedding layer based on the first threshold.

[0134] Specifically, based on the memory usage information of the embedding layer, the multilayer perceptron layer, and the attention layer, the memory usage information of the virtual layer corresponding to the embedding layer is determined. Then, based on the memory usage information of the embedding layer and the corresponding virtual layer, a first threshold for the number of virtual layers corresponding to the embedding layer is determined. For example, a threshold determination function can be pre-set to determine the threshold for the number of virtual layers corresponding to the embedding layer, and the threshold for the number of virtual layers corresponding to the linear transformation layer.

[0135] The preset threshold determination function can be, for example:

[0136] Here, `approximately_layers(X,X0,X1)` represents the threshold determination function. `X` represents the memory or computational cost information of the layer whose threshold needs to be determined; for example, `X` could be the memory cost information of the embedding layer. `X0` and `X1` represent the memory or computational cost of the attention layer or multilayer perceptron layer; for example, `X0` could be the memory cost information of the multilayer perceptron layer, and `X1` could be the memory cost information of the attention layer. Specifically, `X1` represents the memory or computational cost of the virtual layer that is structurally closer to the layer whose threshold needs to be determined. For example, if `X` is the memory cost information of the embedding layer, then `X1` is the memory cost information of the attention layer; if `X` is the memory cost information of the linear transformation layer, then `X1` is the memory cost information of the multilayer perceptron layer.

[0137] `n = floor(X / (X0+X1))` represents the floor function of the logarithm of the virtual layers corresponding to the layer X for which the threshold needs to be determined. That is, `(X0+X1)` is the sum of the memory usage of the attention layer and the multilayer perceptron layer, or the sum of the computational cost of the attention layer and the multilayer perceptron layer; `(X0+X1)` represents the memory usage or computational cost of a pair of virtual layers. `X / (X0+X1)` is the multiple of the pair of virtual layers corresponding to the layer X for which the threshold needs to be determined. For example, if X is the memory usage information of the embedding layer, X0 is the memory usage information of the multilayer perceptron layer, and X1 is the memory usage information of the attention layer, then `X / (X0+X1)` represents the multiple of the memory usage of the embedding layer to the memory usage of a pair of virtual layers, i.e., how many pairs of virtual layers the memory usage of the embedding layer is equivalent to. `floor(X / (X0+X1))` represents the integer `n` obtained by flooring this multiple.

[0138] N = {2n, 2n+1, 2(n+1)} represents the set of candidate thresholds for the number of virtual layers corresponding to the layer for which the threshold needs to be determined. Since n is the floor function of the logarithm of the number of virtual layers corresponding to the layer for which the threshold needs to be determined, directly determining the threshold for the number of virtual layers corresponding to the layer for which the threshold needs to be determined based on n has a large error. Therefore, three possibilities for the threshold are enumerated. Here, N represents the set of candidate thresholds. i The elements in set N are represented by i, which represents the position of the element in the set. For example, N1 represents the first number in the set, which is the first candidate threshold 2n, N2 represents the second number in the set, which is the second candidate threshold 2n+1, and N3 represents the third number in the set, which is the third candidate threshold 2(n+1).

[0139] Q = {n(X0+X1),n(X0+X1)+X1,(n+1)(X0+X1)} represents the set of memory usage or computational complexity information corresponding to candidate thresholds in the candidate threshold set N, where Q represents the set of memory usage or computational complexity information corresponding to candidate thresholds. i The elements in set Q are represented by i, where i represents the position of the element in the set. For example, Q1 represents the first number in the set, which is the memory usage or computational cost information n(X0+X1) corresponding to the first candidate threshold 2n. Q2 represents the second number in the set, which is the memory usage or computational cost information n(X0+X1)+X1 corresponding to the second candidate threshold 2n+1. Q3 represents the third number in the set, which is the memory usage or computational cost information (n+1)(X0+X1) corresponding to the third candidate threshold 2(n+1).

[0140] i = argmin{|Xn(X0+X1)|,|Xn(X0+X1)-X0|,|X-(n+1)(X0+X1)|}, where i represents the position of the element in the set, and the error represents the absolute value of the difference between the memory usage information of the layer whose threshold needs to be determined and the memory usage information of the candidate threshold, or the absolute value of the difference between the computational complexity information of the layer whose threshold needs to be determined and the computational complexity information of the candidate threshold. For example, if X is the memory usage information corresponding to the embedding layer, then the error is the difference between the memory usage information corresponding to the embedding layer and the memory usage information of the candidate threshold.

[0141] argmin{|Xn(X0+X1)|,|Xn(X0+X1)-X0|,|X-(n+1)(X0+X1)|} represents the position of the minimum error. For example, if the minimum error is |Xn(X0+X1)|, then i = 1; if the minimum error is |Xn(X0+X1)-X0|, then i = 2; and if the minimum error is |X-(n+1)(X0+X1)|, then i = 3.

[0142] return N i Q i The candidate threshold representing the position with the minimum return error and the corresponding memory usage or computational cost are used to determine the number of virtual layers. The returned candidate threshold is determined as the memory usage or computational cost of the corresponding virtual layer. For example, if X is the memory usage information corresponding to the embedding layer, X0 is the memory usage information of the multilayer perceptron layer, and X1 is the memory usage information of the attention layer, and i=1 is calculated based on the error, then N1 is 2n, that is, the number threshold of the virtual layer corresponding to the embedding layer is 2n; Q1 is n(X0+X1), that is, the memory usage of the virtual layer corresponding to the embedding layer is n(X0+X1).

[0143] By inputting the memory usage information of the embedding layer, the multilayer perceptron layer, and the attention layer into a preset threshold determination function, approximately_layers(M) can be obtained. emb M mlp M attn According to the function, n = floor(M) emb / (M mlp +M attn )).

[0144] And based on n, the set N is determined as: N={2n,2n+1,2(n+1)}

[0145] And determine the set Q as: Q={n(M mlp +M attn ),n(M mlp +M attn )-M mlp ,(n+1)(M mlp +M attn )}

[0146] And calculate the error i: i = argmin{|M emb -n(M mlp +M attn )|,|M emb -n(M mlp +M attn )-M mlp |,|M emb -(n+1)(M mlp +M attn )|}

[0147] Determine the value of i, and based on the value of i, return a first threshold for the number of virtual layers corresponding to the embedded layer. Memory usage information of the virtual layer corresponding to the embedding layer

[0148] Based on the memory usage information of the embedding layer, the multilayer perceptron layer, and the attention layer, a first threshold is determined for the number of virtual layers corresponding to the embedding layer, using a preset threshold determination function. This first threshold represents the maximum number of virtual layers corresponding to the embedding layer, indicating the number of virtual layers obtained from the memory usage perspective. Since the computational complexity information of the embedding layer is 0, the number of virtual layers corresponding to the embedding layer obtained from the computational complexity perspective is 0. An integer between the first threshold and 0 is then determined as the number of virtual layers corresponding to the embedding layer. For example, the number of virtual layers corresponding to the embedding layer can be determined to be 0, or any integer between the first threshold and 0.

[0149] The advantage of this setup is that by determining the number of virtual layers corresponding to the embedded layer (i.e., the first threshold) based on the memory usage information of the embedding layer, the multilayer perceptron layer, and the attention layer, the number of virtual layers corresponding to the embedded layer can be accurately determined. Treating the multilayer perceptron layer and the attention layer as a pair of virtual layers effectively eliminates errors. Selecting the number of virtual layers corresponding to the embedded layer between the first threshold and 0 allows for greater flexibility in adjusting the number as needed, effectively addressing memory overflow situations.

[0150] In this embodiment, the number of virtual layers corresponding to the linear transformation layer is determined based on the performance information of the linear transformation layer, the attention layer, and the multilayer perceptron layer. This includes: determining the memory usage information of the virtual layers corresponding to the linear transformation layer based on the memory usage information of the linear transformation layer, the attention layer, and the multilayer perceptron layer; determining a second threshold for the number of virtual layers corresponding to the linear transformation layer based on the memory usage information of the linear transformation layer and the memory usage information of the virtual layers corresponding to the linear transformation layer; wherein the second threshold represents the maximum value of the number of virtual layers corresponding to the linear transformation layer; determining the computational workload information of the virtual layers corresponding to the linear transformation layer based on the computational workload information of the linear transformation layer, the attention layer, and the multilayer perceptron layer; determining a third threshold for the number of virtual layers corresponding to the linear transformation layer based on the computational workload information of the linear transformation layer and the computational workload information of the virtual layers corresponding to the linear transformation layer; wherein the third threshold represents the minimum value of the number of virtual layers corresponding to the linear transformation layer; and determining the number of virtual layers corresponding to the linear transformation layer based on the second threshold and the third threshold.

[0151] Specifically, based on the memory usage information of the linear transformation layer, the attention layer, and the multilayer perceptron layer, and using a preset threshold determination function, the memory usage information of the virtual layer corresponding to the linear transformation layer is determined. Then, based on the memory usage information of the linear transformation layer and the corresponding virtual layer, a second threshold for the number of virtual layers corresponding to the linear transformation layer is determined. That is, the memory usage information of the linear transformation layer, the attention layer, and the multilayer perceptron layer is input into the preset threshold determination function to obtain approximately_layers(M head M attn M mlp The memory usage information of the virtual layer corresponding to the linear transformation layer is obtained through calculation. and the second threshold The second threshold represents the maximum number of virtual layers corresponding to the linear transformation layer, that is, the number of virtual layers corresponding to the linear transformation layer in terms of memory usage.

[0152] Based on the computational complexity information of the linear transformation layer, the attention layer, and the multilayer perceptron layer, the computational complexity information of the virtual layer corresponding to the linear transformation layer is determined. Then, based on the computational complexity information of the linear transformation layer and the corresponding virtual layer, a third threshold for the number of virtual layers corresponding to the linear transformation layer is determined. Specifically, the computational complexity information of the linear transformation layer, the attention layer, and the multilayer perceptron layer is input into a preset threshold determination function to obtain approximately_layers(F head F attn F mlp The computational complexity information of the virtual layer corresponding to the linear transformation layer is obtained through calculation. and the third threshold The third threshold represents the minimum number of virtual layers corresponding to the linear transformation layer, that is, the number of virtual layers corresponding to the linear transformation layer in terms of computational complexity.

[0153] Based on the second threshold and the third threshold, the integer between the second threshold and the third threshold is determined as the number of virtual layers corresponding to the linear transformation layer. For example, the third threshold can be determined as the number of virtual layers corresponding to the linear transformation layer, or any integer between the second threshold and the third threshold can be used as the number of virtual layers corresponding to the linear transformation layer.

[0154] The advantage of this setting is that by determining the number of virtual layers corresponding to the linear transformation layer based on the second and third thresholds, the exact number of virtual layers corresponding to the linear transformation layer can be determined, and a numerical range can be given, which facilitates subsequent adjustments based on the actual scenario.

[0155] S304. Based on the preset number of stages and the total number of virtual layers in the financial business model to be processed, determine the number of virtual layers in each model block, and deploy the model block to the computing card of the corresponding stage according to the number of virtual layers in the model block.

[0156] For example, this step can be referred to step S103 above, and will not be repeated here.

[0157] The method for processing model data based on financial business provided in this embodiment obtains the preset number of stages and the financial business model to be processed. It then splits the decoder layer of the financial business model into virtual layers. Based on information such as the vocabulary of the embedding layer and the dimensions of the decoder layer, the number of virtual layers corresponding to the embedding layer and the linear transformation layer is determined, resulting in the total number of virtual layers for the financial business model. Based on the total number of virtual layers in the financial business model, the number of virtual layers in each model block is determined, and the model blocks are deployed to the computing cards of the corresponding stages for computation and training. By splitting the decoder layer, the content of each layer in the model is made smaller, making the model easier to split and allocate to the corresponding stages, thus improving the utilization rate of the computing cards. By determining the number of virtual layers corresponding to the embedding layer and the linear transformation layer, the total number of virtual layers for the entire model can be uniformly obtained, making the model split more even and consistent, and improving the overall computational efficiency.

[0158] Figure 4 is a flowchart illustrating a method for processing model data based on financial business, as provided in an embodiment of this disclosure.

[0159] In this embodiment, the number of virtual layers in each model block is determined based on the preset number of stages and the total number of virtual layers in the financial business model to be processed. This includes: determining the number of layers to be adjusted as the quotient of the preset number of stages and the total number of virtual layers in the financial business model to be processed; determining the remainder between the preset number of stages and the total number of virtual layers in the financial business model to be processed; determining the number of virtual layers in the last model block of the financial business model to be processed based on the number of layers to be adjusted and the number of virtual layers corresponding to the linear transformation layer; and determining the number of virtual layers in each model block based on the preset number of stages, the number of layers to be adjusted, the remainder, and the number of virtual layers in the last model block.

[0160] As shown in Figure 4, the method includes the following steps;

[0161] S401. Obtain the financial business model to be processed and the number of preset stages; wherein, the financial business model to be processed is a pre-built large model used to process the transaction data of financial business, the financial business model to be processed includes an embedding layer, a linear transformation layer and at least one decoder layer, and the number of preset stages represents the number of model blocks of the financial business model to be processed, and each stage corresponds to at least one computing card.

[0162] For example, this step can refer to step S101 above, and will not be repeated here.

[0163] S402. Parse the decoder layer into a virtual layer. Based on the vocabulary of the embedding layer and the dimension of the decoder layer, determine the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer. The virtual layer includes an attention layer and a multi-layer perceptron layer. The attention layer includes a normalization layer and a multi-head attention layer. The multi-layer perceptron layer includes another normalization layer and a feedforward layer. The vocabulary represents the set of words used by the large model.

[0164] For example, this step can refer to step S102 above, and will not be repeated here.

[0165] S403. The quotient of the preset number of stages and the total number of virtual layers in the financial business model to be processed is determined as the number of layers to be adjusted, the remainder of the preset number of stages and the total number of virtual layers in the financial business model to be processed is determined as the number of layers to be allocated, and the preset number of stages is determined.

[0166] For example, if memory resources are sufficient, optimization techniques are used during the training of the financial business model to be processed. To achieve maximum speed, the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer can be set to the minimum value within an interval. That is, the number of virtual layers n corresponding to the embedding layer is... emb The value is 0, indicating the number of virtual layers n corresponding to the linear transformation layer. head This is the third threshold.

[0167] If a memory overflow occurs in the last layer or the first layer, the number of virtual layers corresponding to the embedded layer and the number of virtual layers corresponding to the linear transformation layer can be adjusted so that the number of virtual layers corresponding to the embedded layer and the number of virtual layers corresponding to the linear transformation layer are integers close to the maximum value within the interval.

[0168] By using the number n virtual layers corresponding to all decoder layers dec The number of virtual layers corresponding to the embedded layer, n emb And the number n of virtual layers corresponding to the linear transformation layer. head The summation yields the total number S of virtual layers in the financial business model to be processed. The formula for S is S = n emb +nhead +n dec .

[0169] Based on the preset number of stages n stage The total number of virtual layers S in the financial business model to be processed is used to subtract the preset number of stages n from the total number of virtual layers S in the financial business model to be processed. stage The quotient value is obtained, and it is rounded down to the nearest integer. The rounded quotient value is used as the number of layers to be adjusted, p. The total number S of virtual layers in the financial business model to be processed and the number of preset stages, n, are determined. stage The remainder q represents the number of layers to be allocated, that is, the number of layers that need to be allocated that are left over after allocating the layers to be adjusted.

[0170] The advantage of this setup is that it obtains the total number of virtual layers of the financial model to be processed, and by taking the integer quotient of the total number of virtual layers according to the preset number of stages, it can first determine the number of layers to be adjusted and whether there is a remainder, which facilitates the subsequent average distribution of the financial model to be processed.

[0171] S404. Based on the number of layers to be adjusted, the number of virtual layers corresponding to the linear transformation layer, the performance information of the virtual layers corresponding to the embedding layer, and the performance information of the virtual layers corresponding to the linear transformation layer, determine the number of virtual layers in the last model block of the financial business model to be processed.

[0172] For example, the number of virtual layers in the last model block of the financial business model to be processed is determined based on the number of layers to be adjusted and the number of virtual layers corresponding to the linear transformation layer. For instance, the number of virtual layers V in the last model block of the financial business model to be processed can be determined based on a preset formula for determining the last model block, according to the number of layers to be adjusted and the number of virtual layers corresponding to the linear transformation layer. stage-1 Finally, the formula for determining the model block can be, for example: V stage-1 =max{1, pn head +1}

[0173] Among them, V stage-1 The maximum value is set to the number of virtual layers in the last model block, where max{} represents the maximum value. This formula indicates that the number of virtual layers in the last model block is 1, which is equal to the number of virtual layers n corresponding to the linear transformation layer minus the number of layers p to be adjusted. head The maximum value between plus 1.

[0174] If V stage-1 It equals 1, that is, max{1, pn head The value of +1} is 1, pn head If the number of layers to be adjusted (p) is less than or equal to 0, then the number of virtual layers (n) corresponding to the linear transformation layer is less than or equal to 0. headAt this point, the number of layers to be adjusted in the last stage is insufficient to accommodate all the virtual layers corresponding to the linear transformation layers. In order to ensure the integrity of the linear transformation layers, all the virtual layers corresponding to the linear transformation layers are allocated to the model block corresponding to the last stage, and all the virtual layers corresponding to the embedding layers are allocated to the model block corresponding to the first stage. Based on the number of remaining virtual layers and the number of stages, the new number of layers to be adjusted and the number of layers to be allocated are recalculated.

[0175] A pre-defined allocation preference is set, which includes a memory allocation preference and a computational allocation preference. The preset allocation preference is used to update the number of layers to be adjusted and the number of layers to be allocated. Specifically, the memory allocation preference indicates that the number of layers to be adjusted and the number of layers to be allocated are updated first according to the memory usage information, while the computational allocation preference indicates that the number of layers to be adjusted and the number of layers to be allocated are updated first according to the computational load information.

[0176] If the current allocation preference is memory allocation preference, that is, allocation is based on memory usage information, then the remaining number of virtual layers is the total number of virtual layers S minus the number of virtual layers corresponding to the embedded layer n. emb And subtract the number n of virtual layers corresponding to the linear transformation layer. head That is, the number of remaining virtual layers is equal to the number of virtual layers corresponding to the decoder layer, n. dec The remaining number of stages is the preset number of stages, n. stage Subtracting the first and last stages, the remaining number of stages is n. stage -2.

[0177] Based on the number of remaining virtual layers n dec and the remaining number of stages n stage -2, determine the new number of layers to be adjusted, p:

[0178] Wherein, this formula represents the number of remaining virtual layers n. dec Divide by the remaining number of stages n stage -2, and round down the quotient to the nearest integer, and use the resulting integer as the new number of layers to be adjusted, p.

[0179] Based on the number of remaining virtual layers n dec The new number of layers to be adjusted is p, and the remaining number of stages is n. stage -2, determine the new number of layers to be assigned, q: q = n dec -p(n stage -2)

[0180] This formula represents the number of virtual layers n corresponding to the decoder layer. dec Subtract the new number of layers to be adjusted p and the remaining number of stages n stage -2 product, new unassigned layer number q.

[0181] If the current allocation tendency is computational allocation tendency, that is, allocation is based on computational load information, then the remaining number of virtual layers is the total number of virtual layers S minus the number of virtual layers n corresponding to the linear transformation layer. head That is, the number of remaining virtual layers is equal to the number of virtual layers corresponding to the decoder layer, n. dec Since the computational complexity of the virtual layer corresponding to the embedding layer is 0, meaning the first stage of the embedding layer has no computational overhead, the remaining number of stages is the preset number of stages n. stage Subtracting the last stage, the remaining number of stages is n. stage -1.

[0182] Based on the number of remaining virtual layers n dec and the remaining number of stages n stage -1, determine the new number of layers to be adjusted, p:

[0183] Wherein, this formula represents the number of remaining virtual layers n. dec Divide by the remaining number of stages n stage -1, and round down the resulting quotient to the nearest integer, and use the resulting integer as the new number of layers to be adjusted, p.

[0184] Based on the number of remaining virtual layers n dec The new number of layers to be adjusted is p, and the remaining number of stages is n. stage -1, determine the new number of layers to be assigned, q: q = n dec -p(n stage -1)

[0185] This formula represents the number of virtual layers n corresponding to the decoder layer. dec Subtract the new number of layers to be adjusted p and the remaining number of stages n stage The product of -1, the new number of layers to be assigned, q.

[0186] If the new number of layers to be adjusted, p, is even or the new number of layers to be allocated, q, is 0, then the remaining layers can be evenly distributed among the model blocks corresponding to each stage.

[0187] If the new unassigned layer q is greater than 1, then there exists or or If the memory usage or computational load allocated to the first or last stage is less than that of other stages, and there are more than one unallocated layer, then the stage with the lowest memory usage or computational load is selected first, and the unallocated layer is allocated to that stage. If the memory usage or computational load of the last stage is less than that of other stages, then the unallocated layer is allocated to the last stage, and the number of virtual layers V in the model block corresponding to the last stage is set. stage-1 Updated to V stage-1 =V stage-1 +1, that is, V stage-1 +1 is determined as the number of virtual layers corresponding to the model block in the new last stage; and the number of layers to be allocated q is updated so that q = q - 1, that is, the number of layers to be allocated remaining after allocating the layers to be allocated; if the memory usage or computational load of the first stage is less than that of other stages, the layers to be allocated are allocated to the first stage, and the number of layers to be allocated is updated to q = q - 1 after allocation.

[0188] The advantage of this setup is that by starting from the last model block and splitting from back to front, the model splitting can be guaranteed not to damage the original layers, and it is easier to split, making the splitting more uniform.

[0189] S405. Determine the number of virtual layers in each model block based on the preset number of stages, the number of layers to be adjusted, the number of layers to be allocated, and the number of virtual layers in the last model block.

[0190] For example, starting from the model block corresponding to the last stage, from back to front, the number of virtual layers in each model block is determined sequentially according to the preset number of stages, the number of layers to be adjusted, the number of layers to be allocated, and the number of virtual layers in the last model block. The financial business model to be processed is then split into model blocks according to the number and order of virtual layers.

[0191] If the number of unallocated layers after allocation is still greater than 0, meaning there are still unallocated layers, then these unallocated layers will be evenly distributed into the middle stages. For example,

[0192] For example, a step size `step` can be determined, where `step` represents the number of layers `q` to be allocated. That is, when the financial business model to be processed cannot be evenly divided into each stage, a stage interval of virtual layers needs to be inserted between stages. The formula for determining the step size `step` can be, for example:

[0193] Here, step represents the step size. This formula indicates that when the number of layers to be assigned, q, is greater than 0, i.e., there are layers to be assigned, the step size step is equal to the remaining number of stages, n. stageThe value obtained by rounding down the quotient of -2 divided by the number of layers to be allocated; in other cases of the number of layers to be allocated q, that is, when there is no number of layers to be allocated, the step size step is 0.

[0194] And determine the interval control amount m according to the step size step. The interval control amount is used to control the allocation interval of the number of layers to be allocated, so as to eliminate the uneven allocation caused by different amounts of computation or memory occupation of the redundant number of layers to be allocated. The formula for determining the interval control amount m is:

[0195] where m represents the interval control amount. This formula indicates that when the step size step is greater than 1, that is, when there is a number of layers to be allocated and a step size, the interval control amount is equal to the step size minus 1. This formula is used to determine the interval control amount m. Since the memory occupation and computation amount of the attention layer and the multi-layer perceptron layer are different, in order to evenly allocate the number of layers to be allocated to the computing cards corresponding to the stages, it is necessary to control the amount of computation allocated to each stage, that is, to control the memory occupation or computation amount of the virtual layers in the stage. By using the interval control amount to control the stage where the layer to be allocated is inserted, the allocation can be made more uniform. In other cases of the step size, that is, when there is no number of layers to be allocated and no step size, the control amount m is 0.

[0196] Allocate to the model blocks corresponding to the number of stages in turn from back to front according to the preset allocation function. The preset allocation function can be, for example:

[0197] for j from 0 to n stage -3:

[0198] If j < q * step and mod(j, step) = m: V stage-2-j = p + 1

[0199] Otherwise: V stage-2-j = p

[0200] where for represents the for loop function, j represents starting from 0, n stage -3 the number of loop times to end, V j represents the number of virtual layers in the model block corresponding to the stage, V0 represents the first model block, V stage-1The last model block is represented by, and the modulo function is represented by mod(). mod(i, step)=m means that when the integer j is divided by step, the remainder is m. This formula means that when the conditions j < q * step and mod(i, step)=m are satisfied, the number of virtual layers corresponding to the model block in the stage-2-j stage is p + 1 layers, that is, the number of layers to be adjusted plus 1 layer; if the conditions j < q * step and mod(j, step)=m are not satisfied, the number of virtual layers corresponding to the model block in the stage-2-j stage is p layers, that is, the number of layers to be adjusted.

[0201] The beneficial effect of this setting is that by first determining the number of layers to be adjusted and determining whether there are layers to be allocated, a more accurate allocation of the financial model to be processed can be performed; by allocating from back to front, the integrity of specific layers in the model block can be effectively ensured; by judging whether there are layers to be allocated and according to information such as the step size determined by the layers to be allocated, the remaining virtual layers can be evenly inserted into the model blocks corresponding to the number of stages according to the step size, making the allocation more uniform.

[0202] The method for processing model data based on financial services provided in this embodiment obtains the number of preset stages and the financial service model to be processed, splits the decoder layer in the financial service model, splits the decoder layer into virtual layers, and determines the number of virtual layers corresponding to the embedding layer and the linear transformation layer according to information such as the vocabulary of the embedding layer and the dimension of the decoder layer, obtains the number of virtual layers corresponding to the entire financial service model, determines the number of virtual layers in each model block according to the total number of virtual layers in the financial service model, and deploys the model block to the computing card corresponding to the corresponding stage to calculate and train the financial service model. By splitting the decoder layer, the content of each layer in the model is smaller, making the model easier to split and allocate to the corresponding stage, improving the utilization rate of the computing card. By determining the number of virtual layers corresponding to the embedding layer and the linear transformation layer, the number of virtual layers corresponding to the entire model can be uniformly obtained, making the splitting of the model more uniform and unified, and improving the overall operation efficiency.

[0203] FIG. 5 is a structural block diagram of a device for processing model data based on financial services provided by an embodiment of the present disclosure.

[0204] For ease of description, only parts related to the embodiments of the present disclosure are shown. Referring to FIG. 5, the device 500 for processing model data based on financial services includes: an acquisition unit 501, an analysis unit 502, and an allocation unit 503.

[0205] The acquisition unit 501 is used to acquire the financial business model to be processed and the number of preset stages; wherein, the financial business model to be processed is a pre-built large model used to process the transaction data of financial business, the financial business model to be processed includes an embedding layer, a linear transformation layer and at least one decoder layer, and the number of preset stages represents the number of model blocks of the financial business model to be processed, and each stage corresponds to at least one computing card.

[0206] The parsing unit 502 is used to parse the decoder layer into a virtual layer, and determine the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer based on the vocabulary of the embedding layer and the dimension of the decoder layer; wherein, the virtual layer includes an attention layer and a multilayer perceptron layer, the attention layer includes a normalization layer and a multi-head attention layer, and the multilayer perceptron layer includes another normalization layer and a feedforward layer;

[0207] The allocation unit 503 is used to determine the number of virtual layers in each model block according to the preset number of stages and the total number of virtual layers in the financial business model to be processed, and to deploy the model block to the computing card of the corresponding stage according to the number of virtual layers in the model block.

[0208] Figure 6 is a structural block diagram of a processing device for model data based on financial business provided in an embodiment of this disclosure.

[0209] Based on the embodiment shown in Figure 5, the parsing unit 502 includes an occupancy determination module 5021 and a quantity determination module 5022.

[0210] The occupancy determination module 5021 is used to determine the performance information of the embedding layer, the linear transform layer, the attention layer, and the multilayer perceptron layer based on the vocabulary of the embedding layer and the dimension of the decoder layer; wherein the performance information includes memory occupancy information and computational load information.

[0211] The quantity determination module 5022 is used to determine the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer based on the performance information of the embedding layer, the performance information of the linear transformation layer, the performance information of the attention layer, and the performance information of the multilayer perceptron layer.

[0212] In one example, the occupancy determination module 5021 also includes:

[0213] The first ratio submodule is used to obtain the input data dimension of the decoder layer and determine a first ratio based on the dimension of the decoder layer and the input data dimension of the decoder layer; wherein, the first ratio represents the ratio between the input data dimension of the decoder layer and the dimension of the decoder layer.

[0214] The second ratio submodule is used to obtain the number of query heads and the number of key-value heads of the multi-head attention layer, and determine the second ratio based on the number of query heads and the number of key-value heads; wherein, the second ratio represents the ratio between the number of query heads and the number of key-value heads;

[0215] The third ratio submodule is used to obtain the dimension of the feedforward layer and determine the third ratio based on the dimension of the feedforward layer and the dimension of the decoder layer; wherein, the third ratio represents the ratio between the dimension of the feedforward layer and the dimension of the decoder layer.

[0216] The fourth ratio submodule is used to determine a fourth ratio based on the vocabulary of the embedding layer and the dimension of the decoder layer; wherein the fourth ratio represents the ratio between the size of the vocabulary of the embedding layer and the dimension of the decoder layer.

[0217] The occupancy determination submodule is used to determine the performance information of the embedding layer, the linear transformation layer, the attention layer, and the multilayer perceptron layer based on the first ratio, the second ratio, the third ratio, and the fourth ratio.

[0218] In one example, a specific submodule is used, specifically for:

[0219] Based on the first ratio and the second ratio, the performance information of the attention layer is determined;

[0220] Based on the first ratio and the third ratio, the performance information of the multilayer sensor layer is determined;

[0221] Based on the first ratio and the fourth ratio, the performance information of the linear transformation layer is determined;

[0222] The performance information of the embedding layer is determined based on the first ratio and the fourth ratio.

[0223] In one example, the quantity determination module 5022 includes:

[0224] The embedding quantity submodule is used to determine the number of virtual layers corresponding to the embedding layer based on the performance information of the embedding layer, the performance information of the multilayer perceptron layer, and the performance information of the attention layer.

[0225] The linear quantity submodule is used to determine the number of virtual layers corresponding to the linear transformation layer based on the performance information of the linear transformation layer, the performance information of the attention layer, and the performance information of the multilayer perceptron layer.

[0226] In one example, an embedded quantity submodule is used specifically for:

[0227] Based on the memory usage information of the embedded layer, the memory usage information of the multilayer perceptron layer, and the memory usage information of the attention layer, determine the memory usage information of the virtual layer corresponding to the embedded layer;

[0228] Based on the memory usage information of the embedded layer and the memory usage information of the virtual layer corresponding to the embedded layer, a first threshold for the number of virtual layers corresponding to the embedded layer is determined; wherein, the first threshold represents the maximum number of virtual layers corresponding to the embedded layer.

[0229] Based on the first threshold, the number of virtual layers corresponding to the embedding layer is determined.

[0230] In one example, the linear quantity submodule is specifically used for:

[0231] Based on the memory usage information of the linear transformation layer, the memory usage information of the attention layer, and the memory usage information of the multilayer perceptron layer, determine the memory usage information of the virtual layer corresponding to the linear transformation layer;

[0232] Based on the memory usage information of the linear transformation layer and the memory usage information of the virtual layer corresponding to the linear transformation layer, a second threshold for the number of virtual layers corresponding to the linear transformation layer is determined; wherein, the second threshold represents the maximum number of virtual layers corresponding to the linear transformation layer.

[0233] Based on the computational complexity information of the linear transformation layer, the computational complexity information of the attention layer, and the computational complexity information of the multilayer perceptron layer, the computational complexity information of the virtual layer corresponding to the linear transformation layer is determined.

[0234] Based on the computational complexity information of the linear transformation layer and the computational complexity information of the virtual layer corresponding to the linear transformation layer, a third threshold is determined for the number of virtual layers corresponding to the linear transformation layer; wherein, the third threshold represents the minimum value of the number of virtual layers corresponding to the linear transformation layer.

[0235] The number of virtual layers corresponding to the linear transformation layer is determined based on the second threshold and the third threshold.

[0236] In one example, allocation unit 503 includes:

[0237] The remainder determination module is used to determine the quotient of the preset number of stages and the total number of virtual layers in the financial business model to be processed as the number of layers to be adjusted, the remainder of the preset number of stages and the total number of virtual layers in the financial business model to be processed as the number of layers to be allocated, and to determine the preset number of stages.

[0238] The first layer number determination module is used to determine the number of virtual layers in the last model block of the financial business model to be processed based on the number of layers to be adjusted, the number of virtual layers corresponding to the linear transformation layer, the performance information of the virtual layer corresponding to the embedding layer, and the performance information of the virtual layer corresponding to the linear transformation layer.

[0239] The second layer number determination module is used to determine the number of virtual layers in each model block based on the preset number of stages, the number of layers to be adjusted, the number of layers to be allocated, and the number of virtual layers in the last model block.

[0240] Figure 7 is a structural block diagram of an electronic device provided in an embodiment of this disclosure. The electronic device may be a terminal device or a server. As shown in Figure 7, the electronic device 700 includes: at least one processor 702; and a memory 701 communicatively connected to the at least one processor 702. The memory stores instructions that can be executed by the at least one processor 702. The instructions are executed by the at least one processor 702 to enable the at least one processor 702 to execute the processing method for model data based on financial business disclosed in this disclosure.

[0241] The electronic device 700 also includes a receiver 703 and a transmitter 704. The receiver 703 is used to receive instructions and data sent by other devices, and the transmitter 704 is used to send instructions and data to external devices.

[0242] According to embodiments of this disclosure, this disclosure also provides an electronic device, a readable storage medium, and a computer program product.

[0243] According to embodiments of this disclosure, this disclosure also provides a computer program product comprising: a computer program stored in a readable storage medium, at least one processor of an electronic device being able to read the computer program from the readable storage medium, and the at least one processor executing the computer program causing the electronic device to perform the scheme provided in any of the above embodiments.

[0244] Figure 8 is a block diagram illustrating an electronic device according to an exemplary embodiment. The device may be a mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, etc.

[0245] Device 800 may include one or more of the following components: processing component 802, memory 804, power supply component 806, multimedia component 808, audio component 811, input / output (I / O) interface 812, sensor component 814, and communication component 816.

[0246] Processing component 802 typically controls the overall operation of device 800, such as operations associated with display, telephone calls, data communication, camera operation, and recording. Processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the methods described above. Furthermore, processing component 802 may include one or more modules to facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.

[0247] Memory 804 is configured to store various types of data to support the operation of device 800. Examples of this data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, etc. Memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0248] Power supply component 806 provides power to various components of device 800. Power supply component 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to device 800.

[0249] Multimedia component 808 includes a screen that provides an output interface between the device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundaries of a touch or swipe action but also the duration and pressure associated with the touch or swipe operation. In some embodiments, multimedia component 808 includes a front-facing camera and / or a rear-facing camera. When the device 800 is in an operating mode, such as a shooting mode or a video mode, the front-facing camera and / or the rear-facing camera may receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

[0250] Audio component 810 is configured to output and / or input audio signals. For example, audio component 810 includes a microphone (MIC) configured to receive external audio signals when device 800 is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 804 or transmitted via communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

[0251] I / O interface 812 provides an interface between processing component 802 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.

[0252] Sensor assembly 814 includes one or more sensors for providing state assessments of various aspects of device 800. For example, sensor assembly 814 may detect the on / off state of device 800, the relative positioning of components such as the display and keypad of device 800, changes in the position of device 800 or a component of device 800, the presence or absence of user contact with device 800, the orientation or acceleration / deceleration of device 800, and temperature changes of device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, sensor assembly 814 may also include an accelerometer, a gyroscope, a magnetometer, a pressure sensor, or a temperature sensor.

[0253] Communication component 816 is configured to facilitate wired or wireless communication between device 800 and other devices. Device 800 can access wireless networks based on communication standards, such as WiFi, 2G, or 3G, or combinations thereof. In one exemplary embodiment, communication component 816 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communication component 816 also includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

[0254] In an exemplary embodiment, device 800 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the methods described above.

[0255] In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 804 including instructions, which can be executed by a processor 820 of the device 800 to perform the above-described method. For example, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.

[0256] A non-transitory computer-readable storage medium, wherein when the instructions in the storage medium are executed by the processor of a terminal device, the terminal device is able to execute the aforementioned processing method for model data based on financial business.

[0257] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, as some steps may be performed in other orders or simultaneously according to this application. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily essential to this application.

[0258] It should be further noted that although the steps in the flowchart are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowchart may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these sub-steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.

[0259] It should be understood that the above-described device embodiments are merely illustrative, and the device of this application can also be implemented in other ways. For example, the division of units / modules in the above embodiments is only a logical functional division, and there may be other division methods in actual implementation. For example, multiple units, modules, or components may be combined, or integrated into another system, or some features may be ignored or not executed.

[0260] Furthermore, unless otherwise specified, the functional units / modules in the various embodiments of this application can be integrated into one unit / module, or each unit / module can exist physically separately, or two or more units / modules can be integrated together. The integrated units / modules described above can be implemented in hardware or as software program modules.

[0261] When integrated units / modules are implemented in hardware, the hardware can be digital circuits, analog circuits, etc. The physical implementation of the hardware structure includes, but is not limited to, transistors, memristors, etc. Unless otherwise specified, the processor can be any suitable hardware processor, such as a CPU, GPU, FPGA, DSP, and ASIC, etc. Unless otherwise specified, the storage unit can be any suitable magnetic or magneto-optical storage medium, such as Resistive Random Access Memory (RRAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Enhanced Dynamic Random Access Memory (EDRAM), High-Bandwidth Memory (HBM), Hybrid Memory Cube (HMC), etc.

[0262] If the integrated unit / module is implemented as a software program module and sold or used as an independent product, it can be stored in a computer-readable storage device (CMD). Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a memory and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned memory includes various media capable of storing program code, such as a USB flash drive, read-only memory (ROM), random access memory (RAM), portable hard drive, magnetic disk, or optical disk.

[0263] In the above embodiments, the descriptions of each embodiment have their own emphasis. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments. The technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as the combination of these technical features does not contradict each other, it should be considered within the scope of this specification.

[0264] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this application are indicated by the following claims.

[0265] It should be understood that this application is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this application is limited only by the appended claims.

Claims

1. A method for processing model data based on financial business, characterized in that, include: Obtain the financial business model to be processed and the number of preset stages; wherein, the financial business model to be processed is a pre-built large model used to process the transaction data of financial business, the financial business model to be processed includes an embedding layer, a linear transformation layer and at least one decoder layer, and the number of preset stages represents the number of model blocks of the financial business model to be processed, and each stage corresponds to at least one computing card. The decoder layer is parsed into a virtual layer. Based on the vocabulary of the embedding layer and the dimension of the decoder layer, the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer are determined. The virtual layer includes an attention layer and a multilayer perceptron layer. The attention layer includes a normalization layer and a multi-head attention layer. The multilayer perceptron layer includes another normalization layer and a feedforward layer. The vocabulary represents the set of words used by the large model. Based on the preset number of stages and the total number of virtual layers in the financial business model to be processed, the number of virtual layers in each model block is determined, and the model block is deployed to the computing card of the corresponding stage according to the number of virtual layers in the model block.

2. The method according to claim 1, characterized in that, The number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transform layer are determined based on the vocabulary of the embedding layer and the dimensions of the decoder layer, including: Based on the vocabulary of the embedding layer and the dimensions of the decoder layer, the performance information of the embedding layer, the linear transform layer, the attention layer, and the multilayer perceptron layer is determined; wherein, the performance information includes memory usage information and computational cost information; Based on the performance information of the embedding layer, the linear transformation layer, the attention layer, and the multilayer perceptron layer, the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer are determined.

3. The method according to claim 2, characterized in that, Based on the vocabulary of the embedding layer and the dimensions of the decoder layer, the performance information of the embedding layer, the linear transform layer, the attention layer, and the multilayer perceptron layer is determined, including: Obtain the input data dimension of the decoder layer, and determine a first ratio based on the dimension of the decoder layer and the input data dimension of the decoder layer; wherein, the first ratio represents the ratio between the input data dimension of the decoder layer and the dimension of the decoder layer; Obtain the number of query heads and the number of key-value heads in the multi-head attention layer, and determine a second ratio based on the number of query heads and the number of key-value heads; wherein, the second ratio represents the ratio between the number of query heads and the number of key-value heads; Obtain the dimension of the feedforward layer, and determine a third ratio based on the dimension of the feedforward layer and the dimension of the decoder layer; wherein, the third ratio represents the ratio between the dimension of the feedforward layer and the dimension of the decoder layer; A fourth ratio is determined based on the vocabulary of the embedding layer and the dimension of the decoder layer; wherein the fourth ratio represents the ratio between the size of the vocabulary of the embedding layer and the dimension of the decoder layer. Based on the first ratio, the second ratio, the third ratio, and the fourth ratio, the performance information of the embedding layer, the linear transformation layer, the attention layer, and the multilayer perceptron layer are determined.

4. The method according to claim 3, characterized in that, The computational complexity information in the embedding layer performance information is 0; based on the first ratio, the second ratio, the third ratio, and the fourth ratio, the performance information of the embedding layer, the linear transformation layer, the attention layer, and the multilayer perceptron layer are determined, including: The performance information of the attention layer is determined based on the first ratio and the second ratio; The performance information of the multilayer sensor layer is determined based on the first ratio and the third ratio. The performance information of the linear transformation layer is determined based on the first ratio and the fourth ratio. The performance information of the embedding layer is determined based on the first ratio and the fourth ratio.

5. The method according to claim 2, characterized in that, Based on the performance information of the embedding layer, the linear transformation layer, the attention layer, and the multilayer perceptron layer, the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer are determined, including: The number of virtual layers corresponding to the embedding layer is determined based on the performance information of the embedding layer, the performance information of the multilayer perceptron layer, and the performance information of the attention layer. The number of virtual layers corresponding to the linear transformation layer is determined based on the performance information of the linear transformation layer, the performance information of the attention layer, and the performance information of the multilayer perceptron layer.

6. The method according to claim 5, characterized in that, The number of virtual layers corresponding to the embedding layer is determined based on the performance information of the embedding layer, the performance information of the multilayer perceptron layer, and the performance information of the attention layer, including: Based on the memory usage information of the embedding layer, the memory usage information of the multilayer perceptron layer, and the memory usage information of the attention layer, the memory usage information of the virtual layer corresponding to the embedding layer is determined; Based on the memory usage information of the embedded layer and the memory usage information of the virtual layer corresponding to the embedded layer, a first threshold for the number of virtual layers corresponding to the embedded layer is determined; wherein, the first threshold represents the maximum number of virtual layers corresponding to the embedded layer. The number of virtual layers corresponding to the embedding layer is determined based on the first threshold.

7. The method according to claim 5, characterized in that, Based on the performance information of the linear transformation layer, the performance information of the attention layer, and the performance information of the multilayer perceptron layer, the number of virtual layers corresponding to the linear transformation layer is determined, including: Based on the memory usage information of the linear transformation layer, the memory usage information of the attention layer, and the memory usage information of the multilayer perceptron layer, the memory usage information of the virtual layer corresponding to the linear transformation layer is determined. Based on the memory usage information of the linear transformation layer and the memory usage information of the virtual layer corresponding to the linear transformation layer, a second threshold is determined for the number of virtual layers corresponding to the linear transformation layer; wherein, the second threshold represents the maximum value of the number of virtual layers corresponding to the linear transformation layer. Based on the computational complexity information of the linear transformation layer, the computational complexity information of the attention layer, and the computational complexity information of the multilayer perceptron layer, the computational complexity information of the virtual layer corresponding to the linear transformation layer is determined. Based on the computational complexity information of the linear transformation layer and the computational complexity information of the virtual layer corresponding to the linear transformation layer, a third threshold is determined for the number of virtual layers corresponding to the linear transformation layer; wherein, the third threshold represents the minimum value of the number of virtual layers corresponding to the linear transformation layer. The number of virtual layers corresponding to the linear transformation layer is determined based on the second threshold and the third threshold.

8. The method according to claim 6 or 7, characterized in that, The computational complexity of the virtual layer corresponding to the embedded layer is 0. Based on the preset number of stages and the total number of virtual layers in the financial business model to be processed, the number of virtual layers in each model block is determined, including: The quotient of the preset number of stages and the total number of virtual layers in the financial business model to be processed is determined as the number of layers to be adjusted, the remainder of the preset number of stages and the total number of virtual layers in the financial business model to be processed is determined as the number of layers to be allocated, and the preset number of stages is determined. Based on the number of layers to be adjusted, the number of virtual layers corresponding to the linear transformation layer, the performance information of the virtual layer corresponding to the embedding layer, and the performance information of the virtual layer corresponding to the linear transformation layer, determine the number of virtual layers in the last model block of the financial business model to be processed; The number of virtual layers in each model block is determined based on the preset number of stages, the number of layers to be adjusted, the number of layers to be allocated, and the number of virtual layers in the last model block.

9. A processing device for model data based on financial transactions, comprising: An acquisition unit is used to acquire the financial business model to be processed and the number of preset stages; wherein, the financial business model to be processed is a pre-built large model used to process the transaction data of financial business, the financial business model to be processed includes an embedding layer, a linear transformation layer and at least one decoder layer, and the number of preset stages represents the number of model blocks of the financial business to be processed, and each stage corresponds to at least one computing card. A parsing unit is used to parse the decoder layer into virtual layers, and determine the number of virtual layers corresponding to the embedding layer and the number of virtual layers corresponding to the linear transformation layer based on the vocabulary of the embedding layer and the dimension of the decoder layer; wherein, the virtual layer includes an attention layer and a multilayer perceptron layer, the attention layer includes a normalization layer and a multi-head attention layer, the multilayer perceptron layer includes another normalization layer and a feedforward layer, and the vocabulary represents the vocabulary set used by the large model; The allocation unit is used to determine the number of virtual layers in each model block according to the preset number of stages and the total number of virtual layers in the financial business model to be processed, and to deploy the model block to the computing card of the corresponding stage according to the number of virtual layers in the model block.

10. An electronic device, characterized in that, include: A processor, and a memory communicatively connected to the processor; The memory stores computer-executed instructions; The processor executes computer execution instructions stored in the memory to implement the method as described in any one of claims 1 to 8.

11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, are used to implement the method as described in any one of claims 1 to 8.

12. A computer program product, characterized in that, Includes a computer program that, when executed by a processor, implements the method of any one of claims 1 to 8.