A training data screening method and apparatus for model supervised fine-tuning
By extracting sparse activation features from within the model and performing adaptive enhancement, the problem of the disconnect between mechanistic interpretability research and data selection is solved, achieving efficient model training data selection and improving the model's learning ability and training efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TIANJIN UNIV
- Filing Date
- 2026-04-14
- Publication Date
- 2026-06-19
AI Technical Summary
Existing mechanistic interpretability studies cannot be effectively translated into model optimization actions, and existing data screening methods ignore the internal state of the model, resulting in low screening accuracy and inefficiency.
By extracting hidden state representations from the target network layer of the source model, performing sparse projection processing using a sparse computing accelerator, selecting multidimensional sparse activation features, enhancing the model through adaptive decoding weights, and using significant positive information to filter training data, supervised fine-tuning of the model is achieved.
It improves the observability and learning ability of the model training process, enhances the performance of the model on specific tasks, saves computing resources, and improves training efficiency.
Smart Images

Figure CN122020187B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of model training and optimization technology, and specifically to a method and apparatus for supervising the selection of training data for model fine-tuning. Background Technology
[0002] Mechanistic interpretability (MI) is a research approach that uses reverse computation of the internal mechanisms of neural networks to understand "how" a model (such as a large language model) "works." Its goal is to reveal the specific paths of information flow, representation formation, and decision generation when a model processes tasks. However, existing MI research remains at the level of "explaining" model behavior, or only applies to intervention during inference. This inference-based intervention approach often faces high latency and instability in practical applications, and currently lacks a universal process to transform these deep insights into the model's internal workings (i.e., the specific mechanisms by which information is represented, transmitted, and transformed) into proactive guidance signals, thereby directly optimizing the model construction process during the training phase.
[0003] At the same time, existing data screening techniques have significant limitations: on the one hand, they usually treat the model to be trained as a black box, mainly relying on external signals to judge the utility of the data, ignoring the direct feedback of the model's internal state to the data; on the other hand, when the scale of data increases, some data screening methods based on external indicators often cannot surpass simple random selection, leading to questions about their effectiveness. Summary of the Invention
[0004] In view of the above problems, this application provides a training data selection method and apparatus for supervised model fine-tuning, which is used to solve at least one of the above problems.
[0005] The first aspect of this application provides a training data selection method for supervised fine-tuning of a model, comprising: extracting hidden state representations from the processing results of a text synthesis sequence by the target network layer of the source model, and performing sparse projection processing on the hidden state representations using a sparse computing accelerator to obtain multidimensional sparse activation features, wherein the text synthesis sequence includes a text description and execution results of the target task; performing feature selection operations based on activation frequency thresholds on the features in each dimension of the multidimensional sparse activation features to obtain a candidate feature set, wherein the candidate feature set is stored in a memory with a multi-level caching mechanism; and performing operations on the activation scalars based on adaptive decoding weights to obtain feature influence vectors. The source model is augmented with residual weights using feature influence vectors to obtain an augmented source model. The activation scalar is obtained by performing adaptive encoding weights on the processing results of the target network layer on the validation data. The original output of the source model performing the target task and the augmented output of the augmented source model performing the target task are used to obtain significant positive information for verifying the causal relationship between the input and output of the source model. The significant positive information is used to perform a screening operation on the candidate feature set to obtain the target feature set. The target feature set is used to perform data screening processing on the original training dataset based on the feature activation magnitude to obtain the training dataset for supervised fine-tuning of the target model.
[0006] According to an embodiment of this application, the above-mentioned invoking a sparse computing accelerator to perform sparse projection processing on the hidden state representation to obtain multidimensional sparse activation features includes: deploying a trained sparse autoencoder into a hardware acceleration device to obtain a sparse computing accelerator, wherein the trained sparse autoencoder is used to spatially map the hidden features generated by the source model during the execution of the target task, and the hardware acceleration device includes an image processing device, a tensor processing device, a neural network processing device, or an application-specific integrated circuit; invoking the sparse computing accelerator to obtain the adaptive encoding weights in the process of the target network layer processing the text synthesis sequence, and performing linear activation processing on the hidden state representation and the adaptive encoding weights to achieve sparse projection processing on the hidden state representation to obtain multidimensional sparse activation features.
[0007] According to embodiments of this application, the above-described feature filtering operation based on activation frequency threshold for each dimension of the multidimensional sparse activation features to obtain a candidate feature set includes: obtaining the marker positions of keyword elements in the target task corpus to which the text synthesis sequence belongs, wherein the target task corpus is stored in a first cache memory, and the keyword elements are used to connect the text description and execution result of the target task; calculating the feature activation frequency of each dimension of the multidimensional sparse activation features in the target task corpus using the marker positions, wherein the feature activation frequency represents the proportion of the number of activated samples in the target task corpus to the total number of samples in the target task corpus; performing a comparison operation between the feature activation frequency and the activation frequency threshold to obtain a comparison result; and if the comparison result is greater than or equal to the activation frequency threshold, using the feature corresponding to the comparison result as a candidate feature to obtain a candidate feature set.
[0008] According to an embodiment of this application, the above-mentioned operation of the activation scalar based on adaptive decoding weights to obtain the feature influence vector includes: obtaining the hidden state representation of the verification data generated during the processing of the verification data stored in the first cache memory by the target network layer; calling a sparse computing accelerator with a trained sparse autoencoder to perform an operation based on adaptive encoding weights on the hidden state representation of the verification data to obtain the activation scalar; during the processing of the verification data by the target network layer, calling the sparse computing accelerator to obtain the adaptive decoding weights of the current verification data processing stage, and performing an operation between the adaptive decoding weights and the activation scalar to obtain the feature influence vector.
[0009] According to an embodiment of this application, the above-mentioned method of using feature influence vectors to enhance the residual weights of the source model to obtain an enhanced source model includes: extracting the verification hidden state representation from the processing result of the target network layer on the verification data stored in the first cache memory, and performing a calculation on the verification hidden state representation and the feature influence vector to obtain the enhanced verification hidden state representation; using the enhanced verification hidden state representation to enhance the weights of the residual layer of the source model to obtain the enhanced source model, and storing the parameters of the enhanced source model in the second cache memory.
[0010] According to an embodiment of this application, the above-mentioned method of obtaining significant positive information for verifying the causal relationship between the input and output of the source model by using the original result of the source model executing the target task and the enhanced result of the enhanced source model executing the target task includes: scoring the original result and the enhanced result stored in the third cache memory using a preset evaluation index to obtain the score of the original result and the score of the enhanced result; and calculating the score of the original result and the score of the enhanced result to obtain significant positive information.
[0011] According to an embodiment of this application, the above-mentioned filtering operation on the candidate feature set using significant positive information to obtain the target feature set includes: sorting the candidate feature set in descending order using significant positive information to obtain a descending-ordered candidate feature set, and selecting multiple candidate features ranked first from the descending-ordered candidate feature set to construct the target feature set.
[0012] According to an embodiment of this application, the above-mentioned data filtering process based on feature activation magnitude of the original training dataset using the target feature set to obtain the target training dataset includes: concatenating each input sample and its corresponding label value in the original training dataset based on the label position to obtain a concatenated training dataset; calculating the feature activation magnitude value of each concatenated training data at the label position using the target feature set to obtain the feature resonance score of each concatenated training data; sorting and filtering the concatenated training dataset according to the feature resonance score of each concatenated training data to obtain an initial target training dataset; and decoupling the target input sample and its corresponding label value at the label position of each initial target training data in the initial target training dataset to obtain a target training dataset for supervised fine-tuning of the model.
[0013] According to embodiments of this application, the target task includes at least one of a translation task, a summary generation task, and a mathematical reasoning task; wherein the source model includes a translation source model, a summary generation source model, and a mathematical reasoning source model; wherein the target model includes a translation target model, a summary generation target model, and a mathematical reasoning target model; wherein the text synthesis sequence includes at least one of a translation text sequence, a summary generation text sequence, and a mathematical reasoning text sequence; wherein the verification data includes at least one of a text sample for the translation task, a text sample for the summary generation task, and a text sample for the mathematical reasoning task.
[0014] A second aspect of this application provides a training data filtering device for supervised model fine-tuning, comprising: a multidimensional sparse activation feature acquisition module, used to extract hidden state representations from the processing results of the target network layer of the source model on the synthesized text sequence, and call a sparse computing accelerator to perform sparse projection processing on the hidden state representations to obtain multidimensional sparse activation features, wherein the synthesized text sequence includes a text description and execution results of the target task; a candidate feature set acquisition module, used to perform feature filtering operations based on activation frequency thresholds on the features in each dimension of the multidimensional sparse activation features to obtain a candidate feature set, wherein the candidate feature set is stored in a memory with a multi-level caching mechanism; and a source model enhancement module, used to perform operations on the activation scalars based on adaptive decoding weights. The system obtains a feature influence vector and uses it to enhance the residual weights of the source model, resulting in an enhanced source model. The activation scalar is obtained by performing adaptive encoding weight operations on the processing results of the target network layer on the validation data. A target feature set acquisition module is used to obtain significant positive information for verifying the causal relationship between the input and output of the source model using the original output of the source model performing the target task and the enhanced output of the enhanced source model performing the target task. This significant positive information is then used to filter the candidate feature set, resulting in the target feature set. A training dataset filtering module is used to perform data filtering processing on the original training dataset based on the feature activation magnitude using the target feature set, resulting in a training dataset for supervised fine-tuning of the target model.
[0015] A third aspect of this application provides an electronic device comprising: one or more processors; and a memory for storing one or more computer programs, wherein the one or more processors execute the one or more computer programs to implement the steps of the method described above.
[0016] A fourth aspect of this application also provides a computer-readable storage medium having a computer program or instructions stored thereon, which, when executed by a processor, implement the steps of the above-described method.
[0017] The fifth aspect of this application also provides a computer program product, including a computer program or instructions that, when executed by a processor, implement the steps of the above-described method.
[0018] The training data selection method for supervised model fine-tuning provided in this application extracts sparse activation features that truly drive the task from the model's hidden states, more closely resembling the model's actual learning mechanism. This upgrades the model's execution process for the target task from "data-driven" to "feature-driven." By processing the hidden state representations generated during the model's execution of the target task, the training process is transformed from a black box into an observable process. This solves the technical problem that existing mechanisms cannot effectively translate interpretability results into model optimization actions, proactively enhancing the model's learning ability and effectively avoiding the idleness and waste of interpretable information in the model optimization process. Furthermore, the training data selection method for supervised model fine-tuning provided in this application addresses the problems of low selection accuracy and low data efficiency caused by treating the model as a black box in existing data selection methods. It achieves the goal of efficiently improving the model's performance on a specific task using a small amount of high-value data, effectively saving computational resources (such as hardware resources used for computation) and improving the model's training efficiency. Attached Figure Description
[0019] The above-mentioned contents, other objects, features and advantages of this application will become clearer from the following description of embodiments with reference to the accompanying drawings, in which:
[0020] Figure 1 An application scenario diagram of the training data selection method for supervised model fine-tuning according to an embodiment of this application is shown.
[0021] Figure 2 A flowchart of a training data selection method for supervised model fine-tuning according to an embodiment of this application is shown.
[0022] Figure 3 A framework diagram of an interpretability-driven data filtering method according to an embodiment of this application is shown.
[0023] Figure 4 A structural diagram of a training data selection apparatus for supervised model fine-tuning according to an embodiment of this application is shown.
[0024] Figure 5 A block diagram of an electronic device suitable for implementing a training data selection method for supervised model fine-tuning, according to an embodiment of this application, is shown. Detailed Implementation
[0025] The embodiments of this application will now be described with reference to the accompanying drawings. However, it should be understood that these descriptions are exemplary only and are not intended to limit the scope of this application. In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the embodiments of this application for ease of explanation. However, it will be apparent that one or more embodiments may be implemented without these specific details. Furthermore, descriptions of well-known structures and technologies are omitted in the following description to avoid unnecessarily obscuring the concepts of this application.
[0026] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of this application. The terms “comprising,” “including,” etc., as used herein indicate the presence of the stated features, steps, operations, and / or components, but do not exclude the presence or addition of one or more other features, steps, operations, or components.
[0027] All terms used herein (including technical and scientific terms) have the meanings commonly understood by those skilled in the art, unless otherwise defined. It should be noted that the terms used herein are to be interpreted in a manner consistent with the context of this specification, and not in an idealized or overly rigid way.
[0028] When using expressions such as "at least one of A, B and C", they should generally be interpreted in accordance with the meaning that is commonly understood by those skilled in the art (e.g., "a system having at least one of A, B and C" should include, but is not limited to, a system having A alone, a system having B alone, a system having C alone, a system having A and B, a system having A and C, a system having B and C, and / or a system having A, B and C, etc.).
[0029] With the widespread application of large language models across various fields, how to efficiently filter post-training data to improve model performance on specific tasks has become a current research hotspot. In existing research on mechanistic interpretability, researchers are dedicated to reverse engineering large language models to reveal the causal mechanisms behind their behavior. Recent technological breakthroughs have shown that fine-grained semantic information is encoded in model representations, from neurons corresponding to specific concepts to manipulation vectors controlling high-level attributes such as factuality and security, and sparse features related to downstream tasks. To address the pervasive "superposition hypothesis" problem in neural networks—that is, a single neuron often encodes multiple different semantic concepts (multisemanticity), leading to interpretation difficulties—Sparse Autoencoders (SAEs) have emerged as a key tool. Sparse autoencoders significantly improve the transparency of models by decomposing dense representations into sparse and semantically explicit features.
[0030] However, most current research on mechanistic interpretability remains at the level of "explaining" model behavior, or is only applied to interventions during inference, such as controlling model output by applying offset vectors, and fine-tuning reward models for safe alignment. This inference-time intervention approach often faces high latency and instability in practical applications, and there is currently a lack of a universal process to transform these deep insights into the model's internal workings into proactive guidance signals, thereby directly optimizing the model construction process during the training phase.
[0031] Meanwhile, in data screening techniques, the existing paradigm has shifted from prioritizing data quantity to emphasizing data quality. Current mainstream automated screening methods typically rely on external models or self-evaluation metrics to score data quality, or employ methods such as the law of continued entropy to balance data quality and diversity, thereby enhancing model robustness. However, these methods have significant limitations: on the one hand, they often treat the model to be trained as a black box, relying primarily on external signals to judge the utility of the data, ignoring the direct feedback from the model's internal state; on the other hand, recent research indicates that as the data scales, these externally metric-based screening methods often struggle to outperform simple random selection, casting doubt on their effectiveness.
[0032] In summary, while existing technologies have revealed the internal characteristics of models through mechanistic interpretability research, such as features identified by sparse autoencoders, they have failed to effectively apply these characteristics to the crucial data selection process. Furthermore, existing data selection methods suffer from inefficiency in large-scale scenarios due to a lack of awareness of the model's internal state. Therefore, there is an urgent need for a technical solution that can break down these barriers and directly guide data selection using interpretability signals from within the model.
[0033] Figure 1 An application scenario diagram of the training data selection method for supervised model fine-tuning according to an embodiment of this application is shown.
[0034] like Figure 1 As shown, application scenario 100 according to this embodiment may include model training and optimization scenarios. Network 104 serves as a medium for providing communication links between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. Network 104 may include various connection types, such as wired or wireless communication links or fiber optic cables, etc.
[0035] Users can use the first terminal device 101, the second terminal device 102, and the third terminal device 103 to interact with the server 105 via the network 104 to receive or send messages, etc. Various communication client applications can be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social media platform software, etc. (for example only).
[0036] The first terminal device 101, the second terminal device 102, and the third terminal device 103 can be various electronic devices with displays and support web browsing, including but not limited to smartphones, tablets, laptops, and desktop computers.
[0037] Server 105 can be a server that provides various services, such as a backend management server that supports websites browsed by users using the first terminal device 101, the second terminal device 102, and the third terminal device 103 (this is just an example). The backend management server can analyze and process data such as received user requests, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal devices.
[0038] It should be noted that the training data selection method for supervised model fine-tuning provided in this application embodiment can generally be executed by server 105. Correspondingly, the training data selection device for supervised model fine-tuning provided in this application embodiment can generally be located in server 105. The training data selection method for supervised model fine-tuning provided in this application embodiment can also be executed by a server or server cluster that is different from server 105 and capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and / or server 105. Correspondingly, the training data selection device for supervised model fine-tuning provided in this application embodiment can also be located in a server or server cluster that is different from server 105 and capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and / or server 105.
[0039] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.
[0040] The following will be based on Figure 1 The described scene, through Figures 2-3 The training data selection method for supervised model fine-tuning according to the disclosed embodiments is described in detail.
[0041] Figure 2A flowchart of a training data selection method for supervised model fine-tuning according to an embodiment of this application is shown.
[0042] like Figure 2 As shown, the training data selection method for supervised model fine-tuning in this embodiment includes operations S210 to S250.
[0043] In operation S210, hidden state representations are extracted from the processing results of the text synthesis sequence in the target network layer of the source model, and the sparse computing accelerator is called to perform sparse projection processing on the hidden state representations to obtain multidimensional sparse activation features. The text synthesis sequence includes the text description and execution results of the target task.
[0044] The source model (or model, original model) and target model involved in this application are, for example, large models, namely deep learning models with billions of parameters or more that have been pre-trained on large-scale data. Such models include large language models, large visual models, large multimodal models, and large basic science models.
[0045] Large Language Models (LLMs) are large-scale neural network models based on deep learning, typically employing the Transformer architecture, used to process large amounts of language data and generate high-quality text; while neural network models are machine learning techniques that simulate the neural networks of the human brain to achieve artificial intelligence-like capabilities.
[0046] Among them, the visual big model is a deep learning model with massive parameters used to process visual information. It is used to process massive image data, can extract image features from image data, and thus achieve deep learning of image data.
[0047] Among them, multimodal large models are artificial intelligence models that can simultaneously process and understand multiple different types of data (such as text, images, audio, video, etc.).
[0048] Among them, the Basic Science Big Model is an artificial intelligence model specifically designed for scientific tasks. This Basic Science Big Model can perform deep learning and knowledge reasoning from scientific data, intelligently schedule scientific research tools, and thus promote scientific research.
[0049] The aforementioned sparse computing accelerator is a hardware acceleration device that deploys trained sparse autoencoders (SAEs).
[0050] In operation S220, a feature filtering operation based on the activation frequency threshold is performed on the features in each dimension of the multidimensional sparse activation features to obtain a candidate feature set, which is stored in a memory with a multi-level caching mechanism.
[0051] In operation S230, the activation scalar is operated on based on adaptive decoding weights to obtain a feature influence vector. The feature influence vector is then used to enhance the source model with residual weights to obtain the enhanced source model. The activation scalar is obtained by operating on the processing results of the target network layer on the validation data based on adaptive encoding weights.
[0052] The parameters of both the source model and the enhanced source model are stored in a cache to improve the efficiency of training data selection.
[0053] In operation S240, the original output of the source model performing the target task and the enhanced output of the enhanced source model performing the target task are used to obtain significant positive information for verifying the causal relationship between the input and output of the source model. The significant positive information is then used to perform a screening operation on the candidate feature set to obtain the target feature set.
[0054] The original results, enhanced results, and target feature set are all stored in a cache memory.
[0055] In operation S250, the original training dataset is processed by filtering based on the feature activation magnitude using the target feature set to obtain a training dataset for supervised fine-tuning of the target model.
[0056] According to embodiments of this application, the target task includes at least one of a translation task, a summary generation task, and a mathematical reasoning task; wherein the source model includes a translation source model, a summary generation source model, and a mathematical reasoning source model; wherein the target model includes a translation target model, a summary generation target model, and a mathematical reasoning target model; wherein the text synthesis sequence includes at least one of a translation text sequence, a summary generation text sequence, and a mathematical reasoning text sequence; wherein the verification data includes at least one of a text sample for the translation task, a text sample for the summary generation task, and a text sample for the mathematical reasoning task.
[0057] The training data selection method for supervised model fine-tuning provided in this application extracts sparse activation features that truly drive the task from the model's hidden states, more closely resembling the model's actual learning mechanism. This upgrades the model's execution process for the target task from "data-driven" to "feature-driven." By processing the hidden state representations generated during the model's execution of the target task, the training process is transformed from a black box into an observable process. This solves the technical problem that existing mechanisms cannot effectively translate interpretability results into model optimization actions, proactively enhancing the model's learning ability and effectively avoiding the idleness and waste of interpretable information in the model optimization process. Furthermore, the training data selection method for supervised model fine-tuning provided in this application addresses the problems of low selection accuracy and low data efficiency caused by treating the model as a black box in existing data selection methods. It achieves the goal of efficiently improving the model's performance on a specific task using a small amount of high-value data, effectively saving computational resources (such as hardware resources used for computation) and improving the model's training efficiency.
[0058] The following describes specific implementation methods in conjunction with appendices. Figure 3 The method for selecting training data for supervised fine-tuning of the model provided in this application is described in further detail.
[0059] Figure 3 A framework diagram of an interpretability-driven data filtering method according to an embodiment of this application is shown.
[0060] The Interpretability-Guided Data Selection (IGDS) method provided in this application obtains training data for supervised model fine-tuning through multi-stage processing of the hidden state representations, such as... Figure 3 As shown, it includes a preparation stage, a task feature identification stage, and a training data selection stage.
[0061] In the preparatory phase, multidimensional sparse activation feature extraction is performed using trained SAEs, which are deployed on hardware acceleration devices. First, a text synthesis sequence (or data samples related to the target task) is input into the source model, and the target network layer specified by the source model (such as...) is extracted. Figure 3 The hidden state representation of the output of the 12 layers shown in (a) (e.g.) Figure 3As shown, the text synthesis sequence related to the target task is input into the source model consisting of N (N is a positive integer greater than 0) Transformers using MHA (Multi-Head Attention) + FFN (Feed-Forward Neural Network) to obtain hidden state representations. SAEs trained in a sparse computing accelerator are then used to encode / decode the hidden states, mapping the dense hidden states to multi-dimensional sparse activation features. This achieves decoupling and interpretability of the model's internal semantics. The output of the multi-dimensional sparse activation features is a position-feature level sparse activation value (e.g., ...). Figure 3 (a) shows the position 2291 - activation value 32.6.
[0062] In the task feature recognition stage, including the high-frequency candidate feature recall sub-stage (such as... Figure 3 (as shown in (b)) and the intervention filtering sub-stage of causal verification (as shown in (b)) Figure 3 As shown in (c)). In the high-frequency candidate feature recall sub-stage, activation extraction is performed on validation samples (i.e., validation data) related to the target task to obtain the cross-sample activation frequency of each feature; a frequency threshold filter is used to select candidate feature sets with high-frequency stable activation in the target task; only features with statistical significance for the task are retained, while noise and occasional activations are filtered out. In the intervention filtering sub-stage of causal verification, an intervention experiment is performed on the candidate features: the activation amplitude of the target feature is amplified, and the difference between the original model output and the enhanced output is compared; performance gain is calculated to quantify the causal contribution of the feature to the task performance; features are sorted in descending order according to performance gain, and the Top-K features are selected as the final task feature set, retaining only core features with positive causal influence and excluding pseudo-correlated features.
[0063] During the training data selection phase (e.g.) Figure 3 As shown in (d)), the input task feature set and candidate training data (i.e., candidate dataset) are used to connect SAEs and models (such as...). Figure 3 The LLM (Large Language Model) shown calculates the feature resonance score for each data point; the data are sorted according to the resonance score, and the resonant data (Feature-Resonant Data) that can maximize the activation of the key features of the task are selected. Figure 3 (d) shows the data scores and filtering); the final refined dataset (i.e. the feature resonance dataset) retains only the samples that contribute the most to the activation of the task features, which are then used for supervised fine-tuning (SFT) of the target model.
[0064] Through the above specific implementation methods and appendices Figure 3 It is evident that the training data selection method provided in this application solves the technical problems of the disconnect between mechanistic interpretability research and model training optimization in the existing large language model optimization process, as well as the low training efficiency and poor targeting caused by the lack of internal model feedback mechanisms in traditional data selection methods. Specifically, this is reflected in the following two aspects:
[0065] First, it addresses the technical challenge of effectively translating existing mechanistic interpretability findings into model optimization actions. Current mechanistic interpretability research, such as feature extraction using sparse autoencoders, while revealing the existence of decoupled, human-understandable components within the model (such as neurons or manipulation vectors for specific concepts), primarily limits these findings to post-hoc analysis and interpretation or intervention control during reasoning. Existing technologies lack a universal conversion mechanism to map these deep insights into the model's internal workings into specific operations guiding model training. This results in the inability to proactively enhance model capabilities during training, despite identifying key internal features controlling model behavior, leading to the idleness and waste of interpretability information in the model optimization process.
[0066] Secondly, this approach addresses the problems of low accuracy and inefficiency in existing data screening methods, which treat the model as a black box. Traditional data screening strategies primarily rely on external scoring models or superficial quality and diversity metrics, neglecting the internal state and processing mechanisms of the model being trained. Due to a lack of awareness of the model's internal causal mechanisms, existing methods struggle to identify "feature-resonant data" that truly activates and enhances the model's key capabilities for solving specific tasks. Blindly selecting data often necessitates extensive fine-tuning with massive amounts of data to achieve the desired results, or leads to significant performance degradation as the data size decreases. This fails to achieve the goal of efficiently improving model performance on specific tasks using a small amount of high-value data, resulting in wasted computational resources and low training efficiency.
[0067] According to an embodiment of this application, the above-mentioned invoking a sparse computing accelerator to perform sparse projection processing on the hidden state representation to obtain multidimensional sparse activation features includes: deploying a trained sparse autoencoder into a hardware acceleration device to obtain a sparse computing accelerator, wherein the trained sparse autoencoder is used to spatially map the hidden features generated by the source model during the execution of the target task, and the hardware acceleration device includes an image processing device, a tensor processing device, a neural network processing device, or an application-specific integrated circuit; invoking the sparse computing accelerator to obtain the adaptive encoding weights in the process of the target network layer processing the text synthesis sequence, and performing linear activation processing on the hidden state representation and the adaptive encoding weights to achieve sparse projection processing on the hidden state representation to obtain multidimensional sparse activation features.
[0068] The following detailed description of the process for obtaining the multidimensional sparse activation features provided in this application will be provided through specific implementation methods.
[0069] This application leverages the ability of sparse autoencoders (SAEs) to decompose the dense hidden states of a model (e.g., the source model) into sparse, high-dimensional, and interpretable feature sets.
[0070] For a given input sequence (i.e., a text-synthesized sequence with a text description of the target task and an execution result (standard execution result), this application first obtains the results from a specific layer through forward propagation). Extracting hidden state representations ,in, Represents the space of real numbers. The hidden layer dimension of the model is represented; subsequently, the sparse autoencoder uses the encoder parameterized weight matrix. The dense representation is projected into a sparse feature activation vector (i.e., a multidimensional sparse activation feature). As shown in formula (1), where, This represents the hidden layer dimension of SAEs. The parameter biases of the sparse autoencoder are:
[0071] (1).
[0072] in, Describe the real vector space of SAEs. Let the vector space of the model be a real number space. Each dimension of this vector is... Corresponding to a feature (i.e., the source model) The multidimensional sparse activation feature corresponding to the hidden state standard of the target network. The feature space comprises activation features in multiple dimensions, with values reflecting the activation strength of each feature. The goal of this application is to explore this vast feature space and precisely locate the subset of features that have a causal impact on the target task.
[0073] According to embodiments of this application, the above-described feature filtering operation based on activation frequency threshold for each dimension of the multidimensional sparse activation features to obtain a candidate feature set includes: obtaining the marker positions of keyword elements in the target task corpus to which the text synthesis sequence belongs, wherein the target task corpus is stored in a first cache memory, and the keyword elements are used to connect the text description and execution result of the target task; calculating the feature activation frequency of each dimension of the multidimensional sparse activation features in the target task corpus using the marker positions, wherein the feature activation frequency represents the proportion of the number of activated samples in the target task corpus to the total number of samples in the target task corpus; performing a comparison operation between the feature activation frequency and the activation frequency threshold to obtain a comparison result; and if the comparison result is greater than or equal to the activation frequency threshold, using the feature corresponding to the comparison result as a candidate feature to obtain a candidate feature set.
[0074] The following detailed description of the process of obtaining the candidate feature set in the high-frequency candidate feature recall sub-stage of this application is provided through specific implementation methods.
[0075] To identify multidimensional sparse activation features, this application employs a coarse-to-fine filtering strategy, which refines the massive feature space through two consecutive steps: high-frequency candidate feature recall and intervention filtering for causal verification.
[0076] In the high-frequency candidate feature recall sub-stage, key features for a specific task should remain stably active during task execution. Based on this principle, this application first identifies candidate features that are highly relevant to the target task.
[0077] Using inclusion ( A small corpus of (positive integers) prior task-related samples (i.e., the target corpus to which the text synthesis sequence belongs, denoted as ). This involves monitoring feature activation at keyword meta-positions (e.g., the last meta-position at the end of a prompt word). Representation of features For the sample The activation level of the feature. If the activation frequency of the feature (defined as the activation frequency of the feature in...) The proportion of activated samples in the middle exceeds a preset threshold. If a feature is selected as a candidate feature (e.g., 80%), then that feature is chosen. The candidate feature set is formally defined. As shown in formula (2):
[0078] (2).
[0079] in, This is an indicator function that returns 1 if the condition is true, and 0 otherwise. This high-frequency candidate feature recall operation can effectively filter out the vast majority of irrelevant features, thus providing a manageable set of candidate features for subsequent rigorous validation.
[0080] According to an embodiment of this application, the above-mentioned operation of the activation scalar based on adaptive decoding weights to obtain the feature influence vector includes: obtaining the hidden state representation of the verification data generated during the processing of the verification data stored in the first cache memory by the target network layer; calling a sparse computing accelerator with a trained sparse autoencoder to perform an operation based on adaptive encoding weights on the hidden state representation of the verification data to obtain the activation scalar; during the processing of the verification data by the target network layer, calling the sparse computing accelerator to obtain the adaptive decoding weights of the current verification data processing stage, and performing an operation between the adaptive decoding weights and the activation scalar to obtain the feature influence vector.
[0081] The process of obtaining the feature influence vector will be explained in more detail below through specific implementation methods.
[0082] The feature influence vector is obtained during the intervention filtering sub-stage of causal verification: features that are continuously activated in the task data may only reflect general language patterns rather than task-specific mechanisms. To filter out features that truly have causal power, this application... ( A small validation set (or validation dataset) of positive integer samples. Targeted interventions are implemented, and rigorous feature filtering steps are performed. For each candidate feature... This application quantifies the causal contribution of a feature to a task by measuring the gain that feature activation brings to model performance. Specifically, it involves adding the feature vector corresponding to the target feature to the model's residual stream and quantitatively evaluating the changes in task-specific performance metrics caused by this intervention. That is, for each validation sample (i.e., validation data)... This application calculates the feature's influence vector by multiplying its current activation value by the corresponding adaptive decoder weight. As shown in formula (3):
[0083] (3).
[0084] Among them, Representation of features In hidden state The scalar activation value on, and The decoder weight vector representing this feature.
[0085] According to an embodiment of this application, the above-mentioned method of using feature influence vectors to enhance the residual weights of the source model to obtain an enhanced source model includes: extracting the verification hidden state representation from the processing result of the target network layer on the verification data stored in the first cache memory, and performing a calculation on the verification hidden state representation and the feature influence vector to obtain the enhanced verification hidden state representation; using the enhanced verification hidden state representation to enhance the weights of the residual layer of the source model to obtain the enhanced source model, and storing the parameters of the enhanced source model in the second cache memory.
[0086] This application generates two outputs: one from the original model (or source model) Another "enhanced" corresponding model (i.e., the enhanced model or the enhanced source model) is derived from the feature influence vector being directly added to the residual stream. As shown in formula (4):
[0087] (4).
[0088] According to an embodiment of this application, the above-mentioned method of obtaining significant positive information for verifying the causal relationship between the input and output of the source model by using the original result of the source model executing the target task and the enhanced result of the enhanced source model executing the target task includes: scoring the original result and the enhanced result stored in the third cache memory using a preset evaluation index to obtain the score of the original result and the score of the enhanced result; and calculating the score of the original result and the score of the enhanced result to obtain significant positive information.
[0089] According to an embodiment of this application, the above-mentioned filtering operation on the candidate feature set using significant positive information to obtain the target feature set includes: sorting the candidate feature set in descending order using significant positive information to obtain a descending-ordered candidate feature set, and selecting multiple candidate features ranked first from the descending-ordered candidate feature set to construct the target feature set.
[0090] The process of constructing the target feature set in this application will be further explained in detail below through specific implementation methods.
[0091] make and These represent the performance scores of the original output (or original result, original output result) and the enhanced output (or enhanced result, enhanced output result) in the task, respectively. The causal contribution of this feature is defined as the average performance improvement obtained on the validation set, as shown in Equation (5):
[0092] (5).
[0093] in, This indicates the number of samples in the validation set, and is significantly positive (i.e., contains significantly positive information). This indicates that amplifying this feature can continuously improve the model's task capabilities, thus verifying that this feature is a positive causal driving factor. Finally, this application is based on... All candidate features are sorted, and the top K features (K is a positive integer greater than 0) are selected to form the final validation set (task feature set). This intervention-based screening mechanism ensures that features that have a clear benefit to the target task are retained.
[0094] According to an embodiment of this application, the above-mentioned data filtering process based on feature activation magnitude of the original training dataset using the target feature set to obtain the target training dataset includes: concatenating each input sample and its corresponding label value in the original training dataset based on the label position to obtain a concatenated training dataset; calculating the feature activation magnitude value of each concatenated training data at the label position using the target feature set to obtain the feature resonance score of each concatenated training data; sorting and filtering the concatenated training dataset according to the feature resonance score of each concatenated training data to obtain an initial target training dataset; and decoupling the target input sample and its corresponding label value at the label position of each initial target training data in the initial target training dataset to obtain a target training dataset for supervised fine-tuning of the model.
[0095] The process of obtaining the target training dataset in the above embodiments of this application will be further described in detail below through specific implementation methods.
[0096] After determining the task-related feature set for causal verification Then, this subset is used to quantize a large data pool. The validity of each candidate data point is assessed. The core assumption of this application is that the most valuable data for fine-tuning is that which maximally activates the task-related causal mechanisms within the model.
[0097] To achieve this goal, this application introduces the Feature-Resonant Score (FRS, i.e., feature activation amplitude), denoted as... The score is calculated by aggregating all task features at the labeled position of the same keyword meta. The activation amplitude at the location was calculated, and the position of the marker was related to... Figure 3 The task feature recognition stage shown remains consistent. The score is defined as shown in formula (6):
[0098] (6).
[0099] in, Representing task characteristics In position Input The activation values are determined by prioritizing data that strongly resonate with specific task features, i.e., data that maximizes the activation of the subset of features supporting the required capabilities. Finally, the activation values are determined based on the feature resonance scores. All data points are sorted, and a subset is selected according to a preset ratio. This process ultimately generates the training dataset. That is, a high-performance subset of the original corpus, which contains highly targeted and task-related signals.
[0100] This application sets up multiple experiments to comprehensively verify the effectiveness and robustness of the methods provided, and conducts extensive experimental verification and in-depth mechanistic analysis based on the experimental results. The experimental environment covers diverse downstream tasks and models with different architectures, and the methods provided are rigorously compared with current competitive mainstream data selection baselines. This application selects three core tasks—translation, mathematical reasoning, and summary generation—to verify the effectiveness of the technical solution, using COMET (Crosslingual Optimized Metric for Evaluation of Translation, a cross-lingual optimization metric evaluation tool or neural network-based translation evaluation metric), accuracy (pass@8: pass rate in 8 samples), and ROUGE-1 (Crosslingual Optimized Metric for Evaluation of Translation, an alternative summary evaluation method oriented towards recall) as quantitative evaluation metrics, respectively. Experiments are conducted based on Large Language Model 1, Large Language Model 2, and Large Language Model 3 and their corresponding sparse autoencoders (SAEs). To comprehensively evaluate the proposed IGDS framework, this application establishes a rigorous benchmark that includes existing mainstream algorithms and standard control groups. In terms of comparison strategies, three representative methods were selected: first, data quality-based methods, specifically including the Instruction-Following Difficulty (IFD) strategy targeting instruction-following difficulty and the Loss strategy prioritizing low cross-entropy samples; second, the ZIP (compression) strategy based on data diversity, focusing on maximizing semantic coverage; and third, the Random selection strategy, serving as a robust benchmark for verifying the necessity of intelligent screening. In addition, two standard control groups were set up: the original performance group evaluating the zero-sample capability of the base model, and the performance group fine-tuned using the full dataset, serving as a reference for measuring data efficiency and performance improvement.
[0101] Regarding task feature recognition: This application first verifies the effectiveness of the features, and the results are shown in Table 1. For each model and task, this application reports the proportion of high-frequency candidate features in the baseline (Recall column, where 100‰ = 1%), the specific features with the highest positive impact (Feature column), and their corresponding performance improvement values.
[0102] The feature recognition process first applies a frequency filter to a vast search space containing millions of potential SAEs. This initial step significantly reduces the size of the search space. As shown in the recall column, high-frequency candidate features constitute only a tiny fraction of the total, typically just a few basis points (‰). From this streamlined feature pool, the causal verification step of this application can continuously identify task features that, when amplified, can positively impact the performance of the validation set. The degree of this impact (see the amplification column, i.e., the Δ column) is often quite significant.
[0103] In specific cases, this application observed particularly outstanding effects. For example, as shown in Table 1, the large model 1 achieved a performance improvement of 12 points in the mathematical task, while the large language model 3 achieved a significant gain of +8.34 points in the translation task. In Table 1, data in the feature columns, such as L14_p11575, represent the 11575th position in the 14th layer of the target network; other feature columns have similar meanings. Table 1 shows the single-sample accuracy increase, ΔACC (Accuracy); the summary generation metric increase, ΔROUGE-1; and the translation metric increase, ΔCOMET. The consistent discovery of such high-impact features across different models and tasks strongly demonstrates that the framework of this application can reliably identify key features that are not only related to but also causally related to the model's task-solving ability.
[0104] Table 1: Task Feature Recognition Results
[0105]
[0106] Regarding the performance of the IGDS framework: Based on the identified features, this application selects the top 50% of the candidate data pool for fine-tuning the base model. Table 2 provides a comprehensive comparison with all baseline models and control groups (i.e., the original group before fine-tuning, the full fine-tuning group, the random sampling fine-tuning group, the cross-entropy sampling fine-tuning group, the difficulty-following sampling fine-tuning group, and the compressed sampling fine-tuning group shown in Table 2). The best results of all fine-tuning methods in each model-task combination are marked in italics, and the percentage in parentheses indicates the gap relative to the full SFT performance. The results show that in all tested models and tasks, IGDS (i.e., the method of this application described in Table 2) consistently and significantly outperforms all other data selection methods. Furthermore, in various scenarios, the method of this application even surpasses the model performance of the full data fine-tuned version. As shown in parentheses in Table 2, IGDS even achieves an astonishing +17.4% relative gain on the Large Language Model 1 mathematical task. This phenomenon strongly validates the core hypothesis of this application: data selection through the model's own causal verification mechanism is an effective strategy for achieving accurate model optimization.
[0107] Table 2 Performance Comparison of the Three Tasks
[0108]
[0109] Based on the aforementioned training data selection method for supervised model fine-tuning, this application also provides a training data selection device for supervised model fine-tuning. The following will combine... Figure 4 The device is described in detail.
[0110] Figure 4 A structural diagram of a training data selection apparatus for supervised model fine-tuning according to an embodiment of this application is shown.
[0111] like Figure 4 As shown, the training data screening device 400 for supervised model fine-tuning includes a multidimensional sparse activation feature acquisition module 410, a candidate feature set acquisition module 420, a source model enhancement module 430, a target feature set acquisition module 440, and a training dataset screening module 450.
[0112] The multidimensional sparse activation feature acquisition module 410 is used to extract hidden state representations from the processing results of the text synthesis sequence by the target network layer of the source model, and to call a sparse computing accelerator to perform sparse projection processing on the hidden state representations to obtain multidimensional sparse activation features. The text synthesis sequence includes the text description and execution result of the target task. In one embodiment, the multidimensional sparse activation feature acquisition module 410 can be used to perform the operation S210 described above, which will not be repeated here.
[0113] The candidate feature set acquisition module 420 is used to perform a feature filtering operation based on an activation frequency threshold on the features in each dimension of the multidimensional sparse activation features to obtain a candidate feature set, wherein the candidate feature set is stored in a memory with a multi-level caching mechanism. In one embodiment, the candidate feature set acquisition module 420 can be used to perform the operation S220 described above, which will not be repeated here.
[0114] The source model enhancement module 430 is used to perform adaptive decoding weight-based operations on the activation scalar to obtain a feature influence vector, and then use the feature influence vector to enhance the source model with residual weights to obtain an enhanced source model. The activation scalar is obtained by performing adaptive encoding weight operations on the processing results of the target network layer on the validation data. In one embodiment, the source model enhancement module 430 can be used to perform the operation S230 described above, which will not be repeated here.
[0115] The target feature set acquisition module 440 is used to obtain significant positive information for verifying the causal relationship between the input and output of the source model by utilizing the original output of the source model performing the target task and the enhanced output of the enhanced source model performing the target task. It then uses this significant positive information to perform a filtering operation on the candidate feature set to obtain the target feature set. In one embodiment, the target feature set acquisition module 440 can be used to perform the operation S240 described above, which will not be repeated here.
[0116] The training dataset filtering module 450 is used to perform data filtering processing on the original training dataset based on the feature activation magnitude using the target feature set, to obtain a training dataset for supervised fine-tuning of the target model. In one embodiment, the training dataset filtering module 450 can be used to perform the operation S250 described above, which will not be repeated here.
[0117] According to embodiments of this application, any multiple modules among the multidimensional sparse activation feature acquisition module 410, candidate feature set acquisition module 420, source model enhancement module 430, target feature set acquisition module 440, and training dataset screening module 450 can be merged into one module, or any one of these modules can be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules can be combined with at least part of the functionality of other modules and implemented in one module. According to embodiments of this application, at least one of the multidimensional sparse activation feature acquisition module 410, candidate feature set acquisition module 420, source model enhancement module 430, target feature set acquisition module 440, and training dataset screening module 450 can be at least partially implemented as hardware circuits, such as field-programmable gate arrays (FPGAs), programmable logic arrays (PLAs), systems-on-a-chip, systems-on-a-substrate, systems-on-package, application-specific integrated circuits (ASICs), or any other reasonable means of integrating or packaging circuits, or implemented in software, hardware, or firmware, or in any appropriate combination of any of these three implementation methods. Alternatively, at least one of the multidimensional sparse activation feature acquisition module 410, candidate feature set acquisition module 420, source model enhancement module 430, target feature set acquisition module 440, and training dataset filtering module 450 can be at least partially implemented as a computer program module, which can perform corresponding functions when the computer program module is run.
[0118] Figure 5 A block diagram of an electronic device suitable for implementing a training data selection method for supervised model fine-tuning, according to an embodiment of this application, is shown.
[0119] like Figure 5 As shown, an electronic device 500 according to an embodiment of this application includes a processor 501, which can perform various appropriate actions and processes according to a program stored in ROM 502 (Read-Only Memory) or a program loaded from storage portion 508 into RAM 503 (Random Access Memory). The processor 501 may include, for example, a general-purpose microprocessor (e.g., a CPU), an instruction set processor and / or an associated chipset and / or a special-purpose microprocessor (e.g., an application-specific integrated circuit (ASIC)), etc. The processor 501 may also include onboard memory for caching purposes. The processor 501 may include a single processing unit or multiple processing units for performing different actions of the method flow according to an embodiment of this application.
[0120] RAM 503 stores various programs and data required for the operation of electronic device 500. Processor 501, ROM 502, and RAM 503 are interconnected via bus 504. Processor 501 executes various operations of the method flow according to embodiments of this application by executing programs in ROM 502 and / or RAM 503. It should be noted that the programs may also be stored in one or more memories other than ROM 502 and RAM 503. Processor 501 may also execute various operations of the method flow according to embodiments of this application by executing programs stored in said one or more memories.
[0121] According to embodiments of this application, the electronic device 500 may further include an input / output (I / O) interface 505, which is also connected to a bus 504. The electronic device 500 may also include one or more of the following components connected to the input / output (I / O) interface 505: an input section 506 including a keyboard, mouse, etc.; an output section 507 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 508 including a hard disk, etc.; and a communication section 509 including a network interface card such as a LAN card, modem, etc. The communication section 509 performs communication processing via a network such as the Internet. A drive 510 is also connected to the input / output (I / O) interface 505 as needed. A removable medium 511, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 510 as needed so that computer programs read from it can be installed into the storage section 508 as needed.
[0122] This application also provides a computer-readable storage medium, which may be included in the device / apparatus / system described in the above embodiments; or it may exist independently and not assembled into the device / apparatus / system. The computer-readable storage medium carries one or more programs, which, when executed, implement the method according to the embodiments of this application.
[0123] According to embodiments of this application, the computer-readable storage medium can be a non-volatile computer-readable storage medium, such as including but not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this application, the computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. For example, according to embodiments of this application, the computer-readable storage medium may include ROM 502 and / or RAM 503 and / or one or more memories other than ROM 502 and RAM 503 described above.
[0124] Embodiments of this application also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowchart. When the computer program product is run on a computer system, the program code is used to cause the computer system to implement the methods provided in the embodiments of this application.
[0125] When the computer program is executed by the processor 501, it performs the functions defined in the system / apparatus of this application embodiment. According to the embodiments of this application, the systems, apparatuses, modules, units, etc., described above can be implemented by computer program modules.
[0126] In one embodiment, the computer program may rely on a tangible storage medium such as an optical storage device or a magnetic storage device. In another embodiment, the computer program may also be transmitted and distributed in the form of signals over a network medium, and may be downloaded and installed via the communication section 509, and / or installed from a removable medium 511. The program code contained in the computer program can be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination thereof.
[0127] In such an embodiment, the computer program can be downloaded and installed from a network via communication section 509, and / or installed from removable medium 511. When the computer program is executed by processor 501, it performs the functions defined in the system of this application embodiment. According to embodiments of this application, the systems, devices, apparatuses, modules, units, etc., described above can be implemented by computer program modules.
[0128] According to embodiments of this application, program code for executing the computer programs provided in the embodiments of this application can be written in any combination of one or more programming languages. Specifically, these computational programs can be implemented using high-level procedural and / or object-oriented programming languages, and / or assembly / machine languages. Programming languages include, but are not limited to, languages such as Java, C++, Python, "C", or similar programming languages. The program code can be executed entirely on the user's computing device, partially on the user's device, partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (e.g., via the Internet using an Internet service provider).
[0129] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0130] Those skilled in the art will understand that the features described in the various embodiments of this application can be combined and / or combined in various ways, even if such combinations or combinations are not explicitly described in this application. In particular, the features described in the various embodiments of this application can be combined and / or combined in various ways without departing from the spirit and teachings of this application. All such combinations and / or combinations fall within the scope of this application.
[0131] The embodiments of this application have been described above. However, these embodiments are merely illustrative and not intended to limit the scope of this application. Although various embodiments have been described above, this does not mean that the measures in the various embodiments cannot be used advantageously in combination. Without departing from the scope of this application, those skilled in the art can make various substitutions and modifications, all of which should fall within the scope of this application.
Claims
1. A training data filtering method for model supervised fine-tuning, characterized in that, The method includes: The hidden state representation is extracted from the processing result of the text synthesis sequence by the target network layer of the source model, and the hidden state representation is sparsely projected by the sparse computing accelerator to obtain multidimensional sparse activation features. The text synthesis sequence includes the text description and execution result of the target task. A feature filtering operation based on an activation frequency threshold is performed on the features in each dimension of the multidimensional sparse activation features to obtain a candidate feature set, wherein the candidate feature set is stored in a memory with a multi-level caching mechanism. The activation scalar is processed based on adaptive decoding weights to obtain a feature influence vector, and the source model is enhanced with residual weights using the feature influence vector to obtain an enhanced source model. The activation scalar is obtained by processing the verification data by the target network layer based on adaptive encoding weights. The original output of the source model executing the target task and the enhanced output of the enhanced source model executing the target task are used to obtain significant positive information for verifying the causal relationship between the input and output of the source model. The significant positive information is then used to perform a filtering operation on the candidate feature set to obtain the target feature set. The original training dataset is filtered based on the feature activation magnitude using the target feature set to obtain a training dataset for supervised fine-tuning of the target model.
2. The method of claim 1, wherein, The hidden state representation is sparsely projected using a sparse computing accelerator to obtain multidimensional sparse activation features, including: The trained sparse autoencoder is deployed to a hardware acceleration device to obtain the sparse computing accelerator. The trained sparse autoencoder is used to spatially map the hidden features generated by the source model during the execution of the target task. The hardware acceleration device includes an image processing device, a tensor processing device, a neural network processing device, or an application-specific integrated circuit. The sparse computing accelerator is invoked to obtain the adaptive encoding weights of the target network layer during the processing of the text synthesis sequence, and the hidden state representation and the adaptive encoding weights are linearly activated to achieve sparse projection processing of the hidden state representation, thereby obtaining the multidimensional sparse activation features.
3. The method of claim 1, wherein, Perform a feature filtering operation based on the activation frequency threshold on the features in each dimension of the multidimensional sparse activation features to obtain a candidate feature set including: Obtain the marker position of the keyword element in the target task corpus to which the text synthesis sequence belongs, wherein the target task corpus is stored in a first cache memory, and the keyword element is used to connect the text description and execution result of the target task; The feature activation frequency of each dimension of the multidimensional sparse activation feature in the target task corpus is calculated using the marked position, wherein the feature activation frequency represents the proportion of the number of activated samples in the target task corpus to the total number of samples in the target task corpus. The feature activation frequency is compared with the activation frequency threshold to obtain the comparison result; If the comparison result is greater than or equal to the activation frequency threshold, the feature corresponding to the comparison result is taken as a candidate feature to obtain the candidate feature set.
4. The method of claim 1, wherein, The activation scalar is then processed using adaptive decoding weights to obtain the feature influence vector, which includes: Obtain the hidden state representation of the verification data generated during the process of the target network layer processing the verification data stored in the first cache memory; The sparse computation accelerator with a trained sparse autoencoder is invoked to perform an operation based on adaptive encoding weights on the hidden state representation of the verification data to obtain the activation scalar. During the processing of the verification data in the target network layer, the sparse computing accelerator is invoked to obtain the adaptive decoding weights for the current verification data processing stage. The adaptive decoding weights are then used in conjunction with the activation scalar to obtain the feature influence vector.
5. The method according to claim 1, characterized in that, The source model is augmented using the feature influence vector to obtain an augmented source model, which includes: The verification hidden state representation is extracted from the processing result of the verification data stored in the first cache memory by the target network layer, and the verification hidden state representation is operated with the feature influence vector to obtain the enhanced verification hidden state representation. The weights of the residual layer of the source model are enhanced using the enhanced verification hidden state representation to obtain the enhanced source model, and the parameters of the enhanced source model are stored in the second cache memory.
6. The method of claim 5, wherein, The significant positive information obtained by using the original output of the source model to perform the target task and the enhanced output of the enhanced source model to perform the target task includes: The original results and the enhanced results stored in the third cache memory are scored using preset evaluation indicators to obtain the scores of the original results and the enhanced results. The significant positive information is obtained by calculating the scores of the original results and the enhanced results.
7. The method of claim 6, wherein, Using the significant positive information, a filtering operation is performed on the candidate feature set to obtain the target feature set, which includes: The candidate feature set is sorted in descending order using the significant positive information to obtain a descending-ordered candidate feature set. Then, multiple candidate features ranked first in the descending-ordered candidate feature set are selected to construct the target feature set.
8. The method of claim 3, wherein, Using the target feature set, the original training dataset is subjected to data filtering based on feature activation magnitude to obtain the target training dataset, which includes: Each input sample and its corresponding label value in the original training dataset are concatenated based on the label position to obtain the concatenated training dataset. The feature activation magnitude value of each spliced training data in the spliced training dataset at the marked position is calculated using the target feature set to obtain the feature resonance score of each spliced training data. The spliced training dataset is sorted and filtered according to the feature resonance score of each spliced training data to obtain the initial target training dataset; The target input sample and the label value corresponding to the target input sample in each initial target training data in the initial target training dataset are decoupled at the label position to obtain the target training dataset for supervised fine-tuning of the model.
9. The method according to any one of claims 1 to 8, characterized in that, The target task includes at least one of translation task, summary generation task, and mathematical reasoning task; The source models include translation source models, summary generation source models, and mathematical reasoning source models; The target models include a translation target model, a summary generation target model, and a mathematical reasoning target model; The text synthesis sequence includes at least one of a translated text sequence, a summary-generated text sequence, and a mathematical reasoning text sequence; The verification data includes at least one of the following: text samples for translation tasks, text samples for summarization tasks, and text samples for mathematical reasoning tasks.
10. A training data filtering apparatus for model supervised fine-tuning, characterized in that, The device includes: The multidimensional sparse activation feature acquisition module is used to extract hidden state representations from the processing results of the text synthesis sequence in the target network layer of the source model, and call the sparse computing accelerator to perform sparse projection processing on the hidden state representations to obtain multidimensional sparse activation features. The text synthesis sequence includes the text description and execution result of the target task. The candidate feature set acquisition module is used to perform a feature filtering operation based on the activation frequency threshold on the features in each dimension of the multidimensional sparse activation features to obtain a candidate feature set, wherein the candidate feature set is stored in a memory with a multi-level caching mechanism. The source model enhancement module is used to perform an operation on the activation scalar based on adaptive decoding weights to obtain a feature influence vector, and to use the feature influence vector to enhance the source model with residual weights to obtain an enhanced source model. The activation scalar is obtained by performing an operation on the processing result of the target network layer on the validation data based on adaptive encoding weights. The target feature set acquisition module is used to obtain significant positive information for verifying the causal relationship between the input and output of the source model by using the original result of the source model executing the target task and the enhanced result of the enhanced source model executing the target task, and to use the significant positive information to perform a filtering operation on the candidate feature set to obtain the target feature set; The training dataset filtering module is used to perform data filtering processing on the original training dataset based on the feature activation magnitude using the target feature set, so as to obtain a training dataset for supervised fine-tuning of the target model.