Model scheduling methods, devices, and equipment based on capability vector matching

By converting the question text into capability vectors and combining them with capability vectors in the model library and the call cost, the most suitable AI model is selected, which solves the problem of inaccurate model scheduling in existing technologies and achieves optimized resource utilization and improved response quality.

CN122309750APending Publication Date: 2026-06-30TSINGHUA UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TSINGHUA UNIVERSITY
Filing Date
2026-05-26
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, AI model scheduling methods cannot finely match the complexity of the problem with the multidimensional capabilities of the model, resulting in resource waste and unstable response quality.

Method used

By converting the question text into a question capability vector, and matching it with the capability vectors of models in the model library, the target model with the lowest call cost is selected to ensure that the model's capability values ​​cover the question requirements in all task dimensions.

Benefits of technology

This achieves precise model scheduling, avoids resource waste, improves the stability of response quality, and reduces computing resources and costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309750A_ABST
    Figure CN122309750A_ABST
Patent Text Reader

Abstract

This invention provides a model scheduling method, apparatus, and device based on capability vector matching, relating to the field of artificial intelligence technology. The method includes: converting input question text into a question capability vector; matching at least one candidate model from a model library based on the question capability vector, the model library including multiple models and the model capability vector of each model, the model capability vector including capability values ​​of the corresponding model in multiple task dimensions, wherein the capability values ​​of each task dimension in the model capability vector of the candidate model are all greater than or equal to the dimension value of the question capability vector in the corresponding task dimension; and scheduling a target model with the lowest calling cost from at least one candidate model based on the calling cost of each candidate model, the target model being used to determine the target response information for the question text. This invention enables precise model scheduling, avoids waste of system resources, and improves the stability of response quality.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to a model scheduling method, apparatus, and device based on capability vector matching. Background Technology

[0002] With the rapid development of artificial intelligence (AI) technologies such as large language models, numerous AI models with varying functions and focuses have emerged, demonstrating different strengths in different tasks. Therefore, selecting the most suitable and cost-effective model based on the user's input question is a key issue in building an efficient and economical AI service platform.

[0003] In existing technologies, the mapping between task types and models is usually pre-defined based on manual design. Upon receiving a user's input question, the system routes the question to the pre-defined model based on keywords in the question or the task classification results, and then generates a response based on the determined model.

[0004] However, the above approach cannot finely match the complexity of the problem with the multidimensional capabilities of the model, which may lead to a mismatch between the problem and the selected model, resulting in wasted system resources or unstable response quality. Summary of the Invention

[0005] This invention provides a model scheduling method, apparatus, and device based on capability vector matching to solve the defects in the prior art where the user input problem and the selected model are mismatched, resulting in wasted system resources or unstable response quality. It achieves more accurate model scheduling, avoids wasting system resources, and improves the stability of response quality.

[0006] This invention provides a model scheduling method based on capability vector matching, comprising: Convert the input question text into a question capability vector; Based on the problem capability vector, at least one candidate model is matched from the model library. The model library includes multiple models and the model capability vector of each model. The model capability vector includes the capability values ​​of the corresponding model under multiple task dimensions. The capability values ​​of each task dimension in the model capability vector of the candidate model are greater than or equal to the dimension value of the problem capability vector in the corresponding task dimension. Based on the invocation cost of each of the candidate models, the target model with the lowest cost is scheduled from at least one of the candidate models, and the target model is used to determine the target response information for the question text.

[0007] According to a model scheduling method based on capability vector matching provided by the present invention, the step of converting the input question text into a question capability vector includes: The question text is segmented into multiple words. Multiple word segments are input into a large language model to obtain the semantic representation vector output by the large language model; The semantic representation vector is input into the vector transformation model to obtain the problem capability vector output by the vector transformation model. The vector transformation model is trained based on contrastive learning loss or triplet loss.

[0008] According to the present invention, a model scheduling method based on capability vector matching is provided, wherein the vector transformation model is trained in the following manner: Obtain multiple triplet training samples, each triplet training sample including a sample question, a sample response, and an evaluation result of the sample response; For each of the triplet training samples, determine the sample model that generates the sample response in the triplet training samples, and determine the sample model capability vector of the sample model; The sample questions in the triplet training samples are input into the large language model to obtain the sample semantic representation vector output by the large language model; The sample semantic representation vector is input into the initial vector transformation model to obtain the prediction problem capability vector output by the initial vector transformation model; Based on the evaluation results corresponding to each triplet training sample, the sample model capability vector, and the prediction problem capability vector, loss information is determined, including contrastive learning loss or triplet loss. Based on the loss information, the model parameters of the initial vector transformation model are adjusted to obtain the vector transformation model.

[0009] According to the present invention, a model scheduling method based on capability vector matching is provided, the method further includes: Obtain multiple datasets under each of the aforementioned task dimensions; For each model in the model library, for each task dimension, the model is tested using multiple datasets under the task dimension to obtain the performance score of the model on each dataset. Based on the performance scores of the model on each dataset under the task dimension, the capability value of the model under the task dimension is determined. Based on the model's capability values ​​in each of the task dimensions, the model capability vector is determined.

[0010] According to a model scheduling method based on capability vector matching provided by the present invention, the step of scheduling the target model with the lowest call cost from at least one of the candidate models based on the call cost of each candidate model includes: Cost preferences for obtaining user input; Based on the cost preference and the invocation cost of each of the candidate models, determine the comprehensive cost of each of the candidate models; Based on the comprehensive cost of each of the candidate models, the candidate model with the lowest comprehensive cost is determined as the target model.

[0011] According to the present invention, a model scheduling method based on capability vector matching is provided, the method further includes: The question text is input into the target model to obtain the response information; The response information is input into the text quality assessment model to obtain the quality score output by the text quality assessment model, which is trained based on sample text and score labels. If the quality score is greater than or equal to a preset score, the response information is determined as the target response information.

[0012] According to the present invention, a model scheduling method based on capability vector matching is provided, the method further includes: If the quality score is less than the preset score, obtain adjustment suggestions for the response information output by the text quality assessment model; Based on the adjustment suggestions, update the dimension values ​​of all or part of the task dimensions in the problem capability vector to obtain the updated problem capability vector; Based on the updated problem capability vector, at least one new candidate model is rematched from the model library; A new target model is determined based on the invocation cost of at least one of the new alternative models.

[0013] According to the present invention, a model scheduling method based on capability vector matching, wherein matching at least one candidate model from a model library based on the problem capability vector includes: The problem capability vector is used as a query vector and searched in a pre-built vector index to obtain the target model capability vector. The value of each dimension of the target model capability vector is greater than or equal to the value of each dimension corresponding to the query vector. The vector index is constructed based on the model capability vectors of all models in the model library. The model corresponding to the target model capability vector is determined as the candidate model.

[0014] The present invention also provides a model scheduling device based on capability vector matching, comprising: The conversion module is used to convert the input question text into a question capability vector; A matching module is used to match at least one candidate model from a model library based on the problem capability vector. The model library includes multiple models and model capability vectors of each model. The model capability vectors include capability values ​​of the corresponding model in multiple task dimensions. The capability values ​​of each task dimension in the model capability vectors of the candidate models are all greater than or equal to the dimension value of the problem capability vector in the corresponding task dimension. A scheduling module is used to schedule the target model with the lowest cost from at least one of the candidate models based on the calling cost of each of the candidate models, the target model being used to determine the target response information for the question text.

[0015] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the model scheduling method based on capability vector matching as described above.

[0016] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the model scheduling method based on capability vector matching as described above.

[0017] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the capability vector matching-based model scheduling method as described above.

[0018] The present invention provides a model scheduling method, apparatus, and device based on capability vector matching. This method converts input question text into question capability vectors and matches at least one candidate model from a model library based on these vectors. The model library includes multiple models and their capability vectors, each containing capability values ​​across multiple task dimensions. The capability values ​​of each task dimension in the candidate model's capability vector are greater than or equal to the dimension values ​​of the question capability vector in the corresponding task dimension. Based on the calling cost of each candidate model, the method schedules the target model with the lowest scheduling cost from the at least one candidate model. This target model is used to determine the target response information for the question text. On one hand, by employing a quantitative matching method between question capability vectors and model capability vectors, it ensures that the selected model's capability values ​​across all task dimensions cover the question requirements, avoiding response quality issues caused by insufficient model capability. It also solves the resource waste problem caused by excessive calls to high-performance models, effectively reducing computational resources. This allows for refined and dynamic matching of question complexity and model multi-dimensional capabilities, achieving precise model scheduling. On the other hand, by selecting the target model with the lowest scheduling cost from the qualified candidate models, the cost of model scheduling can be reduced. Attached Figure Description

[0019] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0020] Figure 1 This is a flowchart illustrating the model scheduling method based on capability vector matching provided in an embodiment of the present invention.

[0021] Figure 2 This is a schematic diagram of the training process of the vector transformation model.

[0022] Figure 3 This is a schematic diagram of the structure of a model scheduling device based on capability vector matching provided in an embodiment of the present invention.

[0023] Figure 4 This is a schematic diagram of the physical structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0024] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0025] With the rapid development of AI technology, various AI models with different functions and focuses have emerged. These models demonstrate unique advantages in specific tasks such as text generation, logical reasoning, code programming, and multi-turn dialogue. Therefore, a key issue is how to select the most suitable and cost-effective model from the model library based on the specific characteristics of the user input question. Current model scheduling methods mainly rely on static scheduling based on manually designed rules. For example, a correspondence between task types and models is pre-defined, allowing the question to be routed to the corresponding model based on keywords in the question or the classification results after task classification, so that the model can output the corresponding response. However, the above method cannot finely and quantitatively match the complexity of the question with the multidimensional capabilities of the model. This can lead to a situation where the selected model is either insufficiently capable of handling the task or overcapacitated, resulting in unnecessary resource waste.

[0026] In view of the above problems, this invention proposes a model scheduling method based on capability vector matching. By matching the problem capability vector based on the problem text with the model capability vectors of each model in the model library, the target model with the lowest cost is selected. This not only accurately determines the model that matches the problem text, but also reduces costs, achieves more precise model scheduling, avoids waste of system resources, and improves the stability of response quality.

[0027] The following is combined Figure 1 and Figure 2 The capability vector matching-based model scheduling method provided in this invention is described below. This invention is applicable to AI service platform scenarios that employ AI models for question text processing, multi-model scheduling, or multi-model collaboration, such as online education question-and-answer systems, content generation platforms, or intelligent customer service systems. The executing entity of this method can be a terminal device, computer, server, server cluster, or a specially designed capability vector matching-based model scheduling device, or it can be a capability vector matching-based model scheduling device installed in such electronic equipment. This capability vector matching-based model scheduling device can be implemented through software, hardware, or a combination of both.

[0028] Figure 1This is a flowchart illustrating the model scheduling method based on capability vector matching provided in an embodiment of the present invention, as shown below. Figure 1 As shown, the method includes: Step 101: Convert the input question text into a question capability vector.

[0029] In this step, the question text can be understood as a specific, naturally language-described task or question input by the user, such as "Please explain the meaning of artificial intelligence." Each dimension of the question capability vector corresponds to a specific task capability, such as text understanding, code generation, logical reasoning, multi-turn dialogue, etc. The dimension value of each dimension represents the minimum capability level required in that dimension to solve the question corresponding to the question text.

[0030] In addition, to facilitate subsequent vector comparison and matching, the generated problem capability vector can be normalized, for example, by scaling the values ​​of each dimension to a uniform range, such as [0,100] or [0,1].

[0031] Step 102: Based on the problem capability vector, match at least one candidate model from the model library. The model library includes multiple models and the model capability vector of each model. The model capability vector includes the capability value of the corresponding model in multiple task dimensions. The capability value of each task dimension in the model capability vector of the candidate model is greater than or equal to the dimension value of the problem capability vector in the corresponding task dimension.

[0032] In this step, the pre-built model library includes multiple models and a model capability vector for each model. The model capability vector has the same dimension as the problem capability vector, and each dimension corresponds to a specific task capability value, such as text understanding, code generation, logical reasoning, multi-turn dialogue, etc.

[0033] After determining the problem capability vector, the dimension value of each task dimension in the problem capability vector is compared with the corresponding task dimension capability value of the model capability vector of each model in the model library. Candidate models are then selected from the model library whose capability vectors for each task dimension have a capability value greater than or equal to the corresponding dimension value of the problem capability vector. For example, if the problem capability vector is [Text Comprehension: 70, Code Generation: 85, Logical Reasoning: 75, Multi-Turn Dialogue: 60], then based on the model capability vectors of each model in the model library, candidate models can be selected whose capability values ​​for text comprehension are greater than or equal to 70, code generation are greater than or equal to 85, logical reasoning is greater than or equal to 75, and multi-turn dialogue is greater than or equal to 60.

[0034] These selected candidate models can be understood as models that are capable of handling or solving text problems.

[0035] Step 103: Based on the calling cost of each candidate model, schedule the target model with the lowest cost from at least one candidate model. The target model is used to determine the target response information for the question text.

[0036] In this step, the model library also stores the calling cost of each model. The calling cost includes economic cost and time cost. The economic cost may include, for example, the application programming interface (API) call fee and the computational overhead of locally deployed model. The time cost may include, for example, the estimated response time.

[0037] When determining the target model, one possible implementation is to pre-assign corresponding weights to economic and time costs, thereby determining the overall scheduling cost of each model through a weighted average. After selecting candidate models, the model with the lowest scheduling cost among all candidate models can be identified as the target model.

[0038] In another possible implementation, the target model with the lowest scheduling cost corresponding to the user's pre-set preferences, such as the fastest response or the lowest cost, can also be determined.

[0039] Furthermore, by inputting the question text into the determined target model, the target response information corresponding to the question text can be determined.

[0040] The model scheduling method based on capability vector matching provided in this invention converts the input question text into a question capability vector. Based on the question capability vector, at least one candidate model is matched from a model library, which includes multiple models and their respective capability vectors. Each capability vector includes the capability values ​​of the corresponding model across multiple task dimensions. The capability values ​​of each task dimension in the capability vector of the candidate model are greater than or equal to the dimension value of the question capability vector in the corresponding task dimension. Based on the calling cost of each candidate model, the target model with the lowest calling cost is scheduled from the at least one candidate model. This target model is used to determine the target response information for the question text. On the one hand, by adopting a quantitative matching method between the question capability vector and the model capability vector, it can be ensured that the capability values ​​of the selected model in each task dimension can cover the question requirements, avoiding response quality problems caused by insufficient model capability. Furthermore, it solves the resource waste problem caused by excessive calling of high-performance models, effectively reducing computational resources. This allows for refined and dynamic matching of question complexity and model multi-dimensional capabilities, achieving precise model scheduling. On the other hand, by selecting the target model with the lowest scheduling cost from the candidate models that meet the capability requirements, the cost of model scheduling can be reduced.

[0041] Using the capability vector matching-based model scheduling method provided in this embodiment of the invention, in a real-world test scenario, this invention can reduce the average cost per call by about 40% and shorten the response time by about 30% compared to the fixed method of calling a single high-performance large model, while ensuring the quality of the response (evaluation score > 0.85). This invention has good scalability and cost control capabilities.

[0042] For example, based on the above embodiments, the conversion of the input question text into a question capability vector can be performed in the following manner: The question text is segmented into words to obtain multiple words. These multiple words are then input into a large language model to obtain a semantic representation vector output by the large language model. This semantic representation vector is then input into a vector transformation model to obtain a question capability vector output by the vector transformation model. The vector transformation model is trained based on contrastive learning loss or triplet loss.

[0043] Specifically, the question text can be tokenized to obtain multiple tokens. If the length of the question text exceeds a preset threshold, such as 512 tokens, it can be truncated or segmented to ensure that it matches the input length of the encoder.

[0044] In this embodiment, a collaborative encoding approach using a large language model and a vector transformation model can be employed to convert word segments into question capability vectors. The vector transformation model is typically a lightweight network, such as a small Transformer or a Transformer-based Bidirectional Encoder Representations from Transformers (BERT)-tiny.

[0045] For example, multiple word segments are input into a large language model, and the last hidden state of the large language model is obtained as a semantic representation vector. This semantic representation vector contains rich semantic information in the question text, which can be understood as a general and deep feature representation of the question text.

[0046] Furthermore, the semantic representation vector can be input into a pre-trained vector transformation model. After processing through multiple attention and fully connected layers, the vector transformation model will extract the semantic features of the problem and output a fixed-dimensional problem capability vector. The dimension of this problem capability vector is consistent with the dimension of the model capability vectors of each model stored in the model library.

[0047] By using a co-coding approach combining a large language model and a vector transformation model, the semantic representation vectors in the question text are extracted using the large language model, and the vector transformation model is used to convert the semantic representation vectors into question capability vectors. This approach can fully utilize the deep semantic understanding capabilities of the large language model to ensure the accuracy of the question capability vectors, while also ensuring the efficiency and low cost of the conversion through the lightweight vector transformation model.

[0048] For example, the vector transformation model described above is trained in the following manner: Multiple triplet training samples are obtained. Each triplet training sample includes a sample question, a sample response, and an evaluation result of the sample response. For each triplet training sample, a sample model for generating the sample response in the triplet training sample is determined, and the sample model capability vector of the sample model is determined. The sample question in the triplet training sample is input into a large language model to obtain the sample semantic representation vector output by the large language model. The sample semantic representation vector is input into an initial vector transformation model to obtain the prediction question capability vector output by the initial vector transformation model. Based on the evaluation result, sample model capability vector, and prediction question capability vector corresponding to each triplet training sample, loss information is determined. The loss information includes contrastive learning loss or triplet loss. Based on the loss information, the model parameters of the initial vector transformation model are adjusted to obtain the vector transformation model.

[0049] Specifically, Figure 2 This is a schematic diagram illustrating the training process of a vector transformation model, as shown below. Figure 2As shown, each triplet training sample includes a sample question, a sample response, and an evaluation result of the sample response. The sample response is the response content obtained after inputting the sample question into different sample models. The evaluation result can be used to evaluate the quality of the sample response. The evaluation result can be manually labeled or obtained by inputting the sample question and sample response into an evaluation model to evaluate whether the quality of the sample response is acceptable.

[0050] For each triple training sample, it is necessary to determine the sample model that generates the sample response in the triple training sample, and determine the sample model capability vector of the sample model, where the sample model capability vector includes the capability values ​​of the sample model in multiple task dimensions.

[0051] During model training, the parameters of the large language model are fixed, and only a lightweight vector transformation model is trained. By inputting the sample questions from each triplet training sample into the large language model and controlling the prompt words, the output format of the large language model is designed to obtain interpretable output content. This allows the sample semantic representation vectors output by the large language model to be used for supervised learning of the initial vector transformation model. The sample semantic representation vectors are then input into the initial vector transformation model to obtain the prediction question capability vector output by the initial vector transformation model.

[0052] Furthermore, based on the evaluation results of each sample response, contrastive learning loss or triplet loss can be used to construct loss information. For contrastive learning loss, if the evaluation result of a sample response in a triplet training sample is that it can correctly answer the question, it means that the sample model capability vector of the sample model that generated the sample response should not be less than the predicted question capability vector, and this sample model capability vector and the predicted question capability vector are constructed as a positive sample pair. If the evaluation result of a sample response in a triplet training sample is that it cannot correctly answer the question, it means that the sample model capability vector of the sample model that generated the sample response is less than the predicted question capability vector, and this sample model capability vector and the predicted question capability vector are constructed as a negative sample pair. Then, the Euclidean distance between the two vectors in the positive and negative sample pairs is calculated using the loss function. For positive sample pairs, by minimizing this distance, the sample model capability vector of the sample model that can correctly answer the sample question is encouraged to be close to or greater than the predicted question capability vector. For negative sample pairs, by maximizing this distance or ensuring that the distance exceeds a preset safety boundary value, the predicted question capability vector is kept away from the sample model capability vector of the sample model that cannot correctly answer the sample question.

[0053] For triplet loss, a triplet needs to be constructed for each sample question. The triplet includes an anchor, a positive example, and a negative example. The anchor is the predicted question capability vector for that sample question; the positive example is the capability vector of a sample model known to correctly answer the sample question; and the negative example is the capability vector of a sample model known to fail to correctly answer the sample question. The goal of the loss function is not to directly optimize the absolute distance, but rather to enforce a relative order. That is, the distance between the anchor and the negative example must be significantly greater than the distance between the anchor and the positive example, with the difference exceeding a preset boundary value. In this way, all sample models that can correctly answer the sample question are clustered in the nearest neighbor region of the predicted question capability vector, while sample models that cannot correctly answer the sample question are excluded.

[0054] After determining the loss information, the model parameters of the initial vector transformation model are adjusted based on the loss information. The above process is repeated until the loss is minimized or the number of iterations reaches the preset number, and the final model is determined as the vector transformation model.

[0055] In addition, during practical application, user feedback and evaluation results on the responses will be continuously collected, and user questions, responses, and evaluation results will be used as new training samples. Based on these new training samples, the vector transformation model will be updated regularly to achieve continuous evolution of the system's capabilities.

[0056] In the above approach, by freezing the parameters of the large language model and training only the vector transformation model, the computational complexity and resource consumption of model training can be reduced. Furthermore, during model training, using contrastive learning loss or triplet loss allows the model to learn the key relative order of relationships between sample vectors, rather than simply fitting absolute values, thus giving the trained vector transformation model stronger generalization ability and robustness.

[0057] For example, the model capability vector of each model in the model library can be determined in the following way: Multiple datasets are obtained for each task dimension. For each model in the model library, the model is tested using multiple datasets for each task dimension to obtain the model's performance score on each dataset. Based on the model's performance score on each dataset for each task dimension, the model's capability value for each task dimension is determined. Based on the model's capability value for each task dimension, the model capability vector is determined.

[0058] Specifically, taking task dimensions including text understanding, code generation, logical reasoning, and multi-turn dialogue as examples, several representative public datasets are selected for each task dimension. For example, the datasets for text understanding include SquAD and RACE; the datasets for code generation include HumanEval and MBPP; the datasets for logical reasoning include GSM8K and LogiQA; and the datasets for multi-turn dialogue include MultiWOZ and DailyDialog. Testing each model using these datasets ensures the objectivity and fairness of the test results.

[0059] For each model in the model library, tests are conducted sequentially across each task dimension. For each task dimension, the model is run on all selected datasets corresponding to that task dimension, and test results are obtained on each dataset. Various evaluation metrics of the test results are recorded, such as accuracy, F1 score, and BLEU score.

[0060] For each dataset, the performance score can be determined based on the results of each evaluation indicator under that dataset. For example, the result of a representative evaluation indicator can be used as the performance score of the dataset, or the performance score of the dataset can be determined by weighted averaging based on the results of each evaluation indicator and the corresponding weight of each evaluation indicator.

[0061] Since there are multiple datasets under a task dimension, we can pre-assign corresponding weights to each dataset based on its size under that task dimension, its relevance to the business scenario, and the reliability of the evaluation metrics. In this way, we can obtain the capability value under that task dimension by performing a weighted sum or weighted average based on the performance score of each dataset and its corresponding weight.

[0062] By combining the capability values ​​of a model across all task dimensions into a multi-dimensional vector, we can obtain the model capability vector, where the value of each dimension represents the quantized performance level of the model in the corresponding task dimension.

[0063] For newly added models in the model library, a small dataset can be used to quickly test the model to estimate its capability vector. Subsequently, the capability vector can be dynamically adjusted based on the actual usage of the model.

[0064] In the above approach, the performance level of the model in each task dimension is tested using datasets from each task dimension, thereby obtaining the model capability vector. This enables the standardization, quantification, and unified representation of the capabilities of heterogeneous models, providing a reliable data foundation for subsequent accurate and objective model matching.

[0065] For example, based on the above embodiments, when selecting the target model with the lowest cost from at least one alternative model based on the calling cost of each alternative model, the user's input cost preference can be obtained. Based on the cost preference and the calling cost of each alternative model, the comprehensive cost of each alternative model can be determined. Thus, based on the comprehensive cost of each alternative model, the alternative model with the lowest comprehensive cost can be determined as the target model.

[0066] Specifically, users can pre-set cost preferences, such as "lowest cost," "fastest response," or "equilibrium mode." Based on these user-defined cost preferences, the relative weights of economic cost and time cost in the call cost of each alternative model can be dynamically adjusted. For example, if the cost preference is "lowest cost," the weight of economic cost is higher, and the weight of time cost is lower. In one implementation, the weight of economic cost can be set to the maximum, and the weight of time cost can be set to 0. If the cost preference is "fastest response," the weight of time cost is higher, and the weight of economic cost is lower. In one implementation, the weight of time cost can be set to the maximum, and the weight of economic cost can be set to 0.

[0067] After determining the relative weights of economic and time costs, the overall cost of each candidate model can be determined by weighting the economic and time costs of each model and their respective weights. By comparing the overall costs of the candidate models, the candidate model with the lowest overall cost is selected as the target model.

[0068] The model's call cost can be dynamically updated based on market prices and system load to maintain the economic efficiency of the scheduling strategy.

[0069] In this embodiment, the comprehensive cost of each alternative model can be determined based on cost preferences and the calling cost of each alternative model. Then, the target model can be determined based on the comprehensive cost of each alternative model. By quantifying the user's subjective preferences into calculable weight parameters and integrating them with the calling cost, it is possible not only to accurately adapt to the user's needs in different scenarios, but also to achieve optimal allocation of system resources while ensuring the processing quality of the problem text.

[0070] For example, based on the above embodiments, after inputting the question text into the target model and obtaining the response information, the response information can also be input into the text quality assessment model to obtain the quality score output by the text quality assessment model. The text quality assessment model is trained based on sample text and score labels. If the quality score is greater than or equal to a preset score, the response information is determined as the target response information.

[0071] Specifically, the pre-trained text quality assessment model can be, for example, a BERT-based text quality discrimination model. This model is trained using a large number of sample texts and their score labels. The sample texts include responses covering various quality issues, and the score labels represent the overall quality score of the sample text. Through supervised learning, the trained text quality assessment model can determine whether a response to a text is accurate and fluent. This model can be fine-tuned online based on manually annotated high-quality responses to improve its discrimination ability.

[0072] After identifying the target model, the question text is input into the target model to obtain the response information output by the target model. Further, the quality of this response information needs to be evaluated. For example, the response information can be input into a text quality assessment model to obtain a quality score output by the model. This quality score can be used to characterize the quality of the response information.

[0073] The quality score is compared with the preset score. If the quality score is greater than or equal to the preset score, it means that the quality of the response information meets the standard. The response information can then be used as the final target response information and output to the user.

[0074] By evaluating the quality of responses, the final target response is only returned to the user if the response meets the quality standards, thus improving the accuracy of the target response.

[0075] If the quality score is less than the preset score, the system will obtain adjustment suggestions from the response information output by the text quality assessment model. Based on the adjustment suggestions, the system will update the dimension values ​​of all or part of the task dimensions in the question capability vector to obtain the updated question capability vector. Based on the updated question capability vector, the system will rematch at least one new candidate model from the model library and determine a new target model based on the calling cost of at least one new candidate model.

[0076] Specifically, if the quality score is lower than the preset score, it indicates that the quality of the response is substandard. In this case, the text quality assessment model will output not only the quality score but also adjustment suggestions for the response, such as "the response needs stronger logical reasoning ability" or "there are misunderstandings in the text and incorrect use of technical terms." These adjustment suggestions include potentially underestimated capability dimensions in the problem capability vector.

[0077] After receiving adjustment suggestions, the dimensional values ​​of all or some task dimensions in the problem capability vector can be updated based on these suggestions. For example, regarding the adjustment suggestion that "stronger logical reasoning ability is needed in response information," the dimensional value of the logical reasoning task dimension in the problem capability vector can be increased by 0.5. When updating the dimensional values ​​of all or some task dimensions, a preset step size can be used. This preset step size can be set based on actual conditions or experience, for example, it can be set to 0.5-1.

[0078] The updated problem capability vector is used as the new filtering condition, and the model matching and scheduling process is re-executed. That is, at least one new candidate model is matched from the model library. The capability values ​​of each task dimension in the model capability vector of these new candidate models are greater than or equal to the dimension values ​​of the updated problem capability vector in the corresponding task dimension.

[0079] After selecting new alternative models, the target model with the lowest scheduling cost will be determined based on the invocation cost of each new alternative model and the user's cost preferences.

[0080] The above process is repeated until the quality score of the response information generated by the determined target model is greater than or equal to the preset score, or until the preset number of repetitions is reached.

[0081] In this embodiment, when the quality score is less than the preset score, the dimension values ​​of all or part of the task dimensions in the problem capability vector can be updated based on the adjustment suggestions in the response information. Based on the updated problem capability vector, new alternative models and target models can be re-determined. This allows the construction of a dynamic and adaptive quality closed-loop control mechanism, which can automatically diagnose capability matching deviations and accurately improve the evaluation criteria for the problem capability vector, thereby improving the accuracy of the determined target model and enhancing the quality of the response information.

[0082] For example, based on the above embodiments, when matching at least one candidate model from the model library based on the problem capability vector, the problem capability vector can be used as a query vector to search in a pre-built vector index to obtain the target model capability vector. The value of each dimension of the target model capability vector is greater than or equal to the value of each dimension corresponding to the query vector. The vector index is constructed based on the model capability vectors of all models in the model library, and the model corresponding to the target model capability vector is determined as the candidate model.

[0083] Specifically, in practical applications, the number of models in the model library may be large, and selecting candidate models by comparing them one by one may be inefficient. To solve this problem, vector databases such as Faiss or Milvus can be used to index the model capability vectors of all models, resulting in a vector index.

[0084] When selecting candidate models, the generated problem capability vector is used as the query vector and searched in the pre-built vector index to find all target model capability vectors whose values ​​in each dimension of the vector space are greater than or equal to the dimension value of the corresponding dimension of the query vector. The models corresponding to these target model capability vectors are then identified as candidate models.

[0085] In this embodiment, vector search can reduce the computational complexity of the model matching process, increase the screening speed of candidate models, and effectively reduce the computational resource consumption caused by full comparison.

[0086] The capability vector matching-based model scheduling device provided by the present invention will be described below. The capability vector matching-based model scheduling device described below can be referred to in correspondence with the capability vector matching-based model scheduling method described above.

[0087] Figure 3 This is a schematic diagram of the structure of the model scheduling device based on capability vector matching provided in an embodiment of the present invention, as shown below. Figure 3 As shown, the capability vector matching-based model scheduling device 300 includes: Transformation module 11 is used to convert the input question text into a question capability vector; The matching module 12 is used to match at least one candidate model from the model library based on the problem capability vector. The model library includes multiple models and model capability vectors of each model. The model capability vector includes the capability values ​​of the corresponding model in multiple task dimensions. The capability values ​​of each task dimension in the model capability vector of the candidate model are greater than or equal to the dimension value of the problem capability vector in the corresponding task dimension. The scheduling module 13 is used to schedule the target model with the lowest cost from at least one of the candidate models based on the calling cost of each of the candidate models, the target model being used to determine the target response information for the question text.

[0088] In one example embodiment, the conversion module 11 is specifically used for: The question text is segmented into multiple words. Multiple word segments are input into a large language model to obtain the semantic representation vector output by the large language model; The semantic representation vector is input into the vector transformation model to obtain the problem capability vector output by the vector transformation model. The vector transformation model is trained based on contrastive learning loss or triplet loss.

[0089] In one example embodiment, the vector transformation model is trained based on the following method: Obtain multiple triplet training samples, each triplet training sample including a sample question, a sample response, and an evaluation result of the sample response; For each of the triplet training samples, determine the sample model that generates the sample response in the triplet training samples, and determine the sample model capability vector of the sample model; The sample questions in the triplet training samples are input into the large language model to obtain the sample semantic representation vector output by the large language model; The sample semantic representation vector is input into the initial vector transformation model to obtain the prediction problem capability vector output by the initial vector transformation model; Based on the evaluation results corresponding to each triplet training sample, the sample model capability vector, and the prediction problem capability vector, loss information is determined, including contrastive learning loss or triplet loss. Based on the loss information, the model parameters of the initial vector transformation model are adjusted to obtain the vector transformation model.

[0090] In one example embodiment, the apparatus further includes an acquisition module, a testing module, and a determination module, wherein: The acquisition module is used to acquire multiple datasets under each of the aforementioned task dimensions; The testing module is used to test each model in the model library, for each task dimension, using multiple datasets under the task dimension, and to obtain the performance score of the model on each dataset. The determination module is used to determine the capability value of the model in the task dimension based on the model's performance scores on each dataset in the task dimension; The determination module is used to determine the model capability vector of the model based on the model's capability values ​​in each of the task dimensions.

[0091] In one example embodiment, the scheduling module 13 is specifically used for: Cost preferences for obtaining user input; Based on the cost preference and the invocation cost of each of the candidate models, determine the comprehensive cost of each of the candidate models; Based on the comprehensive cost of each of the candidate models, the candidate model with the lowest comprehensive cost is determined as the target model.

[0092] In one example embodiment, the device further includes an input module, wherein: The input module is used to input the question text into the target model and obtain the response information; The input module is also used to input the response information into the text quality assessment model to obtain the quality score output by the text quality assessment model, wherein the text quality assessment model is trained based on sample text and score labels; The determination module is used to determine the response information as the target response information when the quality score is greater than or equal to a preset score.

[0093] In one example embodiment, the device further includes an update module, wherein: The acquisition module is used to acquire adjustment suggestions for the response information output by the text quality assessment model when the quality score is less than the preset score; The update module is used to update the dimension values ​​of all or part of the task dimensions in the problem capability vector based on the adjustment suggestions, so as to obtain the updated problem capability vector. The matching module 12 is also used to rematch at least one new candidate model from the model library based on the updated problem capability vector; The determination module is also used to determine a new target model based on the invocation cost of at least one of the new alternative models.

[0094] In one example embodiment, the matching module 12 is specifically used for: The problem capability vector is used as a query vector and searched in a pre-built vector index to obtain the target model capability vector. The value of each dimension of the target model capability vector is greater than or equal to the value of each dimension corresponding to the query vector. The vector index is constructed based on the model capability vectors of all models in the model library. The model corresponding to the target model capability vector is determined as the candidate model.

[0095] The apparatus of this embodiment can be used in any embodiment of the model scheduling method based on capability vector matching. Its specific implementation process and technical effects are similar to those in the model scheduling method based on capability vector matching. For details, please refer to the detailed description in the model scheduling method based on capability vector matching, which will not be repeated here.

[0096] Figure 4 This is a schematic diagram of the physical structure of an electronic device provided in an embodiment of the present invention, such as... Figure 4As shown, the electronic device may include: a processor 410, a communications interface 420, a memory 430, and a communication bus 440, wherein the processor 410, the communications interface 420, and the memory 430 communicate with each other through the communication bus 440. The processor 410 can call logical instructions in the memory 430 to execute a model scheduling method based on capability vector matching. This method includes: converting the input question text into a question capability vector; matching at least one candidate model from a model library based on the question capability vector, the model library including multiple models and model capability vectors of each model, the model capability vector including capability values ​​of the corresponding model in multiple task dimensions, wherein the capability values ​​of each task dimension in the model capability vector of the candidate model are all greater than or equal to the dimension value of the question capability vector in the corresponding task dimension; and scheduling the target model with the lowest cost from at least one candidate model based on the calling cost of each candidate model, the target model being used to determine the target response information for the question text.

[0097] Furthermore, the logical instructions in the aforementioned memory 430 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0098] On the other hand, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can execute the model scheduling method based on capability vector matching provided by the above methods. The method includes: converting input question text into a question capability vector; matching at least one candidate model from a model library based on the question capability vector, wherein the model library includes multiple models and model capability vectors of each model, the model capability vectors including capability values ​​of the corresponding model in multiple task dimensions, and the capability values ​​of each task dimension in the model capability vectors of the candidate models are all greater than or equal to the dimension value of the question capability vector in the corresponding task dimension; and scheduling the target model with the lowest cost from at least one candidate model based on the calling cost of each candidate model, wherein the target model is used to determine the target response information of the question text.

[0099] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements a model scheduling method based on capability vector matching provided by the above methods. The method includes: converting an input question text into a question capability vector; matching at least one candidate model from a model library based on the question capability vector, the model library including multiple models and model capability vectors of each model, the model capability vector including capability values ​​of the corresponding model in multiple task dimensions, wherein the capability values ​​of each task dimension in the model capability vector of the candidate model are all greater than or equal to the dimension value of the question capability vector in the corresponding task dimension; and scheduling a target model with the lowest cost from at least one candidate model based on the calling cost of each candidate model, the target model being used to determine the target response information of the question text.

[0100] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0101] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0102] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A model scheduling method based on capability vector matching, characterized in that, include: Convert the input question text into a question capability vector; Based on the problem capability vector, at least one candidate model is matched from the model library. The model library includes multiple models and the model capability vector of each model. The model capability vector includes the capability values ​​of the corresponding model under multiple task dimensions. The capability values ​​of each task dimension in the model capability vector of the candidate model are greater than or equal to the dimension value of the problem capability vector in the corresponding task dimension. Based on the invocation cost of each of the candidate models, the target model with the lowest cost is scheduled from at least one of the candidate models, and the target model is used to determine the target response information for the question text.

2. The model scheduling method based on capability vector matching according to claim 1, characterized in that, The process of converting the input question text into a question capability vector includes: The question text is segmented into multiple words. Multiple word segments are input into a large language model to obtain the semantic representation vector output by the large language model; The semantic representation vector is input into the vector transformation model to obtain the problem capability vector output by the vector transformation model. The vector transformation model is trained based on contrastive learning loss or triplet loss.

3. The model scheduling method based on capability vector matching according to claim 2, characterized in that, The vector transformation model was trained in the following manner: Obtain multiple triplet training samples, each triplet training sample including a sample question, a sample response, and an evaluation result of the sample response; For each of the triplet training samples, determine the sample model that generates the sample response in the triplet training samples, and determine the sample model capability vector of the sample model; The sample questions in the triplet training samples are input into the large language model to obtain the sample semantic representation vector output by the large language model; The sample semantic representation vector is input into the initial vector transformation model to obtain the prediction problem capability vector output by the initial vector transformation model; Based on the evaluation results corresponding to each triplet training sample, the sample model capability vector, and the prediction problem capability vector, loss information is determined, including contrastive learning loss or triplet loss. Based on the loss information, the model parameters of the initial vector transformation model are adjusted to obtain the vector transformation model.

4. The model scheduling method based on capability vector matching according to claim 1, characterized in that, The method further includes: Obtain multiple datasets under each of the aforementioned task dimensions; For each model in the model library, for each task dimension, the model is tested using multiple datasets under the task dimension to obtain the performance score of the model on each dataset. Based on the performance scores of the model on each dataset under the task dimension, the capability value of the model under the task dimension is determined. Based on the model's capability values ​​in each of the task dimensions, the model capability vector is determined.

5. The model scheduling method based on capability vector matching according to claim 1, characterized in that, The step of scheduling the target model with the lowest call cost from at least one of the candidate models based on the call cost of each of the candidate models includes: Cost preferences for obtaining user input; Based on the cost preference and the invocation cost of each of the candidate models, determine the comprehensive cost of each of the candidate models; Based on the comprehensive cost of each of the candidate models, the candidate model with the lowest comprehensive cost is determined as the target model.

6. The model scheduling method based on capability vector matching according to claim 1, characterized in that, The method further includes: The question text is input into the target model to obtain the response information; The response information is input into the text quality assessment model to obtain the quality score output by the text quality assessment model, which is trained based on sample text and score labels. If the quality score is greater than or equal to a preset score, the response information is determined as the target response information.

7. The model scheduling method based on capability vector matching according to claim 6, characterized in that, The method further includes: If the quality score is less than the preset score, obtain adjustment suggestions for the response information output by the text quality assessment model; Based on the adjustment suggestions, update the dimension values ​​of all or part of the task dimensions in the problem capability vector to obtain the updated problem capability vector; Based on the updated problem capability vector, at least one new candidate model is rematched from the model library; A new target model is determined based on the invocation cost of at least one of the new alternative models.

8. The model scheduling method based on capability vector matching according to any one of claims 1-7, characterized in that, The step of matching at least one candidate model from the model library based on the problem capability vector includes: The problem capability vector is used as a query vector and searched in a pre-built vector index to obtain the target model capability vector. The value of each dimension of the target model capability vector is greater than or equal to the value of each dimension corresponding to the query vector. The vector index is constructed based on the model capability vectors of all models in the model library. The model corresponding to the target model capability vector is determined as the candidate model.

9. A model scheduling device based on capability vector matching, characterized in that, include: The conversion module is used to convert the input question text into a question capability vector; A matching module is used to match at least one candidate model from a model library based on the problem capability vector. The model library includes multiple models and model capability vectors of each model. The model capability vectors include capability values ​​of the corresponding model in multiple task dimensions. The capability values ​​of each task dimension in the model capability vectors of the candidate models are all greater than or equal to the dimension value of the problem capability vector in the corresponding task dimension. A scheduling module is used to schedule the target model with the lowest cost from at least one of the candidate models based on the calling cost of each of the candidate models, the target model being used to determine the target response information for the question text.

10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the capability vector matching-based model scheduling method as described in any one of claims 1 to 8.