Method and apparatus for synergistically adjusting artificial intelligence dataset quality and model performance
By iterative operations and correlation analysis between dataset quality and model performance, the dataset quality is dynamically optimized, solving the problem that dataset evaluation is independent of model optimization. This achieves collaborative optimization of the dataset and the model, improving model performance and stability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA ACADEMY OF INFORMATION & COMM
- Filing Date
- 2025-11-21
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, dataset quality assessment is independent of the model training and optimization process, resulting in dataset optimization lagging behind model requirements and making it difficult to achieve collaborative optimization of datasets and models.
Through iterative operations, combined with model building requirements and dataset quality target parameters, a preprocessed dataset is constructed. By analyzing the correlation between dataset quality and model performance, the dataset quality is dynamically optimized to improve model performance. Reinforcement learning, federated learning, and edge computing technologies are used for data collection and processing. A dataset quality feedback matrix and a model performance quantification matrix are designed for evaluation.
It achieves synergistic optimization of dataset quality and model performance, improves model performance and stability, reduces performance fluctuations caused by data quality issues, and ensures high dataset quality and security.
Smart Images

Figure CN121256508B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, for example to a method and apparatus for collaboratively adjusting the quality of artificial intelligence datasets and model performance. Background Technology
[0002] In the era of large models, high-quality datasets are crucial for building accurate, efficient, and reliable models in artificial intelligence. High-quality datasets can not only improve the generalization ability of models, but also reduce the risk of overfitting, making artificial intelligence systems more reliable and robust.
[0003] In related technologies, the quality of a dataset is assessed based on preset static indicators, such as data integrity, accuracy, and consistency.
[0004] In the process of implementing the embodiments of this disclosure, at least the following problems were found in the related art:
[0005] In related technologies, dataset quality assessment is independent of the model training and optimization process. Dataset optimization often lags behind the needs of the model, making it difficult to achieve collaborative optimization between the dataset and the model.
[0006] It should be noted that the information disclosed in the background section above is only used to enhance the understanding of the background of this application, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention
[0007] To provide a basic understanding of some aspects of the disclosed embodiments, a brief summary is given below. This summary is not intended as a general commentary, nor is it intended to identify key / important components or describe the scope of protection of these embodiments, but rather as a prelude to the detailed description that follows.
[0008] This disclosure provides a method and apparatus for synergistically adjusting the quality of artificial intelligence datasets and model performance to achieve synergistic optimization of dataset quality and model performance.
[0009] In some embodiments, the method for collaboratively adjusting the quality of an artificial intelligence dataset and the performance of a model includes: one or more iterative operations; the iterative operations include: constructing a preprocessed dataset according to model construction requirements and target parameters of dataset quality indicators; performing quality assessment on the preprocessed dataset to obtain dataset quality indicator assessment parameters, and determining a target dataset that passes the quality assessment based on the dataset quality indicator assessment parameters; training a model using the target dataset to obtain a pre-trained model, and performing performance assessment on the pre-trained model to obtain model performance indicator assessment parameters; if the model performance indicator assessment parameters do not meet a preset standard, performing a correlation analysis between dataset quality and model performance based on the model performance indicator assessment parameters, and generating updated target parameters of dataset quality indicators based on the analysis results; wherein, the convergence condition of the iterative operations is that the model performance indicator assessment parameters reach a preset standard, and the target artificial intelligence model and target artificial intelligence dataset corresponding to the model performance indicator assessment parameters that meet the preset standard are obtained.
[0010] Optionally, based on the model construction requirements and the target parameters of the dataset quality indicators, a preprocessed dataset is constructed, including: conducting a requirements analysis based on the model construction requirements and the target parameters of the dataset quality indicators, extracting requirements information, and constructing a knowledge graph based on the requirements information; determining the target data structure based on the structured requirements indicators in the knowledge graph and the constraints of the knowledge graph, using reinforcement learning; collecting multi-source heterogeneous data that conforms to the target data structure based on a distributed privacy-preserving data collection technology that integrates federated learning and edge computing; and performing data processing, data annotation, and data augmentation on the multi-source heterogeneous data to obtain the preprocessed dataset.
[0011] Optionally, the preprocessed dataset is quality-assessed to obtain dataset quality metric evaluation parameters, including: designing a dataset quality feedback matrix based on the preprocessed dataset; evaluating the quality of the preprocessed dataset based on the dataset quality feedback matrix to determine the dataset quality feedback matrix parameters; and determining the dataset quality metric evaluation parameters based on the dataset quality feedback matrix parameters; wherein the dataset quality feedback matrix is represented by the following formula:
[0012] X∈R A×B×C×D
[0013] Where X=[M i S j I k Q lLet be the four-dimensional tensor for dataset quality quantization; M be the set of dataset modalities, A be the number of dataset modality types, and i be the index of the dataset modality, i=1,2,...,A; S be the set of model usage stages, B be the number of model usage stage types, and j be the index of the model usage stage, j=1,2,...,B; I be the set of dataset application scenarios, C be the number of dataset application scenario types, and k be the index of the dataset application scenario, k=1,2,...,C; Q be the set of dataset quality quantization metrics, D be the number of dataset quality quantization metric types, and l be the index of the dataset quality quantization metric, l=1,2,...,D.
[0014] Optionally, the preprocessed dataset is evaluated based on the dataset quality feedback matrix to determine the parameters of the dataset quality feedback matrix. This includes: determining the target matrix parameters to be evaluated in the dataset quality feedback matrix according to the evaluation target and the preprocessed dataset; dividing the evaluation task into single-point tasks and distributed tasks according to the size and complexity of the preprocessed dataset; evaluating the preprocessed data according to each of the divided tasks, generating parameter values corresponding to each target matrix parameter, and obtaining the dataset quality feedback matrix parameters.
[0015] Optionally, methods for collaboratively adjusting the quality of AI datasets and model performance also include: when the dataset quality indicator evaluation parameters fail the quality assessment, performing anomaly analysis on the dataset quality feedback matrix parameters based on the assessment results; determining the target dataset quality feedback matrix parameters based on the anomaly analysis results; and reconstructing the preprocessed dataset based on the target dataset quality feedback matrix parameters.
[0016] Optionally, the pre-trained model is evaluated to obtain model performance evaluation parameters, including: designing a model performance quantization matrix based on the pre-trained model; evaluating the performance of the pre-trained model based on the model performance quantization matrix to determine the model performance quantization matrix parameters; and determining the model performance evaluation parameters based on the model performance quantization matrix parameters; wherein the model performance quantization matrix is represented by the following formula:
[0017] Y∈R E×F
[0018] Where, Y=[y mn ] represents the magnitude of the nth objective quantitative indicator of the mth capability sub-item; E represents the number of capability sub-item types, m=1, 2, ..., E; F represents the number of objective quantitative indicator types, n=1, 2, ..., F.
[0019] Optionally, a correlation analysis between dataset quality and model performance is performed based on the model performance index evaluation parameters, and updated dataset quality index target parameters are generated based on the analysis results. This includes: performing a correlation analysis on the model performance index evaluation parameters based on the correlation analysis model to determine the dataset quality index that affects the model performance index; updating the determined dataset quality index based on the model performance index evaluation parameters to obtain the updated dataset quality index target parameters; wherein, the correlation analysis model can determine the mapping relationship between dataset quality index and model performance index.
[0020] Optionally, the association analysis model is trained as follows: collect model performance index evaluation parameters and corresponding dataset quality index evaluation parameters that meet the preset standards from historical data, and perform feature alignment and normalization on the collected data; construct an initial state based on the feature-aligned and normalized model performance index data and dataset quality index data; adjust the dataset quality index data in the initial state, and use the association analysis model to predict new model performance index data based on the adjusted dataset quality index data; adjust the parameters of the association analysis model according to the changes in the model performance index data.
[0021] In some embodiments, the apparatus for collaboratively adjusting the quality of an artificial intelligence dataset and the performance of a model includes: a data engineering module configured to construct a preprocessed dataset according to model construction requirements and target parameters of dataset quality indicators; a dataset quality assessment module configured to assess the quality of the preprocessed dataset, obtain dataset quality indicator assessment parameters, and determine a target dataset that passes the quality assessment based on the dataset quality indicator assessment parameters; a model performance assessment module configured to train a model using the target dataset, obtain a pre-trained model, and assess the performance of the pre-trained model to obtain model performance indicator assessment parameters; further configured to output the target artificial intelligence model and the target artificial intelligence dataset corresponding to the model performance indicator assessment parameters that meet the preset standard when the model performance indicator assessment parameters meet the preset standard; and a correlation analysis module configured to perform correlation analysis between dataset quality and model performance based on the model performance indicator assessment parameters when the model performance indicator assessment parameters do not meet the preset standard, and generate updated target parameters of dataset quality indicators based on the analysis results.
[0022] In some embodiments, the apparatus for coordinating the quality of an artificial intelligence dataset and the performance of a model includes a processor and a memory storing program instructions, the processor being configured to, when running the program instructions, perform the method described above for coordinating the quality of an artificial intelligence dataset and the performance of a model.
[0023] The method and apparatus for collaboratively adjusting the quality of artificial intelligence datasets and model performance provided in this disclosure can achieve the following technical effects:
[0024] In this embodiment, iterative operations tightly integrate dataset quality assessment with model performance evaluation, forming a dynamic closed-loop feedback system. In each iteration, correlation analysis between dataset quality and model performance accurately identifies the root causes of data quality indicators affecting model performance, generating updated target parameters for dataset quality indicators. This allows for targeted optimization of dataset quality in the next iteration, thereby improving model performance. This ensures that dataset quality optimization and model performance improvement occur simultaneously, achieving synergistic optimization of both. Furthermore, by continuously iterating and optimizing dataset quality, the model performance indicator evaluation parameters ultimately reach preset standards, resulting in a high-quality target AI model and target AI dataset. This significantly improves the performance and stability of the target AI model, reducing performance fluctuations caused by data quality issues.
[0025] The above general description and the description below are exemplary and illustrative only and are not intended to limit this application. Attached Figure Description
[0026] One or more embodiments are illustrated by way of example with reference to the accompanying drawings. These illustrations and drawings do not constitute a limitation on the embodiments. Elements having the same reference numerals in the drawings are shown as similar elements. The drawings are not to be scaled. And wherein:
[0027] Figure 1 This is a schematic diagram illustrating a method for collaboratively adjusting the quality of an artificial intelligence dataset and the performance of a model, provided in an embodiment of this disclosure.
[0028] Figure 2 This is a flowchart of a framework for constructing a preprocessed dataset provided in an embodiment of this disclosure;
[0029] Figure 3 This is a flowchart of a framework for quality assessment of a preprocessed dataset provided in an embodiment of this disclosure;
[0030] Figure 4 This is a flowchart of a framework for performance evaluation of a pre-trained model, provided by an embodiment of this disclosure.
[0031] Figure 5 This is a flowchart of a framework for training an association analysis model provided in an embodiment of this disclosure;
[0032] Figure 6 This is a schematic diagram of an apparatus for collaboratively adjusting the quality of an artificial intelligence dataset and the performance of a model, provided in an embodiment of this disclosure;
[0033] Figure 7This is a schematic diagram of another apparatus for collaboratively adjusting the quality of artificial intelligence datasets and model performance provided in an embodiment of this disclosure. Detailed Implementation
[0034] To provide a more detailed understanding of the features and technical content of the embodiments of this disclosure, the implementation of the embodiments of this disclosure will be described in detail below with reference to the accompanying drawings. The accompanying drawings are for illustrative purposes only and are not intended to limit the embodiments of this disclosure. In the following technical description, for ease of explanation, several details are used to provide a full understanding of the disclosed embodiments. However, one or more embodiments may still be implemented without these details. In other cases, well-known structures and devices may be simplified in their depiction to simplify the drawings.
[0035] The term "correspondence" can refer to an association or binding relationship. The correspondence between A and B means that there is an association or binding relationship between A and B.
[0036] Combination Figure 1 As shown, this disclosure provides a method for collaboratively adjusting the quality of an artificial intelligence dataset and the performance of a model. The execution subject of this method can be a processor, and the method includes:
[0037] S101, the processor constructs a preprocessed dataset based on the model construction requirements and the target parameters of the dataset quality indicators.
[0038] S102, the processor performs a quality assessment on the preprocessed dataset, obtains dataset quality index assessment parameters, and determines the target dataset that passes the quality assessment based on the dataset quality index assessment parameters.
[0039] S103, the processor uses the target dataset to train the model, obtains a pre-trained model, and evaluates the performance of the pre-trained model to obtain model performance evaluation parameters.
[0040] S104, the processor determines whether the model performance evaluation parameters meet the preset standards. If not, it executes S105 and S101; if so, it executes S106.
[0041] S105, the processor performs a correlation analysis between dataset quality and model performance based on the model performance index evaluation parameters, and generates updated dataset quality index target parameters based on the analysis results.
[0042] S106, the processor obtains the target artificial intelligence model and the target artificial intelligence dataset corresponding to the model performance index evaluation parameters that meet the preset standards.
[0043] In this embodiment, iterative operations tightly integrate dataset quality assessment with model performance evaluation, forming a dynamic closed-loop feedback system. In each iteration, correlation analysis between dataset quality and model performance accurately identifies the root causes of data quality indicators affecting model performance, generating updated target parameters for dataset quality indicators. This allows for targeted optimization of dataset quality in the next iteration, thereby improving model performance. This ensures that dataset quality optimization and model performance improvement occur simultaneously, achieving synergistic optimization of both. Furthermore, by continuously iterating and optimizing dataset quality, the model performance indicator evaluation parameters ultimately reach preset standards, resulting in a high-quality target AI model and target AI dataset. This significantly improves the performance and stability of the target AI model, reducing performance fluctuations caused by data quality issues.
[0044] Optionally, based on the model construction requirements and the target parameters of the dataset quality indicators, a preprocessed dataset is constructed, including: conducting a requirements analysis based on the model construction requirements and the target parameters of the dataset quality indicators, extracting requirements information, and constructing a knowledge graph based on the requirements information; determining the target data structure based on the structured requirements indicators in the knowledge graph and the constraints of the knowledge graph, using reinforcement learning; collecting multi-source heterogeneous data that conforms to the target data structure based on a distributed privacy-preserving data collection technology that integrates federated learning and edge computing; and performing data processing, data annotation, and data augmentation on the multi-source heterogeneous data to obtain the preprocessed dataset.
[0045] In this embodiment, standardized processes and toolchains are used to achieve engineered management of artificial intelligence data. This standardizes the entire lifecycle of data collection, cleaning, labeling, storage, management, and sharing, ensuring high-quality, security, and traceability of data. It also constructs high-quality, reusable preprocessed datasets to support efficient training and evaluation testing of artificial intelligence models. Combined with... Figure 2 As shown, building a preprocessed dataset mainly includes six steps: artificial intelligence data requirements analysis, data design, data acquisition, data processing, data annotation, and data augmentation.
[0046] Optionally, a requirements analysis is performed based on the model construction requirements and the target parameters of the dataset quality indicators to extract requirement information, and a knowledge graph is constructed based on the requirement information. This includes: modeling the knowledge and concepts of relevant general basic domains and industry-specific domains based on full-domain knowledge graph construction technology to form a domain ontology knowledge graph; using a multimodal model to perform natural language processing on the model construction requirements and the target parameters of the dataset quality indicators to extract requirement information; and mapping the requirement information to the domain ontology knowledge graph.
[0047] In this embodiment, by integrating multimodal semantic understanding and knowledge graph reasoning, the intelligent demand parsing technology automatically extracts the implicit demands and multi-dimensional indicators of high-quality artificial intelligence datasets, realizing semantic understanding and structured representation of demands, thereby grasping demands more accurately.
[0048] Optionally, the multimodal model may employ a model based on a multimodal Transformer.
[0049] Optionally, model construction requirements and dataset quality metric target parameters include: natural language requirements documents, images, audio, video, time series, point clouds, structured data, multimodal files, and other modal inputs. The multimodal model can vectorize these multimodal inputs and capture semantic relationships between different modalities through a cross-modal attention mechanism.
[0050] Optionally, the full domain knowledge graph will associate and store the entities, relationships and constraints in the requirements, and use graph neural networks for knowledge reasoning to uncover implicit requirements that are not explicitly stated in the requirements document but can be deduced.
[0051] Optionally, the knowledge graph has 1000 or more nodes and 50 or more relationship types; the multimodal model has 10 billion or more parameters and can integrate 8 or more understanding modalities. In a specific embodiment, during requirements analysis, the requirements parsing accuracy is greater than or equal to 93%, the implicit requirements mining coverage is greater than or equal to 85%, and the requirements parsing response time is less than or equal to 60 seconds per document.
[0052] Optionally, based on the structured demand indicators in the knowledge graph and the constraints of the knowledge graph, the target data structure is determined, including: constructing a "demand-data structure" mapping model based on the structured demand indicators in the knowledge graph using a reinforcement learning agent; converting entity relationships in the knowledge graph into hard constraints for data structure generation; and determining the target data structure based on the "demand-data structure" mapping model and the hard constraints.
[0053] In this embodiment, a dynamic data structure adaptive generation technique based on reinforcement learning and knowledge graph constraints is designed to achieve a precise match between the required data structure and the original requirements.
[0054] Optionally, the core of the "demand-data structure" mapping model is a reinforcement learning agent, whose state space is a demand feature vector (including multi-dimensional parameters such as data type, precision, and correlation), action space is a set of data structure operations (such as adding fields, modifying data types, setting primary key constraints, defining association rules, and other operations), and reward function is the fit between the data structure and the data acquisition / processing stage (calculated by simulating data flow efficiency, storage usage, and query speed).
[0055] Optionally, entity relationships in the knowledge graph (such as "image data" needing to be associated with "label box coordinates") are transformed into hard constraints for data structure generation. The constraint satisfaction problem solver ensures that the generated data structure conforms to domain specifications (such as medical data needing to include required fields such as "patient ID" and "examination time").
[0056] Optionally, the PPO algorithm is used to train the reinforcement learning agent, and dynamic programming is used to iteratively adjust the data structure. Optimization is triggered after each round of new data samples, and automatic adaptation to mainstream data storage formats (Parquet, ORC, JSON) is supported.
[0057] Optionally, the target data structure supports at least eight data types, including complex types such as images, text, time series, point clouds, and multimodal data. The data structure operation set can define at least 30 association rules, including one-to-one, one-to-many, and many-to-many relationships. The reinforcement learning agent undergoes at least 1500 training iterations, and the data structure matches the requirements at least 96% after convergence. The dynamic adjustment of the data structure in a single round of optimization has a response time of at least 20 seconds, supports concurrent data structure design at least 10 paths, and has a storage format compatibility of at least 98%.
[0058] Optionally, based on a distributed privacy-preserving data acquisition technology that integrates federated learning and edge computing, multi-source heterogeneous data conforming to the target data structure is collected, including: deploying a lightweight federated client on edge devices, integrating a feature extraction sub-model, performing local feature encoding on the raw data, uploading feature vectors or model update parameters, and storing the raw data locally; using an improved FedAvg algorithm on the cloud server to aggregate model updates from edge nodes and generate a global data acquisition strategy model; generating a sampling strategy based on the global data acquisition strategy model, and collecting multi-source heterogeneous data conforming to the target data structure according to the sampling strategy.
[0059] In this embodiment, a distributed privacy-preserving data acquisition technology that integrates federated learning and edge computing is adopted to achieve secure aggregation of multi-source heterogeneous data and dynamic acquisition strategy optimization.
[0060] Optionally, during data acquisition, a dynamic sampling mechanism is designed using fully homomorphic encryption to protect model parameter transmission based on the Microsoft SEAL library. Feature extraction sub-models include EfficientNet-Lite (suitable for image data) and TinyLSTM (suitable for time-series data). After employing an improved FedAvg algorithm on the cloud server, adaptive weight adjustment can be introduced to dynamically allocate aggregation weights based on node data quality. The sampling strategy generated by the global data acquisition strategy model includes: key acquisition scenarios, sample selection rules, and acquisition frequency.
[0061] Optionally, the edge device node types are greater than or equal to eight, including cameras, sensors, medical devices, etc.; the single-node data acquisition latency is less than or equal to 300ms. In a specific embodiment, the federated aggregation accuracy loss can be less than or equal to 2% compared to centralized training, and the encrypted transmission throughput can be greater than or equal to 60MB / s, meeting GDPR privacy compliance requirements. The dynamic sampling strategy can improve the collection coverage of scarce samples by greater than or equal to 40%, and the heterogeneity adaptation of multi-source data can be greater than or equal to 95%.
[0062] Optionally, data processing is performed on the multi-source heterogeneous data, including: data cleaning, feature engineering, and multimodal data alignment.
[0063] In this embodiment, a data cleaning system based on deep learning and a rule engine is designed, integrating various pre-built data cleaning operators (such as deduplication, missing value imputation, and outlier detection), and supporting the dynamic design and expansion of custom operators. Then, based on an AutoML framework (such as AutoKeras), a genetic algorithm is used to dynamically generate optimal feature combinations, supporting feature extraction from images, text, time series, speech, structured data, point clouds, and multimodal data, and automatically adjusting feature engineering strategies based on model performance feedback. Finally, a cross-modal attention mechanism is employed to align data from different modalities, and combined with hard constraints on entity relationships in the knowledge graph, a graph neural network is used to enhance the cross-modal alignment effect. Through adaptive streaming processing and deep learning-driven real-time data processing technology, efficient cleaning and preliminary quality closed-loop control of dynamic data are achieved.
[0064] Optionally, the data stream processing throughput in the cleaning operator is greater than or equal to 150,000 records / second, and supports a parallel processing node count greater than or equal to 20. In a specific embodiment, the feature selection accuracy of feature engineering is greater than or equal to 90%, and the feature dimension compression rate is greater than or equal to 50%. It can achieve a cross-modal data alignment accuracy greater than or equal to 95%, supporting alignment rules for greater than or equal to 10 modalities. It can achieve a dynamic strategy adjustment response time less than or equal to 3 seconds, post-processed data integrity greater than or equal to 99%, and an outlier detection rate greater than or equal to 97%.
[0065] Optionally, data annotation is performed on multi-source heterogeneous data, including: screening active learning samples based on an uncertainty sampling strategy; after manually annotating the active learning samples, performing transfer learning on the active learning samples through a pre-annotation model, and pre-annotating unannotated samples based on the learning results; scoring the pre-annotation results, and optimizing the pre-annotation model based on the scoring results.
[0066] In this embodiment, the efficient annotation technique, which combines human-machine collaborative active learning with small-sample semi-supervised training, significantly reduces manual costs and improves annotation quality. Based on an uncertainty sampling strategy (combining prediction entropy and model confidence), active learning samples (such as classification boundary samples and fuzzy samples) that most significantly improve model performance can be selected from massive amounts of unlabeled data and assigned to manual annotation. A CLIP (Contrastive Language-Image Pre-training) multimodal pre-annotation model is used for transfer learning. Category feature prototypes are generated from a small number of labeled active learning samples (greater than or equal to 50 per class), thus pre-annotating unlabeled samples; manual verification and correction are all that is required. Finally, an annotation quality evaluation model is designed based on BERT to score data annotation consistency. The FixMatch algorithm is used to feed the manual correction results back to semi-supervised training, iteratively optimizing the pre-annotation model and forming a closed-loop technology of "screening-pre-annotation-verification-optimization".
[0067] In one specific embodiment, the intelligent annotation sample size can be greater than or equal to 70% compared to the full annotation, and the annotation accuracy can be greater than or equal to 98.5%. It can improve the efficiency of small sample pre-annotation by greater than or equal to 60%, and the annotation time per sample is less than or equal to 2 seconds. It can reduce the iteration cycle of the semi-supervised model to less than or equal to 30 minutes per round, and achieve an annotation consistency score greater than or equal to 0.95 (Kappa coefficient).
[0068] Optionally, data augmentation is performed on multi-source heterogeneous data, including: generating augmented samples consistent with the original multi-source heterogeneous data distribution using a conditional generative adversarial network (GAN); wherein, the GAN is trained using MAML (Model-Independent Meta-Learning). In this embodiment, an adaptive data augmentation technique that integrates multimodal generative AI and meta-learning achieves a balance between data diversity and distribution consistency. The GAN can generate augmented samples consistent with the original data distribution based on data category labels, modality types, and augmentation parameters. Training the GAN with MAML allows it to quickly adjust its generation strategy with only 5-8 example samples on new data types, avoiding overfitting. Furthermore, a joint evaluation mechanism of Wasserstein distance and entropy can be introduced to dynamically optimize the augmentation parameters, ensuring that the augmented data covers more edge scenarios.
[0069] Optionally, it supports augmentation of 6 or more modal types, including images, text, audio, time series, point clouds, and tables, with a single-sample augmentation time of 50ms or less. The Wasserstein distance between the augmented data and the original data is less than or equal to 0.07. This can improve the class distribution balance by 40% or more; the number of example samples for meta-learning to adapt to new data types is 8 or less. This can improve the training accuracy of the augmented model by 5% or more compared to the un-augmented model.
[0070] Optionally, the preprocessed dataset is subjected to quality evaluation to obtain dataset quality indicator evaluation parameters, including: designing a dataset quality feedback matrix based on the preprocessed dataset; evaluating the quality of the preprocessed dataset based on the dataset quality feedback matrix to determine the dataset quality feedback matrix parameters; and determining the dataset quality indicator evaluation parameters based on the dataset quality feedback matrix parameters.
[0071] In this embodiment, by designing a dataset quality feedback matrix, the quality characteristics of different types of datasets can be comprehensively covered, ensuring the comprehensiveness and detail of the evaluation. The preprocessed dataset is evaluated based on the dataset quality feedback matrix, and the parameters of each quality indicator in the matrix are initialized. A detailed analysis of each parameter in the dataset quality feedback matrix identifies the dataset's quality strengths and weaknesses. Based on preset calculation formulas and methods, the specific values of each quality indicator are calculated. Finally, the values of each quality indicator are comprehensively analyzed to form a comprehensive quality evaluation report. Figure 3 As shown, the quality assessment of preprocessed datasets mainly includes two technical modules: the design of the AI dataset quality quantization tensor and the AI dataset quality quantization assessment process.
[0072] Alternatively, the dataset quality feedback matrix can be represented by the following formula:
[0073] X∈R A×B×C×D
[0074] Where X=[M i S j I k Q l Let be the four-dimensional tensor for dataset quality quantization; M be the set of dataset modalities, A be the number of dataset modality types, and i be the index of the dataset modality, i=1,2,...,A; S be the set of model usage stages, B be the number of model usage stage types, and j be the index of the model usage stage, j=1,2,...,B; I be the set of dataset application scenarios, C be the number of dataset application scenario types, and k be the index of the dataset application scenario, k=1,2,...,C; Q be the set of dataset quality quantization metrics, D be the number of dataset quality quantization metric types, and l be the index of the dataset quality quantization metric, l=1,2,...,D.
[0075] Optionally, the dataset modality types include 12 categories: structured data, text, image, audio, video, time series data, point cloud, multimodal (image + text), multimodal (audio + text), multimodal (video + text), multimodal (audio + video), and multimodal (text + audio + video). In this case, A=12.
[0076] Optionally, the model can be used in five stages: pre-training, supervised fine-tuning, human feedback reinforcement alignment (RLHF), inference, and evaluation. In this case, B=5.
[0077] Optionally, the dataset application scenarios include general basic scenarios and industry-specific scenarios. Specifically, it includes 58 specific application scenarios across 19 major industry categories, such as general basic fields, scientific research, industrial manufacturing, modern agriculture, smart energy, transportation, financial services, healthcare, education, commerce and consumption, internet security, human resources, public safety, trade and commerce, culture and tourism, emergency management, emergency services, urban governance, and green and low-carbon development. In this case, C=58.
[0078] Optionally, the dataset quality quantification indicators include 12 types: completeness, standardization, accuracy, consistency, density, diversity, uniformity, correlation, real-time performance, traceability, convenience, and originality. In this case, D=12.
[0079] Optionally, text modality uses natural language as its core carrier and can be used to express thoughts, transmit information, and record content. In technology pre-research, it often involves exploring related technologies such as semantic understanding and information extraction. Image modality presents visual content through pixel arrays and can display visual information such as object shapes and scenes. In technology pre-research, it is a data format used for research on image recognition, feature extraction, and image generation. Speech modality uses sound waves as the propagation medium and carries human speech information. It includes pre-research directions for speech recognition, synthesis, and sentiment analysis, and is used to achieve the conversion between speech and text and related intelligent interactions. Video modality consists of a continuous sequence of image frames and can also contain audio information. It can present dynamic visual scenes, and in technology pre-research, it involves data format for video content analysis, target tracking, and video generation. Structured modality has a clear and standardized data structure, such as a table. The data is organized in an orderly manner, facilitating operations such as querying, statistics, and analysis. In technology pre-research, it is beneficial for conducting modeling based on structured data. Time-series data modalities are data sequences arranged in chronological order, reflecting the changing patterns of data over time. They are often used in technology research and development for techniques such as prediction and anomaly detection of time-series data. Point cloud data modalities consist of a large number of discrete points in space, each containing information such as three-dimensional coordinates. They can be used to represent the three-dimensional shape of objects. Technology research and development for point clouds involves techniques such as point cloud processing, segmentation, and reconstruction.
[0080] Optionally, different multimodal approaches integrate multiple data modalities such as text, images, voice, and video. The technology research focuses on data formats such as multimodal data fusion representation, cross-modal retrieval, and multimodal generation.
[0081] Optionally, the pre-training stage refers to training the model on large-scale unlabeled data, enabling the model to learn general feature representations and laying the foundation for subsequent model optimization for specific tasks. This is a crucial stage in technology pre-research for acquiring basic model capabilities. The supervised fine-tuning stage refers to further training the pre-trained model using labeled task-specific data to improve its performance on that specific task. This is a key stage in technology pre-research for adapting the model to specific tasks. The human feedback reinforcement alignment stage refers to training the model using reinforcement learning and other methods, incorporating human feedback on the model's output, to make the model's output more aligned with human intentions and expectations. This is a stage in technology pre-research for improving the model's fit with human needs. The inference stage refers to the model receiving input data after training, processing it internally, and outputting corresponding results to solve practical problems or make decisions. This is a stage in technology pre-research for verifying the model's practicality. The evaluation stage refers to using specific evaluation metrics and methods to comprehensively assess and test the model's performance and effectiveness on relevant tasks to determine whether the model has achieved its expected goals. This is a stage in technology pre-research for verifying the model's quality.
[0082] Optionally, specific application scenarios in the general foundational fields include: cross-domain knowledge transfer, multimodal data fusion, speech recognition, and image recognition. Specific application scenarios in the scientific research field include: experimental data validation, model training dataset optimization, and anomaly detection and noise processing. Specific application scenarios in the industrial manufacturing field include: smart factory operation optimization, product quality defect detection, and manufacturing process optimization. Specific application scenarios in the modern agriculture field include: crop growth monitoring, precision planting decision-making, and agricultural resource optimization. Specific application scenarios in the smart energy field include: dynamic optimization of energy systems, energy consumption prediction, and multi-energy integrated management. Specific application scenarios in the transportation field include: autonomous driving scenario optimization, traffic flow prediction, and vehicle scheduling and route planning. Specific application scenarios in the financial services field include: transaction risk assessment, credit assessment, and investment decision support. Specific application scenarios in the healthcare field include: disease diagnosis assistance, patient health monitoring, and medical resource optimization. Specific application scenarios in the education and teaching field include: intelligent teaching resource recommendation, learning behavior analysis, and teaching quality assessment. Specific application scenarios in the commercial consumption field include: user profile construction, personalized recommendation, and consumption trend analysis. Specific application scenarios in the fields of internet security, human resources, public safety, and social security include: network attack detection, data privacy protection, and security incident response. In public safety, these include: intelligent recruitment, employee performance evaluation, and talent development path planning. In commerce and trade, they include: supply chain optimization, market dynamics analysis, and product traceability management. In culture and tourism, they include: intelligent recommendation of tourism resources, intelligent evaluation of cultural activities, and tourism route planning. In emergency management, they include: disaster relief operation optimization, emergency drill effectiveness evaluation, and disaster monitoring and assessment. In meteorological services, they include: weather forecast model training, climate change research, and meteorological disaster early warning. In urban governance, they include: urban planning decision support, public service evaluation, and urban operational status monitoring. In green and low-carbon development, they include: carbon emission monitoring and analysis, renewable energy management, and low-carbon pathway planning.
[0083] In this embodiment, based on four core dimensions—12 specific dataset modalities (M), 5 model usage stages (S), 58 specific industry application scenarios (I), and 12 specific quantitative dataset quality quantification indicators (Q)—an innovative M×S×I×Q four-dimensional cross-type artificial intelligence dataset quality quantification four-dimensional tensor is designed. The core objective is to provide quantifiable, scenario-specific, and non-redundant combinations of specific quality feedback parameters for cross-scenarios involving "specific modalities + specific stages + specific scenarios," avoiding the problem of insufficient adaptability of generalized parameters to specific scenarios. Each quantitative indicator, after standardization, ranges from 0 to 1.
[0084] Optionally, the preprocessed dataset is evaluated based on the dataset quality feedback matrix to determine the parameters of the dataset quality feedback matrix. This includes: determining the target matrix parameters to be evaluated in the dataset quality feedback matrix according to the evaluation target and the preprocessed dataset; dividing the evaluation task into single-point tasks and distributed tasks according to the size and complexity of the preprocessed dataset; evaluating the preprocessed data according to each of the divided tasks, generating parameter values corresponding to each target matrix parameter, and obtaining the dataset quality feedback matrix parameters.
[0085] In this embodiment, combined with Figure 3As shown, the evaluation process begins with preparation, which includes five key technical steps: preparing the AI dataset, designing and coding the industry business architecture, selecting matrix parameters and dynamically designing weights, stratified random sampling of the dataset, and aligning with test security measures. Specifically, after preparing the preprocessed dataset, the industry business architecture is designed and coded according to the application scenario of the preprocessed dataset to better adapt to the needs of dataset quality evaluation. Based on the evaluation objectives and dataset characteristics, the target matrix parameters to be evaluated are selected, and the weights of each parameter are dynamically designed. Stratified random sampling of the dataset ensures the representativeness and diversity of the samples, improving the reliability of the evaluation results. A test security measure is designed and implemented to ensure data security and privacy protection during the evaluation process. Next, quantization execution is required, which includes six key technical steps: importing the dataset file to be quantified, initializing quantization parameters, splitting single-point and distributed tasks, real-time evaluation of dataset quality feedback matrix parameters, analysis of phased quantization issues, and manual verification of quantization results. Specifically, the prepared dataset file is imported into the evaluation system to ensure correct data loading. According to the evaluation plan, the quality matrix parameters are initialized to prepare for automated evaluation. The evaluation task is divided into single-point and distributed tasks, and an appropriate evaluation method is selected based on the dataset size and complexity. Automated tools are used to evaluate the dataset in real time, generating preliminary quality feedback matrix parameters. The results of the automated evaluation are analyzed in stages to identify potential quality problems and anomalies. Combined with manual verification, the results of the automated evaluation are reviewed and corrected to ensure accuracy. Finally, quality feedback is performed, which includes three main technical steps: quality matrix parameter anomaly analysis, anomaly summary and classification, and finalization of the quality feedback matrix parameters. Specifically, the parameters in the quality feedback matrix are analyzed in depth to identify outliers and quality problems. The identified anomalies are summarized and classified, and their root causes are analyzed. Based on the analysis results, the quality feedback matrix parameters are finalized to form a complete quality evaluation report.
[0086] For example, depending on the evaluation objectives and the characteristics of the dataset, it is necessary to evaluate the data completeness of text modality data in the pre-training stage and in cross-domain knowledge transfer scenarios in general basic domains, so as to determine the specific parameter values corresponding to [M1, S1, I1, Q1] in the dataset quality feedback matrix.
[0087] Optionally, methods for collaboratively adjusting the quality of AI datasets and model performance also include: when the dataset quality indicator evaluation parameters fail the quality assessment, performing anomaly analysis on the dataset quality feedback matrix parameters based on the assessment results; determining the target dataset quality feedback matrix parameters based on the anomaly analysis results; and reconstructing the preprocessed dataset based on the target dataset quality feedback matrix parameters.
[0088] In this embodiment, if the dataset quality metric evaluation parameters fail the quality assessment, anomaly analysis is performed on the dataset quality feedback matrix parameters based on the evaluation results to identify specific quality issues. Based on the anomaly analysis results, the target dataset quality feedback matrix parameters that need optimization are determined, clarifying the optimization direction. Based on the target dataset quality feedback matrix parameters, the preprocessed dataset is reconstructed to optimize dataset quality and meet the requirements for model training.
[0089] Optionally, the target dataset includes a training dataset and an evaluation dataset. The training dataset is used for model training, and the evaluation dataset is used for performance evaluation of the pre-trained model.
[0090] Optionally, the pre-trained model is evaluated to obtain model performance evaluation parameters, including: designing a model performance quantization matrix based on the pre-trained model; evaluating the performance of the pre-trained model based on the model performance quantization matrix to determine the model performance quantization matrix parameters; and determining the model performance evaluation parameters based on the model performance quantization matrix parameters.
[0091] In this embodiment, the design of a model performance quantification matrix comprehensively covers various performance characteristics of the model, ensuring the comprehensiveness and detail of the evaluation. Specific quantification indicators are designed for models with different capabilities and application scenarios, improving the relevance and effectiveness of the evaluation. The performance of the pre-trained model is evaluated based on the model performance quantification matrix, initializing the parameters of each performance indicator in the matrix. A detailed analysis of each parameter in the model performance quantification matrix identifies the model's performance strengths and weaknesses. Based on preset calculation formulas and methods, the specific values of each performance indicator are calculated. Finally, the values of each performance indicator are comprehensively analyzed to form a comprehensive quality evaluation report.
[0092] Alternatively, the model performance quantization matrix can be represented by the following formula:
[0093] Y∈R E×F
[0094] Where, Y=[y mn ] represents the magnitude of the nth objective quantitative indicator of the mth capability sub-item; E represents the number of capability sub-item types, m=1, 2, ..., E; F represents the number of objective quantitative indicator types, n=1, 2, ..., F.
[0095] In this embodiment, combined with Figure 4As shown, the model performance mainly includes four aspects: general basic capabilities, typical application capabilities, industry-specific capabilities, and trusted security capabilities, totaling 51 quantitative feedback capability sub-items. Specifically, these include 23 general basic capability items, 9 typical application capability items, 16 industry-specific capability items, and 3 trusted security capability items. For each capability sub-item in different aspects, 21 different combinations of objective quantitative indicators are used for quantitative evaluation. In this case, E=51 and F=21.
[0096] Optionally, different models possess different general-purpose basic capabilities. These models include large language models, visual language models, and speech language models.
[0097] Optionally, the general foundational capabilities of a large language model include: comprehension, generation, reasoning, general knowledge, subject-specific knowledge, memory, multilingualism, long text processing, coding, and role-playing dialogue. Specific tasks related to comprehension include: intent recognition, sentiment analysis, text classification, reading comprehension, and conversational question-answering comprehension. Objective quantitative indicators for comprehension include: accuracy, F1-Scores, and BERT Score. Specific tasks related to generation include: machine translation, automatic summarization, dialogue generation, content creation, content expansion, and paraphrasing. Objective quantitative indicators for generation include: accuracy, BLEU, COMENT, METEOR, and ROUGE. Specific tasks related to reasoning include: common sense reasoning, causal reasoning, analogical reasoning, mathematical reasoning, temporal reasoning, spatial logic reasoning, and complex logic reasoning. Objective quantitative indicators for reasoning include: accuracy, F1-Scores, EM, and the accuracy of the reasoning process. Specific tasks related to general knowledge include: world knowledge, social common sense, everyday common sense, and objective facts. Objective quantitative indicators for general knowledge include: accuracy and F1 Scores. Specific tasks for subject-specific competence include: professional knowledge and domain knowledge. Objective quantitative indicators for subject-specific competence include: accuracy and F1 scores. Specific tasks for memory competence include: cross-language question answering, cross-language classification, cross-language sentence similarity search, cross-language reasoning, and cross-language text alignment. Objective quantitative indicators for memory competence include: accuracy and task completion rate. Specific tasks for multilingual competence include: cross-language question answering, cross-language classification, cross-language sentence similarity search, cross-language reasoning, and cross-language text alignment. Objective quantitative indicators for multilingual competence include: accuracy, F1 scores, recall, BLEU, COMENT, and METEOR. Specific tasks for long-text competence include: single-document question answering, multi-document question answering, long-text character count, and key information retrieval. Objective quantitative indicators for long-text competence include: accuracy, F1 scores, and ROUGE. Specific tasks for coding competence include: intelligent code generation, code debugging, code explanation, code commenting, R&D knowledge Q&A, and code inspection. Objective quantitative indicators for coding competence include: Pass@k, error detection rate, and error correction rate. Specific tasks related to role-based dialogue capabilities include: intelligent customer service role, knowledge management role, and office assistant role. Objective quantitative indicators for role-based dialogue capabilities include: accuracy rate.
[0098] Optionally, the general fundamental capabilities of a visual language model include: visual understanding, visual generation, visual reasoning, visual retrieval, aesthetic understanding, and media creation. Specific tasks of visual understanding include: image description, image quality assessment, behavior recognition, relationship recognition, OCR, video question answering, and video description. Objective quantitative indicators for visual understanding include: accuracy, BLEU, METEOR, ROUGE, mAP, and CER. Specific tasks of visual generation include: text-to-image, text-to-video, and image-to-text video. Objective quantitative indicators for visual generation include: LPIPS, FID, and FVD. Specific tasks of visual reasoning include: mathematical calculation, graph analysis, relationship reasoning, and attribute reasoning. Objective quantitative indicators for visual reasoning include: accuracy, BLEU, ROUGE, and inference process accuracy. Specific tasks of visual retrieval include: text-to-image retrieval, image-to-text retrieval, image-text joint retrieval, and dynamic image-text sequence association retrieval. Objective quantitative indicators for visual retrieval include: accuracy and recall. The specific tasks of aesthetic understanding include: aesthetic identification ability, aesthetic generation ability, aesthetic quality assessment, and aesthetic interpretation ability. The specific tasks of media creation ability include: image and visual design generation, video content creation, visual content integration, and style and creative matching. Objective quantitative indicators of media creation ability include: accuracy rate.
[0099] Optionally, the general basic capabilities of a speech language model include: speech recognition, speech synthesis, speech understanding, voice reproduction, music generation, music information retrieval, and real-time voice interaction. Specific tasks of speech recognition include: text recognition, real-time speech recognition, and subtitle generation. Objective quantitative indicators for speech recognition include: WCR (Text Recognition Response) and SCR (Speech Recognition Response). Specific tasks of speech synthesis include: multi-scene text synthesis, speech naturalness, pronunciation accuracy, and speech rate and rhythm control. Specific tasks of speech understanding include: contextual understanding and semantic reasoning. Objective quantitative indicators for speech understanding include: accuracy. Specific tasks of voice reproduction include: timbre imitation, pronunciation imitation, and intonation imitation. Specific tasks of music generation include: melody generation, arrangement generation, style imitation, lyric generation, emotional expression, and long-duration generation. Objective quantitative indicators for music generation include: accuracy. Specific tasks of music information retrieval include: audio feature extraction, music classification, and lyric alignment. Objective quantitative indicators for music information retrieval include: accuracy. The specific tasks of real-time voice interaction capabilities include: voice command response, voice dialogue fluency, and emotion adaptation. Objective quantitative indicators of real-time voice interaction capabilities include: accuracy.
[0100] Optionally, typical application capabilities include: intelligent customer service applications, knowledge management applications, data analysis applications, office assistant applications, content creation applications, code assistant applications, web page assistant applications, intelligent agent applications, and embodied intelligence applications. Specific tasks of intelligent customer service applications include: intent recognition, multi-turn dialogue consistency, context sensitivity, language diversity, sentiment recognition and support, intelligent response, role-playing, risk identification, dialogue topic classification and text analysis, and avoidance of unreasonable requests and sensitive topics. Objective quantitative indicators for intelligent customer service applications include: accuracy. Specific tasks of knowledge management applications include: knowledge retrieval capabilities, knowledge question-answering capabilities, knowledge extraction capabilities, reading comprehension capabilities, multilingual translation, entity recognition efficiency, sentiment classification accuracy, syntactic analysis depth, text summarization generation capabilities, and text creation adaptability. Objective quantitative indicators for knowledge management applications include: accuracy. Specific tasks of data analysis applications include: data interpretation, data quality inspection, data question answering, data analysis and prediction, data visualization, and data protection. Objective quantitative indicators for data analysis applications include: accuracy. Specific tasks of office assistant applications include: operating system applications, PPT processing applications, and mind mapping applications. Objective quantitative indicators for office assistant applications include: accuracy. Specific tasks for content creation applications include: article writing applications and official document writing applications. Objective quantitative indicators for content creation applications include: accuracy. Specific tasks for code assistant applications include: intelligent code generation, code debugging, code explanation, R&D knowledge Q&A, code commenting, code inspection, and code translation. Objective quantitative indicators for code assistant applications include: accuracy. Specific tasks for web assistant applications include: search intent understanding ability, query generation and rewriting, search result summarization ability, information value judgment ability, search result evaluation and ranking ability, search result information extraction and integration ability, and citation annotation ability. Objective quantitative indicators for web assistant applications include: accuracy. Specific tasks for intelligent agent applications include: memory ability, reasoning ability, planning ability, tool use, and task processing. Objective quantitative indicators for intelligent agent applications include: accuracy. Specific tasks for embodied intelligence applications include: real-world scenario question answering, spatial object manipulation, visual-language alignment, and autonomous environmental exploration. Objective quantitative indicators for embodied intelligence applications include: accuracy.
[0101] Optionally, the industries to which industry-specific capabilities belong include: finance, healthcare, software, education, law, research, government affairs, telecommunications, energy, transportation, media, e-commerce, industry, design, automotive, and embodied intelligence. Specific capabilities in the finance industry include: financial intent understanding, financial slot identification, financial emotion recognition, dialogue subject identification, financial knowledge understanding, financial computing capabilities, financial analysis capabilities, financial interpretation capabilities, and financial compliance identification. The objective quantitative indicator for the finance industry includes: accuracy. Specific capabilities in the healthcare industry include: medical vocabulary, medical concepts, diagnostic suggestions, treatment planning, emergency identification, first aid guidance, dialogue-based medical record generation, and medical event extraction. The objective quantitative indicator for the healthcare industry includes: accuracy. Specific capabilities in the software industry include: user requirement understanding and transformation, code generation, code review capabilities, code optimization suggestions, documentation generation capabilities, intelligent testing capabilities, and intelligent operation and maintenance capabilities. The objective quantitative indicator for the software industry includes: accuracy. Specific competencies in the education industry include: understanding teaching content, analyzing user needs, providing curriculum design suggestions, understanding examination methods, applying educational technology, interpreting education policies, understanding student psychology and behavior, and explaining education industry terminology. Objective quantitative indicators for the education industry include: accuracy rate. Specific competencies in the legal industry include: case analysis, case collection, interpretation of legal provisions, answering legal questions, document generation, contract review, sentencing prediction, bar exam preparation, legal risk warning, and transcript compilation and summarization. Objective quantitative indicators for the legal industry include: accuracy rate. Specific competencies in the scientific research industry include: literature review ability, experimental design ability, paper writing ability, data analysis ability, scientific research innovation ability, and interdisciplinary integration ability. Objective quantitative indicators for the scientific research industry include: accuracy rate. Specific competencies in the government sector include: policy understanding, policy interpretation, providing service information, navigating service procedures, handling user feedback, providing solutions, government decision-making, and risk analysis. Objective quantitative indicators for the government sector include: accuracy rate. Specific capabilities in the telecommunications industry include: communication network understanding, service quality assessment, fault diagnosis, user behavior analysis, data traffic management, security protection, interpretation of communication policies, technical support, network optimization, and explanation of telecommunications industry terminology. Objective quantitative indicators for the telecommunications industry include: accuracy. Specific capabilities in the energy industry include: intelligent inspection, video surveillance, power load forecasting, fault diagnosis and early warning, equipment lifespan prediction, intelligent control, and personnel monitoring. Objective quantitative indicators for the energy industry include: accuracy. Specific capabilities in the transportation industry include: multi-source traffic information fusion, route planning generation, traffic command decision support, traffic simulation, route optimization, traffic intelligence processing, information security analysis, compliance, and real-time decision-making. Objective quantitative indicators for the transportation industry include: accuracy. Specific capabilities in the media industry include: content creation, news reporting, advertising and marketing, and film and television literature creation.Objective quantitative indicators for the media industry include: accuracy. Specific capabilities for the e-commerce industry include: product description understanding, user review analysis, intent recognition, transaction processing, query parsing and analysis, and shopping assistance. Objective quantitative indicators for the e-commerce industry include: accuracy. Specific capabilities for the industrial industry include: principle-based R&D, forward-looking design, efficient simulation, refined testing, intelligent control, and scientific operation and maintenance. Objective quantitative indicators for the industrial industry include: accuracy. Specific capabilities for the design industry include: architectural auxiliary design, architectural drawing review guidance, and comprehensive architectural scoring. Objective quantitative indicators for the design industry include: accuracy. Specific capabilities for the automotive industry include: intelligent cockpit and interaction, automotive marketing, automotive understanding and knowledge, vehicle user guides, autonomous driving, and simulation systems. Objective quantitative indicators for the automotive industry include: accuracy. Specific capabilities for the embodied intelligence industry include: multimodal perception, multimodal interaction, and motion control algorithms. Objective quantitative indicators for the embodied intelligence industry include: accuracy.
[0102] Optionally, trusted security capabilities include: content security, ethical security, and model security. Specific tasks of content security include: risk-based denial of responses, content compliance, and geopolitics. Objective quantitative indicators for content security include: accuracy. Specific tasks of ethical security include: value alignment (values), moral ethics (moral views), ideology, and fairness / discrimination. Objective quantitative indicators for ethical security include: accuracy. Specific tasks of model security include: model attacks, model threats, personal privacy, and institutional privacy. Objective quantitative indicators for model security include: accuracy.
[0103] Alternatively, accuracy is the ratio between the number of correctly answered questions and the total number of questions, calculated using the following formula:
[0104]
[0105] Where T is the number of questions answered correctly, and F is the total number of questions.
[0106] Optionally, recall refers to the proportion of sentences correctly identified as similar by the model out of the total number of actual similar sentences, and is calculated using the following formula:
[0107]
[0108] Where TP is the number of sentences that the model identifies as similar and that are actually similar, and FN is the number of sentences that the model identifies as dissimilar but are actually similar.
[0109] Optionally, the F1 scores are the harmonic mean of precision and recall, where precision is the proportion of true positive samples among the positive samples identified by the model, and recall is the proportion of true positive samples correctly classified. The formula is as follows:
[0110]
[0111]
[0112]
[0113] Where tp is the number of true positive samples, fp is the number of false positive samples, and fn is the number of false negative samples.
[0114] Optionally, BERT Score is a text similarity evaluation metric based on the BERT model, used to measure the semantic similarity between generated and reference texts. It mainly has three core metrics:
[0115]
[0116]
[0117]
[0118] Where x is the reference text, For candidate text, x i For reference marker text, For candidate tag text.
[0119] Optionally, BLEU, or Bilingual Evaluation Alternate, is an evaluation metric used for machine translation tasks. It scores the machine translation output by comparing the n-gram overlap between the machine translation output and one or more reference translations, with the overall idea being accuracy. The calculation formula is:
[0120]
[0121] Where BP is the penalty factor, w n The weights of an n-gram are typically given equal weights for all n values, p n For the precision of n-gram correction, an n-gram is a series of n consecutive items (such as syllables, letters, words, or primitives) in a text. Here, "item" usually refers to a word, but it can also be a character or other linguistic unit.
[0122] Optionally, the COMET evaluation criterion is a deep learning-based method for assessing the quality of machine translation. This criterion quantitatively evaluates the quality of machine translation by comparing the semantic similarity between machine translation results and human translation results.
[0123] Optionally, the METEOR metric is a lexical semantic similarity metric used to evaluate the accuracy of machine translation systems. The calculation formula is as follows:
[0124]
[0125] Among them, P exac P represents the accuracy of the model's translation results in exactly matching the reference answer. stem P represents the accuracy of the model's translation results in stemming compared to the reference answer. enalty This is a penalty item.
[0126] Optionally, ROUGE is a recall-oriented summarization evaluation method that measures how much information from the reference text is contained in the generated text. ROUGE metrics typically include multiple sub-metrics, such as ROUGE-N and ROUGE-L, which measure the similarity between the generated text and the reference text from different perspectives.
[0127] Optionally, Rouge-N calculates the n-gram overlap between the generated text and the reference text, using the following formula:
[0128]
[0129] Where N is the length of the n-gram, i.e. the number of consecutive words, Number of n-grams in common is the number of n-grams common to the generated text and the reference text, and Number of n-grams in reference is the number of all n-grams in the reference text. An n-gram is a series of n consecutive items (such as syllables, letters, words or primitives) in a text. Here, "item" usually refers to a word, but it can also be a character or other linguistic unit.
[0130] Optionally, Rouge-L calculates the similarity between the generated text and the reference text based on the longest common subsequence (LCS), using the following formula:
[0131]
[0132] Among them, P lcs For LCS-based accuracy, R lcs β is the recall rate based on LCS, and β is a parameter used to adjust the weight between recall and precision.
[0133] Optionally, EM measures whether the model's generated answer is completely identical to the standard answer. If the model's answer is identical to the standard answer in content, order, spelling, etc., it is considered an Exact Match; otherwise, it is considered a Mismatch. The calculation formula is:
[0134]
[0135] Optionally, the accuracy of the reasoning process is the degree to which the model's reasoning steps or processes match the standard answer or expert reasoning steps when solving a series of problems. The calculation formula is:
[0136]
[0137] Where n is the number of steps in the reasoning process.
[0138] Optionally, the task completion rate is typically calculated based on the ratio of the number of tasks actually completed to the number of tasks that should have been completed. The formula is:
[0139]
[0140] Optionally, Pass@k is a metric used to evaluate the performance of a code generation model. It measures the probability that at least one of the multiple candidate solutions generated by the model will pass the test correctly. The calculation formula is as follows:
[0141]
[0142] Where k is the kth candidate solution, and p is the probability of each candidate solution passing the test.
[0143] Optionally, the error detection rate is typically calculated based on the ratio of correctly detected errors to the total number of errors. The formula is:
[0144]
[0145] Optionally, the error repair rate is typically calculated based on the ratio of correctly repaired errors to the total number of errors. The formula is:
[0146]
[0147] Optionally, word recognition accuracy (WCR) represents the percentage of correctly recognized words. The calculation formula is:
[0148]
[0149] Where WCR is the character recognition accuracy, D is the number of deleted incorrect characters, I is the number of inserted incorrect characters, S is the number of replaced incorrect characters, and N is the total number of characters in the standard word sequence.
[0150] Optionally, Sentence Recognition Accuracy (SCR) represents the proportion of correctly recognized sentences, calculated using the following formula:
[0151]
[0152] Where SCR is the sentence recognition accuracy, H is the number of sentences that are correctly recognized, and N is the total number of sentences.
[0153] Optionally, the character error rate (CER) represents the difference between the number of recognized characters and the original number of characters. This difference is primarily calculated through substitution, deletion, and insertion errors, using the following formula:
[0154]
[0155] Where S is the number of characters replaced, D is the number of characters deleted, I is the number of characters inserted, and N is the original number of characters.
[0156] Optionally, the mean precision (mAP) is the average of the average precisions of all categories, calculated using the following formula:
[0157]
[0158] Where N is the number of categories, AP i Let be the average precision of the i-th category.
[0159] Optionally, LPIPS is a deep learning-based image similarity metric used to evaluate perceptual differences between images, calculated as follows:
[0160]
[0161] Among them, f k Let w be the feature extraction function for the k-th layer. k N represents the weighting coefficient. k This is the normalization term.
[0162] Optionally, FID is the similarity between the generated image and the real image, calculated using the following formula:
[0163]
[0164] Where, μ r μ is the mean vector of the features of the real image. g To generate the mean vector of image features, ∑ r Let ∑ be the covariance matrix of the features of the real image. g To generate the covariance matrix of image features.
[0165] Optionally, FVD is the similarity between the generated video and the real video, calculated using the following formula:
[0166]
[0167] Where, μ r Let μ be the mean vector of the features of the real video. g To generate the mean vector of video features, ∑ r Let ∑ be the covariance matrix of the real video features.g To generate the covariance matrix of video features.
[0168] Based on the 51 different capability sub-items and 21 specific quantitative indicator dimensions of the above artificial intelligence model, a unified two-dimensional matrix of artificial intelligence model performance quantitative indicators is designed to measure the overall performance capability of a specific artificial intelligence model. The matrix is in the form of a 51x21 two-dimensional matrix. Among them, the specific quantitative objective indicators included in different capability sub-items have different focuses and compositions. Objective indicators that are not included can be directly marked as 0, while other included objective indicators can be directly calculated and tested according to the calculation formula (all ranging from 0 to 1).
[0169] Optionally, a correlation analysis between dataset quality and model performance is performed based on the model performance index evaluation parameters, and updated dataset quality index target parameters are generated based on the analysis results. This includes: performing a correlation analysis on the model performance index evaluation parameters based on the correlation analysis model to determine the dataset quality index that affects the model performance index; updating the determined dataset quality index based on the model performance index evaluation parameters to obtain the updated dataset quality index target parameters; wherein, the correlation analysis model can determine the mapping relationship between dataset quality index and model performance index.
[0170] In this embodiment, an association analysis model can accurately identify the key metrics that have the greatest impact on model performance from among numerous dataset quality metrics. The identified dataset quality metrics are then updated based on the model performance metric evaluation parameters to obtain updated target parameters for the dataset quality metrics. For example, if the model performs poorly on a specific metric, the association analysis model can suggest increasing the weight of the relevant dataset quality metric or adjusting its target value, thereby enabling targeted optimization of the dataset in the next iteration. Guided by the association analysis model, the dataset optimization process becomes more efficient and targeted.
[0171] Optionally, the association analysis model is trained as follows: collect model performance index evaluation parameters and corresponding dataset quality index evaluation parameters that meet the preset standards from historical data, and perform feature alignment and normalization on the collected data; construct an initial state based on the feature-aligned and normalized model performance index data and dataset quality index data; adjust the dataset quality index data in the initial state, and use the association analysis model to predict new model performance index data based on the adjusted dataset quality index data; adjust the parameters of the association analysis model according to the changes in the model performance index data.
[0172] In this embodiment, through a reinforcement learning-based dynamic association analysis method, the association analysis model can accurately capture the dynamic mapping relationship between the 12×5×58×12 dataset quality tensor and the 51×21 model performance parameter two-dimensional matrix, providing strong support for the dynamic closed-loop iterative optimization of the artificial intelligence model. By collecting model performance index evaluation parameters and dataset quality index evaluation parameters that meet preset standards from historical data, the association analysis model can be continuously iteratively optimized, thereby continuously improving the analytical capabilities of the association analysis model in practical applications.
[0173] Optionally, combined Figure 5 As shown, the collected data undergoes feature alignment and normalization, including: feature alignment, matrix parameter normalization, and feature dimensionality reduction.
[0174] In this embodiment, a deep learning autoencoder compresses high-dimensional data quality parameters into feature vectors of the same dimension as the model performance parameters while preserving key information. Specifically, a four-dimensional convolutional autoencoder is designed, taking a 12×5×58×12 four-dimensional data quality tensor as input. The encoder progressively compresses the dimension through 4D convolutional layers (kernel size 3×3×3×3, stride 2), outputting a 51×21 feature matrix (with the same dimension as the model performance parameters) after global pooling. The decoder reconstructs the input tensor through transposed convolutional layers, training with the goal of minimizing reconstruction error to ensure that the compressed features retain the key information of the original data quality. Then, the AI data quality and model performance quantization parameter matrices are standardized, mapping the data of each dimension to the [0, 1] interval using a Min-Max normalization method.
[0175]
[0176] Among them, X norm For normalized data, X is the original data. min and X max These are the minimum and maximum values of the data in this dimension, respectively.
[0177] Finally, feature dimensionality reduction is performed. For the compressed 2D matrix of data quality parameters (51×21), principal component analysis (PCA) is used to compress the data quality features from 51×21 to 51×15, retaining the main data quality information while reducing redundancy between features, resulting in a cumulative contribution rate of over 95%. Specifically, PCA is used to select the eigenvectors corresponding to the 15 largest eigenvalues as principal components, constructing a 21×15 projection matrix. This projection matrix is then used to reduce the dimensionality of the data quality parameter 2D matrix, resulting in a 51×15 dimensionality-reduced matrix. For the 2D matrix of model performance parameters (51×21), feature selection is also performed. The random forest algorithm is used to evaluate feature importance, selecting the 15 main features that have the greatest impact on model performance.
[0178] Optionally, the association analysis model employs an attention-enhanced Transformer deep learning network to learn the nonlinear mapping relationship between the compressed and dimensionality-reduced quality features (51×15) and performance parameters (51×15).
[0179] Optionally, the network input layer of the association analysis model is used to receive the compressed quality feature matrix (51×15); the self-attention layer is used to calculate the association weights between the 51 sample dimensions; the cross-attention layer is used to associate the 15 performance index dimensions with the quality feature dimensions; and the output layer is used to predict the model performance parameters through the fully connected layer.
[0180] Optionally, the loss function of the association analysis model uses the harmonic mean squared error (MSE) and mean absolute error (MAE) with weighted proportions as the optimization objective loss function. This can take into account the worst-case scenario of the two core optimization objectives, and the network parameters are updated through the backpropagation algorithm, enabling the model to initially learn the association analysis mapping network DATA-MODEL between the two.
[0181] Optionally, the loss function is:
[0182]
[0183] Where α is the root mean square error weighting coefficient, and β is the mean absolute error weighting coefficient.
[0184] Optionally, the reinforcement learning dynamic closed-loop feedback method aims to treat the adjustment of data quality as the action of the reinforcement learning agent, and the change of model performance as the reward feedback given by the environment. By continuously trying different data quality adjustment strategies, the agent learns the optimal data quality control strategy that can maximize the model performance reward.
[0185] In this embodiment, the standardized two-dimensional matrix of data quality parameters and the two-dimensional matrix of model performance parameters are input into the reinforcement learning system as the initial state, with the state vector St = [dp1, dp2, ..., mp1, mp2, ...]. Here, dq represents the data quality features, and mp represents the model performance index. At this point, the agent is in the initial state and has not yet made any adjustments to the data quality. The action space of the agent, which provides a closed-loop feedback mechanism for data quality and model performance, is defined as all possible AI data engineering actions that may change the parameters of the two-dimensional matrix of data quality parameters, such as weighted adjustments to certain data quality parameters and selection of further data cleaning strategies. Each action has a strong correlation with one or more elements in the data quality matrix, and each action has a corresponding technical execution cost; for example, data cleaning may consume certain computational resources and time. The reward function is a key component of this method, used to measure the benefit gained by the agent after taking a certain action, i.e., the degree of improvement in model performance. Specifically, after the agent takes an action to adjust the data quality, the model performance is re-evaluated and compared with the previous performance. If some core key performance indicators in the model performance matrix parameters improve, and the improvement exceeds a certain threshold, a positive reward is given; conversely, if performance declines or the improvement falls short of expectations, a negative reward is given. Simultaneously, considering the execution cost of the action, the corresponding cost value is deducted from the reward. For example, if a data cleaning action improves the model's accuracy but consumes significant computing resources, the reward value will be the positive reward for performance improvement minus the cost value of that action. The mathematical expression of the reward function is as follows:
[0186]
[0187] Where R is the reward value for data quality operations, α is the model performance improvement weighting coefficient used to adjust the degree of influence of performance improvement on the reward, and ΔP i Let represent the change in the i-th model performance index, n represent the number of model performance indices involved in the calculation, β represent the cost weighting coefficient used to weigh the impact of data quality enhancement action costs on rewards, and C represent the computing power consumption and other economic costs incurred in executing the action.
[0188] When an agent takes an action, the two-dimensional matrix of data quality parameters changes accordingly. This leads to changes in the input data of the model during training or inference, which in turn causes changes in the two-dimensional matrix of model performance parameters. During state transitions, detailed information about each action, reward, and state change is recorded so that the agent can learn effective behavioral strategies. For example, if the agent performs an action to fill in missing values in a dataset, the relevant indicators of the integrity dimension in the data quality matrix improve. After retraining, the model's performance matrix parameters improve, thus achieving a transition from the initial state to the new state.
[0189] Optionally, a deep Q-network algorithm is used for reinforcement learning. A Q-network is constructed, with the number of neurons in the output layer equal to the size of the action space. The Q-network predicts the Q-value of each action based on the current state and selects the action with the largest Q-value as the action to be executed. An experience replay mechanism is used to replay each state transition (S... t A t R t+1 S t+1 The data is stored in an experience replay pool, and a batch of samples is randomly drawn from the pool for network training to break the correlation between data and improve training stability. At the same time, a target Q network is introduced, and the parameters of the Q network are periodically copied to the target Q network to calculate the target Q value, reducing oscillations during training.
[0190] In this embodiment, a dynamic association model and a reinforcement learning algorithm are combined for joint training. In each training step, the Q-network first selects an action A based on the current state. t The data quality is adjusted to obtain new data quality features. These new features are then input into the DATA-MODEL network for association analysis to predict model performance metrics. Based on the difference between the actual model performance and the predicted performance, the loss is calculated, and the parameters of the static association analysis transformer algorithm network are updated. Simultaneously, based on the reward R... t+1 and new state S t+1 The parameters of the Q-network are updated using the Q-learning algorithm, i.e.:
[0191]
[0192] Where a is the learning rate and γ is the discount factor.
[0193] Optionally, the design employs a Bayesian optimization algorithm to optimize the hyperparameters in the model. For the Transformer network in the dynamic association analysis model, the specific updated hyperparameters include four parameters: the number of hidden layers, the number of hidden layer neurons, the learning rate, and the regularization coefficient. For the deep Q-network, the dynamically updated hyperparameters include three core network parameters: the learning rate, the discount factor, and the size of the experience replay pool. Bayesian optimization constructs a surrogate model (such as a Gaussian process) of the objective function (such as the model's performance metrics on the validation set). Based on historical hyperparameter values and corresponding objective function values, it intelligently selects the next set of hyperparameters for experimentation to quickly find the optimal hyperparameter combination and continuously iterate to improve model performance.
[0194] The method for collaboratively adjusting the quality of artificial intelligence datasets and model performance provided in this disclosure establishes a standardized technical framework for closed-loop feedback between the quality of artificial intelligence datasets and model performance, clarifies the dynamic interaction between the two, provides a unified technical framework and process for the industry, promotes collaboration between artificial intelligence dataset quality assessment and large model training, guides enterprises to efficiently develop and optimize artificial intelligence systems, reduces development costs, improves the final performance and stability of models, promotes the large-scale application of artificial intelligence technology in various fields, and helps the high-quality development of the artificial intelligence industry.
[0195] Combination Figure 6 As shown in the embodiments of this disclosure, an apparatus for collaboratively adjusting the quality of an artificial intelligence dataset and the performance of a model is provided, comprising: a data engineering module 601, a dataset quality assessment module 602, a model performance assessment module 603, and a correlation analysis module 604. The data engineering module 601 is configured to construct a preprocessed dataset according to model construction requirements and dataset quality indicator target parameters. The dataset quality assessment module 602 is configured to perform quality assessment on the preprocessed dataset, obtain dataset quality indicator assessment parameters, and determine a target dataset that passes the quality assessment based on the dataset quality indicator assessment parameters. The model performance assessment module 603 is configured to train a model using the target dataset, obtain a pre-trained model, and perform performance assessment on the pre-trained model to obtain model performance indicator assessment parameters; it is also configured to output the target artificial intelligence model and the target artificial intelligence dataset corresponding to the model performance indicator assessment parameters that meet the preset standard when the model performance indicator assessment parameters meet the preset standard. The correlation analysis module 604 is configured to perform correlation analysis between dataset quality and model performance based on the model performance indicator assessment parameters when the model performance indicator assessment parameters do not meet the preset standard, and generate updated dataset quality indicator target parameters based on the analysis results.
[0196] In this embodiment, the data engineering module 601 constructs the required preprocessed dataset according to pre-defined requirements, and then the dataset quality assessment module 602 assesses the quality of the preprocessed dataset to obtain dataset quality index assessment parameters. If the dataset quality index assessment parameters pass the quality assessment, a target dataset that has passed the quality assessment is obtained; otherwise, if the dataset quality index assessment parameters fail the quality assessment, feedback needs to be sent to the data engineering module 601 based on the assessment results, so that the data engineering module 601 can reconstruct the preprocessed dataset based on the feedback results. Using the target dataset that has passed the quality assessment, the model performance assessment module 603 can train the model to obtain a pre-trained model and assess the performance of the pre-trained model to obtain model performance index assessment parameters. If the model performance index assessment parameters do not meet the preset standard, the correlation analysis module 604 performs correlation analysis between dataset quality and model performance, thereby generating updated dataset quality index target parameters. This allows the data engineering module 601 to regenerate the preprocessed dataset based on the updated dataset quality index target parameters, thus achieving collaborative iterative optimization of dataset quality and model performance. When the model performance evaluation parameters meet the preset standards, the model performance evaluation module 603 can output the target artificial intelligence model and target artificial intelligence dataset corresponding to the model performance evaluation parameters that meet the preset standards. Furthermore, the model performance evaluation parameters that meet the preset standards and the dataset quality evaluation parameters can be fed back to the association analysis module 604 to iteratively optimize the association analysis model in the association analysis module 604, further improving the analytical capabilities of the association analysis model.
[0197] Combination Figure 7As shown, this disclosure provides another apparatus 700 for collaboratively adjusting the quality of artificial intelligence datasets and model performance, including a processor 701 and a memory 702. Optionally, the apparatus may further include a communication interface 703 and a bus 704. The processor 701, communication interface 703, and memory 702 can communicate with each other via the bus 704. The communication interface 703 can be used for information transmission. The processor 701 can call logical instructions in the memory 702 to execute the method for collaboratively adjusting the quality of artificial intelligence datasets and model performance described in the above embodiments. Furthermore, the logical instructions in the memory 702 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. The memory 702, as a computer-readable storage medium, can be used to store software programs, computer-executable programs, such as program instructions / modules corresponding to the methods in the embodiments of this disclosure. The processor 701 executes functional applications and data processing by running the program instructions / modules stored in the memory 702, thereby implementing the method for collaboratively adjusting the quality of artificial intelligence datasets and model performance described in the above embodiments. The memory 702 may include a program storage area and a data storage area. The program storage area may store the operating system and application programs required for at least one function; the data storage area may store data created based on the use of the terminal device. Furthermore, the memory 702 may include high-speed random access memory and may also include non-volatile memory.
[0198] This disclosure provides an electronic device, including: an electronic device body, and the aforementioned apparatus for collaboratively adjusting the quality of an artificial intelligence dataset and the performance of a model. The apparatus for collaboratively adjusting the quality of an artificial intelligence dataset and the performance of a model is installed in the electronic device body. The installation relationship described herein is not limited to placement within the electronic device, but also includes installation connections with other components of the electronic device, including but not limited to physical connections, electrical connections, or signal transmission connections. Those skilled in the art will understand that the apparatus for collaboratively adjusting the quality of an artificial intelligence dataset and the performance of a model can be adapted to suitable electronic device bodies, thereby realizing other feasible embodiments.
[0199] This disclosure provides a computer-readable storage medium storing computer-executable instructions configured to perform the aforementioned method for collaboratively adjusting the quality of an artificial intelligence dataset and the performance of a model.
[0200] The foregoing description and accompanying drawings fully illustrate embodiments of this disclosure to enable those skilled in the art to practice them. Other embodiments may include structural, logical, electrical, procedural, and other changes. The embodiments represent only possible variations. Individual components and functions are optional unless explicitly required, and the order of operation may vary. Parts and features of some embodiments may be included in or replace parts and features of other embodiments. Moreover, the terminology used in this application is for describing embodiments only and is not intended to limit the technical solutions described herein.
Claims
1. A method for collaboratively adjusting the quality of artificial intelligence datasets and model performance, applied to the development and optimization of artificial intelligence systems, characterized in that, include: One or more iterations; Iterative operations include: Based on the model construction requirements and dataset quality indicator target parameters, a preprocessed dataset is constructed. This construction includes: deploying a lightweight federated client on edge devices, integrating a feature extraction sub-model, performing local feature encoding on the raw data, uploading feature vectors or model update parameters, and storing the raw data locally; using an improved FedAvg algorithm on a cloud server to aggregate model updates from edge nodes and generate a global data acquisition strategy model; generating a sampling strategy based on the global data acquisition strategy model, and collecting multi-source heterogeneous data conforming to the target data structure according to the sampling strategy; performing data processing, data annotation, and data augmentation on the multi-source heterogeneous data to obtain the preprocessed dataset. The model construction requirements and dataset quality indicator target parameters include: multiple modal inputs such as natural language requirement documents, images, audio, video, time series, point clouds, structured data, and multimodal files. The preprocessed dataset is quality evaluated to obtain dataset quality index evaluation parameters, and the target dataset that passes the quality evaluation is determined based on the dataset quality index evaluation parameters. The model is trained using the target dataset to obtain a pre-trained model, and the performance of the pre-trained model is evaluated to obtain model performance evaluation parameters. When the model performance evaluation parameters do not meet the preset standards, a correlation analysis between dataset quality and model performance is performed based on the model performance evaluation parameters. Based on the analysis results, updated target parameters for dataset quality are generated. This correlation analysis, which involves performing a correlation analysis on the model performance evaluation parameters to determine the dataset quality indicators that affect model performance, and updating the determined dataset quality indicators based on the model performance evaluation parameters to obtain the updated target parameters for dataset quality. The correlation analysis model can determine the mapping relationship between dataset quality indicators and model performance indicators. The convergence condition for the iterative operation is that the model performance evaluation parameters reach a preset standard, and the target artificial intelligence model and target artificial intelligence dataset corresponding to the model performance evaluation parameters that have reached the preset standard are obtained.
2. The method according to claim 1, characterized in that, Based on the model building requirements and dataset quality indicator target parameters, a preprocessed dataset is constructed, which also includes: Based on the model construction requirements and dataset quality indicator target parameters, a requirements analysis is conducted to extract requirements information, and a knowledge graph is constructed based on the requirements information. Based on the structured requirements indicators in the knowledge graph, and using reinforcement learning and knowledge graph constraints, the target data structure is determined.
3. The method according to claim 1, characterized in that, The preprocessed dataset is quality-assessed to obtain dataset quality metrics, including: Design a dataset quality feedback matrix based on the preprocessed dataset; The preprocessed dataset is evaluated based on the dataset quality feedback matrix, and the parameters of the dataset quality feedback matrix are determined. Based on the parameters of the dataset quality feedback matrix, determine the dataset quality indicator evaluation parameters; The dataset quality feedback matrix is represented by the following formula: X∈R A×B×C×D Where X=[M i S j I k Q l Let be the four-dimensional tensor for dataset quality quantization; M be the set of dataset modalities, A be the number of dataset modality types, and i be the index of the dataset modality, i=1,2,...,A; S be the set of model usage stages, B be the number of model usage stage types, and j be the index of the model usage stage, j=1,2,...,B; I be the set of dataset application scenarios, C be the number of dataset application scenario types, and k be the index of the dataset application scenario, k=1,2,...,C; Q be the set of dataset quality quantization metrics, D be the number of dataset quality quantization metric types, and l be the index of the dataset quality quantization metric, l=1,2,...,D.
4. The method according to claim 3, characterized in that, The preprocessed dataset is evaluated for quality based on the dataset quality feedback matrix, and the parameters of the dataset quality feedback matrix are determined, including: Based on the evaluation objectives and the preprocessed dataset, determine the target matrix parameters that need to be evaluated in the dataset quality feedback matrix; Based on the size and complexity of the preprocessed dataset, the evaluation task is divided into single-point tasks and distributed tasks. The preprocessed data is evaluated based on the segmented tasks, and parameter values corresponding to the target matrix parameters are generated to obtain the dataset quality feedback matrix parameters.
5. The method according to claim 3, characterized in that, Also includes: If the dataset quality indicator evaluation parameters fail the quality assessment, perform anomaly analysis of the dataset quality feedback matrix parameters based on the assessment results. Determine the parameters of the target dataset quality feedback matrix based on the anomaly analysis results; The preprocessed dataset is reconstructed based on the parameters of the target dataset quality feedback matrix.
6. The method according to claim 1, characterized in that, The pre-trained model is evaluated to obtain model performance metrics, including: Design a model performance quantization matrix based on the pre-trained model; The performance of the pre-trained model is evaluated based on the model performance quantization matrix, and the parameters of the model performance quantization matrix are determined. Based on the model performance quantization matrix parameters, determine the model performance evaluation parameters; The model performance quantization matrix is represented by the following formula: Y∈R E×F Where, Y=[y mn ] represents the magnitude of the nth objective quantitative indicator of the mth capability sub-item; E represents the number of capability sub-item types, m=1, 2, ..., E; F represents the number of objective quantitative indicator types, n=1, 2, ..., F.
7. The method according to any one of claims 1 to 6, characterized in that, The association analysis model should be trained as follows: Collect model performance evaluation parameters and corresponding dataset quality evaluation parameters that meet the preset standards from historical data, and perform feature alignment and normalization on the collected data; The initial state is constructed based on the model performance metrics data after feature alignment and normalization, and the dataset quality metrics data; In the initial state, the dataset quality index data is adjusted, and the association analysis model is used to predict the new model performance index data based on the adjusted dataset quality index data. Adjust the parameters of the correlation analysis model based on changes in the model's performance index data.
8. A device for collaboratively adjusting the quality of artificial intelligence datasets and model performance, applied to the development and optimization of artificial intelligence systems, characterized in that, include: The data engineering module is configured to construct a preprocessed dataset based on model construction requirements and dataset quality indicator target parameters. This construction includes: deploying a lightweight federated client on edge devices, integrating a feature extraction sub-model, performing local feature encoding on the raw data, uploading feature vectors or model update parameters, and storing the raw data locally; using an improved FedAvg algorithm on a cloud server to aggregate model updates from edge nodes and generate a global data acquisition strategy model; generating a sampling strategy based on the global data acquisition strategy model and collecting multi-source heterogeneous data conforming to the target data structure according to the sampling strategy; performing data processing, data annotation, and data augmentation on the multi-source heterogeneous data to obtain the preprocessed dataset; and considering various modal inputs including natural language requirement documents, images, audio, video, time series, point clouds, structured data, and multimodal files. The dataset quality assessment module is configured to assess the quality of the preprocessed dataset, obtain dataset quality indicator assessment parameters, and determine the target dataset that passes the quality assessment based on the dataset quality indicator assessment parameters. The model performance evaluation module is configured to train the model using the target dataset to obtain a pre-trained model, and to evaluate the performance of the pre-trained model to obtain model performance evaluation parameters; it is also configured to output the target artificial intelligence model and the target artificial intelligence dataset corresponding to the model performance evaluation parameters that meet the preset standard when the model performance evaluation parameters meet the preset standard. The correlation analysis module is configured to perform a correlation analysis between dataset quality and model performance based on the model performance indicator evaluation parameters when the parameters do not meet preset standards, and to generate updated target parameters for the dataset quality indicators based on the analysis results. This correlation analysis includes: performing a correlation analysis on the model performance indicator evaluation parameters using the correlation analysis model to determine the dataset quality indicators that affect model performance; updating the determined dataset quality indicators based on the model performance indicator evaluation parameters to obtain the updated target parameters for the dataset quality indicators; and establishing the mapping relationship between the dataset quality indicators and the model performance indicators using the correlation analysis model.
9. An apparatus for collaboratively adjusting the quality of an artificial intelligence dataset and the performance of a model, comprising a processor and a memory storing program instructions, characterized in that, The processor is configured to, when running the program instructions, perform the method for collaboratively adjusting the quality of artificial intelligence datasets and model performance as described in any one of claims 1 to 7.